# spark-scala-examples **Repository Path**: doubo151/spark-scala-examples ## Basic Information - **Project Name**: spark-scala-examples - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 1 - **Forks**: 0 - **Created**: 2021-03-13 - **Last Updated**: 2023-05-12 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README Explanation of all Spark SQL, RDD, DataFrame and Dataset examples present on this project are available at https://sparkbyexamples.com/ , All these examples are coded in Scala language and tested in our development environment. # Table of Contents (Spark Examples in Scala) ## Spark RDD Examples - [Create a Spark RDD using Parallelize](https://sparkbyexamples.com/apache-spark-rdd/how-to-create-an-rdd-using-parallelize/) - [Spark – Read multiple text files into single RDD?](https://sparkbyexamples.com/apache-spark-rdd/spark-read-multiple-text-files-into-a-single-rdd/) - [Spark load CSV file into RDD](https://sparkbyexamples.com/apache-spark-rdd/spark-load-csv-file-into-rdd/) - [Different ways to create Spark RDD](https://sparkbyexamples.com/apache-spark-rdd/different-ways-to-create-spark-rdd/) - [Spark – How to create an empty RDD?](https://sparkbyexamples.com/apache-spark-rdd/spark-how-to-create-an-empty-rdd/) - [Spark RDD Transformations with examples](https://sparkbyexamples.com/apache-spark-rdd/spark-rdd-transformations/) - [Spark RDD Actions with examples](https://sparkbyexamples.com/apache-spark-rdd/spark-rdd-actions/) - [Spark Pair RDD Functions](https://sparkbyexamples.com/apache-spark-rdd/spark-pair-rdd-functions/) - [Spark Repartition() vs Coalesce()](https://sparkbyexamples.com/spark/spark-repartition-vs-coalesce/) - [Spark Shuffle Partitions](https://sparkbyexamples.com/spark/spark-shuffle-partitions/) - [Spark Persistence Storage Levels](https://sparkbyexamples.com/spark/spark-persistence-storage-levels/) - [Spark RDD Cache and Persist with Example](https://sparkbyexamples.com/apache-spark-rdd/spark-rdd-cache-and-persist-example/) - [Spark Broadcast Variables](https://sparkbyexamples.com/spark/spark-broadcast-variables/) - [Spark Accumulators Explained](https://sparkbyexamples.com/spark/spark-accumulators/) - [Convert Spark RDD to DataFrame | Dataset](https://sparkbyexamples.com/apache-spark-rdd/convert-spark-rdd-to-dataframe-dataset/) ## Spark SQL Tutorial - [Spark Create DataFrame with Examples](https://sparkbyexamples.com/spark/different-ways-to-create-a-spark-dataframe/) - [Spark DataFrame withColumn](https://sparkbyexamples.com/spark/spark-dataframe-withcolumn/) - [Ways to Rename column on Spark DataFrame](https://sparkbyexamples.com/spark/rename-a-column-on-spark-dataframes/) - [Spark – How to Drop a DataFrame/Dataset column](https://sparkbyexamples.com/spark/spark-drop-column-from-dataframe-dataset/) - [Working with Spark DataFrame Where Filter](https://sparkbyexamples.com/spark/spark-dataframe-where-filter/) - [Spark SQL “case when” and “when otherwise”](https://sparkbyexamples.com/spark/spark-case-when-otherwise-example/) - [Collect() – Retrieve data from Spark RDD/DataFrame](https://sparkbyexamples.com/spark/spark-dataframe-collect/) - [Spark – How to remove duplicate rows](https://sparkbyexamples.com/spark/spark-remove-duplicate-rows/) - [How to Pivot and Unpivot a Spark DataFrame](https://sparkbyexamples.com/spark/how-to-pivot-table-and-unpivot-a-spark-dataframe/) - [Spark SQL Data Types with Examples](https://sparkbyexamples.com/spark/spark-sql-dataframe-data-types/) - [Spark SQL StructType & StructField with examples](https://sparkbyexamples.com/spark/spark-sql-structtype-on-dataframe/) - [Spark schema – explained with examples](https://sparkbyexamples.com/spark/spark-schema-explained-with-examples/) - [Spark Groupby Example with DataFrame](https://sparkbyexamples.com/spark/using-groupby-on-dataframe/) - [Spark – How to Sort DataFrame column explained](https://sparkbyexamples.com/spark/spark-how-to-sort-dataframe-column-explained/) - [Spark SQL Join Types with examples](https://sparkbyexamples.com/spark/spark-sql-dataframe-join/) - [Spark DataFrame Union and UnionAll](https://sparkbyexamples.com/spark/spark-dataframe-union-and-union-all/) - [Spark map vs mapPartitions transformation](https://sparkbyexamples.com/spark/spark-map-vs-mappartitions-transformation/) - [Spark foreachPartition vs foreach | what to use?](https://sparkbyexamples.com/spark/spark-foreachpartition-vs-foreach-explained/) - [Spark DataFrame Cache and Persist Explained](https://sparkbyexamples.com/spark/spark-dataframe-cache-and-persist-explained/) - [Spark SQL UDF (User Defined Functions)](https://sparkbyexamples.com/spark/spark-sql-udf/) - [Spark SQL DataFrame Array (ArrayType) Column](https://sparkbyexamples.com/spark/spark-array-arraytype-dataframe-column/) - [Working with Spark DataFrame Map (MapType) column](https://sparkbyexamples.com/spark/spark-dataframe-map-maptype-column/) - [Spark SQL – Flatten Nested Struct column](https://sparkbyexamples.com/spark/spark-flatten-nested-struct-column/) - [Spark – Flatten nested array to single array column](https://sparkbyexamples.com/spark/spark-flatten-nested-array-column-to-single-column/) - [Spark explode array and map columns to rows](https://sparkbyexamples.com/spark/explode-spark-array-and-map-dataframe-column/) ## Spark SQL Functions - [Spark SQL String Functions Explained](https://sparkbyexamples.com/spark/usage-of-spark-sql-string-functions/) - [Spark SQL Date and Time Functions](https://sparkbyexamples.com/spark/spark-sql-date-and-time-functions/) - [Spark SQL Array functions complete list](https://sparkbyexamples.com/spark/spark-sql-array-functions/) - [Spark SQL Map functions – complete list](https://sparkbyexamples.com/spark/spark-sql-map-functions/) - [Spark SQL Sort functions – complete list](https://sparkbyexamples.com/spark/spark-sql-sort-functions/) - [Spark SQL Aggregate Functions](https://sparkbyexamples.com/spark/spark-sql-aggregate-functions/) - [Spark Window Functions with Examples](https://sparkbyexamples.com/spark/spark-sql-window-functions/) ## Spark Data Source API - [Spark Read CSV file into DataFrame](https://sparkbyexamples.com/spark/spark-read-csv-file-into-dataframe/) - [Spark Read and Write JSON file into DataFrame](https://sparkbyexamples.com/spark/spark-read-and-write-json-file/) - [Spark Read and Write Apache Parquet](https://sparkbyexamples.com/spark/spark-read-write-dataframe-parquet-example/) - [Spark Read XML file using Databricks API](https://sparkbyexamples.com/spark/spark-read-write-xml/) - [Read & Write Avro files using Spark DataFrame](https://sparkbyexamples.com/spark/read-write-avro-file-spark-dataframe/) - [Using Avro Data Files From Spark SQL 2.3.x or earlier](https://sparkbyexamples.com/spark/using-avro-data-files-from-spark-sql-2-3-x/) - [Spark Read from & Write to HBase table | Example](https://sparkbyexamples.com/spark/spark-read-write-using-hbase-spark-connector/) - [Create Spark DataFrame from HBase using Hortonworks](https://sparkbyexamples.com/spark/create-spark-dataframe-from-hbase-using-hortonworks/) - [Spark Read ORC file into DataFrame](https://sparkbyexamples.com/spark/spark-read-orc-file-into-dataframe/) - [Spark 3.0 Read Binary File into DataFrame](https://sparkbyexamples.com/spark/spark-read-binary-file-into-dataframe/) ## Spark Streaming & Kafka - [Spark Streaming – Different Output modes explained](https://sparkbyexamples.com/spark/spark-streaming-outputmode/) - [Spark Streaming files from a directory](https://sparkbyexamples.com/spark/spark-streaming-read-json-files-from-directory/) - [Spark Streaming – Reading data from TCP Socket](https://sparkbyexamples.com/spark/spark-streaming-from-tcp-socket/) - [Spark Streaming with Kafka Example](https://sparkbyexamples.com/spark/spark-streaming-with-kafka/) - [Spark Streaming – Kafka messages in Avro format](https://sparkbyexamples.com/spark/spark-streaming-consume-and-produce-kafka-messages-in-avro-format/) - [Spark SQL Batch Processing – Produce and Consume Apache Kafka Topic](https://sparkbyexamples.com/spark/spark-batch-processing-produce-consume-kafka-topic/)