Spark Batch Processing - Caching, UDFs, DataFrames, Datasets, SparkSQL, and Parquet (Day 2 Lecture)
LOGIN
SIGNUP
PRICING
REVIEWS
CONTACT
SEARCH
About Me
Contact
Search
Home
Data Engineering Boot Camp V2 Combined Track
Spark Batch Processing - Caching, UDFs, DataFrames, Datasets, SparkSQL, and Parquet (Day 2 Lecture)
Sign in to view content
Sign in to view this lesson and continue learning.
Sign in
Spark Batch Processing - Caching, UDFs, DataFrames, Datasets, SparkSQL, and Parquet (Day 2 Lecture)
Week 4: Batch Pipelines with Apache Spark V2
54 mins
SQL
Data Modeling
ETL/ELT
Apache Spark
Previous
Next
Overview
Description
In this lecture, we dive deeper into Spark, focusing on optimization with caching, temporary views, UDFs, DataFrame vs. Dataset vs. SparkSQL, Parquet, and tuning considerations.