Sign in to view content

Sign in to view this lesson and continue learning.

Spark Batch Processing - Caching, UDFs, DataFrames, Datasets, SparkSQL, and Parquet (Day 2 Lecture)

Week 4: Batch Pipelines with Apache Spark V2
54 mins
SQLData ModelingETL/ELTApache Spark

Description

In this lecture, we dive deeper into Spark, focusing on optimization with caching, temporary views, UDFs, DataFrame vs. Dataset vs. SparkSQL, Parquet, and tuning considerations.