Sign in to view content

Sign in to view this lesson and continue learning.

Spark Batch Processing - Data Partitioning, Performance Optimization, and Iceberg Tables (Day 1 Lab)

Week 4: Batch Pipelines with Apache Spark V2
45 mins
PythonApache SparkApache Iceberg

Description

In this lab video, the presenter demonstrates how to execute Spark code and explore data partitioning. Key topics include running cells, monitoring the kernel's status, troubleshooting, the importance of using 'collect', the effects of terminating Spark sessions, and the differences between global sort and partition sort and their impact on performance. Additionally, the video covers writing data to iceberg tables and analyzing data set sizes.