• Post category:StudyBullet-14
  • Reading time:4 mins read


Speed up your Spark Scripts and overcome errors

What you will learn

The three main causes of performance issues in Apache Spark

How to overcome shuffle induced performance issues in Apache Spark

How to overcome skew induced performance issues in Apache Spark

How to overcome spill induced performance issues in Apache Spark

Description

Spark is a powerful framework for processing large datasets in parallel. But, with the complex architecture come frequent performance issues.

In my experience, it can be frustrating looking everywhere, trying to find a resource online that is worded in such a way that you fully understand the inner workings of Spark and how to address these issues. So, I created this course!

This is not a code-along course. This course assumes you already know how to code in Spark. Here, we’re talking about how you resolve the performance issues that you encounter during your development journey! We will walk through all of the theory & you’ll have actionable steps to take to resolve your performance issues.


Get Instant Notification of New Courses on our Telegram channel.


In this course, we will cover off:

  • The Apache Spark Architecture
  • The type of deployment modes in Apache Spark
  • The structure of jobs in Apache Spark
  • How to handle the three main performance concerns in Spark

If you don’t yet know how to code in Spark, you can join my 60 minute crash course in PySpark, here on Udemy.

Let’s get to work understanding why your scripts are not performing as you may hope and resolve your performance issues together. Shuffle, Skew and Spill will be concerns of the past after this course!

English
language

Content

Apache Spark Performance Optimization

Introduction
Spark Architecture
Spark Performance & Config Changes Article
Deployment Modes in Spark
Reviewing Cluster vs Client Deployment Modes
Jobs, Stages & Tasks in Spark
Introduction to Performance Concerns in Spark
What is Shuffle?
Further Insight into Shuffle
How do we identify Shuffle?
Resolve Shuffle: Broadcast Joins
Resolve Shuffle: ReduceBy()
Resolve Shuffle: Config
What is Skew
More About Skew
How to Identify Skew
How to Resolve Skew
Coalesce Vs Repartitioning Article
What is Spill
How To Prevent Spill
Wrapping up!