A Crash Course In PySpark – StudyBullet.com

Post published:31 October, 2021
Post category:StudyBullet-3
Reading time:4 mins read

Learn all the fundamentals of PySpark

What you will learn

PySpark, Apache Spark, Big Data Analytics, Big Data Processing, Python

Description

Spark is one of the most in-demand Big Data processing frameworks right now.

This course will take you through the core concepts of PySpark. We will work to enable you to do most of the things you’d do in SQL or Python Pandas library, that is:

Get Instant Notification of New Courses on our Telegram channel.

Getting hold of data
Handling missing data and cleaning data up
Aggregating your data
Filtering it
Pivoting it
And Writing it back

All of these things will enable you to leverage Spark on large datasets and start getting value from your data.

Let’s get started.

English

language

Content

Introduction

Introduction

How is this course structured

A Scenario To Get Us Started

Introduction to our development environment

Introduction to our dataset & dataframes

Environment configuration code snippet

Ingesting & Cleaning Data

Answering our scenario questions

Core Concepts

Bringing data into dataframes

Inspecting A Dataframe

Handling Null & Duplicate Values

Selecting & Filtering Data

Applying Multiple Filters

Running SQL on Dataframes

Adding Calculated Columns

Group By And Aggregation

Writing Dataframe To Files

Challenge

Challenge Overview

Challenge Solution

Conclusion

Thanks for joining me to learn PySpark!

Enroll for Free

💠 Follow this Video to Get Free Courses on Every Udemy Topics! 💠

Tags: Apache Spark, Free Courses, PySpark, Python, StudyBullet