Learn all the fundamentals of PySpark
What you will learn
PySpark, Apache Spark, Big Data Analytics, Big Data Processing, Python
Description
Spark is one of the most in-demand Big Data processing frameworks right now.
This course will take you through the core concepts of PySpark. We will work to enable you to do most of the things youβd do in SQL or Python Pandas library, that is:
- Getting hold of data
- Handling missing data and cleaning data up
- Aggregating your data
- Filtering it
- Pivoting it
- And Writing it back
All of these things will enable you to leverage Spark on large datasets and start getting value from your data.
Letβs get started.
English
language
Content
Introduction
Introduction
How is this course structured
A Scenario To Get Us Started
Introduction to our development environment
Introduction to our dataset & dataframes
Environment configuration code snippet
Ingesting & Cleaning Data
Answering our scenario questions
Core Concepts
Bringing data into dataframes
Inspecting A Dataframe
Handling Null & Duplicate Values
Selecting & Filtering Data
Applying Multiple Filters
Running SQL on Dataframes
Adding Calculated Columns
Group By And Aggregation
Writing Dataframe To Files
Challenge
Challenge Overview
Challenge Solution
Conclusion
Thanks for joining me to learn PySpark!