• Post category:StudyBullet-3
  • Reading time:4 mins read


Learn all the fundamentals of PySpark

What you will learn

PySpark, Apache Spark, Big Data Analytics, Big Data Processing, Python

Description

Spark is one of the most in-demand Big Data processing frameworks right now.

This course will take you through the core concepts of PySpark. We will work to enable you to do most of the things you’d do in SQL or Python Pandas library, that is:


Get Instant Notification of New Courses on our Telegram channel.


  • Getting hold of data
  • Handling missing data and cleaning data up
  • Aggregating your data
  • Filtering it
  • Pivoting it
  • And Writing it back

All of these things will enable you to leverage Spark on large datasets and start getting value from your data.

Let’s get started.

English
language

Content

Introduction
Introduction
How is this course structured
A Scenario To Get Us Started
Introduction to our development environment
Introduction to our dataset & dataframes
Environment configuration code snippet
Ingesting & Cleaning Data
Answering our scenario questions
Core Concepts
Bringing data into dataframes
Inspecting A Dataframe
Handling Null & Duplicate Values
Selecting & Filtering Data
Applying Multiple Filters
Running SQL on Dataframes
Adding Calculated Columns
Group By And Aggregation
Writing Dataframe To Files
Challenge
Challenge Overview
Challenge Solution
Conclusion
Thanks for joining me to learn PySpark!