Building Big Data Pipelines with PySpark + MongoDB + Bokeh - StudyBullet.com

Post published:10 January, 2022
Post category:StudyBullet-5
Reading time:6 mins read

What you will learn

PySpark Programming

Data Analysis

Python and Bokeh

Data Transformation and Manipulation

Data Visualization

Big Data Machine Learning

Geo Mapping

Geospatial Machine Learning

Creating Dashboards

Description

Welcome to the Building Big Data Pipelines with PySpark & MongoDB & Bokeh course. In

this course we will be building an intelligent data pipeline using big data technologies like

Apache Spark and MongoDB.

We will be building an ETLP pipeline, ETLP stands for Extract Transform Load and Predict.

Get Instant Notification of New Courses on our Telegram channel.

These are the different stages of the data pipeline that our data has to go through in order for it

to become useful at the end. Once the data has gone through this pipeline we will be able to

use it for building reports and dashboards for data analysis.

The data pipeline that we will build will comprise of data processing using PySpark, Predictive

modelling using Spark’s MLlib machine learning library, and data analysis using MongoDB and

Bokeh.

You will learn how to create data processing pipelines using PySpark
You will learn machine learning with geospatial data using the Spark MLlib library
You will learn data analysis using PySpark, MongoDB and Bokeh, inside of jupyter notebook
You will learn how to manipulate, clean and transform data using PySpark dataframes
You will learn basic Geo mapping
You will learn how to create dashboards
You will also learn how to create a lightweight server to serve Bokeh dashboards

English

language

Content

Introduction

Introduction

Setup and Installations

Python Installation

Installing Third Party Libraries

Installing Apache Spark

Installing Java (Optional)

Testing Apache Spark Installation

Installing MongoDB

Installing NoSQL Booster for MongoDB

Data Processing with PySpark and MongoDB

Integrating PySpark with Jupyter Notebook

Data Extraction

Data Transformation

Loading Data into MongoDB

Machine Learning with PySpark and MLlib

Data Pre-processing

Building the Predictive Model

Creating the Prediction Dataset

Data Visualization

Loading the Data Sources from MongoDB

Creating a Map Plot

Creating a Bar Chart

Creating a Magnitude Plot

Creating a Grid Plot

Creating the Data Pipeline Scripts

Installing Visual Studio Code

Creating the PySpark ETL Script

Creating the Machine Learning Script

Creating the Dashboard Server

Source Code and Notebook

Source Code and Notebook

Enroll for Free

💠 Follow this Video to Get Free Courses on Every Udemy Topics! 💠

Tags: Big Data, ETL, Free Courses, Geospatial, PySpark, StudyBullet