• Post category:StudyBullet-5
  • Reading time:6 mins read




What you will learn

 

PySpark Programming

 

Data Analysis

 

Python and Bokeh

 

Data Transformation and Manipulation

 

Data Visualization

 

Big Data Machine Learning

 

Geo Mapping

 

Geospatial Machine Learning

 

Creating Dashboards

Description

Welcome to the ​Building Big Data Pipelines with PySpark & MongoDB & Bokeh​ course. In

this course we will be building an intelligent data pipeline using big data technologies like

Apache Spark and MongoDB.

 

We will be building an ETLP pipeline, ETLP stands for Extract Transform Load and Predict.


Get Instant Notification of New Courses on our Telegram channel.


These are the different stages of the data pipeline that our data has to go through in order for it

to become useful at the end. Once the data has gone through this pipeline we will be able to

use it for building reports and dashboards for data analysis.

 

The data pipeline that we will build will comprise of data processing using PySpark, Predictive

modelling using Spark’s MLlib machine learning library, and data analysis using MongoDB and

Bokeh.

 

  • You will learn how to create data processing pipelines using PySpark

  • You will learn machine learning with geospatial data using the Spark MLlib library

  • You will learn data analysis using PySpark, MongoDB and Bokeh, inside of jupyter notebook

  • You will learn how to manipulate, clean and transform data using PySpark dataframes

  • You will learn basic Geo mapping

  • You will learn how to create dashboards

  • You will also learn how to create a lightweight server to serve Bokeh dashboards

 

 

English
language

Content

Introduction

Introduction

Setup and Installations

Python Installation
Installing Third Party Libraries
Installing Apache Spark
Installing Java (Optional)
Testing Apache Spark Installation
Installing MongoDB
Installing NoSQL Booster for MongoDB

Data Processing with PySpark and MongoDB

Integrating PySpark with Jupyter Notebook
Data Extraction
Data Transformation
Loading Data into MongoDB

Machine Learning with PySpark and MLlib

Data Pre-processing
Building the Predictive Model
Creating the Prediction Dataset

Data Visualization

Loading the Data Sources from MongoDB
Creating a Map Plot
Creating a Bar Chart
Creating a Magnitude Plot
Creating a Grid Plot

Creating the Data Pipeline Scripts

Installing Visual Studio Code
Creating the PySpark ETL Script
Creating the Machine Learning Script
Creating the Dashboard Server

Source Code and Notebook

Source Code and Notebook