• Post category:StudyBullet-5
  • Reading time:9 mins read


What you will learn

Getting started with Apache Spark introduction.

Get an Overview of Spark.

Learn how to Install Spark and setup using VirtualBox.

Learn how to Install and setup using AWS EC2.

Learn about the Data Frame in Apache Spark.

Also learn about Machine Learning in Spark.

Learn about Linear Regression using MLlib.

Learn about Logistic Regression using MLlib.

Learn about SVM Classifier using MLlib.

Learn about K nearest Neighbor using MLlib.

Learn about Naive Bayes Using MLlib.


Get Instant Notification of New Courses on our Telegram channel.


Learn about Tree Based Algorithms.

Description

Welcome to the wonderful online course on Apache Spark.

Apache Spark is an open source general purpose cluster computing system that has capabilities for iterative computation that can process large pieces of data without requiring human intervention. Though its primary function is as a distributed computing engine, it also includes libraries dedicated to machine learning and graph processing tasks for applications such as artificial intelligence.

PySpark is a Python-based implementation of Apache Spark, which distributes computations in memory, on CPUs, or on clusters of machines using in-memory data caching. This lets you run large-scale analytics jobs interactively. Python is a scripting language that’s easy to learn and fun to use! With Spark, you can now bring the power of machine learning directly into your code without having to copy and paste blocks of code from notebooks.

In this course, you will cover:-

  • Getting started with Apache Spark introduction.
  • Get an Overview of Spark.
  • Learn how to Install Spark and setup using VirtualBox.
  • Learn how to Install and setup using AWS EC2.
  • Learn about the Data Frame and its implementation in Spark.
  • Also learn about Machine Learning in Spark.
  • Learn about Linear Regression and its implementation using MLlib.
  • Learn about Logistic Regression and its implementation using MLlib.
  • Learn about SVM Classifier and its implementation using MLlib.
  • Learn about K nearest Neighbor and its implementation using MLlib.
  • Learn about Naive Bayes and its implementation using MLlib.
  • Learn about Tree Based Algorithms and its implementation.

After finishing this course, you will become an expert on Apache Spark. We are also providing quizzes.

You will also have access to all the resources used in this course.

Instructor Support – Quick Instructor Support for any queries.

Enroll now and make the best use of this course.

English
language

Content

Getting started

Course Introduction
Why distributed system?
Overview of Hadoop

Overview of Spark

Why Spark?
Spark Framework
Why should we use Python for Spark?
Spark Versions and it’s updates in each version

Spark Installation and setup using VirtualBox

Download and install virtualbox
Download and install Ubuntu on virtualBox
Installing Java, Scala, py4j, and jupyter notebook
Installing and setting up spark

Spark installation and setup using AWS EC2

Creating free account in AWS
Creating EC2 instance with Ubuntu
SSH connect with windows and linux/mac
Installing Java, Scala, py4j, and jupyter notebook

DataFrame in PySpark

What is Spark Session?
Overview of Dataframe in PySpark
Create DataFrames in Pyspark
Basic Dataframe Operations
Filter, groupby and pivot operations
Date time objects and Handling missing values in pyspark
Drop DuplDataframesicates in pyspark
PySpark UDF

Machine Learning in Pyspark

Introduction to Machine Learning
Spark’s Machine Learning Library (MLlib)

Linear Regression using MLlib

Introduction to Linear Regression
Linear regression implementation with MLlib

Logistic Regression using MLlib

Introduction to Logistic Regression part 1
Introduction to Logistic Regression part 2
Multinomial Logistic Regression implementation with MLlib

SVM Classifier using MLlib

SVM Classifier Implementation

K nearest Neighbor using MLlib

KNN implementation using pyspark MLlib

Naive Bayes Using MLlib

Naive Bayes Implementation using PySpark MLlib

Tree Based Algorithms

Decision Tree Algorithm Implementation using pyspark
Random Forest Algorithm Implementation using pyspark

Outro Section

Conclusion
How to Get Your Certificate of Completion

Outro

Bonus