• Post category:StudyBullet-11
  • Reading time:8 mins read


Hands-on course focusing on data engineering and analysis on Azure Databricks using Spark SQL

What you will learn

Azure Databricks

Data Lakehouse

Delta Lakes

Spark SQL

PySpark

Big Data

Real World Scenarios

Description

Databricks is one of the most in demand big data tools around. It is a fast, easy, and collaborative Spark based big data analytics service designed for data science, ML and data engineering workflows.

The course is packed with lectures, code-along videos and dedicated challenge sections. This should be more than enough to keep you engaged and learning! As an added bonus you will also have lifetime access to all the lectures… and I have provided detailed notebooks as a downloadable asset, the notebooks will contain step by step documentation with additional resources and links.

I have ensured that the delivery of the course is engaging and concise, the curriculum is extensive yet delivered in an efficient way. The course will provide you with hands-on training utilising a variety of different data sets.


Get Instant Notification of New Courses on our Telegram channel.


The course is aimed at teaching you PySpark, Spark SQL in Python and the Databricks Lakehouse Architecture.

You will primarily be using Databricks on Microsoft Azure in addition to other services such as Azure Data Lake Storage Gen 2.

The course will cover a variety of areas including:

  • Set Up and Overview
  • Azure Databricks Notebooks
  • Spark SQL
  • Reading and Writing Data
  • Data Analysis and Transformation with Spark SQL in Python
  • Charts and Dashboards in Databricks Notebooks
  • Databricks Medallion Architecture
  • Accessing Data in Cloud Object Storage
  • Hive Metastore
  • Databases, Tables and Views in Databricks
  • Delta Lake / Databricks Lakehouse Architecture
English
language

Content

Course Overview / Introduction to Spark and Databricks

Course Introduction
Big Data
Hadoop, Spark and Databricks
Apache Spark Architecture
Spark vs Databricks Comparison
Resource: Comparing Apache Spark vs Databricks

Azure and Databricks Set Up

Azure Account Set Up
Azure UI Overview
Resource: Azure Resources
Creating your Databricks Service
Databricks UI Overview
Clusters
Resource: Pricing, Cluster Pools and Runtime Versions
How to use Databricks Notebooks
Mix Languages and add Markdown text in your Notebook
Databricks Utilities Module and FileStore Utilities
Resource: How to use Notebooks
IMPORTANT – Download Course Resource Notebooks
Cost Management and Cancelling your Subscription
Resource: Cancelling your Azure Subscription

Reading and Writing Data

Dataset Download
Databricks FileStore
Resource: File Types
Reading Data
Writing Data
Parquet Files
Deleting Files and Folders

Data Analysis and Transformation with SparkSQL

Selecting and Renaming Columns
Adding New Columns
Changing Data Types
Math Functions and Simple Arithmetic
Sort Functions
String Functions
Datetime Functions
Filtering DataFrames
Conditional Statements
Using SQL Expressions with expr()
Removing Columns
Grouping your DataFrame
Pivot your DataFrame
Joining DataFrames
Union
Unpivot your DataFrame
Pandas

Utilising the Medallion Architecture in Databricks

Medallion Architecture
Resource: Medallion Architecture

Challenge Section: Customer Orders

Dataset Download and DBFS Upload
Assignment 1: Bronze to Silver
Assignment 1 Solutions Walkthrough
Assignment 2: Silver to Gold
Assignment 2 Solutions Walkthrough

Visualizations and Dashboards

Visualizations and Dashboards

Accessing Data from Azure Data Lake Storage (ADLS) with Databricks

Creating an ADLS Gen2 Account
(Optional) Storage Explorer
Accessing via Access Keys
Accessing via SAS Token
Mounting ADLS to DBFS Overview
Mounting ADLS to DBFS Demo
Secret Scopes
End to End Walkthrough Example

Hive Metastore, Databases, Tables and Views

Running SQL on DataFrames
Hive Metastore and Creating Databases
Managed Tables
Specifying a Location for your Underlying Managed Table Data
Unmanaged (External) Tables
Permanent Views

Challenge Section: Employees

Dataset Download and ADLS Upload
Assignment: Employees
Assignment Solutions Walkthrough

Databricks Data Lakehouse / Delta Lake

Databricks Data Lakehouse / Delta Lake Overview
Delta Lake Data Files
Deleting and Updating Records
Merge Into
Table Utility Commands

Modularize Code and Link Notebooks

Running a Notebook from another Notebook
Text Widgets