• Post category:StudyBullet-17
  • Reading time:22 mins read

Ultimate Azure Data Factory: Cloud Data Engineering
Real world Modern Data Warehouse project for Data Engineers using Azure Data Factory, Sql, Data Lake, Databricks

What you will learn

You will learn how to build data pipelines in Azure Data Factory (ADF) through a step-by-step approach.

You will learn how to ingest data in different formats into Azure Data Lake Gen2 using Azure Data Factory (ADF)

You will learn how to use and build various types of transformations in Azure Data Factory (ADF)

You will learn hands-on implementations of building generic artifacts in Azure Data Factory (ADF) such as Flowlets and Templates

You will learn how to transform data into the Medallion layers in Azure Data Lake Gen2 using Data Flows in Azure Data Factory (ADF)

You will learn how to implement ETL/ELT using Azure Data Factory (ADF) in order to implement a Data Warehouse

You will learn how to create generic metadata driven pipelines in Azure Data Factory (ADF) to implement the ETL/ELT processes

You will learn the concepts of the Modern Data Warehouse Architecture and the Delta Lake

You will learn the concepts of Slowly Changing Dimensions and how to implement them in Azure Data Factory (ADF)

You will learn how to load transformed data from Azure Data Lake Storage Gen2 to Azure SQL Database using Azure Data Factory (ADF)

You will learn how to implement a Delta Lake using Databricks Notebook Activity in Azure Data Factory (ADF) and load into Azure Data Lake Storage Gen2

You will learn how to transform your raw data into a finished data warehouse using Azure Data Factory (ADF) and then visualize it in PowerBI

You will learn how to build pipelines using good practices and naming standards as in a typical real-world data engineering project

You will learn how to implement different types of Triggers in Azure Data Factory (ADF) and how to schedule your data pipelines

You will learn how to monitor pipelines using Azure Data Factory (ADF), Azure Monitor, and how to recover from pipeline failures

By the end of this course you will have learnt all the topics required on Azure Data Factory to pass the Azure Data Engineer Associate Certification Exam DP203

Description

Welcome!

Data engineering is a thriving focus in the IT industry, with Microsoft’s Azure Data Factory emerging as a sought-after tool in cloud-based data engineering.

Join this course for a step-by-step journey into mastering Azure Data Factory (ADF). Using a real-world scenario of an e-commerce company grappling with data integration and insights, we’ll explore the data of an online wine retailer, showcasing how implementing a modern data warehouse with ADF can provide solutions.

Distinguishing itself from other Udemy offerings on Azure Data Factory and Data Engineering Technologies, this course guides you hands-on in transforming raw data into a Modern Data Warehouse using Azure Data Factory (ADF). Upon completion, you’ll gain proficiency in ADF, ready to tackle real-world data engineering projects.

Given the course’s focus on real-world business scenarios, it adopts a sequential approach mirroring how such requirements unfold in actual projects. This method ensures you not only implement business needs but also grasp the technical concepts explained at each stage of implementing data pipelines with Azure Data Factory (ADF).

This course covers more than just modern data warehouse concepts like architecture, medallion layers, and delta lake. You’ll also gain expertise in utilizing diverse Azure ecosystem solutions, including Azure Data Lake Storage, Azure SQL Database, and Azure Databricks. Additionally, you’ll learn to visually represent the completed data warehouse through Power BI reports.

This course enables you to grasp concepts and skills assessed in the Azure Data Engineer Associate Certification exam DP203. While it equips you with the necessary skills, it’s important to note that the course is not designed solely for certification passing but for comprehensive learning.


Get Instant Notification of New Courses on our Telegram channel.


I appreciate your time, and I’ve crafted this course to be practical and focused. I aim for simplicity and conciseness, starting from the basics and ensuring proficiency in the technologies covered.

Currently the course teaches you the following:

Azure Data Factory

  • Constructing a contemporary Data Warehouse architecture for a data engineering solution involves utilizing Azure Data Engineering technologies like Azure Data Factory (ADF), Azure Data Lake Gen2, Azure SQL Database, Azure Databricks, Azure KeyVault, and Microsoft PowerBI.
  • Incorporating data from varied sources with diverse formats into Azure Data Lake Gen2 is achieved through the use of Azure Data Factory.
  • Comprehending Azure concepts, including resources and their provisioning methods.
  • Learning to incorporate and use tools such as Azure Storage Explorer, Azure Data Studio, and Visual Studio Code in the development workflow.
  • Implementing Azure Data Factory (ADF) pipelines using different control flow activities such as Get Metadata, ForEach, If Conditions, etc.
  • Using Parameters and Variables in Pipelines, Datasets and LinkedServices to create generic parameter driven pipelines in Azure Data Factory (ADF).
  • Using parameters in conjunction with Azure KeyVault to create generic parameter driven piplines in Azure Data Factory (ADF).
  • Implementing Mapping Data Flows to create transformation logic to handle a variety of transformation scenarios such as Filter, Conditional Split, Derived Column, Aggregate, Join,Β Select, and Sink transformation.
  • Developing universal components in data pipelines, such as Flowlets, and mastering the swift development of data processing needs through pre-built pipeline templates.
  • Learning how to implement error handling in data pipelines and controlling pipeline flow.
  • Implementig data quality rules using the Assert transformation within a data pipeline.
  • Implementing data pipelines to handle common slowly changing dimension scenarios such as SCD Type 1 and SCD Type 2.
  • Implementing data pipleines to implement a Fact table.
  • Learning how to debug data pipelines and resolving issues.
  • Implementing pipeline scheduling using different types of triggers such as Event Trigger, Schedule Trigger and Tumbling Window Trigger in Azure Data Factory (ADF)
  • Implementing Azure Data Factory pipelines to invoke Mapping Data Flows and executing them.
  • Creating ADF pipelines to execute Databricks Notebook activities to carry out transformations and implement a Delta Lake table.
  • Creating pipeline dependencies and using the Pipeline activity to orchestrate the ETL/ELT process.
  • Implementing trigger dependencies to understand how to chain pipelines and orchestrate the data flow.
  • Monitoring data pipelines, creating alert notifications, and reporting data factory metrics using Azure Data Factory Monitor.
  • Understanding how to monitor Azure Data Factory pipelines using Azure Monitor using specific Data FactoryΒ  metrics.

Modern Data Warehouse

  • Understand the different types of Data Warehouse Architectures.
  • Understand the concepts of a Delta Lake.
  • Understand the Dimensional Model and a Star Schema based Data Warehouse.
  • Understand the concept of Medallion Layers and how to implement it within the Azure Data Lake Storage.

Azure Databricks

  • Understand the creation of an Azure Databricks Workspace, Databricks clusters, Mounting storage accounts, Creating Databricks notebooks, performing transformations using Databricks notebooks, and Invoking Databricks notebooks from Azure Data Factory.
  • Understand the implementation of a Delta Lake table using Azure Databricks Notebook activity from an Azure Data Factory pipeline.
  • Understand the concepts of Optimizing a Delta Lake Table, Time Travel, Vacuuming, and Delta Logs.

Azure Resources and Azure Storage Solutions

  • Learn the different approaches to creating Azure Resources.
  • Learn how to create an Azure Storage Account resource, creating containers, and how to upload data through the Azure Portal or through Azure Storage Explorer into the Azure storage resource.
  • Learn how to create an Azure SQL Database resource, understand the Pricing Tiers, Creating an Admin User, Creating Tables, Loading Data, Querying the database and interacting with Azure Sql Database through Azure Data Studio.
English
language

Content

Overview

Welcome
What you will learn?
Goal of this course
Commitment
Course Materials
Course Slides

Introduction

Introduction to Azure Data Factory
Why Azure Data Factory?
What is Azure Data Factory?
Benefits of Azure Data Factory
Azure Account
User Interface Azure Portal
Module Summary

Project Overview

Hands-On Project Overview
Business Case for the Project
Solution Requirements
Architectural Patterns
Modern Data Warehouse Architecture
Hands-On Project Architecture
Repositories
Module Summary

Environment

Module Overview
Software Tools
Software Tools Setup
Azure Resources
Setup Azure Resources
Setup Azure Resources in Azure Portal
Setup Azure Resource Group
Setup Azure Data Lake Storage
Setup Azure Data Factory Resource
Setup Azure Sql DB Resource
Review Azure Resources
Setup Azure Data Studio
Setup Azure Storage Explorer
Module Summary

Building a Data Pipeline

Module Overview
Building Blocks of Azure Data Factory – Main Components
Building Blocks of Azure Data Factory – Pipelines and Activities
Building Blocks of Azure Data Factory – How they Tie Together
Azure Data Factory User Interface – Main Page
Azure Data Factory User Interface – Authoring Canvas
Data Sources
Data Sources – Data Ingestion
Data Sources – Data Organization
Building the Data Pipeline
Building the Data Pipeline – Creating the Containers
Building the Data Pipeline – Creating the Pipeline
Building the Data Pipeline – Review and Organize
Importing Semi-Structured Data
Importing Semi-Structured Data – Building the Pipeline
Importing Semi-Structured Data – Organizing the Pipeline
Importing Semi-Structured Data – Recap of the Lesson
Naming Conventions
Module Summary

Pipeline Activities and Parameters

Module Overview
Activities
Activity Dependencies
Activity Dependencies – Examples
Copy Activity
Copy Activity Concepts – Examples
Expressions and Variables
Expressions and Variables – Examples
Parameters
Parameters – Examples
Azure Key Vault – Overview
Azure Key Vault – Setup
Azure Key Vault – Create Linked Service
Importing Semi-Structured Data
Module Summary

Mapping Data Flows

Module Overview
Introduction to Mapping Data Flows
Scenarios for Mapping Data Flows
User Interface of Mapping Data Flows
User Interface of Mapping Data Flows – Debug Feature
Implementing a Mapping Data Flow – Overview
Implementing a Mapping Data Flow – Pipeline and Data Sources
Implementing a Mapping Data Flow – Adding Transformations
Implementing a Mapping Data Flow – Pipeline Execution
Mapping Data Flow – Concepts
Mapping Data Flow – Concepts Example
Performance of Mapping Data Flows – Integration Runtime
Performance of Mapping Data Flows
Module Summary

Implementing Flowlets

Module Overview
Introduction to Flowlets
Scenarios for Flowlets
User Interface of Flowlets – Overview
User Interface of Flowlets – Create a Demo Flowlet
Implementing a Flowlet – Create Flowlet
Implementing a Flowlet – Use the Flowlet
Module Summary

Controlling Pipeline Flow

Module Overview
Asserts
Implementing Asserts – Assert Expect True
Implementing Asserts – Identifying Error Rows
Implementing Asserts – Processing Error Rows
Error Handling Overview
Implementing Error Handling – Fail Activity
Implementing Error Handling – Capturing Errors
Implementing Error Handling – Logging Errors
Implementing Error Handling – Review of Error Pipeline
Integrating Data Quality and Error Handling
Building Pipelines using Pre-Built Templates
Module Summary

Building the Data Warehouse – Part 1

Module Overview
Data Warehouse Overview
Data Warehouse Models
Data Warehouse Vino World
Data Process
Building the Azure Sql Database – Create the Stage Tables
Building the Azure Sql Database – Create the DW Tables
Building the Staging Layer Master Data – Master data
Building the Staging Layer Master Data – Product data
Building the Staging Layer Master Data – Metadata approach
Building the Staging Layer Master Data – Create Parameter Datasets
Building the Staging Layer Master Data – Create Metadata Pipeline
Building the Staging Layer Master Data – Pipeline execution
Building the Staging Layer Transaction Data
Building the Staging Layer Product Data – Combine Product Data
Building the Staging Layer Transaction Data – Combine Sales Data
Module Summary

Building the Data Warehouse – Part 2

Module Overview
Dimensions – Overview of Dimensions
Dimensions – Slowly Changing Dimensions
Dimensions – Master Dimensions and SCD Type
Building Type1 Dimensions – Using Data Flows
Building Type 1 Dimensions – Pipeline Review
Building Type 1 Dimensions – Using Stored Procedures
Dimensions – Overview of Type2 Dimensions
Building Type 2 Dimensions – Product Dimension
Building Type 2 Dimensions – Using Data Flows – Step1
Building Type 2 Dimensions – Using Data Flows – Step2
Building Type 2 Dimensions – Pipeline Review
Building Type 2 Dimensions – Using Stored Procedures
Building Dimensions – Build remaining dimensions
Facts – Overview
Building Facts
Data Warehouse Review and Data Analysis
Module Summary

Building the Delta Lake

Module Overview
Recap of what we implemented
What we will implement
Azure Databricks
What is Azure Databricks
Core Artifacts of Azure Databricks
Setup Azure Databricks
Setup Databricks Resource
Databricks UI Overview
Databricks Cluster Overview
Create Databricks Cluster
Azure Service Principal and Access to Data Lake Storage
Mount Azure Data Lake Storage
Overview of Delta Lake Implementation
What is a Delta Lake
Create Data Source for the Delta Table
Create Delta Table
Load Delta Table
Update Delta Table
Delta Table Concepts
Create Linked Service to Databricks from Data Factory
Executing Databricks Notebook from Data Factory
Module Summary

Presentation Layer

Module Overview
Overview – Modern Data Warehouse
Overview – What we implemented
Overview – What we will implement
PowerBI – Installation
PowerBI – Overview
PowerBI – Connecting to the Data Warehouse
PowerBI – Building the Tabular Model
PowerBI – Building the Report
PowerBI – Report Requirements
PowerBI – Report Review
Module Summary

Triggers

Module Overview
Overview of Triggers
Approach to Pipeline Execution
Implementing a Master Pipeline
Executing the Master Pipeline
Implementing Event-based triggers
Executing Event-based Triggers
Scheduling Pipelines
Creating a Tumbling Window Trigger
Module Summary

Monitoring

Module Overview
Executing Event-based triggers
Overview of Data Factory Monitoring
What do we monitor in Azure Data Factory
Visual Monitoring in Azure Data Factory
Pipeline Recovery
Setup Alerts
Validate the Alert
Metrics
Module Summary

Section 16: Conclusion

Summary