Mastering Amazon Redshift and Serverless for Data Engineers

In-Depth Course on Amazon Redshift, Redshift Serverless, Integration with EMR, AWS Step Functions, AWS Lambda and more

What you will learn

Getting Started with Amazon Redshift using AWS Web Console

Copy Data from s3 into AWS Redshift Tables using Redshift Queries or Commands

Develop Applications using Redshift Cluster using Python as Programming Language

Copy Data from s3 into AWS Redshift Tables using Python as Programming Language

Create Tables using Databases setup on AWS Redshift Database Server using Distribution Keys and Sort Keys

Run AWS Redshift Federated Queries connecting to traditional RDBMS Databases such as Postgres

Perform ETL using AWS Redshift Federated Queries using Redshift Capacity

Integration of AWS Redshift and AWS Glue Catalog to run queries using Redshift Spectrum

Run AWS Redshift Spectrum Queries using Glue Catalog Tables on Datalake setup using AWS s3

Getting Started with Amazon Redshift Serverless by creating Workgroup and Namespace

Integration of AWS EMR Cluster with Amazon Redshift using Serverless Workgroup

Develop and Deploy Spark Application on AWS EMR Cluster where the processed data will be loaded into Amazon Redshift Serverless Workgroup

Description

AWS or Amazon Redshift is one of the key AWS Services used in building Data Warehouses or Data Marts to serve reports and dashboards for business users. As part of this course, you will end up learning AWS or Amazon Redshift by going through all the important features of AWS or Amazon Redshift to build Data Warehouses or Data Marts.

We have covered features such as Federated Queries, Redshift Spectrum, Integration with Python, AWS Lambda Functions, Integration of Redshift with EMR, and End-to-End Pipeline using AWS Step Functions.

Get Instant Notification of New Courses on our Telegram channel.

Here is the detailed outline of the course.

First, we will understand how to Get Started with Amazon Redshift using AWS Web Console. We will see how to create a cluster, how to connect to the cluster, and also how to run the queries using a Web-based query editor. We will also go ahead and create a Database and tables in the Redshift Cluster. Once we set up a Database and tables, we will also go through the details related to CRUD Operations against tables in Databases in Redshift Cluster.
Once we have the databases and tables in Redshift Cluster, it is time for us to understand how to get data into the tables in Redshift Cluster. One of the common approaches we use to get data into the Redshift cluster is by Copying Data from s3 into Redshift Tables. We will go through the step-by-step process of copying the data into Redshift tables from s3 using the copy command.
Python is one of the prominent programming languages to build Data Engineering or ETL Applications. It is extensively used to build ETL Jobs to get data into Database Tables in Redshift Cluster. Once we understand how to get data from s3 to Redshift tables using Copy Command, we will learn how to Develop Python-based Data Engineering or ETL Applications using Redshift Cluster. We will learn how to perform CRUD operations and also how to take run COPY Commands using Python-based programs.
Once we understand how to build applications using Redshift Cluster, we will go through some of the key concepts used while creating Redshift Tables with Distkeys and Sortkeys.
We can also connect to remote databases such as Postgres and run queries directly on the remote database tables using Redshift Federated Queries and also we can run queries on top of Glue or Athena Catalog using Redshift Spectrum. You will learn how to leverage Redshift Federated Queries and Spectrum to process data in remote Database tables or s3 without copying the data.
You will also get an overview of Amazon Redshift Serverless as part of Getting Started with Amazon Redshift Serverless.
Once you learn Amazon Redshift Serverless, you will end up deploying a Pipeline where a Spark Application is deployed on AWS EMR Cluster which will load the data processed by Spark into Redshift.

English

language

Content

Introduction to Mastering Amazon Redshift and Serverless for Data Engineers

Getting Started with Amazon Redshift

Getting Started with Amazon Redshift – Introduction

Create Redshift Cluster using Free Trial

Connecting to Database using Redshift Query Editor

Get list of tables querying information schema

Run Queries against Redshift Tables using Query Editor

Create Redshift Table using Primary Key

Insert Data into Redshift Tables

Update Data in Redshift Tables

Delete data from Redshift tables

Redshift Saved Queries using Query Editor

Deleting Redshift Cluster

Restore Redshift Cluster from Snapshot

Copy Data from s3 into Redshift Tables

Copy Data from s3 to Redshift – Introduction

Setup Data in s3 for Redshift Copy

Create Database and Table for Redshift Copy Command

Create IAM User with full access on s3 for Redshift Copy

Run Copy Command to copy data from s3 to Redshift Table

Troubleshoot Errors related to Redshift Copy Command

Run Copy Command to copy from s3 to Redshift table

Validate using queries against Redshift Table

Overview of Redshift Copy Command

Create IAM Role for Redshift to access s3

Copy Data from s3 to Redshift table using IAM Role

Setup JSON Dataset in s3 for Redshift Copy Command

Copy JSON Data from s3 to Redshift table using IAM Role

Develop Applications using Redshift Cluster

Develop application using Redshift Cluster – Introduction

Allocate Elastic Ip for Redshift Cluster

Enable Public Accessibility for Redshift Cluster

Update Inbound Rules in Security Group to access Redshift Cluster

Create Database and User in Redshift Cluster

Connect to database in Redshift using psql

Change Owner on Redshift Tables

Download Redshift JDBC Jar file

Connect to Redshift Databases using IDEs such as SQL Workbench

Setup Python Virtual Environment for Redshift

Run Simple Query against Redshift Database Table using Python

Truncate Redshift Table using Python

Create IAM User to copy from s3 to Redshift Tables

Validate Access of IAM User using Boto3

Run Redshift Copy Command using Python

Redshift Tables with Distkeys and Sortkeys

Redshift Tables with Distkeys and Sortkeys – Introduction

Quick Review of Redshift Architecture

Create multi-node Redshift Cluster

Connect to Redshift Cluster using Query Editor

Create Redshift Database

Create Redshift Database User

Create Redshift Database Schema

Default Distribution Style of Redshift Table

Grant Select Permissions on Catalog to Redshift Database User

Update Search Path to query Redshift system tables

Validate table with DISTSTYLE AUTO

Create Cluster from Snapshot to the original state

Overview of Node Slices in Redshift Cluster

Overview of Distribution Styles

Distribution Strategies for retail tables in Redshift

Create Redshift tables with distribution style all

Troubleshoot and Fix Load or Copy Errors

Create Redshift Table with Distribution Style Auto

Create Redshift Tables using Distribution Style Key

Delete Cluster with manual snapshot

Redshift Federated Queries and Spectrum

Redshift Federated Queries and Spectrum – Introduction

Overview of integrating RDS and Redshift for Federated Queries

Create IAM Role for Redshift Cluster

Setup Postgres Database Server for Redshift Federated Queries

Create tables in Postgres Database for Redshift Federated Queries

Creating Secret using Secrets Manager for Postgres Database

Accessing Secret Details using Python Boto3

Reading Json Data to Dataframe using Pandas

Write JSON Data to Database Tables using Pandas

Create IAM Policy for Secret and associate with Redshift Role

Create Redshift Cluster using IAM Role with permissions on secret

Create Redshift External Schema to Postgres Database

Update Redshift Cluster Network Settings for Federated Queries

Performing ETL using Redshift Federated Queries

Clean up resources added for Redshift Federated Queries

Grant Access on Glue Data Catalog to Redshift Cluster for Spectrum

Setup Redshift Clusters to run queries using Spectrum

Quick Recap of Glue Catalog Database and Tables for Redshift Spectrum

Create External Schema using Redshift Spectrum

Run Queries using Redshift Spectrum

Cleanup the Redshift Cluster

Getting Started with Amazon Serverless Redshift

Create Workgroup and Namespace for Amazon Redshift Serverless

Overview of Amazon Redshift Serverless Namespaces and Workgroups

Quick Preview of Amazon Redshift Serverless Dashboard

Validate Amazon Redshift Serverless Workgroup by running a query

Enable Public Accessbility to Redshift Serverless Workgroup

Understand Redshift Serverless Workgroup Capacity measured in RPUs

Setup Redshift Spectrum Schema using Redshift Serverless

Introduction to Setup Redshift Spectrum Database using Redshift Serverless

Setup Files in S3 for Glue Catalog and Redshift Spectrum Database Tables

Cleanup Glue Catalog Database and Crawler using AWS Glue Console

Create Glue Crawler to Setup Glue Catalog Database and Tables for Redshift Shift

Run Glue Crawler to Create Glue Catalog Database and Tables for Redshift Spectu

Create Redshift Serverless Workgroup and Namespace for Redshift Spectrum

Accessing Redshift using Jupyter Based Environment of VS Code

Create Database and User for Data Mart using AWS Redshift Query Editor

Create Database and User for Data Mart using Jupyter Notebooks

Create External Schema in Redshift Database using Glue Catalog Database

Validate External Schema Setup using Redshift Query Editor

Basic SQL Queries using AWS Redshift SQL

Introduction to Basic SQL Queries using AWS Redshift SQL

Overview of Using WITH Clause in Redshift SQL Queries

Overview of Using Views in Redshift SQL Queries

Filtering Data using AWS Redshift SQL

Filtering Data using Boolean AND in Redshift SQL

Filtering Data using LIKE Operator in Redshift SQL

Filtering Data using Boolean OR and IN Operators in Redshift SQL

Overview of Count and Sum using Redshift SQL

Getting Total Average using Redshift SQL

Perform Total Aggregations based on Condition using Redshift SQL

Get Count and Distinct Count using Redshift SQL

Get Sum and Average on Order Item Measures using Redshift SQL

Perform Grouped Aggregations using Redshift SQL

Filtering on Aggregate Results using HAVING on GROUP BY

Overview of Order Of Execution of SQL using Group By and Having

Overview of Joins using Redshift Tables

Integration of AWS EMR with Amazon Redshift

Create AWS EC2 Elastic IP and Key Pair for AWS EMR Cluster

Create Shell Script for AWS EMR Bootstrap Action to install boto3

Create AWS EMR Cluster to integrate with Amazon Redshift

Attach Elastic IP to the AWS EMR Master Node and Validate SSH Connectivity

Setup Project for AWS EMR and Redshift Integration using VS Code Remote Deve

Setup Amazon Redshift Serverless Workgroup and Validate Connetivity

Connect to Redshift Serverless Workgroup from AWS EMR Master using psql

Setup Required Database and User in Amazon Redshift Serverless Workgroup

Install Python Library psycopg2 to connect to Redshift Databases using Python

Validate Redshift Connectivity using Python from AWS EMR Master Node

Create and Validate Redshift Database Tables

Create Secret for Redshift Database using AWS Secrets Manager

Validate Python Boto3 on Master Node of AWS EMR Cluster

Read Secret from AWS Secrets Manager using Python Boto3

Validate Redshift Connectivity from Master Node of AWS EMR Cluster

Launch Pyspark CLI with Redshift Dependencies on AWS EMR Master Node

Validate Redshift Connectivity using Spark on AWS EMR Cluster

Develop Code to Validate Spark and Redshift Integration using EMR

Setup GHActivity Data in AWS s3

Read and Process Data using Pyspark to write into Redshift Table

Develop Write Logic to load Spark Dataframe into Redshift Table

Validate Spark Load Process to Amazon Redshift Table

Understanding AWS s3 Temp Location specified in Spark Applications

Conclusion on Integration of AWS EMR with Amazon Redshift

Develop Applications using Spark SQL on AWS EMR Cluster

Setup AWS EMR Cluster to develop applications using Spark SQL

Setup Visual Studio Code Workspace using AWS EMR Master Node

Update PYTHONPATH to access Pyspark Libraries or Modules on AWS EMR Master Node

Setup Required Data Sets for Spark SQL

Upload Retail DB Files to AWS s3 using AWS CLI commands

Spark SQL and Temporary Views using Spark SQL on AWS EMR Cluster

Create Spark SQL Temporary Views for Orders and Order Items

Join and Aggregate using Spark SQL on AWS EMR Cluster

Write Query Results back to AWS s3 using Spark SQL on AWS EMR Cluster

Develop Script using Spark SQL Commands

Parameterize Bucket Name in Spark SQL Script

Deploy Spark SQL Script in s3 and Run using CLI on AWS EMR Master Node

Deploy Spark SQL Script as Step on AWS EMR Cluster

Conclusion to Develop Spark SQL Applications on EMR Cluster

Develop Applications using Redshift and Python boto3

Introduction to Integration of AWS Lambda Functions and Redshift

Setup Redshift Serverless Workgroup and Namespace

Setup Workspace for Integration of AWS Lambda Functions and Redshift

Validate JSON Data in AWS s3 using Pandas

Get Redshift Cluster Details using Python boto3

Get Redshift Serverless Details using Python Boto3

Run SQL Queries using Redshift Serverless and Python Boto3

Capture Redshift Query Results using Python Boto3

Create Database and User in Redshift Serverless Namespace

Create Table in Redshift Serverless Namespace

Overview of Python Boto3 Waiters

Run Queries against Redshift Table using Boto3 without credentials

Create and Validate Secret using AWS Secrets Manager for Redshift Workgroup

Copy Processed Data from AWS s3 into Redshift Table

Conclusion on Developing Applications using Redshift and Python Boto3

Integration of AWS Lambda Functions and Redshift

Introduction to Integration of AWS Lambda Functions and Redshift

Getting Started with Lambda Function using boto3

Running Lambda Function using AWS Lambda Console

Troubleshoot issues of AWS Lambda Functions using Cloudwatch Logs

Check Python Boto3 Version in AWS Lambda Function Run Time Environment

Overview of adding Lambda Layer to Upgrade Python Boto3 of Lambda Runtime

Copy Zip File with Latest Boto3 to AWS s3 for Lambda Layer

Create Lambda Layer to Upgrade Python Boto3 of Lambda Runtime

Create Function to Copy Data into Redshift Table using boto3

Update Lambda Handler to copy data to Redshift Table

Grant Permissions on Redshift Secret to AWS Lambda Function via IAM Role

Grant Permissions on Redshift Data API to AWS Lambda Function via IAM Role

Review Redshift Workgroup and Truncate Table before running Lambda Function

Run AWS Lambda Function to Copy Data to Redshift Table

Validate Data Copied by AWS Lambda Function in Redshift Table by running queries

Data Pipeline using AWS Step Functions with EMR and Redshift

Introduction to Data Pipeline using AWS Step Functions with EMR and Redshift

Getting Started with State Machines or Data Pipelines using AWS Step Function

Review Execution Details of State Machine or Data Pipeline using AWS Step Functi

Manage State Machines using AWS Step Functions State Machines Dashboard

Create State Machine with AWS Lambda Function to Copy Data From s3 to Redshift

Update State Machine with Permissions on Lambda to Copy Data From s3 to Redshift

Run State Machine with AWS Lambda Function to Copy Data From s3 to Redshift Tab

Overview of Managing AWS EMR Clusters using Boto3

Overview of AWS boto3 to Manage AWS EMR Clusters

Create AWS EMR Job Flow Cluster using Python Boto3

Add Spark SQL Script as Step to AWS EMR Cluster using Boto3

Overview of AWS EMR Waiters using Python Boto3

Terminate AWS EMR Cluster using waiters and Python Boto3

Overview of AWS Step Functions State Machine to execute Spark SQL on EMR

Create State Machine using AWS Step Function to create EMR Cluster

Grant Permissions to State Machine via Role to Create AWS EMR Cluster

Add Spark SQL Script as Step to AWS EMR Cluster using AWS Step Functions

Add Add Terminate AWS EMR Cluster Step to AWS Step Functions State Machine

Pass AWS EMR Step Details as Input to State Machine at Execution Time

Validate Spark SQL Script Execution as AWS EMR Step using State Machine

Create Data Pipeline with EMR and Redshift Integration using AWS Step Function

Grant Permissions on AWS EMR to role of State Machine with EMR and Redshift Inte

Run AWS Step Function State Machine with EMR and Redshift Integration

Validate AWS State Machine Execution with EMR and Redshift Integration

Best Practices to Build State Machines with AWS EMR and Redshift Integration

Enroll for Free