• Post category:StudyBullet-14
  • Reading time:23 mins read


In-Depth Course on Amazon Redshift, Redshift Serverless, Integration with EMR, AWS Step Functions, AWS Lambda and more

What you will learn

Getting Started with Amazon Redshift using AWS Web Console

Copy Data from s3 into AWS Redshift Tables using Redshift Queries or Commands

Develop Applications using Redshift Cluster using Python as Programming Language

Copy Data from s3 into AWS Redshift Tables using Python as Programming Language

Create Tables using Databases setup on AWS Redshift Database Server using Distribution Keys and Sort Keys

Run AWS Redshift Federated Queries connecting to traditional RDBMS Databases such as Postgres

Perform ETL using AWS Redshift Federated Queries using Redshift Capacity

Integration of AWS Redshift and AWS Glue Catalog to run queries using Redshift Spectrum

Run AWS Redshift Spectrum Queries using Glue Catalog Tables on Datalake setup using AWS s3

Getting Started with Amazon Redshift Serverless by creating Workgroup and Namespace

Integration of AWS EMR Cluster with Amazon Redshift using Serverless Workgroup

Develop and Deploy Spark Application on AWS EMR Cluster where the processed data will be loaded into Amazon Redshift Serverless Workgroup

Description

AWS or Amazon Redshift is one of the key AWS Services used in building Data Warehouses or Data Marts to serve reports and dashboards for business users. As part of this course, you will end up learning AWS or Amazon Redshift by going through all the important features of AWS or Amazon Redshift to build Data Warehouses or Data Marts.

We have covered features such as Federated Queries, Redshift Spectrum, Integration with Python, AWS Lambda Functions, Integration of Redshift with EMR, and End-to-End Pipeline using AWS Step Functions.


Get Instant Notification of New Courses on our Telegram channel.


Here is the detailed outline of the course.

  • First, we will understand how to Get Started with Amazon Redshift using AWS Web Console. We will see how to create a cluster, how to connect to the cluster, and also how to run the queries using a Web-based query editor. We will also go ahead and create a Database and tables in the Redshift Cluster. Once we set up a Database and tables, we will also go through the details related to CRUD Operations against tables in Databases in Redshift Cluster.
  • Once we have the databases and tables in Redshift Cluster, it is time for us to understand how to get data into the tables in Redshift Cluster. One of the common approaches we use to get data into the Redshift cluster is by Copying Data from s3 into Redshift Tables. We will go through the step-by-step process of copying the data into Redshift tables from s3 using the copy command.
  • Python is one of the prominent programming languages to build Data Engineering or ETL Applications. It is extensively used to build ETL Jobs to get data into Database Tables in Redshift Cluster. Once we understand how to get data from s3 to Redshift tables using Copy Command, we will learn how to Develop Python-based Data Engineering or ETL Applications using Redshift Cluster. We will learn how to perform CRUD operations and also how to take run COPY Commands using Python-based programs.
  • Once we understand how to build applications using Redshift Cluster, we will go through some of the key concepts used while creating Redshift Tables with Distkeys and Sortkeys.
  • We can also connect to remote databases such as Postgres and run queries directly on the remote database tables using Redshift Federated Queries and also we can run queries on top of Glue or Athena Catalog using Redshift Spectrum. You will learn how to leverage Redshift Federated Queries and Spectrum to process data in remote Database tables or s3 without copying the data.
  • You will also get an overview of Amazon Redshift Serverless as part of Getting Started with Amazon Redshift Serverless.
  • Once you learn Amazon Redshift Serverless, you will end up deploying a Pipeline where a Spark Application is deployed on AWS EMR Cluster which will load the data processed by Spark into Redshift.
English
language

Content

Introduction to Mastering Amazon Redshift and Serverless for Data Engineers

Introduction to Mastering Amazon Redshift and Serverless for Data Engineers

Getting Started with Amazon Redshift

Getting Started with Amazon Redshift – Introduction
Create Redshift Cluster using Free Trial
Connecting to Database using Redshift Query Editor
Get list of tables querying information schema
Run Queries against Redshift Tables using Query Editor
Create Redshift Table using Primary Key
Insert Data into Redshift Tables
Update Data in Redshift Tables
Delete data from Redshift tables
Redshift Saved Queries using Query Editor
Deleting Redshift Cluster
Restore Redshift Cluster from Snapshot

Copy Data from s3 into Redshift Tables

Copy Data from s3 to Redshift – Introduction
Setup Data in s3 for Redshift Copy
Create Database and Table for Redshift Copy Command
Create IAM User with full access on s3 for Redshift Copy
Run Copy Command to copy data from s3 to Redshift Table
Troubleshoot Errors related to Redshift Copy Command
Run Copy Command to copy from s3 to Redshift table
Validate using queries against Redshift Table
Overview of Redshift Copy Command
Create IAM Role for Redshift to access s3
Copy Data from s3 to Redshift table using IAM Role
Setup JSON Dataset in s3 for Redshift Copy Command
Copy JSON Data from s3 to Redshift table using IAM Role

Develop Applications using Redshift Cluster

Develop application using Redshift Cluster – Introduction
Allocate Elastic Ip for Redshift Cluster
Enable Public Accessibility for Redshift Cluster
Update Inbound Rules in Security Group to access Redshift Cluster
Create Database and User in Redshift Cluster
Connect to database in Redshift using psql
Change Owner on Redshift Tables
Download Redshift JDBC Jar file
Connect to Redshift Databases using IDEs such as SQL Workbench
Setup Python Virtual Environment for Redshift
Run Simple Query against Redshift Database Table using Python
Truncate Redshift Table using Python
Create IAM User to copy from s3 to Redshift Tables
Validate Access of IAM User using Boto3
Run Redshift Copy Command using Python

Redshift Tables with Distkeys and Sortkeys

Redshift Tables with Distkeys and Sortkeys – Introduction
Quick Review of Redshift Architecture
Create multi-node Redshift Cluster
Connect to Redshift Cluster using Query Editor
Create Redshift Database
Create Redshift Database User
Create Redshift Database Schema
Default Distribution Style of Redshift Table
Grant Select Permissions on Catalog to Redshift Database User
Update Search Path to query Redshift system tables
Validate table with DISTSTYLE AUTO
Create Cluster from Snapshot to the original state
Overview of Node Slices in Redshift Cluster
Overview of Distribution Styles
Distribution Strategies for retail tables in Redshift
Create Redshift tables with distribution style all
Troubleshoot and Fix Load or Copy Errors
Create Redshift Table with Distribution Style Auto
Create Redshift Tables using Distribution Style Key
Delete Cluster with manual snapshot

Redshift Federated Queries and Spectrum

Redshift Federated Queries and Spectrum – Introduction
Overview of integrating RDS and Redshift for Federated Queries
Create IAM Role for Redshift Cluster
Setup Postgres Database Server for Redshift Federated Queries
Create tables in Postgres Database for Redshift Federated Queries
Creating Secret using Secrets Manager for Postgres Database
Accessing Secret Details using Python Boto3
Reading Json Data to Dataframe using Pandas
Write JSON Data to Database Tables using Pandas
Create IAM Policy for Secret and associate with Redshift Role
Create Redshift Cluster using IAM Role with permissions on secret
Create Redshift External Schema to Postgres Database
Update Redshift Cluster Network Settings for Federated Queries
Performing ETL using Redshift Federated Queries
Clean up resources added for Redshift Federated Queries
Grant Access on Glue Data Catalog to Redshift Cluster for Spectrum
Setup Redshift Clusters to run queries using Spectrum
Quick Recap of Glue Catalog Database and Tables for Redshift Spectrum
Create External Schema using Redshift Spectrum
Run Queries using Redshift Spectrum
Cleanup the Redshift Cluster

Getting Started with Amazon Serverless Redshift

Create Workgroup and Namespace for Amazon Redshift Serverless
Overview of Amazon Redshift Serverless Namespaces and Workgroups
Quick Preview of Amazon Redshift Serverless Dashboard
Validate Amazon Redshift Serverless Workgroup by running a query
Enable Public Accessbility to Redshift Serverless Workgroup
Understand Redshift Serverless Workgroup Capacity measured in RPUs

Setup Redshift Spectrum Schema using Redshift Serverless

Introduction to Setup Redshift Spectrum Database using Redshift Serverless
Setup Files in S3 for Glue Catalog and Redshift Spectrum Database Tables
Cleanup Glue Catalog Database and Crawler using AWS Glue Console
Create Glue Crawler to Setup Glue Catalog Database and Tables for Redshift Shift
Run Glue Crawler to Create Glue Catalog Database and Tables for Redshift Spectu
Create Redshift Serverless Workgroup and Namespace for Redshift Spectrum
Accessing Redshift using Jupyter Based Environment of VS Code
Create Database and User for Data Mart using AWS Redshift Query Editor
Create Database and User for Data Mart using Jupyter Notebooks
Create External Schema in Redshift Database using Glue Catalog Database
Validate External Schema Setup using Redshift Query Editor

Basic SQL Queries using AWS Redshift SQL

Introduction to Basic SQL Queries using AWS Redshift SQL
Overview of Using WITH Clause in Redshift SQL Queries
Overview of Using Views in Redshift SQL Queries
Filtering Data using AWS Redshift SQL
Filtering Data using Boolean AND in Redshift SQL
Filtering Data using LIKE Operator in Redshift SQL
Filtering Data using Boolean OR and IN Operators in Redshift SQL
Overview of Count and Sum using Redshift SQL
Getting Total Average using Redshift SQL
Perform Total Aggregations based on Condition using Redshift SQL
Get Count and Distinct Count using Redshift SQL
Get Sum and Average on Order Item Measures using Redshift SQL
Perform Grouped Aggregations using Redshift SQL
Filtering on Aggregate Results using HAVING on GROUP BY
Overview of Order Of Execution of SQL using Group By and Having
Overview of Joins using Redshift Tables

Integration of AWS EMR with Amazon Redshift

Create AWS EC2 Elastic IP and Key Pair for AWS EMR Cluster
Create Shell Script for AWS EMR Bootstrap Action to install boto3
Create AWS EMR Cluster to integrate with Amazon Redshift
Attach Elastic IP to the AWS EMR Master Node and Validate SSH Connectivity
Setup Project for AWS EMR and Redshift Integration using VS Code Remote Deve
Setup Amazon Redshift Serverless Workgroup and Validate Connetivity
Connect to Redshift Serverless Workgroup from AWS EMR Master using psql
Setup Required Database and User in Amazon Redshift Serverless Workgroup
Install Python Library psycopg2 to connect to Redshift Databases using Python
Validate Redshift Connectivity using Python from AWS EMR Master Node
Create and Validate Redshift Database Tables
Create Secret for Redshift Database using AWS Secrets Manager
Validate Python Boto3 on Master Node of AWS EMR Cluster
Read Secret from AWS Secrets Manager using Python Boto3
Validate Redshift Connectivity from Master Node of AWS EMR Cluster
Launch Pyspark CLI with Redshift Dependencies on AWS EMR Master Node
Validate Redshift Connectivity using Spark on AWS EMR Cluster
Develop Code to Validate Spark and Redshift Integration using EMR
Setup GHActivity Data in AWS s3
Read and Process Data using Pyspark to write into Redshift Table
Develop Write Logic to load Spark Dataframe into Redshift Table
Validate Spark Load Process to Amazon Redshift Table
Understanding AWS s3 Temp Location specified in Spark Applications
Conclusion on Integration of AWS EMR with Amazon Redshift

Develop Applications using Spark SQL on AWS EMR Cluster

Setup AWS EMR Cluster to develop applications using Spark SQL
Setup Visual Studio Code Workspace using AWS EMR Master Node
Update PYTHONPATH to access Pyspark Libraries or Modules on AWS EMR Master Node
Setup Required Data Sets for Spark SQL
Upload Retail DB Files to AWS s3 using AWS CLI commands
Spark SQL and Temporary Views using Spark SQL on AWS EMR Cluster
Create Spark SQL Temporary Views for Orders and Order Items
Join and Aggregate using Spark SQL on AWS EMR Cluster
Write Query Results back to AWS s3 using Spark SQL on AWS EMR Cluster
Develop Script using Spark SQL Commands
Parameterize Bucket Name in Spark SQL Script
Deploy Spark SQL Script in s3 and Run using CLI on AWS EMR Master Node
Deploy Spark SQL Script as Step on AWS EMR Cluster
Conclusion to Develop Spark SQL Applications on EMR Cluster

Develop Applications using Redshift and Python boto3

Introduction to Integration of AWS Lambda Functions and Redshift
Setup Redshift Serverless Workgroup and Namespace
Setup Workspace for Integration of AWS Lambda Functions and Redshift
Validate JSON Data in AWS s3 using Pandas
Get Redshift Cluster Details using Python boto3
Get Redshift Serverless Details using Python Boto3
Run SQL Queries using Redshift Serverless and Python Boto3
Capture Redshift Query Results using Python Boto3
Create Database and User in Redshift Serverless Namespace
Create Table in Redshift Serverless Namespace
Overview of Python Boto3 Waiters
Run Queries against Redshift Table using Boto3 without credentials
Create and Validate Secret using AWS Secrets Manager for Redshift Workgroup
Copy Processed Data from AWS s3 into Redshift Table
Conclusion on Developing Applications using Redshift and Python Boto3

Integration of AWS Lambda Functions and Redshift

Introduction to Integration of AWS Lambda Functions and Redshift
Getting Started with Lambda Function using boto3
Running Lambda Function using AWS Lambda Console
Troubleshoot issues of AWS Lambda Functions using Cloudwatch Logs
Check Python Boto3 Version in AWS Lambda Function Run Time Environment
Overview of adding Lambda Layer to Upgrade Python Boto3 of Lambda Runtime
Copy Zip File with Latest Boto3 to AWS s3 for Lambda Layer
Create Lambda Layer to Upgrade Python Boto3 of Lambda Runtime
Create Function to Copy Data into Redshift Table using boto3
Update Lambda Handler to copy data to Redshift Table
Grant Permissions on Redshift Secret to AWS Lambda Function via IAM Role
Grant Permissions on Redshift Data API to AWS Lambda Function via IAM Role
Review Redshift Workgroup and Truncate Table before running Lambda Function
Run AWS Lambda Function to Copy Data to Redshift Table
Validate Data Copied by AWS Lambda Function in Redshift Table by running queries

Data Pipeline using AWS Step Functions with EMR and Redshift

Introduction to Data Pipeline using AWS Step Functions with EMR and Redshift
Getting Started with State Machines or Data Pipelines using AWS Step Function
Review Execution Details of State Machine or Data Pipeline using AWS Step Functi
Manage State Machines using AWS Step Functions State Machines Dashboard
Create State Machine with AWS Lambda Function to Copy Data From s3 to Redshift
Update State Machine with Permissions on Lambda to Copy Data From s3 to Redshift
Run State Machine with AWS Lambda Function to Copy Data From s3 to Redshift Tab
Overview of Managing AWS EMR Clusters using Boto3
Overview of AWS boto3 to Manage AWS EMR Clusters
Create AWS EMR Job Flow Cluster using Python Boto3
Add Spark SQL Script as Step to AWS EMR Cluster using Boto3
Overview of AWS EMR Waiters using Python Boto3
Terminate AWS EMR Cluster using waiters and Python Boto3
Overview of AWS Step Functions State Machine to execute Spark SQL on EMR
Create State Machine using AWS Step Function to create EMR Cluster
Grant Permissions to State Machine via Role to Create AWS EMR Cluster
Add Spark SQL Script as Step to AWS EMR Cluster using AWS Step Functions
Add Add Terminate AWS EMR Cluster Step to AWS Step Functions State Machine
Pass AWS EMR Step Details as Input to State Machine at Execution Time
Validate Spark SQL Script Execution as AWS EMR Step using State Machine
Create Data Pipeline with EMR and Redshift Integration using AWS Step Function
Grant Permissions on AWS EMR to role of State Machine with EMR and Redshift Inte
Run AWS Step Function State Machine with EMR and Redshift Integration
Validate AWS State Machine Execution with EMR and Redshift Integration
Best Practices to Build State Machines with AWS EMR and Redshift Integration