• Post category:StudyBullet-5
  • Reading time:9 mins read


What you will learn

Understanding of the entire data integration process using PDI

Extracting data from all popular data sources including Excel, JSON, Zipped files, TXT files and even cloud storage

Cleaning the data using Pentaho Data Integration

Applying business rules on the data in PDI

Different types of Data transformations

Loading the data into different formats

Managing SQL database using PDI

Metadata Injection – a powerful tool offered by PDI

Understanding of the concepts of data marts and data warehouse

Description

What is ETL?

The ETL (extract, transform, load) process is the most popular method of collecting data from multiple sources and loading it into a centralized data warehouse. ETL is an essential component of data warehousing and analytics.

Why Pentaho for ETL?

Pentaho has phenomenal ETL, data analysis, metadata management and reporting capabilities. Pentaho is faster than other ETL tools (including Talend). Pentaho has a user-friendly GUI which is easier and takes less time to learn. Pentaho is great for beginners. Also, Pentaho Data Integration (PDI) is an important skill in data analytics field.

How much can I earn?

In the US, median salary of an ETL developer is $74,835 and in India average salary is Rs. 7,06,902 per year. Accenture, Tata Consultancy Services, Cognizant Technology Solutions, Capgemini, IBM, Infosys etc. are major recruiters for people skilled in ETL tools; Pentaho ETL is one of the most sought-after skills that recruiters look for. Demand for Pentaho Data Integration (PDI) techniques is increasing day after day.

What makes us qualified to teach you?

The course is taught by Abhishek and Pukhraj. Instructors of the course have been teaching Data Science and Machine Learning for over a decade. We have experience in teaching and implementing Pentaho ETL, Pentaho Data Integration (PDI) for data mining and data analysis purposes.

We are also the creators of some of the most popular online courses – with over 150,000 enrollments and thousands of 5-star reviews like these ones:

I had an awesome moment taking this course. It broaden my knowledge more on the power use of Excel as an analytical tools. Kudos to the instructor! – Sikiru

Very insightful, learning very nifty tricks and enough detail to make it stick in your mind. – Armand


Get Instant Notification of New Courses on our Telegram channel.


Our Promise

Teaching our students is our job and we are committed to it. If you have any questions about the course content on Pentaho, ETL, practice sheet or anything related to any topic, you can always post a question in the course or send us a direct message.

Download Practice files, take Quizzes, and complete Assignments

With each lecture, there is a practice sheet attached for you to follow along. You can also take quizzes to check your understanding of concepts on Pentaho, ETL, Pentaho Data Integration, Pentaho ETL. Each section contains a practice assignment for you to practically implement your learning on Pentaho, ETL, Pentaho Data Integration, Pentaho ETL. Solution to Assignment is also shared so that you can review your performance.

By the end of this course, your confidence in using Pentaho ETL and Pentaho Data Integration (PDI) will soar. You’ll have a thorough understanding of how to use Pentaho for ETL and Pentaho Data Integration (PDI) techniques for study or as a career opportunity.

Go ahead and click the enroll button, and I’ll see you in lesson 1 of this Pentaho ETL course!

Cheers

Start-Tech Academy

English
language

Content

Introduction

Welcome to the course
Course Resources

Pentaho Data Integration (PDI) Installation and Setup

Setting up environment and installing PDI
Opening Spoon – The Graphical UI

A Simple ETL Demonstration

The example problem statement
Demonstration of a PDI transformation
Demonstration of a PDI Job

The ETL process: The practical part begins here

Data and the ETL process

DATA EXTRACTION: Extracting tabular data

Manually entering data into PDI
Inputting Data from a TXT (text) file
Input from multiple CSV files at the same time
Inputting Data from an Excel file
Extracting Data from Zipped files

DATA EXTRACTION: Extracting non-tabular data

Extracting from XML
Extracting from JSON

Extracting from an SQL table

Plan for importing sales Data
Creating Sales table in SQL
Extracting from an SQL table

Storing and Retrieving Data from Cloud storage

Storing Data on AWS S3
Reading data from AWS S3

Merging Data Streams

Concepts: Merging Data Streams
Sorted Merge Step

Data Cleansing

Introduction to Data Cleansing
Value Mapper Step
Replace in String Step
Fuzzy Match concepts
Fuzzy Match Step in PDI
Fuzzy Match Algorithms
Formula Step and changing data format
Common Data Cleaning Steps

Data Validation

Introduction to Data validation
Data_validation 1 – String-to-Int and integer range validations
Data validation 2 – Checking Reference Values using stream look-up
Data validation 3 – Order date < shipping date using calculator step
Common Data Validation steps

Error Handling

Correcting the errors and merging with main stream
Writing the errors to the log
Writing the errors to a separate file

Transformation and Analytics steps

Concatenating Address Fields
Data Aggregation using Group-by
Normalization and Denormalization
Number Range Step