Big Data and Hadoop Training Introduction
Introduction to Big Data Hadoop
Scenario of Big Data Hadoop
Write Anatomy
Continuation os Write Anatomy
Read Anatomy
Continuation os Read Anatomy
Word Count in Hadoop
Running Hadoop Application
Continuation Hadoop Application
Working on Sample Program
Creating Method Map
Iterable Values
Output Path
Scary Catch Box
Hadoop Architecture and HDFS
Introduction to Hadoop Admin
Limitations of Existing System
Hadoop Key Characteristics
Hadoop Distributed File System
Storage Layer of Hadoop
Hadoop 1.0 Core Components
FS Images
Secondary Name Node
HDFC Architecture
Block Placement Policy
Assignments
Hadoop Architecture Cluster Setup
Installation of Hadoop in Vmware Workstation
Hadoop Package Installation
Configuration of Host Name and Gateway
Copying of ISO File to Centos
Installation of SSH File Using Yum
Copy the Public Key to Authorized Key in SSH
Setup for Block Size and Mapped
Create SSH -keygen for HD User
Start the Map Reduce in Hadoop
Creating a Clone for Hadoop
Changing the Hostname
Configuring Hadoop Site
Slave File Configuration
Creating Name node and Data Node In Hadoop
Understanding HDFS
Hadoop Core Config Files
Hadoop Cluster and Password less SSH
Configuring Rack Awareness
Configuring Rack Awareness Continues
Running DFS Admin Report
Hadoop Map Reduce
Running Hadoop NameNode
Executing Hadoop Command
Writing File in Hadoop Cluster
Understanding FS Command
Directories of Data
Fie System Check
Writing Data in HDFS
Checkpointing Node
Merging the Metadata
Cluster in Safe Mode
Cluster in Maintainance Mode
Commissioning of Data Nodes
Name Node
Validating the Data Node
Storage Considerations
MapReduce Fundamentals
Secondary Sort Hadoop
Creating Composite Key
Continue on Composite Key
Word Count Group
Importance of Partition
Hadoop FS – LS
Joins in Hadoop
Creating Configuration Object
Setup Method
Map Side Join Mapper
Hadoop Commands
Combiner in Hadoop
Continue on Combiner in Hadoop
Uploading Combiner Jar
Introduction to Real World
Ratings Mapper
Movie and Ratings Runner
Movie and Rating Calc Jar
Total Ratings By A User
User Rating Reducer
User Rating Class
Yarn Basic Tutorial
Node Manager
MapReduce Advanced
Running a MapReduce Program
Running a MapReduce Program Continues
HDFS File System
Combination of Word Count Functionality
Word Count With Tools
Log Processor
Advanced MapReduce and PIG
More on Advanced MapReduce
Executing Similar Program
HDI Data and Export Data
Creating New Java Class
Text Out Inverted Indexer
Introduction to MapReduce on Hadoop
Java Build Path
Local MapReduce
Using MapReduce
Sequence file Format
Parse Weblogs
Page View Mapper
Analytics Program
Analytics Program Continue
Inverted Index Map Reduce
Friend Sofa Friend
Cloud era Local Host
Cloud era Local Host Output
Final Module MapReduce Program
Strands
File Path Filter
Example
Example Continue
HIVE Fundamentals
Introduction to HIVE
HIVE Data Base
Load Data Command
How to Replace Column
External Table
HIVE Metastore
What is Hive Partition
Creating Partition Table
Insert Overwrite Table
Dynamic Partition True
Hive Bucketing
Decomposing Data Sets
Hive Joins
Hive Joins Continue
Skew Join
What is Serde
Serde in Hive
Hive UDF
Hive UDF Continues
More Hive UDF
Maxcale Function
Hive Example Use Case
Hive Advanced
Introduction to Hive Concepts and Hands-on Demonstration
Internal Table and External Table
Inserting Data Into Tables
Date and Mathematical Functions
Conditional Statements
Explode and Lateral View
Sorting
Join
Map Join
Static and Dynamic Partitioning
More on Dynamic Partitioning
Alter Command
MSCK Command
Bucketing
Table Sampling
Archiving
Ranks
Creating Views
Advantages of views and Altering Views
What is Indexing
Compact and Bitmap Index Running Time
Hive Commands in Bash Shell
Hive Variables – Hiveconf
Hive Variables -Hiveconf in Bash Shell
Configuring a Hive Var Variable
Variable Substitution
Word Count
Hive Architecture
Parallelism in Hive
Table Properties in Hive
Null Format Properties
Null Format Properties Continues
Purge Commands in Hives
Slowing Changing Dimension
Implement the SCD
Example of the SCD
How to Load XML Data in Hive
How to Load XML Data in Hive Continue
No Drop and Offline in Hive
Immutable Table
How to Create Hive RC File
Multiple Tables
Merging Hive Created Files and Function rLike
Various Configuration Settings in Hive
Various Configuration Settings in Hive Continues
Compressing Various Files in Hive
Different Modes in Hive
File Compression in Hive
Type of Mode in Hive
Comparison of Internal and External Table
PIG Fundamentals
Introduction to Pig
Features of Apache Pig
Pig Vs Hive
Apache Pig Local and MR Modes
Launching Local Modes
Data Types in Pig
Pig Commands – Store and Load
Load Command
Pig Commands – Group
CoGroup Operator
Join and Cross operators in Pig
Join and Cross operators in Pig Continues
Union and Split Operators in Pig
More on Split Operators
Filter Distinct and For each
Pig Functions
Pig Functions Continues
Input Data Size
PIG Advanced
Getting Started with PIG
Installation Process
PIG Latin
Uploading the File in HDFS
PIG Script
PIG Latin Basics
Up and Running with Pig
Loading and Storage
Loading and Storage Continue
Debugging
Grunt Shell
UDFs and Piggy Bank
NoSQL Fundamentals
A Brief History of NoSQL
Schema Agnostic
Nonrelational
Enterprise NoSQL
Recent Trends in IT
NoSQL Benefits and Precautions
Managing Different Data Types
Triple and Graph Store
Hybrid NoSQL Databases
Applying Consistency Method
Choosing ACID or BASE?
Developing Application on NoSQL
Semantics
Public Cloud
Managing Availability
Versioning Data
Apache Mahout
What is Mahout
Mahout Architecture
Subversion Installation
Item Based Recommendation
Example- CBayes Classifier
Command Line Options
Canopy Clustering
Basic Recommender
Practical Examples
Mahout Seqdumper Command
Running Code through Eclipse
Reading from Code
Introduction to Apache Mahout Deep Dive
Use Cases
Recommendation
Example – Tanimoto Distance
How to Use Mahout?
Exercise
Example – Evaluation
Deep Dive Canopy Clustering
Classification
Vector File
Naรฏve Bayes Classifier from Code
KMeans Clustering
Logistic Regression
Apache Oozie
Introduction to Apache Oozie
Discuss Action in Detail
Discuss Parameters
Email Action in Oozie
Hadoop FS Action in Oozie
Hive Action in Oozie
Hive Action in Oozie Continue
Control Node
Control Node Continue
Pig Action in Oozie
Pig Action in Oozie Continues
Oozie Coordinators
Oozie Workflow Applications
Oozie Workflow Applications Continues
Apache Flume
Introduction to Flume
Data Flow in Flume
Flume Netcat Example
Apache Storm
Introduction
Description of Hadoop
Storm Introduction
Apache Storm History
Features of Apache Storm
Architecture of Apache Storm
Architcture Explanation in Detail
Topology
Spouts and Bolts
Stream
Installation Process
Stream Grouping
Stream Grouping Continue
Reliability
Tasks
Workers
Java Installation and Zookeeper
Zookeeper installation
Eclipse Installation
Command line Client
Parallelism in Storm Topology
Apache Avro
Introduction to Apche Avro
Using Avro with Sqoop
Supported Primitive Data Types in Avro
Apache Spark Fundamentals
Introduction to Apache Spark Spark
Spark Context
Spark Components
Introduction to Spark RDD Basics
Use of Filter Function
RDD Transformations in Spark
RDD Transformations in Spark Continues
RDD Persistence in Spark
Group Sort and Actions on Pair RDDs
Spark File Formats
Spark File Formats Continues
Apache Spark Advanced
Introduction to Connecting to Twitter Using Spark
Flowchart of Spark
Components of Spark
Different Services Running on YARN
Introduction to Scala
Case Classes and Pattern Matching
Installation of Scala
Variables and Functions
Variables and Functions Continues
Loops
Collections
More on Collections
Abstract Class
Example of the Abstract Class
Trait
Example of the Trait
Exception
Practical Example of Exceptions
Customize Exceptions of Scala Project
Modifiers
Strings
Methods in Strings
Methods in Strings Continue
Array
RDD in Spark
RDD in Spark Continues
Different Operations
Transformation Operations
Action Operations
Action Operations Continues
Maven Creation
Create Scala Project
Difference between Hadoop 1.x and 2.x
Connection to Twitter Using Spark Streaming
How to Connect Twitter Using Spark Application
More on Connect Twitter Using Spark Application
Hadoop Project 01 – Sales Data Analysis
Introduction to Sales Data Analysis Using Hadoop- HDFS
Working with Problem Statement 2
Working with Problem Statement 3
Working with Problem Statement 4
Working with Problem Statement 5
Working with Problem Statement 6
Hadoop Project 02 – Tourism Survey Analysis
Introduction to Tourism Survey Analysis Using HDFS
Average of Money Spend By Tourist in our Country
Join Country and Nationality
Total no. of Tourist Less than 18
Change the Country Name Column
Number of Males from Australia
Tourism Survey General Detail and Spending Details
Hadoop Project 03 – Faculty Data Management
Introduction to Faculty Data Management Using HDFS
Education Industry
Adding New Column in Faculty Database Management
Changing Column Name and Data Type
Drop Column From Table and Add New Column
Hadoop Project 04 – E-Commerce Sales Analysis
Introduction to E-Commerce Sales Analysis Using Hadoop
Customer Detail not from USA
Customer Detail Account Created After 2009
Customer Details whose Sales are Less than 3600$
Details of Customer Name ’’Anushka
Hadoop Project 05 – Salary Analysis
Part time Employee using Salary Analysis
Details of Administrative Assistance
Data Sets in Ascending Order
Job Title for Each Department
Changing Name to Employee Name
Total number of Employee in Hourly Basis
Annual Salary Taken By Finance Department
Hadoop Project 06 – Health Survey Analysis using HDFS
Introduction to Health Analysis
Show Rows Data From Health Data Table
Adding New Data in Health Data Table
Get Data From HDFS Database from SQL Database
Getting Data in New HDFS Directory from SQL
Export Data Table From HDFS to SQL
Get Details of City Population in Health Dataset
Hadoop Project:07 – Traffic Violation Analysis
Introduction to Traffic Violation Analysis
Introduction to Traffic Violation Analysis Continues
Get Table From SQL to HDFS Directory
Output of Table From SQL to HDFS Directory
List Databases and Tables of SQl in HDFS
Create and Execute jobs in Traffic Violation
Import Data for Personal Injuries from SQL
Get Data For State Maryland
Extract Data of Traffic Violation from HDFS to My SQL
Hadoop Project 08 – PIG/MapReduce – Analyze Loan Dataset
Introduction to Analyze the Loan Data Set
Introduction to Analyze the Loan Data Set Continues
Overall Average Risk
Coding Average Risk
Coding Average Risk Continues
More on Average Risk
Average Risk Per Location
Average Risk per Loan Type
Calculate Average Risk Per Category
Calculate Average Risk Per category Continues
Comparable Interface in MapReduce
Implementation and Execution MapReduce
Average Risk Per Category in PIG
Average Risk Per Category and Location in PIG
Average Risk Per Category and Location in PIG Continues
Average Risk Per Category in Hive
Analysis Bank Loan Dataset in HIVE
Analysis Bank Loan Dataset in HIVE Continues
Understand of Sqoop and Get RDBMS Data in HDFS
Hadoop Project:09 – HIVE – Case Study on Telecom Industry
Introduction of Hive
Simple and Complex Datatype in Hive
Clusters
Database Command in Hive
Tables Commands in Hive
Manage Tables
External Tables
Introduction to Partitioning
Partition Command
Bucketing
Table Contr Services in Hive
Example of Contr Services
Example of Contr Services Continues
Creating Contract All Table
Hadoop Project:10 – HIVE/MapReduce – Customers Complaints Analysis
Introduction to Customer Complaint Project in Big Data
Complaint Filed Under Each File
Creating Driver Files and Jar Manifest
Creating Driver Files and Jar Manifest Continues
Complaint Filed from Particular Location
User Defined Location
List of Complaint Grouped By Location
Hadoop Project 11 – HIVE/PIG/MapReduce/Sqoop – Social Media Analysis
Introduction to Social Media Industry
Book Marking Website
Book Marking Website Continues
Understanding Sqoop
Get Data from RDMS to HDFS
Execute Map Reduce Program in order to Process XML File
Analyze Book Performance By Reviews Using Codev
Analyze Book Performance By Reviews Using Code Continues
Analyse Book By Location
Example of Analyse Book By Location
Analyse Book Reader Against Author
How to process XML File in PIG
How to process XML File in PIG Continues
Analyze Book Performance in XML File in PIG
More on Analyze Book Performance in XML File in PIG
Pig XML File Output Using Book
Pig XML File Output Using Location
Pig XML File Output Using Location Continues
Understanding Complex Data Set Using Hive
Understanding Complex Data Set Using Hive Continues
Create Array in Map Reduce Using Hive
Book Marking Type Data Set Using Complex Type
Output of Book Marking Type Data Set
Hadoop Project 12 – HIVE/PIG – Sensor Data Analysis
Introduction to Sensor Data Analysis
Introduction to Sensor Data Analysis Continues
Example of Sensor Data Analysis
Uderstanding Basic of Big Data and MapReduce
More on Big Data and MapReduce
Converting Json File into Simple Text Format
Converting Json File into Simple Text Format Continues
Output for Json File format
Diffrence Between Pig‚ MapReduce and Hive
More on Pig‚ MapReduce and Hive
Sensor Data Processing in Pig
Working With Pig Function
Types of Function in Pig
Example of Pig Function
Working on Use Cases Using Functions in PIG
Use Case Data Flow in Pig
Ratio Data Flow in Pig
More on Use Case in Pig
More on Use Case in Pig Continues
Example od Ratio Education in Pig
Approach Process the Json File in Hive
Features and Query in Hive
Work on Json Use Cases Using Hive
Work on Json Use Cases Using Hive Continues
Output of Json Usecases Using Hive
More on Json Usecses in Hive
Summary of Sensor Data Processing
Hadoop Project 13 – PIG/MapReduce – Youtube Data Analysis
Introduction to Youtube Data Analysis Using Hadoop
Introduction to Youtube Data Analysis Using Hadoop Continues
Data Preparation For Youtube Data Analysis using Hadoop
Basics of Big Data and Map Reduce
More on Big Data and Map Reduce
Working with Analysis Senario using Map Reduce
Example of Youtube Analyser using Map Reduce
Output Youtube Analyse in Map Reduces
High Rated Youtube Video Analyser in Map Reduces
Implementation and Outputt in Map Reduces
Basics of PIG
Basics of PIG Continues
Analyze Youtube Data using PIG Implementation
Example of PIG Implementation
Output of PIG Implementation
Youtube Video Analyzer using Hive
Creating Youtube Video Analyzer using Hive
Analysis Youtube Videos using Hive Query
Analysis Youtube Videos using Hive Query Continues
More on Hive Query Languages
Conclusion
Hadoop and HDFS Fundamentals on Cloudera
What is Big Data ?
Processing Big Data
Distributed storage and processing
Understanding Map Reduce
Introduction to module 2
Introduction to Cloudera environment
Understanding hadoop environment installed on Cloudera
Understanding metadata configuration on hadoop
Understanding HDFS web UI and HUE
HDFS shell Commands
Few more HDFS shell Commands
Accesing HDFS through Java program
Log Data Analysis with Hadoop
Introduction to Log Processing
Summarizing Log Files
MapReducing Programme
Execute MapReduce Program
Big Data Technology
Executing Big Data Tool
Writing Map Reduce Program
Array List Searching
Processing Files In Map Reduce
Conclusion