• Post category:StudyBullet-16
  • Reading time:37 mins read


Big Data, Hadoop, MapReduce, HDFS, HIVE, PIG, Mahout, NoSQL, Oozie, Flume, Storm, Avro, Spark, Sqoop, Cloudera and more

What you will learn

Learn the concepts of Hadoop and Big Data

Learn in details the concepts of MapReduce, HDFS, HIVE, PIG

Learn Mahout, NoSQL, Oozie, Flume, Storm, Avro, Spark, Sqoop, Cloudera and more

Perform Data Analytics using Hadoop

Master the concepts of Hadoop framework

Get experience on different configurations of Hadoop cluster

Work with real-time projects using Hadoop

Description

Learn from well crafted study materials on Big Data, Hadoop, MapReduce, HDFS, HIVE, PIG, Mahout, NoSQL, Oozie, Flume, Storm, Avro, Spark, Sqoop, Cloudera, Data Analysis, Survey Analysis, Data Management, Sales Analysis, salary Analysis, Traffic Analysis, Loan Analysis, Log Data Analysis, Youtube Data Analysis, Sensor Data Analysis. Learn by doing. Learn from hands-on examples of analyzing big data. Turn your Crafting ability which can be a mixed bag ranging from developers to data scientists using procedural languages in the Hadoop space. Discover and learn the fundamentals of Hadoop. Be a person comfortable in managing the development and deployment of Hadoop applications.

What is Big Data

Big data is a collection of large datasets which cannot be processed using the traditional techniques. Big data uses various tools and techniques to collect and process the data. Big data deals with all types of data including structured, semi-structured and unstructured data. Big data is used in various fields data like

  • Black box data
  • Social media data
  • Stock exchange data
  • Power Grid Data
  • Transport Data
  • Search Engine Data

Benefits of Big Data

Big data has become very important and it is emerging as one of the crucial technologies in todayโ€™s world. The benefits of big data are listed below

Big data can be used by the companies to know the effectiveness of their marketing campaigns, promotions and other advertising media

Big data helps the companies to plan their production

Using the information provided through Big data companies can deliver better and quick service to their customers

Big data helps in better decision making in the companies which will increase the operational efficiencies and reduces the risk of the business

Big data handles huge volume of data in real time and thus enables data privacy and security to a great extent

Challenges faced by Big Data

The major challenges of big data are as follows


Get Instant Notification of New Courses on our Telegram channel.


  • Curation
  • Storage
  • Searching
  • Transfer
  • Analysis
  • Presentation

What is Hadoop

Hadoop is an open source software framework which is used for storing data of any type. It also helps in running applications on group of hardware. Hadoop has huge processing power and it can handle more number of tasks. Open source software here means it is free to download and use. But there are also commercial versions of Hadoop which is becoming available in the market. There are four basic components of Hadoop โ€“ Hadoop Common, Hadoop Distributed File System (HDFS), MapReduce and Yet Another Resource Negotiator (YARN).

Benefits of Hadoop Course

Hadoop is used by most of the organizations because of its ability to store and process huge amount of any type of data. The other benefits of Hadoop includes

  • Computing Power
  • Flexibility
  • Fault Tolerance
  • Low Cost
  • Scalability

Uses of Hadoop

Hadoop is used by many of the organizationโ€™s today because of its following uses

Low cost storage and active data archive

Staging area for a data warehouse and analytics store

Data lake

Sandbox for discovery and analysis

Recommendation Systems

English
language

Content

Big Data and Hadoop Training Introduction

Introduction to Big Data Hadoop
Scenario of Big Data Hadoop
Write Anatomy
Continuation os Write Anatomy
Read Anatomy
Continuation os Read Anatomy
Word Count in Hadoop
Running Hadoop Application
Continuation Hadoop Application
Working on Sample Program
Creating Method Map
Iterable Values
Output Path
Scary Catch Box

Hadoop Architecture and HDFS

Introduction to Hadoop Admin
Limitations of Existing System
Hadoop Key Characteristics
Hadoop Distributed File System
Storage Layer of Hadoop
Hadoop 1.0 Core Components
FS Images
Secondary Name Node
HDFC Architecture
Block Placement Policy
Assignments
Hadoop Architecture Cluster Setup
Installation of Hadoop in Vmware Workstation
Hadoop Package Installation
Configuration of Host Name and Gateway
Copying of ISO File to Centos
Installation of SSH File Using Yum
Copy the Public Key to Authorized Key in SSH
Setup for Block Size and Mapped
Create SSH -keygen for HD User
Start the Map Reduce in Hadoop
Creating a Clone for Hadoop
Changing the Hostname
Configuring Hadoop Site
Slave File Configuration
Creating Name node and Data Node In Hadoop
Understanding HDFS
Hadoop Core Config Files
Hadoop Cluster and Password less SSH
Configuring Rack Awareness
Configuring Rack Awareness Continues
Running DFS Admin Report
Hadoop Map Reduce
Running Hadoop NameNode
Executing Hadoop Command
Writing File in Hadoop Cluster
Understanding FS Command
Directories of Data
Fie System Check
Writing Data in HDFS
Checkpointing Node
Merging the Metadata
Cluster in Safe Mode
Cluster in Maintainance Mode
Commissioning of Data Nodes
Name Node
Validating the Data Node
Storage Considerations

MapReduce Fundamentals

Secondary Sort Hadoop
Creating Composite Key
Continue on Composite Key
Word Count Group
Importance of Partition
Hadoop FS – LS
Joins in Hadoop
Creating Configuration Object
Setup Method
Map Side Join Mapper
Hadoop Commands
Combiner in Hadoop
Continue on Combiner in Hadoop
Uploading Combiner Jar
Introduction to Real World
Ratings Mapper
Movie and Ratings Runner
Movie and Rating Calc Jar
Total Ratings By A User
User Rating Reducer
User Rating Class
Yarn Basic Tutorial
Node Manager

MapReduce Advanced

Running a MapReduce Program
Running a MapReduce Program Continues
HDFS File System
Combination of Word Count Functionality
Word Count With Tools
Log Processor
Advanced MapReduce and PIG
More on Advanced MapReduce
Executing Similar Program
HDI Data and Export Data
Creating New Java Class
Text Out Inverted Indexer
Introduction to MapReduce on Hadoop
Java Build Path
Local MapReduce
Using MapReduce
Sequence file Format
Parse Weblogs
Page View Mapper
Analytics Program
Analytics Program Continue
Inverted Index Map Reduce
Friend Sofa Friend
Cloud era Local Host
Cloud era Local Host Output
Final Module MapReduce Program
Strands
File Path Filter
Example
Example Continue

HIVE Fundamentals

Introduction to HIVE
HIVE Data Base
Load Data Command
How to Replace Column
External Table
HIVE Metastore
What is Hive Partition
Creating Partition Table
Insert Overwrite Table
Dynamic Partition True
Hive Bucketing
Decomposing Data Sets
Hive Joins
Hive Joins Continue
Skew Join
What is Serde
Serde in Hive
Hive UDF
Hive UDF Continues
More Hive UDF
Maxcale Function
Hive Example Use Case

Hive Advanced

Introduction to Hive Concepts and Hands-on Demonstration
Internal Table and External Table
Inserting Data Into Tables
Date and Mathematical Functions
Conditional Statements
Explode and Lateral View
Sorting
Join
Map Join
Static and Dynamic Partitioning
More on Dynamic Partitioning
Alter Command
MSCK Command
Bucketing
Table Sampling
Archiving
Ranks
Creating Views
Advantages of views and Altering Views
What is Indexing
Compact and Bitmap Index Running Time
Hive Commands in Bash Shell
Hive Variables – Hiveconf
Hive Variables -Hiveconf in Bash Shell
Configuring a Hive Var Variable
Variable Substitution
Word Count
Hive Architecture
Parallelism in Hive
Table Properties in Hive
Null Format Properties
Null Format Properties Continues
Purge Commands in Hives
Slowing Changing Dimension
Implement the SCD
Example of the SCD
How to Load XML Data in Hive
How to Load XML Data in Hive Continue
No Drop and Offline in Hive
Immutable Table
How to Create Hive RC File
Multiple Tables
Merging Hive Created Files and Function rLike
Various Configuration Settings in Hive
Various Configuration Settings in Hive Continues
Compressing Various Files in Hive
Different Modes in Hive
File Compression in Hive
Type of Mode in Hive
Comparison of Internal and External Table

PIG Fundamentals

Introduction to Pig
Features of Apache Pig
Pig Vs Hive
Apache Pig Local and MR Modes
Launching Local Modes
Data Types in Pig
Pig Commands – Store and Load
Load Command
Pig Commands – Group
CoGroup Operator
Join and Cross operators in Pig
Join and Cross operators in Pig Continues
Union and Split Operators in Pig
More on Split Operators
Filter Distinct and For each
Pig Functions
Pig Functions Continues
Input Data Size

PIG Advanced

Getting Started with PIG
Installation Process
PIG Latin
Uploading the File in HDFS
PIG Script
PIG Latin Basics
Up and Running with Pig
Loading and Storage
Loading and Storage Continue
Debugging
Grunt Shell
UDFs and Piggy Bank

NoSQL Fundamentals

A Brief History of NoSQL
Schema Agnostic
Nonrelational
Enterprise NoSQL
Recent Trends in IT
NoSQL Benefits and Precautions
Managing Different Data Types
Triple and Graph Store
Hybrid NoSQL Databases
Applying Consistency Method
Choosing ACID or BASE?
Developing Application on NoSQL
Semantics
Public Cloud
Managing Availability
Versioning Data

Apache Mahout

What is Mahout
Mahout Architecture
Subversion Installation
Item Based Recommendation
Example- CBayes Classifier
Command Line Options
Canopy Clustering
Basic Recommender
Practical Examples
Mahout Seqdumper Command
Running Code through Eclipse
Reading from Code
Introduction to Apache Mahout Deep Dive
Use Cases
Recommendation
Example – Tanimoto Distance
How to Use Mahout?
Exercise
Example – Evaluation
Deep Dive Canopy Clustering
Classification
Vector File
Naรฏve Bayes Classifier from Code
KMeans Clustering
Logistic Regression

Apache Oozie

Introduction to Apache Oozie
Discuss Action in Detail
Discuss Parameters
Email Action in Oozie
Hadoop FS Action in Oozie
Hive Action in Oozie
Hive Action in Oozie Continue
Control Node
Control Node Continue
Pig Action in Oozie
Pig Action in Oozie Continues
Oozie Coordinators
Oozie Workflow Applications
Oozie Workflow Applications Continues

Apache Flume

Introduction to Flume
Data Flow in Flume
Flume Netcat Example

Apache Storm

Introduction
Description of Hadoop
Storm Introduction
Apache Storm History
Features of Apache Storm
Architecture of Apache Storm
Architcture Explanation in Detail
Topology
Spouts and Bolts
Stream
Installation Process
Stream Grouping
Stream Grouping Continue
Reliability
Tasks
Workers
Java Installation and Zookeeper
Zookeeper installation
Eclipse Installation
Command line Client
Parallelism in Storm Topology

Apache Avro

Introduction to Apche Avro
Using Avro with Sqoop
Supported Primitive Data Types in Avro

Apache Spark Fundamentals

Introduction to Apache Spark Spark
Spark Context
Spark Components
Introduction to Spark RDD Basics
Use of Filter Function
RDD Transformations in Spark
RDD Transformations in Spark Continues
RDD Persistence in Spark
Group Sort and Actions on Pair RDDs
Spark File Formats
Spark File Formats Continues

Apache Spark Advanced

Introduction to Connecting to Twitter Using Spark
Flowchart of Spark
Components of Spark
Different Services Running on YARN
Introduction to Scala
Case Classes and Pattern Matching
Installation of Scala
Variables and Functions
Variables and Functions Continues
Loops
Collections
More on Collections
Abstract Class
Example of the Abstract Class
Trait
Example of the Trait
Exception
Practical Example of Exceptions
Customize Exceptions of Scala Project
Modifiers
Strings
Methods in Strings
Methods in Strings Continue
Array
RDD in Spark
RDD in Spark Continues
Different Operations
Transformation Operations
Action Operations
Action Operations Continues
Maven Creation
Create Scala Project
Difference between Hadoop 1.x and 2.x
Connection to Twitter Using Spark Streaming
How to Connect Twitter Using Spark Application
More on Connect Twitter Using Spark Application

Hadoop Project 01 – Sales Data Analysis

Introduction to Sales Data Analysis Using Hadoop- HDFS
Working with Problem Statement 2
Working with Problem Statement 3
Working with Problem Statement 4
Working with Problem Statement 5
Working with Problem Statement 6

Hadoop Project 02 – Tourism Survey Analysis

Introduction to Tourism Survey Analysis Using HDFS
Average of Money Spend By Tourist in our Country
Join Country and Nationality
Total no. of Tourist Less than 18
Change the Country Name Column
Number of Males from Australia
Tourism Survey General Detail and Spending Details

Hadoop Project 03 – Faculty Data Management

Introduction to Faculty Data Management Using HDFS
Education Industry
Adding New Column in Faculty Database Management
Changing Column Name and Data Type
Drop Column From Table and Add New Column

Hadoop Project 04 – E-Commerce Sales Analysis

Introduction to E-Commerce Sales Analysis Using Hadoop
Customer Detail not from USA
Customer Detail Account Created After 2009
Customer Details whose Sales are Less than 3600$
Details of Customer Name ’’Anushka

Hadoop Project 05 – Salary Analysis

Part time Employee using Salary Analysis
Details of Administrative Assistance
Data Sets in Ascending Order
Job Title for Each Department
Changing Name to Employee Name
Total number of Employee in Hourly Basis
Annual Salary Taken By Finance Department

Hadoop Project 06 – Health Survey Analysis using HDFS

Introduction to Health Analysis
Show Rows Data From Health Data Table
Adding New Data in Health Data Table
Get Data From HDFS Database from SQL Database
Getting Data in New HDFS Directory from SQL
Export Data Table From HDFS to SQL
Get Details of City Population in Health Dataset

Hadoop Project:07 – Traffic Violation Analysis

Introduction to Traffic Violation Analysis
Introduction to Traffic Violation Analysis Continues
Get Table From SQL to HDFS Directory
Output of Table From SQL to HDFS Directory
List Databases and Tables of SQl in HDFS
Create and Execute jobs in Traffic Violation
Import Data for Personal Injuries from SQL
Get Data For State Maryland
Extract Data of Traffic Violation from HDFS to My SQL

Hadoop Project 08 – PIG/MapReduce – Analyze Loan Dataset

Introduction to Analyze the Loan Data Set
Introduction to Analyze the Loan Data Set Continues
Overall Average Risk
Coding Average Risk
Coding Average Risk Continues
More on Average Risk
Average Risk Per Location
Average Risk per Loan Type
Calculate Average Risk Per Category
Calculate Average Risk Per category Continues
Comparable Interface in MapReduce
Implementation and Execution MapReduce
Average Risk Per Category in PIG
Average Risk Per Category and Location in PIG
Average Risk Per Category and Location in PIG Continues
Average Risk Per Category in Hive
Analysis Bank Loan Dataset in HIVE
Analysis Bank Loan Dataset in HIVE Continues
Understand of Sqoop and Get RDBMS Data in HDFS

Hadoop Project:09 – HIVE – Case Study on Telecom Industry

Introduction of Hive
Simple and Complex Datatype in Hive
Clusters
Database Command in Hive
Tables Commands in Hive
Manage Tables
External Tables
Introduction to Partitioning
Partition Command
Bucketing
Table Contr Services in Hive
Example of Contr Services
Example of Contr Services Continues
Creating Contract All Table

Hadoop Project:10 – HIVE/MapReduce – Customers Complaints Analysis

Introduction to Customer Complaint Project in Big Data
Complaint Filed Under Each File
Creating Driver Files and Jar Manifest
Creating Driver Files and Jar Manifest Continues
Complaint Filed from Particular Location
User Defined Location
List of Complaint Grouped By Location

Hadoop Project 11 – HIVE/PIG/MapReduce/Sqoop – Social Media Analysis

Introduction to Social Media Industry
Book Marking Website
Book Marking Website Continues
Understanding Sqoop
Get Data from RDMS to HDFS
Execute Map Reduce Program in order to Process XML File
Analyze Book Performance By Reviews Using Codev
Analyze Book Performance By Reviews Using Code Continues
Analyse Book By Location
Example of Analyse Book By Location
Analyse Book Reader Against Author
How to process XML File in PIG
How to process XML File in PIG Continues
Analyze Book Performance in XML File in PIG
More on Analyze Book Performance in XML File in PIG
Pig XML File Output Using Book
Pig XML File Output Using Location
Pig XML File Output Using Location Continues
Understanding Complex Data Set Using Hive
Understanding Complex Data Set Using Hive Continues
Create Array in Map Reduce Using Hive
Book Marking Type Data Set Using Complex Type
Output of Book Marking Type Data Set

Hadoop Project 12 – HIVE/PIG – Sensor Data Analysis

Introduction to Sensor Data Analysis
Introduction to Sensor Data Analysis Continues
Example of Sensor Data Analysis
Uderstanding Basic of Big Data and MapReduce
More on Big Data and MapReduce
Converting Json File into Simple Text Format
Converting Json File into Simple Text Format Continues
Output for Json File format
Diffrence Between Pig‚ MapReduce and Hive
More on Pig‚ MapReduce and Hive
Sensor Data Processing in Pig
Working With Pig Function
Types of Function in Pig
Example of Pig Function
Working on Use Cases Using Functions in PIG
Use Case Data Flow in Pig
Ratio Data Flow in Pig
More on Use Case in Pig
More on Use Case in Pig Continues
Example od Ratio Education in Pig
Approach Process the Json File in Hive
Features and Query in Hive
Work on Json Use Cases Using Hive
Work on Json Use Cases Using Hive Continues
Output of Json Usecases Using Hive
More on Json Usecses in Hive
Summary of Sensor Data Processing

Hadoop Project 13 – PIG/MapReduce – Youtube Data Analysis

Introduction to Youtube Data Analysis Using Hadoop
Introduction to Youtube Data Analysis Using Hadoop Continues
Data Preparation For Youtube Data Analysis using Hadoop
Basics of Big Data and Map Reduce
More on Big Data and Map Reduce
Working with Analysis Senario using Map Reduce
Example of Youtube Analyser using Map Reduce
Output Youtube Analyse in Map Reduces
High Rated Youtube Video Analyser in Map Reduces
Implementation and Outputt in Map Reduces
Basics of PIG
Basics of PIG Continues
Analyze Youtube Data using PIG Implementation
Example of PIG Implementation
Output of PIG Implementation
Youtube Video Analyzer using Hive
Creating Youtube Video Analyzer using Hive
Analysis Youtube Videos using Hive Query
Analysis Youtube Videos using Hive Query Continues
More on Hive Query Languages
Conclusion

Hadoop and HDFS Fundamentals on Cloudera

What is Big Data ?
Processing Big Data
Distributed storage and processing
Understanding Map Reduce
Introduction to module 2
Introduction to Cloudera environment
Understanding hadoop environment installed on Cloudera
Understanding metadata configuration on hadoop
Understanding HDFS web UI and HUE
HDFS shell Commands
Few more HDFS shell Commands
Accesing HDFS through Java program

Log Data Analysis with Hadoop

Introduction to Log Processing
Summarizing Log Files
MapReducing Programme
Execute MapReduce Program
Big Data Technology
Executing Big Data Tool
Writing Map Reduce Program
Array List Searching
Processing Files In Map Reduce
Conclusion