• Post category:StudyBullet-9
  • Reading time:10 mins read


Learn everything about Apache Hive a modern, data warehouse.

What you will learn

Why Hive is necessary for Data Engineer

The goal of this course is to help you become familiar with Apache Hive bits and bytes

Learn A to Z of Apache HIVE (From Basic to Advance level).

Hands on Experience on Apache Hive and Real-time Use Case

Description

The Apache Hive data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command-line tool and JDBC driver are provided to connect users to Hive.

One of the most valuable technology skills is the ability to analyze huge data sets, and this course is specifically designed to bring you up to speed on one of the best technologies for this task, Apache Hive! The top technology companies like Google, Facebook, Netflix, Airbnb, Amazon, NASA, and more are all using Apache Hive!

Built on top of Apache Hadoop, Hive provides the following features:

  • Tools to enable easy access to data via SQL, thus enabling data warehousing tasks such as extract/transform/load (ETL), reporting, and data analysis.
  • A mechanism to impose structure on a variety of data formats
  • Access to files stored either directly in Apache HDFS™ or in other data storage systems such as Apache HBase™
  • Query execution via Apache Tez, Apache Spark, or MapReduce
  • Procedural language with HPL-SQL
  • Sub-second query retrieval via Hive LLAP, Apache YARN and Apache Slider.

Hive provides standard SQL functionality, including many of the later SQL:2003, SQL:2011, and SQL:2016 features for analytics.
Hive’s SQL can also be extended with user code via user defined functions (UDFs), user defined aggregates (UDAFs), and user defined table functions (UDTFs).

There is not a single “Hive format” in which data must be stored. Hive comes with built in connectors for comma and tab-separated values (CSV/TSV) text files, Apache Parquet, Apache ORC, and other formats. Users can extend Hive with connectors for other formats. Please see File Formats and Hive SerDe in the Developer Guide for details.

Hive is not designed for online transaction processing (OLTP) workloads. It is best used for traditional data warehousing tasks.

Hive is designed to maximize scalability (scale out with more machines added dynamically to the Hadoop cluster), performance, extensibility, fault-tolerance, and loose-coupling with its input formats.

We will learn

1) Apache Hive Overview

2) Apache Hive Architecture

3) Installation and Configuration


Get Instant Notification of New Courses on our Telegram channel.


4) How a Hive query flows through the system.

5) Hive Features, Limitation and Data Model

6) Data Type, Data Definition Language, and Data Manipulation Language

7) Hive View, Partition, and Bucketing

8) Built-in Functions and Operators

9) Join in Apache Hive

10) Frequently Asked Interview Question and Answers

11) 2 Realtime Projects

My goal is to provide you with practical tools that will be beneficial for you in the future. While doing that, with a real use opportunity.

I am really excited you are here, I hope you are going to follow all the way to the end of the course. It is fairly straight forward fairly easy to follow through the course I will show you step by step each line of code & I will explain what it does and why we are doing it. So please I invite you to follow up on it to go through all the lectures. All right I will see you soon in the course.

English
language

Content

Introduction

Introduction to Course
Introduction to Apache Hive
Hive Architecture
How a Hive query flows through the system.
(Optional) Introduction to Big Data
(Optional) What is Hadoop
Hive Features
Hive Limitation

Installing Apache Hive on Ubuntu (Linux) Machine

Installation Steps of Hadoop
Installation Steps of Apache Hive

Hive Data Model

Hive Data Model Diagram
Tables
Partitions
Buckets or Clusters

Hive Data Types

Hive Data Types
Primitive Type
Complex Type

HIVE Data Definition Language.

Create Database
Drop Database
Alter Database
Use Database
Show Database
Describe Database
Create Table
Create Table (Hands On)
Create Table (Hands On) with all Primitive Datatype
Create Table (Hands On) with all Complex Datatype
Managed and External Tables
Managed and External Tables (Hands On)
Storage Formats
Show Tables
Describe Tables
Drop Table
Alter Table
Truncate Table

HIVE Data Manipulation Language

LOAD
SELECT
INSERT
UPDATE
DELETE

Hive View, Metastore, Partitions, and Bucketing

View
View (Hands On)
Metastore
Partitions
Partitions (Hands On)
Bucketing
Bucketing (Hands On)

Hive Built-In Functions

Date Functions
Mathematical Functions
String Functions

Built-in Operators

Relational Operators
Arithmetic Operators
Logical Operators
String Operators

Hive Join

Joins
Inner Join (Hands On)
Left Outer Join (Hands On)
Right Outer Join (Hands On)
Full Outer Join (Hands On)

Frequently Asked Interview Question and Answers

How to create HIVE Table with multi character delimiter?
How to load Data from a .txt file to Table Stored as ORC in Hive?
How to skip header rows from a table in Hive?
Create single Hive table for small files without degrading performance in Hive?
How will you consume this CSV file into the Hive warehouse using built SerDe?
Is it possible to change the default location of a managed table?
Can hive queries be executed from script files? How?
Can we run unix shell commands from hive? Give example?

Hands On Projects (2 Projects)

Installing Apache Zeppelin (0.10.1)
(Hands On) Downloading files
Data Files for the Project
(Hands On) Configure Hive Interpreter in Apache Zeppelin
Configure Hive Interpreter in Apache Zeppelin
Hadoop Configuration Setting
Download Source Code for Project
Starting Hadoop,Hive, Zeppelin and Uploading Source Code
Project 1 – Part 1
Project 1 Part 2
Project 1 Part 3
Project 1 Part 4
Project 1 Part 5
Project 2 Part 1
Project 2 Part 2
Project 2 Part 3
Project 2 Part 4
Project 2 Part 5
Project 2 Part 6