• Post category:StudyBullet-14
  • Reading time:26 mins read


Data Science Projects with Linear Regression, Logistic Regression, Random Forest, SVM, KNN, KMeans, XGBoost, PCA etc

What you will learn

The fundamental concepts and techniques of machine learning, including supervised and unsupervised learning

The implementation of various machine learning algorithms such as linear regression, logistic regression, k-nearest neighbors, decision trees, etc.

Techniques for building and evaluating machine learning models, such as feature selection, feature engineering, and model evaluation techniques.

The different types of model evaluation metrics, such as accuracy, precision, and recall and how to interpret them.

The use of machine learning libraries such as scikit-learn and pandas to build and evaluate models.

Hands-on experience working on real-world datasets and projects that will give students the opportunity to apply the concepts and techniques learned throughout.

The ability to analyze, interpret and present the results of machine learning models.

Understanding of the trade-offs between different machine learning algorithms, and their advantages and disadvantages.

Understanding of the best practices for developing, implementing, and interpreting machine learning models.

Skills in troubleshooting common machine learning problems and debugging machine learning models.

Description

Welcome to our Machine Learning Projects course! This course is designed for individuals who want to gain hands-on experience in developing and implementing machine learning models. Throughout the course, you will learn the concepts and techniques necessary to build and evaluate machine-learning models using real-world datasets.

We cover basics of machine learning, including supervised and unsupervised learning, and the types of problems that can be solved using these techniques. You will also learn about common machine learning algorithms, such as linear regression, k-nearest neighbors, and decision trees.

ML Prerequisites Lectures

  1. Python Crash Course: It is an introductory level course that is designed to help learners quickly learn the basics of Python programming language.
  2. Numpy: It is a library in Python that provides support for large multi-dimensional arrays of homogeneous data types, and a large collection of high-level mathematical functions to operate on these arrays.
  3. Pandas: It is a library in Python that provides easy-to-use data structures and data analysis tools. It is built on top of Numpy and is widely used for data cleaning, transformation, and manipulation.
  4. Matplotlib: It is a plotting library in Python that provides a wide range of visualization tools and support for different types of plots. It is widely used for data exploration and visualization.
  5. Seaborn: It is a library built on top of Matplotlib that provides higher-level APIs for easier and more attractive plotting. It is widely used for statistical data visualization.
  6. Plotly: It is an open-source library in Python that provides interactive and web-based visualizations. It supports a wide range of plots and is widely used for creating interactive dashboards and data visualization for the web.

ML Models Covered in This Course

  1. Linear Regression: A supervised learning algorithm used for predicting a continuous target variable based on a set of independent variables. It assumes a linear relationship between the independent and dependent variables.
  2. Logistic Regression: A supervised learning algorithm used for predicting a binary outcome based on a set of independent variables. It uses a logistic function to model the probability of the outcome.
  3. Decision Trees: A supervised learning algorithm that uses a tree-like model of decisions and their possible consequences. It is often used for classification and regression tasks.
  4. Random Forest: A supervised learning algorithm that combines multiple decision trees to increase the accuracy and stability of the predictions. It is an ensemble method that reduces overfitting and improves the generalization of the model.
  5. Support Vector Machine (SVM): A supervised learning algorithm used for classification and regression tasks. It finds the best boundary (or hyperplane) that separates the different classes in the data.
  6. K-Nearest Neighbors (KNN): A supervised learning algorithm used for classification and regression tasks. It finds the k nearest points to a new data point and classifies it based on the majority class of the k nearest points.
  7. Hyperparameter Tuning: It is the process of systematically searching for the best combination of hyperparameters for a machine learning model. It is used to optimize the performance of the model and to prevent overfitting by finding the optimal set of parameters that work well on unseen data.
  8. AdaBoost: A supervised learning algorithm that adapts to the data by adjusting the weights of the observations. It is an ensemble method that is used for classification tasks.
  9. XGBoost: A supervised learning algorithm that is an extension of a gradient boosting algorithm. It is widely used in Kaggle competitions and industry projects.
  10. CatBoost: A supervised learning algorithm that is designed to handle categorical variables effectively.

Unsupervised Models

Clustering algorithms can be broadly classified into three types: centroid-based, density-based, and hierarchical. Centroid-based clustering algorithms such as k-means, group data points based on their proximity to a centroid, or center point. Density-based clustering algorithms such as DBSCAN, group data points based on their density in the feature space. Hierarchical clustering algorithms such as Agglomerative and Divisive build a hierarchy of clusters by either merging or dividing clusters iteratively.


Get Instant Notification of New Courses on our Telegram channel.


  1. K-Means: A centroid-based clustering algorithm that groups data points based on their proximity to a centroid. It is widely used for clustering large datasets.
  2. DBSCAN: A density-based clustering algorithm that groups data points based on their density in the feature space. It is useful for identifying clusters of arbitrary shape.
  3. Hierarchical Clustering: An algorithm that builds a hierarchy of clusters by merging or dividing clusters iteratively. It can be agglomerative or divisive in nature.
  4. Spectral Clustering: A clustering algorithm that finds clusters by using eigenvectors of the similarity matrix of the data.
  5. Principal Component Analysis (PCA): A dimensionality reduction technique that projects data onto a lower-dimensional space while preserving the most important information.

Advanced Models

  1. Deep Learning Introduction: Deep learning is a subfield of machine learning that uses artificial neural networks with many layers, called deep neural networks, to model and solve complex problems such as image recognition and natural language processing. It is based on the idea that a neural network can learn to automatically learn representations of the data at different levels of abstraction. Multi-layer Perceptron (MLP) is a type of deep learning model that is a feedforward artificial neural network model that maps sets of input data onto a set of appropriate outputs. MLP is a supervised learning algorithm that can be used for both classification and regression tasks. MLP is based on the idea that a neural network with multiple layers can learn to automatically learn representations of the data at different levels of abstraction.
  2. Natural Language Processing (NLP): Natural Language Processing (NLP) is a field of Artificial Intelligence that deals with the interaction between human language and computers. One of the common techniques used in NLP is the term frequency-inverse document frequency (tf-idf). Tf-idf is a statistical measure that reflects the importance of a word in a document or a corpus of documents. The importance increases proportionally to the number of times a word appears in the document but is offset by the frequency of the word in the corpus. Tf-idf is used in NLP for tasks such as text classification, text clustering, and information retrieval. It is also used in document summarization and feature extraction for text data.

Are there any course requirements or prerequisites?

  • No introductory skill level of Python programming required
  • Have a computer (either Mac, Windows, or Linux)
  • Desire to learn!

Who this course is for:

  • Beginners python programmers.
  • Beginners Data Science programmers.
  • Students of Data Science and Machine Learning.
  • Anyone interested in learning more about python, data science, or data visualizations.
  • Anyone interested in the rapidly expanding world of data science!
  • Developers who want to work in analytics and visualization projects.
  • Anyone who wants to explore and understand data before applying machine learning.

Throughout the course, you will have access to a team of experienced instructors who will provide guidance and support as you work on your projects. You will also have access to a community of fellow students who will provide additional support and feedback as you work on your projects.

The course is self-paced, which means you can complete the modules and projects at your own pace,

English
language

Content

Introduction

Course Introduction
Machine Learning Introduction
Install Anaconda and Python on Windows
Install Anaconda in Linux
Jupyter Notebook Introduction and Keyboard Shortcuts

Python Crash Course

Arithmatic Operations in Python
Data Types in Python
Variable Casting
Strings Operation in Python
String Slicing in Python
String Formatting and Modification
Boolean Variables and Evaluation
List in Python
Tuple in Python
10 Set
Dictionary
Conditional Statements – If Else
While Loops
For Loops
Functions
Working with Date and Time
File Handling Read and Write

Numpy Crash Course

Numpy Introduction – Create Numpy Array
Array Indexing and Slicing
Numpy Data Types
np.nan and np.inf
Statistical Operations
Shape(), Reshape(), Ravel(), Flatten()
arange(), linspace(), range(), random(), zeros(), and ones()
Where
Numpy Array Read and Write
Concatenation and Sorting

Pandas for Data Analysis

Pandas Series Introduction Part 1
Pandas Series Introduction Part 2
Pandas Series Read From File
Apply Pythons Built in Functions to Series
apply() for Pandas Series
Pandas DataFrame Creation from Scratch
Read Files as DataFrame
Columns Manipulation Part 1
Columns Manipulation Part 2
Arithmetic Operations
NULL Values Handling
DataFrame Data Filtering Part 1
DataFrame Data Filtering Part 2
14 Handling Unique and Duplicated Values
Retrive Rows by Index Label
Replace Cell Values
Rename, Delete Index and Columns
Lambda Apply
Pandas Groupby
Groupby Multiple Columns
Merging, Joining, and Concatenation Part 1
Concatenation
Merge and Join
Working with Datetime
Read Stock Data from YAHOO Finance

Matplotlib for Data Analysis

Matplotlib Introduction
Matplotlib Line Plot Part 1
IMDB Movie Revenue Line Plot Part 1
IMDB Movie Revenue Line Plot Part 2
Line Plot Rank vs Runtime Votes Metascore
Line Styling and Putting Labels
Scatter, Bar, and Histogram Plot Part 1
Scatter, Bar, and Histogram Plot Part 2
Subplot Part 1
Subplot Part 2
Subplots
Creating a Zoomed Sub-Figure of a Figure
xlim and ylim, legend, grid, xticks, yticks
Pie Chart and Figure Save

Seaborn for Data Analysis

Introduction
Scatter Plot
Hue, Style and Size Part1
Hue, Style and Size Part2
Line Plot Part 1
Line Plot Part 2
Line Plot Part 3
Subplots
sns.lineplot() and sns.scatterplot()
cat plot
Box Plot
Boxen Plot
Violin Plot
Bar Plot
Point Plot
Joint Plot
Pair Plot
Regression Plot
Controlling Ploted Figure Aesthetics

Data Visualization in Pandas

IRIS Dataset Introduction
Load IRIS Dataset
Line Plot
Secondary Axis
Bar and Barh Plot
Stacked Bar Plot
Histogram
Box Plot
Area and Scatter Plot
Hexbin Plot
Pie Chart
Scatter Matrix and Subplots

Data Visualization with Plotly

Introduction to Plotly and Cufflinks
Plotly Line Plot
Scatter Plot
Stacked Bar Plot
Box and Area Plot
3D Plot
Hist Plot, Bubble Plot and Heatmap

Linear Regression

Linear Regression Introduction
Regression Examples
Types of Linear Regression
Assessing the performance of the model
Bias-Variance tradeoff
What is sklearn and train-test-split
Python Package Upgrade and Import
Load Boston Housing Dataset
Dataset Analysis
Exploratory Data Analysis- Pair Plot
Exploratory Data Analysis- Hist Plot
Exploratory Data Analysis- Heatmap
Train Test Split and Model Training
How to Evaluate the Regression Model Performance
Plot True House Price vs Predicted Price
Plotting Learning Curves Part 1
Plotting Learning Curves Part 2
Machine Learning Model Interpretability- Residuals Plot
Machine Learning Model Interpretability- Prediction Error Plot

Logistic Regression

Logistic Regression Introduction
Sigmoid Function
Decision Boundary
Titanic Dataset Introduction
Dataset Loading
EDA – Heatmap and Density Plot
Missing Age Imputation Part 1
Missing Age Imputation Part 2
Imputation of Missing Embark Town
Data Types Correction and Mapping
One-Hot Encoding
Train Test Split
Model Building Training and Evaluation
Feature Selection – Recursive Feature Elimination
Accuracy, F1-Score, P, R, AUC_ROC Curve Part 1
Accuracy, F1-Score, P, R, AUC_ROC Curve Part 2
Accuracy, F1-Score, P, R, AUC_ROC Curve Part 3
ROC Curve and AUC Part 1
ROC Curve and AUC Part 2
ROC Curve and AUC Part 3

Support Vector Machine

SVM Introduction
SVM Kernels
Breast Cancer Dataset Introduction
Dataset Loading
Cancer Data Visualization Part 1
Cancer Data Visualization Part 2
Data Standardization
Train Test Split
Linear SVM Model Building and Training
Linear SVM Model on Scaled Feature
Polynomial, Sigmoid, RBF Kernels in SVM

Cross Validation and Hyperparameter Tuning

Cross Validation Regularization and Hyperparameter Optimization Introduction
ML Model Training Process
Breast Cancer Dataset Loading
Data Visualization
Train Test Split
Linear Regression and SVM Model Training
Regularization Introduction
Manual Hyperparameter Adjustment
Types of Cross Validation
K-Fold and LeaveOneOut Cross Validation
Grid Search Hypyerparameter Tuning
Random Grid Search Hyperparameter Tuning

K-Nearest Neighbor (KNN)

KNN Introduction
How KNN Works
Wine Dataset Laoding
Data Visualization
Train Test Split and Standardization
KNN Model Building and Training
Hyperparameter Tuning
Pros and Cons of KNN

Decision Tree

Decision Tree Introduction
How Decision Tree Works
What is Attribute Selection Measures – ASM.
Dataset Loading
Dataset Visualization
Train Test Split
Model Training and Evaluation
Tree Visualization
Hyperparameter Optimization
Diabetes Dataset Loading
Decision Tree Regression

Random Forest

Ensemble Learning Bagging and Boosting Introduction
Random Forest Introduction
Dataset Introduction
Data Visualization
Train Test Split and One-Hot Encoding
Random Forest Classifier Training and Evaluation
Data Loading for Random Forest Regression
Random Forest Regression Model Building
Hyperparameter Optimization

Boosting Algorithms

Boosting Algorithms Introduction
Heart-Disease Dataset Understanding
Data Visualization Part 1
Train Test Split
AdaBoost Model Training
AdaBoost Hyperparameter Tuning
XGBoost Introduction
XGBoost Model Training and Hyperparameter Tuning
CatBoost Model Training
CatBoost Hyperparameter Optimization

K-Means Clustering

Introduction to Unsupervised Learning
Introduction to K-Means
How to Choose Best Number of Clusters
K-Means Clustering with Scikit-Learn
Application of Unsupervised Learning
Customers Data Loading
Data Visualization
K-Means Clustering Data Preparation
K-Means Clustering for Age and Spending Score
Clusters Visualization
Decision Boundary Visualization
Putting Everything Together
Selecting Optimum Number of Clusters
Clustering for Annual Income vs Spending Score
3D Clustering Part 1
3D Clustering Part 2

Density Based Clustering

DBSCAN Introduction
Generate Dataset
DBSCAN Clustering
Spectral Clustering
Spectral Clustering Coding

Hierarchical Clustering

Hierarchical Clustering Introduction
Important Terms in Hierarchical Clustering
Stock Market Data Loading
Hierarchical Clustering Coding

Principle Component Analysis (PCA)

PCA Introduction
How PCA is Done.
MNIST Dataset Loading and Understanding
PCA Applications
PCA Coding
PCA Compression Analysis
Data Reconstruction
Choosing Right Number of the Principle Components
Data Reconstruction with 95% Information
Classification Comparison with and without PCA

Introduction to Deep Learning

What is Neuron
Multi-Layer Perceptron
Shallow vs Deep Neural Networks
Activation Function
What is Back Propagation
Optimizers in Deep Learning
Steps to Build Neural Network
Install TensorfFlow in Windows
Install TensorFlow in Linux
Customer Churn Dataset Loading
Data Visualization Part 1
Data Visualization Part 2
Data Preprocessing
Import Neural Networks APIs
How to Get Input Shape and Class Weights
Neural Network Model Building
Model Summary Explanation
Model Training
Model Evaluation
Model Save and Load
Prediction on Real-Life Data

Introduction to Natural Language Processing (NLP)

Introduction to NLP
What are Key NLP Techniques
Overview of NLP Tools
Common Challenges in NLP
Bag of Words – The Simples Word Embedding Technique
Term Frequency – Inverse Document Frequency (TF-IDF)
Load Spam Dataset
Text Preprocessing
Feature Engineering
Pair Plot
Train Test Split
TF-IDF Vectorization
Model Evaluation and Prediction on Real Data
Model Load and Store