Learn how to use NumPy, Pandas, Seaborn, Matplotlib for using Machine Learning and more!
What you will learn
Use Python for Data Science and Machine Learning
Learn to use NumPy for Numerical Data
Learn to use Seaborn for statistical plots
Learn Interactive plots and visualization
How to handle missing values in your dataset
How to derive maximum value for your data
How to transform categorical data
Use JupyterLab/Jupiter Notebook
How to apply EDA (through an assignment)
Customize graphs, modifying colors, lines, fonts, and more
Create a variety of charts, Bar Charts, Line Charts, Stacked Charts, Pie Charts, Histograms, KDE plots, Boxplots, Auto Correlation plots, Scatter P
Description
Are you ready to start your path to becoming a Data Scientist!
What is exploratory data analysis?
Exploratory data analysis (EDA) is used by data scientists to analyze and investigate data sets and summarize their main characteristics, often employing data visualization methods. It helps determine how best to manipulate data sources to get the answers you need, making it easier for data scientists to discover patterns, spot anomalies, test a hypothesis, or check assumptions.
EDA is primarily used to see what data can reveal beyond the formal modeling or hypothesis testing task and provides a better understanding of data set variables and the relationships between them. It can also help determine if the statistical techniques you are considering for data analysis are appropriate. Originally developed by American mathematician John Tukey in the 1970s, EDA techniques continue to be a widely used method in the data discovery process today.
Why is exploratory data analysis important in data science?
The main purpose of EDA is to help look at data before making any assumptions. It can help identify obvious errors, as well as better understand patterns within the data, detect outliers or anomalous events, and find interesting relations among the variables.
Data scientists can use exploratory analysis to ensure the results they produce are valid and applicable to any desired business outcomes and goals. EDA also helps stakeholders by confirming they are asking the right questions. EDA can help answer questions about standard deviations, categorical variables, and confidence intervals. Once EDA is complete and insights are drawn, its features can then be used for more sophisticated data analysis or modeling, including machine learning.
Programming Language Used
Python: an interpreted, object-oriented programming language with dynamic semantics. Its high-level, built-in data structures, combined with dynamic typing and dynamic binding, make it very attractive for rapid application development, as well as for use as a scripting or glue language to connect existing components. Python and EDA can be used together to identify missing values in a data set, which is important so you can decide how to handle missing values for machine learning.
Topics covered in this course :
- Programming with Python
- NumPy with Python
- Using Pandas Data Frames to solve complex tasks
- Use Pandas to Files
- Use Matplotlib and Seaborn for data visualizations
- Exploratory Data Analysis (EDA) of Titanic Dataset
- and much, much more!
By the end of this course you will:
- Have an understanding of how to program in Python.
- Know how to create and manipulate arrays using NumPy and Python.
- Knows how to use pandas to create and analyze data sets.
- Knows how to use Matplotlib and seaborn libraries to create beautiful data visualization.
- Have an amazing portfolio of python data analysis skills!
- Have experience in creating a visualization of real-life projects
Content