Effective Data Wrangling and Exploration with R – Part II
What you will learn
You are going to learn how to perform string manipulation with base R and the stringr package
You are going to master Regular expression with base R and the stringr package
You are going to learn how to perform categorical data manipulation with base R and the forcats package
You are going to master date and datetime manipulation with base R, the chron and lubridate packages
Description
This course will teach you all you need to know to manipulate string, date, and categorical data effectively in R. We shall make use of base R, the stringr package, the forcats package, the chron package and the lubridate packages in this course. This course is the second in a series ofย four courses dealing with data wrangling and exploration in R. The others are:
- Importing and exporting data in R: which has to do with importing csv, tab, txt, xlsx and other file types into R
- Effective Data frame manipulation in R: which involves using base R, the packages dplyr, tidyr, data.table, and sqldf to manipulate data frames
- Effective Data cleaning and exploration
In this course, we are going to look at the following:
- reading and writing raw text data
- formatting strings
- joining and splitting strings
- subsetting strings
- cleaning strings
- performing set operations on strings
- regex functions in both base R and stringr
- performing regex operations
- creating factors and ordered factors
- factor attributes and structure
- inspecting factors
- manipulating factors
- converting strings and numeric to factors and vice versa
- the Date class
- the POSIXt classes
- the chron package
- the lubridate package
- creating dates and date-times from integers and strings
- extracting date-time parts
- getting the current date and date-time
- performing date-time calculations
- rounding datetimes
- formatting datetimes
- timespans: duration, periods, and interval
- importing date-time columns
Hope you enjoy this course as we did developing it.
English
language
Content
Course Introduction
Introduction
Exercise files
String manipulation with base R
Section Objectives
Reading and writing raw text
String length and case folding
Joining strings using the cat() function
Joining strings using the paste() and paste0() functions
Joining strings using the sprintf() function
Formatting numbers with the format() function
Formatting numbers with the formatC() function
Formatting numbers using the scales package
Splitting strings using the strsplit() function
Extracting and replacing parts of a string
Removing white spaces
Performing set operations and conclusion
String manipulation with stringr
Section Objectives and Introduction
String length and formatting
Joining and splitting strings – I
Joining and splitting strings – II
Extracting and replacing parts of a string
Removing white spaces
Sorting strings
Duplicating strings and conclusion
Pattern matching using regular expression
Section Objectives
What is regular expression?
Base R regex functions
Stringr regex functions
Matching sequences
Alternates and ranges
Anchors and Word Boundaries
Quantifiers
Groups
Lookaround and conclusion
Manipulating Categorical data with Base R
Section objectives and What is a factor?
Creating a factor
Factor attributes and structure
Manipulating factors
Ordered Factors
Converting numeric and character vectors to factors and vice versa
Manipulating Categorical data with the forcats package
Section Objectives and Converting to a factor
Inspecting factors
Reordering levels
Restructuring levels and labels
Remove and add levels
Date manipulation with base R – The Date class
The class Date – Section objectives
Creating Dates from strings and integers
Getting the current date
Extracting date parts
Performing calculations with dates
Summary statistics with dates
Formatting dates and conclusion
Date manipulation with base R – The POSIXt class
The class Posixt – Section objectives
Creating Datetimes from strings and integers
Extracting datetime parts
Getting the current datetime
Performing calculations with datetimes
Summary statistics with datetimes
Rounding datetimes
Formatting datetimes
Loading columns as date or datetime and conclusion
Date Manipulation with the chron package
The chron package – Section objectives
Creating Dates from strings and integers
Extracting date and time parts
Getting the current date and time
Performing calculations with dates
Summary statistics with datetimes
Rounding datetimes
Formatting datetimes and conclusion
Date Manipulation with the lubridate package
The lubridate package – Section objectives
Creating Dates from strings and integers
Extracting datetime parts
Getting the current Date and datetime
Performing Date and datetime calculations
Summary statistics with dates and datetimes
Timespan – Duration
Timespan – Periods
Timespan – Intervals
Rounding datetimes
Formatting datetimes as strings
Reading datetime columns as datetime and conclusion