• Post category:StudyBullet-15
  • Reading time:11 mins read

Hands-on text mining and natural language processing (NLP) training for data science applications in R

What you will learn

Students will be able to read in data from different sources- including databases

Basic webscraping- extracting text and tabular data from HTML pages

Social media mining from Facebook and Twitter

Extract information relating to tweets and posts

Analyze text data for emotions

Carry out Sentiment analysis

Implement natural language processing (NLP) on different types of text data


Do You Want to Gain an Edge by Gleaning Novel Insights from Social Media?

Do You Want to Harness the Power of Unstructured Text and Social Media to Predict Trends?

Over the past decade there has been an explosion in social media sites and now sites like Facebook and Twitter are used for everything from sharing information to distributing news. Social media both captures and sets trends. Mining unstructured text data and social media is the latest frontier of machine learning and data science. 


My name is Minerva Singh and I am an Oxford University MPhil (Geography and Environment) graduate. I recently finished a PhD at Cambridge University (Tropical Ecology and Conservation). I have several years of experience in analyzing real-life data from different sources using data science-related techniques and producing publications for international peer-reviewed journals. Unlike other courses out there, which focus on theory and outdated methods, this course will teach you practical techniques to harness the power of both text data and social media to build powerful predictive models. We will cover web-scraping, text mining and natural language processing along with mining social media sites like Twitter and Facebook for text data. Additionally, you will learn to apply both exploratory data analysis and machine learning techniques to gain actionable insights from text and social media data.



My course will help you implement the methods using real data obtained from different sources. Many courses use made-up data that does not empower students to implement R based data science in real life. After taking this course, you’ll easily use packages like the caret, dplyr to work with real data in R. You will also learn to use the common social media mining and natural language processing packages to extract insights from text data.   I will even introduce you to some very important practical case studies – such as identifying important words in a text and predicting movie sentiments based on textual reviews. You will also extract tweets pertaining to trending topics analyze their underlying sentiments and identify topics with Latent Dirichlet allocation. With this Powerful course, you’ll know it all:  extracting text data from websites, extracting data from social media sites and carrying out analysis of these using visualization, stats, machine learning, and deep learning!

Get Instant Notification of New Courses on our Telegram channel.

Note➛ Make sure your 𝐔𝐝𝐞𝐦𝐲 cart has only this course you're going to enroll it now, Remove all other courses from the 𝐔𝐝𝐞𝐦𝐲 cart before Enrolling!

Start analyzing data for your own projects, whatever your skill level and Impress your potential employers with actual examples of your data science projects.


  • Data Structures and Reading in R, including CSV, Excel, JSON, HTML data.
  • Web-Scraping using R
  • Extracting text data from Twitter and Facebook using APIs
  • Extract and clean data from the FourSquare app
  • Exploratory data analysis of textual data
  • Common Natural Language Processing techniques such as sentiment analysis and topic modelling
  • Implement machine learning techniques such as clustering, regression and classification on textual data
  • Network analysis

Plus you will apply your newly gained skills and complete a practical text analysis assignment

We will spend some time dealing with some of the theoretical concepts. However, the majority of the course will focus on implementing different techniques on real data and interpreting the results.

After each video, you will learn a new concept or technique which you may apply to your own projects.

All the data and code used in the course has been made available free of charge and you can use it as you like. You will also have access to additional lectures that are added in the future for FREE.




INTRODUCTION TO THE COURSE: The Key Concepts and Software Tools

About the Course and Instructor
Data and Scripts For the Course
Introduction to R and RStudio
Conclusion to Section 1

Reading in Data from Different Sources

Read in CSV & Excel Data
Read in Data from Online CSV
Read in Zipped File
Read Data from a Database
Read in JSON Data
Read in Data from PDF Documents
Read in Tables from PDF Documents
Conclusion to Section 2

Webscraping: Extract Data from Webpages

Read in Data From Online Google Sheets
Read in Data from Online HTML Tables-Part 1
Read in Data from Online HTML Tables-Part 2
Get and Clean Data from HTML Tables
Read Text Data from an HTML Page
Introduction to Selector Gadget
More Webscraping With rvest-IMDB Webpage
Another Way of Accessing Webpage Elements
Conclusions to Section 3

Introduction to APIs

What is an API?
Extract Text Data from Guardian Newspaper

Text Data Mining from Social Media

Extract Data from Facebook
Get More out Of Facebook
Set up a Twitter App for Mining Data from Twitter
Extract Tweets Using R
More Twitter Data Extraction Using R
Get Tweet Locations
Get Location Specific Trends
Learn More About the Followers of a Twitter Handle
Another Way of Extracting Information From Twitter- the rtweet Package
Geolocation Specific Tweets With “rtweet”
More Data Extraction Using rtweet
Locations of Tweets
Mining Github Using R
Set up the FourSquare App
Extract Reviews for Venues on FourSquare
Conclusions to Section 5

Exploring Text Data For Preliminary Ideas

Explore Tweet Data
A Brief Explanation
EDA With Text Data
Examine Multiple Document Corpus of Text
Brief Introduction to tidytext
Text Exploration & Visualization with tidytext
Explore Multiple Texts with tidytext
Count Unique Words in Tweets
Visualizing Text Data as TF-IDF
TF-IDF in Graphical Form
Conclusions to Section 6

Natural Language Processing: Sentiment Analysis

Wordclouds for Visualizing Tweet Sentiments: India’s Demonetization Policy
Wordclouds for Visualizing Reviews
Tidy Wordclouds
Quanteda Wordcloud
Word Frequency in Text Data
Tweet Sentiments- Mugabe’s Ouster
Tidy Sentiments- Sentiment Analysis Using tidytext
Examine the Polarity of Text
Examine the Polarity of Tweets
Topic Modelling a Document
Topic Modelling Multiple Documents
Topic Modelling Tweets Using Quanteda
Conclusions to Section 7

Text Data and Machine Learning

Clustering for Text Data
Clustering Tweets with Quanteda
Identify Spam Emails with Supervised Classification
Introduction to RTextTools
More on RTextTools
Classifying Textual Data
ML Approaches For Predicting a Binary Outcome in Text Data
ML Approaches For Predicting a Multi-Class Outcome in Text Data

Network Analysis

A Small (Social) Network
A More Theoretical Explanation
Build & Visualize a Network
Network of Emails
More on Network Visualization
Analysis of Tweet Network
Identify Word Pair Networks
Network of Words