• Post category:StudyBullet-7
  • Reading time:13 mins read


Learn and Master Web Scraping using Scrapy Framework with this Step-By-Step Guide and In-Depth Guide

What you will learn

Define the Steps Involved in Web Scraping and Creating Web Crawlers

Install and Setup Scrapy in Windows, Mac OS, Ubuntu (Linux) & Anaconda Environments

Send Request to a URL to Scrape Websites Using Scrapy Spider

Get the HTML Response From URL and Parse it for Web Scraping

Select Desired Data From Websites Using Scrapy Selector, CSS Selectors & XPath

Scrapy Crawl Spiders to Get Data From Websites And Extract it to JSON, CSV, XLSX ( Excel ) and XML Files

Use Scrapy Shell Commands to Test & Verify CSS Selectors or XPath

Export and Save Scraped Data to Online Databases Like MonogoDB Using Scrapy Item Pipelines

Define Scrapy Items to Organize Scraped Data And Load Items Using Scrapy Itemloaders with Input & Output Processors

Scrape Data From Multiple Web Pages Using Scrapy Pagination And Extract Data From HTML Tables

Login Into Websites Using Scrapy FormRequest With CSRF Tokens

Scrape Dynamic/JavaScript Rendered Websites Using Scrapy-Playwright And Interact With Web Elements, Take Screenshot of Websites or Save as PDF

Identify API Calls From a Website and Scrape Data From API Using Scrapy Request

Description

Web scraping is the process of scraping websites and extracting desired data from the same, and in this course, you’ll learn and master web scraping using python and scrapy with a step-by-step and in-depth guide.

A Step-By-Step Guide

Assuming that you don’t know anything about web scraping, scrapy python web scraping,Β  or even web scraping meaning –Β  we will start from the complete basics. In the first section, you’ll learn about the web scraping process step-by-step (with infographics – no code), how to scrape data from websites and how to use scrapy for the same (i.e. scrapy meaning).

After getting the basics clear and having an idea of how web scraping works, we will start web scraping using python & scrapy framework! Again, we’ll move step-by-step and perform each step learned in the basics with bite-sized lessons. We’ll take it slow so that it’s easier for you to understand each and every step involved in scraping and extracting data from websites.

Web Scraping & Scrapy Essentials

Having built an actual web scraper, you’ll get an idea of how web scraping works firsthand. Now it’s crucial to cover the essential concepts of web scraping and scrapy, which we will do next.


Get Instant Notification of New Courses on our Telegram channel.

Noteβž› Make sure your π”ππžπ¦π² cart has only this course you're going to enroll it now, Remove all other courses from the π”ππžπ¦π² cart before Enrolling!


  • CSS Selectors to select web elements
  • XPath to select web elements
  • Scrapy Shell to test & verify selectors
  • Items to organize extracted data
  • Load Items with ItemLoaders with input & output Processors
  • Export data to JSON, CSV, XLSX (Excel) & XML file formats
  • Save extracted data to online databases like MongoDB using ItemPipelines

Master Web Scraping In-Depth

Learning how to scrape websites and the essentials already makes you a complete web scraper but, we’ll take this even further and learn the advanced web scraping techniques to become an expert!

  • Follow links in a webpage to another page
  • Crawl multiple pages and extract data i.e. Pagination
  • Scrape data using Regular Expressions (RegEx)
  • Extract Data From HTML Tables
  • Login Into Websites Using Scrapy FormRequest
  • Bypass CSRF protected Login forms
  • Scrape Dynamic or JavaScript Rendered Websites using Scrapy Playwright
    • Interact with web elements like fill forms, click buttons, etc.
    • Handle Infinite Scroll websites
    • Wait For Elements when contents/data takes time to load
    • Take Screenshot of websites
    • Save websites as PDF
  • Identify APIΒ calls from websites and scrape data from APIs
  • Use middleware in a scrapy project
  • Configure settings in a scrapy project
  • Use and Rotate User-Agents & Proxies
  • Web scraping Best Practices

Real-World Projects

After master web scraping, we need projects to get started! That’s why you’ll perform three projects as well:

  • Champions League Table [ ESPN ]
  • Product Tracker [ Amazon ]
  • Scraper Application [ GUI ]

Join us in this in-depth course where you’ll learn about web scraping from scratch and master the process of extracting data from websites step-by-step. Check out the preview lessons to get started and learn how web scraping works! See you there~

English
language

Content

Introduction

What is Web Scraping?
How Web Scraping Works?
Web Scraping With Scrapy

Scrapy Installation

Scrapy Installation for Windows
Scrapy Installation for Ubuntu (Linux)
Scrapy Installation for Mac
Scrapy Installation for Anaconda
Creating Scrapy Project
Project Walkthrough

Scrapy Spider

Creating Spider
Sending Request
Getting the Response
Scrapy CSS Selector
Selecting All The Data
Extracting Data
Spider Overview

CSS Selectors

CSS Selectors v/s XPath : How to Select Web Elements?
Tagname, Class and Id Selectors
Attribute Selectors
CSS Selectors Cheat Sheet

XPath

XPath Expressions
XPath Attribute Selectors
XPath text( ) Function
XPath Cheat Sheet

Scrapy Shell

What is the Scrapy Shell and How to Use it?
fetch( ) Response
Shell Configuration

Scrapy Items

Structuring Data Into Scrapy Item
Using Item in Spiders
Define Input and Output Processors For Item Fields
Loading Items with Scrapy ItemLoaders
Items, Processors & ItemLoaders Overview

Exporting Data

Output Extracted Data In JSON, CSV & XML Format
Overwrite Previous Output
Appending Data to Previous Output

Scrapy Item Pipeline

How to use Scrapy Item Pipelines?
Saving Data Locally to Excel ( XLSX ) Files
Enable Item Pipelines in Settings
MongoDB (Account) Setup
Saving Data To MonogoDB

Pagination

Extracting Links From href Attributes
Send Request to the Next Page
start_requests( ) method

Following Links

How to Follow Links?
How to Select Data Using Regular Expressions With Scrapy
Setting Up Custom Callback Function
Parse Product Details Page

Scraping Tables

HTML Tables
Selecting Tables Data
Extract Data From HTML Tables

Logging Into Websites

Data Hidden With Logging Forms
Inspecting HTML Forms and Website Activity With Dev Tools
Logging Into Websites With FormRequest
CSRF Protected Login Forms
Extract CSRF Values From Forms

Scraping JavaScript Rendered Websites

What are JavaScript Rendered/Dynamic Websites?
scrapy-playwright Installation
Setting Up Playwright in Scrapy Project
Using Playwright To Render Websites
Scraping Data From Dynamic Websites

Scrapy Playwright

Playwright Overview
Playwright Page Object
Logging In With Playwright
Dynamic Websites With Loading Screens
Wait For Selector/Elements Using Page Couroutines
Dynamic Websites With Infinite Scroll
Taking Screenshot of Websites
Rendering Websites To PDF

API Endpoints

Identifying API Calls
Requesting Data From API
Extracting Data From API

Settings

Scrapy Project Settings
Robots Text
Middleware
Autothrottle Extension

User Agents & Proxies

What are User Agents?
User Agents With Scrapy
What are Proxies?
Proxies With Scrapy

Tips & Tricks

Spider Arguments
Standalone Spiders
Scrapy Shell With bpython
Scrapy get vs extract method
Logging

Project #1: Champions League Table From ESPN.com

Overview
Website Visual Inspection
Finding The Selectors
Building The Spider: Extract Teams Data
Building The Spider: Extract Teams Details

Project #2: Amazon Product Rank

Overview
Scraper Visualization
Finding The Selectors
Building The Spider

Project #3: Extending Scraper With GUI

Scraper Application
Building The GUI (Application Interface)
Running the Spider From the Application