Scrapy Masterclass: Learn Web Scraping With Scrapy Framework

Post published:30 June, 2023
Post category:StudyBullet-7
Reading time:13 mins read

Learn and Master Web Scraping using Scrapy Framework with this Step-By-Step Guide and In-Depth Guide

What you will learn

Define the Steps Involved in Web Scraping and Creating Web Crawlers

Install and Setup Scrapy in Windows, Mac OS, Ubuntu (Linux) & Anaconda Environments

Send Request to a URL to Scrape Websites Using Scrapy Spider

Get the HTML Response From URL and Parse it for Web Scraping

Select Desired Data From Websites Using Scrapy Selector, CSS Selectors & XPath

Scrapy Crawl Spiders to Get Data From Websites And Extract it to JSON, CSV, XLSX ( Excel ) and XML Files

Use Scrapy Shell Commands to Test & Verify CSS Selectors or XPath

Export and Save Scraped Data to Online Databases Like MonogoDB Using Scrapy Item Pipelines

Define Scrapy Items to Organize Scraped Data And Load Items Using Scrapy Itemloaders with Input & Output Processors

Scrape Data From Multiple Web Pages Using Scrapy Pagination And Extract Data From HTML Tables

Login Into Websites Using Scrapy FormRequest With CSRF Tokens

Scrape Dynamic/JavaScript Rendered Websites Using Scrapy-Playwright And Interact With Web Elements, Take Screenshot of Websites or Save as PDF

Identify API Calls From a Website and Scrape Data From API Using Scrapy Request

Description

Web scraping is the process of scraping websites and extracting desired data from the same, and in this course, you’ll learn and master web scraping using python and scrapy with a step-by-step and in-depth guide.

A Step-By-Step Guide

Assuming that you don’t know anything about web scraping, scrapy python web scraping, or even web scraping meaning – we will start from the complete basics. In the first section, you’ll learn about the web scraping process step-by-step (with infographics – no code), how to scrape data from websites and how to use scrapy for the same (i.e. scrapy meaning).

After getting the basics clear and having an idea of how web scraping works, we will start web scraping using python & scrapy framework! Again, we’ll move step-by-step and perform each step learned in the basics with bite-sized lessons. We’ll take it slow so that it’s easier for you to understand each and every step involved in scraping and extracting data from websites.

Web Scraping & Scrapy Essentials

Having built an actual web scraper, you’ll get an idea of how web scraping works firsthand. Now it’s crucial to cover the essential concepts of web scraping and scrapy, which we will do next.

Get Instant Notification of New Courses on our Telegram channel.

CSS Selectors to select web elements
XPath to select web elements
Scrapy Shell to test & verify selectors
Items to organize extracted data
Load Items with ItemLoaders with input & output Processors
Export data to JSON, CSV, XLSX (Excel) & XML file formats
Save extracted data to online databases like MongoDB using ItemPipelines

Master Web Scraping In-Depth

Learning how to scrape websites and the essentials already makes you a complete web scraper but, we’ll take this even further and learn the advanced web scraping techniques to become an expert!

Follow links in a webpage to another page
Crawl multiple pages and extract data i.e. Pagination
Scrape data using Regular Expressions (RegEx)
Extract Data From HTML Tables
Login Into Websites Using Scrapy FormRequest
Bypass CSRF protected Login forms
Scrape Dynamic or JavaScript Rendered Websites using Scrapy Playwright
- Interact with web elements like fill forms, click buttons, etc.
- Handle Infinite Scroll websites
- Wait For Elements when contents/data takes time to load
- Take Screenshot of websites
- Save websites as PDF
Identify API calls from websites and scrape data from APIs
Use middleware in a scrapy project
Configure settings in a scrapy project
Use and Rotate User-Agents & Proxies
Web scraping Best Practices

Real-World Projects

After master web scraping, we need projects to get started! That’s why you’ll perform three projects as well:

Champions League Table [ ESPN ]
Product Tracker [ Amazon ]
Scraper Application [ GUI ]

Join us in this in-depth course where you’ll learn about web scraping from scratch and master the process of extracting data from websites step-by-step. Check out the preview lessons to get started and learn how web scraping works! See you there~

English

language

Content

Introduction

What is Web Scraping?

How Web Scraping Works?

Web Scraping With Scrapy

Scrapy Installation

Scrapy Installation for Windows

Scrapy Installation for Ubuntu (Linux)

Scrapy Installation for Mac

Scrapy Installation for Anaconda

Creating Scrapy Project

Project Walkthrough

Scrapy Spider

Creating Spider

Sending Request

Getting the Response

Scrapy CSS Selector

Selecting All The Data

Extracting Data

Spider Overview

CSS Selectors

CSS Selectors v/s XPath : How to Select Web Elements?

Tagname, Class and Id Selectors

Attribute Selectors

CSS Selectors Cheat Sheet

XPath

XPath Expressions

XPath Attribute Selectors

XPath text( ) Function

XPath Cheat Sheet

Scrapy Shell

What is the Scrapy Shell and How to Use it?

fetch( ) Response

Shell Configuration

Scrapy Items

Structuring Data Into Scrapy Item

Using Item in Spiders

Define Input and Output Processors For Item Fields

Loading Items with Scrapy ItemLoaders

Items, Processors & ItemLoaders Overview

Exporting Data

Output Extracted Data In JSON, CSV & XML Format

Overwrite Previous Output

Appending Data to Previous Output

Scrapy Item Pipeline

How to use Scrapy Item Pipelines?

Saving Data Locally to Excel ( XLSX ) Files

Enable Item Pipelines in Settings

MongoDB (Account) Setup

Saving Data To MonogoDB

Pagination

Extracting Links From href Attributes

Send Request to the Next Page

start_requests( ) method

Following Links

How to Follow Links?

How to Select Data Using Regular Expressions With Scrapy

Setting Up Custom Callback Function

Parse Product Details Page

Scraping Tables

HTML Tables

Selecting Tables Data

Extract Data From HTML Tables

Logging Into Websites

Data Hidden With Logging Forms

Inspecting HTML Forms and Website Activity With Dev Tools

Logging Into Websites With FormRequest

CSRF Protected Login Forms

Extract CSRF Values From Forms

Scraping JavaScript Rendered Websites

What are JavaScript Rendered/Dynamic Websites?

scrapy-playwright Installation

Setting Up Playwright in Scrapy Project

Using Playwright To Render Websites

Scraping Data From Dynamic Websites

Scrapy Playwright

Playwright Overview

Playwright Page Object

Logging In With Playwright

Dynamic Websites With Loading Screens

Wait For Selector/Elements Using Page Couroutines

Dynamic Websites With Infinite Scroll

Taking Screenshot of Websites

Rendering Websites To PDF

API Endpoints

Identifying API Calls

Requesting Data From API

Extracting Data From API

Settings

Scrapy Project Settings

Robots Text

Middleware

Autothrottle Extension

User Agents & Proxies

What are User Agents?

User Agents With Scrapy

What are Proxies?

Proxies With Scrapy

Tips & Tricks

Spider Arguments

Standalone Spiders

Scrapy Shell With bpython

Scrapy get vs extract method

Logging

Project #1: Champions League Table From ESPN.com

Overview

Website Visual Inspection

Finding The Selectors

Building The Spider: Extract Teams Data

Building The Spider: Extract Teams Details

Project #2: Amazon Product Rank

Overview

Scraper Visualization

Finding The Selectors

Building The Spider

Project #3: Extending Scraper With GUI

Scraper Application

Building The GUI (Application Interface)

Running the Spider From the Application

Enroll for Free

💠 Follow this Video to Get Free Courses on Every Udemy Topics! 💠

Tags: Free Courses, Python, Scrapy, StudyBullet, Web Scraping