RuchirChawdhry/1_scraping_python_modules.md

Last active May 17, 2020 13:53

Star (1) You must be signed in to star a gist
Fork (1) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/RuchirChawdhry/4b28c406c5ede52c58f64945bbfae988.js"></script>
Save RuchirChawdhry/4b28c406c5ede52c58f64945bbfae988 to your computer and use it in GitHub Desktop.

Download ZIP

Scraping Resources

Raw

1_scraping_python_modules.md

Python Modules for Scraping:

Scraping and Parsing

selectolax
AdvancedHTMLParser
grequests
parsel
mechanicalsoup
beautifulsoup4
gazpacho
cloudscraper
cfscrape
ipwhois
saas
parse-utils
looter
xlseries
sriram-twitter-scraper

Scrapy

scrapy
scrapyrt
scrapy-splash
scrapy-autoextract
scrapy-pagestorage
scrapy-jsonschema
scrapy-wayback-middleware
scrapy-rss
scrapy-rotating-proxies
django-dynamic-scraper

Specific

yt-videos-list
twint
play-scraper
instagramscraper
instalooter
instabotnet
linkedin-scraper
google-search-results-serpwow
youtubedata
TikTokApi
imgur-scraper
tropescraper
google-search-results
pastepwn
wikitablescrape
recipe-scrapers
name-scraper
lyrics-extractor
newsman
ludoj-scraper

JSON

python-rapidjson
orjson
jsonslicer
nujson
yapic.json

Text & Data Manipulation

htmldate
newspaper3k
acora
hext
boltons (boltons.strutils)
w3lib
textnormaliser
hyperlink
shorttext
postal
readability
cypunct
justext
iso4217parse
isbnlib

Image, Audio & File Manipulation

tesserocr
imagecodecs
imagecodecs-lite
miniaudio
pysndfile
pdfquery

Performance

cnamedtuples
pybase64
lz4 and zstd
pikepdf and PyMuPDF
fortuna, pyewacket, and rng
cytoolz
psutil
libuuid
hoedowm

PS: Not going to include the obvious ones like requests, pandas and numpy

Raw

2_scraping_links.md

Links:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment