web-scraper

This is the scraper script that finds all the links in a page using multi-threaded spiders.

Warning!!! It runs multi-threads so can slow down your computer!!!

To-do:

In main.py, change the value in the variables PROJECT_NAME and HOMEPAGE
Go to your terminal and run the python file 'python main.py'
It is going to take some time to complete the process
After completion your links will be in the directory "<PROJECT_NAME>/crawled.txt"

If you want to see total number of pages:

Resource used: thenewboston

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
__pycache__		__pycache__
ulmedu		ulmedu
README.md		README.md
domain.py		domain.py
general.py		general.py
link_finder.py		link_finder.py
main.py		main.py
spider.py		spider.py

Provide feedback