This is the scraper script that finds all the links in a page using multi-threaded spiders.
Warning!!! It runs multi-threads so can slow down your computer!!!
To-do:
- In main.py, change the value in the variables PROJECT_NAME and HOMEPAGE
- Go to your terminal and run the python file 'python main.py'
- It is going to take some time to complete the process
- After completion your links will be in the directory "<PROJECT_NAME>/crawled.txt"
If you want to see total number of pages:
- Go to <PROJECT_NAME> directory
- Run the file using terminal: python number.py
Resource used: thenewboston