Tiki Crawler with Scrapy
pip install -r requirements.txt
cd tiki
If you cannot install Scrapy on MacOS, please refer to this link for proper guide.
scrapy crawl <spider_name> -o <output_file_path>:<file_format> \
-s IMAGES_STORE=<image_saved_path> \
-s FEED_EXPORT_INDENT=<indent_for_json> \
-a keyword="<your_keyword>" \
-a parser_type=<>
-a sort_type=<product_list_sort_type> \
-a num_products=<number_of_product_to_crawl>Ex:
scrapy crawl tiki_crawler -o data/data.csv -s IMAGES_STORE=data/images -a keyword="iPhone"scrapy crawl <spider_name> -o <output_file_path>.<file_format> \
-s IMAGES_STORE=<image_saved_path> \
-a category="<category_name>" \
-a sort_type=<product_list_sort_type> \
-a num_products=<number_of_product_to_crawl>Ex:
scrapy crawl tiki_crawler -o data/data.json -s IMAGES_STORE=data/images -a category="Điện thoại Smartphone"- Only support (sub)categories from subcategories.
- Supported output file format:
csv,json,jsonl,pickle,xml,marshal. Refer here for more information.
-o <OUTPUT_FILE>: We define the output path, filename and format after-oargument. This argument must always be input when running a new script.-s IMAGES_STORE: We specify new output path of downloading images by using this argument. Default directory:data/images.-s FEED_EXPORT_INDENT: Amount of spaces you want to indent for output json file. Default indent:4. For lighter output file, set this argument to0.-a keyword: Search products by keywords. Remember to put your keywords in the quotes""to avoid spacing error.-a category: Search products by categories. Remember this argument andkeywordargument cannot be used at the same time.-a sort_type: Product display in this order. Supported options:popular,top_seller,newest,asc,desc. Default:popular.-a parser_type: Choose which parser to get product information. Supported options:api,html. Default:api. Work only forkeyword.-a num_products: Number of products you want to crawl. Default:50.
To understand what I've done in this project. Please refer to my notes below:
- Scrapy at a glance: scrapy.
- Crawling by keyword: keyword.
- Crawling by category: category.
- Personal experience and comparison: comparision.