fintech(i.e. financial technology)
"fintech_spider" is a spider based on Scrapy to crawl a large number of financial data on the Internet.
The data crawled by "fintech_spider" has been used by 嗅金牛, 数知源.
Directory | Author | Usage |
---|---|---|
Anti_Anti_Spider | hee | |
demo | Some Demonstrations(e.g. PhantomJS/Proxies, etc.) | |
demo/ArticleSpider | lxw | |
demo/geetestcrack.py | hee | |
demo/phantomjs_proxy | lxw | Add IP proxy in PhantomJS |
demo/user_agent.txt | hee | A large number of User-Agents |
README.md | lxw | The document for this project |
Spiders | The Spiders directory stores Python scripts that crawl data we need from the Internet) | |
Spiders/CJO_case_demo.md | lxw | Some case and The main idea about how to crawl data from 中国裁判文书网(China Judgements Online) |
Spiders/CJOSpider | lxw | (w/ scrapy)Spiders for crawling data from 中国裁判文书网(China Judgements Online) |
Spiders/CJOSpider_wo_scrapy.py | lxw | (w/o scrapy)Spiders for crawling data from 中国裁判文书网(China Judgements Online) |
Spiders/CninfoSpider | hee | Spiders for crawling data from 巨潮资讯 |
Spiders/CNKI_Patent | lxw | Spiders for crawling the patent data from 中国知网 |
Spiders/NECIPSSpider | lxw | Spiders for crawling data from 国家企业信用信息公示系统(National Enterprise Credit Information Publicity System) |
Spiders/NECIPSSpider_wo_scrapy.py | lxw | Spiders(w/o Scrapy) for crawling data from 国家企业信用信息公示系统(National Enterprise Credit Information Publicity System) |
Spiders/new_three_board | lxw | Spiders for crawling data from 全国中小企业股份转让系统 |
- 在README.md中更新所提交的各个目录的用途(如果子目录中有关键的文件,也请列出)
- CJOSpider
- [NO, 按理说只用CJOSpider.py然后重新运行就可以] 增加对Redis中TASKS_HASH没有爬取结束任务的爬取代码(一定小于CONCURRENT_REQUESTS个?)
- [NO, 按理说只用CJODocIDSpider.py然后重新运行就可以] 增加对Redis中DOC_ID_HASH没有爬取结束任务的爬取代码
- NECIPSSpider
- add Referer to NECIPSpider_wo_scrapy.py
- threadpool for NECIPSSpider_wo_scrapy.py