fintech_spider

fintech(i.e. financial technology)

"fintech_spider" is a spider based on Scrapy to crawl a large number of financial data on the Internet.

The data crawled by "fintech_spider" has been used by 嗅金牛, 数知源.

Directory	Author	Usage
Anti_Anti_Spider	hee

demo		Some Demonstrations(e.g. PhantomJS/Proxies, etc.)
demo/ArticleSpider	lxw
demo/geetestcrack.py	hee
demo/phantomjs_proxy	lxw	Add IP proxy in PhantomJS
demo/user_agent.txt	hee	A large number of User-Agents

README.md	lxw	The document for this project

Spiders		The Spiders directory stores Python scripts that crawl data we need from the Internet)
Spiders/CJO_case_demo.md	lxw	Some case and The main idea about how to crawl data from 中国裁判文书网(China Judgements Online)
Spiders/CJOSpider	lxw	(w/ scrapy)Spiders for crawling data from 中国裁判文书网(China Judgements Online)
Spiders/CJOSpider_wo_scrapy.py	lxw	(w/o scrapy)Spiders for crawling data from 中国裁判文书网(China Judgements Online)
Spiders/CninfoSpider	hee	Spiders for crawling data from 巨潮资讯
Spiders/CNKI_Patent	lxw	Spiders for crawling the patent data from 中国知网
Spiders/NECIPSSpider	lxw	Spiders for crawling data from 国家企业信用信息公示系统(National Enterprise Credit Information Publicity System)
Spiders/NECIPSSpider_wo_scrapy.py	lxw	Spiders(w/o Scrapy) for crawling data from 国家企业信用信息公示系统(National Enterprise Credit Information Publicity System)
Spiders/new_three_board	lxw	Spiders for crawling data from 全国中小企业股份转让系统

[NO, 按理说只用CJOSpider.py然后重新运行就可以] 增加对Redis中TASKS_HASH没有爬取结束任务的爬取代码(一定小于CONCURRENT_REQUESTS个?)
[NO, 按理说只用CJODocIDSpider.py然后重新运行就可以] 增加对Redis中DOC_ID_HASH没有爬取结束任务的爬取代码

Name		Name	Last commit message	Last commit date
Latest commit History 146 Commits
Anti_Anti_Spider/Captcha		Anti_Anti_Spider/Captcha
IPPool		IPPool
Spiders		Spiders
demo		demo
scripts		scripts
test		test
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback