Skip to content

WenZuHuai/fintech_spider

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

fintech_spider

fintech(i.e. financial technology)

"fintech_spider" is a spider based on Scrapy to crawl a large number of financial data on the Internet.

The data crawled by "fintech_spider" has been used by 嗅金牛, 数知源.

Structrue of fintech_spider

Directory Author Usage
Anti_Anti_Spider hee
demo Some Demonstrations(e.g. PhantomJS/Proxies, etc.)
demo/ArticleSpider lxw
demo/geetestcrack.py hee
demo/phantomjs_proxy lxw Add IP proxy in PhantomJS
demo/user_agent.txt hee A large number of User-Agents
README.md lxw The document for this project
Spiders The Spiders directory stores Python scripts that crawl data we need from the Internet)
Spiders/CJO_case_demo.md lxw Some case and The main idea about how to crawl data from 中国裁判文书网(China Judgements Online)
Spiders/CJOSpider lxw (w/ scrapy)Spiders for crawling data from 中国裁判文书网(China Judgements Online)
Spiders/CJOSpider_wo_scrapy.py lxw (w/o scrapy)Spiders for crawling data from 中国裁判文书网(China Judgements Online)
Spiders/CninfoSpider hee Spiders for crawling data from 巨潮资讯
Spiders/CNKI_Patent lxw Spiders for crawling the patent data from 中国知网
Spiders/NECIPSSpider lxw Spiders for crawling data from 国家企业信用信息公示系统(National Enterprise Credit Information Publicity System)
Spiders/NECIPSSpider_wo_scrapy.py lxw Spiders(w/o Scrapy) for crawling data from 国家企业信用信息公示系统(National Enterprise Credit Information Publicity System)
Spiders/new_three_board lxw Spiders for crawling data from 全国中小企业股份转让系统

TODO

He Chen:

  1. 在README.md中更新所提交的各个目录的用途(如果子目录中有关键的文件,也请列出)

Xiaowei Liu:

  • CJOSpider
  1. [NO, 按理说只用CJOSpider.py然后重新运行就可以] 增加对Redis中TASKS_HASH没有爬取结束任务的爬取代码(一定小于CONCURRENT_REQUESTS个?)
  2. [NO, 按理说只用CJODocIDSpider.py然后重新运行就可以] 增加对Redis中DOC_ID_HASH没有爬取结束任务的爬取代码
  • NECIPSSpider
  1. add Referer to NECIPSpider_wo_scrapy.py
  2. threadpool for NECIPSSpider_wo_scrapy.py

About

Based on Scrapy

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published