scrapy command note
scrapy basic command
Scrapy has some useful subcommands, like "startproject" I introduced in a previous entry.
pythonのフレームワークでサクッとクローラをつくる。"Python Framework Scrapy" - ケンキュウハック
This is a note for scrapy subcommands.
startobject
Create a Scrapy project.$ scrapy startproject newproject
You can edit python files under newproject directory.
genspider
Create a new spider and check available templates.$ scrapy genspider -t basic newspider01 example.com Created spider 'newspider01' using template 'basic' in module: scrapy_sample.spiders.newspider01
Create a "newspider01" crawls to "http://www.example.com/".
Following command shows available templates.
scrapy genspider -l basic crawl csvfeed xmlfeed
crawl
Start crawling a spider.$ scrapy crawl newspider01
list
Show all spiders$ scrapy list newspider01 newspider02
view
Open a web page in a browser.scrapy view http://www.example.com/
This opens the given url page in your browser.
shell
Check parameters in a python console.scrapy shell http://www.example.com/some/page.html ... [s] Available Scrapy objects: [s] hxs <HtmlXPathSelector xpath=None data=u'<html><head><title>Example Domain</title'> [s] item {} [s] request <GET http://www.example.com/some/page.html> [s] response <200 http://www.iana.org/domains/example> [s] settings <CrawlerSettings module=<module 'scrapy_sample.settings' from '/Users/shinya/scrapy_sample/scrapy_sample/settings.pyc'>> [s] spider <BaseSpider 'default' at 0x10a0ef190> [s] Useful shortcuts: [s] shelp() Shell help (print this help) [s] fetch(req_or_url) Fetch request (or URL) and update local objects [s] view(response) View response in a browser >>>print hxs <HtmlXPathSelector xpath=None data=u'<html><head><title>Example Domain</title'>
Check parameters, the spider took from the url, in a python console.
You can see more information here.