What is it?

Default template for my scraping project. Usually it's few scrapers + REST API on top for JavaScript/Mobile frontend or another service to consume.

Depends on

MongoDB
Redis

Featured by

How to use

clone repo

git clone https://github.com/istinspring/imscrape-template
cd imscrape-template

install project dependencies

pip install -r requirements.txt

run test scraper

python cli.py -T github

run REST api

python api.py

Complex run with celery

Run the api and celery using Procfile

honcho start

Run crawler as a celery task passing -c options into the cli.py (command line interface) script

python cli.py -T github -c

and open http://localhost:8000 (both json and xml supported via content-type header)

<resource href="github_favorites" title="github_favorites">
    <link rel="last" href="github_favorites?page=3" title="last page"/>
    <link rel="next" href="github_favorites?page=2" title="next page"/>
    <link rel="parent" href="/" title="home"/>
    <_meta>
        <max_results>10</max_results>
        <page>1</page>
        <total>25</total>
    </_meta>

    <resource href="github_favorites/555179e74dc7822d62abc2b5" title="Github_favorite">
        <_created>Tue, 12 May 2015 03:56:23 GMT</_created>
        <_etag>83dc8829ef61dfeb52111e04cb04a6e534b36810</_etag>
        <_id>555179e74dc7822d62abc2b5</_id>
        <_updated>Tue, 12 May 2015 03:56:23 GMT</_updated>
        <author>https://github.com/square</author>
        <commits>29</commits>
        <forks>120</forks>
        <repo_name>leakcanary</repo_name>
        <source_url>https://github.com/square/leakcanary</source_url>
        <stars>2116</stars>
        <watchers>155</watchers>
    </resource>

    ...
</resource>

TODO

database post/update via. internal eve (+ validate data)
add honcho
init settings.py constants from the environment variables
add celery
add makefile

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
schemas		schemas
scrapers		scrapers
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
Procfile		Procfile
README.md		README.md
api.py		api.py
cli.py		cli.py
job.py		job.py
requirements.txt		requirements.txt
settings.py		settings.py
tasks.py		tasks.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is it?

Depends on

Featured by

How to use

Complex run with celery

TODO

About

Releases

Packages

Languages

License

sonlia/imscrape-template

Folders and files

Latest commit

History

Repository files navigation

What is it?

Depends on

Featured by

How to use

Complex run with celery

TODO

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages