BulkDownloader

Bulk Download

This module provides the ability to download and work with weekly bulk Patent zip files, along with other external data resources.

Features

Async Downloads
Automatic retry on failure
Restartable (currently only syncronous downloads)

Sources

USPTO Bulk Patent Download, grants and applications from Google or Reedtech
Patent CPC Classification Scheme
FDA "NDA" Drug Database

Additional sources can be added to sources.xml

Download Bulk Patent Zips from USPTO

Downloads from https://bulkdata.uspto.gov/

  gov.uspto.bulkdata.cli2.BulkData

    options:
      --type=application               Data type: [grant, application, gazette]
      --date="20140101-20161231"       Single date range or comma seperated list of date ranges
      --limit=0
      --skip=0
      --outdir="../download"
      --async=false
      --filename="ipa140109.zip"

Download other External Resources

 gov.uspto.bulkdata.cli.Download

    Options:
      --available             Display available sources
      --source=cpc            Source provider: [cpc, fda, reetech, google]
      --type=cpc_scheme       Data type: [cpc_scheme, nda, patent_grant, patent_application]
      --limit=1
      --skip=0
      --outdir="../download"
      --async=false
      --filename="ipa140109.zip"

Extract Patent Documents

 gov.uspto.bulkdata.cli.ExtractPatent --source="download/ipa150101.zip" --skip 0 --limit 5 --outdir="download" --aps=false

View single Patent Document

 gov.uspto.bulkdata.cli.Look

   Options:
      --source="download/ipa150101.zip"
      --skip=0                  
      --limit=1                 
      --num=100                    Diplay by iteration number in bulk file    
      --id=US3931903A1             Display by Patent ID
      --fields=id,title,family     Fields to display
      --out=download/patent.xml    Output to File instead of STDOUT
      --aps=true                   Viewing a Greenbook Patent
      
   Fields:
      raw        Display raw Document
      object     Display Patent toString()
      id
      title
      abstract
      description
      citations
      claims
      assignee
      inventor
      classification
      family

Dump a single Patent XML Document by location in zipfile; the 3rd document:

 gov.uspto.bulkdata.cli.Look --source="download/ipa150305.zip" --num=3 --fields=xml --out=download/patent.xml

Dump a single Patent XML Document by ID (note it may be slow as it parse each document to check its id):

 gov.uspto.bulkdata.cli.Look --source="download/ipa150305.zip" --id=US3931903A1 --fields=xml --out=download/patent.xml
 # id requirements: country code, patent id without leading zero, and kind code.

Name		Name	Last commit message	Last commit date
parent directory ..
.settings		.settings
src		src
.classpath		.classpath
.gitignore		.gitignore
.project		.project
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Bulk Download

Features

Sources

Download Bulk Patent Zips from USPTO

Download other External Resources

Extract Patent Documents

View single Patent Document

Dump a single Patent XML Document by location in zipfile; the 3rd document:

Dump a single Patent XML Document by ID (note it may be slow as it parse each document to check its id):

FilesExpand file tree

BulkDownloader

Directory actions

More options

Directory actions

More options

Latest commit

History

BulkDownloader

Folders and files

parent directory

README.md

Bulk Download

Features

Sources

Download Bulk Patent Zips from USPTO

Download other External Resources

Extract Patent Documents

View single Patent Document

Dump a single Patent XML Document by location in zipfile; the 3rd document:

Dump a single Patent XML Document by ID (note it may be slow as it parse each document to check its id):