This module provides the ability to download and work with weekly bulk Patent zip files, along with other external data resources.
- Async Downloads
- Automatic retry on failure
- Restartable (currently only syncronous downloads)
- USPTO Bulk Patent Download, grants and applications from Google or Reedtech
- Patent CPC Classification Scheme
- FDA "NDA" Drug Database
Downloads from https://bulkdata.uspto.gov/
gov.uspto.bulkdata.cli2.BulkData
options:
--type=application Data type: [grant, application, gazette]
--date="20140101-20161231" Single date range or comma seperated list of date ranges
--limit=0
--skip=0
--outdir="../download"
--async=false
--filename="ipa140109.zip"
gov.uspto.bulkdata.cli.Download
Options:
--available Display available sources
--source=cpc Source provider: [cpc, fda, reetech, google]
--type=cpc_scheme Data type: [cpc_scheme, nda, patent_grant, patent_application]
--limit=1
--skip=0
--outdir="../download"
--async=false
--filename="ipa140109.zip"
gov.uspto.bulkdata.cli.ExtractPatent --source="download/ipa150101.zip" --skip 0 --limit 5 --outdir="download" --aps=false
gov.uspto.bulkdata.cli.Look
Options:
--source="download/ipa150101.zip"
--skip=0
--limit=1
--num=100 Diplay by iteration number in bulk file
--id=US3931903A1 Display by Patent ID
--fields=id,title,family Fields to display
--out=download/patent.xml Output to File instead of STDOUT
--aps=true Viewing a Greenbook Patent
Fields:
raw Display raw Document
object Display Patent toString()
id
title
abstract
description
citations
claims
assignee
inventor
classification
family
gov.uspto.bulkdata.cli.Look --source="download/ipa150305.zip" --num=3 --fields=xml --out=download/patent.xml
Dump a single Patent XML Document by ID (note it may be slow as it parse each document to check its id):
gov.uspto.bulkdata.cli.Look --source="download/ipa150305.zip" --id=US3931903A1 --fields=xml --out=download/patent.xml
# id requirements: country code, patent id without leading zero, and kind code.