Skip to content

Bulk download British Library Heritage Made Digital Newspapers 📰

License

Notifications You must be signed in to change notification settings

Living-with-machines/hmd_newspaper_dl

Repository files navigation

hmd_download

Bulk download Heritage Made Digital digitised newspapers from the British Library Research Repository

This command line tool is intended to make it easy to bulk download Heritage Made Digital Newspapers from the British Library Research Repository.

The tool has been used by Living with Machines but may be of use to other people. Since the tool is intended to download the collection in 'bulk' it is likely to be useful if you either want:

  • all HMD newspapers
  • a random sample i.e. 10 newspaper

This tool was developed for internal use so it might not be suitable for your needs. If you have problems or suggestions with the tool please open an issue.

Install

The tool was developed using nbdev so although all of the code for this tool lives inside a single Jupyter notebook you can still install it as a Python package. At the moment this is done via GitHub:

python -m pip install git+https://github.com/Living-with-machines/hmd_newspaper_dl

It is recommened to install the package insdide a virtual environment. Since this is a command line tool one simple option for installing is pipx which will install the tool inside a new virtual environment for you:

pipx install git+https://github.com/Living-with-machines/hmd_newspaper_dl

How to use

Once you have installed the packaghe you will also have made available a console script hmd_download:

usage: hmd_download [-h] [--n_threads N_THREADS] [--subset SUBSET] save_dir

Download HMD newspaper from iro to `save_dir` using `n_threads`

positional arguments:
  save_dir               Output Directory

optional arguments:
  -h, --help             show this help message and exit
  --n_threads N_THREADS  Number threads to use (default: 8)
  --subset SUBSET        Download subset of HMD

This will by default download all available newspaper titles. If you just want a subset you can pass in a subset parameter to specify how many titles you want. At the moment this is just a random selection.

Feedback

This tool was put together for internal Living with Machines but is shared in case it is helpful for other people. If you have feedback, problems or want to suggest changes please open a new issue.