Bulk download Heritage Made Digital digitised newspapers from the British Library Research Repository
This command line tool is intended to make it easy to bulk download Heritage Made Digital Newspapers from the British Library Research Repository.
The tool has been used by Living with Machines but may be of use to other people. Since the tool is intended to download the collection in 'bulk' it is likely to be useful if you either want:
- all HMD newspapers
- a random sample i.e. 10 newspaper
This tool was developed for internal use so it might not be suitable for your needs. If you have problems or suggestions with the tool please open an issue.
The tool was developed using nbdev
so although all of the code for this tool lives inside a single Jupyter notebook you can still install it as a Python package. At the moment this is done via GitHub:
python -m pip install git+https://github.com/Living-with-machines/hmd_newspaper_dl
It is recommened to install the package insdide a virtual environment. Since this is a command line tool one simple option for installing is pipx which will install the tool inside a new virtual environment for you:
pipx install git+https://github.com/Living-with-machines/hmd_newspaper_dl
Once you have installed the packaghe you will also have made available a console script hmd_download
:
usage: hmd_download [-h] [--n_threads N_THREADS] [--subset SUBSET] save_dir
Download HMD newspaper from iro to `save_dir` using `n_threads`
positional arguments:
save_dir Output Directory
optional arguments:
-h, --help show this help message and exit
--n_threads N_THREADS Number threads to use (default: 8)
--subset SUBSET Download subset of HMD
This will by default download all available newspaper titles. If you just want a subset you can pass in a subset parameter to specify how many titles you want. At the moment this is just a random selection.
This tool was put together for internal Living with Machines but is shared in case it is helpful for other people. If you have feedback, problems or want to suggest changes please open a new issue.