Skip to content

realzza/xenopy

 
 

Repository files navigation

birdData

BirdData is a python wrapper for Xeno-canto API 2.0. Enables user to download bird data with one command line. BirdData supports multithreading download.

Environment

Download repo to local:

git clone [email protected]:realzza/birdData.git

Set up environment:

pip install -r requirement.txt

Download MetaData

Metadata is a simple configuration for each recording. Typically, metadata files contain information like recordist, recoding time, country, location, latitude, longitude, altitude, and recording length. Below is an example of a metadata file.

{
    "id": "426350", 
    "gen": "Abroscopus", 
    "sp": "superciliaris", 
    "ssp": "", 
    "en": "Yellow-bellied Warbler", 
    "rec": "Peter Boesman", 
    "cnt": "India", 
    "loc": "Eagle Nest, Sessni area and lower, Arrunachal Pradesh",
    "lat": "27.0223", 
    "lng": "92.4139",
    "alt": "", 
    "type": "song", 
    "url": "//xeno-canto.org/426350", 
    "file": "https://xeno-canto.org/426350/download"
}

Use download_meta.py to download metadata files. Customize your own query by defining multiple parameters before you request metadata from xeno-canto api.

optional arguments:
  -h, --help           show this help message and exit
  --gen GEN            genus
  --ssp SSP            subspecies
  --cnt CNT            country
  --type TYPE          type
  --rmk RMK            remark
  --lat LAT            latitude
  --lon LON            longtitude
  --loc LOC            location
  --box BOX            box:LAT_MIN,LON_MIN,LAT_MAX,LON_MAX
  --area AREA          Continent
  --since SINCE        e.g. since:2012-11-09
  --year YEAR          year
  --month MONTH        month
  --output OUTPUT      directory to output directory. default: `dataset/metadata/`
  --attempts ATTEMPTS

A sample metadata downloading activity

python download-meta.py --cnt China --loc Shanghai --since 2022-01-01 --output test/

Please refer to the Search Tips for definitions about above parameters.

Download Recordings

Single-thread

Download audio data for one bird species. Use scientific name starting with lowercase. e.g, cettia cetti.

python download.py --name "cettia cetti"

Download audio data for a file of species names. Format requirement: names divided by "\n"

python download.py --name name_file

General Usage:

usage: download.py [-h] --name NAME

download bird audios

optional arguments:
  -h, --help   show this help message and exit
  --name NAME  [1] name of one bird species; [2] file of bird species spaced
               by '\n'

Multi-thread

Usage

Speed up downloading using multiple threads.

python download-mult.py --name "cettia cetti" --process-ratio 0.6

Download multiple birds in a file, format requirement: names divided by "\n"

python download-mult.py --name name_file --process-ratio 0.6

General Usage:

usage: download-mult.py [-h] --name NAME [--process-ratio PROCESS_RATIO]

download bird audios

optional arguments:
  -h, --help            show this help message and exit
  --name NAME           [1] name of one bird species; [2] file of bird species
                        spaced by '\n'
  --process-ratio PROCESS_RATIO
                        float[0~1], define cpu utilities in downloading audios
                        [default: 0.8]

Kill multiprocess

It would be hard to kill multiprocess programs manually. download-mult.py has a backdoor for this concern: it will automatically generate a kill.sh after downloading started. Kill program by

bash kill.sh

Badcase backup

Find download failure record at bad_urls.txt so that you can redownload afterwards if necessary.

Align Dataset

The bird data you download is in .mp3 format, unsupported by lightweight feature-extracting libraries such as soundfile and audiofile (librosa is terribly slow compared to these two). Transform unextractable .mp3 into extractable .wav by alignDataset-mult.py script.

python alignDataset-mult.py --dataDir dataset/audio --outDir ./wavs --process 24 

Usage

usage: alignDataset-mult.py [-h] [--dataDir DATADIR] [--outDir OUTDIR]
                            [--process PROCESS]

align smaplerate of dataset

optional arguments:
  -h, --help         show this help message and exit
  --dataDir DATADIR  path to the input dir
  --outDir OUTDIR    path to the output dir
  --process PROCESS  number of process running

Kill multiprocess

bash kill_align.sh

Bad transformation backup

Find transformation failures at bad_aligns.txt

To-do

  • [12.29] multiprocess download
  • [1.1] Automated killing script for multiprocess program
  • [1.1] Bad url backup for trace back
  • define sample rate prior to download

Contact

Feel free to file an issue had you encountered any problems. Have fun!