Automatically split downloads in chunks for queries with >4000 records

Just a small possible enhancement, but would it be possible to have the `download `function automatically split the queries in chunks for searches when the length of `list_of_accession_ids` is >4000.

Now I do this myself, e.g. to fetch the most recently uploaded records, using
```
df= query(
  credentials = credentials, 
  from_subm = as.character(GISAID_max_submdate), 
  to_subm = as.character(today),
  fast = TRUE
)
dim(df) # 103356      1
# function to split vector in chunks of max size chunk_length
chunk = function(x, chunk_length=4000) split(x, ceiling(seq_along(x)/chunk_length))

chunks = chunk(df$accession_id)
downloads = do.call(rbind, lapply(1:length(chunks),
                   function (chunk) {
                     message(paste0("Downloading batch ", chunk, " out of ", length(chunks)))
                     Sys.sleep(3)
                     return(download(credentials = credentials, 
                              list_of_accession_ids = chunks[[chunk]])) } ))
dim(downloads) # 103356     29
names(downloads)
# [1] "strain"                "virus"                 "accession_id"         
# [4] "genbank_accession"     "date"                  "region"               
# [7] "country"               "division"              "location"             
# [10] "region_exposure"       "country_exposure"      "division_exposure"    
# [13] "segment"               "length"                "host"                 
# [16] "age"                   "sex"                   "Nextstrain_clade"     
# [19] "pangolin_lineage"      "GISAID_clade"          "originating_lab"      
# [22] "submitting_lab"        "authors"               "url"                  
# [25] "title"                 "paper_url"             "date_submitted"       
# [28] "purpose_of_sequencing" "sequence"  
```

Even better could be to also have this parallelized (if GISAID would allow that), as the above is still relative slow - it now takes about 1.5 hours to download these 103K records from the last 5 days. If I tried with a chunk size of 5000 I received a Server error, so reduced it to 4000 and that seemed to work...


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatically split downloads in chunks for queries with >4000 records #29

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development