Skip to content

"negative length vectors are not allowed" ? #41

Closed
@ireneg

Description

@ireneg

Hi guys,

I'm trying to simulate sequence under a bunch of different conditions. The vast majority of them work, but I've been hitting an uninformative error message with others:

extraVars <- commandArgs(trailingOnly=T)
#extraVars <- c("ortho_FALSE", 0.93, 50) # for testing only

library(polyester)
library(Biostrings)

setwd("~/orthoexon/simNGS/hg38_macFas5")

# FASTA annotation
hsFastaPath <- paste0(getwd(), "/fasta/hg38_ensembl84_orthoexon_", extraVars[2], "pc_", extraVars[1], ".fa")
hsFasta <- readDNAStringSet(hsFastaPath)

# ~20x coverage ----> reads per transcript = transcriptlength/readlength * 20
# here all transcripts will have ~equal FPKM
hsDepth <- round(20 * width(hsFasta) / 100)

# Simulate humans:
dir.create(file.path(paste0(getwd(), "/polyester/", extraVars[1], "_", extraVars[2], "_", extraVars[3], "bp/hg38")), showWarnings = F, recursive=T)
    simulate_experiment(hsFastaPath, reads_per_transcript=hsDepth, num_reps=10, fold_changes=matrix(1, nrow=length(hsFasta)), outdir=paste0(getwd(), "/polyester/", extraVars[1], "_", extraVars[2], "_", extraVars[3], "bp/hg38"), readlen=as.numeric(extraVars[3]), paired=F, seed=round(11051984*as.numeric(extraVars[2])))

Often times, simulate_experiment fails fairly quickly with this error message:

Error in .Call(.NAME, ..., PACKAGE = PACKAGE) : 
  negative length vectors are not allowed

I launch many jobs with the same script, for multiple values of extraVars[2], and only some are repeatedly and predictably failing; extraVars[3] is always either 50 or 100. I realise the problem is probably coming from my fasta input, but the files I use differ by thousands of lines and checking them manually is not really going to work, so any intuition you might have as to what is causing the error would really help me clean them up - thank you!!

(And, not to be captain obvious here, but here are the lengths for one of the failed runs:

summary(width(hsFasta))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
     30     293     793    1486    2090   98920 

summary(hsDepth)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    6.0    59.0   159.0   297.1   418.0 19780.0 

there are no obvious negative lengths that I can find.)

Oh, also - for some reason this command yields 20 replicates instead of 10 per run of simulate_experiment. Any idea why? Not that I don't appreciate having twice as many samples as I need, I just couldn't find the answer in the manual.

sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS release 6.6 (Final)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] Biostrings_2.40.2   XVector_0.12.1      IRanges_2.6.1      
[4] S4Vectors_0.10.3    BiocGenerics_0.18.0 polyester_1.8.3    

loaded via a namespace (and not attached):
[1] zlibbioc_1.18.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions