Description
Hi guys,
I'm trying to simulate sequence under a bunch of different conditions. The vast majority of them work, but I've been hitting an uninformative error message with others:
extraVars <- commandArgs(trailingOnly=T)
#extraVars <- c("ortho_FALSE", 0.93, 50) # for testing only
library(polyester)
library(Biostrings)
setwd("~/orthoexon/simNGS/hg38_macFas5")
# FASTA annotation
hsFastaPath <- paste0(getwd(), "/fasta/hg38_ensembl84_orthoexon_", extraVars[2], "pc_", extraVars[1], ".fa")
hsFasta <- readDNAStringSet(hsFastaPath)
# ~20x coverage ----> reads per transcript = transcriptlength/readlength * 20
# here all transcripts will have ~equal FPKM
hsDepth <- round(20 * width(hsFasta) / 100)
# Simulate humans:
dir.create(file.path(paste0(getwd(), "/polyester/", extraVars[1], "_", extraVars[2], "_", extraVars[3], "bp/hg38")), showWarnings = F, recursive=T)
simulate_experiment(hsFastaPath, reads_per_transcript=hsDepth, num_reps=10, fold_changes=matrix(1, nrow=length(hsFasta)), outdir=paste0(getwd(), "/polyester/", extraVars[1], "_", extraVars[2], "_", extraVars[3], "bp/hg38"), readlen=as.numeric(extraVars[3]), paired=F, seed=round(11051984*as.numeric(extraVars[2])))
Often times, simulate_experiment
fails fairly quickly with this error message:
Error in .Call(.NAME, ..., PACKAGE = PACKAGE) :
negative length vectors are not allowed
I launch many jobs with the same script, for multiple values of extraVars[2], and only some are repeatedly and predictably failing; extraVars[3] is always either 50 or 100. I realise the problem is probably coming from my fasta input, but the files I use differ by thousands of lines and checking them manually is not really going to work, so any intuition you might have as to what is causing the error would really help me clean them up - thank you!!
(And, not to be captain obvious here, but here are the lengths for one of the failed runs:
summary(width(hsFasta))
Min. 1st Qu. Median Mean 3rd Qu. Max.
30 293 793 1486 2090 98920
summary(hsDepth)
Min. 1st Qu. Median Mean 3rd Qu. Max.
6.0 59.0 159.0 297.1 418.0 19780.0
there are no obvious negative lengths that I can find.)
Oh, also - for some reason this command yields 20 replicates instead of 10 per run of simulate_experiment. Any idea why? Not that I don't appreciate having twice as many samples as I need, I just couldn't find the answer in the manual.
sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS release 6.6 (Final)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] Biostrings_2.40.2 XVector_0.12.1 IRanges_2.6.1
[4] S4Vectors_0.10.3 BiocGenerics_0.18.0 polyester_1.8.3
loaded via a namespace (and not attached):
[1] zlibbioc_1.18.0