A tool for prioritising identity-by-descent (IBD) variants in Whole Genome Sequencing (WGS) data from families with rare heritable diseases. IBDVar consists of a variant prioritisation pipeline command-line program and an intereactive Shiny dashboard for starting the pipeline and visualising output.
- Overview
- System Requirements
- Variant Prioritisation Pipelines
- Shiny Dashboard
- Questions, Feature Requests, Bug Reports and Issues
- Licence
- Collaborators
The use of IBDVar follows a three step process:
The prioritisation pipeline is composed of two sub-pipelines (short variants and structural variants (SV)) that are started independently. Users can upload a multi-sample VCF file and configure the short variants or structural variants prioritisation pipeline in the Shiny dashboard or run the pipelines on a multi-sample VCF file at the command line using a configuration file. Once the pipeline has completed the output can be explored interactively in the corresponding pipeline tab in the Shiny dashboard. Unique to the tool, is the integration of IBD segment detection in variant prioritisation for WGS data. An overview of the key steps is shown below.
For running the bash pipeline backend:
- Linux OS (developed and tested on Ubuntu 22 LTS)
- R (>=4.2)
- BCFtools (1.15.1)
- ClinVar VCF file (GRCh38)
- IBIS (v1.20.9)
- Variant Effect Predictor (VEP)
- CADD (v1.6) plugin resources (SNVs and indels)
- CCDS (release number 22) text file
For the deploying the shiny dashboard, the following R dependencies are required:
- shiny
- shinydashboard
- shinyFiles
- shinyJS
- htmlwidgets
- dplyr
- jsonlite
- purrr
- readxl
- DT
- ideogram
- reshape2
To install these R packages, type the following in an R console:
install.packages(c("shiny", "shinydashboard", "shinyFiles", "shinyJS", "htmlwidgets", "dplyr", "jsonlite", "purrr", "readxl", "DT", "reshape2"))
To install the ideogram library, find the path of ideogram tarball file (.tar.gz) and type:
install.packages("path/to/ideogram_0.0.0.9000.tar.gz", type="source", repos=NULL)
IBDVar can prioritise both short variants and structural variants (SV) from multi-sample VCF files generated from the Illumina DRAGEN Pipeline. Both prioritisation pipelines can be initiated from the command-line or inside the Shiny dashboard "Start pipeline" tab.
A multi-sample VCF file contained short variants (indels/ SNPs) called from the Illumina DRAGEN pipeline is used as input (see the Illumina website for details). The VCF file format should adhere to version 4.2 specification. The pipeline expects chromosome naming to be prefixed with "chr" however, the tool checks for naming consistencies between the input VCF and the annotation resources implemented in the pipeline.
To run the short variants pipeline at the command line, you will need to create a configuration file with parameters (with "=" separating the parameter and its value) described in the table below:
Category |
Configuration parameter |
Description |
General settings |
in_vcf |
An input file path for the small variants VCF produced from Illumina DRAGEN Germline Pipeline. |
|
out_dir |
An output directory path location to generate pipeline output |
|
threads |
The number of threads (CPU) for executing the pipeline (default: 4) |
QC filtering |
GQ |
Minimum genotype quality threshold for each sample (default: 20) |
|
DP |
Minimum (FORMAT) read depth threshold per sample (default:10) |
|
MAF |
Minimum allele frequency for variants to be selected for the PLINK dataset |
IBD detection |
mind |
Maximum percentage of missing genotype data e.g., 0.1 excludes samples with > 10% missing genotype data (default: 0.1) |
|
geno |
Select variants with missing calling rates lower than the provided value (default: 0.1) |
|
max_af |
Maximum allele frequency threshold for rare variants from the gnomAD, ESP or 100 genomes project populations. (Default: 0.05)
|
|
ibis_mt1 |
Minimum number of markers for IBIS to call a segment IBD1 |
|
ibis_mt2 |
Minimum number of markers for IBIS to call a segment IBD2 |
|
genes |
A list of genes of interest for selecting variants in specified genes (optional) |
Tools |
tools_dir |
Optional tools base directory path for tools required by the pipeline |
|
plink |
PLINK2 directory path |
|
vep |
Vep executable file path |
|
ibis |
Ibis directory path |
Resources |
resources |
Optional base directory path for resources |
|
clinvar |
ClinVar VCF file path |
|
genetic_map |
The file path for the genetic recombination map for the human genome |
|
cadd |
CADD plugin resource directory path |
Click here for an example of a short variants config file.
As the short variants pipeline can take a few hours to complete, it is highly recommended to run the pipeline in a Linux GNU screen to prevent abrupt termination of the pipeline, for example, in the event of a connection drop or a sudden SSH session termination. To install Linux GNU Screen on Ubuntu / Debian systems:
sudo apt update
sudo apt install screen
On CentOS/Fedora type:
sudo yum install screen
To create a screen type screen
in the terminal, or create a named screen by typing the following:
screen -S <screen_name>
Attach the screen to the terminal as follows:
screen -r <screen_name>
Once the screen is attached execute the pipeline as described in the usage section below.
After attaching the screen to the terminal and initiating the short variants pipeline, detach the screen by pressing CtrA
and d
, or typing in the terminal:
screen -d <screen_name>
This will allow exiting of the terminal window without terminating the pipeline. To reattach the screen, simply type in the terminal:
screen -r <screen_name>
./short_variants.sh -c pipeline.config [-m in_vcf.md5sum ]
-c
: config file (ending with .config) containing all parameters to execute the pipeline (required)-m
: md5sum file to perform and and md5sum check on the input VCF file specified in the config file-h
: help message with usage details and options
A multi-sample VCF file contained structural variants called (using Manta) from the Illumina DRAGEN pipeline is used as input (see the Illumina website for details). The VCF file format should adhere to version 4.2 specification. The pipeline expects chromosome naming to be prefixed with "chr" however, the tool checks for naming consistencies between the input VCF and the annotation resources implemented in the pipeline.
To start the structural variants pipeline at the command line, you will need to create a configuration file using the parameters specified in the table below:
Category |
Configuration parameter |
Description |
General settings |
sv_vcf |
Input VCF file path |
|
out_dir |
Directory path for pipeline output |
|
threads |
Number of threads (CPU) |
Variant selection |
ibd_seg |
IBD segment file path (from the short variants pipeline) for selecting SV in IBD segments. |
|
genes |
A list of genes of interest to be used to filter variants. |
Tools |
tools_dir |
(Optional) base directory for tools |
Resources |
resources |
The base directory for resources (optional) |
|
ccds |
CCDS directory path |
Click here for an example of a structural variants config file.
./structural_variants.sh -c pipeline.config
-c
: config file (ending with .config) containing all parameters to execute the pipeline (required)-h
: help message with usage details and options
The shiny dashboard allows users to start prioritisation pipelines for short or strucutral variants and to analyse the output interactively.
To start the Shiny Dashboard in the Cranfield Univeristy server, log into the Linux server deploying the tool and type the application URL (can be requested from the author) in a web-browser. (Note that development and testing was performed using the Google Chrome browser so performance may vary with other browsers.) The shiny dashboard can also be started in RStudio however it is not recommended, since most of views have been configured for browser display and may affect performance of the tool.
The "Start Pipeline" tab will be open first by default.
In the "Start Pipeline" tab you can start the short variants or structural variants pipeline by selecting an input VCF file, output folder for results and configuring parameters listed in the respective pipeline box.
Once parameters have been specified, click Start
in the respective pipeline box to run the pipeline. A notification message should appear in the bottom right corner indicating pipeline initiation.
In the "Short Variants" tab you can explore the short variants pipeline output interactively.
The tab features:
-
a "Files" box to upload the following files which are located in the "final_output" folder of the output folder specified at run-time of the pipeline:
- A prioritised and annotated list of variants produced from the short variants prioritisation pipeline.
- An IBIS IBD segment file produced from the pipeline
- An optional file containing list of genes of interest can also be uploaded to filter the variants by these genes
-
Interactive variants table - users can filter, sort, search and download a TSV file of variants reported in the table.
-
Filters panel - contains a series of checkboxes to filter variants by CADD score, predicted consequence, SIFT and PolyPhen calls, clinical significance (ClinVar) and VEP predicted impact (loss of function etc.)
-
Interactive ideogram - filters variants in the interactive data table below by the IBD region clicked by the user. A tool-tip reporting the chromosome number, start and end position of a given IBD region is displayed when a user hovers over an IBD region.
-
"Summary" box summarising:
- total number of variants
- number of pathogenic variants identified by ClinVar
- number of detected IBD segments, the total number of deleterious missense variants predicted by SIFT, PolyPhen and CADD
- number of loss of function variants
In the "Structural Variants" tab, the prioritised SV calls from the pipeline can be interactively explored using filters and an interactive data table.
SV tab features include:
- "Files" box for uploading the prioritised list of SV calls (.tsv) file
- Interactive table of variants that can filtered, sorted, searched and downloaded as a TSV file.
- "Summary" tab providing summary statistics on the various counts of SV types and also the mean SV lengths.
- Filters panel containing checkboxes to filter the variants table by: SV type, chromosome number, precision of breakpoints of called SVs and genes of interest.
For any questions, feature requests, bug reports or issues regarding the latest version of IBDVar, please click on the "issues" tab present at the top-left of the GitHub repository page.
This codebase was developed as part of an MSc thesis project (MSc Applied Bioinformatics, Cranfield University 2021-2022) under the supervision of Dr Alexey Larionov.