A Nextflow pipeline for processing target NGS BRCA data
This documentation provides an overview of the Nextflow pipeline for variant calling on BRCA genes using amplicon Illumina data and instructions on how to use it.
The pipeline performs the following steps:
- Quality Control: Assess the quality of the sequencing reads using FastQC.
- Read Alignment: Align reads to the reference genome using BWA-MEM.
- Merge BAM Files: Merge aligned BAM files by sample ID.
- BAM Processing: Process BAM files (sorting, marking duplicates) using SAMtools and Elprep.
- Quality Assessment: Assess the quality of the alignments using Qualimap.
- BAM to CRAM Conversion: Convert BAM files to CRAM format.
- Variant Calling: Call variants using Strelka and DeepVariant.
- Variant Filtering: Filter variant calls using BCFtools.
- Variant Annotation: Annotate variants using ANNOVAR.
- Quality Control Summary: Generate a summary of QC metrics using MultiQC.
- Nextflow installed on your system.
- Required bioinformatics tools installed (e.g., BWA, SAMtools, FastQC, etc.).
nextflow run main.nf --csv test.csv --debug true --outdir results
Prepare a CSV file containing the paths to your input sequencing reads. The CSV file should have the following columns:
sampleId
: Unique identifier for each sample.part
: Part of the sample (e.g., replicate number).read1
: Path to the first read file.read2
: Path to the second read file.
- fasta of reference genome : hs38DH.fa
- BRCA1/2 positions in bed format : brca.bed.gz
- Path to Annovar database : /annovar/hg38
- Annovar code : /annovar/table_annovar.pl
-
Prepare Input Files: Ensure your input files are in the correct format and paths are specified correctly in the CSV file.
-
Set Parameters: Define the output directory and the path to the CSV file.
-
Execute Pipeline: Run the pipeline using the following command:
nextflow run <path_to_pipeline.nf> --csv <path_to_input_csv> --outdir <output_directory>
Replace <path_to_pipeline.nf>
, <path_to_input_csv>
, and <output_directory>
with your actual file paths and directory.
nextflow run brca_variant_calling.nf --csv samples.csv --outdir results
This command runs the pipeline with the input samples specified in samples.csv
and outputs the results to the results
directory.
The pipeline generates various output files, including:
- Quality control reports.
- Aligned BAM/CRAM files.
- Variant call files (VCF).
- Filtered and annotated variant call files.
- A summary report of quality metrics.
A workflow for clinical profiling of BRCA genes in Chilean breast cancer patients via targeted sequencing Evelin González, Rodrigo Moreno Salinas, Manuel Muñoz, Soledad Lantadilla Herrera, Mylene Cabrera Morales, Pastor Jullian, Waleska Ebner Durrels, Gonzalo Vigueras Stari, Javier Anabalón Ramos, Juan Francisco Miquel, Lilian Jara, Carol Moraga, Alex Di Genova medRxiv 2024.09.25.24314295; doi: https://doi.org/10.1101/2024.09.25.24314295