DBS-Pro Analysis

About

This pipeline analyses data sequencing data from DBS-Pro experiments for protein and PrEST quantification. The DBS-Pro method uses barcoded antibodies for surface protein quantification in droplets. For example to study single exosomes.

Overview of DBS-Pro pipeline run on three samples.

The pipeline takes input of single end FASTQs with a construct such as those specified in standard constructs. For each sample the DBS is extracted (extract_dbs) and clustered (dbs_cluster) to enable error correction of the DBS sequences (correct_dbs). At the same time the ABC and UMI are extracted from the same read (extract_abc_umi)and then the UMIs are demultiplexed based on their ABC (demultiplex_abc). For each ABC the UMIs are grouped by DBS then clustered to correct errors (umi_cluster). Finaly the corrected sequences are combined into a read specific DBS, ABC and UMI combination that are tallied to create the final output in the form of a TSV (integrate). If there are multiple sampels these are also merged to generate a combined TSV (merge_data). A final report is also generated to enable some basic QC of the data. Also see the demo for a step-by-step of a typical workflow.

^{DBS: Droplet Barcode Sequence. Reads sharing this sequence originate from the same droplet.}
^{ABC: Antibody Barcodes Sequence. Identifies which antibody was present in the droplet.}
^{UMI: Unique Molecular Identifier. Identifies how many antibodies with a particular ABC that was present in the droplet.}

Setup

First, make sure conda is installed on your system.

Clone the git repository.

git clone https://github.com/FrickTobias/DBS-Pro

Move into the git folder and install all dependencies in a conda environment.
```
cd DBS-Pro
```
For reproducibility the *.lock files are used.

2.1. For OSX use:
```
conda create --name dbspro --file environment.osx-64.lock
```
2.2. For LINUX use:
```
conda create --name dbspro --file environment.linux-64.lock
```
2.3. Using flexible dependancies (Not recommended)
```
conda env create --name dbspro --file environment.yml
```
This option will likely introduce newer versions the softwares and depenencies which have not yet been tested.
Activate the conda environment.
```
conda activate dbspro
```
Install the dbspro package.
```
pip install .
```
For development, please use pip install -e .[dev].

Usage

Prepare a FASTA with each of the antibody barcodes used in your experiment. The entry name will be used to define the targets. Also make sure that each sequence is prepended with ^, this is used for demultiplexing. See the example FASTA below:

>ABC01
^ATGCTG
>ABC02
^GTAGAT
>ABC03
^CTAGCA

Use dbspro init to create an analysis folder. Provide the FASTA with the antibody barcodes (here named ABCs.fasta), an directory name and one or more FASTQ for the samples.

dbspro init --abc ABCs.fasta <output-folder> <sample1.fastq>

If you have several samples you could also provide a CSV file in the line format: </path/to/sample.fastq>,<sample_name>. This enables you to name your samples as you wish. With a CSV the initialization is as follows:

dbspro init --abc ABCs.fasta --sample-csv samples.csv <output-folder>

Once the directory has been successfully initialized, moving into the directory

cd <output-folder>

and check the current (default) configs using

dbspro config

Any changes to the configs should be primaraly be done through the dbspro config command to validate the parameters. You can check the construct layout by running dbspro config --print-construct. Some standard constructs are also defined, see Standard constructs. Once the configs are updated you are ready to run the full analysis using this command.

dbspro run

For more information on how to run use dbspro run -h.

Output files

The main output is a TSV file data.tsv.gz with the following columns:

Column name	Description
`Barcode`	The DBS sequence
`Target`	Target name (accuired from ABC FASTA headers)
`UMI`	The UMI sequence
`ReadCount`	Number of reads with this DBS, Target and UMI combination
`Sample`	Sample name

For convenience, anndata h5ad files with count matrices are also generated for each sample. These can be used for downstream analysis using Scanpy. To import the data use the following code:

import scanpy as sc
adata = sc.read_h5ad("mysample.h5ad")
adata

The pipeline also generates a report report.html with some basic QC metrics.

Standard constructs

The most common construct are included as presets which can be initialized using the -c/--construct parameter in dbspro config. Currently available constructs include:

dbspro_v1

Sequence: 5'-CGATGCTAATCAGATCA BDVHBDVHBDVHBDVHBDVH AAGAGTCAATAGACCATCTAACAGGATTCAGGTA XXXXX NNNNNN TTATATCACGACAAGAG-3'
Name:        |       H1      | |       DBS        | |               H2               | |ABC| |UMI | |       H3      |
Size (bp):   |       17      | |        20        | |               34               | | 5 | | 6  | |       17      |

This is the DBS-Pro construct used in the publication Stiller et al. 2019.

dbspro_v2

Sequence: 5'-CAGTCTGAGCGGTTCAACAGG BDVHBDVHBDVHBDVHBDVH GCGGTCGTGCTGTATTGTCTCCCACCATGACTAACGCGCTTG XXXXX NNNNNN CACCTGACGCACTGAATACGC-3'
Name:        |         H1        | |       DBS        | |                   H2                   | |ABC| |UMI | |         H3        |
Size (bp):   |         21        | |        20        | |                   42                   | | 5 | | 6  | |         21        |

This is the DBS-Pro construct used in the publication Banijamali et al. 2022.

pba

Sequence: 5'-NNNNNNNNNNNNNNN ACCTGAGACATCATAATAGCA XXXXX NNNNNN CATTACTAGGAATCACACGCAGAT-3'
Name:        |     DBS     | |         H2        | |ABC| |UMI | |          H3          |
Size (bp):   |      15     | |         21        | | 5 | | 6  | |          24          |

This is the construct used in the article Wu et al. 2019 which introduces the Proximity Barcoding Assay (PBA).

Demo

A short demostration of the pipeline and some downstream analysis is available in the following Jupyter Notebook. This can also be used to test that the conda environment is properly setup.

Development

For notes on development see doc/development.

Publications

Checkout version v0.1 for the pipeline used in:

Stiller, C., Aghelpasand, H., Frick, T., Westerlund, K., Ahmadian, A., & Eriksson Karlström, A. (2019). Fast and efficient Fc-specific photoaffinity labelling to produce antibody-DNA-conjugates. Bioconjugate chemistry.

Version v0.3 was used in:

Banijamali, M., Höjer, P., Nagy, A., Hååg, P., Gomero, E. P., Stiller, C., Kaminskyy, V. O., Ekman, S., Lewensohn, R., Karlström, A. E., Viktorsson, K., & Ahmadian, A. (2022). Characterizing Single Extracellular Vesicles by Droplet Barcode Sequencing for Protein Analysis. Journal of Extracellular Vesicles, e12277.

Name		Name	Last commit message	Last commit date
Latest commit History 400 Commits
.github/workflows		.github/workflows
doc		doc
example		example
src/dbspro		src/dbspro
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.linux-64.lock		environment.linux-64.lock
environment.osx-64.lock		environment.osx-64.lock
environment.yml		environment.yml
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DBS-Pro Analysis

About

Setup

Usage

Output files

Standard constructs

dbspro_v1

dbspro_v2

pba

Demo

Development

Publications

About

Releases 3

Packages

Contributors 2

Languages

License

FrickTobias/DBS-Pro

Folders and files

Latest commit

History

Repository files navigation

DBS-Pro Analysis

About

Setup

Usage

Output files

Standard constructs

dbspro_v1

dbspro_v2

pba

Demo

Development

Publications

About

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 2

Languages

Packages