DCoNA is a statistical tool that allows one to identify pair interactions, which correlation significantly changes between two conditions. DCoNA was designed to test the hypothesis for a predefined list of source and target pairs ("Network" regime). However, DCoNA can also be used in the complete- network regime when the list is not given ("Exhaustive" regime). In this regime, DCoNA tests the hypothesis for all possible pairs of molecules from expression data. Aside from the hypothesis testing, DCoNA can be used to test that significantly altered correlations of a particular source molecule are overrepresented among all significantly changed correlations. Also, DCoNA can compute mean, median, and other quantiles of z-statistics associated with a particular molecule and its targets to determine a trend in correlation changes.
If you use DCoNA in work contributing to a scientific publication, we ask you to cite our publication:
Anton Zhiyanov, Narek Engibaryan, Stepan Nersisyan, Maxim Shkurnikov, Alexander Tonevitsky, Differential co-expression network analysis with DCoNA reveals isomiR targeting aberrations in prostate cancer, Bioinformatics, Volume 39, Issue 2, February 2023, btad051
https://doi.org/10.1093/bioinformatics/btad051
pip install dcona
You can try DCoNA on TCGA-PRAD test dataset
You can use DCoNA either as Python-module or as a command-line tool.
Detailed description of functions with data example and test launch.
It tests the hypothesis on correlation equiavalence between pairs of genes
dcona.ztest(data_df, description_df, reference_group, experimental_group, correlation='spearman', alternative='two-sided', interaction=None, repeats_number=None, output_dir=None, process_number=None)
- Command-line usage:
dcona ztest config.json
It aggregates correlation changes of source molecule with all its targets.
dcona.zscore(data_df, description_df, reference_group, experimental_group, correlation='spearman', score='mean', alternative='two-sided', interaction=None, repeats_number=None, output_dir=None, process_number=None)
- Command-line usage:
dcona zscore config.json
It groups pairs with changed correlations by the source molecules and finds overrepresented groups using the hypergeometric test.
dcona.hypergeom(ztest_df, alternative='two-sided', oriented=True, output_dir=None)
- Command-line usage:
You should launchztest
and thenhypergeom
with the same config file.dcona hypergeom config.json
To run the tool in command line you need the following data:
config.json
containing data filenames and tool usage parameters
{
"data_path": "./example/data/data.csv",
"description_path": "./example/data/description.csv",
"interaction_path": "./example/data/interactions.csv",
"output_dir_path": "./../output/",
"reference_group": "Normal",
"experimental_group": "Tumor",
"correlation": "spearman",
"alternative": "two-sided",
"score": "mean",
"repeats_number": 500,
"process_number": 2
}
Both relative and absolute file paths can be used.
Data description:
-
data_path
:data.csv
contains an expression table. Rows of the table should be grouped by genes, miRNAs, isomiRNAs and other items. Columns of the table are grouped by patients taken from two different groups.Structure of
data.csv
:sample_1 ... sample_n gene_1 1.2345 ... 1.2345 ... ... ... ... gene_n 1.2345 ... 1.2345 -
description_path
:description.csv
divide patients into two non-intersecting groups (e.g.Normal
andTumor
patients). It is assumed that a patient does not belong to the both groups simultaneously.Structure of
description.csv
:Sample Group sample_1 condition_1 ... ... sample_n condition_2 Column names have to be exactly
Sample
andGroup
. -
interaction_path
(optional):interaction.csv
contains source/target pairs - correlations will be computed among this pairs (innetwork
mode). You should delete this line from the config file if you want to launch anexhaustive
mode.Structure of
interaction.csv
:Source Target source_gene_1 target_gene_2 ... ... source_gene_n target_gene_n Column names have to be exactly
Source
andTarget
. -
output_dir_path
is a path to an output directory.
Usage parameters:
-
reference_group
,experimental_group
are names of the patient groups. -
correlation
:spearman
orpearson
, defines the type of correlation that will be used in the tool. -
alternative
:two-sided
,less
orgreater
.TODO: describe the parameter meaning in
ztest
andzscore
regimes.
DCoNA has two working regimes:
- Network (interactions) regime - performs calculations only on given gene pairs. Requires an
interaction.csv
file. - Exhaustive (all vs all) regime - generates all possible gene pairs from genes listed in
data.csv
and performs calculations. Aninteraction.csv
file is not needed.