Skip to content

zhiyanov/DCoNA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DCoNA: tool for fast Differential Correlation Network Analysis

DCoNA is a statistical tool that allows one to identify pair interactions, which correlation significantly changes between two conditions. DCoNA was designed to test the hypothesis for a predefined list of source and target pairs ("Network" regime). However, DCoNA can also be used in the complete- network regime when the list is not given ("Exhaustive" regime). In this regime, DCoNA tests the hypothesis for all possible pairs of molecules from expression data. Aside from the hypothesis testing, DCoNA can be used to test that significantly altered correlations of a particular source molecule are overrepresented among all significantly changed correlations. Also, DCoNA can compute mean, median, and other quantiles of z-statistics associated with a particular molecule and its targets to determine a trend in correlation changes.

If you use DCoNA in work contributing to a scientific publication, we ask you to cite our publication:

Anton Zhiyanov, Narek Engibaryan, Stepan Nersisyan, Maxim Shkurnikov, Alexander Tonevitsky, Differential co-expression network analysis with DCoNA reveals isomiR targeting aberrations in prostate cancer, Bioinformatics, Volume 39, Issue 2, February 2023, btad051
https://doi.org/10.1093/bioinformatics/btad051

Table of Contents

Installation

Installation using pip

pip install dcona

Downloading example dataset

You can try DCoNA on TCGA-PRAD test dataset

Usage

You can use DCoNA either as Python-module or as a command-line tool.

Detailed description of functions with data example and test launch.

Available functions

dcona.ztest

It tests the hypothesis on correlation equiavalence between pairs of genes

dcona.ztest(data_df, description_df, reference_group, experimental_group, correlation='spearman', alternative='two-sided', interaction=None, repeats_number=None, output_dir=None, process_number=None)
  • Command-line usage:
    dcona ztest config.json

dcona.zscore

It aggregates correlation changes of source molecule with all its targets.

dcona.zscore(data_df, description_df, reference_group, experimental_group, correlation='spearman', score='mean', alternative='two-sided', interaction=None, repeats_number=None, output_dir=None, process_number=None)
  • Command-line usage:
    dcona zscore config.json

dcona.hypergeom

It groups pairs with changed correlations by the source molecules and finds overrepresented groups using the hypergeometric test.

dcona.hypergeom(ztest_df, alternative='two-sided', oriented=True, output_dir=None)
  • Command-line usage:
    You should launch ztest and then hypergeom with the same config file.
    dcona hypergeom config.json

Data structure for CLI launch

To run the tool in command line you need the following data:

  • config.json containing data filenames and tool usage parameters
{
	"data_path": "./example/data/data.csv",
	"description_path": "./example/data/description.csv",
	"interaction_path": "./example/data/interactions.csv",
	"output_dir_path": "./../output/",
	
	"reference_group": "Normal",
	"experimental_group": "Tumor",

	"correlation": "spearman",
	"alternative": "two-sided",
	"score": "mean",
	"repeats_number": 500,
	"process_number": 2
}

Both relative and absolute file paths can be used.

Data description:

  • data_path : data.csv contains an expression table. Rows of the table should be grouped by genes, miRNAs, isomiRNAs and other items. Columns of the table are grouped by patients taken from two different groups.

    Structure of data.csv :

    sample_1 ... sample_n
    gene_1 1.2345 ... 1.2345
    ... ... ... ...
    gene_n 1.2345 ... 1.2345
  • description_path : description.csv divide patients into two non-intersecting groups (e.g. Normal and Tumor patients). It is assumed that a patient does not belong to the both groups simultaneously.

    Structure of description.csv:

    Sample Group
    sample_1 condition_1
    ... ...
    sample_n condition_2

    Column names have to be exactly Sample and Group.

  • interaction_path (optional): interaction.csv contains source/target pairs - correlations will be computed among this pairs (in network mode). You should delete this line from the config file if you want to launch an exhaustive mode.

    Structure of interaction.csv:

    Source Target
    source_gene_1 target_gene_2
    ... ...
    source_gene_n target_gene_n

    Column names have to be exactly Source and Target.

  • output_dir_path is a path to an output directory.

Usage parameters:

  • reference_group, experimental_group are names of the patient groups.

  • correlation : spearman or pearson, defines the type of correlation that will be used in the tool.

  • alternative : two-sided, less or greater.

    TODO: describe the parameter meaning in ztest and zscore regimes.

Network and exhaustive regimes

DCoNA has two working regimes:

  • Network (interactions) regime - performs calculations only on given gene pairs. Requires an interaction.csv file.
  • Exhaustive (all vs all) regime - generates all possible gene pairs from genes listed in data.csv and performs calculations. An interaction.csv file is not needed.