Gene regulatory network reconstruction from pseudotemporal single-cell gene expression data. Standalone MATLAB implementation of the SINGE algorithm. This code has been tested on MATLAB R2014b and R2018a on Linux, MATLAB R2020a on macOS, and MATLAB R2018a on Windows.
The software was formerly called SCINGE and has been renamed SINGE.
If you use the SINGE software please cite:
Atul Deshpande, Li-Fang Chu, Ron Stewart, Anthony Gitter. Network inference with Granger causality ensembles on single-cell transcriptomic data. bioRxiv 2019. doi:10.1101/534834
The SINGE-supplemental repository contains additional scripts, analyses, and results related to this manuscript.
The dependencies vary based on how SINGE is run. Setup instructions for each mode are described below.
The full SINGE pipeline runs multiple Generalized Lasso Granger (GLG) tests to infer different directed networks for different hyperparameters and subsamples of the data. These directed networks are then aggregated into a final predicted network. For small or medium datasets and relatively few hyperparameter combinations, SINGE can be run in a "standalone" mode where all the GLG tests and the aggregation step are run serially. However, for larger datasets or hyperparameter combinations, the GLG tests can be run in parallel on a single machine or multiple machines. After all GLG tests terminate, the results can be aggregated separately.
The standalone and parallel modes are accessible in three ways: MATLAB, compiled MATLAB executables with a wrapper Bash script, or Docker.
Running SINGE through MATLAB requires the source code in this repository and the glmnet_matlab package as a dependency.
Unzip glmnet_matlab.zip
in either the root directory that contains SINGE_Example.m
or the code
subdirectory.
Then use SINGE.m
to run SINGE in the standalone mode or SINGE_GLG_Test.m
and SINGE_Aggregate.m
to run each stage separately.
SINGE can be run through MATLAB in Linux, macOS, or Windows but may not work in all Windows environments.
SINGE.m
usage:
SINGE(Data,gene_list,outdir,hyperparameter_file)
SINGE_Example.m
demonstrates a simple example with the hyperparameters specified in default_hyperparameters.txt
.
It runs SINGE on data1/X_SCODE_data.mat
and writes the results to the Output
directory.
Requires Bash, the operating system-specific compiled SINGE code, and a compatible MATLAB runtime library, which can be downloaded from https://www.mathworks.com/products/compiler/matlab-runtime.html
Starting with release 0.4.0, the compiled executables for Linux SINGE_GLG_Test
and SINGE_Aggregate
are available from the GitHub releases page.
Download these executables and place them in the same directory as the wrapper scripts SINGE.sh
, run_SINGE_GLG_Test.sh
, and run_SINGE_Aggregate.sh
from this repository.
The compiled code has been tested with the R2018a runtime in Linux.
Starting with release 0.4.1, the compiled executables for macOS are available from the GitHub releases page in the file SINGE_mac.tgz
.
Download these executables, untar them with tar -xf SINGE_mac.tgz
, and place them in the same directory as the wrapper scripts SINGE.sh
, run_SINGE_GLG_Test_mac.sh
, and run_SINGE_Aggregate.sh_mac
from this repository.
The compiled code has been tested with the R2020a runtime in macOS.
There are no compiled executables for Windows. We recommend running SINGE through Docker in Windows if a compatible MATLAB environment is not available.
Bash wrapper script usage:
bash SINGE.sh runtime_dir mode Data gene_list outdir [hyperparameter_file] [hyperparameter_number]
hyperparameter_file
is required only for the standalone and GLG modes.hyperparameter_number
is required only for GLG mode.
Use bash SINGE.sh -h
to print the complete usage message.
The SINGE.sh
script automatically detects whether it is running on Linux or macOS and uses the appropriate wrapper script and executables.
bash SINGE.sh PATH_TO_RUNTIME standalone data1/X_SCODE_data.mat data1/gene_list.mat Output default_hyperparameters.txt
bash SINGE.sh PATH_TO_RUNTIME GLG data1/X_SCODE_data.mat data1/gene_list.mat Output default_hyperparameters.txt 2
bash SINGE.sh PATH_TO_RUNTIME Aggregate data1/X_SCODE_data.mat data1/gene_list.mat Output
Replace PATH_TO_RUNTIME
with the path to the MATLAB runtime.
Requires Docker.
The most straightforward way to run SINGE through Docker is with the SINGE.sh
wrapper script.
The usage is the same as the examples above except the script name and MATLAB runtime path do not need to be specified.
Alternatively, arbitrary commands can be run inside Docker by overriding the default entry point.
We recommend specifying the version of the Docker image.
docker run -v $(pwd):/SINGE -w /SINGE agitter/singe:0.4.0 standalone data1/X_SCODE_data.mat data1/gene_list.mat Output default_hyperparameters.txt
docker run -v $(pwd):/SINGE -w /SINGE --entrypoint "/bin/bash" agitter/singe:0.4.0 -c "source ~/.bashrc; conda activate singe-test; tests/compare_example_output.sh Output
This example is part of the SINGE test code, which only runs when called from the root of the SINGE git repository.
- data - Path to matfile with ordered single-cell expression data (sparse matrix
X
), pseudotime values (arrayptime
), optional indices of regulators (array of index valuesregix
), and optional branching information (matrixbranches
). For example, the data indata1/X_SCODE_data.mat
represents a linear trajectory, anddata_bifurcated/X_data_bifurcated.mat
represents a branching trajectory with two branches. - gene_list - Path to file containing list of gene names corresponding to the rows in the expression data matrix
X
in Data (e.g.,data1/gene_list.mat
) - outdir - Path to folder for storing results from individual GLG Tests
- hyperparameter_file - Path to file containing a list of GLG hyperparameter combinations for the hyperparameters described below
Additional input for compiled MATLAB code with R2018a runtime
- runtime_dir - Path to MATLAB R2018a runtime library
GLG hyperparameters:
- --ID - Numeric identifier for the GLG hyperparameter set, which should be unique for each hyperparameter set and replicate index
- --lambda - Sparsity parameter (lambda = 0 results in a non-sparse solution)
- --dT - Time resolution for GLG test
- --num-lags - Number of lags for GLG test
- --kernel-width - Gaussian kernel width for GLG test
- --replicate - Replicate index
- --family - Distribution Family of the gene expression values (options =
gaussian
,poisson
, default =gaussian
) - --prob-zero-removal - For Zero-handling Strategy (default = 0)
- --prob-remove-samples - Sample removal rate for obtaining subsampled replicates (default = 0.2)
- --date - Valid date in the
dd-mmm-yyyy
ormm/dd/yyyy
format.
See default_hyperparameters.txt
for an example hyperparameters file.
Users can generate their own hyperparameter file using the bash
script scripts/generate_hyperparameters.sh
, which takes hyperparameter values from the files scripts/lambda.txt
, scripts/kernel.txt
, scripts/time.txt
, scripts/probzeroremoval.txt
, and scripts/probremovesample.txt
.
See USAGE.md
for guidelines on setting hyperparameters and running SINGE on a new dataset.
- SINGE_Ranked_Edge_List.txt - File with list of ranked edges according to their SINGE scores
- SINGE_Gene_Influence.txt - File with list of genes ranked according to their SINGE influence.
When running SINGE v0.5.0
on a dataset with a branching trajectory (existence of matrix branches
in mat
file), the SINGE_Ranked_Edge_List.txt and SINGE_Gene_Influence.txt are calculated for the entire branching process by combining the results of the individual GLG tests from all branches. Alternatively, the user can store the individual GLG test results from each branch in a separate folder and call SINGE Aggregate to obtain branch specific network inference.
The master branch of this repository may be unstable as new features are implemented. Use a versioned release for stable data analysis.
Because the subsampling and zero-removal stages involve pseudo-random sample removals, SINGE generates a random seed using input hyperparameters, including the date input. The results can be reproduced by providing the same inputs and date from a previous experiment.
The tests
directory contains test scripts and reference output files to test SINGE.
GitHub Actions is used to run several types of tests in a Linux environment and to deploy a temporary Docker image to DockerHub every time the repository's master
branch is updated.
The tests build the SINGE Docker image, run SINGE on the example data in multiple ways using Docker, and compare the generated output with the reference output.
GitHub Actions is also used to test SINGE in a macOS environment. The tests install the MATLAB runtime, run the compiled SINGE code on the example data, and compare the generated output with the reference output. The macOS tests use a more permissive threshold when comparing the generated and reference adjacency matrices due to minor operating system-specific differences in the output.
The compiled version of SINGE for Linux is generated by compiling the MATLAB code in MATLAB R2018a:
mcc -N -m -R -singleCompThread -R -nodisplay -R -nojvm -a ./glmnet_matlab/ -a ./code/ SINGE_GLG_Test.m
mcc -N -m -R -singleCompThread -R -nodisplay -R -nojvm -a ./code/ SINGE_Aggregate.m
compile_SINGE.sh
is used for testing to compile SINGE and confirm the source .m
files match the versions used to create the binaries.
The compiled version of SINGE for macOS is generated by running the compile_SINGE_mac.sh
script in MATLAB R2020a.
SINGE is available under the MIT License, Copyright © 2019 Atul Deshpande, Anthony Gitter.
The file iLasso_for_SINGE.m
has been modified from iLasso.m
.
The original third-party code is available under the MIT License, Copyright © 2014 USC-Melady.
The compiled version of SINGE includes the glmnet_matlab package, which is available under the GPL-2 license.