Star-CDP is a method for estimating cell lineage trees from CRISPR/Cas9-induced mutations under the Star Homoplasy parsimony criterion score (Sashittal et al., 2023). The input characters have 0 representing the unedited state, positive integers representing edited states, and -1 representing the missing or ambiguous state. Under the Star Homoplasy assumption, only mutations between the unedited state (0) and edited states are allowed (note that this is equivalent to Camin-Sokal parsimony for binary characters, where 0 is the ancestral/unedited state and 1 is the only derived/edited state). Our approach is novel in that it is guaranteed to return an optimal solution to the problem that obeys the set of constraints (clades aka subsets of cells) given as input. In practice, the clade constraints are generated from candidate cell lineage trees recovered in prior analsyes or heuristic search, as described in this example.
To build, Star-CDP use commands:
git clone https://github.com/molloy-lab/Star-CDP.git
cd Star-CDP/src
make
Note: On Linux, we successfully compiled with gcc version 8.5.0. On Mac OS X, we successfully compiled with Apple clang version 15.0.0, which requires Apple command line tools to be installed. This can be done with the following commands
# sudo rm -rf /Library/Developer/CommandLineTools
xcode-select --install
and then following the instructions in the pop-up window.
- C++ 17 or above
- ASTRAL
- Boost 1.80.0 or above
To install ASTRAL following the instructions #In either case, before running Star-CDP, you must download ASTRAL and extract the zip folder into the src directory:
git clone https://github.com/smirarab/ASTRAL.git
mv ASTRAL tmp-ASTRAL
unzip tmp-ASTRAL/Astral.*.zip
rm -rf tmp-ASTRAL
To install boost
- On mac, it can be install via homebrew by the following instruction.
This installs up-to-date Boost(> 1.80.0) to
brew install boost
/usr/local/
include and/usr/local/lib
(or in /opt/homebrew on Apple Silicon). - On Linux
This installs up-to-date Boost(> 1.80.0) to
sudo apt update sudo apt install libboost-all-dev
/usr/local
These defualt Boost path has been already set up in our Makefile. However, if the Boost has been installed in a different path, the BOOST_INCLUDE_PATH
and BOOST_LIB_PATH
should be changed accordingly.
Star-CDP requires the following three inputs files.
- A file containing the character matrix, a comma-separated values (CSV) file that has rows representing cells and columns representing target sites, with same format as Startle input character matrix. Values of the character matrix must be either non-negative integers or '-1', with 0 indicating the unmutated state, other integers indicating mutated state, and '-1' as the missing data character.
- A priors files containing the all mutations' probabilities, a comma-separated values (CSV) file with only three columns with the same format as Startle Input priors csv file. The first column represents the site
$x$ , the second column represents a mutated state$y$ and the third column represents the probability that the mutation$0->y$ is on site x. - A trees file containing trees of search space, a files containing lines of newick strings. This file could be constructed via heuristic search, refer to this example for more details.
To run Star-CDP, we recommend working through this example.
The usage options can be viewed with this command:
./star-cdp -h
The output should be
Star-CDP version 1.0.0
COMMAND: ./star-cdp
===================================== Star-CDP =====================================
Star-CDP is a program that solves the large Star Homoplasy parsimony within a
clade-constrained version of tree space.
USAGE for large parsimony problem:
./star-cdp -i <input character file>
-m <input mutation probability file>
-t <input trees from heuritic search or other sources>
-g <label of outgroup (unedited / ancestor)>
-o <output tree file>
USAGE for small parsimony problem, i.e., computing score for given tree:
./star-cdp -i <input character file>
-m <input mutation probability file>
-q <input tree for scoring>
OPTIONS:
[-h|--help]
Prints this help message.
(-i|--input) <input characters file>
Name of file containing input characters in CSV format
(-m|--mutations) <mutations probability file>
Name of file containing mutations probability
[(-t|--trees) <input trees file>]
Name of file containing trees in newick format for constructing solution
space with ASTRAL
(-x|-g|--outgroup) <outgroup or unedited cell label>
Comma separated list of outgroup cells (e.g. unedited cell) used to root
solution space
[(-q) <input species file>]
Name of file containing tree for scoring in newick format
[(-o|--output) <output file>]
Prefix of file for writing output tree (default: stdout)
[(-nosupp)]
Turn off the calculation of clade support, which is the fraction of optimal
solutions clade appears in
[(-consensus)]
Compute greedy, majority, and strict consensus based on clade support
[(-e|--equal)]
Using equal weight for all mutations
[(-memory)]
Amount of memory given to ASTRAL.(Defualt:16000M)
Contact: Post issue to Github (https://github.com/molloy-lab/Star-CDP)
or email Junyan Dai ([email protected]) & Erin Molloy ([email protected])
If you use Star-CDP in your work, please cite:
Dai and Molloy, 2024, Star-CDP, https://github.com/molloy-lab/Star-CDP/
====================================================================================