Rarefaction is better than robust Aitchison PCA and other compositional data analysis methods at controlling for uneven sequencing effort
This is a reanalysis of Martino et al.'s 2019 manuscript, "A Novel Sparse Compositional Technique Reveals Microbial Perturbations" published mSystems. The benchmarking code, primarily
written in Python, was obtained from its GitHub repository. I have a copy of the repository as originally posted on January 28, 2019 in the deicode-benchmarking directory.
Make sure conda and mamba are installed. Installing TinyTex may take some fenagling and on Mac OSX seems to need to be installed in the home directory Library (i.e., ~/Library/TinyTex) using install.packages. See this issue for more clues on installing tinytex.
mamba config --set channel_priority strict
mamba env create -f workflow/envs/martino.yml
mamba activate martinoUse Snakemake (installed in martino environment) to build project
snakemake --use-conda --conda-frontend mamba -c10Alternatively, on a cluster with slurm you can run
sbatch slurm/default.slurmworkflow/env/martino.yml- primary environmentworkflow/env/rpca.yml- environment for running RPCA-based analysis. Has a number of packages that were required for running different parts of the scripts from thedeicode_benchmarkingrepository.
Python/Numpy was pitching a fit about the use of np.int in these files. The miniconda3 directory was in ~/opt/ on my Mac, but in ~/ on our linux cluster
~/opt/miniconda3/envs/rpca/lib/python3.11/site-packages/deicode/_optspace.py~/opt/miniconda3/envs/rpca/lib/python3.11/site-packages/gemelli/optspace.py.snakemake/conda/*/lib/python3.11/site-packages/deicode/_optspace.py.snakemake/conda/*/lib/python3.11/site-packages/gemelli/optspace.py
I replaced np.int with int and everything seemed to work fine. The files in .snakemake/ are for the local environment when running snakemake.
Worth updating tinytex (mine updated to v2023.12) for work with quarto
quarto install tools tinytex