This code repository contains a command line implementation of SPURF, that takes a single antibody heavy chain DNA sequence and returns its inferred substitution profile and a logo plot of this. SPURF uses cached data from a large-scale Rep-Seq dataset as input to a statistical model made to determine a detailed clonal family specific substitution profile for a single input sequence. Source code to fit the SPURF model from scratch using another dataset is also provided. Results and methods are described in our preprint. The dataset used in the paper is available in our Zenodo bucket.
Clone this GitHub repo recursively to get the necessary submodules:
git clone --recursive https://github.com/krdav/SPURF.git
cd SPURF
git pull --recurse-submodules https://github.com/krdav/SPURF.git
There are two supported ways of installing the command line implementation of SPURF: 1) using Conda on Linux and 2) using Docker and the provided Dockerfile. The Conda installation has been tested on our own servers and a fresh Ubuntu installation on a VirtualBox. Using VirtualBox, SPURF can be installed on both Mac and Windows. Alternatively, Docker can also be used on any platform that supports it.
First, install Conda for Python 2.
Miniconda is sufficient and much faster at installing.
Remember to source ~/.bashrc
if continuing installing in the same terminal window.
Install dependencies with apt-get
:
sudo apt-get update
sudo apt-get upgrade -y
sudo apt-get install -y libz-dev cmake scons libgsl0-dev libncurses5-dev libxml2-dev libxslt1-dev mafft hmmer
Use the INSTALL executable to install the required python environment and partis (via ./INSTALL
).
After installation, the Conda environment needs to be loaded every time before use, like this:
source activate SPURF
First install Docker.
We have a Docker image on Docker Hub that is automatically kept up to date with the master branch of this repository. It can be pulled and used directly:
sudo docker pull krdav/spurf
Alternatively you can build the container yourself from inside the main repository directory:
sudo docker build -t spurf .
To run this container, use a command such as (see modifications below)
sudo docker run -it -v host-dir:/host krdav/spurf /bin/bash
- replace
host-dir
with the local directory to which you would like access inside your container - replace
/host
with the place you would like this directory to be mounted - if you built your own container, use
spurf
in place ofkrdav/spurf
Detach using ctrl-p ctrl-q
.
SPURF is wrapped into an Rscript named run_SPURF.R
that takes three inputs:
- an antibody heavy chain DNA sequence
- (optional) the basename for the two output files which are a substitution profile and a logo plot
- the model type (i.e.
l2
orjaccard
).
Example run:
Rscript --vanilla run_SPURF.R <input_sequence> <output_base> <model_type>
E.g.:
Rscript --vanilla run_SPURF.R CGCAGGACTGTTGANGCCTTCGGAGACCCTGTCCCTCACCTGCGTTGTCTCTGGCGGGTCCTTCAGTGATTACTACTGGAGCTGGATCCATCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGGAAATCAATCATAGTGGGAGCACCAACTACAACCCGTCCCTCGAAAGTCGAGCCACCATATCAGTAGACACGTCCCAGAACAACCTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCGGACTCGGCTGTGTATTACTGTGCGAGAGGCCCGACTACAATGGCTCACGACTTTGACTACTGGGGCCAGGGAACCCTGGTCACC seqXYZ_SPURF_output l2
By default, the model type l2
is used.