Reorganization of the 3D genome pinpoints non-coding drivers of primary prostate tumors

This repository contains all the data and analysis related to Reorganization of the 3D genome pinpoints non-coding drivers of primary prostate tumors.

Published version of the paper is available on Cancer Res.

A preprint version of this article is available on bioRxiv.

A reproducible run of this work can be found on CodeOcean.

Usage

To download all the code, scripts, and results, use git clone:

git clone https://github.com/LupienLab/3d-reorganization-prostate-cancer.git

This does not download the raw sequencing data. There are placeholder folders for the raw data, but the FASTQ files are available from the European Genome-Phenome Archive.

Data Type	EGA Accession Number
Whole genome sequencing	EGAS00001000900
RNA-seq	EGAS00001000900
ChIP-seq (H3K27ac)	EGAS00001002496
Hi-C	EGAS00001005014

Processed data from the Hi-C sequencing data can be found on the Gene Expression Omnibus (Accession GSE164347).

Raw Hi-C data from other studies can be found with the links and accession numbers below.

Data	Repository	Accession Number
22Rv1, RWPE1, and C4-2B	GEO	GSE118629
H1-hESC (Rep 1)	4D Nucleome	4DNFI6HDY7WZ
H1-hESC (Rep 2)	4D Nucleome	4DNFITH978XV
HAP-1 (Rep 1)	4D Nucleome	4DNFIT64Q7A3
HAP-1 (Rep 2)	4D Nucleome	4DNFINSKEZND
GM12878 (Rep 1)	4D Nucleome	4DNFIIV4M7TF
GM12878 (Rep 2)	4D Nucleome	4DNFIXVAKX9Q

Project Structure

This repository is structured as follows:

.
└── data/            # directory where all non-analysis data is stored
    ├── External/    # data from other papers, collaborators
    ├── Raw/         # raw data generated for this specific project along with pre-processing scripts and data
    └── Processed/   # data from `Raw/` that has been aggregated or processed in some other way beyond the standard raw pre-processing
└── code/
    ├── Result1/     # analysis scripts and logs for `result1`
    ├── Result2/     # analysis scripts and logs for `result2`
    └── ...
└── results/
    ├── Result1/     # results for `result1`
    ├── Result2/     # results for `result2`
    └── ...
├── README.md        # this file
└── environment.yaml # Anaconda environment YAML file for the entire project

To re-run any of the analyses in the code/ folders:

Build and activate the conda environment stored in environment.yaml

conda create --file environment.yaml -n <ENV_NAME>
conda activate <ENV_NAME>

Navigate to the result directory of interest
Run snakemake

That should regenerate the entire set of results for that specific folder. You can preview that needs to be run by running snakemake -n before running the analyses.

Name		Name	Last commit message	Last commit date
Latest commit History 1,178 Commits
Manuscript/Tables		Manuscript/Tables
code		code
data		data
legacy		legacy
results		results
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reorganization of the 3D genome pinpoints non-coding drivers of primary prostate tumors

Usage

Project Structure

About

Releases

Contributors 3

Languages

License

LupienLab/3d-reorganization-prostate-cancer

Folders and files

Latest commit

History

Repository files navigation

Reorganization of the 3D genome pinpoints non-coding drivers of primary prostate tumors

Usage

Project Structure

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Contributors 3

Languages