This dataset enables data retrieval with DataLad (0.12.2 or later) from the HCP Open Access dataset for users that accepted the WU-Minn HCP Consortium Open Access Data Use Terms and obtained valid AWS credentials via db.humanconnectome.org.
The Human Connectome Project (HCP) aims to construct a map of the complete structural and functional neural connections in vivo within and across individuals.
Its 'WU-Minn HCP Open Access Data' data release includes high-resolution 3T MR scans from young healthy adult twins and non-twin siblings (ages 22-35) using four imaging modalities: structural images (T1w and T2w), resting-state fMRI (rfMRI), task-fMRI (tfMRI), and high angular resolution diffusion imaging (dMRI). It further includes behavioral and other individual subject measure data for all, and MEG data and 7T MR data for a subset of subjects (twin pairs).
To retrieve HCP Open Access data via DataLad with this dataset, you need to agree to the WU-Minn HCP Consortium Open Access Data Use Terms and obtain valid AWS credentials:
- Create an account at http://db.humanconnectome.org.
- Log into your account and accept the data use terms of the "WU-Minn HCP Data - 1200 Subjects" data release.
- Enable Amazon S3 access for your Amazon account to get an access key ID
and a secret access secret key (click on the button with the S3 logo).
- The access key ID is a character string similar to this:
AKIAXOX5CT57GHZ4SVFV
- The secret access key is a character string similar to this:
vntFcVA+YI0Ii3tVZPpdyQrgp2H05YjesyXKGE+n
- The access key ID is a character string similar to this:
You will be asked to supply your AWS credentials the first time you use datalad get
to retrieve file content of your choice from
the HCP Open Access dataset. You
should only need to provide credentials once, and all subsequent datalad get
commands
will retrieve data without asking them again.
Note that a subset of files used to not be available via the S3 bucket and was instead retrieved from the ConnectomeDB database. This has since been rectified, but the ConnectomeDB download option was kept as a redundant fallback. When retrieving files from ConnectomeDB, you will be asked to supply your ConnectomeDB credentials (user name and password) instead of your AWS credentials. As with AWS credentials, you will only need to supply these credentials once.
Each HCP1200/
subject directory in this dataset is a DataLad subdataset. The
command datalad get -n <subject-id>
clones this subdataset and allows to
access this subjects release notes (subdirectory release-notes
). Within each
subject's subdataset, one DataLad subdataset exists for each additional
available subdirectory (e.g., MEG
, T1w
, etc., as far as available
for the particular subject).
If you have never used DataLad before, please read the section on DataLad datasets below.
This repository is a DataLad dataset. It allows fine-grained data access up to the level of single files of the HCP Open Access dataset without hosting the HCP data. In order to use this repository for data retrieval, DataLad is required. It is a free and open source command line tool, available for all major operating systems, and builds up on Git and git-annex to allow sharing, synchronizing, and version controlling collections of large files. You can find information on how to install DataLad at handbook.datalad.org/en/latest/intro/installation.html.
A DataLad dataset can be cloned
by running
datalad clone <url>
Once a dataset is cloned, it is a light-weight directory on your local machine. At this point, it contains only small metadata and information on the identity of the files in the dataset, but not actual content of the (sometimes large) data files.
After cloning a dataset, you can retrieve file contents by running
datalad get <path/to/directory/or/file>
This command will trigger a download of the files, directories, or subdatasets you have specified.
DataLad datasets can contain other datasets, so called subdatasets. If you clone the top-level dataset, subdatasets do not yet contain metadata and information on the identity of files, but appear to be empty directories. In order to retrieve file availability metadata in subdatasets, run
datalad get -n <path/to/subdataset>
Afterwards, you can browse the retrieved metadata to find out about subdataset contents, and
retrieve individual files with datalad get
. If you use datalad get <path/to/subdataset>
,
all contents of the subdataset will be downloaded at once.
DataLad datasets can be updated. The command datalad update
will fetch updates and store them
on a different branch (by default remotes/origin/master
). Running
datalad update --merge
will pull available updates and integrate them in one go.
More information on DataLad and how to use it can be found in the DataLad Handbook at handbook.datalad.org. The chapter "DataLad datasets" can help you to familiarize yourself with the concept of a dataset.