brain imaging data structure and center for reproducible neuroscince

32
Brain Imaging Data Structure CHRIS GORGOLEWSKI STANFORD UNIVERSITY

Upload: krzysztof-gorgolewski

Post on 21-Jan-2017

200 views

Category:

Science


3 download

TRANSCRIPT

Page 1: Brain Imaging Data Structure and Center for Reproducible Neuroscince

Brain Imaging Data Structure

CHRISGORGOLEWSKISTANFORDUNIVERSITY

Page 2: Brain Imaging Data Structure and Center for Reproducible Neuroscince

Getting lost in your data

Page 3: Brain Imaging Data Structure and Center for Reproducible Neuroscince

Getting lost in your data

•MRI has been used to study the human brain for over 20 years.

• Despite similarities in experimental designs and data types each researcher tends to organize and describe their data in their own way.

http://www.nature.com/news/brain-imaging -fmri-2 -0-1.10365

Page 4: Brain Imaging Data Structure and Center for Reproducible Neuroscince

Getting lost in your data

Heterogeneity in data description practices causes:

• problems in sharing data (even within the same lab),

• unnecessary manual metadata input when running processing pipelines,

• no way to automatically validate completeness of a given dataset,

• difficulties in combining data from multi-center studies.

Page 5: Brain Imaging Data Structure and Center for Reproducible Neuroscince

Brain Imaging Data Structure

Brain Imaging Data Structure (BIDS) is a new way for standardizing, describing and

organizing results of a human neuroimaging experiment.

Page 6: Brain Imaging Data Structure and Center for Reproducible Neuroscince

Who is it for?

1. Lab PIs. It will make handing over one dataset from one student/postdoc to another easy.

2. Workflow developers. It’s easier to write pipelines expecting a particular file organization.

3. Database curators. Accepting one dataset format will make curation easier.

Page 7: Brain Imaging Data Structure and Center for Reproducible Neuroscince

Principles behind BIDS

1. Adoption is crucial.

2. Don’t reinvent the wheel.

3. Some meta data is better than no metadata

4. Don’t rely on external software (databases) or complicated file formats (RDF).

5. Aim to capture 80% of experiments but give the remaining 20% space to extend the standard.

Page 8: Brain Imaging Data Structure and Center for Reproducible Neuroscince

Implementation1. Some metadata is encoded in the folder structure.

2. Some metadata is replicated in the file name for simplicity.

3. Use of tab separated files for tabular data.

4. Use of NIFTI files for imaging data.

5. Use of JSON files for dictionary type metadata.

6. Use of legacy text file formats for b vectors/values and physiological data.

7. Make certain folder hierarchy levels optional for simplicity.

8. Allows for arbitrary files not covered by the spec to be included in any way the researchers deem appropriate.

Page 9: Brain Imaging Data Structure and Center for Reproducible Neuroscince

Why TSV?

1. Simple text format with wide software support.

2. Strings with commas do not need to be escaped by quotation marks.

Page 10: Brain Imaging Data Structure and Center for Reproducible Neuroscince

Why NiFTI?Pros:

1. Widest support from software packages.

2. Designed for neuroimaging.

Cons:

1. Poor metadata support.

2. Memory mapped random access to compressed NifTI is hard to implement.

Page 11: Brain Imaging Data Structure and Center for Reproducible Neuroscince

Why JSON?

1. Simple text (you can use notepad to edit).

2. Wide support from different programming languages.

3. Simpler than XML, but almost as powerful.

4. Extensible with linked data.

Page 12: Brain Imaging Data Structure and Center for Reproducible Neuroscince

BIDS features1. Handles multiple sessions and runs

2. Supports sparse acquisition (via slice timing)

3. Supports contiguous acquisition covariates (breathing, cardiac etc.)

4. Supports multiple field map formats

5. Supports multiple types of anatomical scans

6. Supports function MRI: both task based and resting state.

7. Supports diffusions data (together with corresponding bvec, bvalfiles)

8. Supports behavioral variables on the level of subjects (demographics), sessions, and runs.

Page 13: Brain Imaging Data Structure and Center for Reproducible Neuroscince

Folder organization (simplified)sub-control01/anat/sub-control01_T1w.nii.gzsub-control01_T1w.jsonsub-control01_T2w.nii.gzsub-control01_T2w.json

func/sub-control01_task-nback_bold.nii.gzsub-control01_task-nback_bold.jsonsub-control01_task-nback_events.tsvsub-control01_task-nback_cont-physio.tsvsub-control01_task-nback_cont-physio.jsonsub-control01_task-nback_sbref.nii.gz

dwi/sub-control01_dwi.nii.gzsub-control01_dwi.bvalsub-control01_dwi.bvec

fmapsub-control01_phasediff.nii.gzsub-control01_phasediff.jsonsub-control01_magnitude1.nii.gz

sub-control01_scans.tsvparticipants.tsvdataset_description.jsonREADMECHANGES

Page 14: Brain Imaging Data Structure and Center for Reproducible Neuroscince

Example events file

onset duration trial_type ResponseTime1.2 0.6 go 1.4355.6 0.6 stop 1.739…

Page 15: Brain Imaging Data Structure and Center for Reproducible Neuroscince

Example metadata file{

"RepetitionTime": 3.0,

"EchoTime": 0.0003,

"FlipAngle": 78,

"SliceTiming": [0.0, 0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.8, 2.0, 2.2, 2.4, 2.6, 2.8],

"MultibandAccellerationFactor": 4,

"ParallelReductionFactorInPlane": 2

}

Page 16: Brain Imaging Data Structure and Center for Reproducible Neuroscince

Example demographics file

participant_id age sexsub-001 34 MSub-002 12 FSub-003 33 F

Page 17: Brain Imaging Data Structure and Center for Reproducible Neuroscince

Keys to success

1. Make the community involved in the design process.

2. Provide a good validation tool (browser based!).

3. Build tools/workflows/pipelines that make adopting BIDS

worthwhile (AA, Nipype, C-PAC etc.)

4. Get support from databases (LORIS, COINS, SciTran,

OpenfMRI, XNAT, etc.)

Page 18: Brain Imaging Data Structure and Center for Reproducible Neuroscince

Existing tools1. bids-validator: https://github.com/INCF/bids-validator (demo)

2. openfmri2bids: https://github.com/INCF/openfmri2bids

3. bidsutils: https://github.com/INCF/bidsutils

4. dcm2niix: https://github.com/neurolabusc/dcm2niix

5. dicm2nii:

http://www.mathworks.com/matlabcentral/fileexchange/42997-

dicom-to-nifti-converter--nifti-tool-and-viewer

6. Quality Assessment Protocol: http://preprocessed-connectomes-

project.github.io/quality-assessment-protocol

7. SciTran: https://scitran.github.io

Page 19: Brain Imaging Data Structure and Center for Reproducible Neuroscince

Upcoming tools1. OpenfMRI (internal format)

2. XNAT (import)

3. COINS (export)

4. heudiconv (conversion)

5. LORIS (import)

6. C-PAC (import)

7. NIAK (import)

8. Nipype (import)

Page 20: Brain Imaging Data Structure and Center for Reproducible Neuroscince

WhydoIcare

Page 21: Brain Imaging Data Structure and Center for Reproducible Neuroscince

Data sharing drives progress

Page 22: Brain Imaging Data Structure and Center for Reproducible Neuroscince

Data sharing drives progress

$878,400how much it would cost to perform

studies using OpenfMRI data if it did not exist

Page 23: Brain Imaging Data Structure and Center for Reproducible Neuroscince

Convincing people to share data is hard

1. Publication as an incentive (data papers – Gorgolewski et al.

2013)

2. Sharing only statistical derivatives (NeuroVault – Gorgolewski

et al. 2014)

Page 24: Brain Imaging Data Structure and Center for Reproducible Neuroscince

Poldrack and Gorgolewski, 2014

Page 25: Brain Imaging Data Structure and Center for Reproducible Neuroscince

Convincing people to share data is hard

1. Publication as an incentive (data papers – Gorgolewski et al.

2013)

2. Sharing only statistical derivatives (NeuroVault – Gorgolewski

et al. 2014)

3. Journal policies (see PloS One, F1000Research Scientific Data)

Page 26: Brain Imaging Data Structure and Center for Reproducible Neuroscince

Data sharing fears1. Fear of being scooped

2. Fear of someone finding a mistake

3. Misconceptions about the ownership of the data

Page 27: Brain Imaging Data Structure and Center for Reproducible Neuroscince
Page 28: Brain Imaging Data Structure and Center for Reproducible Neuroscince

Stanford | Center for ReproducibleNeurscience Analyzing for reproducibility

reproducibility.stanford.edu

• Automated quality control reporting• Data analysis service• Using cutting edge, robust and well tested methods• Leveraging supercomputer power not accessible to

most labs• Quantify reproducibility by out of sample prediction

estimates• “Glass box” – in depth documentation describing all data

analysis steps

Page 29: Brain Imaging Data Structure and Center for Reproducible Neuroscince

Stanford | Center for ReproducibleNeurscience Analyzing for reproducibility

reproducibility.stanford.edu

• The service is completely free of charge• Under one condition: the data will be publicly available

after a grace period

Page 30: Brain Imaging Data Structure and Center for Reproducible Neuroscince

Stanford | Center for ReproducibleNeurscience Analyzing for reproducibility

reproducibility.stanford.edu

• CRN will:• Make more data publicly available• Improve access to best methods and algorithms

(including yours!)• Enable automatic data exploration and hypothesis

generation• Foster the culture of looking at out of sample

predictions and effect sizes

Page 31: Brain Imaging Data Structure and Center for Reproducible Neuroscince

Acknowledgments

The Poldrack Lab @ Stanford

Data Sharing Task Force

Page 32: Brain Imaging Data Structure and Center for Reproducible Neuroscince

bids.neuroimaging.io