Building Open Science Communities For Imaging Research
Open Source EHR Summit & Workshop
October 17-18, 2012
National Harbor, MD
Fred Prior1,
J. Kirby2, J. Freymann2, C. Jaffe3, K. Smith1, B. Vendt1, K. Clark1, L. Tarbox1
1 Washington University School of Medicine 2 SAIC-Frederick 3 Boston University
Open Science
A somewhat nebulous concept in the scientific
community which has been construed to mean...
Using Open Source software in scientific research;
Making data and tools available to the public to enhance basic
science education;
Freely sharing tools, data and results among scientists;
Making scientific results available in Open Access journals;
Finding innovative solutions to scientific problems via crowd
sourcing.
Using Open Source software to capture and manage
Open Data to encourage and support research and
education.
Creating Research Communities around an Open Data resource.
Open Data
Really open access to data that was collected for one
purpose but is being reused for new purposes.
The NCI Cancer Imaging Program (CIP) has funded The
Cancer Imaging Archive (TCIA), an open image archive
service to support cancer research.
TCIA collects, de-identifies, curates and manages rich
collections of oncology images and related clinical and trial
information.
Since June of 2011, over 63 institutions have submitted
more than 19.5 million images.
Over 1500 registered users from 93 countries have
downloaded more than13.5 TB of information.
TCIA encourages and supports cancer-related open science communities by hosting and
managing the image archive, providing project wiki space, and is in the process of adding
searchable clinical trial data repositories to facilitate collaborative research.
http://www.cancerimagingarchive.net/
Open Source Software supporting Open Science Private Cloud computing environment based on XCP and XEN virtualization and
LAMP Stack (CentOS /Apache /MySQL /PHP)
National Biomedical Imaging Archive (NBIA) software provides image
management https://wiki.nci.nih.gov/display/NBIA/National+Biomedical+Imaging+Archive+-
+NBIA;jsessionid=A27CD5D74464B9D7C48CE5A8492D12CB
Clinical Trial Processor (CTP) software supports de-identification and secure
transport http://mircwiki.rsna.org/index.php?title=CTP-The_RSNA_Clinical_Trial_Processor
Confluence Wiki supports documentation and community outreach
Request Tracker (RT) Help Ticket tracking system manages help desk functions
http://www.bestpractical.com/rt/
Extensible Neuroimaging Archive Toolkit (XNAT) supports project focused view
of multiple image collections and research workgroups http://xnat.org/about/xnat-publications.html
AIME image annotation and markup management system allows users to
add content to the collections http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3015142/
Managing an Open Access Information Resource
A CTP pipeline de-identifies images in
compliance with DICOM PS 3.15, Appendix E:
and queues them for transmission to the intake
server.
The intake server accepts multiple concurrent
submissions. These are curated (quality control
review) before being moved to the public server
and made open access.
The TCIA wiki supports research groups by
providing a platform to describe the scope and
intent of collections and to provide metadata,
including the posting of conference abstracts and
publications.
Privileged image consumers have been granted
special permissions to query/download
collection-specific limited-access images, while
“general” consumers cannot.
A
B
C
D
The public server hosts curated images available
for download and the OS NBIA application for
doing so. Downloads are noted in NBIA’s
MySQL database.
The operations management Support Center’s
trouble-ticket system tracks image-consumers’
issues and resolutions. Consumers reach the
support center via email or telephone.
Image consumers view and download images
from TCIA and can save lists of images that
may be shared with colleagues. TCIA supports
searches by collection, anatomy, modality,
scanner manufacturer, dates, and many
modality-event specific criteria.
A dashboard reporting system provides TCIA
management with current and historical
consumer counts, collection image counts, and
image downloads.
E
F
G
H
Clark, et al., SIIM 2012, http://erl.wustl.edu/publications/posters/2012posters.html
TCIA DICOM Knowledge Base Open Data derived from clinical practice, or human subjects research
must be obtained with proper consent and fully de-identified before
being made publically available to protect patient privacy and comply
with applicable law.
The Cancer Imaging Archive (TCIA) staff has accumulated a wealth of
knowledge on best practices and procedures for DICOM image de-
identification.
In order to share this information with the wider research community a
wiki based knowledge base was created:
https://wiki.cancerimagingarchive.net/display/Public/De-
identification+Knowledge+Base
DICOM permits private extensions to the standard. These “private
tag” attributes may contain scientifically valuable information that must
be identified and retained.
De-identification scripts are constructed for each combination of
modality, vendor, software version and made available.
NCI’s Collection Acceptance Criteria
CIP Advisory Group prioritizes based on: NCI grant / contract award data sharing requirements
Analysis of imaging features to be used as biomarkers
Creation of correlative signatures for multi-platform biomarkers
Creation of algorithms for detection of cancer
Testing and validating quantitative analysis techniques
Unique characteristics for clinical training.
Adding new collections is a continuous process.
Current TCIA Collections Collection # Subjects # Images Access TCGA-BRCA 61 60,633 Public
TCGA-GBM 294 636,029 Public
TCGA-LGG 26 15,411 Public
TCGA-KIRC 102 48,781 Public
REMBRANDT 130 110,020 Public
RIDER Lung PET CT 244 269,522 Public
RIDER Breast MRI 10 4,800 Public
RIDER Neuro MRI 19 70,220 Public
RIDER Lung CT 32 15,716 Public
RIDER Phantom PET-CT 20 2,231 Public
RIDER Phantom MRI 10 7,061 Public
LIDC-IDRI 1,010 244,527 Public
FDA Lung Phantom 3 634,256 Public
Breast-Diagnosis 88 105,050 Public
QIN Breast 22 26,409 Public
QIBA CT-1C 1 69,258 Public
CT Colonography 825 941,774 Public
Prostate-Diagnosis 53 18,584 Public
QIN Lung 15 1,168 Restricted
QIN Phantom 3 475 Restricted
QIN Lung Segmentation Challenge 100 7,975 Restricted
QIN Prostate 22 25,981 Restricted
NaF Prostate 5 41,404 Restricted
Prostate-MRI 26 22,036 Restricted
Renal Training 8 7,576 Restricted
Head-Neck Cetuximab 96 15,199 Restricted
NLST 26,722 15,364,335 Restricted
TCIA Supported Research An international research community has free
access to information Cancer researchers can use this data to test new hypotheses
and develop new analysis techniques to advance our scientific
understanding of cancer.
Engineers and developers can build new analysis tools and techniques using this data as test material for developing and validating algorithms.
Educators can use it as a teaching tool for introducing students to medical imaging technology and cancer phenotypes.
NCI supported, TCIA centric research communities CIP TCGA Radiology Initiative
Lung Image Database Consortium
Quantitative Imaging Network
National Lung Screening Trial
CIP TCGA Radiology Initiative
Driven by input from its scientific community,
the Cancer Imaging Program (CIP) finds itself
at the junction of two powerful scientific
requisites: the need for cross-disciplinary research and inter-
institutional data-sharing to speed scientific discovery and
reduce redundancy,
the need to provide imaging phenotype data to augment
large scale genomic analysis.
Generate uniform data
set from multiple sites
Access Training Data Sets
Federated annotation
tools pull data from a central
location
Publications can point to
specific Collections and
Shared Lists
TCIA Use Case How TCIA Enables the TCGA Phenotype Research Groups
Visualization and Annotation Tools Use TCIA Collections and Annotation Repository
Radiologists analyze TCIA collections using Clear Canvas
visualization and AIM annotations.
Annotations and markup are stored back into TCIA
Lung Image Database Consortium (LIDC)
The Lung Image Database Consortium research project
(LIDC-IDRI) involves the generation of marked-up
annotated lesions on the diagnostic and lung cancer
screening thoracic CT scans found in the LIDC-IDRI
image collection.
The project has created a web-accessible international
resource for development, training, and evaluation of
computer-assisted diagnostic (CAD) methods for lung
cancer detection and diagnosis.
https://wiki.cancerimagingarchive.net/display/Public/Lung+Image+Dat
abase+Consortium
Quantitative Imaging Network (QIN)
QIN’s mission is to improve the role of quantitative
imaging for clinical decision making in oncology by the
development and validation of data acquisition, analysis
methods, and tools to tailor treatment to individual
patients and to predict or monitor the response to drug
or radiation therapy.
To date, fifteen centers of imaging excellence have
been selected through the NIH peer review process.
TCIA actively supports 5 cross network research
initiatives with both public and private data collections.
https://wiki.cancerimagingarchive.net/display/Public/Quantitative+Imaging+
Network+Collections
Scientific Output from TCIA Supported Research Initiatives 20 peer reviewed publications
32 Scientific presentations
Partial list of publications may be found
here:
http://cancerimagingarchive.net/publications.html.
More are in preparation as the work is
ongoing and new projects and collections
are continually being added.
Conclusions
The Cancer Imaging Archive is an investment in Open
Science by the National Cancer Institute and allows Open
Access to cancer images, trial data, and mechanisms for
collaborative research.
Medical imaging research has been slow to adopt Open
Science, but initiatives such as TCIA are producing
substantial progress in this new direction.
Open science communities have formed around TCIA data
collections and are gaining traction as evidenced by a
steadily increasing output of abstracts and publications.