wci pop sci feb 2011
DESCRIPTION
Talk given to the Emory Cancer Control and Population Science Program 2/17/2011 Describing Biomedical Informatics, Integrative Cancer Research, caBIG and CTSATRANSCRIPT
Biomedical Informatics and Integrative Cancer Research
Joel Saltz MD, PhDDirector Center for Comprehensive
Informatics
Objectives
• Brain Tumor in Silico Center• Whole Exome Sequencing and
Hypertension in African American Populations
• Biomedical Informatics: caBIG, CTSA Informatics Tools and Infrastructure
Integrative Analysis: Tumor Microenvironment
• Structural and functional differentiation within tumor
• Molecular pathways are time and space dependent
• “Field effects” – gradient of genetic, epigenetic changes
• Radiology, microscopy, high throughput genetic, genomic, epigenetic studies, flow cytometry, microCT, nanotechologies …
• Create biomarkers to understand disease progression, response to treatment
Tumors are organs consisting of many interdependent cell types
• From John E. Niederhuber, M.D. Director National Cancer Institute, NIH presented at Integrating and Leveraging the Physical Sciences to Open a New Frontier in Oncology, Feb 2008
Informatics Requirements
•Parallel initiatives Pathology, Radiology, “omics”
•Exploit synergies between all initiatives to improve ability to forecast survival & response.
RadiologyImaging
Patient Outco
me
Pathologic Features
“Omic”Data
Structural Complexity
Tumor Microenvironment(roughly 25TB/cm2 tissue)
In Silico Center for Translational NeuroOncology Informatics
Director: Joel Saltz, MD, PhD; PI Dan Brat MD, PhD
AIMS1. Determine genetic and gene expression
correlates of high resolution nuclear morphometry in the diffuse gliomas and their relation to MR features using Rembrandt and TCGA datasets.
2. Determine the influence of tumor micro-environment on gene expression profiling and genetic classification using TCGA data
3. Examine the gene expression profile of low grade gliomas that progress to GBM for predictive clustering, prognostic significance and correlates with pathologic and radiologic features.
4. Identify correlates of MRI enhancement patterns in astrocytic neoplasms with underlying vascular changes and gene expression profiles.
8
In Silico Program Objectives (from NCI)• In silico is an expression used to mean "performed on computer
or via computer simulation.“ (Wikipedia)• In silico science centers: support investigator-initiated,
hypothesis-driven research in the etiology, treatment, and prevention of cancer using in silico methods• Generating and publishing novel cancer research findings leveraging
caBIG tools and infrastructure• Identifying novel bioinformatics processes and tools to exploit
existing data resources• Encouraging the development of additional data resources and
caBIG analytic services• Assessing the capabilities of current caBIG tools• Emory, Columbia, Georgetown, Fred Hutchinson Cancer ,
Translational Genomics Research Institute
TCGA: Large Scale Integrative multi-”omic” Cancer Study
TCGA Research Network
Digital Pathology
Neuroimaging
Distinguish (and maybe redefine) astrocytic, oligodendroglialand oligoastrocytic tumors using TCGA and Rembrandt
Important since treatment and Outcome differ
• Link nuclear shape, texture to biological and clinical behavior
• How is nuclear shape, texture related to gene expression category defined by clustering analysis of Rembrandt data sets?
• Relate nuclear morphometry and gene expression to neuroimaging features (Vasari feature set)
• Genetic and gene expression correlates of high resolution nuclear morphometry and relation to MR features using Rembrandt and TCGA datasets.
TCGA Brain Pathology CriteriaAttributes that Relate to Entire Specimen
Roughly 200 TCGA specimens; Three Reviewers with Dan Brat adjudicating
Not Present: Not detected on any blockPresent: detected on any blockAbundant: present in ≥ 50% of 10x
fields in ≥ 50% blocks
Microvascular hyperplasia elements (1,2) Complex/glomeruloid Circumferential endothelial hyperplasia
Necrosis elements (3,4) Multiple serpentine pseduopalisading
pattern Zonal necrosis
Small cell component Gemistocytes “Oligodendroglioma-like” component
with perinuclear cytoplasmic halos Perineuronal and/or perivascular satellitosis Multi-nucleated/giant cells Epithelial metaplasia Mesenchymal metaplasia Entrapped gray matter Entrapped white matter Micro-mineralization
Inflammation Macrophage/histiocytic infiltrates Lymphocytic infiltrates Polymorphonuclear leukocytic infiltrates
Characterization of specific microanatomicstructures
Characterization of neoplastic nuclei
• Nuclear size (area and perimeter)
• shape (eccentricity, circularity major axis, minor axis, Fourier shape descriptor and extent ratio)
• intensity (average, maximum, minimum, standard error) and texture (entropy, energy, skewness and kurtosis)
characterization of regions of angiogenesis
• endothelial hypertrophy • endothelial hyperplasia• microvascular hyperplasia• glomeruloid proliferation• area of angiogenesis region• shape – (how the region
departs from a fitted tubular structure)
• normalized color
Feature Extraction
TCGA Whole Slide Images
Jun Kong
Oligodendroglioma Astrocytoma
Nuclear Qualities
Class Assignment
1 10
Astrocytoma vs OligodendroglimaOverlap in genetics, gene expression, histology
Astrocytoma vs Oligodendroglima• Assess nuclear size (area and
perimeter), shape (eccentricity, circularity major axis, minor axis, Fourier shape descriptor and extent ratio), intensity (average, maximum, minimum, standard error) and texture (entropy, energy, skewness and kurtosis).
Whole slide scans from 15 TCGA GBMS (69 slides)7 purely astrocytic in morphology; 7 with 2+ oligo components399,233 nuclei analyzed for astro/oligo featuresCases were categorized based on ratio of oligo/astro cells
Machine-based Classification of TCGA GBMs (J Kong)
TCGA Gene Expression Query:c-Met overexpression
Separation:p =1.4 X 10 -22
Imaging Pathology MolecularTime 1 – 8 yrs
Examine gene expression profiles of low grade gliomas that progress to GBM for predictive clustering and correlates with pathologic and radiologic features.
ClassicalProneural Neural Mesenchymal
Hierarchical clustering of 176 Rembrandt samples using TCGA classification genes defines four major subtypes.
(Lee Cooper and Carlos Moreno)
75 lower-grade gliomas in REMBRANDT (p < 0.0003).
Lee CooperCarlos Moreno
Predicting Recurrence/Survival43 oligodendrogliomas in REMBRANDT (p < 0.0002).
Neuroimaging Correlates
Define relationship between contrast-enhancement, perfusion and permeability with vascular changes
Correlate MR characteristics defined by the Vasari Feature Set with pathologic grade, vascular morphology and gene expression profiles
Angiogenesis Segmentation
H&EImage
ColorDeconvolution
HematoxylinImage
EosinImage
EosinImage
SpatialNorm.
DensityImage
DensityCalculation
BoundarySmoothing
DensityImage
ObjectID
SegmentedVessels
Eosin intensity image
Angiogenic Segmentation
States of AngiogenesisEndothelial Hypertrophy
Complex MicrovascularHyperplasia
Endothelial Hyperplasia
Lee CooperSharath Cholleti
Recent Findings from Integrated Analysis of Necrosis, Angiogenesis, Gene Expression in
GBM• Lee A.D. Cooper; Carlos S. Moreno; Candace S. Chisolm; Christina Appin;
David A. Gutman; Jun Kong; Tahsin Kurc; Joel H. Saltz; Daniel J. Brat• Frozen sections from 88 GBM samples were manually marked to identify
regions of necrosis and angiogenic vessels exhibiting endothelial hypertrophy, hyperplasia, or complex microvascular proliferation
• Markups were used to calculate extent of both necrosis and angiogenesis as a percentage of total tissue area
• Gene expression from the HT-HGU133A platform analyzed using Significance Analysis of Microarrays (SAM); Cox Proportional Hazards modeling to identify mRNAs significantly associated with extent of necrosis and/or angiogenesis using a false discovery rate cutoff of < 5%
Recent Findings from Integrated Analysis of Necrosis, Angiogenesis, Gene Expression in
GBM• Associated with necrosis were master regulators of the
mesenchymal tumor subtype, including C/EBP-B, C/EBP-D, STAT3, FOSL2, and RUNX1
• IPA analysis of genes correlated with necrosis identified significantly enriched canonical pathways including :
• HIF-1α (p = 3.0e-7), NFκB (p = 1.4e-3), • IL-6 (p = 6.9e-6), FGF (p = 2.7e-5), • ERK/MAPK (p = 1.2e-4), • Protein Kinase A signaling (p = 1.9e-4), • Thrombin signaling (p = 5.2e-3),• HGF (p = 0.023) signaling.
Vasari Imaging Criteria(Adam Flanders, TJU; Dan Rubin, Stanford, Lori Dodd, NCI)
• Require standardized validated feature sets to describe de novo disease.
• Fundamental obstacle to new imaging criteria as treatment biomarkers is lack of standard terminology:– To define a comprehensive set of imaging
features of cancer– For reporting imaging results– To provide a more quantitative, reproducible
basis for assessing baseline disease and treatment response
Classify Imaging Features of Entire Tumor and Resected Specimen
Record features of the entire tumor at baseline.
Distinguish features that comprise tissue in resected specimen.
Imaging Features of Resected Specimen• Extent of resection of enhancing tumor• Extent resection of nCET• Extent resection of vasogenic edema
Defining Rich Set of Qualitative and Quantitative Image Biomarkers
• Community-driven ontology development project; collaboration with ASNR
• Imaging features (5 categories)– Location of lesion– Morphology of lesion margin (definition, thickness,
enhancement, diffusion)
– Morphology of lesion substance (enhancement, PS characteristics, focality/multicentricity, necrosis, cysts, midline invasion, cortical involvement, T1/FLAIR ratio)
– Alterations in vicinity of lesion (edema, edema crossing midline, hemorrhage, pial invasion, ependymal invasion, satellites, deep WM invasion, calvarial remodeling)
– Resection features (extent of nCE tissue, CE tissue, resectedcomponents)
Results: Reader Agreement• High inter-observer agreement among
the three readers– (kappa = 0.68, p<0.001)
• Percentage agreement was also high for most features individually– 22 of 30 features (73%) had agreement greater than
50%– Twelve features (40%) had >80% agreement– No feature had less than 20% agreement
• Feature agreement rose substantially when used with tolerance (+/- 1).
Preliminary Relationships of Features to Survival
• Cox proportional hazards models were fit to each of the thirty features related to overall survival.
• Features associated with lower survival included (p<.0001):– Proportion of enhancing tissue at baseline.– Thick or nodular enhancement characteristics.– Contralateral hemisphere invasion.
• Proportion of non contrast enhancing tumor (nCET) had positive correlation with survival.
• Tumor size at baseline had no relationship to survival.
Recent Findings Relating Radiology, Pathology “Omics”
• Linear regression models incorporating multiple imaging features or a single VASARI feature (ependymal extension) and tumor gene expression can be used to predict patient survival.
• Multiple statistically significant associations between imaging and genomic features in glioblastomas. EGFR mutant tumors were significantly larger than TP53 mutant tumors, and were more likely to demonstrate pial involvement. CDKN2A homozygous deletion associated with an ill-defined nonenhancing tumor margin and enhancing pial involvement.
• Significant association between minimal enhancing tumor (≤5% proportion of the overall tumor) and Proneural classification (p=0.0006). Significant association between a >5% proportion of necrosis and the presence of microvascular hyperplasia in pathology slides (p=0.008).
Minority GridGrady, Kaiser-Atlanta, MSM-East Point, Jackson-HindsMorehouse, Emory, Jackson Heart Study, University of
Washington, Baylor
• Aim 1: Establish organizational framework as consoritium of academic medical centers and minority-serving “safety net” medical care facilities
• Aim 2: Establish an EHR-linked bioinformatics/bio-repository infrastructure that facilitates in depth genotyping, phenotypic characterization and logitudinal surveillance of minority patients
• Aim 3: Demonstrate utility of MH-GRID with a “use case” project that defines genetic, personal and social-environmental determinants of severe hypertension in African Americans
• This platform could also be leveraged to carry out cancer studies
Overall Goals of Minority Grid
• Breadth and nature of genomic variation associated with clinical phenotypes among patients of various bio-geographical ancestral groups
• Bio-ancestry-specific, low frequency/major effect DNA variants that contribute to racial differences in drug responsiveness, health outcomes and health disparities
• Characterization of admixture• Long term outcome of patients with at-risk variants
revealed by whole exome sequencing
Approach
• Identification of 1200 cases, 1200 controls– Controls have longitudinal followup with BP consistently below
120/80• Whole exon sequencing• Detection of new common variants and rare/low frequency
variants• EHR data, interview data: health literacy, perceived stress,
dietary intake, physical activity, neighborhood characteristics (via geocoding)
• Clinical Laboratory analyses: electrolytes, plasma creatinine, lipid profile, glucose, estimated GFR
• Project funded for roughly 2 months and is getting underway
Transcontinental Railway: The Golden Spike - Triumph of Standards
Semantic Interoperability: Same ideas, different words
Challenges• Unprecedented magnitude of change
throughout the system• Constant flow of information to
manage• Legacy systems• Cultural barriers
The ca“BIG” Picture
(from Ken Beutow)
The ca“BIG” Picture
The cancer Biomedical Informatics Grid (caBIG):
• Standards-based vocabulary, data elements, data models facilitate information exchange
• Common, widely distributed infrastructure permits cancer community to focus on innovation
• Collection of interoperable component-based applications developed to common standards
• Cancer information is widely available to diverse communities (from Ken Beutow)
Biomedical Informatics and Middleware
DisseminatesInformation
GridInformation Integration
Brings in InformationGrid
Information Integration
Translates andIntegrates Information
Natural Language ProcessingOntologies
caGrid -- “Octopus middleware”
caGrid Components– Language (metadata,
ontologies)– Grid Service Graphical
Development Toolkit (Introduce)
– DICOM compatability (IVI middleware)
– Security (GAARDS)– Advertisement and
Discovery– Workflow
Integrated BIP• Architecture working group to design a common
architecture• Collaborative projects
– Security infrastructure– Testing framework– Bioinformatics support– Registry implementation at Grady for quality improvement
and cardiovascular research– LIMS deployment for biospecimen management– i2b2 deployment for clinical data
• Leverage institutional strengths for education and training
• Leverage over $3.8M in grant and internal funding this year
ACTSI-wide Federated Data Warehouse System
Develop integrative, federated ACTSI information warehouse Integrated clinical/imaging/”omic”/biomarker/tissue information
should always be available A virtually centralized, big Atlanta wide information warehouse that
has all relevant data Patients seen and information gathered at any ACTSI site, specimens sent
to any affiliated core, imaging carried out at any affiliated site Give me all gene expression, SNP, virtual slide images, hematology
studies and CMV serologies for kidney transplant candidates accrued into Study X or Study Y between Feb 2011 and Jan 2012 who were on the kidney transplant waiting list as of November 1, 2010.
Development efforts Security, Web Portal, Common Data Elements & Vocabularies,
Identifiers, High-performance Computing middleware, Testing framework.
DATA
INTEGRATION
Information Warehouse User AccessAcquisition Transfer
CPOEOR systemPatient MgmtDictated reportsPathology reports
Daily
ADT LabRespiratory Blood Endoscopy CardiologySiemens Img
Real time
Patient BillingPractice Plans
Pt Satisfaction
Weekly
Monthly
Cancer GeneticsWoundImagesTissuePulmonaryGenomic Data
Web
Multi-Dimensional Analysis & Data Mining Ad-hoc
Query
Error Report Benchmarking
De-IdentificationHonest Broker
Wound Center
Research
Web Scorecards & Dashboards
Image Analysis
Text Mining, NLP
Business Clinical
Research External
Meta Data
Ove
rvie
w
Crucial to Leverage Institutional Data
Ohio State Information Warehouse Infrastructure
ACTSI-wide Federated Data Warehouse
Enhanced Registries• Linked Databases for Research
• Leverages common data elements and models and existing standards. Initially for cardiovascular disease, diabetes and co-morbidities.
• Derived data elements represent categories of data and temporal patterns of interest.
• Linked to source data – initially, the Emory Healthcare Clinical Data Warehouse and the Grady Health System Diabetes Patient Tracking System.
• Supports end-user researcher query and analysis.
• Research PACS• Federated support for management of image data. • DICOM standards and Grid services for federated access.• Management of image analysis results.
Registry Project Status
• Co-morbidity registry prototype completed that exports demographics, encounters, readmissions, discharge diagnoses and diagnosis categories, and medication categories to Excel pivot tables
• Has been used by Emory Healthcare to identify co-morbidities associated with readmissions for patient populations at high risk
• System development is ongoing
1997: Virtual Microscope at Hopkins/Maryland
Distinguishing Characteristic in Gliomas
Use image analysis algorithms to segment and classify microanatomic features (Nuclei, Astrocytoma, Necrosis ...) in whole slide images
Represent the segmentation and classification in a well defined structured format that can be used to correlate the pathology with other data modalities
Oligodendroglioma Astrocytoma
Nuclear QualitiesRound shaped withsmooth regular texture
Elongated with rough, irregular texture
PAIS Database
Implemented with IBM DB2 for large scale pathologyimage metadata (~million markups per slide)
Represented by a complex data model capturing multi-faceted information including markups, annotations,algorithm provenance, specimen, etc.
Support for complex relationships and spatial query: multi-level granularities, relationships between markups andannotations, spatial and nested relationships
PAIS Database and Analysis Pipeline
Suite of analysis algorithms and pipelines that carry outthe following tasks:
1. segmentation of cells and nuclei;2. characterization of shape and texture features of
segmented nuclei;3. storage of nuclei meta-data in relational database;4. mechanism supporting spatial queries for human-
annotated nuclei;5. machine learning methods that integrate information from
features to accomplish classification tasks.
Image Mining for Comparative Analysis of Expression Patterns in Tissue Microarray
(PI’s: Foran and Saltz)
Build reference library ofexpression signatures, integrate state-of-the-art multi-spectral imaging capability and build a deployable clinical decision support system for analyzing imaged specimens.
Technologies and computational tools developed during the course of the project to be tested on a Grid-enabled, virtual laboratory established among strategic sites located at CINJ, Emory, RU, UPenn, OSU, and ASU.Funded by NIH through grant#5R01LM009239-02 David J. Foran, Ph.D.
ACTSI: Example Active Biomedical Informatics Projects
In Silico Study of Brain Tumors Minority Health Genomics and Translational Research
Bio-Repository Database (MH-GRID) ACTSI Cardiovascular, Diabetes, Brain Tumor Registry Early Hospital Readmission CFAR (Center for AIDS Research) HIV/Cancer Project Radiation Therapy and Quantitative Imaging Integrative Analysis of Text and Discrete Data Related
to Smoking Cessation and Asthma Metadata Analysis of Glycan Structures Semantic Query and Analysis of Integrative Datasets in
Renal Transplant Clinical Studies (CTOT-C)
Thanks to:
• In silico center team: Dan Brat (Science PI), Tahsin Kurc, Ashish Sharma, Tony Pan, David Gutman, Jun Kong, Sharath Cholleti, Carlos Moreno, Chad Holder, Erwin Van Meir, Daniel Rubin, Tom Mikkelsen, Adam Flanders, Joel Saltz (Director)
• caGrid Knowledge Center: Joel Saltz, Mike Caliguiri, Steve Langella co-Directors; Tahsin Kurc, Himanshu Rathod Emory leads
• caBIG In vivo imaging team: Eliot Siegel, Paul Mulhern, Adam Flanders, David Channon, Daniel Rubin, Fred Prior, Larry Tarbox and many others
• In vivo imaging Emory team: Tony Pan, Ashish Sharma, Joel Saltz• Emory ATC Supplement team: Tim Fox, Ashish Sharma, Tony Pan, Edi
Schreibmann, Paul Pantalone• Digital Pathology R01: Foran and Saltz; Jun Kong, Sharath Cholleti,
Fusheng Wang, Tony Pan, Tahsin Kurc, Ashish Sharma, David Gutman (Emory), Wenjin Chen, Vicky Chu, Jun Hu, Lin Yang, David J. Foran (Rutgers)
Thanks!