monthly program update february 9, 2012 andrew j. buckler, ms principal investigator with funding...

Monthly Program Update February 9, 2012 Andrew J. Buckler, MS Principal Investigator WITH FUNDING SUPPORT PROVIDED BY NATIONAL INSTITUTE OF STANDARDS AND TECHNOLOGY

Agenda 10,000 foot view of where weve been and where were going Implementation-independent computational model for quantitative imaging development For each QI-Bench app, a demonstration of what exists and a gap analysis relative to model (including first demo of Formulate, and review of major Execute update) 22

10,000 Foot View PeriodActivityTest BedDevelopers, Users 2009 -> Winter 2011User needs and requirements analysis 1, 2 Spring/Summer 2011Initial Execute (including RDSM and BAS) and desktop Analyze (initial prototyping)3, 4 Autumn 2011Initial Specify (including QIBO and BiomarkerDB) Individual time point demonstrator 5, 10 Winter 2012Initial Formulate, and major update to Execute 3A Pilot5, 30 Spring/Summer 2012Computational model to drive architecture, and support for semi-automated workflows 3A Pivotal and longitudinal change demonstrator Autumn 2012 -> 2013Exercise end-to-end chain with reproducible workflows (various) 333

Where we left off last month 4444 Formulate Reference Data Sets QIBO Specify RDF Triple Store CT Volumetry CT obtained_by Tumor growth measure_of Therapeutic Efficacy Therapeutic Efficacy used_for Analyze Y= 0..n + 1 (QIB)+ 2 T+ e ij Execute Feedbac k

DNF Model: Data // Data resources: RawDataType = ImagingDataType | NonImagingDataType | ClinicalVariableType CollectedValue = Value + Uncertainty DataService = { RawData | CollectedValue } // implication being that contents may change over time ReferenceDataSet = { RawData | CollectedValue } // with fixed refresh policy and documented (controlled) provenance // Derived from analysis of one or more ReferenceDataSets: TechnicalPerformance = Uncertainty | CoefficientOfVariation | CoefficientOfReliability | ClinicalPerformance = ReceiverOperatingCharacteristic | PPV/NPV | RegressionCoefficient | SummaryStatistic = TechnicalPerformance| ClinicalPerformance 555555

DNF Model: Knowledge // Managed as Knowledge store: Relation = subject property object (property object) BiomarkerDB = { Relation } //Examples: OntologyConcept has Instance | Biomarker isUsedFor BiologicalUse // use | Biomarker isMeasuredBy AssayMethod // method | AssayMethod usesTemplate AimTemplate // template | AimTemplate includes CollectedValuePrompt // prompt | ClinicalContext appliesTo IndicatedBiology // biology | (AssayMethod targets BiologicalTarget) withStrength TechnicalPerformance | (Biomarker pertainsTo ClinicalContext) withStrength ClinicalPerformance | generalizations beyond this 666666

Requirements drive function // Business Requirements FNIH, QIBA, and C-Path participants dont have a way to provide precise specification for context for use and applicable assay methods (to allow semantic labeling): BiomarkerDB = Specify (biomarker domain expertise, ontology for labeling); 777777

Requirements drive function // Business Requirements FNIH, QIBA, and C-Path participants dont have a way to provide precise specification for context for use and applicable assay methods (to allow semantic labeling): BiomarkerDB = Specify (biomarker domain expertise, ontology for labeling); Researchers and consortia dont have an ability to exploit existing data resources with high precision and recall: ReferenceDataSet+ = Formulate (BiomarkerDB, {DataService} ); 888888

Requirements drive function // Business Requirements FNIH, QIBA, and C-Path participants dont have a way to provide precise specification for context for use and applicable assay methods (to allow semantic labeling): BiomarkerDB = Specify (biomarker domain expertise, ontology for labeling); Researchers and consortia dont have an ability to exploit existing data resources with high precision and recall: ReferenceDataSet+ = Formulate (BiomarkerDB, {DataService} ); Technology developers and contract research organizations dont have a way to do large- scale quantitative runs: ReferenceDataSet.CollectedValue+ = Execute (ReferenceDataSet.RawData); 999999

Requirements drive function // Business Requirements FNIH, QIBA, and C-Path participants dont have a way to provide precise specification for context for use and applicable assay methods (to allow semantic labeling): BiomarkerDB = Specify (biomarker domain expertise, ontology for labeling); Researchers and consortia dont have an ability to exploit existing data resources with high precision and recall: ReferenceDataSet+ = Formulate (BiomarkerDB, {DataService} ); Technology developers and contract research organizations dont have a way to do large- scale quantitative runs: ReferenceDataSet.CollectedValue+ = Execute (ReferenceDataSet.RawData); The community lacks way to apply definitive statistical analyses of annotation and image markup over specified context for use: BiomarkerDB.SummaryStatistic+ = Analyze ( { ReferenceDataSet.CollectedValue } ); 10

Requirements drive function // Business Requirements FNIH, QIBA, and C-Path participants dont have a way to provide precise specification for context for use and applicable assay methods (to allow semantic labeling): BiomarkerDB = Specify (biomarker domain expertise, ontology for labeling); Researchers and consortia dont have an ability to exploit existing data resources with high precision and recall: ReferenceDataSet+ = Formulate (BiomarkerDB, {DataService} ); Technology developers and contract research organizations dont have a way to do large- scale quantitative runs: ReferenceDataSet.CollectedValue+ = Execute (ReferenceDataSet.RawData); The community lacks way to apply definitive statistical analyses of annotation and image markup over specified context for use: BiomarkerDB.SummaryStatistic+ = Analyze ( { ReferenceDataSet.CollectedValue } ); Industry lacks standardized ways to report and submit data electronically: efiling transactions+ = Package (BiomarkerDB, {ReferenceDataSet} ); 11

Layered requirement: Reproducible Workflows with Documented Provenance // Business Requirements FNIH, QIBA, and C-Path participants dont have a way to provide precise specification for context for use and applicable assay methods (to allow semantic labeling): BiomarkerDB = Specify (biomarker domain expertise, ontology for labeling); Researchers and consortia dont have an ability to exploit existing data resources with high precision and recall: ReferenceDataSet+ = Formulate (BiomarkerDB, {DataService} ); Technology developers and contract research organizations dont have a way to do large- scale quantitative runs: ReferenceDataSet.CollectedValue+ = Execute (ReferenceDataSet.RawData); The community lacks way to apply definitive statistical analyses of annotation and image markup over specified context for use: BiomarkerDB.SummaryStatistic+ = Analyze ( { ReferenceDataSet.CollectedValue } ); Industry lacks standardized ways to report and submit data electronically: efiling transactions+ = Package (BiomarkerDB, {ReferenceDataSet} ); 12

Implementation-independent computational model: Computational and informatics design implications: It is possible to model chain as a process to achieve prescribed levels of statistical significance for validity and utility It is possible to apply logical and statistical inference to address generalizability of the results across clinical contexts Of course, bottleneck remains availability of data; purpose here is to define informatics services to make best use of that data as to: How to optimize information content from any given experimental study, and How to incorporate individual study results into a formally defined description of the biomarker acceptable to regulatory agencies. 13 // Altogether efiling transactions = Package (Analyze (Execute (Formulate (Specify (biomarker domain expertise), DataService))));

BiomarkerDB = Specify (domain expertise to describe biomarkers); RoleUse CaseSupported NowGap vs. Model Domain expert Add/edit knowledge Web App (thin) Can create triples in local store for local community Need to create knowledge spanning domains and communities (not just local) Informaticist Curate knowledge Desktop (thick) Triples can be curated singly, Ontology curated in Protg Need to edit clusters of triples (not just one at a time), need system support for input to ontology-level curation IT systems expert Link ontologies Server-side All ontologies on BioPortal are supported Cant link across ontologies directly 14

Project directions for Specify Extend the QIBO to link to existing established ontologies 1.leverage BFO upper ontology to align different ontologies 2.convert portions of BRIDG and LSDAM to ontology models in OWL Automated conversion done in two steps: 1.convert current Sparx Enterprise Architect XMI EMF UML format 2.export resulting EMF UML into a RDF/OWL representation using TopBraid Composer Provide GUI to traverse the QIBO concepts according to their relationships and create statements represented as RDF triples and stored in an RDF store. example: Image is from Patient AND Patient has age >60 AND Patient has Disease where Disease has value Lung Cancer AND Patient has Smoking Status where Smoking Status has value True. This translates to Find me all images that are from a patient older than 60, diagnosed with lung cancer and is a smoker. Each set of RDF triples will be stored as a profile in Bio2RDF 15

ReferenceDataSet+ = Formulate (BiomarkerDB, {DataService} ); 16 RoleUse CaseSupported NowGap vs. Model Domain expert Find data with high precision and recall Web App (thin) Perform saved searches and organize data Granular, role-based, security with single sign-on to both private and public data resources Informaticist Form and expose queries to find data Desktop (thick) Use UML models Define queries in terms of RDF triples driven by ontologies (not UML) IT systems expert Configure knowledge resources and data services Server-side Configure resources that use caGrid Prepare and support more flexible method (e.g., SPARQL) so as not to be limited by caGrid

Project directions for Formulate Enables users to select the profiles (set of RDF triples) created in Specify, execute a query and retrieve the results in various forms. assemble/transform the set of RDF triples to SPARQL queries: 1.form an uninterrupted chain linking the instance of the input class from the ontology to the desired output class 2.formulate/invoke necessary SPARQL queries against the web services. 17 Activities: Load models used in Specify to caB2B A transformer that will take Source (specify) Load it to caB2B Expose the query through caB2B Web Client Create a form-based query parameterization utility Longer term: Thick client will assist modification of the query or facilitate reparameterization Activities: Load models used in Specify to caB2B A transformer that will take Source (specify) Load it to caB2B Expose the query through caB2B Web Client Create a form-based query parameterization utility Longer term: Thick client will assist modification of the query or facilitate reparameterization Activities End point representation (caGrid Service/SPARQL) Target language is not DCQL but SPARQL Come up with some SPARQL End Point wrappers for some projects of interest (pilots) Universal wrapper for caGrid services (?) For sure one for DICOM and TCIA/Midas Activities End point representation (caGrid Service/SPARQL) Target language is not DCQL but SPARQL Come up with some SPARQL End Point wrappers for some projects of interest (pilots) Universal wrapper for caGrid services (?) For sure one for DICOM and TCIA/Midas Activities Results in RDF Transformed results in CSV and some form digestable by Midas Directly stored (direct interfacing) Manual load (indirect interfacing) Activities Results in RDF Transformed results in CSV and some form digestable by Midas Directly stored (direct interfacing) Manual load (indirect interfacing)

ReferenceDataSet.CollectedValue+ = Execute (ReferenceDataSet.RawData); 18 RoleUse CaseSupported NowGap vs. Model Domain expert Set up studies to gather imaging results Web App (thin) Warehouse data flexibly, initiate automated and (soon) semi- automated runs More complete automated support for data curation and script editing (current automated support for curation is limited) Informaticist Establish metadata standards and define scripted runs Desktop (thick) Single time point and (soon) longitudinal change studies on single data types Build composite markers with multiple parameters per modality and spanning multiple modalities IT systems expert Configure data marts of mixed data types Server-side Support multiple and mixed types of bit streams Explicit support for IHE Profiles

Project directions for Execute Script to write Image Formation content into Biomarker Database for provenance of Reference Data Sets: Application for pulling in data from the Image Formation schema to populate the biomarker database. This data will originate from the DICOM imagery imported into QI-Bench. Laboratory Protocol for the NBIA Connector and Batch Analysis Service: Laboratory protocol to describe the use of the NBIA Connector and the Image Formation script to import data into QIBench and use of the Batch Analysis Service for server-side processing. Support change analysis biomarkers serial studies (up to two time points in the current period, extensible to additional in subsequent development iterations): Support experiments including at minimum two time points. An example of this is the change in volume or SUV, rather than (only) estimation of the value at one time point. Document and solidify the API harness for execution modules of the Batch Analysis Service: This task will include the documentation and complete specification of the Batchmake Application API. Support scripted reader studies: Support reader studies through worklist items specified via AIM templates as well as Query/Retrieve via DICOM standards for interaction with reader stations. ClearCanvas will serve as the target reader station for the first implementation. Automated support for export of data to Analyze: Including generation of annotation and image markup output from reference algorithms (i.e., LSTK for volumetric CT and Slicer3D for SUV) based on AIM templates instead of the current hard- coded implementation. An AIM Template is an.xsd file. Automated support for import of data from Formulate: This task will include refactoring and stabilization of the NBIA Connector in order to incorporate its functionality into Formulate. 19

BiomarkerDB.SummaryStatistic+ = Analyze ({ReferenceDataSet.CollectedValue}); 20 RoleUse CaseSupported NowGap vs. Model Domain expert Run and modify statistical analyses Web App (thin) n/aCreate the web app Informaticist Configure set of analysis scripts in toolbox Desktop (thick) Run relevant calculations at technical performance level Persist statistical calculations results as N- ary relations in knowledge store, across larger data sets and at clinical level too IT systems expert Configure data input and output services Server-side Connects to (some)caGrid data services Connect directly to RDSM and knowledge store (and through Formulate, broader set of data services)

Analyze Current Capabilities 21

22 Project directions for Analyze Data services Images, Patient info Annotations, Collections Experiments AD Server Current MVT AIM, DICOM Experiment data Metadata Modified XIP Host Web Client AD Client Non- Grid Data Sources MVT Measurement Variability Tool XIP LIB Cached objects: AIM/DICOM, etc R LIB Data Access Desktop GUI Scope: GWT (or Tapestry) UI ; Create web client version Change from DB2 to RESTful service layer; Add calculation results to persistent database Implemented according to Open Source Development Initiative (OSDI) recommendations; In such a way as to enable the enhancement roadmap; and Integrated with projects driving advanced semantics and FDA support

efiling transactions+ = Package (BiomarkerDB, {ReferenceDataSet}); 23 RoleUse CaseSupported NowGap vs. Model Domain expert Define and pull a data package Web App (thin) n/a Define and pull a data package Informaticist Define data mappings Desktop (thick) n/aDefine data mappings IT systems expert Connect to electronic regulatory systems Server-siden/a Connect to electronic regulatory systems Package is a lower priority application, current effort is to participate In vocabulary standards efforts, staging implementation later.

Outlook PeriodActivityTest BedDevelopers, Users 2009 -> Winter 2011User needs and requirements analysis 1, 2 Spring/Summer 2011Initial Execute (including RDSM and BAS) and desktop Analyze (initial prototyping)3, 4 Autumn 2011Initial Specify (including QIBO and BiomarkerDB) Individual time point demonstrator 5, 10 Winter 2012Initial Formulate, and major update to Execute 3A Pilot5, 30 Spring/Summer 2012Computational model to drive architecture, and support for semi-automated workflows 3A Pivotal and longitudinal change demonstrator Autumn 2012 -> 2013Exercise end-to-end chain with reproducible workflows (various) 24

Value proposition of QI-Bench Efficiently collect and exploit evidence establishing standards for optimized quantitative imaging: Users want confidence in the read-outs Pharma wants to use them as endpoints Device/SW companies want to market products that produce them without huge costs Public wants to trust the decisions that they contribute to By providing a verification framework to develop precompetitive specifications and support test harnesses to curate and utilize reference data Doing so as an accessible and open resource facilitates collaboration among diverse stakeholders 26

Summary: QI-Bench Contributions We make it practical to increase the magnitude of data for increased statistical significance. We provide practical means to grapple with massive data sets. We address the problem of efficient use of resources to assess limits of generalizability. We make formal specification accessible to diverse groups of experts that are not skilled or interested in knowledge engineering. We map both medical as well as technical domain expertise into representations well suited to emerging capabilities of the semantic web. We enable a mechanism to assess compliance with standards or requirements within specific contexts for use. We take a toolbox approach to statistical analysis. We provide the capability in a manner which is accessible to varying levels of collaborative models, from individual companies or institutions to larger consortia or public-private partnerships to fully open public access. 27

QI-Bench Structure / Acknowledgements Prime: BBMSC (Andrew Buckler, Gary Wernsing, Mike Sperling, Matt Ouellette) Co-Investigators Kitware (Rick Avila, Patrick Reynolds, Julien Jomier, Mike Grauer) Stanford (David Paik, Tiffany Ting Liu) Financial support as well as technical content: NIST (Mary Brady, Alden Dima, Guillaume Radde) Collaborators / Colleagues / Idea Contributors FDA (Nick Petrick, Marios Gavrielides) UCLA (Grace Kim) UMD (Eliot Siegel, Joe Chen, Ganesh Saiprasad) VUmc (Otto Hoekstra) Northwestern (Pat Mongkolwat) Georgetown (Baris Suzek) Industry Pharma: Novartis (Stefan Baumann), Merck (Richard Baumgartner) Device/Software: Definiens (Maria Athelogou), Claron Technologies (Ingmar Bitter) Coordinating Programs RSNA QIBA (e.g., Dan Sullivan, Binsheng Zhao) Under consideration: CTMM TraIT (Andre Dekker, Jeroen Belien) 28

monthly program update february 9, 2012 andrew j. buckler, ms principal investigator with funding...

Documents