july 2012 update july 12, 2012 andrew j. buckler, ms principal investigator, qi-bench

July 2012 UpdateJuly 12, 2012

Andrew J. Buckler, MSPrincipal Investigator,

QI-Bench

WITH FUNDING SUPPORT

PROVIDED BY NATIONAL

INSTITUTE OF STANDARDS AND

TECHNOLOGY

Agenda for Today• Approach, plans, and progress on Testing• Analysis Modules

– Overview– Bias-Linearity Demo

• Second development iteration

22222222

Testing: System Under Test• Funtionality perspective:

– Specify, Formulate, Execute, Analyze, Package• Range of supported information:

– _loc, _dcm, _seg, _chg, and _cov data types

33333333

44444444

55555555

66666666

77777777

88888888

SUBJID ARM DOSE REGION AGE COUNTRY RACE SEX

99999999

1010101010101010

Testing: Risk-based, Multiple Scopes• Risk analysis (RA) specifies what level of

unit/module, integration, verification, and validation is needed based on application

• Validation itself: – Installation Qualification (IQ)– Operational Qualification (OQ)– Performance Qualification (PQ):

• capacity• speed• correctness (including curation and computation) • usability• utility

1111111111111111

TEST PLANS, PROTOCOLS, AND REPORTS

1212121212121212

Analysis ModulesModule Specification Status

Method Comparison

Radar plots and related methodology based on readings from multiple methods on data set with ground truth

Currently have 3A pilot in R, not yet generalized but straightforward to do so.Plan to refine based on Metrology Workshop results and include case of comparison without truth also.

Bias and Linearity

According to Metrology Workshop specifications

Demonstrate version today that works from summary statistics, e.g., to support meta-analysis. Plan to add analysis of individual reads.

Test-retest Reliability

According to Metrology Workshop specifications

Prototype demonstrated last month.Plan to build real module in next month.

Reproducibility (including detailed factor analysis)

Accepts as input fractional factorial data of cross-sectional biomarker estimates with range of fixed and random factors, produces mixed effects model

Module under development that will support both meta-analysis as well as direct data.

Variance Components Assessment

Accepts as input longitudinal change data, estimates variance due to various non-treatment factors

Module under development to support direct data.

1313131313131313

1414141414141414

1515151515151515

1616161616161616

Second development iteration: content and priorities

FunctionalityTheoretical Base Test Beds

Domain Specific Language

Executable Specifications

Computational Model

Enterprise vocabulary / data service registry

End-to-end Specify-> Package workflows

Curation pipeline workflows

DICOM:• Segmentation objects• Query/retrieve• Structured Reporting

Worklist for scripted reader studies

Improved query / search tools (including link of Formulate and Execute)

Continued expansion of Analyze tool box

Further analysis of 1187/4140, 1C, and other data sets using LSTK and/or use API to other algorithms

Support more 3A-like challenges

Integration of detection into pipeline

Meta-analysis of reported results using Analyze

False-positive reduction in lung cancer screening

Other biomarkers

1717171717171717

Value proposition of QI-Bench• Efficiently collect and exploit evidence establishing

standards for optimized quantitative imaging:– Users want confidence in the read-outs– Pharma wants to use them as endpoints– Device/SW companies want to market products that produce them

without huge costs– Public wants to trust the decisions that they contribute to

• By providing a verification framework to develop precompetitive specifications and support test harnesses to curate and utilize reference data

• Doing so as an accessible and open resource facilitates collaboration among diverse stakeholders

1919

Summary:QI-Bench Contributions• We make it practical to increase the magnitude of data for increased

statistical significance. • We provide practical means to grapple with massive data sets.• We address the problem of efficient use of resources to assess limits of

generalizability. • We make formal specification accessible to diverse groups of experts that are

not skilled or interested in knowledge engineering. • We map both medical as well as technical domain expertise into

representations well suited to emerging capabilities of the semantic web. • We enable a mechanism to assess compliance with standards or

requirements within specific contexts for use.• We take a “toolbox” approach to statistical analysis. • We provide the capability in a manner which is accessible to varying levels of

collaborative models, from individual companies or institutions to larger consortia or public-private partnerships to fully open public access.

2020

QI-BenchStructure / Acknowledgements• Prime: BBMSC (Andrew Buckler, Gary Wernsing, Mike Sperling, Matt Ouellette, Kjell Johnson, Jovanna

Danagoulian)

• Co-Investigators– Kitware (Rick Avila, Patrick Reynolds, Julien Jomier, Mike Grauer)– Stanford (David Paik)

• Financial support as well as technical content: NIST (Mary Brady, Alden Dima, John Lu)

• Collaborators / Colleagues / Idea Contributors– Georgetown (Baris Suzek)– FDA (Nick Petrick, Marios Gavrielides) – UMD (Eliot Siegel, Joe Chen, Ganesh Saiprasad, Yelena Yesha)– Northwestern (Pat Mongkolwat)– UCLA (Grace Kim)– VUmc (Otto Hoekstra)

• Industry– Pharma: Novartis (Stefan Baumann), Merck (Richard Baumgartner)– Device/Software: Definiens, Median, Intio, GE, Siemens, Mevis, Claron Technologies, …

• Coordinating Programs– RSNA QIBA (e.g., Dan Sullivan, Binsheng Zhao)– Under consideration: CTMM TraIT (Andre Dekker, Jeroen Belien)

2121

july 2012 update july 12, 2012 andrew j. buckler, ms principal investigator, qi-bench

Documents

direct data

data sets

cov data types33333333

input longitudinal change

analysis of individual

tool boxfurther analysis

metrology workshop results

funding support