cultivating collaboration - sharing data, code, and tools ... collaboration - shari… · be...

Post on 27-Jul-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

EDM ForumEDM Forum Community

Webinars Events

8-29-2013

Cultivating Collaboration - Sharing Data, Code,and Tools to Accelerate the Science of HealthcareAnthony D'AmicoKaiser Family Foundation

Xiaoqian JiangUC San Diego, x1jiang@ucsd.edu

Daniella MeekerRAND Corporation, dmeeker@rand.org

Fred TrotterDocGraph Journal and Not Only Development

Dave CliffordAvicenna

Follow this and additional works at: http://repository.academyhealth.org/webinars

Part of the Health Services Research Commons, and the Social and Behavioral SciencesCommons

This Video/Media is brought to you for free and open access by the Events at EDM Forum Community. It has been accepted for inclusion in Webinarsby an authorized administrator of EDM Forum Community.

Recommended CitationD'Amico, Anthony; Jiang, Xiaoqian; Meeker, Daniella; Trotter, Fred; and Clifford, Dave, "Cultivating Collaboration - Sharing Data,Code, and Tools to Accelerate the Science of Healthcare" (2013). Webinars. Paper 12.http://repository.academyhealth.org/webinars/12

Cultivating Collaboration – Sharing

Data, Code, and Tools to

Accelerate the Science of

Healthcare

Anthony D’Amico, Kaiser Family Foundation;

Xiaoqian Jiang, University of California- San

Diego; Daniella Meeker, RAND Corporation;

Fred Trotter, DocGraph Journal and Not Only

Development; Dave Clifford, Avicenna

August 29, 2013

Welcome

Erin Holve, Ph.D., M.P.H.,

M.P.P.

– Senior Director of Research

& Education, AcademyHealth

– Principal Investigator of the

EDM Forum

– eGEMs Editor-in-Chief

Follow the conversation on Twitter!

#eGEMs @edm_ah @academyhealth

AcademyHealth: Improving

Health & Health Care AcademyHealth is a leading national organization serving the fields of health

services and policy research and the professionals who produce and use

this important work.

Together with our members, we offer programs and services that support the

development and use of rigorous, relevant and timely evidence to:

1. Increase the quality, accessibility and value

of health care,

2. Reduce disparities, and

3. Improve health.

A trusted broker of information, AcademyHealth

brings stakeholders together to address the current

and future needs of an evolving health system,

inform health policy, and translate evidence into action.

The audio and slide presentation will

be delivered directly to your

computer

Speakers or headphones are required to hear the

audio portion of the webinar.

If you do not hear any audio now, check your

computer’s speaker settings and volume.

If you need an alternate method of accessing audio,

please submit a question through the Q&A pod.

Technical Assistance

Live technical assistance:

– Call Adobe Connect at (800) 422-3623

Refer to the ‘Technical Assistance’ box

in the bottom left corner for tips to

resolve common technical difficulties.

Please turn off your pop-up blocker in

order to take a survey

To submit a question:

1. Click in the Q&A box on the left side of your screen

2. Type your question into the dialog box and click the Send button

Questions may be submitted at

any time during the presentation

Advancing the National Dialogue on Use of HIT

for Research & Quality Improvement

Electronic Data Methods

(EDM) Forum Goals

– Work with the community to

identify cross-cutting

• Challenges

• Opportunities

• Research priorities

– Provide opportunities for

collaborative learning

– Ensure widespread

promotion of tools,

techniques, and findings

Join the Discussion Sign up at edmforum@academyhealth.org

Health Data Ecosystem

www.hhs.gov/open/datasets/communityhealthdata

*Researchers

are innovators

too….

The Landscape Electronic Health Data

Initiatives

The Data Quality

Collaborative Collaborative working

group of leading

experts

Developing a

comprehensive data

quality assessment

framework and

guidelines for the CER

community

Seeks feedback from

the community through

the EDM Forum

eRepository

New Brief! An

Organizing

Framework for

New Informatics

Tools and

Approaches

AcademyHealth. “Informatics Tools and Approaches To Facilitate

the Use of Electronic Data for CER, PCOR, and QI: Resources

Developed by the PROSPECT, DRN, and Enhanced Registry

Projects,” EDM Forum, August 2013.

New eJournal! eGEMs

(Generating Evidence and Methods to

improve patient outcomes)

Peer-reviewed and open access

ejournal

Submissions must:

– Address use of electronic clinical

data (i.e. EHRs) for research and

quality improvement

– Highlight generalizable ‘lessons

learned’ to accelerate translation,

dissemination, and implementation

of health science

– Explain why investigators’ work

contributes to improving patient

outcomes

Great Interest to Date

12 published manuscripts (since 1/17/13)

5,800+ publication downloads (as of 8/26/13)

20+ papers currently under review

Forthcoming Special Issues

– Ways Decision Makers Can Use Evidence to Improve Patient Outcomes in Learning Health Systems Guest Editor: Wade Aubry, University of California, San Francisco

– Methods for CER, PCOR, and QI Using Electronic Clinical Data in a Learning Health System Guest Editor: Michael Stoto, Georgetown University

For more information about eGEMs submission

guidelines, visit http://repository.academyhealth.org/egems

Transforming the Research Enterprise

“Make the idea bigger”

How to sustainably link emerging data and tools in a marketplace of people and ideas committed to transforming patient

care and outcomes?

Discovery

Implementation

Research

Care

Learning Objectives

Build awareness of opportunities to engage in open

data and research communities

Learn about coding in R for federal surveys;

techniques to facilitate distributed analyses; and use

provider data for research

Improve users' experience with new tools and data by

involving potential users in different stages of

development

Explore opportunities to build your career by engaging

in open source data and research activities

Today’s Faculty

Anthony D’Amico, Kaiser Family

Foundation

Xiaoqian Jiang, University of California-

San Diego

Daniella Meeker, RAND Corporation

Fred Trotter, DocGraph Journal and Not

Only Development

Dave Clifford, Avicenna, LLC

How to Analyze Survey Data for Free with the R Language

Anthony Damico Statistical Analyst Kaiser Family Foundation

Do you analyze survey data for work or pleasure?

Analyze Survey Data with the scripts on http://asdfree.com

My sincerest apologies Why are you here?

Do you speak any R? Do you analyze survey data

with SAS, SUDAAN, Stata, or SPSS?

Are you concerned that proprietary software makes statistical research

difficult to reproduce?

Does it bother you that your analyses might all be wrong?

Learn R by watching two-minute videos on http://twotorials.com

Do you mind the price tag?

Read the “Getting Started with R” Guide on

http://flowingdata.com

Hopefully you’ll never have to change jobs

Enroll in the free “Computing for Data Analysis” on http://coursera.com

nah required by supervisor

nah nah

nah

nah

nah

yeah

yeah yeah

yeah

yeah

..but Anthony, I hate the sound of your voice

..but I need something structured

done

done

done

...so you’re using Excel

yeah

Complex Sampling

Sample geographies first, then sample individuals within those

geographies.

19

American Community Survey (ACS) ; IPUMS - American Community Survey (IPUMS-USA) ; American Time Use Survey (ATUS) ; Behavior Risk Factor Surveillance System (BRFSS) ; Consumer Assessment of Healthcare Providers and Systems (CAHPS) ; Consumer Expenditure Survey (CE) ; Current Population Survey (CPS) ; IPUMS - Current Population Survey (IPUMS-CPS) ; Employer Health Benefits Survey (EHBS) ; General Social Survey (GSS) ; Health and Retirement Study (HRS) ; Medicare Current Beneficiary Survey (MCBS) ; Medical Expenditure Panel Survey (MEPS) ; National Health and Nutrition Examination Survey (NHANES) ; National Health Interview Survey (NHIS) ; National Longitudinal Study of Adolescent Health (AddHealth) ; National Longitudinal Surveys (NLS) ; National Study of Drug Use and Health (NSDUH) ; Panel Study of Income Dynamics (PSID) ; Survey of Business Owners (SBO) ; Survey of Consumer Finances (SCF) ; Survey of Income and Program Participation (SIPP) ; Youth Risk Behavior Surveillance System (YRBSS)

Complex Sample Survey Data Sets

21

twotorials.com

asdfree.com 1) Download Automation

2) Replication Scripts

3) Current Analysis Examples

22

Do you analyze survey data for work or pleasure?

Analyze Survey Data with the scripts on http://asdfree.com

My sincerest apologies Why are you here?

Do you speak any R? Do you analyze survey data

with SAS, SUDAAN, Stata, or SPSS?

Are you concerned that proprietary software makes statistical research

difficult to reproduce?

Does it bother you that your analyses might all be wrong?

Learn R by watching two-minute videos on http://twotorials.com

Do you mind the price tag?

Read the “Getting Started with R” Guide on

http://flowingdata.com

Hopefully you’ll never have to change jobs

Enroll in the free “Computing for Data Analysis” on http://coursera.com

nah required by supervisor

nah nah

nah

nah

nah

yeah

yeah yeah

yeah

yeah

..but Anthony, I hate the sound of your voice

..but I need something structured

done

done

done

...so you’re using Excel

yeah

Accelerating Open Science in Healthcare Through Open Code, Data and Process

Xiaoqian Jiang, Ph.D.

Division of Biomedical Informatics University of California San Diego

24

--Experience based on Grid Logistic Regression development

Open Code

25

Allow software to be freely used, modified, and shared.

Licence Year

BSD 3-Clause "New" or "Revised" license 1988

BSD 2-Clause "Simplified" or "FreeBSD" license 1988

MIT license 1988

Apache License 2.0 2004

Eclipse Public License 2004 Common Development and Distribution License 2005

GNU General Public License (GPL) 2007 GNU Library or "Lesser" General Public License (LGPL) 2007

Mozilla Public License 2.0 2012

Open Code

26

Webservices Location Privacy Preserving SVM http://privacy.ucsd.edu:8080/ppsvm/ Web Grid Logistic Regression http://dbmi-engine.ucsd.edu/webglore/ Interactive Matching Patients And randomized Clinical Trials http://dbmi-engine.ucsd.edu/IMPACT/

Softwares Deposit

Distributed Cox backbone https://code.google.com/p/distributed-cox/ Randomized clinical trial matching backbone https://code.google.com/p/grouprct/ Grid Logistic Regression Backbone https://code.google.com/p/glore/ Sequential minimal optimization based SVM http://hwanjoyu.org/svm-java/ Web-based model calibration framework https://code.google.com/p/webcalibsis/

Differential PCA algorithm https://code.google.com/p/dpca/

http://idash.ucsd.edu/idash-softwaretools More on

Open tutorial

27

• Data use agreements across institutions o Limited and complicated

o Specific to a particular study

o Resources for sharing are limited

o Security/privacy constraints are hard for small institutions to follow

• Sharing data today o Little incentive

o Only one model: users download data

o Yes/No decision on sharing

Open Data

Thanks Dr. Ohno-Machado for this slide.

Accelerating Open Science

29

• Research is a Process,

sharing our

experience may

accelerate Science Open Science

Healthcare research

Data collection

Algorithm development

Software implementation

Results verification

Backbone development and verification

UI prototype and soliciting UX advices

Integrated system and leave room for extension

Tw

o S

tag

e d

eve

lop

me

nt

Constantly checking

users’ experience

My Experience in Developing

Grid Logistic Regression

30

Two stage biomedical webservice

development

Improving users' experience

through involving potential users in

different stages of the development

Motivation

• Traditional approaches to data sharing has

limitations and undermined the ability of

researchers and clinicians to access, aggregate,

and meaningfully analyze patient records at the

point of care.

• WebGlore is a webservice for biomedical

researchers to build a global predictive logistic

regression model without sharing data.

31

Patient data

Patient data

32

Aggregated information, i.e., marginal distribution, sufficient statistics, kernel matrix

Share model vs. Disseminate data

Developers and expertise

33

Machine learning

Statistics

Signal Processing JAVA, PHP

JSP, PHP

JAVA, PHP

JAVA, PHP

JAVA UI, HTML, CSS

Predictive modeling

Google Code as Version Control

34

Foundation of GLORE

35

• Suppose m-1 features are

consistent over k sites

• In each iteration,

intermediary results of a

mxm matrix and a m-

dimensional vector are

transmitted to k-1 sites

Backbone implementation

Implementation • R backbone

o https://www.dropbox.com/s/gmnr

qgifdq9tjd7/glore_R.zip

• JAVA backbone o https://code.google.com/p/glore/

37

Human factor and user experience are important!

38

Check Point 1:

Performance Validation

Check User Experience

A first thought about UX

39

Client interface Setup task parameters

A first thought about UX

40

Setup task parameters -- filling task details

A first thought about UX

41

Client interface Join a task

A first thought about UX

42

Client interface Show result

43

Check Point 2:

Check Potential Users’ Satisfaction

Potential Users’ feedback

• Advantage o Easy to implement

o Flexibility in developing complex interface

o Friendly to tools and packages that sit on local clients

• Disadvantage o Healthcare environments are reluctant to install third party software

o Communication through pre-specified ports is of security concern

o Do not support all platforms unless implemented individually

44

First webservice development

WebGLORE 1.0

• An easy-to-use software as a service for healthcare

should be: o Plug-in ready(User protected)

o Deployable in a variety of hosting environments (Platform friendly)

o Security and firewall compatible(Security-enhanced network)

46

Applet-Servlet architecture

47

48

Check Point 3:

Check Potential Users’ Satisfaction

Critical advises from testers

• Pros. o Transparent model construction procedures, which allow participants to

see the intermediary steps

o Visualization on model helps users to understand model performance and

reveal important factors

• Cons. o Users cannot see their historical activities

o Users cannot change the user profile

o Repeated warnings from JAVA applet in browsers are annoying

49

Second webservice development

WebGLORE 2.0

51

http://dbmi-engine.ucsd.edu/webglore2/

Generate reports

52

53

Check Point 4:

Check System Validity

Experiments

• CA-19 and CA-125 data

54

run

nin

g t

ime

(sec

on

ds)

co

mp

aris

on

Estimate Std. Error Z-value Pr(>|z|)

Intercept -1.4645 0.3881 -3.7739 1.61E-04

CA19 0.0274 0.0085 3.2063 1.34E-03

CA125 0.0163 0.0077 2.1008 3.57E-02

H-L test p-value = 0.891

AUC = 0.891

Experiments

• Breast cancer biomarkers (CA-19, CA-125)

H-L test p-value = 0.891

AUC = 0.891

• Edinburgh myocardial infraction data

H-L test p-value = 0.430

AUC = 0.699

55

Estimate Std. Error Z-value Pr(>|z|)

Intercept -1.4645 0.3881 -3.7739 1.61E-04

CA19 0.0274 0.0085 3.2063 1.34E-03

CA125 0.0163 0.0077 2.1008 3.57E-02

Estimate Std. Error Z-value Pr(>|z|)

Intercept -4.3485 0.2968 -14.6508 0.00E+00

Pain in left arm 0.1816 0.2680 0.6777 4.98E-01

Pain in right arm 0.1764 0.3061 0.5763 5.64E-01

Nausea 0.1323 0.3862 0.3426 7.32E-01

Hypoperfusion 2.2511 0.6590 3.4160 6.36E-04

ST elevation 5.5556 0.4404 12.6150 0.00E+00

New Q waves 4.1453 0.6747 6.1435 8.07E-10

ST depression 3.4173 0.2815 12.1392 0.00E+00

T wave inversion 1.2030 0.2635 4.5649 5.00E-06

Sweating 0.2721 0.2510 1.0837 2.79E-01

56

Check Point 5:

External Validation

Experiments

57

• Cincinnati data (ImproveCareNow!)

Site 1 - 245 observations on 5 patients.

Site 2 - 563 observations on 24 patients.

A quality improvement and research

collaborative focused on improving the

care and outcomes of children with

Inflammatory Bowel Disease

Experiments • Cincinnati data (ImproveCareNow!)

58

Site 1 - 245 observations on 5 patients.

Site 2 - 563 observations on 24 patients.

F1 - patient id

F2 - weeks to response

F3 - patient on biologics

F4 - days since diagnosis

F5 – gender

F6 – Race

F7 - Age in years at start of treatment

F8 - Extent of disease

F9 - patient on thiopurine

F10 - patient on methotrexate

F11 - patient on salicylate

F12 - patient on steroids

F13 - days since diagnosis (recorded variable)

F14 - gender (recorded variable)

F15 - race (recorded variable)

F16 - race (factor variable)

F17 - patient on steroid (factor variable)

F18 - patient on salicylate (factor variable)

F19 - patient on thiopurine (factor variable)

F20 - patient on methotrexate (factor variable)

F21 - patient diagnosis F22 - patient diagnosis (factor variable)

Features

A quality improvement and research

collaborative focused on improving the

care and outcomes of children with

Inflammatory Bowel Disease

Experiments

59

F1 - patient id

F2 - weeks to response

F3 - patient on biologics

F4 - days since diagnosis

F5 – gender

F6 – Race

F7 - Age in years at start of treatment

F8 - Extent of disease

F9 - patient on thiopurine

F10 - patient on methotrexate

F11 - patient on salicylate

F12 - patient on steroids

F13 - days since diagnosis (recorded variable)

F14 - gender (recorded variable)

F15 - race (recorded variable)

F16 - race (factor variable)

F17 - patient on steroid (factor variable)

F18 - patient on salicylate (factor variable)

F19 - patient on thiopurine (factor variable)

F20 - patient on methotrexate (factor variable)

F21 - patient diagnosis F22 - patient diagnosis (factor variable)

Features

Target = responded to treatment (i.e., improvement in condition)

Target

• Cincinnati data (ImproveCareNow!)

Site 1 - 245 observations on 5 patients.

Site 2 - 563 observations on 24 patients.

A quality improvement and research

collaborative focused on improving the

care and outcomes of children with

Inflammatory Bowel Disease

Experiments • Cincinnati data (ImproveCareNow!)

60

Predictor Beta SE Z-statistics df p Odds ratio Intercept 4.8802 2581.989 0.0019 1 0.9985 N/A F1 0.0034 0.0016 2.1977 1 0.028 1.0035 F2 0.1143 0.0373 3.0652 1 0.0022 1.1211 F3 1.8766 0.9398 1.9969 1 0.0458 6.5311 F4 0.0027 0.0012 2.206 1 0.0274 1.0027 F5 -1.7232 1290.995 -0.0013 1 0.9989 0.1785 F6 -0.7147 0.4921 -1.4523 1 0.1464 0.4893 F7 -0.5522 0.1909 -2.8926 1 0.0038 0.5757 F8 0.0673 0.1231 0.5469 1 0.5845 1.0696 F9 -0.8537 2236.068 -0.0004 1 0.9997 0.4259 F10 0 3162.278 0 1 1 1 F11 0.5396 2236.068 0.0002 1 0.9998 1.7154 F12 0.3057 2236.068 0.0001 1 0.9999 1.3576 F13 0.0245 1.0657 0.023 1 0.9816 1.0248 F14 0.7519 1290.995 0.0006 1 0.9995 2.1211 F15 0.5949 2236.068 0.0003 1 0.9998 1.8128 F16 0.5949 2236.068 0.0003 1 0.9998 1.8128 F17 0.3057 2236.068 0.0001 1 0.9999 1.3576 F18 0.5396 2236.068 0.0002 1 0.9998 1.7154 F19 -0.8537 2236.068 -0.0004 1 0.9997 0.4259 F20 0 3162.278 0 1 1 1 F21 -0.3472 2236.068 -0.0002 1 0.9999 0.7066 F22 -0.3472 2236.068 -0.0002 1 0.9999 0.7066

Calibration Error = 0.05

AUC = 0.744

HL-C = 0.26

HL-H = 0.59

Acknowledgements

• We thank Dr. Hamish Fraser and Dr. Kelly Zou for providing the

clinical data

• We thank Dr. Keith Marsolo for the helpful advice

• We thank EDM forum and iDASH for supporting this research!

61

Discussion Questions • What is the most favorable format of open software

the community wants?

62

AMIA’12 Privacy Preserving Support Vector Machine

AMIA’13,

Bioinformatics Grid Logistic Regression

Submitted to BMC Distributed Cox Proportional Hazard Model

How do you like to share?

63

SaaS

PaaS

IaaS Operators, Developers, Collaborators

Researchers, Developers Collaborators

Healthcare professionals, End-user services

• What are the features you envision to have in order

to facilitate code, data, and process sharing?

Thanks

64

Cultivating Collaboration – Sharing Data, Code, and Tools to Accelerate the Science of Healthcare 29 August, 2013 EDM Forum Webinar

Daniella Meeker, RAND Corporation

65

Research: Structured Data And Code

Academic healthcare science • Text-based journals are the currency of continued funding

• A journal article eliminates structure and information from original data and puts it into a file cabinet

• Obscuring methods and data

• Slow

dissemination publication

value

infrastructure

66

• Methods cannot be exchanged and replicated

• Data is rarely exchanged and re-analyzed for robustness

• Redundant work

• Publication bias

• No infrastructure for efficient collaboration

• Code sharing

• Metadata standards

• No incentives for collaboration in the scientific community

• Journal articles are released slowly and without detail

• Data has greater utility to investigators if it is hoarded

• Academic funding model does not support sustainable infrastructure

Academic healthcare science

67

Commercial health data science AKA business intelligence

• Environment – the “real” learning health system • Health care practice is moving more quickly than health services

research.

• Post-ACA, providers and plans motivated to leverage their data to find efficiencies

• Despite regulation, healthcare is among the fastest growing segments of cloud computing: infrastructure as a services (IaaS) and software as a services (SaaS)

• Funding model for commercial healthcare data science supports creation of scalable tools and an efficient marketplace for tools and analysis • Software engineers are part of staff

• Analytic services

• Incentives for dissemination are mixed

68

Collaboration Infrastructure models from other sciences

• Open Science Grid Physics, nanotechnolgy, structural biology

• OSG: 1.4M CPU-hours/day, >90 sites, >3000 users,

• >260 pubs in 2010

• LIGO Physics/Astrophysics

• Established practices and metadata standards

• 1 PB data in last science run, distributed worldwide

• ESGF • 1.2 PB climate data • delivered to 23,000 users; 600+

pubs

• Collage – Executable papers Computer science

69

Incentivizing a Learning Health System

70

• Research and practice must become interoperable • Requires commitment to a single standard across multiple

agencies • In the age of BI relevant research must go beyond

secondary analysis and link basic biology and biomedicine data with patient reported data

• Repositories and clearinghouses are a good start, but not enough…LHS requires searchable assets with high utility

• discoverable • standards for metadata and coding practices • computable artifacts • application sandboxes with realistically simulated data

• Incentives for collaboration and sharing. • Create a marketplace for reusable tools that links tool

utility and reuse to research funding.

TOXNET

If you really love your data, you will set it free.

-ft

Pursing Open Data in Healthcare

Why?

How?

Our Two Efforts

DocGraph

toEleven

Why Open Source your Data?

From Eric Raymond's “The Cathedral and the Bazaar”

The Tragedy of the Commons

vs

The Magic Cauldron

How?

Prepare to receive the secret recipe for successfully running an Open Source project:

In seriousness

Let your community connect with each other. You need a mailing list

Visit ours at DocGraph.org

Use either

Google Groups

Discourse

DocGraph

Is an graph data set of the healthcare system

It shows how doctors, hospitals, labs, etc work together to provide care

Based on a FOIA request to CMS

~50 Million Edges

~2 Million nodes

Crowdfunded Asked for $15k to develop data set, and got $60k on Medstartr

Open Data set

Download the Open Data Set for $1

Open Source version requires research be contributed back

Join the mailing list

Do something amazing

toEleven

Part of a “grand plan” with Ian Eslick

Born out of Academy Health collaboration

Goals:

Make research translation to digital interventions sustainable by dramatically

lowering development and ongoing maintenance costs.

toEleven

Is the mobile app front end for Ians n=1 server backend components

Developing with CCHMC around iMigraine applications

About to announce a Food Database Project

Dave Clifford

To submit a question:

1. Click in the Q&A box on the left side of your screen

2. Type your question into the dialog box and click the Send button

Submitting Questions

Thank You

Please take a moment to fill out the

brief evaluation which will appear in your browser.

top related