observa(onal health data sciences and informa(cs · pdf fileobserva(onal health data sciences...
TRANSCRIPT
Observa(onalHealthDataSciencesandInforma(cs
(OHDSI)GeorgeHripcsak,MD,MS
ColumbiaUniversityMedicalCenter
NewYork-PresbyterianHospital
Sea>leSymposiumonHealthCareDataAnalyBcs
Observa(onalHealthDataSciencesandInforma(cs(OHDSI,as“Odyssey”)
AmulB-stakeholder,interdisciplinary,
internaBonalcollaboraBvewithacoordinaBng
centeratColumbiaUniversity
Mission:Toimprovehealth,byempoweringa
communitytocollaboraBvelygeneratethe
evidencethatpromotesbe>erhealthdecisions
andbe>ercare
Aimingfor1,000,000,000paBentdatanetwork
h>p://ohdsi.org
OHDSI’sglobalresearchcommunity
• >140collaboratorsfrom20differentcountries
• ExpertsininformaBcs,staBsBcs,epidemiology,clinicalsciences
• AcBveparBcipaBonfromacademia,government,industry,providers
• Currently600millionpaBentrecordsin52databases
h>p://ohdsi.org/who-we-are/collaborators/
Whylarge-scaleanalysisisneededin
healthcare
Alldrugs
Allhealthoutcomesofinterest
PaBent-levelpredicBonsforpersonalizedevidencerequires
bigdata
2millionpaBentsseemexcessiveorunnecessary?
• ImagineaproviderwantstocompareherpaBentwithotherpaBentswiththe
samegender(50%),inthesame10-yearagegroup(10%),andwiththesame
comorbidityofType2diabetes(5%)
• ImaginethepaBentisconcernedabouttheriskofketoacidosis(0.5%)
associatedwithtwoalternaBvetreatmentstheyareconsidering
• With2millionpaBents,you’donlyexpecttoobserve25similarpaBentswith
theevent,andwouldonlybepoweredtoobservearelaBverisk>2.0
Aggregateddataacrossahealthsystemof1,000providersmaycontain2,000,000paBents
EvidenceOHDSIseekstogeneratefrom
observaBonaldata• Clinicalcharacteriza(on
– Naturalhistory:Whohasdiabetes,andwhotakesme`ormin?
– Qualityimprovement:WhatproporBonofpaBentswithdiabetesexperiencecomplicaBons?
• Popula(on-leveles(ma(on– Safetysurveillance:Doesme`ormincauselacBcacidosis?
– ComparaBveeffecBveness:Doesme`ormincauselacBcacidosismorethanglyburide?
• Pa(ent-levelpredic(on– Precisionmedicine:Giveneverythingyouknowaboutme,ifItakeme`ormin,whatisthechanceIwillgetlacBcacidosis?
– DiseaseintercepBon:Giveneverythingyouknowaboutme,whatisthechanceIwilldevelopdiabetes?
OHDSI’sapproachtoopenscience
Open
source
socware
Open
science
Enableusers
todo
something
Generate
evidence
• OpenscienceisaboutsharingthejourneytoevidencegeneraBon
• Open-sourcesocwarecanbepartofthejourney,butit’snotafinaldesBnaBon
• Openprocessescanenhancethejourneythroughimprovedreproducibilityof
researchandexpandedadopBonofscienBficbestpracBces
Data+AnalyBcs+DomainexperBse
Standardizingworkflowstoenable
transparent,reproducibleresearch
Open
science
Generate
evidence
Databasesummary
Cohortdefini(on
Cohortsummary
Comparecohorts
Exposure-outcomesummary
Effectes(ma(on
&calibra(on
Comparedatabases
Definedinputs:• Targetexposure
• Comparatorgroup
• Outcome
• Time-at-risk
• ModelspecificaBon
PopulaBon-levelesBmaBonforcomparaBve
effecBvenessresearch:
Is<intervenBonX>be>erthan<intervenBonY>
inreducingtheriskof<condiBonZ>?
Consistentoutputs:• analysisspecificaBonsfortransparencyand
reproducibility(protocol+sourcecode)
• onlyaggregatesummarystaBsBcs
(nopaBent-leveldata)
• modeldiagnosBcstoevaluateaccuracy
• resultsasevidencetobedisseminated
• staBcforreporBng(e.g.viapublicaBon)
• interacBveforexploraBon(e.g.viaapp)
OHDSIDisBnguishingFeatures
• InternaBonaleffort(size&coverage)– 43sourcesterminologiesfromaroundtheworld
• Openscience(depth)– Infrastructureservesthescience– Stack:Terminology,CDM,ETL,QA,VisualizaBon,
NovelanalyBcmethods,Clinicalresearch
• FullinformaBonmodel
HowOHDSIWorks
Sourcedata
warehouse,with
idenBfiable
paBent-leveldata
Standardized,de-
idenBfiedpaBent-
leveldatabase
(OMOPCDMv5)
ETL
Summary
staBsBcsresults
repository
OHDSI.org
Consistency
Temporality
Strength Plausibility
Experiment
Coherence
Biologicalgradient Specificity
Analogy
Compara(veeffec(veness
Predic(vemodeling
OHDSIDataPartners
OHDSICoordinaBngCenter
Standardized
large-scale
analyBcs
Analysis
results
AnalyBcs
development
andtesBng
Researchand
educaBon
Data
network
support
DeepinformaBonmodelOMOPCDMv5.0.1
Concept
Concept_relaBonship
Concept_ancestor
Vocabulary
Source_to_concept_map
RelaBonship
Concept_synonym
Drug_strength
Cohort_definiBon
Standardizedvocabularies
A>ribute_definiBon
Domain
Concept_class
Cohort
Dose_era
CondiBon_era
Drug_era
Cohort_a>ribute
Standardizedderivedelem
ents
Stan
dardized
clin
icaldata
Drug_exposure
CondiBon_occurrence
Procedure_occurrence
Visit_occurrence
Measurement
ObservaBon_period
Payer_plan_period
Provider
Care_siteLocaBon
Death
Cost
Device_exposure
ObservaBon
Note
Standardizedhealthsystemdata
Fact_relaBonship
SpecimenCDM_source
Standardizedmeta-data
Standardizedhealtheconom
ics
Person
Extensivevocabularies
Preparingyourdataforanalysis
PaBent-level
datainsource
system/schema
PaBent-level
datain
OMOPCDM
ETL
design
ETL
implementETLtest
WhiteRabbit:profileyour
sourcedata
RabbitInAHat:mapyoursource
structureto
CDMtablesand
fields
ATHENA:standardized
vocabularies
forallCDM
domains
ACHILLES:profileyour
CDMdata;
reviewdata
quality
assessment;
explore
populaBon-
levelsummaries
OHDSItoolsbuilttohelp
CDM:
DDL,index,
constraintsfor
Oracle,SQL
Server,
PostgresQL;
Vocabularytables
withloading
scripts
h>p://github.com/OHDSI
OHDSIForums:PublicdiscussionsforOMOPCDMImplementers/developers
Usagi:mapyour
sourcecodes
toCDM
vocabulary
ACHILLESHeelDataValidaBon
ATLAStobuild,visualize,andanalyze
cohorts
Characterizethecohortsofinterest
LAERTES:Knowledgebaseofwhatweknow:
literature,labeling,spontaneousreporBng
OHDSIinAcBon
• Generateevidence– Randomizedtrialisthegoldstandard
– ObservaBonalresearchissupporBng• Canitbecomeapartnership?
CharacterizaBon
• TodaywecarryoutRCTswithoutclearknowledgeofactualpracBce
• TherewillbenoRCTswithoutanobservaBonalprecursor
– ItwillberequiredtocharacterizeapopulaBonusinglarge-scaleobservaBonaldatabeforedesigninganRCT
– Diseaseburden– ActualtreatmentpracBce
– Timeontherapy
– CourseandcomplicaBonrate
– Donenowsomewhatthroughliteratureandpilotstudies
TreatmentPathways
Public
Industry
Regulator
AcademicsRCT,Obs
Literature
Laypress
Socialmedia
Guidelines
Formulary
Labels
AdverBsing Clinician
PaBent
Family
Consultant
IndicaBon
Feasibility
Cost
Preference
Localstakeholders
Globalstakeholders Conduits
Inputs
Evidence
Networkprocess
1. JointhecollaboraBve2. ProposeastudytotheopencollaboraBve3. Writeprotocol
– h>p://www.ohdsi.org/web/wiki/doku.php?id=research:studies
4. Codeit,runitlocally,debugit(minimizeothers’work)
5. Publishit:h>ps://github.com/ohdsi
6. EachnodevoluntarilyexecutesontheirCDM
7. Centrallyshareresults8. CollaboraBvelyexploreresultsandjointlypublish
findings
OHDSIinacBon:
Chronicdiseasetreatmentpathways
• ConceivedatAMIA
• Protocolwri>en,codewri>enandtestedat2
sites
• Analysissubmi>edto
OHDSInetwork
• Resultssubmi>edfor7
databases
15Nov2014
30Nov2014
2Dec2014
5Dec2014
OHDSIparBcipaBngdatapartnersAbbre-via(on
Name Descrip(on Popula(on,millions
AUSOM AjouUniversitySchoolofMedicine SouthKorea;inpaBenthospital
EHR2
CCAE MarketScanCommercialClaimsand
EncountersUSprivate-payerclaims 119
CPRD UKClinicalPracBceResearchDatalink UK;EHRfromgeneralpracBce 11CUMC ColumbiaUniversityMedicalCenter US;inpaBentEHR 4GE GECentricity US;outpaBentEHR 33INPC RegenstriefInsBtute,IndianaNetworkfor
PaBentCareUS;integratedhealthexchange15
JMDC JapanMedicalDataCenter Japan;private-payerclaims 3MDCD MarketScanMedicaidMulB-State US;public-payerclaims 17MDCR MarketScanMedicareSupplementaland
CoordinaBonofBenefitsUS;privateandpublic-payer
claims9
OPTUM OptumClinFormaBcs US;private-payerclaims 40STRIDE StanfordTranslaBonalResearchIntegrated
DatabaseEnvironmentUS;inpaBentEHR 2
HKU HongKongUniversity HongKong;EHR 1
Treatmentpathwayeventflow
ProceedingsoftheNaBonalAcademyofSciences,2016
T2DM:Alldatabases
Treatmentpathwaysfordiabetes
Firstdrug
Seconddrug
Onlydrug
Type2DiabetesMellitus Hypertension Depression
OPTUM
GE
MDCDCUMC
INPC
MDCR
CPRD
JMDC
CCAE
PopulaBon-levelheterogeneityacrosssystems,
andpaBent-levelheterogeneitywithinsystems
HTN:Alldatabases
PaBent-levelheterogeneity
25%ofHTNpaBents(10%ofothers)have
auniquepathdespite250Mpop
Monotherapy–diabetes
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1989 1994 1999 2004 2009
AUSOM(SKorea*) CCAE(US#) CPRD(UK*) CUMC(US*)
GE(US*) INPC(US*#) JMDC(Japan#) MDCD(US#)
MDCR(US#) OPTUM(US#) STRIDE(US*)
General
upwardtrend
in
monotherapy
Monotherapy–HTN
AUSOM(SKorea*) CCAE(US#) CPRD(UK*) CUMC(US*)
GE(US*) INPC(US*#) JMDC(Japan#) MDCD(US#)
MDCR(US#) OPTUM(US#) STRIDE(US*)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1989 1994 1999 2004 2009
Academic
medical
centers
differfrom
general
pracBces
Monotherapy–diabetes
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1989 1994 1999 2004 2009
AUSOM(SKorea*) CCAE(US#) CPRD(UK*) CUMC(US*)
GE(US*) INPC(US*#) JMDC(Japan#) MDCD(US#)
MDCR(US#) OPTUM(US#) STRIDE(US*)
General
pracBces,
whether
EHRor
claims,have
similar
profiles
Conclusions:Networkresearch
• ItisfeasibletoencodetheworldpopulaBoninasingledatamodel
– Over600,000,000recordsbyvoluntaryeffort(682,000,000)
• GeneraBngevidenceisfeasible• Stakeholderswillingtoshareresults• Abletoaccommodatevastdifferencesin
privacyandresearchregulaBon