new a. holzinger 709.049 mi, 04.11 - human-centered.ai · 2015. 11. 4. · biomedical data...

92
Status as of Di, 03.11.2015, 10:00 Dear Students, welcome to the 4th lecture of our course. Please remember from the last lecture: modeling of knowledge, medical Ontologies, Classification efforts and the International Classification of Diseases (ICD); Standardized Nomenclature of Medicine Clinical Terms (SNOMED CT); Medical Subject Headings (MeSH); Unified Medical Language System (UMLS); Please always be aware of the definition of biomedical informatics (Medizinische Informatik): Biomedical Informatics is the inter‐disciplinary field that studies and pursues the effective use of biomedical data, information, and knowledge for scientific inquiry, problem solving, and decision making, motivated by efforts to improve human health (and well‐being). 1 WS 2015 A. Holzinger LV 709.049 Mi, 04.11.2015

Upload: others

Post on 15-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Status asofDi,03.11.2015,10:00

Dear Students,welcometothe4thlectureofourcourse.Pleaserememberfromthelastlecture:modelingofknowledge,medicalOntologies,ClassificationeffortsandtheInternationalClassificationofDiseases(ICD);StandardizedNomenclatureofMedicineClinicalTerms(SNOMEDCT);MedicalSubjectHeadings(MeSH);UnifiedMedicalLanguageSystem(UMLS);

Pleasealwaysbeawareofthedefinitionofbiomedicalinformatics(MedizinischeInformatik):BiomedicalInformatics istheinter‐disciplinaryfieldthatstudiesandpursuestheeffectiveuseofbiomedicaldata,information,andknowledgeforscientificinquiry,problemsolving,anddecisionmaking,motivatedbyeffortstoimprovehumanhealth(and well‐being).

1WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 2: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

2WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 3: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Bayes’RuleBiomedicaldatawarehouseBusinesshospitalinformationsystemClinicalworkflowDataintegrationEnterprisedatamodelingInformationretrieval(IR)ProbabilisticModelQualityofinformationretrievalSettheoreticmodelVectorSpaceModel(VSM)

3WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 4: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

4WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 5: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

5WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 6: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

6WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 7: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

7WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 8: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

8WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 9: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Attheendofthisfourthlectureyou…

…haveanoverviewaboutthegeneralarchitectureofanHospitalInformationSystem(detailsinlecture10:MedicalInformationSystemsandBiomedicalKnowledgeManagement);…knowsomeprinciplesofhospitaldatabases;…haveanoverviewonsomebiomedicaldatabases;…arefamiliarwithsomebasicsofinformationretrieval.

9WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 10: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Amongstother problemsomekeychallengesinclude:Increasinglylargeandcomplexdatasets“BigData”duetodataintensiveresearchIncreasingamountsofnon‐standardizedandun‐structuredinformation(e.g.freetext)Dataquality,dataintegration,universalaccessPrivacy,security,safetyanddataprotectionissues(see→Lecture11)Timeaspectsindatabases(Gschwandtner,Gärtner,Aigner &Miksch,2012),(Johnston&Weis,2010).“BigDataresourcesareallawasteoftimeandmoneyifdataanalystscannotfind,orfailtocomprehend,thebasicinformationthatdescribesthedataheldintheresources(Berman,2013b)”DataidentificationiscertainlythemostunderappreciatedandleastunderstoodBigDataissue.Measurements,annotations,properties,andclassesofinformationhavenoinformationalmeaningunlesstheyareattachedtoanidentifierthatdistinguishesonedataobjectfromallotherdataobjectsandthatlinkstogetheralloftheinformationassociatedwiththeidentifieddataobject(Berman,2013a).Communicationofdatabetweenapplicationsystemsmustensuresecuritytoavoidimproperaccess,becausetrustorthelackthereof,isthemostessentialfactorblockingtheadoptionofrapidlyevolvingWebtechnologyparadigmsuchassoftwareasservice(SaaS)ordatadistributionservicessuchasCloudcomputing(Sreenivasaiah,2010).

10WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 11: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Beforewediscussinformationsystems andlearnaboutdatabases,letusstartwithalookintotheHospital…

11WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 12: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Letusstartwithalookintothehospital:Inthisslideweseeatypicalhospitalscenario:medicalprofessionalsaresurroundedbyinformationtechnology.Anolddreamofhospitalmanagerswasalwaystohavean“alldigitalhospital”todigitalizeallworkflowsandtostorealldatainanelectronicway– towardsapaperlesshospital.Althoughmuchefforthasbeenspenttowardsapaperlesshospital,mosthospitalsworldwidearestillfarawayfrombeinga“all‐digitalhospital”(Waterson,Glenn&Eason,2012).Aninterestingstudy:AllhospitalsintheprovinceofStyria(Austria)arewellequippedwithsophisticatedInformationTechnology,whichprovidesall‐encompassingon‐screenpatientinformation.Previousresearchmadeonthetheoreticalproperties,advantagesanddisadvantages,ofreadingfrompapervs.readingfromascreenhasresultedintheassumptionthatreadingfromascreenisslower,lessaccurateandmoretiring.However,recentflatscreentechnology,especiallyonthebasisofLCD,isofsuchhighqualitythatobviouslythisassumptionshouldnowbechallenged.Astheelectronicstorageandpresentationofinformationhasmanyadvantagesinadditiontoafastertransferandprocessingoftheinformation,theusageofelectronicscreensinclinicsshouldoutperformthetraditionalhardcopyinbothexecutionandpreferenceratings.InastudyintheCountyhospitalStyria,Austria,with111medicalprofessionals,workinginareal‐lifesetting,theywereeachaskedtoreadoriginalandauthenticdiagnosisreports,agynecologicalreportandaninternalmedicaldocument,onbothscreenandpaperinarandomlyassignedorder.ReadingcomprehensionwasmeasuredbytheChunkedReadingTest,andspeedandaccuracyofreadingperformancewasquantified.Inordertogetafullunderstandingoftheclinicians'preferences,subjectiveratingswerealsocollected.WilcoxonSignedRankTestsshowednosignificantdifferencesonreadingperformancebetweenpapervs.screen.However,medicalprofessionalsshowedasignificant(90%)preferenceforreadingfrompaper.Despitethehighqualityandthebenefitsofelectronicmedia,paperstillhassomequalitieswhichcannotprovidedelectronicallydodate(Holzingeretal.,2011).BTW:GrazUniversityHospital istheflagshiphospitaloftheStyrianKAGESwith23countyhospitalsandisamongstthelargesthospitalsinEurope.

12WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 13: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Megaissuesrelatedwithhospitalinformationsystemsinclude:dataintegration,datafusion,standardizationissues,clinicalprocessanalysis,modeling,complianceissues,evidencebasedtreatmentanddecisionsupport,privacy,security,safetyanddataprotectionandknowledgediscoveryanddatamining– allconnectedwiththecentraltopicofthislecture:databases.

BTW:TheKAGESusesopenMEDOCS basedonish.med whichisbasedonSAPR3,anoverviewaboutdifferentbusinesshospitalinformationsystemsvendorscanbefoundhere:

13WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 14: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

14

Theteamworkinthehospitalrequiresalotofcommunicationandinformationexchange.Thevisionofabusinessenterprisehospitalinformationsystemistocoverallworkflows,organizationalprocessesandinformationflowselectronically.

Note:Thequalityoftheworkofphysiciansisheavilyinfluencedbytheusabilityoftheiravailableequipment.Intheslideyouseeatypicalworkmeetingofmedicalprofessionals,wheretheydiscussthepatientcasesjointly.Itisimportanttostudyandunderstandtheworkflowsoftheendusersandtoinvolvethemintothedevelopmentofinformationsystemsasearlyaspossiblebyauser‐centereddesignprocess(Holzinger,2003).Experimentsshowedthatbystudyingtheworkflowstheengineersgetdeepinsightsintohowtodevelopanappropriateapplicationforaspecifiedtargetendusergroup(Holzinger,Geierhofer,Ackerl &Searle,2005).

WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 15: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Theaforementionedgoalofan“all‐digitalhospital”produces“bigdata”andremarkablymuchofthedataisunstructuredtext.Interestingly,themainandmostimportantoutputisthemedicalreport(Arztbrief):Intheexampleitisthereportofamedicalimage– nottheimageitselfistherelevantissue– itisthereport(Holzinger,Geierhofer &Errath,2007b).Thehandlingwithunstructureddataisamegachallengeandbringsalongalotofchallengesforcomputers.

15WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 16: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Letusbrieflycomparehumanintelligencewithmachineintelligence.Agoodexampleonthecomplexitywhichwearefacinginhospitalinformationprocessingarethedifferencesbetweenchessandhumannaturallanguageprocessing:Whereaschessisafinite,mathematicallywell‐definedsearchspace,hencewehaveawelldefinedcomputationalspace,withlimitednumbersofmovesandstatesandgroundedinexplicit,unambiguousmathematicalrules,humanlanguageisexactlytheopposite:Ambiguous,contextualandimplicit;groundedinthehumancognitivespace,withaseeminglyinfinitenumberofwaystoexpressoneandthesamemeaning.Note:IBMDeepBluedefeatedtheWorldChessChampionGarryKasparovinasix‐gamematchin1997.Therewereanumberoffactorsthatcontributedtothissuccess,including:asingle‐chipchesssearchengine,amassivelyparallelsystemwithmultiplelevelsofparallelism,astrongemphasisonsearchextensions,acomplexevaluationfunction,andeffectiveuseofaGrandmastergamedatabase.Technically,DeepBluewasamassivelyparallelsystemdesignedforcarryingoutchessgametreesearches.Thesystemwascomposedofa30‐nodeIBMRS/6000SPcomputerand480single‐chipchesssearchengines,with16chesschipsperSPprocessor.TheSPsystemconsistsof28nodeswith120MHzP2SCprocessors,and2nodeswith135MHzP2SCprocessors.Thenodescommunicatedwitheachotherviaahighspeedswitchandallnodeshad1GBofRAM,and4GBofdisk.Duringthe1997matchwithKasparov,thesystemrantheAIX4.2operatingsystem.ThechesschipsinDeepBluewereeachcapableofsearchingupto2.5millionchesspositionspersecond,andcommunicatewiththeirhostnodeviaafastmicrochannelbus(Campbell,Hoane &Hsu,2002).

16WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 17: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

17WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 18: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Youcanrememberwhat welearnedlastlectureaboutworkflowsandworkflowmodelling..

18WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 19: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Healthcareprocessesrequirethecooperationofdifferentorganizationalunitsandvariousmedicaldisciplines.Insuchanenvironmentoptimalprocesssupportbecomescrucial.Inthisslideweseeatypicalorganizationalprocessformedicalorderentryandresultreporting,whichisusedtocoordinatetheinter‐departmentalcommunicationbetweenaward(ambulatorysetting)andtheradiologyunit.Thedepictedprocessisnottailoredtoaspecificclinicalpathway,butshowsanexampleforacharacteristicorganizationalprocedureofthehospital:Anorder(inGerman:Anweisung,Verschreibung)isplacedbyaphysicianatthewardoratanambulatorysetting.Theindicationischeckedintheradiologydepartmentanddependingontheresulttheorderplacerisinformedwhethertherequesthasbeenrejectedorscheduled.Theactualradiologicalexaminationandcorrespondingdocumentationisdoneintheexaminationroom.Theradiologyreportisgeneratedafterwards,whichhastobevalidatedbythephysicianwithhissignature.Thereportissentbacktotheorderplacer.Thisisanexampleforafundamentalprocessofclinicalpracticeandcapturestheorganizationalknowledgenecessarytocoordinatethehealthcareprocessamongdifferentpeopleandorganizationalunits;i.e.,focusisonthesupportofcoreorganizationalprocesses(Lenz&Reichert,2007).

19WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 20: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Thisisjustthatyouhave anidea,howcomplicatedsuchprocessescanbeandyoucanimaginehowdifficultitistodigitalizeallinvolveddata

20WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 21: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Themedicaltreatmentprocessisoftendescribedasdiagnostic–therapeuticcycle(Bemmel&Musen,1997)including:observation,reasoning,andaction.Pleaserememberthatinmedicinewedealwithuncertaininformation(Holzinger&Simonic,2011)andeachpassofthediagnostic‐therapeuticcyclecanbeseenasastepindecreasingtheuncertaintyaboutthepatient’sdisease.Consequently,theobservationprocessalwaysstartswiththepatienthistory(“lookingintothepast”)andproceedswithdiagnosticprocedureswhichareselectedbasedonavailableinformation.TheaimoftheHISistoassisthealthcarepersonnelinmakinginformeddecisions.Maybethemostimportantquestiontobeansweredishowtodeterminewhatisrelevant.Availabilityofrelevantinformationisapreconditionfor(good)medicaldecisions– andthemedicalknowledgeguidesthesedecisions(Lenz&Reichert,2007).FollowingtheprinciplesofEvidencebasedmedicine(EBM)physiciansarerequiredtoformulatequestionsbasedonpatients’problems,searchtheliteratureforanswers,evaluatetheevidenceforitsvalidityandusefulness,andfinallyapplythe(new)informationtopatientstreatment(Hawkins,2005).Thelimitingfactoristheshorttimeaclinicianhastomakeadecision(Gigerenzer &Gaissmaier,2011).

21WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 22: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

22WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 23: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Thisslideshowsaclassicalconceptualmodel:Theheartisacentraldataandcommunicationstructure.Thepatients“enter”(logically)thesystemthroughtheadmissionontheleftside,transferanddischargefunctionsofthecoreandleavesthesystem,atleastpartially,throughtherightside.Inthemainfocusisacentraldatabase,althoughalternativesolutionshaveoptedforamoredistributedconstructionofdatabases;nonethelesscentralorderingprincipleshavetobekepttoachievethenecessaryintegrationofinformationandthedistributiontothevariouspointswhereitisneeded,beitintheareaofhospitalmanagementorinthefieldofcareprovision.Thiscentraldatabaseisservingthecentraloperationalpurposesofthehospitalinthecontextofitsdualgoals(Haux etal.,1998),(Reichertz,2006),(Haux,2006).

23WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 24: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Hereyou seethetypicalarchitectureofsuchasystem

ICU=IntensiveCareUnitNICU=NeonatalIntensiveCareUnitPICU=PediatricIntensiveCareUnit

Therearemanydifferentapplicationarchitecturesinuse,andwewillcometoitbacklater,in→Lecture10,soherejustONEexampleforaenterprisebusinesshospitalinformationsystemasitiscalledprofessionally.However,wewantnowtoconcentrateonsometechnicalissuesofdatabases.

24WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 25: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

In ahospitaltherearedata,data,data,…

Inthisclassicalimageby(Shortliffe,Perrault,Wiederhold &Fagan,2001)itbecomesveryobviousthatdatabasesarecentralcomponentsforanhospitalinformationsystem.Averyinterestingslideisthenext,whereweseeanhistoricalexamplefromthe“stone‐age”ofcomputerscience.

25WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 26: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Thispictureby(Gardner,Pryor&Warner,1999)isinsofarinterestingasitshowsusclearlyamegaissueuptothepresent:tointegrateandfusiondifferentdataandtomakeitaccessibletotheclinician.Whilethereismuchresearchontheintegrationofheterogeneousinformationsystems,ashortcomingisintheintegrationofavailabledata.Justtoclarifythedifferencesbetweendataintegrationanddatafusion:

Dataintegrationinvolvescombiningdataresidingindifferentdistributedsourcesandprovidinguserswithaunifiedviewofandaccesstothesedata.Ithasbecomethefocusofextensivetheoreticalandpracticalwork,andnumerousopenproblemsremainunsolved(Lenzerini,2002).Datafusionistheprocessofmergingmultiplerecordsrepresentingthesamereal‐worldobjectintoasingle,consistent,accurate,andusefulrepresentation(Bleiholder &Naumann,2008).ThetrendtowardsP4medicine(Predictive,Preventive,Participatory,Personalized)hasresultedinasheermassofthegenerated(‐omics)data,henceamainchallengeisintheintegrationandfusionofheterogeneousdatasources,especiallyintheintegrationofdatafromtheclinicaldomainwithsourcesfromthebiologicaldomain.

26WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 27: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Integration – datafusion– fordataanalysis– thecentralgoaltosupportdecisionmakingprocesses– datavirtualization– abstractlayer– businessintelligence–serviceorientedarchitecture

27WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 28: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Database(DB)istheorganizedcollectionofdatathroughacertaindatastructure(e.g.hash‐table,adjacencymatrix,graphstructure,etc.).Databasemanagementsystem(DBMS)issoftwarewhichoperatestheDB.WellknownDBMSsinclude:Oracle,IBMDB2,MicrosoftSQLServer,MicrosoftAccess,MySQL,SQLite.ExamplesforGraphDatabasesincludeInfoGrid,Neo4j,orBrightstarDB.TheusedDBisnotgenerallyportable,butdifferentDBMSscaninter‐operatebyusingstandardssuchasSQLandODBC.Databasesystem(DBS)=DB+DBMS.Thetermdatabasesystememphasizesthatdataismanagedintermsofaccuracy,availability,resilience,andusability.Datawarehouse(DWH)isanintegratedrepositoryusedforreportingandlongtermstorageofanalysisdata.DataMarts(DM)areaccesslayersofaDWHandareusedastemporaryrepositoriesfordataanalysis.

RecommendableReadinginclude:(Plattner,2013),(Robinson,Webber&Eifrem,2013):Robinson,I.,Webber,J.&Eifrem,E.2013.GraphDatabases,O'ReillyMedia.Plattner,H.2013.ACourseinIn‐MemoryDataManagement:TheInnerMechanicsofIn‐MemoryDatabases,HeidelbergNewYorkDordrechtLondon,Springer.Oneofthestandardtextbooksisthe6thediton of"DatabaseSystemConcepts"by(Silberschatz,Korth &Sudarshan,2010).

28WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 29: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

ADWHisanintegratedsystem,specificallydesignedforenterprisebusinessdecisionsupportandcanbeusedinhospitalsandinbiomedicalapplications.InSlide4‐13weseeanexampleofahospitaldatawarehouse:Onthelefttherearethe(heterogeneous)datasources,suchasPACS(PictureArchiving&CommunicationSystem)andRIS(RadiologicalInformationSystem),andapartfromthecoreHIS,somespecialdatabaseswhichcanalsoincludeproprietaryandlegacysystems.ForthedatastagingandareaserverstheCommonObjectRequestBrokerArchitecture(CORBA)isused,astandarddefinedbytheObjectManagementGroup(OMG)thatsupportsmultipleplatforminteroperability(Zhang,Zhang,Tjandra &Wong,2004).Thisisastandardhospitalinformationarchitectureand– typically‐ withnointegrationoflaboratorydatasourcesandmostofallnoOmics‐dataintegration,asforexamplefromthepathologyorabio‐bank.

29WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 30: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

ADWHcanbesubdividedintoso‐calleddatamarts(DM),whichcanbeseenasspecificaccesslayerofaDWH,orientedtoaspecificteam.Slide4‐14showsthearchitectureoftheMayoclinicDWH,whichisincrementallyinstantiatingeachcomponentofthearchitectureondemand.Dataintegrationproceedsfromlefttoright(leftmostyouseetheprimarydatasources;movingright,thedataareintegratedintostagingandreplicationservices,withfurtherrefinement).Thelayersare:1)Subjects=thehighestlevelareasthatdefinetheactivitiesoftheenterprise(e.g.Individual);2)Concepts=thecollectionsofdatathatarecontainedinoneormoresubjectareas(e.g.,Patient,Provider,Referrer,etc.);3)BusinessInformationModels=theorganizationofthedatathatsupporttheprocessesandworkflowsoftheenterprise’sdefinedConcepts.(Chute,Beck,Fisk&Mohr,2010)

30WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 31: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

CloudcomputingisagoodexampleforSoftwareasaservice – flexiblespacevianetwork– thisremindsustotheearlydaysofcomputingwithmainframecomputingandthin‐clientterminals.

31WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 32: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

AstandardenvironmentforproductionandprocessingofgenomicdatacanbeseeninSlide4‐15:Sequencinglabssubmittheirdatatolargedatabases,e.g.GenBank,NationalCenterofBiotechnologyInformation(NCBI);EuropeanBioinformaticsInstitute(EMBL)database;DNADataBankofJapan(DDBJ);ShortReadArchive(SRA);GeneExpressionOmnibus(GEO)orMicroarraydatabaseArrayExpress.Thesemaintain,organizeanddistributethesequencingdata.Mostusersaccesstheinformationeitherthroughweb‐basedapplicationsorthroughintegrators,suchasEnsembl,theUniversityofCaliforniaatSantaCruz(UCSC)GenomeBrowserorGalaxy.Theendusershavetodownloadgenomicdatafromtheseprimaryandsecondarysources(Stein,2010).Remember:SequencingistheprocessofdeterminingthepreciseorderofnucleotideswithinaDNAmoleculetodeterminetheorderofthefourbases—adenine,guanine,cytosine,andthymine—inastrandofDNA.TheadventofrapidDNAsequencingmethodshasgreatlyacceleratedbiologicalandmedicalresearchanddiscoveryandproduceslargedatasets.Sequencinghasbecomeindispensableforbasicbiologicalresearch,andinnumerousappliedfieldssuchasdiagnosticsandbiotechnology.Note:Abiobank isaphysicalplacewhichstoresbiologicalspecimens– andinsomecasesalsodata(Roden etal.,2008).

32WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 33: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Hereweseeacloud‐basedgenomeinformaticssystem.Insteadofseparategenomedatasetsstoredatvariouslocations,thedatasetsarestoredinthecloudasvirtualdatabases.Webservicesrunontopofthesedatasets,includingtheprimaryarchivesandtheintegrators,runningasvirtualmachineswithinthecloud.Casualusers,whoareaccustomedtoaccessingthedataviatheNCBI,DDBJ,Ensembl orUCSC,workasusual;thefactthattheseserversarelocatedinsidethecloudisinvisibletothem.Poweruserscancontinuetodownloadthedata,buthaveanattractivealternative.Insteadofmovingthedatatothecomputationalcluster,theymovethecomputationalclustertothedata(Stein,2010).Note:Cloudcomputingisbasedonsharingofresourcestoachievecoherenceandeconomiesofscaleoveranetwork(similartotheelectricitygrid).FourAtthefoundationofcloudcomputingisthebroaderconceptofconvergedinfrastructureandsharedservices.Cloudprovidersoffertheirservicesaccordingtoseveralfundamentalmodels1)Infrastructureasaservice(IaaS),2)Platformasaservice(PaaS),3)Softwareasaservice(SaaS)

33WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 34: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Justanexampleforacloudbasedservice:TheMasterIndexisthePACSCloudcoreentityandcontainsinformationaboutothermodules,includingGatewaysandCloudSlaves(repositoryanddatabase).Italsoprovidesauthenticationservicestoinstitutionalgatewaysandallidentifiableinformationrelatedwithpatientsarestoredinamasterindexdatabase,fundamentaltoensuresolutionsforconfidentialityandprivacy.TheCloudSlavesprovide,ononehand,storageofsightlessdata(objectsrepositories)and,onotherhand,adatabasecontainingallnoidentifiablemetadataextractedfromDICOMstudies,i.e.themostdemandingtaskconcerningcomputationalpower(Bastiao‐Silva,Costa,Silva&Oliveira,2011).

34WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 35: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Wehavetodeterminebetweenfederateddataandwarehouseddata.Afederateddatabasesystemisameta‐databasemanagementsystem,whichtransparentlymapsmultipleheterogeneousandautonomousdatabasesystemsintoasinglefederateddatabaseandthiscanbea“virtualdatabase”– withoutdataintegrationasitisindatawarehouses.Intheslidewecanseeonthey‐axisthedataintegrationarchitectureandonthex‐axistheknowledgerepresentationmethodologiesandwherecurrentdataintegrationsystemsliealongthiscontinuum.Theessenceofthisimageisthatthereisno“best‐solution”:Asystemdesignedtohavefullcontrolofdataandfastqueriescanhavedifficultyexpressingcomplexbiologicalconceptsandintegratingthem.SystemsthatemployhighlyexpressiveknowledgerepresentationmethodologiessuchasOntologiesaremoreabletorepresentandintegratecomplexbiologicalconceptsbuthavemuchlesstractablequeries(Louieetal.,2007).

35WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 36: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Obviously thereisadifferencebetweenthedatabasesfortheHospitalInformationSystemandthedatabaseswhichareusedforscientificwork.

36WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 37: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

WhereasdatabasesfortheuseinHISareprocesscenteredandcentralfortheelectronicpatientrecord,biomedicaldatabasesarelibrariesofallsortsoflifesciencedata,collectedfromscientificexperimentsandcomputationalanalyses.Suchdatabasescontainexperimentalbiologicaldatafromclinicalwork,genomics,proteomics,metabolomics,microarraygeneexpression,phylogenetics,pharmacogenomics,etc.Examples:Text:e.g.PubMed,OMIM(OnlineMendelian InheritanceinMan);Sequencedata:e.g.Entrez,GenBank (DNA),UniProt (protein).Proteinstructures:e.g.PDB,StructuralClassificationofProteins(SCOP),CATH(ProteinStructureClassification);Anoverviewcanbefoundhere:(Masic &Milinovic,2012),Onlineopenaccessvia:http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3544328Note:Pharmacogenomicsisthetechnologyfortheanalyticsofhowgeneticmakeupaffectsanindividual'sresponsetodrugs– soitdealswiththeinfluenceofgeneticvariationondrugresponseinpatientsbycorrelatinggeneexpressionorsingle‐nucleotidepolymorphismswithefficacyandtoxicity.Thecentralaimistooptimizedrugtherapytoensuremaximumeffectivenesswithminimaladverseeffectsandisacoretowardspersonalizedmedicine.

37WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 38: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Agood videocanbeseenhere:https://www.youtube.com/watch?v=DSHhep_w6pk

TheProteinDataBankarchive‐informationaboutthe3Dshapesofproteins,nucleicacids,andcomplexassemblieshelpsstudentsandresearchersunderstandallaspectsofbiomedicine,fromproteinsynthesistohealthanddisease.AsamemberofthewwPDB,theRCSBPDBcuratesandannotatesPDBdata.

TheRCSBPDBbuildsuponthedatabycreatingtoolsandresourcesforresearchandeducationinmolecularbiology,structuralbiology,computationalbiology,andbeyond.

Remember:Proteinsarethemoleculesusedbythecellforperformingandcontrollingcellularprocesses,including:degradationandbiosynthesisofmolecules,physiologicalsignaling,energystorageandconversion,formationofcellularstructuresetc.Proteinstructuresaredeterminedwithcrystallographicx‐raymethodsorbynuclearmagneticresonancespectroscopy.Oncetheatomiccoordinatesoftheproteinstructurehavebeendetermined,atableofthesecoordinatesisdepositedintotheproteindatabase(PDB),aninternationalrepositoryfor3Dstructurefiles:http://www.rcsb.org/pdb/ThisdatabaseishandledbytheRCSB(ResearchCollaboratory forStructuralBiology)attheRutgersUniversityandUCSanDiego.PDBisthemostimportantsourceforproteinstructures.Beforeanewstructureofaproteinisadded,acarefulexaminationofthedatamustbecarriedouttoguaranteethequalityofthestructure.ThePDBdatafilecontains,amongothers,thecoordinatesofalltheatomsoftheprotein(Wiltgen &Holzinger,2005),(Wiltgen,Holzinger&Tilz,2007).

38WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 39: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

APDBstructureentryshouldbecitedwithitsPDBIDandprimaryreference.Forexample:PDBID:102LD.W.Heinz,W.A.Baase,F.W.Dahlquist,B.W.Matthews(1993)HowAmino‐AcidInsertionsareAllowedinanAlpha‐Helix

ofT4LysozymeNature361:561.

AnentrywithoutapublishedreferencecanbecitedwiththePDBID,authornames,andtitle:PDBID:1CI0W.Shi,D.A.Ostrov,S.E.Gerchman,V.Graziano,H.Kycia,B.Studier,S.C.Almo,S.K.Burley,NewYorkStructuralGenomiX

ResearchConsortium(NYSGXRC).TheStructureofPNPOxidasefromS.cerevisiae

AnentrymayalsobereferencedusingitsDigitalObjectIdentifier(DOI).TheDOIsforPDBentriesallhavethesameformat:10.2210/pdbXXXX/pdb,whereXXXXshouldbereplacedwiththedesiredPDBID.TheDOIcanbeusedaspartofaURLtoobtainthisdatafile(http://dx.doi.org/10.2210/pdb4hhb/pdb),orcanbeenteredinaDOIresolver(suchashttp://www.crossref.org/)toautomaticallylinktopdb4hhb.ent.gzonthemainPDBftparchive(ftp://ftp.wwpdb.org).Forexample,theDOIforPDBentry4HHBis"10.2210/pdb4hhb/pdb".ThislinksdirectlytotheentryinthePDBfileformatontheFTPserver.ImagesfromStructureSummarypagesshouldcitetheRCSBPDBandthePDBentry:ImagefromtheRCSBPDB(www.rcsb.org)ofPDBID1BNA(H.R.Drew,R.M.Wing,T.Takano,C.Broka,S.Tanaka,K.

Itakura,R.E.Dickerson (1981)StructureofaB‐DNAdodecamer:conformationanddynamicsProc.Natl.Acad.Sci.USA 78:2179‐2183).

ImagescreatedusingPDBdataandothersoftwareshouldcitethePDBIDandthemoleculargraphicsprogramused.Imageof1AOI(K.Luger,A.W.Mader,R.K.Richmond,D.F.Sargent,T.J.Richmond(1997)structureofthecoreparticleat2.8

AresolutionNature389:251‐260)createdwithProteinWorkshop(J.L.Moreland,A.Gramada,O.V.Buzko,Q.Zhang,P.E.Bourne(2005)TheMolecularBiologyToolkit(MBT):amodularplatformfordevelopingmolecularvisualizationapplications.BMCBioinformatics6:21).

39WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 40: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

RememberthestructuraldimensionswhichwediscussedinLecture1andLecture2.ThisSlideby(Kampen,2013)isaveryniceoverviewofvariousdatabasesaddressingthedifferentmicroscopicdimensions.Additionally,thedataonthelevelofthehospitalinformationsystemsareadded– sothatyouhaveagoodsummaryoftheaforementioned.IfwetakeasideLiteraturedatabasesandontologies(intheupperrightcornerofthisSlide)westartwith:Genomedatabases:Ensembl http://www.ensembl.org/index.htmlNucleotidesequenceEMBL‐Bankhttp://www.ebi.ac.uk/ena/Geneexpression:ArrayExpress http://www.ebi.ac.uk/arrayexpressProteomes:UniProt http://www.uniprot.org/Proteins:InterPro http://www.ebi.ac.uk/interpro/Proteinstructure:PDBhttp://www.rcsb.org/pdb/home/home.doProteinInteractions:IntAct http://www.ebi.ac.uk/intact/Chemicalentities:ChEMBL https://www.ebi.ac.uk/chembl/Pathways:Reactome http://www.reactome.org/Systems:BioModels http://www.ebi.ac.uk/biomodels‐main/

WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 41: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Ensembl (nottomixupwithEnsemble;‐)isagoodexampleforaGenomedatabaseandisajointprojectbetweentheEuropeanBioinformaticsInstituteandtheWellcome TrustSangerInstitute,whichwaslaunchedin1999inresponsetotheimminentcompletionoftheHumanGenomeProject(Flicek etal.,2011).Itsaimremainstoprovideacentralizedresourceforgeneticists,molecularbiologistsstudyingthegenomesofourownspeciesandothervertebratesandmodelorganisms.Ensembl providesoneofseveralwell‐knowngenomebrowsersfortheretrievalofgenomicinformation.

41WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 42: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

ArrayExpress isadatabaseoffunctionalgenomicsexperimentsthatcanbequeriedandthedatadownloaded.Itincludesgeneexpressiondatafrommicroarrayandhighthroughputsequencingstudies.DataiscollectedtoMIAMEandMINSEQEstandards.ExperimentsaresubmitteddirectlytoArrayExpress orareimportedfromtheNCBIGEOdatabase.MIAME=MinimumInformationAboutaMicroarrayExperiment.Thisisthedatathatisneededtoenabletheinterpretationoftheresultsoftheexperimentunambiguouslyandpotentiallytoreproducetheexperiment(Brazma etal.,2001).ThesixmostcriticalelementscontributingtowardsMIAMEare:1)Therawdataforeachhybridisation (e.g.,CELorGPRfiles),2)Thefinalprocessed(normalised)dataforthesetofhybridisations intheexperiment;3)Theessentialsampleannotationincludingexperimentalfactorsandtheirvalues,4)theexperimentaldesignincludingsampledatarelationships;5)Annotationofthearray(e.g.,geneidentifiers,genomiccoordinates,probeoligonucleotidesequencesorreferencecommercialarraycatalognumber),and6)Laboratoryanddataprocessingprotocols(e.g.,whatnormalisation methodhasbeenusedtoobtainthefinalprocesseddata);see:http://www.mged.org/Workgroups/MIAME/miame.html

42WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 43: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

IntAct isanopensourcedatabaseforprotein‐proteininteractions.Thewebinterfaceprovidesbothtextualandgraphicalrepresentationsofsuchproteininteractions,andallowsexploringinteractionnetworksinthecontextoftheGOannotationsoftheinteractingproteins.Moreover,awebserviceallowsdirectcomputationalaccesstoretrieveinteractionnetworksinXMLformat.IntActcontainsbinaryandcomplexinteractionsimportedfromtheliteratureandcuratedincollaborationwiththeSwiss‐Prot team,makingintensiveuseofcontrolledvocabulariestoensuredataconsistency(Hermjakob etal.,2004).http://www.ebi.ac.uk/intact

43WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 44: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

TheBioModels Databaseisafreely‐accessibleonlineresourceforstoring,viewing,retrieving,andanalyzingpublished,peer‐reviewedquantitativemodelsofbiochemicalandcellularsystems.Thestructureandbehaviorofeachsimulationmodelarethoroughlychecked;inaddition,modelelementsareannotatedwithtermsfromcontrolledvocabulariesaswellaslinkedtorelevantdataresources.Modelscanbeexaminedonlineordownloadedinvariousformatsandreactionnetworkdiagramscanbegeneratedfromthemodelsinseveralformats.BioModelsDatabasealsoprovidesfeaturessuchasonlinesimulationandtheextractionofcomponentsfromlargescalemodelsintosmallersub‐models.Thesystemprovidesarangeofwebservicesthatexternalsoftwaresystemscanusetoaccessup‐to‐datedatafromthedatabase(Lietal.,2010).http://www.ebi.ac.uk/biomodels/Note:Quantitativemodelsofbiochemicalandcellularsystemsareusedtoanswerresearchquestionsinthebiologicalsciencesanddigitalmodelingisofgrowinginterestinmolecularandsystemsbiology.Awell‐knownexampleistheVirtualHuman(Kell,2007).

44WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 45: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Thelargestmonasterylibraryoftheworld– agoodexampleforawell‐definedknowledgespace.

Yes,perfectlycorrect– thisGoldenRetrieverisbringingbackthewoodenstick– heisretrievingit.Thisisexactlywhatthewordtoretrievemeans:bringingsomethingback.

45WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 46: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Pleaseremember thebasicdifferencesbetweenretrievalanddiscovery:Retrievalisbringingbackanalreadyknownobject,whereasdiscoveryisfindingsomethingwhichwaspreviouslyunknown.Inotherwords:RetrievalisdealingwithknownobjectsandDisovery/Miningisfindingnewthings– inourcasenewinsight(sensemaking)intodata.Slide4‐26makesitclear:

Maimon &Rokach (2010)(Maimon &Rokach,2010)defineKnowledgeDiscoveryinDatabases(KDD)asanautomatic,exploratoryanalysisandmodelingoflargedatarepositoriesandtheorganizedprocessofidentifyingvalid,novel,usefulandunderstandablepatternsfromlargeandcomplexdatasets.DataMining(DM)isthecoreoftheKDDprocess(Witten,Frank&Hall,2011).ThetermKDDactuallygoesbacktothemachinelearningandArtificialIntelligence(AI)community(Piatetsky‐Shapiro,2000).Interestingly,thefirstapplicationinthisareawasagaininmedicalinformatics:TheprogramRxwasthefirstthatanalyzeddatafromabout50,000Stanfordpatientsandlookedforunexpectedside‐effectsofdrugs(Blum&Wiederhold,1985).ThetermreallybecamepopularwiththepaperbyFayyadetal.(1996)(Fayyad,Piatetsky‐Shapiro&Smyth,1996),whodescribedtheKDDprocessconsistingof9subsequentsteps:

46WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 47: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

1.Learningfromtheapplicationdomain:includesunderstandingrelevantpreviousknowledge,thegoalsoftheapplicationandacertainamountofdomainexpertise;2.Creatingatargetdataset:includesselectingadatasetorfocusingonasubsetofvariablesordatasamplesonwhichdiscoveryshallbeperformed;3.Datacleansing(andpreprocessing):includesremovingnoiseoroutliers,strategiesforhandlingmissingdata,etc.);4.Datareductionandprojection:includesfindingusefulfeaturestorepresentthedata,dimensionalityreduction,etc.;5.Choosingthefunctionofdatamining:includesdecidingthepurposeandprincipleofthemodelforminingalgorithms(e.g.,summarization,classification,regressionandclustering);6.Choosingthedataminingalgorithm:includesselectingmethod(s)tobeusedforsearchingforpatternsinthedata,suchasdecidingwhichmodelsandparametersmaybeappropriate(e.g.,modelsforcategoricaldataaredifferentfrommodelsonvectorsoverreals)andmatchingaparticulardataminingmethodwiththecriteriaoftheKDDprocess;7.Datamining:searchingforpatternsofinterestinarepresentationalformorasetofsuchrepresentations,includingclassificationrulesortrees,regression,clustering,sequencemodeling,dependencyandlineanalysis;8.Interpretation:includesinterpretingthediscoveredpatternsandpossiblyreturningtoanyoftheprevioussteps,aswellaspossiblevisualizationoftheextractedpatterns,removingredundantorirrelevantpatternsandtranslatingtheusefulonesintotermsunderstandablebyusers;9.Usingdiscoveredknowledge:includesincorporatingthisknowledgeintotheperformanceofthesystem,takingactionsbasedontheknowledgeordocumentingitandreportingittointerestedparties,aswellascheckingfor,andresolving,potentialconflictswithpreviouslybelievedknowledge(Holzinger,2013).

InInformationretrievalaqueryqisdefinedasaformulation(N,L)=qandthematcheswithanindexIMatching(q,I)retrievesrelevantdatatosatisfythesearchquery(Baeza‐Yates&Ribeiro‐Neto,2011).

47WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 48: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Pleaseremember thedifferencesbetweendataobjectsandinformationobjects–dataisanabstractrepresentationinthecomputationalspace– informationisperceivableforthecognitivespace(Notethatitdoesnotmeanthatinformationisautomaticallyknowledge–forgettingknowledgewemustusebothourperceptionandcognition,i.e.humanintelligence)

48WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 49: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

AnexcellentstartinthedeterminationbetweenDRandIRistheworkof(VanRijsbergen,1979):ThemostimportantdifferenceisthatthedatamodelinDRisdeterministic,whereaswespeakaboutprobableinformationintheIRModel,henceinformationretrievalisprobabilistic(Simonic&Holzinger,2010).*Monothetic =typeinwhichallmembersareidenticalonallcharacteristics;**Polythetic =typeinwhichallmembersaresimilar,butnotidentical;

49WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 50: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

IRcanbedefinedasare‐callofalreadyexistinginformation,notaimingatthediscoveryofnewstructuresasitisthegoalinKnowledgeDiscoveryandDataMining(see→Lecture6).Aswehavealreadyheardseveraltimes,inhospitalinformationsystemsmostofthedataconsistsofmedicaldocuments,whichconsistmostlyofunstructuredinformation:text.But:Whatistext?Fromacomputationalperspective,textconsistsofsequencesofcharacterstrings,thesyntax(Hotho,Nürnberger &Paaß,2005),henceitisanabstractrepresentationofnaturallanguageandthechallengesareinsemantics(meaning).TextprocessingbelongstothefieldofNaturallanguageprocessing(NLP)whichishighlyinterdisciplinary,dealingwiththeinteractionbetweenthecognitivespace(naturallanguages)andthecomputationalspace(formallanguages).Assuch,NLPiscloselyrelatedtoHCI.Textminingisasubfieldofdatamining.TheoriginalgoalofIRwastofinddocumentswhichcontainanswerstoquestionsandnotthefindingofanswersitself(Hearst,1999).Forthispurposestatisticalmeasuresandmethodsareused,andweneedaformaldescriptionfirst.

50WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 51: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Thisisthegeneralprinciple:Theenduserformulateshisqueryviatheuserinterface,informofaTextOperations(“userneed”).Thenextstepistherepresentation(logicaldocumentviewDintheformalmodelin→Slide4‐30)ofthedocumentsandtherepresentationofthereasoningstrategy,querylogicalviewQ(comparewith→Slide4‐30and→Slide4‐31).Theresultisarankingoftheretrieveddocuments,whichwillbedisplayedviatheuserinterface.

51WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 52: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

52

ModelingtheIR‐processiscomplex,becausewearedealingwithimprecise,vagueanduncertainelements,thusitisdifficulttoformalizeduetohighinfluencesofhumanfactors,i.e.relevanceandinformationneeds,whicharehighlysubjectiveandcontextspecific.However,inthedefinitionofanyIR‐modelwecanidentifysomecommonaspects(Canfora &Cerulo,2004).Thefirststepistherepresentationofdocumentsandinformationneeds.Fromtheserepresentationsareasoningstrategycanbedefined,whichsolvesarepresentationsimilarityproblemtocomputetherelevanceofdocumentswithrespecttothequeries.VariousstrategieshavebeenintroducedwiththeaimofimprovingtheIR‐process.Weclassifythesemethodologiesundertwomainaspects:Representation(query&document,seeSlide→4‐33)andReasoning(applicationofdiversemethods,see→Slide4‐34).LettheIRModelbeaquadruple

Eq.4‐1 IR={D,Q,F,R(q_i,d_j)}

Disasetcomposedoflogicalviews(representationcomponent)ofthedocumentswithinacollection;Qisasetoflogicalviews(representationcomponent)oftheuserinformationneeds(thesearecalledqueries);Fisaframeworkformodelingdocumentrepresentations,queriesandtheirrelationships(reasoningcomponent);ThisincludessetsandBooleanrelations,vectorsandlinearalgebraoperations,samplespacesandprobabilitydistributions;R(qi,dj)isarankingfunction(→Slide4‐31)thatassociatesarealnumberwithaqueryrepresentationqi Qandadocumentrepresentationdj D.Suchrankingdefinesanorderingamongthedocswithregardtothequeryqi.Theenduserin→Slide4‐29formulateshisqueryinformofatextoperation,thenextstepistherepresentation(logicalviewD)ofthedocumentsandtherepresentationofthereasoningstrategy,bothlogicalviewsDandQ(comparewithSlide4‐31)resultinarankingoftheretrieveddocuments.

WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 53: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

ThelogicalviewsDandQresultintherankingfunctionR(qi,dj)accordingto(Baeza‐Yates&Ribeiro‐Neto,2011)

Speak:Rindexed dsubscriptjandqsubscripti

53WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 54: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Guess whichalgorithmthisis?AshortdescriptioncanbefoundinHastie,T.,Tibshirani,R.&Friedman,J.2009.TheElementsofStatisticalLearning:DataMining,Inference,andPrediction.SecondEdition,NewYork,Springer.

54WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 55: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Yes!Alot differentmethods– everymethodhavingparticularadvantagesanddisadvantages– wecannotdiscussmuchhere,butwecangetaroughoverview.

55WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 56: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

TherepresentationcomponentisanessentialpartofeveryIRsystem,asitistherepresentationoftheinformationitself(visibletotheuser):informationcanbeprocessedifitisrepresentedinanappropriateway.Queriesaretherepresentationofinformationneedsofauser.Note:Atextcanbecharacterizedbyusingfourattributes:syntax,structure,semantics,andstyle.Atexthasagivensyntaxandastructure,whichareusuallydictatedbytheapplicationorbythepersonwhocreatedit.Textalsohassemantics,specifiedbytheauthorofthedocument.Additionally,adocumentmayhaveapresentationstyleassociatedwithit,whichspecifieshowitshouldbedisplayedorprinted.Inmanyapproachestotextrepresentationthestyleiscoupledwiththedocumentsyntaxandstructure(LaTeX).XMLseparatestherepresentationofsyntaxandstructures,definedeitherbyaDTDoranXSD,andstyle,whichiscapturedbyXSL(Canfora &Cerulo,2004).Note:Ann‐gramisasubsequenceofnitemsfromagivensequence.Theitemsinquestioncanbephonemes,syllables,letters,wordsorbasepairsaccordingtotheapplication.Ann‐gramofsize1isreferredtoasa"unigram";size2isa"bi‐gram"(or,lesscommonly,a"di‐gram");size3isa"tri‐gram";size4isa"four‐gram"andsize5ormoreissimplycalledan"n‐gram".Somelanguagemodelsbuiltfromn‐gramsare"(n−1)‐orderMarkovmodels".Ann‐grammodelisatypeofprobabilisticmodelforpredictingthenextiteminsuchasequence.n‐grammodelsareusedinvariousareasofstatisticalnaturallanguageprocessingandgeneticsequenceanalysis.

56WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 57: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Deeplearningalgorithmsarebasedondistributedrepresentations,withtheassumptionthatobserveddataisgeneratedbytheinteractionsofmanydifferentfactorsondifferentlevels.Deeplearningaddstheassumptionthatthesefactorsareorganizedintomultiplelevels,correspondingtodifferentlevelsofabstractionorcompositionandvariousnumbersoflayersandlayersizescanbeusedtoprovidedifferentamountsofabstraction.Bengio,Y.;Courville,A.;Vincent,P.(2013)."RepresentationLearning:AReviewandNewPerspectives".IEEETransactionsonPatternAnalysisandMachineIntelligence35(8):1798–1828

Reasoningreferstothesetofmethods,models,andtechnologiesusedtomatchdocumentandqueryrepresentationsintheretrievaltask.Strictlyrelatedwiththereasoningcomponentistheconceptofrelevance.TheprimarygoalofanIRsystemistoretrievethedocumentsrelevanttoaquery.Thereasoningcomponentdefinestheframeworktomeasuretherelevancebetweendocumentsandqueriesusingtheirrepresentations(Canfora &Cerulo,2004).Google,forexample,usesakeywordbasedvectorspacemodel(see→Slide4‐38)alongwithgraph‐basedprobabilitytheoriesandFuzzysettheories.Slide4‐35showsaconciseoverviewofsomeselectedmethods,accordingtovariousdocumentproperties.

57WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 58: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

TherearemanymethodsofIR,fordetailsconsultastandardreferencee.g.Baeza‐Yates&Ribeiro‐Neto (2011).SettheoreticapproachesincludetheClassicSet‐basedBoolean,theExtendedBooleanandtheFuzzyApproach;AlgebraicapproachesincludetheGeneralizedVectorModel,LatentSemanticIndexing(LSI),NeuralNetworks;andtheProbabilisticapproachincludesBayesianNetworks,LanguageModelsandInferenceNetworks.Wewilldiscussonlyafewandtheseverybriefly,sothatyouhaveaquickoverview:Thesettheoreticapproach:BooleanModelinSlide4‐36andSlide4‐37;theVectorSpaceModelinSlide4‐38toSlide4‐42;andtheProbabilisticModelinSlide4‐43toSlide4‐44.

58WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 59: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Documents/queriesarerepresentedasasetofindexterms;queriesareBooleanexpressions(AND,OR,NOT);FortheBooleanmodel,theindextermweightvariablesarebinary,i.e.w_(i,j)∈{0│1}.AqueryqisaconventionalBooleanexpression.Letq _dnf bethedisjunctivenormalformofthequeryq.Further,letq_ccbeanyoftheconjunctivecomponentsofq _dnf.Thesimilarityofadocumentd_j tothequeryqisdefinedas

Ifsim(d_j,q)=1thentheBooleanmodelpredictsthatthedocumentd_j isrelevanttoqueryq.Otherwisethepredictionisthatthedocumentisnotrelevant.Fordetailspleasereferto(Baeza‐Yates&Ribeiro‐Neto,2011)

59WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 60: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

TheBooleanModelhasseveraladvantages,includingeasytounderstand,exactformalismandthequerylanguageisexpressive;however,seriousdisadvantages,e.g.nopartialmatches,the“bag‐of‐words”representationdoesnotaccuratelyconsiderthesemanticsofdocuments(Vallet,Fernández &Castells,2005),andthequerylanguageiscomplicated,finallytheretrieveddocumentscannotberanked.

TheExtendedBooleanModel(EBM)by(Salton,Fox&Wu,1983)overcomessomedisadvantagesbymakinguseofpartialmatchingandtermweights,similarasinthevectorspacemodel.Moreover,asthevector‐processingsystemsuffersfromonemajordisadvantage:thestructureinherentinthestandardBooleanqueryformulationisabsent,theEBMcombinesthecharacteristicsoftheVectorSpaceModelwiththepropertiesofBooleanalgebra.Hence,theEBMcanalsobeapplied,whentheinitialquerystatementsareavailableasnaturallanguageformulationsofuserneeds,ratherthanasconventionalBooleanformulations.

60WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 61: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Thevectorspacemodel(VSM)representsdocumentsasvectorsinthem‐dimensionalspace(Salton,Wong&Yang,1975).Thus,documentscanbecomparedbyvectoroperationsandqueriescanbeperformedbyencodingthequerytermssimilartothedocumentsinaqueryvector.Thisqueryvectorcanbecomparedtoeachdocument,whichreturnsaresultlistbyorderingthedocumentsaccordingtothecomputedsimilarity.Themaintaskofthevectorspacerepresentationofdocumentsistofindanappropriateencodingofthefeaturevector.Eachelementofavectorusuallyrepresentsaword(see→Slide4‐40)ofthedocumentcollection.Thesizeofthevectorisdefinedbythenumberofwordsofthecompletedocumentcollection.Theeasiestwayofdocumentencodingistousebinarytermvectors,thatmeansavectorelementissetto1ifthecorrespondingwordisusedinthedocumentandto0ifthewordisnot(Equation4‐4).ThisencodingresultsinasimpleBooleancomparison.Toimprovetheperformanceusuallytermweightingschemesareused,wheretheweightsreflecttheimportanceofawordinaspecificdocumentoftheconsideredcollection.Largeweightsareassignedtotermsthatareusedfrequentlyinrelevantdocumentsbutrarelyinthewholedocumentcollection(Salton&Buckley,1988).Thusaweightw(d;t)foratermtindocumentdiscomputedbytermfrequencytf (d;t)timesinversedocumentfrequencyidf(t),whichdescribesthetermspecificitywithinthedocumentcollection.TherankingcanbemadebyusingtheCosinesSimilarity(see→Slide4‐41).Thecosineoftheanglebetweentwovectorsisameasureofhow“similar”theyare,whichinturn,isameasureofthesimilarityofthesestrings.Ifthevectorsareofunitlength,thecosineoftheanglebetweenthemissimplythedotproductofthevectors(Tata&Patel,2007).

61WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 62: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Asaresultwegetamatrixrepresentation,andnowwecanapplyvectoralgebra,orparticularlinearalgebra– herestillinR3.Mathematically,wecanworkinarbitrarilyhighdimensionalspaces.ThemajorprobleminvolvedisthemappingbackintoR2.Oneverypositiveaspectisthatwecanlookforgettingsparsematrices,i.e.wesavealotofcomputationalpower.

62WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 63: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Turney,P.D.&Pantel,P.2010.Fromfrequencytomeaning:Vectorspacemodelsofsemantics.Journalofartificialintelligenceresearch,37,(1),141‐188.

Computersunderstandverylittleofthemeaningofhumanlanguage.Thisprofoundlylimitsourabilitytogiveinstructionstocomputers,theabilityofcomputerstoexplaintheiractionstous,andtheabilityofcomputerstoanalyseandprocesstext.Vectorspacemodels(VSMs)ofsemanticsarebeginningtoaddresstheselimits.Turney etal. (2010)surveystheuseofVSMsforsemanticprocessingoftext.TheyorganizetheliteratureonVSMsaccordingtothestructureofthematrixinaVSM.TherearecurrentlythreebroadclassesofVSMs,basedonterm–document,word–context,andpair–patternmatrices,yieldingthreeclassesofapplications.Theysurveyabroadrangeofapplicationsinthesethreecategoriesandwetakeadetailedlookataspecificopensourceprojectineachcategory.TheirgoalinthissurveyistoshowthebreadthofapplicationsofVSMsforsemantics,toprovideanewperspectiveonVSMsforthosewhoarealreadyfamiliarwiththearea,andtoprovidepointersintotheliteratureforthosewhoarelessfamiliarwiththefield.

63WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 64: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Beim Retrievalverfahren wird ein Rankingähnlicher Dokumente über dieCosinusÄhnlichkeit im m‐Dimensionalen Vektorraum durchgeführt.

InformationNeedQ→ =( _1"," _2,…"," _ )Wird ein Rankingähnlicher Dokumente über dieCosinus Ähnlichkeit im mdimensionalen VectorSpaceModeldurchgeführt

DerVorteil dieser Methode ist,dass es ein einfaches mathematisches Modelldarstellt,DieMatrizen sind Sparse(ist alsoeine günstige Datenstruktur)Dasretrievalkann inO(n)durchgeführt werden,daher gibt es ein relativ schnellesranking

Nachteile:DieWortanordung geht verloren (BagofWordAnsatz).

Es gibt viele weitere Methoden,wie z.B.LatentSemanticAnalysis(LSA)usw.ProbabilisticLatentSemanticAnalysis(PLSA)LatentDirichlet Allocation(LDA)

64WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 65: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

TheadvantagesofthealgebraicVSMincludethatitiseasytounderstand,partialmatchesarepossible,documentscanbesortedbyrank,anditusesterm‐weightingschemes;ontheothersidethereisahighercomputationalefforttocalculatesimilarity,andthe“bag‐of‐words”representationdoesnotaccuratelyconsiderthesemanticsofdocuments(Vallet,Fernández &Castells,2005).

65WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 66: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Fortheprobabilisticmodel,theindexweightvariablesareallbinary,i.e.ωij∈[0,1],ωiq∈[0,1].Aqueryqisasubsetofindexterms.LetRbethesetofdocumentsknown(orinitiallyguessed)toberelevant.LetR̅bethecomplementofR(thisisthesetofnon‐relevantdocuments).LetP(R/dj)bedeprobabilitythatthedocumentdj isrelevanttothequeryqandP(R̅/dj)betheprobabilitythatdj isnonrelevanttoq.Thesimilaritysim(dj,q)ofthedocumentdj tothequeryqisdefinedastheratio:

66WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 67: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Asinallmodelswehavecertainprosandcons,theprobabilisticmodelhasabigadvantage:thedocumentscanberankedbyrelevance;however,onthedisadvantageoussideitisabinarymodel(binaryweights),theindextermsareassumedtobeindependentandlackofdocumentnormalizationandthereisaneedtoguesstheinitialseparationofdocumentsintorelevantandnon‐relevantsets.

67WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 68: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Well, therearetwomainmeasurements

68WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 69: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Recall andPrecision– hardasabone

Followingthisdefinition:Recall=Correct/(Correct+Missing)andPrecision=Correct/(Correct+Spurious)

PrecisionPisthefractionofretrieveddocumentsthatarerelevanttothesearch:P=|{setofrelevantdocs}∩{setoffounddocs}|/{setoffounddocs}RecallRisthefractionofthedocumentsthatarerelevanttothequerythataresuccessfullyretrieved:R=|{setofrelevantdocs}∩{setoffounddocs}|/{setofrelevantdocs}Acombinationofprecisionandrecallistheharmonicmeanofboth,whichiscalledF‐measure:F=2∙(P∙R)/(P+R)Inclassification5termsareused:truepositives(=correct);truenegatives(=correct);falsepositives(=spurious);falsenegatives(=spurious);notdetected(=missing).

69WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 70: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Inthisslideweseeanoverviewofthelinguisticprocessingpipelinethatdescribesthestepsthatareperformedfromthedocumenttoitssemanticrepresentation.ThedomainknowledgeusedinthesemanticretrievalsystemismodeledintheformofthemedicalsemanticnetworkIDMACSR(MSN).ItusestheWingert Nomenclature(WNC)asitsmedicalterminology.TheWNCisbasedontheGermanversionofSNOMEDdevelopedbyFriedrichWingert.AlthoughitsmainfocusisonGerman,it,toalesserextent,supportsseveralotherlanguagesincludingEnglishandFrench.TheMSNformsasimpleontologywhoseconceptsareorganizedinataxonomy(isA‐hierarchy)andamerology (anatomicalpartOfhierarchy).Furtherrelationsbetweenconceptsaremodeled bylabelededges.TheMSNisdividedintoseveralsubdomains,including:– topography(i.e.,anatomicalconcepts)– morphology(e.g.,fracture,fever)– function(e.g.,respiration)– diseases(e.g.,glaucoma)– agents(e.g.,pathogens,pharmaceuticalsubstances)Currently,theMSNcontainsmorethan90,000termsand300,000uniquerelations.Thequerylanguagefollowsasimplegrammar,namely:Query::=DisjunctionDisjunction::=Conjunction|Conjunction";"DisjunctionConjunction::=Atom|Atom","ConjunctionAtom::=Term|"!"TermThusaqueryformsaBooleanexpressionindisjunctiveformoversearchterms.Semanticqueryexpansionhasbeendiscussedinseveralpreviouswork(Kingsland,Harbourt,Syed&Schuyler,1993),(Aronson,Rindflesch &Browne,1994)(Efthimiadis,1996).Theapproachisasfollows:eachsearchtermisindexed(usingthelinguisticprocessingmethodsdescribedabove)andreplacedbytheidentifieroftheWNCconceptmatchingtheterm.TheseconceptidentifiersarecalledWNCindices.IfthesearchtermreferstoacombinationofseveralconceptsintheWNC(e.g.,Gastroparesis=Stomach+Paresis),thesearchtermisreplacedbyaconjunctionoftheWNC(Kreuzthaleretal.,2011).

70WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 71: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

AscanbeseenfromthisSlidethemedicaldomainexpertoutperformstheotherretrievalmethods,achievinghighprecisionatahighrecalllevel.Interestingly,thesemanticbasedinformationretrievaltoolachievesapproximatelythesamerecalllevelasthemedicaldomainexpertwhilehavingalowerprecisionvalue.Thisperformanceresultisgood,rememberingthefactwhateffortthemedicaldomainexperthastomaketotranslatetheinformationneedintoaquerystring.Incontrasttothis,theinputfortheinformationretrievaltoolisshortandclearsothereforelessefforthastobemadetotransformtheinformationneedtothequerylanguageunderstoodbytheinformationretrievaltool.Keywordsearchhasahighprecisionvaluebutalowerrecallvalue.Thisresultisclearwhenconsideringthefactthatinformationneedsthatcanbedescribedbyusingthesekeyword(s)willachieveahighprecisionvalue.So,ifdocumentsarefoundtheywillberelevantbuttherecalllevelwillgenerallysuffer.LookingattheSlide4‐47,keywordsearchachievesapproximatelythesameprecisionasIRToolOnebutafarworserecall.ItisalsopossiblethatnosearchresultsarefoundatallwhenusingthekeywordsearchmethodologyascanbeseenfortheNeubildung,Darm informationneed(seeAppendixBandAppendixA).Incontrasttothis,forthisinformationneed,IRToolOnehasaboutthesameprecisionrecalllevelsasthemedicaldomainexpert,reflectingthesemanticprocessingchainofthetool.TheLSAstatisticalretrievalmethodhas,whencomparedtotheothermethods,alowerprecisionforallmeasuredrecalllevels.ThisresultgivestheimpressionthatLSAisapplicableforgettinghighprecisionvaluesforaparticularamountofsearchresultsbuthardtousetoachievebothhighprecisionandhighrecallvalues,whichisneededforexampleinclinicalstudies.

71WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 72: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Thefutureofbigdatais…big andtherewillbemanychallengesforus tosolve!

72WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 73: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

The grandquestionsofthefutureishowtomakesenseoutofthedata– megaquestionsincludeare:“Whatisinteresting?”– and“Whatisrelevant?”

73WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 74: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

74WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 75: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

75WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 76: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

76WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 77: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Adverse DrugEvents(ADE)areverycommonandthereforetheorderentrymustbetakenspecialcareof.Themedicationordersindifferentmedicationsystems.(a)Kardex system,(b)TIMEDsystem,and(c)CPOEsystem.

Physiciansmustentertheirmedicationordersintothesystem;nursesmaynotacceptanyhand‐writtenprescription.Aphysicianentersamedicationorderbyselectingadruganditsdosageform,strength,administrationroute,dosageregimen,startdateandtime.

77WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 78: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

ComparisonshowedthatthemedicationorderingandadministrationprocessaftertheimplementationresemblesthatoftheKardex‐system,whileitiscompletelydifferentfromthatoftheTIMED‐system.InbothKardex andTIMEDunits,wecomparednurseattitudestowardsthecomputerizedprocessinthepost‐implementationphasewiththeirattitudestowardsthepaper‐basedprocessinthepreimplementation phase.

78WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 79: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

NO=NoStock

ThemedicationorderingandadministrationprocessesinKardex‐systemandTIMED‐system;MO(MedicationOrder);HIS(HospitalInformationSystem);NS(Non‐Stock);forrequestingurgentNSdrugs,nursesoftendirectlyreferredtothepharmacywithhand‐writtenrequests.

ComparisonofFigs.2and3showsthatthemedicationorderingandadministrationprocessaftertheimplementationresemblesthatoftheKardex‐system,whileitiscompletelydifferentfromthatoftheTIMED‐system.InbothKardex andTIMEDunits,wecomparednurseattitudestowardsthecomputerizedprocessinthepost‐implementationphasewiththeirattitudestowardsthepaper‐basedprocessinthepreimplementation phase.

Thereisnocleardefinitionaboutthis,butitisdefinitelyaboutmanagementofdata,informationandknowledgefordecisionsupport.Letuslookintoapracticalexample– physicianorder– wherealotoferrorshappenedinthepastduetoamessofpaperbasedordersproducingalotofpaperchaos(youallknowthepost‐itsyndrome)

79WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 80: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Observationsandresultsofinvestigations—includinghistory,signs,andsymptoms—areconvertedbyclinicalstaffintodecisionsandappropriateactions.Controlusuallyrequirestheuseofrecordsandexternalsourcesofknowledge

Thecareofeachpatientcanbeconsideredtobeacontrolloopin whichdatafromobservationsandinvestigationsleadtodecisionsandactionsdesignedtotakecareofapatient'sproblemsandtheirconsequencesinasafe,effective,andlegitimatemanner.Thisloopoccursinallspecialtiesandisthesourceofalltheactivitiesofahealthcarefacilitysuchasahospital.Thoughcomplex,theseactivitiescanbesetoutasfourconcentricshells.

80WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 81: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Theclinicalcontrolloopisatthecoreofacomplexorganisation representedbyfour“shells”thatexchangedata.Activity shellsofclinicalcontrolloop

Clinicalmanagementshell—Assessmentofobservationsandresultsofinvestigations.Formulationsofdecisionsincludingthosebasedonobservations,investigations,andprocedurescarriedoutduringaconsultation

Clinicaladministrativeshell—Administrativeactivitieswhichfacilitatetheclinicalmanagementshellandlinkittotheothershells,suchasarrangingappointmentsandinvestigations,clinicalcorrespondence,filingresults,andclinicalaudit

Clinicalservicesshell—Investigative,therapeutic,andgeneralservicesprovidedbylaboratories,imagingfacilities,therapyunits,operatingtheatres,wards,suppliesdepartments,transport,etc

Generalmanagementshell—Generalmanagementofhealthcare,byhospitalmanagers,financialcontrollers,healthcarepurchasers,andstatutoryauthorities

81WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 82: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Exampleofavisualizedinformationsystemarchitecture,hereofthecomputer‐supportedpartofthehospitalinformationsystemoftheMedicalSchoolHanoverfrom1984([1],p.9).

82WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 83: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Mayo’sEnterpriseDataModeling(EDM)providesacontextforMayoenterpriseactivities.

83WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 84: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Care2x1isagenericmulti‐languageopen‐sourceprojectthatimplementsamodernHospitalInformationSystem.TheprojectwasstartedinMay2002withthereleaseofthefirstbetaversionofCare2xbyanursewhowasdissatisfiedwiththeHISinthehospitalwherehewasworking.Untiltodaythedevelopmentteamhasgrowntoover100membersfromover20countries.Care2xisaweb‐basedHISthatisbuiltuponotheropen‐sourceprojects:theApachewebserverfromtheApacheFoundation(http://www.apache.org/),thescriptlanguagePHP(http://www.php.org/)andtherelationaldatabasemanagementsystemmySQL(http://www.mysql.com/).ThereexistseveralsourcecodebranchesthattrytointegratetheoptiontochoosefromotherRDBMSlikeOracleandpostgreSQL.Thelatteroneisalreadysupportedinthecurrentversionatthetimeofwriting:“deployment2.1”.Forourinvestigationswehavechosenthemostfeature‐richversionthatwasavailablefromtheCare2xwebpageinearlyfall2004.Thisreleasehadtheversionnumber“pre‐deployment2.0.2”.Someminordeficienciesthatwereportlatermayalreadybefixedinthecurrentversion“deployment2.1”.

84WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 85: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

This isjusttoshowyouanexampleofaglobaldatabaseschemaEachmolecule(“Molecules”table)mayhavemorethanoneconformation(“Conformations”table)anditmaycomefrommorethanonesource(“Sources”table).Therearetwotypesofexperiments(“Experiments”table)thataredoneonmolecules:computationaldockingandbiologicalassays.Theresults(“DockingResults”and“AssayResults”tables)oftheseexperimentswerecapturedinthedatabase.Eachtypeofexperimentisdoneonaparticularp53mutant(“Mutants”table)andhasascore(“Scores”table)associatedwithit.

85WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 86: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

OpenDatabase Connectivity– APIinCforaccessingDBMSSystemarchitectureandthehybridstrategytodataintegration.Dockingandsmallmoleculedatausethemediationapproach,whilethefunctionalandstructuralassaydatausethedatawarehousingapproach.TheCRDBisbothamediatorandadatawarehouse.“Mutants”and“Molecular”aredatamartsofthewarehouse.TheODBCdriversarewrappersinthemediationapproach.Dashedlinesindicateintegrationplannedinthefuture.

86WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 87: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Theatomiccoordinatesofaproteinaredepositedintotheproteindatabase(PDB),aninternationalrepositoryfor3Dstructurefiles.AtthemomentPDBcontainsmorethan26.000proteinstructures

87WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 88: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

Wewilldealwithvisualizationsinlecture9– herejustanappetizerwhatyoucandisplay

Thisshows acervicalcancerqueryvisualization.TheGenenodesarepositionedusingbothchromosomenumberandorganismname.ThispositioningmethodallowsuserstofocusonaparticulargeneandspeciesusingNVSS’ssliderfilters.Nodesaresize‐codedaccordingtotheirindegree,whichprovidesanadditionalvisualcueaboutthenode’simportance.

88WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 89: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

89WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 90: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

90WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 91: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

91WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015

Page 92: New A. Holzinger 709.049 Mi, 04.11 - human-centered.ai · 2015. 11. 4. · Biomedical data warehouse Business hospital information system Clinical workflow ... , security, safety

http://psychology.wikia.com/wiki/Information_retrievalhttp://www.eecs.wsu.edu/mgd/gdb.html(GraphDatasets)

92WS 2015

A. Holzinger                                                         LV 709.049                                         Mi, 04.11.2015