swat4 ls fca_slides

Refining Health Outcomes of Interest using Formal Concept Analysis and Semantic Query

Expansion

Olivier Curé1, Henri Maurer2, Paea Le Pendu3, Nigam Shah3

1: CNRS LIGM lab, UPEM, France2: Edinburgh University, IK3: BMIR lab, Stanford University, USA

Problem setting

● Applications need to select, extract, compare and analyze groups of patients using Electronic Health Records (EHRs)

● This require to define Health Outcomes of Interests (HOI), e.g. myocardial infarction, chronic obstructive pulmonary disease.

● With clinical text, these definitions should capture variations of terms and ensure good precision and recall of the text-mining process.

Problem setting (2)

● It is not practical to define precisely these HOIs with concept identifiers, e.g. UMLS CUIs.

● We provide a solution that produces and refines HOI definitions from terms provided by the end-user.

● Our solution aims to propose sound and complete definitions in a best-effort way.

Approach overview

Diseases

Procedures

DrugsDevices

Bioportal - Knowledge

termsconcepts

Semantic QueryExpansion

Terminology3 DB

Semantic QueryExpansion

Formal ConceptAnalysis

StatisticsBasedPruning

● Improve search results by expanding queries with the transitive closure of the subsumption relationship of ontology concepts.

● Queries can be generalized (resp. specialized) via expansions with ancestors (resp. descendants).

● Ex: expanding a query with 'neoplasm' or 'tumor' when searching for 'cancer'.

● Abstract conceptual descriptions from a set of objects described by some attributes.

● Used in machine learning and knowledge management.

● A formal context is a triple (G,M,I), resp. a set of objects, attributes and a binary relation between G and M.

● A formal context can be represented as a matrix.

FCA (2)

{1,2}-{CF1,F1,CF2,F2}

{3}-{CF1,F1,MF2,F2}

{6}-{BLF1,F1,MF2,F2}

{4,5}-{BLF1,F1,BLF2,F2}

{1,2,3}-{CF1,F1,F2}

{3,6}-{MF2,F1,F2}

{4,5,6}-{BLF1,F1,F2}

{1,2,3,4,5,6}-{F1,F2}

Method

● SQE: Relational database approach– We are using the ontologies stored in Stanford's

DB and its materialization of concept subsumption (almost 14 millions entries).

● FCA: objects and attributes of the formal context are concept identifiers (UMLS concept identifiers).

Method (3)

● To improve relevance, identifying potential concepts among discovered ones, a pruning FCA-based approach is designed.

● Formal contexts is composed of matching concepts as objects and candidate concepts as attributes.

● Thus the binary relation corresponds to the subsumption relationship.

Method (4)

● Ex: 10365: “hyperlipoproteinemia type iv” and 740154 : “disease, disorder or finding”● Standard FCA algorithms are used to define the FCA lattice.

Method (5)

● Qualifying a discovered concept is performed using a top-down navigation of the FCA lattice.

● For each formal concept <Ai,Bi>, we compute the transitive closure of sub concepts of Ai (resp. Bi), denoted LAi (resp. Lbi).

● If (|LBi ∩ LAi |)/ | LBi | ≥ Θ, with Θ a predefined pruning threshold then Bi is potential concept

Method (6)

● Concept sets:– M : matching

– D : Discovered

– P : Potential

– C : Other concept

Example

● Search on Hypercholesterolemia on 18 ontologies provides:– 20 matching concepts (i.e., FCA objects)

– 102 discovered concepts (i.e., FCA attributes)

● Generates an FCA lattice with 67 formal concepts

● First formal concept satisfying a Θ=.75 pruning threshold is at the 4th level of the lattice: only 4 concepts out of 16 LBi are covered by LAi .

● These 4 concepts have the following preferred labels: “hypercholesterolemia”, “cholesterolosis”, “secondary hypercholesterolemia” and “hyperlipidemia”.

Method (7)

● We include interactions with end-user to validate our potential discoveries.

● Hence the domain expert has the final decision on acceptance/rejection of a proposition.

● Important issue: trade-off between user interactions and precision/recall of results.

● End-user can validate whenever she wants.● Interactions are performed in a web interface providing

additional information on the search (clinical text snippets, number of patients).

Evaluation

● i2b2 obesity NLP reference set used as an evaluation data set

● Gold standard are the results of a previous experiment conducted at Stanford.

● Evaluation in terms of specificity, sensitivity and duration of computation (on commodity hardware)

Evaluation (2)

● An improvement of 2 and 3 % on resp. sensitivity and specificity.

● Computation duration in terms of seconds on a standard laptop.

Evaluation (3)

● More interesting is that some of our false negatives seem to be relevant to the search.

● Some of these false negative come from the matching and also the potential (i.e. FCA based) approaches:

● Matching example :– Sitosterolemia for hypercholesterolemia'' for hypercholesterolemia

● Potential examples: ● “h/o: raised blood, familial hyperlipoproteinemia”, “fh: raised blood lipids” for

hypercholesterolemia, while the gold standard contains concepts such as “hyperlipoproteinemia type ii”) concepts which confirms the relevance of using a semantic approach.

● Note that among our true positive, depending on the use case, a significant number of items have been retrieved from the potential concept set, i.e., using our FCA statistical approach.

Conclusion

● We have proposed a semi-automatic solution for defining HOIs.

● Approach uses SQE and FCA enriched with a statistical approach.

● Our results are comparable to state of the art methods.

● It refines HOIs definitions efficiently with relevant terms/concepts/

Future works

● Conduct user-driven evaluations with clinicians and researchers.

● Analyze acceptance/rejection of end-users in practical scenarios.

● Use active learning over past query refinements to improve future queries.

● Study our method's impact on mining EHRs clinical notes and cohort building tools.

Thanks

Questions ?

ocure@univ-mlv.fr

swat4 ls fca_slides

formal concepts

fca objects

fca attributes

fca lattice

discovered concepts

formal concept analysis

potential concepts

matching concepts

Education

ls-5 owners manual - duro dyne · manual ls parts &...

an tardán derrynagall or ballydaly doire na ngall … ·...

ls series stand-alone inverters - rfi...ls series australian...

cdn.komachine.com · 2018-11-14 · ls-rab / ls-sb ls-cb...

retail restaurant-medical...

session 3 san ecalcs & ls ls

ls copper wire rod - ls cable & system

· pdf fileannex to: declaration of reach regulation...

19 - jung.de · jung enables greatest convenience in a...

implementation of polar format sar image formation on the...

ss.kln.ac.lk...2020/10/07 · page 2 of 4 scanned with...

linea lavanderia laundry line - rotondigroup.com · 3 linea...

zusatzmaterial 2. klassen - salzburgerland.com · colours...

ls c&s bus way system - brasilco power...ls consists of...

compact photoelectric sensor with built-in amplifier...

short form catalogue ls series limit switches€¦ ·...

tude anj &rief elixir · ls the path within, ls made by...

power chucks ls / lsc - forkardt | workholding ·...

¾l¹w ,iÀ - iranpotk.com 8 (mm) taper punch with knurled...

ls back ls loader ls back-hoe operator … tractor...