semanticinteroperabilityforhealthnetwork) deliverable4.1 ... 288408 d4_1 initial m… ·...

Semantic Interoperability for Health Network

Deliverable 4.1: Initial models covering the heart failure use case

[Version 6, Dec 10, 2012]

Call: FP7-‐ICT-‐2011-‐7

Grant agreement for: Network of Excellence (NoE)

Project acronym: SemanticHealthNet

Project full title: Semantic Interoperability for Health Network

Grant agreement no.: 288408

Budget: 3.222.380 EURO

Funding: 2.945.364 EURO

Start: 01.12.2011 -‐ End: 30.11.2014

Website: www.semantichealthnet.eu

Coordinators:

The SemanticHealthNet project is partially funded by the European Commission.

SemanticHealthNet

D4.1 Initial models covering the heart failure use case of 64

Document description Deliverable: D4.1 Publishable summary:

The SemanticHealthNet (SHN) project faces the challenge of semantic interoperability (SIOp) of clinical information. Despite multiple initiatives over the last decade major problems remain unsolved. A key requirement to SIOp is to univocally identify the meaning of clinical information throughout different systems and formalisms. Currently, many different representations encode the same semantic content. Following the SemanticHEALTH roadmap, this should be addressed by a closer integration between information models as used in electronic health records and terminologies and ontologies. This is complicated by the overlapping zone between both, which multiplies the possible different representations for exactly the same piece of clinical information. The work put forward by the SemanticHealthNet WP4 departs from the analysis of typical examples of non-‐interoperability, coming from two usage scenarios, viz. transfer of meaningful information from one system to another and standardized semantic retrieval across content from different systems. WP4 has started to analyse how the semantic equivalence of clinical information can be ascertained. It is currently developing general principles and guidelines to properly identifying which things have to be represented by information models (mainly openEHR, EN13606 and HL7) and which by ontologies (mainly SNOMED CT) in order to represent clinical information in an unambiguous way. This general principles and guidelines are reflected in the semantic resources developed. According to the work plan, the focus should be in (i) chronic heart failure and (ii) cardiovascular prevention, including all clinical aspects like diagnosis, observations, or medication. This will drive the resources to be developed. The first project year has focused on (i) and the aspects of diagnostic statements. SemanticHealthNet is not proposing a new standard. Instead it tries to formally describe the current interplay between information structures and clinical vocabularies and to introduce a semantic layer by which all facets of the complex meaning of components of the electronic health record can be addressed. To this end, WP4 makes consequent use of formal ontologies based on description logics and Semantic Web standards. This is documented by the concrete models created during the first project year, which are described in-‐depth in this document. They are mainly intended as easy-‐to-‐understand exemplars, built to demonstrate the feasibility of the ontological approach. This deliverable provides a snapshot of work in progress. Several issues are under discussion, and a total consensus among all stakeholders has not been reached yet. Controversial issues are addressed at several places in the text. One example of current discussions is the approximation of inherently higher order referential statements ("diagnosis is about clinical situation") by means of subset of first-‐order logics. Another example is an engineering bottleneck, which consists in the problem of eliciting the implicit knowledge contained in current information models and to "upgrade" the static elements of information models by references to a formal ontology.

SemanticHealthNet


Status: Version: Public: □ No □ Yes Deadline: Contact: Stefan Schulz [email protected] Editors: Catalina Martínez-‐Costa, Stefan Schulz

SemanticHealthNet


Table of Contents

Table of Contents .................................................................................................................................... 4

1. Introduction and objectives ............................................................................................................ 6

1.1 The SemanticHealthNet Project .............................................................................................. 6

1.2 Approach and methods ........................................................................................................... 8

1.3 Objectives ................................................................................................................................ 9

2 Background ................................................................................................................................... 11

2.1 EHR standards ....................................................................................................................... 11

2.2 SNOMED CT ........................................................................................................................... 11

3 The Semantic Interoperability (SIOp) challenge ............................................................................ 13

3.1 SIOp Scenarios ....................................................................................................................... 13

3.2 SIOp and Information models ............................................................................................... 14

3.3 SIOp and SNOMED CT Situation With Explicit Context ......................................................... 18

4 Ontologies and Description Logics (DL) ......................................................................................... 20

5 Shared logical framework based on OWL DL ontologies .............................................................. 22

5.1 Information vs. clinical reality ............................................................................................... 22

5.2 Information entities .............................................................................................................. 24

5.3 Clinical Situations .................................................................................................................. 26

5.4 Ontology-‐based binding of information entities to clinical situations .................................. 27

5.5 Negations .............................................................................................................................. 32

6 Discussion on alternatives, options, consequences and criteria for choice .................................. 35

6.1 Creation of clinical models .................................................................................................... 35

6.2 Query of clinical data instances of isosemantic models ........................................................ 36

7 Models and Use Cases ................................................................................................................... 38

7.1 Use case 1 (Data transfer) ..................................................................................................... 39

7.1.1 Phase 1: Semantic clinical information extraction ........................................................ 39

7.1.2 Phase 2: Clinical data exchange .................................................................................... 43

7.1.3 Phase 3: Syntactic clinical information mapping ........................................................... 44

7.1.4 Comments on Use case 1 .............................................................................................. 46

7.2 Use case 2 (Data abstraction for querying) ........................................................................... 47

7.2.1 Comments on Use case 2 .............................................................................................. 53

8 Notes about the deliverable ......................................................................................................... 54

9 Annex I Heart care summary use case .......................................................................................... 55

SemanticHealthNet


10 Annex II: Clinical Situation ......................................................................................................... 60

11 Annex III: Notes about ContSys ................................................................................................. 63

SemanticHealthNet


1. Introduction and objectives

1.1 The SemanticHealthNet Project

Semantic interoperability (SIOp) of clinical data is a vital prerequisite for enabling patient-‐centred care and advanced clinical and biomedical research. SemanticHealthNet (SHN) will develop a scalable and sustainable pan-‐European organisational and governance process to achieve this objective across healthcare systems and institutions. One important aspect is the development of a logical – ontological framework to support interoperability, which is compliant with current clinical vocabularies and information models, in particular SNOMED CT, HL7 CDA, ISO 13606 and openEHR. Alternatively the SHN project might provide guidance to implementers of the Standards Development Organizations (SDOs) and SNOMED CT.

According to the work plan, the clinical focus on chronic heart failure and cardiovascular prevention (including all clinical aspects like diagnosis, observables, medication) will drive the resources to be developed. We expect the exemplars in cardiology and public health to be specific enough to permit comprehensive development and validation of these resources, and yet typical enough for wider generalisation of the methodology and its governance. SemanticHealthNet will capture the needs articulated by clinicians and public health experts for evidence-‐based, patient-‐centred integrated care in these domains. Existing European consensus in the management of chronic heart failure and cardiovascular prevention will then be integrated in EHR architectures, clinical data structures, terminologies and ontology by leading technical experts.

Clinical and Industrial Advisory Boards will provide links with other domains in which these results can be used beneficially. The involvement of health authorities, clinical professionals, insurers, ministries of health, vendors, and purchasers will ensure that the project approach and results are realistically adoptable and viable. This work will also build on the SemanticHEALTH1 and CALLIOPE2 roadmaps for eHealth interoperability.

A business model to justify strategic investments, including the opportunity costs for key stakeholders such as Standards Development Organisations and industry, will be defined. Links with the epSOS large scale pilot and the eHealth Governance Initiative, will inform the shape of the Virtual Organisation that this Network will establish to sustain semantic interoperability developments and their adoption.

The consortium comprises 17 Partners and more than 40 internationally recognised experts, including from USA and Canada, ensuring a global impact.

Partners

1 Stroetmann, V. N.; Kalra, D.; et al. (2009) Semantic Interoperability for Better Health and Safer Healthcare: Research and Development Roadmap for Europe: SemanticHEALTH Report. European Commission; Luxembourg: Office for Official Publications of the European 2 European eHealth Interoperability Roadmap: http://ec.europa.eu/information_society/activities/health/docs/cip/calliope-‐roadmap-‐122010.pdf

SemanticHealthNet


1. Research in Advanced Medical Informatics and Telematics (RAMIT) – BE (Admin Coordinator)

2. Imperial College London (Imperial) – UK 3. University of Hull (UHULL) – UK 4. University Hospitals of Geneva (HUG) – CH 5. World Health Organization (WHO) 6. The University of Manchester (UoM) – UK 7. Medical University of Graz (MUG) – AT 8. International Health Terminology Standards Development Organisation (IHTSDO) 9. Institut National de la Santé et la Recherche Médicale (INSERM) – FR 10. Ocean Informatics (Ocean) 11. Health Level 7 International Foundation (HL7 International) 12. EN13606 Association (EN13606) 13. Empirica Gesellschaft für Kommunikations-‐ und Technologieforschung mbH (EMPIRICA) – DE 14. Standing Committee of European Doctors (CPME) – BE 15. European Coordination Committee of the Radiological, Electromedical and Healthcare IT

Industry (COCIR) – BE 16. Whittington NHS Trust (WHIT) – UK 17. European Institute for Health Records (EuroRec) (NoE Coordinator)

External Experts

1. Alan Rector (University of Manchester, UK) 2. Daniel Karlsson (University of Linköping, Sweden) 3. Diego Boscá (University of Valencia, Spain) 4. Jesualdo Tomás Fernández Breis (University of Murcia, Spain) 5. Mathias Brochhausen (University of Arkansas for Medical Sciences, USA) 6. Rahil Qamar (NHS Connecting for Health (CFH), UK) 7. William Hogan (University of Arkansas for Medical Sciences, USA) 8. William Goossen (Hogeschool Windesheim and Results 4 Care, NL)

Project Plan

Workstream I:

WP1: Patient care exemplar (heart failure) WP2: Public health exemplar (coronary prevention) WP3: Stakeholder validation

Workstream II:

WP4: Harmonised resources WP5: Infostructure and tools WP6: Industrial engagement

Workstream III:

SemanticHealthNet


WP7: Adoption and sustainability WP8: European Virtual Organisation WP9: Project management, dissemination, promotion

1.2 Approach and methods

Semantic Interoperability of clinical information within and across Electronic Health Record (EHR) systems should fulfil the following conditions:

• Clinical information is represented in a way by the sending system that it is understood by the receiving system with no or minimal loss of accuracy.

• Humans and computer can query and aggregate meaningful EHR data across different systems.

Although much of the investment in EHRs has been at national levels, the challenge of semantic interoperability is a global one, not only to support cross-‐border health care but also to support large-‐scale multi-‐national research and valid international comparisons. Multiple initiatives have been taken by different Standards Development Organizations (SDOs), but major challenges remain unsolved. A basic requirement is to univocally identify the meaning of clinical information across systems and to create convergence among a large range of heterogeneous representations. Nowadays there are three layers of artefacts to represent the meaning of clinical information:

1. Generic information models for representing EHR data such as the provided by ISO 13606, openEHR and the HL7 Clinical Statement Pattern (HL7 CSP).

2. Clinical element definitions like ISO 13606/openEHR archetypes, Detailed Clinical Models (DCMs) or Intermountain CEMs that are re-‐usable models that describe all the items around a topic such as BP measurement or a laboratory analysis test result; and clinical data set definitions or templates such as HL7 CDA or ADL templates, consisting of specific combinations of clinical element definitions, for a particular use case or purpose, tailored to the needs of structured data acquisition, use and exchange.

3. Clinical vocabularies such as LOINC, ICD and SNOMED CT, the latter being increasingly based on formal-‐ontological principles and logic.

Terminologies link different (intra-‐ and interlingual) language expressions (terms) to common meaning identifiers (often called concepts). Ontologies provide ordering schemes for domain entities by formally describing classes and properties. SNOMED CT is a hybrid in this sense, as it is not only a repository of terms but also an ontology that describes and categorizes the clinical entities the terms denote. It is the latter, which is of interest in SemanticHealthNet. We will therefore consider SNOMED CT as a clinical ontology. We will use the term "clinical vocabulary" for all systems of semantic reference, whether formal or not.

For semantic interoperability a linkage (binding) between information models and ontologies is indispensable. In the final report of SemanticHEALTH it is stated that sharing clinical meaning does not automatically imply identical terms and data structures: different physical or logical EHR representations may have a common meaning, i.e. they may be semantically equivalent. Therefore

SemanticHealthNet


the goal of semantic interoperability is to be able to recognize and process semantically equivalent information homogeneously, even if instances are heterogeneously represented. The sources of variety are different combinations of information models, ontologies, or isosemantic encodings within the same information model or ontology, e.g. pre-‐ vs. post-‐coordinated expressions in SNOMED CT. Different encodings of the same clinical information exist, i.e., where the same meaning is represented in different vocabularies to provide more granular and precise meanings. For instance, in LOINC, there are about 130 entries for blood pressure.

Nevertheless, the main challenge we know of in real projects to do with clinical models is not binding to external ontologies but the implied ontological relationships between information items that are currently only controlled by human intelligence.

According to the same report, the use of ontologies like SNOMED CT, together with EHR standards should be embedded into a framework capable of identifying equivalent clinical information even if heterogeneously represented. In order to achieve this goal, SemanticHEALTH recommends that formal ontologies should play an important role to univocally represent the meaning of each clinical information item and to map semantically equivalent expressions within and between EHRs, thus supporting semantic interoperability. Formal ontologies consist of logical axioms that convey the meaning of terms for a particular community3. The set of logical axioms that define a representational unit (concept, class, represented by a unique preferred name) is named its intensional definition. Dependable exchange of clinical data requires that a common intensional definition per representational unit exists. In this way, ontologies are based on the understanding of the members of a community and help to reduce ambiguity in communication4.

Throughout life sciences, ontology resources have increasingly been developed in the last years, and more and more experience in ontology theory and engineering has been accumulated in the (bio-‐) medical informatics community. OWL5, the Web Ontology Language, supported by the Protégé editor6 and several description logics reasoners has been established as a de facto standard in biomedical ontology research and practice. It represents a compromise between expressiveness and computing power, thus encompassing known limitations in either aspect.

1.3 Objectives

SemanticHealthNet WP4 will adapt, modify, combine, and apply existing formalisms rather than developing new ones. The major outputs will be:

• Specific information resources for heart failure, heart disease prevention and other conditions (in partnership with collaborating EC projects), on the basis of which generic solutions can be built;

3 Bishr Y, Pundt H, Kuhn W, Radwan M. Probing the concept of information communities -‐ a first step toward semantic interoperability. Interoperating Geographic Information Systems. 1999;495:203215. 4 Hakimpour F, Geppert A. Global Schema Generation Using Formal Ontologies. In: Proceedings of the 21st International Conference on Conceptual Modeling. ER '02. London, UK; 2002. p. 307-‐321. 5 OWL 2 Web Ontology Language Document Overview. http://www.w3.org/TR/owl2-‐overview/ 6 Protégé. http://protege.stanford.edu/

SemanticHealthNet


• Semantic resources and methodologies for meeting requirements for serving populations, including public health and health service management and research, intended for use across heterogeneous data sources;

• A generalized methodology for harmonizing the semantics between different information models and clinical vocabularies such as terminologies, ontologies, and classification systems. This methodology is to be applied and scaled in the coming years to the rest of health care, to be co-‐coordinated and governed by the Virtual Organization;

• Recommendations for further R&D by SDOs and research funding organizations such as the EC.

SemanticHealthNet


2 Background

2.1 EHR standards

In the last twenty years many countries have made big research efforts towards Electronic Health Records (EHR). Several EHR standards such as HL7 (CDA) or ISO 13606 and EHR specifications such as OpenEHR were proposed in order to standardize the way clinical information is represented.

Each of these EHR standards and specifications proposes to structure clinical information based on their own reference model. Each model has established a nested data structure that organizes the content of the clinical record. The granularity and semantic explicitness of these structures differs from one specification to the other. For instance, in OpenEHR there is a predefined structure for instructions or observations, whereas in ISO 13606 there is a generic concept that is used for both of them. This means that if ISO 13606 is used then important contextual information, e.g. whether what is represented refers to an observation or an instruction, will have to be provided by other resources such as a clinical terminologies or modelling patterns on the archetype level as those will be part of the renewed ISO 13606 standard.

Moreover, current information models offer freedom to structure the same clinical record information by using a broad array of different combinations of information model entities and there are many inbuilt assumptions implied in the structural relationships established between these information entities.

Regardless of which information model is chosen, it is not sufficient on its own to encode content in an interoperable way. A clinical vocabulary is required to express specific content and information model entities include the possibility to link each of the information model entities to some standardized coded content (such as SNOMED CT codes). However, the same clinical vocabulary can be linked to a clinical model at different levels of granularity. All this aspects will be later discussed.

2.2 SNOMED CT

The use of a system of standardized semantic reference is essential for the interoperability of clinical information. Among many, SNOMED CT – the Systematized Nomenclature of Medicine -‐ Clinical Terms – is certainly the most comprehensive one7. Originally developed by the College of American Pathologists (CAP), it is the result of the merger of SNOMED-‐RT (Reference Terminology) and the NHS Clinical Terms v3 (Read Codes or CTV3). SNOMED CT is owned, maintained, and further developed by the IHTSDO (International Health Standards Development Organization)8.

SNOMED CT includes more than 311,000 concepts, to which terms in several languages are connected. The concepts are uniquely identified by a concept ID (i.e. the concept 22298006 refers to Myocardial infarction) and embedded into a complex taxonomy. Many concepts are further described or defined using description logic axioms. SNOMED CT can therefore be regarded as a terminology system rooted in a formal ontology, which will allow integrating it into the semantic framework of SemanticHealthNet. 7 http://en.wikipedia.org/wiki/SNOMED_CT 8 http://www.ihtsdo.org

SemanticHealthNet


This will certainly provide significant advantages but there are also known drawbacks and therefore challenges, especially due to its limitation to the EL++ dialect, a rather inexpressive variant of OWL DL, which, e.g., lacks negation and universal restrictions. Although SNOMED CT’s ongoing redesign efforts increasingly consider principles of formal ontology, its top-‐level categories and relations reflect more the legacy of its predecessors than the precepts of formal ontologies. In addition, SNOMED CT also includes characteristic of information models. Particularly in the Findings and Situation hierarchies we find complex clinical expressions like Chronic disease -‐ treatment changed, Family history of radiation therapy, or Suspected deep vein thrombosis that refer to complex clinical situations and not to medical processes or objects. These have arisen precisely because SNOMED CT is often used in situations in which there is NO information model, as the legacy of UK primary care is that a record consists of a list of codes. The upshot is that much content that belongs in the information model is in the codes because there was nowhere else to put it. This might lead to conflicting semantics, when applied in the various information models and it brings us to a central concern of SemanticHealthNet – the notion of the context in which a clinical statement is made or a code is used.

SemanticHealthNet proposed to be as inclusive as possible with regard to all possibilities of encoding, also those SNOMED CT concepts which represent lengthy clinical statements rather than clinical entities themselves. It therefore does not take, primarily, a prescriptive attitude in the sense that it would recommend just not to use certain SNOMED CT concepts, even more correct alternatives existing that are preferable from an ontological point of view. Therefore, the interpretation of context-‐laden SNOMED CT concepts in a consistent way with the EHR information models will require of the definition of some semantic patterns throughout the project. The SNOMED CT hierarchy Situation with explicit context will require a detailed examination. The standard EL++ description logics cannot be used for many of the concepts contained therein. An alternative solution using richer description logics is currently under progress within SemanticHealthNet. This is an important aspect in the goal of developing an overall viewpoint on SNOMED CT for SemanticHealthNet.

Another difficulty is ambiguity in SNOMED CT. Numerous concepts occur more than once, with slightly different shades in meaning. E.g. for systolic blood pressure SNOMED CT offers two possible encodings, namely the observable entity (i.e. 72313002 |systolic blood pressure|) or the clinical finding (i.e. 163030003 |on examination -‐ Systolic BP reading|). According to SNOMED CT, observables are used to code elements/entities, which a value can be assigned, while findings represent the result of a clinical observation, assessment or judgement. Clinical models bind information entities to observables or clinical findings in an idiosyncratic way and this hampers semantic interoperability. Therefore it is necessary to provide some guidelines and general principles of their use. The IHTSDO Observable Entities SIG from IHTSDO is already studying how to remodel them in order to avoid these problems.

Important performance issues matter when reasoning is done. Reasoners such as Snorocket or ELK have been optimized for large-‐size ontologies like SNOMED CT and provide high-‐performance implementation of the polynomial-‐time classification algorithm for EL++. Snorocket, e.g., classifies SNOMED CT in less than a minute and has been included in the IHTSDO Workbench software. The possibility of extracting value sets for specific use cases will also be studied in order to make feasible the use of SNOMED CT in the EHR.

SemanticHealthNet


3 The Semantic Interoperability (SIOp) challenge

According to the SemanticHEALTH report, full semantic interoperability consists of the ability of information systems to automatically access, interpret and present all necessary medical information required and located in another system. Neither language gaps nor technological differences should prevent the receiver system to seamlessly integrate the received information into the local record thus providing a complete picture of the patient’s health as if it would have been collected locally.

This SIOp definition is overly ambitious. In practice we will have to accept limitations to the degree of SIOp that can be achieved. It is implausible that we achieve less ambiguous communication via information systems than we can face-‐to-‐face communications. Misunderstandings of words and descriptions are inevitable. The intention to send EHR data from one system and to insert into a very different one and process them homogeneously is rather unrealistic. Therefore a compromise to the desired level of SIOp to obtain for different use cases will have to be found.

3.1 SIOp Scenarios

There are at least four kinds of SIOp scenarios: (1) data acquisition and communication at patient level for routine care, (2) data acquisition and communication at patient level for clinical trials, (3) statistical aggregation of patient information (1 or 2) for research purposes, (4) aggregation for health management and health service remuneration. The degree of detail required for each is different, although the technical issues may be similar.

SemanticHEALTH had investigated which are the most promising areas according to the costs and benefits for different degrees of SIOp and different purposes, for shorter-‐term investment. For instance, the achievement of partial SIOp for a minimal data set is expected to yield high added value at moderate cost. A high degree of SIOp of EHRs for direct patient care purposes would be very costly, but could also yield very significant benefits. For the purposes of public health research, striving for partial SIOp should come at moderate costs and yield high benefits. Finally, a high degree of SIOp for the purpose of research and knowledge sharing would yield very significant benefits, albeit at high costs.

With regards to costs SemanticHEALTH also stated that the implementation and the utilisation are main issues to be considered, being this last one critical for the realization of benefits. Although it has long been the goal that all data should be collected in the course of routine care the experience has suggested limits due to time and financial constraints that may make it impractical to collect data in the detail required using the present systems. Some other costs will include the local customization of terminologies, change management that requires additional training and education, and the harmonisation of data collections. The commitment of large vendors to interoperable solutions (except HL7v2, LOINC, DICOM) has been limited.

SemanticHEALTH concluded that the costs of SIOp are justified by the benefits, first of all the speed and accuracy of accessing meaningful health related data. Key aspects pertain to the medical staff saving work time, gaining efficiency and clinical outcomes through better access to patient information across disciplines, care settings and countries. A special focus must be put on patient safety. Other benefits will be the strengthening of the patient role in the doctor-‐patient relationship, the decrease of the reaction time to global threats such as pandemics, or the greater confidence in the information used for audit, planning and performance management.

SemanticHealthNet


3.2 SIOp and Information models

EHR standards had been devised to structure and organize clinical information in a way that would be easy to communicate and process. But they allow many different ways of representing the same clinical information, not only between different information model standards but also within the same standard. The binding of information entities to clinical vocabularies is meant to provide explicit semantic to the clinical content represented by these structural models. Bindings were developed to comply with different kinds of vocabularies, and clinical vocabularies can be used within multiple different information model frameworks. There are two consequences. Firstly, present EHR standards include their own vocabulary, with mostly hidden (non explicit) ontological assumptions. Secondly, vocabularies include context-‐laden concepts, which correspond to complex clinical statements rather than to terms. As a result there is a zone of overlap between EHR information models and clinical vocabularies, which provokes that exist different representations for exactly the same clinical information. This is one of the main barriers to SIOp, i.e. the lack of a strict separation between the information represented by clinical vocabularies and by EHR information models.

As a very simple example of this overlapping zone Figure (Figure 3-‐1) shows three representations of the statement "cancer confirmed". In the upper left part it is represented by using an EHR information model (e.g. OpenEHR) and by means of two information entities: one representing the disease diagnosed (cancer) and other representing the status of the diagnosis (confirmed). On the other hand, in the right upper part it has been represented using the single SNOMED CT concept cancer confirmed. But, does it alter anything of the nature of a cancer to "be confirmed"? The answer is no; what can be confirmed is its diagnosis, i.e. the information that is referring to the SNOMED CT concept. Finally, in the bottom part of the same figure, the same statement has been represented using OpenEHR information model entities (for the diagnosis and its status) and SNOMED CT (for the disease diagnosed). Note that there is no explicit semantic link between the information entities that represent the diagnosis and its status.

confirmed

cancerCancer Diagnosis,confirmed

Terminology concept Information entity

confirmed

cancer

CancerTerminology concept

EHR$Informa,on$Model$(openEHR)$ Terminology$(SNOMED$CT)$

EHR$Informa,on$Model$/$Terminology$

395099008$|cancer$confirmed|$

Figure 3-‐1 Information model & clinical vocabulary representing the statement "cancer confirmed". Top left: all information is in the information model. Top right: all information is in a

SNOMED CT concept. Bottom: the information model represents the epistemic part of the statement and refers to SNOMED CT, which represents the ontological part.

SemanticHealthNet


Different clinical models that represent the same semantic content but in a different way (usually different structure) are named isosemantic models. According to the SemanticHEALTH roadmap, for achieving SIOp both (information) structure and vocabulary are needed. It means that it is not enough with a set of codes without any information structure and vice-‐versa. Thus, the SHN challenge consists of detecting the information that is semantically equivalent among isosemantic models. This will require to develop general principles and guidelines to properly identify which things have to be represented by EHR information models and which ones by clinical vocabularies, so that information is represented in an unambiguous way. These general principles and guidelines will be reflected in the semantic resources (ontologies) developed.

Information models represent the entities in which information is recorded. These entities are (immaterial) information objects like documents, headings, sections, entries, etc. They represent the contextual and epistemic aspects in the EHR and they usually provide answers to the questions "where" (e.g. healthcare facility), "when" (e.g. date time), "who" (e.g. clinician) and "how" (e.g. clinical process), whereas the referent in the real world is referred to by "what" (e.g. clinical condition)9 questions.

This brings us to a general theory of reference10. In human discourse, but also in representational artefacts like pictures, the existence of some specific information entity (an image, a sentence) about a type T does not necessarily imply the existence of any instance of T in the world. Examples of this abound in medical documentation. Doctors talk about details of a planned operation without knowing whether this operation will ever take place. They prescribe a drug regime without knowing whether the patient will ever take the medicines. A child is suspected of bacterial meningitis and is treated as such although, retrospectively, there was only a viral infection. Nevertheless, the medical concept of meningitis is present in the physician's deliberations just as it is documented and coded in the record, albeit in the context of suspicion.

Information entities as used in clinical documents do not necessarily refer to entities of proven existence on the side of the patient. On the one hand they are about states of human knowledge (e.g. what a health professional thinks, hypothesizes, plans), and subsequent speech acts that manifest themselves in record entries with words like "suspected", "confirmed", "ordered" etc. On the other hand there are clinical codes that denote types of clinical entities (concepts). SNOMED CT often provides intensional definitions for a type or concept11, whereas its extension is the class of all particular entities of this type. In SNOMED CT, e.g. 254701007|basalioma| represents the class of all (individual) basalioma and the axioms attached convey invariant features for an entity to qualify as a basalioma, e.g. that it is located at the skin. However, we should be able to talk about basalioma, without referring (or without knowing whether we refer) to any particular entity in the extension of the abovementioned concept.

It is tempting to draw, in theory, a clear line between these two types of semantic artefacts, viz. models of information and models of the domain. As we have demonstrated clinical vocabularies often blend ontological with epistemic aspects, as shown with the SNOMED CT concept 395099008 |Cancer confirmed (situation)| where cancer is a disease (i.e. a clinical entity), whereas the whole statement is a diagnosis (the information that it is confirmed being an attribute of the diagnosis). It is even more apparent with a concept such as 160244002 |No known allergies (situation)|. It exists due

9 See also related discussions in the CIMI and SIAMS projects: http://informatics.mayo.edu/CIMI/images/c/c6/GF_EN13606_SIAMS_for_CIMI.pdf 10 http://plato.stanford.edu/entries/reference/ 11 We do not distinguish between "Type", "Class", and "Concept" in this text.

SemanticHealthNet


to coding convenience, whereas from a purist point of view, only the clinical entity allergy should come from a clinical vocabulary, and the epistemic "envelope" should be represented by an information model.

In SNOMED CT both concepts are placed under the Situation with Explicit Context concept model category. Here, concepts include epistemic information (i.e. what is known about a situation) such as "confirmed", mixed with contextual information such as the temporal reference (e.g. "past"), and the reference to a clinical concept, which may or may not be instantiated in the particular case (e.g. "cancer" or "allergy"). Therefore, as earlier mentioned (see Section 2.2), this segment of SNOMED CT will require of a reinterpretation in order to use their terms in a consistent way within the EHR.

In practice, the line between both kinds of entities is difficult to draw because SNOMED CT and other ontologies and terminologies provide concepts that include information entity aspects, but also information model entities provide some implicit meaning that should be represented using the clinical vocabulary (see Figure 3-‐1).

Examples are OBSERVATION or EVALUATION from the openEHR or HL7 information models. An example of information models including both epistemic and ontological information is shown below (see Figure 3-‐2) with an excerpt from the OpenEHR diagnosis archetype12:

EVALUATION[at0000.1] matches { -- Diagnosis data matches { ITEM_TREE[at0001] matches { -- structure items cardinality matches {1..*; ordered} matches { ELEMENT[at0002.1] matches { -- Diagnosis value matches { DV_CODED_TEXT matches { defining_code matches {[ac0.1]} -- Any term that 'is_a' diagnosis }}} ELEMENT[at0.32] occurrences matches {0..1} matches { --Status value matches { DV_CODED_TEXT matches { defining_code matches { [local:: at0.33, -- provisional at0.34] -- working }}}} ... CLUSTER[at0011] occurrences matches {0..*} matches { -- Location items cardinality matches {1..2; ordered} matches { ELEMENT[at0012] occurrences matches {0..1} matches { -- Body Site value matches { DV_CODED_TEXT matches { defining_code matches {[ac0000]}-- Any term that describes a body site }}}}} ...

Figure 3-‐2 Excerpt of the OpenEHR diagnosis archetype The information model entity "ELEMENT[at0002.1]" and the information model entity "CLUSTER[at0011]" should together specify, possibly by means of reference to classes in a clinical ontology, the focus condition of the diagnosis. On the other hand, the entity "ELEMENT[at0.32]" specifies the epistemic status of the diagnosis, i.e. whether it is a provisional or a working diagnosis.

12 OpenEHR Clinical Knowledge Manager: http://openehr.org/knowledge/.

SemanticHealthNet


Thus, as in the case of clinical vocabularies mixing representation of clinical entities with contextual and epistemic information described above, also in information modelling the boundary between representation of clinical and information entities is unclear.

This is a known problem that has already been addressed in other works. The HL7 community, as a result of the Terminfo Project13, provided a guide for the use of SNOMED CT in the HL7 V3 Clinical Statement pattern. Sundvall et al.14, highlighted the difficulty to bind clinical information models to clinical vocabularies and presented a tool that supports the binding of openEHR archetypes to SNOMED CT. The UK National Health System (NHS) has also developed a guide on terminology binding in the context of the Logical Record Architecture (LRA)15 following an EN13606 based logical model. In the ISO DTS 13972 on Detailed clinical models16, characteristics and processes, guidance is given to define a concept as represented in a detailed clinical model using terminology clinical vocabulary, and further how to link specific data elements and specific values to codes. Principles of modelling clinical information are suggested by SIAMS17. This project aims at the creation of standardised generic semantic patterns, in order to provide harmonization with other resources such as ContSys (CEN/ISO System of Concepts for Continuity of Care), HISA (Health Information Services Architecture), coding systems, value sets, ontologies and archetypes. More recently, the Mayo Clinic, in the context of the SHARPn project on the secondary use of EHR data has developed CEM (Clinical Element Model)18, and as CIMI (Clinical Information Model Initiative)19, both aim to provide a common format for the representation of clinical information in a way that is semantically interoperable, what implies to provide a method for doing the terminology / ontology binding.

To identify clinical information independently of how it is represented is a hard task, but not impossible. For instance, comparing different information models that were built to collect information on the same clinical topic, e.g. blood pressure, reveals that several aspects of the information remain consistent, independent of the modelling approach. These consistent components are the reference to the clinical topic (e.g. blood pressure), the explicit listing of the separate data elements (systolic blood pressure, diastolic blood pressure, average blood pressure), and influencing factors (such as single data elements for body position, level of exercise, body location for measurement, instrument used), value set expression (such as standing, sitting and lying for body position) and precise code bindings to these specifics. In addition explicit relationships can be defined, as are data types for observations and more. All these information components are available in specific OpenEHR and 13606 archetypes, HL7 templates, and Detailed Clinical Model instances.

We prefer to call this "information on a topic" and not "concept". Statements like "Archetypes are used in openEHR to model clinical concepts"20 are misleading. The meaning of concepts such as Arterial Blood Pressure is represented in ontologies, e.g. that it is a pressure; that it is a property of blood in the arteries, etc. A clinical model on blood pressure defines the data elements required to collect information around a clinical concept such as blood pressure, according to a given clinical context.

13 Using SNOMED CT in HL7 Version 3; Implementation Guide, Release 1.5: http://www.hl7.org/v3ballot/html/infrastructure/terminfo/terminfo.html 14 Sundvall E, Qamar R, Nystrom M, Forss M, Petersson H, Karlsson D, et al. Integration of tools for binding archetypes to SNOMED CT. BMC Medical Informatics and Decision Making. 2008;8(Suppl1):S7 15 Logical Record Architecture for Health and Social Care: http://www.connectingforhealth.nhs.uk/systemsandservices/data/lra 16 ISO/CD TS 13972 Health informatics -‐-‐ Detailed clinical models, characteristics and processes 17 http://informatics.mayo.edu/CIMI/images/c/c6/GF_EN13606_SIAMS_for_CIMI.pdf 18 SHARPn project: http://informatics.mayo.edu/sharp/ 19 CIMI initiative: http://informatics.mayo.edu/CIMI/ 20 http://en.wikipedia.org/wiki/OpenEHR

SemanticHealthNet


SemanticHealthNet is, at this stage, mostly descriptive but less prescriptive. Instead of introducing new models it proposes a formal-‐ontological framework that can be used to logically describe existing structures and then to provide SIOp between different representations of the same clinical information by using different but isosemantic models. Being descriptive does not prevent to provide recommendations to Standard Development Organizations, especially if the current approaches address patient safety issues. Not a new "standard" is defined but a semantic layer is specified that should be able of mediating among clinical models and clinical vocabularies, which will require of the definition of semantic patterns. This does not mean, however, that SHN will not issue good practice recommendations in order to prevent repeating the mistakes of the past.

Models that will be created within SemanticHealthNet are therefore intended as exemplars. They are built to demonstrate the feasibility of the SemanticHealthNet approach, which attempts to make the ontology model binding explicit by adhering to strict modelling principles grounded in formal ontologies. We consider the accumulated experience on ontological modelling and upper-‐ontology creation as a useful resource. Our hypothesis is that strict adherence to these principles would reduce, but not eliminate, the abovementioned representational variety (see Section 5).

3.3 SIOp and SNOMED CT Situation With Explicit Context

As mentioned in Section 2.2, SemanticHealthNet proposed to be as inclusive as possible with regard to all possibilities of encoding. This includes keeping all some SNOMED CT hierarchies within its scope. The consistent use of context-‐laden SNOMED CT concepts (from the Situation with explicit context hierarchy) within information models is currently subject to discussions within the consortium. The SNOMED CT documentation states that a concept includes context information if its name explicitly represents information that might otherwise be represented by another less context-‐rich concept in a particular structural placement within a record. If context is embedded in the meaning of a concept then context elements typically alter the meaning in such a way that the resulting concept is no longer a subtype of the original one. In order to represent these kinds of concepts they propose a context model. For example, the model for Skin Cancer Unknown would be expressed as followed:

Skin Cancer Unknown ISA Situation-‐with-‐explicit-‐context

[SubjectRelationshipContext] Subject of record [AssociatedFinding] Skin cancer [TemporalContext] Current or specified time [FindingContext] Unknown

If instead we would like to say that there is No Skin Cancer, the value of [FindingContext] has to be changed to Known absent. As it can be observed, in the same SNOMED CT concept epistemic information (what is known about the situation), such as Suspected or Changed, is mixed with contextual information (e.g. the temporal reference), and the reference to a clinical concept, which may or may not be instantiated in the particular case. As much as this may seem justified, at least pragmatically, it becomes problematic when the above frame-‐like representation is translated into description logics, following the official conversion routine21:

21 We use Manchester Syntax. Classes (Concepts) are written by Italics, object properties (relations) by Bold face

SemanticHealthNet


Skin Cancer Unknown subClassOf Situation with explicit context and

SubjectRelationshipContext some Subject of record and AssociatedFinding some Skin cancer and TemporalContext Current or specified time and FindingContext some Unknown

As existentially quantified relations are, besides taxonomic (IsA) links, the only way EL++ offers to link two SNOMED CT concepts, unintended interpretations arise, such as, e.g., that for each instance of the concept (each record entry) Skin Cancer Unknown there is at least one instance of Skin Cancer. This is an obvious contradiction to the intended meaning.

SemanticHealthNet


4 Ontologies and Description Logics (DL)

Ontologies

Increasingly clinical vocabularies, particularly SNOMED CT and the upcoming ICD-‐11, are based on ontologies. Computer scientists envision (formal) ontologies mostly as explicit formal specifications of an abstract, simplified view of the world, a so-‐called conceptualization, which is bound to a specific purpose22, driven by requirements for intelligent systems. More recently it has been stressed that ontology building should strive at axiomatically describe objective reality beyond a specific, application-‐oriented realm, based on the philosophical discipline of Ontology as the study of what there is23.

In the last decade, the term Applied Ontology24 has been increasingly used for interdisciplinary ontology research and engineering projects that integrate the computer science and philosophy perspectives. Ontological analyses ("what is x?") need to be distinguished from epistemological analysis ("how do I know that x is?"). The Semantic Web has established to create standards for meaningful exchange of information, thus mainly contributing to the creation of standards like RDF(S), SKOS, and OWL. Among them, only OWL fulfils our requirements for using formal ontologies within SHN. There are several philosophical tendencies regarding the commitment to reality. One approach, Scientific Realism has been very actively propagated by ontologists like Barry Smith25. To date, many biomedical ontologies subscribe, more or less strictly to this approach, some aspects of which are easily reconciled with the thinking of medical practitioners and biomedical researchers.

In SemanticHealthNet we follow the line of Applied Ontology with the purpose of providing semantic interoperability to clinical information. This requires to learn from all these different interpretations and to apply this knowledge to achieve our purpose.

Representation

There are several languages used in ontology engineering. The most popular one is OWL (Web Ontology Language), recommended by the W3C as ontology language for the Web, based on Description Logics (DL), which are subsets of first-‐order logics. OWL allows for the use of DL reasoners such as Fact++, Pellet or Hermit. There are two OWL versions, OWL 1 and OWL 2. We will refer here to OWL 2 DL in general and OWL-‐EL in particular. The latter one is used by SNOMED CT. As it provides polynomial time for reasoning, it easily scales up to huge ontologies like SNOMED CT. However, it lacks negation and value restrictions. The practical scalability of reasoning is crucial for the adoption of semantic technologies in real settings. More and more expressive ontologies have been built in the last years and current DL reasoners cannot efficiently reason with them.

22 Gruber T. Toward principles for the design of ontologies used for knowledge sharing. International Journal of Human Computer Studies, 43(5-‐6):907-‐928, 1995. 23 Quine W. ”On What There Is”, Review of Metaphysics 2: 21-‐38. Reprinted in From a Logical Point of View (Cambridge: Harvard University Press, 1953). pages 1-‐19, 1948. 24 Gruber, T. (2008). Liu, Ling; Özsu, M. Tamer. eds. Ontology. Springer-‐Verlag. ISBN 978-‐0-‐387-‐49616-‐0. 25 Barry Smith and Werner Ceusters, Ontological Realism as a Methodology for Coordinated Evolution of Scientific Ontologies, Applied Ontology, 5, 2010, 139–188.

SemanticHealthNet


Upper level Ontologies

Upper level ontologies provide a basic, mostly domain-‐independent set of categories, relations and axioms, which are easily reusable across specific domain applications. Without them, if we give one modelling task to different people the resulting representations will be heterogeneous and of limited interoperability. Upper level ontologies are meant to address this problem by standardising the development process and restricting the choices of ontology engineers. Examples of upper level ontologies are BFO (Basic Formal Ontology)26, DOLCE (Descriptive Ontology for Linguistic and Cognitive Engineering)27, GFO (General formal ontology)28 or SUMO (Suggested Upper Merged Ontology)29. Under upper level ontologies, top-‐domain ontologies further elaborate an upper level ontology to a specific domain. BioTop30 is an example for this. The rationale of its creation had been to constrain upper level categories by a set of canonical relations, derived from the OBO Relation Ontology (RO), at a time where BFO was restricted to a mere taxonomy of mutually disjoint classes, without including any relations and related axioms. In SemanticHealthNet we will use its reduced version, BioTopLite31, with the focus of representing clinical information entities.

The uppermost level of BioTopLite exhibits ten mutually disjoint top categories: Condition, Disposition, ImmaterialObject, MaterialObject, InformationObject, Quality, Role, ValueRegion, Process and Time. BioTopLite borrows relations (OWL object properties) from the RO, viz. proper-‐part-‐of, located-‐in, has-‐participant, but it extends this scope by additional relations such as has-‐realization, inheres-‐in etc. It contains numerous axioms, such as domain/range restrictions of object properties, relationship chains, as well as existential and value restrictions at the level of class definitions. BioTopLite users are recommended not to add new object properties, as the existing ones are considered to be exhaustive. Thus, the building of domain ontologies under BioTopLite heavily constrains the freedom of the ontology engineer, which is intended, as this warrants a higher predictability of the domain ontologies produced. As many spontaneous design decisions bear the risk of violating constraints, it is therefore mandatory to classify the ontology (using DL classifiers like Fact++ or HermiT), even after minor modification steps.

Maintenance and interoperability issues

The medical domain is constantly evolving which requires changes and adaptations of ontologies. New concepts are added and existing ones are retired. Versioning mechanisms will therefore be necessary to deal with these changes. Specific techniques and tools would have to be defined to make reliable use of ontologies. Resolvable, persistent identifiers for classes and relations in an ontology (URIs) are an important mechanism to assure interoperability across version changes. In the life sciences domain there are initiatives such as Identifiers.org32 for providing persistent URIs used to identify data for the scientific community. Another important issue will be how to establish links between the clinical model and the ontology communities, which allow adding new content to ontologies in reliable way, avoiding local definitions. Much experience on this has been collected in the OBO Foundry project with ontologies like the Gene Ontology using a request tracker mechanism33.

26 http://www.ifomis.org/bfo 27 http://www.loa.istc.cnr.it/DOLCE.html 28 http://www.onto-‐med.de/ontologies/gfo/ 29 http://www.ontologyportal.org/ 30 http://www.imbi.uni-‐freiburg.de/ontology/biotop/ 31 http://purl.org/biotop/biotoplite.owl 32 http://identifiers.org 33 http://sourceforge.net/tracker/?group_id=36855&atid=440764

SemanticHealthNet


5 Shared logical framework based on OWL DL ontologies

5.1 Information vs. clinical reality

In Section 3 the lack of a strict separation between information and clinical entities has been identified as one of the main barriers to SIOp. In this section we will introduce the shared logical framework we propose in order to clearly establish the boundary between both. We will show how the framework proposed will allow detecting semantic equivalent information from different clinical models.

The framework is based on the following assumption: Clinical information exists as soon as it is created, whereas the existence of the referent can mostly not be taken for granted: Diagnoses are often hypothetical, findings may be error-‐prone, symptoms biased. Diagnostic statements may contradict themselves, although the underlying clinical reality is the same. Therefore we do not claim the (material) existence of the referent of a statement in a clinical model. This is exemplified by Figure 5-‐1. The rectangles divide the realm of Information (healthcare context) and the realm of the clinical reality, here called the Possible Patient Scenario (described by ontology classes), in order to make clear that this world can only be possibly grasped by the statements made in an EHR. The outermost orange box (i.e. information) provides all the health care information related to healthcare delivery stakeholders (e.g. the clinician), the healthcare facility, and the times of consultation and recording. It also includes the representation of an EHR excerpt defined to record the past history of a patient and their clinical diagnosis by using information entities according to some EHR standard (e.g. openEHR). The innermost green box (Possible Patient Scenario) consists of clinical concepts or types (e.g. cancer, diabetes, lung, etc.), as introduced by a clinical ontology like SNOMED CT. This means the formal description of things described by medical terms, as the possible scenario of the patient and the clinical situations in which he or she participates.

In the present clinical models this information is often represented only by means of information entities or codes from a clinical vocabulary, or by arbitrary combinations of the two. Our hypothesis is that the use of the shared logical framework proposed should provide an unambiguous representation of clinical information (i.e. unambiguous way of relating the different kind of entities). There are some experiences of this in the context of the Detailed Clinical Modeling approach, in which the medical context, data element specification and terminology / ontology binding have been combined34.

34 ISO DTS 13972. Health informatics -‐-‐ Detailed clinical models, characteristics and processes. Geneva, ISO.

SemanticHealthNet


Figure 5-‐1 Information entities / Ontology (possible patient scenario)

Our approach consists of developing a Common Model of Meaning (CMM) in which the distinction between information / vocabulary (ontology, terminology) is clearly identified, and which allows to combine both SNOMED CT and EHR standards in an unambiguous way, by following certain patterns (see Figure 5-‐2). The CMM is based in formal ontologies. However, as it will be further discussed, ontologies do not have to be used to represent the whole model but just some parts of it that will allow detecting semantic equivalences among different representations of clinical information.

Common Model

Of Meaning

Terminology

SNOMED CT

EHR standards

Information Entities

Figure 5-‐2 Common Model of Meaning mapping (Figure 5-‐3 depicts our common framework that consists of the following three ontologies:

• Top-‐level ontology: as top-‐level ontology, BioTopLite is used (see Section 4). It will help to constrain the way in which information and domain ontologies combine.

• Domain ontology: as ontology for representing the healthcare domain, parts of SNOMED CT will be used. Its concepts are treated as classes35 in the sense of OWL. Classes aggregate individual entities according to their invariant features. They inherit their basic properties from top-‐level ontology categories like Process, Quality, Condition.

• EHR Information entity ontology: information entities are outcomes of clinical actions, such as observations, investigations, or evaluations. They refer to clinical concepts and represent

35 Apart from "Linkage concepts", which correspond to OWL object properties

SemanticHealthNet


the epistemic and contextual aspects of the information (e.g. past history, confirmed). The concepts of this ontology will be represented as subclasses of the top-‐level category Information object of BioTopLite. This category is disjoint from all other categories, i.e. nothing can be both an information entity and, e.g. a process, quality, or a material entity.

The framework proposed will make use of description logics (OWL-‐DL), supported by reasoners such as HermiT. Figure 5-‐3 depicts the architecture. In order to provide SIOp, we propose to annotate the information entities of clinical models by using DL expressions conforming to the shared logical and ontological framework presented. In this way, by means of using a DL reasoner the equivalence between the different DL expressions can be computed. For instance, if we want to obtain the past history of some disease from some patient, and there are three possible encodings for representing that information, we want to obtain it independently of the encoding used. This means that the use of a DL reasoner should tell us if they have the same or similar meaning.

Figure 5-‐3 Architecture schema of the approach

5.2 Information entities

The ontology of EHR information entities aims to describe the contextual and epistemic aspects of information. They are created as the result of the execution of processes. To what this information actually refers will be subject of further deliberation.

A natural candidate for an information artefact ontology would be the Information Artifact Ontology (IAO)36. There is one major drawback of using IAO in our proposal: in IAO all information entities refer to some entity that has to exist in reality. In SHN, however, we have to include the cases in which an information entity refers to some type of things (or concept), for which no instance exist in our realm of interest. For instance, there is information about a (supposed) tumour of a patient X, who is, in reality, healthy.

In SHN, information entities have been put under the BioTopLite category information object. This class is defined as a piece of information that exists independently of any potential material carrier. As subclasses of information object we have placed the following classes:

• SHN_clinical_information_item: it is a clinical information entity that refers to some clinical situation of a patient in which he/she might have some clinical condition(s).

• SHN_clinical_information_set: it is a clinical information entity that aggregates other data items of the same type that have something in common.

36 http://code.google.com/p/information-‐artifact-‐ontology/

SemanticHealthNet


• SHN_clinical_composition: it is a collection of clinical information entities that refers to some patient healthcare situation.

Additionally, the SHN_InformationObjectAttribute class, which has been created as subclass of the BioTopLite Quality category in order to represent qualities of information entities such as some epistemic terms (e.g. suspected, probable, etc.). We are aware that ascribing qualities to information entities may contradict assumptions in certain upper-‐level ontologies. They should rather be considered attributions to statements as qualities in a strict sense. So far they are primitives, but they could be fully defined in the future. For instance, "confirmed diagnosis" would then be a diagnosis that is the outcome of a confirmation process.

The central part of the Figure 5-‐4 provides a graphical representation of the present status of the EHR ontology of information entities. It will be further developed throughout the project. The left and right parts of the same figure correspond to the clinical process and clinical domain ontologies (SNOMED CT). Clinical information will be represented by using the information entity and the domain ontologies. However it is also important to represent how the information was obtained. Thus, it is also necessary to define a Clinical process ontology that represents all the healthcare activities, tasks, etc. (and their interactions) involved in the generation of some specific clinical information. This will require also defining the boundary between the context represented by the information entities (epistemic), and by the clinical process involved in the acquisition and record of the clinical information and how they relate each other. In the left part of the figure it can be observed that the relation outcomeOf relates the clinical process and the information entity ontologies. We consider that clinical information referred to by information entities is obtained as the result (outcomeOf) of the performance of some clinical action(s) (clinical process model).

At this moment further developments on this are still a matter of research in the project. Standards such as ContSys37, or the ContSys based implementation made by the Swedish National Board of Health and Welfare, and how they relate with the SNOMED CT Procedure hierarchy in the presented approach will be future subject of study.

Figure 5-‐4 Relationship between Clinical Processes, Information Entities and SNOMED CT ontologies. Clinical concepts are referred to but need not be instantiated (even "confirmed" di

diagnoses can be wrong)

37 ContSys CEN/ISO 13940, currently in the process of being harmonised with CEN/ISO 13606 in SIAMS http://www.contsys.net

SemanticHealthNet


5.3 Clinical Situations

In order to fully understand the previous figure it is necessary to introduce the notion of Clinical situation. An analysis of the SNOMED CT terms under the category "clinical finding ", together with their taxonomic links had shown that an interpretation of these finding concepts as clinical conditions (pathological entities) is problematic. For instance, the Tetralogy of Fallot38 is in SNOMED a subconcept of Pulmonic valve stenosis. As seen as pathological entities one would rather assert that Tetralogy of Fallot has Pulmonic valve stenosis as part. However, the taxonomic relation is acceptable if Tetralogy of Fallot means "patients while having a tetralogy of Fallot", and Pulmonic valve stenosis means "patients while having a stenosis of the pulmonic valves". We assume that this kind of wanted inferences had originated the design of the findings hierarchy.

We have therefore proposed to interpret SNOMED finding concepts as clinical situations, e.g. as phases of a patient's life during which a certain condition (structure, disposition, process)39 exists during all time. A clinical situation can also include more than one condition. In that case, a clinical situation of a person with two conditions a and b would be classified as a subtype of clinical situation with condition a and as a subtype of clinical situation with condition b.

We have formally analysed and described the relationship between a clinical condition (finding, disorder) and a clinical situation40 (see Annex II: Clinical Situation). We have introduced the relation includes as linking situations to conditions. In the finding hierarchy this relation would correspond to the role group relation41,42.

Although the complete notion of clinical situation cannot be fully defined in OWL, the following subclass axiom holds, e.g. for ClinicalSituationWithLungCancer (the SCT finding concept lung cancer):

ClinicalSituationWithLungCancer equivalentTo (includes some

(finding_site some LungStructure) (associated_morphology some MalignantNeoplasm))

This is just an ontological clarification of what is implicitly meant by ClinicalSituationWithLungCancer. IHTSDO in order to carry out the harmonization between SNOMED CT and the new ICD-‐11 disease classification is studying the interpretation of SNOMED CT disorders as clinical situations. It is still subject to discussion whether all SNOMED CT finding concepts are principally to be interpreted as denoting situations (under the above definition). In this case, the whole expression would conflate into the SNOMED CT concept lung cancer.

38 http://en.wikipedia.org/wiki/Tetralogy_of_Fallot 39Schulz S, Spackman K, James A, Cocos C, Boeker M. Scalable representations of diseases in biomedical ontologies. J Biomed Semantics. 2011 May 17;2 Suppl 2:S6. 40 Andrade A, Martínez-‐Costa C, Brochhausen M, Spackman K, Schulz S. The Ontology of Clinical Situations. Currently under review at JAMIA. 41 Spackman KA, Dionne R, Mays E, Weis J. Role grouping as an extension to the description logic of Ontylog, motivated by concept modeling in SNOMED. Proc AMIA Symp. 2002:712-‐6. 42 Schulz S, Hanser S, Hahn U, Rogers J. The semantics of procedures and diseases in SNOMED CT.Methods Inf Med. 2006;45(4):354-‐8.

SemanticHealthNet


5.4 Ontology-based binding of information entities to clinical situations

Once clinical situations have been introduced we will show how clinical situations represented by SNOMED CT concepts are bound to the ontology of information entities. This is done by the primitive relation isAbout as defined in the IAO ontology for relating an information artefact to an entity (e.g. clinical entity). The following OWL DL expression (a) shows how would be represented the fact that the patient is diagnosed with lung cancer.

SHN_clinical_information_item and isAbout QUANTIFIER ClinicalSituationWithLungCancer (a) In this expression SHN_clinical_information_item will represent the outcome of a diagnosis, in case we subscribe to the definition of diagnosis is defined as "the identification by a medical provider of a condition, disease, or injury made by evaluating the symptoms and signs presented by a patient"43. The relation isAbout is linked by a quantifier to the clinical concept.

We will refine the above expression as follows (b):

DiagnosisWithLungCancer equivalentTo SHN_clinical_information_item and isAbout some ClinicalSituationWithLungCancer (b)

and outcomeOf some DiagnosingAction

If the diagnosis included more than a condition such as lung cancer and diabetes it would be represented as (c):

DiagnosisWithLungCancer&Diabetes equivalentTo Diagnosis and isAbout some ClinicalSituationWithLungCancer

and isAbout some ClinicalSituationWithDiabetes (c)

with SHN_clinical_information_item and outcomeOf some DiagnosingAction being equivalent to Diagnosis.

Then, the above diagnosis (c) would be classified as a subtype of a diagnosis with lung cancer and as subtype of diagnosis with diabetes. A query for getting all patients with lung cancer would therefore provide those who have lung cancer and also diabetes.

DiagnosingAction is defined in the Clinical Process ontology shown in the left part of Figure 5-‐4. This ontology will represent the different healthcare processes, activities, etc. and their interactions, that will provide clinical information.

Diagnoses as an outcome of cognitive processes do not necessarily correspond to existing entities on the side of the patient. Diagnostic statements may convey some probability or certainty, they may express merely suspicions or at-‐risk situations, and they can be plainly wrong. It would therefore be inadequate in our framework to make existential assumptions like in (d), in which it is stated that all lung cancer diagnosis imply the existence of lung cancer:

LungCancerDiagnosis equivalentTo Diagnosis and isAbout some ClinicalSituationLungCancer (d)

43 http://www.thefreedictionary.com/diagnosis

SemanticHealthNet


In standard Description logics (as used in biomedical ontologies), backed by the realist ontology paradigm, formal expressions are supposed to have single interpretations and are therefore evaluated to true or false. This precludes having sets of interpretations, i.e. possible worlds, would have the advantage to represent contingent information. In our case, this would allow us to express LungCancerDiagnosis as something that possibly refers to real lung cancer44. Modal logics could overcome this problem, but they are not supported by the ontology formalisms that are in practical use in the biomedical community.

The impossibility of expressing contingent information in OWL-‐DL and the unacceptability of asserting existence where existence cannot be guaranteed require some robust approximation. This is the reason why we propagate the use of the universal quantifier (only) for relating information entities with other entities.

By stating

Diagnosis_X equivalentTo Diagnosis and isAbout only X (e)

we restrict ourselves to state that the Diagnosis of this type is about X in case it is about anything, or reformulated to a semantically equivalent statement:

Diagnosis_X equivalentTo Diagnosis and not (isAbout some (not X)) (f)

i.e. it is not about anything else than X.

We will come back to discuss the shortcomings of this approach and discuss alternatives.

The epistemic state of an information entity like a diagnosis can be further specified by assigning an appropriate attribute to the information entity, e.g. that the diagnosis is suspected:

SuspectedLungCancerDiagnosis equivalentTo Diagnosis and isAbout only ClinicalSituationWithLungCancer

and hasInformationObjectAttribute some Suspected (g)

Once the diagnosis is confirmed, it can be represented by modifying it as follows:

LungCancerDiagnosis equivalentTo Diagnosis and isAbout only ClinicalSituationWithLungCancer (h) and hasInformationObjectAttribute some Confirmed

It is still subject to debate whether this expression – in case of confirmed diagnosis, the expression (h) should be enhanced as follows:

LungCancerDiagnosis equivalentTo Diagnosis and isAbout only ClinicalSituationWithLungCancer and isAbout some ClinicalSituationWithLungCancer

and hasInformationObjectAttribute some Confirmed (i)

This would require that a confirmed diagnosis can never be false. A counterargument would be that the qualification of a diagnosis as confirmed is an act in the responsibility of the physician, which 44 The lack of expressing possibilities in standard DL [Rector.owled.2008] [doi: 10.1186/1471-‐2105-‐8-‐377]

SemanticHealthNet


could nevertheless be based on wrong information. Querying clinical data on "confirmed" diagnoses and comparing them to a ground truth, e.g. from pathology, would be a relevant task for quality assurance. The decision of what is the taken as the ground truth would nevertheless depend on the use case. It would therefore be safer for the representation of clinical information if no existential claim is made at all at the level of recording. The existence of entities at the patient's side could then only be made when it comes to data analysis.

Our solution is challenged by the frequent cases in which the basic diagnosis is confirmed but the cause of the disease is only suspected. We are on the safe side if represent this by two separate information entities:

LungCancerConfirmedDiagnosis equivalentTo Diagnosis and isAbout only ClinicalSituationWithLungCancer and hasInformationObjectAttribute some Confirmed (j)

LungCancerCausedByHeavyCigaretteSmokerSuspectedDiagnosis equivalentTo Diagnosis and isAbout only (ClinicalSituationWithLungCancer and dueTo some ClinicalSituationOfHeavyCigaretteSmoker) and hasInformationObjectAttribute some Suspected (k)

We would get undesired results if we create enhance (j) in terms of an existential statement

LungCancerConfirmedDiagnosis equivalentTo Diagnosis and isAbout some ClinicalSituationWithLungCancer and isAbout only ClinicalSituationWithLungCancer and hasInformationObjectAttribute some Confirmed (i)

This implies that a patient situation of the type ClinicalSituationWithLungCancer exists.

cs123 Type ClinicalSituationWithLungCancer

If we apply (k) to this, then c123 could be reclassified as an instance of ClinicalSituationWithLungCancer and dueTo some SituationOfHeavyCigaretteSmoker. This would clearly be wrong. To be safe, every diagnosis should be represented as a separate statement that refers to the clinical concept only via value restriction. We hypothesize that this provides sufficient reasoning power. However, further going interpretations that assert the existence of entities on the side of the patients should be aware of this problem.

The expression (j) describes that the patient has been diagnosed with lung cancer and the expression (k) states that the possible cause of the lung cancer is smoking. Therefore, the EHR of the patient would be annotated with the two above expressions. This representation will allow obtaining this patient as an answer to the query "Give me all patients with lung cancer" and also as the answer to the query "Give me all patients with lung cancer caused probably by smoking".

All additional assertions, e.g. who made the diagnosis, when it was made etc. (i.e. healthcare context) can also be added to the above DL expression as follows:

Diagnosis and isAbout only (Cancer and hasLocus some Lung) WHAT? and (outcomeOf some (DiagnosingAction and HOW? hasParticipant some (HumanOrganism and bearerOf some ClinicianRole)) and WHO? hasPointInTime some (PointInTime and denotedBy some DateTime) and WHEN? hasLocus some HealthcareFacility) WHERE? and hasInformationObjectAttribute some Suspected

SemanticHealthNet


However, even if using description logics syntax for a global representation of information models plus ontologies, the reasoning would be restricted to the filler of the isAbout relation, which may become arbitrarily complex, according to the degree of postcoordination needed. All additional assertions, e.g. who made the diagnosis, when it was made etc. could then be made at A-‐box (data) level or being represented by the information structures.

This would appropriately address the fact that DL representations follow an open world semantics, whereas information models – at least implicitly – follow a closed world semantics. From the implementation side it would be important to distinguish two steps: DL reasoning and querying would be restricted to the WHAT clause, whereas database querying would be addressed by the HOW, WHO, WHEN, and WHERE clauses. This is in line with the current debates at CIMI, but requires more in-‐depth analysis.

Once the shared logical framework has been introduced, next we will explain how it can be used in order to detect semantic equivalent expression and thus, for enabling semantic interoperability.

In order to provide semantic interoperability, we propose to annotate the information entities of clinical models by using DL expressions conforming to the shared logical framework explained and then, by using a DL reasoner the equivalence between the different DL expressions will be computed. Figure 5-‐5 shows three isosemantic models for representing that the patient does not have a past history of diabetes. By using a DL reasoner we will know if the information represented by the three forms is the same or not (semantically equivalent or not).

Figure 5-‐5 No past history of diabetes mellitus

The way how the binding between information entities and clinical entities or concepts should be ontologically expressed, is currently subject to intensive discussions. It has to be further analysed whether the relation isAbout appropriately captures this purpose. Information artefacts in clinical models are fractal. A whole archetype is about many different things, and only its atomic structures can be expected to be about one thing only, e.g. about one (kind of) disorder, like in the diagnostic statement "Confirmed diagnosis of congestive heart failure". Even this statement is, additionally, about a patient. Using isAbout in a broader sense would then violate the use of the universal restriction operator "only". We therefore introduce subrelations such as isAboutSituation or isAboutSubjectOfRecord.

SemanticHealthNet


The bigger problem, however, is that a complete account of "statements about" would require higher logics, whereas we are bounded to a subset of first-‐order logics. The challenge is therefore to find appropriate "logical approximations", i.e. first order projections of higher order statements. The approach underlying our current models and examples makes use of universal quantification (isAboutSituation only).

The issue is that the diagnostic statement is second order. It is about understanding how the information entity relates to the clinical class, but not necessarily to any member of that class.

Higher order statements can be expressed in OWL full (or in OWL 2 with "puns"45) as follows:

TentativeDiagnosis and isAboutSituation value HeartFailureSituation

The keyword value indicates that the statement is about the class HeartFailureSituation (OWL Full) or a pun on that class (OWL-‐DL).

Expressed this way, the statement is "It is tentatively believed that this patient situation belongs to the class of HeartFailureSituations." In this reading there is no need for any heart failure, hypothetical or otherwise, because the statement is second order and about a statement that references the class of situations including heart failure rather than any individual heart failure situation. (If the tentative belief turns out to be true, then an instance of HeartFailureSituation must exist, but not otherwise, and we may never know). Using classes themselves in this way is an alternative for general issues around value sets46.

The use of value and puns requires implementing auxiliary mechanisms to achieve all the intended results, since OWL reasoners do not understand the link between the individual HeartFailureSituation and the class HeartFailureSituation.

A more radical but possibly more practical solution would be to say that what we should be doing is querying the ontology rather than formulating a class definition. Queries are second order, i.e. about the ontology, but allow negation as failure. Possible query languages include SPARQL 1.1 with the OWL entailment semantics (just being implemented but not easily available at the moment). A possible temporising, or even longer term solution might be a language such as Ocean Informatics' query language TQL. It would allow for expressions such as "Hypertension except hypertension in Pregnancy" with the expected semantics of negation as failure rather than provable falsehood. It also allows compensation for "idiosyncrasies" in SNOMED or other terminologies. In a natural extension of Manchester syntax this could look like:

TentativeDiagnosis and isAbout query (SELECT X: WHERE X subClassOf HeartFailureSituation)

This is just a way of selecting a collection of classes as values rather than a single class. As with OWL full or puns, it would need additional mechanism beyond standard OWL reasoners.

45 "Puns" are a approximation for the OWL full intent that avoids the reasoner having to deal with higher order constructs at the cost of allowing them to be expressed but reasoning about them totally separately.) 46 A Rector, R Qamar, T Marley: Binding ontologies and coding systems to electronic health records and messages, Applied Ontology 4 (2009), 51-‐69. http://www.cs.man.ac.uk/~rector/papers/Terminology-‐binding-‐final-‐revision-‐embedded-‐single-‐rector%20copy.pdf

SemanticHealthNet


5.5 Negations

Negations are crucial in medical documentation. From a representational point of view the problem is that negations can occur at several places, both in the information model and in the ontology. Figure 5-‐5 depicts three possibilities to express that there is no past history of diabetes. In the two first ones the negation is represented in the information model, in the heading (e.g. No Previous History) and as a check list (e.g. Yes, No, Unknown) respectively. In the third one it is represented in classes from SNOMED CT (Not diabetic).

In Figure 5-‐6 we have added free text annotations for each information entity into which the representation of the statement No history of diabetes can be dissected. They describe the meaning of each of the information entities used to represent the clinical statement. In this case these annotations include the values selected for the patient (green comments).

Figure 5-‐6: Free text annotations -‐ no past history of diabetes –

If we transform the above free text annotations into DL expressions, for each of the three representations we will obtain the following expressions for each of the information entities (see Table 5-‐1). Note that each table corresponds to one of the representations and that the dark grey row represents the green free text comments (i.e. value).

Textual description DL annotation

No Previous History InformationEntity and isAboutSituation only ClinicalSituationWithoutDisorder and outcomeOf some HistoryTakingAction

Value (Diabetes) InformationEntity and isAboutSituation only ClinicalSituationWithoutDiabetes and outcomeOf some HistoryTakingAction

1

2

3

1

SemanticHealthNet



Past History InformationEntity and isAboutSituation only ClinicalSituation and outcomeOf some HistoryTakingAction

Value (No diabetes)

InformationEntity and isAboutSituation only ClinicalSituationWithoutDiabetes and outcomeOf some HistoryTakingAction


Past History InformationEntity and isAboutSituation only ClinicalSituation and outcomeOf some HistoryTakingAction

Value (No diabetes)

InformationEntity and isAboutSituation only ClinicalSituationWithoutDiabetes and outcomeOf some HistoryTakingAction

Table 5-‐1 Isosemantic DL annotations for no past history of diabetes

SNOMED CT currently does not allow the use of negation and disjunction, whereas OWL DL supports both. Negation can be embedded in the clinical situation definition as in the following approximative definition:

ClinicalSituationWithoutX equivalentTo ClinicalSituation and not hasProcessualPart some X

Assuming that in SNOMED CT clinical findings are interpreted as clinical situations as introduced above, this will require adding negation and disjointness capabilities to SNOMED CT, what means to move SNOMED CT to the OWL DL representation. Kaiser Permanente is currently investigating the benefits and effects on classification performance of introducing limited negation into SNOMED CT.

We have created three instances in the ontology corresponding to each of the three forms, and we have added them the corresponding axioms of the previous tables. If we launch the reasoner, and make a query to retrieve all the patients with no past history of diabetes, it will show the three instances as result:

Figure 5: DL reasoning result -‐ no past history of diabetes –

2

3

SemanticHealthNet


However, if we make a query for obtaining the patients without past history of condition, we will only get the instance from form 1 (No_Past_History_Diabetes_F1). This is due to the fact that not having a past history of diabetes mellitus does not imply not having a past history of any condition.

Figure 5-‐6: DL reasoning result – no past history of condition -‐

SemanticHealthNet


6 Discussion on alternatives, options, consequences and criteria for choice

In the context of SemanticHealthNet it has been proven necessary to discuss the options we have for representing clinical information and which criteria should be followed in order to choose the most suitable representation. We think that more than one representation is needed, depending on the specific purpose. We have identified two main use cases:

1) Creation of clinical models 2) Query of clinical data instances of isosemantic models

6.1 Creation of clinical models

A representation for the definition of clinical models will check their syntactic and semantic consistency. For instance, in the context of the definition of openEHR and ISO 1306 archetypes, ADL was selected as representational language and the Archetype Object Model as the constraint model. There are already several archetype editors available that allow checking the syntactic correctness of the models created (with regard to the ADL grammar). However, to check whether these models are semantically correct is something that at present is not supported by these tools. The semantic consistency of clinical models must take into consideration not only the consistency of the information model (Reference Model) itself but also the consistency with regard the semantic provided by the bindings. In order to support the definition of new models, more than one representation formalism might be needed. For instance, a graphical representation which is targeted to the clinician and which defines the model (e.g. kind of mind map); another one to check the syntactic consistency (e.g. ADL parser) and another one for checking the semantic consistency (e.g. ontology) of the model.

Although ontologies are, in theory, the most suitable representation to be used to ensure semantic consistency of the models, we have to admit that this would require very expressive representations that are currently not available for SNOMED CT. At least rough category constraints can be obtained by declaring (most of) the SNOMED CT hierarchies as mutually disjoint. Thus, violations of category restrictions (e.g. that a diagnostic statement is about a drug) could be prevented. SemanticHealthNet is current analyzing ways of how to define value sets referring to SNOMED CT concepts. The idea formulated in the above model (including closed-‐world SPARQL queries into OWL axioms) could be taken up and adapted to this purpose.

As part of the clinical model creation task, there are some other related activities that can benefit from the representation formalism used. Examples of this are quality assurance of models, clinical data validation or management of clinical models. This last task will comprise searching activities and detection of inconsistencies among clinical models (e.g. wrong specializations).

Clinical models as the proposed by Mayo Clinic, the Clinical Element Model (CEM)47, or the one that will result from the CIMI initiative are examples of modelling proposals that might be suitable for the creation of models. They are prescriptive proposals, i.e. they define rules that clinical models have to follow when created.

47 http://informatics.mayo.edu/sharp/index.php/CEMS

SemanticHealthNet


6.2 Query of clinical data instances of isosemantic models

Opposed to the prescriptive nature of the modelling proposal for defining new models, this second use case focuses on the representation of clinical data from isosemantic models, in order to support the homogeneous query of data from heterogeneous medical repositories. It assumes that present hospital information systems represent clinical information using different EHR models (proprietary or based on some standard), and the cost of implementing a map to a new model that supports semantic interoperability is too high.

A representation that fits these requirements should provide a semantic layer to current information systems. This semantic layer would sit between these information systems and a sophisticated query system. Their main roles will be to provide a consistent semantic representation of clinical data and to allow detecting semantic equivalences among them. Thus, a sophisticated query system will use the results from the semantic layer in order to build queries for retrieving data from heterogeneous medical repositories.

For building the semantic layer, we assume that the approach presented in this document, in Section 5, could be one of the most suitable options.

However, the costs of providing this semantic layer do not have to be underestimated. This approach requires the effort of creating a semantically consistent representation to present clinical data. This is a tedious and not trivial task, which cannot be achieved without a very good support by appropriate informatics tools.

In order to make this approach feasible, a clinical model editor should internally support the creation of this semantic representation. It could access the ontology resources we propose during model design, an author could then be guided to construct well formed models that are consistent internally, and as consistent as is appropriate with other models that have previously been developed and published. In fact, it is at this point is where both use cases, (1) clinical models creation and (2) query of clinical data instances of isosemantic models, can play together. In the first use case we mentioned that ontologies are the most suitable representation to be used to ensure the semantic consistency of the models created, and this is just one of the services we are providing with the semantic layer. The other service would be the detection of semantic equivalent expressions among heterogeneous representations of clinical data (i.e semantic interoperability).

Discussion

The approach of the Clinical Modelling Initiative (CIMI)48 is to define a set of patterns for the creation of clinical models; and their binding to a clinical vocabulary has been identified as one of the biggest sources of problems. A similar situation occurred in the context of the NHS Logical Record Architecture (LRA)49 project and also current DCM practice follows similar lines and every data element is bound to specific SNOMED CT or other terminology concepts, or to a specified reference set. We are conscious that the use of ontology-‐based vocabularies such as SNOMED CT is of a great value and indispensable for achieving SIOp, but one consequence of its use is that it makes the creation of clinical models harder. The main problem is the inclusion of information aspects in

48 http://informatics.mayo.edu/CIMI/index.php/Main_Page 49 http://www.connectingforhealth.nhs.uk/systemsandservices/data/lra

SemanticHealthNet


SNOMED CT, as explained in previous sections, but sometimes also the selection of one or another term for encoding a given piece of information is not trivial. This situation led the LRA approach to develop a way of excluding some SNOMED CT terms when defining the bindings. In a similar vein, CIMI proposes to constrain the scope of SNOMED concepts to which information entities can be bound to. In our opinion this can be a provisional solution (a first step to SIOp), which may work in some bounded environment, but given the huge size of the ontology, to exclude all the undesired terms would become unmanageable as well as might turn out into breaking SIOp. SemanticHealthNet also acknowledges that those problematic terms exist because they are used to represent a complex clinical statement with one code. They should therefore motivate us to represent even these complicated statements in a principles way.

We think that for avoiding ad-‐hoc and not scalable solutions solid principles have to be followed (as the ones provided by formal and upper level ontologies), which will make also sometimes the way to SIOp more painful. As in the case of the above-‐mentioned clinical modelling approaches (LRA and CIMI), some subsets from SNOMED CT will have to be selected but the criteria to exclude them will have always a clear foundation, which may require of the suggestion of new semantic representational patterns for them.

None of the approaches, CIMI, LRA and nor the one presented here in the SemanticHealthNet context is so different from each other.

We think that the combined efforts from all of them can make really progress towards Semantic Interoperability. As mentioned earlier, the semantic layer provided by SemanticHealthNet could provide a semantic consistency checking of the models created as well the detection of semantic equivalent expressions in other models (e.g. pre-‐ vs. post-‐coordination). On the other hand, the patterns provided by clinical modelling approaches as CIMI, could define where to place all the epistemic and contextual aspects of information and assure the syntactic consistency of the models created, as well as providing other tasks related with the creation of clinical models (e.g. searching & management of clinical models, clinical data validation, etc.).

SemanticHealthNet


7 Models and Use Cases

In this section two different SIOp approaches are presented. Use case 1 determines the equivalence between the information models used by different EHR systems and tries to classify the exchanged clinical data into the receiver system. Use case 2 determines the equivalence between clinical data gathered in heterogeneous ways so that systems and humans can make successful queries in order to retrieve the desired information form heterogeneous representations. The section starts from simple data acquisition forms (see Figure 7-‐1), describes them using different clinical models, and demonstrates semantic annotations in OWL DL, the semantic equivalence of which can be proven by a DL classifier. In both use cases it is supposed that each system uses a different EHR standard. However, SIOp is not only an issue when using different EHR standards but also when using the same one, in which many different isosemantic models can be built. In practice the greater heterogeneity will not be between two different information models but between different clinical models that have been created by different teams with overlapping scope.

The examples chosen are just fictitious and deliberately simple. They do not pretend to be real examples but just a way of clarifying some of the issues presented in this document. We hope that they sufficiently exemplify how important aspects of the heart failure use case can be addressed. On the other hand they are generic enough so that solutions for other use cases can be abstracted away from them. Currently, the SemanticHealthNet Workstream one (WPs 1 -‐3) is working together with WP4 in defining a Heart Care summary report that we will later use as input to create more real and complete clinical models.

For the time being it is not yet clear, which of these two (major) types of use cases will prevail in SemanticHealthNet, and which variation of these use case may be introduced. For instance, one variation would be that an ontology-‐based semantic annotation (such as the outcome of approaches following use case type 1) is imported into an "empty" information model.

Figure 7-‐1 Heterogeneous clinical application forms for recording the same information

SemanticHealthNet


The figure shows three screenshots from three different fictitious and simplified user forms, supposed to record some clinical information obtained in a clinical encounter, more specifically, the clinical history of a patient and his diagnosis. Each form captures the same clinical information but in a different way, what provides us with a semantic interoperability use case. We can imagine a scenario in which the three EHR systems to which these clinical forms belong to, would like to exchange this kind of information. In the following subsections the two different interoperability use cases proposed are explained based on this example.

7.1 Use case 1 (Data transfer)

Data elements (d1, d2, ... , dn) recorded in System X using clinical model C following standard S are transferred to System Y using clinical model D and following standard T.

The exchange process will consist in the following phases:

• Phase 1: Semantic clinical information extraction • Phase 2: Clinical data exchange • Phase 3: Syntactic clinical information mapping

7.1.1 Phase 1: Semantic clinical information extraction

This phase consists of identifying the same clinical information in each of the systems independently of how it has been recorded or captured. In order to do it, each of the information entities will be annotated with Description Logics (OWL-‐DL) expressions according to the ontological framework proposed in Section 5. These DL annotations will not be the same for each form but by using a DL reasoner they should be classified according to their definition.

Figure 7-‐2 depicts the excerpt from each clinical form that states that the patient does not have a history of diabetes mellitus. It can be observed how this information has been captured in a different way in each one. In form one there is a text field with the text "Diabetes" and three radio buttons with the options "Yes, No, Unknown" respectively, where the second option has been selected. In form two there is a text field named "Other diseases" and the SNOMED CT code "no diabetes" has been selected from a combo box. Finally in form three, this information has been recorded in the part of the form named "Others" where the SNOMED CT code has been also selected from a combo box. In the three of them the information has been recorded in the patient history section.

Figure 7-‐2 Heterogeneous representations for No History of diabetes

SemanticHealthNet


In Figure 7-‐2, we have added free text annotations for each information entity into which the representation of the statement no history of diabetes can be dissected. The yellow comments describe the meaning of each of the information entities used to represent the clinical statement. The green comments describe the values selected for the patient. We can imagine that each of these forms implement a different EHR standard. Following, an excerpt of each of the corresponding clinical models for the forms one and two according to ISO 13606 and OpenEHR standards respectively are provided. Similarly, the same information can be represented in Detailed Clinical Model format, or Health Level 7 clinical statements. For ease of reading, only two examples are provided.

Figure 7-‐3 ISO 13606 clinical model used for representing No History of diabetes (form one)

Figure 7-‐4 OpenEHR clinical model used for representing No History of diabetes (Form one)

It can be observed how the stressed comments added to the information entities correspond to the yellow text annotations of Figure 7-‐2. In the ISO 13606 model, the information entity ELEMENT[at0002] is bound to the SNOMED CT concept Diabetes mellitus. However, in the OpenEHR model the information entity ELEMENT[at0003] is not bound to a specific kind of disorder. The OpenEHR model in this case allows encoding any kind of disorder. In order to identify the semantic equivalence between the information represented by the different clinical forms we will add the DL annotations shown in Table 7-‐1 and Table 7-‐2 for each of the clinical entities of the ISO 13606 and OpenEHR representations.

We have created DL annotations only for the clinical models without including the concrete values (patient data). Later, we will show the corresponding data instances for both ISO 13606 and openEHR models that include the patient data and we will try to classify them according to the different clinical form representations.

SECTION[at0000] matches { -- History of problem / condition members cardinality matches {1..*; unordered} matches { ENTRY[at0001] matches { -- Problem / Condition items cardinality matches {1..*; unordered } matches { ELEMENT[at0002] matches { -- Diabetes Mellitus value matches { SIMPLE_TEXT[at0003] matches { -- SIMPLE_TEXT originalText matches {"Yes","No","Unknown"} } } } }…}}}

SECTION[at0000] matches { -- History of condition items cardinality matches {0..*; unordered; unique} matches { EVALUATION[at0001] matches { -- Other disorders data matches { ITEM_TREE[at0002] matches { -- ITEM_TREE items cardinality matches {0..*; unordered} matches { ELEMENT[at0003] matches { -- Disorder value matches { DV_CODED_TEXT[at0004] matches {*} -- DV_CODED_TEXT } } }}…}}}}

SemanticHealthNet


In order to identify the information entity we are annotating in the clinical model, each one is identified by using the ADL path50 shown in the tables. These annotations are based on the ontological infrastructure presented in Section 5. If we have a look at the DL annotations done to the ISO 13606 clinical model, it can be observed how they do not include the clinical data, i.e. "no diabetes mellitus" but only the information entity that refers to the clinical entity "diabetes mellitus", that does not assert the existence or absence of the disease by using the universal quantifier only (i.e. SHN_clinical_information_item and isAboutSituation only SHN_SituationOfPersonWithDiabetesMellitus).

Table 7-‐1 ISO 13606 representation according to Fig. 7-‐3 (Form one)

A similar situation happens with the annotations done to the openEHR clinical model. However in this case we know that the patient might have a disorder but we do not know which one exactly (i.e. SHN_clinical_information_item and (isAboutSituation only SHN_SituationOfPersonWithDisorder).

Table 7-‐2 openEHR representation according to Fig. 7-‐4 (Form two)

The conjunction of all the DL annotations provided in each table conform the DL expression that correspond to the ISO 13606 and openEHR representation for the clinical models for History of diabetes (see Table 7-‐3).

50 http://www.openehr.org/svn/specification/TRUNK/publishing/architecture/overview.pdf

ISO 13606

ADL path Textual description DL annotations

content[at0000]/ History of problem / condition

SHN_clinical_information_item and (outcomeOf some SCT_history_taking) and (isAboutSituation only SHN_Clinical_Situation)

content[at0000]/members[at0001]/ Problem / Condition

SHN_clinical_information_item and isAboutSituation only SHN_ClinicalSituation

content[at0000]/members[at0001]/ items[at0002]/

Diabetes mellitus

SHN_clinical_information_item and isAboutSituation only SHN_SituationOfPersonWithDiabetesMellitus

openEHR

ADL path Textual

description DL annotations

content[at0000]/ History of condition

SHN_clinical_information_item and (outcomeOf some SCT_history_taking) and (isAboutSituation only SHN_Clinical_Situation)

content[at0000]/items[at0001]/ Other disorders

SHN_clinical_information_item and (isAboutSituation only SHN_SituationOfPersonWithDisorder)

content[at0000]/items[at0001]/ data[at0002]/

Disorder SHN_clinical_information_item and (isAboutSituation only SHN_SituationOfPersonWithDisorder)

SemanticHealthNet


openEHR (Form two) DL annotations (History of disorder)

History of condition and Other disorders and

Disorder

ISO 13606 (Form one) DL annotations (History of diabetes)

History of condition and Problem / Condition and

Diabetes mellitus SHN_clinical_information_item and (outcomeOf some SCT_history_taking) and (isAboutSituation only SHN_Clinical_Situation) and SHN_clinical_information_item and isAboutSituation only SHN_SituationOfPersonWithDisorder and SHN_clinical_information_item and isAboutSituation only SHN_SituationOfPersonWithDisorder

SHN_clinical_information_item and (outcomeOf some SCT_history_taking) and (isAboutSituation only SHN_Clinical_Situation) and SHN_clinical_information_item and isAboutSituation only SHN_ClinicalSituation and SHN_clinical_information_item and isAboutSituation only SHN_SituationOfPersonWithDiabetes

Table 7-‐3 DL expression for openEHR and ISO 13606 clinical models

If we now launch the DL reasoner the three representations for No history of diabetes (i.e. NoHistory_of_diabetes_record_X, where X corresponds to the form number) will classified as shown in Figure 7-‐5. It has to be noted how the representation for the form ❸ (NoHistory_of_ diabetes_record_3) is more generic than the other two representations. In form three the information has been recorded in the part of the form named "Others" and it could therefore refer to any condition of the patient. Directly classified under the representation for form three is the representation for form two. In this last case the information has been recorded in the part of the form named "Other diseases" and therefore we know that the clinical data recorded should be a kind of disease. Finally, the representation for from one is the most specific one. In this case the text field with the text "Diabetes" in the form specify the disease from which we want to obtain some information.

Figure 7-‐5 Reasoning with the DL annotations for the No history of diabetes use case

For each of the clinical information entities that compose the history of the patient, as well as for the diagnosis section, the same annotation process has to be performed. Figure 7-‐6 shows the final DL representation for the form one; the others would have a similar representation. Note that Diagnosis_1, History_FH_of_heart_failure_record_1, etc. are defined OWL classes but their full definition has been omitted here.

SemanticHealthNet


Figure 7-‐6 final DL representation for the form one

7.1.2 Phase 2: Clinical data exchange

This phase consists of exchanging clinical data between a system X and a system Y by using the equivalences between the DL annotations performed in phase one and adding the DL representation for the clinical data to be exchanged. These last ones will be received by the asking system if they are correctly classified in that one.

We will explain it by using the same example of phase one. Imagine that the forms one and two from Figure 7-‐1 belongs to the EHR systems X and Y, based on ISO 13606 and openEHR standards respectively. The system Y asks for the history of disorders of some patient of system X by using the openEHR archetype shown in Figure 7-‐4. Assuming that the equivalence between the different DL annotations for each information entity have been inferred in phase one (see Figure 7-‐5), the next step consists of exchanging the clinical data. Figure 7-‐9 shows an excerpt of the data extract generated for the ISO 13606 clinical model. This data should be received by the OpenEHR system.

Figure 7-‐7 Excerpt of ISO 13606 EHR data extract

Table 7-‐4 shows all the DL annotations for both ISO 13606 and openEHR systems representations. The dark grey row at the bottom of the ISO 13606 OWL DL expressions representation corresponds to the DL expression for the clinical data to be exchanged (no diabetes mellitus) from the ISO 13606 system to the openEHR one.

Class: EHR_record_one EquivalentTo: shn_composition and ('has abstract part' some Diagnosis_1) and ('has abstract part' some History_FH_of_heart_failure_record_1) and ('has abstract part' some History_of_chest_pain_record_1) and ('has abstract part' some NoHistory_of_diabetes_record_1) SubClassOf: shn_composition

ELEMENT[at0002] matches { -- Diabetes Mellitus value matches { SIMPLE_TEXT[at0003] matches { -- SIMPLE_TEXT originalText matches {"Yes","No","Unknown"} }} <items type="ELEMENT"> <archetype_id>at0002</archetype_id> <name type="SIMPLE_TEXT"> <originalText>Diabetes mellitus</originalText> </name> <value type="SIMPLE_TEXT"> <archetype_id>at0003</archetype_id> <originalText>No</originalText> </value> </items>

SemanticHealthNet


OpenEHR ⇐ ISO 13606 openEHR (Form two)

DL annotations (History of disorder) History of condition and Other disorders and

Disorder

ISO 13606 (Form one) DL annotations (History of diabetes)

History of condition and Problem / Condition and

Diabetes mellitus SHN_clinical_information_item and (outcomeOf some SCT_history_taking) and (isAboutSituation only SHN_Clinical_Situation) and SHN_clinical_information_item and isAboutSituation only SHN_SituationOfPersonWithDisorder and SHN_clinical_information_item and isAboutSituation only SHN_SituationOfPersonWithDisorder

SHN_clinical_information_item and (outcomeOf some SCT_history_taking) and (isAboutSituation only SHN_Clinical_Situation) and SHN_clinical_information_item and isAboutSituation only SHN_ClinicalSituation and SHN_clinical_information_item and isAboutSituation only SHN_SituationOfPersonWithDiabetes and

SHN_clinical_information_item and isAboutSituation some SHN_SituationOfPersonWithoutDiabetes

Table 7-‐4 DL expressions for information exchange between ISO 13606 and OpenEHR systems

If we launch the DL reasoner again (see Figure 7-‐8) the clinical data from the ISO 13606 system will be properly classified in the openEHR system. It can be observed in the figure, how "NoHistory_of_diabetes_record_1_withData" that corresponds to the ISO 13606 DL annotations shown in Table 7-‐4, is classified under the class "NoHistory_of_diabetes_record_2" that represents the DL annotations shown in the same table for the OpenEHR representation.

Figure 7-‐8 Reasoning with the DL annotations for the No history of diabetes use case

7.1.3 Phase 3: Syntactic clinical information mapping

This phase consists of transforming the clinical information extracted from a system X into a representation according to another system Y. First, in phase one, the equivalence between the different representations of clinical information has been inferred. Second, in phase two, clinical data have been properly classified in the receiver system. Then, in this phase, in order to integrate the received data into the receptor system, a set of syntactic mapping rules will have to be applied to the clinical information from system X in order to transform them to a representation according to system Y. We will explain it by using the same example of phases one and two in which the diabetes information is exchanged from system X (ISO 13606) to system Y (openEHR). Therefore, this step will

SemanticHealthNet


consist of transforming each of the corresponding ISO 13606 information entities to openEHR in order to represent the clinical information from system X in system Y, as if it had been recorded in the last one. This requires of applying a set of syntactic mapping rules previously defined, in this case established to transform ISO 13606 information entities into openEHR ones..

openEHR ISO 13606 Textual

description ADL path Textual description ADL path

History of condition content[at0000]/

History of problem / condition

content[at0000]/

Other disorders content[at0000]/items[at0001]/ Problem /

Condition content[at0000]/members[at0001]/

Disorder content[at0000]/items[at0001]/

data[at0002]/items[at0003]/

Diabetes mellitus

content[at0000]/members[at0001]/

items[at0002]/

No Diabetes mellitus

content[at0000]/items[at0001]/

data[at0002]/items[at0003]/ value[at0004]/value

No content[at0000]/members[at0001]/

items[at0002]/value[at0003]/originalText/

Table 7-‐5 Correspondence between syntactic paths from ISO 13606 and openEHR clinical models

In Table 7-‐5 (above), each row shows an OpenEHR to ISO 13606 information entity mapping and its corresponding ADL paths in both standard representations. The last row in the table corresponds to the value, i.e. the clinical data recorded by the clinician about the patient. In order to interpret the value, it has to be done together with the information entity that contains it. For instance, the information entity ELEMENT[at0002], which refers to Diabetes mellitus, has to be interpreted together with the value "No" (i.e No diabetes mellitus). The same occurs in the openEHR representation where the information entity ELEMENT[at0003], which refers to the annotation Disorder has to be interpreted together with the value (i.e. SNOMED CT expression "No diabetes mellitus").

Then, in the openEHR representation the value consists of the SNOMED CT code for No diabetes mellitus while in ISO 13606 consists of the string "No" that refers to the information entity ELEMENT[at0002] and which refers to Diabetes Mellitus. In order to represent this data according to OpenEHR both the information entity and its value will have to be interpreted together and it will require additional transformation rules. In this case a rule would transform the information entity ELEMENT[at0002], which refers to Diabetes mellitus and with value "No" represented as a simple text datatype to the SNOMED CT code that represents the No diabetes information in OpenEHR as a coded text datatype. Figure 7-‐9 depicts an excerpt of the clinical data exchanged and represented according to OpenEHR.

SemanticHealthNet


Figure 7-‐9 Excerpt of openEHR EHR data extract

7.1.4 Comments on Use case 1

The above use case has shown an example of an interoperability scenario in which it is possible first to compute the equivalence between the isosemantic models without taking into account the values (data instances) and then later classify the clinical data from one system into another one. We have also considered that it was possible to define the syntactic mappings between both openEHR and ISO 13606 clinical models without information loss.

However, the above use case corresponds to an ideal situation in which the clinical model entities that represent the history of diabetes clinical information are correctly classified. However, in other cases such as for instance the case of the family history of heart failure. The result of applying the reasoner to the OWL DL annotations added to the information entities shows a different situation:

Since the information entity used in form three to encode the family history of heart failure is represented as "History / Others" and this only states that it is referring to the history of some condition it is not classified under any of the representations of form 1 and two where the information entities refer to the family history of heart failure and family history of disease explicitly. This means that for instance the clinical data from the system that uses form one will not be received by the system that uses the form three.

In phase three, when syntactic mappings are applied in order to represent the clinical data from a system B into the corresponding representation in a system A, not always will be possible to do the mapping without information loss and the rules needed to implement the mapping of the specific clinical data when they are represented at different granularity levels in the corresponding systems are not obvious.

ELEMENT[at0003] matches { -- Disorder value matches { DV_CODED_TEXT[at0004] matches {*} -- DV_CODED_TEXT } } <items type="ELEMENT"> <archetype_node_id>at0003</archetype_node_id> <name type="DV_TEXT"> <value>Disorder</value> </name> <value type="DV_CODED_TEXT"> <archetype_node_id>at0004</archetype_node_id> <defining_code type="CODE_PHRASE"> <terminology_id>SNOMED CT</terminology_id> <code_string> 373572006 |clinical finding absent|: {246090004|associated finding|=73211009|diabetes mellitus|} </code_string> </defining_code> <value>no diabetes</value> </value> </items>

SemanticHealthNet


7.2 Use case 2 (Data abstraction for querying)

This use case refrains from converting the data from one clinical model into another. Here, the syntactical interoperability is out of reach, even in the case that both models represent exactly the same information. Interoperability is therefore limited to an exchange of the semantic annotations in description logics.

Let us assume that two systems X and Y follow information model standards S and T respectively. For the same kind of clinical content System X uses model C; whereas System Y uses model D. Each system has the clinical data stored according to these models. These clinical data are annotated with DL expressions (clinical model + values). Additional metadata (authors, time, location, etc.) are stored separately.

Each system will export its clinical data to an RDF repository, thus making them available for querying. The inferred model resulting of the OWL DL annotations made to each model C and D will also be exported to RDF. It will be possible to make SPARQL queries to retrieve data from both system X and Y RDF repositories in a homogeneous way by using the inferred annotations.

Clinical Information is queried and shown to the receiver system but cannot be straightforwardly integrated in it (from abstract representation to more concrete one).

The following example focuses on showing how the semantic annotations added to clinical data from isosemantic models can detect that they are semantically equivalents.

Figure 7-‐10 shows three user interfaces (with appropriate value sets) from three fictitious EHR systems.

Figure 7-‐10 Heterogeneous clinical application forms for recording the same information

Each user interface allows acquiring some kind of information about a clinical case (demographics, time, and other metadata are left out for the sake of simplicity). What the three interfaces have in common is that they refer to some piece of diagnostic information, shortly diagnosis. Diagnosis here should be understood as the statement of what is known about the disease/disorder of a patient. A diagnosis DiagD does not necessarily have a referent D in the real world: if a diagnosis DiagD has a status suspected, then we cannot infer that there is really any instance of D in the patient, as it is typical for admission diagnoses.

SemanticHealthNet


In the following figure, each form consists of fixed and variable elements (values). Each of these elements contributes some part of the overall meaning the whole information is about. The yellow comments shown in Figure 7-‐11 describe in text what the information represented is about.

Figure 7-‐11 Three different forms representing the same kind of information, with text annotations of all of its fixed and variable elements

The whole picture is only given by a combination of these annotations. If we now translate the free text annotation into OWL-‐DL axioms, one by one (see Figure 7-‐12, Figure 7-‐13 and Figure 7-‐14):

Figure 7-‐12: Example of "filled" form with DL annotations. Note the different UI paradigms, here drop-‐down menu vs. check boxes

SemanticHealthNet


In this example, on purpose, the filled forms are semantically equivalent, which can be shown by automated reasoning (see Figure 7-‐18). If we consider the models without values, we clearly see the semantic differences:

Figure 7-‐13 Example of a filled information template in which the whole information is included in one term

from a clinical terminology. The DL annotation anticipates the supposedly "correct" SNOMED CT

representation of the concept "Suspected heart failure caused by physical exercise"

Figure 7-‐14: Example of a filled information template with DL annotations. In contrast to Fig. 7-‐12, the

element "physical exercise" is here part of a value set and not a fixed element of the information model

SemanticHealthNet


Figure 7-‐15: Example of "empty" information templates.

The first one reduces the scope to organ failure diagnoses, the second one allows for whatsoever diagnoses. Different degrees of freedom apply regarding status and cause between the models.

The OWL DL annotation pattern for an empty clinical information model requires placeholders. It can only be used for reasoning if all placeholders are filled by values. During the creation of the clinical models some OWL DL expressions could be specified by leaving open the specific values and only specifying the value sets:.

Figure 7-‐16: Example of a empty information template with corresponding DL expressions in which the variable elements are referred to as ?x, ?y, ?z

?x

SemanticHealthNet


Figure 7-‐16 depicts how diagnosis instances, e.g. the real clinical data as embedded into an information model are annotated by OWL-‐DL (T-‐Box) expressions. These annotations have the form:

Diagnosis_y#123 Type Diagnosis and (isAboutSituation only Situation_Y) and….

Later, in order to add the clinical data instances, the 'isAboutSituation only' is filled by an atomic or composed SNOMED CT concept using OWL-‐EL language. Special patterns apply to negated statements. The additional conjoints further specify the diagnosis (e.g. whether confirmed or suspected). It will have to be seen in the future which additional information (time, place, author, patient, etc.) will have to be managed outside this framework.

The following figure shows three instances, one for each form, defined according to the OWL DL axioms shown in the previous figures (Figure 7-‐12, Figure 7-‐13 and Figure 7-‐14).

Figure 7-‐17: Diagnosis instances

The check for the semantic equivalence of the information represented by each form can then be performed by a DL reasoner completely at the T-‐Box level. Queries could then be formulated as DL queries as follows:

SemanticHealthNet


All three information instances found

Figure 7-‐18: DL classification (1)

Figure 7-‐18 and Figure 7-‐19 show how the three instances of diagnosis are retrieved independently of the granularity in which the query is performed.

Figure 7-‐19 DL classification (2)

SemanticHealthNet


7.2.1 Comments on Use case 2

The above use case has shown an example of an interoperability scenario in which the clinical information captured by three isosemantic models is computed as semantically equivalent if both the "empty forms" and the values are used together for reasoning.

It could be summarized as consisting of the following steps:

• Manual semantic annotation of (empty) information model and value sets using the proposed ontology and SNOMED CT

• Filling of the information model with data • Automated semantic annotation of the model and the clinical data

A translation problem might occur in step 1, which means to correctly adding semantics to the information model. Ideally, it would make explicit all implicit assumptions contained in the information model. As humans are in the loop who have to be trained and need to provide a high quality work this might be error prone. It is a process that needs to be quality assured as it will have a major impact on the workflows and tooling.

This use case could be interpreted as a variant of use case one, but it does not pursue also the syntactic interoperability, which is the integration of the clinical data in the receiver system but to provide a semantic layer to current information systems that will sit in between them and a sophisticated query system. The main roles of this layer will be to provide a consistent semantic representation of clinical data and to allow detecting equivalent semantics among them. Thus, a sophisticated query system will use the results from the semantic layer in order to build queries for retrieving data from heterogeneous medical repositories. This use case does not aim to provide reverse translation from the ontological representation in OWL back to the original representation according to a specific EHR standard and ontology.

At this moment we consider this the most realistic scenario for providing SIOp. WP4 is working on looking for more practical evidences of this. However, other alternatives or combinations of the already presented might be further aim of study. More discussion about this can be also found in Section 6.

SemanticHealthNet


8 Notes about the deliverable

At the time of writing there is not a complete consensus about all the aspects discussed in this deliverable. Open issues have been highlighted in the text. The deliverable documents the work and the discussions during the first ten months of SemanticHealthNet WP4. It constitutes a snapshot of lively discussions at the end of November 2012. It successfully documents the problems that exist when trying to formalize SIOp using a deep ontological analysis and principled representation languages and upper models. Preliminary results show the need for compromises due to practical implementation issues. It also anticipates the challenges of tooling and training when it comes to practical realizations.

The examples provided are deliberately simple and fictitious. They should not be too seriously analysed from a clinical point of view, but seen as exemplars that are understandable both for medical experts and medical informaticians. They are meant to highlight several facets of clinical documentation and as blueprints, which can support real clinical applications. Currently, WP4 together with WS1 are working in developing a heart care summary that will be later used as SIOp use case and thus used to test the framework presented here. Whereas the current examples are very diagnosis-‐driven, the heart care summary will shift the focus also to other aspects of documentation.

Further discussions will be followed in the consortium taking this deliverable as source in which most of the challenges to get SIOp are presented. On a longer term, the practicality and tractability of the suggested approaches have to been studied in tested in real world settings in the framework of SemanticHealthNet Workstream 2 and translated into a more accessible form to enable a broader group of implementers to appreciate and learn from the experience.

SemanticHealthNet


9 Annex I Heart care summary use case

The SemanticHealthNet Network of Excellence project held a two days technical meeting on 26th and 27th April in Copenhagen, in which participated partners of WP4 (see participants list). Three external experts were also invited to the meeting.

One of the main objectives of this first technical meeting was to represent an example use case by using different EHR standards together with SNOMED CT. To this end, one month before the meeting, the partners from the OpenEHR, HL7, EN 13606 and IHTSDO organizations, had received the following use case:

USE CASE #1: HEART CARE SUMMARY VISIT DATE: 21/07/2004 PATIENT: Minnie Mouse HOSPITAL: Hospital no 1 HISTORY: An 80-‐year-‐old patient presents herself in an outpatient clinic due to breathlessness and a lot of fatigue. The primary diagnosis is heart failure. Allergies are unknown. The patient is not diabetic and there is no answer to the question whether the patient had pain. HEART FAILURE EVALUATION: BP is 104/58. Current status of heart failure is moderate to severe, and the disease is stable. ON EXAMINATION (21/07/2004): Oedema around knees. JVP at ear lobes level. Tachycardia. Most recent ECG (17/07/2004) shows a heart rate of 73, atrial fibrillation and previous anterior infarct. REQUESTED TESTS: glucose tolerance test Patient was recommended to start spironolactone 25mg/d. Check potassium in two weeks.

During the first meeting day, the workpackage leaders exposed some basic ideas of their suggested methodology to harmonize heterogeneous documentation artefacts.

The partners' resulting models were, one by one, presented and discussed. Another objective of this meeting was to present a possible solution that could serve as bridge between the different representations provided. The second day, the WP leaders presented their ontological framework as a possible solution and presented an overview of some general aspects of it. Afterwards, some fragments of the use case were firstly compared in terms of the different information model specifications and later represented according to the ontology proposed. Finally, an overview of how the transformation of each particular representation to the ontological formalism could be performed was given. There was no fundamental criticism of the ontology-‐based method proposed. However, there was not time enough to discuss many controversial aspects. This happed in subsequent e-‐mail discussions, which are also in the scope of this report.

SemanticHealthNet


The next subsections intend to provide a brief summary of the presentations given by the different partners. A more extended explanation of each one can be seen in the Annex section.

Following we will show how the same clinical information is represented heterogeneously depending on the EHR standards and clinical vocabularies used. More specifically we will show its representation according to openEHR, HL7 CDA and SNOMED CT. Attention will be paid to the SNOMED CT bindings provided and the information embedded in the information model structures. We will compare these representations and then we will provide their definition in OWL by using the ontological framework above. Finally we will discuss some of the problems we found.

The example chosen is the following clinical information statement, "The patient is not diabetic", extracted from the history section of the heart care summary report. According to openEHR, this statement could be represented by using the archetype openEHR-‐EHR-‐EVALUATION.exclusion-‐problem_diagnosis.v1. Following, in Figure 9-‐1, an extract of its encoding is shown.

Figure 9-‐1: OpenEHR: "The patient is not diabetic"

By means of a template the value of the "ELEMENT[at.09]" can be fixed to the default coded text value 73211009 |Diabetes mellitus|, being a reference to the SNOMED CT code 73211009 with the preferred term "Diabetes mellitus" for the sake of human readability. Here we can observe how information entities may include some semantic as it happens with the openEHR ELEMENT entity that has the meaning no previous history of embedded. If the same statement is represented by using the HL7 CDA standard, the encoding depicted in Figure 9-‐2 will be obtained.

EVALUATION[at0000.1] matches { -‐-‐ Exclusion statement -‐ Problems and Diagnoses data matches { ITEM_TREE[at0001] matches { -‐-‐ Tree items cardinality matches {0..*; unordered} matches { ELEMENT[at0.9] occurrences matches {0..1} matches { -‐-‐ No previous history of value matches { -‐-‐ 73211009 | diabetes mellitus | DV_TEXT matches {*} } } ... }}}}

SemanticHealthNet


Figure 9-‐2: HL7 CDA: "The patient is not diabetic"

It can be observed how the fact that the patient does not have diabetes mellitus is defined inside a problem section template, as an observation, a kind of HL7 ACT entity, whose value is the following SNOMED CT post-‐coordinated expression:

373572006 |clinical finding absent| : { 246090004 |associated finding| = 73211009 |diabetes mellitus|}

It is noteworthy that the treatment of negation is fundamentally different from openEHR. Whereas in openEHR the whole archetype embodies the negative meaning, HL7 Terminfo project in this case recommends, to avoid confusion, not to use any negation semantics within the information model (although it is not forbidden). This requires that the target ontology caters for negation, which explains the need of a post-‐coordinated SNOMED CT expression as demonstrated.

SNOMED CT, due to its hierarchy of "context-‐dependent categories" provides concepts for expressing information entities such as document sections, thus blending information entities and domain-‐ontology aspects (see Figure 9-‐3):

422625006 |history of present illness section| 373572006 |clinical finding absent| : { 246090004 |associated finding| = 73211009 |diabetes mellitus|}

Figure 9-‐3: SNOMED CT: "The patient is not diabetic"

<component> <section> <templateId root='2.16.840...'/> <title>Problems</title> ... <entry typeCode="DRIV"> <act classCode="ACT" moodCode="EVN"> <templateId root='2.16.840...'/> ... <entryRelationship typeCode="SUBJ"> <observation classCode="OBS" moodCode="EVN">  <templateId root='2.16.840.1...'/> <code code="ASSERTION" codeSystem="2.16.840..."/> <value xsi:type="CD" ... displayName=not diabetic/> <entryRelationship typeCode="REFR"> <observation classCode="OBS" moodCode="EVN">  <templateId root='2.16.840...'/> <code code="33999-4" ... displayName="Status"/> <statusCode code="completed"/> <value xsi:type="CE" ... displayName="Active"/> </observation> </entryRelationship> </observation> </entryRelationship> </act> </entry> ...

SemanticHealthNet


Our examples demonstrate that for the same piece of information there are different possible representations: in openEHR the negation is expressed by the archetype proper and the clinical meaning is provided by the binding to one simple SNOMED CT code; in HL7 CDA it is achieved via the binding to a complex SNOMED CT expression that embodies the negative meaning. The latter is also the case with the SNOMED only encoding, where the SNOMED CT expression from the second example is wrapped into a contextual information entity, using the concept 422625006 |history of present illness section| from SNOMED CT.

How can we make sure that the three encodings have equivalent meanings? SemanticHealthNet will examine the role of an ontological framework for semantic annotations that can be generated out of the diverse information model / ontology combinations, and from which semantic equivalence should be computed automatically, using OWL-‐DL as a language that allows a logic-‐based rendering of the information, which then can be submitted to description logics classifiers for testing semantic equivalence. In this framework the fact that a person does not have diabetes mellitus in their clinical history will be expressed by means of a complex semantic annotation of the information. This piece of information that states the absence of diabetes in the history is identified as the instance (or annotation) of an OWL class, here called (DocumentItemAboutSituationWithoutDiabetes), which is, ontologically, a specialization of the class InformationEntity. By means of the isAboutSituation relation it is related to a logical expression that defines the clinical situation under scrutiny (ClinicalSituationOfPersonWithoutDiabetes):

ClinicalSituationOfPersonWithoutDiabetes equivalentTo processualPartOf some (BiologicalLife and (hasParticipant some (HumanOrganism and (not (locusOf some SCT_Diabetes)))))

DocumentItemAboutSituationWithoutDiabetes equivalentTo (abstractPartOf some Document) and (isAboutSituation some ClinicalSituationOfPersonWithoutDiabetes)

In order to obtain this ontological representation, some patterns and mapping rules have previously to be defined at the level of the archetype or clinical models. For instance that the SNOMED CT concepts finding with explicit context instead of findings must be used when binding HL7 documents and archetypes to SNOMED CT. Clinical finding present, clinical finding absent, clinical finding suspected or clinical finding unknown are subtypes of the concept finding with explicit context and they could be easily transformed to our ontological representation. For instance, this would be the representation for the case clinical finding absent:

DocumentItem and isAboutSituation some (Situation and not associatedFinding some ClinicalCondition)

In the above representation the relation associatedFinding has been used for simplification reasons but it is defined as:

associatedFinding = processualPartOf o hasParticipant o locusOf

SemanticHealthNet


As a result, all meaning-‐bearing elements in clinical documents would then be annotated by logical expressions in OWL-‐DL, which refer to one or more SNOMED CT concepts. The equivalence of such expressions could then be verified by DL reasoners. However, there are some issues that have to be better studied as the representation of the context. The statement that the patient has no diabetes mellitus does not say anything about its precise context. Questions such as "does it refer to the result of some examination that has just been done?" or "is it part of his/her past history?", have to be answered. In the representations above provided, the context has been expressed also in different ways. In openEHR the context was embedded in the information entities (No previous history of) as well as in HL7 CDA where it was included in the problems section. In SNOMED CT a concept from the hierarchy record artefact was used (history of present illness section). Therefore, the context of the finding has to be clearly identified and in order to do it is necessary to establish some patterns that will depend on the EHR standard used.

SemanticHealthNet


10 Annex II: Clinical Situation

(from Manuscript "The Ontology of Clinical Situations" by A. Andrade et al., submitted to JAMIA)

Discourse about clinical situations requires first explicitly capturing the meaning of medical terms or codes as used in the electronic health record and secondly identifying the ontological categories of the clinical entities denoted by these terms. This demands clear definitions of what clinical conditions, or more specifically, what diseases and disorders are. In OGMS it is proposed that the term "disease" denotes a subclass of disposition, which members, if realized, are realized in pathological processes due to one or more "disorders" in the organism bearing the disease instance. The term "disorder" denotes combinations of physical components that are clinically abnormal and not part of any larger such combination. For example, the disease Heart Failure is a disposition to undergo tachycardia and pulmonary congestion (pathological processes), caused by a physical disorder of the heart, such as myocardial cell death after a heart infarction.

However, medical language (and therefore terminology) tends to disregard such systematic distinctions between disorder, disease and pathological process. Instead, it uses "disease" and "disorder" in sometimes idiosyncratic ways. It is therefore not surprising that, in SNOMED CT, entities named "disease" and "disorder", as well as signs and symptoms, regardless of whether they denote processes, abnormal body parts, or dispositions, appear intermixed under a single category, such as Clinical Finding. Moreover, clinical findings are not necessarily pathologic, such as 309534003|Normal height|. This can be observed in ICD, too, e.g. Z55.0 Illiteracy. The inclusion criterion for a term is not primarily whether it denotes a pathological entity but whether it is of clinical interest.51 We therefore ignore the distinction between "abnormal" or "pathological" and introduce Clinical condition as a common category containing all entities that are referred to by medical documentation and science:

ClinicalCondition equivalentTo ClinicalMaterialEntity or ClinicalDisposition or ClinicalProcess

This disjunctive class is not meant to blur the boundaries between material entities, dispositions and processes. That "cancer" denotes malignant tissue in one context and malignant growth in another one does not mean at all that instances of cancer are both occurrents and continuants. Which type it instantiates is deliberately left open, so that in the moment a term or a code is used, a disambiguation may possibly occur according to the context of use. All competing definitions of Clinical Situation we will introduce in the following will refer to this disjunctive class ClinicalCondition.

The problem now is to find an easy and ontologically consistent way to relate Clinical situations with Clinical Conditions. For this purpose we introduce the relation includes, which can be distinguished into the following cases:

1. A process x coexists with a process y. y is a part of x which temporally coincides.

2. A process x coexists with a material entity y.

51 We do not find it important at this point to deliberate any further about the boundary between the normal and the abnormal, which we regard as fuzzy and of debatable ontological relevance.

SemanticHealthNet


3. A process coexists with a disposition. This means that this process includes a material entity while it bears a disposition

includes (x, y) := ∀x ∀y Process(x) ∧

(Process(y) ∧ ∃r ∃s (SpatiotemporalRegion(r) ∧ SpatiotemporalRegion(s) ∧

'occupies spatiotemporal region'(x, r) ∧ 'occupies spatiotemporal region'(y, r) ∧

'occurrent part of'(r, s)))

∨

(MaterialEntity(y) ∧ ∀t ('exists at'(x, t) → 'exists at'(y, t) ∧

∃s ∃r (SpatiotemporalRegion(s) ∧ SpatialRegion(r) ∧

'occupies spatiotemporal region'(x, s) ∧ 'projects onto'(s, r, t) ∧ 'located at'(y, r, t))))

∨

(Disposition(y) ∧ ∀t ('exists at' (x, t) → 'exists at' (y, t) ∧

∃s ∃r ∃m (SpatiotemporalRegion(s) ∧ SpatialRegion(r) ∧ MaterialEntity(m) ∧

'occupies spatiotemporal region'(x, s) ∧ 'projects onto'(s, r, t) ∧

'located at'(m, r, t) ∧ 'bearer of'(m, y, t))))

A ClinicalSituation_X is a temporal part (phase) of a person's life during which some ClinicalCondition_X is present

This definition restricts clinical situation to phases of a person's life and follows closely the meaning embedded into the above relation includes:

ClinicalSituation_X equivalentTo

Process and ('temporal part of' some Life) and (includes some ClinicalCondition_X)

With ClinicalSituation being a Process that is 'temporal part of' some Life, CancerSituation would be represented as:

CancerSituation equivalentTo ClinicalSituation and includes some Cancer

It is here of no consequence whether the cancer is seen as a material entity or a process. A situation without cancer is also a situation:

SituationWithoutCancer equivalentTo ClinicalSituation and not (includes some Cancer)

Accordingly, a ClinicalSituation_X_Y is a phase of a patient's life for which some ClinicalCondition_X and some ClinicalCondition_Y is wholly present:

SemanticHealthNet


ClinicalSituation_X_Y equivalentTo ClinicalSituation and

includes some ClinicalCondition_X and

includes some ClinicalCondition_Y

This is consistent with numerous SNOMED CT concepts where C with D is a subclass of both C and D: A clinical situation with cancer and dyspnea would be both subsumed by CancerSituation and by DyspneaSituation. Similarly, KidneyHematomaSituation would be a common child of both HemorrhageSituation and RenalMassSituation, just as a TetralogyOfFallotSituation would be a common child of PulmonicValveStenosisSituation and VentricularSeptumDefectSituation (among others). However, a RenalMassSituation would only be a KidneyHematomaSituation as long as both conditions are equally present.

SemanticHealthNet


11 Annex III: Notes about ContSys

The ContSys standard introduces concepts with direct traceability to an explicit model of clinical processes, with a special focus on the health state of an individual. Aspects that compose the health state are defined in ContSys as health conditions. Different types of health conditions (observed/perceived, assessed, considered, target, prognostic and risk conditions) can be linked to a clinical situations and/or to steps in the clinical process through the reference clinical models representing the clinical context. Links are needed for both clinical content and clinical process context. Which characteristics each clinical reference model should include can be identified from current information models and vocabularies. Based on reference clinical models with their exhaustive gross list of characteristics/attributes from different sources more specific clinical models for different types of clinical processes (e.g. the clinical process for heart failure) can be derived. This would mean that a symptom of e.g. fatigue in chronic heart failure could be represented by such a specialisation of the reference clinical model for body function. According to the use case, only those attributes in the reference model found relevant will be populated by values.

Clinical processes modify clinical conditions. For instance, an observation value is assessed with some uncertainty and later identified as a considered condition. After further process steps with investigation activities more observations are added as separate perceived conditions that later can be used as criteria for a more holistic description of the health state. Successively criteria are added and commonly the considered condition is transformed (and relabelled) as a perceived condition. If the genesis of the perceived condition is assessed as reasonable sure, the condition is transformed into an assessed condition. The life cycle of a clinical process contain such refinements of the perceptions of the health state. The effect of treatments, influencing the health state, are correspondingly observed and registered as new health conditions.

In SemanticHealthNet clinical conditions are indirectly represented through clinical situations. To what extent the ContSys notions of condition and situation come close to the ones introduced in SemanticHealthNet.

The process model in ContSys is based on the definition of process in ISO 9000:2005 “set of interrelated or interacting activities which transform inputs into outputs”. Processes are categorised by their customer, input, refinement object and output respectively. The customer of the process is the individual or the patient. The input is an individual’s health status and the process consists of activities aiming to indirectly or directly add value to the individual’s health status. There is a core process and to support this core process there are almost always several supportive processes which influence and interact with the core process.

Examples are Health overview – past and present health conditions and health care activities, Diagnosis: perceived condition as criteria – assessed condition, Motivating condition -‐ considered condition -‐ investigating activity-‐ resulting perceived condition, Motivating condition -‐ target condition – treatment activity – resulting perceived condition, Health problem: health conditions as criteria for health problem, Health related risk: underlying condition/activity – triggering event – risk condition/activity – resulting condtion/activity.

In SemanticHealthNet the closest to the idea of “reference clinical models for clinical situations” correspond possibly to the idea of Healthcare situation, which is distinct from Clinical Situation, and which at the moment has not been studied in depth. Healthcare situations will comprise a set of related clinical situation of the patient system plus surrounding situations (which do not comprise the patient system) with regards to some healthcare delivery.

SemanticHealthNet


In ContSys, the reference clinical models for steps in the clinical process are combinations of situational models and sometimes models for single concepts. The combinations also here are based on experiences from analyses of a great number of clinical processes. There are three basic types: health request, needs assessment and activity plan.

In SemanticHealthNet the links between the clinical processes involved in the clinical information generation are not represented at this moment. This would correspond to the relations defined between the clinical actions that have as a result some information referring to some clinical situation of the patient.

semanticinteroperabilityforhealthnetwork) deliverable4.1 ... 288408 d4_1 initial m… ·...

Documents