tracking referents (based on oic, december 1, 2006)

51
New York State Center of Excellence in Bioinformatics & Life Sciences R T U New York State Center of Excellence in Bioinformatics & Life Sciences R T U Tracking Referents (based on OIC, December 1, 2006) Barry SMITH and Werner CEUSTERS Center of Excellence in Bioinformatics and Life Sciences University at Buffalo, NY, USA http://www.org.buffalo.edu/RTU

Upload: lanza

Post on 12-Jan-2016

19 views

Category:

Documents


0 download

DESCRIPTION

Tracking Referents (based on OIC, December 1, 2006). Barry SMITH and Werner CEUSTERS Center of Excellence in Bioinformatics and Life Sciences University at Buffalo, NY, USA http://www.org.buffalo.edu/RTU. Representational artifacts. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T U

Tracking Referents

(based on OIC, December 1, 2006)

Barry SMITH and Werner CEUSTERSCenter of Excellence in Bioinformatics and Life Sciences

University at Buffalo, NY, USA

http://www.org.buffalo.edu/RTU

Page 2: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T U

Representational artifacts

classified according to the sort of entities they are about

Non-Formalized Formalized

Primarily about particulars

news reports inventories, referent tracking database

Primarily about universals / types

scientific theories, textbooks

ontologies, terminologies,

Page 3: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T U

A realist view of the world

• The world consists of entities that can be divided according to three dichotomies– entities that are

• Either particulars or universals;

• Either occurrents or continuants;

• Either dependent or independent;

– together with relations between these entities• <particular , universal> e.g. is-instance-of,

• <particular , particular> e.g. is-member-of

• <universal , universal> e.g. is_a (is-subtype-of)

Page 4: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T U

A realist view of the world (1)

airplane philosopher

airport

universals/types

instances/particulars

Enola Gay Barry Smith

JFKGeorge Bush

instance of

president

Page 5: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T U

A realist view of the world (2)

Enola Gay Barry Smith

JFKGeorge Bush

t

continuants

flying meetingoccurrents

Page 6: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T U

A realist view of the world (3)

philosopher

universals

particulars

Barry SmithGeorge Bush

presidentchild adult

t

Instance-at t

Page 7: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T U

Inadequate representational units

• “JFK” “Enola Gay”

• “Barry Smith” “George Bush”

Page 8: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T U

• Purpose:– explicit reference to the

concrete individual entities relevant to the accurate description of a scene

Proposed Solution: Referent TrackingNow! That should clear up a few things around here !

Ceusters W, Smith B. Strategies for Referent Tracking in Electronic Health Records. J Biomed Inform. 2006 Jun;39(3):362-78.

Page 9: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T U

78

Numbers instead of words

• Method:

– Introduce an Instance Unique Identifier (IUI) for each relevant particular (individual) entity

Page 10: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T U

Essentials of Referent Tracking

• generate of universally unique identifiers;• decide what particulars should receive a IUI;• finding out whether or not a particular has already been

assigned a IUI (each particular should receive maximally one IUI);

• using IUIs to make statements;• determining the truth values of statements in which IUIs

are used;• correcting errors in the assignment of IUIs.

Page 11: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T U

IUI generation

• Universally Unique IDs: – recently standardized through ISO/IEC 9834-8:2004, – specifies format and generation rules enabling users to

produce 128-bit identifiers that are either guaranteed or have a high probability of being globally unique

– Meaningless strings– Central management or certification not needed to

guarantee uniqueness• (But use as IUI requires this)

Page 12: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T U

IUI assignment• = an act carried out by the first ‘cognitive agent’

who recognizes the need to acknowledge the existence of a particular it has information about by labeling it with a IUI.

• ‘cognitive agent’:– A person;– An organisation;– A device or software agent, e.g.

• Bank note printer• Image analysis software

Page 13: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T U

Criteria for IUI assignment (1)

1. Different for continuants and for occurrents2. The continuant is in front of you, you can see it,

photograph it– The photograph gets a IUI; your act (occurrent) of taking the

photo gets a IUI

3. The occurrent occurs in your presence, you can make a video

– The video gets a IUI; your act (occurrent) of taking the video gets a IUI

4. When assigning a IUI you may not know exactly what the particular is (which type it instantiates)

Page 14: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T U

Criteria for IUI assignment (2)

2. The particular’s existence ‘may not already have been determined as the existence of something else’:

• Morning star and evening star• Himalaya 2 observers not knowing they observed the same thing

3. May not have already been assigned a IUI.4. It must be relevant to do so:

• Personal decision, (scientific) community guideline, ... • Possibilities offered by the EHR system• If a IUI has been assigned by somebody, everybody else

making statements about the particular should use it

Page 15: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T U

Assertion of assignments

• IUI assignment is an act whose \execution has to be asserted in the IUI-repository:– <da, Ai, td>

• da IUI of the registering agent

• Ai the assertion of the assignment <pa, pp, tap, c>

» pa IUI of the author of the assertion

» pp IUI of the particular

» tap time of the assignment

» c optional description for identification

• td time of registering Ai in the IUI-repository

• Neither td or tap give any information about when #pp started to

exist. This might be asserted in statements providing

information about #pp .

Page 16: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T UPTP statements - particular to

particular

• ordered sextuples of the form <sa, ta, r, o, P, tr>

sa is the IUI of the author of the statement,

ta a reference to the time when the statement is made,

r a reference to a relationship (available in o) obtaining between the particulars referred to in P,o a reference to the ontology from which r is taken,P an ordered list of IUIs referring to the particulars between which

r obtains, and,tr a reference to the time at which the relationship obtains.

• P contains as many IUIs as required by the arity of r. In most cases, P will be an ordered pair such that r obtains between the particular represented by the first IUI and the one referred to by the second IUI. • As with A statements, these statements must also be accompanied by a meta-statement capturing when the sextuple became available to the referent tracking system.

Page 17: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T U

PTCL statements – particular to class

<sa, ta, inst, o, p, cl, tr>

sa is the IUI of the author of the statement,

ta a reference to the time when the statement is made,

inst a reference to an instance relationship available in o obtaining between p and cl,

o a reference to the ontology from which inst and cl are taken,

p the IUI referring to the particular whose inst relationship with cl is asserted,

cl the class in o to which p enjoys the inst relationship, and,

tr a reference to the time at which the relationship obtains.

Page 18: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T U

Other Advantages

• mapping as by-product of tracking– Descriptions about the same particular using different

ontologies/concept-based systems

• Quality control of ontologies and concept-based systems– Systematic “inconsistent” descriptions in or cross

terminologies may indicate poor definition of the respective terms

Page 19: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T U

Dynamic aspects

Page 20: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T U

Accept that everything may change:

1. changes in the underlying reality:• Particulars and universals come and go

2. changes in our (scientific) understanding: • The plant Vulcan does not exist

3. reassessments of what is considered to be relevant for inclusion (notion of purpose).

4. encoding mistakes introduced during data entry or ontology development.

Page 21: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T U

Reality versus beliefs, both in evolution

IUI-#3

O-#2

O-#1

tU1

U2

p3Reality

BeliefO-#0

= “denotes” = what constitutes the meaning of representational units …. Therefore: O-#0 is meaningless

Page 22: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T U

An “optimal” representational artifact (2)

• Each representational unit in such a representational artifact would designate – (1) a single portion of reality (POR), which is – (2) relevant to its purposes and such that – (3) the authors intended to use this representational

unit to designate this POR, and– (4) there would be no PORs objectively relevant to

these purposes that are not referred to in the representational artifact.

Page 23: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T U

Sources of error

• assertion errors: sources may be in error as to what is the case in their target domain;

• relevance errors: sources and analysts may be in error as to what is objectively relevant to a given purpose;

• encoding errors: they may not successfully encode their underlying cognitive representations, so that particular representational units fail to point to the intended PORs.

Page 24: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T U

Key requirement for updating

Any change in an ontology or data repository should be

associated with the reason for that change to be able to assess later what kind of mistake has been made !

Page 25: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T U

Example: a person’s gender

• In John Smith’s EHR:– At t1: “male” at t2: “female”

• What are the possibilities ?• Change in reality:

• transgender surgery• change in legal self-identification

• Change in understanding: it was female from the very beginning but interpreted wrongly

• Correction of data entry mistake(was understood as male, but wrongly transcribed)

Page 26: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T U

A realism-based metric for data quality• Must be able to deal with a variety of problems by which

matching endeavors thus far have been affected– different authors may have different though still veridical views

on the same portion of reality, – authors may make mistakes,

• when interpreting reality, or • when formulating their interpretations in their chosen representation

language

– a matcher can never be sure to what the expressions in an repository actually refer (no God’s eye perspective),

– if two ontologies are developed at different times, reality itself may have changed in the intervening period.

Page 27: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T UAn example: merging data from two sourcesReality exist before any observation

R

And also most structures in reality are there in advance

Page 28: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T UThe author of O1 acknowledges the existence of some Portion Of Reality (POR)

R

B1

Some portions of reality escape his attention.

Page 29: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T U

R

He considers only some of them relevant for O1,represents thus only part, here with Int = R+.

O1

B1

#1

RU1B1

RU1O1

• Both RU1B1 and RU1

O1 are representational units referring to #1;

• RU1O1 is NOT a

representation of RU1B1;

• RU1O1 is created through

concretization of RU1B1 in

some medium.

Page 30: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T U

R

Similarly concerning the author of O2

O1

B2B1

O2

Page 31: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T U

R

Creation of the mapping

O1

B2B1

O2

Om

Page 32: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T U

Two (out of many other) possible configurations

#1 was not considered to be relevant for O2, but is considered to be relevant for Om.

The author of O1 made an encoding mistake, so that his ontology contains a reference to a non-intended referent, and this is copied into Om.

RR

O1O1

B2B2B1B1

O2O2

OmOm

RR

O1O1

B2B2B1B1

O2O2

OmOm

Page 33: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T UTypology of expressions included in and excluded from an

ontology in light of relevance and relation to external reality

Page 34: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T UTypology of expressions included in and excluded from an

ontology in light of relevance and relation to external reality

Valid presence in the representation

Valid absence in the representation

Page 35: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T UTypology of expressions included in and excluded from an

ontology in light of relevance and relation to external reality

Unjustified presence in the representation

Unjustified absence in the representation

But sometimes you get lucky …

Page 36: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T UThe original beliefs are usually not accessible

R

O1

O2

B2B1

Om

Page 37: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T UThe original beliefs are usually not accessible

R

O1

O2

Om

• But if the ontologies are well documented and representations intelligible, then many such beliefs can be inferred, and mistakes found.

Page 38: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T UFor concept-based systems, there is also no reality

R

O1

O2

Om

Page 39: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T UBut that what must hold if both ontologies are believed to be right, can be believed to mirror reality

O1

Om

O2

Page 40: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T U

The principle of forced backward belief

O1O1

OmOm

O2O2

A lot of information loss

Page 41: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T U

A decision support tool for dealing with inconsistencies ?

• O1:– Holds that penguins are birds, birds fly

• O2:– Holds that penguins are birds, penguins don’t fly

• The problem for Om:– Which source ontology to believe?

– What might be the source of the inconsistency ?• O1 is right and penguins do fly

• O1 is wrong and either penguins are not birds or not all birds fly

• Both are right but the representational units ‘penguin’, ‘bird’ and ‘fly’ do not refer to the same entities in reality.

Page 42: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T U

Possible evolutions through updates

Page 43: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T U

Possible evolutions through updates

Example: a relevant entity ceases to exist, but the representation is not updated:

Page 44: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T U

Updating is an active process

• authors assume in good faith that – all included representational units are of the P+1 type, and

– all they are aware of, but not included, of A+1 or A+2.

• If they become aware of a mistake, they make a change under the assumption that their changes are also towards the P+1, A+1, or A+2 cases.

• Thus at that time, they know of what type the previous entry must of have been under the belief what the current one is, and the reason for the change.

Page 45: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T U

This leads to a calculus …

• NOT:

– to demonstrate how good an individual version of an ontology is,

• But rather– to measure how much it improved (hopefully)

as compared to its predecessors.

• Principle: recursive belief revision

Page 46: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T U

BeliefsAt t about t

Backward belief revision over time

Reality: a POR exists and is not relevant

• At time t, an analyst correctly perceives the existence of some particular, but considers it relevant while it isn’t, and he makes an encoding error such that the representational unit does not refer.

• There is thus a -2 error with respect to reality, but this remains, of course, unknown.

-2

R P

Page 47: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T U

BeliefsAt t about t

Backward belief revision over time

At t+1 about t+1At t+1 about t

Reality: a POR exists and is not relevant

• At t+1, he correct the encoding mistake, which forces him to believe that at t, the unit-reality configuration was of type P-4 rather than P+1.

R P

-2

Page 48: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T U

BeliefsAt t about t

Backward belief revision over time

At t+1 about t+1At t+1 about t

Reality: a POR exists and is not relevant

• Although he believes that the current situation is P+1, it is in reality P-6, where it was P-7 before.

• The real error is now -1, while the perceived error with respect to t is also -1

R P

-2

-1-1

Page 49: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T U

BeliefsAt t about t

Backward belief revision over time

At t+1 about t+1At t+1 about t

Reality: a POR exists and is not relevant

• At t+2, he believes that the posited POR in fact does not exist

R P

-2

-1-1

Page 50: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T U

BeliefsAt t about t

Backward belief revision over time

At t+1 about t+1At t+1 about t

Reality: a POR exists and is not relevant

At t+2 about t+2At t+2 about t+1At t+2 about t

R P

-2

-1

-1

-1-3-5

Page 51: Tracking Referents (based on OIC, December 1, 2006)

New York State Center of Excellence in Bioinformatics & Life Sciences

R T U New York State Center of Excellence in Bioinformatics & Life Sciences

R T U

Conclusion

• Realist ontology is a powerful quality assurance tool for building high quality ontologies AND high quality databases;

• Referent tracking, based on realist ontology, is a means to remove the ambiguity in data that cannot be solved by realist ontology alone;– It is a form of “adult” annotation

• Application of RT requires a globally accessible repository• The use of “meaningless” IUIs allows very strict safety and

security measures to be implemented.