extracting information from heterogeneous information sources using ontologically specified target...

46
Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University Funded by NSF

Post on 19-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David

Extracting Information from Heterogeneous Information

Sources Using Ontologically Specified Target Views

Joachim Biskup

Universität Dortmund

and

David W. Embley

Brigham Young University

Funded by NSF

Page 2: Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David

Information ExchangeSource Target

InformationExtraction

SchemaMatching

Leveragethis …

… to dothis

Page 3: Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David

Presentation Outline

• Overview• Matching (Direct)• Matching (Derived)• Matching Algorithm• Summary

Page 4: Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David
Page 5: Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David

Requirements

1. f is an injective function.2. f maps obj. sets to obj. sets and rel. sets to rel. sets3. f respects rel-set arities.4. f respects referential integrity.5. f respects types.6. f respects real-world identity.7. f ’s coercions are G/S compatible.8. f respects subset constraints.9. f respects mutual-exclusion constraints.10. f respects union constraints

Page 6: Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David

User Interaction(IDS Statements)

• Issue– Explains the issue– Example: units, may need transformation

• Default– Explains the default option– Example: if no transformation, no conversion

• Suggestion– Gives a suggestion about how to resolve the issue– Example: if needed, specify the conversion

Page 7: Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David

Theorem

Let f be the generated mapping from target t to source s,populated such that s has a valid interpretation. Let t’ bethe submodel of t populated from s by f. Then t’ has avalid interpretation.

Proof: the paper is the proof …

Page 8: Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David

Target(Graphical View)

Page 9: Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David

Target(Textual View)

Page 10: Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David

Source Example(Assumed to be Populated)

Page 11: Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David

Matching (Direct)

• Object Sets

• Relationship Sets

Page 12: Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David

Object-Set Type Compatibility

<a, b>1. type(a) = type(b)2. type(a) type(b)3. type(a) type(b)4. type(a) type(b)

Page 13: Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David

type(a) = type(b)• Same type

– string = string, but Airport Head Of State– Need better matching techniques

• Same type, different units– Size Nr Sq Km– Need unit conversion

• Same type, different format– Date Date, but 01/02/2002 Jan 2, 2002– Need format conversion

• Same type, same units and format, different assumptions– Altitude Altitude, but altitude of aircraft and spacecraft differ– Need same assumptions

• Same type, same units and format, same assumption, OIDs

Page 14: Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David

type(a) type(b)and type(a) type(b)

• Real Integer or Video Image– Target has greater discriminating power– Can add .0 or make a video of a single image (?)

• Integer Real or Image Video– Source has greater discriminating power– Can round off or select one of the frames (?)

Page 15: Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David

type(a) type(b)

• Image String– Mismatch, even if same attribute (e.g. both City)– Types can help discard potential matches

• String(5) Integer– But suppose the integer is 2– Might work, but is “2.000” ok?

Page 16: Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David

Relationship Match Requirements

• Referential integrity

• Constraints– Cardinality– Mandatory/Optional

Page 17: Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David

Referential Integrity

a

b

a’

b’

Target Source

. . . . . . a’’

The types of a, a’, and a’’ canall be different, but not arbitrary.Example: a (String), a’ (Integer),a’’ (Real).

Page 18: Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David

Relationship-Set Constraint Compatibility

<a, b>1. constr(a) <=> constr(b)2. (constr(a) <= constr(b)) (constr(a) => constr(b))3. (constr(a) <= constr(b)) (constr(a) => constr(b))4. (constr(a) <= constr(b)) (constr(a) => constr(b))

Page 19: Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David

constr(a) <=> constr(b)

Person Car

owns

drives

o

o

o

o

Person Car?

o o

Need more information to resolve: Perhaps “?” is “purchased.”

Page 20: Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David

(constr(a) <= constr(b)) (constr(a) => constr(b))

City

City Map

City

City Map

a b

The target (a) expects many maps, but the source can’t supply them.

Page 21: Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David

(constr(a) <= constr(b)) (constr(a) => constr(b))

City

City Map

City

City Map

a b

The target (a) expects one map, but the source can supply many.

Page 22: Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David

(constr(a) <= constr(b)) (constr(a) => constr(b))

City

City Map

City

City Map

a b

The target (a) expects at least one and potentially many maps,but the source may have none or at most one.

o

Page 23: Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David

Matching (Derived)

• Generalization/Specialization• Composite Values• Derived Relationship Sets• Displayable/Nondisplayable Object Sets

Page 24: Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David

Generalization/Specialization

• For a target object set, a source object set may:– have no overlap (just ignore)– have a proper subset (accept or find missing

generalization)– have the same values (direct match)– have a proper superset (hard, except for roles)– overlap (like proper subset and proper superset)

• Consider roles and missing generalizations

Page 25: Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David

Roles

target:

source:

City Travel Video

City Clip: Video

o o

o o

Video WithCity Scene

Video WithCity Scene

Page 26: Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David

Missing Generalization

target source

City Map Country Map City Map: Image Country Map: Image

Map: Image

Map: Image

Page 27: Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David

Composite Values

• Composite in Source (split)• Composite in Target (merge)• Examples of Derived Relationships

Page 28: Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David

Composite in Source

Video

Nr Hours Nr Minutes

Video

Time

Nr Hours Nr Minutes

target source

Note also that we generated a source path.

Page 29: Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David

Composite in Source

Video

Nr Hours Nr Minutes

Video

Nr Hours Nr Minutes

target source

Page 30: Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David

Composite in Target

Video

Nr Hours Nr Minutes

target

Video

Time

source

Time

Page 31: Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David

Composite in Target

Video

target

Video

Time

source

Time

Page 32: Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David

Displayable/NondisplayableObject-Set Matches

• Nondisplayable in Source: find a key

• Nondisplayable in Target: create a key

Page 33: Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David

Nondisplayable in Source

target source

Airport Airport

No Key: Discard Match

City

Airline

flys to

serves

Page 34: Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David

Nondisplayable in Source

target source

Airport Airport

No Key: Discard Match

City

Airline

flys to

serves

Page 35: Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David

Nondisplayable in Source

target source

Airport Airport

One Key: Choose it

City

Airline

flys to

serves

Airport Name

Page 36: Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David

Nondisplayable in Source

target source

Airport Airport

One Key: Choose it

City

Airline

flys to

serves

Airport Name

Page 37: Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David

Nondisplayable in Source

target source

Airport Airport

Two or more Keys: Choose One

City

Airline

flys to

serves

Airport Name

Airport Code

Page 38: Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David

Nondisplayable in Source

target source

Airport Airport

Two or more Keys: Choose One

City

Airline

flys to

serves

Airport Name

Airport Code

Page 39: Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David

Matching Algorithm

Page 40: Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David
Page 41: Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David

Sample Match Table

Page 42: Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David

Pictorial View of Match Table

target

source

Page 43: Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David

Summary

Page 44: Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David

Concluding Remarks

• QED (the theorem holds)

Let f be the generated mapping from target t to source s,populated such that s has a valid interpretation. Let t’ bethe submodel of t populated from s by f. Then t’ has avalid interpretation.

Proof: the paper is the proof …

Page 45: Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David

Pictorial View of Match Table

t = target

s = source

f = the mapping

t’ has a validinterpretation

t’ = submodel

Page 46: Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David

Concluding Remarks

• QED (the theorem holds)• Merge (several sources)

– All sources extracted to same view– Union merge

• Object identity problems• Constraint problems

• Source Modeling (convert to OSM)• Framework defined, but not implemented