learning the semantic meaning of a concept from the web yang yu and yun peng may 30, 2007...

30
Learning the Semantic Meaning of a Concept from the Web Yang Yu and Yun Peng May 30, 2007 [email protected], [email protected]

Upload: dulcie-ashlyn-booth

Post on 18-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Learning the Semantic Meaning of a Concept from the Web Yang Yu and Yun Peng May 30, 2007 yangyu1@umbc.edu, ypeng@umbc.edu

Learning the Semantic Meaning of a Concept from the Web

Yang Yu and Yun PengMay 30, [email protected], [email protected]

Page 2: Learning the Semantic Meaning of a Concept from the Web Yang Yu and Yun Peng May 30, 2007 yangyu1@umbc.edu, ypeng@umbc.edu

2

The Problem

Manually preparing training data for each concept in text classification based ontology mapping is expensive.

LIVING_THINGS

ANIMAL PLANT

HUMAN

MAN

CAT

WOMAN

TREE

ARBOR

GRASS

FRUTEX

Exemplars

Page 3: Learning the Semantic Meaning of a Concept from the Web Yang Yu and Yun Peng May 30, 2007 yangyu1@umbc.edu, ypeng@umbc.edu

3

Our Approach

Automatically collecting training data.

Benefits Reduce the amount of human work

http://www.google.com/

Page 4: Learning the Semantic Meaning of a Concept from the Web Yang Yu and Yun Peng May 30, 2007 yangyu1@umbc.edu, ypeng@umbc.edu

4

Overview

Background The semantic Web and ontology Ontology Mapping

Approach Prototype System Experimental Results

WEAPONS ontology LIVING_THINGS ontology

Limitations and Conclusions

Page 5: Learning the Semantic Meaning of a Concept from the Web Yang Yu and Yun Peng May 30, 2007 yangyu1@umbc.edu, ypeng@umbc.edu

5

Semantic Web and Ontology Mapping The Semantic Web

“an extension of the current web” ontology files and programs that use them

Ontology Mapping Interoperability problem Mapping

r = f (Ci, Cj) where i=1, …, n and j=1, …, m; r {equivalent, subClassOf, superClassOf,

complement, overlapped, other}

Page 6: Learning the Semantic Meaning of a Concept from the Web Yang Yu and Yun Peng May 30, 2007 yangyu1@umbc.edu, ypeng@umbc.edu

6

Approaches to Ontology Mapping Manual mapping String Matching Text classification

the semantic meaning of a concept can be reflected in the training data (exemplars) that use the concept

Probabilistic feature model Classification Results highly dependent on the quality of

exemplars

Page 7: Learning the Semantic Meaning of a Concept from the Web Yang Yu and Yun Peng May 30, 2007 yangyu1@umbc.edu, ypeng@umbc.edu

7

Motivation and Proposal

Preparing exemplars manually is costly

Billions of documents available on the web Search engines

Page 8: Learning the Semantic Meaning of a Concept from the Web Yang Yu and Yun Peng May 30, 2007 yangyu1@umbc.edu, ypeng@umbc.edu

8

The Proposal

Using the concept defined in an ontology and the semantic information to form a query and processing the search results to obtain exemplars

Verification Build a prototype system Check ontology mapping results

Page 9: Learning the Semantic Meaning of a Concept from the Web Yang Yu and Yun Peng May 30, 2007 yangyu1@umbc.edu, ypeng@umbc.edu

9

System overview – Part I

Ontology A

Parser

Processor

Search Engine

HTML Docs

Queries

Text Files

Links to Web Pages

WWW

Retriever

Retriever

1. Whole file

2. Only sentences containing search keywords

Page 10: Learning the Semantic Meaning of a Concept from the Web Yang Yu and Yun Peng May 30, 2007 yangyu1@umbc.edu, ypeng@umbc.edu

10

System overview– Part II

Ontology A Ontology BModel Builder

Mapping Results

Text Files (B)

Calculator

Feature Model

Text Files (A)

Rainbow

Rainbow

Page 11: Learning the Semantic Meaning of a Concept from the Web Yang Yu and Yun Peng May 30, 2007 yangyu1@umbc.edu, ypeng@umbc.edu

11

The model builder

LIVING_THINGS

ANIMAL PLANT

HUMAN

MAN

CAT

WOMAN

TREE

ARBOR

GRASS

FRUTEX

LIVING_THINGS

ANIMAL PLANT

HUMAN

MAN

CAT

WOMAN

TREE

ARBOR

GRASS

FRUTEX

Mutually exclusive and exhaustive Leaf classes C+ and C-

Page 12: Learning the Semantic Meaning of a Concept from the Web Yang Yu and Yun Peng May 30, 2007 yangyu1@umbc.edu, ypeng@umbc.edu

12

The calculator

Naïve Bayes text classifier tends to give extreme values (1/0)

Calculating conditional probabilities from raw classification data by taking average

Page 13: Learning the Semantic Meaning of a Concept from the Web Yang Yu and Yun Peng May 30, 2007 yangyu1@umbc.edu, ypeng@umbc.edu

13

An Example of the Calculator

APC

TANK-VEHICLE

AIR-DEFENSE-GUN

SAUDI-NAVAL-MISSILE-CRAFT

Classifier

200

10SAUDI-NAVAL-MISSILE-CRAFT

20AIR-DEFENSE-GUN

170TANK-VEHICLE

Num. of exemplars

Categories in WeaponsA.n3

P(TANK-VEHICLE | APC) = 170 /200= 0.85

P(AIR-DEFENSE-GUN | APC) = 0.10

P(SAUDI-NAVAL-MISSILE-CRAFT| APC) = 0.05

Ontology for Weapons

Page 14: Learning the Semantic Meaning of a Concept from the Web Yang Yu and Yun Peng May 30, 2007 yangyu1@umbc.edu, ypeng@umbc.edu

14

Experiments with WEAPONS ontology WeaponsA.n3 and WeaponsB.n3

Information Interoperation and Integration Conference (http://www.atl.lmco.com/projects/ontology/i3con.html)

Both have over 80 classes defined More than 60 classes are leaf classes

Page 15: Learning the Semantic Meaning of a Concept from the Web Yang Yu and Yun Peng May 30, 2007 yangyu1@umbc.edu, ypeng@umbc.edu

15

WeaponsA.n3Part of WeaponsA.n3

TANK-VEHICLE-

MODERN-NAVAL-SHIP

WEAPON

CONVENTIONAL-

WEAPON

WARPLANEARMORED-COMBAT-VEHICLE

PATROL-CRAFT

AIRCRAFT-CARRIER

SUPER-ETENDARD

Page 16: Learning the Semantic Meaning of a Concept from the Web Yang Yu and Yun Peng May 30, 2007 yangyu1@umbc.edu, ypeng@umbc.edu

16

WeaponsB.n3Part of WeaponsB.n3

TANK-VEHICLE-

MODERN-NAVAL-SHIP

WEAPON

CONVENTIONAL-WEAPON

WARPLANEARMORED-COMBAT-VEHICLE

LIGHT-TANK APC

PATROL-WARTER-CRAFT

AIRCRAFT-CARRIER

LIGHT-AIRCRAFT-CARRIER

PATROL-BOAT-RIVER

PATROL-BOAT

FIGHTER-PLANE

FIGHTER-ATTACK-PLANE

SUPER-ETENDARD-FIGHTER

Page 17: Learning the Semantic Meaning of a Concept from the Web Yang Yu and Yun Peng May 30, 2007 yangyu1@umbc.edu, ypeng@umbc.edu

17

Expected Results

TANK-VEHICLESUPER-ETENDARD

LIGHT-TANK

APCPATROL-WARTER-CRAFT

AIRCRAFT-CARRIER

LIGHT-AIRCRAFT-CARRIER

PATROL-BOAT-RIVER

PATROL-BOAT

FIGHTER-PLANE

FIGHTER-ATTACK-PLANE

SUPER-ETENDARD-FIGHTER

PATROL-CRAFT

WeaponsA.n3

WeaponsB.n3

Page 18: Learning the Semantic Meaning of a Concept from the Web Yang Yu and Yun Peng May 30, 2007 yangyu1@umbc.edu, ypeng@umbc.edu

18

A Typical Report

APCAPC

SELF-PROPELLED-ARTILLERY 0.357180681

TANK-VEHICLETANK-VEHICLE 0.2771392740.277139274

ICBM 0.10423636

MRBM 0.080615147

TOWED-ARTILLERY 0.054724102

SUPPORT-VESSEL 0.023265054

PATROL-CRAFT 0.019570325

MOLOTOV-COCKTAIL 0.015032411

TORPEDO-CRAFT 0.013677696

SUPER-ETENDARD 0.009856519

MORTAR 0.00772997

AIR-DEFENSE-GUN 0.002997109

MACHINE-GUN 0.000211772

MOLOTOV-COCKTAIL 0.000187578

TRUCK-BOMB 0.000171675

AS-9-KYLE-ALCM 0.000156403

ARABIL-100-MISSILE 0.000111953

AL-HIJARAH-MISSILE 7.65E-05

OGHAB-MISSILE 7.12E-05

BADAR-2000 4.28E-05

P(APC | Ci) where i = 1 … 63

...... ……

Page 19: Learning the Semantic Meaning of a Concept from the Web Yang Yu and Yun Peng May 30, 2007 yangyu1@umbc.edu, ypeng@umbc.edu

19

classes with highest conditional probability

0.38MRBM0.49AIRCRAFT-CARRIERFIGHTER-PLANE

0.3TANK-VEHICLE0.56SILKWORM-MISSILE-MODLIGHT-TANK

0.66PATROL-CRAFT0.51SILKWORM-MISSILE-MODPATROL-BOAT

0.54PATROL-CRAFT0.65SILKWORM-MISSILE-MODPATROL-BOAT-RIVER

0.52PATROL-CRAFT0.28SILKWORM-MISSILE-MODPATROL-WATERCRAFT

0.38MRBM0.83SILKWORM-MISSILE-MODFIGHTER-ATTACK-PLANE

0.51MRBM0.66SILKWORM-MISSILE-MODSUPER-ETENDARD-FIGHTER

0.36SELF-PROPELLED-ARTILLERY0.46

SILKWORM-MISSILE-MODAPC

0.57AIRCRAFT-CARRIER0.65AIRCRAFT-CARRIERLIGHT-AIRCRAFT-CARRIER

ProbSentences with KeywordsProbWhole fileNew Classes

Page 20: Learning the Semantic Meaning of a Concept from the Web Yang Yu and Yun Peng May 30, 2007 yangyu1@umbc.edu, ypeng@umbc.edu

20

LIVING_THINGS

ANIMAL PLANT

HUMAN

MAN

CAT

WOMAN

TREE

ARBOR

GRASS

FRUTEX

GIRL

Level1

Level2

Level3

Experiment with LIVING_THINGS ontology P(MAN | HUMAN) P (WOMAN | HUMAN) Find a mapping for GIRL

HUMAN

MAN

WOMAN

Page 21: Learning the Semantic Meaning of a Concept from the Web Yang Yu and Yun Peng May 30, 2007 yangyu1@umbc.edu, ypeng@umbc.edu

21

Experiment Results (1)

HUMAN

MAN

WOMAN

Results of experiment (1)

P (MAN | HUMAN) = 0.62

P (WOMAN | HUMAN) = 0.38

Page 22: Learning the Semantic Meaning of a Concept from the Web Yang Yu and Yun Peng May 30, 2007 yangyu1@umbc.edu, ypeng@umbc.edu

22

LIVING_THINGS

ANIMAL PLANT

HUMAN

MAN

CAT

WOMAN

TREE

ARBOR

GRASS

FRUTEX

GIRL

Level1

Level2

Level3

Experiment Results (2)

1P(WOMAN | GIRL)

0P(MAN | GIRL)

0.30P(CAT | GIRL)

0.70P(HUMAN | GIRL)

0.23P(PLANT | GIRL)

0.76P(ANIMAL | GIRL)

0P(PYCNOGONID | GIRL)

0.43P(HUMAN | GIRL)

0.01P(CAT | GIRL)

0.56P(DOG | GIRL)

0.37P(MAN | GIRL)

0.63P(WOMAN | GIRL)

0.08P(CAT | GIRL)

0.92P(HUMAN | GIRL)

0.17P(PLANT | GIRL)

0.83P(ANIMAL | GIRL)

With clustering on exemplarsWithout clustering on exemplars

with additional classes

clusty.com

Page 23: Learning the Semantic Meaning of a Concept from the Web Yang Yu and Yun Peng May 30, 2007 yangyu1@umbc.edu, ypeng@umbc.edu

23

Additional Experiments: Different Queries

Living+things+plant+Plantae+tree+arborarbor

Living+things+plant+Plantae+tree+Frutexfrutex

Living+things+plant+Plantae+grassgrass

Living+things+plant+Plantae+treetree

Living+things+animal+Animalia+human+intelligent+woman+femalewoman

Living+things+animal+Animalia+human+intelligent+man+maleman

Living+things+animal+Animalia+human+intelligenthuman

Living+things+animal+Animalia+cat+Felidaecat

Living+things+plant+Plantaeplant

Living+things+animal+Animaliaanimal

Living+thingsliving+things

QueriesConcepts

Queries augmented with class properties

Page 24: Learning the Semantic Meaning of a Concept from the Web Yang Yu and Yun Peng May 30, 2007 yangyu1@umbc.edu, ypeng@umbc.edu

24

Experiment Results (3)

0.070.09P(WOMAN | HUMAN)

0.930.91P(MAN | HUMAN)

Keyword SentencesWholeConditional Probability

0.840.86P(WOMAN | GIRL)

0.160.14P(MAN | GIRL)

0.170.22P(CAT | GIRL)

0.830.78P(HUMAN | GIRL)

0.170.1P(PLANT | GIRL)

0.830.9P(ANIMAL | GIRL)

Keyword SentencesWholeConditional Probability

HUMAN

MAN

WOMAN

LIVING_THINGS

ANIMAL PLANT

HUMAN

MAN

CAT

WOMAN

TREE

ARBOR

GRASS

FRUTEX

GIRL

Level1

Level2

Level3

Results of experiment (1) with new queries

Results of experiment (2) with new queries

Page 25: Learning the Semantic Meaning of a Concept from the Web Yang Yu and Yun Peng May 30, 2007 yangyu1@umbc.edu, ypeng@umbc.edu

25

Limitation 1: Relevancy != similarity

Search Results for concept A

Text related to concept A

Text against concept AText for concept A

i.e. desired exemplars

Text for related concept B

Page 26: Learning the Semantic Meaning of a Concept from the Web Yang Yu and Yun Peng May 30, 2007 yangyu1@umbc.edu, ypeng@umbc.edu

26

Limitation 2: “Conditional Probability” An exemplar is a combination of strings that

represent some usage of a concept. An exemplar is not an instance of a concept. The way we calculate conditional probability

is an estimation.

HUMAN

MAN

WOMAN

Page 27: Learning the Semantic Meaning of a Concept from the Web Yang Yu and Yun Peng May 30, 2007 yangyu1@umbc.edu, ypeng@umbc.edu

27

Limitation 3: Popularity != relevancy Limited by a search engine’s algorithm

PageRank™ Popularity does not equal relevancy

Weight cannot be specified for words in a search query

Page 28: Learning the Semantic Meaning of a Concept from the Web Yang Yu and Yun Peng May 30, 2007 yangyu1@umbc.edu, ypeng@umbc.edu

28

Related Research

UMBC OntoMapper Sushama Prasad, Peng Yun and Finin Tim, A Tool for Mapping between Two Ontologies

Using Explicit Information, AAMAS 2002 Workshop on Ontologies and Agent Systems, 2002. CAIMEN

Lacher S. Martin and Groh Georg ,Facilitating the Exchange of Explicit Knowledge through Ontology Mappings, Proc of the Fourteenth International FLAIRS conference, 2001.

GLUE Doan Anhai, Madhavan Jayant, Dhamankar Robin, Domingos Pedro, and Halevy Alon,

Learning to Match Ontologies on the Semantic Web, WWW2002, May, 2002.

Google Conditional Probability P(HUMAN | MAN) = 1.77 billion / 2.29 billion = 0.77 P(HUMAN | WOMAN) = 0.6 billion / 2.29 billion = 0.26 Wyatt D., Philipose M., and Choudhury T., Unsupervised Activity Recognition Using

Automatically Mined Common Sense. Proceedings of AAAI-05. pp. 21-27.

Page 29: Learning the Semantic Meaning of a Concept from the Web Yang Yu and Yun Peng May 30, 2007 yangyu1@umbc.edu, ypeng@umbc.edu

29

Conclusion and Future Work Text retrieved from the web can be used as

exemplars for text classification based ontology mapping Many parameters affect the quality of the

exemplars There are noise contained in the processed

documents Future work

Clustering Restrict search to highly relevant sites and web

resources

Page 30: Learning the Semantic Meaning of a Concept from the Web Yang Yu and Yun Peng May 30, 2007 yangyu1@umbc.edu, ypeng@umbc.edu

30

Questions

Thank you [email protected] [email protected]