semrank: ranking complex relationship search results on the semantic web kemafor anyanwu, angela...

35
SemRank: Ranking Complex Relationship Search Results on the Semantic Web Kemafor Anyanwu, Angela Maduko, Amit Sheth LSDIS lab , University of Georgia Paper presentation at WWW2005, Chiba Japan Kemafor Anyanwu, Angela Maduko, and Amit Sheth. SemRank: Ranking Complex Relationship Search Results on the Semantic Web, Proceedings of the 14th International World Wide Web Conference (WWW2005), Chiba, Japan, May 10-14, 2005, pp. 117-127. This work is funded by NSF-ITR-IDM Award#0325464 titled ‘SemDIS : Discovering Complex Relationships in the Semantic Web ’ and NSF-ITR-IDM Award#0219649 titled ‘Semantic Association Identification and Knowledge Discovery for National Security Applications.’

Upload: jasper-lindsey

Post on 29-Jan-2016

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SemRank: Ranking Complex Relationship Search Results on the Semantic Web Kemafor Anyanwu, Angela Maduko, Amit Sheth LSDIS labLSDIS lab, University of Georgia

SemRank: Ranking Complex Relationship Search Results

on the Semantic Web

Kemafor Anyanwu, Angela Maduko, Amit ShethLSDIS lab, University of Georgia

Paper presentation at WWW2005, Chiba JapanKemafor Anyanwu, Angela Maduko, and Amit Sheth. SemRank: Ranking Complex Relationship Search Results on the Semantic Web, Proceedings of the 14th International World Wide Web

Conference (WWW2005), Chiba, Japan, May 10-14, 2005, pp. 117-127.

This work is funded by NSF-ITR-IDM Award#0325464 titled ‘SemDIS: Discovering Complex Relationships in the Semantic Web’ and NSF-ITR-IDM Award#0219649 titled ‘Semantic Association Identification and Knowledge Discovery for National Security Applications.’

Page 2: SemRank: Ranking Complex Relationship Search Results on the Semantic Web Kemafor Anyanwu, Angela Maduko, Amit Sheth LSDIS labLSDIS lab, University of Georgia

Outline• The Problem• The SemRank relevance model• SemRank computational issues in the SSARK

system• Evaluating SemRank: strategy and issues• Related Work• Conclusion and Future work

Page 3: SemRank: Ranking Complex Relationship Search Results on the Semantic Web Kemafor Anyanwu, Angela Maduko, Amit Sheth LSDIS labLSDIS lab, University of Georgia

The Problem• [Anyanwu et al WWW2003] proposed a

query operator for finding complex relationships between entities

• [Angles et al ESWC05] a survey of graph-based query operations that should be enabled on the Semantic Web

• Question: How can results of relationship query operations be ranked?

Page 4: SemRank: Ranking Complex Relationship Search Results on the Semantic Web Kemafor Anyanwu, Angela Maduko, Amit Sheth LSDIS labLSDIS lab, University of Georgia

g

The Relationship Ranking Problem query q = (1, 3) (a pair of nodes)

2 3a

de

f1

5

b

c

7

6f

g

8 h

2.

g

31

1. Find the subgraph that covers q

4

.

.

.

3.

2. List the results in order of relevance

could be done with step 1 or as a separate

step

2n.

1

4

b d1. 3

Page 5: SemRank: Ranking Complex Relationship Search Results on the Semantic Web Kemafor Anyanwu, Angela Maduko, Amit Sheth LSDIS labLSDIS lab, University of Georgia

Things to think about• Relevance as best match vs. ????• Homogenous (hyperlinks) vs. heterogeneous

relationships• Should relevance be fixed for all situations?• Size of result set potentially large

Page 6: SemRank: Ranking Complex Relationship Search Results on the Semantic Web Kemafor Anyanwu, Angela Maduko, Amit Sheth LSDIS labLSDIS lab, University of Georgia

The SemRank Model

Page 7: SemRank: Ranking Complex Relationship Search Results on the Semantic Web Kemafor Anyanwu, Angela Maduko, Amit Sheth LSDIS labLSDIS lab, University of Georgia

SemRank’s Design Philosophy

• Tenet 1: Thou shall support variable rankings

• Tenet 2: Thou must not burden the user with complex query specification

• Tenet 3: Thou shall support main stream search paradigms

Page 8: SemRank: Ranking Complex Relationship Search Results on the Semantic Web Kemafor Anyanwu, Angela Maduko, Amit Sheth LSDIS labLSDIS lab, University of Georgia

SemRank’s Key Concepts

• Modulative Ranking • Relevance: Search Mode + Predictability• Refraction Count

– How varied is the result from what is expected from schema?

• Information Gain– How much information does a user gain by being

informed about a result?

• S-Match – Best semantic match with user need (if provided)

Page 9: SemRank: Ranking Complex Relationship Search Results on the Semantic Web Kemafor Anyanwu, Angela Maduko, Amit Sheth LSDIS labLSDIS lab, University of Georgia

High Information GainHigh Refraction CountHigh S-Match

Low Information GainLow Refraction CountHigh S-Match

adjustable search mode

Page 10: SemRank: Ranking Complex Relationship Search Results on the Semantic Web Kemafor Anyanwu, Angela Maduko, Amit Sheth LSDIS labLSDIS lab, University of Georgia

Modulative Rank Function

• Typical preference or rank function– Ranki = wij

* attrij

• What we want is, given– µ - weight function parameter– and attributes attr1, attr2 … attrk e.g. length– for each attribute, select appropriate weight functions

from g1, g2, … gm e.g. gi (µ) = µ • each gi is some function of µ

• Then– Ranki(µ ) = gj(µ) * (attrik

)

• where gj is the weight function selected for attrk

Page 11: SemRank: Ranking Complex Relationship Search Results on the Semantic Web Kemafor Anyanwu, Angela Maduko, Amit Sheth LSDIS labLSDIS lab, University of Georgia

Refraction as a measure of predictability

Page 12: SemRank: Ranking Complex Relationship Search Results on the Semantic Web Kemafor Anyanwu, Angela Maduko, Amit Sheth LSDIS labLSDIS lab, University of Georgia

Refraction

• The path “ enrolled_in taught_by married_to “ doesn’t exist anywhere at schema layer

• We say that the path refracts at node 3 • High refraction count in a path low

predictability

Student Course Professorenrolled_in

Spouse married_to

taught_by

1 2enrolled_in taught_by

4married_to

3

Page 13: SemRank: Ranking Complex Relationship Search Results on the Semantic Web Kemafor Anyanwu, Angela Maduko, Amit Sheth LSDIS labLSDIS lab, University of Georgia

Semantic Summary

C1 C2 C3 C4 C5

C1 p1, p2, p1, p2 p3

C2 P5, p4

C3 p1, p2 p1, p2 p3

C4 p4, p5

C5

C1

C5

C4C3

C2

p1

p2

p1

p3

p4

p1, p2

p5

p4

p5

p3

p1, p2

p2

C1 C3C2 C4

RepresentativeOntology Class

Page 14: SemRank: Ranking Complex Relationship Search Results on the Semantic Web Kemafor Anyanwu, Angela Maduko, Amit Sheth LSDIS labLSDIS lab, University of Georgia

Semantic Summary & Refraction.• A Semantic Summary is a graph of

representative ontology classes with appropriate relations as arcs

• For a path p = r1, p1, r2, p2, r3, there is a refraction at r2 if

• p1 (ROCi, ROCj) and p2 (ROCj, ROCk) (or vice versa) where– ROCi, ROCj, ROCk are representative ontology

classes of r1, r2, r3 respectively

Page 15: SemRank: Ranking Complex Relationship Search Results on the Semantic Web Kemafor Anyanwu, Angela Maduko, Amit Sheth LSDIS labLSDIS lab, University of Georgia

Information content and

Information gain

Page 16: SemRank: Ranking Complex Relationship Search Results on the Semantic Web Kemafor Anyanwu, Angela Maduko, Amit Sheth LSDIS labLSDIS lab, University of Georgia

Measuring Information Content of a Property • Content is related to uncertainty removed• Typically measured as some function of its

probability– High probability -> low information content

• For p P, P = set of property types, its information content ISP can be measured as: – ISP(pk) = log2(1/Prk(p = pk))

= - log2 ( [[ pk]] / [[ P ]] )

• ISP(p) is maximum when

– Pri = 1 / [[ P ]] = log [[ P ]]

Page 17: SemRank: Ranking Complex Relationship Search Results on the Semantic Web Kemafor Anyanwu, Angela Maduko, Amit Sheth LSDIS labLSDIS lab, University of Georgia

Information Content of a Property Sequence – global perspective• The information content of a sequence of properties p1 p2 p3 pk is

– max(ISP(pi)), 1 ≤ i ≤ k

p1p2 p3

Prob = high Prob = low Prob = high

Information content is dependent on p2

weak point

Page 18: SemRank: Ranking Complex Relationship Search Results on the Semantic Web Kemafor Anyanwu, Angela Maduko, Amit Sheth LSDIS labLSDIS lab, University of Georgia

Information Content – Local Perspective

• Global high information content but local low information content

• Given (a, p1, b), information content with respect to only the valid possibilities between a and b ?

(a, p1, b), and valid(p1) is

P = (ROCi, ROCj), a ROCi and b ROCj and superproperties

• Recompute probabilities based on P (local)– I =min(NI(pi) + average of other NI

Page 19: SemRank: Ranking Complex Relationship Search Results on the Semantic Web Kemafor Anyanwu, Angela Maduko, Amit Sheth LSDIS labLSDIS lab, University of Georgia

Total Information Content

Total information content =

Information content from global perspective +

Information content from local perspective

Page 20: SemRank: Ranking Complex Relationship Search Results on the Semantic Web Kemafor Anyanwu, Angela Maduko, Amit Sheth LSDIS labLSDIS lab, University of Georgia

S-MatchRelevance Specification as

keywords

Page 21: SemRank: Ranking Complex Relationship Search Results on the Semantic Web Kemafor Anyanwu, Angela Maduko, Amit Sheth LSDIS labLSDIS lab, University of Georgia

published_in located_in

Keywords

Page 22: SemRank: Ranking Complex Relationship Search Results on the Semantic Web Kemafor Anyanwu, Angela Maduko, Amit Sheth LSDIS labLSDIS lab, University of Georgia

S-Match

• Uses the “best semantic match” paradigm

• For a keyword ki and a property pj on a path:– SemMatch(ki, pj) = 0 < (2d)-1 1, where

• d is the minimum distance between the properties in a property hierarchy

• For a path ps, its S-Match value is:– the sum of the max(SemMatch(ki, pj))

Page 23: SemRank: Ranking Complex Relationship Search Results on the Semantic Web Kemafor Anyanwu, Angela Maduko, Amit Sheth LSDIS labLSDIS lab, University of Georgia

Putting it all together …….

Page 24: SemRank: Ranking Complex Relationship Search Results on the Semantic Web Kemafor Anyanwu, Angela Maduko, Amit Sheth LSDIS labLSDIS lab, University of Georgia

SemRank

• For a search mode and a path ps:

• Modulated information gain for ps, I(ps)

– I(ps) = (1-)(I(ps))-1 + I(ps)

• Modulated Refraction Count RC(ps)

– RC(ps) = RC(ps)

• SEMRANK(ps) = I(ps) (1+RC(ps)) (1+S-Match(ps))

Page 25: SemRank: Ranking Complex Relationship Search Results on the Semantic Web Kemafor Anyanwu, Angela Maduko, Amit Sheth LSDIS labLSDIS lab, University of Georgia

Computing SemRank in SSARK

Page 26: SemRank: Ranking Complex Relationship Search Results on the Semantic Web Kemafor Anyanwu, Angela Maduko, Amit Sheth LSDIS labLSDIS lab, University of Georgia

The SSARK system

Ranking Engine

Pipelined top-k resultsPreprocessor

Query Processor

RDFDocuments

Query & ResultInterface

User SubSystem

x ?? ?? ?? y

FDIXPHIXROIX

Index ManagerStorage Manager

LtStoreUtStore

Loader

LACLook Ahead

Cache

RCResult Cache

Preprocessingphase

Query Processingphase

Rankingphase

Page 27: SemRank: Ranking Complex Relationship Search Results on the Semantic Web Kemafor Anyanwu, Angela Maduko, Amit Sheth LSDIS labLSDIS lab, University of Georgia

2 3a

de

f1

5

b

c4

Approach

g

Query Processor

a f

fec b

db , 4

, 5 , 4 , 2 , 1

, 6

, 2 , 5

, 3

Ranking engine

Assigns SemRank* valuesto leaves of the tree i.e. edges on the path

* - without refraction count

g

Page 28: SemRank: Ranking Complex Relationship Search Results on the Semantic Web Kemafor Anyanwu, Angela Maduko, Amit Sheth LSDIS labLSDIS lab, University of Georgia

The Index Subsystem• FDIX – Frequency Distribution IndeX

– Stores the frequency distribution of properties

• ROIX – Representative Ontology IndeX– Maps classes to Representative Ontology Classes– Stores the semantic summary graph

• PHIX – Property Hierarchy IndeX– Uses the Dewey Decimal labeling scheme to

encode the hierarchical relationships in a property hierarchy

– Used for computing S-Match (match between keywords and properties in a path)

Page 29: SemRank: Ranking Complex Relationship Search Results on the Semantic Web Kemafor Anyanwu, Angela Maduko, Amit Sheth LSDIS labLSDIS lab, University of Georgia

a, 3 b, 2

c, 4 d, 1 e, 2 f, 5

h, 1 i, 6

g, 3

a, 3 b, 2

c, 4 d, 1 e, 2 f, 5

h, 1 i, 6

g, 3h, 1i, 6,

e, 2f, 5,

d, 1c, 4 ,

ab, 5

g.i, 9 ,

h, 1i, 6 ,

i, 6

a, 3 b, 2

c, 4 d, 1 e, 2 f, 5

h, 1

g, 3

gh, 4

d, 1

ab, 5

cf, 9 ,ce, 6

e, 2

f, 5 ,c 4 ,c.f, 9 ,

h, 1

i, 6 ,ab, 5

a, 3 b, 2

c, 4 d, 1 e, 2 f, 5

h, 1 i, 6

g, 3

gh, 4

g.i, 9

d1

ce, 6

e, 2f, 5 ,c4 ,

gi, 9 ,cf, 9 ,

cf, 9,ce , 6

. . .

Top-K Evaluation

Final Top_k:1. g.i, 182. c. f, 9

Page 30: SemRank: Ranking Complex Relationship Search Results on the Semantic Web Kemafor Anyanwu, Angela Maduko, Amit Sheth LSDIS labLSDIS lab, University of Georgia

Evaluation Issues• Data set needs

– Entities described with a variety of relationships– Richly connected hierarchies – Realistic frequency distributions

• Synthetically generated realistic small data set using human defined rules– e.g. |(p = “audits”)| ≤ 0.1 |(p = “enrolls”)|

Page 31: SemRank: Ranking Complex Relationship Search Results on the Semantic Web Kemafor Anyanwu, Angela Maduko, Amit Sheth LSDIS labLSDIS lab, University of Georgia

µ = 0

Page 32: SemRank: Ranking Complex Relationship Search Results on the Semantic Web Kemafor Anyanwu, Angela Maduko, Amit Sheth LSDIS labLSDIS lab, University of Georgia

µ = 1

Page 33: SemRank: Ranking Complex Relationship Search Results on the Semantic Web Kemafor Anyanwu, Angela Maduko, Amit Sheth LSDIS labLSDIS lab, University of Georgia

Related Work

• Semantic searching and ranking of entities on the Semantic Web

• Rocha et al WWW2004, Guha et al WWW2003, Stojanovic et al ISWC 2003 , Zhuge et al WWW2003,

• Semantic ranking of relationships • Halaschek VLDB demo 2004, Aleman-Meza et al

SWDB03

Page 34: SemRank: Ranking Complex Relationship Search Results on the Semantic Web Kemafor Anyanwu, Angela Maduko, Amit Sheth LSDIS labLSDIS lab, University of Georgia

Future Work• Comprehensive evaluation• Including some measures for importance of

nodes in the paths• Revise the Modulation function• Optimizing Top-K evaluation

– Decreasing height of tree– estimation techniques for a closer approximation

to SemRank ordering

Page 35: SemRank: Ranking Complex Relationship Search Results on the Semantic Web Kemafor Anyanwu, Angela Maduko, Amit Sheth LSDIS labLSDIS lab, University of Georgia

Data, demos, more publications at SemDis project web site (Google: semdis)

Thank You