entityrank: searching entities directly and holistically - tao cheng, xifeng yan, kevin chen-chuan...

28
Entities Directly and Entities Directly and Holistically Holistically - Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang CS Department, UIUC Presented By: Md. Abdus Salam PhD Student CSE Department, UTA

Upload: pauline-andrews

Post on 18-Dec-2015

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: EntityRank: Searching Entities Directly and Holistically - Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang CS Department, UIUC Presented By: Md. Abdus Salam

EntityRank: Searching EntityRank: Searching

Entities Directly and Entities Directly and

HolisticallyHolistically

- Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang

CS Department, UIUC

Presented By: Md. Abdus SalamPhD StudentCSE Department, UTA

Page 2: EntityRank: Searching Entities Directly and Holistically - Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang CS Department, UIUC Presented By: Md. Abdus Salam

Motivating ScenarioMotivating Scenario

Customer service phone number of Amazon?

Page 3: EntityRank: Searching Entities Directly and Holistically - Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang CS Department, UIUC Presented By: Md. Abdus Salam

Search on Amazon?

Page 4: EntityRank: Searching Entities Directly and Holistically - Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang CS Department, UIUC Presented By: Md. Abdus Salam

Search on Google?

Page 5: EntityRank: Searching Entities Directly and Holistically - Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang CS Department, UIUC Presented By: Md. Abdus Salam

Many many Similar CasesMany many Similar CasesThe email of Luis Gravano?What profs are doing databases at UIUC?The papers and presentations of ICDE

2007?Due date of SIGMOD 2008?Sale price of “Canon PowerShot A400”?“Hamlet” books available at bookstores?

Often times, we are looking for data entities, e.g. emails, dates, prices, etc, not pages.

Page 6: EntityRank: Searching Entities Directly and Holistically - Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang CS Department, UIUC Presented By: Md. Abdus Salam

What you search is not what you want.

Page 7: EntityRank: Searching Entities Directly and Holistically - Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang CS Department, UIUC Presented By: Md. Abdus Salam

From pages to entitiesFrom pages to entitiesTraditional Search Entity Search

Keywords Entities

ResultsResults Support

Page 8: EntityRank: Searching Entities Directly and Holistically - Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang CS Department, UIUC Presented By: Md. Abdus Salam

Concretely, what is meant by

Entity Search?

Page 9: EntityRank: Searching Entities Directly and Holistically - Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang CS Department, UIUC Presented By: Md. Abdus Salam

9

Entity Search Problem:Entity Search Problem: Given:

Input: Keywords & Entities (optionally with a pattern)

E.g. Amazon Customer Service #phone

Output: Ranked Entity Tuples

……

0.60

0.80

0.90

Page 10: EntityRank: Searching Entities Directly and Holistically - Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang CS Department, UIUC Presented By: Md. Abdus Salam

10

How to rank Entities?

Challenge: Challenge:

Page 11: EntityRank: Searching Entities Directly and Holistically - Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang CS Department, UIUC Presented By: Md. Abdus Salam

Characteristics I: ContextualCharacteristics I: Contextual -Utilize Entities’ Surrounding -Utilize Entities’ Surrounding ContextContext

Content

Context

Page 12: EntityRank: Searching Entities Directly and Holistically - Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang CS Department, UIUC Presented By: Md. Abdus Salam

Characteristics II: UncertainCharacteristics II: Uncertain -Extractions are -Extractions are

non”prefect”non”prefect”

Page 13: EntityRank: Searching Entities Directly and Holistically - Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang CS Department, UIUC Presented By: Md. Abdus Salam

Characteristics III: HolisticCharacteristics III: Holistic -Many evidences from multiple -Many evidences from multiple sourcessources

Page 14: EntityRank: Searching Entities Directly and Holistically - Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang CS Department, UIUC Presented By: Md. Abdus Salam

Characteristics IV: Characteristics IV: DiscriminativeDiscriminative - Web Pages are of Varying - Web Pages are of Varying QualityQuality

Page 15: EntityRank: Searching Entities Directly and Holistically - Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang CS Department, UIUC Presented By: Md. Abdus Salam

Characteristics V: AssociativeCharacteristics V: Associative -Tell True Associations from -Tell True Associations from AccidentalAccidental

Example: Finding Prof. Luis Gravano’s Email

Observation: [email protected] appears very frequently with keywords “Luis”, “Gravano”

However, such association is only accidental as [email protected] appears on many pages.

Page 16: EntityRank: Searching Entities Directly and Holistically - Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang CS Department, UIUC Presented By: Md. Abdus Salam

EntityRankEntityRank: The Impression : The Impression ModelModel

Tireless Observer ... ... ...

?? ??

Access Layer: Global Aggregation

Recognition Layer: Local Assessment

Validation Layer: Hypothesis Testing

……

0.60

0.80

0.90

Page 17: EntityRank: Searching Entities Directly and Holistically - Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang CS Department, UIUC Presented By: Md. Abdus Salam

17

Recognition Layer: Local Recognition Layer: Local AssessmentAssessment

Contextual

Uncertain

Holistic

Discriminative

Associative

Input: L1

L2:d

Output: )|)7575201)800((( dqp

)|)7400376408(( dqp )|)2006.02.9(( dqp

Page 18: EntityRank: Searching Entities Directly and Holistically - Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang CS Department, UIUC Presented By: Md. Abdus Salam

18

Access Layer: Global Access Layer: Global AggregationAggregation

Contextual

Uncertain

Holistic

Discriminative

Associative

Holistic Discriminative

d

o dpqpp )()d|7575)-201-800((

Output:

Input:

1d

)|)7575201800(( 1dqp

2d

)|)7575201800(( 2dqp

3d

)|)7575201800(( 3dqp

Page 19: EntityRank: Searching Entities Directly and Holistically - Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang CS Department, UIUC Presented By: Md. Abdus Salam

19

Validation Layer: Hypothesis Validation Layer: Hypothesis TestingTesting

rp

Contextual

Uncertain

Holistic

Discriminative

Associative

op

Input:

Collection E over D

Output:

)1

1log)1(log(2))((

r

oo

r

oo

p

pp

p

pptqScore

Virtual Collection E’ over D’

randomize

Page 20: EntityRank: Searching Entities Directly and Holistically - Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang CS Department, UIUC Presented By: Md. Abdus Salam

EntityRankEntityRank: The Scoring : The Scoring FunctionFunction

Local RecognitionGlobal AggregationValidation

Page 21: EntityRank: Searching Entities Directly and Holistically - Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang CS Department, UIUC Presented By: Md. Abdus Salam

21

Sort-merge Join

Query ProcessingQuery Processing

7, 33

d9

3d7

10d6

5d3

8, 25

d1

Doc Posting Doc

8, 24

d7

66d5

11d3

Posting

44d8

9d7

12d3

Doc Posting

Amazon Customer Service

(13,800-202-7575,1.0)(78,800-322-9266,1.0)

d7

(18,800-202-7575,1.0)

d3

(42,851-0400,0.8)d2

Doc Posting

#phone

Aggregation

800-202-7575: p1800-322-9266: p3800-202-7575: p2

800-322-9266: p5800-202-7575: p4

Hypothesis Test Result

Page 22: EntityRank: Searching Entities Directly and Holistically - Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang CS Department, UIUC Presented By: Md. Abdus Salam

22

Experiment SetupExperiment SetupCorpus: General crawl of the Web(Aug,

2006), around 2TB with 93M pages.

Entities: Phone (8.8M distinctive instances) Email (4.6M distinctive instances)

System: A cluster of 34 machines

Page 23: EntityRank: Searching Entities Directly and Holistically - Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang CS Department, UIUC Presented By: Md. Abdus Salam

23

Comparing EntityRank to the Comparing EntityRank to the Following Different ApproachesFollowing Different Approaches

Contextual

Uncertain Holistic Discriminative

Associative

Naïve

Local

Global

Combine

Without

EntityRank

Page 24: EntityRank: Searching Entities Directly and Holistically - Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang CS Department, UIUC Presented By: Md. Abdus Salam

Online Demo.

Page 25: EntityRank: Searching Entities Directly and Holistically - Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang CS Department, UIUC Presented By: Md. Abdus Salam

25

Example Query ResultsExample Query Results

Page 26: EntityRank: Searching Entities Directly and Holistically - Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang CS Department, UIUC Presented By: Md. Abdus Salam

26

ConclusionsConclusionsFormulate the entity search problem

Study and define the characteristics of entity search

Conceptual Impression Model and concrete EntityRank framework for ranking entities

An online prototype with real Web corpus

Page 27: EntityRank: Searching Entities Directly and Holistically - Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang CS Department, UIUC Presented By: Md. Abdus Salam

Thank You !Thank You !

Questions?

Page 28: EntityRank: Searching Entities Directly and Holistically - Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang CS Department, UIUC Presented By: Md. Abdus Salam

ReferenceReferenceEntityRank: Searching Entities Directly and Holistically. T. Cheng, X. Yan, and K. C.-C. Chang. In Proceedings of the 33rd Very Large Data Bases Conference (VLDB 2007), pages 387-398, Vienna, Austria, September 2007

http://www-forward.cs.uiuc.edu/talks/2007/entityrank-vldb07-cyc-sep07.ppt