re-contextualization and contextual entity exploration · supporting interpretations of forgotten...

30
1 Re-contextualization and contextual Entity exploration Sebastian Holzki June 7, 2016 Sebastian Holzki 1

Upload: others

Post on 17-Jun-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Re-contextualization and contextual Entity exploration · Supporting Interpretations of Forgotten Stories by Time-aware Re-Contextualization. Proceedings of the Eighth International

1

Re-contextualization and

contextual Entity exploration

Sebastian Holzki

June 7, 2016 Sebastian Holzki 1

Page 2: Re-contextualization and contextual Entity exploration · Supporting Interpretations of Forgotten Stories by Time-aware Re-Contextualization. Proceedings of the Eighth International

- PAPER PRESENTATION -

LEVERAGING KNOWLEDGE BASES FOR CONTEXTUAL

ENTITY EXPLORATION

Authors: Joonseok Lee, Ariel Fuxman, Bo Zhao, and Yuanhua Lv

June 7, 2016 Sebastian Holzki 2

Page 3: Re-contextualization and contextual Entity exploration · Supporting Interpretations of Forgotten Stories by Time-aware Re-Contextualization. Proceedings of the Eighth International

Information needs

• Primary

June 7, 2016 Sebastian Holzki 3

Page 4: Re-contextualization and contextual Entity exploration · Supporting Interpretations of Forgotten Stories by Time-aware Re-Contextualization. Proceedings of the Eighth International

Contextual entity exploration problem

June 7, 2016 Sebastian Holzki

User selection Context

Knowledge Base Knowledge Graph

4

Text to entity

mapping

Page 5: Re-contextualization and contextual Entity exploration · Supporting Interpretations of Forgotten Stories by Time-aware Re-Contextualization. Proceedings of the Eighth International

Using knowledge bases effectively

• Build a focused subgraph of the knowledge graph

• User selection

• Context around user selection

• Explore the graph for other relevant entities

June 7, 2016 Sebastian Holzki 5

Figure: Lee, Fuxman, Bo and Yuanhua [1], (adapted)

Context: blue

User-selection: gray

Page 6: Re-contextualization and contextual Entity exploration · Supporting Interpretations of Forgotten Stories by Time-aware Re-Contextualization. Proceedings of the Eighth International

Creating a reasonable ranking

June 7, 2016 Sebastian Holzki

Context

Focused subgraph

6

Text to entity

mapping

Deliver a limited

number of entities,

ordered by relevance

1 …

2 …

3 …

4 …

Page 7: Re-contextualization and contextual Entity exploration · Supporting Interpretations of Forgotten Stories by Time-aware Re-Contextualization. Proceedings of the Eighth International

How to measure relevance?

• Ability to connect user-selected entity and context entities

• Context-Selection Betweenness

• Overall relevance to user-selected entity

• Personalized Random Walk

June 7, 2016 Sebastian Holzki 7

Figure: Lee, Fuxman, Bo and Yuanhua [1], (adapted)

Page 8: Re-contextualization and contextual Entity exploration · Supporting Interpretations of Forgotten Stories by Time-aware Re-Contextualization. Proceedings of the Eighth International

Context-Selection betweenness (background)

Employing „Normalized Wikipedia Distance“ *

Iu and Iv: Neighbours of u and v

• Measures the semantic distance of Wikipedia pages (u and v)

• More common neighbours → less distant

• Normalized by corpus size

June 7, 2016 Sebastian Holzki 8

* Already introduced within this course to measure entity coherence, namely: mw-coh.

Page 9: Re-contextualization and contextual Entity exploration · Supporting Interpretations of Forgotten Stories by Time-aware Re-Contextualization. Proceedings of the Eighth International

Context-Selection betweenness

June 7, 2016 Sebastian Holzki 9

• θ = 0.5: Emphasizes more relevant contexts

• Weight relative to path-length l and number of all shortest paths k

• Normalized by weight of all shortest paths Z (Not only those including node v)

Result: Nodes v connecting s and c through short paths have

higher Context-Selection Betweenness

Page 10: Re-contextualization and contextual Entity exploration · Supporting Interpretations of Forgotten Stories by Time-aware Re-Contextualization. Proceedings of the Eighth International

Personalized Random Walk

Motivation:

Consider the overall relevance of an entity

• „Personalized“ means allowing arbitrary jumps:

• Probability for user selected entity: xs

• Probabilities for context entities: xc/|C|

• Result: emphasize on entities near user selection or context

Reminder:

„Random Walk“ assigns stationary probabilities to nodes in a graph,

by randomly „walking“ through the graph.

June 7, 2016 Sebastian Holzki 10

Page 11: Re-contextualization and contextual Entity exploration · Supporting Interpretations of Forgotten Stories by Time-aware Re-Contextualization. Proceedings of the Eighth International

Score Aggregation

1. Large graphs produce low probabilities

2. Context-Selection Betweenness is biased by number of context

entities

3. Trust in Context-Selection Betweenness is affected by graph-size

and number of context entities

June 7, 2016 Sebastian Holzki 11

1

2

3

Page 12: Re-contextualization and contextual Entity exploration · Supporting Interpretations of Forgotten Stories by Time-aware Re-Contextualization. Proceedings of the Eighth International

Implementation details

• Wikipedia (2014) as knowledge base

• Entity linking through Wikipedia Hyperlinks

• Personalized Random Walk Parameters: xs = 0.05; xc = 0

• Aggregated score used as ranking

• User selection generated by crowd workers

• Evaluation of ranking also done by crowd workers

June 7, 2016 Sebastian Holzki 12

Page 13: Re-contextualization and contextual Entity exploration · Supporting Interpretations of Forgotten Stories by Time-aware Re-Contextualization. Proceedings of the Eighth International

Results: comparison to baselines

June 7, 2016 Sebastian Holzki 13

Source: Lee, Fuxman, Bo and Yuanhua [1]

Page 14: Re-contextualization and contextual Entity exploration · Supporting Interpretations of Forgotten Stories by Time-aware Re-Contextualization. Proceedings of the Eighth International

Results: scores and aggregation

June 7, 2016 Sebastian Holzki 14

Source: Lee, Fuxman, Bo and Yuanhua [1]

Page 15: Re-contextualization and contextual Entity exploration · Supporting Interpretations of Forgotten Stories by Time-aware Re-Contextualization. Proceedings of the Eighth International

Summary

• Switching between primary and secondary sources

• Contextual entity exploration

• Relevance in knowledge graphs

• Context-Selection Betweenness + Personalized Random Walk

• Comparison to baseline

Future work (Source: [1])

• Different contextualization systems have low degree of overlap

• Idea: combine them and re-rank

June 7, 2016 Sebastian Holzki 15

Page 16: Re-contextualization and contextual Entity exploration · Supporting Interpretations of Forgotten Stories by Time-aware Re-Contextualization. Proceedings of the Eighth International

[Backup] Normalized Wikipedia Distance Example

June 7, 2016 Sebastian Holzki 16

| Iu | = 4

| Iv | = 3

| Iu ∩ Iv| = 2

|V| = 9

NWD = (log(4) - log(2)) / (log(9) - log(3)) ≈ 0.631

Page 17: Re-contextualization and contextual Entity exploration · Supporting Interpretations of Forgotten Stories by Time-aware Re-Contextualization. Proceedings of the Eighth International

[Backup] Results: influence of entity linking

June 7, 2016 Sebastian Holzki 17

Source: Lee, Fuxman, Bo and Yuanhua [1]

Page 18: Re-contextualization and contextual Entity exploration · Supporting Interpretations of Forgotten Stories by Time-aware Re-Contextualization. Proceedings of the Eighth International

- PAPER PRESENTATION -

BACK TO THE PAST: SUPPORTING INTERPRETATIONS

OF FORGOTTEN STORIES BY TIME-AWARE RE-

CONTEXTUALIZATION

Authors: Nam Khanh Tran, Andrea Ceroni, Nattiya Kanhabua and Claudia Niederée

June 7, 2016 Sebastian Holzki 18

Page 19: Re-contextualization and contextual Entity exploration · Supporting Interpretations of Forgotten Stories by Time-aware Re-Contextualization. Proceedings of the Eighth International

Context of documents from the past

June 7, 2016 Sebastian Holzki 19

Source: Course slides of Nam Khanh Tran [3]

Information item Time-aware context

Page 20: Re-contextualization and contextual Entity exploration · Supporting Interpretations of Forgotten Stories by Time-aware Re-Contextualization. Proceedings of the Eighth International

Four goals of contextualizing „the past“

1. Relevance: Context c has to be relevant to information item d

2. Complementarity: Avoid displaying duplicate information

3. Time-awareness: Consider creation-time of d

4. Conciseness: Show brief context, not whole pages

June 7, 2016 Sebastian Holzki 20

Page 21: Re-contextualization and contextual Entity exploration · Supporting Interpretations of Forgotten Stories by Time-aware Re-Contextualization. Proceedings of the Eighth International

Prerequisites

• Document with contextualization hooks

• Contextualization hook:

„… parts of document, that require contextualization.“ [2]

Entities, concept mentions, …

• Knowledge base

• Facility to provide proper context for hooks (Proper: with regard to the four goals)

June 7, 2016 Sebastian Holzki 21

Page 22: Re-contextualization and contextual Entity exploration · Supporting Interpretations of Forgotten Stories by Time-aware Re-Contextualization. Proceedings of the Eighth International

Pre processed

knowledge base

Main concept of the paper

June 7, 2016 Sebastian Holzki 22

Document Hook

identification

Query

formulation

Index of

contextualization units Context

retrieval

Context

ranking

Ranked context

1 …

2 …

3 …

Source: Course slides of Nam Khanh Tran [3] and original paper [2]

Page 23: Re-contextualization and contextual Entity exploration · Supporting Interpretations of Forgotten Stories by Time-aware Re-Contextualization. Proceedings of the Eighth International

Query formulation

1. Document-based queries (title, lead paragraph)

2. Hook-based queries (all hooks, each hook on its own)

3. Learning to select promising queries out of power-set of hooks

• Trained on Recall@k

• Some prediction-features:

• Linguistic: entities, document length, number of nouns, …

• Temporal scope (±2 years), temporal similarity, temporal document frequency

• …

June 7, 2016 Sebastian Holzki 23

Query Performance Prediction (QPP)

Page 24: Re-contextualization and contextual Entity exploration · Supporting Interpretations of Forgotten Stories by Time-aware Re-Contextualization. Proceedings of the Eighth International

Context retrieval

• Query-likelihood language model:

• Rank contextualization units c according to the probability P(c|q)

=> How likely is it, that c generates the given query q at random

• Assuming independent query terms

• Serves for relevance goal

June 7, 2016 Sebastian Holzki 24

Page 25: Re-contextualization and contextual Entity exploration · Supporting Interpretations of Forgotten Stories by Time-aware Re-Contextualization. Proceedings of the Eighth International

Learning to re-rank context

• Contextualization units initially ranked by retrieval model

• Goal of reranking: Achieve high precision-measure

• Also consider a variety of features

• Trained by examples: [proper context => document]

• Incorporated features:

• Topic diversity

• Entity difference, text difference (complementarity)

• Temporal similarity of context and document

June 7, 2016 Sebastian Holzki 25

Page 26: Re-contextualization and contextual Entity exploration · Supporting Interpretations of Forgotten Stories by Time-aware Re-Contextualization. Proceedings of the Eighth International

Evaluation: query formulation methods

June 7, 2016 Sebastian Holzki 26

Highlighted: variations of machine-learned

approach (Query Performance Prediction)

Page 27: Re-contextualization and contextual Entity exploration · Supporting Interpretations of Forgotten Stories by Time-aware Re-Contextualization. Proceedings of the Eighth International

Evaluation: re-ranking context

RF: learning to rank approach using RankForest

LM: standard query-likelihood language model

LM-T: modification of LM including time-awareness

• Same query formulation method QPP@100

• Learning to rank performs significantly better

• Complementarity plays an important role

June 7, 2016 Sebastian Holzki 27

Baselines {

Page 28: Re-contextualization and contextual Entity exploration · Supporting Interpretations of Forgotten Stories by Time-aware Re-Contextualization. Proceedings of the Eighth International

Summary

• Contextualization need for documents from the past

• Usual entity contextualization falls short

• Time-awareness and complementarity improve context quality

• Query formulation optimized for recall

• Context retrieval and ranking optimized for precision

• Results show significant improvements

June 7, 2016 Sebastian Holzki 28

Page 29: Re-contextualization and contextual Entity exploration · Supporting Interpretations of Forgotten Stories by Time-aware Re-Contextualization. Proceedings of the Eighth International

Topic: Contextualization – overall summary

• Focus on user-selection/hooks

• Well structured and large knowledge bases are mandatory

• Some goals require more effort (time-awareness)

• Automated contextualization makes life a lot easier

Thanks for your attention!

June 7, 2016 Sebastian Holzki 29

Page 30: Re-contextualization and contextual Entity exploration · Supporting Interpretations of Forgotten Stories by Time-aware Re-Contextualization. Proceedings of the Eighth International

Sources

[1] Lee, Joonseok and Fuxman, Ariel and Zhao, Bo and Lv, Yuanhua. Leveraging Knowledge Bases for

Contextual Entity Exploration. Proceedings of the 21th ACM SIGKDD International Conference on

Knowledge Discovery and Data Mining, 2015.

[2] Nam Khanh Tran, Andrea Ceroni, Nattiya Kanhabua and Claudia Niederée. Back to the Past:

Supporting Interpretations of Forgotten Stories by Time-aware Re-Contextualization. Proceedings of

the Eighth International Conference on Web Search and Data Mining (WSDM'15), Shanghai, China,

February, 2015

[3] Nam Khanh Tran. Contextualization (Course slides). Lecture “Web Science” at Leibniz Universität

Hannover, May 3, 2016.

June 7, 2016 Sebastian Holzki 30