metro maps of dafna shahaf carlos guestrin eric horvitz

45
Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

Upload: harvey-wilcox

Post on 24-Dec-2015

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

Metro Maps of

Dafna ShahafCarlos Guestrin

Eric Horvitz

Page 2: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

The abundance of books is a distraction‘‘

,,Lucius Annaeus Seneca

4 BC – 65 AD

Page 3: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

… and it does not get any better

• 129,864,880 Books (Google estimate)

• Research:– PubMed: 19 million papers

(One paper added per minute!)– Scopus: 40 million papers

Page 4: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

Papers

InnovativePapers

Page 5: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

So, you want to understand a research topic…

Now what?

Page 6: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

Search Engines are Great

• But do not show how it all fits together

Page 7: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

Timeline Systems

Page 8: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

Research is not Linear

Page 9: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

Metro Map

• A map is a set of lines of articles• Each line follows a coherent narrative thread• Temporal Dynamics + Structure

austerity

bailout

junk status

Germany

protests

strike

labor unionsMerkel

Page 10: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

Map Definition• A map M is a pair (G, P) where

– G=(V,E) is a directed graph– P is a set of paths in G (metro lines)– Each e Î E must belong to at least one metro line

austerity

bailout

junk status

protests

strike

Germany

labor unionsMerkel

Page 11: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

Game Plan

Objective Algorithm Does itwork?

Page 12: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

Properties of a Good Map

1. Coherence

???

Page 13: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

1 2 3 4 5

Greece

Europe

ItalyRepublican

Protest

Coherence: Main IdeaConnecting the Dots [S, Guestrin, KDD’10]

Debt default

Coherence is not a property of local interactions:

Incoherent: Each pair shares different words

Page 14: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

1 2 3 4 5

Greece

Austerity

ItalyRepublican

Protest

Coherence: Main IdeaConnecting the Dots [S, Guestrin, KDD’10]

Debt default

A more-coherent chain:

Coherent: a small number of words captures the story

Page 15: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

Words are too Simple

1 2 3

Probability

NetworkCost

Sensor networks

Bayesiannetworks Social

networks

Page 16: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

Using the Citation Graph

• Create a graph per word– All papers mentioning the word – Edge weight = strength of influence [El-Arini, Guestrin KDD‘11]

3

6 7

4

9

2

8

1

5

Network

Where did paper 8 get the idea?

Do papers 8 and 9 mean the same thing?

Page 17: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

Words are too Simple

1 2 3

Probability

NetworkCost

Sensor networks

Bayesiannetworks Social

networks

Incoherent

Page 18: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

Properties of a Good Map

1. Coherence

Is it enough?

Page 19: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

Max-coherence MapQuery: Reinforcement Learning

Page 20: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

Properties of a Good Map

1. Coherence

2. Coverage

Should cover diverse topics important to

the user

Page 21: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

Coverage: What to Cover?

• Perhaps words?• Not enough:

SVM in oracle database 10gMilenova et al

VLDB '05

Support Vector Machines in Relational Databases RupingSVM '02

1

2

Page 22: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

Similar Content

1 2

Page 23: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

Different Impact Citing Venues and Authors:

Affected more authors/ venues

Very little intersection

1 2

Page 24: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

What to Cover?

• Instead of words…• Cover papers• A paper covers papers that

it had an impact on• High-coverage map:

impact on a lot of the corpus• Why descendants?

• Soft notion: [0,1]

Page 25: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

p has High Impact on q if…p

q

Many paths(especially short)

Note that our protocol is different from previous

work…

coherent

Formalize with coherent random walks

We use the algorithm of…

r

Page 26: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

Map Coverage• Documents cover pieces of the corpus:

CorpusCoverage

Page 27: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

High-coverage, Coherent Map

Page 28: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

Properties of a Good Map

1. Coherence

2. Coverage

3. Connectivity

Page 29: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

Definition: Connectivity

• Experimented with formulations• Users do not care about connection type• Encourage connections between pairs of lines

Page 30: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

Lines with No Intersection

Solution: Reward lines that had impact on each other

Perceptrons SVMOptimizing Kernels

for SVM

Face DetectionSVM for Facial

Recognition

Page 31: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

Tying it all Together:Map Objective

• Coherence– Either coherent or not: Constraint

• Coverage– Must have!

• Connectivity– Nice to have

Consider all coherent maps with maximum possible coverage.

Find the most connected one.

Page 32: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

Game Plan

Objective Algorithm Does itwork?

Page 33: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

Approach Overview

Documents D

1. Coherence graph G 2. Coverage function f

f( ) = ?

3. Increase Connectivity

Page 34: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

Coherence Graph: Main Idea

• Vertices correspond to short coherent chains• Directed edges between chains which can be

conjoined and remain coherent

1 2 3

4 5 6 5 8 9

1 2 3 5 8 9

Page 35: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

Finding High-Coverage Chains• Paths correspond to coherent chains.• Problem: find a path of length K maximizing

coverage of underlying articles

1 2 3

4 5 6 5 8 9

Cover( )

>

Cover( )

?

1 2 3 4 5 6

1 2 3 5 8 9

Page 36: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

Reformulation• Paths correspond to coherent chains.• Problem: find a path of length K maximizing

coverage of underlying articles

• Submodular orienteering– [Chekuri and Pal, 2005]– Quasipolynomial time recursive greedy– O(log OPT) approximation

Orienteering

a function of the nodes visited

Page 37: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

Approach Overview: Recap

Documents D

1. Coherence graph G 2. Coverage function f

f( ) = ?

3. Increase Connectivity

Encodes all coherent chains as

graph paths

Submodular orienteering [Chekuri & Pal, 2005]

Quasipoly time recursive greedy

O(log OPT) approximation

Page 38: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

Example Map: Reinforcement Learning

multi-agent cooperative joint teammdp states pomdp transition optioncontrol motor robot skills armbandit regret dilemma exploration armq-learning bound optimal rmax mdp

Page 39: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

Example Map Detail: SVM

Page 40: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

Game Plan

Objective Algorithm Does itwork?

Page 41: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

User Study

• Tricky!– No double-blind, no within-subject– Domain: understandable yet unfamiliar– Reinforcement Learning (RL)

Page 42: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

User Study

• 30 participants• First-year grad student, Reinforcement

Learning project• Update a survey paper from 1996• Identify research directions + relevant papers

– Google Scholar – Map and Google Scholar – Baselines: Map, Wikipedia

Page 43: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

Results (in a nutshell)Be

tter

Google Us Google Us

Map users find better papers, and

cover more important areas

Page 44: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

User CommentsHelpful

noticed directions I didn't know aboutgreat starting point

… get a basic idea of what science is up to

why don't you draw words on edges?

Legend is confusing

hard to get an idea from paper title alone

Page 45: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

Conclusions• Formulated metrics characterizing good maps for

the scientific domain• Efficient methods with theoretical guarantees• User studies highlight the promise of the method• Website on the way!• Personalization

Thank you!