metro maps of dafna shahaf carlos guestrin eric horvitz
TRANSCRIPT
Metro Maps of
Dafna ShahafCarlos Guestrin
Eric Horvitz
The abundance of books is a distraction‘‘
,,Lucius Annaeus Seneca
4 BC – 65 AD
… and it does not get any better
• 129,864,880 Books (Google estimate)
• Research:– PubMed: 19 million papers
(One paper added per minute!)– Scopus: 40 million papers
Papers
InnovativePapers
So, you want to understand a research topic…
Now what?
Search Engines are Great
• But do not show how it all fits together
Timeline Systems
Research is not Linear
Metro Map
• A map is a set of lines of articles• Each line follows a coherent narrative thread• Temporal Dynamics + Structure
austerity
bailout
junk status
Germany
protests
strike
labor unionsMerkel
Map Definition• A map M is a pair (G, P) where
– G=(V,E) is a directed graph– P is a set of paths in G (metro lines)– Each e Î E must belong to at least one metro line
austerity
bailout
junk status
protests
strike
Germany
labor unionsMerkel
Game Plan
Objective Algorithm Does itwork?
Properties of a Good Map
1. Coherence
???
1 2 3 4 5
Greece
Europe
ItalyRepublican
Protest
Coherence: Main IdeaConnecting the Dots [S, Guestrin, KDD’10]
Debt default
Coherence is not a property of local interactions:
Incoherent: Each pair shares different words
1 2 3 4 5
Greece
Austerity
ItalyRepublican
Protest
Coherence: Main IdeaConnecting the Dots [S, Guestrin, KDD’10]
Debt default
A more-coherent chain:
Coherent: a small number of words captures the story
Words are too Simple
1 2 3
Probability
NetworkCost
Sensor networks
Bayesiannetworks Social
networks
Using the Citation Graph
• Create a graph per word– All papers mentioning the word – Edge weight = strength of influence [El-Arini, Guestrin KDD‘11]
3
6 7
4
9
2
8
1
5
Network
Where did paper 8 get the idea?
Do papers 8 and 9 mean the same thing?
Words are too Simple
1 2 3
Probability
NetworkCost
Sensor networks
Bayesiannetworks Social
networks
Incoherent
Properties of a Good Map
1. Coherence
Is it enough?
Max-coherence MapQuery: Reinforcement Learning
Properties of a Good Map
1. Coherence
2. Coverage
Should cover diverse topics important to
the user
Coverage: What to Cover?
• Perhaps words?• Not enough:
SVM in oracle database 10gMilenova et al
VLDB '05
Support Vector Machines in Relational Databases RupingSVM '02
1
2
Similar Content
1 2
Different Impact Citing Venues and Authors:
Affected more authors/ venues
Very little intersection
1 2
What to Cover?
• Instead of words…• Cover papers• A paper covers papers that
it had an impact on• High-coverage map:
impact on a lot of the corpus• Why descendants?
• Soft notion: [0,1]
p has High Impact on q if…p
q
Many paths(especially short)
Note that our protocol is different from previous
work…
coherent
Formalize with coherent random walks
We use the algorithm of…
r
Map Coverage• Documents cover pieces of the corpus:
CorpusCoverage
High-coverage, Coherent Map
Properties of a Good Map
1. Coherence
2. Coverage
3. Connectivity
Definition: Connectivity
• Experimented with formulations• Users do not care about connection type• Encourage connections between pairs of lines
Lines with No Intersection
Solution: Reward lines that had impact on each other
Perceptrons SVMOptimizing Kernels
for SVM
Face DetectionSVM for Facial
Recognition
Tying it all Together:Map Objective
• Coherence– Either coherent or not: Constraint
• Coverage– Must have!
• Connectivity– Nice to have
Consider all coherent maps with maximum possible coverage.
Find the most connected one.
Game Plan
Objective Algorithm Does itwork?
Approach Overview
Documents D
…
1. Coherence graph G 2. Coverage function f
f( ) = ?
3. Increase Connectivity
Coherence Graph: Main Idea
• Vertices correspond to short coherent chains• Directed edges between chains which can be
conjoined and remain coherent
1 2 3
4 5 6 5 8 9
1 2 3 5 8 9
Finding High-Coverage Chains• Paths correspond to coherent chains.• Problem: find a path of length K maximizing
coverage of underlying articles
1 2 3
4 5 6 5 8 9
Cover( )
>
Cover( )
?
1 2 3 4 5 6
1 2 3 5 8 9
Reformulation• Paths correspond to coherent chains.• Problem: find a path of length K maximizing
coverage of underlying articles
• Submodular orienteering– [Chekuri and Pal, 2005]– Quasipolynomial time recursive greedy– O(log OPT) approximation
Orienteering
a function of the nodes visited
Approach Overview: Recap
Documents D
…
1. Coherence graph G 2. Coverage function f
f( ) = ?
3. Increase Connectivity
Encodes all coherent chains as
graph paths
Submodular orienteering [Chekuri & Pal, 2005]
Quasipoly time recursive greedy
O(log OPT) approximation
Example Map: Reinforcement Learning
multi-agent cooperative joint teammdp states pomdp transition optioncontrol motor robot skills armbandit regret dilemma exploration armq-learning bound optimal rmax mdp
Example Map Detail: SVM
Game Plan
Objective Algorithm Does itwork?
User Study
• Tricky!– No double-blind, no within-subject– Domain: understandable yet unfamiliar– Reinforcement Learning (RL)
User Study
• 30 participants• First-year grad student, Reinforcement
Learning project• Update a survey paper from 1996• Identify research directions + relevant papers
– Google Scholar – Map and Google Scholar – Baselines: Map, Wikipedia
Results (in a nutshell)Be
tter
Google Us Google Us
Map users find better papers, and
cover more important areas
User CommentsHelpful
noticed directions I didn't know aboutgreat starting point
… get a basic idea of what science is up to
why don't you draw words on edges?
Legend is confusing
hard to get an idea from paper title alone
Conclusions• Formulated metrics characterizing good maps for
the scientific domain• Efficient methods with theoretical guarantees• User studies highlight the promise of the method• Website on the way!• Personalization
Thank you!