citesight: contextual citation recommendation with differential search avishay livne 1, vivek...
TRANSCRIPT
CiteSight:Contextual Citation Recommendation with Differential Search
Avishay Livne1, Vivek Gokuladas2, Jaime Teevan3, Susan Dumais3, Eytan Adar1
1University of Michigan, 2Qualcom, 3Microsoft
#SIGIR18 #JaimesBackyard
CiteSight:Contextual Citation Recommendation with Differential Search
Avishay Livne1, Vivek Gokuladas2, Jaime Teevan3, Susan Dumais3, Eytan Adar1
1University of Michigan, 2Qualcom, 3Microsoft
Search Engines Focus on Speed
Why Do We Cite?
• Paying homage to pioneers• Giving credit for related work• Identifying methodology• Providing background• Correcting one’s work• Correcting the work of others• Substantiating claims• …
[Garfield, 1965]
How Do We Cite?
• Many resources– Search engines– Bibliographic tools– Colleagues
• Work practice– Papers we know– Papers we should know
Why × How = 2 Specs
• Spec 1– I know what I want, give it to me now– Citation context:
• “… calculating the differences between blocks of text [“
• Spec 2– I don’t know or can’t remember what I want
• [cite]
• Complex, dynamic search space = slow– Inherent trade-off
• Can we build a system to support both?
The CiteSight User Interface
Split World Into Two
Stuff I don’t know about
Stuff I want fast = stuff I know
about
Microsoft Academic
Strategy
• Small, personalized index– Updated dynamically• What you’ve cited before• What you’ve cited now• What other people have cited
– Venue, co-citation, etc.
• Run a big index for everything else
Ranking
• Query: Citation context– “… calculating the differences between
blocks of text [“
• Dynamic recommendations– Immediately: Search the cache– In the background: Search the full index
• Rank retrieved papers:– Gradient boosted regression tree – Features: network + text• Popularity, author similarity, textual similarity,…
Citation Context
• Citation context is really good at picking out “winners”
• People talk about a paper the same way as you!• Not the same
way the author talks about their work
Paper text
Bob et al. introduced ABC in […]
XYZ is similar to ABC […]
We utilize ABC to…[…]
That’s nice…
(S. Redner, 1998)
Citations
Context Coupling
Popular paper Less-popular paper
A B
• A and B related– Co-cited: When B
is mentioned, A is
• “Borrow” contexts from A to B
• Borrowed context used as a feature in ranking papers
CiteSight Evaluation
• Can CiteSight predict existing citations?– 1000 randomly selected CS papers
(2011)• Criteria: 20-40 citations
– 5-fold cross validation–Metric: NDCG• Gain of 1 when guesses correct citation• Gain related to # of co-citations for close
guesses
• User feedback from 5 CS grad students
Results
• Large improvement– Context coupling– All features
Features NDCG@10
Text only 40.8%
Context
coupling46.5%
All features 61.9%
Results
• Large improvement– Context coupling– All features– Citation-related
features > text
• More info = better– Authors– Citations, to a
point
Features NDCG@10
Text only 40.8%
Context
coupling46.5%
All features 61.9%
+ keywords 46.5%
+ title 46.6%
+ authors
similarity
47.5%
+ abstract 47.8%
+ citation count 48.6%
+ venue
relevancy
49.2%
+ citations 53.0%
+ co-citations 56.7%
+ authors
history
57.6%
Cache v. Corpus
• Relevance– Cache accounts for
46% of NDCG@10 of the corpus
– 10% cache is better
• Speed– Cache: 6 ms
• Instantaneous!
– Corpus: 450 ms
Summary
• Differential need for speed• CiteSight – differential search– Two different use cases = two indices
1. Local index updated dynamically, contextually
2. Global index with full content
– Context coupling improves relevance– Local index improves speed• Able to provide instantaneous results• Often relevant because contextually updated
Questions?