oisin boydell, barry smyth

17
From Social Bookmarking to Social Summarization: An Experiment in Communi ty-Based Summary Generat ion Oisin Boydell, Barry Smyth Adaptive Information Cluster, School of Computer Sc ience and Informatics University College Dublin 2007 Intelligent User Interfaces Presented by Sharon HSIAO 2007.10.05

Upload: desirae-wheeler

Post on 04-Jan-2016

25 views

Category:

Documents


0 download

DESCRIPTION

From Social Bookmarking to Social Summarization: An Experiment in Community-Based Summary Generation. Oisin Boydell, Barry Smyth Adaptive Information Cluster, School of Computer Science and Informatics University College Dublin 2007 Intelligent User Interfaces - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Oisin Boydell, Barry Smyth

From Social Bookmarking to Social Summarization: An Experiment in Community-Based Summary G

enerationOisin Boydell, Barry Smyth

Adaptive Information Cluster, School of Computer Science and Informatics

University College Dublin2007 Intelligent User Interfaces

Presented by Sharon HSIAO 2007.10.05

Page 2: Oisin Boydell, Barry Smyth

Agenda

• Introduction

• Novelty way to generate a social summary

• Evaluation & Methodology

• Experiments

• Discussion

• Conclusion

Page 3: Oisin Boydell, Barry Smyth

Introduction

• Traditional approach of summarization technique may perform well in general; however, it may not meet the needs and preferences of individual users or a community of users, to extract the core content of the document effectively

Page 4: Oisin Boydell, Barry Smyth

Summarization

• 2 broad approaches to summarization:– Extraction

• Open Text Summarizer (OTS)• MEAD Summarizer

Word occurrence and positional information to extract high scoring sentences

– Abstraction• Rely heavily on syntactic• Representation is conceptual

Page 5: Oisin Boydell, Barry Smyth

Web page Summarization

• Html markup• In-linking text• Search engine click-through• Sentence-selection algorithm: web content+quer

y click-through• the weight of query words is increased according

to its frequency within the query collection

Social summarization

interaction or usage data can be used to good effect to generate high quality summaries of Web pages

Page 6: Oisin Boydell, Barry Smyth

idea of Social Summarization

• 1. A page p can be associated with a set of queries, Q(p) =q1, . . . , qn

• 2. For a given query, qi, the search engine (SE) will produce a query-sensitive snippet, SSE(p, qi), which contains a number of sentence fragments

• 3. The social summary for p, SSSE(p), can be constructed from the combination of fragments associated with Q(p)

• according to the importance of the fragment, give rank order

Page 7: Oisin Boydell, Barry Smyth

Generating a social summary

1. extract the snippet texts, S(bi, p) to produce a set of sentence fragments

2. normalise sentence fragments to cope with fragment overlap and subsumption

3. score each sentence fragment according to its frequency of occurrence across the snippets

4. rank-order the normalised fragments to produce the final summary

Page 8: Oisin Boydell, Barry Smyth
Page 9: Oisin Boydell, Barry Smyth

Setup & Methodology

• Data from Del.icio.us• 3781 bookmarked pages• Tags up to a maximum 50 per page• 1386 pages contained description text within HM

TL meta-content description tag• Compared with OTS and MEAD• Lucene snippet generator (Apache Foundation)• ROUGE(Recall-Oriented Understudy for Gisting

Evaluation): to compare generated to gold-standard; counting overlapping n-gram, word sequences, word pairs

Page 10: Oisin Boydell, Barry Smyth

Experiment 1Comparison of Summary Quality

• Avg length of SS summaries was 24% of the original

Page 11: Oisin Boydell, Barry Smyth

Experiment 2Summary Length vs. Quality

• consider the quality of summaries of different lengths, by eliminating low scoring fragments from the final social summary

Page 12: Oisin Boydell, Barry Smyth

Experiment 3Search Activity vs Quality

• consider the relationship between the number of available cues (bookmark tags, in this case) and summary quality

• query sets of size 1-10, 11-20, 21-30, 31-40, and 41-50 queries

• selected randomly, producing nearly 25,000 different summaries in total

Page 13: Oisin Boydell, Barry Smyth

• SS produces summaries with recall scores that are 31% better than the OTS summaries and approximately 28% better than the MEAD summaries

Page 14: Oisin Boydell, Barry Smyth

Discussion

• Query-Focused Social Summaries– generating a more focused social summary th

at is informed perhaps by the context provided by some target user query, SS(p, qT )

– top ranking results may be associated with longer (more detailed) social summaries than lower ranking results

Page 15: Oisin Boydell, Barry Smyth

• Community-Focused Social Summaries– social summarization technique can be used to gener

ate query focused snippets that better reflect the niche needs of a particular community of searchers

– identify those queries that have led to the past selection of p by community members and that are similar to qT

– Eg. “Jaguar parts”• “Genuine Jaguar, Land Rover and Range Rover OEM and br

and name aftermarket parts”• “The one-stop-shop for genuine restoration Jaguar parts for

all classic models including S Type, X Type, X300 - XJR, ...”

Page 16: Oisin Boydell, Barry Smyth

Preliminary results

• extracted the top 100 bookmarked pages for tag “travel”

• Then extracted the top bookmark tags used to label each of these pages; generate a new set of tags (eg. European travel, travel tips…)

• 1153 bookmarked pages, 5291 unique sets of terms, 6290 unique users

• Training & test set• 5 random split training&test

Page 17: Oisin Boydell, Barry Smyth

Conclusion

• social summarization technique produces higher-quality summaries

• query-focused social summaries provide searchers with improved result-snippet summaries

• community-focused summaries — summaries that better reflect the needs of communities of like minded users