![Page 1: A System for Query-Specific Document Summarization](https://reader031.vdocument.in/reader031/viewer/2022020701/61f981325cf743744d76c284/html5/thumbnails/1.jpg)
A System for Query-Specific Document Summarization
Ramakrishna Varadarajan,
Vagelis Hristidis.
FLORIDA INTERNATIONAL UNIVERSITY,
School of Computing and Information Sciences,
Miami.
![Page 2: A System for Query-Specific Document Summarization](https://reader031.vdocument.in/reader031/viewer/2022020701/61f981325cf743744d76c284/html5/thumbnails/2.jpg)
Slide 1
Roadmap
•Need for query-specific summaries
•Our approach– Building a document graph
– Definition of summary
– Rank Summaries
• Efficient computation of summaries
• Evaluation of summarization process– Quality
– Performance
•Related Work
• Conclusions Florida International University (FIU)
![Page 3: A System for Query-Specific Document Summarization](https://reader031.vdocument.in/reader031/viewer/2022020701/61f981325cf743744d76c284/html5/thumbnails/3.jpg)
Slide 2
Roadmap
•Need for query-specific summaries
•Our approach– Building a document graph
– Definition of summary
– Rank Summaries
• Efficient computation of summaries
• Evaluation of summarization process– Quality
– Performance
•Related Work
• Conclusions Florida International University (FIU)
![Page 4: A System for Query-Specific Document Summarization](https://reader031.vdocument.in/reader031/viewer/2022020701/61f981325cf743744d76c284/html5/thumbnails/4.jpg)
Slide 3
Need for Query-Specific Summaries
• Locating relevant information is hard.
• Summaries are helpful because:– Provide a Quick preview of the document.
– Allow users to quickly decide relevance.
– Save user’s browsing time.
• Success of Web search engines – Query specific snippetsare important.
• Two categories of summaries:– Query-Independent – Most of prior works.
– Query-Specific – Applicable to web search engines.Florida International University (FIU)
![Page 5: A System for Query-Specific Document Summarization](https://reader031.vdocument.in/reader031/viewer/2022020701/61f981325cf743744d76c284/html5/thumbnails/5.jpg)
Slide 4
Motivation
Query-Specific Summaries
Florida International University (FIU)
![Page 6: A System for Query-Specific Document Summarization](https://reader031.vdocument.in/reader031/viewer/2022020701/61f981325cf743744d76c284/html5/thumbnails/6.jpg)
Slide 5
Motivation
Drawbacks
• Association between query keywords is unclear.
• Naïve approach for summarization.
• Ignores semantic relations between keywords in the document.
Summarization research till date
•Mostly Query-Independent.
• Not applicable for web search.
Florida International University (FIU)
![Page 7: A System for Query-Specific Document Summarization](https://reader031.vdocument.in/reader031/viewer/2022020701/61f981325cf743744d76c284/html5/thumbnails/7.jpg)
Slide 6
Roadmap
•Need for query-specific summaries
•Our approach– Building a document graph
– Definition of summary
– Rank Summaries
• Efficient computation of summaries
• Evaluation of summarization process– Quality
– Performance
•Related Work
• Conclusions Florida International University (FIU)
![Page 8: A System for Query-Specific Document Summarization](https://reader031.vdocument.in/reader031/viewer/2022020701/61f981325cf743744d76c284/html5/thumbnails/8.jpg)
Slide 7
Our Approach
• Document � graph
• We call it Document Graph.
Three Steps
Step 1: Preprocess
• Build a document graph, G.
Step 2: Summary Generation
• Given a query Q and a document graph G,
Summaries � Spanning Trees that cover all keywords
Step 3: Rank spanning trees.Florida International University (FIU)
![Page 9: A System for Query-Specific Document Summarization](https://reader031.vdocument.in/reader031/viewer/2022020701/61f981325cf743744d76c284/html5/thumbnails/9.jpg)
Slide 8
Building Document Graphs
• Parse the document.
• Split it into text fragments (using delimiters or tags).
• Text Fragments represented as Nodes
• Add an edge between 2 nodes, if semantically related.
• Edges : Semantic Links
• Edge weights: Degree of association
Florida International University (FIU)
![Page 10: A System for Query-Specific Document Summarization](https://reader031.vdocument.in/reader031/viewer/2022020701/61f981325cf743744d76c284/html5/thumbnails/10.jpg)
Slide 9
Example
v0
v3
v1
v2
0.026
0.015
0.015
v7 v13
v14
0.027
0.017 0.043v10
0.023
v4v8
v5
0.015
0.058 0.091
0.0150.053
0.027
0.028
0.015
v6
0.044
0.06
0.015
0.015
0.015
0.032
0.060
v90.015 0.015
v12
v110.015
0.015
0.015
0.015
v15v16 0.015
0.037
Sample DocumentDocument Graph
• Parsing delimiter – NewLine.
• Text Fragments – Paragraphs.
• 17 text fragments (v0…v16).
• 17 nodes in Document Graph.Florida International University (FIU)
![Page 11: A System for Query-Specific Document Summarization](https://reader031.vdocument.in/reader031/viewer/2022020701/61f981325cf743744d76c284/html5/thumbnails/11.jpg)
Slide 10
Input parameters for Document Graph construction
–Parsing Delimiters
• For Plain Text – Newline or Period
• For HTML – Tags (<p>,<br>,<ul><ol>,<table>…etc.)
–Threshold for Edge weights
• Tradeoff of Quality and Performance.
• Edges with weights lesser, are not added.
–Maximum Fragment Size
• Limit on Node Size
Florida International University (FIU)
![Page 12: A System for Query-Specific Document Summarization](https://reader031.vdocument.in/reader031/viewer/2022020701/61f981325cf743744d76c284/html5/thumbnails/12.jpg)
Slide 11
Computing edges of Document Graphs
• For every pair of nodes,
� Common Words are used (stops words – ignored)
� Thesaurus and stemmer used (rely on Oracle Intermedia Text services)
� If EScore(e) ≥ threshold, an edge is added.
• Special Case
� Adjacent Text Fragments.
� Share Close Proximity.
� Weight = Max (EScore(e) ,threshold).
Florida International University (FIU)
![Page 13: A System for Query-Specific Document Summarization](https://reader031.vdocument.in/reader031/viewer/2022020701/61f981325cf743744d76c284/html5/thumbnails/13.jpg)
Slide 12
Edge Scoring
• EScore
A tf*idf adaptation.
–Query Independent.
–Edge e(u,v)
w – common word,
t (v) – text fragment corresponding to node v.
Size (v) –number of words in text fragment t(v).
( )
))(())((
))())),(()),(((
)( ))()((
vtsizeutsize
widfwvttfwutft
eEScore vtutw
+
⋅+=
∑∈ I
Florida International University (FIU)
![Page 14: A System for Query-Specific Document Summarization](https://reader031.vdocument.in/reader031/viewer/2022020701/61f981325cf743744d76c284/html5/thumbnails/14.jpg)
Slide 13
Example (cont’d)
v0
v3
v1
v2
0.026
0.015
0.015
v7 v13
v14
0.027
0.017 0.043v10
0.023
v4v8
v5
0.015
0.058 0.091
0.0150.053
0.027
0.028
0.015
v6
0.044
0.06
0.015
0.015
0.015
0.032
0.060
v90.015 0.015
v12
v110.015
0.015
0.015
0.015
v15v16 0.015
0.037
Sample DocumentDocument Graph
Common words:
• BrainGate,
• CyberkineticsReasons for high weight
• Rare Words (idf is large).
Florida International University (FIU)
![Page 15: A System for Query-Specific Document Summarization](https://reader031.vdocument.in/reader031/viewer/2022020701/61f981325cf743744d76c284/html5/thumbnails/15.jpg)
Slide 14
Computing Query-Specific Summaries
• Given a Query, Q and a Document Graph, G:
Summary �Minimal Total Spanning Tree.
Minimal Total Spanning Tree
• Total – Every keyword in at least one node (AND semantics)
• Minimal – To avoid redundancy (Eliminating useless leaves)
Summarization Problem
Given – Document Graph G and a Query Q
Find – Top (best) Minimal Total Spanning Tree (Summary)Florida International University (FIU)
![Page 16: A System for Query-Specific Document Summarization](https://reader031.vdocument.in/reader031/viewer/2022020701/61f981325cf743744d76c284/html5/thumbnails/16.jpg)
Slide 15
Example
v0
v3
v1
v2
0.026
0.015
0.015
v7 v13
v14
0.027
0.017 0.043v10
0.023
v4v8
v5
0.015
0.058 0.091
0.0150.053
0.027
0.028
0.015
v6
0.044
0.06
0.015
0.015
0.015
0.032
0.060
v90.015 0.015
v12
v110.015
0.015
0.015
0.015
v15v16 0.015
0.037
Sample Document Document Graph
v100.017v0
0.046 0.008Top Summary for “Brain Chip Research"
Score = 67.74
Brain chip offers hope for paralyzed. Donoghue’s initial research published in the science journal Nature in 2002 consisted of attaching an implant to a monkey’s brain that enabled it to play a simple pinball computer game remotely.
Florida International University (FIU)
![Page 17: A System for Query-Specific Document Summarization](https://reader031.vdocument.in/reader031/viewer/2022020701/61f981325cf743744d76c284/html5/thumbnails/17.jpg)
Slide 16
Summary Scoring Function
Requirements
Properties of Good Summaries :
• Highly relevant nodes (fragments) improve Score.
• Loose semantic Links degrade Score.
• Large spanning trees get a degraded Score.
• Based on Query-dependent & Query-Independent factors.
Summary Scoring
• This function satisfies these requirements.
• Best Summary has minimum score
∑∑∈
∈
+=Teedge
Tvnode
vNScoreb
eEScorea
)(
1)(
1Score(T)
a andb are calibrating parameters.
(a=1 & b=0.5)Florida International University (FIU)
![Page 18: A System for Query-Specific Document Summarization](https://reader031.vdocument.in/reader031/viewer/2022020701/61f981325cf743744d76c284/html5/thumbnails/18.jpg)
Slide 17
Summary Node Scoring
• Node Scoring
–Widely used Okapi weighting.
–Query Dependent.
–NScore (v) =
N – Number of Documents in the collection.
tf – Term Frequency .
df – Document Frequency.
avdl – Average Document Length.
qtfk
qtfk
tfavdl
dlbbk
tfk
df
dfN
dQt++
++−
++
+−∑∈ 3
3
1
1
,
)1(.
))1((
)1(.
5.0
5.0ln
Florida International University (FIU)
![Page 19: A System for Query-Specific Document Summarization](https://reader031.vdocument.in/reader031/viewer/2022020701/61f981325cf743744d76c284/html5/thumbnails/19.jpg)
Slide 18
Roadmap
•Need for query-specific summaries
•Our approach– Building a document graph
– Definition of summary
– Rank Summaries
•Efficient computation of summaries
• Evaluation of summarization process– Quality
– Performance
•Related Work
• Conclusions Florida International University (FIU)
![Page 20: A System for Query-Specific Document Summarization](https://reader031.vdocument.in/reader031/viewer/2022020701/61f981325cf743744d76c284/html5/thumbnails/20.jpg)
Slide 19
ALGORITHMS
• Adaptations of BANKS [ICDE02] Algorithms
• Input : Document Graph G and Query Q
• Output : Minimal Total Spanning trees (Summaries)
• Enumeration Algorithm.
• Expanding Search Algorithm.
Pre-computation:
–A Full text Index.
–All Pairs shortest paths for each document graph
(edge weight of edge e= 1/Escore(e)).
Florida International University (FIU)
![Page 21: A System for Query-Specific Document Summarization](https://reader031.vdocument.in/reader031/viewer/2022020701/61f981325cf743744d76c284/html5/thumbnails/21.jpg)
Slide 20
Roadmap
•Need for query-specific summaries
•Our approach– Building a document graph
– Definition of summary
– Rank Summaries
• Efficient computation of summaries
•Evaluation of summarization process– Quality
– Performance
•Related Work
• Conclusions Florida International University (FIU)
![Page 22: A System for Query-Specific Document Summarization](https://reader031.vdocument.in/reader031/viewer/2022020701/61f981325cf743744d76c284/html5/thumbnails/22.jpg)
Slide 21
User Surveys
• To evaluate the Quality of Summaries
• Subjects : 15 Students from FIU (all levels & various majors).
• Users evaluate summaries based on their Quality.
• Rating: 1 (least descriptive) to 5 (most descriptive)
• Surveys
–Comparison with Google & MSN Desktop.
–Comparison with DUC 2005 datasets.
Florida International University (FIU)
![Page 23: A System for Query-Specific Document Summarization](https://reader031.vdocument.in/reader031/viewer/2022020701/61f981325cf743744d76c284/html5/thumbnails/23.jpg)
Slide 22
Comparison with Google & MSN Desktop Engines
3.674.001.003.001.672.005
4.004.673.001.672.671.674
4.004.933.000.672.673.003
3.334.333.002.003.332.002
3.674.873.672.333.672.331
D2D1D2D1D2D1
Our ApproachMSN DesktopGoogle Desktop
Queries
Computer network security projectDeleted computer software5
Large research grantsWorm affected agencies4
Software projectsRecovering worm deleted files3
Algorithms development researchAnti-virus protection2
IT Research awardsMicrosoft worm protection1
Document D2Document D1Queries
![Page 24: A System for Query-Specific Document Summarization](https://reader031.vdocument.in/reader031/viewer/2022020701/61f981325cf743744d76c284/html5/thumbnails/24.jpg)
Slide 23
Performance Experiments
01
23
45
67
2 3 4 5Number of keywords in query
Pro
cess
ing tim
e(m
sec)
Mult i-result EnumerationMult i-result Expanding
0
1
2
3
4
5
6
2 3 4 5Number of keywords in the query
Pro
cess
ing T
ime (m
sec)
Top-1 Enumeration
Top-1 Expanding
Average times to calculate node weights
17.3311.509.375.31Time (msec)
5432Number of keywords
1.81.41.31.1Top-1 Expanding Search Algorithm.
2.782.11.81.4Top-1 Enumeration Algorithm.
5432Number of keywords
Average ranks of Top-1 Algorithms
News articles from science section of cnn.com
Florida International University (FIU)
![Page 25: A System for Query-Specific Document Summarization](https://reader031.vdocument.in/reader031/viewer/2022020701/61f981325cf743744d76c284/html5/thumbnails/25.jpg)
Slide 24
Roadmap
•Need for query-specific summaries
•Our approach– Building a document graph
– Definition of summary
– Rank Summaries
• Efficient computation of summaries
• Evaluation of summarization process– Quality
– Performance
•Related Work
• Conclusions Florida International University (FIU)
![Page 26: A System for Query-Specific Document Summarization](https://reader031.vdocument.in/reader031/viewer/2022020701/61f981325cf743744d76c284/html5/thumbnails/26.jpg)
Slide 25
Related Work
Document Summarization
• Mostly Query-Independent
• Summarizing Web Pages– Berger et.al [SIGIR 2000] synthesizes summaries.
– Paris et.al [CIKM 2000] uses anchor text (ignores content).
• Splitting Web pages in to blocks– Song et.al [WWW2004] Block importance models (learning algorithms)
– Cai et.al [SIGIR 2004] Block level link analysis
• Document modeled as Graphs– Lexrank : Sentence Centrality using link analysis.
– TextRank: “representative” sentences using link analysis.
Keyword Search in Data Graphs
– BANKS [ICDE 2002]: group-steiner tree problem
– DISCOVER, DBXplorer.
– XRANK[2003]: search in XML documents.Florida International University (FIU)
![Page 27: A System for Query-Specific Document Summarization](https://reader031.vdocument.in/reader031/viewer/2022020701/61f981325cf743744d76c284/html5/thumbnails/27.jpg)
Slide 26
Conclusions
• Method for Query-Specific Summarization.
• Exploiting inherent structure of documents for the purpose of Summarization.
• Enhanced User Satisfaction – User Surveys.
A Prototype of the System available at:
http://dbir.cs.fiu.edu/summarization
Florida International University (FIU)
![Page 28: A System for Query-Specific Document Summarization](https://reader031.vdocument.in/reader031/viewer/2022020701/61f981325cf743744d76c284/html5/thumbnails/28.jpg)
Slide 27
Thank You !!!
Questions ???
Florida International University (FIU)
![Page 29: A System for Query-Specific Document Summarization](https://reader031.vdocument.in/reader031/viewer/2022020701/61f981325cf743744d76c284/html5/thumbnails/29.jpg)
Slide 28
Enumeration Algorithm
Florida International University (FIU)
![Page 30: A System for Query-Specific Document Summarization](https://reader031.vdocument.in/reader031/viewer/2022020701/61f981325cf743744d76c284/html5/thumbnails/30.jpg)
Slide 29
Expanding Search Algorithm
w1
v4
v2
w3
v30.01
0.02
0.02
Florida International University (FIU)
![Page 31: A System for Query-Specific Document Summarization](https://reader031.vdocument.in/reader031/viewer/2022020701/61f981325cf743744d76c284/html5/thumbnails/31.jpg)
Slide 30
Comparison with DUC peers
3.672.33FT921-743.673.67FT943-16238
4.172.83FT922-133534.174.00FT943-16477
4.332.00FT921-9373.002.83FT931-3563
4.002.00FT922-1903.332.50FT944-8297
2.504.00FT921-77864.662.33FT941-3237
Our Approach
DUC PeerDoc. ID
Our approach
DUCPeerDoc. ID
Query 2 (Women in Parliaments)DUC Topic ID: d321f
Query 1 (International Organized Crime)
DUC Topic ID: d301i
Florida International University (FIU)
![Page 32: A System for Query-Specific Document Summarization](https://reader031.vdocument.in/reader031/viewer/2022020701/61f981325cf743744d76c284/html5/thumbnails/32.jpg)
Slide 31
DEMO
Florida International University (FIU)
![Page 33: A System for Query-Specific Document Summarization](https://reader031.vdocument.in/reader031/viewer/2022020701/61f981325cf743744d76c284/html5/thumbnails/33.jpg)
Slide 32
DEMO
Florida International University (FIU)