social context summarization · •solving via dual wing factor graph model •experimental result...
Post on 16-Oct-2020
9 Views
Preview:
TRANSCRIPT
1
Zi Yang, Keke Cai, Jie Tang, Li Zhang, Zhong Su, Juanzi Li
Tsinghua University IBM, China Research Lab
July 25, 2011, SIGIR 2011, Beijing
Social Context Summarization
2
Outline
• Motivation
• Prior Work
• Social Context Summarization
• Solving via Dual Wing Factor Graph Model
• Experimental Result and Case Study
• Conclusion and Future Work
3
Motivation
• Document summarization
Document Summary
4
Motivation
• Context and social context
Web document Related documents
Hyperlinks
Click-through data
Discussing Tweets
Opinions/comments
Associated authors/users
User network
Message network
5
Motivation
• How do (social) contexts help summarization?
Related documents
Discussing Tweets
User network
Message network
Textual information • Integrate context information to
help estimate the informativeness of sentences
Social influence • If a user is an opinion leader, his/her comments
should be more important than others.
Information propagation • If a comment has been forwarded or replied by
many other users, the comment should be more important than others.
6
Challenges
• To incorporate users, user generated contents, social networks, and implicit networks (such as the forward/reply network).
How to formally define the social context?
• To capture the dependency between the social context information and the quality of summaries.
How to formalize the problem in a principled framework?
• The social context contains inevitable noise.
How to validate on a real-world data set?
7
Outline
• Motivation
• Prior Work
• Social Context Summarization
• Solving via Dual Wing Factor Graph Model
• Experimental Result and Case Study
• Conclusion and Future Work
8
Prior Work
• Document summarization w/o context
0.5 0.7 0.4 0.5 0.6
0.3 0.7 0.4 0.5 0.6 0.5 0.4 0.5 0.3 0.6
Sentence Scoring
Linguistic features • rhetorical structure, lexical chains Statistical features • term significance, sentences similarity
Sentence Selection
Unsupervised approaches Supervised approaches • Binary classification: Naïve
Bayes [Kupiec:95], Maximum Entropy [Osborne:02], Genetic Algorithm [Yeh:05]
• Sequence labeling: HMM [Conroy:01], CRF [Shen:07]
9
Prior Work
• Binary classification vs. sequence labeling
– Example: document d with sentence set Sd = {s1, s2, s3, s4} associated with Yd = {y1, y2, y3, y4}.
s1 s2 s3 s4
y1 y2 y3 y4
f1 f2 f3 f4g12 g23 g34
4
1
( )i
i
p f
x
Sequence labeling
s1 s2 s3 s4
y1 y2 y3 y4
f1 f2 f3 f4
Binary classification
4 3
, 1
1 1
( )i i i
i i
p f g
x
10
Prior Work
• Binary classification vs. sequence labeling
– Example: document d with sentence set Sd = {s1, s2, s3, s4} associated with Yd = {y1, y2, y3, y4}.
s1 s2 s3 s4
y1 y2 y3 y4
f1 f2 f3 f4g12 g23 g34
4
1
( )i
i
p f
x
Sequence labeling
s1 s2 s3 s4
y1 y2 y3 y4
f1 f2 f3 f4
Binary classification
4 3
, 1
1 1
( )i i i
i i
p f g
x
Factor graph representation
Nodes • represent variables or factors Edges • associate factors with variables
11
Prior Work
• Document summarization with context
–Contents from external documents [Wan:08], cited articles [Mei:08], documents selected by users [Mani:98]
–Text surrounding the hyperlink [Amitay:00, Delort:03]
–Search engine click-through data [Sun:05]
–Commented sentence selection [Hu:08] or opinionated text [Ganesan:10, Paul:10]
12
Prior Work
• Document summarization with context
–Contents from external documents [Wan:08], cited articles [Mei:08], documents selected by users [Mani:98]
–Text surrounding the hyperlink [Amitay:00, Delort:03]
–Search engine click-through data [Sun:05]
–Commented sentence selection [Hu:08] or opinionated text [Ganesan:10, Paul:10]
LIMITATION
Lacks of a unified approach
To explore the non-textual information from social context
To take full advantage of conventional techniques in summarization
!
13
Outline
• Motivation
• Prior Work
• Social Context Summarization
• Solving via Dual Wing Factor Graph Model
• Experimental Result and Case Study
• Conclusion and Future Work
14
Social Context Summarization
• Social Context and SCAN (in Twitter scenario)
Web document d
Associated tweet set Md = {mi}
Associated user set Ud = {ui}
User relations Ed
u = {(ui, uj)}
Tweet relations Ed
m = {(mi, mj)}
Social context of d Cd = <Md, Ud>
s1 s2 s3 s4
Sentence relations Ed
s = {(si, sj)} Social Context Augmented
Network (SCAN) Gd = (Sd, Cd, Ed),
Ed = Eds ∪ Ed
m ∪ Edu
15
Social Context Summarization
• SCAN
– Three-layer perspective
• Summarization objective
DocumentLayer
MicroblogLayer
UserLayer
f0llow
f0llow
u1
u2
u3
reply
retweetreply
d2
retweet
m6
m5
m7
m8
d1
s1 s2 s3 s4
m9
m10s1 s3
m7 m9
A summary consists of
Important tweets Md*
Important sentences Sd*
Subproblem 1: Key Sentence
Extraction
Subproblem 2: Tweet
Summarization
16
Social Context Summarization
• SCAN
– Three-layer perspective
• Summarization objective
DocumentLayer
MicroblogLayer
UserLayer
f0llow
f0llow
u1
u2
u3
reply
retweetreply
d2
retweet
m6
m5
m7
m8
d1
s1 s2 s3 s4
m9
m10s1 s3
m7 m9
A summary consists of
Important tweets Md*
Important sentences Sd*
Subproblem 1: Key Sentence
Extraction
Subproblem 2: Tweet
Summarization
QUESTION
How to capture the mutual reinforcement between the two subproblems to facilitate generating a HQ summary?
?
17
Outline
• Motivation
• Prior Work
• Social Context Summarization
• Solving via Dual Wing Factor Graph Model
• Experimental Result and Case Study
• Conclusion and Future Work
18
Sentence & Social Context Scoring
Sentence & Social Context Selection
Summarized Social Context
Social Context
Modeling
Document Modeling
Testing Set
Web Document
Social Context
TrainingSet
Social Context
Web Document
Solving via Dual Wing Factor Graph Model
• Overview
– Supervised framework
Take advantage of conventional techniques in summarization
Explore the non-textual information from social context
Reinforce!
19
m5 m6 m7 m8
y5 y6 y7 y8
f5 f6 f7g56 g67
g68
f8
Solving via Dual Wing Factor Graph Model
• Modeling
s1 s2 s3 s4
y1 y2 y3 y4
f1 f2 f3 f4g12 g23 g34Document
Layer
MicroblogLayer
UserLayer
f0llow
f0llow
u1
u2
u3
reply
retweetreply
d2
retweet
m6
m5
m7
m8
d1
s1 s2 s3 s4
m9
m10
Local attribute factor
Sentence dependency factor
Tweet dependency factor
20
Solving via Dual Wing Factor Graph Model
• Modeling
m5 m6 m7 m8
y5 y6 y7 y8
f5 f6 f7g56 g67
g68
f8
m5 m6 m7 m8
y5 y6 y7 y8
f5 f6 f7
g56 g67
g68
f8
s1 s2 s3 s4
y1 y2 y3 y4
f1 f2 f3 f4g12 g23 g34
h15 h16 h37 h47
s1 s2 s3 s4
y1 y2 y3 y4
f1 f2 f3 f4g12 g23 g34Document
Layer
MicroblogLayer
UserLayer
f0llow
f0llow
u1
u2
u3
reply
retweetreply
d2
retweet
m6
m5
m7
m8
d1
s1 s2 s3 s4
m9
m10
Local attribute factor
Sentence dependency factor
Tweet dependency factor
Cross-domain dependency factor
21
Solving via Dual Wing Factor Graph Model
• Modeling
m5 m6 m7 m8
y5 y6 y7 y8
f5 f6 f7g56 g67
g68
f8
m5 m6 m7 m8
y5 y6 y7 y8
f5 f6 f7
g56 g67
g68
f8
s1 s2 s3 s4
y1 y2 y3 y4
f1 f2 f3 f4g12 g23 g34
h15 h16 h37 h47
s1 s2 s3 s4
y1 y2 y3 y4
f1 f2 f3 f4g12 g23 g34Document
Layer
MicroblogLayer
UserLayer
f0llow
f0llow
u1
u2
u3
reply
retweetreply
d2
retweet
m6
m5
m7
m8
d1
s1 s2 s3 s4
m9
m10
Local attribute factor
Sentence dependency factor
Tweet dependency factor
Cross-domain dependency factor
QUESTION
How to define these factors?
?
22
Solving via Dual Wing Factor Graph Model
• Modeling
– Local attribute factor
• Maximum entropy
– Dependency factor
• Binary value
, ,( , ) e x p
i c c i c i c if y x y
,
i f s o m e c o n d i t i o n h o ld s( , , )
1 o t h e r w i s e
c
i j c c i j
eg y y
y𝑖 ≠ 1 or y𝑗 ≠ 1
What condition?
Sentence dependency
Tweet dependency
Cross-domain dependency
y𝑖 ≤ y𝑗 y𝑖 = y𝑗
si sj mi mj si mj
Only for similarity > θg or θh
23
Solving via Dual Wing Factor Graph Model
• Modeling
– Objective function
• Parameter estimation
– L-BFGS
• Inference algorithm
– Loopy belief propagation
– Could be distributed easily!
, ,
,
1m a x ( , ) ( , , ) ( , , )
i c c i i j c c i j i j i j
i j T c C
f y g y y h y yZ
Parameters
24
Outline
• Motivation
• Prior Work
• Social Context Summarization
• Solving via Dual Wing Factor Graph Model
• Experimental Result and Case Study
• Conclusion and Future Work
25
Experimental Result and Case Study
• Data preparation
• Distribution of frequency
• Evaluate on only HQ websites
– CNN, BBC, MTV, ESPN, Mashable.
4,874,389 users
404,544,462 tweets
200,000 most frequent URLs
12,964,166 tweets
Ad. pages
26
Experimental Result and Case Study
• Annotation method
– Manual annotation on
– 1,145 HITs (Human Intelligent Tasks) from 158 different users
• Evaluation methods
– F1 and ROUGE-1,2
• Baseline methods
– Supervised: SVM, logistic regression (LR), CRF, SVM+, LR+ (include neighbors’ features)
– Unsupervised: random, Doc/ParaLead, PageRank
27
Experimental Result and Case Study
• Feature
– 6 basic features for sentences
– 5 basic features for tweets
• author’s PageRank (influence from user network)
• Result
00.10.20.30.40.50.60.7
00.10.20.30.40.50.60.7
F1 R1 R2
sentence tweet
Improved with social context Not improved
28
Experimental Result and Case Study
• Analysis on results
– Why not significantly improved on Tweet Summarization?
– Significantly improved on the MTV and ESPN domains
• Fewer documents and corresponding tweets
Users' interests Annotations Tweets
29
Experimental Result and Case Study
• Analysis on results
– Impact of thresholds θg and θh
Smaller θh More LQ relations
Greater θh Less HQ relations
Smaller θg More LQ relations
Greater θg Less HQ relations
30
Experimental Result and Case Study
• Analysis on results
– Factor contribution analysis
sentence
tweet
Sentence position in paragraph
Sentence length
Averaged TF-IDF
Tweet length
31
Experimental Result and Case Study
• Case study
This is both weird and macabre at the same time.
Weekend At Bernie's come true (or "Wochenende an Bernie" in German)
BBC News, 'Women try to take body on plane at Liverpool airport'
Please don't disturb my friend, he's dead tired <--- Lmao
Please don't disturb my friend, he's dead tired
And in comedy news today, two women try to re-enact Weekend at Bernie’s at Liverpool Airport
…
…
…
Police have arrested two women after they tried to take the body of a dead relative on to a plane at Liverpool John Lennon Airport.
The women - his widow and step-daughter - said they thought he was asleep.
"I [did not] kill my Willi.
My Willi is my god.
I [have loved] my Willi for 22 years."
And she insisted that with his eyes closed they believed he was asleep.
“A dead person you cannot carry to Germany, there are too many people checking and security.
How can you bring a dead person to Germany?”
0.50 0.55 0.50 0.49 0.59 0.50
0.49 0.50 0.50 0.48 0.50
0.50
0.50
0.49
0.51
0.50
0.51
0.51
0.51
0.51
0.49
0.49
0.61 0.55 0.50 0.39 0.76 0.52
0.50 0.55 0.50 0.49 0.59 0.50
0.50 0.50 0.52 0.50 0.50
0.51 0.50 0.51 0.50 0.50
Sentence relationTweet relationSentence localTweet localSentence-tweet relation
High-freq terms in Doc: women, dead person, body, Liverpool Airport High-freq term in Tweets: Liverpool airport, Weekend At Bernie’s
“Women try to take body on plane at Liverpool airport
additional
32
Outline
• Motivation
• Prior Work
• Social Context Summarization
• Solving via Dual Wing Factor Graph Model
• Experimental Result and Case Study
• Conclusion and Future Work
33
Conclusion and Future Work • Conclusion
– Formally defined the social context that incorporates users, user generated contents, and associated networks.
– Proposed a Dual Wing Factor Graph Model to capture the dependency among SCANs.
– Conducted experiments on real world data set, and the proposed method outperforms the baseline methods.
• Future work – Explore friendship among user network
more “elegantly”.
34
Thanks! QA
Zi Yang
July 25, 2011, SIGIR 2011, Beijing
top related