conversation disentanglement in sports discourse
DESCRIPTION
Conversation Disentanglement in Sports Discourse. Anthony Wong 6/01/11. Importance of Topic. What is conversation disentanglement? Clustering task, diving a transcript into a number of smaller, separate conversations Conversation disentanglement has a couple practical applications: - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Conversation Disentanglement in Sports Discourse](https://reader036.vdocument.in/reader036/viewer/2022062314/56814900550346895db62be5/html5/thumbnails/1.jpg)
Conversation Conversation Disentanglement in Disentanglement in
Sports DiscourseSports Discourse
Anthony Wong6/01/11
![Page 2: Conversation Disentanglement in Sports Discourse](https://reader036.vdocument.in/reader036/viewer/2022062314/56814900550346895db62be5/html5/thumbnails/2.jpg)
Importance of TopicImportance of TopicWhat is conversation disentanglement?
◦Clustering task, diving a transcript into a number of smaller, separate conversations
Conversation disentanglement has a couple practical applications:◦Summary generation◦User-interface systems like automatic
threading
![Page 3: Conversation Disentanglement in Sports Discourse](https://reader036.vdocument.in/reader036/viewer/2022062314/56814900550346895db62be5/html5/thumbnails/3.jpg)
Basis of my ApproachBasis of my Approach
Michael Elsner and Eugene Charniak (2008)◦Uses lexical and non-lexical features
to cluster different threads Time between utterances, same
speaker, number of shared words, “content” words
![Page 4: Conversation Disentanglement in Sports Discourse](https://reader036.vdocument.in/reader036/viewer/2022062314/56814900550346895db62be5/html5/thumbnails/4.jpg)
Proposed Project Proposed Project OverviewOverviewFollow the methodology in Elsner and
Charniak’s paper◦Create and annotate a dataset of sports
discourseUse existing Elsner/Charniak model to
provide a baseline classification results and see how well their model adapts to a different chat domain
Test out different feature combination to hopefully raise performance
? – Compare results with Elsner/Charniak paper in some meaningful way
![Page 5: Conversation Disentanglement in Sports Discourse](https://reader036.vdocument.in/reader036/viewer/2022062314/56814900550346895db62be5/html5/thumbnails/5.jpg)
Progress so farProgress so far
![Page 6: Conversation Disentanglement in Sports Discourse](https://reader036.vdocument.in/reader036/viewer/2022062314/56814900550346895db62be5/html5/thumbnails/6.jpg)
Retrieving and preparing Retrieving and preparing datadata
![Page 7: Conversation Disentanglement in Sports Discourse](https://reader036.vdocument.in/reader036/viewer/2022062314/56814900550346895db62be5/html5/thumbnails/7.jpg)
Retrieving and preparing Retrieving and preparing datadata
![Page 8: Conversation Disentanglement in Sports Discourse](https://reader036.vdocument.in/reader036/viewer/2022062314/56814900550346895db62be5/html5/thumbnails/8.jpg)
Annotating the dataAnnotating the data
![Page 9: Conversation Disentanglement in Sports Discourse](https://reader036.vdocument.in/reader036/viewer/2022062314/56814900550346895db62be5/html5/thumbnails/9.jpg)
Annotating the dataAnnotating the data
T1 715 KateC : Sam - this is going to be painful, isn't it? T1 715 SamHolako : I hope not Kate, but Howard, Nelson and Carter have killed the Raptors in the past T2 715 JaredWade : Classic Frisco. The Minnesota bathroom smells worse, I hear. T3 715 Anthony(RapsFan) : @Batman: His WP48 is the worst on the team. Andrea is terrible. He scores. That's about it. T3 715 Arnold : Holy impossibilities , Batman - that won't happen. T4 715 BretLaGree : Raja Bell and Mike Bibby just held a flop-off in the lane. Bell won. T5 715 Bobbo : Zach, Go hit up Cinnabun!!! worth the $$...write it off to ESPN anyway T5 715 ZachHarper : I don't think it works that way T6 715 Aras : Jared! T6 715 JaredWade : Aras.
![Page 10: Conversation Disentanglement in Sports Discourse](https://reader036.vdocument.in/reader036/viewer/2022062314/56814900550346895db62be5/html5/thumbnails/10.jpg)
Annotating the dataAnnotating the dataThe annotated part of this transcript
has 399 lines.177 unique threads.The average conversation length is
2.25423728814 .The median conversation length is 2 .The entropy is 7.0155726118 bits.The median chat has 0.0 interruptions
per line.The average block of 10 contains
6.25706940874 threads.The line-averaged conversation density
is 2.77944862155 .
![Page 11: Conversation Disentanglement in Sports Discourse](https://reader036.vdocument.in/reader036/viewer/2022062314/56814900550346895db62be5/html5/thumbnails/11.jpg)
Running Elsner model as Running Elsner model as isis T1 715 KateC : Sam - this is going to be painful, isn't it? T2 715 SamHolako : I hope not Kate, but Howard,
Nelson and Carter have killed the Raptors in the past T3 715 JaredWade : Classic Frisco. The Minnesota
bathroom smells worse, I hear. T4 715 Anthony(RapsFan) : @Batman: His WP48 is the
worst on the team. Andrea is terrible. He scores. That's about it.
T5 715 Arnold : Holy impossibilities , Batman - that won't happen.
T6 715 BretLaGree : Raja Bell and Mike Bibby just held a flop-off in the lane. Bell won.
T7 715 Bobbo : Zach, Go hit up Cinnabun!!! worth the $$...write it off to ESPN anyway
T8 715 ZachHarper : I don't think it works that way T9 715 Aras : Jared! T9 715 JaredWade : Aras.
![Page 12: Conversation Disentanglement in Sports Discourse](https://reader036.vdocument.in/reader036/viewer/2022062314/56814900550346895db62be5/html5/thumbnails/12.jpg)
Running Elsner model as Running Elsner model as isis368 unique threads.The average conversation length is
1.08423913043 .The median conversation length is
1 .The entropy is 8.48485646504 bits.The median chat has 0.0
interruptions per line.The average block of 10 contains
9.52699228792 threads.The line-averaged conversation
density is 1.42355889724 .
![Page 13: Conversation Disentanglement in Sports Discourse](https://reader036.vdocument.in/reader036/viewer/2022062314/56814900550346895db62be5/html5/thumbnails/13.jpg)
Editing the model and Editing the model and evaluationevaluation
Still in progress◦A lot of room for improvement◦Many different feature combinations
to try
Need to get evaluation code running
![Page 14: Conversation Disentanglement in Sports Discourse](https://reader036.vdocument.in/reader036/viewer/2022062314/56814900550346895db62be5/html5/thumbnails/14.jpg)
IssuesIssuesDocumentation for Elsner code is
good, but my Python is not
Integration issues between my data and Elsner code
MEGA Model Optimization Package (megam)