cmu tdt report 12-13 november 2001 the cmu tdt team: jaime carbonell, yiming yang, ralf brown, chun...

13
CMU TDT Report 12-13 November 2001 The CMU TDT Team: Jaime Carbonell, Yiming Yang, Ralf Brown, Chun Jin, Jian Zhang Language Technologies Institute, CMU

Upload: valentine-mccarthy

Post on 18-Jan-2018

215 views

Category:

Documents


0 download

DESCRIPTION

Baseline FSD Method (Unconditional) Dissimilarity with Past Decision threshold on most-similar story (Linear) temporal decay Length-filter (for teasers) Cosine similarity with standard weights:

TRANSCRIPT

Page 1: CMU TDT Report 12-13 November 2001 The CMU TDT Team: Jaime Carbonell, Yiming Yang, Ralf Brown, Chun Jin, Jian Zhang Language Technologies Institute, CMU

CMU TDT Report 12-13 November 2001

The CMU TDT Team:Jaime Carbonell, Yiming Yang, Ralf

Brown, Chun Jin, Jian ZhangLanguage Technologies Institute, CMU

Page 2: CMU TDT Report 12-13 November 2001 The CMU TDT Team: Jaime Carbonell, Yiming Yang, Ralf Brown, Chun Jin, Jian Zhang Language Technologies Institute, CMU

Time Line for TDT Activities (Re)Start: Summer 2001 Baseline FSD, Link, Det: Sept 2001 Evaluation (of baseline): Oct 2001 New Techniques: Nov 2001 – Onwards

Topic-conditional Novelty Situated NE’s (all tasks) Source-conditional interpolated training

Page 3: CMU TDT Report 12-13 November 2001 The CMU TDT Team: Jaime Carbonell, Yiming Yang, Ralf Brown, Chun Jin, Jian Zhang Language Technologies Institute, CMU

Baseline FSD Method (Unconditional) Dissimilarity with

Past Decision threshold on most-similar

story (Linear) temporal decay Length-filter (for teasers)

Cosine similarity with standard weights:

)/log(*))log(1( idfNtftfidf

Page 4: CMU TDT Report 12-13 November 2001 The CMU TDT Team: Jaime Carbonell, Yiming Yang, Ralf Brown, Chun Jin, Jian Zhang Language Technologies Institute, CMU

FSD ResultsStory weighted

Topic weighted

P(miss) .6028 .6028 P(F/A) .0207 .0186 Cost .0141 .0143 Norm Cost

.7043 .7217

Opt N. Cost

.6807 .6807

Page 5: CMU TDT Report 12-13 November 2001 The CMU TDT Team: Jaime Carbonell, Yiming Yang, Ralf Brown, Chun Jin, Jian Zhang Language Technologies Institute, CMU

Comparative FSD DET Curves

Page 6: CMU TDT Report 12-13 November 2001 The CMU TDT Team: Jaime Carbonell, Yiming Yang, Ralf Brown, Chun Jin, Jian Zhang Language Technologies Institute, CMU

FSD Observations Cross-site comparable baselines (cost =.7) Data/labeling issues (from error analysis)

“Events-vs-Topics” issue (e.g. Asia crisis) A few mislabled stories wreak havoc for FSD Eager auto-segmentation a problem (misses)

Recommendations for TDT labeling FSD on true events, or events within topic(s) Change auto-segmentation optimality

criterion ?? Recommendations for TDT reserachers

Keep working hard on FSD – not cracked yet

Page 7: CMU TDT Report 12-13 November 2001 The CMU TDT Team: Jaime Carbonell, Yiming Yang, Ralf Brown, Chun Jin, Jian Zhang Language Technologies Institute, CMU

New FSD Directions Topic-conditional models

E.g. “airplane,” “investigation,” “FAA,” “FBI,” “casualties,” topic, not event

“TWA 800,” “March 12, 1997” event First categorize into topic, then use

maximally-discriminative terms within topic

Rely on situated named entities E.g. “Arcan as victim,” “Sharon as peacemaker”

Page 8: CMU TDT Report 12-13 November 2001 The CMU TDT Team: Jaime Carbonell, Yiming Yang, Ralf Brown, Chun Jin, Jian Zhang Language Technologies Institute, CMU

A New Approach to First Story Detection for TDT

Page 9: CMU TDT Report 12-13 November 2001 The CMU TDT Team: Jaime Carbonell, Yiming Yang, Ralf Brown, Chun Jin, Jian Zhang Language Technologies Institute, CMU

Baseline Story-Link Detection Use same term-weighting and cosine

similarity as FSD and detection Decision Thresholds conditioned on

language and source Lower threshold for cross-language Lower threshold cross-ASR/newswire Thresholds trained on development set 15% improvement over universal

threshold

Page 10: CMU TDT Report 12-13 November 2001 The CMU TDT Team: Jaime Carbonell, Yiming Yang, Ralf Brown, Chun Jin, Jian Zhang Language Technologies Institute, CMU

Primary Link

Page 11: CMU TDT Report 12-13 November 2001 The CMU TDT Team: Jaime Carbonell, Yiming Yang, Ralf Brown, Chun Jin, Jian Zhang Language Technologies Institute, CMU

CMU Link

Page 12: CMU TDT Report 12-13 November 2001 The CMU TDT Team: Jaime Carbonell, Yiming Yang, Ralf Brown, Chun Jin, Jian Zhang Language Technologies Institute, CMU

CMU2 Link

Page 13: CMU TDT Report 12-13 November 2001 The CMU TDT Team: Jaime Carbonell, Yiming Yang, Ralf Brown, Chun Jin, Jian Zhang Language Technologies Institute, CMU

CMU Detection

Auto-segmentedboundaries

Pre-establishedboundaries

Cdet (basic) .0076 .0063Cdet (norm) .3786 .3138

Incremental Retrospective ClusteringGroup-Average in Forward Deferral WindowSame cosine similarity and terms weight as FSD