cmu tdt report 12-13 november 2001 the cmu tdt team: jaime carbonell, yiming yang, ralf brown, chun...

CMU TDT Report 12-13 November 2001

The CMU TDT Team:Jaime Carbonell, Yiming Yang, Ralf

Brown, Chun Jin, Jian ZhangLanguage Technologies Institute, CMU

Time Line for TDT Activities (Re)Start: Summer 2001 Baseline FSD, Link, Det: Sept 2001 Evaluation (of baseline): Oct 2001 New Techniques: Nov 2001 – Onwards

Topic-conditional Novelty Situated NE’s (all tasks) Source-conditional interpolated training

Baseline FSD Method (Unconditional) Dissimilarity with

Past Decision threshold on most-similar

story (Linear) temporal decay Length-filter (for teasers)

Cosine similarity with standard weights:

)/log(*))log(1( idfNtftfidf

FSD ResultsStory weighted

Topic weighted

P(miss) .6028 .6028 P(F/A) .0207 .0186 Cost .0141 .0143 Norm Cost

.7043 .7217

Opt N. Cost

.6807 .6807

Comparative FSD DET Curves

FSD Observations Cross-site comparable baselines (cost =.7) Data/labeling issues (from error analysis)

“Events-vs-Topics” issue (e.g. Asia crisis) A few mislabled stories wreak havoc for FSD Eager auto-segmentation a problem (misses)

Recommendations for TDT labeling FSD on true events, or events within topic(s) Change auto-segmentation optimality

criterion ?? Recommendations for TDT reserachers

Keep working hard on FSD – not cracked yet

New FSD Directions Topic-conditional models

E.g. “airplane,” “investigation,” “FAA,” “FBI,” “casualties,” topic, not event

“TWA 800,” “March 12, 1997” event First categorize into topic, then use

maximally-discriminative terms within topic

Rely on situated named entities E.g. “Arcan as victim,” “Sharon as peacemaker”

A New Approach to First Story Detection for TDT

Baseline Story-Link Detection Use same term-weighting and cosine

similarity as FSD and detection Decision Thresholds conditioned on

language and source Lower threshold for cross-language Lower threshold cross-ASR/newswire Thresholds trained on development set 15% improvement over universal

threshold

Primary Link

CMU Link

CMU2 Link

CMU Detection

Auto-segmentedboundaries

Pre-establishedboundaries

Cdet (basic) .0076 .0063Cdet (norm) .3786 .3138

Incremental Retrospective ClusteringGroup-Average in Forward Deferral WindowSame cosine similarity and terms weight as FSD

cmu tdt report 12-13 november 2001 the cmu tdt team: jaime carbonell, yiming yang, ralf brown, chun...

Documents