carnegie mellon nod and multilingual status report april 1998 carnegie mellon university howard d....

CarnegieMellon

NoD and Multilingual Status ReportApril 1998

Carnegie Mellon UniversityHoward D. Wactlar

Digital Video LibraryDigital Video Library

CarnegieMellon

MLI and NoD Tasks• Data collection & preparation - English, Serb-Croation, and

German

• Multilingual speech recognition enhancements

• Video and audio segmentation

• Multilingual indexing, retrieval, search

• Summarization-on-demand

• Annotations

• User studies

• Additional languages and functionalities

• Demonstration as a network-based service

CarnegieMellon

Accomplishments to Apr 98

We are achieving what we proposed and beyond

• Advances in capability (research => integrated function)

• Infrastructure evolution & growth

• Testbed activity and extension

• Related research and outreach

CarnegieMellon

Accomplishments to Apr 98 (cont’d)

• Serbo-Croation demonstration system

• Automated and dynamic abstraction and summarization for improved navigation

• Topic detection and assignment for subject browsing

• Dynamically improved speech recognition for index generation

• Coherent story segmentation through corpus specific, rule-based analysis

more ...

CarnegieMellon

Accomplishments to Apr 98 (cont’d)

• Video-OCR for improved name/face identification

• Multi-level annotations to mark and share commentary

• Web interface enabling “slide show” viewing over slow links

• Database restructuring to enable size growth and function evolution

• Remote testbeds with access to daily updated news

CarnegieMellon

Automated Abstraction and Summarization

• Critical to efficient navigation of video

• Improved automatic title generation

• Dynamic “poster frame” icons - query based

• Skims smoothed through enhanced language models and rule-based scene selection

CarnegieMellon

CarnegieMellon

“Naïve” Poster Frame Result List (Uses First Shot Image)

CarnegieMellon

Query-based Poster Frame Result List

CarnegieMellon

Query-based Poster Frame Selection Process

1. Decompose video segment into shots.2. Compute representative frame for each shot.

3. Locate query scoring words (shown by arrows).4. Use frame from highest scoring shot.

CarnegieMellon

Enhances browsing and discovery over directed search

Different methods from several areas being evaluated

• Information retrieval - vector space methods - relevance feedback

• Speech recognition - hidden Markov models

• Statistics - k-nearest neighbors - exponential models

Topic Detection and Tracking

CarnegieMellon

KNN-based Topic Detection

• Build training index with pre-labeled topics - 45000 Broadcast News stories from 1995 and 1996 - 3178 different news topics occurring > 10 times

• Search for top 10 related stories in training index

• Lookup topics for related stories

• Re-weight topics by story relevance (select top 5)

• At 5 topics, Recall - .491 Relevance - .482

CarnegieMellon

Speech Recognition for Index Generation

• Integrate closed captioning with speech recognition generated transcription

• Improve accuracy by automatic daily expansion of language model from closed captioning e.g. “Dodi Fayed”

• Participated (with Claritech) in TREC Spoken Document track

– large text retrieval evaluation benchmarks (NIST/DARPA)

– scored second due to OOV words (CIA, well-known, torched)

CarnegieMellon

Segmentation - Creating the Video Paragraph

Break up a video stream into semantically coherent pieces

• corpus-specific analysis

• language model approaches

• video structure analysis

CarnegieMellon

Segmentation - Commercial Detection

Look for several potential indicators in multiple passes

• detect lapses in cc capture greater than some threshold

• occurrence of black frames

• rate of scene change and motion

Ad Removal based on Black Frame and Scene Change Detection

Truth=>

Hypothesis=>

<= Black frames

<= Scene change

CarnegieMellon

Segmentation - Language Models

Novel application to find shift in topic within a document

• Adaptive exponential language models improve as they see more material from current topic

e.g., probable distance of “managed care” to “physicians”

• Static language models are pre-computed likelihood of short-range adjacency (e.g. trigrams)

• Compare predictive performance models

i.e., assigned probability to the next observed words

• A segment boundary is likely to exist when the adaptive model shows a dip in performance relative to the short-range model

CarnegieMellon

-0.05

0

0.05

0.1

0.15

0.2

0.25

-500 -400 -300 -200 -100 0 100 200 300 400 500

A plot of the ratio of the two language models as a function of the relative position in a segment.

CarnegieMellon

Image component crucial to news corpus

Capture of text overlayed on the video image

Detected, filtered, OCR’d, incorporated into content and indexed

Video OCR

CarnegieMellon

Video OCR Block Diagram

Text Area

Detection

Text Area

Preprocessing

Commercial

OCR

Video

ASCII Text

CarnegieMellon

Video Frames(1/2 s intervals)

Filtered Frames AND-ed Frames

CarnegieMellon

Text Detection False Alarms

Video Frame Filtered and Anded Frame

CarnegieMellon

Text Detection Misses

Video Frame Filtered and Anded Frame

CarnegieMellon

Challenges for VOCR Preprocessing

• The resolution of video text is very low (<10×10 ppc).

• Text detection and extraction are complicated by complex backgrounds.

CarnegieMellon

VOCR Preprocessing Problems

CarnegieMellon

CarnegieMellon

Character recognition - 83%Word recognition - 70%

Language model post processing will improve word recognition rate, but new names and places will not be in language model

Important adjunct to Name-It: name/face correlation through co-occurrence matrices

Video OCR - Results

CarnegieMellon

Annotation fields contain metadata automatically derived from the content (e.g. topics, chyron)

Annotations are included in the index (searchable separately or combined with transcript)

Personal annotations are typed or spoken comments that are established on a per user basis

• bookmarking or commentary

• fully indexed and searchable with other data

Annotations

CarnegieMellon

Long-time concern about video fidelity on internet

Compromise is slide show of high quality JPEG images and continuous audio

Not all navigation tools translate directly

Required substantive change in interface specification

Browsing improved over full video interface

User effectiveness versus full video to be explored

Web Interface

CarnegieMellon

Conversion of underlying database architecture (ONGOING)• extends functionality

- e.g. date filtering => “What’s new?” query• improved interoperability

- fully distributed, replicated function• increased scale• negative impact on query performance (improving)

Summer-long ruggedization program for reliable processing and quality control

900 hours on-line, terabyte data store

12 Alphas for parallel processing (and experiments)

Infrastructure Evolution and Growth

CarnegieMellon

Corpus

• CNN data: 620 hours + 12 hrs/wkEarly Prime, World View, Impact, Science & Technology Week, Earth Matters, Travel Guide, Your Health

Distant high speed network access

• Informedia-Net attached to both vBNS and AAI nets

• enables attachment of clients to CMU servers from selected locations

• clients at DARPA, SPAWAR (forthcoming), NSA

Testbeds

CarnegieMellon

Serbo-Croation LVCSR on the Dictation and Broadcast News Domain

• Informedia (English)– CMU Informedia Group (Howard Wactlar, Alex

Hauptmann, Ricky Houghton, et al.)– CMU Sphinx Group

• Multilingual Speech Recognition– CMU/UKA Interactive Systems Labs - JanusRTk (Alex

Waibel, Michael Finke, Petra Geutner, Peter Scheytt)• Translation/Cross Language Retrieval

– CMU Language Technologies Institute (Jaime Carbonell, Eric Nyberg, Bob Frederking, Paul Kennedy, et al.)

CarnegieMellon

Serbo-Croation Broadcast News Recognition

• Initial database: Globalphone Serbo-Croation (UKA)• Broadcast news: Collected by satellite from Germany

(UKA)• 15 hours transcribed• Janus recognition toolkit: 15 languages• Janus applied to Serbo-Croation broadcast news• Problem: Morphology, large number of inflections• Competitive performance already: 26% WER

CarnegieMellon

Vocabulary Growth Per Broadcast

Broadcast News System

0

5000

10000

15000

20000

25000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

News Broadcasts

Wo

rds

CarnegieMellon

Serbo-Croatian BN Speech Performance

Broadcast News System

73.6

43.6

36.0 29.5

26.0

0

10

20

30

40

50

60

70

80

August September October December January

WE

R [

%]

Language Normalization

Hypothesis DrivenLexicon Adaptation

CarnegieMellon

Informedia dataset and infrastructure as a benchmarkable testbed for research in spoken language and visual documents

Potential for establishing on-line public domain video archive

• e.g. all government produced video for training and public information

• fully indexed and searchable

Proposed National Research Data Testbed

CarnegieMellon

Project Genoa Contributions

• Code to extract video to place in a CIP

• Processing changes to index I-frames

• Code to run Web browser to play the MPEG segment

• Working towards a generic Web-based interface

• Other CMU: Meeting browser

• Full access to client but not full source code

CarnegieMellon

CMUInformedia

Server

CMUInformedia

Client(NOD)

CrisisBrowse ClientSpIKE/Visage/NOD?

Netscape

CrisisBrowseServer

MassStorage

CIPServer

?

Starlight

BWD

JTFPlanner

MIDB(S)

Sybase

MDITDS(S)

Sybase

JEDS

OSIS(U)

CIAFactbook

(U)

JANES(U)

Intelink-S

Pseudo-TS/SCI SecretUnclassified

WWW(U)

Starlight

?

DIAWash, DC

Pittsburgh, PA

Internet

CIALangley, VA

HPKB(U)

SIPRNETDISN LES

JEDS SAICSan Diego, CA

SAICSan Diego, CA

mpegjpegtxthtml

mpegjpegtxthtml

DB?DB?

DB?

Data Source PictureData Source PictureData Source PictureData Source Picture

DIAL-IN

NetworkNeighborhood

http

?

DARPA TIEArlington, VA

WorldEnergy

Database(U)

Access

CarnegieMellon

• Complete full-function Web interface

• Foreign language system unification

• S-C language models for improved query and selection

• S-C segmentation

• System completeness, robustness

• Should we pursue?

– Regular capture & processing

– Delivery to testbeds

Future Plans - Near Term

CarnegieMellon

Future Plans - Long Term

• NSA’s formal evaluation will help guide modifications and new features

• Other languages - Korean? Chinese?

• Translation? Translation tools?

• Named entity extraction: people, places, faces

• Geospatial correlation and visualization

• More content and multiple sources

• Multidocument summarization

CarnegieMellon

CarnegieMellon

Digital Video LibraryDigital Video Library

carnegie mellon nod and multilingual status report april 1998 carnegie mellon university howard d....

Documents