carnegie mellon nod and multilingual status report april 1998 carnegie mellon university howard d....

42
Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Digital Video Library Library

Post on 22-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library

CarnegieMellon

NoD and Multilingual Status ReportApril 1998

Carnegie Mellon UniversityHoward D. Wactlar

Digital Video LibraryDigital Video Library

Page 2: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library

CarnegieMellon

MLI and NoD Tasks• Data collection & preparation - English, Serb-Croation, and

German

• Multilingual speech recognition enhancements

• Video and audio segmentation

• Multilingual indexing, retrieval, search

• Summarization-on-demand

• Annotations

• User studies

• Additional languages and functionalities

• Demonstration as a network-based service

Page 3: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library

CarnegieMellon

Accomplishments to Apr 98

We are achieving what we proposed and beyond

• Advances in capability (research => integrated function)

• Infrastructure evolution & growth

• Testbed activity and extension

• Related research and outreach

Page 4: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library

CarnegieMellon

Accomplishments to Apr 98 (cont’d)

• Serbo-Croation demonstration system

• Automated and dynamic abstraction and summarization for improved navigation

• Topic detection and assignment for subject browsing

• Dynamically improved speech recognition for index generation

• Coherent story segmentation through corpus specific, rule-based analysis

more ...

Page 5: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library

CarnegieMellon

Accomplishments to Apr 98 (cont’d)

• Video-OCR for improved name/face identification

• Multi-level annotations to mark and share commentary

• Web interface enabling “slide show” viewing over slow links

• Database restructuring to enable size growth and function evolution

• Remote testbeds with access to daily updated news

Page 6: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library

CarnegieMellon

Automated Abstraction and Summarization

• Critical to efficient navigation of video

• Improved automatic title generation

• Dynamic “poster frame” icons - query based

• Skims smoothed through enhanced language models and rule-based scene selection

Page 7: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library

CarnegieMellon

Page 8: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library

CarnegieMellon

“Naïve” Poster Frame Result List (Uses First Shot Image)

Page 9: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library

CarnegieMellon

Query-based Poster Frame Result List

Page 10: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library

CarnegieMellon

Query-based Poster Frame Selection Process

1. Decompose video segment into shots.2. Compute representative frame for each shot.

3. Locate query scoring words (shown by arrows).4. Use frame from highest scoring shot.

Page 11: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library

CarnegieMellon

Enhances browsing and discovery over directed search

Different methods from several areas being evaluated

• Information retrieval - vector space methods - relevance feedback

• Speech recognition - hidden Markov models

• Statistics - k-nearest neighbors - exponential models

Topic Detection and Tracking

Page 12: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library

CarnegieMellon

KNN-based Topic Detection

• Build training index with pre-labeled topics - 45000 Broadcast News stories from 1995 and 1996 - 3178 different news topics occurring > 10 times

• Search for top 10 related stories in training index

• Lookup topics for related stories

• Re-weight topics by story relevance (select top 5)

• At 5 topics, Recall - .491 Relevance - .482

Page 13: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library

CarnegieMellon

Speech Recognition for Index Generation

• Integrate closed captioning with speech recognition generated transcription

• Improve accuracy by automatic daily expansion of language model from closed captioning e.g. “Dodi Fayed”

• Participated (with Claritech) in TREC Spoken Document track

– large text retrieval evaluation benchmarks (NIST/DARPA)

– scored second due to OOV words (CIA, well-known, torched)

Page 14: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library

CarnegieMellon

Segmentation - Creating the Video Paragraph

Break up a video stream into semantically coherent pieces

• corpus-specific analysis

• language model approaches

• video structure analysis

Page 15: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library

CarnegieMellon

Segmentation - Commercial Detection

Look for several potential indicators in multiple passes

• detect lapses in cc capture greater than some threshold

• occurrence of black frames

• rate of scene change and motion

Page 16: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library

Ad Removal based on Black Frame and Scene Change Detection

Truth=>

Hypothesis=>

<= Black frames

<= Scene change

Page 17: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library

CarnegieMellon

Segmentation - Language Models

Novel application to find shift in topic within a document

• Adaptive exponential language models improve as they see more material from current topic

e.g., probable distance of “managed care” to “physicians”

• Static language models are pre-computed likelihood of short-range adjacency (e.g. trigrams)

• Compare predictive performance models

i.e., assigned probability to the next observed words

• A segment boundary is likely to exist when the adaptive model shows a dip in performance relative to the short-range model

Page 18: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library

CarnegieMellon

-0.05

0

0.05

0.1

0.15

0.2

0.25

-500 -400 -300 -200 -100 0 100 200 300 400 500

A plot of the ratio of the two language models as a function of the relative position in a segment.

Page 19: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library

CarnegieMellon

Image component crucial to news corpus

Capture of text overlayed on the video image

Detected, filtered, OCR’d, incorporated into content and indexed

Video OCR

Page 20: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library

CarnegieMellon

Video OCR Block Diagram

Text Area

Detection

Text Area

Preprocessing

Commercial

OCR

Video

ASCII Text

Page 21: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library

CarnegieMellon

Video Frames(1/2 s intervals)

Filtered Frames AND-ed Frames

Page 22: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library

CarnegieMellon

Text Detection False Alarms

Video Frame Filtered and Anded Frame

Page 23: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library

CarnegieMellon

Text Detection Misses

Video Frame Filtered and Anded Frame

Page 24: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library

CarnegieMellon

Challenges for VOCR Preprocessing

• The resolution of video text is very low (<10×10 ppc).

• Text detection and extraction are complicated by complex backgrounds.

Page 25: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library

CarnegieMellon

VOCR Preprocessing Problems

Page 26: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library

CarnegieMellon

Page 27: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library

CarnegieMellon

Character recognition - 83%Word recognition - 70%

Language model post processing will improve word recognition rate, but new names and places will not be in language model

Important adjunct to Name-It: name/face correlation through co-occurrence matrices

Video OCR - Results

Page 28: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library

CarnegieMellon

Annotation fields contain metadata automatically derived from the content (e.g. topics, chyron)

Annotations are included in the index (searchable separately or combined with transcript)

Personal annotations are typed or spoken comments that are established on a per user basis

• bookmarking or commentary

• fully indexed and searchable with other data

Annotations

Page 29: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library

CarnegieMellon

Long-time concern about video fidelity on internet

Compromise is slide show of high quality JPEG images and continuous audio

Not all navigation tools translate directly

Required substantive change in interface specification

Browsing improved over full video interface

User effectiveness versus full video to be explored

Web Interface

Page 30: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library

CarnegieMellon

Conversion of underlying database architecture (ONGOING)• extends functionality

- e.g. date filtering => “What’s new?” query• improved interoperability

- fully distributed, replicated function• increased scale• negative impact on query performance (improving)

Summer-long ruggedization program for reliable processing and quality control

900 hours on-line, terabyte data store

12 Alphas for parallel processing (and experiments)

Infrastructure Evolution and Growth

Page 31: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library

CarnegieMellon

Corpus

• CNN data: 620 hours + 12 hrs/wkEarly Prime, World View, Impact, Science & Technology Week, Earth Matters, Travel Guide, Your Health

Distant high speed network access

• Informedia-Net attached to both vBNS and AAI nets

• enables attachment of clients to CMU servers from selected locations

• clients at DARPA, SPAWAR (forthcoming), NSA

Testbeds

Page 32: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library

CarnegieMellon

Serbo-Croation LVCSR on the Dictation and Broadcast News Domain

• Informedia (English)– CMU Informedia Group (Howard Wactlar, Alex

Hauptmann, Ricky Houghton, et al.)– CMU Sphinx Group

• Multilingual Speech Recognition– CMU/UKA Interactive Systems Labs - JanusRTk (Alex

Waibel, Michael Finke, Petra Geutner, Peter Scheytt)• Translation/Cross Language Retrieval

– CMU Language Technologies Institute (Jaime Carbonell, Eric Nyberg, Bob Frederking, Paul Kennedy, et al.)

Page 33: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library

CarnegieMellon

Serbo-Croation Broadcast News Recognition

• Initial database: Globalphone Serbo-Croation (UKA)• Broadcast news: Collected by satellite from Germany

(UKA)• 15 hours transcribed• Janus recognition toolkit: 15 languages• Janus applied to Serbo-Croation broadcast news• Problem: Morphology, large number of inflections• Competitive performance already: 26% WER

Page 34: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library

CarnegieMellon

Vocabulary Growth Per Broadcast

Broadcast News System

0

5000

10000

15000

20000

25000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

News Broadcasts

Wo

rds

Page 35: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library

CarnegieMellon

Serbo-Croatian BN Speech Performance

Broadcast News System

73.6

43.6

36.0 29.5

26.0

0

10

20

30

40

50

60

70

80

August September October December January

WE

R [

%]

Language Normalization

Hypothesis DrivenLexicon Adaptation

Page 36: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library

CarnegieMellon

Informedia dataset and infrastructure as a benchmarkable testbed for research in spoken language and visual documents

Potential for establishing on-line public domain video archive

• e.g. all government produced video for training and public information

• fully indexed and searchable

Proposed National Research Data Testbed

Page 37: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library

CarnegieMellon

Project Genoa Contributions

• Code to extract video to place in a CIP

• Processing changes to index I-frames

• Code to run Web browser to play the MPEG segment

• Working towards a generic Web-based interface

• Other CMU: Meeting browser

• Full access to client but not full source code

Page 38: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library

CarnegieMellon

CMUInformedia

Server

CMUInformedia

Client(NOD)

CrisisBrowse ClientSpIKE/Visage/NOD?

Netscape

CrisisBrowseServer

MassStorage

CIPServer

?

Starlight

BWD

JTFPlanner

MIDB(S)

Sybase

MDITDS(S)

Sybase

JEDS

OSIS(U)

CIAFactbook

(U)

JANES(U)

Intelink-S

Pseudo-TS/SCI SecretUnclassified

WWW(U)

Starlight

?

DIAWash, DC

Pittsburgh, PA

Internet

CIALangley, VA

HPKB(U)

SIPRNETDISN LES

JEDS SAICSan Diego, CA

SAICSan Diego, CA

mpegjpegtxthtml

mpegjpegtxthtml

DB?DB?

DB?

Data Source PictureData Source PictureData Source PictureData Source Picture

DIAL-IN

NetworkNeighborhood

http

?

DARPA TIEArlington, VA

WorldEnergy

Database(U)

Access

Page 39: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library

CarnegieMellon

• Complete full-function Web interface

• Foreign language system unification

• S-C language models for improved query and selection

• S-C segmentation

• System completeness, robustness

• Should we pursue?

– Regular capture & processing

– Delivery to testbeds

Future Plans - Near Term

Page 40: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library

CarnegieMellon

Future Plans - Long Term

• NSA’s formal evaluation will help guide modifications and new features

• Other languages - Korean? Chinese?

• Translation? Translation tools?

• Named entity extraction: people, places, faces

• Geospatial correlation and visualization

• More content and multiple sources

• Multidocument summarization

Page 41: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library

CarnegieMellon

Page 42: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library

CarnegieMellon

Digital Video LibraryDigital Video Library