carnegie mellon nod and multilingual status report april 1998 carnegie mellon university howard d....
Post on 22-Dec-2015
216 views
TRANSCRIPT
CarnegieMellon
NoD and Multilingual Status ReportApril 1998
Carnegie Mellon UniversityHoward D. Wactlar
Digital Video LibraryDigital Video Library
CarnegieMellon
MLI and NoD Tasks• Data collection & preparation - English, Serb-Croation, and
German
• Multilingual speech recognition enhancements
• Video and audio segmentation
• Multilingual indexing, retrieval, search
• Summarization-on-demand
• Annotations
• User studies
• Additional languages and functionalities
• Demonstration as a network-based service
CarnegieMellon
Accomplishments to Apr 98
We are achieving what we proposed and beyond
• Advances in capability (research => integrated function)
• Infrastructure evolution & growth
• Testbed activity and extension
• Related research and outreach
CarnegieMellon
Accomplishments to Apr 98 (cont’d)
• Serbo-Croation demonstration system
• Automated and dynamic abstraction and summarization for improved navigation
• Topic detection and assignment for subject browsing
• Dynamically improved speech recognition for index generation
• Coherent story segmentation through corpus specific, rule-based analysis
more ...
CarnegieMellon
Accomplishments to Apr 98 (cont’d)
• Video-OCR for improved name/face identification
• Multi-level annotations to mark and share commentary
• Web interface enabling “slide show” viewing over slow links
• Database restructuring to enable size growth and function evolution
• Remote testbeds with access to daily updated news
CarnegieMellon
Automated Abstraction and Summarization
• Critical to efficient navigation of video
• Improved automatic title generation
• Dynamic “poster frame” icons - query based
• Skims smoothed through enhanced language models and rule-based scene selection
CarnegieMellon
CarnegieMellon
“Naïve” Poster Frame Result List (Uses First Shot Image)
CarnegieMellon
Query-based Poster Frame Result List
CarnegieMellon
Query-based Poster Frame Selection Process
1. Decompose video segment into shots.2. Compute representative frame for each shot.
3. Locate query scoring words (shown by arrows).4. Use frame from highest scoring shot.
CarnegieMellon
Enhances browsing and discovery over directed search
Different methods from several areas being evaluated
• Information retrieval - vector space methods - relevance feedback
• Speech recognition - hidden Markov models
• Statistics - k-nearest neighbors - exponential models
Topic Detection and Tracking
CarnegieMellon
KNN-based Topic Detection
• Build training index with pre-labeled topics - 45000 Broadcast News stories from 1995 and 1996 - 3178 different news topics occurring > 10 times
• Search for top 10 related stories in training index
• Lookup topics for related stories
• Re-weight topics by story relevance (select top 5)
• At 5 topics, Recall - .491 Relevance - .482
CarnegieMellon
Speech Recognition for Index Generation
• Integrate closed captioning with speech recognition generated transcription
• Improve accuracy by automatic daily expansion of language model from closed captioning e.g. “Dodi Fayed”
• Participated (with Claritech) in TREC Spoken Document track
– large text retrieval evaluation benchmarks (NIST/DARPA)
– scored second due to OOV words (CIA, well-known, torched)
CarnegieMellon
Segmentation - Creating the Video Paragraph
Break up a video stream into semantically coherent pieces
• corpus-specific analysis
• language model approaches
• video structure analysis
CarnegieMellon
Segmentation - Commercial Detection
Look for several potential indicators in multiple passes
• detect lapses in cc capture greater than some threshold
• occurrence of black frames
• rate of scene change and motion
Ad Removal based on Black Frame and Scene Change Detection
Truth=>
Hypothesis=>
<= Black frames
<= Scene change
CarnegieMellon
Segmentation - Language Models
Novel application to find shift in topic within a document
• Adaptive exponential language models improve as they see more material from current topic
e.g., probable distance of “managed care” to “physicians”
• Static language models are pre-computed likelihood of short-range adjacency (e.g. trigrams)
• Compare predictive performance models
i.e., assigned probability to the next observed words
• A segment boundary is likely to exist when the adaptive model shows a dip in performance relative to the short-range model
CarnegieMellon
-0.05
0
0.05
0.1
0.15
0.2
0.25
-500 -400 -300 -200 -100 0 100 200 300 400 500
A plot of the ratio of the two language models as a function of the relative position in a segment.
CarnegieMellon
Image component crucial to news corpus
Capture of text overlayed on the video image
Detected, filtered, OCR’d, incorporated into content and indexed
Video OCR
CarnegieMellon
Video OCR Block Diagram
Text Area
Detection
Text Area
Preprocessing
Commercial
OCR
Video
ASCII Text
CarnegieMellon
Video Frames(1/2 s intervals)
Filtered Frames AND-ed Frames
CarnegieMellon
Text Detection False Alarms
Video Frame Filtered and Anded Frame
CarnegieMellon
Text Detection Misses
Video Frame Filtered and Anded Frame
CarnegieMellon
Challenges for VOCR Preprocessing
• The resolution of video text is very low (<10×10 ppc).
• Text detection and extraction are complicated by complex backgrounds.
CarnegieMellon
VOCR Preprocessing Problems
CarnegieMellon
CarnegieMellon
Character recognition - 83%Word recognition - 70%
Language model post processing will improve word recognition rate, but new names and places will not be in language model
Important adjunct to Name-It: name/face correlation through co-occurrence matrices
Video OCR - Results
CarnegieMellon
Annotation fields contain metadata automatically derived from the content (e.g. topics, chyron)
Annotations are included in the index (searchable separately or combined with transcript)
Personal annotations are typed or spoken comments that are established on a per user basis
• bookmarking or commentary
• fully indexed and searchable with other data
Annotations
CarnegieMellon
Long-time concern about video fidelity on internet
Compromise is slide show of high quality JPEG images and continuous audio
Not all navigation tools translate directly
Required substantive change in interface specification
Browsing improved over full video interface
User effectiveness versus full video to be explored
Web Interface
CarnegieMellon
Conversion of underlying database architecture (ONGOING)• extends functionality
- e.g. date filtering => “What’s new?” query• improved interoperability
- fully distributed, replicated function• increased scale• negative impact on query performance (improving)
Summer-long ruggedization program for reliable processing and quality control
900 hours on-line, terabyte data store
12 Alphas for parallel processing (and experiments)
Infrastructure Evolution and Growth
CarnegieMellon
Corpus
• CNN data: 620 hours + 12 hrs/wkEarly Prime, World View, Impact, Science & Technology Week, Earth Matters, Travel Guide, Your Health
Distant high speed network access
• Informedia-Net attached to both vBNS and AAI nets
• enables attachment of clients to CMU servers from selected locations
• clients at DARPA, SPAWAR (forthcoming), NSA
Testbeds
CarnegieMellon
Serbo-Croation LVCSR on the Dictation and Broadcast News Domain
• Informedia (English)– CMU Informedia Group (Howard Wactlar, Alex
Hauptmann, Ricky Houghton, et al.)– CMU Sphinx Group
• Multilingual Speech Recognition– CMU/UKA Interactive Systems Labs - JanusRTk (Alex
Waibel, Michael Finke, Petra Geutner, Peter Scheytt)• Translation/Cross Language Retrieval
– CMU Language Technologies Institute (Jaime Carbonell, Eric Nyberg, Bob Frederking, Paul Kennedy, et al.)
CarnegieMellon
Serbo-Croation Broadcast News Recognition
• Initial database: Globalphone Serbo-Croation (UKA)• Broadcast news: Collected by satellite from Germany
(UKA)• 15 hours transcribed• Janus recognition toolkit: 15 languages• Janus applied to Serbo-Croation broadcast news• Problem: Morphology, large number of inflections• Competitive performance already: 26% WER
CarnegieMellon
Vocabulary Growth Per Broadcast
Broadcast News System
0
5000
10000
15000
20000
25000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
News Broadcasts
Wo
rds
CarnegieMellon
Serbo-Croatian BN Speech Performance
Broadcast News System
73.6
43.6
36.0 29.5
26.0
0
10
20
30
40
50
60
70
80
August September October December January
WE
R [
%]
Language Normalization
Hypothesis DrivenLexicon Adaptation
CarnegieMellon
Informedia dataset and infrastructure as a benchmarkable testbed for research in spoken language and visual documents
Potential for establishing on-line public domain video archive
• e.g. all government produced video for training and public information
• fully indexed and searchable
Proposed National Research Data Testbed
CarnegieMellon
Project Genoa Contributions
• Code to extract video to place in a CIP
• Processing changes to index I-frames
• Code to run Web browser to play the MPEG segment
• Working towards a generic Web-based interface
• Other CMU: Meeting browser
• Full access to client but not full source code
CarnegieMellon
CMUInformedia
Server
CMUInformedia
Client(NOD)
CrisisBrowse ClientSpIKE/Visage/NOD?
Netscape
CrisisBrowseServer
MassStorage
CIPServer
?
Starlight
BWD
JTFPlanner
MIDB(S)
Sybase
MDITDS(S)
Sybase
JEDS
OSIS(U)
CIAFactbook
(U)
JANES(U)
Intelink-S
Pseudo-TS/SCI SecretUnclassified
WWW(U)
Starlight
?
DIAWash, DC
Pittsburgh, PA
Internet
CIALangley, VA
HPKB(U)
SIPRNETDISN LES
JEDS SAICSan Diego, CA
SAICSan Diego, CA
mpegjpegtxthtml
mpegjpegtxthtml
DB?DB?
DB?
Data Source PictureData Source PictureData Source PictureData Source Picture
DIAL-IN
NetworkNeighborhood
http
?
DARPA TIEArlington, VA
WorldEnergy
Database(U)
Access
CarnegieMellon
• Complete full-function Web interface
• Foreign language system unification
• S-C language models for improved query and selection
• S-C segmentation
• System completeness, robustness
• Should we pursue?
– Regular capture & processing
– Delivery to testbeds
Future Plans - Near Term
CarnegieMellon
Future Plans - Long Term
• NSA’s formal evaluation will help guide modifications and new features
• Other languages - Korean? Chinese?
• Translation? Translation tools?
• Named entity extraction: people, places, faces
• Geospatial correlation and visualization
• More content and multiple sources
• Multidocument summarization
CarnegieMellon
CarnegieMellon
Digital Video LibraryDigital Video Library