question answering from errorful multimedia streams aquaint pi meeting – june 2002
DESCRIPTION
Digital Video Library. Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting – June 2002 Howard D. Wactlar Carnegie Mellon University, USA. Outline. Goals for QA from multimedia Background Informedia Information extraction Determining answer information - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting – June 2002](https://reader035.vdocument.in/reader035/viewer/2022062520/5681584b550346895dc5a250/html5/thumbnails/1.jpg)
Question Answering from Errorful Multimedia Streams
AQUAINT PI Meeting – June 2002
Howard D. WactlarCarnegie Mellon University, USA
Digital Video LibraryDigital Video Library
![Page 2: Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting – June 2002](https://reader035.vdocument.in/reader035/viewer/2022062520/5681584b550346895dc5a250/html5/thumbnails/2.jpg)
Outline
• Goals for QA from multimedia
• Background- Informedia
- Information extraction
• Determining answer information
• Presenting the answer and follow-up
![Page 3: Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting – June 2002](https://reader035.vdocument.in/reader035/viewer/2022062520/5681584b550346895dc5a250/html5/thumbnails/3.jpg)
![Page 4: Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting – June 2002](https://reader035.vdocument.in/reader035/viewer/2022062520/5681584b550346895dc5a250/html5/thumbnails/4.jpg)
Why is Multimedia Important
• TV and radio broadcasts record human events across the globe
• Broadcast interviews, analysis and opinions created globally provide varied interpretive perspectives and context
• Images of people, events, maps and charts provide additional content not conveyed orally
- May be correlated with the spoken words
• Some pictures are worth a thousand words
![Page 5: Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting – June 2002](https://reader035.vdocument.in/reader035/viewer/2022062520/5681584b550346895dc5a250/html5/thumbnails/5.jpg)
Annual Video and Audio Production
Commercial
• 4500 motion pictures -> 9,000 hours/year (4.5 TB)
• 33,000 TV stations x 4 hrs/day -> 48,000,000 hrs/yr (24,000 TB)
• 44,000 radio stations x 4 hrs/day -> 65,500,000 hrs/yr (3,275 TB)
Personal
• Photographs: 80 billion images -> 410,000 TB/yr
• Home videos: 1.4 billion tapes -> 300,000 TB/yr
• X-rays: 2 billion -> 17,000 TB/yr
Surveillance
• Airports: 14,000 terminals x 140 cameras x 24 hrs/day -> 48 M hrs/day
![Page 6: Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting – June 2002](https://reader035.vdocument.in/reader035/viewer/2022062520/5681584b550346895dc5a250/html5/thumbnails/6.jpg)
Background
REQUIREMENTS:
- Automated process for information extraction from video
- Full-content search and retrieval from any spoken language and visual document
Establishment of large video libraries as a network searchable information resource
Mission: Enable Search and Discovery in the Video Medium
APPROACH: Integration of machine speech, image and natural language
understanding for library creation and exploration
Exploit operational Informedia DVL infrastructure and technology
![Page 7: Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting – June 2002](https://reader035.vdocument.in/reader035/viewer/2022062520/5681584b550346895dc5a250/html5/thumbnails/7.jpg)
Indexing
Relevant Result SetRelevant Result Set
Requested Segment Requested Segment or Summarizationor Summarization
Information Exploration & DiscoveryInformation Exploration & DiscoveryONLINEONLINE
MultimodalMultimodalQueriesQueries
AnalystAnalyst
BrowsingBrowsingand Query and Query RefinementRefinement
Information Collection & AnalysisInformation Collection & AnalysisOFFLINEOFFLINE
Indexed DatabaseIndexed SegmentedTranscript Compressed Audio/Video& Images
Distribution To Users
Processing
Entity ExtractionFace, OCR Text Recognition
1010
011
100 01 10
Surveillance Broadcast TV Radio
Digital Encoding
ImageAnalysis
Speech Analysis
Informedia System Architecture
![Page 8: Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting – June 2002](https://reader035.vdocument.in/reader035/viewer/2022062520/5681584b550346895dc5a250/html5/thumbnails/8.jpg)
![Page 9: Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting – June 2002](https://reader035.vdocument.in/reader035/viewer/2022062520/5681584b550346895dc5a250/html5/thumbnails/9.jpg)
Related Language Processing Work
• MUC, DUC, TREC especially QA track- Pronoun and Anaphora resolution
- Part-of-speech tagging
- Fact extraction
- Summarization
- Question-answering
…Electronic text focus
![Page 10: Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting – June 2002](https://reader035.vdocument.in/reader035/viewer/2022062520/5681584b550346895dc5a250/html5/thumbnails/10.jpg)
Why is Multimedia Hard
• It’s a fundamentally linear, temporal medium
• Speech, image and language understanding are all errorful, ambiguous and incomplete
• Information must be time-synchronized and correlated across modalities for both produced and natural video
• Verbal content lacks:- sentence boundaries,
- punctuation,
- capitalization …that enables a syntactic analysis
• Image recognition w/o known context is very limited
• Many errors from many sources!
![Page 11: Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting – June 2002](https://reader035.vdocument.in/reader035/viewer/2022062520/5681584b550346895dc5a250/html5/thumbnails/11.jpg)
Why We Think the Problems are Trackable
• Lot’s of data enables LEARNING systems
• Have shown complete or perfect information is not necessary
• Utilize multiple sources of information jointly: - text, image, audio, web text and databases
![Page 12: Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting – June 2002](https://reader035.vdocument.in/reader035/viewer/2022062520/5681584b550346895dc5a250/html5/thumbnails/12.jpg)
Research Focus
• Determining the answer information- Resolving co-references
- Discovering semantic relations
- Learning Information flow
- Hardening uncertain information
• Organizing and presenting the answer result- Text summaries
- Augmenting contextual material
- Maps, charts and images to allow follow-up questions
- Explicit representation of uncertainty
![Page 13: Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting – June 2002](https://reader035.vdocument.in/reader035/viewer/2022062520/5681584b550346895dc5a250/html5/thumbnails/13.jpg)
Resolving Co-references
• When is the same person mentioned (or seen, or identified)
• Places referenced (in words, on signs, on maps)
• Organizations cited (verbally, on signage, in charts)
• Requires:- Pronoun resolution
- Merge multiple spellings, abbreviations and contractions
- Merge across media (OCR, audio, text, faces)
![Page 14: Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting – June 2002](https://reader035.vdocument.in/reader035/viewer/2022062520/5681584b550346895dc5a250/html5/thumbnails/14.jpg)
Mining Links and Learning Semantic Relations
• Visualize co-occurrence in documents, in location, in time- Location can be variably sized regions
- Times can be arbitrary periods
• Finding semantic roles for related named entities- Dr. X is CEO of company Y
![Page 15: Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting – June 2002](https://reader035.vdocument.in/reader035/viewer/2022062520/5681584b550346895dc5a250/html5/thumbnails/15.jpg)
![Page 16: Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting – June 2002](https://reader035.vdocument.in/reader035/viewer/2022062520/5681584b550346895dc5a250/html5/thumbnails/16.jpg)
Active Hardening of Evidence
• Extracted information is noisy
• Acquire new supporting or falsifying evidence from other sources (web)
- On-demand or
- Automatically when original evidence is weak
…Result is higher fidelity information
![Page 17: Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting – June 2002](https://reader035.vdocument.in/reader035/viewer/2022062520/5681584b550346895dc5a250/html5/thumbnails/17.jpg)
Learning Information Flow
Tightly correlated
Information flow
Conditional information flow3-6 days
CNN ABC
Radio Duetsch Welle
(Germany)
Wiretap 1(Saudi Arabia)
HiddenSource 3
3-6 days
HiddenSource 4
RadioTehran(Iran)
Lifestyle news
HiddenSource 1
HiddenSource 2
News onMiddle East,
407 days
![Page 18: Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting – June 2002](https://reader035.vdocument.in/reader035/viewer/2022062520/5681584b550346895dc5a250/html5/thumbnails/18.jpg)
Learning Information Flow
• Where did a fact originate?
• Multiple sources report facts over time, with small changes- E.g. Different newspapers get the same story from AP or
Reuters source. Story ‘looks’ different.
- Imagery frequently is reused as well
• Columbia’s Newsblaster exploits this idea for summarization of the core story sentences
![Page 19: Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting – June 2002](https://reader035.vdocument.in/reader035/viewer/2022062520/5681584b550346895dc5a250/html5/thumbnails/19.jpg)
Integrated Analysis Environment
• Summarize multimedia information visually and textually
• Allow explicit display of and control over acceptable level of uncertainty
• Show link structure of entities and relations
• Interactive visualization for drill-down and follow-up
![Page 20: Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting – June 2002](https://reader035.vdocument.in/reader035/viewer/2022062520/5681584b550346895dc5a250/html5/thumbnails/20.jpg)
Strategic Advantages of Multimedia Analysis and Response
• Collect Large Amounts of Data
• Learning Approaches
• Leverage across media types
• Perfection is not necessary (80% solution may be ok)
• User in the loop filters remaining errors
• Effective interfaces and visualizations
![Page 21: Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting – June 2002](https://reader035.vdocument.in/reader035/viewer/2022062520/5681584b550346895dc5a250/html5/thumbnails/21.jpg)
Digital Video LibraryDigital Video Library