trecvid evaluations
DESCRIPTION
TRECVID Evaluations. Mei-Chen Yeh 03/27/2012. Introduction. Text REtrieval Conference (TREC) Organized by National Institute of Standards (NIST) Support from government agencies Annual evaluation (NOT a competition) Different “tracks” over the years, e.g. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: TRECVID Evaluations](https://reader036.vdocument.in/reader036/viewer/2022062222/56816302550346895dd37b17/html5/thumbnails/1.jpg)
TRECVID Evaluations
Mei-Chen Yeh03/27/2012
![Page 2: TRECVID Evaluations](https://reader036.vdocument.in/reader036/viewer/2022062222/56816302550346895dd37b17/html5/thumbnails/2.jpg)
Introduction
• Text REtrieval Conference (TREC)– Organized by National Institute of Standards (NIST)– Support from government agencies– Annual evaluation (NOT a competition)– Different “tracks” over the years, e.g.• web retrieval, email spam filtering, question answering,
routing, spoken documents, OCR, video (standalone conference from 2001)
• TREC Video Retrieval Evaluation (TRECVID)
![Page 3: TRECVID Evaluations](https://reader036.vdocument.in/reader036/viewer/2022062222/56816302550346895dd37b17/html5/thumbnails/3.jpg)
Introduction
• Objectives of TRECVID– Promote progress in content-based analysis and
retrieval from digital videos– Provide open, metrics-based evaluation– Model real world situations
![Page 4: TRECVID Evaluations](https://reader036.vdocument.in/reader036/viewer/2022062222/56816302550346895dd37b17/html5/thumbnails/4.jpg)
Introduction
• Evaluation is driven by participants• The collection is fixed, available in the spring– 50% data used for development, 50% for testing
• Test queries available in July, 1 month to submission
• More details: – http://trecvid.nist.gov/
![Page 5: TRECVID Evaluations](https://reader036.vdocument.in/reader036/viewer/2022062222/56816302550346895dd37b17/html5/thumbnails/5.jpg)
TRECVID Video Collections• Test data
– Broadcast news– TV programs– Surveillance videos– Video rushes provided by BBC– Documentary and educational materials supplied by the Netherlands Institute
for Sound and Vision (2007-2009)– The Gatwick airport surveillance videos provided by the UK Home Office (2009)– Web videos (2010)
• Languages– English– Arabic– Chinese
![Page 6: TRECVID Evaluations](https://reader036.vdocument.in/reader036/viewer/2022062222/56816302550346895dd37b17/html5/thumbnails/6.jpg)
Collection History
![Page 7: TRECVID Evaluations](https://reader036.vdocument.in/reader036/viewer/2022062222/56816302550346895dd37b17/html5/thumbnails/7.jpg)
Collection History
• 2011– 19200 online videos (150 GB, 600 hours)– 50 hours of airport surveillance videos
• 2012– 27200 online videos (200 GB, 800 hours)– 21,000 equal-length, short clips of BBC rush videos– airport surveillance videos (not yet announced)– ~4,000-hour collection of Internet multimedia
![Page 8: TRECVID Evaluations](https://reader036.vdocument.in/reader036/viewer/2022062222/56816302550346895dd37b17/html5/thumbnails/8.jpg)
Tasks
• Semantic indexing (SIN) • Known-item search (KIS) • Content-based copy detection (CCD) – by 2011• Interactive surveillance event detection (SED) • Instance search (INS) • Multimedia event detection (MED)• Multimedia event recounting (MER) – since
2012
![Page 9: TRECVID Evaluations](https://reader036.vdocument.in/reader036/viewer/2022062222/56816302550346895dd37b17/html5/thumbnails/9.jpg)
Semantic indexing
• System task:– Given the test collection, master shot reference,
and concept definitions, return for each concept a list of at most 2000 shot IDs from the test collection ranked according to their likeliness of containing the concept.
• 500 concepts (since 2011)• “Concept pair” (2012)
![Page 10: TRECVID Evaluations](https://reader036.vdocument.in/reader036/viewer/2022062222/56816302550346895dd37b17/html5/thumbnails/10.jpg)
Examples• Boy (One or more male children)• Teenager• Scientists (Images of people who appear to be scientists)• Dark skinned people• Handshaking• Running• Throwing• Eaters (Putting food or drink in his/her mouth)• Sadness • Anger• Windy (Scenes showing windy weather) Full list
![Page 11: TRECVID Evaluations](https://reader036.vdocument.in/reader036/viewer/2022062222/56816302550346895dd37b17/html5/thumbnails/11.jpg)
Example (concept pair)• Beach + Mountain • Old_People + Flags • Animal + Snow • Bird + Waterscape_waterfront • Dog + Indoor • Driver + Female_Human_Face • Person + Underwater • Table + Telephone • Two_People + Vegetation • Car + Bicycle
![Page 12: TRECVID Evaluations](https://reader036.vdocument.in/reader036/viewer/2022062222/56816302550346895dd37b17/html5/thumbnails/12.jpg)
Known-item search
• Models the situation in which someone knows of a video, has seen it before, believes it is contained in a collection, but doesn't know where to look.
• Inputs– A text-only description of the video desired– A test collection of videos
• Outputs– Top ranked videos (automatic or interactive mode)
![Page 13: TRECVID Evaluations](https://reader036.vdocument.in/reader036/viewer/2022062222/56816302550346895dd37b17/html5/thumbnails/13.jpg)
Examples
• Find the video with the guy talking about how it just keeps raining.
• Find the video about some guys in their apartment talking about some cleaning schedule.
• Find the video where a guy talks about the FBI and Britney Spears.
• Find the video with the guy in a yellow T-shirt with the big letter M on it.
• …http://www-nlpir.nist.gov/projects/tv2010/ki.examples.html
![Page 14: TRECVID Evaluations](https://reader036.vdocument.in/reader036/viewer/2022062222/56816302550346895dd37b17/html5/thumbnails/14.jpg)
Content-based copy detection
![Page 15: TRECVID Evaluations](https://reader036.vdocument.in/reader036/viewer/2022062222/56816302550346895dd37b17/html5/thumbnails/15.jpg)
![Page 16: TRECVID Evaluations](https://reader036.vdocument.in/reader036/viewer/2022062222/56816302550346895dd37b17/html5/thumbnails/16.jpg)
Surveillance event detection• Detects human behaviors in vast amounts surveillance
video, real time!• For public safety and security• Event examples– Person runs– Cell to ear– Object put– People meet– Embrace– Pointing– …
![Page 17: TRECVID Evaluations](https://reader036.vdocument.in/reader036/viewer/2022062222/56816302550346895dd37b17/html5/thumbnails/17.jpg)
Instance search
• Finds video segments of a certain specific person, object, or place, given a visual example.
![Page 18: TRECVID Evaluations](https://reader036.vdocument.in/reader036/viewer/2022062222/56816302550346895dd37b17/html5/thumbnails/18.jpg)
Instance search
• Input– a collection of test clips– a collection of queries that delimit a person,
object, or place entity in some example video• Output– for each query up to the 1000 clips most likely to
contain a recognizable instance of the entity
![Page 19: TRECVID Evaluations](https://reader036.vdocument.in/reader036/viewer/2022062222/56816302550346895dd37b17/html5/thumbnails/19.jpg)
Query examples
![Page 20: TRECVID Evaluations](https://reader036.vdocument.in/reader036/viewer/2022062222/56816302550346895dd37b17/html5/thumbnails/20.jpg)
Multimedia event detection • System task
– Given a collection of test videos and a list of test events, indicate whether each of the test events is present anywhere in each of the test videos and give the strength of evidence for each such judgment.
• In 2010– Making a cake: one or more people make a cake – Batting a run in: within a single play during a baseball-type game, a
batter hits a ball and one or more runners (possibly including the batter) scores a run
– Assembling a shelter: one or more people construct a temporary or semi-permanent shelter for humans that could provide protection from the elements.
• 15 new events are released for 2011, not yet announced for 2012.
![Page 21: TRECVID Evaluations](https://reader036.vdocument.in/reader036/viewer/2022062222/56816302550346895dd37b17/html5/thumbnails/21.jpg)
Multimedia event recounting• New in 2012• Task
– Once a multimedia event detection system has found an event in a video clip, it is useful for a human user to be able to examine the evidence on which the system's decision was based. An important goal is for that evidence to be semantically meaningful to a human.
• Input– a clip and a event kit (name, definition, explication--textual exposition
of the terms and concepts, evidential descriptions, and illustrative video exemplars)
• Output– a clear, concise text-only (alphanumeric) recounting or summary of
the key evidence that the event does in fact occur in the video
![Page 22: TRECVID Evaluations](https://reader036.vdocument.in/reader036/viewer/2022062222/56816302550346895dd37b17/html5/thumbnails/22.jpg)
Schedule
• Feb. call for participation• Apr. complete the guidelines• Jun.-Jul. release query data• Sep. submission due• Oct. return the results• Nov. paper submission due• Dec. workshop
![Page 23: TRECVID Evaluations](https://reader036.vdocument.in/reader036/viewer/2022062222/56816302550346895dd37b17/html5/thumbnails/23.jpg)
Call for partners
• Standardized evaluations and comparisons• Test on large collections • Failures are not embarrassing, and can be
presented at the TRECVID workshop!• Anyone can participate!– A “priceless” resource for researches