the trec interactive video track and content-based retrieval from digital video rong yan...

The TREC Interactive Video Track The TREC Interactive Video Track and Content-based Retrieval from and Content-based Retrieval from

Digital VideoDigital Video

Rong Yan

http://www.cs.cmu.edu/~christel/MM2002/syllabus.htm

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 2 Carnegie Mellon

State-of-the-art Multimedia Search EnginesState-of-the-art Multimedia Search Engines

• Recalling the Homework 1 and Homework 2Recalling the Homework 1 and Homework 2• Better for simple concepts, Better for simple concepts,

e.g. Two people kissing, A picture of a giraffe e.g. Two people kissing, A picture of a giraffe • Don’t work for complex queriesDon’t work for complex queries

e.g. A picture of a brick home with black shutters and e.g. A picture of a brick home with black shutters and white pillars, with a pickup truck in front of it (image) white pillars, with a pickup truck in front of it (image)


ExamplesExamples

• Find the pictures of giraffe Find the pictures of giraffe • Keyword: giraffe Keyword: giraffe • http://images.google.com/images?hl=en&lr=lang_zh-CN

%7Clang_en&ie=UTF-8&oe=UTF-8&q=giraffe+

• A picture of a brick home with black shutters and white A picture of a brick home with black shutters and white pillars, with a pickup truck in front of it (image) pillars, with a pickup truck in front of it (image) • brick home shutters brick home shutters • http://images.google.com/images?hl=en&lr=lang_zh-CN

%7Clang_en&ie=UTF-8&oe=UTF-8&q=brick+home+shutters+


Why this happens?Why this happens?

• Most of these search engines are keyword based Most of these search engines are keyword based • ““False” multi-media search engineFalse” multi-media search engine• Have to represent your idea in keywordsHave to represent your idea in keywords• These keywords are expected to appear in the filename, These keywords are expected to appear in the filename,

or corresponding webpageor corresponding webpage

• Therefore……Therefore……• Unable to handle semantic meaning of imagesUnable to handle semantic meaning of images• Unable to handle visual positionUnable to handle visual position• Unable to handle time informationUnable to handle time information• Unable to use images as query Unable to use images as query • ………………..


Solution Solution

• Excerpted from your homeworkExcerpted from your homework

• …………I found that the Google Image Search was not I found that the Google Image Search was not as good as expected. Altavista was the more as good as expected. Altavista was the more useful multimedia search engine. useful multimedia search engine. However, most However, most of them just did a search based on the of them just did a search based on the filename or the matching keywordsfilename or the matching keywords within within the site it was located.the site it was located. I think it would be great I think it would be great to have multimedia search engine intelligent to have multimedia search engine intelligent enough to enough to associate its own keywords based on associate its own keywords based on what's in the imagewhat's in the image..


Solution Solution


• …………I found that the Google Image Search was not I found that the Google Image Search was not as good as expected. Altavista was the more as good as expected. Altavista was the more useful multimedia search engine. useful multimedia search engine. However, most However, most of them just did a search based on the of them just did a search based on the filename or the matching keywordsfilename or the matching keywords within within the site it was located.the site it was located. I think it would be great I think it would be great to have multimedia search engine intelligent to have multimedia search engine intelligent enough to enough to associate its own keywords based on associate its own keywords based on what's in the imagewhat's in the image..

Our Solution: Content-based Information

Retrieval(CBIR)


Solution Solution


• …………I found that the Google Image Search was not I found that the Google Image Search was not as good as expected. Altavista was the more as good as expected. Altavista was the more useful multimedia search engine. useful multimedia search engine. However, most However, most of them just did a search based on the of them just did a search based on the filename or the matching keywordsfilename or the matching keywords within within the site it was located.the site it was located. I think it would be great I think it would be great to have multimedia search engine intelligent to have multimedia search engine intelligent enough to enough to associate its own keywords based on associate its own keywords based on what's in the imagewhat's in the image. .

Our Solution: Content-based Information

Retrieval(CBIR)

Mainly Video Retrieval in this

lecture


Content-based Video RetrievalContent-based Video Retrieval

• ApplicationApplication

• ImplementationImplementation

• Effort on TREC02 video trackEffort on TREC02 video track• Feature Extraction Task (High-level Semantics Feature)Feature Extraction Task (High-level Semantics Feature)• Manual Retrieval Task (One-run Retrieval)Manual Retrieval Task (One-run Retrieval)• Interactive Retrieval Task (Multiple-run with Feedback)Interactive Retrieval Task (Multiple-run with Feedback)• Results & DemoResults & Demo

• ConclusionConclusion


Application Application

• Increasing demand for visual information retrievalIncreasing demand for visual information retrieval• Retrieve useful information from databasesRetrieve useful information from databases• Sharing and distributing video data through computer Sharing and distributing video data through computer

networksnetworks

• Example: BBCExample: BBC• BBC archive has +500k queries plus 1M new items … BBC archive has +500k queries plus 1M new items …

per year;per year;• From the BBC …From the BBC …

• Police car with blue light flashing Police car with blue light flashing

• Government plan to improve reading standards Government plan to improve reading standards

• Two shot of Kenneth Clarke and William HagueTwo shot of Kenneth Clarke and William Hague


Application ( Cont. ) Application ( Cont. )

• Video Surveillance Video Surveillance • Find where else the person appearsFind where else the person appears

• Experience On-DemandExperience On-Demand• Help to remember previous eventsHelp to remember previous events

• Provide useful information on travelingProvide useful information on traveling• Equipment on cars to retrieve useful multimedia Equipment on cars to retrieve useful multimedia

information according to your location/preferenceinformation according to your location/preference

• ………………

• Video content is plentiful … its now available digitally … Video content is plentiful … its now available digitally … we can work on it directly … so it followswe can work on it directly … so it follows


Typical Retrieval FrameworkTypical Retrieval Framework

• User : provide query information that User : provide query information that represents his information needs represents his information needs

• Database: store a large collection of Database: store a large collection of video datavideo data

• Goal: Find the most relevant shots Goal: Find the most relevant shots from the databasefrom the database• Shots: “paragraph” in video, typically Shots: “paragraph” in video, typically

20 – 40 seconds, which is the basic 20 – 40 seconds, which is the basic unit of video retrieval unit of video retrieval


Sample QuerySample Query

• Text : Text : Find pictures of George WashingtonFind pictures of George Washington

• Image:Image: • Video:Video:


Bridging the Gap Bridging the Gap

Video DatabaseUser

Result


Automatically Structure Video DataAutomatically Structure Video Data

• The first step for video retrieval: Video “programmes” The first step for video retrieval: Video “programmes” are structured into logical scenes, and physical shots are structured into logical scenes, and physical shots

• If dealing with text, then the structure is obvious: If dealing with text, then the structure is obvious: • paragraph, section, topic, page, etc.paragraph, section, topic, page, etc.

• All text-based indexing, retrieval, linking, etc. builds All text-based indexing, retrieval, linking, etc. builds upon this structure;upon this structure;

• Automatic shot boundary detection and selection of Automatic shot boundary detection and selection of representative keyframes is usually the first step;representative keyframes is usually the first step;


Typical automatic structuring of videoTypical automatic structuring of video

A set of shots

a video document

Keyframe browser combined with transcript or object-based search


Bridging the Gap Bridging the Gap

Video DatabaseUser

Video Structure Information Need

Result


Ideal solution Ideal solution

Video DatabaseUser

Video Structure Information NeedUnderstanding the semantic meaning and retrieve

Result


Ideal solution Ideal solution

Video DatabaseUser

Video Structure Information NeedUnderstanding the semantic meaning and retrieve

Result

However, 1. Hard to represent query in

natural language and for computer to understand

2. Computers have no experience3. Other representation

restriction like position, time


Alternative SolutionAlternative Solution

Video DatabaseUser


Match and combine

Result

Provide evidence of relevant information ( text, image, audio)


Evidence-based Retrieval SystemEvidence-based Retrieval System

• General framework for current video retrieval systemGeneral framework for current video retrieval system

• Video retrieval based on the evidence from both users Video retrieval based on the evidence from both users and database, includingand database, including• Text information Text information • Image informationImage information• Motion informationMotion information• Audio informaitonAudio informaiton

• Return a relevant score for each evidenceReturn a relevant score for each evidence

• Combination of the scoresCombination of the scores


Keyword-based SystemKeyword-based System

Video DatabaseUser


Automatic Annotation

Keyword

Including filename, video title, caption,

related web page


Keyword-based SystemKeyword-based System

Video DatabaseUser


Automatic Annotation

Keyword

Manual Annotation


Manual AnnotationManual Annotation

• Manually creating annotation/keywords for image / Manually creating annotation/keywords for image / video datavideo data

• Examples: Examples: Gettyimage.com (image retrieval) (image retrieval)

• Pros: Pros: • Represent the semantic meaning of videoRepresent the semantic meaning of video

• ConsCons• Time-consuming, labor-intensive Time-consuming, labor-intensive • Keyword is not enough to represent information needKeyword is not enough to represent information need


Speech and OCR transcriptionSpeech and OCR transcription

Video DatabaseUser


Annotation

Keyword

Speech Transcription

OCR Transcription


Query using speech/OCR informationQuery using speech/OCR information

Query: Query: Find pictures of Harry Hertz, Find pictures of Harry Hertz,

Director of the National Quality Director of the National Quality Program, NISTProgram, NIST

Speech: We’re looking for people that have a broad range of expertise that have business knowledge that have knowledge on quality management on quality improvement and in particular …

OCR:H,arry Hertz a Director aro 7 wa-,i,,ty Program,Harry Hertz a Director


What we lack?What we lack?

Video DatabaseUser


Annotation

Keyword

Speech Transcription

OCR Transcription

Image Information


Image-based RetrievalImage-based Retrieval

Video DatabaseUser


Text Information

Keyword

Image Feature

Query Images


Global Low-level Image FeatureGlobal Low-level Image Feature

• Color-based FeatureColor-based Feature• Color HistogramColor Histogram• Color PecentageColor Pecentage• Color CorrelogramColor Correlogram• Color MomentsColor Moments

• Texture-based FeatureTexture-based Feature• Gabor FilterGabor Filter• Wavelet Wavelet

• Shape/Structure FeatureShape/Structure Feature


Regional Low-level Image FeatureRegional Low-level Image Feature

• Segmentation into objectsSegmentation into objects

• Extract low-level features from each regionsExtract low-level features from each regions


Image SearchImage Search

• Feature RepresentationFeature Representation• Image: represented as a series of real number, or a Image: represented as a series of real number, or a

vector of features, vector of features, (f1, …., fn)(f1, …., fn)• Distance Function: The distance between two vectors, Distance Function: The distance between two vectors,

typically Euclidean Distancetypically Euclidean Distance• We believe “Nearest is relevant”We believe “Nearest is relevant”

• The nearest images in the database is relevant to the query The nearest images in the database is relevant to the query images. images.


Finding Similar ImagesFinding Similar Images


But…..But…..• Low-level feature doesn’t work in all the cases


High-level Image FeatureHigh-level Image Feature

• Objects: Persons, Roads, Cars, Skies…Objects: Persons, Roads, Cars, Skies…

• Scenes: Indoors, Outdoors, Cityscape, Landscape, Scenes: Indoors, Outdoors, Cityscape, Landscape, Water, Office, Factory…Water, Office, Factory…

• Event: Parade, Explosion, Picnic, Playing Soccer…Event: Parade, Explosion, Picnic, Playing Soccer…

• Generated from low-level featuresGenerated from low-level features


Image Feature

Image-based RetrievalImage-based Retrieval

Video DatabaseUser


Text Information

Keyword

Query Images

Low-level Feature

High-level Feature


More Evidence in Video RetrievalMore Evidence in Video Retrieval

Video DatabaseUser


Text Information

Keyword

Image Information

Query Images

MotionInformation

Audio Information

Motion

Audio


Combination of multi-modal resultsCombination of multi-modal results

• Difference characteristics between multi-modal Difference characteristics between multi-modal informationinformation• Text-based InformationText-based Information: better for middle and high level : better for middle and high level

queriesqueries• e.g. Find the video clip of dancing women wearing dresses e.g. Find the video clip of dancing women wearing dresses

• Image-based InformationImage-based Information: better for low and middle level : better for low and middle level queriesqueries

• e.g. Find the video clip of green treese.g. Find the video clip of green trees

• Combination of multi-modal informationCombination of multi-modal information


Other Useful TechniqueOther Useful Technique

• Query ExpansionQuery Expansion

• Cross-Modal RelationCross-Modal Relation

• Relevance FeedbackRelevance Feedback


RecapRecap

• Video Retrieval is to bridge the gap between user Video Retrieval is to bridge the gap between user information need and video databaseinformation need and video database

• Multi-modal evidenceMulti-modal evidence• Text-based (most popular) Text-based (most popular) • Image-basedImage-based• Motion-basedMotion-based• Audio-basedAudio-based

• Combination of the evidenceCombination of the evidence


Content-based Video RetrievalContent-based Video Retrieval

• ApplicationApplication

• ImplementationImplementation

• TREC video trackTREC video track• Feature Extraction Task (High-level Semantics Feature)Feature Extraction Task (High-level Semantics Feature)• Manual Retrieval Task (One-run Retrieval)Manual Retrieval Task (One-run Retrieval)• Interactive Retrieval Task (Multiple-run with Feedback)Interactive Retrieval Task (Multiple-run with Feedback)• Results & DemoResults & Demo

• ConclusionConclusion


Introduction to TREC Video Retrieval TrackIntroduction to TREC Video Retrieval Track

• Full Name: Text REtrieval Conference Full Name: Text REtrieval Conference

• TREC Video Track web site: TREC Video Track web site: http://www-nlpir.nist.gov/projects/trecvid/

• TREC series sponsored by the National Institute of TREC series sponsored by the National Institute of Standards and Technology (NIST) with additional support Standards and Technology (NIST) with additional support from other U.S. government agenciesfrom other U.S. government agencies• Goal is to encourage research in information retrievalGoal is to encourage research in information retrieval


Introduction to TREC Video Retrieval TrackIntroduction to TREC Video Retrieval Track

• Video Retrieval Track started in 2001Video Retrieval Track started in 2001• Goal is investigation of content-based retrieval from digital Goal is investigation of content-based retrieval from digital

videovideo• Focus on the Focus on the shotshot as the unit of information retrieval rather as the unit of information retrieval rather

than the scene or story/segment/clipthan the scene or story/segment/clip

• Current state-of-the-art Video Retrieval CompetitionCurrent state-of-the-art Video Retrieval Competition• 17 active participants, including groups from CMU, IBM 17 active participants, including groups from CMU, IBM

Research, Microsoft Research Asia, MediaMill, LIMSI, Dublin Research, Microsoft Research Asia, MediaMill, LIMSI, Dublin City University.City University.


Main tasks in TRECMain tasks in TREC

• Shot boundary detectionShot boundary detection

• Semantic Feature Extraction Task Semantic Feature Extraction Task

• Video Retrieval TaskVideo Retrieval Task• Manual Retrieval: Manual Retrieval: Human formulate a query and then Human formulate a query and then

automatically retrieve from collectionautomatically retrieve from collection• Interactive Retrieval: Interactive Retrieval: Full human access and feedbackFull human access and feedback


Image Feature

Where are they?Where are they?

Video DatabaseUser


Text Information

Keyword

Query Images

Low-level Feature

High-level Feature

Shot Boundary Detection Feature

Extraction

Retrieval Task


Video DataVideo Data

• Difficult to get video data for use in TREC because ©Difficult to get video data for use in TREC because ©

• Used mainly Used mainly Internet Archive• advertising, educational, industrial, amateur films 1930-1970 advertising, educational, industrial, amateur films 1930-1970 • produced by corporations, non-profit organisations, trade produced by corporations, non-profit organisations, trade

groups, etc.groups, etc.• Noisy, strange color, but real archive dataNoisy, strange color, but real archive data• 73.3 hours partitioned as follows:73.3 hours partitioned as follows:

4.85

5.07

23.26

40.12Search test

Feature development(training and validation)

Feature test

Shot boundary test


Shot Boundary DetectionShot Boundary Detection

• Fundamental primitive of most/all work in content-based Fundamental primitive of most/all work in content-based video retrievalvideo retrieval

A set of video shots

a video document


Feature ExtractionFeature Extraction

• Extracted high-level semantic feature from videoExtracted high-level semantic feature from video

• Assign a video clip to one or more of several categories Assign a video clip to one or more of several categories of videoof video

High-level features:Cityscape, Lake,

Trees, Water, Sky


Feature ExtractionFeature Extraction

• Interesting itself but when it serves to help video navigation and search then its importance increases

• Benefits:

• Retrieval - Find video from a particular class

• Filtering - Remove irrelevant and distracting information categories from summaries and visualizations


The FeaturesThe Features

FaceFace

Clip contains at least one human face with the nose, Clip contains at least one human face with the nose, mouth, and both eyes visible. Pictures of a face meeting mouth, and both eyes visible. Pictures of a face meeting the above conditions countthe above conditions count

PeoplePeople

Clip contains a group of two more humans, each of which Clip contains a group of two more humans, each of which is at least partially visible and is recognizable as a humanis at least partially visible and is recognizable as a human

On-screen TextOn-screen Text

Clip contains superimposed text large enough to be readClip contains superimposed text large enough to be read


The FeaturesThe Features

IndoorIndoor

Clip contains a recognizably indoor location, i.e., inside a building Clip contains a recognizably indoor location, i.e., inside a building

OutdoorOutdoorClip contains a recognizably outdoor location, i.e., one outside of Clip contains a recognizably outdoor location, i.e., one outside of

buildingsbuildings

CityscapeCityscapeClip contains a recognizably city/urban/suburban settingClip contains a recognizably city/urban/suburban setting

LandscapeLandscapeClip contains a predominantly natural inland setting, i.e., one with Clip contains a predominantly natural inland setting, i.e., one with

little or no evidence of development by humans. Scenes with little or no evidence of development by humans. Scenes with bodies of water that are clearly inland may be includedbodies of water that are clearly inland may be included


Non-Video (Audio) FeaturesNon-Video (Audio) Features

SpeechSpeechA human voice uttering words is recognizable as such in this A human voice uttering words is recognizable as such in this

segment segment

Instrumental SoundInstrumental SoundSound produced by one or more musical instruments is Sound produced by one or more musical instruments is

recognizable as such in this segmentrecognizable as such in this segment

MonologuesMonologuesSegment contains an event in which a single person is at least Segment contains an event in which a single person is at least

partially visible and speaks for a long time without partially visible and speaks for a long time without interruption by another speaker. Pauses are ok if shortinterruption by another speaker. Pauses are ok if short


TREC02 Results TREC02 Results

0

0.2

0.4

0.6

0.8

1 2 3 4 5 6 7 8 9 10

Feature

Avera

ge p

recis

ion

CMU_r1

A_CMU_r2

CLIPS-LIT_GEOD

CLIPS-LIT-LIMSU

DCUFE2002

Eurecom1

Fudan_FE_Sys1

Fudan_FE_Sys2

IBM-1

IBM-2

MediaMill1

MediaMill2

MSRA

UnivO_MT1

UnivO_MT2

Avg Prec Cap

Outd

oors

Indo

ors

Face

Peop

le

City

scap

e Land

sca

pe Text

ov

erla

ySp

eec

h

Inst

rum

enta

l

so

und

Mon

olo

g

Random baseline


Video Search TaskVideo Search Task

• The most important task and final goalThe most important task and final goal

• Manual & Interactive Search Task Manual & Interactive Search Task


Queries for 2002 TREC Video TrackQueries for 2002 TREC Video Track

• Specific item or personSpecific item or person• Eddie Rickenbacker, James Chandler, George Washington, Golden Eddie Rickenbacker, James Chandler, George Washington, Golden

Gate Bridge, Price Tower in Bartlesville, OK Gate Bridge, Price Tower in Bartlesville, OK

• Specific factSpecific fact• Arch in Washington Square Park in NYC, map of continental USArch in Washington Square Park in NYC, map of continental US

• Instances of a categoryInstances of a category• football players, overhead views of cities, one or more women football players, overhead views of cities, one or more women

standing in long dressesstanding in long dresses

• Instances of events/activitiesInstances of events/activities• people spending leisure time at the beach, one or more musicians people spending leisure time at the beach, one or more musicians

with audible music, crowd walking in an urban environment, with audible music, crowd walking in an urban environment, locomotive approaching the viewerlocomotive approaching the viewer



• XML RepresentationXML Representation

<!DOCTYPE videoTopic SYSTEM "videoTopics.dtd"><videoTopic num="077">

<textDescription text="Find pictures of George Washington" />

<imageExample src="http://www.cia.gov/csi/monograph/firstln/955pres2.gif" desc="face" />

<videoExample src="01681.mpg" start="09m25.938s" stop="09m29.308s" desc="face" />

</videoTopic>



• Text : Text : Find pictures of George WashingtonFind pictures of George Washington

• Image:Image: • Video:Video:


Evaluation MetricEvaluation Metric

• Goal: Maximize the Mean Average PrecisionGoal: Maximize the Mean Average Precision• Result set limited to 100 shotsResult set limited to 100 shots• Precision = (# relevant shots retrieved)/(total # shots retrieved)Precision = (# relevant shots retrieved)/(total # shots retrieved)• Average precision: compute precision after each retrieved Average precision: compute precision after each retrieved

relevant shot and then average these precisions over the total relevant shot and then average these precisions over the total number of retrieved relevant shots in the collection for that number of retrieved relevant shots in the collection for that topictopic

• Submitting the maximum number of shots per result set can Submitting the maximum number of shots per result set can never lower the average precision for that submissionnever lower the average precision for that submission

• Mean Average Precision = average of the average precision Mean Average Precision = average of the average precision measures for each topicmeasures for each topic


CMU Manual Retrieval System CMU Manual Retrieval System

Query

Text Image

Final Score

Text Score

ImageScore

RetrievalAgents

MovieInfo

PRFScore


Snapshot of the systemSnapshot of the system


Manual Search ResultManual Search Result

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9 1

Recall

Pre

cis

ion

Prous Science

IBM-2

CMU_MANUAL1

IBM-3

LL10_T

CLIPS+ASR

Fudan_Search_Sys4

CLIPS+ASR+X

ICMKM-2

UMDMqtrec


CMU Interactive Search SystemCMU Interactive Search System

• New Interface based on Informedia systemNew Interface based on Informedia system

• Multiple document storyboardsMultiple document storyboards

• Query context plays a key role in filtering image sets to Query context plays a key role in filtering image sets to manageable sizesmanageable sizes

• TREC 2002 image feature set offers additional filtering TREC 2002 image feature set offers additional filtering capabilities for indoor, outdoor, faces, people, etc.capabilities for indoor, outdoor, faces, people, etc.

• Displaying filter count and distribution guides their use Displaying filter count and distribution guides their use in manipulating the storyboard viewsin manipulating the storyboard views


Snapshot of the systemSnapshot of the system


Filter Interface for using Image FeaturesFilter Interface for using Image Features


Interactive runs top 10 (of 13)Interactive runs top 10 (of 13)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9 1

Recall

Pre

cis

ion

CMUInfInt1

DCUTrec11B.1

DCUTrec11C.2

CMU_INTERACTIVE_2

UnivO_MT5

IBM-4

DCUTrec11B.3

DCUTrec11C.4

UMDIqtrec

MSRA.Q-Video.2a

Prous Science

IBM-2

CMU_MANUAL1


Mean AvgP vs mean elapsed timeMean AvgP vs mean elapsed time

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0

5

10

15

20

25

30

IuVf1CMUInfIn

t1CMU_IN

TERACTIVE

DCUTrec11

B.

1DCUTrec

11B.

3DCUTrec

11C.

2DCUTrec

11C.4

IBM-4

ICMKM-1

MSRA.Q-

Video.1

MSRA.Q-V

ideo.2.

A

UMDIqtrec

UNivO_M

T5

Mean average precision

Mean elapsed time (mins.)

Wide variation in elapsed time.Not the dominant factor in effectiveness


DemoDemo

• CMU Interactive Search SystemCMU Interactive Search System

• IBM Video Retrieval SystemIBM Video Retrieval Systemhttp://mp7.i2.ibm.com/


ConclusionConclusion

• The goal of content-based video retrieval is to build The goal of content-based video retrieval is to build more intelligent video retrieval engine via semantic more intelligent video retrieval engine via semantic meaningmeaning

• Many applications in daily lifeMany applications in daily life

• Combine evidence from different aspectsCombine evidence from different aspects

• Hot research topic, few business systemHot research topic, few business system

• State-of-the-art performance is still unacceptable for State-of-the-art performance is still unacceptable for normal users, space to improvenormal users, space to improve

the trec interactive video track and content-based retrieval from digital video rong yan...

Documents