reliving on demand a total viewer experience
DESCRIPTION
Enablingmedia reliving experiences that are aesthetically pleasing,interactive, and semantically drivable as they center on people,locations, time, and events discovered in a media collection.TRANSCRIPT
1
RELIVING ON DEMAND: A TOTAL VIEWER
EXPERIENCE
Vivek K. Singh1*, Jiebo Luo2, Dhiraj Joshi2, Phoury Lei2, Madirakshi Das2, Peter Stubler2
ACM International Conference on Multimedia – ACMM 2011
1 University of California, Irvine, 2 Kodak Research Laboratories, Rochester, NY,
* Work was done when the author was interning at Kodak Research Laboratories, Eastman Kodak Company, Rochester, NY, USA.
Why do people take pictures?
1. Digital re-living
2. Sharing it with
family and friends
What’s available today?
• Commercial Slideshows (Picasa, iPhoto, ACDsee):
• Focus on visual appearance only.
• Don’t understand/utilize semantics (except “FaceMovie”)
• Research efforts: Semantic analysis
• No interaction
• Interaction on demand
• Allow different users to dynamically re-direct the flow of
media reliving experience
Platforms
Desktop
Digital frame
HDTV
Kodak Gallery
Mobile
Kiosk
Preview
• Re-living of events in user’s life, based on WHO,
WHERE, and WHEN .
Outline
• Preview
• Design principles
• System design
• Under the hood (sneak peek)
• Evaluations
Design principles
1. User controllable:
• Responsive to user demand (overcoming intent gap)
2. Semantically drivable:
• Events as organizing units
• Who, when, where; what
3. Aesthetically pleasing:
• Dynamic presentation
• Multimodal (songs, images, videos)
Retrieval vs. Browsing vs. Reliving
• Media by itself is uninteresting unless it performs a
function (e.g. reliving, sharing) for the human user
• Retrieval
• Fetching data. Strong intent (e.g. search)
• Browsing
• Piecemeal reliving. Weak intent (e.g. youtube)
• Reliving
• Valuable middle ground.
• Semantically re-direct the flow if desired.
System overview
System overview: Approach
Media data structure
TypeURL
Aesthetic IVI
dateTimesubjects
location
Height, width
Media
properties
Aesthetic
properties
Semantic
properties
Score Suitability
properties
Pre-processingMedia
Collection
Metadata
Repository
Date and Time
Extraction
Location Information
Extraction
Event Clustering
Aesthetics Value
Extraction
Face Detection
Face Clustering
Geographic
Clustering
Face Labeling
Reordering of event list
• Basic idea
• Time
• People
• Location
Choosing layout
• Default:
i = 2 3 4 5
Choose transitions
• If (criteria=time || criteria=loc)
• Slide In/Out
• If (criteria=personi)
• Face2Face transition
Transform(θ1, trans.X
1,
trans.Y 1, scale 1)
Transform(θ2, trans.X
2,
trans.Y2, scale 2)
Choose song
• If (criteria=time)
• Select seasonal songs (easily extensible to finer grain)
• If (criteria=loc)
• Select regional songs
• If (criteria=personi)
• Select age-based songs (easily extensible to gender)
• Taken from a library of available songs
Show images
• In time order
• Higher score => more display time
• Auto-zoom-crop
• Find center to focus on
• Match the aspect ratio required
• Multiple Holes in transitions
• Token passing amongst holes
• Representative image as background
Logging user sessions<Interaction>
<Click>
<GlobalEventID>urn:guid:f1337996-3c28-4345-b4fb-c4f1b788fc05</GlobalEventID>
<SortedEventID>0</SortedEventID
<TimeStamp>10:17:47 AM</TimeStamp>
<Criteria_type>gps</Criteria_type>
<Criteria_value>61.2175937710438 , -149.898739309764</Criteria_value>
<HotSpotClick>False</HotSpotClick>
</Click>
<Snapshot>
<Locations>
<loc>-149.898739309764,61.2175937710438</loc>
<loc>-73.508556462585,40.5956603174603</loc>
<loc>102.757525301205,25.1018832329317</loc>
<loc>104.195397,35.86166</loc>
<loc>6.09306585111111,52.7236709366667</loc>
</Locations>
<People>
<peo>Jiebo</peo>
<peo>Joyce</peo>
<peo>Xinping</peo>
<peo></peo>
<peo></peo>
</People>
<SortedEvents>
<eve>urn:guid:f1337996-3c28-4345-b4fb-c4f1b788fc05</eve>
<eve>urn:guid:f1337996-3c28-4345-b4fb-c4f1b788fc05</eve>
<eve>urn:guid:f1337996-3c28-4345-b4fb-c4f1b788fc05</eve>
<eve>urn:guid:f1337996-3c28-4345-b4fb-c4f1b788fc05</eve>
<eve>urn:guid:f1337996-3c28-4345-b4fb-c4f1b788fc05</eve>
<eve></eve>
</SortedEvents>
<PicsShown>
<pic>c:\data\jiebo\cvpr2008\103_5972.jpg</pic>
<pic>c:\data\jiebo\cvpr2008\103_5973.jpg</pic>
<pic>c:\data\jiebo\lijiang-shangrila-day2\108_0043.jpg</pic>
<pic>c:\data\jiebo\lijiang-shangrila-day2\108_0044.jpg</pic>
</PicsShown>
</Snapshot>
</Interaction>
Evaluations
• Experiments with 11 families
• 35 user interaction sessions logged
• Roles
• 1st person (owner)
• 2nd person (immediate family)
• 3rd person (friends, cousins )
Age of contributing photographers 23 to 56
No. of images/ videos in the collection 2,091 to 10,522
No. of calendar years in time span 3 to 10
No. of tagged people in the collection 26 to 137
No. of places in the collection 19 to 45
Experiment 1: Comparison with commercially available
options
6.2 Experiment 2: Use of different features across
different user demographics
Females 1.14 1.49 1.13 1.01
Males 1.41 1.25 2.08 1.43
Both 1.30 1.27 1.28 1.35
All 1st party 2nd party 3rd party
Active Vs Passive?
Clicks per axis Stickiness :Time spent after clicks
Future work
• Choosing songs more generically/smartly
• Choosing optimal spatio-temporal placement of
images in the slide show
• Choosing layout
• Choosing transition time?
• Supporting multiple axes simultaneously
• Previews