caip 2015 introduction on visual lifelogging i part
TRANSCRIPT
Deep learning and egocentric vision
1
Pe3a Radeva & Mariella Dimiccoli
www.cvc.uab.es/~pe.a
Barcelona Perceptual Compu.ng Laboratory (BCNPCL), Universitat de Barcelona &
Computer Vision Center
Index
ì 1. Egocentric Vision and Lifelogging
ì 2. Deep learning
ì 3. Temporal egocentric segmenta.on for event extrac.on
ì 4. Social analysis from egocentric images
2
Index
ì 1. Egocentric Vision and Lifelogging
ì 2. Deep learning
ì 3. Temporal egocentric segmenta.on for event extrac.on
ì 4. Social analysis from egocentric images
3
Wearable cameras and the life-‐logging trend
IMEC, University of Eindhoven, March, 2015 4
Shipments of wearable compu3ng devices worldwide by category from 2013 to 2015 (in millions)
Technology is “running”!
Evolu.on of life-‐logging apparatus, including wearable computer, camera, and viewfinder with wireless Internet connec.on. Early apparatus used separate transmiVng and receiving antennas. Later apparatus evolved toward the appearance of ordinary eyeglasses in the late 1980s and early 1990s.
5 “Quan.fied Self & life-‐logging MeetsInternet of Things (IOT)”, Mazzlan Abbas.
Visual life-‐logging Defini3on: Visual life-‐logging consists of acquiring images related to an individual through a wearable camera.
Benefits:
ì A digital memory of people you met, conversa3ons you had, places you visited, and events you par.cipated in. ì This memory would be searchable, retrievable, and shareable.
ì A 14/7/365 monitoring of daily ac3vi3es. ì This data could serve as a warning system and also as a personal base upon which to diagnosis illness and to prescribe medicines.
ì A way of organizing, shaping, and “reading” your own life. ì A complete archive of your work and play, and your work habits. Deep compara.ve analysis of your ac.vi.es could assist your
produc.vity, crea.vity, and consump.vity.
ì To the degree this life-‐log is shared, this archive of informa.on can be leveraged to help others work, amplify social interac3ons, and in the biological realm, shared medical logs could rapidly advance medicine discoveries.
6
Will life-‐logging and internet of things help know us beber?
7
Don’t you love to know… • Where you’re going?! Who you’ve interacted with?!
• How long you’ve spoken to friends?! The affinity to connec.ons?!
• How long it takes to get to work?!
• The tone of your messages?! The amount you text, tweet, or update?!
• How much exercise you’re geVng?!
• How much you get distracted?! Where is your .me most spent?
Will life-‐logging and internet of things help know us beber?
8
Life-‐logging everything you see, do, feel, speak, experience and hear is almost here.
Wearable camera can provide many benefits, such as assis.ve technologies to help people:
• see beber, • store and remember beber, • work beber, • func.on beber, • remember and recognize names and faces…
AMDO'2014 9
The three trends of the mainstream Lifelogging • Lifelogging is keeping everything by
using technology to capture events, documents and experiences and keep them in a chronological .meline.
Life-‐streaming (social networking) • Life-‐streaming is sharing details of your life
with other people. Self-‐quan3fied (health & wellfare) • Monitors your biological status that you
capture but for the purpose of improving and sharing
Social networks & lifelogging ì Facebook originally began as a college networking website in 2003 and has expanded beyond anyone
could imagine. ì Facebook has more than 350 million ac.ve users and more than 2.5 billion photos are uploaded every month.
ì Flickr.com is an image and video hos.ng website launched in 2004 by a company called Ludicorp. ì Flickr offers its users a comprehensive photo organisa.on and tagging, which makes it the web‟s Number 1
website for bloggers to share their photos.
ì Flickr has around 44 million users.
AMDO'2014 10
“To Log or Not to Log? Privacy Risks and Solu.ons for Lifelogging and Con.nuous Ac.vity Sharing Applica.ons”, Blaine Price, ENISA, Centre for Research in Compu.ng.
• Youtube is the most popular video sharing site that lets anyone upload short videos for private or public viewing. • Founded in 2005 by Chad Hurley, Steve Chen and
Jawed Karim, Youtube is one of the Top-‐5 most visited websites in the world with more than 5 billion videos uploaded monthly.
Social networks & lifelogging
ì Twiaer.com is a social networking website created in 2006, offering a micro-‐blogging service that lets users send and read short text messages known as tweets up to 140 characters in length.
ì Twiber is becoming more popular and as a result, more than 3 million tweets are published daily.
AMDO'2014 11
• Instagram community has grown to more than 200 million Instagrammers capturing and sharing their lives every month.
• As Instagram exceed 20 billion photos shared on Instagram to date, we look back in wonder at the beauty and importance of everything this community has created.
Health, wellfare & lifelogging
What is Quan.fied Self (QS)? The con.nuous tracking of various aspects of our physical bodies: • how many calories we burn; • how many hours a week we spend commu.ng or siVng;... • our body fat percentage; • how many steps we take in a day; • how long we sleep;…
AMDO'2014 12
Source: Swan, M. Sensor Mania! The Internet of Things, Objec.ve Metrics, and the Quan.fied Self 2.0. J Sens Actuator Netw 2012.
How many self-‐trackers/self-‐quan3fiers are there in the audience?
Increasingly con3nuous and automated data collec3on for: Physical Ac3vi3es • Miles, steps, calories, repe..ons, sets, METs1 Diet and Nutri3on • Calories consumed, carbs, fat, protein, specific ingredients, glycemic index, sa.ety,
por.ons, supplement doses, tas.ness, cost, loca.on Psychological, Mental, and Cogni3ve States and Traits • Mood, happiness, irrita.on, emo.on, anxiety, esteem, depression, confidence IQ, alertness, focus, selec.ve/sustained/divided aben.on, reac.on, memory, verbal fluency, pa.ence, crea.vity, reasoning, psychomotor vigilance Environmental Variables • Loca.on, architecture, weather, noise, pollu.on, cluber, light, season Situa3onal Variables • Context, situa.on, gra.fica.on of situa.on, .me of day, day of week Social Variables • Influence, trust, charisma, karma, current role/status in the group or social network
Living “digitally”
Life signals: The marketplace is soon to be flooded with inexpensive sensors (smart phones already integrate cameras, microphones, GPS, and accelerometers):
ì collect a wide variety of data in digital form—e.g., con3nuous video and sound of one’s daily ac3vi3es, sport performance, body weight, or sleep paaerns—
ì automa.cally geoloca.ng and uploading personal data to websites ,
ì providing sta3s3cal and visualiza3on tools for forecas.ng and trend analysis.
Con3nuous and complete life signals: If one factors in the gradual elimina.on of paper in favor of digital forms for commercial transac.ons, communica.on, documenta.on, etc., and the con.nually plumme.ng costs of digital storage,
ì the picture of a world where “everyone is on the record all the 3me” does not seem far-‐fetched.
Digital & on the cloud: Furthermore, for the first .me in history, these different types of records — images, quan.ta.ve data, audio recordings, wriben documents, etc. — are available for transmission, storage, indexa3on, analysis, retrieval, and visualiza3on in a single media.
AMDO'2014 13
Do you know that…?
hbp://ideas.ted.com/2014/11/17/the-‐economic-‐impact-‐of-‐bad-‐mee.ngs/
University of Groningen, November, 2014 15
Do you know that…?
University of Groningen, November, 2014
hbp://ideas.ted.com/2014/11/17/the-‐economic-‐impact-‐of-‐bad-‐mee.ngs/
Do you know that…?
Mee.ngs and us
ì Is our impression about our .me spent correct?
ì How to extract a summary of the mee.ng event in order to display it and op.mize our ac.vi.es?
ì What about recording all our day, day by day?
CAIP, Malta, September, 2015 17
Is it feasible to record everything that happens in a person’s life?
ì In 1970, a disk to store 20 MB was the size of a washing machine and costed 20.000$.
ì Today a TB (one trillion bytes) costs a 100$ and is the size of a paperback book.
ì By 2020 a TB will cost the same as a good cup of coffee and will probably be in your cell phone.
ì 100$ will then buy around 250 TB of storage, enough to hold tens of thousands of hours of video and tens of millions of photographs. ì This should sa.sfy most life-‐loggers’ recording needs for an en3re life.
ì In fact, digital storage capacity is increasing faster than our ability to pull informa.on back out. ì From 2000 it became trivial and cheap to sock away tremendous piles of data.
“Total: How the E-‐memory revolu.on will change everything”, Gordon Bell and Jim Gemmel, 2009, Dubon, Penguin Group.
18
The Moore’s Law (1965): “Transistor density that can be etched onto the silicon wafer of a microchip doubles every two years”.
Is it feasible to record everything that happens in a person’s life?
“Total: How the E-‐memory revolu.on will change everything”, Gordon Bell and Jim Gemmel, 2009, Dubon, Penguin Group.
19
The Moore’s Law (1965): “Transistor density that can be etched onto the silicon wafer of a microchip doubles every two years”.
The hard part is no longer deciding what to hold on to, but how to efficiently organize it, sort it, access it, and find paaerns and meaning in it.
This is a primary challenge for the engineers that will fully unleash the power of Total Recall.
Data management
CAIP, Malta, September, 2015 20
The really big issue here is that you might, individually, not worry about recording and even publishing details of your personal life.
• But you are publishing your friends, family and business contacts details at the same 3me.
Ethical guidelines for wearable cameras
ì Anonimity and confiden3ality: Researchers coding image data should: ì not discuss the content with anyone outside of the team, ì not iden.fy anyone they recognize in the images,
ì be aware of how sensi.ve the data are.
ì Data encryp3on: Conxden.ality can be protected by conxguring devices and using specialist viewing soyware to make the images accessible only to the research team (lost devices). ì Devices should be configured so that data can only be retrieved by the research team. It
should be impossible for par.cipants or third par.es who find devices to access the images.
ì Data storage: Collected images should be stored securely and password-‐protected, according to na.onal regula.ons.
21
Understanding user privacy requirements and risks from emerging technologies
ì People are bad at: 1. understanding the future value of revealing private informa.on
today,
2. understanding the risks from technology they have not yet used or heard of.
22
Wearable cameras can be very useful: An es.mated one million Russian motorists have dashboard video cameras installed in their cars.
Police officers carring video camera units and using Velcro to place these cameras in police wagons, helmet cams, ear cams, chest cams with audio capability, GPS locators, taser cams
• “Even with only half of the 54 uniformed patrol officers wearing cameras at any given .me, a department in USA had an 88 % decline in the number of complaints filed against officers, compared with the 12 months before the study”, The New York Times, 4th of July, 2013.
The world’s leading police body worn video camera deployed by over 4000 agencies in 16 countries.
Benefits and poten.al applica.ons
ì It will take quite some .me for people to feel comfortable with ‘always connected’ devices that can discreetly take photos or videos. Will the benefits outweigh the nega.ves?
“Quan.fied Self & life-‐logging Meets Internet of Things (IOT)”, Dr. Mazlan Abbas, MIMOS Berhad
23
Life-‐logging and Egocentric Vision
Egocentric vision “Taking photos can impair your ability to remember”
Next 4me you're at a museum or an event, think before you snap a photo!
“SenseCam has already made an impact in memory enhancement”
ì What do we have? ì Self centric, automa.cally captured Images of our life ì Objec3ve photos versus Subjec3ve (tradi.onal) photos ì Most similar photos to our memory
25
Egocentric vision “Taking photos can impair your ability to remember”
Next 4me you're at a museum or an event, think before you snap a photo!
“SenseCam has already made an impact in memory enhancement”
ì What do we have? ì Self centric, automa.cally captured Images of our life ì Objec3ve photos versus Subjec3ve (tradi.onal) photos ì Most similar photos to our memory
ì What we are looking for? ì Seman3c informa.on (life-‐logs, NOT data)
ì A Search Engine for the Self is required! (life-‐logging)
26
Egocentric vision
ì What we want:
Events to be extracted from life-‐logging images
27
Egocentric vision
28
ì What we have:
Wealth of life-‐logging data ì We propose an energy-‐based approach for mo.on-‐based event
segmenta.on of life-‐logging sequences of low temporal resolu.on
ì -‐ The segmenta.on is reached integra.ng different kind of image features and classifiers into a graph-‐cut framework to assure consistent sequence treatment.
Complete dataset of a day captured with SenseCam (more than 4,100 images
IMEC, University of Eindhoven, March, 2015 29
Choice of devise depends on: 1) where they are set: a hung up camera has the advantage that is considered more unobtrusive for the user, or 2) their temporal resolu.on: a camera with a low fps will capture less mo.on informa.on, but we will need to process less data. We chose a SenseCam or Narra.ve -‐ cameras hung on the neck or pinned on the dress that capture 2-‐4 fps.
Or the hell of life-‐logging data
Towards lifestyle characteriza.on We want to extract life-‐logging informa3on about:
ì Events
ì Ac.vi.es
ì Social interac.ons, etc..
ì Memorable moments
ì Personal habits and context
ì Lifestyle… healthy lifestyle…
30
Our egocentric vision research:
ì Video segmenta.on for events extrac.on
ì Mo.on-‐based video segmenta.on towards ac.vi.es recogni.on
ì Human tracking, towards social interac.on and key-‐frame extrac.on
ì Ac.ve learning for object recogni.on
ì Object discovery for lifestyle characteriza.on, etc, etc.
What? Where? When? Who?
Egocentric vision: an overview
31 Marc Bolaños, Mariella Dimiccoli, Pe3a Radeva, “Towards Storytelling from Visual Lifelogging: An Overview”, arXiv:1507.06120, 2015, (submiaed to a Journal).
Informa.veness CNN
Day Lifelog Informa.ve Images
Egocentric informa.veness through deep learning
Temporal video segmenta.on
33
Life-‐logging Video Segments Extrac3on Data set: thousands of
images
Features Extrac.on Clustering Daily Video
Summariza.on
Event extrac.on Event
characteriza.on
RGB+HOG CNN
Ward k-‐Means
Spectral Clustering
Estefania Talavera, Mariella Dimiccoli, Marc Bolaños, Maedeh Aghaei and Pe3a Radeva. R-‐clustering for Egocentric Video Segmenta3on, IBPRIA’15, San3ago de Compostela, 2015, (extended version submiaed to a journal).
Mo.on-‐based video segmenta.on ì Event segmenta.on in LL data is characterized by the movement of the
person wearing the device.
ì Group consecu.ve frames in three general event classes: • ”Sta.c” (person is not
moving) • ”In Transit” (person is moving
or running) • ”Moving Camera” (person is
in the same area performing some ac.on).
34 M. Bolaños, M. Garolera, P. Radeva, “Video segmenta3on of life-‐logging videos”, AMDO’2014, Springer-‐Verlag, 2014, p.1-‐9.
Visual summary through keyframe extrac.on
M Bolaños, R Mestre, E Talavera, X Giró-‐i-‐Nieto, P Radeva, VISUAL SUMMARY OF EGOCENTRIC PHOTOSTREAMS BY REPRESENTATIVE KEYFRAMES, arXiv preprint arXiv:1505.01130, 2015 IEEE Interna3onal Conference on Mul3media and ExpoWorkshops (ICMEW).
35
Results
Joint work btw UPC & UB.
Seman.c Summariza.on
• Images should be:
1. Not repe33ve
2. Not-‐usual
• And answering to:
1. Who?
2. What?
3. How?
4. When?
5. Where?
In order to construct the storytelling:
Results:
We define a criterion to choose the keyframes based on saliency, objectness and faceness:
Master of Aniol lLidon, supervised by Xavier Giro-‐i-‐Nieto a¡& Pe3a Radeva Joint work btw UPC & UB.
Food detec.on in egocentric vision How can we automa.cally detect every instance of a dish in all of its variants, shapes and posi3ons and in such a large number of images? The main problems that arise are: • Complexity and variability of the data. • Huge amounts of data to analyse (up to 100.000 images per month). Any efficient supervised classifier needs a huge amount of training set! Using human .me for labelling million of images is too expensive.
37
We need an automa3c aid for labelling huge amount of images.
Ac.ve learning for nutri.on-‐related objects annota.on
38 Marc Bolaños, Maite Garolera, Pe.a Radeva, "Ac.ve Labeling Applica.on Applied to Food-‐Related Object Recogni.on", 5th Workshop on Mul.media for Cooking and Ea.ng Ac.vi.es : CEA2013, ACM Interna.onal Conference on Mul.media 2013, Barcelona October, 2013.
Ac.ve Learning
ì Goal: Minimize human effort by guiding the process of crea.ng the training set.
39
Effect of ac.ve learning: improvement of learning performance (ley), improvement of training .me (right).
Object discovery for context charaterisa.on ì What is the context of a person day-‐by-‐day?
Do these sets belong to the same person?
Person 1
Person 1
Person 2
M. Bolaños, P. Radeva, “Object Discovery for Egocentric Videos Based on Convolu.onal Neural Network Features”, Workshop on Story telling, ECCV, 2014. Marc Bolaños, Maite Garolera and Pe.a Radeva. Object Discovery using CNN Features in Egocentric Videos, IBPRIA’15, San.ago de Compostela, 2015. (submibed to a journal)
40
Social analysis ì Who was with us? For how long?
ì Humans are important
ì Video segmenta3on based on presence of people ì Each segment is an event related to specific person
ì Intui.ve solu.on is to track visible people
41
Social analysis
ì A tracking method independent from background changes and temporal resolu3on could fit this data
42 M. Aghaei, P. Radeva, “Bag-‐of-‐Tracklets for Person Tracking in Life-‐Logging Data”, CCIA’2014. Extended version submibed to a journal.
Analysis of human interac.on
Maedeh Aghaei, Mariella Dimiccoli, Pe3a Radeva, “Towards Social Interac3on Detec3on in Egocentric Photo-‐streams”, (submiaed) 43
!
F-‐forma.ons extrac.on from egocentric vision
44
Life-‐logging analysis for MCI treatment
45
Goal: to develop tools for memory reforcing of MCI and Alzheimer people. • To develop a life-‐logging system recording specific autobiographical episodes for s3mula3ng posteriorly episodic memory func3on
known to be deficient in MCI.
• To explore the associa.on btw biomarkers changes in cogni3ve, func3onal and emo3onal outcomes.
• To learn more about the underlying biological mechanisms for how effec3ve behavioural interven3ons improve cogni3ve and func3onal outcomes.
Inspired in the case of use of SenseCam for a pa.ent with herpes encephali.s.
But what are the memorable moments?
Life-‐logging for wellfare
• how to extract seman3c units related to the lifestyle and their context rela.on, • • how to segment life-‐log data into meaningful events,
• what are the seman.c units that characterize the lifestyle of individuals,
• what is their rela3on and how the context affects them,
• how to extract and characterize lifestyles paberns,
• what is the healtstyle, etc.
To derive lifestyle paberns from visual life-‐logs and to conduct a study on the feasibility of automa3cally genera3on of lifestyle paaerns and interpreta3ons to be used in the future to improve lifestyle of individuals.
46
Towards lifestyle characteriza.on Towards visualizing summarized lifestyle data to ease the management of the user’s healthy habits (sedentary lifestyles, nutri.onal ac.vity, etc.).
47
Life-‐logging can help us accomplish our goals: taking photos of our everyday life and being able to analyse what we eat, star.ng by the dish recogni.on.
Aim: to develop tools based on lifelogging to show the rela.on between nutri.on and cogni.on. European proposal submibed to the call in April 2015.
Conclusions ì Life-‐logging is a very recent trend growing very fast and with a huge poten.al.
ì Technology for data acquisi.on and storage is ready.
ì Life-‐loggers are wai3ng for tools of egocentric vision to process the huge amount of data.
ì We developed several algorithms for: ì Video segmenta.on
ì Event detec.on
ì Object and scene recogni.on
ì Key frame extrac.on ì Video summarization and browsing
ì Social interac.on
ì Storytelling
ì Huge poten3al for numerous applica3ons to health & wellfare that will strongly influence the future of the technology
48