semantic and diverse summarization of egocentric photo events
TRANSCRIPT
![Page 1: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/1.jpg)
Semantic and Diverse Summarization of Egocentric
Photo EventsAniol Lidon Baulida
Master Computer Vision (UAB, UPC, UPF, UOC)
Advisors:Xavier Giró Nieto, Image Processing Group, Universitat Politècnica de CatalunyaPetia Radeva, Barcelona Perceptual Computing Lab, Universitat de Barcelona
1
![Page 2: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/2.jpg)
CollaborationBarcelona Perceptual Computing Laboratory :
Marc Bolaños, Petia Radeva
Image Processing Group:
Xavier Giró
Grup de Recerca Cervell, Cognició i Conducta:
Maite Garolera
Institute of Creative Media Technologies:
Matthias Zeppelzauer
2
![Page 3: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/3.jpg)
Motivation• In 2013, 44.4 million people with dementia worldwide.• “Cognitive Stimulation Therapy”
3
![Page 4: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/4.jpg)
Motivation• Lifelogging with Narrative Clip.• Up to 2000~3000 images at day!• Summarization is needed.
4
![Page 5: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/5.jpg)
Goal
5
Automatically summarize events. • Sorting by priority.• Trade-off between relevance and diversity.• Obtaining sorted ranks.
![Page 6: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/6.jpg)
Goal
6
RELEVANCE
Automatically summarize events. • Sorting by priority.• Trade-off between relevance and diversity.• Obtaining sorted ranks.
![Page 7: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/7.jpg)
Goal
7
RELEVANCE
DIVERSITY
Automatically summarize events. • Sorting by priority.• Trade-off between relevance and diversity.• Obtaining sorted ranks.
![Page 8: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/8.jpg)
Sate of the art• This project continues the work started by Ricard Mestre.
– Event segmentation and selecting the most repetitive image from an event.
• Off-the-shelf algorithms used:– Informativeness network: provided by Marc Bolaños (to be published)– Blur detection: Crete et al. The blur effect: perception and estimation with a new no-
reference perceptual blur metric– Saliency Maps: provided by Kevin McGuinness (to be published).– Face detection: Zhu et al. Face detection, pose estimation, and landmark localization in
the wild.– Object Candidates: Arbelaez et al. Multiscale Combinatorial Grouping – Object Detector: Hoffman et al. Large Scale Detection through Adaptation.– Affective: Campos et al. Diving Deep into Sentiment: Understanding Fine-tuned CNNs for
Visual Sentiment Prediction
8
![Page 9: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/9.jpg)
Pipeline
9
![Page 10: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/10.jpg)
Pipeline
10
![Page 11: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/11.jpg)
Prefiltering
11
Aim: Removing uninformative images.
Informativeness network
Fine-tuning by Human Annotations
Filtering out: Discarding absolutely uninformative frames.
![Page 12: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/12.jpg)
Pipeline
12
![Page 13: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/13.jpg)
Pipeline
13
![Page 14: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/14.jpg)
Relevance
14
What is relevance?Frame-level:
•Repeated.• Unusual.• WHAT? Representative of an activity. • WHO? Social interactions. • WHERE? Environment. • WHEN an event has occurred. • HOW activity occurred.
![Page 15: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/15.jpg)
Relevance
15
What is relevance?Frame-level:
• WHAT? Representative of an activity. • Saliency Maps• Object detection
• WHO? Social interactions. • Face detection• Sentiment Analysis (Affectivity)
![Page 16: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/16.jpg)
Relevance Ranking: pipeline
16
Prefiltering
Diversityre-ranking
![Page 17: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/17.jpg)
Relevance rankingSaliency maps
SalNet CNN
Aim: Determining interesting zones.
Scoring for relevance: Averaging all saliency-map values.
17
![Page 18: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/18.jpg)
Relevance ranking
18
Objects
LSDA Large Scale Detection through Adaptation
Object Detector
Aim: Finding well defined objects.
Scoring for relevance: Summing all detected objects scores.
![Page 19: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/19.jpg)
Relevance ranking
19
Faces
Face detection, pose estimation, and landmark localization in the wild.
Aim: Finding well defined faces.
Scoring for relevance: Summing exponentially all faces confidences.
![Page 20: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/20.jpg)
Relevance Ranking: pipeline
20
Prefiltering
Diversityre-ranking
![Page 21: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/21.jpg)
Pipeline
21
![Page 22: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/22.jpg)
Pipeline
22
![Page 23: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/23.jpg)
Diversity re-ranking
Re-ranking by Soft Max Diversity Fusion
23
Color similarity
Faces similarity
![Page 24: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/24.jpg)
Diversity re-ranking
Re-ranking by Soft Max Diversity Fusion
24
Color similarity
Faces similarity
![Page 25: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/25.jpg)
Diversity re-ranking
Re-ranking by Soft Max Diversity Fusion
25
Color similarity
Faces similarity
![Page 26: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/26.jpg)
Similarity measure
26
ImageNetEuclidean distance between features (L2 norm).
CNN trained with ImageNet DB (1000 classes) using CaffeNet Architecture.
Fully connected layer 8 removed.
![Page 27: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/27.jpg)
Pipeline
27
![Page 28: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/28.jpg)
Pipeline
28
![Page 29: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/29.jpg)
Assesment
29
Validation of automatic approach
Manually annotated summaries
• 7 dataset with labelled ground-truth • 2 Online questionnaires• Mean Opinion Score
Psychologists feedback:
INTERMEDIATE VALIDATION FINAL EVALUATION
![Page 30: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/30.jpg)
Subjective problem
30
Precision
GROUND-TRUTH SELECTED
![Page 31: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/31.jpg)
Metric
31
Mean Normalized Sum of Max Similarities (MNSMS)
MN
SMS
n (%)
Normalization in both axesY: Divide by GT samplesX: Reshape samples to N bins
Ground-Truth
Sor
ted
List
(Res
ults
)
n=1
Similarity Sum= + +
![Page 32: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/32.jpg)
Metric
32
Mean Normalized Sum of Max Similarities (MNSMS)
MN
SMS
n (%)
Normalization in both axesY: Divide by GT samplesX: Reshape samples to N bins
Ground-Truth
Sor
ted
List
(Res
ults
)
n=2
Similarity Sum= + +
![Page 33: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/33.jpg)
Metric
33
Mean Normalized Sum of Max Similarities (MNSMS)
MN
SMS
n (%)
Normalization in both axesY: Divide by GT samplesX: Reshape samples to N bins
Ground-Truth
Sor
ted
List
(Res
ults
)
n= 3
Similarity Sum= + +
![Page 34: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/34.jpg)
Metric
34
Mean Normalized Sum of Max Similarities (MNSMS)
MN
SMS
n (%)
Normalization in both axesY: Divide by GT samplesX: Reshape samples
Ground-Truth
Sor
ted
List
(Res
ults
)
Similarity Sum= + +
n= 4
![Page 35: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/35.jpg)
AUC
Metric
35
Mean Normalized Sum of Max Similarities (MNSMS)
MN
SMS
n (%)
Normalization in both axesY: Divide by GT samplesX: Reshape samples
Ground-Truth
Sor
ted
List
(Res
ults
)
Similarity Sum= + +
n= 4
![Page 36: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/36.jpg)
Assesment
36
Validation of automatic approach
Manually annotated summaries
• 7 dataset with labelled ground-truth• MNSMS (ImageNet) AUC
• 2 Online questionnaires• Mean Opinion Score
Psychologists feedback:
INTERMEDIATE VALIDATION FINAL EVALUATION
![Page 37: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/37.jpg)
Intermediate validation
37
Prefiltering•Informativeness Network
•Hand Crafter Estimators
• Not prefitering
![Page 38: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/38.jpg)
Intermediate validation
38
• SalNet
• SalNet + Gaussian
Objects Relevance• LSDA (object detector)
• MCG (object candidates)
0,7
0,75
0,8
0,85
0,9
SalNet SalNet + Gauss
0,7
0,75
0,8
0,85
0,9
LSDA MCG
Saliency RelevanceSaliency Relevance AUC
Objects Relevance AUC
![Page 39: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/39.jpg)
Intermediate validation
Affective Relevance• Positive
• Negative
•Extremum
•Random
Sentiment analysis CNN • 2 classes: positive / negative
39
![Page 40: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/40.jpg)
Assesment
40
Validation of automatic approach
Manually annotated summaries
• 7 dataset with labelled ground-truth• MNSMS (ImageNet) AUC
• 2 rounds of online questionnaires• Mean Opinion Score
Psychologists feedback:
INTERMEDIATE VALIDATION FINAL EVALUATION
![Page 41: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/41.jpg)
Final evaluation
41
SIMILARITY• ImageNet CNN (fc8 removed)
• Places CNN (fc8 removed)
• LSDA (only spatial NMS)
• Fusion (ImageNet + Places + LSDA)
(Diversity re-ranking + Weight fusion in MNSMS)
![Page 43: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/43.jpg)
Final evaluation
43
MEAN OPINION SCORE• ImageNet configuration
• Uniform Sampling
• Ground-truth (previous manual annotation)
![Page 45: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/45.jpg)
Final resultsRepresentativity of summaries:
Preferred summary:
Mean Opinion Score (1 worse - 5 best)
45
![Page 46: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/46.jpg)
GeneralizationMediaeval diverse task
• APPLICATION: Finding more information about a place to visit. • GOAL: Povide a ranked list of Flickr photos for a predefined set of queries. The
refined list should be both relevant to the query and also diverse.
46A. Lidon, M. Bolaños, M. Seidl, X. Giro-i Nieto, P. Radeva, and M. Zeppelzauer, “Upc-ub-stp @ mediaeval 2015 diversity task: Iterative reranking of relevant images,” in MediaEval 2015 Workshop, Wurzen, Germany, 2015.
0,40,420,440,460,48
0,50,520,540,56
Run 1 F1@20 (Visual)
![Page 47: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/47.jpg)
Conclusions
• Contributions: – Mean Normalized Sum of Max Similarities. – New criterion for semantic diversity (based on LSDA).– New method for diversity fusion.– Online evaluation questionnaires.
47
![Page 48: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/48.jpg)
Conclusions• Tested in two applications:
– Memory reinforcement for mild-dementia.– Diverse Social Images Task from the scientific MediaEval benchmark.
• Mean Opinion Score of 4.6 out of 5.00.
• Publications:– Working-notes paper in MediaEval challenge.– Wearable and Ego-vision Systems for Augmented Experience of the
journal IEEE Transactions on Human-Machine Systems.
• Code available: https://imatge.upc.edu/web/resources/semantic-and-diverse-summarization-egocentric-photo-events-software
48
![Page 49: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/49.jpg)
Future work
• Further in other relevance criterion.• Higher level of semantics. • Determine automatically the summary length.
49
![Page 50: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/50.jpg)
Thanks for your attention!
50
![Page 51: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/51.jpg)
Prefiltering
51
Hand-crafted estimators
Blur
Black
Burned Color mean
Crete et al.
Informativeness network
•CNN trained with ImageNet + Places.
•Finetuned with human annotations: relevant / irrelevant
by Marc Bolaños (UB)
![Page 52: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/52.jpg)
Relevance ranking
52
Affective
• VitorNet CNN (2 classes sentiment prediccions)
by Victor Campos (UPC)
![Page 53: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/53.jpg)
Relevance ranking
53
Late fusion
• Score normalization:•By Rank
•By Score
• Aggregate scores
Using MNSMS weights will be learned
![Page 54: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/54.jpg)
Similarity measure
54
ImageNet
Places
LSDa
CNN trained with ImageNet DB (1000 classes) using CaffeNet Architecture.
Fully connected layer 8 removed.
CNN trained with Places (476 classes) DB using CaffeNet Architecture.
Fully connected layer 8 removed.
Object detector : Large Scale Detection through Adaptation (7500 classes).Knowledgement transfer: Classifiers without bounding box annotated data into detectorsTwo post-processing steps of no-maxima supression.
![Page 55: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/55.jpg)
ResultMediaeval diverse task
• APPLICATION: Finding more information about a place to visit. • GOAL: Povide a ranked list of Flickr photos for a predefined set of queries. The
refined list should be both relevant to the query and also diverse.
Ranking for relevance
Filtering
Distance computation
Diversity
Informativeness network, Textual
Keep N% top results
ImageNet, Places, Textual
Diverse top results
![Page 56: Semantic and Diverse Summarization of Egocentric Photo Events](https://reader036.vdocument.in/reader036/viewer/2022070522/58eec8211a28abd0528b4643/html5/thumbnails/56.jpg)
ResultMediaeval diverse task
• APPLICATION: Finding more information about a place to visit. • GOAL: Povide a ranked list of Flickr photos for a predefined set of queries. The
refined list should be both relevant to the query and also diverse.
Visual Textual Multi Crediv. Multi