High‐Level Semantic ModelingShih‐Fu Chang
Digital Video Multimedia Lab, Columbia UniversityCVPR Tutorial, June 2014
1200 SentiBankConcepts
Predict Sentiment
Visual Aesthetics• Datta et al ECCV 2006, Naila Murrary et al CVPR 2012 (AVA)
• AVA Dataset: 250,000 images from 963 dp‐challenges with aesthetics scores and semantic/style labels
Aesthetics is Subjective‐ Non‐Conventional Style/Subject Tends to Cause Large Score Variations
Murrary et al CVPR 2012 (AVA)
Score Distributions of Each Image Vary but Form Patterns Murrary et al CVPR 2012 (AVA)
Gaussian distribution
a reasonable fit
Semantics Also Plays An Important Role
Murrary et al CVPR 2012
‐Many less attractive classes are associated with negative semantics
What makes video pleasing – NHK 1000 Videos Aesthetics Ranking at ACMMM13
9
Cinematographic Evaluation
Additional Filter for Web video search
Personal Video Collection
Aesthetically Pleasing Not‐so Pleasing
(Bhattacharya et al, ACMMM 2013)
Computational Video Aesthetics
Input Video
Shot
Keyframe
Cell
Camera motion Foreground
Motion Texture Dynamics
Semantics Sentiments
Sharpness Eye Sensitivity Dark Channel
Fused Aesthetic Model
Query Video
PredicteAppeal
Shot Level AestheticModels
Frame Level Aesthetic Models
Cell Level Aesthetic Models
(Bhattacharya et al, ACMMM 2013)(Subh Bhattacharya et al, ACMMM 2013)
Which Images are More Memorable (Philip Isola et al, 2011)
S.F. Chang 13
+ person, floor, car ‐ sky, tree, mountain
For Content to be Viral, it Needs to be Emotional
Psychology emotion wheel (8 emotions, by Robert Plutchik)
15
Plenty on the Web:“For content to go viral, it needs to
be emotional,” Dan Jones
‐ Dan Jones
The Power of Social Visual Multimedia
@BarackObama: Four more years. @Brynn4NY: Rollercoaster at sea.
2012 Tweets of the Year
16
17
Classifying Image EmotionsMachajdik and Hanbury, ACMMM 2010
IAPS Affect Data set Art Affect Data set
How Do People Describe Emotions in Web Photos?‐‐Web mining to discover visual emotions in social media
Build Sentiment Ontology
Psychology emotion wheel (8 emotions)Robert Plutchik, ‘91
Discover sentiment words
SelectConcepts
SAD EYES
MISTY WOODS
18
Analyze tags with strong sentiments
Borth, Ji, Chen, Breuel, Chang, Large‐Scale Visual Sentiment Ontology, ACM Multimedia 2013
Concurrent tags with emotions
S.F. Chang 19
From 6 million tags on Flickr and YouTubeColor code: text sentiment values
From Machine Vision Perspective:Not all concepts/entities are detectable!
‐‐ which 1000 concepts to focus in pictures?
Computational Focus –Adjective‐Noun Pair (ANP)
• Adjective (268): express emotions• positive: beautiful, amazing, cute• negative: sad, angry, dark
• Nouns (1187): possible detection• people, places, animals, food, objects, weather
• Standard steps:– remove entities like “hot dog” via wikipedia– Choose sentiment rich ANP concepts by NLP tools“Senti‐WordNet” “SentiStrength”
S.F. Chang 22
ANP Ontology (noun)• 6 levels
– ANIMALS– FLORA– PERSON– OBJECTS– NATURAL PLACES– MAN‐MADE PLACES– VEHICLE– FICTIONAL_CREATURES– FOOD– ABSTRACT_CONCEPTS– ART_PHOTOGRAPHY– EVENTS– ACTION– WEATHER_CONDITIONS– TIME
S.F. Chang 23
ANP Ontology (adjective)
• adjectives (2 levels)– weather (stormy, cold, sunny)– people (young, attractive)– animals (cute, fluffy)– places (haunted, misty)– food (yummy, salty)– object (colorful, beautiful)
S.F. Chang 24
Open Issues …
• How will the visual sentiment ontology change over different domains?– Differences in photo style, quality, user groups, culture, tasks
• How to link mid‐level concepts to high‐level emotions?– currently based on association
Next Step: Teach Machine to Recognize Visual Sentiments
SentiBank(1200
Detectors)
Build Sentiment Ontology
Psychology emotion wheel (24 emotions)
Discover sentiment words
SelectAdj‐Noun Pairs
Train Classifiers
Performance Filtering
Sentiment Prediction
SAD EYES
MISTY WOODS
S.F. Chang 28
Image Features• Generic features
– Color Histogram (3x256 dim.)– GIST descriptor (512 dim.)– Local Binary Pattern (52 dim.)– SIFT Bag‐of‐Words (1,000 codewords
2‐layer spatial pyramid, max pooling)– Attribute descriptor (2,000 dim.)
• Special features– Object detection (people, objects, etc.)– Aesthetics features (color schemes, layout, etc.)– Face and attributes– Improve accuracy 9%‐30%
S.F. Chang 29
Aesthetic Features• Dark Channel [He et al. ‘09]
• minimum of local intensity
• Sharpness [Vu et al. ‘11]• sharpness of local image regions by spectral and spatial measures
• Depth of Field• wavelet decomposition in HSV color space (low vs high)
• Color Harmony [Nishiyama et al. ‘11]• Using local histogram of Moon‐Spencer model, which defines compatibility of two color values (example of compatible color)
What Do Humans Expect to See?‐ from small annotation experiments
masochismtango@flickrhouseofduke@flickrHIKARU Pan@flickrJules3000@flickr
springlakecake@flickrebonique2007@flickrhurlham@flickrhouseofduke@flickr
“Smiling dog”: tongue visible, mouth open, face camera, close shot, pink tongue, open mouth, frontal dog face
“Tired dog”: lying on floor or surface, closed eyes, yawning, resting, no action, fore legs, paws, face on floor
So, Need to Link to Objects + Attributes
Training Images of Same Noun (e.g. dog)
Object (noun) DetectorHOG, DPM
Feature Extraction• object/background/whole• SIFT, GIST, LBP, color• aesthetics (symmetry,
white balance, etc.)• composition (object
size, position)
DiscardSoft Adj. Labels
ConceptNet, SentiStrenth, human labels
Weighted SVMs:
ANP Classifiers +Feature selection:
Yes No
cute sad wet
Adjectives:
cutedog
saddog
wetdog
SamFan1@flickr
Bahman Farzad@flickr
zoompict@flickr
BloodyGoku21@flickr
Testing Images
Hierarchical: Object + Affect Attributes
• Testing
dog?
Candidates + Features
cutedog?
saddog?
wetdog?
ANP Classifiers:
face? car?
Candidates + Features
madface?
sillyface? face?
Candidates + Features
hotcar?
tinycar?
safecar?
FuseNoun Score:
Max ScoreOutput:
sweet
epSos.de@flickr Karf Oohlu@flickr rollinoldman@flickrgreen_lover@flickrNiH@google+houseofduke@flickr paevalill@flickr ccdoh1@flickr flatworldsedge@flickr
Tricky Issue: Concept Subjectivity
• Attribute s subjective, ambiguous, and overlapped– E.g., cute dog, fluffy dog, cuddly dog
• Solution– Need a way to handle soft label overlap
– Model overlap proportion in SVM
Cute dog
Tiny dogFluffy dog
The SVM Algorithm
35
F. Yu; D. Liu; S. Kumar; T. Jebara; S.‐F. Chang. ∝SVM for learning with label proportions. ICML13
Label prediction loss proportion loss
• Learned with alternate optimization or a relaxed convex form
• Formulation:
Image set of “Fluffy Dog” has proportion pk being “Cut Dog”
Image set of “Tiny Dog” has proportion pk being “Cut Dog”
proportions can be approximated by ConceptNet
Example: SVM for Video Event Recognition
• Model the proportion of positive instances in each event• Detecting complex events in ~ 100,000 videos with 20% gain
K.‐T. Lai; F. X. Yu; M.‐S. Chen; S.‐F. Chang. CVPR 2014
VSO/SentiBank ResourcesOntology and 1,200 Classifiers
http://visual‐sentiment‐ontology.appspot.com/
Shih‐Fu Chang 39
1200 Classifiers Predict Sentiment
Application: Live Sentiment Prediction
True stuff. I have mad respect for all the ladies that DO NOT give in to abortion.
#groundzero #hurricanesandy#newjersey
True stuff. I have mad respect for all the ladies that DO NOT give in to abortion.
#groundzero #hurricanesandy#newjersey
Ouch mr police man
PhotoTweet Stream:
40
@nickespo89
@charleslawrence @radiodario
Positive?Neutral?Negative?
Viewer Response Depends …• Responses depend on viewer’s perspective• Mechanic Turk sentiment labeling over 2000 photo tweets
True stuff. I have mad respect for all the ladies that DO NOT give in to abortion.
Amazon Mechanic Turk Sentiment/Emotion Label:(image‐based labeling) worker 1: Positive, trust:acceptanceworker 2: Neutral, interest:unlabeled,sad:pensivenessworker 3: Positive, interest:interest
(text‐based labeling)worker 1: Positive, joy:serenity,trust:acceptanceworker 2: Positive, anger:neutral,interest:interest,joy:serenity,trust:acceptanceworker 3: Negative, sad:sadness
(text‐image‐based labeling)worker 1: Positive, joy:serenity,sad:neutralworker 2: Positive, interest:interest,joy:joy,sad:neutral,surprise:distractionworker 3: Positive, joy:serenity,surprise:neutral,trust:trust
S.F. Chang 41
@nickespo89
Response also Depends on Topic• Viewer disagreement varies across topics• Text more controversial than image in invoking responses
S.F. Chang 42
% sentiment labels disagreed by all viewers
Photo Tweet Sentiment Tracking (during Hurricane Sandy)
• Goal: track sentiment evolution during natural disaster• Data collection:
• Oct 25 – Nov 02, 2012• Related popular hashtags : #prayforusa, #frankenstorm, #nyc,#hurricane,#sandy,#hurricanesandy, #staysafe, #redcross,#myheartgoesouttoyou,…
• 2000 Photo Tweets collected• Ground Truth Labeling:
• 1340 labels (positive or negative) agreed by 2 annotators
• Training Classifier:• Text (SentiStrength)• Visual(SentiBank, Logistic Regr.)• Training/Testing ratio: 4:1• 5‐fold cross‐validation• Accuracy: 60.7% (text), 66.4% (visual), 72% (Text‐Visual Combined)
S.F. Chang 43
Publisher (expressed) vs. Viewer (evoked) Affects
S.F. Chang 44
Discover sentiment words
SAD EYES
MISTY WOODS
BEAUTIFUL DOG MOODY
Publisher
Viewer
Viewer Affect Concepts (VAC)
S.F. Chang 45
Popular ResponsesResponses for “SAD” images
What viewers say about images of different emotions?
Publisher Affect vs. Viewer Response
• How do Publisher Affect Concepts evoke different Viewer Affect Concepts?
46
Great. now i’m hungry.
Looks so delicious!!
PAC‐VACCorrelationModels
Viewer Affect Concept (VAC)delicious
hungry
yummynice happy
tastyfat
cool
Publisher Affect Concept (PAC)
…
yummy meattraditional celebration
hot fooddelicious food
great food
outdoor party
SentiBank [Borth, ACM MM’13]
Probabilistic PAC‐VAC Correlation Models
47P(vj | di; )
P(vj | )P(di | vj; )P(di | )
P(di | vj; ) (P(pk | di )P(pk | vj;k1
A
) (1P(pk | di ))(1P(pk | vj; )))
- Recommending images by Multivariate Bernoulli formulation:
P(pk | vj; ) BikP(vj | di )
i1
D
P(vj | di )i1
D
, Bik : presence of pk in the metadata of di
- Measuring PAC-VAC Co-occurrences:
- Predicting VACs for a given image by Bayes model:
Visual‐based detection score of pk in di
VACPACimage
Application: Comment Assistant
48
lovely moody shot ‐ so peaceful!
B C
…
PAC
PAC‐VAC CorrelationModels
PAC
VAC
SentiBank Detection
“wonderful,” “lovely,” “nice,” “peaceful,” “moody,” “serene,”…
Candidate Comment Collection
CommentSelection
Predicted VACs
Comment Tutor
Summary: Affect/Emotion Attributes
1200 Classifiers Predict Sentiment
yummy foodPublisher
Affect Concept
Viewer Affect Conceptgreat . . . now i'm
hungry . . .
• Retrieval• Authoring• Recommendation• Social communication
ReferencesVisual Affect• Machajdik, Jana, and Allan Hanbury. "Affective image classification using features inspired by psychology and art
theory." In ACM Multimedia, 2010.Visual Emotion and Sentiment• Borth, Damian, Rongrong Ji, Tao Chen, Thomas Breuel, and Shih‐Fu Chang. "Large‐scale visual sentiment ontology
and detectors using adjective noun pairs." In ACM Multimedia, 2013.• Chen, Yan‐Ying, Tao Chen, Winston H. Hsu, Hong‐Yuan Mark Liao, and Shih‐Fu Chang. "Predicting Viewer Affective
Comments Based on Image Content in Social Media." In ACM ICMR, 2014.Visual Aesthetics• Datta, Ritendra, Dhiraj Joshi, Jia Li, and James Z. Wang. "Studying aesthetics in photographic images using a
computational approach." In ECCV, 2006.• Naila Murray, De Barcelona, Luca Marchesotti, and Florent Perronnin. AVA: A Large‐Scale Database for Aesthetic
Visual Analysis. In CVPR, 2012. Visual Aesthetics, Interestingness, Memorability• S. Dhar, V. Ordonez, and T. L. Berg. High level describable attributes
for predicting aesthetics and interestingness. In CVPR, 2011. • Gygli, Michael, Helmut Grabner, Hayko Riemenschneider, Fabian Nater, and Luc Van Gool. "The interestingness of
images." In ICCV, 2013.• Bhattacharya, Subhabrata, Behnaz Nojavanasghari, Tao Chen, Dong Liu, Shih‐Fu Chang, and Mubarak Shah.
"Towards a comprehensive computational model foraesthetic assessment of videos." In ACM Multimedia, 2013.• Isola, Phillip, Jianxiong Xiao, Antonio Torralba, and Aude Oliva. "What makes an image memorable?." In CVPR,
2011.Image Style• Karayev, Sergey, Aaron Hertzmann, Holger Winnemoeller, Aseem Agarwala, and Trevor Darrell. "Recognizing Image
Style." arXiv preprint arXiv:1311.3715(2013).