25-1 image annotation and feature extraction guest lecture lei wang, latifur khan, bhavani...
TRANSCRIPT
![Page 1: 25-1 Image Annotation and Feature Extraction Guest Lecture Lei Wang, Latifur Khan, Bhavani Thuraisingham October 2007 Digital Forensics:](https://reader033.vdocument.in/reader033/viewer/2022051417/5697bfec1a28abf838cb889b/html5/thumbnails/1.jpg)
25-1
Image Annotation and
Feature Extraction
Guest Lecture
Lei Wang, Latifur Khan, Bhavani Thuraisingham
October 2007
Digital Forensics:
![Page 2: 25-1 Image Annotation and Feature Extraction Guest Lecture Lei Wang, Latifur Khan, Bhavani Thuraisingham October 2007 Digital Forensics:](https://reader033.vdocument.in/reader033/viewer/2022051417/5697bfec1a28abf838cb889b/html5/thumbnails/2.jpg)
25-2
Outline
How do we retrieve Images? Motivation Annotation
Correspondence: Models Enhancement
Future Work Results Reference
![Page 3: 25-1 Image Annotation and Feature Extraction Guest Lecture Lei Wang, Latifur Khan, Bhavani Thuraisingham October 2007 Digital Forensics:](https://reader033.vdocument.in/reader033/viewer/2022051417/5697bfec1a28abf838cb889b/html5/thumbnails/3.jpg)
25-3
How do we retrieve images?
Use Google image search ! Google uses filenames, surrounding text and
ignores contents of the images.
![Page 4: 25-1 Image Annotation and Feature Extraction Guest Lecture Lei Wang, Latifur Khan, Bhavani Thuraisingham October 2007 Digital Forensics:](https://reader033.vdocument.in/reader033/viewer/2022051417/5697bfec1a28abf838cb889b/html5/thumbnails/4.jpg)
25-4
Motivation How to retrieve images/videos?
CBIR is based on similarity search of visual features Doesn’t support textual queries Doesn’t capture “semantics”
Automatically annotate images then retrieve based on the textual annotations.
Example Annotations:
Tiger, grass.
![Page 5: 25-1 Image Annotation and Feature Extraction Guest Lecture Lei Wang, Latifur Khan, Bhavani Thuraisingham October 2007 Digital Forensics:](https://reader033.vdocument.in/reader033/viewer/2022051417/5697bfec1a28abf838cb889b/html5/thumbnails/5.jpg)
25-5
Motivation There is a gap between perceptual issue
and conceptual issue. Semantic gap: Hard to represent semantic
meaning using low-level image features like color, texture and shape.
It’s possible to answer query ‘Red ball’ with ‘Red Rose’.
Query by CBIR Retrieved
image
![Page 6: 25-1 Image Annotation and Feature Extraction Guest Lecture Lei Wang, Latifur Khan, Bhavani Thuraisingham October 2007 Digital Forensics:](https://reader033.vdocument.in/reader033/viewer/2022051417/5697bfec1a28abf838cb889b/html5/thumbnails/6.jpg)
25-6
Motivation Most of current automatic image annotation
and retrieval approaches consider Keywords Low-level image features for visual
token/region/object Correspondence between keywords and visual
tokens Our goal is to develop automated image
annotation tecniques with better accuracy
![Page 7: 25-1 Image Annotation and Feature Extraction Guest Lecture Lei Wang, Latifur Khan, Bhavani Thuraisingham October 2007 Digital Forensics:](https://reader033.vdocument.in/reader033/viewer/2022051417/5697bfec1a28abf838cb889b/html5/thumbnails/7.jpg)
25-7
Annotation
![Page 8: 25-1 Image Annotation and Feature Extraction Guest Lecture Lei Wang, Latifur Khan, Bhavani Thuraisingham October 2007 Digital Forensics:](https://reader033.vdocument.in/reader033/viewer/2022051417/5697bfec1a28abf838cb889b/html5/thumbnails/8.jpg)
25-8
Annotation Major steps:
Segmentation into regions
Clustering to construct blob-tokens
Analyze correspondence between key words and blob-tokens
Auto Annotation
![Page 9: 25-1 Image Annotation and Feature Extraction Guest Lecture Lei Wang, Latifur Khan, Bhavani Thuraisingham October 2007 Digital Forensics:](https://reader033.vdocument.in/reader033/viewer/2022051417/5697bfec1a28abf838cb889b/html5/thumbnails/9.jpg)
25-9
Annotation: Segmentation & Clustering
Images Segments Blob-tokens
![Page 10: 25-1 Image Annotation and Feature Extraction Guest Lecture Lei Wang, Latifur Khan, Bhavani Thuraisingham October 2007 Digital Forensics:](https://reader033.vdocument.in/reader033/viewer/2022051417/5697bfec1a28abf838cb889b/html5/thumbnails/10.jpg)
25-10
Annotation: Correspondence/Linking
Our purpose is to find correspondence between words and blob-tokens.
P(Tiger|V1), P(V2|grass)…
![Page 11: 25-1 Image Annotation and Feature Extraction Guest Lecture Lei Wang, Latifur Khan, Bhavani Thuraisingham October 2007 Digital Forensics:](https://reader033.vdocument.in/reader033/viewer/2022051417/5697bfec1a28abf838cb889b/html5/thumbnails/11.jpg)
25-11
Auto Annotation
Tiger Grass Lion
??
….…
![Page 12: 25-1 Image Annotation and Feature Extraction Guest Lecture Lei Wang, Latifur Khan, Bhavani Thuraisingham October 2007 Digital Forensics:](https://reader033.vdocument.in/reader033/viewer/2022051417/5697bfec1a28abf838cb889b/html5/thumbnails/12.jpg)
25-12Segmentation: Image Vocabulary
Can we represent all the images with a finite set of symbols? Text documents consist of words Images consist of visual terms
V123 V89 V988
V4552 V12336 V2
V765 V9887
copyright © R. Manmatha
![Page 13: 25-1 Image Annotation and Feature Extraction Guest Lecture Lei Wang, Latifur Khan, Bhavani Thuraisingham October 2007 Digital Forensics:](https://reader033.vdocument.in/reader033/viewer/2022051417/5697bfec1a28abf838cb889b/html5/thumbnails/13.jpg)
25-13
Construction of Visual Terms
Segmented images ( e.g., Blobworld, Normalized-cuts algorithm.)
Cluster segments. Each cluster is a visual term/blob-token
Visterms/blobtoken
… …
Images SegmentsV1 V2
V3 V4V1
V5 V6
![Page 14: 25-1 Image Annotation and Feature Extraction Guest Lecture Lei Wang, Latifur Khan, Bhavani Thuraisingham October 2007 Digital Forensics:](https://reader033.vdocument.in/reader033/viewer/2022051417/5697bfec1a28abf838cb889b/html5/thumbnails/14.jpg)
25-14
Discrete Visual terms
Rectangular partition works better! Partition keyframe, clusters across images. Segmentation problem can be avoided at some extent.
copyright © R. Manmatha
![Page 15: 25-1 Image Annotation and Feature Extraction Guest Lecture Lei Wang, Latifur Khan, Bhavani Thuraisingham October 2007 Digital Forensics:](https://reader033.vdocument.in/reader033/viewer/2022051417/5697bfec1a28abf838cb889b/html5/thumbnails/15.jpg)
25-15
Visual terms Or partition using a rectangular
grid and cluster. Actually works better.
![Page 16: 25-1 Image Annotation and Feature Extraction Guest Lecture Lei Wang, Latifur Khan, Bhavani Thuraisingham October 2007 Digital Forensics:](https://reader033.vdocument.in/reader033/viewer/2022051417/5697bfec1a28abf838cb889b/html5/thumbnails/16.jpg)
25-16
Grid vs Segmentation
Segmentation vs Rectangular Partition. Results - Rectangular Partition better than
segmentation! Model learned over many images. Segmentation
over one image.
![Page 17: 25-1 Image Annotation and Feature Extraction Guest Lecture Lei Wang, Latifur Khan, Bhavani Thuraisingham October 2007 Digital Forensics:](https://reader033.vdocument.in/reader033/viewer/2022051417/5697bfec1a28abf838cb889b/html5/thumbnails/17.jpg)
25-17
Feature Extraction & Clustering
Feature Extraction: Color Texture Shape
K-means clustering: To generate finite visual terms. Each cluster’s centroid represents a visual term.
![Page 18: 25-1 Image Annotation and Feature Extraction Guest Lecture Lei Wang, Latifur Khan, Bhavani Thuraisingham October 2007 Digital Forensics:](https://reader033.vdocument.in/reader033/viewer/2022051417/5697bfec1a28abf838cb889b/html5/thumbnails/18.jpg)
25-18
Co-Occurrence Models
Mori et al. 1999 Create the co-
occurrence table using a training set of annotated images
Tend to annotate with high frequency words
Context is ignored Needs joint probability
models
w1 w2 w3 w4
V1 12 2 0 1
V2 32 40 13 32
V3 13 12 0 0
V4 65 43 12 0
P( w1 | v1 ) = 12/(12+2+0+1)=0.8
P( v3 | w2 ) = 12/(2+40+12+43)=0.12
![Page 19: 25-1 Image Annotation and Feature Extraction Guest Lecture Lei Wang, Latifur Khan, Bhavani Thuraisingham October 2007 Digital Forensics:](https://reader033.vdocument.in/reader033/viewer/2022051417/5697bfec1a28abf838cb889b/html5/thumbnails/19.jpg)
25-19
Correspondence: Translation Model (TM)
Pr(f|e) = ∑ Pr(f,a|e)
a
Pr(w|v) = ∑ Pr(w,a|v)
a
![Page 20: 25-1 Image Annotation and Feature Extraction Guest Lecture Lei Wang, Latifur Khan, Bhavani Thuraisingham October 2007 Digital Forensics:](https://reader033.vdocument.in/reader033/viewer/2022051417/5697bfec1a28abf838cb889b/html5/thumbnails/20.jpg)
25-20
Translation ModelsDuygulu et al. 2002Use classical IBM machine translation models to translate visterms into words
IBM machine translation models Need a bi-lingual corpus to train the models
V2 V4 V6Mary did not slap the green witch
Maui People Dance
Mary no daba una botefada a la bruja verde
… …V1 V34 V321 V21
Tiger grasssky
… … … …
![Page 21: 25-1 Image Annotation and Feature Extraction Guest Lecture Lei Wang, Latifur Khan, Bhavani Thuraisingham October 2007 Digital Forensics:](https://reader033.vdocument.in/reader033/viewer/2022051417/5697bfec1a28abf838cb889b/html5/thumbnails/21.jpg)
25-21
Correspondence (TM )
W
X =
N
N
B
W
B
![Page 22: 25-1 Image Annotation and Feature Extraction Guest Lecture Lei Wang, Latifur Khan, Bhavani Thuraisingham October 2007 Digital Forensics:](https://reader033.vdocument.in/reader033/viewer/2022051417/5697bfec1a28abf838cb889b/html5/thumbnails/22.jpg)
25-22
Correspondence (TM )
N
W
N
B
WiBj
![Page 23: 25-1 Image Annotation and Feature Extraction Guest Lecture Lei Wang, Latifur Khan, Bhavani Thuraisingham October 2007 Digital Forensics:](https://reader033.vdocument.in/reader033/viewer/2022051417/5697bfec1a28abf838cb889b/html5/thumbnails/23.jpg)
25-23
Results Dataset
Corel Stock Photo CDs. 600 CDs, each of them
consists of 100 images under same topic.
We select 5000 images (4500 training, 500 testing). Each image has manual annotation.
374 words and 500 blobs.
sun city sky mountain
grizzly bear meadow water
![Page 24: 25-1 Image Annotation and Feature Extraction Guest Lecture Lei Wang, Latifur Khan, Bhavani Thuraisingham October 2007 Digital Forensics:](https://reader033.vdocument.in/reader033/viewer/2022051417/5697bfec1a28abf838cb889b/html5/thumbnails/24.jpg)
25-24
Results Experimental Context
3,000 training objects 300 images for testing
Each object is represented by a vector of 30 dimensions: color, texture, and shape
![Page 25: 25-1 Image Annotation and Feature Extraction Guest Lecture Lei Wang, Latifur Khan, Bhavani Thuraisingham October 2007 Digital Forensics:](https://reader033.vdocument.in/reader033/viewer/2022051417/5697bfec1a28abf838cb889b/html5/thumbnails/25.jpg)
25-25
Results Each Image Object/Blob-token has 30 features: Size -- portion of the image covered by the region. Position -- coordinates of the region center of mass
normalized by the image dimensions. Color -- average and standard deviation of (R,G, B),
(L, a, b) over the region. Texture -- average and variance of 16 filter
responses, four differences of Gaussian filters with different sigmas, and 12 oriented filters, aligned in 30-degree increments.
For shape, we use six features (i.e., area, x, y, boundary, convexity, and moment of inertia).
![Page 26: 25-1 Image Annotation and Feature Extraction Guest Lecture Lei Wang, Latifur Khan, Bhavani Thuraisingham October 2007 Digital Forensics:](https://reader033.vdocument.in/reader033/viewer/2022051417/5697bfec1a28abf838cb889b/html5/thumbnails/26.jpg)
25-26
Results
Examples for automatic annotation
![Page 27: 25-1 Image Annotation and Feature Extraction Guest Lecture Lei Wang, Latifur Khan, Bhavani Thuraisingham October 2007 Digital Forensics:](https://reader033.vdocument.in/reader033/viewer/2022051417/5697bfec1a28abf838cb889b/html5/thumbnails/27.jpg)
25-27
Results
The number of segments annotated correctly among
299 testing segments for different models
![Page 28: 25-1 Image Annotation and Feature Extraction Guest Lecture Lei Wang, Latifur Khan, Bhavani Thuraisingham October 2007 Digital Forensics:](https://reader033.vdocument.in/reader033/viewer/2022051417/5697bfec1a28abf838cb889b/html5/thumbnails/28.jpg)
25-28
Results Correspondence based on K-means---
PTK. Correspondence based on Weighted
Feature Selection --- PTS. With GDR dimensionality of image
object will be reduced (say from 30 to 20) and then apply K-means and so on.
![Page 29: 25-1 Image Annotation and Feature Extraction Guest Lecture Lei Wang, Latifur Khan, Bhavani Thuraisingham October 2007 Digital Forensics:](https://reader033.vdocument.in/reader033/viewer/2022051417/5697bfec1a28abf838cb889b/html5/thumbnails/29.jpg)
25-29
Results Precision p
Recall r
NumCorrect means the number of retrieved images
which contain query keyword in its original annotation
NumRetrieved is the number of retrieved images NumExist is the total number of images in test set
containing query keyword in annotation Result of Common E measure
E=1-2/(1/p+1/r)
trievedCorrect NumNump Re/
ExistCorrect NumNumr /
NumExistNumRetrieved
NumCorrect
![Page 30: 25-1 Image Annotation and Feature Extraction Guest Lecture Lei Wang, Latifur Khan, Bhavani Thuraisingham October 2007 Digital Forensics:](https://reader033.vdocument.in/reader033/viewer/2022051417/5697bfec1a28abf838cb889b/html5/thumbnails/30.jpg)
25-30
Results: Precision, Recall and E
Precision of retrieval for different models
![Page 31: 25-1 Image Annotation and Feature Extraction Guest Lecture Lei Wang, Latifur Khan, Bhavani Thuraisingham October 2007 Digital Forensics:](https://reader033.vdocument.in/reader033/viewer/2022051417/5697bfec1a28abf838cb889b/html5/thumbnails/31.jpg)
25-31
Results: Precision, Recall and E-measure
Recall of retrieval for different models
![Page 32: 25-1 Image Annotation and Feature Extraction Guest Lecture Lei Wang, Latifur Khan, Bhavani Thuraisingham October 2007 Digital Forensics:](https://reader033.vdocument.in/reader033/viewer/2022051417/5697bfec1a28abf838cb889b/html5/thumbnails/32.jpg)
25-32
Results: Precision, Recall and E-measure
E Measure of retrieval for different models