alip: automatic linguistic indexing of pictures
DESCRIPTION
ALIP: Automatic Linguistic Indexing of Pictures. Jia Li The Pennsylvania State University. Can a computer do this?. “Building, sky, lake, landscape, Europe, tree”. Outline. Background Statistical image modeling approach The system architecture The image model Experiments - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: ALIP: Automatic Linguistic Indexing of Pictures](https://reader036.vdocument.in/reader036/viewer/2022081516/56814b6f550346895db85aef/html5/thumbnails/1.jpg)
ALIP: Automatic Linguistic Indexing of Pictures
Jia Li
The Pennsylvania State University
![Page 2: ALIP: Automatic Linguistic Indexing of Pictures](https://reader036.vdocument.in/reader036/viewer/2022081516/56814b6f550346895db85aef/html5/thumbnails/2.jpg)
“Building, sky, lake, landscape, Europe, tree”
Can a computer do this?
![Page 3: ALIP: Automatic Linguistic Indexing of Pictures](https://reader036.vdocument.in/reader036/viewer/2022081516/56814b6f550346895db85aef/html5/thumbnails/3.jpg)
Outline
Background Statistical image modeling
approach The system architecture The image model
Experiments Conclusions and future work
![Page 4: ALIP: Automatic Linguistic Indexing of Pictures](https://reader036.vdocument.in/reader036/viewer/2022081516/56814b6f550346895db85aef/html5/thumbnails/4.jpg)
Image Database
The image database contains categorized images.
Each category is annotated with a few words. Landscape, glacier Africa, wildlife
Each category of images is referred to as a concept.
![Page 5: ALIP: Automatic Linguistic Indexing of Pictures](https://reader036.vdocument.in/reader036/viewer/2022081516/56814b6f550346895db85aef/html5/thumbnails/5.jpg)
A Category of Images
Annotation: “man, male, people, cloth, face”
![Page 6: ALIP: Automatic Linguistic Indexing of Pictures](https://reader036.vdocument.in/reader036/viewer/2022081516/56814b6f550346895db85aef/html5/thumbnails/6.jpg)
ALIP: Automatic Linguistic Indexing for Pictures
Learn relations between annotation words and images using the training database.
Profile each category by a statistical image model: 2-D Multiresolution Hidden Markov Model (2-D MHMM).
Assess the similarity between an image and a category by its likelihood under the profiling model.
![Page 7: ALIP: Automatic Linguistic Indexing of Pictures](https://reader036.vdocument.in/reader036/viewer/2022081516/56814b6f550346895db85aef/html5/thumbnails/7.jpg)
Outline
Background Statistical image modeling
approach The system architecture The image model
Experiments Conclusions and future work
![Page 8: ALIP: Automatic Linguistic Indexing of Pictures](https://reader036.vdocument.in/reader036/viewer/2022081516/56814b6f550346895db85aef/html5/thumbnails/8.jpg)
Training Process
![Page 9: ALIP: Automatic Linguistic Indexing of Pictures](https://reader036.vdocument.in/reader036/viewer/2022081516/56814b6f550346895db85aef/html5/thumbnails/9.jpg)
Automatic Annotation Process
![Page 10: ALIP: Automatic Linguistic Indexing of Pictures](https://reader036.vdocument.in/reader036/viewer/2022081516/56814b6f550346895db85aef/html5/thumbnails/10.jpg)
Training
Training images used to train a concept with description “man, male, people, cloth, face”
![Page 11: ALIP: Automatic Linguistic Indexing of Pictures](https://reader036.vdocument.in/reader036/viewer/2022081516/56814b6f550346895db85aef/html5/thumbnails/11.jpg)
Outline
Background Statistical image modeling
approach The system architecture The image model
Experiments Conclusions and future work
![Page 12: ALIP: Automatic Linguistic Indexing of Pictures](https://reader036.vdocument.in/reader036/viewer/2022081516/56814b6f550346895db85aef/html5/thumbnails/12.jpg)
2D HMM
Each node exists in a hidden state. The states are governed by a Markov mesh (a causal Markov random field). Given the state, the feature vector is conditionally independent of other feature vectors and follows a
normal distribution. The states are introduced to efficiently model the spatial dependence among feature vectors. The states are not observable, which makes estimation difficult.
Regard an image as a grid. A feature vector is computed for each node.
![Page 13: ALIP: Automatic Linguistic Indexing of Pictures](https://reader036.vdocument.in/reader036/viewer/2022081516/56814b6f550346895db85aef/html5/thumbnails/13.jpg)
2D HMM
The underlying states are governed by a Markov mesh.
(i’,j’)<(i,j) if i’<i; or i’=i & j’<j
Context: the set of states for (i’, j’): (i’, j’)<(i, j)
![Page 14: ALIP: Automatic Linguistic Indexing of Pictures](https://reader036.vdocument.in/reader036/viewer/2022081516/56814b6f550346895db85aef/html5/thumbnails/14.jpg)
2-D MHMM
Incorporate features at multiple resolutions. Provide more flexibility for modeling statistical dependence. Reduce computation by representing context information
hierarchically.
Filtering, e.g., by wavelet transform
![Page 15: ALIP: Automatic Linguistic Indexing of Pictures](https://reader036.vdocument.in/reader036/viewer/2022081516/56814b6f550346895db85aef/html5/thumbnails/15.jpg)
2D MHMM
An image is a pyramid grid.
A Markovian dependence is assumed across resolutions.
Given the state of a parent node, the states of its child nodes follow a Markov mesh with transition probabilities depending on the parent state.
![Page 16: ALIP: Automatic Linguistic Indexing of Pictures](https://reader036.vdocument.in/reader036/viewer/2022081516/56814b6f550346895db85aef/html5/thumbnails/16.jpg)
2D MHMM
First-order Markov dependence across resolutions.
![Page 17: ALIP: Automatic Linguistic Indexing of Pictures](https://reader036.vdocument.in/reader036/viewer/2022081516/56814b6f550346895db85aef/html5/thumbnails/17.jpg)
2D MHMM The child nodes at resolution r of node (k,l) at resolution r-1: Conditional independence given the parent state:
![Page 18: ALIP: Automatic Linguistic Indexing of Pictures](https://reader036.vdocument.in/reader036/viewer/2022081516/56814b6f550346895db85aef/html5/thumbnails/18.jpg)
2-D MHMM
Statistical dependence among the states of sibling blocks is characterized by a 2-D HMM.
The transition probability depends on: The neighboring states in both
directions The state of the parent block
![Page 19: ALIP: Automatic Linguistic Indexing of Pictures](https://reader036.vdocument.in/reader036/viewer/2022081516/56814b6f550346895db85aef/html5/thumbnails/19.jpg)
2-D MHMM (Summary)
2-D MHMM finds “modes” of the feature vectors and characterizes their inter- and intra-scale spatial dependence.
![Page 20: ALIP: Automatic Linguistic Indexing of Pictures](https://reader036.vdocument.in/reader036/viewer/2022081516/56814b6f550346895db85aef/html5/thumbnails/20.jpg)
Estimation of 2-D HMM
Parameters to be estimated: Transition probabilities Mean and covariance matrix of each
Gaussian distribution EM algorithm is applied for ML
estimation.
![Page 21: ALIP: Automatic Linguistic Indexing of Pictures](https://reader036.vdocument.in/reader036/viewer/2022081516/56814b6f550346895db85aef/html5/thumbnails/21.jpg)
EM Iteration
![Page 22: ALIP: Automatic Linguistic Indexing of Pictures](https://reader036.vdocument.in/reader036/viewer/2022081516/56814b6f550346895db85aef/html5/thumbnails/22.jpg)
EM Iteration
![Page 23: ALIP: Automatic Linguistic Indexing of Pictures](https://reader036.vdocument.in/reader036/viewer/2022081516/56814b6f550346895db85aef/html5/thumbnails/23.jpg)
Computation Issues
An approximation to theclassification EM approach
![Page 24: ALIP: Automatic Linguistic Indexing of Pictures](https://reader036.vdocument.in/reader036/viewer/2022081516/56814b6f550346895db85aef/html5/thumbnails/24.jpg)
Annotation Process
Rank the categories by the likelihoods of an image to be annotated under their profiling 2-D MHMMs.
Select annotation words from those used to describe the top ranked categories.
Statistical significance is computed for each candidate word. Words that are unlikely to have appeared by chance are selected. Favor the selection of rare words.
![Page 25: ALIP: Automatic Linguistic Indexing of Pictures](https://reader036.vdocument.in/reader036/viewer/2022081516/56814b6f550346895db85aef/html5/thumbnails/25.jpg)
Outline
Background Statistical image modeling
approach The system architecture The image model
Experiments Conclusions and future work
![Page 26: ALIP: Automatic Linguistic Indexing of Pictures](https://reader036.vdocument.in/reader036/viewer/2022081516/56814b6f550346895db85aef/html5/thumbnails/26.jpg)
Initial Experiment
600 concepts, each trained with 40 images
15 minutes Pentium CPU time per concept, train only once
highly parallelizable algorithm
![Page 27: ALIP: Automatic Linguistic Indexing of Pictures](https://reader036.vdocument.in/reader036/viewer/2022081516/56814b6f550346895db85aef/html5/thumbnails/27.jpg)
Preliminary Results
Computer Prediction: people, Europe, man-made, water
Building, sky, lake, landscape,
Europe, tree People, Europe, female
Food, indoor, cuisine, dessert
Snow, animal, wildlife, sky,
cloth, ice, people
![Page 28: ALIP: Automatic Linguistic Indexing of Pictures](https://reader036.vdocument.in/reader036/viewer/2022081516/56814b6f550346895db85aef/html5/thumbnails/28.jpg)
More Results
![Page 29: ALIP: Automatic Linguistic Indexing of Pictures](https://reader036.vdocument.in/reader036/viewer/2022081516/56814b6f550346895db85aef/html5/thumbnails/29.jpg)
Results: using our own photographs
P: Photographer annotation Underlined words: words predicted by
computer (Parenthesis): words not in the learned
“dictionary” of the computer
![Page 30: ALIP: Automatic Linguistic Indexing of Pictures](https://reader036.vdocument.in/reader036/viewer/2022081516/56814b6f550346895db85aef/html5/thumbnails/30.jpg)
10 classes:
Africa,beach,buildings,buses,dinosaurs,elephants,flowers,horses,mountains,food.
Systematic Evaluation
![Page 31: ALIP: Automatic Linguistic Indexing of Pictures](https://reader036.vdocument.in/reader036/viewer/2022081516/56814b6f550346895db85aef/html5/thumbnails/31.jpg)
600-class Classification Task: classify a given image to one of the 600
semantic classes Gold standard: the photographer/publisher
classification This procedure provides lower-bounds of the
accuracy measures because: There can be overlaps of semantics among classes (e.g.,
“Europe” vs. “France” vs. “Paris”, or, “tigers I” vs. “tigers II”) Training images in the same class may not be visually
similar (e.g., the class of “sport events” include different sports and different shooting angles)
Result: with 11,200 test images, 15% of the time ALIP selected the exact class as the best choice I.e., ALIP is about 90 times more intelligent than a
system with random-drawing system
![Page 32: ALIP: Automatic Linguistic Indexing of Pictures](https://reader036.vdocument.in/reader036/viewer/2022081516/56814b6f550346895db85aef/html5/thumbnails/32.jpg)
More Information
http://www.stat.psu.edu/~jiali/index.demo.html J. Li, J. Z. Wang, ``Automatic linguistic indexing
of pictures by a statistical modeling approach,'' IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(9):1075-1088,2003.
![Page 33: ALIP: Automatic Linguistic Indexing of Pictures](https://reader036.vdocument.in/reader036/viewer/2022081516/56814b6f550346895db85aef/html5/thumbnails/33.jpg)
Conclusions Automatic Linguistic Indexing of Pictures
Highly challenging Much more to be explored
Statistical modeling has shown some success.
To be explored: Training image database is not categorized. Better modeling techniques. Real-world applications.