hierarchical models of vision: machine learning/computer vision alan yuille ucla: dept. statistics...

65
Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology Dept. Brain and Cognitive Engineering, Korea University

Upload: simon-kennedy

Post on 17-Dec-2015

222 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Hierarchical Models of Vision: Machine Learning/Computer Vision

Alan YuilleUCLA: Dept. Statistics

Joint App. Computer Science, Psychiatry, Psychology

Dept. Brain and Cognitive Engineering, Korea University

Page 2: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Structure of Talk

• Comments on the relations between Cognitive Science and Machine Learning.

• Comments about Cog. Sci. ML and Neuroscience.• Three related Hierarchical Machine Learning Models.

• (I) Convolutional Networks.• (II) Structured Discriminative Models.• (III) Grammars and Compositional Models.• The examples will be on vision, but the techniques are

generally applicable.

Page 3: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Cognitive Science helps Machine Learning

• Cognitive Science is useful to ML because the human visual system has many desirable properties: (not present in most ML systems).

• (i) flexible, adaptive, robust• (ii) capable of learning from limited data, ability to

transfer, • (iii) able to perform multiple tasks,• (iv) closely coupled to reasoning, language, and

other cognitive abilities.• Cognitive Scientists search for fundamental

theories and not incremental pragmatic solutions.

Page 4: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Cognitive Science and Machine Learning

• Machine Learning is useful to Cog. Sci. because it has experience dealing with complex tasks on huge datasets (e.g., the fundamental problem of vision).

• Machine Learning – and Computer Vision --- has developed a very large number of mathematical and computational techniques, which seem necessary to deal with the complexities of the world.

• Data drives the modeling tools. Simple data requires only simple tools. But simple tasks also require simple tools. (neglected by CV).

Page 5: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Combining Cognitive and ML

• Augmented Reality – we need computer systems that can interact with humans.

• How can a visually impaired person best be helped by a ML/CV system? Wants to be able to ask the computer questions– who was that person? – i.e. interact with it as if it was a human. Turing tests for vision (S. Geman and D. Geman).

• Image Analyst (Medicine, Military) – wants a ML system that can reason about images, make analogies to other images, and so on.

Page 6: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Data Set Dilemmas

• Too complicated a dataset: requires a lot of engineering to perform well (“neural network tricks”, N students testing 100x N parameter settings).

• Too simple a dataset:• Results may not generalize to the real world. It may

focus on side issues.• Tyranny of Datasets: You can only evaluate

performance on a limited set of tasks (e.g., can do “object classification” and not “object segmentation” or “cat part detection”, or ask “what is the cat doing?”)

Page 7: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Datasets and Generalization

• Machine Learning methods are tested on large benchmarked datasets.

• Two of the applications involve 20,000 and 1,000,000 images.

• Critical Issues of Machine Learning: • (I) Learnability: will the results generalize to new

datasets?• (II) Inference: can we compute properties fast enough?• Theoretical Results: Probably Approximately Correct

(PAC) Theorems.

Page 8: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Vision: The Data and the Problem• Complexity, Variability, and Ambiguity of Images.• Enormous range of visual tasks that can be performed. Set of all

images is practically infinite.• 30,000 objects, 1,000 scenes.• How can humans interpret images in 150 Msec?• Fundamental Problem: complexity.

Page 9: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Neuroscience: Bio-Inspired

• Theoretical Models of the Visual Cortex (e.g., T. Poggio) are hierarchical and closely related to convolutional nets.

• Generative models (later in this talk) may help explain the increasing evidence of top-down mechanisms.

• Behavior-to-Brain: propose models for the visual cortex that can be tested by fMRI, multi-electrodes, and related techniques.

• (multi-electrodes T.S. Lee, fMRI D.K. Kersten).• Caveat: real neurons don’t behave like neurons in textbooks…• Conjecture: Structure of the Brain and ML systems is driven

by the statistical structure of the environment. The Pattern Theory Manifesto.

Page 10: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Hierarchical Models of Vision

• Why Hierarchies?• Bio-inspired: Mimics the structure of the

human/macaque visual system.• Computer Vision Architectures: low-, middle-,

high-level. From ambiguous low-level to unambiguous high level.

• Optimal design: for representing, learning, and retrieving image patterns?

Page 11: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Three Types of Hierarchies:

• (I) Convolutional Neural Networks: ImageNet Dataset.

• Krizhevsky, Sutskever, and Hinton (2013). • LeCun, Salakudinov.• (II) Discriminative Part-Based Models (McAllester,

Ramanan, Felzenswalb 2008, L. Zhu et al. 2010). PASCAL dataset.

• (III) Generative Models. Grammars and Compositional Models. (Geman, Mumford, SC Zhu, L. Zhu,…).

Page 12: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Example I: Convolutional Nets

• Krizhevsky, Sutskever, and Hinton (2013).• Dataset ImageNet (Fei Fei Li).• 1,000,000 images.• 1,000 objects.• Task: detect and localize objects.

Page 13: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Example I: Neural Network

• Architecture: Neural Network.• Convolutional: each hidden unit applies the

same localized linear filter to the input.

Page 14: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Example I: Neurons

Page 15: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Example I: The Hierarchy.

Page 16: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Example 1: Model Details

• New model.

Page 17: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Learning

Page 18: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Example 1: Learnt Filters

• Image features learnt – the usual suspects.

Page 19: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Example I: Dropout

Page 20: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Example I: Results

Page 21: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Example I: Conclusion

• This convolutional net was the most successful algorithm on the ImageNet Challenge 2012.

• It requires a very large amounts of data to train.

• Devil is in the details (“tricks for neural networks”).

• Algorithm implemented on Graphics Processing Units (GPUs) to deal with complexity of inference and learning.

Page 22: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Example II: Structured Discriminative Models.

• Star Models : MacAllester, Felzenszwalb, Ramanan. 2008.

• Objects are made from “parts” (not semantic parts).• Discriminative Models: • Hierarchical variant: L. Zhu, Y. Chen. et al. 2010.• Learning: latent support-vector machines.• Inference: window search plus dynamic programming.• Application: Pascal object detection challenge. 20,000

images, 20 objects.• Task: identify and localize (bounding box).

Page 23: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Example II: Mixture Models• Each Object is represented by six models – to allow for different

viewpoints.• Energy function/Probabilistic Model defined on hierarchical graph.• Nodes represent parts which can move relative to each other enabling

spatial deformations.• Constraints on deformations impose by potentials on the graph

structure.Parent-Child spatial constraints Parts: blue (1), yellow (9), purple (36)

Deformations of HorseDeformations of Car

Page 24: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Example II: Mixture Models:

• Each object is represented by 6 hierarchical models (mixture of models).

• These mixture components account for pose/viewpoint changes.

Page 25: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Example II: Features and Potentials

• Edge-Like Cues: Histogram of Gradients (HOGs)• Appearance-Cues: Bag of Words Models

(dictionary obtained by clustering SIFT or HOG features).

• Learning: (I) weights for the importance of features, (ii) weights for the spatial relations between parts.

Page 26: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Example II: Learning by Latent SVM• The Graph Structure is known.• The training data is partly supervised. It gives

image regions labeled by object/non-object.• But you do not know which mixture (viewpoint)

component or the positions of the parts. These are hidden variables.

• Learning: Latent Support Vector Machine (L SVM).• Learn the weights while simultaneously estimating

the hidden variables (part positions, viewpoint).

Page 27: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Example II: Details (1)

• Each hierarchy is a 3-layer tree.• Each node represents a part.• Total of 46 nodes: (1+9+ 4 x 9)

• Each node has a spatial position (parts can “move” or are “active”)• Graph edges from parents to

child – impose spatial constraints.

Page 28: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Example II: Details (2)

• The object model has variables:1. – represents the position of the parts.2. – specifies which mixture component (e.g. pose).3. – specifies whether the object is present or not.4. – model parameter (to be learnt).• Note: during learning the part positions and the

pose are unknown – so they are latent variables and will be expressed as

pVy

p

V),( pVh

Page 29: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Example II: Details (3)

• The “energy” of the model is defined to be: where is the image in the region. • The object is detected by solving:

• If then we have detected the object.• If so, specifies the mixture

component and the positions of the parts.

),,( hyx x

),,(maxarg**, hyxhy 1* y

*)*,(* Vph

Page 30: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Example II: Details (4)

• There are three types of potential terms (1) Spatial terms which specify the distribution

on the positions of the parts.(2) Data terms for the edges of the object

defined using HOG features.(3) Regional appearance data terms

defined by histograms of words (HOWs – using grey SIFT features and K- means).

),,( hyx),( hyshape

),,( hyxHOG

),,( hyxHOW

Page 31: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Example II: Details (5)

• Edge-like: Histogram of Oriented Gradients HOGs (Upper row)

• Regional: Histogram Of Words (Bottom row)• Dense sampling: 13950 HOGs + 27600 HOWs

Page 32: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Example II: Details (6)

• To detect an object requiring solving:

for each image region.• We solve this by scanning over the

subwindows of the image, use dynamic programming to estimate the part positions

and do exhaustive search over the

),,(maxarg**, hyxhy

pVy&

Page 33: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Example II: Details (7)

• The input to learning is a set of labeled image regions.

• Learning require us to estimate the parameters

• While simultaneously estimating the hidden variables

},...,1:)_,_{( Niiyix

),( Vph

Page 34: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Example II: Details (8)

• We use Yu and Joachim’s (2009) formulation of latent SVM.

• This specifies a non-convex criterion to be minimized. This can be re-expressed in terms of a convex plus a concave part.

N

iii

hii

hywhyxwhyyLhyxwCw

1,

2 )],,([max)],,(),,([max||||2

1min

N

iii

h

N

iii

hyw

hyxwC

hyyLhyxwCw

1

1,

2

)],,([max

)],,(),,([max||||2

1min

Page 35: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Example II: Details (9)

• Yu and Joachims (2009) propose the CCCP algorithm (Yuille and Rangarajan 2001) to minimize this criterion.

• This iterates between estimating the hidden variables and the parameters (like the EM algorithm).

• We propose a variant – incremental CCCP – which is faster.

• Result: our method works well for learning the parameters without complex initialization.

Page 36: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Example II: Details (10)

• Iterative Algorithm:– Step 1: fill in the latent positions with best score(DP)– Step 2: solve the structural SVM problem using

partial negative training set (incrementally enlarge).• Initialization:

– No pretraining (no clustering).– No displacement of all nodes (no deformation).– Pose assignment: maximum overlapping

• Simultaneous multi-layer learning

Page 37: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Detection Results on PASCAL 2010: Cat

Page 38: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Example II: Cat Results

Page 39: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Example II: Horse Results

Page 40: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Example II: Car Results

Page 41: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Example II: Conclusion

• All current methods that perform well on the Pascal Object Detection Challenge use these types of models.

• Performance is fairly good for medium to large objects. Errors are understandable – cat versus dog, car versus train.

• But seems highly unlikely that this is how humans perform these tasks – humans can probably learn from much less data).

• The devil is in the details. Small “engineering” changes can yield big improvements.

• Improved results by combining these “top-down” object models with “bottom-up” edge cues: Fidler, Mottaghi, Yuille, Urtasun. CVPR 2013.

Page 42: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Example III: Grammars/Compositional Models

• Generative models of objects and scenes. • These models have explicit representation of parts –

e.g., can “parse” objects instead of just detect them.• Explicit Representations – gives the ability to perform

multiple tasks (arguably closer to human cognition).• Part sharing – efficiency of inference and learning.• Adaptive and Flexible. Can learn from little data.• Tyranny of Datatsets: “will they work on Pascal?”.

Page 43: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Example III: Generative Models

• Basic Grammars (Grenander, Fu, Mjolsness, Biederman).• Images are generated from dictionaries of elementary

components – with stochastic rules for spatial and structural relations.

Page 44: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Example III: Analysis by Synthesis• Analyze an image by inverting image formation.• Inverse problem: determine how the data was generated, how was it

caused? • Inverse computer graphics.

Page 45: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Example III: Real Images• Image Parsing: (Z. Tu, X. Chen, A.L. Yuille, and S.C. Zhu 2003).• Learn probabilistic models for the visual patterns that can appear in

images.• Interpret/understand an image by decomposing it into its

constituent parts. • Inference algorithm: bottom-up and top-down.

Page 46: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Example III: Advantages

• Rich Explicit Representations enable:• Understanding of objects, scenes, and events.• Reasoning about functions and roles of objects, goals and

intentions of agents, predicting the outcomes of events. SC Zhu – MURI.

Page 47: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

• Ability to transfer between contexts and generalize or extrapolate (e.g. , from Cow to Yak). Reduces hypothesis space – PAC Theory.

• Ability to reason about the system, intervene, do diagnostics.

• Allows the system to answer many different questions based on the same underlying knowledge structure.

• Scale up to multiple objects by part-sharing.

Example III: Advantages

Page 48: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Example III: Car Detection• Kokkinos and Yuille 2010. A 3-layer model.• Object made from parts – Car = Red-Part AND Blue-Part AND Green-Part• Parts are made by AND-ing contours. Red-Part=Con-1 AND Con-2…• These contours correspond to AND-ing tokens extracted from the image.

48

The model has flexible geometry to deal with different types of cars:An SUV looks different than a Prius.

Parts move relative to the object.Contours can move relative to the parts.Quantify this spatial variation by a probability distribution which is learnt from data.

Page 49: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

49

Example III: Generative Models.

Page 50: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

50

Example III: Analogy -- Building a puzzle

Bottom-Up solution: Combine pieces until you build the car Does not exploit the box’ cover

Top-Down solution: Try fitting each piece to the box’ cover. Most pieces are uniform/irrelevant

Bottom-Up/Top-Down solution: Form car-like structures, but use cover to suggest combinations.

Uses AI from MacAllester and Felzewnswalb.

Page 51: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

51

Example III: Localize and Parse

51

Page 52: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Example III

• Summary.• Car/Object is represented as a hierarchical

graphical models.• Inference algorithm: message

passing/dynamic programming/A*.• Learning algorithms: parameter estimation. Multi-instance learning (Latent SVM is a special case).

Page 53: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Example III: Part Sharing.

• Exploit part-sharing to deal with multiple objects.• More efficient inference and representation –

exponential gains: quantified in Yuille and Mottaghi ICML. 2013

• Learning requires less data: a part learnt for a Cow can be used for a Yak.

Page 54: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Example III: AND/OR Graphs for Baseball

• Part sharing enables the model to deal with objects with multiple poses and viewpoints (~100).

• Inference and Learning by bottom-up and top-down processing:

54

Page 55: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

55

Example III: Results on Baseball Players:

• Performed well on benchmarked datasets.• Zhu, Chen, Lin, Lin, Yuille CVPR 2008, 2010.

Page 56: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

56

Example III: Structure Learning• Task: given 10 training images, no labeling, no

alignment, highly ambiguous features.– Estimate Graph structure (nodes and edges) – Estimate the parameters.

?Combinatorial Explosion problem

Correspondence is unknown

Page 57: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

57

Example III: Unsupervised Learning• Structure Induction.• Bridges the gap between low-, mid-, and high-level

vision.• Between Chomsky and Hinton?

Page 58: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

58

Example III: Learning Multiple Objects

• Unsupervised learning algorithm to learn parts shared between different objects.

• Zhu, Chen, Freeman, Torralba, Yuille 2010. • Structure Induction – learning the graph

structures and learning the parameters.

Page 59: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Example III: Many Objects/Viewpoints

• 120 templates: 5 viewpoints & 26 classes

59

Page 60: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Example III: Learn Hierarchical Dictionary.

• Low-level to Mid-level to High-level.• Automatically shares parts and stops.

60

Page 61: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Example III: Part Sharing decreases with Levels

61

Page 62: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Example III: Summary• These generative models with explicit rich representations

offer potential advantages: flexibility, adaptability, transfer.• Enable reasoning about functions and roles of objects,

goals and intentions of agents, predicting the outcomes of events.

• Access to semantic descriptions. Making analogies between images.

• Augmented Reality – e.g. computer vision system communicating with a visually impaired human.

• “In the long term models will be generative”. G. Hinton. 2013.

Page 63: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

63

Conclusions• Three examples of Hierarchical Models of

Vision.• Convolutional Networks, Structured

Discriminative Models, Generative Grammars/Compositional Models.

• Relations to Neuroscience.• Machine Learning and Cognitive Science.• Augmented Reality: Humans and Computers• Importance of Data and Tasks.

Page 64: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

Theoretical Frameworks

• All three models formulated in terms of probability distributions/energy functions defined over graphs or grammars.

• Discriminative versus Generative models.• P(W|I) versus P(I|W) P(W).• Representation – are properties represented

explicitly? (Requirement for performing tasks).• Inference algorithms and learning algorithm.• Generalization (PAC theorems).

Page 65: Hierarchical Models of Vision: Machine Learning/Computer Vision Alan Yuille UCLA: Dept. Statistics Joint App. Computer Science, Psychiatry, Psychology

A Probabilistic Model isdefined by four elements

• (i) Graph Structure – Nodes/Edges -- Representation• (ii) State Variables – W – input I. --Representation• (ii) Potentials – Phi -- Probability• (iii) Parameters/Weights – Lambda – Probability

• The state variables are defined at the graph nodes.• The potentials and parameters are defined over the

graph edges – and relate the model to the image I.