up-stat 2015 abstract presentation - statistical and machine learning methods for image processing

Statistical and Machine Learning Methods for Image Processing:

Foundational Concepts and Tools from the Past, Present and Future

Matthew A. Corsetti

CQAS, RIT

2

Statistical Learning in Image Processing

• Unsupervised• noise removal• image reconstruction• image annotation

• Supervised• image recognition• image classification Human: “A group of men playing Frisbee in

the park.”

Computer model: “A group of young people playing a game of Frisbee.”

3

Unsupervised Learning in Image Processing

A Primary Objective:• Improve image quality via local

processing (noise removal, deconvolution, etc.)

• Typical Noise Removal Problem:• Given the image f that has

been corrupted by noise ε• extract from f + ε a smooth

image h • such that h ≈ f

Positron Emission Tomography (PET):• Emission-based technology to map

metabolic activity in the body• Uses radioactive material• Trade-off between image quality and

exposure time

f +

f

h

PET Machine

4

Supervised Learning in Image Processing

Google StreetView System• Locations of detected faces are blurred so as

to preserve privacy

Auto-focus• Locations of detected faces are used as sites

for the selective auto-focus in most modern digital cameras

Primary Objectives:• Address more global properties of an image (locate, recognize and/or classify

objects within an image)• Example: Face detection

• Image is divided into many smaller overlapping frames at different locations, orientations and scales

• Each frame is classified as to whether it contains face-like textures• Locations with highest probabilities of a face are returned (sliding window

detector)

5

The Bayesian Approach

Common applications of Bayesian statistics in image analysis and processing include:

• Adequate modelling of noise structure• Provide confidence statements about the output results• Regularization of undetermined systems through the use of a prior distribution

Bayes Theorem:

• the underlying image, the corresponding signal• the prior distribution which provides prior beliefs about the properties of

the underlying image • the likelihood model which describes the data formation process for the

underlying image • the marginal of the data, uninformative regarding • the posterior which allows for inference about

6

Markov Random Fields

• Markov Random Fields are undirected graphical models that represent dependencies between random variables through the use of undirected edges and can be used to model a prior of an image

• A Markov Random Field (MRF) is a graph G = (V, E)• V represents the set of nodes in the graph • Each node is associated with a random variable for • E the undirected edges which connect the nodes of graph G and

represent their non-conditional dependencies

• The neighborhood of node , denoted , is composed of a specific number of neighboring nodes close to node

• The Markov Blanket of node , is the neighborhood of node that contains only the nodes that are directly connected with node and is the smallest set of nodes that render node conditionally independent of all other nodes in the graph

𝑥𝑖

7

Markov Random Fields

• The size of can certainly be adjusted

• Common sizes include:• 4 nearest neighbors• 8 nearest neighbors• 12 nearest neighbors

• Node is conditionally independent of all other nodes in the graph G so long as the nodes of its Markov Blanket are provided

• The Markov Blanket in this example is represented by the yellow and green nodes as they form direct edges with node

𝑥𝑖

8

Factorization of Markov Random Fields

Pairwise Markov Property• Any two nodes and that do not form an edge are

conditionally independent given all other nodes

•

• The node a is conditionally independent of the node g given all other nodes (i.e. b, c, d, e, f, h)

a

• The MRF has several fully connected subgraphs known as cliques

• Nodes that are not connected are conditionally independent and thus belong to different cliques

Example MRF

b c

d

e f

g

a

g

• The maximal cliques in the example provided are:

h

C = { (a, b, c),• A maximal clique is a clique that cannot be made any larger

without loosing full connectivity

(b, d, c), (b, e, d), (c, d, f), (d, e, f),

(f, h), (e, f, g) }

9

Hammersley-Clifford Theorem

• Instead of associating a conditional probability distributions with each node, a potential function or factor is associated with each maximal clique so as to represent the prior

• Hammersley-Clifford Theorem: A distribution satisfies the conditional independence properties of a MRF iff can be represented as a product of factors (clique potentials), one per maximal clique

• is the set of all maximal cliques in the graph• is the partitioning function (a normalizing constant)

a

Example MRF

b c

d

e f

g

𝑝 (𝑥 )=𝑎𝑏𝑐 (𝑥𝑎 ,𝑥𝑏 ,𝑥𝑐 )𝑏𝑐𝑑 (𝑥𝑏 , 𝑥𝑐 ,𝑥𝑑 )…𝑒𝑓𝑔 (𝑥𝑒 ,𝑥 𝑓 , 𝑥𝑔)

∑𝑥

𝑎𝑏𝑐 (𝑥𝑎 ,𝑥𝑏 ,𝑥𝑐 )𝑏𝑐𝑑 (𝑥𝑏 ,𝑥𝑐 ,𝑥𝑑)…𝑒𝑓𝑔 (𝑥𝑒 , 𝑥 𝑓 ,𝑥𝑔 )

h

10

The Ising Model

• The Ising Model is a MRF that is used in modeling the behavior of magnets• Let represent an atom’s spin which can spin either up or down

• Ferro-magnets – neighboring spins tend to line up in same direction• Anti-ferromagnets – neighboring spins tend not to line up in same direction

• The MRF Ising Model can be represented as a 2d lattice in which neighboring variables are connected

• The pairwise clique potential is defined as:

• represents the coupling strength between nodes and

• Often all edges are assumed to have equal strength and so = J

• If J > 0, neighboring spins likely to be in same state (ferro-magnets)

• If J < 0, neighboring spins unlikely to be in same state (anti-ferro-magnets)

Ferrofluid

11

The Ising Model as the Prior

J = the coupling strength parameter

12

The Potts Model• A generalization of the Ising model to multiple discrete cases • If then the clique potential function is:

J = 1.32 J = 1.34 J = 1.36

• J = 1.32, many small clusters

• J = 1.34, the critical value of the strength parameter, a mix of small and large clusters

• J = 1.36, larger clusters

• The Potts Model is frequently used as a prior for image segmentation because it states that pixels that are neighbors are likely to have the same label

• The Potts prior is appropriate for regularization in supervised learning task but not for image segmentation in unsupervised learning because the segments it produces do not accurately represent segments observed in nature

Critical Value

13

A Final Example: the prior • A model that encodes our prior beliefs about the underlying image • Represents how we believe the components of behave in

Cell Bodies Image

Cell Nuclei Image

Ising Model

Smoothed Potts

Model

Smoothed Ising

Model

Potts Model

Thank you

Questions & Answers

up-stat 2015 abstract presentation - statistical and machine learning methods for image processing

Documents

image f

underlying image

image quality

image analysis

image example

smooth image h

face detection image

neighbors node