pattern recognition fatoş tunay yarman vural

70
PATTERN RECOGNITION Fatoş Tunay Yarman Vural Textbook: Pattern Recognition and Machine Learning, C. Bishop Reccomended Book: Pattern Theory, U. Granander, M. Miller

Upload: ardith

Post on 19-Mar-2016

109 views

Category:

Documents


4 download

DESCRIPTION

PATTERN RECOGNITION Fatoş Tunay Yarman Vural. Textbook : Pattern Recognition and Machine Learning , C. Bishop Reccomended Book : Pattern Theory , U. Granander , M. Miller. Course Requirement. Final:50% Project: 50% Literature Survey report : 1 April Algorithm Development :1 May - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

PATTERN RECOGNITIONFatoş Tunay Yarman Vural

Textbook: Pattern Recognition and Machine Learning, C. Bishop

Reccomended Book: Pattern Theory, U. Granander, M. Miller

Page 2: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Course Requirement

Final:50%Project: 50%Literature Survey report: 1 AprilAlgorithm Development:1 MayFull Paper with implementation 1 June

Page 3: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Content

1.What is Pattern2. Probability Theory3. Bayesian Paradigm4. Information Theory5. Linear Methods6.Kernel Methods7. Graph Methods

Page 4: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

WHAT İS PATTERN

• Structures regulated by rules• Goal:Represent empirical knowledge in

mathematical forms• the Mathematics of Perception• Need: Algebra, probability theory, graph

theory

Page 5: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

What you perceive is not what you hear:

ACTUAL SOUND1. The ?eel is on the shoe2. The ?eel is on the car3. The ?eel is on the table4. The ?eel is on the orange

PERCEIVED WORDS1. The heel is on the shoe2. The wheel is on the car3. The meal is on the table4. The peel is on the orange

(Warren & Warren, 1970)

Statistical inference is being used!

Page 6: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

All flows! Heraclitos

• It is only the invariance, the permanent facts, that enable us to find the meaning in a world of flux.

• We can only perceive variances• Our aim is to find the invariant laws of our

varying obserbvations

Pattern Recognition

Page 7: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

ASSUMPTİON

SOURCE:HypothesisClassesObljects

CHANNEL:Noisy

OBSERVATION:Multiple sensorVariations

Page 8: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Example

Handwritten Digit Recognition

Page 9: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Polynomial Curve Fitting

Page 10: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Sum-of-Squares Error Function

Page 11: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

0th Order Polynomial

Page 12: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

1st Order Polynomial

Page 13: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

3rd Order Polynomial

Page 14: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

9th Order Polynomial

Page 15: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Over-fitting

Root-Mean-Square (RMS) Error:

Page 16: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Polynomial Coefficients

Page 17: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Data Set Size:

9th Order Polynomial

Page 18: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Data Set Size:

9th Order Polynomial

Page 19: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Regularization

Penalize large coefficient values

Page 20: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Regularization:

Page 21: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Regularization:

Page 22: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Regularization: vs.

Page 23: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Polynomial Coefficients

Page 24: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Probability Theory

Apples and Oranges

Page 25: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Probability Theory

Marginal Probability

Conditional ProbabilityJoint Probability

Page 26: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Probability Theory

Sum Rule

Product Rule

Page 27: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

The Rules of Probability

Sum Rule

Product Rule

Page 28: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Bayes’ Theorem

posterior likelihood × prior

Page 29: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Probability Densities

Page 30: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Transformed Densities

Markus Svensén
This figure was taken from Solution 1.4 in the web-edition of the solutions manual for PRML, available at http://research.microsoft.com/~cmbishop/PRML. A more thorough explanation of what the figure shows is provided in the text of the solution.
Page 31: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Expectations

Conditional Expectation(discrete)

Approximate Expectation(discrete and continuous)

Page 32: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Variances and Covariances

Page 33: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

The Gaussian Distribution

Page 34: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Gaussian Mean and Variance

Page 35: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

The Multivariate Gaussian

Page 36: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Gaussian Parameter Estimation

Likelihood function

Page 37: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Maximum (Log) Likelihood

Page 38: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Properties of and

Page 39: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Curve Fitting Re-visited

Page 40: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Maximum Likelihood

Determine by minimizing sum-of-squares error, .

Page 41: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Predictive Distribution

Page 42: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

MAP: A Step towards Bayes

Determine by minimizing regularized sum-of-squares error, .

Page 43: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Bayesian Curve Fitting

Page 44: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Bayesian Predictive Distribution

Page 45: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Model Selection

Cross-Validation

Page 46: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Curse of Dimensionality

Page 47: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Curse of Dimensionality

Polynomial curve fitting, M = 3

Gaussian Densities in higher dimensions

Page 48: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Decision Theory

Inference stepDetermine either or .

Decision stepFor given x, determine optimal t.

Page 49: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Minimum Misclassification Rate

Page 50: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Minimum Expected Loss

Example: classify medical images as ‘cancer’ or ‘normal’

DecisionTr

uth

Page 51: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Minimum Expected Loss

Regions are chosen to minimize

Page 52: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Reject Option

Page 53: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Why Separate Inference and Decision?

• Minimizing risk (loss matrix may change over time)• Reject option• Unbalanced class priors• Combining models

Page 54: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Decision Theory for Regression

Inference stepDetermine .

Decision stepFor given x, make optimal prediction, y(x), for t.

Loss function:

Page 55: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

The Squared Loss Function

Page 56: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

y(x), obtained by minimizing the expected squared loss, with respect tonmean of the conditional distribution p(t|x).

Minimum Expected loss= optimal least squares predictor w.r. To the conditional mean +the variance of the distribution of t, averaged over x.

Second term: can be regarded as noise. Because it is independent of y(x),it represents the irreducible minimum value of the loss function.

Page 57: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Minkowsky Metric: Lq = |y - t|q

Page 58: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Generative vs Discriminative

Generative approach: ModelUse Bayes’ theorem

Discriminative approach: Model directly

Page 59: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Information Theory:Claude Shannon 1916-2001

Goal: Find the amount of information carried by a specific value of an r.v.Need something intuitive

Page 60: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Information Theory: C. Shannon

Information: giving form or shape to the mindAssumptions: Source Receiver• Information is the quality of a message • it may be a truth or a lie, • if the amount of information in the received

message increases, the message is more accurate.• Need a common alphabet to communucate

message

Page 61: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Quantification of information.

Given r.v. X and p(x) , what is the amount of information when we receive

an outcome of x? Self Information h(x)= -log p (x)Low probability High info SurpriseBase e: natsBase 2: bits

Page 62: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Entropy: Expected value of self information information needed to specify the state of a random variable.

Why does H(X) measures information?•İt makes sense intuitively•“Nobody knows what entropy really is, so in any discussion you will always have an advan tage". Von Neumann

Page 63: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Entropy

Noiseless Coding theory: • Entropy is a lower bound on the number of bits needed to

transmit the state of random variable.

Ex: discrete with 8 possible states (alphabet); how many bits (only two values using) to transmit the state of x?

All states equally likely,

Page 64: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Entropy

Page 65: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Entropy and Multiplicity

In how many ways can N identical objects be allocated in M bins?

Entropy maximized when

Page 66: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Entropy

Page 67: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Differential Entropy

Put bins of width ¢ along the real line

Differential entropy maximized (for fixed ) when

in which case

Page 68: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Conditional Entropy

Page 69: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

The Kullback-Leibler DivergenceAverage additional amount of information required to specify the value of x as a result of using q(x) instead of the true distribution p(x)

Page 70: PATTERN RECOGNITION Fatoş  Tunay Yarman Vural

Mutual Information