review rong jin. comparison of different classification models the goal of all classifiers...
Post on 22-Dec-2015
214 views
TRANSCRIPT
![Page 1: Review Rong Jin. Comparison of Different Classification Models The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d7f5503460f94a62a73/html5/thumbnails/1.jpg)
ReviewRong Jin
![Page 2: Review Rong Jin. Comparison of Different Classification Models The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d7f5503460f94a62a73/html5/thumbnails/2.jpg)
Comparison of Different Classification Models The goal of all classifiers
Predicating class label y for an input x Estimate p(y|x)
![Page 3: Review Rong Jin. Comparison of Different Classification Models The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d7f5503460f94a62a73/html5/thumbnails/3.jpg)
K Nearest Neighbor (kNN) Approach
(k=1)(k=4)
Probability interpretation: estimate p(y|x) as
, | , ( )( | ) , ( ) is the neighborhood around
| ( ) |
i i i ix y y y x N xp y x N x x
N x
![Page 4: Review Rong Jin. Comparison of Different Classification Models The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d7f5503460f94a62a73/html5/thumbnails/4.jpg)
K Nearest Neighbor Approach (KNN) What is the appropriate size for neighborhood N(x)?
Leave one out approach Weight K nearest neighbor
Neighbor is defined through a weight function
Estimate p(y|x)
How to estimate the appropriate value for 2?
( ) ( , )( | )
( )i ii
ii
w x y yp y x
w x
2
22
( ) exp2
ii
x xw x
![Page 5: Review Rong Jin. Comparison of Different Classification Models The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d7f5503460f94a62a73/html5/thumbnails/5.jpg)
K Nearest Neighbor Approach (KNN) What is the appropriate size for neighborhood N(x)?
Leave one out approach Weight K nearest neighbor
Neighbor is defined through a weight function
Estimate p(y|x)
How to estimate the appropriate value for 2?
( ) ( , )( | )
( )i ii
ii
w x y yp y x
w x
2
22
( ) exp2
ii
x xw x
![Page 6: Review Rong Jin. Comparison of Different Classification Models The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d7f5503460f94a62a73/html5/thumbnails/6.jpg)
K Nearest Neighbor Approach (KNN) What is the appropriate size for neighborhood N(x)?
Leave one out approach Weight K nearest neighbor
Neighbor is defined through a weight function
Estimate p(y|x)
How to estimate the appropriate value for 2?
( ) ( , )( | )
( )i ii
ii
w x y yp y x
w x
2
22
( ) exp2
ii
x xw x
![Page 7: Review Rong Jin. Comparison of Different Classification Models The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d7f5503460f94a62a73/html5/thumbnails/7.jpg)
Weighted K Nearest Neighbor Leave one out + maximum likelihood Estimate leave one out probability
Leave one out likelihood of training data
Search the optimal 2 by maximizing the leave one out likelihood
( ) ( , ) 1 ( ) ( , )( | )
( ) 1 ( )i j j ii j i j j ii
j ji j i ji j i
w x y y w x y yp y x
w x w x
LOO 1 1
1 ( ) ( , )log ( | ) log
1 ( )n n i j j ii
j jj ji ji
w x y yl p y x
w x
![Page 8: Review Rong Jin. Comparison of Different Classification Models The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d7f5503460f94a62a73/html5/thumbnails/8.jpg)
Weight K Nearest Neighbor Leave one out + maximum likelihood Estimate leave one out probability
Leave one out likelihood of training data
Search the optimal 2 by maximizing the leave one out likelihood
( ) ( , ) 1 ( ) ( , )( | )
( ) 1 ( )i j j ii j i j j ii
j ji j i ji j i
w x y y w x y yp y x
w x w x
LOO 1 1
1 ( ) ( , )log ( | ) log
1 ( )n n i j j ii
j jj ji ji
w x y yl p y x
w x
![Page 9: Review Rong Jin. Comparison of Different Classification Models The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d7f5503460f94a62a73/html5/thumbnails/9.jpg)
Gaussian Generative Model p(y|x) ~ p(x|y) p(y): posterior = likelihood prior Estimate p(x|y) and p(y)
Allocate a separate set of parameters for each class {1, 2,…, c}
p(xly;) p(x;y)
Maximum likelihood estimation2
22
( )1( | ) exp
22
y
yy
xp x y
2
22
1 1
( )1log ( | ) log log 2 log
2 2i
i i i
i
N Ni y
i i y y yi i y
xl p x y p p
![Page 10: Review Rong Jin. Comparison of Different Classification Models The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d7f5503460f94a62a73/html5/thumbnails/10.jpg)
Gaussian Generative Model p(y|x) ~ p(x|y) p(y): posterior = likelihood prior Estimate p(x|y) and p(y)
Allocate a separate set of parameters for each class {1, 2,…, c}
p(xly;) p(x;y)
Maximum likelihood estimation2
22
( )1( | ) exp
22
y
yy
xp x y
2
22
1 1
( )1log ( | ) log log 2 log
2 2i
i i i
i
N Ni y
i i y y yi i y
xl p x y p p
![Page 11: Review Rong Jin. Comparison of Different Classification Models The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d7f5503460f94a62a73/html5/thumbnails/11.jpg)
Gaussian Generative Model Difficult to estimate p(x|y) if x is of high dimensionality
Naïve Bayes:
Essentially a linear model
How to make a Gaussian generative model discriminative? (m,m) of each class are only based on the data belonging
to that class lack of discriminative power
1 2( | ; ) ( | ; ) ( | ; )... ( | ; )dp x y p x y p x y p x y
![Page 12: Review Rong Jin. Comparison of Different Classification Models The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d7f5503460f94a62a73/html5/thumbnails/12.jpg)
Gaussian Generative Model
Maximum likelihood estimation
2
22
2'' 1
'22' 1 ''
( )1exp
22( | ) ( )( | )
( )( | ') ( ') 1exp
22
yy
yy
c cyy
yy yy
xp
p x y p yp y x
xp x y p yp
1
2 2' '2
2 221 ' 1 ''
log ( | )
( ) ( )1log 2 log log exp
2 2 22
i
i i
i
N
i ii
N ci y y i y
y yi yy yy
l p y x
x p xp
How to optimize this objective function?
![Page 13: Review Rong Jin. Comparison of Different Classification Models The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d7f5503460f94a62a73/html5/thumbnails/13.jpg)
Gaussian Generative Model Bound optimization algorithm
1 1 1
' ' ' ' ' '1 1 1
2 ' '
'2 ' 2
''
, , ,..., , , : parameter of current iteration
' , , ,..., , , : parameter of last iteration
( ) ( ')
( )(2 )1log log
2 2
log2
i i i i i i
i i i
c c c
c c c
y y y y i y y
y y y
y
y
p p
p p
l l
p x
p
p
' 2 2
1 ' ' '
'2 2'2 2' 1 ' 1' '' '
( ) ( )exp log exp
2 22
N
c ci i y y i y
y yy yy
x p x
![Page 14: Review Rong Jin. Comparison of Different Classification Models The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d7f5503460f94a62a73/html5/thumbnails/14.jpg)
Gaussian Generative Model
2 ' '
'2 ' 2
2' '
22' 1 '1 '
' ' 2' '
'2'2' 1 ''
( ) ( ')
( )(2 )1log log
2 2
( )exp
22
( )exp
22
i i i i i i
i i i
y y y y i y y
y y y
cNy i y
y yi y
cy i y
y yy
l l
p x
p
p x
p x
Using log 1x x We have decomposed the interaction of parameters between different classes
Question: how to handle x with multiple features ?
![Page 15: Review Rong Jin. Comparison of Different Classification Models The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d7f5503460f94a62a73/html5/thumbnails/15.jpg)
Logistic Regression Model A linear decision boundary: wx+b
A probabilistic model p(y|x)
Maximum likelihood approach for estimating weights w and threshold b
0 positive
0 negative
w x b
w x b
1( 1| )
1 exp( ( ))p y x
y w x b
( ) ( )
1 1
( ) ( )
1 1
( ) log ( | ) log ( | )
1 1log log
1 exp 1 exp
N Ntrain i ii i
N N
i ii i
l D p x p x
w x b w x b
1w x b
![Page 16: Review Rong Jin. Comparison of Different Classification Models The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d7f5503460f94a62a73/html5/thumbnails/16.jpg)
Logistic Regression Model Overfitting issue Example: text classification
Words that appears in only one document will be assigned with infinite large weight
Solution: regularization
( ) ( )
21 1
( ) ( ) 21 1 1
( ) log ( | ) log ( | )
1 1log log
1 exp 1 exp
N Ntrain i ii i
N N mji i j
i i
l D p x p x s w
s ww x b w x b
Regularization term
![Page 17: Review Rong Jin. Comparison of Different Classification Models The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d7f5503460f94a62a73/html5/thumbnails/17.jpg)
Kernelize logistic regression model
1
1
( ), ( )
( , ) ( , )
Ni ii
Ni ii
x x w x
w x K w x K x x
1
1 , 1
1
1 1( | )
1 exp( ( , )) 1 exp ( , )
1( ) log ( , )
1 exp ( , )
Ni ii
N Nreg i j i ji i jN
i j j ij
p y xyK x w y K x x
l c K x xy K x x
Non-linear Logistic Regression Model
![Page 18: Review Rong Jin. Comparison of Different Classification Models The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d7f5503460f94a62a73/html5/thumbnails/18.jpg)
Non-linear Logistic Regression Model Hierarchical Mixture Expert Model
Group linear classifiers into a tree structure
Group 1
g1(x)
m1,1(x)
Group Layer
ExpertLayer
r(x)
Group 2
g2(x)
m1,2(x) m2,1(x) m2,2(x)
1 11 2 21
1 12 2 22
( 1| ) ( | ) ( 1| ) ( | )( | ) ( 1| ) ( 1| )
( 1| ) ( | ) ( 1| ) ( | )
g x m y x g x m y xp y x r x r x
g x m y x g x m y x
Products generates nonlinearity in the prediction function
![Page 19: Review Rong Jin. Comparison of Different Classification Models The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d7f5503460f94a62a73/html5/thumbnails/19.jpg)
It could be a rough assumption by assuming all data points can be fitted by a linear model
But, it is usually appropriate to assume a local linear model
KNN can be viewed as a localized model without any parameters Can we extend the KNN approach by introducing a
localized linear model?
Non-linear Logistic Regression Model
![Page 20: Review Rong Jin. Comparison of Different Classification Models The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d7f5503460f94a62a73/html5/thumbnails/20.jpg)
Localized Logistic Regression Model Similar to the weight KNN
Weigh each training example by
Build a logistic regression model using the weighted examples
2
22
( ) exp2
ii
x xw x
2
21
1log
1 exp
Nreg i
i
l c wy w x b
2
21
1( ) log
1 exp
Nreg ii
i
l w x c wy w x b
![Page 21: Review Rong Jin. Comparison of Different Classification Models The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d7f5503460f94a62a73/html5/thumbnails/21.jpg)
Localized Logistic Regression Model Similar to the weight KNN
Weigh each training example by
Build a logistic regression model using the weighted examples
2
22
( ) exp2
ii
x xw x
2
21
1log
1 exp
Nreg i
i
l c wy w x b
2
21
1( ) log
1 exp
Nreg ii
i
l w x c wy w x b
![Page 22: Review Rong Jin. Comparison of Different Classification Models The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d7f5503460f94a62a73/html5/thumbnails/22.jpg)
Conditional Exponential Model An extension of logistic regression model to multiple class case
A different set of weights wy and threshold b for each class y Translation invariance
1( | ; ) exp( )
( )
( ) exp( )
y y
y yy
p y x b x wZ x
Z x b x w
1 1 0w b
![Page 23: Review Rong Jin. Comparison of Different Classification Models The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d7f5503460f94a62a73/html5/thumbnails/23.jpg)
Iterative scaling methods for optimization
Maximum Entropy Model Finding the simplest model that matches with the
data
1 1( | )
1 1
max ( | ) max ( | ) log ( | )
subject to
( | ) ( , ), ( | )=1
i
N Ni i i ii i yp y x p
N Ni i i i ii i y
H y x p y x p y x
p y x x x y y p y x
Maximize Entropy Prefer uniform distribution
Constraints Enforce the model to be
consistent with observed data
![Page 24: Review Rong Jin. Comparison of Different Classification Models The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d7f5503460f94a62a73/html5/thumbnails/24.jpg)
Classification Margin
Support Vector Machine Classification margin Maximum margin principle:
Separate data far away from the decision boundary
Two objectives Minimize the classification
error over training data Maximize the classification
margin Support vectors
Only support vectors have impact on the location of decision boundary
denotes +1
denotes -1
0w x b
![Page 25: Review Rong Jin. Comparison of Different Classification Models The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d7f5503460f94a62a73/html5/thumbnails/25.jpg)
Support Vector Machine Classification margin Maximum margin principle:
Separate data far away from the decision boundary
Two objectives Minimize the classification
error over training data Maximize the classification
margin Support vectors
Only support vectors have impact on the location of decision boundary
denotes +1
denotes -1
Support Vectors
0w x b
![Page 26: Review Rong Jin. Comparison of Different Classification Models The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d7f5503460f94a62a73/html5/thumbnails/26.jpg)
Support Vector Machine Separable case
Noisy case
* * 21
,
1 1
2 2
{ , }= argmin
subject to
1
1
....
1
mii
w b
N N
w b w
y w x b
y w x b
y w x b
* * 21 1
,
1 1 1 1
2 2 2 2
{ , }= argmin
subject to
1 , 0
1 , 0
....
1 , 0
m Ni ji j
w b
N N N N
w b w c
y w x b
y w x b
y w x b
![Page 27: Review Rong Jin. Comparison of Different Classification Models The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d7f5503460f94a62a73/html5/thumbnails/27.jpg)
Support Vector Machine Separable case
Noisy case
* * 21
,
1 1
2 2
{ , }= argmin
subject to
1
1
....
1
mii
w b
N N
w b w
y w x b
y w x b
y w x b
* * 21 1
,
1 1 1 1
2 2 2 2
{ , }= argmin
subject to
1 , 0
1 , 0
....
1 , 0
m Ni ji j
w b
N N N N
w b w c
y w x b
y w x b
y w x b
Quadratic programming!
![Page 28: Review Rong Jin. Comparison of Different Classification Models The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d7f5503460f94a62a73/html5/thumbnails/28.jpg)
Logistic Regression Model vs. Support Vector Machine Logistic regression model
Support vector machine
21 1
,{ , }* arg min log 1 exp ( )
N mi ji j
w bw b y w x b s w
* * 21 1
,
1 1 1 1
{ , }= argmin
subject to
1 , 0
....
1 , 0
N mi ji j
w b
N N N N
w b c w
y w x b
y w x b
Different loss function for punishing mistakes
Identical terms
![Page 29: Review Rong Jin. Comparison of Different Classification Models The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d7f5503460f94a62a73/html5/thumbnails/29.jpg)
-3 -2.5 -2 -1.5 -1 -0.5 0 0.5 10
0.5
1
1.5
2
2.5
3
3.5
wx+b
Loss
Loss function for logistic regressionLoss function for SVM
Logistic Regression Model vs. Support Vector Machine
Logistic regression differs from support vector machine only in the loss function
( ) 1y wx b
![Page 30: Review Rong Jin. Comparison of Different Classification Models The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d7f5503460f94a62a73/html5/thumbnails/30.jpg)
Kernel Tricks Introducing nonlinearity into the discriminative models Diffusion kernel
A graph laplacian L for local similarity
Diffusion kernel
Propagate local similarity information into a global one
,
,
,
i j
i j
i kk i
s x x i jL
s x x i j
, or L dK e K LK
d
![Page 31: Review Rong Jin. Comparison of Different Classification Models The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d7f5503460f94a62a73/html5/thumbnails/31.jpg)
Fisher Kernel Derive a kernel function from a generative model Key idea
Map a point x in original input space into the model space The similarity of two data points are measured in the
model space
Original Input Space Model Space
1 1( )x x
2 2( )x x
Measure the similarity in the
model space
![Page 32: Review Rong Jin. Comparison of Different Classification Models The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d7f5503460f94a62a73/html5/thumbnails/32.jpg)
Kernel Methods in Generative Model Usually, kernels can be introduced to a
generative model through a Gaussian process Define a “kernelized” covariance matrix
Positive semi-definitive, similar to Mercer’s condition
![Page 33: Review Rong Jin. Comparison of Different Classification Models The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d7f5503460f94a62a73/html5/thumbnails/33.jpg)
Multi-class SVM SVMs can only handle two-class outputs One-against-all
Learn N SVM’s SVM 1 learns “Output==1” vs “Output != 1” SVM 2 learns “Output==2” vs “Output != 2” : SVM N learns “Output==N” vs “Output != N”
![Page 34: Review Rong Jin. Comparison of Different Classification Models The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d7f5503460f94a62a73/html5/thumbnails/34.jpg)
Error Correct Output Code (ECOC) Encode each class into a bit vector
1 0 0 1
0 1 0 1
0 1 1 0
A
B
C
S1 S2 S3 S4
x 1 1 1 0
1
1
2
![Page 35: Review Rong Jin. Comparison of Different Classification Models The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d7f5503460f94a62a73/html5/thumbnails/35.jpg)
Ordinal Regression A special class of multi-class classification problem There a natural ordinal relationship between multiple
classes Maximum margin principle
The computation of margin involves multiple classes
‘good’
‘OK’
‘bad’w’
![Page 36: Review Rong Jin. Comparison of Different Classification Models The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d7f5503460f94a62a73/html5/thumbnails/36.jpg)
Ordinal Regression
1 2
1 2
1 2
* * *1 2 1 2
, ,
1 1 2 2, ,
1 2
2 2, ,1 1
{ , , } arg max margin( , , )
arg max min(margin ( , ),margin ( , ))
arg max min min , ming o o b
b b
b b
d dD D D Db bi ii i
b b b b
b b
b b
w w
w
w
x xw
w w
w w
x w x w
1
1 2
2
subject to
: 0
: 0, 0
: 0
i g i
i o i i
i b i
D b
D b b
D b
x x w
x x w x w
x x w
1 2
* * * 21 2 1
, ,
{ , , } arg min dii
b b
b b w w
w
1
1 2
2
subject to
: 1
: 1, 1
: 1
i g i
i o i i
i b i
D b
D b b
D b
x x w
x x w x w
x x w
![Page 37: Review Rong Jin. Comparison of Different Classification Models The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d7f5503460f94a62a73/html5/thumbnails/37.jpg)
Decision Tree
From slides of Andrew Moore
![Page 38: Review Rong Jin. Comparison of Different Classification Models The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d7f5503460f94a62a73/html5/thumbnails/38.jpg)
Decision Tree A greedy approach for generating a decision tree
1. Choose the most informative feature Using the mutual information measurements
2. Split data set according to the values of the selected feature
3. Recursive until each data item is classified correctly Attributes with real values
Quantize the real value into a discrete one
![Page 39: Review Rong Jin. Comparison of Different Classification Models The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d7f5503460f94a62a73/html5/thumbnails/39.jpg)
Decision Tree The overfitting problem
Tree pruning Reduced error pruning Rule post-pruning
![Page 40: Review Rong Jin. Comparison of Different Classification Models The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d7f5503460f94a62a73/html5/thumbnails/40.jpg)
Decision Tree The overfitting problem
Tree pruning Reduced error pruning Rule post-pruning
![Page 41: Review Rong Jin. Comparison of Different Classification Models The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d7f5503460f94a62a73/html5/thumbnails/41.jpg)
Generalize Decision Tree
+ +
a decision tree with simple data partition
+
a decision tree using classifiers for data partition
+
Each node is a linear classifier
Attribute 1
Attribute 2
classifier