Download - Chapter 4 CONCEPTS OF LEARNING, CLASSIFICATION AND REGRESSION Cios / Pedrycz / Swiniarski / Kurgan
![Page 1: Chapter 4 CONCEPTS OF LEARNING, CLASSIFICATION AND REGRESSION Cios / Pedrycz / Swiniarski / Kurgan](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e4f5503460f94b4754e/html5/thumbnails/1.jpg)
Chapter 4
CONCEPTS OF LEARNING, CLASSIFICATION AND
REGRESSION
Cios / Pedrycz / Swiniarski / KurganCios / Pedrycz / Swiniarski / Kurgan
![Page 2: Chapter 4 CONCEPTS OF LEARNING, CLASSIFICATION AND REGRESSION Cios / Pedrycz / Swiniarski / Kurgan](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e4f5503460f94b4754e/html5/thumbnails/2.jpg)
© 2007 Cios / Pedrycz / Swiniarski /
Kurgan
2
Outline
• Main Modes of Learning
• Types of Classifiers
• Approximation, Generalization and Memorization
![Page 3: Chapter 4 CONCEPTS OF LEARNING, CLASSIFICATION AND REGRESSION Cios / Pedrycz / Swiniarski / Kurgan](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e4f5503460f94b4754e/html5/thumbnails/3.jpg)
© 2007 Cios / Pedrycz / Swiniarski /
Kurgan
3
Main Modes of Learning
• Unsupervised learning
• Supervised learning
• Reinforcement learning
• Learning with knowledge hints and semi-supervised learning
![Page 4: Chapter 4 CONCEPTS OF LEARNING, CLASSIFICATION AND REGRESSION Cios / Pedrycz / Swiniarski / Kurgan](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e4f5503460f94b4754e/html5/thumbnails/4.jpg)
© 2007 Cios / Pedrycz / Swiniarski /
Kurgan
4
Unsupervised Learning
Unsupervised learning, e.g., clustering, is concerned with an automatic discovering of structure in data without any supervision.
Given N-dimensional dataset X = {x1, x2,…, xN}, where each xk is characterized by a set of attributes, determine structure, i.e., identify and describe groups (clusters) present within X.
![Page 5: Chapter 4 CONCEPTS OF LEARNING, CLASSIFICATION AND REGRESSION Cios / Pedrycz / Swiniarski / Kurgan](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e4f5503460f94b4754e/html5/thumbnails/5.jpg)
© 2007 Cios / Pedrycz / Swiniarski /
Kurgan
5
Examples of Clusters
(a) (b)
x1
x2
(c) (d
Geometry of clusters (groups) and 4 ways of grouping patterns
![Page 6: Chapter 4 CONCEPTS OF LEARNING, CLASSIFICATION AND REGRESSION Cios / Pedrycz / Swiniarski / Kurgan](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e4f5503460f94b4754e/html5/thumbnails/6.jpg)
© 2007 Cios / Pedrycz / Swiniarski /
Kurgan
6
Defining Distance/Closeness of Data
Distance function d(x, y) plays a pivotal role when grouping data
Conditions for a distance metric:d (x,x) = 0 d(x, y ) = d(y,x) symmetryd(x, z) + d(z, y) >= d(x,y) triangle inequality
![Page 7: Chapter 4 CONCEPTS OF LEARNING, CLASSIFICATION AND REGRESSION Cios / Pedrycz / Swiniarski / Kurgan](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e4f5503460f94b4754e/html5/thumbnails/7.jpg)
© 2007 Cios / Pedrycz / Swiniarski /
Kurgan
7
Examples of Distance Functions
|yx|),d( i
n
1ii
yx
n
1i
2ii )y(x),d( yx
Hamming distance
Euclidean distance
Tchebyschev distance )|yx(|max),d( iii yx
![Page 8: Chapter 4 CONCEPTS OF LEARNING, CLASSIFICATION AND REGRESSION Cios / Pedrycz / Swiniarski / Kurgan](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e4f5503460f94b4754e/html5/thumbnails/8.jpg)
© 2007 Cios / Pedrycz / Swiniarski /
Kurgan
8
Hamming/Euclidean/ Tchebyschev Distances
d
d d
d d
d
![Page 9: Chapter 4 CONCEPTS OF LEARNING, CLASSIFICATION AND REGRESSION Cios / Pedrycz / Swiniarski / Kurgan](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e4f5503460f94b4754e/html5/thumbnails/9.jpg)
© 2007 Cios / Pedrycz / Swiniarski /
Kurgan
9
Supervised Learning
We are given a collection of data (patterns) in two forms:
• discrete labels - in which case we have a classification problem
• values of a continuous variable – in which case we have a regression or approximation problem
![Page 10: Chapter 4 CONCEPTS OF LEARNING, CLASSIFICATION AND REGRESSION Cios / Pedrycz / Swiniarski / Kurgan](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e4f5503460f94b4754e/html5/thumbnails/10.jpg)
© 2007 Cios / Pedrycz / Swiniarski /
Kurgan
10
Examples of Classifiers
Linear classifier
Piece-wise linear classifier
Nonlinear classifier
(x)
(x)
(x)
![Page 11: Chapter 4 CONCEPTS OF LEARNING, CLASSIFICATION AND REGRESSION Cios / Pedrycz / Swiniarski / Kurgan](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e4f5503460f94b4754e/html5/thumbnails/11.jpg)
© 2007 Cios / Pedrycz / Swiniarski /
Kurgan
11
Reinforcement Learning
Reinforcement learning is guided by less detailed information (supervision mechanism) than in the case of supervised learning.
It comes in the form of reinforcement information (reinforcement signal).
For instance, given “c” classes, the reinforcement signal r(w) could be binary:
otherwise 1,-
,...)ω,(ωeven is label class if 1,r(w) 42
![Page 12: Chapter 4 CONCEPTS OF LEARNING, CLASSIFICATION AND REGRESSION Cios / Pedrycz / Swiniarski / Kurgan](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e4f5503460f94b4754e/html5/thumbnails/12.jpg)
© 2007 Cios / Pedrycz / Swiniarski /
Kurgan
12
Reinforcement Learning
Reinforcement in classification- partial guidance through class combination
r(z) classifier
![Page 13: Chapter 4 CONCEPTS OF LEARNING, CLASSIFICATION AND REGRESSION Cios / Pedrycz / Swiniarski / Kurgan](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e4f5503460f94b4754e/html5/thumbnails/13.jpg)
© 2007 Cios / Pedrycz / Swiniarski /
Kurgan
13
Reinforcement Learning
Reinforcement in regression- the thresholded version of target signal
r(z)
Regression model
![Page 14: Chapter 4 CONCEPTS OF LEARNING, CLASSIFICATION AND REGRESSION Cios / Pedrycz / Swiniarski / Kurgan](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e4f5503460f94b4754e/html5/thumbnails/14.jpg)
© 2007 Cios / Pedrycz / Swiniarski /
Kurgan
14
Reinforcement Learning
Reinforcement in regression- partial guidance through aggregate (average) of a signal
r(z)
Regression model
![Page 15: Chapter 4 CONCEPTS OF LEARNING, CLASSIFICATION AND REGRESSION Cios / Pedrycz / Swiniarski / Kurgan](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e4f5503460f94b4754e/html5/thumbnails/15.jpg)
© 2007 Cios / Pedrycz / Swiniarski /
Kurgan
15
Semi-supervised Learning
Often, we possess some domain knowledge when clustering. It may be in the form of a small portion of data being labeled.
labeled patterns
![Page 16: Chapter 4 CONCEPTS OF LEARNING, CLASSIFICATION AND REGRESSION Cios / Pedrycz / Swiniarski / Kurgan](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e4f5503460f94b4754e/html5/thumbnails/16.jpg)
© 2007 Cios / Pedrycz / Swiniarski /
Kurgan
16
Learning with Proximity Hints
Instead of class labels, we may have pairs of datafor which proximity levels have been provided.
Proximity = Proximity =
Advantages:
•Number of classes is not required
•Only some selected pairs of data areconsidered
![Page 17: Chapter 4 CONCEPTS OF LEARNING, CLASSIFICATION AND REGRESSION Cios / Pedrycz / Swiniarski / Kurgan](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e4f5503460f94b4754e/html5/thumbnails/17.jpg)
© 2007 Cios / Pedrycz / Swiniarski /
Kurgan
17
Classification Problem
Classifiers are algorithms that discriminate between classes of patterns.
Depending upon the number of classes in the problem, we talk about two- and many-class classifiers.
The design of the classifier depends upon the character of data, number of classes, learning algorithm, and validation procedures.
Classifier can be regarded as the mapping (F) from feature space to class space
F: X {1, 2, …, c}
![Page 18: Chapter 4 CONCEPTS OF LEARNING, CLASSIFICATION AND REGRESSION Cios / Pedrycz / Swiniarski / Kurgan](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e4f5503460f94b4754e/html5/thumbnails/18.jpg)
© 2007 Cios / Pedrycz / Swiniarski /
Kurgan
18
Two-Class Classifier and Output Coding
classifier x
y
a b
1 2
0 1
1 2
0
1 2
(a) (b)
y
y
y
[0, ½] if pattern belongs to class 1 [ ½ , 1] if pattern belongs to class 2
- (x) <0 if x belongs to 1 - (x) 0 if x belongs to 2
![Page 19: Chapter 4 CONCEPTS OF LEARNING, CLASSIFICATION AND REGRESSION Cios / Pedrycz / Swiniarski / Kurgan](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e4f5503460f94b4754e/html5/thumbnails/19.jpg)
© 2007 Cios / Pedrycz / Swiniarski /
Kurgan
19
Multi Class Classifier
x
y1 y2 yc
classifier
Maximum of class membership- select class (i0) for which
i0 = arg max {y1, y2,…, yc}
![Page 20: Chapter 4 CONCEPTS OF LEARNING, CLASSIFICATION AND REGRESSION Cios / Pedrycz / Swiniarski / Kurgan](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e4f5503460f94b4754e/html5/thumbnails/20.jpg)
© 2007 Cios / Pedrycz / Swiniarski /
Kurgan
20
Multi Class Dichotomic Classifier
We can split the c-class problem into a subset of two-class problems.
In each, we consider class, say 1, and the other class is formed by all the patterns that do not belong to class 1.
Binary/dichotomic decision:
1(x) 0 if x belongs to 1
1(x) < 0 if x does not belong to 1
![Page 21: Chapter 4 CONCEPTS OF LEARNING, CLASSIFICATION AND REGRESSION Cios / Pedrycz / Swiniarski / Kurgan](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e4f5503460f94b4754e/html5/thumbnails/21.jpg)
© 2007 Cios / Pedrycz / Swiniarski /
Kurgan
21
Multi Class Dichotomic Classifier
Dichotomic decision:
1(x) 0 if x belongs to 1
1(x) < 0 if x does not belong to 1
Cases:
• only one classifier generates a nonnegative value
• several classifiers identify the pattern as belonging to a specific class. conflict class assignment
• no classifier issued a classification decision –undefined class assignment.
![Page 22: Chapter 4 CONCEPTS OF LEARNING, CLASSIFICATION AND REGRESSION Cios / Pedrycz / Swiniarski / Kurgan](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e4f5503460f94b4754e/html5/thumbnails/22.jpg)
© 2007 Cios / Pedrycz / Swiniarski /
Kurgan
22
Multi Class Dichotomic Classifier
1
not 1
2
not 2
1
2
conflict
lack of decision
![Page 23: Chapter 4 CONCEPTS OF LEARNING, CLASSIFICATION AND REGRESSION Cios / Pedrycz / Swiniarski / Kurgan](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e4f5503460f94b4754e/html5/thumbnails/23.jpg)
© 2007 Cios / Pedrycz / Swiniarski /
Kurgan
23
Classification vs. Regression
In contrast to classification in regression we have:
• continuous output variable and • the objective is to build a model (regressor) so that a certain approximation error is minimized
For a data set formed by pairs of input-output data (xk, yk), k = 1, 2,…,N where yk is in R
the regression model (regressor) has the form of some mapping F(x) such that for any xk we obtain F(xk) that is as close to yk as possible.
![Page 24: Chapter 4 CONCEPTS OF LEARNING, CLASSIFICATION AND REGRESSION Cios / Pedrycz / Swiniarski / Kurgan](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e4f5503460f94b4754e/html5/thumbnails/24.jpg)
© 2007 Cios / Pedrycz / Swiniarski /
Kurgan
24
Examples of Regression Models
Linearly distributed dataHigh dispersion
Nonlinearly distributed dataLow dispersion
Linearly distributed dataLow dispersion
y
x
(a)
y
x
(b)
`
y
x
(c)
![Page 25: Chapter 4 CONCEPTS OF LEARNING, CLASSIFICATION AND REGRESSION Cios / Pedrycz / Swiniarski / Kurgan](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e4f5503460f94b4754e/html5/thumbnails/25.jpg)
© 2007 Cios / Pedrycz / Swiniarski /
Kurgan
25
Main Categories of Classifiers
Explicit and implicit characterization of classifiers:
(a) Explicitly specified function - such as linear, polynomial, neural network, etc.
(b) Implicit – no formula but rather a description, such as a decision tree, nearest neighbor classifier, etc.
![Page 26: Chapter 4 CONCEPTS OF LEARNING, CLASSIFICATION AND REGRESSION Cios / Pedrycz / Swiniarski / Kurgan](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e4f5503460f94b4754e/html5/thumbnails/26.jpg)
© 2007 Cios / Pedrycz / Swiniarski /
Kurgan
26
Nearest - Neighbor Classifier
Classify x considering class of the nearest neighbor
L = arg mink ||x – xk|| class of x is the same as the class to which xL belongs to
![Page 27: Chapter 4 CONCEPTS OF LEARNING, CLASSIFICATION AND REGRESSION Cios / Pedrycz / Swiniarski / Kurgan](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e4f5503460f94b4754e/html5/thumbnails/27.jpg)
© 2007 Cios / Pedrycz / Swiniarski /
Kurgan
27
Decision Trees
Boundaries are always parallel to the coordinate axes.
x1 <a
yes
yes
x2 >b
class-1
class-1 class-2
no
no
x1
x2
a
b
class-1
class-2
![Page 28: Chapter 4 CONCEPTS OF LEARNING, CLASSIFICATION AND REGRESSION Cios / Pedrycz / Swiniarski / Kurgan](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e4f5503460f94b4754e/html5/thumbnails/28.jpg)
© 2007 Cios / Pedrycz / Swiniarski /
Kurgan
28
Linear Classifiers
Linear function of the features (variables)
(x) = w0 + w1x1 + w2 x2 + … +wn xn
Parameters of the classifier: w0, w1, ….
Geometry: line, plane, hyperplane
Linear separability of data
(x1,x2) = 0.7 + 1.3x1 -2.5 x2
![Page 29: Chapter 4 CONCEPTS OF LEARNING, CLASSIFICATION AND REGRESSION Cios / Pedrycz / Swiniarski / Kurgan](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e4f5503460f94b4754e/html5/thumbnails/29.jpg)
© 2007 Cios / Pedrycz / Swiniarski /
Kurgan
29
Linear Classifiers
Linear classifiers can be described in a compact form by using vector notation:
(x) = wT x~
where w = [w0 w1 …wn]T and x~=[1 x1 x2 … xn]
Note that x~ is defined in an extended/augmented input space that is x~ =[1 x]T
![Page 30: Chapter 4 CONCEPTS OF LEARNING, CLASSIFICATION AND REGRESSION Cios / Pedrycz / Swiniarski / Kurgan](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e4f5503460f94b4754e/html5/thumbnails/30.jpg)
© 2007 Cios / Pedrycz / Swiniarski /
Kurgan
30
Nonlinear Classifiers
Polynomial classifiers
(x) = w0 + w1x1 + w2 x2 + … +wn xn + + wn+1x1
2 + wn+2x22+ … + w2n xn
2+ + w2n+1x1x2 +....
have nonlinear boundaries formed at the expense of increased dimensionality of the feature space.
![Page 31: Chapter 4 CONCEPTS OF LEARNING, CLASSIFICATION AND REGRESSION Cios / Pedrycz / Swiniarski / Kurgan](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e4f5503460f94b4754e/html5/thumbnails/31.jpg)
© 2007 Cios / Pedrycz / Swiniarski /
Kurgan
31
Performance Assessment
Loss function: L(1, 2) and L(2, 1)
clas
sifi
er L(1, 2)
L(2, 1)
1
2
1 2
Correct classification losses
![Page 32: Chapter 4 CONCEPTS OF LEARNING, CLASSIFICATION AND REGRESSION Cios / Pedrycz / Swiniarski / Kurgan](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e4f5503460f94b4754e/html5/thumbnails/32.jpg)
© 2007 Cios / Pedrycz / Swiniarski /
Kurgan
32
Performance Assessment
A performance index is used to measure the quality of the classifier and can be expressed for the k-th data point as:
We sum up the above expressions over all data to express the total cumulative error
otherwise 0,
ω tobelonging as iedmisclassif was if )ω,L(ω
ω tobelonging as iedmisclassif was if )ω,L(ω
)e( 2k12
1k21
k
x
x
x
1k
k )e(Q x
![Page 33: Chapter 4 CONCEPTS OF LEARNING, CLASSIFICATION AND REGRESSION Cios / Pedrycz / Swiniarski / Kurgan](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e4f5503460f94b4754e/html5/thumbnails/33.jpg)
© 2007 Cios / Pedrycz / Swiniarski /
Kurgan
33
Generalization Aspects of Classification/Regression Models
Performance is assessed with regard to unseen data. Typically, the available data are split into tow or three disjoint subsets
• Training • Validation • Testing
Training set - used to complete training (learning) of the classifier. All optimization activities are guided by the performance index and its changes are reported for the training data.
![Page 34: Chapter 4 CONCEPTS OF LEARNING, CLASSIFICATION AND REGRESSION Cios / Pedrycz / Swiniarski / Kurgan](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e4f5503460f94b4754e/html5/thumbnails/34.jpg)
© 2007 Cios / Pedrycz / Swiniarski /
Kurgan
34
Overtraining and Validation Sets
order of polynomial
Per
form
ance
inde
x
training set
validation set
Validation set is essential in selecting a structure of classifiersBy using validation set, we can determine an optimal order of the polynomial
Consider polynomial classifiers
![Page 35: Chapter 4 CONCEPTS OF LEARNING, CLASSIFICATION AND REGRESSION Cios / Pedrycz / Swiniarski / Kurgan](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e4f5503460f94b4754e/html5/thumbnails/35.jpg)
© 2007 Cios / Pedrycz / Swiniarski /
Kurgan
35
Approximation, Generalization and Memorization
Approximation – generalization dilemma: excellent performance on the training set but unacceptable performance on the testing set.
Memorization effect: data becomes memorized (including those data points that are noisy) and thus classifier exhibits poor generalization abilities.
![Page 36: Chapter 4 CONCEPTS OF LEARNING, CLASSIFICATION AND REGRESSION Cios / Pedrycz / Swiniarski / Kurgan](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e4f5503460f94b4754e/html5/thumbnails/36.jpg)
© 2007 Cios / Pedrycz / Swiniarski /
Kurgan
36
Approximation, Generalization and Memorization
Nonlinear classifier produced zero classification error but with poor generalization ability.
![Page 37: Chapter 4 CONCEPTS OF LEARNING, CLASSIFICATION AND REGRESSION Cios / Pedrycz / Swiniarski / Kurgan](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e4f5503460f94b4754e/html5/thumbnails/37.jpg)
© 2007 Cios / Pedrycz / Swiniarski /
Kurgan
37
References
Bishop, C.M. 1995. Neural Networks for Pattern Recognition, Oxford University Press
Duda, R.O, Hart, PE and Stork DG. 2001 Pattern Classification, 2nd edition, J. Wiley
Kaufmann, L. and Rousseeuw, P.J. 1990. Finding Groups in Data: An Introduction to Cluster Analysis, Wiley
Soderstrom, T. and Stoica, P. 1986. System Identification, Wiley
Webb, A. 2002. Statistical Pattern Recognition, 2nd edition, Wiley