kernel methods jong cheol jeong. out line 6.1 one-dimensional kernel smoothers 6.1.1 local linear...
TRANSCRIPT
![Page 1: Kernel Methods Jong Cheol Jeong. Out line 6.1 One-Dimensional Kernel Smoothers 6.1.1 Local Linear Regression 6.1.2 Local Polynomial Regression 6.2 Selecting](https://reader035.vdocument.in/reader035/viewer/2022062321/56649f275503460f94c3e88c/html5/thumbnails/1.jpg)
Kernel Methods
Jong Cheol Jeong
![Page 2: Kernel Methods Jong Cheol Jeong. Out line 6.1 One-Dimensional Kernel Smoothers 6.1.1 Local Linear Regression 6.1.2 Local Polynomial Regression 6.2 Selecting](https://reader035.vdocument.in/reader035/viewer/2022062321/56649f275503460f94c3e88c/html5/thumbnails/2.jpg)
Out line
6.1 One-Dimensional Kernel Smoothers 6.1.1 Local Linear Regression 6.1.2 Local Polynomial Regression
6.2 Selecting the Width of Kernel 6.3 Local Regression in Rp
6.4 Structured Local Regression Models in Rp
6.5 Local Likelihood and Other Models 6.6 Kernel Density Estimation and Classification 6.7Radial Basis Functions and Kernels 6.8 Mixture Models for Density Estimation and
Classification
![Page 3: Kernel Methods Jong Cheol Jeong. Out line 6.1 One-Dimensional Kernel Smoothers 6.1.1 Local Linear Regression 6.1.2 Local Polynomial Regression 6.2 Selecting](https://reader035.vdocument.in/reader035/viewer/2022062321/56649f275503460f94c3e88c/html5/thumbnails/3.jpg)
Kernel Function:
the kernel function is a weighting function by assigning weights to the nearby data points in making an estimate.
![Page 4: Kernel Methods Jong Cheol Jeong. Out line 6.1 One-Dimensional Kernel Smoothers 6.1.1 Local Linear Regression 6.1.2 Local Polynomial Regression 6.2 Selecting](https://reader035.vdocument.in/reader035/viewer/2022062321/56649f275503460f94c3e88c/html5/thumbnails/4.jpg)
One-Dimensional Kernel Smoothers
))(N|Ave((x) k xxyf ii
(6.1)
K-nearest-neighbor
![Page 5: Kernel Methods Jong Cheol Jeong. Out line 6.1 One-Dimensional Kernel Smoothers 6.1.1 Local Linear Regression 6.1.2 Local Polynomial Regression 6.2 Selecting](https://reader035.vdocument.in/reader035/viewer/2022062321/56649f275503460f94c3e88c/html5/thumbnails/5.jpg)
One-Dimensional Kernel Smoothers
N
i i
N
i ii
xxK
yxxKxf
1 0
1 00
),(
),()(
(6.2)
Nadaraya-Watson kernel-weighted average
With the Epanechnikov quadratic kernel
0
0 ),(xx
DxxK i
otherwise. 0
;1||if )1( 2
4
3tt
tD
(6.3)
(6.4)
![Page 6: Kernel Methods Jong Cheol Jeong. Out line 6.1 One-Dimensional Kernel Smoothers 6.1.1 Local Linear Regression 6.1.2 Local Polynomial Regression 6.2 Selecting](https://reader035.vdocument.in/reader035/viewer/2022062321/56649f275503460f94c3e88c/html5/thumbnails/6.jpg)
One-Dimensional Kernel Smoothers
Adaptive neighborhoods with kernels
)(),(
0
00 xh
xxDxxK i
(6.5)
][00 )( kxxxh X[k] is the kth closest xi to x0
otherwise. 0
;1||if )1( 33 tttD (6.6)
Tri-cube
![Page 7: Kernel Methods Jong Cheol Jeong. Out line 6.1 One-Dimensional Kernel Smoothers 6.1.1 Local Linear Regression 6.1.2 Local Polynomial Regression 6.2 Selecting](https://reader035.vdocument.in/reader035/viewer/2022062321/56649f275503460f94c3e88c/html5/thumbnails/7.jpg)
One-Dimensional Kernel Smoothers
Nearest-Neighbor kernel Vs. Epanechnikov kernel
![Page 8: Kernel Methods Jong Cheol Jeong. Out line 6.1 One-Dimensional Kernel Smoothers 6.1.1 Local Linear Regression 6.1.2 Local Polynomial Regression 6.2 Selecting](https://reader035.vdocument.in/reader035/viewer/2022062321/56649f275503460f94c3e88c/html5/thumbnails/8.jpg)
Local Linear Regression
![Page 9: Kernel Methods Jong Cheol Jeong. Out line 6.1 One-Dimensional Kernel Smoothers 6.1.1 Local Linear Regression 6.1.2 Local Polynomial Regression 6.2 Selecting](https://reader035.vdocument.in/reader035/viewer/2022062321/56649f275503460f94c3e88c/html5/thumbnails/9.jpg)
Local Linear Regression
Locally weighted linear regression
N
iiii
xxxxxyxxK
1
2000
)(),()()(),(min
00
(6.7)
Estimate function with equivalent kernel
N
iii
TTT
yxl
xxxbxf
10
0
1
000
)(
)()()()(ˆ yWBBWB (6.8)
(6.9)
),( from matrix) diagonal(:)(
matrix) regression(2:),,1()(
00 i
T
xxKNNx
Nxxb
W
B
![Page 10: Kernel Methods Jong Cheol Jeong. Out line 6.1 One-Dimensional Kernel Smoothers 6.1.1 Local Linear Regression 6.1.2 Local Polynomial Regression 6.2 Selecting](https://reader035.vdocument.in/reader035/viewer/2022062321/56649f275503460f94c3e88c/html5/thumbnails/10.jpg)
Local Polynomial Regression
Local quadratic regression
N
i
d
j
jijii
djxxxxxyxxK
1
2
1000
,1),(),()()(),(min
00
(6.11)
Trimming the hills and filling the valleys
![Page 11: Kernel Methods Jong Cheol Jeong. Out line 6.1 One-Dimensional Kernel Smoothers 6.1.1 Local Linear Regression 6.1.2 Local Polynomial Regression 6.2 Selecting](https://reader035.vdocument.in/reader035/viewer/2022062321/56649f275503460f94c3e88c/html5/thumbnails/11.jpg)
Local Polynomial Regression
Bias-variance tradeoff in selecting the polynomial degree
![Page 12: Kernel Methods Jong Cheol Jeong. Out line 6.1 One-Dimensional Kernel Smoothers 6.1.1 Local Linear Regression 6.1.2 Local Polynomial Regression 6.2 Selecting](https://reader035.vdocument.in/reader035/viewer/2022062321/56649f275503460f94c3e88c/html5/thumbnails/12.jpg)
Selecting the Width of Kernel
Bias-variance tradeoff in selecting the width
•The window is narrow then its variance will be relatively large, and the bias will tend to be small
•The window is wide then its variance will be relatively small, and the bias will tend to be higher
![Page 13: Kernel Methods Jong Cheol Jeong. Out line 6.1 One-Dimensional Kernel Smoothers 6.1.1 Local Linear Regression 6.1.2 Local Polynomial Regression 6.2 Selecting](https://reader035.vdocument.in/reader035/viewer/2022062321/56649f275503460f94c3e88c/html5/thumbnails/13.jpg)
Local Regression in Rp
Local regression in p-dimension
N
i
Tiii
xxxbyxxK
1
2
00)(
)()(),(min0
(6.12)
0
0 ),(
degree maximum of Xin termspolynomial of vector a :)(
xxDxxK
dXb
D can be radial function or tri-cube function
(6.13)
![Page 14: Kernel Methods Jong Cheol Jeong. Out line 6.1 One-Dimensional Kernel Smoothers 6.1.1 Local Linear Regression 6.1.2 Local Polynomial Regression 6.2 Selecting](https://reader035.vdocument.in/reader035/viewer/2022062321/56649f275503460f94c3e88c/html5/thumbnails/14.jpg)
Structured Local Regression Models in Rp
Structured kernels
)()(
),( 000,
xxxxdxxK
T
A
A(6.14)
0,0,0
matrix tesemidefini positive a :
eigenvalueT AxAx
A
When the dimension to sample-size ratio is unfavorable, local regression does not help us much, unless we are willing to make some structural assumptions about the model
- Downgrading or omitting coordinates can reduce the error
Equation 6.13 gives equal weight to each coordinate, so we can modify the Kernel in order to control the weight on each coordinate
![Page 15: Kernel Methods Jong Cheol Jeong. Out line 6.1 One-Dimensional Kernel Smoothers 6.1.1 Local Linear Regression 6.1.2 Local Polynomial Regression 6.2 Selecting](https://reader035.vdocument.in/reader035/viewer/2022062321/56649f275503460f94c3e88c/html5/thumbnails/15.jpg)
Structured Regression functions
Fitting a regression function: considering every labels of interaction
),,,()|( 21 pXXXfXYE
ANOVA decompositions: a statistical idea of analyzing the variances between different variables and find certain dependencies on subset of variables
(6.15)
lk
lkklj
jjp XXgXgXXXf ),()(),,,( 21
Eliminating some of higher-order terms
![Page 16: Kernel Methods Jong Cheol Jeong. Out line 6.1 One-Dimensional Kernel Smoothers 6.1.1 Local Linear Regression 6.1.2 Local Polynomial Regression 6.2 Selecting](https://reader035.vdocument.in/reader035/viewer/2022062321/56649f275503460f94c3e88c/html5/thumbnails/16.jpg)
Structured Regression functions
Dividing the p predictors in X
qq XZXZZXf )()()()( 11
pqXXZ
XXXX
pq
q
where},{
},,{
1
21
(6.16)
Regression model by locally weighted least squares
N
iqqiiii
xzzxzxzyzzK
1
2001100
)(),()()()(),(min
00
(6.17)
Constructing a linear model for given Z
Varying coefficient modelsVarying coefficient models: a special case of structured model
![Page 17: Kernel Methods Jong Cheol Jeong. Out line 6.1 One-Dimensional Kernel Smoothers 6.1.1 Local Linear Regression 6.1.2 Local Polynomial Regression 6.2 Selecting](https://reader035.vdocument.in/reader035/viewer/2022062321/56649f275503460f94c3e88c/html5/thumbnails/17.jpg)
Questions
• Section 6.2 details how we may select the optimal lambda parameter for a kernel. How do we select the optimal kernel function? Are there kernels that tend to outperform others in most cases? If not, are there ways to determine a kernel that may perform well without doing an experiment?
![Page 18: Kernel Methods Jong Cheol Jeong. Out line 6.1 One-Dimensional Kernel Smoothers 6.1.1 Local Linear Regression 6.1.2 Local Polynomial Regression 6.2 Selecting](https://reader035.vdocument.in/reader035/viewer/2022062321/56649f275503460f94c3e88c/html5/thumbnails/18.jpg)
Questions
• One benefit of using kernels with SVM's is that we can expand the dimensionality of the dataset and make it more likely to find a separating hyperplane with a hard margin. But section 6.3 says that for local regression, the proportion of points on the boundary increases to 1 as the dimensionality increases. Thus, the predictions we make will have even more bias. Is there a compromise solution that will work, or is the kernel trick best applied in classification problems?
![Page 19: Kernel Methods Jong Cheol Jeong. Out line 6.1 One-Dimensional Kernel Smoothers 6.1.1 Local Linear Regression 6.1.2 Local Polynomial Regression 6.2 Selecting](https://reader035.vdocument.in/reader035/viewer/2022062321/56649f275503460f94c3e88c/html5/thumbnails/19.jpg)
Questions?