![Page 1: [width=3.3cm]images/LogoMOA.jpg .5cm Regressionabifet/523/Regression-Slides.pdf · I Regression Gain = SD(before Split) SD(after split) StandardDeviation (SD) = qX (y yi)2=N. Numeric](https://reader035.vdocument.in/reader035/viewer/2022070808/5f06e0f57e708231d41a3040/html5/thumbnails/1.jpg)
Regression
Albert Bifet
May 2012
![Page 2: [width=3.3cm]images/LogoMOA.jpg .5cm Regressionabifet/523/Regression-Slides.pdf · I Regression Gain = SD(before Split) SD(after split) StandardDeviation (SD) = qX (y yi)2=N. Numeric](https://reader035.vdocument.in/reader035/viewer/2022070808/5f06e0f57e708231d41a3040/html5/thumbnails/2.jpg)
COMP423A/COMP523A Data Stream Mining
Outline
1. Introduction2. Stream Algorithmics3. Concept drift4. Evaluation5. Classification6. Ensemble Methods7. Regression8. Clustering9. Frequent Pattern Mining
10. Distributed Streaming
![Page 3: [width=3.3cm]images/LogoMOA.jpg .5cm Regressionabifet/523/Regression-Slides.pdf · I Regression Gain = SD(before Split) SD(after split) StandardDeviation (SD) = qX (y yi)2=N. Numeric](https://reader035.vdocument.in/reader035/viewer/2022070808/5f06e0f57e708231d41a3040/html5/thumbnails/3.jpg)
Data Streams
Big Data & Real Time
![Page 4: [width=3.3cm]images/LogoMOA.jpg .5cm Regressionabifet/523/Regression-Slides.pdf · I Regression Gain = SD(before Split) SD(after split) StandardDeviation (SD) = qX (y yi)2=N. Numeric](https://reader035.vdocument.in/reader035/viewer/2022070808/5f06e0f57e708231d41a3040/html5/thumbnails/4.jpg)
Regression
DefinitionGiven a numeric class attribute, a regression algorithm builds amodel that predicts for every unlabelled instance I a numericvalue with accuracy.
y = f (x)
ExampleStock-Market price prediction
ExampleAirplane delays
![Page 5: [width=3.3cm]images/LogoMOA.jpg .5cm Regressionabifet/523/Regression-Slides.pdf · I Regression Gain = SD(before Split) SD(after split) StandardDeviation (SD) = qX (y yi)2=N. Numeric](https://reader035.vdocument.in/reader035/viewer/2022070808/5f06e0f57e708231d41a3040/html5/thumbnails/5.jpg)
Evaluation
1. Error estimation: Hold-out or Prequential
2. Evaluation performance measures: MSE or MAE
3. Statistical significance validation: Nemenyi test
Evaluation Framework
![Page 6: [width=3.3cm]images/LogoMOA.jpg .5cm Regressionabifet/523/Regression-Slides.pdf · I Regression Gain = SD(before Split) SD(after split) StandardDeviation (SD) = qX (y yi)2=N. Numeric](https://reader035.vdocument.in/reader035/viewer/2022070808/5f06e0f57e708231d41a3040/html5/thumbnails/6.jpg)
2. Performance Measures
Regression mean measures
I Mean square error:
MSE =∑
(f (xi)− yi)2/N
I Root mean square error:
RMSE =√
MSE =√∑
(f (xi)− yi)2/N
Forgetting mechanism for estimating measuresSliding window of size w with the most recent observations
![Page 7: [width=3.3cm]images/LogoMOA.jpg .5cm Regressionabifet/523/Regression-Slides.pdf · I Regression Gain = SD(before Split) SD(after split) StandardDeviation (SD) = qX (y yi)2=N. Numeric](https://reader035.vdocument.in/reader035/viewer/2022070808/5f06e0f57e708231d41a3040/html5/thumbnails/7.jpg)
2. Performance Measures
Regression relative measures
I Relative Square error:
RSE =∑
(f (xi)− yi)2/
∑(yi − yi)
2
I Root relative square error:
RRSE =√
RSE =√∑
(f (xi)− yi)2/∑
(yi)− yi)2
Forgetting mechanism for estimating measuresSliding window of size w with the most recent observations
![Page 8: [width=3.3cm]images/LogoMOA.jpg .5cm Regressionabifet/523/Regression-Slides.pdf · I Regression Gain = SD(before Split) SD(after split) StandardDeviation (SD) = qX (y yi)2=N. Numeric](https://reader035.vdocument.in/reader035/viewer/2022070808/5f06e0f57e708231d41a3040/html5/thumbnails/8.jpg)
2. Performance Measures
Regression absolute measures
I Mean absolute error:
MAE =∑
(|f (xi)− yi |)/N
I Relative absolute error:
RAE =∑
(|f (xi)− yi |)/∑
(|yi − yi |)
Forgetting mechanism for estimating measuresSliding window of size w with the most recent observations
![Page 9: [width=3.3cm]images/LogoMOA.jpg .5cm Regressionabifet/523/Regression-Slides.pdf · I Regression Gain = SD(before Split) SD(after split) StandardDeviation (SD) = qX (y yi)2=N. Numeric](https://reader035.vdocument.in/reader035/viewer/2022070808/5f06e0f57e708231d41a3040/html5/thumbnails/9.jpg)
Linear Methods for Regression
Linear Least Squares fitting
I Linear Regression Model
f (x) = β0 +
p∑j=1
βjxj = Xβ
I Minimize residual sum of squares
RSS(β) =N∑
i=1
(yi − f (xi))2/N = (y− Xβ)′(y− Xβ)
I Solution:β = (X′X)−1X′y
![Page 10: [width=3.3cm]images/LogoMOA.jpg .5cm Regressionabifet/523/Regression-Slides.pdf · I Regression Gain = SD(before Split) SD(after split) StandardDeviation (SD) = qX (y yi)2=N. Numeric](https://reader035.vdocument.in/reader035/viewer/2022070808/5f06e0f57e708231d41a3040/html5/thumbnails/10.jpg)
Perceptron
Attribute 1
Attribute 2
Attribute 3
Attribute 4
Attribute 5
Output h~w (~xi)
w1
w2
w3
w4
w5
I Data stream: 〈~xi , yi〉I Classical perceptron: h~w (~xi) = ~wT~xi ,I Minimize Mean-square error: J(~w) = 1
2∑
(yi − h~w (~xi))2
![Page 11: [width=3.3cm]images/LogoMOA.jpg .5cm Regressionabifet/523/Regression-Slides.pdf · I Regression Gain = SD(before Split) SD(after split) StandardDeviation (SD) = qX (y yi)2=N. Numeric](https://reader035.vdocument.in/reader035/viewer/2022070808/5f06e0f57e708231d41a3040/html5/thumbnails/11.jpg)
Perceptron
I Minimize Mean-square error: J(~w) = 12∑
(yi − h~w (~xi))2
I Stochastic Gradient Descent: ~w = ~w − η∇J~xi
I Gradient of the error function:
∇J = −∑
i
(yi − h~w (~xi))
I Weight update rule
~w = ~w + η∑
i
(yi − h~w (~xi))~xi
![Page 12: [width=3.3cm]images/LogoMOA.jpg .5cm Regressionabifet/523/Regression-Slides.pdf · I Regression Gain = SD(before Split) SD(after split) StandardDeviation (SD) = qX (y yi)2=N. Numeric](https://reader035.vdocument.in/reader035/viewer/2022070808/5f06e0f57e708231d41a3040/html5/thumbnails/12.jpg)
Fast Incremental Model Tree with Drift DetectionFIMT-DD
FIMT-DD differences with HT:
1. Splitting Criterion2. Numeric attribute handling using BINTREE3. Linear model at the leaves4. Concept Drift Handling: Page-Hinckley5. Alternate Tree adaption strategy
![Page 13: [width=3.3cm]images/LogoMOA.jpg .5cm Regressionabifet/523/Regression-Slides.pdf · I Regression Gain = SD(before Split) SD(after split) StandardDeviation (SD) = qX (y yi)2=N. Numeric](https://reader035.vdocument.in/reader035/viewer/2022070808/5f06e0f57e708231d41a3040/html5/thumbnails/13.jpg)
Splitting Criterion
Standard Deviation Reduction MeasureI Classification
Information Gain = Entropy(before Split)− Entropy(after split)
Entropy = −c∑
pi · log pi
Gini Index =c∑
pi(1− pi) = 1−c∑
p2i
I Regression
Gain = SD(before Split)− SD(after split)
StandardDeviation (SD) =√∑
(y − yi)2/N
![Page 14: [width=3.3cm]images/LogoMOA.jpg .5cm Regressionabifet/523/Regression-Slides.pdf · I Regression Gain = SD(before Split) SD(after split) StandardDeviation (SD) = qX (y yi)2=N. Numeric](https://reader035.vdocument.in/reader035/viewer/2022070808/5f06e0f57e708231d41a3040/html5/thumbnails/14.jpg)
Numeric Handling Methods
Exhaustive Binary Tree (BINTREE – Gama et al, 2003)
I Closest implementation of a batch methodI Incrementally update a binary tree as data is observedI Issues: high memory cost, high cost of split search, data
order
![Page 15: [width=3.3cm]images/LogoMOA.jpg .5cm Regressionabifet/523/Regression-Slides.pdf · I Regression Gain = SD(before Split) SD(after split) StandardDeviation (SD) = qX (y yi)2=N. Numeric](https://reader035.vdocument.in/reader035/viewer/2022070808/5f06e0f57e708231d41a3040/html5/thumbnails/15.jpg)
Page Hinckley Test
I The CUSUM test
g0 = 0, gt = max (0,gt−1 + εt − υ)
if gt > h then alarm and gt = 0
I The Page Hinckley Test
g0 = 0, gt = gt−1 + (εt − υ)
Gt = min(gt )
if gt −Gt > h then alarm and gt = 0
![Page 16: [width=3.3cm]images/LogoMOA.jpg .5cm Regressionabifet/523/Regression-Slides.pdf · I Regression Gain = SD(before Split) SD(after split) StandardDeviation (SD) = qX (y yi)2=N. Numeric](https://reader035.vdocument.in/reader035/viewer/2022070808/5f06e0f57e708231d41a3040/html5/thumbnails/16.jpg)
Lazy Methods
kNN Nearest Neighbours:
1. Mean value of the k nearest neighbours
f (xq) =
∑ki=1 f (xi)
k
2. Depends on distance function