[width=3.3cm]images/logomoa.jpg .5cm regressionabifet/523/regression-slides.pdf · i regression...
TRANSCRIPT
Regression
Albert Bifet
May 2012
COMP423A/COMP523A Data Stream Mining
Outline
1. Introduction2. Stream Algorithmics3. Concept drift4. Evaluation5. Classification6. Ensemble Methods7. Regression8. Clustering9. Frequent Pattern Mining
10. Distributed Streaming
Data Streams
Big Data & Real Time
Regression
DefinitionGiven a numeric class attribute, a regression algorithm builds amodel that predicts for every unlabelled instance I a numericvalue with accuracy.
y = f (x)
ExampleStock-Market price prediction
ExampleAirplane delays
Evaluation
1. Error estimation: Hold-out or Prequential
2. Evaluation performance measures: MSE or MAE
3. Statistical significance validation: Nemenyi test
Evaluation Framework
2. Performance Measures
Regression mean measures
I Mean square error:
MSE =∑
(f (xi)− yi)2/N
I Root mean square error:
RMSE =√
MSE =√∑
(f (xi)− yi)2/N
Forgetting mechanism for estimating measuresSliding window of size w with the most recent observations
2. Performance Measures
Regression relative measures
I Relative Square error:
RSE =∑
(f (xi)− yi)2/
∑(yi − yi)
2
I Root relative square error:
RRSE =√
RSE =√∑
(f (xi)− yi)2/∑
(yi)− yi)2
Forgetting mechanism for estimating measuresSliding window of size w with the most recent observations
2. Performance Measures
Regression absolute measures
I Mean absolute error:
MAE =∑
(|f (xi)− yi |)/N
I Relative absolute error:
RAE =∑
(|f (xi)− yi |)/∑
(|yi − yi |)
Forgetting mechanism for estimating measuresSliding window of size w with the most recent observations
Linear Methods for Regression
Linear Least Squares fitting
I Linear Regression Model
f (x) = β0 +
p∑j=1
βjxj = Xβ
I Minimize residual sum of squares
RSS(β) =N∑
i=1
(yi − f (xi))2/N = (y− Xβ)′(y− Xβ)
I Solution:β = (X′X)−1X′y
Perceptron
Attribute 1
Attribute 2
Attribute 3
Attribute 4
Attribute 5
Output h~w (~xi)
w1
w2
w3
w4
w5
I Data stream: 〈~xi , yi〉I Classical perceptron: h~w (~xi) = ~wT~xi ,I Minimize Mean-square error: J(~w) = 1
2∑
(yi − h~w (~xi))2
Perceptron
I Minimize Mean-square error: J(~w) = 12∑
(yi − h~w (~xi))2
I Stochastic Gradient Descent: ~w = ~w − η∇J~xi
I Gradient of the error function:
∇J = −∑
i
(yi − h~w (~xi))
I Weight update rule
~w = ~w + η∑
i
(yi − h~w (~xi))~xi
Fast Incremental Model Tree with Drift DetectionFIMT-DD
FIMT-DD differences with HT:
1. Splitting Criterion2. Numeric attribute handling using BINTREE3. Linear model at the leaves4. Concept Drift Handling: Page-Hinckley5. Alternate Tree adaption strategy
Splitting Criterion
Standard Deviation Reduction MeasureI Classification
Information Gain = Entropy(before Split)− Entropy(after split)
Entropy = −c∑
pi · log pi
Gini Index =c∑
pi(1− pi) = 1−c∑
p2i
I Regression
Gain = SD(before Split)− SD(after split)
StandardDeviation (SD) =√∑
(y − yi)2/N
Numeric Handling Methods
Exhaustive Binary Tree (BINTREE – Gama et al, 2003)
I Closest implementation of a batch methodI Incrementally update a binary tree as data is observedI Issues: high memory cost, high cost of split search, data
order
Page Hinckley Test
I The CUSUM test
g0 = 0, gt = max (0,gt−1 + εt − υ)
if gt > h then alarm and gt = 0
I The Page Hinckley Test
g0 = 0, gt = gt−1 + (εt − υ)
Gt = min(gt )
if gt −Gt > h then alarm and gt = 0
Lazy Methods
kNN Nearest Neighbours:
1. Mean value of the k nearest neighbours
f (xq) =
∑ki=1 f (xi)
k
2. Depends on distance function