modern mdl meets data mining insight, theory, and...

60
Modern MDL Meets Data Mining Insight, Theory, and Practice Part IVDynamic Setting Kenji Yamanishi Graduate School of Information Science and Technology, the University of Tokyo August 4 th 2019 KDD Tutorial

Upload: others

Post on 11-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

Modern MDL Meets Data MiningInsight, Theory, and Practice

ーPart IVーDynamic Setting

Kenji YamanishiGraduate School of Information Science and

Technology, the University of TokyoAugust 4th 2019

KDD Tutorial

Page 2: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

Part IV. Dynamic Setting

4.1. Change Detection with MDL Change Statistics4.1.1. Change Detection4.1.2. MDL Change Statistics4.1.3. Sequential Gradual Change Detection4.1.4. Adaptive Windowing

4.2. Model Change Detection with MDL Principle4.2.1. MDL Model Change Statistics4.2.2. Dynamic Model Selection4.2.3. Clustering Change Detection4.2.4. Model Change Sign Detection

Page 3: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

4.1. Change Detection with MDL Change Statistics.

Page 4: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

Detecting emergence of bursts of anomalies

4.1.1 Change DetectionWhat’s Change Detection?

Page 5: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

t=a :change point

is largeDissimilarity Measure=Kullback-Leibler divergence

Definition of Change Point

Page 6: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

Application to Malware Detection

Detecting SQL Injection via change point detection

0

5

10

15

20

25

30

35

40

changescore

Malware Attack

→time

22 hours

Sign-Scanning

Page 7: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

Why Change Detection?

Time Series Event behind changeAccess log Malware

Computer usage log FraudSyslog Failure

Sensor data AccidentTweet Topic Emergence

Real estate transaction Economics crisisUsage transaction Market trend

Visual field loss Glaucoma

Page 8: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

Previous Work■ Abrupt Change detection:

[Hinkley 1970] [Hsu 1977][Basseville, Nikiforov 1993](CUSUM)[Guralnik, Srivastava 1998] [Fearnhead, Liu 2007]

■ On-line abrupt change detection:[Yamanishi,Takeuchi 2002] [Kiefer et al.2004] [Takeuchi, Yamanishi 2006] [Adams,MacKay 2007]

■ Incremental change detection(Concept drift)[Zliobaite 2009] [Gama et al. 2013]

■ Continuous change detection[Miyaguchi, Yamanishi 2015] [Yamanishi Miyaguchi 2016]

No studies on unifying approaches to detecting gradual changes as well as abrupt ones

Page 9: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

Abrupt Change Detection

Gradual Change Detection[Yamanishi Miyaguchi BigData 2016]

{Miyaguchi Yamanishi JDSA2018][Kaneko Miyaguchi Yamanishi BIgData2016]

Model Change Detection[Yamanishi Fukushima IEEE IT 2018]

[Hirai Yamanishi KDD2012][Hayashi Yamanishi DAMI 2014]

Model Change Sign Detection[Hirai Yamansihi BigData 2018]

Unifying gradual and abrupt change detection

New Directions of Change Detection

MDL

Page 10: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

4.1.2 MDL Change StatisticsHypothesis Testing Framework

10

parametric class of prob. densities

Likelihood test cannot be applied

t is change pt

t is not change pt

Page 11: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

MDL Change Statistics

If the data can be compressed significantly more by changing the distribution at time t,then that point may be thought of as a change point.

Basic Idea

timet

C.f. [Yamanishi Miyaguchi BigData2016] [Vreeken Leeuwen DAMI2014][Hooi et al. CIKM2018] [Guralnik and Srivastava KDD1999]

Page 12: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

NML CodelengthParametric model

NML Codelength(Normalized Maximum Likelihood (NML) Codelength)

Parametric Complexity

k:# parameters

=Cn

(Fisher Information)where

Page 13: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

MDL-change statistics

Formal Definition of MDL Change Statistics

NML Code-lengthfor unchange

NML Code-length for change

[Yamanishi Miyaguchi BigData2016]

Page 14: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

Performance Evaluation Metrics

The performance measure of hypothesis testing

Type I error probability:

=The probability that H0 is true but H1 is accepted.

(False alarm rate)

Type II error probability

=The probability that H1 is true but H0 is accepted.

(Overlooking rate)

Page 15: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

Theoretical Performance of MDL-TestTheorem 4.1.1(Error probabilities for MDL-test)

[Yamanishi Miyaguchi BigData2016]

:NML distribution

where

Error probabilities converge to zero exponentially with model complexity-based exponents.

(False alarm rate)

(Overlooking rate)

Page 16: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

4.1.3.Sequential Gradual Change Detection

16

Detecting change symptom from data stream

Challenges:Real-time detection of sign of changes

Abrupt change⇒Conventional target ⇒Our new target

Gradual change

ChangeSymptom

Changepoint

Page 17: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

Score Curve

Sequential MDL Change Detection(S-MDL)

MDL Change Statistics

[Yamanishi, Miyaguchi BigData2016]Sequentially compute MDL change statistics with fixed window

Change point

Page 18: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

Sequential MDL Change DetectionSequential variant 2h: window size

Runs linearlyin window size

Page 19: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

19

Example 4.1.1. (Gaussian distributions)

MDL change statistics at time t:

:

Page 20: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

20

Example 4.1.2. (Poisson distributions)

MDL change statistics at time t:

Page 21: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

21

Example 4.2.3. (Linear Regression)

MDL change statistics at time t:

Page 22: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

22

■Total Benefit (How early)

■#False Alarms(How reliably)

■Performance Measure

Experiments: Synthetic Data

Evaluation metricsbenefit

Tt*

true

1

β

β

β

β

AUC

threshold

Area under curve

β

β

t

Page 23: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

23

Experiments: Synthetic DataJumping means

replacing the step function H(·) with a slope function S(·) s.t.

where H(x) is the Heaviside step function that takes 1 ifx ≥ 0, otherwise 0

AbruptChange

GradualChange

Page 24: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

24

Experiments: Synthetic DataJumping variances

replacing the step function H(·) with a slope function S(·) s.t.

where H(x) is the Heaviside step function that takes 1 ifx ≥ 0, otherwise 0

AbruptChange

GradualChange

Page 25: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

25

Experiments: Synthetic DataJumping means:

Jumping variances:

IRL: Inverse Run Length [Adams and MacKay 2007] CF: ChangeFinder [Takeuchi and Yamanishi 2006] MDL1: Proposed method with independent Gaussian MDL2: Proposed method with linear regression

AUC

AUC

[Yamanishi, Miyaguchi BigData2016]

Page 26: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

26

Experiments: Real Data(Security)SQL injection symptom detection

■A time series of IP-URL counts, where each datum was the maximum # of total counts of records sent from an identical IP address to an identical URL within 15 minutes.

■Total records =8632

■MDL1 and MDL2 employ Poisson distributions

Data provided by LAC Corporation

[Yamanishi, Miyaguchi BigData2016]

Page 27: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

27

Experiments: Real Data

SQL injection Attack

Real symptom security analysts confirmed

-SQL injection symptom detection-

Detected symptom caused by gradual increase of IP-URL accounts

Page 28: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

How do you choose window size?

Page 29: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

29

Compute statisticsfor all division points in the window Determine window size

SCAW: Sequentially compute MDL change statistics with Adaptive Windowing (ADWIN) [Bifet & Gavaldà SDM07]

• If a statistics value exceeds threshold, it shrinks its window

• Cost-saving version (ADWIN2)- Narrowing down the number of division points from to

→ no need to choose window size heuristically

4.1.4. Adaptive Windowing

[Kaneko, Miyaguchi, Yamanishi BigData2017]

Page 30: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

Asymptotic Reliability

[Kaneko, Miyaguchi, Yamanishi BigData2017]

• Asymptotic reliability assures:“the number of false-alarms stays finite as the data size grows when the target process does not contain any changes.”

Threshold Hyperparameter

Theorem 4.1.2

Page 31: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

・ Precision-recall plots

PHT: Page-Hinkley Test [Hinkle 70] ADWIN [Bifet & Gavaldà 07]CF: ChangeFinder [Takeuchi & Yamanishi 06]BOCPD: Bayesian online chnagepoint detection [Adams & MacKay 07] 31

SCAW achieves highest performance

Experimental Result: Synthetic Data

[Kaneko, Miyaguchi. Yamanishi BigData2017]

Page 32: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

- time series data217×325,440

32

SCAW2

S-MDL

SCAW is the better choice as a stream change detection

Data provided and evaluated by Toray Corp.

-

• Increase in the amount of an ingredient from early Apr. in 2015• A temporary stop of the boiler system on Mar. 15th in 2015

Experimental Results: Real Data ーFailure Sign Detection-

Window size

ChangeScore

Adaptive window

Fixed window

Detected signs of real failures in an industrial boiler system

Real FailureSigns of failures

[Kaneko, Miyaguchi. Yamanishi BigData2017]

Page 33: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

4.2. Model Change Detection with MDL Principle

Page 34: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

Related Work・Tracking Piecewise Stationary Sources

[Shamir Merhav IEEE IT1999] [Killick, Fearnhead, Eckley JASA2012] [Davis, Yau EJS2013]

・Switching Distribution[Erven, Grunwald, Rooij JRoyalStat 2013]

・Tracking Best Experts / Derandomization[Herbster, Warmuth JML 1998] [Vovk ML99]

・Dynamic Model Selection [Yamanishi, Maruyama KDD2005, IEEE IT2007][Davis, Lee, Rodriguez JASA 2006][Hirai Yamanishi KDD2012] [Yamanishi Fukushima IEEE IT2019]

・Concept Drift[J. Gama, I. Zlibait, A. Bifet, M. Pechenizkiy, Bouchachia,ACM Survey 2013]

Page 35: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

4.2.1. MDL Model Change Statistics

M0*M2*

M1*

NML codelength for change

[Yamanishi Fukushima IEEE Inform Theory 2018]MDL-Change Statistics

NML codelength for unchange

modelparameter

Parametric Complexity

Page 36: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

[Yamanishi Fukushima IEEE Inform Theory 2018]Theorem 4.1.3

Theoretical Result on MDL-Test

Type I and II error probabilities converge exponentially to zerowhere exponents depend on parametric complexities

(False alarm prob.)

(Overlooking prob.)

MDL change statisticsMDL Test:

Page 37: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

4.2.3. Dynamic Model Selection (DMS)-Multiple model change detection-

Find a model sequence that minimizes total description length

Predictive Codelengthfor data sequence

PredictiveCodelengthfor model sequence

DMS(Dynamic Model Selection)criterion

[Yamanishi and Maruyama KDD2005, IEEE IT 2007]

Computable viaDynamic Programming

Model class

Page 38: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

Probabilistic Setting of DMS

■Predictive distribution for data sequence

■モデル遷移確率

Sequentially normalized maximum likelihood code-length

■Model transition probability

Maximum Likelihood Prediction

Bayes Prediction

SNML Prediction

Page 39: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

1) Model sequence selection using dynamic programming

2)Estimating model transition prob. via Krischevsky-Trofimovestimator

DMS Algorithm

# change points needed to be M at time t

Page 40: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

■What’s Syslog?• Event sequences collected with BSD syslog protocol• Warning messages about devices

[Yamanishi, Maruyama KDD2005]

時間

Anomalyscore

Detect failures early and identify their

patterns

Application to Failure Detection from Syslog

Page 41: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

j-th session of syslog :

Syslog sessions are modeled with HMM mixtures

∑=

=K

kkjkkj PP

1)|()|( θπθ yy

where ∑ ∏ ∏−

= =+=

),...,(

1

1 111

1

)|()|()()|(jT

j j

xx

T

t

T

tttkttkkkjk xybxxaxP γθy

),...,( 1 jTxx :latent variables

Syslog Modeling with HMM Mixtures

K: #syslog behaviorpatterns

:session length

Syslogseq.

State seq..

Page 42: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

Bridge Error(2002/1/10)

Systen down

SystemLock-up(2002/1/15)

33025:Jan 15 15:03:59 WARN:swsig:sw_SigGetMem: alloc failed(256)33026:Jan 15 15:03:59 WARN:swsig:sw_SigGetMem: alloc failed(256)

42253:Jan 19 22:26:33 ERR :bridge:!brdgursrv: queue is full. discarding a message.

Systemlock up(2001/11/13)

Memory Exhaustion 2001/11/20)

Memory exhaustion(2001/11/11)

http://fbi-award.jp/sentan/jusyou/2005/nec.pdf

Experiments: Failure Detection#syslog patterns changed two days before system down.

# syslog patterns

Page 43: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

43

4.2.3. Clustering Change Detection

TimeChangepoint

Changepoint

Detecting changes of number of clusters and clustering assignments

Page 44: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

DMS for Complete Variable ModelIncrementally Application of DMS to complete variable model

[Hirai Yamanishi KDD2012]

Z: latent variable…Cluster index of X

Complete variable model

Page 45: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

Incremental DMS Criterion

NML codelength for Clustered data sequence

Codelength forcluster change

Slice total codelength time-wisely, then select # clusters and cluster assignment at each time

Slice time-wisely

[Hirai Yamanishi KDD2012] See also [Sun et al. KDD2007][Satoh Yamanishi ICDM2013]

Page 46: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

Application to Gaussian Mixture ModelComplete variable model of Gaussian mixture model

Upper bound on NML codelength for GMM

[Hirai and Yamanishi IEEE IT 2019]

Page 47: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

Tracking changes of customer structuresfrom beer transaction behavior data(QPR)

Data providedby M-Cube

Experimental Results: Real Data-Market Structure Change Detection-

Period: Nov.2011-Jan. 2012 #customers: 3185Data for each customer at t=consumption volume of 14 brands

beer during 14 days until time t

timechange change

3185

14 dim

14 days

[Hirai Yamanishi KDD2012]

Page 48: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

48

ConsumptionFrom Dec. 19th

To Jan. 1st.

Change of #clusters was detected at time when year-end demand increased vastly.

ConsumptionFrom Jan. 9th

To Jan. 22nd

Page 49: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

2012/2/1 49

平均消費量(ml) cluster 1 cluster 2 cluster 3

ビールA 184 0 117

ビールB 91 0 95

プレミアムA 108 0 80

プレミアムB 113 0 43

ビールC 0 0 126ビールD 0 0 140

第三のビールA 93 41 43第三のビールB 0 198 121第三のビールC 0 303 103第三のビールD 0 120 182

発泡酒A 0 75 48オフA 0 0 157

オフB 0 114 34

オフC 0 0 83

総購入量 589 852 1373人数(人) 598 376 311

cluster 1 cluster 2 cluster 3 cluster 4 cluster 5

84 0 131 50 229

123 0 248 0 0

153 0 174 73 0

176 0 105 0 0

0 0 146 122 0

0 0 72 192 0

101 131 130 0 0

0 34 406 0 131

0 107 112 46 236

0 202 431 0 0

0 107 87 0 0

0 0 169 138 0

0 215 74 0 0

0 0 61 83 0

637 796 2348 705 596

397 190 123 162 363

• Year-end demands of Beer A and 3rd

world Beer C rapidly increased, they led to form new additional clusters

Clustering Structure Change

Page 50: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

4.2.4. Model Change Sign Detection

k=3 k=4k=?

Model uncertainty increases

Page 51: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

Problem SettingProblem setting time t

Page 52: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

Structural EntropyStructural Entropy … measuring uncertainty of model selection

[Hirai Yamanishi BigData 2018]

where

Or for complete variable model

Page 53: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

Model Change Sign Detection via Structural Entropy

Modeldimension

Structural uncertainty

Change sign

[Hirai Yamanishi BigData 2018]See also [Ohsawa RevSNS 2018]

Page 54: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

Experimental Results: Synthetic DataChange sign can be detected by looking at rise up of structural entropy

[Hirai Yamanishi BigData2016]

Page 55: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

Experimental Results: Real DataSigns of customer clustering structure changes can be detected by looking at rise up of structural entropy

Page 56: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

Summary• The MDL change statistics is a theoretically justified methodology

for measuring the change score either for parameter changes or model changes.

• For gradual change detection, apply sequential MDL statistics with adaptive/non-adaptive windowing to conduct real-time event detection.

• For multiple model change detection, conduct Dynamic Model Selection(DMS) to obtain optimal model sequences.

• For clustering structure change detection, apply DMS to latent variable models sequentially to catch up latent structure changes.

• Signs of model changes may be detected by looking at structural entropy measuring model uncertainty.

Page 57: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

References■ 4.1. MDL change statistics・J. Vreeken, M. van Leeuwen, A. Siebes: “Krimp: mining itemsets that compress,”

Data Mining and Knowledge Discovery, Vol. 23, 1, pp 169-214, 2011.・K.Yamanishi and K.Miyaguchi: “Detecting gradual changes from data stream using

MDL- change statistics,“ Proceedings of 2016 IEEE International Conference on BigData (IEEE BigData2016), pp:156-163, 2016.

・R. Kaneko, K.Miyaguchi, and K.Yamanishi:“Detecting Changes in Streaming Data with Information-Theoretic Windowing," Proceedings of 2017 IEEE International Conference on Big Data (BigData2017 ), pp: 646-655, 2017.

・B.Hooi, L.Akoglu,D.Eswaran,A.Pandey, A.Jereminov,L.Pileggi, C.Faloutsos: “ChangeDAR: Online localized change detection for sensor data on a graph,”in Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp:507-516, 2018.C.f. Adaptive window algorithm

・A.Bifet and R.Gavaldà: “Learning from time-changing data with adaptive windowing,”in Proceedings of the 2007 SIAM International Conference on Data Mining, 2007.C.f. Predictive change statistics

・V.Guralnik and J.Srivastava: “Event detection from time series data,” in Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp:33–42, 1999.

Page 58: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

References■ 4.2. Dynamic Model Selection・K.Yamanishi and Y.Maruyama: “Dynamic syslog mining for network failure

monitoring,” Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD2005), pp:499-508, 2005.

・K.Yamanishi and Y.Maruyama: “Dynamic model selection with its applications tonovelty detection,” IEEE Transactions on Information Theory, Vol. 53, NO. 6, pp:2180-2189, 2007.

・K.Yamanishi and S. Fukushima: " Model change detection with the MDL principle", IEEE Transactions on Information Theory, 64(9), pp: 6115-6126, 2018.

■ 4.2. Topics Related to Dynamic Model Selection・M.Herbster and M.Warmuth: “Tracking the best experts,” Machine Learning, 32,

pp:151–178,1998.・V. Vovk: “Derandomizing stochastic prediction strategies," Machine Learning,

vol. 35, no. 3, pp. 247-282, 1999.・J.Kleinberg: “Bursty and hierarchical structure in stream,” Data Mining and Knowledge Discovery, 7, pp:373—397, 2003.

Page 59: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

References■ 4.2. Topics Related to Dynamic Model Selection(Cont.)・R.A.Davis, T.C.M.Lee, G.A.Rofriguez-Yam: “Structural break estimation for nonstationary time series models,” Journal of American Statistical Associations, 101, pp:223-239, 2006.・X. Xuan and K. Murphy. Modeling changing dependency structure in multivariate

time series,” Proceedings of the 24th International Conference on Machine Learning, (ICML2007), pp.1055--1062, 2007.・T.Erven, P.Grunwald, and S.Rooij: “Catching up by switching sooner: a predictive approach to adaptive estimation with an application to the AIC-BIC dilemma," Jr.Royal Stat.Soc.Ser.B, vol. 74, no. Issue 3, pp. 361–417, 2012.

・R.Killick, P.Fearnhead, and I.A.Eckley: “Optimal detection of changepoints with a linear computational cost,” Journal of American Statistical Associations, 107:500, pp:1590-1598, 2012.・J. Gama, I. Zlibait, A. Bifet, M. Pechenizkiy, and A. Bouchachia, “A survey onconcept drift adaptation," ACM Computing Survey, 2013.・Y.Hayashi and K.Yamanishi: “Sequential network change detection with its

applications to ad impact relation analysis,“ Data Mining and Knowledge Discovery: Vol. 29, Issue 1 ,pp: 137-167, 2015.

Page 60: Modern MDL Meets Data Mining Insight, Theory, and Practiceeda.mmci.uni-saarland.de/events/mdldm19/slides/mdldm19-part4.pdf · Modern MDL Meets Data Mining Insight, Theory, and Practice

References■ 4.2.3. Clustering Change Detection・M. Song and H.Wang: “Highly efficient incremental estimation of Gaussian mixture

models for online data stream clustering,” Intelligent Computing, 2005.・J. Sun, C. Faloutsos, S.Papadimitriou,P. S. Yu: “GraphScope: parameter-free mining of large time-evolving graphs,” Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2007), pp: 687-696, 2007.

・S. Hirai and K.Yamanishi: 〝Detecting changes of clustering structures using normalized maximum likelihood coding." Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD2012), pp:343-351, 2012・S.Sato and K.Yamanishi:〝Graph partitioning change detection using tree-based

clustering," Proceedings of IEEE International Conference on Data Mining (ICDM2013),pp:1169-1174, 2013.

■ 4.2.4. Model Change Sign Detection・S. Hirai and K.Yamanishi: “Detecting Latent Structure Uncertainty with Structural

Entropy”, Proceedings of IEEE International Conference on BigData (BigData2018), Dec. 2018.

・Y. Ohsawa: “Graph-based entropy for detecting explanatory signs of changes in market,” The Review of Social Network Strategies, Vol 12, 2, pp:183-203, 2018.