time series bitmap experiments

Time Series Bitmap ExperimentsTime Series Bitmap Experiments

• This file contains full color, large scale versions of the experiments shown in the paper, and additional experiments which were omitted because of space constraints

• Note that in every case, all the data is freely available

12

171827

282930

34

5621

2278

910

111213

141516

1920

262523

24

The clustering achieved on 15 pairs of samples from 15 diverse datasets. The red lines in the dendrogram draw attention to objectively incorrect subtrees

ParametersLevel 3N = 100n = 10

Dataset 1: Heterogeneous Data, Part 1

1 MotorCurrent: 2 MotorCurrent: 3 Video Surveillance: Ann, gun 4 Video Surveillance: Ann, no gun 5 Video Surveillance: Eamonn, gun 6 Video Surveillance: Eamonn, no gun 7 Power Demand: Jan-March (Italian) 8 Power Demand: April-June (Italian) 9 Great Lakes (Erie) 10 Great Lakes (Ontario) 11 Buoy Sensor: North Salinity 12 Buoy Sensor East Salinity 13 Koski ECG: slow 1 14 Koski ECG: slow 2 15 Koski ECG: fast 1 16 Koski ECG: fast 2 17 Exchange Rate: Swiss Franc 18 Exchange Rate: German Mark 19 Furnace: heating input 20 Furnace: cooling input 21 Reel 2: angular speed 22 Reel 2: tension 23 Balloon1 24 Balloon2 (lagged) 25 Evaporator: feed flow 26 Evaporator: vapor flow 27 Shuttle Inertia Sensor X 28 Shuttle Inertia Sensor X 29 Shuttle Inertia Sensor Z 30 Shuttle Inertia Sensor Z

Data Key

Data is in ASCII file“time_series_bitmap_1”

12

293021

2256

34

91011

121718

2827

7813

141516

1926

202523

24

Dataset 1: Heterogeneous Data, Part 2 If we do the clustering with only level 2 information, the clustering is very slightly worse, but still quite robust considering that we are only using 1.6% of the information available in the time series



Data Key

12

171827

282930

910

111221

2234

56

131415

1678

1920

262523

24


Dataset 1: Heterogeneous Data, Part 3

12

171830

292728

34

91011

122122

56

7813

141516

1920

262523

24


12

171827

282930

34

5621

2278

910

111213

141516

1920

262523

24


Changing the parameters by up to 50% either way has little effect on the quality of the clustering.

Here are some random examples


Data Key

Dataset 1: Heterogeneous Data, Part 4We compared our approach to a Markov model based approach and a ARIMA based approach.For both competitors we spent one hour of human time trying to find the best parameters


Data Key

128

21718

2756

2122

29303

478

910

121113

161514

1923

242620

25

117

18282

5622

34

102113

141516

711

9198

122025

2623

242729

30

12

171827

282930

34

5621

2278

910

111213

141516

1920

262523

24


Segmental Markov model [1]

Mixtures of ARMA models [2]

1

2

3

4

5

11

13

12

14

15

6

9

10

7

8

16

18

17

19

20

Dataset 2: Homogenous Data, Part 1

Cluster 1 (datasets 1 ~ 5): BIDMC Congestive Heart Failure Database (chfdb): record chf02

Start times at 0, 82, 150, 200, 250, respectively

Cluster 2 (datasets 6 ~ 10): BIDMC Congestive Heart Failure Database (chfdb): record chf15


Cluster 3 (datasets 11 ~ 15): Long Term ST Database (ltstdb): record 20021


Cluster 4 (datasets 16 ~ 20): MIT-BIH Noise Stress Test Database (nstdb): record 118e6


Data Key


Here we cluster 5 randomly chosen subsections from 4 different ECG datasets

Dataset 2: Homogenous Data, Part 2

1

5

2

3

4

11

13

12

14

15

16

20

18

19

17

6

9

10

7

8

The bitmap approach is defined (and very robust) when the time series are of different lengths

Dataset 3: MIT ECG Arrhythmia Data Part 1In Ge and Smyth 2000, this dataset was explored with segmental hidden Markov models. After they careful adjusted the parameters they reported 98% classification accuracy. Using time series bitmap with virtually any parameter settings, we get perfect classifications and clustering.

We can get perfect classifications using one nearest neighbor classification, or we can project the data into 2 dimensional space (see next slide) and get perfect accuracy using a simple linear classifier, a decision tree or SVD.

(Dataset donated by Padhraic Smyth and Seyoung Kim)

1259241428812151327232675191718222320610111621429553238504036445256303439413143333753354542514648494754


1228719153101225416920261427172458222936613211118233441303931373544534652485049563240424538435533544751

Segmental Markov model [1]

0.35

0.4

0.45

0.5

0.55

1

23

4

5

6

7

8 9

10

11

12

13 14

15

1617

18

19

20

21

2223

24

25

26

27 28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

4849

50

51

52

53

54

55

56


Dataset 3: MIT ECG Arrhythmia Data Part 2

Dataset 4: MotorCurrent Part 1

11431618571181012615132024171992134322336253827403022243726393133352829

11661121363126419149243929342171272237273252010152540353031881323383328

The Bitmap approach is completely phase independent, which may be useful for certain datasets.

Consider the Motorcurrent dataset (Donated by Richard J. Povinelli). Here the problem is to distinguish between normal motor operations and “broken connectors”. If we attempt to cluster this dataset with Euclidean distance or DTW, the fact that the sample are out of phase confuses the algorithm (far left), however the bitmap approach can easily produce objectively correct clusterings.

In this problem the time series bitmaps are very very similar between classes, and humans will find it hard to distinguish them. Nevertheless, there is enough information to achieve correct clusterings

Euclidean Distance

This drawing shows the correlation of muscle depolarization and ECG tracings at corresponding times.Phase 0 denotes ventricular depolarization. This is seen on the ECG as the beginning of the QRS complex. Phase 1 denotes the initial rapid repolarization due to closing of fast sodium channels. This is seen as the large drop in mV on the ECG. Phase 2 represents the plateau stage during which inflow and outflow currents are balanced. The ECG returns to baseline.Phase 3 is repolarization. Potassium channels open and calcium closes. The ECG shows the repolarizing T wave.Phase 4 is the recover phase. Both the muscle tracing and ECG return to baseline levels

0 100 200 300 400 500

ventricular depolarization

initial rapid repolarization

“plateau” stage

repolarization

recovery phase

0 100 200 300 400 500

More information about the Kalpakis_ECG demonstration

194322233719412928444547464824535133911143812251740426187103021323836341633311524262720

A clustering of a subset of the Kalpakis_ECG dataset.

Note that while ECGs have incredible variability, the 5 non-ECGs clearly stand out in the bitmap representation.

http://www.physionet.org/cgi-bin/chart?database=mitdb&record=210&annotator=atr&tstart=21&width=small

Anomaly detectionAnomaly detection MITdb/210Dataset

0:19 0:21 0:23 0:25 0:27

Fusion of ventricular and normal beat

Anomaly Score

time series bitmap experiments

Documents