time series bitmap experiments
DESCRIPTION
Time Series Bitmap Experiments. This file contains full color, large scale versions of the experiments shown in the paper, and additional experiments which were omitted because of space constraints Note that in every case, all the data is freely available. - PowerPoint PPT PresentationTRANSCRIPT
Time Series Bitmap ExperimentsTime Series Bitmap Experiments
• This file contains full color, large scale versions of the experiments shown in the paper, and additional experiments which were omitted because of space constraints
• Note that in every case, all the data is freely available
12
171827
282930
34
5621
2278
910
111213
141516
1920
262523
24
The clustering achieved on 15 pairs of samples from 15 diverse datasets. The red lines in the dendrogram draw attention to objectively incorrect subtrees
ParametersLevel 3N = 100n = 10
Dataset 1: Heterogeneous Data, Part 1
1 MotorCurrent: 2 MotorCurrent: 3 Video Surveillance: Ann, gun 4 Video Surveillance: Ann, no gun 5 Video Surveillance: Eamonn, gun 6 Video Surveillance: Eamonn, no gun 7 Power Demand: Jan-March (Italian) 8 Power Demand: April-June (Italian) 9 Great Lakes (Erie) 10 Great Lakes (Ontario) 11 Buoy Sensor: North Salinity 12 Buoy Sensor East Salinity 13 Koski ECG: slow 1 14 Koski ECG: slow 2 15 Koski ECG: fast 1 16 Koski ECG: fast 2 17 Exchange Rate: Swiss Franc 18 Exchange Rate: German Mark 19 Furnace: heating input 20 Furnace: cooling input 21 Reel 2: angular speed 22 Reel 2: tension 23 Balloon1 24 Balloon2 (lagged) 25 Evaporator: feed flow 26 Evaporator: vapor flow 27 Shuttle Inertia Sensor X 28 Shuttle Inertia Sensor X 29 Shuttle Inertia Sensor Z 30 Shuttle Inertia Sensor Z
Data Key
Data is in ASCII file“time_series_bitmap_1”
12
293021
2256
34
91011
121718
2827
7813
141516
1926
202523
24
Dataset 1: Heterogeneous Data, Part 2 If we do the clustering with only level 2 information, the clustering is very slightly worse, but still quite robust considering that we are only using 1.6% of the information available in the time series
ParametersLevel 2N = 100n = 10
1 MotorCurrent: 2 MotorCurrent: 3 Video Surveillance: Ann, gun 4 Video Surveillance: Ann, no gun 5 Video Surveillance: Eamonn, gun 6 Video Surveillance: Eamonn, no gun 7 Power Demand: Jan-March (Italian) 8 Power Demand: April-June (Italian) 9 Great Lakes (Erie) 10 Great Lakes (Ontario) 11 Buoy Sensor: North Salinity 12 Buoy Sensor East Salinity 13 Koski ECG: slow 1 14 Koski ECG: slow 2 15 Koski ECG: fast 1 16 Koski ECG: fast 2 17 Exchange Rate: Swiss Franc 18 Exchange Rate: German Mark 19 Furnace: heating input 20 Furnace: cooling input 21 Reel 2: angular speed 22 Reel 2: tension 23 Balloon1 24 Balloon2 (lagged) 25 Evaporator: feed flow 26 Evaporator: vapor flow 27 Shuttle Inertia Sensor X 28 Shuttle Inertia Sensor X 29 Shuttle Inertia Sensor Z 30 Shuttle Inertia Sensor Z
Data Key
12
171827
282930
910
111221
2234
56
131415
1678
1920
262523
24
ParametersLevel 3N = 64n = 8
Dataset 1: Heterogeneous Data, Part 3
12
171830
292728
34
91011
122122
56
7813
141516
1920
262523
24
ParametersLevel 3N = 77n = 11
12
171827
282930
34
5621
2278
910
111213
141516
1920
262523
24
ParametersLevel 3N = 54n = 9
Changing the parameters by up to 50% either way has little effect on the quality of the clustering.
Here are some random examples
1 MotorCurrent: 2 MotorCurrent: 3 Video Surveillance: Ann, gun 4 Video Surveillance: Ann, no gun 5 Video Surveillance: Eamonn, gun 6 Video Surveillance: Eamonn, no gun 7 Power Demand: Jan-March (Italian) 8 Power Demand: April-June (Italian) 9 Great Lakes (Erie) 10 Great Lakes (Ontario) 11 Buoy Sensor: North Salinity 12 Buoy Sensor East Salinity 13 Koski ECG: slow 1 14 Koski ECG: slow 2 15 Koski ECG: fast 1 16 Koski ECG: fast 2 17 Exchange Rate: Swiss Franc 18 Exchange Rate: German Mark 19 Furnace: heating input 20 Furnace: cooling input 21 Reel 2: angular speed 22 Reel 2: tension 23 Balloon1 24 Balloon2 (lagged) 25 Evaporator: feed flow 26 Evaporator: vapor flow 27 Shuttle Inertia Sensor X 28 Shuttle Inertia Sensor X 29 Shuttle Inertia Sensor Z 30 Shuttle Inertia Sensor Z
Data Key
Dataset 1: Heterogeneous Data, Part 4We compared our approach to a Markov model based approach and a ARIMA based approach.For both competitors we spent one hour of human time trying to find the best parameters
1 MotorCurrent: 2 MotorCurrent: 3 Video Surveillance: Ann, gun 4 Video Surveillance: Ann, no gun 5 Video Surveillance: Eamonn, gun 6 Video Surveillance: Eamonn, no gun 7 Power Demand: Jan-March (Italian) 8 Power Demand: April-June (Italian) 9 Great Lakes (Erie) 10 Great Lakes (Ontario) 11 Buoy Sensor: North Salinity 12 Buoy Sensor East Salinity 13 Koski ECG: slow 1 14 Koski ECG: slow 2 15 Koski ECG: fast 1 16 Koski ECG: fast 2 17 Exchange Rate: Swiss Franc 18 Exchange Rate: German Mark 19 Furnace: heating input 20 Furnace: cooling input 21 Reel 2: angular speed 22 Reel 2: tension 23 Balloon1 24 Balloon2 (lagged) 25 Evaporator: feed flow 26 Evaporator: vapor flow 27 Shuttle Inertia Sensor X 28 Shuttle Inertia Sensor X 29 Shuttle Inertia Sensor Z 30 Shuttle Inertia Sensor Z
Data Key
128
21718
2756
2122
29303
478
910
121113
161514
1923
242620
25
117
18282
5622
34
102113
141516
711
9198
122025
2623
242729
30
12
171827
282930
34
5621
2278
910
111213
141516
1920
262523
24
ParametersLevel 3N = 100n = 10
Segmental Markov model [1]
Mixtures of ARMA models [2]
1
2
3
4
5
11
13
12
14
15
6
9
10
7
8
16
18
17
19
20
Dataset 2: Homogenous Data, Part 1
Cluster 1 (datasets 1 ~ 5): BIDMC Congestive Heart Failure Database (chfdb): record chf02
Start times at 0, 82, 150, 200, 250, respectively
Cluster 2 (datasets 6 ~ 10): BIDMC Congestive Heart Failure Database (chfdb): record chf15
Start times at 0, 82, 150, 200, 250, respectively
Cluster 3 (datasets 11 ~ 15): Long Term ST Database (ltstdb): record 20021
Start times at 0, 50, 100, 150, 200, respectively
Cluster 4 (datasets 16 ~ 20): MIT-BIH Noise Stress Test Database (nstdb): record 118e6
Start times at 0, 50, 100, 150, 200, respectively
Data Key
ParametersLevel 3N = 50n = 10
Here we cluster 5 randomly chosen subsections from 4 different ECG datasets
Dataset 2: Homogenous Data, Part 2
1
5
2
3
4
11
13
12
14
15
16
20
18
19
17
6
9
10
7
8
The bitmap approach is defined (and very robust) when the time series are of different lengths
Dataset 3: MIT ECG Arrhythmia Data Part 1In Ge and Smyth 2000, this dataset was explored with segmental hidden Markov models. After they careful adjusted the parameters they reported 98% classification accuracy. Using time series bitmap with virtually any parameter settings, we get perfect classifications and clustering.
We can get perfect classifications using one nearest neighbor classification, or we can project the data into 2 dimensional space (see next slide) and get perfect accuracy using a simple linear classifier, a decision tree or SVD.
(Dataset donated by Padhraic Smyth and Seyoung Kim)
1259241428812151327232675191718222320610111621429553238504036445256303439413143333753354542514648494754
ParametersLevel 1N = 60n = 12
1228719153101225416920261427172458222936613211118233441303931373544534652485049563240424538435533544751
Segmental Markov model [1]
0.35
0.4
0.45
0.5
0.55
1
23
4
5
6
7
8 9
10
11
12
13 14
15
1617
18
19
20
21
2223
24
25
26
27 28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
4849
50
51
52
53
54
55
56
ParametersLevel 1N = 60n = 12
Dataset 3: MIT ECG Arrhythmia Data Part 2
Dataset 4: MotorCurrent Part 1
11431618571181012615132024171992134322336253827403022243726393133352829
11661121363126419149243929342171272237273252010152540353031881323383328
The Bitmap approach is completely phase independent, which may be useful for certain datasets.
Consider the Motorcurrent dataset (Donated by Richard J. Povinelli). Here the problem is to distinguish between normal motor operations and “broken connectors”. If we attempt to cluster this dataset with Euclidean distance or DTW, the fact that the sample are out of phase confuses the algorithm (far left), however the bitmap approach can easily produce objectively correct clusterings.
In this problem the time series bitmaps are very very similar between classes, and humans will find it hard to distinguish them. Nevertheless, there is enough information to achieve correct clusterings
Euclidean Distance
This drawing shows the correlation of muscle depolarization and ECG tracings at corresponding times.Phase 0 denotes ventricular depolarization. This is seen on the ECG as the beginning of the QRS complex. Phase 1 denotes the initial rapid repolarization due to closing of fast sodium channels. This is seen as the large drop in mV on the ECG. Phase 2 represents the plateau stage during which inflow and outflow currents are balanced. The ECG returns to baseline.Phase 3 is repolarization. Potassium channels open and calcium closes. The ECG shows the repolarizing T wave.Phase 4 is the recover phase. Both the muscle tracing and ECG return to baseline levels
0 100 200 300 400 500
ventricular depolarization
initial rapid repolarization
“plateau” stage
repolarization
recovery phase
0 100 200 300 400 500
More information about the Kalpakis_ECG demonstration
194322233719412928444547464824535133911143812251740426187103021323836341633311524262720
A clustering of a subset of the Kalpakis_ECG dataset.
Note that while ECGs have incredible variability, the 5 non-ECGs clearly stand out in the bitmap representation.
http://www.physionet.org/cgi-bin/chart?database=mitdb&record=210&annotator=atr&tstart=21&width=small
Anomaly detectionAnomaly detection MITdb/210Dataset
0:19 0:21 0:23 0:25 0:27
Fusion of ventricular and normal beat
Anomaly Score