positive unlabeled learning for time series classification
DESCRIPTION
Positive Unlabeled Learning for Time Series Classification. Nguyen Minh Nhut Xiao-Li Li See-Kiong Ng Institute for Infocomm Research, Singapore. Outline. Introduction and Related Work The Proposed Technique Learning from Common Local Clusters (LCLC ) Evaluation Experiments Conclusions - PowerPoint PPT PresentationTRANSCRIPT
Positive Unlabeled Positive Unlabeled Learning for Time Learning for Time
Series ClassificationSeries Classification
Nguyen Minh NhutXiao-Li LiSee-Kiong Ng
Institute for Infocomm Research, Singapore
OutlineOutline
Introduction and Related Work
The Proposed Technique Learning from Common Local Clusters (LCLC )
Evaluation Experiments
Conclusions
Discussion and question
1. Introduction1. Introduction
Traditional Supervised Learning
– Given a set of labeled training examples of n classes, the system uses this set to build a classifier.
– The classifier is then used to classify new examples into the n classes.
Typically require a large number of labeled examples:
– Human labels can be expensive, time consuming and sometimes even impossible.
Unlabeled DataUnlabeled Data
Unlabeled data are usually plentiful.
Can we label only a small number of examples and make use of a large number of unlabeled examples to learn?
Unlabeled data contain information which can be used to improve the classification accuracy.
Positive-Unlabeled (PU) Positive-Unlabeled (PU) LearningLearning
Positive examples: We have a set of examples of interesting class P, and
Unlabeled set: also has a set U of unlabeled (or mixed) examples with instances from P and also not from P (negative examples).
Build a classifier: using P and U to classify the data in U as well as future test data.
PU Learning for Time PU Learning for Time Series ClassificationSeries Classification
PU learning is applicable in a wide range of application domains such as: text classification, bio-medical informatics, pattern recognition, and recommendation system…
However, the application of PU learning for time series data has been relatively less explored due to:– High feature correlation.– Lack of joint probability distribution over words and
collocations (in texts).
Existing PU Learning for Existing PU Learning for Time Series ClassificationTime Series Classification
Wei, L. and E. Keogh (2006). "Semi-supervised time series classification." ACM SIGKDD.
The idea is: Let the classifier teach itself by its own predication.
Unfortunately, without a good stopping criterion, the method often stops too early, resulting in high precision but low recall.
The Proposed Technique: LCLC The Proposed Technique: LCLC ( Learning from Common Local ( Learning from Common Local
Clusters)Clusters)
The proposed LCLC algorithm addresses two specific issues in PU learning for time series classification:– Select independent and relevant features
from the time series data using cluster-based approach.
– Accurately extract reliable positive and negatives from the given unlabeled data.
Local Clustering and Feature Local Clustering and Feature SelectionSelection
P
U
U
U
U
U
U
U
U
U
U
Algorithm 1. Local clustering and feature selection Input: one initial seed positive s, unlabelled dataset U, number of clusters K 1. Use Wei’s method to get an initial positive set
P; 2. K-ULCs Partition U into K local clusters
using K-means; 3. Select K common principal features from the
raw feature set Clever-Cluster(P, K-ULCs);
Yoon, H., K. Yang, et al. (2005). "Feature subset selection and feature ranking for multivariate time series." IEEE Transactions on Knowledge and Data Engineering
The cluster-based approach of the proposed LCLC method offers the following advantages:
The cluster-based approach is much more robust than instance-based methods for extracting the likely positives and negatives from U.
The similarity between two time series data can be effectively measured using a well-selected subset of the common principal features that can capture the underlying characteristics of both positive and unlabeled clusters.
Local Clustering and Feature Local Clustering and Feature SelectionSelection
Extracting Reliable Negative set Extracting Reliable Negative set
1. RN = , AMBI= ;2. For i=1 to K3. Compute the distance between local cluster
ULCi to P;
4. Sort d(ULCi , P) (i=1, 2, …, K) in a decreasing order;
5. dMedian= the median distance of d(ULCi , P) (i=1, 2,…, K);
6. For i=1 to K
7. If (d(ULCi , P)> dMedian)
8. RN= RN ULCi;9. Else
10. AMBI = AMBI ULCi;
P
R N
R N
R N
R N
R N
A M B I
A M B I
A M B I
A M B I
A M B I
Algorithm 2. Extracting Reliable Negative ExamplesInput: positive data P, K unlabeled local clusters ULCi
Boundary decision using Cluster Boundary decision using Cluster chaining approachchaining approach
Algorithm 3. Identifying likely positive clusters LP and likely negative clusters LNInput: ambiguous clusters AMBI, positive cluster P, reliable negative clusters set in RN
P
D ecision B ou n d a ry
R N
R N
R N
R N
R N
A M B I
A M B I
A M B I
A M B I
A M B I
1. LP =; LN =; 2. While (AMBI!= ) 3. Find the nearest AMBI cluster CAMBI,A to P and
add CAMBI,A to cluster-chaini ; 4. While CAMBI,A RN 5. Find the nearest cluster CAMBI,B (from
AMBI RN) to CAMBI,A and add CAMBI,B to cluster-chaini;
6. CAMBI,A= CAMBI,B; 7. Loop for all the cluster-chains 8. breaking linki (Cm, Cm+1)= the link with
maximal distance between the clusters in the cluster-chain
9. LP AMBI from the cluster-chain with P 10. LN AMBI from the cluster-chain without P
Boundary decision using Cluster Boundary decision using Cluster chaining approachchaining approach
Minimize the effect of possible noisy examples.
Offer a robust solution for the cases of severely unbalanced positive and negative examples in the unlabeled dataset U .
EMPIRICAL EVALUATIONEMPIRICAL EVALUATION
NameTraining set Testing set
Num ofFeaturesPositive Negative Positive Negative
ECG 208 602 312 904 86Word
Spotting109 796 109 796 272
Wafer 381 3201 381 3201 152Yoga 156 150 156 150 428CBF 155 310 155 310 128
Datasets
Wei, L. (2007). "Self Training dataset." http://alumni.cs.ucr.edu/~wli/selfTraining/.Keogh, E. (2008). "The UCR Time Series Classification/ Clustering Homepage" http://www.cs.ucr.edu/~eamonn/time_series_data/.
Experiment settingExperiment setting
We randomly select just one seed instance from the positive class for the learning phase, the rest of training data are treated as unlabeled data.
We build a 1-NN classifier using P together with LP as a positive training set, and RN together with LN as a negative training set.
We repeat our experiments 30 times and report the average values of the 30 results.
Overall performanceOverall performance
Dataset ECGWord
SpottingWafer Yoga CBF
Wei’s method 0.405 0.279 0.433 0.466 0.201
Ratana’s method 0.840 0.637 0.080 0.626 0.309
LCLC wo FS 0.631 0.608 0.637 0.808 0.599
LCLC wo CC 0.781 0.52 0.32 0.699 0.586
LCLC (F-measure) 0.867 0.727 0.724 0.854 0.701
Wei, L. and E. Keogh (2006)Ratanamahatana, C. and D. Wanichsan (2008).
Sensitivity of the size of Sensitivity of the size of local clusterslocal clusters
We have set the number of clusters K = Size(U)/ULC_size, where ULC_size is size of the unlabeled clusters.
ConclusionsConclusions
First, LCLC adopts a cluster-based method that is much more robust than instance-based PU learning methods.
Secondly, we have adopted a feature selection strategy that can take the characteristics of both positive and unlabeled clusters.
Finally, we have devised a novel cluster chaining approach to extract the boundary positive and negative clusters.
There are three key approaches that underlie LCLC’s improved classification performance over existing methods.
Discussion and Discussion and questionquestion