wracog: a gibbs sampling-based oversampling technique
DESCRIPTION
This paper was presented at the International Conference on Data Mining, 2013.TRANSCRIPT
![Page 1: wRACOG: A Gibbs Sampling-Based Oversampling Technique](https://reader036.vdocument.in/reader036/viewer/2022062418/555ce052d8b42a4f2b8b56df/html5/thumbnails/1.jpg)
Barnan DasSchool of Electrical Engineering and Computer Science
Washington State University
wRACOG: A Gibbs Sampling-Based Oversampling TechniqueBarnan Das, Narayanan C. Krishnan, Diane J. Cook
![Page 2: wRACOG: A Gibbs Sampling-Based Oversampling Technique](https://reader036.vdocument.in/reader036/viewer/2022062418/555ce052d8b42a4f2b8b56df/html5/thumbnails/2.jpg)
2
Imbalanced Class Distribution
![Page 3: wRACOG: A Gibbs Sampling-Based Oversampling Technique](https://reader036.vdocument.in/reader036/viewer/2022062418/555ce052d8b42a4f2b8b56df/html5/thumbnails/3.jpg)
3
Automated Prompting for Older Adults
![Page 4: wRACOG: A Gibbs Sampling-Based Oversampling Technique](https://reader036.vdocument.in/reader036/viewer/2022062418/555ce052d8b42a4f2b8b56df/html5/thumbnails/4.jpg)
4
Automated Prompting for Older Adults
![Page 5: wRACOG: A Gibbs Sampling-Based Oversampling Technique](https://reader036.vdocument.in/reader036/viewer/2022062418/555ce052d8b42a4f2b8b56df/html5/thumbnails/5.jpg)
Class Distribution
5
149
3831
Total number of data points
3980
![Page 6: wRACOG: A Gibbs Sampling-Based Oversampling Technique](https://reader036.vdocument.in/reader036/viewer/2022062418/555ce052d8b42a4f2b8b56df/html5/thumbnails/6.jpg)
Solution?
6
Preprocessing
Sampling• Over-sampling the minority class• Under-sampling the majority class
Oversampling• Spatial location of samples in Euclidean space
![Page 7: wRACOG: A Gibbs Sampling-Based Oversampling Technique](https://reader036.vdocument.in/reader036/viewer/2022062418/555ce052d8b42a4f2b8b56df/html5/thumbnails/7.jpg)
Proposed Approach
7
Preprocessing technique to oversample minority class
Approximate discrete probability distribution using
Generate new minority class data points using
Chow-Liu’s algorithm Gibbs sampling
![Page 8: wRACOG: A Gibbs Sampling-Based Oversampling Technique](https://reader036.vdocument.in/reader036/viewer/2022062418/555ce052d8b42a4f2b8b56df/html5/thumbnails/8.jpg)
Approximating Discrete Probability Distribution
8
Minority Class
Mutual Information Between Attributes
I (xi,xj)i = 1,2,…(n-1)j = 2,3,…,ni < j
Maximum-weighted Dependence Tree
Chow-Liu Dependence Tree
![Page 9: wRACOG: A Gibbs Sampling-Based Oversampling Technique](https://reader036.vdocument.in/reader036/viewer/2022062418/555ce052d8b42a4f2b8b56df/html5/thumbnails/9.jpg)
Gibbs Sampling
9
For all attributes
Chow-Liu Dependence Tree
![Page 10: wRACOG: A Gibbs Sampling-Based Oversampling Technique](https://reader036.vdocument.in/reader036/viewer/2022062418/555ce052d8b42a4f2b8b56df/html5/thumbnails/10.jpg)
Gibbs Sampling
10
Minority Class Samples
Majority Class Samples
Markov Chains
![Page 11: wRACOG: A Gibbs Sampling-Based Oversampling Technique](https://reader036.vdocument.in/reader036/viewer/2022062418/555ce052d8b42a4f2b8b56df/html5/thumbnails/11.jpg)
(wrapper-based)RApidly COnverging Gibbs sampler: RACOG & wRACOG
11
Differ in sample selection from Markov chains RACOG:• Based on burn-in and lag• Stopping criteria: predefined number of iterations• Effectiveness of new samples is not judged
wRACOG:• Iterative training on dataset, addition of
misclassified data points• Stopping criteria: No further improvement of
performance measure (TP rate)
![Page 12: wRACOG: A Gibbs Sampling-Based Oversampling Technique](https://reader036.vdocument.in/reader036/viewer/2022062418/555ce052d8b42a4f2b8b56df/html5/thumbnails/12.jpg)
Experimental Setup
12
Datasets
• prompting• abalone• car• nursery• letter• connect-4
Classifiers
• C4.5 decision tree
• SVM• k-Nearest
Neighbor• Logistic
Regression
Other Methods
• SMOTE• SMOTEBoost• RUSBoost
![Page 13: wRACOG: A Gibbs Sampling-Based Oversampling Technique](https://reader036.vdocument.in/reader036/viewer/2022062418/555ce052d8b42a4f2b8b56df/html5/thumbnails/13.jpg)
Results (Sensitivity)
13
![Page 14: wRACOG: A Gibbs Sampling-Based Oversampling Technique](https://reader036.vdocument.in/reader036/viewer/2022062418/555ce052d8b42a4f2b8b56df/html5/thumbnails/14.jpg)
Results (G-mean)
14
![Page 15: wRACOG: A Gibbs Sampling-Based Oversampling Technique](https://reader036.vdocument.in/reader036/viewer/2022062418/555ce052d8b42a4f2b8b56df/html5/thumbnails/15.jpg)
Results (ROC)
15
![Page 16: wRACOG: A Gibbs Sampling-Based Oversampling Technique](https://reader036.vdocument.in/reader036/viewer/2022062418/555ce052d8b42a4f2b8b56df/html5/thumbnails/16.jpg)
New Samples Generated
16
![Page 17: wRACOG: A Gibbs Sampling-Based Oversampling Technique](https://reader036.vdocument.in/reader036/viewer/2022062418/555ce052d8b42a4f2b8b56df/html5/thumbnails/17.jpg)
Iterations of Gibbs Sampler
17
![Page 18: wRACOG: A Gibbs Sampling-Based Oversampling Technique](https://reader036.vdocument.in/reader036/viewer/2022062418/555ce052d8b42a4f2b8b56df/html5/thumbnails/18.jpg)
Conclusion
18
• Oversampling technique to address imbalanced classes
• Takes probability distribution of minority class into account
• Performs better than other sampling methods
![Page 19: wRACOG: A Gibbs Sampling-Based Oversampling Technique](https://reader036.vdocument.in/reader036/viewer/2022062418/555ce052d8b42a4f2b8b56df/html5/thumbnails/19.jpg)
19
![Page 20: wRACOG: A Gibbs Sampling-Based Oversampling Technique](https://reader036.vdocument.in/reader036/viewer/2022062418/555ce052d8b42a4f2b8b56df/html5/thumbnails/20.jpg)
Backup Slides
20