stor 892 object oriented data analysis
DESCRIPTION
STOR 892 Object Oriented Data Analysis. Radial Distance Weighted Discrimination Jie Xiong Advised by Prof. J.S. Marron Department of Statistics and Operations Research UNC-Chapel Hill. Outline. Preliminaries Support Vector Machine (SVM) and Distance Weighted Discrimination (DWD) - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: STOR 892 Object Oriented Data Analysis](https://reader036.vdocument.in/reader036/viewer/2022081513/56813ba9550346895da4da73/html5/thumbnails/1.jpg)
STOR 892 Object Oriented Data Analysis
Radial Distance Weighted Discrimination
Jie XiongAdvised by Prof. J.S. Marron
Department of Statistics and Operations ResearchUNC-Chapel Hill
![Page 2: STOR 892 Object Oriented Data Analysis](https://reader036.vdocument.in/reader036/viewer/2022081513/56813ba9550346895da4da73/html5/thumbnails/2.jpg)
Outline
• Preliminaries– Support Vector Machine (SVM) and Distance Weighted
Discrimination (DWD)
• Data Object, which motivates the development of Radial DWD.– An important application: ‘Virus Hunting’– High Dimension Low Sample (HDLSS) characteristics
• Radial DWD optimizations• Real data and simulation study
Outline-Preliminaries-Data Object - Radial DWD - Simulation and real data
![Page 3: STOR 892 Object Oriented Data Analysis](https://reader036.vdocument.in/reader036/viewer/2022081513/56813ba9550346895da4da73/html5/thumbnails/3.jpg)
Preliminaries
• Binary classification– Using “training data” from Class +1 and Class -1– Develop a “rule” for assigning new data to a Class– Canonical examples include disease diagnosis
based on measurements
– Think about split the data space for the 2 Classes using a classification boundary• Most simple case: linear hyperplane
Outline-Preliminaries-Data Object - Radial DWD - Simulation and real data
![Page 4: STOR 892 Object Oriented Data Analysis](https://reader036.vdocument.in/reader036/viewer/2022081513/56813ba9550346895da4da73/html5/thumbnails/4.jpg)
Optimization ViewpointFormulate Optimization problem, based
on:• Data (feature) vectors • Class Labels • Normal Vector • Location (determines intercept) • Residuals (right side) • Residuals (wrong side)
nxx ,...,1
1iyw
b bwxyr tiii
ii r
![Page 5: STOR 892 Object Oriented Data Analysis](https://reader036.vdocument.in/reader036/viewer/2022081513/56813ba9550346895da4da73/html5/thumbnails/5.jpg)
![Page 6: STOR 892 Object Oriented Data Analysis](https://reader036.vdocument.in/reader036/viewer/2022081513/56813ba9550346895da4da73/html5/thumbnails/6.jpg)
Preliminaries
• SVM and DWD– Both are binary classifiers, they separate the 2
classes using a hyperplane– DWD is designed for High Dimension Low Sample
Size (HDLSS) data, avoid data piling, larger generalizability
Outline-Preliminaries-Data Object - Radial DWD - Simulation and real data
![Page 7: STOR 892 Object Oriented Data Analysis](https://reader036.vdocument.in/reader036/viewer/2022081513/56813ba9550346895da4da73/html5/thumbnails/7.jpg)
50d)1,0(N
2.21 20 nn
Optimal Direction
![Page 8: STOR 892 Object Oriented Data Analysis](https://reader036.vdocument.in/reader036/viewer/2022081513/56813ba9550346895da4da73/html5/thumbnails/8.jpg)
Support Vector Machine Direction
50d)1,0(N
2.21 20 nn
![Page 9: STOR 892 Object Oriented Data Analysis](https://reader036.vdocument.in/reader036/viewer/2022081513/56813ba9550346895da4da73/html5/thumbnails/9.jpg)
Distance Weighted Discrimination
50d)1,0(N
2.21 20 nn
![Page 10: STOR 892 Object Oriented Data Analysis](https://reader036.vdocument.in/reader036/viewer/2022081513/56813ba9550346895da4da73/html5/thumbnails/10.jpg)
Data Objects
• Introduce ‘Virus Hunting’ using DNA sequence-alignment data.– DNA sequence and alignment– Data vector and the normalization used– HDLSS data geometry: data on simplex
Outline-Preliminaries-Data Object - Radial DWD - Simulation and real data
![Page 11: STOR 892 Object Oriented Data Analysis](https://reader036.vdocument.in/reader036/viewer/2022081513/56813ba9550346895da4da73/html5/thumbnails/11.jpg)
Data Objects
• Introduce ‘Virus Hunting’ using DNA sequence-alignment data.– DNA sequence and alignment– Data vector and the normalization used– HDLSS data geometry: data on simplex
Outline-Preliminaries-Data Object - Radial DWD - Simulation and real data
![Page 12: STOR 892 Object Oriented Data Analysis](https://reader036.vdocument.in/reader036/viewer/2022081513/56813ba9550346895da4da73/html5/thumbnails/12.jpg)
Outline-Preliminaries-Data Object - Radial DWD - Simulation and real data
Virus sequenceReference (HSV-1)
Short reads
![Page 13: STOR 892 Object Oriented Data Analysis](https://reader036.vdocument.in/reader036/viewer/2022081513/56813ba9550346895da4da73/html5/thumbnails/13.jpg)
Outline-Preliminaries-Data Object - Radial DWD - Simulation and real data
![Page 14: STOR 892 Object Oriented Data Analysis](https://reader036.vdocument.in/reader036/viewer/2022081513/56813ba9550346895da4da73/html5/thumbnails/14.jpg)
Data Objects
• Data on simplex• Let d be the dimension of the data space• (x1,…,xd) with non-negative entries adding up to 1 is
on the unit simplex of dimension (d-1)• (1/d,…,1/d) is the center of the unit simplex• (0,…1,…,0) is one of the vertices
Outline-Preliminaries-Data Object - Radial DWD - Simulation and real data
![Page 15: STOR 892 Object Oriented Data Analysis](https://reader036.vdocument.in/reader036/viewer/2022081513/56813ba9550346895da4da73/html5/thumbnails/15.jpg)
Outline-Preliminaries-Data Object - Radial DWD - Simulation and real data
![Page 16: STOR 892 Object Oriented Data Analysis](https://reader036.vdocument.in/reader036/viewer/2022081513/56813ba9550346895da4da73/html5/thumbnails/16.jpg)
Data Objects
• Introduce ‘Virus Hunting’ using DNA sequence-alignment data.– DNA sequence and alignment– Data vector and the normalization use– HDLSS data geometry
• Data points on the unit simplex • Position and distances.
Outline-Preliminaries-Data Object - Radial DWD - Simulation and real data
![Page 17: STOR 892 Object Oriented Data Analysis](https://reader036.vdocument.in/reader036/viewer/2022081513/56813ba9550346895da4da73/html5/thumbnails/17.jpg)
Outline-Preliminaries-Data Object - Radial DWD - Simulation and real data
![Page 18: STOR 892 Object Oriented Data Analysis](https://reader036.vdocument.in/reader036/viewer/2022081513/56813ba9550346895da4da73/html5/thumbnails/18.jpg)
Data Objects
• What can we say about the linear classifiers?– When dimension is low, training data may not be linear
separable– Under HDLSS, very often the training data is linearly
separable (see Ahn and Marron 2010), however, the classification for the new samples could be very bad.
Outline-Preliminaries-Data Object - Radial DWD - Simulation and real data
![Page 19: STOR 892 Object Oriented Data Analysis](https://reader036.vdocument.in/reader036/viewer/2022081513/56813ba9550346895da4da73/html5/thumbnails/19.jpg)
Outline-Preliminaries-Data Object - Radial DWD - Simulation and real data
![Page 20: STOR 892 Object Oriented Data Analysis](https://reader036.vdocument.in/reader036/viewer/2022081513/56813ba9550346895da4da73/html5/thumbnails/20.jpg)
Outline-Preliminaries-Data Object - Radial DWD - Simulation and real data
![Page 21: STOR 892 Object Oriented Data Analysis](https://reader036.vdocument.in/reader036/viewer/2022081513/56813ba9550346895da4da73/html5/thumbnails/21.jpg)
Outline-Preliminaries-Data Object - Radial DWD - Simulation and real data
What can we say about this testing sample?
![Page 22: STOR 892 Object Oriented Data Analysis](https://reader036.vdocument.in/reader036/viewer/2022081513/56813ba9550346895da4da73/html5/thumbnails/22.jpg)
Outline-Preliminaries-Data Object - Radial DWD - Simulation and real data
1. Visualize using the distance to the center of the sphere, in high dimension cases.
2. Inside or outside the sphere?
![Page 23: STOR 892 Object Oriented Data Analysis](https://reader036.vdocument.in/reader036/viewer/2022081513/56813ba9550346895da4da73/html5/thumbnails/23.jpg)
Outline-Preliminaries-Data Object - Radial DWD - Simulation and real data
R
Data points: x1…xi…xnClass labels: y1…yi…yn are +/-1O: center of the sphereR: Radius of the sphere
![Page 24: STOR 892 Object Oriented Data Analysis](https://reader036.vdocument.in/reader036/viewer/2022081513/56813ba9550346895da4da73/html5/thumbnails/24.jpg)
Outline-Preliminaries-Data Object - Radial DWD - Simulation and real data
R
Data points: x1…xi…xnClass labels: y1…yi…yn are +/-1O: center of the sphereR: Radius of the sphere
The distance of a data point xi to the center is
![Page 25: STOR 892 Object Oriented Data Analysis](https://reader036.vdocument.in/reader036/viewer/2022081513/56813ba9550346895da4da73/html5/thumbnails/25.jpg)
Outline-Preliminaries-Data Object - Radial DWD - Simulation and real data
R
Data points: x1…xi…xnClass labels: y1…yi…yn are +/-1O: center of the sphereR: Radius of the sphere
The distance of a data point xi to the sphere is
![Page 26: STOR 892 Object Oriented Data Analysis](https://reader036.vdocument.in/reader036/viewer/2022081513/56813ba9550346895da4da73/html5/thumbnails/26.jpg)
Outline-Preliminaries-Data Object - Radial DWD - Simulation and real data
R
Data points: x1…xi…xnClass labels: y1…yi…yn are +/-1O: center of the sphereR: Radius of the sphere
The distance of a data point xi to the sphere is
The objective is to minimize the inverse of the sum of the distances to the sphere
![Page 27: STOR 892 Object Oriented Data Analysis](https://reader036.vdocument.in/reader036/viewer/2022081513/56813ba9550346895da4da73/html5/thumbnails/27.jpg)
Radial DWD
• Radial DWD Optimization
Outline-Preliminaries-Data Object - Radial DWD - Simulation and real data
![Page 28: STOR 892 Object Oriented Data Analysis](https://reader036.vdocument.in/reader036/viewer/2022081513/56813ba9550346895da4da73/html5/thumbnails/28.jpg)
Simulation and real data
• Real ‘Virus Hunting’ data.– HSV1
• Simulated Data using Dirichlet distribution.– Compare Radial DWD with some alternatives: MD, SVM,
DWD, LASSO, kernel SVM
Outline-Preliminaries-Data Object - Radial DWD - Simulation and real data
![Page 29: STOR 892 Object Oriented Data Analysis](https://reader036.vdocument.in/reader036/viewer/2022081513/56813ba9550346895da4da73/html5/thumbnails/29.jpg)
Simulation and real data
• Real ‘Virus Hunting’ data.– HSV1 positives in training– HSV1 negatives in training– HSV1 related samples in testing (human and other species)– Unrelated samples in testing
• Simulated Data using Dirichlet distribution.– Compare Radial DWD with some alternatives: MD, SVM,
DWD, LASSO, kernel SVM
Outline-Preliminaries-Data Object - Radial DWD - Simulation and real data
![Page 30: STOR 892 Object Oriented Data Analysis](https://reader036.vdocument.in/reader036/viewer/2022081513/56813ba9550346895da4da73/html5/thumbnails/30.jpg)
Outline-Preliminaries-Data Object - Radial DWD - Simulation and real data
Real ‘Virus Hunting’ data.HSV1 positives in trainingHSV1 negatives in trainingHSV1 related samples in testing (human and other species)Unrelated samples in testing
![Page 31: STOR 892 Object Oriented Data Analysis](https://reader036.vdocument.in/reader036/viewer/2022081513/56813ba9550346895da4da73/html5/thumbnails/31.jpg)
Outline-Preliminaries-Data Object - Radial DWD - Simulation and real data
![Page 32: STOR 892 Object Oriented Data Analysis](https://reader036.vdocument.in/reader036/viewer/2022081513/56813ba9550346895da4da73/html5/thumbnails/32.jpg)
Simulation and real data
• Real ‘Virus Hunting’ data.– HSV1
• Simulated Data using Dirichlet distribution.– Dirichlet distribution: a 2 dimensional simplex case using
Dirichlet (a1,a2,a3), and a1=a2=a3 = a.– Compare Radial DWD with some alternatives: MD, SVM,
DWD, LASSO, kernel SVM
Outline-Preliminaries-Data Object - Radial DWD - Simulation and real data
![Page 33: STOR 892 Object Oriented Data Analysis](https://reader036.vdocument.in/reader036/viewer/2022081513/56813ba9550346895da4da73/html5/thumbnails/33.jpg)
Outline-Preliminaries-Data Object - Simulation and real data
The density of Dirichlet(a,a,a)
![Page 34: STOR 892 Object Oriented Data Analysis](https://reader036.vdocument.in/reader036/viewer/2022081513/56813ba9550346895da4da73/html5/thumbnails/34.jpg)
Outline-Preliminaries-Data Object - Radial DWD - Simulation and real data
![Page 35: STOR 892 Object Oriented Data Analysis](https://reader036.vdocument.in/reader036/viewer/2022081513/56813ba9550346895da4da73/html5/thumbnails/35.jpg)
Simulation and real data
• Real ‘Virus Hunting’ data.– HSV1
• Simulated Data using Dirichlet distribution.– Dirichlet distribution: a 2 dimensional simplex case using
Dirichlet (a1,a2,a3), and a1=a2=a3.– Compare Radial DWD with some alternatives: MD, SVM,
DWD, LASSO, kernel SVM
Outline-Preliminaries-Data Object - Radial DWD - Simulation and real data
![Page 36: STOR 892 Object Oriented Data Analysis](https://reader036.vdocument.in/reader036/viewer/2022081513/56813ba9550346895da4da73/html5/thumbnails/36.jpg)
Outline-Preliminaries-Data Object - Radial DWD - Simulation and real data
![Page 37: STOR 892 Object Oriented Data Analysis](https://reader036.vdocument.in/reader036/viewer/2022081513/56813ba9550346895da4da73/html5/thumbnails/37.jpg)
Outline-Preliminaries-Data Object - Radial DWD - Simulation and real data