effective dimension reduction with prior knowledge haesun park division of computational science and...
Post on 21-Dec-2015
219 views
TRANSCRIPT
![Page 1: Effective Dimension Reduction with Prior Knowledge Haesun Park Division of Computational Science and Eng. College of Computing Georgia Institute of Technology](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d6c5503460f94a4bfbe/html5/thumbnails/1.jpg)
Effective Dimension Reductionwith Prior Knowledge
Haesun Park
Division of Computational Science and Eng.
College of Computing
Georgia Institute of Technology
Atlanta, GA
Joint work w/ Barry Drake, Peg Howland, Hyunsoo Kim, and Cheonghee Park DIMACS, May, 2007
![Page 2: Effective Dimension Reduction with Prior Knowledge Haesun Park Division of Computational Science and Eng. College of Computing Georgia Institute of Technology](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d6c5503460f94a4bfbe/html5/thumbnails/2.jpg)
Dimension Reduction
• Dimension Reduction for Clustered Data: Linear Discriminant Analysis (LDA) Generalized LDA (LDA/GSVD, regularized LDA) Orthogonal Centroid Method (OCM)
• Dimension Reduction for Nonnegative Data: Nonnegative Matrix Factorization (NMF)
• Applications: Text classification, Face recognition, Fingerprint classification, Gene clustering in Microarray Analysis …
![Page 3: Effective Dimension Reduction with Prior Knowledge Haesun Park Division of Computational Science and Eng. College of Computing Georgia Institute of Technology](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d6c5503460f94a4bfbe/html5/thumbnails/3.jpg)
2D Representation Utilize Cluster Structure if Known
2D representation of 150x1000 data with 7 clusters: LDA vs. SVD
![Page 4: Effective Dimension Reduction with Prior Knowledge Haesun Park Division of Computational Science and Eng. College of Computing Georgia Institute of Technology](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d6c5503460f94a4bfbe/html5/thumbnails/4.jpg)
A = [a1 an] mxn, clustered data Ni = items in class i, | Ni | = ni , total r classesci = centroid, c = global centroid
Sb = ∑1≤ i≤ r ∑ j N∈ i (ci – c) (ci – c)T
Sw = ∑1≤ i≤ r ∑ j N∈ i (aj – ci ) (aj – ci )T
St = ∑1≤ i≤ n (ai – c ) (ai – c )T ,
Dimension Reduction for Clustered Data Measure for Cluster Quality
Sw + Sb = St
![Page 5: Effective Dimension Reduction with Prior Knowledge Haesun Park Division of Computational Science and Eng. College of Computing Georgia Institute of Technology](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d6c5503460f94a4bfbe/html5/thumbnails/5.jpg)
Optimal Dimension Reducing Transformation
High quality clusters have small trace(Sw) & large trace(Sb)
Want: G s.t. min trace(GT SwG) & max trace(GT Sb
G)
• max trace ((GT SwG)-1 (GT Sb G)) LDA (Fisher 36, Rao 48)
• max trace (GT Sb G) Orthogonal Centroid (Park et al. 03)
• max trace (GT (Sw+Sb )G) PCA (Pearson 1901, Hotelling 33)
• max trace (GT A AT G) LSI (Deerwester et al. 90)
GTy qx1, q << m
GTqxmy mx1
GTG=I
GTG=I
GTG=I
![Page 6: Effective Dimension Reduction with Prior Knowledge Haesun Park Division of Computational Science and Eng. College of Computing Georgia Institute of Technology](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d6c5503460f94a4bfbe/html5/thumbnails/6.jpg)
Classical LDA(Fisher ’36, Rao ‘48)
max trace ((GT SwG)-1 (GT Sb G))
• G : leading (r-1) e.vectors of Sw-1Sb Fails
when m>n (undersampled), Sw singular
Sw Hw
HwT
=x
• Sb=Hb HbT, Hbn1(c-c) , … ,nr(cr - c) ] :
mxr
• Sw=Hw HwT, Hw=[a1-c1, a2-c1, …, an-cr ] : mxn
![Page 7: Effective Dimension Reduction with Prior Knowledge Haesun Park Division of Computational Science and Eng. College of Computing Georgia Institute of Technology](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d6c5503460f94a4bfbe/html5/thumbnails/7.jpg)
LDA based on GSVD (LDA/GSVD) (Howland, Jeon, Park, SIMAX03, Howland and Park, IEEE TPAMI 04)
Sw-1Sb x = x Sbx=Swx δ2Hb Hb
Tx = 2Hw HwTx
UT HbT X
VT HwT X
b 0) =
w 0) = 0
0
XT HbHbTX = XT Sb X and XTHwHw
TX = XT Sw XClassical LDA is a special case of LDA/GSVD
XTSb X =
I
Db
0
0XTSw X =
0
Dw
I
0
![Page 8: Effective Dimension Reduction with Prior Knowledge Haesun Park Division of Computational Science and Eng. College of Computing Georgia Institute of Technology](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d6c5503460f94a4bfbe/html5/thumbnails/8.jpg)
Generalization of LDA for Undersampled Problems
Regularized LDA (Friedman ’89, Zhao et al. ’99 … )
LDA/GSVD : Solution G = [ X1 X2 ] (Howland, Jeon, Park ’03)
Solutions based on Null(Sw ) and Range(Sb )… (Chen et al. ’00, Yu & Yang ’01, Park & Park ’03 …) Two-stage methods:
• Face Recognition: PCA + LDA (Swets & Weng ’96 , Zhao et al. 99 )
• Information Retrieval: LSI + LDA (Torkkola ’01)
• Mathematical Equivalence: (Howland and Park ’03)
PCA+ LDA/GSVD = LDA/GSVD LSI +LDA/GSVD = LDA/GSVD More efficient = QRD + LDA/GSVD
![Page 9: Effective Dimension Reduction with Prior Knowledge Haesun Park Division of Computational Science and Eng. College of Computing Georgia Institute of Technology](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d6c5503460f94a4bfbe/html5/thumbnails/9.jpg)
QRD Preprocessing in Dim. Reduction(Distance Preserving Dim. Redution)
A Q1
R
For undersampled data A:mxn, m>>n
=
Q1: orthonormal basis for span(A)Dimension reduction of A by Q1
T, Q1T A = R: nxn
Q1T preserves distance of L2 norm:
|| ai ||2 = || Q1T ai ||2
|| ai - aj ||2 = || Q1T (ai - aj )||2
In cos distance: cos(ai, aj) = cos(Q1T ai, Q1
T aj)
Q1 Q2=R
0
Applicable to PCA, LDA, LDA/GSVD, Isomap, LTSA, LLE, …
![Page 10: Effective Dimension Reduction with Prior Knowledge Haesun Park Division of Computational Science and Eng. College of Computing Georgia Institute of Technology](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d6c5503460f94a4bfbe/html5/thumbnails/10.jpg)
Data Dim. # r LDA/GSVD regLDA(LDA)
QR+LDA/GSVD
QR+LDA/regGSVD
Text 5896 x 210 7 48.8 42.2 0.14 0.03
Yale 77760 x 165
15 -- -- 0.96 0.22
AT&T 10304 x 400
40 -- -- 0.07 0.02
Feret 3000 x 130
10 10.9 9.3 0.03 0.01
OptDigit 64 x 5610
10 8.97 9.60 0.02
Isolet 617 x 7797
26 98.1 99.33 6.70
Speed Up with QRD Preprocessing(computation time)
![Page 11: Effective Dimension Reduction with Prior Knowledge Haesun Park Division of Computational Science and Eng. College of Computing Georgia Institute of Technology](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d6c5503460f94a4bfbe/html5/thumbnails/11.jpg)
Classification Medline
Data
(1250 items, 5 Clusters)
Methods Full OCM LDA/GSVD
Dim 22095 5 4
centroid (L2) 84.8 84.8 88.9
centroid (Cosine)
88.0 88.0 83.9
15nn (L2) 83.4 88.2 89.0
15nn (Cosine) 82.3 88.3 83.9
30nn(L2) 83.9 88.6 89.0
30nn (Cosine) 83.5 88.4 83.9
SVM 88.9 88.7 87.2
Text Classification with Dim. Reduction
Classification accuracy (%)
Similarity measures: L2 norm and Cosine
(Kim, Howland, Park, JMLR03)
Reuters
Data
(9579 items, 90 Clusters)
Full OCM
11941 90
78.89 78.00
80.45 80.46
78.65 85.51
80.21 86.19
87.11 87.03
![Page 12: Effective Dimension Reduction with Prior Knowledge Haesun Park Division of Computational Science and Eng. College of Computing Georgia Institute of Technology](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d6c5503460f94a4bfbe/html5/thumbnails/12.jpg)
Dim. Red. Method Dim kNN k=1 k=5 k=9
Full Space 8586 79.4 76.4 72.1LDA/GSVD 14 98.8 (90) 98.8 98.8Regularized LDA 14 97.6 (85) 97.6 97.6Proj. to null (Sw) 14 97.6 (84) 97.6 97.6 (Chen et al., ’00)
Transf. to range(Sb) 14 89.7 (82) 94.6 91.5 (Yu & Yang, ’01)
Prediction Accuracy in %, leave-one-out ( and average of 100 random split)Yale Face Database: 243 x 320 pixels = full dimension of 77760 11 images/person x 15 people = 165 imagesAfter Preprocessing (avg 3x3): 8586 x 165
Face Recognition on Yale Data(C. Park and H. Park, icdm04)
![Page 13: Effective Dimension Reduction with Prior Knowledge Haesun Park Division of Computational Science and Eng. College of Computing Georgia Institute of Technology](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d6c5503460f94a4bfbe/html5/thumbnails/13.jpg)
Fingerprint Classification Results on NIST Fingerprint Database 4
4000 fingerprint images of size 512x512By KDA/GSVD, dimension reduced from 105x105 to 4
KDA/GSVD: Nonlinear Extension of LDA/GSVD based on Kernel Functions
Rejection rate(%) 0 1.8 8.5
KDA/GSVD 90.7 91.3 92.8
kNN & NN Jain et al., 99 - 90.0 91.2
SVM Yao et al., 03 - 90.0 92.2
(C. Park and H. Park , Pattern Recognition, 2005)
![Page 14: Effective Dimension Reduction with Prior Knowledge Haesun Park Division of Computational Science and Eng. College of Computing Georgia Institute of Technology](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d6c5503460f94a4bfbe/html5/thumbnails/14.jpg)
Nonnegativity Preserving Dim. Reduction Nonnegative Matrix Factorization (Paatero&Tappa 94, Lee&Seung NATURE 99, Pauca et al. SIAM DM 04, Hoyer 04, Lin 05, Berry 06, Kim and Park 06, …)
A WH
Given A:mxn with A>=0 and k << min (m,n), find W:mxk and H:kxn with W>=0 and H>=0 s.t.
~=
min || A – WH ||F
NMF/ANLS: Two-block Coordinate Descent Method in Bound-constrained Opt. Iterate the following ANLS (Kim and Park, Bioinformatics, to appear ) :
fixing W , solve minH>=0 || W H –A||F fixing H , solve minW>=0 || HT WT –AT||F Any limit point is a stationary point (Grippo and Siandrone 00)
![Page 15: Effective Dimension Reduction with Prior Knowledge Haesun Park Division of Computational Science and Eng. College of Computing Georgia Institute of Technology](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d6c5503460f94a4bfbe/html5/thumbnails/15.jpg)
Nonnegativity Constraints?
Better Approximation vs. Better Representation/InterpretationGiven A : m x n and k < min(m, n)
SVD: Best Approximation min ||A – W H||F, A = UVT, AUk Vk
T
NMF: Better Representation/Interpretation? min ||A – W H|| F, W>=0, H>=0 ?
Nonnegative Constraints are physically meaningful Pixels in digital image, Molecule concentration in bioinformatics Signal Intensities, Visualization….
Interpretation of analysis results: nonsubtractive combinations ofnonnegative basis vectors
![Page 16: Effective Dimension Reduction with Prior Knowledge Haesun Park Division of Computational Science and Eng. College of Computing Georgia Institute of Technology](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d6c5503460f94a4bfbe/html5/thumbnails/16.jpg)
Performance of NMF Algorithms
The relative residuals vs. the number of iterations for NMF/ANLS, NMF/MUR, and NMF/ALS on a zero residual artificial problem A:200x50
![Page 17: Effective Dimension Reduction with Prior Knowledge Haesun Park Division of Computational Science and Eng. College of Computing Georgia Institute of Technology](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d6c5503460f94a4bfbe/html5/thumbnails/17.jpg)
Recovery of Factors by SVD and NMF
A: 2500x28, W:2500x3, H:3x28 where A=W*HRecovery of the factors W and H by SVD and NMF/ANLS
![Page 18: Effective Dimension Reduction with Prior Knowledge Haesun Park Division of Computational Science and Eng. College of Computing Georgia Institute of Technology](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d6c5503460f94a4bfbe/html5/thumbnails/18.jpg)
SummaryEffective Algorithms for Dimension Reduction and Matrix Decompositions that exploits prior knowledge
• Design of New Algorithms: e.g. for undersampled data • Take Advantage of Prior Knowledge for Physically More Meaningful Modeling• Storage and Efficiency Issues for Massive Scale Data• Adaptive Algorithms
* Applicable to a wide range of problems(Text classification, Facial recognition, Fingerprint classification, Gene class discovery in Microarray data, Protein secondary structure prediction … )
Thank you!