yeast cell cycles, freq. 2 proj
DESCRIPTION
Yeast Cell Cycles, Freq. 2 Proj. PCA on Freq. 2 Periodic Component Of Data. Source Batch Adj: Source Colors. Source Batch Adj: PC 1-3 & DWD direction. Source Batch Adj: DWD Source Adjustment. NCI 60: Raw Data, Platform Colored. NCI 60: Fully Adjusted Data, Platform Colored. - PowerPoint PPT PresentationTRANSCRIPT
Title
Yeast Cell Cycles, Freq. 2 Proj.
PCA onFreq. 2PeriodicComponent Of Data1Source Batch Adj: Source Colors
2Source Batch Adj: PC 1-3 & DWD direction
3Source Batch Adj: DWD Source Adjustment
4NCI 60: Raw Data, Platform Colored
5NCI 60: Fully Adjusted Data, Platform Colored
6Matlab SoftwareWant to try similar analyses?
Matlab Available from UNC Site License
Download Software:
Google Marron Software
7Matlab SoftwareChoose
8
Matlab SoftwareDownload .zip File, & Expand to 3 Directories9
Matlab SoftwarePut these in Matlab Path10
Matlab SoftwarePut these in Matlab Path11Matlab BasicsMatlab has Modalities:
Interpreted (Type Commands & Run Individually)
Batch (Run Script Files = Command Sets)12Matlab BasicsMatlab in Interpreted Mode:
13Matlab BasicsMatlab in Interpreted Mode:
14Matlab BasicsMatlab in Interpreted Mode:
15Matlab BasicsMatlab in Interpreted Mode:
16Matlab BasicsMatlab in Interpreted Mode:
17Matlab BasicsMatlab in Interpreted Mode:
18Matlab BasicsMatlab in Interpreted Mode:
For description of a function:
>> help [function name]
19Matlab BasicsMatlab in Interpreted Mode:
20Matlab BasicsMatlab in Interpreted Mode:
To Find Functions:>> help [category name]
e.g.
>> help stats21Matlab BasicsMatlab in Interpreted Mode:
22Matlab BasicsMatlab has Modalities:
Interpreted (Type Commands)
Batch (Run Script Files)
For Serious Scientific Computing:Always Run Scripts23Matlab BasicsMatlab Script File: Just a List of Matlab Commands Matlab Executes Them in Order
Why Bother (Why Not Just Type Commands)?Reproducibility(Can Find Mistakes & Use Again Much Later)24Matlab Script FilesAn Example:
Recall Brushing Analysis of Next Generation Sequencing Data
25Simple1stView:CurveOverlay
(log scale)
Functional Data Analysis26
Often UsefulPopulationView:
PCAScoresFunctional Data Analysis27
SuggestionOfClusters???Functional Data Analysis28
SuggestionOfClusters
Which AreThese?Functional Data Analysis29
ManuallyBrushClustersFunctional Data Analysis30
ManuallyBrushClusters
ClearAlternateSplicingFunctional Data Analysis31Matlab Script FilesAn Example:
Recall Brushing Analysis of Next Generation Sequencing Data
Analysis In Script File:VisualizeNextGen2011.m
Matlab Script File Suffix32Matlab Script FilesAn Example:
Recall Brushing Analysis of Next Generation Sequencing Data
Analysis In Script File:VisualizeNextGen2011.m
Matlab Script File SuffixOn Course Web Page33Matlab Script FilesString of Text
34Matlab Script FilesCommand to Display String to Screen
35Matlab Script FilesNotes About Data (Maximizes Reproducibility)
36Matlab Script FilesHave Index for Each Part of Analysis
37Matlab Script FilesSo Keep Everything Done (Maxs Reprodity)
38Matlab Script FilesNote Some Are Graphics Shown (Can Repeat)
39Matlab Script FilesSet Graphics to Default
40Matlab Script FilesPut Different Program Parts in IF-Block
41Matlab Script FilesComment Out Currently Unused Commands
42Matlab Script FilesRead Data from Excel File
43Matlab Script FilesFor Generic Functional Data Analysis:
44Matlab Script FilesInput Data Matrix
45Matlab Script FilesStructure, with Other Settings
46Matlab Script FilesMake Scores Scatterplot
47Matlab Script FilesUses Careful Choice of Color Matrix
48Matlab Script FilesStart with PCA
49Matlab Script FilesThen Create Color Matrix
50Matlab Script FilesBlack Red Blue
51Matlab Script FilesRun Script Using Filename as a Command
52Cornea DataMain Point: OODA Beyond FDA
Recall Interplay:
Object Space Descriptor Space53Cornea DataCornea: Outer surface of the eyeDriver of Vision: Curvature of Cornea
Data Objects: Images on the unit diskRadial Curvature as Heat Map
Special Thanks to K. L. Cohen, N. Tripoli,UNC Ophthalmology54Cornea DataCornea Data:
Raw Data
DecomposeIntoModes ofVariation?
55Cornea DataReference: Locantore, et al (1999)
Visualization (generally true for images):More challenging than for curves (since cant overlay)Instead view sequence of imagesHarder to see population structure (than for curves)So PCA type decomposition of variation is more important 56Cornea DataNature of images (on the unit disk, not usual rectangle)Color is curvatureAlong radii of circle (direction with most effect on vision)Hotter (red, yellow) for more curvatureCooler (blue, green) for less curvatureFeature vec. is coeffs of Zernike expansionZernike basis: ~ Fourier basis, on diskConveniently represented in polar coords 57Cornea DataData Representation - Zernike BasisPixels as features is large and wastefulNatural to find more efficient representnPolar Coordinate Tensor Product of:Fourier basis (angular)Special Jacobi (radial, to avoid singularities)See:Schwiegerling, Greivenkamp & Miller (1995) Born & Wolf (1980) 58Cornea DataData Representation - Zernike Basis
Descriptor Space is Vector Space of Zernike Coefficients
So Perform PCA There59PCA of Cornea DataRecall: PCA can find (often insightful) direction of greatest variability
Main problem: display of result (no overlays for images)
Solution: show movie of marching along the direction vector60PCA of Cornea DataPC1 Movie:
61PCA of Cornea DataPC1 Summary:Mean (1st image): mild vertl astigmatismknown popn structure called with the ruleMain dirn: more curved & less curvedCorresponds to first optometric measure(89% of variatn, in Mean Resid. SS sense)Also: stronger astigm & no astigmFound corrn between astigm and curvreScores (blue): Apparent Gaussian distn62PCA of Cornea DataPC2 Movie:
63PCA of Cornea DataPC2 Movie:
Mean: same as aboveCommon centerpoint of point cloudAre studying directions from mean
Images along direction vector:Looks terrible???Why? 64PCA of Cornea DataPC2 Movie:Reason made clear in Scores Plot (blue): Single outlying data object drives PC dirnA known problem with PCARecall finds direction with max variationIn sense of varianceEasily dominated by single large observatn 65PCA of Cornea DataToy Example: Single Outlier Driving PCA
66PCA of Cornea DataPC2 Affected by Outlier:How bad is this problem?
View 1: Statistician: Arrggghh!!!!Outliers are very dangerousCan give arbitrary and meaningless dirns67PCA of Cornea DataPC2 Affected by Outlier:How bad is this problem?View 2: Ophthalmologist: No ProblemDriven by edge effects (see raw data)Artifact of light reflection data gathering (eyelid blocking, and drying effects)Routinely visually ignore those anywayFound interesting (& well known) dirn:steeper superior vs steeper inferior 68Cornea DataCornea Data:
Raw Data
Which oneis the outlier?
Will say more later
69PCA of Cornea DataPC3 Movie
70PCA of Cornea DataPC3 Movie (ophthalmologists view):Edge Effect Outlier is presentBut focusing on central region shows changing dirn of astigm (3% of MR SS)with the rule (vertical) vs. against the rule (horizontal)most astigmatism is with the rulemost of rest is against the rule(known folklore) 71PCA of Cornea DataPC4 movie
72PCA of Cornea DataContinue with ophthalmologists viewPC4 movie version: Other direction of astigmatism???Location (i.e. registration) effect???Harder to interpret OK, since only 1.7% of MR SSSubstantially less than for PC2 & PC3 73PCA of Cornea DataOphthalmologists View (cont.)
Overall Impressions / Conclusions:Useful decomposition of population variationUseful insight into population structure74PCA of Cornea Data Now return to Statisticians View:How can we handle these outliers?Even though not fatal here, can be for other examplesRecall Simple Toy Example (in 2d):
75Outliers in PCADeeper Toy Example:
76Outliers in PCADeeper Toy Example:Why is that an outlier?Never leaves range of other dataBut Euclidean distance to others very largerelative to other distancesAlso major difference in terms of shapeAnd even smoothnessImportant lesson: many directions in
77Outliers in PCAMuch likeearlierParabolasExample
But with1 outlierthrown in
78Outliers in PCAPCA for DeeperToy E.g.Data:
79Outliers in PCADeeper Toy Example:At first glance, mean and PC1 look similar to no outlier versionPC2 clearly driven completely by outlierPC2 scores plot (on right) gives clear outlier diagnosticOutlier does not appear in other directionsPrevious PC2, now appears as PC3Total Power (upper right plot) now spread farther80Outliers in PCACloser Look at Deeper Toy Example:
Mean influenced a little, by the outlierAppearance of corners at every other coordinate
PC1 substantially influenced by the outlierClear wiggles81Outliers in PCAWhat can (should?) be done about outliers?Context 1: Outliers are important aspects of the populationThey need to be highlighted in the analysisAlthough could separate into subpopulationsContext 2: Outliers are bad data, of no interestrecording errors? Other mistakes?Then should avoid distorted view of PCA 82Outliers in PCAStandard Statistical Approaches to Dealing with Outliers:Outlier Deletion: Kick out bad dataRobust Statistical methods:Work with full data set, but downweight bad dataReduce influence, instead of deleting 83Outliers in PCAExample Cornea Data:Can find PC2 outlier (by looking through data (careful!))Problem: after removal, another point dominates PC2Could delete that, but then another appearsAfter 4th step have eliminated 10% of data (n = 43) 84Outliers in PCAExample Cornea Data
85Outliers in PCAMotivates alternate approach:Robust Statistical MethodsRecall main idea:Downweight (instead of delete) outliers a large literature. Good intros(from different viewpoints) are: Huber (1981) Hampel, et al (1986)Staudte & Sheather (1990)
86Outliers in PCASimple robustness concept: breakdown pointhow much of data moved to will destroy estimate?Usual mean has breakdown 0Median has breakdown (best possible)Conclude:Median much more robust than meanMedian uses all dataMedian gets good breakdown fromequal vote
87Outliers in PCAMean has breakdown 0
Single OutlierPulls Mean Outsiderange of data
88Outliers in PCAControversy: Is medians equal vote scheme good or bad?Huber: Outliers contain some information,So should only control influence (e.g. median)Hampel, et. al.: Outliers contain no useful informationShould be assigned weight 0 (not done by median)Using proper robust method (not simply deleted) 89Outliers in PCARobustness Controversy (cont.):Both are right (depending on context)Source of major (unfortunately bitter) debate!Application to Cornea data:Hubers model more sensibleAlready know some useful info in each data pointThus median type methods are sensible
90Robust PCAWhat is multivariate median?There are several! (median generalizes in different ways)Coordinate-wise median Often worst Not rotation invariant(2-d data uniform on L)Can lie on convex hull of data(same example)Thus poor notion of center
91Robust PCACoordinate-wise median Not rotation invariant
Thus poor notion of center
92Robust PCACoordinate-wise median Can lie on convex hull of data
Thus poor notion of center
93Robust PCAWhat is multivariate median (cont.)?ii.Simplicial depth (a. k. a. data depth): Liu (1990)Paint Thickness of dim simplices with corners at dataNice ideaGood invariance propertiesSlow to compute
94Robust PCAWhat is multivariate median (cont.)?iii.Hubers M-estimate:Given data , Estimate center of population by
Where is the usual Euclidean normHere: use only (minimal impact by outliers)
95Robust PCAHubers M-estimate (cont):Estimate center of population by
Case :Can show (sample mean)(also called Frchet Mean)Here: use only (minimal impact by outliers)
96Robust PCA M-estimate (cont.):A view of minimizer: solution of
A useful viewpoint is based on: = Projn of data onto sphere centd at with radius And representation:
97Robust PCA M-estimate (cont.):Thus the solution of
is the solution of:
So is location where projected data are centered
Slide sphere around until mean (of projected data) is at center
98Robust PCA M-estimate (cont.):Slide sphere around until mean (of projected data) is at center
99Robust PCA M-estimate (cont.):Additional literature:Called geometric median (long before Huber) by: Haldane (1948)Shown unique for by: Milasevic and Ducharme (1987) Useful iterative algorithm: Gower (1974)(see also Sec. 3.2 of Huber).Cornea Data experience: works well for
100Robust PCA M-estimate for Cornea Data:
Sample Mean M-estimateDefinite improvementBut outliers still have some influenceImprovement? (will suggest one soon)
101Robust PCANow have robust measure of center, how about spread?
I.e. how can we do robust PCA?
102Robust PCANow have robust measure of center, how about spread?
I.e. how can we do robust PCA?
103Robust PCAApproaches to Robust PCA:Robust Estimation of Covariance MatrixProjection PursuitSpherical PCA104Robust PCARobust PCA 1: Robust Estimation of Covariance MatrixA. Component-wise Robust Covariances:Major problem: Hard to get non-negative definitenessMinimum Volume Ellipsoid: Rousseeuw & Leroy (2005) Requires (in available software)Needed for simple definition of affine invariant
105Important Aside 106Important Aside Classical Approach to HDLSS data: Dont have enough data for analysis, get more
Unworkable (and getting worse) for many modern settings:
Medical Imaging (e.g. Cornea Data)Micro-arrays & gene expressionChemometric spectra data107Robust PCARobust PCA 2: Projection PursuitIdea: focus onfinding direction of greatest variabilityReference: Li and Chen (1985)
Problems: Robust estimates of spread are nonlinearResults in many local optima
108Robust PCA109Robust PCARobust PCA 3:
Spherical PCA
110