matlab software
DESCRIPTION
Matlab Software. Want to try similar analyses? Matlab Available from UNC Site License Download Software: Google “ Marron Software”. Cornea Data. Cornea Data: Raw Data Decompose Into Modes of Variation?. PCA of Cornea Data. PC1 Movie:. PCA of Cornea Data. PC1 Summary: - PowerPoint PPT PresentationTRANSCRIPT
Title
Matlab SoftwareWant to try similar analyses?
Matlab Available from UNC Site License
Download Software:
Google Marron Software
1Cornea DataCornea Data:
Raw Data
DecomposeIntoModes ofVariation?
2PCA of Cornea DataPC1 Movie:
3PCA of Cornea DataPC1 Summary:Mean (1st image): mild vertl astigmatismknown popn structure called with the ruleMain dirn: more curved & less curvedCorresponds to first optometric measure(89% of variatn, in Mean Resid. SS sense)Also: stronger astigm & no astigmFound corrn between astigm and curvreScores (blue): Apparent Gaussian distn4PCA of Cornea DataPC2 Movie:
5PCA of Cornea DataPC2 Movie:
Mean: same as aboveCommon centerpoint of point cloudAre studying directions from mean
Images along direction vector:Looks terrible???Why? 6Outliers in PCAMotivates alternate approach:Robust Statistical MethodsRecall main idea:Downweight (instead of delete) outliers a large literature. Good intros(from different viewpoints) are: Huber (1981) Hampel, et al (1986)Staudte & Sheather (1990)
7Robust PCAWhat is multivariate median?There are several! (median generalizes in different ways)Coordinate-wise median Often worst Not rotation invariant(2-d data uniform on L)Can lie on convex hull of data(same example)Thus poor notion of center
8Robust PCAWhat is multivariate median (cont.)?iii.Hubers M-estimate:Given data , Estimate center of population by
Where is the usual Euclidean normHere: use only (minimal impact by outliers)
9Robust PCA M-estimate (cont.):Slide sphere around until mean (of projected data) is at center
10Robust PCA M-estimate for Cornea Data:
Sample Mean M-estimateDefinite improvementBut outliers still have some influenceImprovement? (will suggest one soon)
11Robust PCAApproaches to Robust PCA:Robust Estimation of Covariance MatrixProjection PursuitSpherical PCA12Robust PCARobust PCA 3:
Spherical PCA
13Robust PCARobust PCA 3: Spherical PCAIdea: use projection to sphere idea from M-estimationIn particular project data to centered sphere Hot Dog of data becomes Ice CapsEasily found by PCA (on projd data)Outliers pulled in to reduce influenceRadius of sphere unimportant 1414Robust PCA1515Robust PCARobust PCA 3: Spherical PCA
Independent Derivation & Alternate Name:PCA of Spatial Signs
(think: multivariate extension of sign test)
Complete Description:Oja, H. (2010) Multivar. Nonpar. Methods 1616Robust PCASpatial Signs
Interesting Variation:
Spatial Ranks
Idea: Keep Track of DepthVia Ranks of Radii
1717
Robust PCASpherical PCA for Toy Example:
Curve DataWith anOutlier
First recallConventionalPCA1818Robust PCASpherical PCA for Toy Example:
Now doSphericalPCA
Better result?
1919Robust PCASpherical PCA for Toy Data:Mean looks smootherPC1 nearly flat (unaffected by outlier)PC2 is nearly tilt (again unaffected by outlier)PC3 finally strongly driven by outlierOK, since all other directions about equal in variation Energy Plot, no longer ordered (outlier drives SS, but not directions)2020Robust PCASpherical PCA for Toy Example:
Check outLaterComponents
2121
Aside On VisualizationAnother Multivariate Data Visualization Tool:Parallel CoordinatesE.g. Fisher Iris Datad = 4Named Variables
(thanks to Wikipedia)2222
Aside On VisualizationAnother Multivariate Data Visualization Tool:Parallel CoordinatesE.g. Fisher Iris Datad = 4Named VariablesCurves are Data Objects(4-vectors)2323Robust PCAUseful View: Parallel Coordinates Plot
X-axis:ZernikeCoefficientNumber
Y-axis:Coefficient
2424Robust PCACornea Data, Parallel Coordinates Plot:
Top Plot: ZernikeCoefficients
All n = 43 verySimilar
Most Action in fewLow Freq. Coeffs.
2525Robust PCACornea Data, Parallel Coordinates Plot Middle Plot: (Zernike Coefficients median)Most Variation in lowest frequenciesE.g. as in Fourier compression of smooth signalsProjecting on sphere will destroy thisBy magnifying high frequency behavior
Bottom Plot: discussed later 2626Robust PCASpherical PCA Problem : Magnification of High Freq. CoeffsSolution : Elliptical AnalysisMain idea: project data onto suitable ellipse, not sphereWhich ellipse? (in general, this is problem that PCA solves!)Simplification: Consider ellipses parallel to coordinate axes 2727Robust PCA2828Robust PCA
RescaleCoordsUnscaleCoordsSpherical PCA29Robust PCAElliptical Analysis (cont.):Simple Implementation,via coordinate axis rescalingDivide each axis by MADProject Data to sphere (in transformed space)Return to original space (mulply by origl MAD) for analysisDo PCA on Projections30Robust PCAElliptical Estimate of center:Do M-estimation in transformed space (then transform back)Results for cornea data:
Sample Mean Spherical Center Elliptical CenterElliptical clearly bestNearly no edge effect
31Robust PCAElliptical PCA for cornea data:Original PC1, Elliptical PC1
32Robust PCAElliptical PCA for cornea data:Original PC1, Elliptical PC1Still finds overall curvature & correlated astigmatismMinor edge effects almost completely gone33Robust PCAElliptical PCA for cornea data:Original PC2, Elliptical PC2
34Robust PCAElliptical PCA for cornea data:Original PC1, Elliptical PC1Still finds overall curvature & correlated astigmatismMinor edge effects almost completely goneOriginal PC2, Elliptical PC2Huge edge effects dramatically reducedStill finds steeper superior vs. inferior 35Robust PCAElliptical PCA for cornea data:Original PC3, Elliptical PC3
36Robust PCAElliptical PCA for Cornea Data (cont.):Original PC3, Elliptical PC3-Edge effects greatly diminishedBut some of against the rule astigmatism also lostPrice paid for robustness37Robust PCAElliptical PCA for cornea data:Original PC4, Elliptical PC4
38Robust PCAElliptical PCA for Cornea Data (cont.):Original PC3, Elliptical PC3-Edge effects greatly diminishedBut some of against the rule astigmatism also lostPrice paid for robustnessOriginal PC4, Elliptical PC4Now looks more like variation on astigmatism??? 39Robust PCACurrent state of the art:Spherical & Elliptical PCA are a kludgePut together by Robustness AmateursTo solve this HDLSS problemGood News: Robustness Pros are now in the game:Maronna, et al (2006), Sec. 6.10.240Robust PCADisclaimer on robust analys of Cornea Data:
Critical parameter is radius of analysis, : Shown above, Elliptical PCA very effective : Stronger edge effects, Elliptical PCA less useful : Edge effects weaker, dont need robust PCA
41
Big Picture View of PCAAbove View:PCA finds optimal directions in point cloud42
Big Picture View of PCAAbove View:PCA finds optimal directions in point cloud43Big Picture View of PCAAbove View:PCA finds optimal directions in point cloudMaximize projected variationMinimize residual variation(same by Pythagorean Theorem)Notes:Get useful insights about dataCan compute for any point cloudBut there are other views. 44Big Picture View of PCAAlternate Viewpoint: Gaussian Likelihood
45Big Picture View of PCAAlternate Viewpoint: Gaussian Likelihood
46Big Picture View of PCAAlternate Viewpoint: Gaussian LikelihoodWhen data are multivariate GaussianPCA finds major axes of elliptal contoursof Probability Density Maximum Likelihood Estimate47
Big Picture View of PCAAlternate Viewpoint: Gaussian Likelihood
Maximum Likelihood Estimate48Big Picture View of PCAAlternate Viewpoint: Gaussian LikelihoodWhen data are multivariate GaussianPCA finds major axes of elliptal contoursof Probability Density Maximum Likelihood Estimate
Mistaken idea: PCA only useful for Gaussian data 49Big Picture View of PCASimple check for Gaussian distribution: Standardized parallel coordinate plotSubtract coordinate wise median(robust version of mean)(not good as point cloud center, but now only looking at coordinates)Divide by MAD / MAD(N(0,1))(put on same scale as standard deviation)See if data stays in range 3 to +3 50Big Picture View of PCAE.g.Cornea Data:
StandardizedParallel CoordinatePlot
Shown before
51Big Picture View of PCARaw Cornea Data:
Data Median
(Data Median)------------------- MAD
52Big Picture View of PCACheck for Gaussian distn: Standzed Parallel Coord. PlotE.g. Cornea data (recall image view of data)Several data points > 20 s.d.s from the centerDistribution clearly not GaussianStrong kurtosis (heavy tailed)But PCA still gave strong insights 53Big Picture View of PCAImage View of Cornea Data:
54Correlation PCAA related (& better known) variation of PCA:Replace cov. matrix with correlation matrixI.e. do eigen analysis of
Where
55Correlation PCAWhy use correlation matrix?
Makes features unit freee.g. Height, Weight, Age, $, Are directions in point cloud meaningful or useful?Will unimportant directions dominate? 56Correlation PCAPersonal Thoughts on Correlation PCA:Can be very useful (especially with noncommensurate units)Not always, can hide important structureTo make choice:Decide whether whitening is usefulMy personal use of correlatn PCA is rareOther people use it most of the time 57Transformations Useful Method for Data Analysts Apply to Marginal Distributions(i.e. to Individual Variables) Idea: Put Data on Right Scale Common Example:Data Orders of Magnitude DifferentLog10 Puts Data on More Analyzable Scale
58TransformationsExample: Melanoma Data AnalysisCurrent Collaborators:Nancy ThomasMarc NiethammerDavid EberhardQing FengHeather Couture59Clinical diagnosisBackgroundIntroduction
March 17, 2010, # Image Analysis of Histology SlidesGoalBackground
Melanoma
Image: www.melanoma.caBenign1 in 75 North Americans will develop a malignant melanoma in their lifetime.Initial goal: Automatically segment nuclei.Challenge: Dense packing of nuclei.Ultimately:Cancer grading and patient survival.
Image: melanoma.blogsome.comMarch 17, 2010, # Cutting Cell ClustersCell Nuclei from HistologySegmentationWith subdivisionWithout subdivisionNot perfect, but improved over standard algorithm (rq. validation).
Our segmentationCommercial seg.March 17, 2010, # Feature ExtractionFeatures from Cell NucleiFeature ExtractionExtract various features based on color and morphology
Example high-level concepts:
Stain intensity
Nuclear area
Density of nuclei
Regularity of nuclear shapeMarch 17, 2010, # Original ImageFeatures from Cell NucleiFeature Extraction
Conventional NevusSuperficial Spreading MelanomaMarch 17, 2010, # Restained ImageFeatures from Cell NucleiFeature ExtractionConventional NevusSuperficial Spreading Melanoma
March 17, 2010, # Masked RegionFeatures from Cell NucleiFeature ExtractionConventional NevusSuperficial Spreading Melanoma
Conventional NevusSuperficial Spreading Melanoma
Masks generated by pathologistMarch 17, 2010, # Segmented NucleiFeatures from Cell NucleiFeature ExtractionConventional NevusSuperficial Spreading Melanoma
March 17, 2010, # Labeled NucleiFeatures from Cell NucleiFeature ExtractionConventional NevusSuperficial Spreading Melanoma
March 17, 2010, # Nuclear RegionsFeatures from Cell NucleiFeature ExtractionConventional NevusSuperficial Spreading Melanoma
Generated by growing nuclei out from boundary
Used for various color and density features: Region Stain 2, Region Area Ratio, etc.March 17, 2010, # Delaunay TriangulationFeatures from Cell NucleiFeature ExtractionConventional NevusSuperficial Spreading Melanoma
Triangulation of nuclear centers
Used for various density features: Mean Delaunay, Max. Delaunay, etc.March 17, 2010, # TransformationsStatistical Challenge: Some Features Spread Over Several Orders of MagnitudeFew Big Values Can Dominate Analysis71TransformationsFew Large Values Can Dominate Analysis
72TransformationsFew Large Values Can Dominate Analysis
Terminology: SkewnessLargest Positive Values Positive Skewness
(Thanks to Wikipedia)
73Transformations74
TransformationsLog Scale Gives All More Appropriate Weight
Note: OrdersOfMagnitude
Log10 mostInterpretable75
TransformationsAnother Feature with Large Values
ProblemFor ThisFeature:
0 Values(no log)
76
Transformations77TransformationsMuch Nicer Distribution
78TransformationsDifferent Direction (Negative) of Skewness
79TransformationsUse Log Difference Transformation
80TransformationsUseful Visualization: MargDistPlot
Idea: Visually Screen Many MarginalDistributions, Simultaneously
Approach: Order, Using Some Summary
E.g. Set of Poisson() DistributionsWith Wide Range of Parameters( = 0.2, , 20, log spaced)
81
TransformationsUseful Visualization: MargDistPlot
SummaryPlot
Here UsingMeans
82
TransformationsUseful Visualization: MargDistPlot
SummaryPlot
ShowRepretiveMarginalDistns83
TransformationsUseful Visualization: MargDistPlot
SummaryPlot
Note:SensibleOrdering
84
TransformationsUseful Visualization: MargDistPlot
ChangeSummaryTo Skewness
Most/LeastSymmetric85
TransformationsUseful Visualization: MargDistPlot
ChangeSummaryTo Skewness
Most/LeastSymmetric86Transformations87Transformations88Transformations89Transformations90TransformationsAside on Kurtosis:4-th Moment Summary
Nice Graphic
(from mvpprograms.com)
91TransformationsAside on Kurtosis:4-th Moment SummaryPositiveBig at PeakAnd Tails
92TransformationsAside on Kurtosis:4-th Moment Summary
NegativeBig in Flanks
93Clusters in dataCommon Statistical Task:Find Clusters in DataInteresting sub-populations?Important structure in data?How to do this?
PCA provides very simple approachThere is a large literature of other methods(will study more later)94PCA to find clustersToy Example (2 clusters)
95PCA to find clustersToy Example (2 clusters)Dominant direction finds very distinct clustersSkewer through meatballs (in point cloud space)Shows up clearly in scores plotAn important use of scores plot is finding such structure96PCA to find clustersRecall Toy Example with more clusters:
97PCA to find clustersBest revealed by 2d scatterplots (4 clusters):
98PCA to find clustersCaution: there are limitationsRevealed by NCI60 dataRecall Microarray dataWith 8 known different cancer typesCould separate out clustersUsing specialized DWD directionsBut mostly not found using PCARecall only finds dirns of max variationDoes not use class label information99PCA to find clustersSpecific DWD directions, for NCI60 Data:
GoodSeparationOf CancerTypes
100PCA to find clustersPCA directions, for NCI60 Data:
PC2:Melanoma
PC1-3Leukemia
101PCA to find clustersOff Diagonal PC1-4 & 5-8, NCI60 Data:PC1 & 5RenalLeukemia(separated)
PC2:Melanoma
102PCA to find clustersMain Lesson: PCA is limited at finding clustersRevealed by NCI60 dataRecall Microarray dataWith 8 known different cancer typesPCA does not find all 8 clustersRecall only finds dirns of max variationDoes not use class label information103PCA to find clustersA deeper example:Mass Flux DataData from Enrica Bellone,National Center for Atmospheric ResearchMass Flux for quantifying cloud typesHow does mass change when moving into a cloud104PCA to find clustersPCA of Mass Flux Data:
105PCA to find clustersSummary of PCA of Mass Flux Data:Mean:Captures General Mountain ShapePC1:Generally overall height of peakshows up nicely in mean +- plot (2nd col)3 apparent clusters in scores plotAre those really there?If so, could lead to interesting discoveryIf not, could waste effort in investigation 106PCA to find clustersSummary of PCA of Mass Flux Data:PC2:Location of peakagain mean +- plot very useful herePC3:Width adjustmentagain see most clearly in mean +- plot
Maybe non-linear modes of variation???107PCA to find clustersReturn to Investigation of PC1 Clusters:Can see 3 bumps in smooth histogramMain Question: Important structureorsampling variability?
Approach: SiZer(SIgnificance of ZERo crossings of deriv.)108