reduc&on - ucsd dse mas · pdf filewhy reduce? manage visual complexity 1. derive data 2....
TRANSCRIPT
Reduc&on
DataVisualiza&onAmitChourasia
Whyreduce?Managevisualcomplexity1. Derivedata2. Viewchange3. Facet4. Reduce
1. Filter:reduceitems/aGributes(beware“outofsight,outofmindno&on”)2. Aggrega&on:standinsummariza&on(precludesdetails,howandwhatshouldbe
aggregated)5. Dimensionalityreduc&on
Reducing Items and Attributes
FilterItems
Attributes
Aggregate
Items
Attributes
Figures by Eamonn Maguire
Reduce:DesignChoice–AGributeFilteringEliminatenumberofelements• Itemfiltering-Rangeselec&on Filterauthor:SeanConnery
Filter&tlelength:60-269
System FilmFinder
What:Data Table:ninevalueaGributes.
How:Encode ScaGerplot;detailviewwithtext/images.
How:Facet Mul&form,overview–detail.
How:Reduce ItemfilteringFigures by Ahlberg and Shneiderman. “FilmFinder”
ScentedwidgetsAugmentstandardwidgetswithaddi&onalcuestohelpuserdecidewhetherthereisvalueinfurtherdrilldown• Addconcisegraph• Addicons/text• Encodemoreinfointhewidget
Figures by Willett et al. “Scented Widgets: Improving Navigation Cues with Embedded Visualizations””
Reduce:DesignChoice–ItemFilteringEliminatenumberofaGributesasopposedtoitems.O]enusedwithaGributeorderingandsimilaritymeasures.
System DimensionalOrdering,Spacing,andFilteringApproach(DOFSA)
What:Data Table:manyvalueaGributes.
How:Encode Starplots.
How:Facet Smallmul&pleswithmatrixalignment.
How:Reduce AGributefiltering. Figures by Yang et al. “InterRing: An Interactive Tool for Visually Navigating and Manipulating Hierarchical Structures.”
Reduce:DesignChoice–AggregateMergeelementstoformanewderivedelement.Commonaggrega&onmethods.Howevertheyarerarelysufficiente.g.Anscombe’squartet• Average• Minimum• Maximum• Count• Sum
Figures by Yang et al. “InterRing: An Interactive Tool for Visually Navigating and Manipulating Hierarchical Structures.”
Reduce:DesignChoice–ItemAggrega&on(Histogram)Histogramshowsdistribu&onofitemswithinanoriginalaGribute.Choiceofbinsizeiscri&calandtricky.• Usedatacharacteris&cs• Extendthechoicetouser
20
15
10
5
0
Weight Class (lbs)Idiom Histogram
What:Data Table:onequan&ta&vevalueaGribute.
What:Derived Derivedtable:onederivedorderedkeyaGribute(bin),onederivedquan&ta&vevalueaGribute(itemcountperbin).
How:Encode Rec&linearlayout.Linemarkwithalignedposi&ontoexpressderivedvalueaGribute.Posi&on:keyaGribute. Figures by Eamonn Maguire
Reduce:DesignChoice–ItemAggrega&on(Con&nuousScaGerplots)
Con&nuousscaGerplotsuseadense,space-filling2Dmatrixalignment,whereeachpixelisgivenadifferentcolor.Sca@erplotproblems• Occlusion• Sizecoding,labelsworsens
Idiom ConCnuoussca@erplot
What:Data Table:twoquan&ta&vevalueaGributes.
What:Derived Derivedtable:twoorderedkeyaGributes(x,ypixelloca&ons),onequan&ta&veaGribute(overplotdensity).
How:Encode Densespace-filling2Dmatrixalignment,sequen&alcategoricalhue+orderedluminancecolormap.
How:Reduce Itemaggrega&on Figures by Bachthaler and Weiskopf. “Continuous Scatterplots.”
Reduce:DesignChoice–ItemAggrega&on(Boxplot)Boxplotshowsaggregatesta&s&calsummaryforonequan&ta&veaGribute.Fiveitems• Median(50%)• Quar&les(25%&75%)• Upper,Lowerfences• Outliers*
Idiom Boxplotandvaseplot
What:Data Table:manyquan&ta&vevalueaGributes.
What:Derived Fivequan&ta&veaGributesforeachoriginalaGribute,represen&ngitsdistribu&on.
Why:Tasks Characterizedistribu&on;findoutliers,extremes,averages;iden&fyskew.
How:Encode OneglyphperoriginalaGributeexpressingderivedaGributevaluesusingver&calspa&alposi&on,with1Dlistalignmentofglyphsintoseparatedwithhorizontalspa&alposi&on.
How:Reduce Itemaggrega&on.
Scale Items:unlimited.AGributes:dozens. Figures by Wickham and Stryjewski. “40 Years of Boxplots.”
Reduce:DesignChoice–ItemAggrega&on(Solarplot)Solarplotshowsaradialhistogram.Basecircleradiuscontrolsnumberofbins
Idiom Solarplot
What:Data Table:onequan&ta&veaGribute.
What:Derived Derivedtable:onederivedorderedkeyaGribute(bin),onederivedquan&ta&vevalueaGribute(itemcountperbin).Numberofbinsinterac&velycontrolled.
Why:Tasks Characterizedistribu&on;findoutliers,extremes,averages;iden&fyskew.
How:Encode Radiallayout,linemarks.Linelength:expressderivedvalueaGribute;angle:keyaGribute.
How:Reduce Itemaggrega&on.
Scale Originalitems:unlimited.Derivedbins:propor&onaltoscreenspaceallocated.Figures by Wickham and Stryjewski. “40 Years of Boxplots.”
Reduce:DesignChoice–ItemAggrega&on(HierarchicalParallelCoordinates)
Hierarchicalparallelcoordinatesusesclusteringaggrega&onforscalability.Cluster–Mean,minimumandmaximum;depthandshownwithbandofvaryingwidthandopacity.Basecircleradiuscontrolsnumberofbins
Idiom HierarchicalParallelCoordinates
What:Data Table
What:Derived Clusterhierarchyatoporiginaltableofitems.Fiveper-clusteraGributes:count,mean,min,max,depth.
Why:Tasks Characterizedistribu&on;findoutliers,extremes,averages;iden&fyskew.
How:Encode Parallelcoordinates.Colorclustersbyproximityinhierarchy.
How:Reduce Interac&veitemaggrega&ontochangelevelofdetail.
Scale Items:10,000–100,000.Clusters:onedozen. Figures by Wickham and Stryjewski. “40 Years of Boxplots.”
Idiom GeographicallyWeightedBoxplots
What:Data Geographicgeometrywithareaboundaries.Table:KeyaGribute(area),severalquan&ta&vevalueaGributes.Table:Five-numbersta&s&calsummarydistribu&onsforeachoriginalaGribute.
What:Derived
Mul&dimensionaltable:keyaGribute(area),keyaGribute(scale),quan&ta&vevalueaGributes(geographicallyweightedsta&s&calsummariesforeachareaatmul&plescales).
How:Encode
Boxplot
How:Facet Superimposedlayers:globalboxplotasgraybackground,current-scaleboxplotasgreenforeground.
How:Reduce
Spa&alaggrega&on.
Reduce:DesignChoice–ItemAggrega&on(Geowigs)GeographicallyWeightedBoxplotsSophis&catedsupportforspa&alaggrega&onusinggeographicallyweightedregressionandgeographicallyweightedsummarysta&s&cs
Figures by Dykes and Brunsdon. “Geographically Weighted Visualization: Interactive Graphics for Scale-Varying Exploratory Analysis.”
Reduce:DesignChoice–ItemAggrega&on(Spa&al)Modifiablearealunitproblem(MAUP)Issue:Changingtheboundariesoftheregionsusedtoanalyzedatacanyielddrama&callydifferentresults.
Figures by https://www.e-education.psu.edu/geog486/node/1864
Reduce:DesignChoice–AGributeAggrega&on(Dimensionalityreduc&on)
Dimensionalityreduc&on(DR)isanaggrega&onmethodwherethegoalistopreservethemeaningfulstructureofadatasetwhileusingfeweraGributestorepresenttheitems.• Derivelow-dimensionaltargetfromactualhighdimensionaldata• Assump&ons/Usewhen
– Hiddenstructureindata– Significantredundancyindata
WhyDR?• improveperformanceofdownstreamalgorithm
– Avoidcurseofdimensionality
• dataanalysisDRtasks• Dimension-orientedtasksequences
– Namesynthe&cdimensions,mapsynthe&cdimstooriginalones
• Cluster-orientedtasksequences– Verifyclusters,nameclusters,matchclustersandclasses
TobeconCnuedinGuestLecture…Slide: Tamara Munzner,
Dimensionality reduction
• attribute aggregation– derive low-dimensional target space from high-dimensional measured space – use when you can’t directly measure what you care about
• true dimensionality of dataset conjectured to be smaller than dimensionality of measurements• latent factors, hidden variables
1146
Tumor Measurement Data DR
Malignant Benign
data: 9D measured space
derived data: 2D target space
Idiom DimensionalityReducConforDocumentCollecCons
What:Data Tablewith10,000aGributes.
What:Derived TablewithtwoaGributes.
Why:Tasks Characterizedistribu&on;findoutliers,extremes,averages;iden&fyskew.
How:Encode ScaGerplot,coloredbyconjecturedclustering.
How:Reduce AGributeaggrega&on(dimensionalityreduc&on)withMDS
Scale OriginalaGributes:10,000.DerivedaGributes:two.Items:100,000.
DimensionalityReduc&onforDocumentCollec&ons
Figures by Eamonn Maguire
Reduce:DesignChoice–AGributeAggrega&on(Dimensionalityreduc&on)
BagofWords(DR)• Analyzethedifferen&aldistribu&onofwordsbetweenthedocuments• FindclustersofrelateddocumentsbasedoncommonwordusebetweenthemE.g.Ngram
Reduce:DesignChoice–AGributeAggrega&on(Dimensionalityreduc&on)
ProcessUseNaGributestosynthesize2ormoreaGributesEsCmaCngtruedimensionality• HowdoyouknowwhenyouwouldbenefitfromDR?
– Considererrorforlow-dimprojec&onvshigh-dimprojec&on
• Nosinglecorrectanswer;manymetricsproposed– Cumula&vevariancethatisnotaccountedfor– Strain:matchvaria&onsindistance(vsactualdistancevalues)– Stress:differencebetweeninterpointdistancesinhighandlowdims
DesignChoices• ScaGerplot(2synthe&caGributes)• ScaGerplotmatrix(>2synthe&caGributes)
Estimating true dimensionality
• how do you know when you would benefit from DR? – consider error for low-dim projection vs high-dim projection
• no single correct answer; many metrics proposed– cumulative variance that is not accounted for– strain: match variations in distance (vs actual distance values)– stress: difference between interpoint distances in high and low dims
14
Slide: Tamara Munzner,
Reduce:DesignChoice–AGributeAggrega&on(Lineardimensionalityreduc&on:PCA)
Principalcomponentsanalysis(PCA)• Describeloca&onofeachpointaslinearcombina&onofweightsforeachaxis• Findingaxes:firstwithmostvariance,secondwithnextmost,...
Linear dimensionality reduction
• principal components analysis (PCA)– describe location of each point as linear combination of weights for each axis– finding axes: first with most variance, second with next most, ...
17[http://en.wikipedia.org/wiki/File:GaussianScatterPCA.png]Slide: Tamara Munzner, Figures by hhGp://en.wikipedia.org/wiki/File:GaussianScaGerPCA.png]
Reduce:DesignChoice–AGributeAggrega&on(Nonlineardimensionalityreduc&on)
NonlineardimensionalityreducConManytechniquesproposed• MDS,char&ng,isomap,LLE,T-SNE• Manyliteratures:visualiza&on,machinelearning,op&miza&on,psychology,...ProsCanhandlecurvedratherthanlinearstructureCons• Loseall&estooriginaldims/aGribs• Newdimensionscannotbeeasilyrelatedtooriginals
Slide: Tamara Munzner,
Dimensionalityreduc&on:Mul&dimensionalScaling(MDS)
MDS:familyofmethods,linearandnonlinear!
Classicalscaling:minimizestrain• Earlyformula&onequivalenttoPCA(linear)• Nystrom/spectralmethodsapproximateeigenvectors:O(N)
– LandmarkMDS[deSilva2004],PivotMDS[Brandes&Pich2006]
• limita&ons:qualityforveryhighdimensionalsparsedata
Distancescaling:minimizestress• Nonlinearop&miza&on:O(N2)
– SMACOF[deLeeuw1977]
• Force-directedplacement:O(N2)– Stochas&cForce[Chalmers1996]– limita&ons:qualityproblemsfromlocalminima
Glimmergoal:O(N)speedandhighquality Slide: Tamara Munzner,
Dimensionalityreduc&on:Mul&dimensionalScaling(MDS)
Spring-basedMDS:naive• Repeatforallpoints
–Computespringforcetoallotherpoints–Differencebetweenhighdim,lowdimdistance–movetobeGerloca&onusingcomputedforces
• Computedistancesbetweenallpoints–O(N2)itera&on,O(N3)algorithm
Slide: Tamara Munzner,
Spring-based MDS: naive
• repeat for all points– compute spring force to all other points– difference between high dim, low dim distance– move to better location using computed forces
• compute distances between all points– O(N2) iteration, O(N3) algorithm
20
Dimensionalityreduc&on:Mul&dimensionalScaling(MDS)
Fasterspringmodel:Stochas&c• Comparedistancesonlywithafewpoints
– maintainsmalllocalneighborhoodset– each&mepicksomerandoms,swapinifcloser
• Smallconstant:6locals,3randoms(typically)– O(N)itera&on,O(N2)algorithm
Slide: Tamara Munzner,
Faster spring model: Stochastic
• compare distances only with a few points– maintain small local neighborhood set– each time pick some randoms, swap in if closer
• small constant: 6 locals, 3 randoms (typically)– O(N) iteration, O(N2) algorithm
21
Readings• VisualAnalysisandDesign–Chapter13,15