reduc&on - ucsd dse mas · pdf filewhy reduce? manage visual complexity 1. derive data 2....

24
Reduc&on Data Visualiza&on Amit Chourasia

Upload: dinhthuy

Post on 08-Feb-2018

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Reduc&on - UCSD DSE MAS · PDF fileWhy reduce? Manage visual complexity 1. Derive data 2. View change 3. Facet 4. Reduce 1. Filter : reduce items/aributes (beware “out of sight,

Reduc&on

DataVisualiza&onAmitChourasia

Page 2: Reduc&on - UCSD DSE MAS · PDF fileWhy reduce? Manage visual complexity 1. Derive data 2. View change 3. Facet 4. Reduce 1. Filter : reduce items/aributes (beware “out of sight,

Whyreduce?Managevisualcomplexity1.  Derivedata2.  Viewchange3.  Facet4.   Reduce

1.  Filter:reduceitems/aGributes(beware“outofsight,outofmindno&on”)2.  Aggrega&on:standinsummariza&on(precludesdetails,howandwhatshouldbe

aggregated)5.  Dimensionalityreduc&on

Page 3: Reduc&on - UCSD DSE MAS · PDF fileWhy reduce? Manage visual complexity 1. Derive data 2. View change 3. Facet 4. Reduce 1. Filter : reduce items/aributes (beware “out of sight,

Reducing Items and Attributes

FilterItems

Attributes

Aggregate

Items

Attributes

Figures by Eamonn Maguire

Page 4: Reduc&on - UCSD DSE MAS · PDF fileWhy reduce? Manage visual complexity 1. Derive data 2. View change 3. Facet 4. Reduce 1. Filter : reduce items/aributes (beware “out of sight,

Reduce:DesignChoice–AGributeFilteringEliminatenumberofelements•  Itemfiltering-Rangeselec&on Filterauthor:SeanConnery

Filter&tlelength:60-269

System FilmFinder

What:Data Table:ninevalueaGributes.

How:Encode ScaGerplot;detailviewwithtext/images.

How:Facet Mul&form,overview–detail.

How:Reduce ItemfilteringFigures by Ahlberg and Shneiderman. “FilmFinder”

Page 5: Reduc&on - UCSD DSE MAS · PDF fileWhy reduce? Manage visual complexity 1. Derive data 2. View change 3. Facet 4. Reduce 1. Filter : reduce items/aributes (beware “out of sight,

ScentedwidgetsAugmentstandardwidgetswithaddi&onalcuestohelpuserdecidewhetherthereisvalueinfurtherdrilldown•  Addconcisegraph•  Addicons/text•  Encodemoreinfointhewidget

Figures by Willett et al. “Scented Widgets: Improving Navigation Cues with Embedded Visualizations””

Page 6: Reduc&on - UCSD DSE MAS · PDF fileWhy reduce? Manage visual complexity 1. Derive data 2. View change 3. Facet 4. Reduce 1. Filter : reduce items/aributes (beware “out of sight,

Reduce:DesignChoice–ItemFilteringEliminatenumberofaGributesasopposedtoitems.O]enusedwithaGributeorderingandsimilaritymeasures.

System DimensionalOrdering,Spacing,andFilteringApproach(DOFSA)

What:Data Table:manyvalueaGributes.

How:Encode Starplots.

How:Facet Smallmul&pleswithmatrixalignment.

How:Reduce AGributefiltering. Figures by Yang et al. “InterRing: An Interactive Tool for Visually Navigating and Manipulating Hierarchical Structures.”

Page 7: Reduc&on - UCSD DSE MAS · PDF fileWhy reduce? Manage visual complexity 1. Derive data 2. View change 3. Facet 4. Reduce 1. Filter : reduce items/aributes (beware “out of sight,

Reduce:DesignChoice–AggregateMergeelementstoformanewderivedelement.Commonaggrega&onmethods.Howevertheyarerarelysufficiente.g.Anscombe’squartet•  Average•  Minimum•  Maximum•  Count•  Sum

Figures by Yang et al. “InterRing: An Interactive Tool for Visually Navigating and Manipulating Hierarchical Structures.”

Page 8: Reduc&on - UCSD DSE MAS · PDF fileWhy reduce? Manage visual complexity 1. Derive data 2. View change 3. Facet 4. Reduce 1. Filter : reduce items/aributes (beware “out of sight,

Reduce:DesignChoice–ItemAggrega&on(Histogram)Histogramshowsdistribu&onofitemswithinanoriginalaGribute.Choiceofbinsizeiscri&calandtricky.•  Usedatacharacteris&cs•  Extendthechoicetouser

20

15

10

5

0

Weight Class (lbs)Idiom Histogram

What:Data Table:onequan&ta&vevalueaGribute.

What:Derived Derivedtable:onederivedorderedkeyaGribute(bin),onederivedquan&ta&vevalueaGribute(itemcountperbin).

How:Encode Rec&linearlayout.Linemarkwithalignedposi&ontoexpressderivedvalueaGribute.Posi&on:keyaGribute. Figures by Eamonn Maguire

Page 9: Reduc&on - UCSD DSE MAS · PDF fileWhy reduce? Manage visual complexity 1. Derive data 2. View change 3. Facet 4. Reduce 1. Filter : reduce items/aributes (beware “out of sight,

Reduce:DesignChoice–ItemAggrega&on(Con&nuousScaGerplots)

Con&nuousscaGerplotsuseadense,space-filling2Dmatrixalignment,whereeachpixelisgivenadifferentcolor.Sca@erplotproblems•  Occlusion•  Sizecoding,labelsworsens

Idiom ConCnuoussca@erplot

What:Data Table:twoquan&ta&vevalueaGributes.

What:Derived Derivedtable:twoorderedkeyaGributes(x,ypixelloca&ons),onequan&ta&veaGribute(overplotdensity).

How:Encode Densespace-filling2Dmatrixalignment,sequen&alcategoricalhue+orderedluminancecolormap.

How:Reduce Itemaggrega&on Figures by Bachthaler and Weiskopf. “Continuous Scatterplots.”

Page 10: Reduc&on - UCSD DSE MAS · PDF fileWhy reduce? Manage visual complexity 1. Derive data 2. View change 3. Facet 4. Reduce 1. Filter : reduce items/aributes (beware “out of sight,

Reduce:DesignChoice–ItemAggrega&on(Boxplot)Boxplotshowsaggregatesta&s&calsummaryforonequan&ta&veaGribute.Fiveitems•  Median(50%)•  Quar&les(25%&75%)•  Upper,Lowerfences•  Outliers*

Idiom Boxplotandvaseplot

What:Data Table:manyquan&ta&vevalueaGributes.

What:Derived Fivequan&ta&veaGributesforeachoriginalaGribute,represen&ngitsdistribu&on.

Why:Tasks Characterizedistribu&on;findoutliers,extremes,averages;iden&fyskew.

How:Encode OneglyphperoriginalaGributeexpressingderivedaGributevaluesusingver&calspa&alposi&on,with1Dlistalignmentofglyphsintoseparatedwithhorizontalspa&alposi&on.

How:Reduce Itemaggrega&on.

Scale Items:unlimited.AGributes:dozens. Figures by Wickham and Stryjewski. “40 Years of Boxplots.”

Page 11: Reduc&on - UCSD DSE MAS · PDF fileWhy reduce? Manage visual complexity 1. Derive data 2. View change 3. Facet 4. Reduce 1. Filter : reduce items/aributes (beware “out of sight,

Reduce:DesignChoice–ItemAggrega&on(Solarplot)Solarplotshowsaradialhistogram.Basecircleradiuscontrolsnumberofbins

Idiom Solarplot

What:Data Table:onequan&ta&veaGribute.

What:Derived Derivedtable:onederivedorderedkeyaGribute(bin),onederivedquan&ta&vevalueaGribute(itemcountperbin).Numberofbinsinterac&velycontrolled.

Why:Tasks Characterizedistribu&on;findoutliers,extremes,averages;iden&fyskew.

How:Encode Radiallayout,linemarks.Linelength:expressderivedvalueaGribute;angle:keyaGribute.

How:Reduce Itemaggrega&on.

Scale Originalitems:unlimited.Derivedbins:propor&onaltoscreenspaceallocated.Figures by Wickham and Stryjewski. “40 Years of Boxplots.”

Page 12: Reduc&on - UCSD DSE MAS · PDF fileWhy reduce? Manage visual complexity 1. Derive data 2. View change 3. Facet 4. Reduce 1. Filter : reduce items/aributes (beware “out of sight,

Reduce:DesignChoice–ItemAggrega&on(HierarchicalParallelCoordinates)

Hierarchicalparallelcoordinatesusesclusteringaggrega&onforscalability.Cluster–Mean,minimumandmaximum;depthandshownwithbandofvaryingwidthandopacity.Basecircleradiuscontrolsnumberofbins

Idiom HierarchicalParallelCoordinates

What:Data Table

What:Derived Clusterhierarchyatoporiginaltableofitems.Fiveper-clusteraGributes:count,mean,min,max,depth.

Why:Tasks Characterizedistribu&on;findoutliers,extremes,averages;iden&fyskew.

How:Encode Parallelcoordinates.Colorclustersbyproximityinhierarchy.

How:Reduce Interac&veitemaggrega&ontochangelevelofdetail.

Scale Items:10,000–100,000.Clusters:onedozen. Figures by Wickham and Stryjewski. “40 Years of Boxplots.”

Page 13: Reduc&on - UCSD DSE MAS · PDF fileWhy reduce? Manage visual complexity 1. Derive data 2. View change 3. Facet 4. Reduce 1. Filter : reduce items/aributes (beware “out of sight,

Idiom GeographicallyWeightedBoxplots

What:Data Geographicgeometrywithareaboundaries.Table:KeyaGribute(area),severalquan&ta&vevalueaGributes.Table:Five-numbersta&s&calsummarydistribu&onsforeachoriginalaGribute.

What:Derived

Mul&dimensionaltable:keyaGribute(area),keyaGribute(scale),quan&ta&vevalueaGributes(geographicallyweightedsta&s&calsummariesforeachareaatmul&plescales).

How:Encode

Boxplot

How:Facet Superimposedlayers:globalboxplotasgraybackground,current-scaleboxplotasgreenforeground.

How:Reduce

Spa&alaggrega&on.

Reduce:DesignChoice–ItemAggrega&on(Geowigs)GeographicallyWeightedBoxplotsSophis&catedsupportforspa&alaggrega&onusinggeographicallyweightedregressionandgeographicallyweightedsummarysta&s&cs

Figures by Dykes and Brunsdon. “Geographically Weighted Visualization: Interactive Graphics for Scale-Varying Exploratory Analysis.”

Page 14: Reduc&on - UCSD DSE MAS · PDF fileWhy reduce? Manage visual complexity 1. Derive data 2. View change 3. Facet 4. Reduce 1. Filter : reduce items/aributes (beware “out of sight,

Reduce:DesignChoice–ItemAggrega&on(Spa&al)Modifiablearealunitproblem(MAUP)Issue:Changingtheboundariesoftheregionsusedtoanalyzedatacanyielddrama&callydifferentresults.

Figures by https://www.e-education.psu.edu/geog486/node/1864

Page 15: Reduc&on - UCSD DSE MAS · PDF fileWhy reduce? Manage visual complexity 1. Derive data 2. View change 3. Facet 4. Reduce 1. Filter : reduce items/aributes (beware “out of sight,

Reduce:DesignChoice–AGributeAggrega&on(Dimensionalityreduc&on)

Dimensionalityreduc&on(DR)isanaggrega&onmethodwherethegoalistopreservethemeaningfulstructureofadatasetwhileusingfeweraGributestorepresenttheitems.•  Derivelow-dimensionaltargetfromactualhighdimensionaldata•  Assump&ons/Usewhen

–  Hiddenstructureindata–  Significantredundancyindata

WhyDR?•  improveperformanceofdownstreamalgorithm

–  Avoidcurseofdimensionality

•  dataanalysisDRtasks•  Dimension-orientedtasksequences

–  Namesynthe&cdimensions,mapsynthe&cdimstooriginalones

•  Cluster-orientedtasksequences–  Verifyclusters,nameclusters,matchclustersandclasses

TobeconCnuedinGuestLecture…Slide: Tamara Munzner,

Dimensionality reduction

• attribute aggregation– derive low-dimensional target space from high-dimensional measured space – use when you can’t directly measure what you care about

• true dimensionality of dataset conjectured to be smaller than dimensionality of measurements• latent factors, hidden variables

1146

Tumor Measurement Data DR

Malignant Benign

data: 9D measured space

derived data: 2D target space

Page 16: Reduc&on - UCSD DSE MAS · PDF fileWhy reduce? Manage visual complexity 1. Derive data 2. View change 3. Facet 4. Reduce 1. Filter : reduce items/aributes (beware “out of sight,

Idiom DimensionalityReducConforDocumentCollecCons

What:Data Tablewith10,000aGributes.

What:Derived TablewithtwoaGributes.

Why:Tasks Characterizedistribu&on;findoutliers,extremes,averages;iden&fyskew.

How:Encode ScaGerplot,coloredbyconjecturedclustering.

How:Reduce AGributeaggrega&on(dimensionalityreduc&on)withMDS

Scale OriginalaGributes:10,000.DerivedaGributes:two.Items:100,000.

DimensionalityReduc&onforDocumentCollec&ons

Figures by Eamonn Maguire

Page 17: Reduc&on - UCSD DSE MAS · PDF fileWhy reduce? Manage visual complexity 1. Derive data 2. View change 3. Facet 4. Reduce 1. Filter : reduce items/aributes (beware “out of sight,

Reduce:DesignChoice–AGributeAggrega&on(Dimensionalityreduc&on)

BagofWords(DR)•  Analyzethedifferen&aldistribu&onofwordsbetweenthedocuments•  FindclustersofrelateddocumentsbasedoncommonwordusebetweenthemE.g.Ngram

Page 18: Reduc&on - UCSD DSE MAS · PDF fileWhy reduce? Manage visual complexity 1. Derive data 2. View change 3. Facet 4. Reduce 1. Filter : reduce items/aributes (beware “out of sight,

Reduce:DesignChoice–AGributeAggrega&on(Dimensionalityreduc&on)

ProcessUseNaGributestosynthesize2ormoreaGributesEsCmaCngtruedimensionality•  HowdoyouknowwhenyouwouldbenefitfromDR?

–  Considererrorforlow-dimprojec&onvshigh-dimprojec&on

•  Nosinglecorrectanswer;manymetricsproposed–  Cumula&vevariancethatisnotaccountedfor–  Strain:matchvaria&onsindistance(vsactualdistancevalues)–  Stress:differencebetweeninterpointdistancesinhighandlowdims

DesignChoices•  ScaGerplot(2synthe&caGributes)•  ScaGerplotmatrix(>2synthe&caGributes)

Estimating true dimensionality

• how do you know when you would benefit from DR? – consider error for low-dim projection vs high-dim projection

• no single correct answer; many metrics proposed– cumulative variance that is not accounted for– strain: match variations in distance (vs actual distance values)– stress: difference between interpoint distances in high and low dims

14

Slide: Tamara Munzner,

Page 19: Reduc&on - UCSD DSE MAS · PDF fileWhy reduce? Manage visual complexity 1. Derive data 2. View change 3. Facet 4. Reduce 1. Filter : reduce items/aributes (beware “out of sight,

Reduce:DesignChoice–AGributeAggrega&on(Lineardimensionalityreduc&on:PCA)

Principalcomponentsanalysis(PCA)•  Describeloca&onofeachpointaslinearcombina&onofweightsforeachaxis•  Findingaxes:firstwithmostvariance,secondwithnextmost,...

Linear dimensionality reduction

• principal components analysis (PCA)– describe location of each point as linear combination of weights for each axis– finding axes: first with most variance, second with next most, ...

17[http://en.wikipedia.org/wiki/File:GaussianScatterPCA.png]Slide: Tamara Munzner, Figures by hhGp://en.wikipedia.org/wiki/File:GaussianScaGerPCA.png]

Page 20: Reduc&on - UCSD DSE MAS · PDF fileWhy reduce? Manage visual complexity 1. Derive data 2. View change 3. Facet 4. Reduce 1. Filter : reduce items/aributes (beware “out of sight,

Reduce:DesignChoice–AGributeAggrega&on(Nonlineardimensionalityreduc&on)

NonlineardimensionalityreducConManytechniquesproposed•  MDS,char&ng,isomap,LLE,T-SNE•  Manyliteratures:visualiza&on,machinelearning,op&miza&on,psychology,...ProsCanhandlecurvedratherthanlinearstructureCons•  Loseall&estooriginaldims/aGribs•  Newdimensionscannotbeeasilyrelatedtooriginals

Slide: Tamara Munzner,

Page 21: Reduc&on - UCSD DSE MAS · PDF fileWhy reduce? Manage visual complexity 1. Derive data 2. View change 3. Facet 4. Reduce 1. Filter : reduce items/aributes (beware “out of sight,

Dimensionalityreduc&on:Mul&dimensionalScaling(MDS)

MDS:familyofmethods,linearandnonlinear!

Classicalscaling:minimizestrain•  Earlyformula&onequivalenttoPCA(linear)•  Nystrom/spectralmethodsapproximateeigenvectors:O(N)

–  LandmarkMDS[deSilva2004],PivotMDS[Brandes&Pich2006]

•  limita&ons:qualityforveryhighdimensionalsparsedata

Distancescaling:minimizestress•  Nonlinearop&miza&on:O(N2)

–  SMACOF[deLeeuw1977]

•  Force-directedplacement:O(N2)–  Stochas&cForce[Chalmers1996]–  limita&ons:qualityproblemsfromlocalminima

Glimmergoal:O(N)speedandhighquality Slide: Tamara Munzner,

Page 22: Reduc&on - UCSD DSE MAS · PDF fileWhy reduce? Manage visual complexity 1. Derive data 2. View change 3. Facet 4. Reduce 1. Filter : reduce items/aributes (beware “out of sight,

Dimensionalityreduc&on:Mul&dimensionalScaling(MDS)

Spring-basedMDS:naive•  Repeatforallpoints

–Computespringforcetoallotherpoints–Differencebetweenhighdim,lowdimdistance–movetobeGerloca&onusingcomputedforces

•  Computedistancesbetweenallpoints–O(N2)itera&on,O(N3)algorithm

Slide: Tamara Munzner,

Spring-based MDS: naive

• repeat for all points– compute spring force to all other points– difference between high dim, low dim distance– move to better location using computed forces

• compute distances between all points– O(N2) iteration, O(N3) algorithm

20

Page 23: Reduc&on - UCSD DSE MAS · PDF fileWhy reduce? Manage visual complexity 1. Derive data 2. View change 3. Facet 4. Reduce 1. Filter : reduce items/aributes (beware “out of sight,

Dimensionalityreduc&on:Mul&dimensionalScaling(MDS)

Fasterspringmodel:Stochas&c•  Comparedistancesonlywithafewpoints

–  maintainsmalllocalneighborhoodset–  each&mepicksomerandoms,swapinifcloser

•  Smallconstant:6locals,3randoms(typically)–  O(N)itera&on,O(N2)algorithm

Slide: Tamara Munzner,

Faster spring model: Stochastic

• compare distances only with a few points– maintain small local neighborhood set– each time pick some randoms, swap in if closer

• small constant: 6 locals, 3 randoms (typically)– O(N) iteration, O(N2) algorithm

21

Page 24: Reduc&on - UCSD DSE MAS · PDF fileWhy reduce? Manage visual complexity 1. Derive data 2. View change 3. Facet 4. Reduce 1. Filter : reduce items/aributes (beware “out of sight,

Readings•  VisualAnalysisandDesign–Chapter13,15