scienfic and large data visualizaon introduc&on to...

Post on 23-Aug-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Scien&ficandLargeDataVisualiza&on

Introduc&ontoInforma&onVisualiza&on

MassimilianoCorsiniVisualCompu,ngLab,ISTI-CNR-Italy

Nextlessons–Overview•  Introduc&ontoInforma&onVisualiza&on•  Mo&va&ons•  DataTypes,GraphTypesandVisualPercep&on•  Mul&dimensionalData•  GraphDrawing•  Prac&ce

–  Visualiza&onontheWeb–  Javascript,WebGL–  D3.js

Lesson14-15–IntrotoInfoVis•  Introduc&onandmo&va&ons•  Ingredientsofeffec&vevisualiza&on•  Datatypes•  Graphtypes•  Visualpercep&on

Informa,onVisualiza,on

•  Informa,onvisualiza,onisthestudyof(interac,ve)visualrepresenta,onsofabstractdatatoreinforcehumancogni,on.[Wikipedia]

•  Theuseofcomputer-supported,interac,ve,visualrepresenta,onsofabstractdatatoamplifycogni,on.[Cardetal.1999]

•  Theuseofcomputergraphicsandinterac,ontoassisthumansinsolvingproblems.[Purchaseetal.2008]

Informa,onVisualiza,on

•  Thepurposeofinforma,onvisualiza,onistoamplifycogni,veperformance,notjusttocreateinteres,ngpictures.[Card2007]

•  Infographicsisavisualtoolforcommunica,on,fortheunderstandingandfortheanalysis.[AlbertoCairo]

Informa,onVisualiza,on

•  DifferencefromScien4ficVisualiza4on:–  Informa,onvisualiza,ontreatsalsoabstractdata(numericalandnon-numericaldata).

–  Inscien,ficvisualiza,onspa,alrepresenta,onisgiven.

•  DifferencefromVisualAnaly4cs:–  InVisualAnaly,cstheaccentisonthereasoning/interac,onloop.

InfoGraphics–Charts

From the Wall Street Journal

InfoGraphics–Diagrams

Juan Colombata & Enzo Oliva – La Voz del Interior (Argentina)

Mo,va,ons

Mo,va,ons

Which country is close to its historical maximum ?

Mo,va,ons

Easier to answer… Why ?

Ra,onale

•  Thehumanvisualsystem(HVS)isverygoodatiden,fiesandanalyzespaZerns.

•  Wecanvisualizedatatomakeeasyforourbraintoanalyzethem,forexampletodocomparisons.

Effec,veness

•  Tobeeffec,ve,datavisualiza,onshouldbetakenintoaccountseveralfactors:– Datatype– Goal(func,on)– VisualPercep,onSystem

Func,onalArt

•  Func,ondoesnotdictatebutrestrictourchoices.

•  Thisispar,cularlytrueforinfographics.

Example–Comparisons

From “The functional art” by Alberto Cairo

ClevelandandMcGill1984Lessaccurate

Lessaccurate

Adapted from “The functional art” by Alberto Cairo

KeyIngredients

•  Inthefollowingwefocuson:– Datatypes

•  Onedimension,twodimensions,N-dimensions•  Quan,ta,ve,Ordinal,Nominal

– Graphtypes– VisualPercep,on

•  Wegivesomedesignguidelines,meby,me.

Data

•  Informa,onisobtainedfromdata(!)•  Structured/Unstructured.•  Generatedbysensors,bycomputers,byhumans,etc.

VariableTypes•  AccordingtoStevens(1946):– Nominal•  Labels(e.g.apples,oranges,bananas)

– Ordinal•  Toorderingthings(e.g.ranksofmovies)

–  Interval•  Intervalscaleofmeasurements(e.g.,meofdeparture-arrival)

– Ra,o•  Measuresdefinedonara,oscale(e.g.themassofanobject)

S. S. Stevens, “On the theory of scales and measurements.”, Science, 103, pp. 677-680, 1946.

VariableTypes

•  Category– Steven’snominalclass(e.g.countrynames,typeofdisease)

•  Ordinal– Labelsexpressingdegree(e.g.cold,hot,veryhot)–  Ingeneral,encodedasintegerdata.

•  Quan,ta,ve–  Intervals,measures,etc.–  Ingeneral,real-numbereddata.

DataDimensions

•  Commondimensions:– Univariate,bivariate,trivariate– Mul,-variate(N>3dimensions)

•  Variablescanbedependentorindependent.•  EachcaseisapointinaspacewithNdimension(datapoint).

DataDimensions

•  AsetofdatacanberepresentedbyatablewithNcolumns(oneforeachvariable).

Variable1 Variable2 Variable3

Data1

Data2

Data3

Data4

DataRela,onships

•  Atablecanbeusedtorepresentarela,onshipbetweendifferentdata.

UserId Gameid

Data1 Smith 023923

Data2 James 238548

Data3 Frank 385753

Data4 Powell 357352

DataRela,onships

•  1-to-1•  1-to-many•  Many-to-many•  Rela,onshipsmayalsohaveaZributes

Whatwewant..

•  Asetofwellformedandinterconnectedtablesofdata.

Whatwehave..

•  Datadoesnotcomeintheformwewouldlike.

•  Datamayhaveinconsistencies:– Corrupteddata– Missingdata– Severaldatamaybeequivalent(e.g.textfield“R&D”,“Research”,“ResearchandDevelopment”,“r*d”)

DataProcessingPipeline

•  Datadoesnotcomeintheformwewouldlike.

•  Typicaldataprocessingpipeline(fromrawdatatocleanstructureddata):1.  Collectdata.2.  Datasimplifica,on(extractasubsetofinterest).3.  Cleanandstructurethem.

•  Anoutputofthepipelinecanbealsometadata.

DataCollec,on

•  Searchanddownloaddata.•  Parsetext.•  Convertbetweendifferentformats.– Examples:CVStoJSON,MySQLtoHTML,etc.

•  Mergeheterogeneoussources.

DataSimplifica,on

•  Filter/Selec,on– Removeunwanteddata– Removeinvaliddata(nullvalues)

•  Aggrega,on– Collapseseveraldatapointsintoasingleone– Replacesomevalueswithminimum,maximum,average,total,etc.

DataProcessingPipeline

•  Typicallyperformedautoma,callyusingscripts.

•  Human-guideddatatransforma,onsispossible(throughmacro-opera,ons)– Useappropriatetools(e.g.OpenRefine-h?p://openrefine.org)

Metadata

•  Datawhichdescribesthedata– Roleofvariables– Typeofvariables– Constraints– Dependencies

•  Collec,onandprocessingopera,onscanbealsodescribed.

HowtoPresentData?

•  Howtopresentdatagraphically?– Toallowvisualanalysis– TohighlightpaZerns– Toanswersomespecificques,ons– Etc.

Quan,ta,veValues

•  Wehavejustmen,onedtheworkbyCleveland&McGill(1984)– Posi,on– Length– Angle/slope– Area– Volume– Colorsatura,on/shading

PerceptualAccuracy

From “Data Visualization” Course by John C. Hart, for Coursera, 2015.

QUANTITATIVE ORDINAL NOMINAL

Posi,on Posi,on Posi,on

Length Density Hue

Angle Satura,on Texture

Slope Hue Connec,on

Area Texture Containment

Volume Connec,on Density

Density Containment Satura,on

Satura,on Length Shape

Hue Angle Length

Slope Angle

Area Slope

Volume Area

Volume

TablevsGraphs

•  Presenttablesdirectlyispreferredwhen:– Fewdatapoints– Precisevaluesareimportant

UnivariateData

•  Fewinteres,ngsolu,ons.•  Sta,s,caldescrip,on:– Mean,median,standarddevia,on,quar,les.

•  Warning!(somedatathatappearstobeunivariateareactuallybivariate).

Stemplots

•  Alsocalledstemandleafplot.

•  Usedtodisplayquan,ta,vedata,generallyfromsmalldatasets(50orfewerobserva,ons).

•  Easytoprint.•  Easytoread.

Figure from Data Visualization Catalogue (http://datavizcatalogue.com)

Stemplots

Figure from Data Visualization Catalogue (http://datavizcatalogue.com)

BoxandWhiskerPlots•  BoxandWhiskerPlot(orBoxPlot)isaconvenientwayofvisuallydisplayingadatadistribu,onthroughtheirquar,les.

•  Advantages:–  Keyvalues(average,median,25th

percen,le,etc.)–  Ifthereareanyoutliersandwhattheir

valuesare.–  Ifthedataisskewedandinwhat

direc,on.

Figure from Data Visualization Catalogue (http://datavizcatalogue.com)

BoxandWhiskerPlots

Figure from Data Visualization Catalogue (http://datavizcatalogue.com)

LineCharts/LineGraphs

•  InventedbytheScotshengineerandsta,s,cianWilliamPlayfair(1759-1823)

•  Twoquan,ta,vevariables,typically:–  Xà,meorintervals,Yany

•  LineindicatesthattherearealsointermediatevaluesFigure from Data Visualization Catalogue (http://datavizcatalogue.com)

BarCharts

•  BivariateData– Onenominalvariable(typicallyindependent)

– Onequan,ta,vevariable(typicallydependentvariable)

•  Horizontal/Ver,calbars•  Donotconfusewithhistograms.

BarCharts

3D?!Notaverygoodidea..

Histograms

•  BivariateData– Oneindependentandonedependentvariable

– Thefirstvariableisquan,zedinintervals(bins)

Figure from Data Visualization Catalogue (http://datavizcatalogue.com)

PieCharts

•  BivariateData– Oneindependentandonedependentvariable

•  Goodforaquickvisualcheck.

•  Notgoodfor:– Manyvalues.– Accuratecomparisons.

Figure from Data Visualization Catalogue (http://datavizcatalogue.com)

Figure from Data Visualization Catalogue (http://datavizcatalogue.com)

PieCharts

3D?!Notmorereadable.

Figure by Luiz Salomão.

•  Essen,ally,apiechartwiththecenterareacutout.

•  Allowstofocusmoreonarclengthinsteadofcomparingthepropor,onbetweenslices.

•  Morespaceefficient.

Figure from Data Visualization Catalogue (http://datavizcatalogue.com)

DonutCharts

SunburstCharts•  Toshowshierarchythroughaseriesofrings.Eachringcorrespondstoalevelinthehierarchy.

•  Hierarchymovingoutwardsfromthecenter.

•  Colourcanbeusedtohighlighthierarchalgroupingsorspecificcategories.Figure from Data Visualization Catalogue (http://datavizcatalogue.com)

SunburstCharts

Produces by Space Radar app (https://github.com/zz85/space-radar)

ScaZerPlots

•  BivariateData– Twoindependentvariables

•  Goodtoiden,fyrela,onships,outliersandclusters.

Variable1

Variable2

ScaZerPlots

Variable1

Variable2

Outliers

HighDensity

Variable1

Variable2Clusters

SurfaceGraphs

•  TrivariateData– Threecon,nuousvariables

– Twoindependentandonedependent

SurfaceGraphs

•  Colormaybeassociatedtothedependentvariable.

SurfaceGraphs

•  Levelcurvescanbealsoused.

3DScaZerPlots

•  TrivariateData– Threequan,ta,vevariables

•  Sameconceptof2DScaZerPlot

ColoredScaZerPlots

•  TrivariateData•  Colorcanencodeavariableoracategory

BubbleCharts

•  TrivariateData– Threequan,ta,vevariables

•  Donotallowforaccuratecomparison.

•  Colorscanbeusedtoshowdifferentcategories.

BubbleCharts

Figure from http://gapminder.com

ChloroplethMap

•  BivariateData–  Quan,ta,vevalueoverageographicalareas/regions

•  Dataarecoloured,shadedorpaZernedindifferentways.

•  Goodforanoverview,notforaccuratecomparison.•  Smallareascanbeunderemphasized.

WordCloud

•  WordCloudsdisplayshowfrequentlywordsappearinagivenbodyoftext,bymakingthesizeofeachwordpropor4onaltoitsfrequency.

•  Arrangementandcolorcanvaryalot.•  WordCloudscanbeusedtocomparetwobodiesoftextortogiveaquickideaofrepea,ngkeywords(e.g.usedbyresearcherstosummarizethecontentoftheirpapers).

WordClouds

•  Disadvantages:– Longwordsareemphasizedovershortwords.– WordswhoseleZerscontainmanyascendersanddescendersmayreceivemoreaZen,on.

– Noaccuracycomparison,mainlyusedforaesthe,creasons.

WordClouds(example)

Produces by www.jasondavies.com/wordcloud.

WordClouds(example)

Produces by www.jasondavies.com/wordcloud.

WordClouds(example)

From www.wordclouds.com .

Summary

•  Abriefintroduc,ononInforma,onVisualiza,onhasbeengiven.

•  Informa,oncomingfromdata.Datathatshouldbecollectedandprocessedproperly.

•  Aquickpanoramicofgraphtypeshasbeengiven.

Ques&ons?

top related