dr. aijun zhang stat3622 data visualization 12 september 2016 · stat3622 data visualization 12...

28
Exploratory Data Analysis Simple Base Graphics Using Lattice Package Exploratory Data Analysis Dr. Aijun Zhang STAT3622 Data Visualization 12 September 2016 StatSoft.org 1

Upload: others

Post on 28-May-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dr. Aijun Zhang STAT3622 Data Visualization 12 September 2016 · STAT3622 Data Visualization 12 September 2016 StatSoft.org 1. Exploratory Data AnalysisSimple Base GraphicsUsing Lattice

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Exploratory Data Analysis

Dr. Aijun ZhangSTAT3622 Data Visualization

12 September 2016

StatSoft.org 1

Page 2: Dr. Aijun Zhang STAT3622 Data Visualization 12 September 2016 · STAT3622 Data Visualization 12 September 2016 StatSoft.org 1. Exploratory Data AnalysisSimple Base GraphicsUsing Lattice

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Outline

1 Exploratory Data Analysis

2 Simple Base Graphics

3 Using Lattice Package

StatSoft.org 2

Page 3: Dr. Aijun Zhang STAT3622 Data Visualization 12 September 2016 · STAT3622 Data Visualization 12 September 2016 StatSoft.org 1. Exploratory Data AnalysisSimple Base GraphicsUsing Lattice

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

John Tukey

John Tukey (1915 – 2000)

Proposed “Exploratory Data Analysis”

Coined terms: Boxplot, Stem-and-Leafplot, ANOVA (Analysis of Variance)

Coined terms “Bit” and “Software”

Co-Developed Fast Fourier Transformalgorithm, Projection Pursuit, Jackknifeestimation

Famous quote: “The best thing aboutbeing a statistician is that you get to playin everyone’s backyard. ”

StatSoft.org 3

Page 4: Dr. Aijun Zhang STAT3622 Data Visualization 12 September 2016 · STAT3622 Data Visualization 12 September 2016 StatSoft.org 1. Exploratory Data AnalysisSimple Base GraphicsUsing Lattice

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

John Tukey

“The greatest value of a picture is when it forces us to notice whatwe never expected to see.”

John Tukey (1977)

Tables

Five-number summary

Scatter plot

Box-plot

Residual plot

Smoother

Stem-and-Leaf plot

Bag plot

Median Polish

StatSoft.org 4

Page 5: Dr. Aijun Zhang STAT3622 Data Visualization 12 September 2016 · STAT3622 Data Visualization 12 September 2016 StatSoft.org 1. Exploratory Data AnalysisSimple Base GraphicsUsing Lattice

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Example 1: Anscombe Dataset

x1 y1 x2 y2 x3 y3 x4 y41 10.00 8.04 10.00 9.14 10.00 7.46 8.00 6.582 8.00 6.95 8.00 8.14 8.00 6.77 8.00 5.763 13.00 7.58 13.00 8.74 13.00 12.74 8.00 7.714 9.00 8.81 9.00 8.77 9.00 7.11 8.00 8.845 11.00 8.33 11.00 9.26 11.00 7.81 8.00 8.476 14.00 9.96 14.00 8.10 14.00 8.84 8.00 7.047 6.00 7.24 6.00 6.13 6.00 6.08 8.00 5.258 4.00 4.26 4.00 3.10 4.00 5.39 19.00 12.509 12.00 10.84 12.00 9.13 12.00 8.15 8.00 5.56

10 7.00 4.82 7.00 7.26 7.00 6.42 8.00 7.9111 5.00 5.68 5.00 4.74 5.00 5.73 8.00 6.89

Mean 9.00 7.50 9.00 7.50 9.00 7.50 9.00 7.50Sd 3.32 2.03 3.32 2.03 3.32 2.03 3.32 2.03Cor 0.82 0.82 0.82 0.82

StatSoft.org 5

Page 6: Dr. Aijun Zhang STAT3622 Data Visualization 12 September 2016 · STAT3622 Data Visualization 12 September 2016 StatSoft.org 1. Exploratory Data AnalysisSimple Base GraphicsUsing Lattice

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Example 1: Anscombe Dataset

StatSoft.org 6

Page 7: Dr. Aijun Zhang STAT3622 Data Visualization 12 September 2016 · STAT3622 Data Visualization 12 September 2016 StatSoft.org 1. Exploratory Data AnalysisSimple Base GraphicsUsing Lattice

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Exploratory Data Analysis

The EDA is a statistical approach to make sense of data by using avariety of techniques (mostly graphical). It may help

Assess assumption about variables distribution

Identify relationship between variables

Extract important variables

Suggest use of appropriate models

Detect problems of collected data (e.g. outliers, missing data,measurement errors)

StatSoft.org 7

Page 8: Dr. Aijun Zhang STAT3622 Data Visualization 12 September 2016 · STAT3622 Data Visualization 12 September 2016 StatSoft.org 1. Exploratory Data AnalysisSimple Base GraphicsUsing Lattice

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Base Statistical Graphics

UnivariteHistogram, Stem-and-Leaf, Dot, Q-Q, Density plotsBoxplot, Box-and-whiskerBar, Pie, Polar, Waterfall charts

BivariateXYplot, Line, Area, Scatter, Bubble charts

Trivariate3D Scatter, Contour, Level/Heatmap, Surface plots

StatSoft.org 8

Page 9: Dr. Aijun Zhang STAT3622 Data Visualization 12 September 2016 · STAT3622 Data Visualization 12 September 2016 StatSoft.org 1. Exploratory Data AnalysisSimple Base GraphicsUsing Lattice

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Which Chart to Use?

StatSoft.org 9

Page 10: Dr. Aijun Zhang STAT3622 Data Visualization 12 September 2016 · STAT3622 Data Visualization 12 September 2016 StatSoft.org 1. Exploratory Data AnalysisSimple Base GraphicsUsing Lattice

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Which Chart to Use?

Indeed, experience matters!

StatSoft.org 10

Page 11: Dr. Aijun Zhang STAT3622 Data Visualization 12 September 2016 · STAT3622 Data Visualization 12 September 2016 StatSoft.org 1. Exploratory Data AnalysisSimple Base GraphicsUsing Lattice

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Outline

1 Exploratory Data Analysis

2 Simple Base Graphics

3 Using Lattice Package

StatSoft.org 11

Page 12: Dr. Aijun Zhang STAT3622 Data Visualization 12 September 2016 · STAT3622 Data Visualization 12 September 2016 StatSoft.org 1. Exploratory Data AnalysisSimple Base GraphicsUsing Lattice

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Iris Dataset

Let’s play with Iris data in RStudio. Refer to R markdown soucecodes and html output files (reproducible).

StatSoft.org 12

Page 13: Dr. Aijun Zhang STAT3622 Data Visualization 12 September 2016 · STAT3622 Data Visualization 12 September 2016 StatSoft.org 1. Exploratory Data AnalysisSimple Base GraphicsUsing Lattice

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Histogram

Options: title(.,main, xlab, ylab), hist(.,breaks, freq, col)figure layout by par(mfrow/mfcol = c(nr,nc))

StatSoft.org 13

Page 14: Dr. Aijun Zhang STAT3622 Data Visualization 12 September 2016 · STAT3622 Data Visualization 12 September 2016 StatSoft.org 1. Exploratory Data AnalysisSimple Base GraphicsUsing Lattice

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Histogram with Density Plot

Options: hist(.,freq=F); lines(density(.), lty, lwd,)

StatSoft.org 14

Page 15: Dr. Aijun Zhang STAT3622 Data Visualization 12 September 2016 · STAT3622 Data Visualization 12 September 2016 StatSoft.org 1. Exploratory Data AnalysisSimple Base GraphicsUsing Lattice

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Boxplot

Remarks: Outliers /∈ {Q1− 1.5IQR,Q3 + 1.5IQR}

Options: plotting x (vector), X (matrix) and x ∼ c (grouping)

StatSoft.org 15

Page 16: Dr. Aijun Zhang STAT3622 Data Visualization 12 September 2016 · STAT3622 Data Visualization 12 September 2016 StatSoft.org 1. Exploratory Data AnalysisSimple Base GraphicsUsing Lattice

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Plotting Categorical Variables

Ticks: data selection/subsetting; see UCLA R-site

StatSoft.org 16

Page 17: Dr. Aijun Zhang STAT3622 Data Visualization 12 September 2016 · STAT3622 Data Visualization 12 September 2016 StatSoft.org 1. Exploratory Data AnalysisSimple Base GraphicsUsing Lattice

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Relationship Between Variables

Tricks: mathematical annotations in plots; see plotmath.html

StatSoft.org 17

Page 18: Dr. Aijun Zhang STAT3622 Data Visualization 12 September 2016 · STAT3622 Data Visualization 12 September 2016 StatSoft.org 1. Exploratory Data AnalysisSimple Base GraphicsUsing Lattice

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Relationship Between Variables

Tricks: color indexing by unclass(DataX$Species)adding legend/text at different locations

StatSoft.org 18

Page 19: Dr. Aijun Zhang STAT3622 Data Visualization 12 September 2016 · STAT3622 Data Visualization 12 September 2016 StatSoft.org 1. Exploratory Data AnalysisSimple Base GraphicsUsing Lattice

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Pairwise Scatter Plots

Tricks: color indexing by unclass(DataX$Species)adding legend/text at different locations

StatSoft.org 19

Page 20: Dr. Aijun Zhang STAT3622 Data Visualization 12 September 2016 · STAT3622 Data Visualization 12 September 2016 StatSoft.org 1. Exploratory Data AnalysisSimple Base GraphicsUsing Lattice

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Pairwise Scatter Plots

Tricks: color indexing by unclass(DataX$Species)adding legend/text at different locations

StatSoft.org 20

Page 21: Dr. Aijun Zhang STAT3622 Data Visualization 12 September 2016 · STAT3622 Data Visualization 12 September 2016 StatSoft.org 1. Exploratory Data AnalysisSimple Base GraphicsUsing Lattice

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Outline

1 Exploratory Data Analysis

2 Simple Base Graphics

3 Using Lattice Package

StatSoft.org 21

Page 22: Dr. Aijun Zhang STAT3622 Data Visualization 12 September 2016 · STAT3622 Data Visualization 12 September 2016 StatSoft.org 1. Exploratory Data AnalysisSimple Base GraphicsUsing Lattice

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Lattice Package

Sarkar (2008; Springer)

Using trellis graphs for multivariate data

Multipanel conditioning and grouping

Elegant high-level data visualization

Covering most of statistical charts

Figures and Codes can be found athttp://lmdvr.r-forge.r-project.org/

Plot customization are not straightforward

StatSoft.org 22

Page 23: Dr. Aijun Zhang STAT3622 Data Visualization 12 September 2016 · STAT3622 Data Visualization 12 September 2016 StatSoft.org 1. Exploratory Data AnalysisSimple Base GraphicsUsing Lattice

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Univariate Distributions with Conditioning and Grouping

Refer to R Markdown for source codes/outputs (reproducible)

StatSoft.org 23

Page 24: Dr. Aijun Zhang STAT3622 Data Visualization 12 September 2016 · STAT3622 Data Visualization 12 September 2016 StatSoft.org 1. Exploratory Data AnalysisSimple Base GraphicsUsing Lattice

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Univariate Distributions with Conditioning and Grouping

StatSoft.org 24

Page 25: Dr. Aijun Zhang STAT3622 Data Visualization 12 September 2016 · STAT3622 Data Visualization 12 September 2016 StatSoft.org 1. Exploratory Data AnalysisSimple Base GraphicsUsing Lattice

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Univariate Distributions with Conditioning and Grouping

StatSoft.org 25

Page 26: Dr. Aijun Zhang STAT3622 Data Visualization 12 September 2016 · STAT3622 Data Visualization 12 September 2016 StatSoft.org 1. Exploratory Data AnalysisSimple Base GraphicsUsing Lattice

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Exploring Bivariate Relationships

StatSoft.org 26

Page 27: Dr. Aijun Zhang STAT3622 Data Visualization 12 September 2016 · STAT3622 Data Visualization 12 September 2016 StatSoft.org 1. Exploratory Data AnalysisSimple Base GraphicsUsing Lattice

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Trivariate Heatmap and 3D Plots

StatSoft.org 27

Page 28: Dr. Aijun Zhang STAT3622 Data Visualization 12 September 2016 · STAT3622 Data Visualization 12 September 2016 StatSoft.org 1. Exploratory Data AnalysisSimple Base GraphicsUsing Lattice

Exploratory Data Analysis Simple Base Graphics Using Lattice Package

Trivariate Heatmap and 3D Plots

StatSoft.org 28