part 1 – data presentation statistics and data analysis
Post on 19-Jan-2016
216 Views
Preview:
TRANSCRIPT
Part 1 – Data Presentation
Statistics and Data Analysis
Part 1 – Data Presentation
Statistics and Data Analysis
Part 1 – Data Presentation
PepperoniPlainMushroomSausagePepper and OnionMushroom and OnionGarlicMeatball
CategoryMeatball
5.0%Garlic2.3%
Mushroom and Onion9.2%
Pepper and Onion7.3%
Sausage5.8%
Mushroom16.2%
Plain32.5%
Pepperoni21.8%
Pie Chart of Percent vs Type
List
ing
900000
800000
700000
600000
500000
400000
300000
200000
100000
Boxplot of Listing
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Perc
ent
10000008000006000004000002000000
99
95
90
80
70
605040
30
20
10
5
1
Mean 369687StDev 156865N 51AD 0.994P-Value 0.012
Probability Plot of ListingNormal - 95% CI
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Frequency
900000800000700000600000500000400000300000200000
14
12
10
8
6
4
2
0
Histogram of Listing
Listing
Perc
ent
9000
00
8000
00
7000
00
6000
00
5000
00
4000
00
3000
00
2000
00
1000
000
100
80
60
40
20
0
Mean 369687StDev 156865N 51
Empirical CDF of ListingNormal
IncomePC
List
ing
30000250002000015000
1000000
800000
600000
400000
200000
Marginal Plot of Listing vs IncomePC
Part 1 – Data Presentation
3
Data Presentation Agenda
Data and Data Types Representing Data: pie chart, bar chart. Summarizing Data: box plot, histogram
Central tendencySpreadDistribution (shape)
1/29
PepperoniPlainMushroomSausagePepper and OnionMushroom and OnionGarlicMeatball
CategoryMeatball
5.0%Garlic2.3%
Mushroom and Onion9.2%
Pepper and Onion7.3%
Sausage5.8%
Mushroom16.2%
Plain32.5%
Pepperoni21.8%
Pie Chart of Percent vs Type
List
ing
900000
800000
700000
600000
500000
400000
300000
200000
100000
Boxplot of Listing
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Perc
ent
10000008000006000004000002000000
99
95
90
80
70
605040
30
20
10
5
1
Mean 369687StDev 156865N 51AD 0.994P-Value 0.012
Probability Plot of ListingNormal - 95% CI
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Frequency
900000800000700000600000500000400000300000200000
14
12
10
8
6
4
2
0
Histogram of Listing
Listing
Perc
ent
9000
00
8000
00
7000
00
6000
00
5000
00
4000
00
3000
00
2000
00
1000
000
100
80
60
40
20
0
Mean 369687StDev 156865N 51
Empirical CDF of ListingNormal
IncomePC
List
ing
30000250002000015000
1000000
800000
600000
400000
200000
Marginal Plot of Listing vs IncomePC
Part 1 – Data Presentation
4
Data = A Set of FactsA picture of some aspect of the world
Pizza Sales by Type
2/29
What do the data tell you?
How can you use the information?
What additional information would make these data more informative?
PepperoniPlainMushroomSausagePepper and OnionMushroom and OnionGarlicMeatball
CategoryMeatball
5.0%Garlic2.3%
Mushroom and Onion9.2%
Pepper and Onion7.3%
Sausage5.8%
Mushroom16.2%
Plain32.5%
Pepperoni21.8%
Pie Chart of Percent vs Type
List
ing
900000
800000
700000
600000
500000
400000
300000
200000
100000
Boxplot of Listing
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Perc
ent
10000008000006000004000002000000
99
95
90
80
70
605040
30
20
10
5
1
Mean 369687StDev 156865N 51AD 0.994P-Value 0.012
Probability Plot of ListingNormal - 95% CI
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Frequency
900000800000700000600000500000400000300000200000
14
12
10
8
6
4
2
0
Histogram of Listing
Listing
Perc
ent
9000
00
8000
00
7000
00
6000
00
5000
00
4000
00
3000
00
2000
00
1000
000
100
80
60
40
20
0
Mean 369687StDev 156865N 51
Empirical CDF of ListingNormal
IncomePC
List
ing
30000250002000015000
1000000
800000
600000
400000
200000
Marginal Plot of Listing vs IncomePC
Part 1 – Data Presentation
5
A More Complicated Set of Facts: What story do the data tell?
3/29
PepperoniPlainMushroomSausagePepper and OnionMushroom and OnionGarlicMeatball
CategoryMeatball
5.0%Garlic2.3%
Mushroom and Onion9.2%
Pepper and Onion7.3%
Sausage5.8%
Mushroom16.2%
Plain32.5%
Pepperoni21.8%
Pie Chart of Percent vs Type
List
ing
900000
800000
700000
600000
500000
400000
300000
200000
100000
Boxplot of Listing
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Perc
ent
10000008000006000004000002000000
99
95
90
80
70
605040
30
20
10
5
1
Mean 369687StDev 156865N 51AD 0.994P-Value 0.012
Probability Plot of ListingNormal - 95% CI
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Frequency
900000800000700000600000500000400000300000200000
14
12
10
8
6
4
2
0
Histogram of Listing
Listing
Perc
ent
9000
00
8000
00
7000
00
6000
00
5000
00
4000
00
3000
00
2000
00
1000
000
100
80
60
40
20
0
Mean 369687StDev 156865N 51
Empirical CDF of ListingNormal
IncomePC
List
ing
30000250002000015000
1000000
800000
600000
400000
200000
Marginal Plot of Listing vs IncomePC
Part 1 – Data Presentation
6
Data Types and Measurement
Univariate vs. Multivariate Quantitative
Discrete = count: Number of shootings by city by time Continuous = measurement: Housing prices
Qualitative Categorical: Shopping mall, car brand, trip mode Ordinal: Survey data on attitudes; “How do you feel about…?”Strongly disagree Disagree Neutral Agree Strongly agreeMoody’s bond ratings: Aaa, Aa, A, Bbb, Bb, B, and so on.
Frameworks Cross section Time series Longitudinal
4/29
PepperoniPlainMushroomSausagePepper and OnionMushroom and OnionGarlicMeatball
CategoryMeatball
5.0%Garlic2.3%
Mushroom and Onion9.2%
Pepper and Onion7.3%
Sausage5.8%
Mushroom16.2%
Plain32.5%
Pepperoni21.8%
Pie Chart of Percent vs Type
List
ing
900000
800000
700000
600000
500000
400000
300000
200000
100000
Boxplot of Listing
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Perc
ent
10000008000006000004000002000000
99
95
90
80
70
605040
30
20
10
5
1
Mean 369687StDev 156865N 51AD 0.994P-Value 0.012
Probability Plot of ListingNormal - 95% CI
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Frequency
900000800000700000600000500000400000300000200000
14
12
10
8
6
4
2
0
Histogram of Listing
Listing
Perc
ent
9000
00
8000
00
7000
00
6000
00
5000
00
4000
00
3000
00
2000
00
1000
000
100
80
60
40
20
0
Mean 369687StDev 156865N 51
Empirical CDF of ListingNormal
IncomePC
List
ing
30000250002000015000
1000000
800000
600000
400000
200000
Marginal Plot of Listing vs IncomePC
Part 1 – Data Presentation
7
Univariate vs. Multivariate
Univariate: Count of pizzas is the single variable.
Multivariate: Numerous Variables
5/29
PepperoniPlainMushroomSausagePepper and OnionMushroom and OnionGarlicMeatball
CategoryMeatball
5.0%Garlic2.3%
Mushroom and Onion9.2%
Pepper and Onion7.3%
Sausage5.8%
Mushroom16.2%
Plain32.5%
Pepperoni21.8%
Pie Chart of Percent vs Type
List
ing
900000
800000
700000
600000
500000
400000
300000
200000
100000
Boxplot of Listing
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Perc
ent
10000008000006000004000002000000
99
95
90
80
70
605040
30
20
10
5
1
Mean 369687StDev 156865N 51AD 0.994P-Value 0.012
Probability Plot of ListingNormal - 95% CI
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Frequency
900000800000700000600000500000400000300000200000
14
12
10
8
6
4
2
0
Histogram of Listing
Listing
Perc
ent
9000
00
8000
00
7000
00
6000
00
5000
00
4000
00
3000
00
2000
00
1000
000
100
80
60
40
20
0
Mean 369687StDev 156865N 51
Empirical CDF of ListingNormal
IncomePC
List
ing
30000250002000015000
1000000
800000
600000
400000
200000
Marginal Plot of Listing vs IncomePC
Part 1 – Data Presentation
8
Discrete Data – US Crime Statistics; Counts of Occurrences.
6/29
PepperoniPlainMushroomSausagePepper and OnionMushroom and OnionGarlicMeatball
CategoryMeatball
5.0%Garlic2.3%
Mushroom and Onion9.2%
Pepper and Onion7.3%
Sausage5.8%
Mushroom16.2%
Plain32.5%
Pepperoni21.8%
Pie Chart of Percent vs Type
List
ing
900000
800000
700000
600000
500000
400000
300000
200000
100000
Boxplot of Listing
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Perc
ent
10000008000006000004000002000000
99
95
90
80
70
605040
30
20
10
5
1
Mean 369687StDev 156865N 51AD 0.994P-Value 0.012
Probability Plot of ListingNormal - 95% CI
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Frequency
900000800000700000600000500000400000300000200000
14
12
10
8
6
4
2
0
Histogram of Listing
Listing
Perc
ent
9000
00
8000
00
7000
00
6000
00
5000
00
4000
00
3000
00
2000
00
1000
000
100
80
60
40
20
0
Mean 369687StDev 156865N 51
Empirical CDF of ListingNormal
IncomePC
List
ing
30000250002000015000
1000000
800000
600000
400000
200000
Marginal Plot of Listing vs IncomePC
Part 1 – Data Presentation
9
Continuous DataHousing Prices and Incomes
7/29
PepperoniPlainMushroomSausagePepper and OnionMushroom and OnionGarlicMeatball
CategoryMeatball
5.0%Garlic2.3%
Mushroom and Onion9.2%
Pepper and Onion7.3%
Sausage5.8%
Mushroom16.2%
Plain32.5%
Pepperoni21.8%
Pie Chart of Percent vs Type
List
ing
900000
800000
700000
600000
500000
400000
300000
200000
100000
Boxplot of Listing
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Perc
ent
10000008000006000004000002000000
99
95
90
80
70
605040
30
20
10
5
1
Mean 369687StDev 156865N 51AD 0.994P-Value 0.012
Probability Plot of ListingNormal - 95% CI
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Frequency
900000800000700000600000500000400000300000200000
14
12
10
8
6
4
2
0
Histogram of Listing
Listing
Perc
ent
9000
00
8000
00
7000
00
6000
00
5000
00
4000
00
3000
00
2000
00
1000
000
100
80
60
40
20
0
Mean 369687StDev 156865N 51
Empirical CDF of ListingNormal
IncomePC
List
ing
30000250002000015000
1000000
800000
600000
400000
200000
Marginal Plot of Listing vs IncomePC
Part 1 – Data Presentation
10
Unordered Qualitative DataTravel Mode by 210 Travelers*
8/29
* Note: Not computed with Minitab
PepperoniPlainMushroomSausagePepper and OnionMushroom and OnionGarlicMeatball
CategoryMeatball
5.0%Garlic2.3%
Mushroom and Onion9.2%
Pepper and Onion7.3%
Sausage5.8%
Mushroom16.2%
Plain32.5%
Pepperoni21.8%
Pie Chart of Percent vs Type
List
ing
900000
800000
700000
600000
500000
400000
300000
200000
100000
Boxplot of Listing
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Perc
ent
10000008000006000004000002000000
99
95
90
80
70
605040
30
20
10
5
1
Mean 369687StDev 156865N 51AD 0.994P-Value 0.012
Probability Plot of ListingNormal - 95% CI
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Frequency
900000800000700000600000500000400000300000200000
14
12
10
8
6
4
2
0
Histogram of Listing
Listing
Perc
ent
9000
00
8000
00
7000
00
6000
00
5000
00
4000
00
3000
00
2000
00
1000
000
100
80
60
40
20
0
Mean 369687StDev 156865N 51
Empirical CDF of ListingNormal
IncomePC
List
ing
30000250002000015000
1000000
800000
600000
400000
200000
Marginal Plot of Listing vs IncomePC
Part 1 – Data Presentation
11
Ordered Qualitative Data: German Health Satisfaction Survey; 5,831 Women. On a scale from 0 to 10, how do you feel about your health?*
HEALTH SATISFACTIONN = 5831Response Frequency================== 0 97 1 52 2 147 3 287 4 346 5 935 6 631 7 924 8 1329 9 626 10 457
9/29
* Note: Not computed with Minitab
PepperoniPlainMushroomSausagePepper and OnionMushroom and OnionGarlicMeatball
CategoryMeatball
5.0%Garlic2.3%
Mushroom and Onion9.2%
Pepper and Onion7.3%
Sausage5.8%
Mushroom16.2%
Plain32.5%
Pepperoni21.8%
Pie Chart of Percent vs Type
List
ing
900000
800000
700000
600000
500000
400000
300000
200000
100000
Boxplot of Listing
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Perc
ent
10000008000006000004000002000000
99
95
90
80
70
605040
30
20
10
5
1
Mean 369687StDev 156865N 51AD 0.994P-Value 0.012
Probability Plot of ListingNormal - 95% CI
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Frequency
900000800000700000600000500000400000300000200000
14
12
10
8
6
4
2
0
Histogram of Listing
Listing
Perc
ent
9000
00
8000
00
7000
00
6000
00
5000
00
4000
00
3000
00
2000
00
1000
000
100
80
60
40
20
0
Mean 369687StDev 156865N 51
Empirical CDF of ListingNormal
IncomePC
List
ing
30000250002000015000
1000000
800000
600000
400000
200000
Marginal Plot of Listing vs IncomePC
Part 1 – Data Presentation
12
Problems with Ordered Survey Response Data
Safety Count Percent Cum Pct
1 17 27.87 27.87
2 15 24.59 52.46
3 17 27.87 80.33
4 10 16.39 96.72
5 2 3.28 100.00
61 Stern Students’ Ranking of Subway Safety (1994)*
Very Unsatisfactory
Unsatisfactory
OK
Satisfactory
Very Satisfactory
Jeff Simonoff: Data Presentation and Summary, pp. 3-4
10/29
PepperoniPlainMushroomSausagePepper and OnionMushroom and OnionGarlicMeatball
CategoryMeatball
5.0%Garlic2.3%
Mushroom and Onion9.2%
Pepper and Onion7.3%
Sausage5.8%
Mushroom16.2%
Plain32.5%
Pepperoni21.8%
Pie Chart of Percent vs Type
List
ing
900000
800000
700000
600000
500000
400000
300000
200000
100000
Boxplot of Listing
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Perc
ent
10000008000006000004000002000000
99
95
90
80
70
605040
30
20
10
5
1
Mean 369687StDev 156865N 51AD 0.994P-Value 0.012
Probability Plot of ListingNormal - 95% CI
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Frequency
900000800000700000600000500000400000300000200000
14
12
10
8
6
4
2
0
Histogram of Listing
Listing
Perc
ent
9000
00
8000
00
7000
00
6000
00
5000
00
4000
00
3000
00
2000
00
1000
000
100
80
60
40
20
0
Mean 369687StDev 156865N 51
Empirical CDF of ListingNormal
IncomePC
List
ing
30000250002000015000
1000000
800000
600000
400000
200000
Marginal Plot of Listing vs IncomePC
Part 1 – Data Presentation
13
Quantitative vs. Qualitative DataQualitative Data:No units of measurementArithmetic manipulation is usually meaningless. The average of Air and Bus is not Train
Quantitative Data: Units of measurement make sense. Arithmetic computations make sense.
11/29
PepperoniPlainMushroomSausagePepper and OnionMushroom and OnionGarlicMeatball
CategoryMeatball
5.0%Garlic2.3%
Mushroom and Onion9.2%
Pepper and Onion7.3%
Sausage5.8%
Mushroom16.2%
Plain32.5%
Pepperoni21.8%
Pie Chart of Percent vs Type
List
ing
900000
800000
700000
600000
500000
400000
300000
200000
100000
Boxplot of Listing
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Perc
ent
10000008000006000004000002000000
99
95
90
80
70
605040
30
20
10
5
1
Mean 369687StDev 156865N 51AD 0.994P-Value 0.012
Probability Plot of ListingNormal - 95% CI
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Frequency
900000800000700000600000500000400000300000200000
14
12
10
8
6
4
2
0
Histogram of Listing
Listing
Perc
ent
9000
00
8000
00
7000
00
6000
00
5000
00
4000
00
3000
00
2000
00
1000
000
100
80
60
40
20
0
Mean 369687StDev 156865N 51
Empirical CDF of ListingNormal
IncomePC
List
ing
30000250002000015000
1000000
800000
600000
400000
200000
Marginal Plot of Listing vs IncomePC
Part 1 – Data Presentation
14
Cross Section DataHousing Prices and Incomes
13/29
PepperoniPlainMushroomSausagePepper and OnionMushroom and OnionGarlicMeatball
CategoryMeatball
5.0%Garlic2.3%
Mushroom and Onion9.2%
Pepper and Onion7.3%
Sausage5.8%
Mushroom16.2%
Plain32.5%
Pepperoni21.8%
Pie Chart of Percent vs Type
List
ing
900000
800000
700000
600000
500000
400000
300000
200000
100000
Boxplot of Listing
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Perc
ent
10000008000006000004000002000000
99
95
90
80
70
605040
30
20
10
5
1
Mean 369687StDev 156865N 51AD 0.994P-Value 0.012
Probability Plot of ListingNormal - 95% CI
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Frequency
900000800000700000600000500000400000300000200000
14
12
10
8
6
4
2
0
Histogram of Listing
Listing
Perc
ent
9000
00
8000
00
7000
00
6000
00
5000
00
4000
00
3000
00
2000
00
1000
000
100
80
60
40
20
0
Mean 369687StDev 156865N 51
Empirical CDF of ListingNormal
IncomePC
List
ing
30000250002000015000
1000000
800000
600000
400000
200000
Marginal Plot of Listing vs IncomePC
Part 1 – Data Presentation
15
Time Series Data: Car Thefts
14/29
PepperoniPlainMushroomSausagePepper and OnionMushroom and OnionGarlicMeatball
CategoryMeatball
5.0%Garlic2.3%
Mushroom and Onion9.2%
Pepper and Onion7.3%
Sausage5.8%
Mushroom16.2%
Plain32.5%
Pepperoni21.8%
Pie Chart of Percent vs Type
List
ing
900000
800000
700000
600000
500000
400000
300000
200000
100000
Boxplot of Listing
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Perc
ent
10000008000006000004000002000000
99
95
90
80
70
605040
30
20
10
5
1
Mean 369687StDev 156865N 51AD 0.994P-Value 0.012
Probability Plot of ListingNormal - 95% CI
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Frequency
900000800000700000600000500000400000300000200000
14
12
10
8
6
4
2
0
Histogram of Listing
Listing
Perc
ent
9000
00
8000
00
7000
00
6000
00
5000
00
4000
00
3000
00
2000
00
1000
000
100
80
60
40
20
0
Mean 369687StDev 156865N 51
Empirical CDF of ListingNormal
IncomePC
List
ing
30000250002000015000
1000000
800000
600000
400000
200000
Marginal Plot of Listing vs IncomePC
Part 1 – Data Presentation
16
Longitudinal Data: 3 Year Survey: Satisfaction on a scale from 0 to 5.
15/29
PepperoniPlainMushroomSausagePepper and OnionMushroom and OnionGarlicMeatball
CategoryMeatball
5.0%Garlic2.3%
Mushroom and Onion9.2%
Pepper and Onion7.3%
Sausage5.8%
Mushroom16.2%
Plain32.5%
Pepperoni21.8%
Pie Chart of Percent vs Type
List
ing
900000
800000
700000
600000
500000
400000
300000
200000
100000
Boxplot of Listing
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Perc
ent
10000008000006000004000002000000
99
95
90
80
70
605040
30
20
10
5
1
Mean 369687StDev 156865N 51AD 0.994P-Value 0.012
Probability Plot of ListingNormal - 95% CI
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Frequency
900000800000700000600000500000400000300000200000
14
12
10
8
6
4
2
0
Histogram of Listing
Listing
Perc
ent
9000
00
8000
00
7000
00
6000
00
5000
00
4000
00
3000
00
2000
00
1000
000
100
80
60
40
20
0
Mean 369687StDev 156865N 51
Empirical CDF of ListingNormal
IncomePC
List
ing
30000250002000015000
1000000
800000
600000
400000
200000
Marginal Plot of Listing vs IncomePC
Part 1 – Data Presentation
17
Representing Data
In raw formTransformed to a visual formSummarized graphically Summarized statistically
16/29
PepperoniPlainMushroomSausagePepper and OnionMushroom and OnionGarlicMeatball
CategoryMeatball
5.0%Garlic2.3%
Mushroom and Onion9.2%
Pepper and Onion7.3%
Sausage5.8%
Mushroom16.2%
Plain32.5%
Pepperoni21.8%
Pie Chart of Percent vs Type
List
ing
900000
800000
700000
600000
500000
400000
300000
200000
100000
Boxplot of Listing
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Perc
ent
10000008000006000004000002000000
99
95
90
80
70
605040
30
20
10
5
1
Mean 369687StDev 156865N 51AD 0.994P-Value 0.012
Probability Plot of ListingNormal - 95% CI
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Frequency
900000800000700000600000500000400000300000200000
14
12
10
8
6
4
2
0
Histogram of Listing
Listing
Perc
ent
9000
00
8000
00
7000
00
6000
00
5000
00
4000
00
3000
00
2000
00
1000
000
100
80
60
40
20
0
Mean 369687StDev 156865N 51
Empirical CDF of ListingNormal
IncomePC
List
ing
30000250002000015000
1000000
800000
600000
400000
200000
Marginal Plot of Listing vs IncomePC
Part 1 – Data Presentation
18
Housing Prices and Incomes
17/29
PepperoniPlainMushroomSausagePepper and OnionMushroom and OnionGarlicMeatball
CategoryMeatball
5.0%Garlic2.3%
Mushroom and Onion9.2%
Pepper and Onion7.3%
Sausage5.8%
Mushroom16.2%
Plain32.5%
Pepperoni21.8%
Pie Chart of Percent vs Type
List
ing
900000
800000
700000
600000
500000
400000
300000
200000
100000
Boxplot of Listing
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Perc
ent
10000008000006000004000002000000
99
95
90
80
70
605040
30
20
10
5
1
Mean 369687StDev 156865N 51AD 0.994P-Value 0.012
Probability Plot of ListingNormal - 95% CI
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Frequency
900000800000700000600000500000400000300000200000
14
12
10
8
6
4
2
0
Histogram of Listing
Listing
Perc
ent
9000
00
8000
00
7000
00
6000
00
5000
00
4000
00
3000
00
2000
00
1000
000
100
80
60
40
20
0
Mean 369687StDev 156865N 51
Empirical CDF of ListingNormal
IncomePC
List
ing
30000250002000015000
1000000
800000
600000
400000
200000
Marginal Plot of Listing vs IncomePC
Part 1 – Data Presentation
19
Housing Price DataVisual Representation
www.trulia.com/home_prices/
18/29
PepperoniPlainMushroomSausagePepper and OnionMushroom and OnionGarlicMeatball
CategoryMeatball
5.0%Garlic2.3%
Mushroom and Onion9.2%
Pepper and Onion7.3%
Sausage5.8%
Mushroom16.2%
Plain32.5%
Pepperoni21.8%
Pie Chart of Percent vs Type
List
ing
900000
800000
700000
600000
500000
400000
300000
200000
100000
Boxplot of Listing
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Perc
ent
10000008000006000004000002000000
99
95
90
80
70
605040
30
20
10
5
1
Mean 369687StDev 156865N 51AD 0.994P-Value 0.012
Probability Plot of ListingNormal - 95% CI
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Frequency
900000800000700000600000500000400000300000200000
14
12
10
8
6
4
2
0
Histogram of Listing
Listing
Perc
ent
9000
00
8000
00
7000
00
6000
00
5000
00
4000
00
3000
00
2000
00
1000
000
100
80
60
40
20
0
Mean 369687StDev 156865N 51
Empirical CDF of ListingNormal
IncomePC
List
ing
30000250002000015000
1000000
800000
600000
400000
200000
Marginal Plot of Listing vs IncomePC
Part 1 – Data Presentation
20
Pie Chart
PepperoniPlainMushroomSausagePepper and OnionMushroom and OnionGarlicMeatball
CategoryMeatball
5.0%Garlic2.3%
Mushroom and Onion9.2%
Pepper and Onion7.3%
Sausage5.8%
Mushroom16.2%
Plain32.5%
Pepperoni21.8%
Pie Chart of Percent vs Type
Pizza Pies Sold, by Type
19/20
PepperoniPlainMushroomSausagePepper and OnionMushroom and OnionGarlicMeatball
CategoryMeatball
5.0%Garlic2.3%
Mushroom and Onion9.2%
Pepper and Onion7.3%
Sausage5.8%
Mushroom16.2%
Plain32.5%
Pepperoni21.8%
Pie Chart of Percent vs Type
List
ing
900000
800000
700000
600000
500000
400000
300000
200000
100000
Boxplot of Listing
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Perc
ent
10000008000006000004000002000000
99
95
90
80
70
605040
30
20
10
5
1
Mean 369687StDev 156865N 51AD 0.994P-Value 0.012
Probability Plot of ListingNormal - 95% CI
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Frequency
900000800000700000600000500000400000300000200000
14
12
10
8
6
4
2
0
Histogram of Listing
Listing
Perc
ent
9000
00
8000
00
7000
00
6000
00
5000
00
4000
00
3000
00
2000
00
1000
000
100
80
60
40
20
0
Mean 369687StDev 156865N 51
Empirical CDF of ListingNormal
IncomePC
List
ing
30000250002000015000
1000000
800000
600000
400000
200000
Marginal Plot of Listing vs IncomePC
Part 1 – Data Presentation
21
Data Representation
PepperoniPlainMushroomSausagePepper and OnionMushroom and OnionGarlicMeatball
CategoryMeatball
5.0%Garlic2.3%
Mushroom and Onion9.2%
Pepper and Onion7.3%
Sausage5.8%
Mushroom16.2%
Plain32.5%
Pepperoni21.8%
Pie Chart of Percent vs Type
Same data. Which is easier to understand?
20/29
Type
Num
ber
Meatball
Garlic
Mushroo
m and
Onio
n
Pepp
er and
Onio
n
Saus
age
Mushroo
mPla
in
Pepp
eron
i
4000
3000
2000
1000
0
Chart of Number vs Type
BAR CHART PIE CHART
PepperoniPlainMushroomSausagePepper and OnionMushroom and OnionGarlicMeatball
CategoryMeatball
5.0%Garlic2.3%
Mushroom and Onion9.2%
Pepper and Onion7.3%
Sausage5.8%
Mushroom16.2%
Plain32.5%
Pepperoni21.8%
Pie Chart of Percent vs Type
List
ing
900000
800000
700000
600000
500000
400000
300000
200000
100000
Boxplot of Listing
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Perc
ent
10000008000006000004000002000000
99
95
90
80
70
605040
30
20
10
5
1
Mean 369687StDev 156865N 51AD 0.994P-Value 0.012
Probability Plot of ListingNormal - 95% CI
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Frequency
900000800000700000600000500000400000300000200000
14
12
10
8
6
4
2
0
Histogram of Listing
Listing
Perc
ent
9000
00
8000
00
7000
00
6000
00
5000
00
4000
00
3000
00
2000
00
1000
000
100
80
60
40
20
0
Mean 369687StDev 156865N 51
Empirical CDF of ListingNormal
IncomePC
List
ing
30000250002000015000
1000000
800000
600000
400000
200000
Marginal Plot of Listing vs IncomePC
Part 1 – Data Presentation
22
A Box Plot Describes the Distributionof Values in a Set of Data
List
ing
900000
800000
700000
600000
500000
400000
300000
200000
100000
Average House Listing Price by State
21/29
Hawaii
What is an outlier?Why do we believe a particular point is an outlier?
Box and Whisker Plot for House Price Listings
PepperoniPlainMushroomSausagePepper and OnionMushroom and OnionGarlicMeatball
CategoryMeatball
5.0%Garlic2.3%
Mushroom and Onion9.2%
Pepper and Onion7.3%
Sausage5.8%
Mushroom16.2%
Plain32.5%
Pepperoni21.8%
Pie Chart of Percent vs Type
List
ing
900000
800000
700000
600000
500000
400000
300000
200000
100000
Boxplot of Listing
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Perc
ent
10000008000006000004000002000000
99
95
90
80
70
605040
30
20
10
5
1
Mean 369687StDev 156865N 51AD 0.994P-Value 0.012
Probability Plot of ListingNormal - 95% CI
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Frequency
900000800000700000600000500000400000300000200000
14
12
10
8
6
4
2
0
Histogram of Listing
Listing
Perc
ent
9000
00
8000
00
7000
00
6000
00
5000
00
4000
00
3000
00
2000
00
1000
000
100
80
60
40
20
0
Mean 369687StDev 156865N 51
Empirical CDF of ListingNormal
IncomePC
List
ing
30000250002000015000
1000000
800000
600000
400000
200000
Marginal Plot of Listing vs IncomePC
Part 1 – Data Presentation
23
Making a Box Plot for Per Capita Income
Maximum=31136
Median=22610
Minimum=17043
1st Quartile=21677 (approx)
3rd Quartile=24933 (approx.)
Interquartile Range = IQR=24933-21677=3256
22/29
PepperoniPlainMushroomSausagePepper and OnionMushroom and OnionGarlicMeatball
CategoryMeatball
5.0%Garlic2.3%
Mushroom and Onion9.2%
Pepper and Onion7.3%
Sausage5.8%
Mushroom16.2%
Plain32.5%
Pepperoni21.8%
Pie Chart of Percent vs Type
List
ing
900000
800000
700000
600000
500000
400000
300000
200000
100000
Boxplot of Listing
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Perc
ent
10000008000006000004000002000000
99
95
90
80
70
605040
30
20
10
5
1
Mean 369687StDev 156865N 51AD 0.994P-Value 0.012
Probability Plot of ListingNormal - 95% CI
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Frequency
900000800000700000600000500000400000300000200000
14
12
10
8
6
4
2
0
Histogram of Listing
Listing
Perc
ent
9000
00
8000
00
7000
00
6000
00
5000
00
4000
00
3000
00
2000
00
1000
000
100
80
60
40
20
0
Mean 369687StDev 156865N 51
Empirical CDF of ListingNormal
IncomePC
List
ing
30000250002000015000
1000000
800000
600000
400000
200000
Marginal Plot of Listing vs IncomePC
Part 1 – Data Presentation
24
A Frequency Distribution
24/29
PepperoniPlainMushroomSausagePepper and OnionMushroom and OnionGarlicMeatball
CategoryMeatball
5.0%Garlic2.3%
Mushroom and Onion9.2%
Pepper and Onion7.3%
Sausage5.8%
Mushroom16.2%
Plain32.5%
Pepperoni21.8%
Pie Chart of Percent vs Type
List
ing
900000
800000
700000
600000
500000
400000
300000
200000
100000
Boxplot of Listing
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Perc
ent
10000008000006000004000002000000
99
95
90
80
70
605040
30
20
10
5
1
Mean 369687StDev 156865N 51AD 0.994P-Value 0.012
Probability Plot of ListingNormal - 95% CI
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Frequency
900000800000700000600000500000400000300000200000
14
12
10
8
6
4
2
0
Histogram of Listing
Listing
Perc
ent
9000
00
8000
00
7000
00
6000
00
5000
00
4000
00
3000
00
2000
00
1000
000
100
80
60
40
20
0
Mean 369687StDev 156865N 51
Empirical CDF of ListingNormal
IncomePC
List
ing
30000250002000015000
1000000
800000
600000
400000
200000
Marginal Plot of Listing vs IncomePC
Part 1 – Data Presentation
25
Histogram for House Price Listings
Listing
Frequency
900000800000700000600000500000400000300000200000
14
12
10
8
6
4
2
0
Histogram of Listing
25/29
HOG, pp. 16-18
A histogram describes the sample data and suggests the nature of the underlying data generating process. Note the “skewness” of the distribution of listings.
PepperoniPlainMushroomSausagePepper and OnionMushroom and OnionGarlicMeatball
CategoryMeatball
5.0%Garlic2.3%
Mushroom and Onion9.2%
Pepper and Onion7.3%
Sausage5.8%
Mushroom16.2%
Plain32.5%
Pepperoni21.8%
Pie Chart of Percent vs Type
List
ing
900000
800000
700000
600000
500000
400000
300000
200000
100000
Boxplot of Listing
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Perc
ent
10000008000006000004000002000000
99
95
90
80
70
605040
30
20
10
5
1
Mean 369687StDev 156865N 51AD 0.994P-Value 0.012
Probability Plot of ListingNormal - 95% CI
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Frequency
900000800000700000600000500000400000300000200000
14
12
10
8
6
4
2
0
Histogram of Listing
Listing
Perc
ent
9000
00
8000
00
7000
00
6000
00
5000
00
4000
00
3000
00
2000
00
1000
000
100
80
60
40
20
0
Mean 369687StDev 156865N 51
Empirical CDF of ListingNormal
IncomePC
List
ing
30000250002000015000
1000000
800000
600000
400000
200000
Marginal Plot of Listing vs IncomePC
Part 1 – Data Presentation
26
Distribution of House Price Listings
Listing
Frequency
900000800000700000600000500000400000300000200000
14
12
10
8
6
4
2
0
Histogram of Listing
List
ing
900000
800000
700000
600000
500000
400000
300000
200000
100000
Average House Listing Price by State
26/29
Asymmetry (skewness) in the histogram of listing prices…
Shows up in the box and whisker plot. Note the long whisker at the top of the figure.
PepperoniPlainMushroomSausagePepper and OnionMushroom and OnionGarlicMeatball
CategoryMeatball
5.0%Garlic2.3%
Mushroom and Onion9.2%
Pepper and Onion7.3%
Sausage5.8%
Mushroom16.2%
Plain32.5%
Pepperoni21.8%
Pie Chart of Percent vs Type
List
ing
900000
800000
700000
600000
500000
400000
300000
200000
100000
Boxplot of Listing
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Perc
ent
10000008000006000004000002000000
99
95
90
80
70
605040
30
20
10
5
1
Mean 369687StDev 156865N 51AD 0.994P-Value 0.012
Probability Plot of ListingNormal - 95% CI
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Frequency
900000800000700000600000500000400000300000200000
14
12
10
8
6
4
2
0
Histogram of Listing
Listing
Perc
ent
9000
00
8000
00
7000
00
6000
00
5000
00
4000
00
3000
00
2000
00
1000
000
100
80
60
40
20
0
Mean 369687StDev 156865N 51
Empirical CDF of ListingNormal
IncomePC
List
ing
30000250002000015000
1000000
800000
600000
400000
200000
Marginal Plot of Listing vs IncomePC
Part 1 – Data Presentation
27
More than One Group in A Histogram*
NF = 14243
NM = 13083
27/29
* Note: Not computed with Minitab
PepperoniPlainMushroomSausagePepper and OnionMushroom and OnionGarlicMeatball
CategoryMeatball
5.0%Garlic2.3%
Mushroom and Onion9.2%
Pepper and Onion7.3%
Sausage5.8%
Mushroom16.2%
Plain32.5%
Pepperoni21.8%
Pie Chart of Percent vs Type
List
ing
900000
800000
700000
600000
500000
400000
300000
200000
100000
Boxplot of Listing
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Perc
ent
10000008000006000004000002000000
99
95
90
80
70
605040
30
20
10
5
1
Mean 369687StDev 156865N 51AD 0.994P-Value 0.012
Probability Plot of ListingNormal - 95% CI
IncomePC
List
ing
3250030000275002500022500200001750015000
900000
800000
700000
600000
500000
400000
300000
200000
100000
Scatterplot of Listing vs IncomePC
Listing
Frequency
900000800000700000600000500000400000300000200000
14
12
10
8
6
4
2
0
Histogram of Listing
Listing
Perc
ent
9000
00
8000
00
7000
00
6000
00
5000
00
4000
00
3000
00
2000
00
1000
000
100
80
60
40
20
0
Mean 369687StDev 156865N 51
Empirical CDF of ListingNormal
IncomePC
List
ing
30000250002000015000
1000000
800000
600000
400000
200000
Marginal Plot of Listing vs IncomePC
Part 1 – Data Presentation
28
Summary
What story does the data presentation tell? Data in raw form tell no story. Visual representation of data tells something about the data
Data reduction and summary representation: What do we learn? Location Spread Shape of the distribution
What tool is most informative? Reduction to a small number of features Visual displays of data
• Pie chart• Box and whisker plots• Histograms• Time series plots
“There are lies, damned lies and statistics.” (Benjamin Disraeli)
29/29
top related