data analysis, presentation, and statistics
DESCRIPTION
Overview Tables and Graphs Populations and Samples Mean, Median, and Standard Deviation Standard Error & 95% Confidence Interval (CI) Error Bars Comparing Means of Two Data Sets Linear Regression (LR)TRANSCRIPT
Data Analysis, Presentation, and Statistics
Fr Clinic I
Overview
• Tables and Graphs• Populations and Samples• Mean, Median, and Standard Deviation• Standard Error & 95% Confidence Interval (CI)• Error Bars• Comparing Means of Two Data Sets• Linear Regression (LR)
Warning• Statistics is a huge field, I’ve simplified considerably
here. For example:– Mean, Median, and Standard Deviation
• There are alternative formulas
– Standard Error and the 95% Confidence Interval• There are other ways to calculate CIs (e.g., z statistic instead of
t; difference between two means, rather than single mean…)
– Error Bars• Don’t go beyond the interpretations I give here!
– Linear Regression• We only look at simple LR and only calculate the intercept, slope
and R2. There is much more to LR!
Should I Use a Table or Graph?
• Tables– Presenting large amount of different data– Comparing multiple characteristics
• Graphs– Visual presentation quickly gives
information– Compare one or two characteristics– Showing trends
TablesWater
(1)
Turbidity (NTU)
(2)
True Color (Pt-Co)
(3)
Apparent Color
(Pt-Co) (4)
Pond Water 10 13 30 Sweetwater 4 5 12
Hiker 3 8 11 MiniWorks 2 3 5 Standard 5a 15 15
a Level at which humans can visually detect turbidity
Table 1: Average Turbidity and Color of Water Treated by Portable Water Filters
Consistent Format, Title, Units, Big FontsDifferentiate Headings, Number Columns
4 5 12
Figures
11
Figure 1: Turbidity of Pond Water, Treated and Untreated
0
5
10
15
20
25
Pond Water Sweetwater Miniworks Hiker Pioneer Voyager
Filter
Turb
idity
(NTU
)
20
107
5
1
11
Consistent Format, Title, UnitsGood Axis Titles, Big Fonts
Graphing Suggestions
• 1, 2, 5 rule – – Set gradations so smallest division of the axis
is a positive integer power of 10 times 1, 2, or 5.
• Huh?
• Set your scale up so that the smallest division is an integer increment.
Graphing Suggestions
• Labels– All axes should be labeled– Include units on the label
• Points, lines, curves– Play around with options– Color can be your friend– Color can be your enemy
Trans #1
-5000
0
5000
10000
15000
20000
-0.2 0 0.2 0.4 0.6 0.8 1
Trans #1
Deflection of Beam 1 vs. applied load
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
0 0.25 0.5 0.75 1 1.25
Deflection (inches)
Load
(pou
nds)
Deflection of Beam 1 vs. applied load
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
0 0.25 0.5 0.75 1 1.25
Deflection (inches)
Load
(pou
nds)
Comparison of Beam Deflections
0
5000
10000
15000
20000
25000
0 0.25 0.5 0.75 1 1.25
Deflection (inches)
Load
(pou
nds)
Beam 1 Beam 2 Beam 3
Comparison of Beam Deflections
0
5000
10000
15000
20000
25000
0 0.25 0.5 0.75 1 1.25
Deflection (inches)
Load
(pou
nds)
Beam 1 Beam 2 Beam 3
Populations and Samples• Population
– All of the possible outcomes of experiment or observation • US population• Particular type of steel beam
• Sample– A finite number of outcomes measured or observations
made• 1000 US citizens• 5 beams
• We use samples to estimate population properties– Mean, Variability (e.g. standard deviation), Distribution
• Height of 1000 US citizens used to estimate mean of US population
Mean and Median
• Turbidity of Treated Water (NTU)Mean Mean = Sum of values divided by number of = Sum of values divided by number of samples samples
= (= (1+3+3+6+8+101+3+3+6+8+10)/6 )/6 = 5.2 NTU= 5.2 NTU
Median = The middle number Median = The middle number Rank - Rank - 1 2 3 4 5 61 2 3 4 5 6Number - Number - 1 3 3 6 8 101 3 3 6 8 10
For even number of sample points, average middle twoFor even number of sample points, average middle two
= (3+6)/2 = 4.5= (3+6)/2 = 4.5
1336810
Excel: Mean – AVERAGE; Median - MEDIAN
Variance
• Measure of variability– sum of the square of the deviation about the
mean divided by degrees of freedom
1n
xxs
2i2
n = number of data points
Excel: variance – VAR
• Square-root of the variance• For phenomena following a Normal
Distribution (bell curve), 95% of population values lie within 1.96 standard deviations of the mean
• Area under curve is probability of getting value within specified range
Standard Deviation, s
Normal Distribution
-4 -2 0 2 4
Standard Deviation
-1.96 1.96
95%
Standard Deviations from Mean
2ss
Excel: standard deviation – STDEV
• Standard deviation of mean – Of sample of size n – taken from population with standard deviation s
– Estimate of mean depends on sample selected– As n , variance of mean estimate goes down, i.e.,
estimate of population mean improves– As n , mean estimate distribution approaches normal,
regardless of population distribution
Standard Error of Mean
nssX
• Interval within which we are 95 % confident the true mean lies
• t95%,n-1 is t-statistic for 95% CI if sample size = n– If n 30, let t95%,n-1 = 1.96 (Normal Distribution)– Otherwise, use Excel formula: TINV(0.05,n-1)
• n = number of data points
95% Confidence Interval (CI) for Mean
X1n%,95 stX
• Show data variability on plot of mean values
• Types of error bars include:• ± Standard Deviation, ± Standard Error, ± 95% CI• Maximum
and minimum value
Error Bars
0
2
4
6
8
10
Filter 1 Filger 2 Filter 3
Filter Type
Turb
idity
(NTU
)
• Standard Deviation– Demonstrates data variability, but no comparison
possible
• Standard Error– If bars overlap, any difference in means is not statistically
significant– If bars do not overlap, indicates nothing!
• 95% Confidence Interval– If bars overlap, indicates nothing!– If bars do not overlap, difference is statistically significant
• We’ll use 95 % CI
Using Error Bars to compare data
Example 1Turbidity Data
1 2 3 mean St Dev n St Error t95%,2 +/- 95% CINTU NTU NTU NTU NTU NTU
Filter 1 2.1 2.1 2.2 2.1 0.06 3 0.03 4.30 0.14Filter 2 3.2 4.4 5 4.2 0.92 3 0.53 4.30 2.28Filter 3 4.3 4.2 4.5 4.3 0.15 3 0.09 4.30 0.38
2.1
4.2 4.3
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
Filter 1 Filter 2 Filter 3
Portable Water Filter
Turb
idity
(NTU
)
Create Bar Chart of Name vs Mean. Right click on data. Select “Format Data Series”.
Example 2Turbidity Measurements
Time 1 2 3 mean St Dev n St Error t95,2 +/- 95% CIMin NTU NTU NTU NTU NTU NTU1 4.3 4.5 4.6 4.5 0.15 3 0.09 4.30 0.382 4.4 4.4 4.5 4.4 0.06 3 0.03 4.30 0.143 4.3 4.2 4.2 4.2 0.06 3 0.03 4.30 0.14
0.0
1.0
2.0
3.0
4.0
5.0
6.0
0 1 2 3 4
Time (min)
Turb
idity
(NTU
)
Linear Regression
• Fit the best straight line to a data set
y = 1.897x + 0.8667R2 = 0.9762
0
5
10
15
20
25
0 2 4 6 8 10 12
Height (m)
Gra
de P
oint
Ave
rage
Right-click on data point and use “trendline” option. Use “options” tab to get equation and R2.
R2 - Coefficient of multiple Determination
2
i
2i
2i
2ii2
yyyy
yyyy
1R
ŷi = Predicted y values, from regression equationyi = Observed y values
R2 = fraction of variance explained by regression (variance = standard deviation squared)= 1 if data lies along a straight line