ch3 elementary descriptivech3 elementary descriptive ...mduan/stat3411/ch3.pdf · ch3 elementary...

42
Ch3 Elementary Descriptive Ch3 Elementary Descriptive Statistics

Upload: others

Post on 19-Jul-2020

39 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ch3 Elementary DescriptiveCh3 Elementary Descriptive ...mduan/stat3411/ch3.pdf · Ch3 Elementary DescriptiveCh3 Elementary Descriptive Statistics. Section 3.1: Elementary Graphical

Ch3 Elementary DescriptiveCh3 Elementary Descriptive Statistics

Page 2: Ch3 Elementary DescriptiveCh3 Elementary Descriptive ...mduan/stat3411/ch3.pdf · Ch3 Elementary DescriptiveCh3 Elementary Descriptive Statistics. Section 3.1: Elementary Graphical

Section 3.1: Elementary Graphical Treatment of DataTreatment of Data

Before doing ANYTHING with data:

• Understand the question.Understand the question.– An approximate answer to the exact question is

always better than an exact answer to analways better than an exact answer to an approximate question. John Tukey.

h h• Know how the experiment was conducted.

Page 3: Ch3 Elementary DescriptiveCh3 Elementary Descriptive ...mduan/stat3411/ch3.pdf · Ch3 Elementary DescriptiveCh3 Elementary Descriptive Statistics. Section 3.1: Elementary Graphical

The FIRST thing to do with the data is to

PLOT THE DATAPLOT THE DATA– Plot all individual points.

If h i b i– If there are connections between points, e.g. points are from same pairs (or sometimes

bl k ) h i bseparate blocks), show connections between related points.

Page 4: Ch3 Elementary DescriptiveCh3 Elementary Descriptive ...mduan/stat3411/ch3.pdf · Ch3 Elementary DescriptiveCh3 Elementary Descriptive Statistics. Section 3.1: Elementary Graphical

l i d i l iPlotting data is an extremely important step.• More often than not data I get when g

consulting have problems like incorrect data or attributes they didn’t tell me about.y

• Plotting helps reveal relationships and answersanswers.

• Plotting is a very effective way to present ltresults.

– “A picture is worth a thousand words.”

Page 5: Ch3 Elementary DescriptiveCh3 Elementary Descriptive ...mduan/stat3411/ch3.pdf · Ch3 Elementary DescriptiveCh3 Elementary Descriptive Statistics. Section 3.1: Elementary Graphical

Example:8 lb. test fishing line question: Which type(s) of line are strongest?

Listing numerical data

Trilene XL 11.5 11.3 11.7 11.6 11.7 11.4 11.5 11.5 11.6 11.4 Trilene XT 11.6 11.8 11.7 11.7 11.5 11.6 11.6 11.8 11.5 11.7Stren 11.1 11.1 11.2 11.0 11.1 11.3 11.2 10.9 11.0 11.1

It’s hard to see what’s happening without organizing the data.

Page 6: Ch3 Elementary DescriptiveCh3 Elementary Descriptive ...mduan/stat3411/ch3.pdf · Ch3 Elementary DescriptiveCh3 Elementary Descriptive Statistics. Section 3.1: Elementary Graphical

A “dot” diagramA dot diagram

XL XT StrenXL XT Stren11.8 **11.7 ** ***11.6 ** ***11.5 *** *11 4 ** *11.4 ** *11.3 * *11 2 **11.211.1 ****11.0 **11.010.9 *

Page 7: Ch3 Elementary DescriptiveCh3 Elementary Descriptive ...mduan/stat3411/ch3.pdf · Ch3 Elementary DescriptiveCh3 Elementary Descriptive Statistics. Section 3.1: Elementary Graphical

Two groups can be compared with back toTwo groups can be compared with back to back stem and leaf diagramsE g Stopping distances of bikesE.g. Stopping distances of bikes

Treaded tire Smooth tire34 1 8 935 5

5 366 4 37 56 4 37 5

381 39 1

2 0 402 0 40Or dot diagrams

| | | * | ** | | * |** Treaded340 350 360 370 380 390 400340 350 360 370 380 390 400

|*** | * | | * | | * | Smooth

Page 8: Ch3 Elementary DescriptiveCh3 Elementary Descriptive ...mduan/stat3411/ch3.pdf · Ch3 Elementary DescriptiveCh3 Elementary Descriptive Statistics. Section 3.1: Elementary Graphical

When there are associations between sets of data values, plot the data accordingly.

E.g., Snowfall for duluth and White Bear Lake 1972‐2000A t d t l t th d tA not very good way to plot the data

WB Lake Duluth*130 *

120 *110 **

** 100 ***** 100 **** 90 *****

80 ************ 70 ******** 70 **

*** 60 ************ 50 ****

*** 40 *** 40*** 30 **** 20

Page 9: Ch3 Elementary DescriptiveCh3 Elementary Descriptive ...mduan/stat3411/ch3.pdf · Ch3 Elementary DescriptiveCh3 Elementary Descriptive Statistics. Section 3.1: Elementary Graphical

Snowfall plotp

130140

100110120130

Duluth

708090

100

w_t

otal

30405060

snow

0102030

White Bear

01972 1977 1982 1987 1992 1997

yearyear

Page 10: Ch3 Elementary DescriptiveCh3 Elementary Descriptive ...mduan/stat3411/ch3.pdf · Ch3 Elementary DescriptiveCh3 Elementary Descriptive Statistics. Section 3.1: Elementary Graphical

A study of trace metals in South Indian River

3

5

1

2

3

46

T=top water zinc concentration (mg/L)( / )B=bottom water zinc (mg/L)

1 2 3 4 5 6Top 0.415 0.238 0.390 0.410 0.605 0.609B tt 0 430 0 266 0 567 0 531 0 707 0 716Bottom 0.430 0.266 0.567 0.531 0.707 0.716

Page 11: Ch3 Elementary DescriptiveCh3 Elementary Descriptive ...mduan/stat3411/ch3.pdf · Ch3 Elementary DescriptiveCh3 Elementary Descriptive Statistics. Section 3.1: Elementary Graphical

• One of the first things to do when analyzing data is to PLOT the data

0.8

0.4

0.5

0.6

0.7

Zinc

0.1

0.2

0.3

Z

• This is not a useful way to plot the data There is not

0

Top Bottom

• This is not a useful way to plot the data. There is not a clear distinction between bottom water and top water zinc—even though Bottom>Top at all 6water zinc even though Bottom Top at all 6 locations.

Page 12: Ch3 Elementary DescriptiveCh3 Elementary Descriptive ...mduan/stat3411/ch3.pdf · Ch3 Elementary DescriptiveCh3 Elementary Descriptive Statistics. Section 3.1: Elementary Graphical

A better wayA better way

0.7

0.6

0.4

0.5

Zinc

0.3

0.2Top Bottom

Connect points in the same pairConnect points in the same pair.

Page 13: Ch3 Elementary DescriptiveCh3 Elementary Descriptive ...mduan/stat3411/ch3.pdf · Ch3 Elementary DescriptiveCh3 Elementary Descriptive Statistics. Section 3.1: Elementary Graphical

A better wayA better way

0.8

0.6 Bottom=Top

0.4

0.2

00 0.2 0.4 0.6 0.80 0.2 0.4 0.6 0.8

Page 14: Ch3 Elementary DescriptiveCh3 Elementary Descriptive ...mduan/stat3411/ch3.pdf · Ch3 Elementary DescriptiveCh3 Elementary Descriptive Statistics. Section 3.1: Elementary Graphical

• This following plot would imply a natural ordering of sites from 1 to 6sites from 1 to 6.

0.8

0.5

0.6

0.7

0.2

0.3

0.4Zinc

Top

Bottom

0

0.1

0 1 2 3 4 5 6 7

• This would not be the best way to plot the data unless

Site

the sites 1‐6 correspond to a natural ordering such as distance downstream of a factory.

Page 15: Ch3 Elementary DescriptiveCh3 Elementary Descriptive ...mduan/stat3411/ch3.pdf · Ch3 Elementary DescriptiveCh3 Elementary Descriptive Statistics. Section 3.1: Elementary Graphical

Section 3.2: Quantiles and Related Graphical Tools

Quantile:

Roughly speaking, for a number p between 0 and 1 the p quantile of a distribution is a number1, the p quantile of a distribution is a number such that a fraction p of the distribution lies to the l f d f i 1 f h di ib i lileft and a fraction 1‐p of the distribution lies to the right.

Page 16: Ch3 Elementary DescriptiveCh3 Elementary Descriptive ...mduan/stat3411/ch3.pdf · Ch3 Elementary DescriptiveCh3 Elementary Descriptive Statistics. Section 3.1: Elementary Graphical

p quantile = 1O0*pth percentile

Q(0.10) = 0.10 quantile = 10 th percentile( ) q p

Q(0.50) = 0.50 quantile = 50 th percentile = medianQ(0.50) 0.50 quantile 50 percentile median

Q(0 25) =0 25 quantile = 25 th percentile= first quartileQ(0.25) 0.25 quantile 25 percentile first quartile

Q(0 75) =0 75 quantile = 75 th percentile= third quartileQ(0.75) =0.75 quantile = 75 percentile= third quartile

Page 17: Ch3 Elementary DescriptiveCh3 Elementary Descriptive ...mduan/stat3411/ch3.pdf · Ch3 Elementary DescriptiveCh3 Elementary Descriptive Statistics. Section 3.1: Elementary Graphical

• Boxplots are useful summaries, particularly when th t i t f d t l tthere are too many points for a dot plot.

• To make a boxplot, we need essentially 5 numbers.

Page 18: Ch3 Elementary DescriptiveCh3 Elementary Descriptive ...mduan/stat3411/ch3.pdf · Ch3 Elementary DescriptiveCh3 Elementary Descriptive Statistics. Section 3.1: Elementary Graphical
Page 19: Ch3 Elementary DescriptiveCh3 Elementary Descriptive ...mduan/stat3411/ch3.pdf · Ch3 Elementary DescriptiveCh3 Elementary Descriptive Statistics. Section 3.1: Elementary Graphical
Page 20: Ch3 Elementary DescriptiveCh3 Elementary Descriptive ...mduan/stat3411/ch3.pdf · Ch3 Elementary DescriptiveCh3 Elementary Descriptive Statistics. Section 3.1: Elementary Graphical

Section 3.2.3 Q‐Q Plots and Comparing Distributional Shapes

• Most of the statistical tools we will use in this class assume normal distributions (a bell (shaped distribution for the population of possible values)possible values).

• In order to know if these are the right tools for l b b bla particular job, we need to be able to assess

if the data appear to have come from a normal population.

Page 21: Ch3 Elementary DescriptiveCh3 Elementary Descriptive ...mduan/stat3411/ch3.pdf · Ch3 Elementary DescriptiveCh3 Elementary Descriptive Statistics. Section 3.1: Elementary Graphical

• With large amounts of data, one can draw a histogram of the measured values and see if it gis bell‐shaped.

• A normal plot is a method for assessing normality that works well with big or small data sets. It gives a good visual check fordata sets. It gives a good visual check for normality.

Page 22: Ch3 Elementary DescriptiveCh3 Elementary Descriptive ...mduan/stat3411/ch3.pdf · Ch3 Elementary DescriptiveCh3 Elementary Descriptive Statistics. Section 3.1: Elementary Graphical

Simulation: 100 observations, normal with mean=5, st dev=1

• x< rnorm(100 mean=5 sd=1)• x<‐rnorm(100, mean=5, sd=1)

• qqnorm(x)7

86

45x

23

-2 -1 0 1 2

Q uant iles of S tandard Norm al

Page 23: Ch3 Elementary DescriptiveCh3 Elementary Descriptive ...mduan/stat3411/ch3.pdf · Ch3 Elementary DescriptiveCh3 Elementary Descriptive Statistics. Section 3.1: Elementary Graphical

• A normal plot is a plot of the data in a way• A normal plot is a plot of the data in a way such that data from normal populations will come out pretty much in a straight line.

• We plot the corresponding quantiles of a " d d l'' di ib i d d"standard normal'' distribution versus ordered y values

Page 24: Ch3 Elementary DescriptiveCh3 Elementary Descriptive ...mduan/stat3411/ch3.pdf · Ch3 Elementary DescriptiveCh3 Elementary Descriptive Statistics. Section 3.1: Elementary Graphical

In other wordsIn order to plot the data and check for normality, we compare y p

b d d t t• our observed data to

• what we would expect from a sample of standard normal data.

Page 25: Ch3 Elementary DescriptiveCh3 Elementary Descriptive ...mduan/stat3411/ch3.pdf · Ch3 Elementary DescriptiveCh3 Elementary Descriptive Statistics. Section 3.1: Elementary Graphical

A standard normal distribution is a normal distribution with • mean µ=0

d d d i i 1• standard deviation σ=1. Any normal population can be thought of as a rescaledAny normal population can be thought of as a rescaled standard normal population. For example if Z is standard normal, then

• 100 + 5Z will have • µ=100 and σ= 5.

M lti l i ll l b 5 lti li th t d d d i ti b 5Multiplying all values by 5 multiplies the standard deviation by 5. Adding 100 to every number adds 100 to the mean.

Page 26: Ch3 Elementary DescriptiveCh3 Elementary Descriptive ...mduan/stat3411/ch3.pdf · Ch3 Elementary DescriptiveCh3 Elementary Descriptive Statistics. Section 3.1: Elementary Graphical

• So if we plot ordered values from a normal population against corresponding quantiles of p p g p g qa standard normal population, we expect to get a reasonably straight line since anyget a reasonably straight line, since any normal distribution is linearly related to the standard normal distributionstandard normal distribution.

Page 27: Ch3 Elementary DescriptiveCh3 Elementary Descriptive ...mduan/stat3411/ch3.pdf · Ch3 Elementary DescriptiveCh3 Elementary Descriptive Statistics. Section 3.1: Elementary Graphical

With Excel normal quantile can be found with the NORMINV function.With Excel normal quantile can be found with the NORMINV function.NORMDIST finds probabilities given a particular value. NORMINV is the inverse function finding a value with a given

b bilit f b i l th th tprobability of being less than that. A cell is assigned for example the formulaA cell is assigned for example the formula

• = NORMINV(C3, 0, 1) • The 0, 1 indicates µ=0 and σ=1

oA standard normal quantile

Page 28: Ch3 Elementary DescriptiveCh3 Elementary Descriptive ...mduan/stat3411/ch3.pdf · Ch3 Elementary DescriptiveCh3 Elementary Descriptive Statistics. Section 3.1: Elementary Graphical

The textbook plots the p• standard normal quantiles on the vertical axis and • the ordered data points on the horizontal axis.the ordered data points on the horizontal axis.

Many software packages and other books plot theMany software packages and other books plot the standard normal quantiles on the horizontal axis and the ordered data points on the vertical axisthe ordered data points on the vertical axis.

Eith th l t h ld l k ``f i l '' t i ht if thEither way, the plot should look ``fairly'' straight if the data are from a normal distribution.

Page 29: Ch3 Elementary DescriptiveCh3 Elementary Descriptive ...mduan/stat3411/ch3.pdf · Ch3 Elementary DescriptiveCh3 Elementary Descriptive Statistics. Section 3.1: Elementary Graphical

Here are ordered lifetimes of springs under 2 levels of stress. (page 379)

Normal 950 stress 900 stress Normal 950 stress 900 stressn i (i-0.5)/n Quantile Lifetime Lifetime 10 1 0.05 -1.645 117 153 2 0.15 -1.036 135 162 3 0.25 -0.674 135 189 4 0.35 -0.385 162 216 5 0.45 -0.126 162 216 6 0.55 0.126 171 216 7 0.65 0.385 189 225 8 0.75 0.674 189 225 9 0.85 1.036 198 243 10 0.95 1.645 225 306

Since n=10 for both sets the corresponding normal quantiles are the same for both sets.

Page 30: Ch3 Elementary DescriptiveCh3 Elementary Descriptive ...mduan/stat3411/ch3.pdf · Ch3 Elementary DescriptiveCh3 Elementary Descriptive Statistics. Section 3.1: Elementary Graphical

To construct normal plots for these two data sets, we plot • each ordered data set versus• the standard normal quantiles from Excel.

300

350

200

250

engt

h

950 stress

100

150

Life

-l 900 stress

0

50

2 000 1 000 0 000 1 000 2 000-2.000 -1.000 0.000 1.000 2.000

Normal Quantiles

Since both plots are fairly straight, these data are fairly normal.

Page 31: Ch3 Elementary DescriptiveCh3 Elementary Descriptive ...mduan/stat3411/ch3.pdf · Ch3 Elementary DescriptiveCh3 Elementary Descriptive Statistics. Section 3.1: Elementary Graphical

Excel File of Lifetime of Springs DataExcel File of Lifetime of Springs Data

Normal Ordered Orderedn i (i-0.5)/n Quantile E(Z) 900 stress 950 stress

10 1 0 05 -1 645 -1 539 153 11710 1 0.05 -1.645 -1.539 153 1172 0.15 -1.036 -1.001 162 1353 0.25 -0.674 -0.656 189 1354 0 35 -0 385 -0 376 216 1624 0.35 -0.385 -0.376 216 1625 0.45 -0.126 -0.123 216 1626 0.55 0.126 0.123 216 1717 0 65 0 385 0 376 225 1897 0.65 0.385 0.376 225 1898 0.75 0.674 0.656 225 1899 0.85 1.036 1.001 243 198

10 0 95 1 645 1 539 306 22510 0.95 1.645 1.539 306 225

Page 32: Ch3 Elementary DescriptiveCh3 Elementary Descriptive ...mduan/stat3411/ch3.pdf · Ch3 Elementary DescriptiveCh3 Elementary Descriptive Statistics. Section 3.1: Elementary Graphical

Section 3.3: Numerical Summaries

Measures of Location: The data are found spread around what value ?p

M di Q(O 50) 50th ilMedian = Q(O.50) = 50th percentile.n

x∑Sample mean = arithmetic mean = average 1

iix

xn

==∑

The mean is more affected by unusual values than the median.

Page 33: Ch3 Elementary DescriptiveCh3 Elementary Descriptive ...mduan/stat3411/ch3.pdf · Ch3 Elementary DescriptiveCh3 Elementary Descriptive Statistics. Section 3.1: Elementary Graphical

Measures of Spread:

• R = Range = Biggest – Smallest

• The size of the range can be affected by how many values we have. Many number will tend to have a larger range than fewer numbers.

• IQR = lnterquartile Range = Q(0.75) – Q(0.25)Q q g Q( ) Q( )Range that include half of the values.

Page 34: Ch3 Elementary DescriptiveCh3 Elementary Descriptive ...mduan/stat3411/ch3.pdf · Ch3 Elementary DescriptiveCh3 Elementary Descriptive Statistics. Section 3.1: Elementary Graphical

( )2x x−∑• Sample variance = ( )2

1ix x

sn

=−

Essentially an average squared deviation from hthe mean.

2

• Sample standard deviation = ( )2

2

1ix x

s sn

−= =

−∑

Page 35: Ch3 Elementary DescriptiveCh3 Elementary Descriptive ...mduan/stat3411/ch3.pdf · Ch3 Elementary DescriptiveCh3 Elementary Descriptive Statistics. Section 3.1: Elementary Graphical

Example: X1 = 8 X2 = 9 X3 = 4

8 9 4 73

x + += =

( ) ( ) ( )2 2 22

38 7 9 7 4 7

7− + − + −( ) ( ) ( )2 7

27 2 65

s = =

7 2.65s = =

Page 36: Ch3 Elementary DescriptiveCh3 Elementary Descriptive ...mduan/stat3411/ch3.pdf · Ch3 Elementary DescriptiveCh3 Elementary Descriptive Statistics. Section 3.1: Elementary Graphical

Statistics and Parameters

A statistic is a numerical summary of theA statistic is a numerical summary of the sample data.

= sample meanx px

s2 = sample variance

Page 37: Ch3 Elementary DescriptiveCh3 Elementary Descriptive ...mduan/stat3411/ch3.pdf · Ch3 Elementary DescriptiveCh3 Elementary Descriptive Statistics. Section 3.1: Elementary Graphical

A parameter is a summary of an entire population or a theoretical distribution, for example a normal distribution.

µ = population mean 1

N

iix∑

σ2 = population variance

1i

Nµ ==

p p

( )2

2 1

N

iix µ−∑

2 1i

Nσ ==

Average squared deviation from the mean. Second central moment.

σ = population standard deviation 2σ σ=

Page 38: Ch3 Elementary DescriptiveCh3 Elementary Descriptive ...mduan/stat3411/ch3.pdf · Ch3 Elementary DescriptiveCh3 Elementary Descriptive Statistics. Section 3.1: Elementary Graphical

• For a sample of size n the sample variance isFor a sample of size n, the sample variance is

2 21 ( )n

s x x= ∑1

( )1 ii

s x xn =

= −− ∑

• Why divide by n ‐1? This makes an unbiased estimator of Unbiased means on

2s2σunbiased estimator of . Unbiased means on

the average correct.σ

Page 39: Ch3 Elementary DescriptiveCh3 Elementary Descriptive ...mduan/stat3411/ch3.pdf · Ch3 Elementary DescriptiveCh3 Elementary Descriptive Statistics. Section 3.1: Elementary Graphical

Suppose we have a large population of ball bearings with diameters µ=1cm and 20.02 0.0004σ σ= =

Sample1 0.98 0.00032

2 sx2 1.03 0.000313 1.01 0.000454 1.02 0.00052. . .. . .∞ ‐‐‐‐‐‐ ‐‐‐‐‐‐‐‐Mean 1.00 0.0004

If we knew µ we would find 22 ( )n x µ−∑If we knew µ we would find

Fact 2

1

( )ˆ i

i

xn

µσ=

= ∑∑∑ −=− 22 )()( min xxmx ii

So and would be too small for σ2.

2

∑∑ −≤− 22 )()( µii xxxnxxi

2)( −∑

Dividing by n‐1 makes s2 come out right (σ2 )on average.

Page 40: Ch3 Elementary DescriptiveCh3 Elementary Descriptive ...mduan/stat3411/ch3.pdf · Ch3 Elementary DescriptiveCh3 Elementary Descriptive Statistics. Section 3.1: Elementary Graphical

Notice that s2 is undefined if n=1; we can't divide by zero.y

Thi kThis makes sense.

If we have only one number, that number t ll thi b t t ti l d i thtells us nothing about potential spread in the population.

Page 41: Ch3 Elementary DescriptiveCh3 Elementary Descriptive ...mduan/stat3411/ch3.pdf · Ch3 Elementary DescriptiveCh3 Elementary Descriptive Statistics. Section 3.1: Elementary Graphical

Pl i i i i i f lPlotting summary statistics over time is useful for issues such as quality control.

Read section 3.3.4 for general information.

Page 42: Ch3 Elementary DescriptiveCh3 Elementary Descriptive ...mduan/stat3411/ch3.pdf · Ch3 Elementary DescriptiveCh3 Elementary Descriptive Statistics. Section 3.1: Elementary Graphical