chapter 1 section 1.2 describing distributions with numbers

Post on 28-Mar-2015

231 Views

Category:

Documents

6 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Chapter 1Section 1.2

Describing Distributions with

Numbers

Parameter -

Fixed value about a population

Typical unknown

Statistic -

Value calculated from a sample

Measures of Central Tendency

Mean - the arithmetic average

Use m to represent a population mean

Use to represent a sample mean

n

xx

Formula: S is the capital Greek letter

sigma – it means to sum the values that follow

parameter

statisticThis is on the formula

sheet, so you do not have to memorize it.

Measures of Central Tendency

Median - the middle of the data; 50th percentile

Observations must be in numerical orderIs the middle single value if n is oddThe average of the middle two values if n is even

NOTE: n denotes the sample size

Measures of Central Tendency

Mode – the observation that occurs the most often

Can be more than one mode

If all values occur only once – there is no mode

Not used as often as mean & median

Range-The difference between the largest and smallest observations.

This is only one number! Not 3-8 but 5

Measures of Central Tendency

Suppose we are interested in the number of lollipops that are bought at a certain store. A sample of 5 customers buys the following number of lollipops. Find the median.

2 3 4 8 12

The numbers are in order & n is odd – so find the

middle observation.

The median is 4 lollipops!

Suppose we have sample of 6 customers that buy the following number of lollipops. The median is …

2 3 4 6 8 12

The numbers are in order & n is even – so find the middle two observations.

The median is 5 lollipops!

Now, average these two values.

5

Suppose we have sample of 6 customers that buy the following number of lollipops. Find the mean.

2 3 4 6 8 12

To find the mean number of lollipops add the observations and divide by

n.

61286432 833.5x

What would happen to the median & mean if the 12 lollipops were 20?

2 3 4 6 8 20

The median is . . . 5

The mean is . . .

62086432

7.17

What happened?

What would happen to the median & mean if the 20 lollipops were 50?

2 3 4 6 8 50

The median is . . . 5

The mean is . . .

65086432

12.17

What happened?

Resistant -

Statistics that are not affected by outliers

Is the median resistant?

►Is the mean resistant?

YES

NO

Now find how each observation deviates from the mean.

What is the sum of the deviations from the mean?

Look at the following data set. Find the mean.

22 23 24 25 25 26 29 30

5.25x

xx 0

Will this sum always equal zero?

YESThis is the deviation from

the mean.

Look at the following data set. Find the mean & median.

Mean =

Median =

21 23 23 24 25 25 26 26 2627

27 27 27 28 30 30 30 31 3232

27

Create a histogram with the data. (use x-scale of 2) Then find the mean and median.

27

Look at the placement of the mean and median in

this symmetrical distribution.

Look at the following data set. Find the mean & median.

Mean =

Median =

22 29 28 22 24 25 28 2125

23 24 23 26 36 38 62 23

25

Create a histogram with the data. (use x-scale of 8) Then find the mean and median.

28.176

Look at the placement of the mean and median in

this right skewed distribution.

Look at the following data set. Find the mean & median.

Mean =

Median =

21 46 54 47 53 60 55 55 60

56 58 58 58 58 62 63 64

58

Create a histogram with the data. Then find the mean and

median.

54.588

Look at the placement of the mean and median in

this skewed left distribution.

Recap:

In a symmetrical distribution, the mean and median are equal.

In a skewed distribution, the mean is pulled in the direction of the skewness.

In a symmetrical distribution, you should report the mean!

In a skewed distribution, the median should be reported as the measure of center!

Quartiles Arrange the observations in increasing order and locate the median M in the ordered list of observations.

The first quartile Q1 is the median of the 1st half of the observations

The third quartile Q3 is the median of the2nd half of the observations.

16 19 24 25 25 33 33 34 34 37 37 40 42 46 49 73

medianQ1

Q3

25 34 41

What if there is odd number?16 19 24 25 25 33 33 34 34

medianWhen dividing data in half, forget about the middle number

The interquartile range (IQR)The distance between the first and third quartiles.

IQR = Q3 – Q1

Always positive

Outlier:We call an observation an outlier if it falls more than 1.5 x IQR above the third or below the first.

Let’s look back at the same data:

16 19 24 25 25 33 33 34 34 37 37 40 42 46 49 73

Q1=25 Q3=41IQR=41-25=1625 - 1.5 x 16 = 141 + 1.5 x 16 = 65

Lower Cutoff Upper Cutoff

Since 73 is above the upper cutoff, we will call it an outlier.

Five-number summary

Minimum

Q1

Median

Q3

Maximum

If you plot these five numbers on a graph, we have a ………

Advantage boxplots?

ease of constructionconvenient handling of outliersconstruction is not subjective (like histograms)Used with medium or large size data sets (n > 10)useful for comparative displays

Disadvantage of boxplots

does not retain the individual observations

should not be used with small data sets (n < 10)

How to construct find five-number summary

Min Q1 Med Q3 Max

draw box from Q1 to Q3

draw median as center line in the box

extend whiskers to min & max

Modified boxplots

display outliers

fences mark off the outliers

whiskers extend to largest (smallest) data value inside the fence

ALWAYS use modified boxplots in this class!!!

Modified Boxplot

Q1 Q3

Q1 – 1.5IQR Q3 + 1.5IQRAny observation outside this fence is an outlier! Put a dot

for the outliers.

Interquartile Range (IQR) – is the range (length) of the box

Q3 - Q1

These are called the fences and should not be seen.

Modified Boxplot . . .

Q1 Q3

Draw the “whisker” from the quartiles to the observation that is

within the fence!

A report from the U.S. Department of Justice gave the following percent increase in federal prison populations in 20 northeastern & mid-western states in 1999.

5.9 1.3 5.0 5.9 4.5 5.6 4.1 6.3 4.86.9

4.5 3.5 7.2 6.4 5.5 5.3 8.0 4.4 7.23.2

Create a modified boxplot. Describe the distribution.

Use the calculator to create a modified boxplot.

Evidence suggests that a high indoor radon concentration might be linked to the development of childhood cancers. The data that follows is the radon concentration in two different samples of houses. The first sample consisted of houses in which a child was diagnosed with cancer. Houses in the second sample had no recorded cases of childhood cancer.

(see data on note page)

Create parallel boxplots. Compare the distributions.

Cancer

No Cancer

100 200Radon

The median radon concentration for the no cancer group is lower than the median for the cancer group. The range of the cancer group is larger than the range for the no cancer group. Both distributions are skewed right. The cancer group has outliers at 39, 45, 57, and 210. The no cancer group has outliers at 55 and 85.

Assignment 1.2

Why is the study of variability important?

Allows us to distinguish between usual & unusual values

In some situations, want more/less variability

scores on standardized tests

time bombs

medicine

Measures of Variability

range (max-min)

interquartile range (Q3-Q1)

deviations

variance

standard deviation

xx

2

Lower case Greek letter sigma

Suppose that we have these data values:

24 34 26 30 3716 28 21 35 29

Find the mean.

Find the deviations. xx

What is the sum of the deviations from the mean?

24 34 26 30 3716 28 21 35 29

Square the deviations: 2xx

Find the average of the squared deviations:

n

xx 2

The average of the deviations squared is called the variance.

Population Sample

2 2s

parameter statistic

Calculation of variance of a sample

1

2

2

n

xxs n

df

A standard deviation is a measure of the average deviation from the mean.

Calculation of standard deviation

1

2

n

xxs n

Degrees of Freedom (df)

n deviations contain (n - 1) independent pieces of information about variability

Which measure(s) of variability is/are

resistant?

Activity (worksheet)

Linear transformation ruleWhen multiplying or adding a constant to a random variable, the mean and median changes by both.

When multiplying or adding a constant to a random variable, the standard deviation changes only by multiplication.

Formulas:xbax

xbax

a

ba

An appliance repair shop charges a $30 service call to go to a home for a repair. It also charges $25 per hour for labor. From past history, the average length of repairs is 1 hour 15 minutes (1.25 hours) with standard deviation of 20 minutes (1/3 hour). Including the charge for the service call, what is the mean and standard deviation for the charges for labor? 25.61$)25.1(2530

33.8$31

25

Rules for Combining two variables

To find the mean for the sum (or difference), add (or subtract) the two means

To find the standard deviation of the sum (or differences), ALWAYS add the variances, then take the square root.

Formulas:

baba

baba

22baba

If variables are independent

Bicycles arrive at a bike shop in boxes. Before they can be sold, they must be unpacked, assembled, and tuned (lubricated, adjusted, etc.). Based on past experience, the times for each setup phase are independent with the following means & standard deviations (in minutes). What are the mean and standard deviation for the total bicycle setup times?Phase Mean SD

Unpacking

3.5 0.7

Assembly 21.8 2.4

Tuning 12.3 2.7minutes6.373.128.215.3 T

minutes680.37.24.27.0 222 T

Assignment 1.2B

top related