numerical descriptive measures chapter 2 borrowed from 321%20ppts/c3.ppt

29
Numerical Descriptive Measures Chapter 2 Borrowed from http://www2.uta.edu/infosys/amer/ courses/3321%20ppts/c3.ppt

Upload: adela-ball

Post on 11-Jan-2016

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Numerical Descriptive Measures Chapter 2 Borrowed from  321%20ppts/c3.ppt

Numerical Descriptive Measures

Chapter 2Borrowed from

http://www2.uta.edu/infosys/amer/courses/3321%20ppts/c3.ppt

Page 2: Numerical Descriptive Measures Chapter 2 Borrowed from  321%20ppts/c3.ppt

Chapter Topics Measures of central tendency

Mean, median, mode, midrange

Quartiles Measure of variation

Range, interquartile range, variance and Standard deviation, coefficient of variation

Shape Symmetric, skewed (+/-)

Coefficient of correlation

Page 3: Numerical Descriptive Measures Chapter 2 Borrowed from  321%20ppts/c3.ppt

Summary Measures

Central Tendency

Arithmetic Mean

Median Mode

Quartile

Geometric Mean

Summary Measures

Variation

Variance

Standard Deviation

Coefficient of Variation

Range

Page 4: Numerical Descriptive Measures Chapter 2 Borrowed from  321%20ppts/c3.ppt

Measures of Central Tendency

Central Tendency

Average (Mean) Median Mode

1

1

n

ii

N

ii

XX

n

X

N

Page 5: Numerical Descriptive Measures Chapter 2 Borrowed from  321%20ppts/c3.ppt

Mean (Arithmetic Mean) Mean (arithmetic mean) of data values

Sample mean

Population mean

1 1 2

n

ii n

XX X X

Xn n

1 1 2

N

ii N

XX X X

N N

Sample Size

Population Size

Page 6: Numerical Descriptive Measures Chapter 2 Borrowed from  321%20ppts/c3.ppt

Mean (Arithmetic Mean)

The most common measure of central tendency

Affected by extreme values (outliers)

(continued)

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14

Mean = 5 Mean = 6

Page 7: Numerical Descriptive Measures Chapter 2 Borrowed from  321%20ppts/c3.ppt

Median Robust measure of central tendency Not affected by extreme values

In an Ordered array, median is the “middle” number If n or N is odd, median is the middle number If n or N is even, median is the average of

the two middle numbers

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14

Median = 5 Median = 5

Page 8: Numerical Descriptive Measures Chapter 2 Borrowed from  321%20ppts/c3.ppt

Mode A measure of central tendency Value that occurs most often Not affected by extreme values Used for either numerical or categorical

data There may may be no mode There may be several modes

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Mode = 9

0 1 2 3 4 5 6

No Mode

Page 9: Numerical Descriptive Measures Chapter 2 Borrowed from  321%20ppts/c3.ppt

Quartiles Split Ordered Data into 4 Quarters

Position of i-th Quartile

and Are Measures of Noncentral Location = Median, A Measure of Central Tendency

25% 25% 25% 25%

1Q 2Q 3Q

Data in Ordered Array: 11 12 13 16 16 17 18 21 22

1 1

1 9 1 12 13Position of 2.5 12.5

4 2Q Q

1Q 3Q

2Q

1

4i

i nQ

Page 10: Numerical Descriptive Measures Chapter 2 Borrowed from  321%20ppts/c3.ppt

Measures of Variation

Variation

Variance Standard Deviation Coefficient of Variation

PopulationVariance (σ2)

Sample

Variance (S2)

Population Standard deviation (σ)

Sample Standard deviation (S)

Range

Inter-quartile Range

Page 11: Numerical Descriptive Measures Chapter 2 Borrowed from  321%20ppts/c3.ppt

Range

Measure of variation Difference between the largest and the

smallest observations:

Ignores the way in which data are distributed

Largest SmallestRange X X

7 8 9 10 11 12

Range = 12 - 7 = 5

7 8 9 10 11 12

Range = 12 - 7 = 5

Page 12: Numerical Descriptive Measures Chapter 2 Borrowed from  321%20ppts/c3.ppt

Measure of variation Also known as midspread

Spread in the middle 50% Difference between the first and third

quartiles

Not affected by extreme values

3 1Interquartile Range 17.5 12.5 5Q Q

Interquartile Range

Data in Ordered Array: 11 12 13 16 16 17 17 18 21

Page 13: Numerical Descriptive Measures Chapter 2 Borrowed from  321%20ppts/c3.ppt

2

2 1

N

ii

X

N

Important measure of variation Shows variation about the mean

Sample variance:

Population variance:

2

2 1

1

n

ii

X XS

n

Variance

Page 14: Numerical Descriptive Measures Chapter 2 Borrowed from  321%20ppts/c3.ppt

Standard Deviation Most important measure of variation Shows variation about the mean Has the same units as the original data

Sample standard deviation:

Population standard deviation:

2

1

1

n

ii

X XS

n

2

1

N

ii

X

N

Page 15: Numerical Descriptive Measures Chapter 2 Borrowed from  321%20ppts/c3.ppt

Comparing Standard Deviations

Mean = 15.5 S = 3.338 11 12 13 14 15 16 17 18 19 20 21

11 12 13 14 15 16 17 18 19 20 21

Data B

Data A

Mean = 15.5 S = .9258

11 12 13 14 15 16 17 18 19 20 21

Mean = 15.5 S = 4.57

Data C

Page 16: Numerical Descriptive Measures Chapter 2 Borrowed from  321%20ppts/c3.ppt

Coefficient of Variation

Measures relative variation Always in percentage (%) Shows variation relative to mean Is used to compare two or more sets of

data measured in different units

100%S

CVX

Page 17: Numerical Descriptive Measures Chapter 2 Borrowed from  321%20ppts/c3.ppt

Comparing Coefficient of Variation

Stock A: Average price last year = $50 Standard deviation = $5

Stock B: Average price last year = $100 Standard deviation = $5

Coefficient of variation: Stock A:

Stock B:

$5100% 100% 10%

$50

SCV

X

$5100% 100% 5%

$100

SCV

X

Page 18: Numerical Descriptive Measures Chapter 2 Borrowed from  321%20ppts/c3.ppt

Shape of a Distribution

Describes how data is distributed Measures of shape

Symmetric or Skewed

Mean = Median =Mode Mean < Median < Mode Mode < Median < Mean

(+) Right-Skewed(-) Left-Skewed Symmetric

Page 19: Numerical Descriptive Measures Chapter 2 Borrowed from  321%20ppts/c3.ppt

Coefficient of Correlation

Measures the strength of the linear relationship between two quantitative variables

1

2 2

1 1

n

i ii

n n

i ii i

X X Y Yr

X X Y Y

Page 20: Numerical Descriptive Measures Chapter 2 Borrowed from  321%20ppts/c3.ppt

Features of Correlation Coefficient

Unit free Ranges between –1 and 1 The closer to –1, the stronger the negative

linear relationship The closer to 1, the stronger the positive linear

relationship The closer to 0, the weaker any positive linear

relationship

Page 21: Numerical Descriptive Measures Chapter 2 Borrowed from  321%20ppts/c3.ppt

Scatter Plots of Data with Various Correlation

CoefficientsY

X

Y

X

Y

X

Y

X

Y

X

r = -1 r = -.6 r = 0

r = .6 r = 1

Page 22: Numerical Descriptive Measures Chapter 2 Borrowed from  321%20ppts/c3.ppt

Chapter Summary Described measures of central tendency

Mean, median, mode, midrange

Discussed quartile Described measure of variation

Range, interquartile range, variance and standard deviation, coefficient of variation

Illustrated shape of distribution Symmetric, Skewed

Discussed correlation coefficient

Page 23: Numerical Descriptive Measures Chapter 2 Borrowed from  321%20ppts/c3.ppt

Stem-n-leaf plot A class took a test. The students' got the following scores. 94 85 74 85 77 100 85 95 98 95 First let us draw the stem and leaf plot

This makes it easy to see that the mode, the most common

score is an 85 since there are three of those scores and all of the other scores have frequencies of either one or tow.

It will also make it easier to rank the scores

Page 24: Numerical Descriptive Measures Chapter 2 Borrowed from  321%20ppts/c3.ppt

Stem-n-leaf

This will make it easier to find the median and the quartiles. There are as many scores above the median as below. Since there are ten scores, and ten is an even number, we can divide the scores into two equal groups with no scores left over. To find the median, we divide the scores up into the upper five and the lower five.

Page 25: Numerical Descriptive Measures Chapter 2 Borrowed from  321%20ppts/c3.ppt

Box Plot The five-number summary is an abbreviated way to describe a

sample. The five number summary is a list of the following numbers:

Minimum First (Lower) Quartile, Median, Third (Upper) Quartile, Maximum The five number summary leads to a graphical representation

of a distribution called the boxplot. Boxplots are ideal for comparing two nearly-continuous variables. To draw a boxplot (see the example in the figure below), follow these simple steps:

The ends of the box (hinges) are at the quartiles, so that the length of the box is the .

Page 26: Numerical Descriptive Measures Chapter 2 Borrowed from  321%20ppts/c3.ppt

Box plots

The median is marked by a line within the box. The two vertical lines (called whiskers) outside

the box extend to the smallest and largest observations within of the quartiles.

Observations that fall outside of are called extreme outliers and are marked, for example, with an open circle. Observations between and are called mild outliers and are distinguished by a different mark, e.g., a closed circle.

Page 27: Numerical Descriptive Measures Chapter 2 Borrowed from  321%20ppts/c3.ppt

Example of boxplot

The Density of Nitrogen - A Comparison of Two Samples Lord Raleigh was one of the earliest scientists to

study the density of nitrogen. In his studies, he noticed something peculiar. The density of nitrogen produced from chemical compounds tended to be smaller than the density of nitrogen produced from the air

However, he was working with fairly small samples, and the question is, was he correct in his conjecture?

Lord Raleigh's measurements which first appeared in Proceedings, Royal Society (London, 55, 1894 pp. 340-344) are produced below. The units are the mass of nitrogen filling a certain flask under specified pressure and temperature.

1. Calculate the summary statistics for each set of data.

2. Construct side-by-side box plots.

Page 28: Numerical Descriptive Measures Chapter 2 Borrowed from  321%20ppts/c3.ppt

Chemical Atmospheric

2.30143 2.31017

2.2989 2.30986

2.29816 2.3101

2.30182 2.31001

2.29869 2.31024

2.2994 2.3101

2.29849 2.31028

2.29889 2.31163

2.30074 2.30956

2.30054  

Page 29: Numerical Descriptive Measures Chapter 2 Borrowed from  321%20ppts/c3.ppt

Box Plot

Max 2.30182 2.31163

Min 2.29816

2.30956

Q1 2.29874 2.31001

Median 2.29915 2.3101

Q3 2.30069 2.31024

Atmospheric pressure has 9 observations and chemical pressure has 10 values. Use following five points to draw box plots and compare the two variables