1/2/2014 (c) 2000, ron s. kenett, ph.d.1 understanding variability instructor: ron s. kenett email:...

55
07/02/22 (c) 2000, Ron S. Kenett, Ph.D. 1 Understanding Variabilit Instructor: Ron S. Kenett Email: [email protected] Course Website: www.kpa.co.il/biostat Course textbook: MODERN INDUSTRIAL STATISTICS, Kenett and Zacks, Duxbury Press, 1998

Upload: lillian-wade

Post on 26-Mar-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 1

Understanding Variability

Instructor: Ron S. KenettEmail: [email protected]

Course Website: www.kpa.co.il/biostatCourse textbook: MODERN INDUSTRIAL STATISTICS,

Kenett and Zacks, Duxbury Press, 1998

Page 2: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 2

Course Syllabus

•Understanding Variability•Variability in Several Dimensions•Basic Models of Probability•Sampling for Estimation of Population Quantities•Parametric Statistical Inference•Computer Intensive Techniques•Multiple Linear Regression•Statistical Process Control•Design of Experiments

Page 3: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 3

Discrete DataDiscrete Data

A set of data is said to be discrete if the values / observations belonging to it are distinct and separate. That is, they can be counted (1,2,3,.......). For example, the number of kittens in a litter; the number of patients in a doctors surgery; the number of flaws in one metre of cloth; gender (male, female); blood group (O, A, B, AB).

Discrete DataDiscrete Data

A set of data is said to be discrete if the values / observations belonging to it are distinct and separate. That is, they can be counted (1,2,3,.......). For example, the number of kittens in a litter; the number of patients in a doctors surgery; the number of flaws in one metre of cloth; gender (male, female); blood group (O, A, B, AB).

Page 4: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 4

Continuous DataContinuous Data

A set of data is said to be continuous if the values / observations belonging to it may take on any value within a finite or infinite interval. You can count, order and measure continuous data. For example, height; weight; temperature; the amount of sugar in an orange; the time required to run a mile.

Continuous DataContinuous Data

A set of data is said to be continuous if the values / observations belonging to it may take on any value within a finite or infinite interval. You can count, order and measure continuous data. For example, height; weight; temperature; the amount of sugar in an orange; the time required to run a mile.

Page 5: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 5

Types of VariablesTypes of Variables

Qualitative Variables Attributes, categories

Examples: male/female, registered to vote/not, ethnicity, eye color....

Quantitative Variables Discrete - usually take on integer values but

can take on fractions when variable allows - counts, how many

Continuous - can take on any value at any point along an interval - measurements, how much

Page 6: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 6

For each of the following, indicate whether the appropriate

variable would be qualitative or quantitative.

If the variable is quantitative, indicate whether it would be discrete or continuous.

Self Assessment TestSelf Assessment Test

Page 7: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 7

Self Assessment TestSelf Assessment Test

a) Whether you own an RCA Colortrak television set

b) Your status as a full-time or a part-time student

c) Number of people who attended your school’s graduation last year

Qualitative Variable two levels: yes/no no measurement

Qualitative Variable two levels: full/part no measurement

Quantitative, Discrete Variable a countable number only whole numbers

Page 8: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 8

Self Assessment TestSelf Assessment Test

d) The price of your most recent haircut

e) Sam’s travel time from his dorm to the Student Union

Quantitative, Discrete Variable a countable number only whole numbers

Quantitative, Continuous Variable any number time is measured can take on any value

greater than zero

Page 9: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 9

Self Assessment TestSelf Assessment Test

f) The number of students on campus who belong to a social fraternity or sorority

Quantitative, Discrete Variable a countable number only whole numbers

Page 10: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 10

Scales of MeasurementScales of Measurement

Nominal Scale - Labels represent various levels of a categorical variable.

Ordinal Scale - Labels represent an order that indicates either preference or ranking.

Interval Scale - Numerical labels indicate order and distance between elements. There is no absolute zero and multiples of measures are not meaningful.

Ratio Scale - Numerical labels indicate order and distance between elements. There is an absolute zero and multiples of measures are meaningful.

Page 11: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 11

Self Assessment TestSelf Assessment Test

Bill scored 1200 on the Scholastic Aptitude Test and entered college as a physics major. As a freshman, he changed to business because he thought it was more interesting. Because he made the dean’s list last semester, his parents gave him $30 to buy a new Casio calculator. Identify at least one piece of information in the:

Page 12: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 12

Self Assessment TestSelf Assessment Test

a) nominal scale of measurement.

1. Bill is going to college.2. Bill will buy a Casio calculator.3. Bill was a physics major.4. Bill is a business major.5. Bill was on the dean’s list.

Page 13: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 13

Self Assessment TestSelf Assessment Test

b) ordinal scale of measurement

c) interval scale of measurement

d) ratio scale of measurement

Bill is a freshman.

Bill earned a 1200 on the SAT.

Bill’s parents gave him $30.

Page 14: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 14

Self Assessment TestSelf Assessment Test

b) ordinal scale of measurement

c) interval scale of measurement

d) ratio scale of measurement

Bill is a freshman.

Bill earned a 1200 on the SAT.

Bill’s parents gave him $30.

Page 15: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 15

HistogramHistogram

A histogram is a way of summarising data that are measured on an interval scale (either discrete or continuous). It is often used in exploratory data analysis to illustrate the major features of the distribution of the data in a convenient form. It divides up the range of possible values in a data set into classes or groups. For each group, a rectangle is constructed with a base length equal to the range of values in that specific group, and an area proportional to the number of observations falling into that group. This means that the rectangles might be drawn of non-uniform height.

HistogramHistogram

A histogram is a way of summarising data that are measured on an interval scale (either discrete or continuous). It is often used in exploratory data analysis to illustrate the major features of the distribution of the data in a convenient form. It divides up the range of possible values in a data set into classes or groups. For each group, a rectangle is constructed with a base length equal to the range of values in that specific group, and an area proportional to the number of observations falling into that group. This means that the rectangles might be drawn of non-uniform height.

Page 16: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 16

Data array An orderly presentation of data in

either ascending or descending numerical order.

Frequency Distribution A table that represents the data in

classes and that shows the number of observations in each class.

Key TermsKey Terms

Page 17: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 17

Key TermsKey Terms

Frequency Distribution Class - The category Frequency - Number in each class Class limits - Boundaries for each

class Class interval - Width of each class Class mark - Midpoint of each class

Page 18: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 18

SturgeSturge’’s Rules Rule

How to set the approximate number of classes to begin constructing a frequency distribution.

where k = approximate number of classes to use and

n = the number of observations in the data set .

Page 19: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 19

Frequency DistributionsFrequency Distributions

1. Number of classes Choose an approximate number of classes for your

data. Sturges’ rule can help.

2. Estimate the class interval Divide the approximate number of classes (from

Step 1) into the range of your data to find the approximate class interval, where the range is defined as the largest data value minus the smallest data value.

3. Determine the class intervalRound the estimate (from Step 2) to a convenient value.

Page 20: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 20

Frequency DistributionsFrequency Distributions

4. Lower Class LimitDetermine the lower class limit for the first class by selecting a convenient number that is smaller than the lowest data value.

5. Class LimitsDetermine the other class limits by repeatedly adding the class width (from Step 2) to the prior class limit, starting with the lower class limit (from Step 3).

6. Define the classesUse the sequence of class limits to define the classes.

Page 21: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 21

Relative Frequency DistributionsRelative Frequency Distributions

1. Retain the same classes defined in the frequency distribution.

2. Sum the total number of observations across all classes of the frequency distribution.

3. Divide the frequency for each class by the total number of observations, forming the percentage of data values in each class.

Page 22: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 22

Cumulative Relative Frequency Cumulative Relative Frequency DistributionsDistributions

1. List the number of observations in the lowest class.

2. Add the frequency of the lowest class to the frequency of the second class. Record that cumulative sum for the second class.

3. Continue to add the prior cumulative sum to the frequency for that class, so that the cumulative sum for the final class is the total number of observations in the data set.

Page 23: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 23

Cumulative Relative Frequency Cumulative Relative Frequency DistributionsDistributions

4 .Divide the accumulated frequencies for each class by the total number of observations -- giving you the percent of all observations that occurred up to an including that class.

An Alternative: Accrue the relative frequencies for each class instead of the raw frequencies. Then you don’t have to divide by the total to get percentages.

Page 24: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 24

ExampleExample

The average daily cost to community hospitals for patient stays during 1993 for each of the 50 U.S. states was given in the next table. a) Arrange these into a data array. b) Construct a stem-and-leaf display. *) Approximately how many classes would be

appropriate for these data? c & d) Construct a frequency distribution. State

interval width and class mark. e) Construct a histogram, a relative frequency

distribution, and a cumulative relative frequency distribution.

Page 25: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 25

Example Example ––Data ListData List

AL $775 HI 823 MA 1,036 NM 1,046 SD 506

AK 1,136 ID 659 MI 902 NY 784 TN 859

AZ 1,091 IL 917 MN 652 NC 763 TX 1,010

AR 678IN 898 MS 555 ND 507 UT 1,081CA 1,221 IA 612 MO 863 OH 940 VT

676CO 961 KS 666 MT 482 OK 797 VA

830CT 1,058 KY 703 NE 626 OR 1,052 WA

1,143DE 1,024 LA 875 NV 900 PA 861 WV

701FL 960 ME 738 NH 976 RI 885 WI

744GA 775 MD 889 NJ 829 SC 838 WY

537

Page 26: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 26

Example Example –– Data Array Data Array

CA 1,221TX 1,010 RI 885 NY 784 KS 666WA 1,143NH 976 LA 875 AL 775 ID 659AK 1,136CO 961 MO 863 GA 775 MN 652AZ 1,091FL 960 PA 861 NC 763 NE 626UT 1,081CH 940 TN 859 WI 744 IA 612CT 1,058 IL 917 SC 838 ME 738 MS 555OR 1,052 MI 902 VA 830 KY 703 WY

537NM 1,046NV 900 NJ 829 WV 701 ND 507MA 1,036 IN 898 HI 823 AR 678 SD 506DE 1,024MD 889 OK 797 VT 676 MT 482

Page 27: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 27

Example Example –– Stem and Leaf Stem and Leaf DisplayDisplayStem-and-Leaf Display N = 50Leaf Unit: 100

1 12 21 2 11 43, 36 8 10 91, 81, 58, 52, 46, 36, 24, 10 7 9 76, 61, 60, 40, 17, 02, 00(11) 8 98, 89, 85, 75, 63, 61, 59, 38, 30, 29, 23 9 7 97, 84, 75, 75, 63, 44, 38, 03, 01 7 6 78, 76, 66, 59, 52, 26, 12 4 5 55, 37, 07, 06 1 4 82

Range: $482 - $1,221

Page 28: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 28

To approximate the number of classes we should use in creating the frequency distribution, use Sturges’ Rule, n = 50:

Sturges’ rule suggests we use approximately 7 classes.

k13.322(log10

n)13.322(log10

50)

13.322(1.69897)15.6446.6447

Example Example –– Frequency Frequency DistributionDistribution

Page 29: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 29

Step 1. Number of classes Sturges’ Rule: approximately 7

classes.

The range is: $1,221 – $482 = $739

$739/7 $106 and $739/8 $92 Steps 2 & 3. The Class

Interval So, if we use 8 classes, we can make

each class $100 wide.

Example Example –– Frequency Frequency DistributionDistribution

Page 30: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 30

Example Example –– Frequency Frequency DistributionDistribution

Step 1. Number of classes Sturges’ Rule: approximately 7

classes.

The range is: $1,221 – $482 = $739

$739/7 $106 and $739/8 $92 Steps 2 & 3. The Class

Interval So, if we use 8 classes, we can make

each class $100 wide.

Page 31: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 31

Example Example –– Frequency Frequency DistributionDistribution

Step 4. The Lower Class Limit If we start at $450, we can cover the range in 8

classes, each class $100 in width.The first class : $450 up to $550

Steps 5 & 6. Setting Class Limits$450 up to $550 $850 up to $950$550 up to $650 $950 up to $1,050$650 up to $750 $1,050 up to $1,150$750 up to $850 $1,150 up to $1,250

Page 32: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 32

Example Example –– Frequency Frequency DistributionDistribution

Average daily cost NumberMark

$450 – under $550 4 $500 $550 – under $650 3 $600 $650 – under $750 9 $700 $750 – under $850 9 $800 $850 – under $950 11 $900 $950 – under $1,050 7 $1,000$1,050 – under $1,150 6 $1,100$1,150 – under $1,250 1 $1,200

Interval width: $100

Page 33: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 33

Example Example –– Histogram Histogram

0

2

4

6

8

10

12

500 600 700 800 900 1000 1100 1200

Page 34: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 34

Example Example –– Relative Frequency Relative Frequency DistributionDistribution

Average daily cost Number Rel. Freq. $450 – under $550 4 4/50 = .08 $550 – under $650 3 3/50 = .06 $650 – under $750 9 9/50 = .18 $750 – under $850 9 9/50 = .18 $850 – under $950 11 11/50 = .22 $950 – under $1,050 7 7/50 = .14$1,050 – under $1,150 6 6/50 = .12$1,150 – under $1,250 1 1/50 = .02

Page 35: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 35

Example Example –– Polygon Polygon

0

0.05

0.1

0.15

0.2

0.25

0 200 400 600 800 1000 1200 1400

Page 36: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 36

Example Example –– Cumulative Cumulative Frequency DistributionFrequency Distribution

Average daily cost Number Cum. Freq. $450 – under $550 4 4 $550 – under $650 3 7 $650 – under $750 9 16 $750 – under $850 9 25 $850 – under $9 11 36 $950 – under $1,050 7 43$1,050 – under $1,150 6 49$1,150 – under $1,250 1 50

Page 37: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 37

Example Example –– Cumulative Relative Cumulative Relative Frequency DistributionFrequency Distribution

Average daily cost Cum.Freq. Cum.Rel.Freq. $450 – under $550 4 4/50 = .02 $550 – under $650 7 7/50 = .14 $650 – under $750 16 16/50 = .32 $750 – under $850 25 25/50 = .50 $850 – under $950 36 36/50 = .72 $950 – under $1,050 43 43/50 = .86$1,050 – under $1,150 49 49/50 = .98$1,150 – under $1,250 50 50/50 = 1.00

Page 38: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 38

Example Example –– Percentage Ogive Percentage Ogive

0

5

10

15

20

25

30

35

40

45

50

0 200 400 600 800 1000 1200

Page 39: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 39

Page 40: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 40

Key TermsKey Terms

Measures of Central Tendency,

The Center

Mean µ, population; , sample

Weighted Mean Median Mode

x

Page 41: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 41

Key TermsKey Terms

Measures of Dispersion,

The Spread

Range Mean absolute deviation Variance Standard deviation Interquartile range Interquartile deviation Coefficient of variation

Page 42: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 42

Key TermsKey Terms

Measures of Relative Position

Quantiles Quartiles Deciles Percentiles

Residuals Standardized values

Page 43: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 43

The MeanThe Mean

Mean Arithmetic average = (sum all values)/# of

values Population: µ = (xi)/N Sample: = (xi)/n

Problem: Calculate the average number of truck shipments from the United States to five Canadian cities for the following data given in thousands of bags:

Montreal, 64.0; Ottawa, 15.0; Toronto, 285.0; Vancouver, 228.0; Winnipeg, 45.0

(Ans: 127.4)

x

Page 44: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 44

The Weighted MeanThe Weighted Mean

When what you have is grouped data, compute the mean using µ = (wixi)/wi

Problem: Calculate the average profit from truck shipments, United States to Canada, for the following data given in thousands of bags and profits per thousand bags:Montreal 64.0 Ottawa 15.0 Toronto 285.0

$15.00 $13.50 $15.50

Vancouver 228.0 Winnipeg 45.0 $12.00 $14.00

(Ans: $14.04 per thous. bags)

Page 45: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 45

The MedianThe Median

To find the median:

1. Put the data in an array.2A. If the data set has an ODD number of numbers, the

median is the middle value.2B. If the data set has an EVEN number of numbers,

the median is the AVERAGE of the middle two values.(Note that the median of an even set of data values is not necessarily a member of the set of values.)

The median is particularly useful if there are outliers in the data set, which otherwise tend to sway the value of an arithmetic mean.

Page 46: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 46

The ModeThe Mode

The mode is the most frequent value. While there is just one value for the

mean and one value for the median, there may be more than one value for the mode of a data set.

The mode tends to be less frequently used than the mean or the median.

0

2

4

6

8

10

12

500 600 700 800 900 1000 1100 1200

Page 47: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 47

Comparing Measures of Central Comparing Measures of Central TendencyTendency

If mean = median = mode, the shape of the distribution is symmetric.

If mode < median < mean or if mean > median > mode,the shape of the distribution trails to the right,is positively skewed.

If mean < median < mode or if mode > median > mean,the shape of the distribution trails to the left,is negatively skewed.

Page 48: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 48

The RangeThe Range

The range is the distance between the smallest and the largest data value in the set.

Range = largest value – smallest value

Sometimes range is reported as an interval, anchored between the smallest and largest data value, rather than the actual width of that interval.

Page 49: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 49

ResidualsResiduals

Residuals are the differences between each data value in the set and the group mean: for a population, xi – µ for a sample, xi – x

Page 50: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 50

The MADThe MAD

The mean absolute deviation is found by summing the absolute values of all residuals and dividing by the number of values in the set:for a population, MAD = (|xi – µ|)/Nfor a sample, MAD = (|xi – |)/n

x

Page 51: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 51

The VarianceThe Variance

Variance is one of the most frequently used measures of spread,

for population,

for sample,

The right side of each equation is often used as a computational shortcut.

2 (x

i–)2

N(x

i)2 – N2

N

s2(x

i– x )2

n–1(x

i)2 –nx 2

n–1

Page 52: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 52

The Standard DeviationThe Standard Deviation

Since variance is given in squared units, we often find uses for the standard deviation, which is the square root of variance: for a population,

for a sample,

2

s s2

Page 53: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 53

QuartilesQuartiles

One of the most frequently used quantiles is the quartile.

Quartiles divide the values of a data set into four subsets of equal size, each comprising 25% of the observations.

To find the first, second, and third quartiles: 1. Arrange the N data values into an array. 2. First quartile, Q1 = data value at position (N +

1)/4 3. Second quartile, Q2 = data value at position 2(N

+ 1)/4 4. Third quartile, Q3 = data value at position 3(N +

1)/4

Page 54: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 54

QuartilesQuartiles

0.0 1.5 3.0 4.5 6.0

0

25

50

75

100

Ln_YarnS

Cum

ula

tive

Fre

que

ncy

Q1 Q2 Q3

0.0 1.5 3.0 4.5 6.0

0

25

50

75

100

Ln_YarnS

Cum

ula

tive

Fre

que

ncy

Q1 Q2 Q3

Page 55: 1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Email: ron@kpa.co.ilron@kpa.co.il Course Website:

04/10/23

(c) 2000, Ron S. Kenett, Ph.D. 55

Standardized ValuesStandardized Values

How far above or below the individual value is compared to the population mean in units of standard deviation “How far above or below” (data value –

mean) which is the residual... “In units of standard deviation” divided by

Standardized individual value: A negative z means the data value falls below the

mean.

x– z