statistics & data analysis

48
Statistics & Data Analysis Course Number B01.1305 Course Section 60 Meeting Time Monday 6-9:30 pm CLASS #1

Upload: trevor-curtis

Post on 03-Jan-2016

22 views

Category:

Documents


0 download

DESCRIPTION

Statistics & Data Analysis. Course NumberB01.1305 Course Section60 Meeting TimeMonday 6-9:30 pm. CLASS #1. Class #1 Outline. Introduction to the instructor Introduction to the class Review of syllabus Introduction to statistics Class Goals Types of data - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Statistics & Data Analysis

Statistics & Data AnalysisStatistics & Data Analysis

Course Number B01.1305

Course Section 60

Meeting Time Monday 6-9:30 pm

Course Number B01.1305

Course Section 60

Meeting Time Monday 6-9:30 pm

CLASS #1

Page 2: Statistics & Data Analysis

Professor S. D. Balkin -- May 20, 2002 2

Class #1 OutlineClass #1 Outline

Introduction to the instructor

Introduction to the class• Review of syllabus• Introduction to statistics• Class Goals

Types of data

Graphical and numerical methods for univariate series

Minitab Tutorial

Page 3: Statistics & Data Analysis

Professor S. D. Balkin -- May 20, 2002 3

Professor Balkin’s InfoProfessor Balkin’s Info

Ph.D. in Business Administration, Penn State

Masters in Statistics, Penn State

Mathematics/Economics and Music, Lafayette College

Employment• Pfizer Inc.

– Management Science Group; Sept. 2001 – current

• Ernst & Young– Quantitative Economics and Statistics Group; June 1999 – August 2001

Page 4: Statistics & Data Analysis

Professor S. D. Balkin -- May 20, 2002 4

What is Statistics?What is Statistics?

STATISTICS: A body of principles and methods for extracting useful information from data, for assessing the reliability of that information, for measuring and managing risk, and for making decisions in the face of uncertainty.

POPULATION: set of measurements corresponding to the entire collection of units

SAMPLE: set of measurements that are collected from a population

OBJECTIVES:• To make inferences about a population from a sample, including

the extent of uncertainty• Design the data collection process to facilitate drawing valid

inferences

Page 5: Statistics & Data Analysis

Professor S. D. Balkin -- May 20, 2002 5

Reasons for SamplingReasons for Sampling

Typically due to prohibitive cost of contacting millions of people or performing costly experiments• Election polls query about 2,000 voters to make

inferences regarding how all voters cast their ballots

Sometimes the sampling process is destructive• Sampling wine quality

Page 6: Statistics & Data Analysis

Professor S. D. Balkin -- May 20, 2002 6

Statistics in Everyday LifeStatistics in Everyday Life

Monthly Unemployment Rates (BLS)

Consumer Price Index

Presidential Approval Rating

Quality and Productivity Improvement

Scientific Inquiry• Training effectiveness• Advertising impact

Page 7: Statistics & Data Analysis

Professor S. D. Balkin -- May 20, 2002 7

Interesting Statistical PerspectivesInteresting Statistical Perspectives

“Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write”.

– (H. G. Wells)

“There are three kinds of lies -- Lies, damn lies, and statistics”.

– (Benjamin Disraeli)

“You’ve got to know when to hold ‘em, know when to fold ‘em.”

– (Kenny Rogers, in The Gambler)

“The average U. S. household has 2.75 people in it.”– (U. S. Census Bureau, 1980)

“4 out of 5 dentists surveyed recommended Trident Sugarless Gum for their patients who chew gum.”

– (Advertisement for Trident)

Page 8: Statistics & Data Analysis

Professor S. D. Balkin -- May 20, 2002 8

Semester OverviewSemester Overview

Understanding data• Intro to descriptive statistics, interpreting data, and graphical

methods

Dealing with and quantifying uncertainty• Random variables and probability

Using samples to make generalizations about populations• Assessing whether a change in data is beyond random

variation

Modeling relationships and predicting• Using sample data to create models that give predictions for

all values of a population

Page 9: Statistics & Data Analysis

Professor S. D. Balkin -- May 20, 2002 9

Goals for this ClassGoals for this Class

To gain an understanding of descriptive statistics, probability, statistical inference, and regression analysis so that it may be applied to your job

To be able to identify when statistical procedures are required to facilitate your business decision making

To be able to identify both good and poor use of statistics in business

Page 10: Statistics & Data Analysis

Professor S. D. Balkin -- May 20, 2002 10

Goals for MeGoals for Me

To teach you statistics and data analysis effectively

To improve my effectiveness as an instructor

Page 11: Statistics & Data Analysis

Professor S. D. Balkin -- May 20, 2002 11

My Promise To YouMy Promise To You

I will not teach you anything in this class that is not regularly used in business and industry

If you ask, “Where is this used?” I will have a real example for you

Page 12: Statistics & Data Analysis

Professor S. D. Balkin -- May 20, 2002 12

Types of DataTypes of Data

C able Appoin tm en t (M ade , M issed)E m ploym ent S ta tus (em p loyed, unem p loyed)

B ond R atings (1, 2 , 3 , o r 4 sta rs)S e rv ice Q ua lity (poo r, good , exce llen t)

Q ua lita tive / Ca tego ria lQ ua lita tive tra it on ly c lass ifiab le in to ca tego ries

C able Appo in tm en t W a iting T im e (hou rs)E m ploym ent Tenure (m onths)

B ond R etu rn (pe rcentage)C ost (dolla rs)

Q uan tita tive / C on tinuousC haracte ristic measu rem en t on a numerica l sca le

D ata

Page 13: Statistics & Data Analysis

Professor S. D. Balkin -- May 20, 2002 13

Example: Data TypesExample: Data Types

Business Horizons (1993) conducted a comprehensive survey of 800 CEOs who run the country's largest global corporations. Some of the variables measured are given below. Classify them as quantitative or qualitative.

• State of birth• Age• Educational Level• Tenure with Firm• Total Compensation• Area of Expertise• Gender

Page 14: Statistics & Data Analysis

Professor S. D. Balkin -- May 20, 2002 14

How Much DataHow Much Data

GM AT Scores for students in this classIncom es in a zipcode

Returns for a stock over this past yearRespondent ages from m arket research

W hat is a typical value?How do the values vary?

Univariate DataData sets w ith just one piece of inform ation

GM AT scores and college GPAIncom es and age in a zipcode

Returns and volum e for a stockM R respondent age and purchase intent

Is there a relationship?How strong is the relationship?

Is there a predictive relationship?

B ivariate DataData sets w ith two pieces of inform ation

GM AT Scores, Salary, Gender, Job Tenure,Job Category, House O wnership, etc...

A re there relationships?How strong are the relationships?Do predictive relationships ex ist?

M ultivariate DataData sets w ith three or more pieces of inform ation

Variables

Page 15: Statistics & Data Analysis

CHAPTER 2CHAPTER 2Summarizing Data about

One Variable

Summarizing Data about

One Variable

Page 16: Statistics & Data Analysis

Professor S. D. Balkin -- May 20, 2002 16

IntroductionIntroduction

Unorganized mass of numbers is difficult to interpret

First task in understanding data is summarizing it• Graphically• Numerically

Page 17: Statistics & Data Analysis

Professor S. D. Balkin -- May 20, 2002 17

Chapter GoalsChapter Goals

Distinguish between qualitative and quantitative variables

Learn graphic representations of univariate data

Learn numerical representations of univariate data

Investigate data acquired over time

Page 18: Statistics & Data Analysis

Professor S. D. Balkin -- May 20, 2002 18

Distribution of ValuesDistribution of Values

Distribution is essentially how many times each possible data values occur in a set of data.

Methods for displaying distributions• Qualitative data

– Frequency table– Bar charts

• Quantitative data– Histograms– Stem-Leaf diagrams– Boxplots

Page 19: Statistics & Data Analysis

Professor S. D. Balkin -- May 20, 2002 19

Example: Qualitative DataExample: Qualitative Data

Background: A question on a market research survey asked 17 respondents the size of their households

Data: 1,1,1,2,2,2,2,2,3,3,3,3,3,3,4,4,6

Frequency TableHousehold

SizeNumber of

Households

1 32 53 64 25 06 1

Page 20: Statistics & Data Analysis

Professor S. D. Balkin -- May 20, 2002 20

Example: Qualitative Data (cont.)Example: Qualitative Data (cont.)

Barchart: Plot of frequencies each category occurs in the data set

Number of Households

0

1

2

3

4

5

6

7

1 2 3 4 5 6Household Size

Fre

qu

en

cy

Page 21: Statistics & Data Analysis

Professor S. D. Balkin -- May 20, 2002 21

Example: Quantitative DataExample: Quantitative Data

Background: Forbes magazine published data on the best small firms in 1993. These were firms with annual sales of more than five and less than $350 million. Firms were ranked by five-year average return on investment. The data are the annual salary of the chief executive officer for the first 60 ranked firms.

Data (in thousands):

145 621 262 208 362 424 339 736 291 58 498 643 390 332 750

368 659 234 396 300 343 536 543 217 298 1103 406 254 862 204

206 250 21 298 350 800 726 370 536 291 808 543 149 350 242

198 213 296 317 482 155 802 200 282 573 388 250 396 572

Page 22: Statistics & Data Analysis

Professor S. D. Balkin -- May 20, 2002 22

Example: Quantitative Data (cont.)Example: Quantitative Data (cont.)

Histograms are constructed in the same way as bar charts except:• User must create classes to count frequencies• Bars are adjacent instead of separated with space

Page 23: Statistics & Data Analysis

Professor S. D. Balkin -- May 20, 2002 23

Example: Quantitative Data (cont.)Example: Quantitative Data (cont.)

CEO Salary Histogram

Salary (in thousands)

Fre

qu

en

cy

0 200 400 600 800 1000 1200

05

10

15

20

25

30

Page 24: Statistics & Data Analysis

Professor S. D. Balkin -- May 20, 2002 24

Example: Quantitative Data (cont.)Example: Quantitative Data (cont.)

Questions:• What is the typical value of CEO salary?• How much variability is there around this value?• What is the general shape of the data?

Histogram characteristics:• Central tendency• Variability• Skewness• Modality• Outliers

Page 25: Statistics & Data Analysis

Professor S. D. Balkin -- May 20, 2002 25

SkewnesssSkewnesssSymmetric Distribution

Data

Fre

q

26 28 30 32 34

050

015

00

Right Skewed Distribution

Data

Fre

q

0 10 20 30

050

015

00

Left Skewed Distribution

Data

Fre

q

60 70 80 90 100

050

015

00

Page 26: Statistics & Data Analysis

Professor S. D. Balkin -- May 20, 2002 26

ModalityModality

Unimodal Distribution

Data

Fre

q

26 28 30 32

01

00

0

Bimodal Distribution

Data

Fre

q

8 10 12 14 16 18

05

01

50

Page 27: Statistics & Data Analysis

Professor S. D. Balkin -- May 20, 2002 27

OutliersOutliers

Distribution with Outlier

Data

Fre

q

28 30 32 34 36

05

10

15

20

25

30

35

Page 28: Statistics & Data Analysis

Professor S. D. Balkin -- May 20, 2002 28

Example: Stem-Leaf DiagramExample: Stem-Leaf Diagram

Background: Telecom company wants to analyze the time to complete new service orders measured in hours

Data: 42 21 46 69 87 29 34 59 81 97 64 60 87 81 69 77 75 47

73 82 91 74 70 65 86 87 67 69 49 57 55 68 74 66 81 90 75 82 37 94

Diagram: 2 | 193 | 474 | 26795 | 5796 | 0456789997 | 03445578 | 1112267779 | 0147

Page 29: Statistics & Data Analysis

Professor S. D. Balkin -- May 20, 2002 29

Measures of Central TendencyMeasures of Central Tendency

Mode: Value or category that occurs most frequently

Median: Middle value when the data are sorted

Mean: Sum of measurements divided by the number of measurements

Page 30: Statistics & Data Analysis

Professor S. D. Balkin -- May 20, 2002 30

Example: ModeExample: Mode

Background: A question on a market research survey asked 17 respondents the size of their households

Data: 1,1,1,2,2,2,2,2,3,3,3,3,3,3,4,4,6

Frequency TableHousehold

SizeNumber of

Households

1 32 53 64 25 06 1

Mode

Page 31: Statistics & Data Analysis

Professor S. D. Balkin -- May 20, 2002 31

Example: MedianExample: Median

Background: A question on a market research survey asked 17 respondents the size of their households

Data: 1,1,1,2,2,2,2,2,3,3,3,3,3,3,4,4,6

Since the n=17 observations, • Median is the (n+1)/2 = 9th observation

Observation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17Household Size 1 1 1 2 2 2 2 2 3 3 3 3 3 3 4 4 6

Median

Page 32: Statistics & Data Analysis

Professor S. D. Balkin -- May 20, 2002 32

Example: MeanExample: Mean

Background: Cable company wants to know how long an installer spends at each stop. One employee performed five installations in one day and recorded how many minutes she was at each location.

Data: 45, 23, 36, 29, 52

Mean = (45+23+36+29+52) / 5 = 37 minutes

Page 33: Statistics & Data Analysis

Professor S. D. Balkin -- May 20, 2002 33

Example: Back to the CEO’s SalariesExample: Back to the CEO’s Salaries

CEO Salary Histogram

Salary (in thousands)

Freq

uenc

y

0 200 400 600 800 1000 1200

05

1015

2025

30

Mean = 404.1695

Median = 350

WHY THE DIFFERENCE?

Page 34: Statistics & Data Analysis

Professor S. D. Balkin -- May 20, 2002 34

Measures of VariationMeasures of Variation

A primary reason for using statistics is due to variability

If there was no variability, we would not nee statistics

Examples:• Worker productivity• Stock market• Promotional expenditures

Measures• Standard deviation: variation around the mean• Range: distance between smallest and largest observations

Page 35: Statistics & Data Analysis

Professor S. D. Balkin -- May 20, 2002 35

Standard DeviationStandard Deviation

Standard Deviation: summarizes how far away from the mean the data value typically are.

Calculation• Find the deviations by subtracting the mean from

each data value• Square these deviations, add them up, and divide

by n-1• Take the square root of this number

Page 36: Statistics & Data Analysis

Professor S. D. Balkin -- May 20, 2002 36

Example: Standard DeviationExample: Standard Deviation

Background: Your firm spends $19 Million per year on advertising, and management is wondering if that figure is appropriate. Other firms in your industry have a mean advertising expenditure of $22.3 Million per year.

Page 37: Statistics & Data Analysis

Professor S. D. Balkin -- May 20, 2002 37

Example: Standard Deviation (cont.)Example: Standard Deviation (cont.)

Ad$$$ Deviations Sq Devs8 -14.29 204.32

19 -3.29 10.8522 -0.29 0.0920 -2.29 5.2627 4.71 22.1537 14.71 216.2638 15.71 246.6723 0.71 0.5023 0.71 0.5012 -10.29 105.9711 -11.29 127.5632 9.71 94.2020 -2.29 5.2618 -4.29 18.4423 0.71 0.5035 12.71 161.4411 -11.29 127.56

Mean = 22.29St Dev = 9.18

Industry Advertising Histogram

Millions of Dollars

Fre

qu

en

cy

5 10 15 20 25 30 35 40

01

23

4

Page 38: Statistics & Data Analysis

Professor S. D. Balkin -- May 20, 2002 38

Example: Standard Deviation (cont.)Example: Standard Deviation (cont.)

Difference from peer group average is $3.3 Million

This difference is smaller than the industry standard deviation of $9.18 Million

Conclusion: You advertising budget, while slightly below the industry average, is typical compared with your industry peers

Page 39: Statistics & Data Analysis

Professor S. D. Balkin -- May 20, 2002 39

Empirical RuleEmpirical Rule

If the histogram for a given sample is unimodal and symmetric (mound-shaped), then the following rule-of-thumb may be applied:

Let represent the sample mean and s the sample standard deviation. Then

x

ts.measuremen theof allely approximat contains3

ts;measuremen theof 95%ely approximat contains2

ts;measuremen theof 68%ely approximat contains1

sx

sx

sx

Page 40: Statistics & Data Analysis

Professor S. D. Balkin -- May 20, 2002 40

Example: Stock Market VolatilityExample: Stock Market Volatility

Description: Stock market returns are supposed to be unpredictable. Let’s see if the empirical rule holds true

Data: S&P-500 Daily returns; Jan 01, 1998 – May 17, 2002

Mean = 0.0002

St. Dev. = 0.0128

72.8% (95.3%) of the returns fallbetween the sample mean plusand minus one (two) st.dev.

S&P-500 Daily Returns Histogram

Daily Return

Fre

qu

en

cy

-0.06 -0.04 -0.02 0.00 0.02 0.04 0.06

05

01

00

15

02

00

25

03

00

35

0

Page 41: Statistics & Data Analysis

Professor S. D. Balkin -- May 20, 2002 41

Inter-Quartile RangeInter-Quartile Range

Inter-Quartile Range (IQR) provides an alternative approach to measuring variability

Computation:• Sort the data and find the median• Divide the data into top and bottom halves• Find the median of both halves. These are the 25th and

75th percentiles• IQR = 75th percentile – 25th percentile

Outlier Measure – Any value outside the inner fences is an outlier candidate• Lower inner fence = 25th percentile – 1.5 IQR• Upper inner fence = 75th percentile + 1.5 IQR

Page 42: Statistics & Data Analysis

Professor S. D. Balkin -- May 20, 2002 42

Box-Plot – S&P-500 ExampleBox-Plot – S&P-500 Example

Data: S&P-500 Daily returns; Jan 01, 1998 – May 17, 2002-0

.06

-0.0

4-0

.02

0.00

0.02

0.04

S&P-500 Daily Returns BoxplotD

aily

Ret

urn

Median

75th percentile

25th percentile

Upper inner fence

Lower inner fence

Outliers

Page 43: Statistics & Data Analysis

Minitab TutorialMinitab Tutorial

Page 44: Statistics & Data Analysis

Professor S. D. Balkin -- May 20, 2002 44

Why Use Minitab???Why Use Minitab???

Goal of course is to learn statistical concepts• Most statistical analyses are performed using computers• Each company may use a different statistical package

YES…Minitab is used in business!• Typically in quality control and design of experiments

EXCEL has very limited statistical functionality and is considerably more difficult to use than Minitab

There are many stat packages (SAS, SPSS, Systat, Splus, R, Statistica, Mathematica, etc.)• Minitab is the easiest program to use right away• Excellent Help facilities• Statistical glossary built-in

Page 45: Statistics & Data Analysis

Professor S. D. Balkin -- May 20, 2002 45

Minitab Tutorial – Case Study 1Minitab Tutorial – Case Study 1

A hotel kept records over time of the reasons why guest requested room changes. The frequencies were as follows

– Room not clean 2– Plumbing not working 1– Wrong type of bed 13– Noisy location 4– Wanted nonsmoking 18– Didn’t like view 1– Not properly equipped 8– Other 6

Page 46: Statistics & Data Analysis

Professor S. D. Balkin -- May 20, 2002 46

Minitab Tutorial – Case Study 2Minitab Tutorial – Case Study 2

Exercise 2.8 in book• Produce graphics• Produce descriptive statistics

Page 47: Statistics & Data Analysis

Professor S. D. Balkin -- May 20, 2002 47

Minitab Tutorial – Case Study 3Minitab Tutorial – Case Study 3

Diversification???

Data: S&P-500 and IBM daily returns from Jan 01, 1998 through May 17, 2002

Page 48: Statistics & Data Analysis

Professor S. D. Balkin -- May 20, 2002 48

Next TimeNext Time

Probability and Probability Distributions