math11002 prac modules reco

Practicum Module

Math11002 ‐ Business Statistics

By: Aurino Rilman Adam Djamaris

MODELLING AND SIMULATION LABORATORY MANAGEMENT PROGRAM

2010

MANAGEMENT PROGRAM

MODELLING AND SIMULATION LABORATORY

ARD – BUSINESS STATISTICS‐Sec. 2 of ‐131

1.1 Answer questions below with a brief description. .................................................................................. 8

1. EXPLAIN KEY DEFINITION AND GIVE AT LEAST 1 EXAMPLE ! ......................................................... 8

1.2 Use Microsoft Excel complete following tasks !! ..................................................................................... 9

1.3 Create Bar chart and also include cumulative line chart using data on table 1. ........................................ 9

1.4 Create Pie Graph, and attach excel graph results to as your answer! ...................................................... 9

1.5 The following data represent the cost of electricity during july 2006 for random samples of 50 one‐

bedroom apartments in large city ...................................................................................................................... 9

1.6 From a frequency distribution and percentage distribution that have class interval with upper class limits

$99, $119, and so on. ........................................................................................................................................ 10

1.7 Construct a histogram and a percentage polygon .................................................................................. 10

1.8 Form a cumulative percentage distribution and plot a cumulative percentage polygon ......................... 10

1.9 Around what amount does monthly electricity cost seem to be concentrated? ..................................... 10

1.10 Appendix .............................................................................................................................................. 10

1.10.1 Installing Excel Add‐Ins for PHStat2 ....................................................................................................... 10

1.10.2 INSTALLING “DATA ANALYSIS” ON EXCEL 2007 ..................................................................................... 10

1.10.3 Installing and Operating the Prentice Hall PHStat ON Your Home Computer ...................................... 11

1.10.4 Configuring Excel 2007 security for PHStat2 ......................................................................................... 11

2 NUMERICAL DESCRIPTIVE MEASURES ......................................................................................... 13

2.1 Central Tendency .................................................................................................................................. 13

2.1.1 The Mean ................................................................................................................................................... 13

2.1.2 The Median ................................................................................................................................................ 14

2.1.3 The Mode ................................................................................................................................................... 15

2.1.4 Quartiles ..................................................................................................................................................... 16

2.1.5 The Geometric Mean ................................................................................................................................. 17

2.1.6 Other useful Excel Basic Built‐In Functions: ............................................................................................... 17

2.2 Assignment 2.1: .................................................................................................................................... 20

2.3 Variation .............................................................................................................................................. 20

2.3.1 The Range ................................................................................................................................................... 20

2.3.2 The InterQuartile Range ............................................................................................................................. 21

2.3.3 The Variance and Standar Deviation .......................................................................................................... 21

2.3.4 The Coefficient of Variance ....................................................................................................................... 22

MANAGEMENT PROGRAM



2.3.5 Z Scores ...................................................................................................................................................... 23

2.4 Shape ................................................................................................................................................... 24

2.4.1 Formula: ..................................................................................................................................................... 24

2.5 Assignment 2.2: ................................................................................................................................... 25

2.6 Descriptive summary of population ...................................................................................................... 25

2.6.1 Excel Statistical Analysis Tools ................................................................................................................... 25

2.6.2 Install and use the Analysis ToolPak .......................................................................................................... 26

2.7 Box‐whisker plot ................................................................................................................................... 27

2.8 Assignment 2.3 ..................................................................................................................................... 29

2.9 Weighted mean .................................................................................................................................... 29

2.10 Assignment 2.4 ..................................................................................................................................... 30

2.11 Correlation coefficients ......................................................................................................................... 30

2.12 Covariance ............................................................................................................................................ 33

2.13 Assignment 2.5 ..................................................................................................................................... 33

2.13.1 Calories and Fat relationship ................................................................................................................. 33

2.13.2 Fuel Efficiency Calculation and Standard ............................................................................................... 34

3 PROBABILITY .............................................................................................................................. 35

3.1 Basic Probability ................................................................................................................................... 35

3.2 Sample spaces and events, contingency tables, simple probability and joint probability ........................ 36

3.2.1 Sample Space ............................................................................................................................................. 36

3.2.2 Event in Sample Space ............................................................................................................................... 36

3.2.3 Simple and Joint Probability ....................................................................................................................... 37

3.3 Bayes' Theorem .................................................................................................................................... 38

3.4 Assignment 3.1 ..................................................................................................................................... 39

3.5 Basic Probability Rules .......................................................................................................................... 41

3.5.1 Discrete Random Variable .......................................................................................................................... 41

3.5.2 Discrete Random Variables Expected Value .............................................................................................. 42

3.5.3 Discrete Random Variables Dispersion ...................................................................................................... 42

3.5.4 Covariance .................................................................................................................................................. 42

3.5.5 The Sum of Two Random Variables: Measures .......................................................................................... 43

MANAGEMENT PROGRAM



3.6 Binomial Distribution ............................................................................................................................ 44

3.6.1 Properties ................................................................................................................................................... 44

3.6.2 The Binomial Distribution Formula ............................................................................................................ 45

3.6.3 The shape and Characteristics ................................................................................................................... 45

3.7 Poisson Distribution .............................................................................................................................. 46

3.7.1 Properties ................................................................................................................................................... 46

3.7.2 Formula ...................................................................................................................................................... 46

3.7.3 Shape .......................................................................................................................................................... 47

3.8 Hypergeometric distribution ................................................................................................................. 47

3.8.1 Formula ...................................................................................................................................................... 47

3.8.2 Example ...................................................................................................................................................... 48

3.9 Read Excel Companion to Chapter 5 ...................................................................................................... 48

3.10 Assignment 3.2 ..................................................................................................................................... 48

3.11 Assignment 3.3 ..................................................................................................................................... 49

4 NORMAL AND SAMPLING DISTRIBUTION ................................................................................... 50

4.1 Normal Distribution and Evaluating Normality ...................................................................................... 50

4.1.1 Normal Probability Density Function ......................................................................................................... 51

4.1.2 Evaluating Normality .................................................................................................................................. 52

4.2 Sampling and Sampling Distribution ...................................................................................................... 54

4.2.1 Sample ........................................................................................................................................................ 54

4.2.2 Types of Samples ........................................................................................................................................ 54

4.2.3 Sampling Distributions ............................................................................................................................... 55

4.2.4 SAMPLING FROM FINITE POPULATIONS .................................................................................................... 56

4.3 Assignment for Simple Random Sample ................................................................................................ 56

4.4 Assignment for Sampling Distribution ................................................................................................... 56

4.5 Assignment for The Sampling Distribution of the mean ......................................................................... 56

4.6 Assignment for Sampling from Finite Population ................................................................................... 57

5 CONFIDENCE INTERVAL ESTIMATION .......................................................................................... 58

5.1 Confidence intervals ............................................................................................................................. 58

5.1.1 A point estimate and a confidence interval estimate ................................................................................ 58

5.1.2 Confidence Interval for μ (σ Known) ......................................................................................................... 59

MANAGEMENT PROGRAM



5.1.3 Confidence Interval for μ (σ Unknown) ..................................................................................................... 61

5.2 Confidence Interval Estimate for a Single Population Proportion ........................................................... 64

5.2.1 Example for Confidence Intervals for the Population Proportion .............................................................. 64

5.3 Determining Sample Size ...................................................................................................................... 65

5.3.1 IF Population Standard Deviation (σ) Known ............................................................................................. 65

5.3.2 IF Population Standard Deviation (σ) Unknown ......................................................................................... 66

5.3.3 To Determine The Required Sample Size For The Proportion ................................................................... 66

5.4 Assignment 5 ........................................................................................................................................ 67

6 HYPOTHESIS TESTING AND TWO SAMPLE TEST ........................................................................... 68

6.1 Hypothesis Testing ................................................................................................................................ 68

6.1.1 The Null Hypothesis, H0 .............................................................................................................................. 68

6.1.2 The Alternative Hypothesis, H1 .................................................................................................................. 69

6.1.3 The Hypothesis Testing Process ................................................................................................................. 69

6.1.4 The Test Statistic and Critical Values .......................................................................................................... 70

6.1.5 Errors in Decision Making .......................................................................................................................... 70

6.1.6 Level of Significance, α ............................................................................................................................... 71

6.1.7 Hypothesis Testing: σ Known ..................................................................................................................... 71

6.1.8 6 Steps of Hypothesis Testing: ................................................................................................................... 72

6.1.9 Hypothesis Testing: σ Known p‐Value Approach ....................................................................................... 73

6.1.10 Hypothesis Testing: σ Known Confidence Interval Connections ........................................................... 74

6.1.11 One Tail Tests ......................................................................................................................................... 74

6.1.12 Hypothesis Testing: σ Unknown ............................................................................................................ 77

6.1.13 Hypothesis Testing: Connection to Confidence Intervals ...................................................................... 77

6.1.14 Hypothesis Testing Proportion .............................................................................................................. 78

6.2 Assignment 6.1 ..................................................................................................................................... 79

6.3 Two‐Sample Tests ................................................................................................................................. 79

6.3.1 Two‐Sample Tests Independent Populations ............................................................................................. 81

6.3.2 Independent Populations Unequal Variance ............................................................................................. 82

7 ANOVA AND CHI SQUARE AND NON PARAMETRIC TESTS .......................................................... 83

7.1 One‐Way Analysis of Variance .............................................................................................................. 84

7.1.1 Hypotheses: One‐Way ANOVA ................................................................................................................... 84

7.1.2 Partitioning the Variation ........................................................................................................................... 85

7.1.3 Obtaining the Mean Squares ..................................................................................................................... 86

7.1.4 One‐Way ANOVA Table .............................................................................................................................. 86

7.1.5 Test statistic ............................................................................................................................................... 86

7.1.6 Example ...................................................................................................................................................... 87

MANAGEMENT PROGRAM



7.1.7 The The Tukey‐Kramer Procedure ............................................................................................................. 88

7.1.8 ANOVA Assumptions .................................................................................................................................. 89

7.2 Two‐Way Analysis of Variance .............................................................................................................. 90

7.2.1 Sources of Variation ................................................................................................................................... 90

7.2.2 Two‐Way ANOVA: Features ....................................................................................................................... 91

7.2.3 Interaction .................................................................................................................................................. 91

7.3 CHI SQUARE AND NON PARAMETRIC TESTS .......................................................................................... 91

7.3.1 One‐Variable Chi‐Square (goodness‐of‐fit test) with equal expected frequencies ................................... 92

7.3.2 One‐Variable Chi‐Square (goodness‐of‐fit test) with predetermined expected frequencies .................... 94

7.3.3 Two‐Variable Chi‐Square (test of independence) ...................................................................................... 96

7.4 Assignment ........................................................................................................................................... 98

7.4.1 Assignment 7.1 ........................................................................................................................................... 98

7.4.2 Assignment 7.2. ........................................................................................................................................ 100

7.4.3 Assignment 7.3 ......................................................................................................................................... 101

8 REGRESSION ANALYSIS ............................................................................................................. 105

8.1 Simple Regression Analysis ................................................................................................................. 105

8.2 Regression Analysis Using Excel .......................................................................................................... 105

8.3 Regression Dialog Box ......................................................................................................................... 106

8.4 Simple Regression ............................................................................................................................... 107

8.5 Linear Correlation and Regression Analysis ......................................................................................... 107

9 MULTIPLE REGRESSION MODEL ................................................................................................ 111

9.1 MULTIPLE REGRESSION USING THE DATA ANALYSIS ADD‐IN ................................................................ 111

9.2 INTERPRET REGRESSION STATISTICS TABLE ......................................................................................... 113

9.3 INTERPRET ANOVA TABLE ................................................................................................................... 114

9.4 INTERPRET REGRESSION COEFFICIENTS TABLE ..................................................................................... 114

9.5 CONFIDENCE INTERVALS FOR SLOPE COEFFICIENTS ............................................................................. 115

9.6 TEST HYPOTHESIS OF ZERO SLOPE COEFFICIENT ("TEST OF STATISTICAL SIGNIFICANCE") ..................... 116

9.7 TEST HYPOTHESIS ON A REGRESSION PARAMETER .............................................................................. 116

9.7.1 Using the p‐value approach ..................................................................................................................... 116

MANAGEMENT PROGRAM



9.7.2 Using the critical value approach ............................................................................................................. 116

9.8 OVERALL TEST OF SIGNIFICANCE OF THE REGRESSION PARAMETERS ................................................... 116

9.9 PREDICTED VALUE OF Y GIVEN REGRESSORS ....................................................................................... 117

9.10 EXCEL LIMITATIONS ............................................................................................................................ 117

9.11 Assignment 9.1 ................................................................................................................................... 117

10 TIME SERIES FORECASTING ................................................................................................... 119

10.1 Time series forecasting models ........................................................................................................... 120

10.1.1 CLASSICAL MULTIPLICATIVE TIME‐SERIES MODEL FOR ANNUAL DATA .............................................. 120

10.1.2 Assignment 9.1 .................................................................................................................................... 121

10.2 Moving Average and Exponential Smoothing ...................................................................................... 121

10.2.1 Moving Average Models ...................................................................................................................... 121

10.2.2 Exponential Smoothing Models ........................................................................................................... 123

10.3 Assignment 10.2 ................................................................................................................................. 124

10.4 Linear, exponential and quadratic trend .............................................................................................. 124

10.4.1 Linear Trend Model ............................................................................................................................. 124

10.4.2 Exponential Trend Model .................................................................................................................... 126

10.4.3 Model Selection Using First, Second, and Percentage Differences ...................................................... 127

10.4.4 Assignment 10.3 .................................................................................................................................. 128

10.5 The autoregressive and the least‐square models for seasonal data ...................................................... 128

10.6 Prices indexes ..................................................................................................................................... 128

10.6.1 Example ............................................................................................................................................... 129

10.7 Aggregated and simple indexes ........................................................................................................... 129

10.7.1 Unweighted Aggregate Price Index ..................................................................................................... 130

10.7.2 Weighted Aggregate Price Indexes ..................................................................................................... 130

MANAGEMENT PROGRAM



Practicum: Math11002 Business Statistics MODULE 1

Date of Receipt

Score: Assistant Signature

Submitted only on Day/Date: ____________ / ______________ Time: 12.00 – 14.00 WIB In ____________________

I herewith signed here on stated that I have strived to do all this the module by myself. Name/NIM : ______________________________/_______________ Signature : _______________________________________________ Rem.:

Module Description: Data Collection and Data Presentation

Objective The student understand the sources of data used in business, types of data used in business, Developing tables and charts for categorical data Developing tables and charts for numerical data and presenting graphs Examination of cross tabulated data using the contingency table and side‐by‐side bar chart and using Microsoft Excel to process business data.

Output Use separate papers to report your results (in hand writing or computer print out). A report produced by the students should be in the form of working procedures and results in both softcopy and hardcopy.

Pre‐Lab Read:

Levine, et.al. 2008. Statistics for Managers – Using Microsoft™ Excel. Fifth Editon. Pearson

Education, Inc., Upper Saddle River, New Jersey., pages 18‐30 and pages 75‐93 .

Set the Ms. Excel Application to be ready for Data Analysis Add‐In. See page 28‐29.

1.1 Answer questions below with a brief description.

1. Explain Key Definition and give at least 1 example ! 1.1 Population :

1.2 Sample:

1.3 Parameter:

1.4 Statistics:

1.5 Descriptive:

1.6 Inferential Statistics:

2. Name three circumstances that require data collection

3. Explain the difference between Descriptive and Inferential Statistics

4. Design questionnaire about data collection of your own with at least 10 question!

MANAGEMENT PROGRAM



5. According to The State of the News Media, 2006, the average age of viewers of "ABC World News

Tonight" is 59 years. Suppose a rival network executive hypothesizes that the average age of ABC

news viewers is less than 59. To test her hypothesis, she samples 500 ABC nightly news viewers and

determines the age of each.

5.1 Describe the population.

5.2 Describe the variable of interest.

5.3 Describe the sample.

5.4 Describe the inference.

6. Problem Cola wars is the popular term for the intense competition between Coca‐Cola and Pepsi

displayed in their marketing campaigns. Their campaigns have featured movie and television stars,

rock videos, athletic endorsements, and claims of consumer preference based on taste tests.

Suppose, as part of a Pepsi marketing campaign, 1,000 cola consumers are given a blind taste test

(i.e., a taste test in which the two brand names are disguised). Each consumer is asked to state a

preference for brand A or brand B.

6.1 Describe the population,

6.2 Describe the variable of interest.

6.3 Describe the sample.

6.4 Describe the inference.

1.2 Use Microsoft Excel complete following tasks !!

1.3 Create Bar chart and also include cumulative line chart using data on table 1.

1.4 Create Pie Graph, and attach excel graph results to as your answer!

Table 1. Percentage Expended Money

What You Would Do With the Money

Percentage (%)

Buy a luxury item, vacation, or gift 20 Give it to charity 2 Pay debt 24 Save 31 Spend on essentials 16 Other 7

1.5 The following data represent the cost of electricity during july 2006 for random

samples of 50 one‐bedroom apartments in large city

Table 2. Utility Charge 96 171 202 178 147 102 153 197 127 82

MANAGEMENT PROGRAM



157 185 90 116 172 111 148 213 130 165 141 149 206 175 123 128 144 168 109 167 95 163 150 154 130 143 187 166 139 149

108 119 183 151 114 135 191 137 129 158

1.6 From a frequency distribution and percentage distribution that have class

interval with upper class limits $99, $119, and so on.

1.7 Construct a histogram and a percentage polygon

1.8 Form a cumulative percentage distribution and plot a cumulative percentage

polygon

1.9 Around what amount does monthly electricity cost seem to be concentrated?

1.10 Appendix

1.10.1 Installing Excel AddIns for PHStat2

The Prentice Hall PHStat Microsoft Excel add‐in enhances Microsoft Excel to better support the

statistical analyses taught in an introductory statistics course. Using PHStat lessens the technical training

needed to use Microsoft Excel to perform statistical analysis and allows you to generate results that

would otherwise be very tedious or impossible to produce from worksheets built from scratch. PHStat

requires that “Data Analysis” is installed on EXCEL and the following system requirements:

Any Windows 95 (or later) system; Microsoft Excel 95 or Microsoft Excel 97 (or later)

32 MB of main memory; 64 MB required when running sampling distribution simulations and

data‐intensive regression analyses; approximately 5 MB hard disk free space during setup

process and 3MB hard disk space after installation.

Preferred Display settings: PHStat will run with any display settings, but for best results set the

Desktop area to 800 by 600 pixels with Small Fonts. (Use the Settings tab of the Display applet of

the Control Panel to change settings.).

1.10.2 INSTALLING “DATA ANALYSIS” ON EXCEL 2007

1. Open Excel and click the Office Button. 2. In the Office Button pane, click Excel Options.

MANAGEMENT PROGRAM



3. In the Excel options dialog box that appears, click Add-Ins in the left panel and look for Analysis ToolPak and Analysis ToolPak –VBA under Active Application Add-ins.

4. If they do not appear, click Go. in the Add-Ins dialog box that appears, verify that Analysis ToolPak and Analysis ToolPak –VBA are both checked in the Add-Ins available list.

5. Click OK and exit Excel to save these settings.

Click on the “Microsoft Office” button in the upper left hand corner of the EXCEL spreadsheet and click

on “EXCEL Options” in the lower right hand corner of the pull‐down menu. On the left side of the “EXCEL

Options” page click on “Add‐ins” and then the “Go” button at the bottom of the page. This should open

the “Add‐ins” section. Select “Analysis ToolPak” and “Analysis ToolPak‐VBA” and click “OK.”

1.10.3 Installing and Operating the Prentice Hall PHStat ON Your Home Computer

To use the Prentice Hall PHStat Microsoft Excel add‐in, you first need to run the setup program

(Setup.exe) located in the PHStat directory on this disk. The setup program will install the PHStat

program files to your system and add icons on your Desktop and Start Menu for PHStat. To do this

simply insert PHStat disk in your CD drive and follow directions.

To operate PHStat or EXCEL simply double clicks on the PHStat icon. For EXCEL 2007 users, you will likely

have to click on “Enable Macros” which should popup by itself.

1.10.4 Configuring Excel 2007 security for PHStat2

You must change the Trust Center settings to allow PHStat2 to properly function. Click the Office Button, and then click Excel Options in the Office menu. In the Excel Options dialog box that appears, click Trust Center and then in the Trust Center panel, click Trust Center Settings. In the left pane of the

MANAGEMENT PROGRAM



Trust Center dialog box that appears, first click Add‐Ins and clear, if necessary all of the check boxes that appear under the Add‐ins banner. Next, click Macro Settings in the left pane and click either Disable all macros with notification (recommended) or Enable all macros (not recommended, use only if the other choice fails to allow PHStat2 to function properly).

MANAGEMENT PROGRAM




Date of Receipt




Module Description: NUMERICAL DESCRIPTIVE MEASURES

Objective Measures of central tendency, variation, and shape Population summary measures Five number summary and Box‐and‐Whisker plots Covariance and Coefficient of correlation.


2 NUMERICAL DESCRIPTIVE MEASURES

2.1 Central Tendency Central tendency refers to the tendency of the individual measures in a distribution to cluster

together toward some point of aggregation.

2.1.1 The Mean Mean or arithmetic mean is value of total sum of values divided by the number of data values

included included to the calculation (quantity of integer).

2.1.1.1 Formula: The Mean Total sum divided by quantity of integers

∑

Where = Sample mean =Number of values or sample size =ith value of the variable X ∑ = Summation of all value in the sample

2.1.1.2 Ms Excel BuiltIn Function for calculating Mean The function is written as follows:

= AVERAGE (argument)

MANAGEMENT PROGRAM



The argument for this function is data contained in the selected range of cells.

Example Using Excel's AVERAGE Function:

Note: For help with this example, see the image to the right.

1. Enter the following data into cells C1 to C6: 11,12,13,14,15,16.

2. Click on cell C7 ‐ the location where the results will be displayed.

3. Type " = average( " in cell C7.

4. Drag select cells C1 to C6 with the mouse pointer.

5. Type the closing bracket " ) " after the cell range in cell C7.

6. Press the ENTER key on the keyboard.

7. The answer ‐ 13.5 ‐ should be displayed in cell C7.

8. The complete function = AVERAGE (C1 : C6) appears in the formula bar above the worksheet.

2.1.2 The Median The MEDIAN shows you the middle value in a list of numbers. Middle, in this case, refers to

arithmetic size rather than the location of the numbers in a list. If there is an even set of

numbers, the median is the average of the middle two values.

2.1.2.1 Formula: The Median Middle value that separates the greater and lesser halves of

a data set

ranked value

2.1.2.2 Ms Excel BuiltIn Function for calculating Median The syntax for the MEDIAN function is:

= MEDIAN ( number1, number2, ... number255 )

Note:Up to 255 numbers can be entered into the function.

MANAGEMENT PROGRAM



Example Using Excel's MEDIAN Function:


1. Enter the following data into cells D1 to D5: 4,12,49,24,65.

2. Click on cell E1 ‐ the location where the results will be displayed.

3. Click on the Formulas tab.

4. Choose More Functions > Statistical from the ribbon to open the function drop down list.

5. Click on MEDIAN in the list to bring up the function's dialog box.

6. Drag select cells D1 to D5 in the spreadsheet to enter the range into the dialog box, then Click OK.

7. The answer 24 should appear in cell E1 since there are two numbers larger (49 and 65) and two numbers smaller (4 and 12) than it in the list.

8. The complete function = MEDIAN (D1 : D5) appears in the formula bar above the worksheet when you click on cell F1.

2.1.3 The Mode The mode is Most frequent number in a data set.

2.1.3.1 Formula: The Median For example, the mode of array of 1, 3, 4, 4, 4, 7, 7, 12, 17 is

4.

2.1.3.2 Ms Excel BuiltIn Function for calculating Mode The MODE function, one of Excel's statistical functions, tells

you the most frequently occurring value in a list of numbers.

The syntax for the MODE function is:

= MODE ( number1, number2, ... number255 )

Note:Up to 255 numbers can be entered into the function.

Example Using Excel's MODE Function:


MANAGEMENT PROGRAM



1. Enter the following data into cells D1 to D6: 98,135,147,135,98,135. 2. Click on cell E1 ‐ the location where the results will be displayed. 3. Click on the Formulas tab. 4. Choose More Functions > Statistical from the ribbon to open the function drop down list. 5. Click on MODE in the list to bring up the function's dialog box. 6. Drag select cells D1 to D6 in the spreadsheet to enter the range into the dialog box. Then

Click OK. 7. The answer 135 should appear in cell E1 since this number appears the most (three times) in

the list of data. 8. The complete function = MODE (D1 : D6) appears in the formula bar above the worksheet

when you click on cell E1.

2.1.4 Quartiles Quartiles often are used in sales and survey data to divide populations into groups. For example,

you can use QUARTILE to find the top 25 percent of incomes in a population.

2.1.4.1 Formulas of Quartiles

First quartile (designated Q1) = lower quartile = cuts off lowest 25% of data = 25th percentile

Ssecond quartile (designated Q2) = median = cuts data set in half = 50th percentile

Third quartile (designated Q3) = upper quartile = cuts off highest 25% of data, or lowest 75% = 75th percentile

2.1.4.2 Ms Excel BuiltIn Function for calculating Mode The syntax for the MODE function is:

=QUARTILE(array,quart)

Array is the array or cell range of numeric values for which you want the quartile value.

Quart indicates which value to return.

If quart equals QUARTILE returns

0 Minimum value

1 First quartile (25th percentile)

2 Median value (50th percentile)

3 Third quartile (75th percentile)

4 Maximum value

MANAGEMENT PROGRAM



2.1.5 The Geometric Mean The Geometric Mean measures the rate of change of a variable over time. Returns the geometric

mean of an array or range of positive data. For example, you can use GEOMEAN to calculate average growth rate given compound interest with variable rates

2.1.5.1 Formula: The Geometric Mean

Geometric Mean is the nth root of the product of n values

xG X X X

Or

Geometric Mean Rate of Return measures the average percentage return of an investment over

time.

RG 1 R 1 R 1 R 1

2.1.5.2 Ms Excel BuiltIn Function for calculating Geometric Mean

Syntax

= GEOMEAN(number1,number2,...)

Number1, number2, ... are 1 to 255 arguments for which you

want to calculate the mean. You can also use a single

array or a reference to an array instead of arguments

separated by commas.

Example:

1. Enter data to cells A2 through A8 : 4, 5, 8, 7, 11, 4, 3

2. On B4 type formula =GEOMEAN(A2:A8). Then Click ENTER. 3. The answer 5.47698697 should appear in cell B4

2.1.6 Other useful Excel Basic BuiltIn Functions:

2.1.6.1 SUM

MANAGEMENT PROGRAM



Horizontal 100 200 300 600 =SUM(C4:E4)

Vertical 100 200 300 600 =SUM(C7:C9)

Single Cells 100 300 600

200

What Does It Do ? This function creates a total from a list of numbers. It can be used either horizontally or vertically. The numbers can be in single cells, ranges are from other functions.

Syntax =SUM(Range1,Range2,Range3... through to Range30).

2.1.6.2 COUNT

Entries To Be Counted Count

10 20 30 3 =COUNT(C4:E4)

10 0 30 3 =COUNT(C5:E5)

10 -20 30 3 =COUNT(C6:E6)

10 1-Jan-88 30 3 =COUNT(C7:E7)

10 21:30 30 3 =COUNT(C8:E8)

10 0.758576 30 3 =COUNT(C9:E9) 10 30 2 =COUNT(C10:E10)

10 Hello 30 2 =COUNT(C11:E11)

10 #DIV/0! 30 2 =COUNT(C12:E12)

What Does It Do ? This function counts the number of numeric entries in a list. It will ignore blanks, text and errors.

Syntax =COUNT(Range1,Range2,Range3... through to Range30)

2.1.6.3 MAX

Values Maximum

MANAGEMENT PROGRAM



120 800 100 120 250 800 =MAX(C4:G4)

Dates Maximum 1-Jan-98 25-Dec-98 31-Mar-98 27-Dec-98 4-Jul-98 27-Dec-98 =MAX(C7:G7)

What Does It Do ? This function picks the highest value from a list of data.

Syntax =MAX(Range1,Range2,Range3... through to Range30)

2.1.6.4 MIN

Values Minimum 120 800 100 120 250 100 =MIN(C4:G4)

Dates Maximum 1-Jan-98 25-Dec-98 31-Mar-98 27-Dec-98 4-Jul-98 1-Jan-98 =MIN(C7:G7)

What Does It Do ? This function picks the lowest value from a list of data.

Syntax =MIN(Range1,Range2,Range3... through to Range30)

MANAGEMENT PROGRAM



2.2 Assignment 2.1: The sample data of 38 banks for direct deposit customers who maintain a Rp. 100(millions) balance:

26 28 40 20 21 22 25 25 18 25 15 20

18 20 25 25 22 30 30 3 15 20 29 26

28 10 2 21 22 25 25 18 25 15 20 18

20 25 25 22 30 30 30 65 20 29 23 45

1. Using formulas above calculate Mean, Median, Mode, Quartiles and Geometric Mean of the

sample data.

2. Use Ms Excel Functions to calculate Mean, Median, Mode, Quartiles and Geometric Mean of the

sample data.

3. Compare the result and report your analysis.

2.3 Variation Variability or variation refers to the overall separations and differences that exist among the

individual measures in a distribution, while central tendency refers to their closeness and

similarity. Variation measures the spread or the dispersion of values in a data set.

2.3.1 The Range The Range equal to the largest value minus the smallest value.

2.3.1.1 Formula: The Range

2.3.1.2 Ms Excel BuiltIn Function for calculating The Range To calculate the range in Ms Excel we use two built‐in function: MAX() and MIN() . See Section

1.1.6 above.

Based on the formula of the range above the syntax of formula to calculate The Range:

= MAX()‐MIN()

Example:


2. On B4 type formula = MAX(A2:A8)‐MIN(A2:A8). Then Click ENTER.

MANAGEMENT PROGRAM



3. The answer 5.47698697 should appear in cell B4

2.3.2 The InterQuartile Range The InterQuartile Range equal to the different between the third quartile and the first quartile in a set

of data.

2.3.2.1 Formula: The Range

2.3.2.2 Ms Excel BuiltIn Function for calculating The Range To calculate the range in Ms Excel we use FORMULA with built‐in

function QUARTILE(). See Section 1.1.4.2 above.

Based on the formula of the range above the syntax of formula to

calculate The Range:

= QUARTILE(range,3)‐QUARTILE(range,1)

Example:


2. On B4 type formula = QUARTILE(A2:A8,3)‐ QUARTILE(A2:A8,1). Then Click ENTER. 3. The answer 5.47698697 should appear in cell B4

2.3.3 The Variance and Standar Deviation The InterQuartile Range equal to the different between the third quartile and the first quartile in a set

of data.

2.3.3.1 Formula: The Variance and Standard Deviation Variance formula:

1

Or

∑

1

Standar Variation formula:

MANAGEMENT PROGRAM



∑

1

2.3.3.2 Ms Excel BuiltIn Function for calculating Variance and Standard Deviation To calculate the range in Ms Excel we use FORMULA with built‐in function VAR(), and the

standard deviation we user STDEV()

Syntax:

=VAR(number1,number2,...)

=STDEV(number1,number2,...)

Number1, number2, ... are 1 to 255 number arguments

corresponding to a sample of a population

Example for VARIANCE:

1. Enter data to cells A2 through A8 : 4, 5, 8, 7, 11, 4, 3 2. On B4 type formula = VAR(A2:A8). Then Click ENTER. 3. The answer 8 should appear in cell B4

Example for STANDAR DEVIATION:


2. On B4 type formula = STDEV(A2:A8). Then Click ENTER. 3. The answer 2.828427125 should appear in cell B4

2.3.4 The Coefficient of Variance The Coefficient of Variance is a relative measure of variation that always expressed in percentage.

2.3.4.1 Formula: The Coefficient of Variance The coefficient of variance is equal to the standard deviation divided by the mean and multiplied by

100%

Formula:

MANAGEMENT PROGRAM



100%

2.3.4.2 Ms Excel BuiltIn Function for calculating Variance and Standard Deviation To calculate the range in Ms Excel we use FORMULA with built‐in function STDEV(), and the mean

we use AVERAGE()

Syntax:

=(STDEV(number1,number2,...)/AVERAGE(number1,number2,...))*100%

Number1, number2, ... are 1 to 255 number arguments corresponding to a sample of a population


1. Enter data to cells A2 through A8: 4, 5, 8, 7, 11, 4, 3

2. On B4 type formula = (STDEV(A2:A8)/AVERAGE(A2:A8)X100% Then Click ENTER.

3. The answer 8 should appear in cell B4

2.3.5 Z Scores Z Scores is an extreme value or outlier located far away from the mean.

Formula:

2.3.5.1 Ms Excel BuiltIn Function for Z Scores To calculate Z Score in Ms Excel we use FORMULA with built‐in function STDEV(), and the mean

we use AVERAGE()

Syntax:

'=(number - AVERAGE(range of number))/STDEV(range of number)

MANAGEMENT PROGRAM



number is 1 number argument corresponding to a sample of a population

Range of Number are 1 to 255 number arguments corresponding to a sample of a population



2. On C2 type formula '=(A2‐AVERAGE($A$2:$A$8))/STDEV($A$2:$A$8) Then copy to others cells ( C3 to C8) ENTER.

3. The answer ‐0.707106781 should appear in cell C2.

2.4 Shape The of a data set represents a pattern of all the values, from the lowest to the highest value. A

distribution is either symmetrical or skewed. A symmetrical distribution is values below mean are

distributed exactly as the values above the mean. While skewed distribution will results in an imbalance

of low values or high values.

2.4.1 Formula: Shape influences the relationship of the mean to the median in the following ways:

Mean < Median: negative or left skewed

Mean = Median: symmetric or zero skewness

Mean > Median: positive or right skewed

2.4.1.1 Ms Excel Function for calculating skewness

Returns the skewness of a distribution. Skewness characterizes the degree of asymmetry of a distribution

around its mean. Positive skewness indicates a distribution with an asymmetric tail extending toward more

positive values. Negative skewness indicates a

distribution with an asymmetric tail extending toward

more negative values.

Syntax

= SKEW(numbers)

MANAGEMENT PROGRAM



Examples:

Example for Negative SKEWNESS:

1. Enter data to cells A2 through A8 : 10,10,20,30,40,50,50

2. On B2 type formula =SKEW(A3:A8) , then press ENTER. 3. The answer ‐0.38 should appear in cell B2. This mean number mean that the

distribution of data (A3 to A8) is negative 4. On B4 type formula =SKEW(A2:A8) , then press ENTER. 5. The answer 0 should appear in cell B4. This mean number mean that the

distribution of data (A2 to A8) is symmetric. 6. On B6 type formula =SKEW(A2:A7) , then press ENTER. 7. The answer +0.38 should appear in cell B6. This mean number mean that the

distribution of data (A2 to A7) is positive

2.5 Assignment 2.2: Using Data on 1.2 above calculate or compose Range, InterQuartile Range, Variance and

Standar Deviation, The Coefficient of Variance, Z Scores, Shape. Report your results.

2.6 Descriptive summary of population The Descriptive Statistics procedure of the ToolPak add‐in.

INSTALLING “DATA ANALYSIS” ON EXCEL

2.6.1 Excel Statistical Analysis Tools Excel has several data analysis tools included through an Analysis ToolPak add-in. These tools can quickly produce complex engineering or statistical analyses of your data. Each tool is a little different, but all require you to input what data you wish Excel to analyze.

MANAGEMENT PROGRAM



Data Analysis… is located under the Tools menu. If the option is not there, you will need to install the

Analysis ToolPak.

2.6.2 Install and use the Analysis ToolPak

1. On the Tools menu, click Add‐Ins….

2. Select the Analysis ToolPak check box.

3. On the Tools menu, click Data Analysis.

Note: If Analysis ToolPak is not listed in the Add‐

Ins dialog box, click Browse… and locate the

drive, folder name, and file name for the Analysis

ToolPak add‐in, Analys32.xll — usually located in

the Microsoft Office\Office\Library\Analysis

folder — or run the Setup program if it isn't

installed.

For EXCEL 2007:

1. Click on Data Tab and click on “Data Analysis” Icon on Data Tab.

2. Click on the “Microsoft Office” button in the upper left hand corner of the EXCEL

spreadsheet and click on “EXCEL Options” in the lower right hand corner of the pull-down menu. On the left side of the “EXCEL Options” page click on “Add-ins” and then the “Go” button at the bottom of the page. This should open the “Add-ins” section.

3. Select “Analysis ToolPak” and “Analysis ToolPak-VBA” and click “OK.”

For EXCEL 2003 or earlier version:

1. Click on the “Tools” tab/pull-down menu and click on “Data Analysis.”

MANAGEMENT PROGRAM



2. If “Data Analysis” does not appear on the “Tools” pull-down menu, then click on “Add-Ins” and click on the first two boxes (“Analysis ToolPak” and “Analysis ToolPak-VBA”). Click “OK” and open “Data Analysis.”

Using ToolPak Descriptive Statistics

Begin the Analysis ToolPak add-in and Descriptive Statistics from the Analysis Tools list and Click OK. In the Descriptive Statistics dialog box (shown below), enter the cell range of the data as the Input Range. Click the Column option and Labels in first row. See Designing Effective Worksheets in Section 1.6 of Levine, et.al. 2008. Statistics For Managers Using Microsoft Excel, Fifth Edition. Pearson Education, Inc. Upper Saddle River, New Jersey, 07458.

Finish by clicking New Worksheet Ply, Summary statistics, Kth Larget, and Kth Smallest, and the OK.

2.7 Boxwhisker plot In descriptive statistics, a box plot or boxplot (also known as a box‐and‐whisker diagram or plot) is a

convenient way of graphically depicting groups of numerical data through their five‐number summaries:

the smallest observation (sample minimum), lower quartile (Q1), median (Q2), upper quartile (Q3), and

largest observation (sample maximum). A boxplot may also indicate which observations, if any, might be

considered outliers.

A boxplot, or box and whisker diagram, provides a simple graphical summary of a set of data. It shows a

measure of central location (the median), two measures of dispersion (the range and inter‐quartile

range), the skewness (from the orientation of the median relative to the quartiles) and potential outliers

(marked individually). Boxplots are especially useful when comparing two or more sets of data.

Regrettably, there is currently no boxplot facility in Microsoft Excel. For simplicity, many recent statistics

textbooks (for example, Daly et al, 1995) omit the fences used to identify possible outliers. These

simplified boxplots, displaying most of the important features, can be drawn quite easily in Excel. In the

absence of any fences (see Devore and Peck (1990) for a definition), a simple rule is that a whisker which

is longer than three times the length of the box probably indicates an outlier.

MANAGEMENT PROGRAM



Create Box‐Wisher Plot

To create BoxPlot using Ms Excel 2007 :

1. Highlight the whole table, including figures and series labels, then select Insert from the main

menu. Under Charts select a Line chart and choose the Line with Markers option.

2. Under Chart Tools select Design > Switch Row/Column. Right‐click on a data point from the first

data series, and choose Format Data Series > Line Colour > No line to remove the connecting

lines. Repeat for the other four data series in turn.

3. Select any of the data series and under Chart Tools select Layout > Analysis > Lines > High‐Low

Lines, then Layout > Analysis > Lines > Up/Down Bars > Up/Down Bars.

4. Further customising can be carried out according to your own preferences by right‐clicking on

the relevant object and selecting the Format option on the shortcut menu.

The Result:

0

10

20

30

40

50

60

70

80

90

set 1 set 2 set 3

Q1

Min

Median

Max

Q3

MANAGEMENT PROGRAM



2.8 Assignment 2.3 Replicate section 2.7 Box‐whisker plot procedure

2.9 Weighted mean Excel does not contain a built in function to calculate a weighted average. It is however easy to do it

using the SUMPRODUCT() function in a simple formula.

‐ A B C

1 Weighted average

2

3 Cost Staff

4 Grade A 13000 5

5 Grade B 15000 2

6 Grade C 20000 3

7

8 Average 16000

9 Wtd Avg 15500

SumProduct() multiplies two arrays (or ranges) together and returns the sum of the product. In the

illustration it would calculate '(B4 x C4) + (B5 x C5) + (B6 x C6)'.

The formula in cell B9 is: = SUMPRODUCT(B4:B6, C4:C6) / SUM(C4:C6)

The result shows that the weighted average is less than the plain arithmetic mean. This is because it has

taken into account the larger number of staff being paid the lower salary.

‐ F G H

13 Forecast incorporating risk

14

15 ProbabilitySales

16 Good weather 30% 10000

17 Mediocre weather 50% 8000

18 Poor weather 19% 2000

19 Hurricane 1% 0

20

21 Forecast 100% 7380

The weighted average can also be used for assessing the risk or determining the probability of various

outcomes. If a judgement is made about the likelihood of various weather conditions for an outdoor

sporting and the effect on ticket sales, a predicted value of sales can be calculated using a similar

formula as the previous example. =SUMPRODUCT(G16:G19, H16:H19) returns the value of 7,380. The

MANAGEMENT PROGRAM



probability values (G16:G19) are already expressed as percentages (total= 100% or 1.0) and so there is

no need to divide by SUM(G16:G19).

2.10 Assignment 2.4 Capital Component Cost % of capital structure

Retained Earnings 8% 30%

Common Stocks 9% 10%

Preferred Stocks 10% 15%

Debt (Bonds) 6.67% 45%

Using table above Calculate the weighted average cost of capital (WACC) of this company !

2.11 Correlation coefficients

2.11.1.1 Correlation Coefficients Formula

If (X1,Y1 ),(X2,Y2 ),(X3,Y3 )...,(Xn,Yn ) are the observed values then the correlation coefficient (usually denoted as Corr(X,Y) or ρXY ) of the observed sample is defined as:

∑

∑ ∑

Another way of visualizing the formula is:

,

Now we generalize the idea of sample correlation coefficient when the sample is not bivariate but multivariate.

Let X~1, X~2, X

~3,..., X

~n be a random sample where each X~i is a k‐dimensional vector of the form

X~i = Xi1, Xi2, Xi3,..., Xin. Just like in the previous topic.

Just like in the case of sample covariance, in the multivariate case we talk of sample correlation coefficient matrix. Like the dispersion matrix, the sample correlation coefficient matrix is a square matrix of order k x k defined as below.

All the diagonal entries are 1 as both mathematically and heuristically we see that the correlation coefficient of any variable with itself should be 1.

ρii = 1 for all i

MANAGEMENT PROGRAM



Similar to the dispersion matrix, the off‐diagonal elements are correlation coefficient of the ith and jth variables.

,∑

∑ ∑

Or in another way:

,

2.11.1.2 Ms Excel Function for calculating correlation

Step1: To make this calculation select Tools/Data Analysis/Correlation… The following dialog box is displayed:

Step 2: In the input range textbox enter the range of the data (include the first row containing the variable name) or click on the data selection icon and mark the range to use. Step 3: Notice that the “Labels in First Row” checkbox is checked. Step 4: Click on OK and the following information will appear in a new worksheet:

A B

1 TIME1 TIME2

2 TIME1 1

3 TIME2 0.763957 1

The Pearson’s correlation for these two variables is 0.764 (rounded.)

MANAGEMENT PROGRAM



Example 2

2.11.1.3 A second way to calculate the correlation is with a function. Step1: In the Example worksheet, enter some labels in column I to indicate that you are calculating a correlation.

Step 2: In the J3 (or wherever you want it) cell, you will enter an Excel function that will calculate the desired correlation.

Step 3: Enter the formula

=CORREL(C2:C51,D2:D51)

Note that it is of the form, =CORREL(array1,array2)

Where the first array and second array contain the paired numbers to correlate. It is IMPORTANT that the numbers be paired correctly.)

The answer will appear in the cell. In this case, the Pearson’s correlation is 0.764 (rounded.)

2.11.1.4 Calculate the correlation is with using Formula

MANAGEMENT PROGRAM



2.12 Covariance

For a bivariate sample we have dealt with the covariance already. Let us just recall it:

Given a random sample (X1,Y1 ),(X2,Y2 ),(X3,Y3 )...,(Xn,Yn ) the sample covariance Cov(X,Y) is defined as

,1

2.12.1.1 Ms Excel Calculation for Covariance:

2.12.1.2 Ms Excel Function for Covariance:

To calculate Covariance using Ms Excel Function we can use COVAR(array1,array2)

The covariance calculation on Ms Function base on equation, where x and y are the sample means AVERAGE(array1) and AVERAGE(array2), and n is the sample size.

2.13 Assignment 2.5

2.13.1 Calories and Fat relationship

Product Calories Fat

Dunkin' Donuts Iced Mocha Swirl latte (whole milk) 240 8.0

Starbucks Coffee Frappuccino blended coffee 260 3.5

Dunkin' Donuts Coffee Coolatta (cream) 350 22.0

Starbucks Iced Coffee Mocha Expresso (whole milk and whipped cream 350 20.0

Starbucks Mocha Frappuccino blended coffee (whipped cream) 420 16.0

Starbucks Chocolate Brownie Frappuccino blended coffee (whipped cream) 510 22.0

Starbucks Chocolate Frappuccino Blended Crème (whipped cream) 530 19.0

MANAGEMENT PROGRAM



Using data above calculate:

a. The covariance using both technique above and compare. Explain ! b. Compute the coefficient of correlation using techniques explained above. c. Which do you think is more valuable in expressing the relationship between calories and fat – the covariance

or the coefficient of correlation? Explain. d. What your conclusions about the relationship between Calories and Fat? Explain.

2.13.2 Fuel Efficiency Calculation and Standard

Car Owner Government

Standard

2005 Ford F-150 14.3 16.8

2005 Chevrolet Silverado 15.0 17.8

2002 Honda Accord LX 27.8 26.2

2002 Honda Civic 27.9 34.2

2004 Honda Civic Hybrid 48.8 47.6

2002 Ford Explorer 16.8 18.3

2005 Toyota Camry 23.7 28.5

2003 Toyota Corolla 32.8 33.1

2005 Toyota Prius 37.3 56.0

a. Compute the covariance using both techniques explained above and compare. Explain ! b. Compute the coefficient of correlation using techniques explained above. c. What your conclusions about the relationship between Owner Calculation and Government Standard?

Explain.

MANAGEMENT PROGRAM



Practicum: MATH11002 Business Statistics

MODULE 3

Date of Receipt


Submitted only on Day/Date: ____________ / ______________ Time: WIB In ____________________

I herewith signed here on stated that I have strived to do all this with the module myself. Name/NIM : ______________________________/_______________ Signature : _______________________________________________ Rem.:

Module Description: Probability

Objective The student understand and able to define and examine basic probability concepts Define conditional, joint and marginal probability To use Bayes' theorem to revise probabilities Statistical Independence; Addressed the probability of a discrete random variable Define covariance and discuss its application in finance To compute probability from the binomial, Poisson and Hypergeometric distribution How to use this distribution to solve business problem using Ms Excel Regression Analysis or Other Statistical Softwares.

Output A report produced by the students should be in the form of working procedures and results in both softcopy and hardcopy.

3 PROBABILITY

3.1 Basic Probability

Probability: the chance that an uncertain event will occur (always between 0 and 1)

Event: Each possible type of occurrence or outcome

Simple Event: an event that can be described by a single characteristic

Sample Space: the collection of all possible events

There are three approaches to assessing the probability of an uncertain event:

1. A priori Classical Probability: the probability of an event is based on prior knowledge of the

process involve

d.

Example: Find the probability of selecting a face card (Jack, Queen, or King) from a standard

deck of 52 cards. Answer:

MANAGEMENT PROGRAM



2. Empirical Classical Probability: the probability of an event is based on observed data.

Example: Find the probability of selecting a male taking statistics from the population described

in the following table:

Taking Stats Not Taking Stats Total

Male 84 145 229

Female 76 134 210

Total 160 279 439

84439

0.191

3. Subjective Probability: the probability of an event is determined by an individual, based on that

person’s past experience, personal opinion, and/or analysis of a particular situation.

3.2 Sample spaces and events, contingency tables, simple probability and joint probability

3.2.1 Sample Space The Sample Space is the collection of all possible events

Ex. All 6 faces of a die:

Ex. All 52 cards in a deck of cards

Ex. All possible outcomes when having a child: Boy or Girl

3.2.2 Event in Sample Space Simple event

An outcome from a sample space with one characteristic

ex. A red card from a deck of cards

Complement of an event A (denoted A/)

All outcomes that are not part of event A

ex. All cards that are not diamonds

Joint event

Involves two or more characteristics simultaneously

observedoutcomesofnumber total

observed outcomes favorable ofnumber Occurrence ofy Probabilit

MANAGEMENT PROGRAM



ex. An ace that is also red from a deck of cards

In mathematics, a probability of an event A is represented by a real number in the range from 0 to 1 and

written as P(A), p(A) or Pr(A). An impossible event has a probability of 0, and a certain event has a

probability of 1. The opposite or complement of an event A is the event [not A] (that is, the event of A

not occurring); its probability is given by P(not A) = 1 ‐ P(A). As an example, the chance of not rolling a six

on a six‐sided die is 1 ‐ (chance of rolling a six) =1 .

3.2.3 Simple and Joint Probability Simple (Marginal) Probability refers to the probability of a simple event.

ex. P(King)

Joint Probability refers to the probability of an occurrence of two or more events.

ex. P(King and Spade)

If both the events A and B occur on a single performance of an experiment this is called the intersection

or joint probability of A and B, denoted as . If two events, A and B are independent then the

joint probability is

.

for example, if two coins are flipped the chance of both being heads is

If either event A or event B or both events occur on a single performance of an experiment this is called

the union of the events A and B denoted as . If two events are mutually exclusive then the

probability of either occurring is

For example, the chance of rolling a 1 or 2 on a six‐sided die is

1 2 1 2 1 216

16

26

13

If the events are not mutually exclusive then

For example, when drawing a single card at random from a regular deck of cards, the chance of getting a

heart or a face card (J,Q,K) (or one that is both) is , because of the 52 cards of a deck 13

are hearts, 12 are face cards, and 3 are both: here the possibilities included in the "3 that are both" are

included in each of the "13 hearts" and the "12 face cards" but should only be counted once.

MANAGEMENT PROGRAM



Conditional probability is the probability of some event A, given the occurrence of some other event B.

Conditional probability is written P(A|B), and is read "the probability of A, given B". It is defined by

|

If P(B) = 0 then P(A|B) is undefined.

Summary of probabilities Event Probability

A 0,1 not A 1 A or B

if A and B are mutually exclusive

A and B |if A and B are independent

A given B |

3.3 Bayes' Theorem

))P(BB|P(A))P(BB|P(A))P(BB|P(A

))P(BB|P(AA)|P(B

kk2211

iii

where:

Bi = ith event of k mutually exclusive and collectively exhaustive events

A = new event that might impact P(Bi)

Bayes’ Theorem Example

A drilling company has estimated a 40% chance of striking oil for their new well. A detailed test has

been scheduled for more information. Historically, 60% of successful wells have had detailed tests, and

20% of unsuccessful wells have had detailed tests. Given that this well has been scheduled for a

detailed test, what is the probability that the well will be successful?

Solution:

Let S = successful well and U = unsuccessful well

P(S) = .4 , P(U) = .6 (prior probabilities)

Define the detailed test event as D

Conditional probabilities: P(D|S) = .6 and P(D|U) = .2

MANAGEMENT PROGRAM



667.12.24.

24.

)6)(.2(.)4)(.6(.

)4)(.6(.

U)P(U)|P(DS)P(S)|P(D

S)P(S)|P(DD)|P(S

Given the detailed test, the revised probability of a successful well has risen to .667 from the

original estimate of 0.4.

Event Prior Prob. Conditional

Prob. Joint Prob. Revised Prob.

S (successful) .4 .6 .4*.6 = .24 .24/.36 = .667

U (unsuccessful) .6 .2 .6*.2 = .12 .12/.36 = .333

3.4 Assignment 3.1 Create entry as screenshot below or use Probability.xls workbook file from CD companion of Statistics

for Managers Using Microsoft Excel Textbook.

Input the data only to the blue color cells.

Probabilities

Sample Space Column Variable

B B' Totals

Row Variable A 200 50 250

A' 100 650 750

Totals 300 700 1000

MANAGEMENT PROGRAM



Simple Probabilities

P(A) 0.25

P(A') 0.75

P(B) 0.30

P(B') 0.70

Joint Probabilities

P(A and B) 0.20

P(A and B') 0.05

P(A' and B) 0.10

P(A' and B') 0.65

Addition Rule

P(A or B) 0.35

P(A or B') 0.90

P(A' or B) 0.95

P(A' or B') 0.80

1. A Music Store has been visited by 7 customers that have been bought some goods and 9 others

just window shopping at random times. Achmad (customer) arrived at 11:30 am.

a. Give an example of a simple event

b. What is the complement of a customer have been bought some goods?

2. Given the following contingency table:

B B’

A 12 48

A’ 30 54

Use calculator and MS Excel to find the probability of

a. Event A’

b. Event A and B

c. Event A’ and B

d. Event A’ and B’

3. Compare calculation results (calculator and Ms Excel)

4. A box of nine gloves contains two left‐handed gloves and seven right handed gloves.

a. if two gloves are randomly selected from the box without replacement, what is the

probability that both gloves selected will be right‐handed?

b. if two gloves are randomly selected from the box without replacement, what is the

probability there will be one right‐handed and one left‐handed gloves?

c. if three gloves are selected from the box with replacement, what is the probability that

all three gloves will be left right‐handed?

d. If you were sampling with replacement, what would be the answers to (a) and (b)?

MANAGEMENT PROGRAM



5. An advertizing executive is studying television viewing habits of married man and women during

prime‐time hours. Based on past viewing records, the executive has determined that during

prime‐time, husbands are watching television 60% of the time. When the husband is watching

television, 40% of the time the wife is also watching. When the husband is not watching

television, 30% of the time the wife is watching television. Find the probability that

a. If the wife is watching television, the husband is also watching television

b. The wife is watching television in prime time.

3.5 Basic Probability Rules A random variable represents a possible numerical value from an uncertain event.

Discrete random variables produce outcomes that come from a counting process (i.e. number

of classes you are taking).

Continuous random variables produce outcomes that come from a measurement (i.e. your

annual salary, or your weight).

3.5.1 Discrete Random Variable A probability distribution for a discrete random variable is a mutually exclusive listing of all

possible numerical outcomes for that variable and a particular probability of occurrence

associated with each outcome

Number of Classes Taken Probability

2 0.2

3 0.4

4 0.24

5 0.16

Example: Experiment with toss 2 coins. Let X = number of heads.

X Value Probability

0 1/4 = .25

1 2/4 = .50

2 1/4 = .25

MANAGEMENT PROGRAM



3.5.2 Discrete Random Variables Expected Value Expected Value (or mean) of a discrete distribution (Weighted Average)

N

iii XPX

1

)( E(X)

Example: Toss 2 coins, X = # of heads,

Compute expected value of X:

E(X) = (0)(.25) + (1)(.50) + (2)(.25) = 1.0

3.5.3 Discrete Random Variables Dispersion Variance of a discrete random variable

N

1ii

2i

2 )P(XE(X)][Xσ

Standard Deviation of a discrete random variable

N

1ii

2i

2 )P(XE(X)][Xσσ

where:

E(X) = Expected value of the discrete random variable X

Xi = the ith outcome of X

P(Xi) = Probability of the ith occurrence of X

Example: Toss 2 coins, X = # heads, compute standard deviation (recall that E(X) = 1)

.707.50(.25)1)(2(.50)1)(1(.25)1)(0σ 222

3.5.4 Covariance The covariance measures the strength of the linear relationship between two numerical random

variables X and Y. A positive covariance indicates a positive relationship. A negative covariance

indicates a negative relationship.

Covariance formula: )()]()][(([σ1

N

iiiiiXY YXPYEYXEX

where: X = discrete variable X

Xi = the ith outcome of X

Y = discrete variable Y

Yi = the ith outcome of Y

P(XiYi) = probability of occurrence of the condition affecting

MANAGEMENT PROGRAM



the ith outcome of X and the ith outcome of Y

Example:

Consider the return per $1000 for two types of investments

Economic P(X

iYi) Condition

Investment

Passive Fund X Aggressive Fund Y

0.2 Recession ‐ $25 ‐ $200

0.5 Stable Economy + $50 + $60

0.3 Expanding Economy + $100 + $350

Investment Returns ‐ The Mean

E(X) = μX = (‐25)(.2) +(50)(.5) + (100)(.3) = 50

E(Y) = μY = (‐200)(.2) +(60)(.5) + (350)(.3) = 95

Interpretation: Fund X is averaging a $50.00 return and fund Y is averaging a $95.00

return per $1000 invested.

Investment Returns ‐ Standard Deviation

43.30(.3)50)(100(.5)50)(50(.2)50)(-25σ 222X

71.193)3(.)95350()5(.)9560()2(.)95200-(σ 222Y

Interpretation: Even though fund Y has a higher average return, it is subject to much

more variability and the probability of loss is higher.

Investment Returns – Covariance

8250

95)(.3)50)(350(100 95)(.5)50)(60(5095)(.2)200-50)((-25σXY

Interpretation: Since the covariance is large and positive, there is a positive relationship

between the two investment funds, meaning that they will likely rise and fall together.

3.5.5 The Sum of Two Random Variables: Measures Expected Value: )()()( YEXEYXE

Variance: XYYXYXYX 2σσσσ)Var( 222

MANAGEMENT PROGRAM



Standard deviation: 2σσ YXYX

Example: Portfolio Expected Return and Expected Risk

Investment portfolios usually contain several different funds (random variables)

The expected return and standard deviation of two funds together can now be calculated.

Investment Objective: Maximize return (mean) while minimizing risk (standard deviation).

Recall: Investment X: E(X) = 50 σX = 43.30

Investment Y: E(Y) = 95 σY = 193.21

σXY = 8250

Suppose 40% of the portfolio is in Investment X and 60% is in Investment Y:

77)95()6(.)50(4.E(P)

04.1338250)2(.4)(.6)((193.21))6(.(43.30)(.4)σ 2222 P

The portfolio return is between the values for investments X and Y considered individually.

3.6 Binomial Distribution

3.6.1 Properties A fixed number of observations, n

ex. 15 tosses of a coin; ten light bulbs taken from a warehouse Two mutually exclusive and collectively exhaustive categories

ex. head or tail in each toss of a coin; defective or not defective light bulb; having a boy or girl

Generally called “success” and “failure” Probability of success is p, probability of failure is 1 – p

Constant probability for each observation ex. Probability of getting a tail is the same each time we toss the coin

Observations are independent

The outcome of one observation does not affect the outcome of the other

Two sampling methods

Infinite population without replacement

Finite population with replacement

The number of combinations of selecting X objects out of n objects is:

X)!(nX!

n!

X

nCXn

where:

n! =n(n ‐ 1)(n ‐ 2) . . . (2)(1)

X! = X(X ‐ 1)(X ‐ 2) . . . (2)(1)

0! = 1 (by definition)

MANAGEMENT PROGRAM



3.6.2 The Binomial Distribution Formula

XnX )(1X)!(nX!

n!P(X)

pp

P(X) = probability of X successes in n trials, with probability of success p on each trial

X = number of ‘successes’ in sample, (X = 0, 1, 2, ..., n) N = sample size (number of trials or observations) P = probability of “success” Example: What is the probability of one success in five observations if the probability of success

is .1? X = 1, n = 5, and p = .1

.32805

)(5)(.1)(.9

.1)(1(.1)1)!(51!

5!

)(1X)!(nX!

n!1)P(X

4

151

XnX

pp

3.6.3 The shape and Characteristics The shape of the binomial distribution depends on the values of p and n

Mean: pnE(x)μ

Variance and Standard Deviation

)-(1nσ2 pp and )-(1nσ pp

0.5(5)(.1)nμ p

0.6708.1)(5)(.1)(1)-(1nσ pp

2.5(5)(.5)nμ p

1.118.5)(5)(.5)(1)-(1nσ pp

MANAGEMENT PROGRAM



3.7 Poisson Distribution An area of opportunity is a continuous unit or interval of time, volume, or such area in which

more than one occurrence of an event can occur.

ex. The number of scratches in a car’s paint

ex. The number of mosquito bites on a person

ex. The number of computer crashes in a day

3.7.1 Properties Count the number of times an event occurs in a given area of opportunity

The probability that an event occurs in one area of opportunity is the same for all areas of

opportunity

The number of events that occur in one area of opportunity is independent of the number

of events that occur in the other areas of opportunity

The probability that two or more events occur in an area of opportunity approaches zero as

the area of opportunity becomes smaller

The average number of events per unit is (lambda)

3.7.2 Formula

X!

λeP(X)

xλ

where:

X = the probability of X events in an area of opportunity

= expected number of events

e = mathematical constant approximated by 2.71828…

Suppose that, on average, 5 cars enter a parking lot per minute. What is the probability that in a

given minute, 7 cars will enter? So, X = 7 and λ = 5

0.1047!

5e

X!

λeP(7)

75xλ

So, there is a 10.4% chance 7 cars will enter the parking in a given minute.

MANAGEMENT PROGRAM



3.7.3 Shape

3.8 Hypergeometric distribution

The binomial distribution is applicable when selecting from a finite population with replacement

or from an infinite population without replacement.

The hypergeometric distribution is applicable when selecting from a finite population without

replacement.

“n” trials in a sample taken from a finite population of size N

Sample taken without replacement

Outcomes of trials are dependent

Concerned with finding the probability of “X” successes in the sample where there are “A”

successes in the population

3.8.1 Formula

n

N

Xn

AN

X

A

XP )(

Where:

N = population size

A = number of successes in the population

N – A = number of failures in the population

X P(X)

0 1 2 3 4 5 6 7

0.6065 0.3033 0.0758 0.0126 0.0016 0.0002 0.0000 0.0000

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0 1 2 3 4 5 6 7

x

P(x

)

P(X = 2) = .0758

=0.5

MANAGEMENT PROGRAM



n = sample size

X = number of successes in the sample

n – X = number of failures in the sample

The mean of the hypergeometric distribution is: N

nAE(x)μ

The standard deviation is: 1- N

n-N

N

A)-nA(Nσ

2

Where: 1- N

n-Nis called the “Finite Population Correction Factor” from sampling without

replacement from a finite population

3.8.2 Example Different computers are checked from 10 in the department. 4 of the 10 computers have illegal

software loaded. What is the probability that 2 of the 3 selected computers have illegal

software loaded?

So, N = 10, n = 3, A = 4, X = 2

0.3120

(6)(6)

3

10

1

6

2

4

n

N

Xn

AN

X

A

2)P(X

The probability that 2 of the 3 selected computers have illegal software loaded is .30, or 30%.

3.9 Read Excel Companion to Chapter 5 Levine, et.al. 2008. Statistics for Managers – Using Microsoft™ Excel. Fifth Editon. Pearson

Education, Inc., Upper Saddle River, New Jersey., pages 211‐215

3.10 Assignment 3.2

1. Problems for Section 5.1 Number 5.2 and 5.4

2. Problems for Section 5.2 Number 5.14




MANAGEMENT PROGRAM



3.11 Assignment 3.3 1. Problem Section 6.2 – No. 6.2, No. 6.7

2. Problem Section 6.3 – No. 6.14, 6.15 and No. 6.16

3. Problem Section 6.4 – No. 6.24, 6.25 and No. 6.26

4. Problem Section 6.5 – No. 6.35 and No. 6.36

MANAGEMENT PROGRAM




Date of Receipt




Module Description: NORMAL AND SAMPLING DISTRIBUTION

Objective Define continuous distribution: normal, uniform and exponential Probabilities using formulas and tables The concept of the sampling distribution The importance of the Central Limit Theorem Examine when to apply different distributions


4 NORMAL AND SAMPLING DISTRIBUTION

4.1 Normal Distribution and Evaluating Normality Normal distribution or Gaussian distribution is a continuous probability distribution that describes data

that cluster around the mean. The normal distribution has several theoretical properties:

Bell Shaped in its appearance

Measures of central tendency (mean, median and mode) are equal

Interquartile range is equal to 1.33 standar deviations.

Infinite range

The normal distribution can be used to describe, at least approximately, any variable that tends to

cluster around the mean. For example, the heights of adult males in the Indonesian are roughly normally

distributed, with a mean of about 160 cm. Most men have a height close to the mean, though a small

number of outliers have a height significantly above or below the mean. A histogram of male heights will

appear similar to a bell curve, with the correspondence becoming closer if more data are used.

MANAGEMENT PROGRAM



Figure 4‐1 Normal Distribution

Source: http://upload.wikimedia.org/wikipedia/commons/b/bb/Normal_distribution_and_scales.gif

By the central limit theorem, the sum of a large number of independent random variables is distributed

approximately normally. For this reason, the normal distribution is used throughout statistics, natural

science, and social science as a simple model for complex phenomena. For example, the observational

error in an experiment is usually assumed to follow a normal distribution, and the propagation of

uncertainty is computed using this assumption.

4.1.1 Normal Probability Density Function Normal equation. The value of the random variable Y (f(X)) is:

1

√2 /

where X is a normal random variable, μ is the mean, σ is the standard deviation, π is approximately 3.14159, and e is approximately 2.71828.

4.1.1.1 Transformation Formula The Z value is equal to the difference between X and the mean, µ, divided by the standard deviation, σ.

MANAGEMENT PROGRAM



4.1.1.2 Probability and the Normal Curve

The normal distribution is a continuous probability distribution. This has several implications for

probability.

The total area under the normal curve is equal to 1.

The probability that a normal random variable X equals any particular value is 0.

The probability that X is greater than a equals the area under the normal curve bounded by a

and plus infinity (as indicated by the non‐shaded area in the figure below).

The probability that X is less than a equals the area under the normal curve bounded by a and

minus infinity (as indicated by the shaded area in the figure below).

The Standardized Normal Probability Density Function is given by equation:

1

√2

Additionally, every normal curve (regardless of its mean or standard deviation) conforms to the

following "rule".

About 68% of the area under the curve falls within 1 standard deviation of the mean.

About 95% of the area under the curve falls within 2 standard deviations of the mean.

About 99.7% of the area under the curve falls within 3 standard deviations of the mean.

Collectively, these points are known as the empirical rule or the 68‐95‐99.7 rule. Clearly, given a normal

distribution, most outcomes will be within 3 standard deviations of the mean.

To see how transformation formula is applied see page 222‐232 Chapter 6 The Normal Distribution of

Levine, et.al. 2008. Statistics for Managers using Microsoft Excel Fifth Edition.

4.1.2 Evaluating Normality

4.1.2.1 Compare Data Characteristics to Theoretical Properties of normal distribution The normal distribution:

Symmetrical mean and median are equal

Bell shaped empirical rule applies

Interquartile range = 1.33 standard deviations

How to compare:

5. Construct charts and observe their appearance. For small or moderate data sets,

construct stem‐leaf display or a box‐and‐whisker plot. For large data sets, construct the

frequency distribution and plot the histogram or polygon.

MANAGEMENT PROGRAM



6. Compute descriptive numerical measures and compare the characteristics of the data

with the theoretical properties of the normal distribution. Compare mean and media.

The interquartile range should approximately 1.33 times of the standard deviation. The

range approximately 6 times the standard deviation.

7. Evaluate how the values in data distributed. Determine whether ±2/3 of values lie

between the mean and ± standard deviation. Determine ± 4/5 of the values lie between

the mean and ± 1.28 standard deviations. Determine whether ± 19 out of every 20

values lies between the mean ± 2 standard deviation

4. Example:

3 Year Return

Mean 17.8

Standard Error 0.17099

Median 17.2

Mode 15.1

Standard Deviation 4.94991

Sample Variance 24.5016

Kurtosis 1.03812

Skewness 0.66073

Range 35.6

Minimum 6.7

Maximum 42.3

Sum 14916.4

Count 838

Largest(1) 42.3

Smallest(1) 6.7

Confidence Level(95.0%) 0.33562

1. The Mean (17.8) slightly higher than The Median (17.2) {Normal Dist. mean =

median}

2. Box and Whisker plot right‐skewed withmax oulier 42. {Normal Dist. Symmetrical}

3. Interquartile range 7.0 approx. 1.41 Standard Deviation (SD) {Normal Dist. 1.33}

4. Range 35.6 equal to 7.19 SD {Normal Dist. 6 SD}

5. 74.2 Returns are within ± 1 SD of the mean. {Normal Dist. 68.26%}

6. 83.3% or returns within ± 1.28 SD (Normal Dist. 80% }

Thus, the conclusion base on the fact above, the three year returns are right skewed and

not normally distributed.

4.1.2.2 Construct a normal probability plot A normal probability plot is graphical approach for evaluating whether data are normally

distributed. The approach is called quantile‐quantile plot. A normal probability plot for data from

a normal distribution will be approximately linear. To compute normal probabilities and create

plots, we can use PHStat as described on Excel Companion to Chapter 6 of Levine, et.al. 2008.

3 Year Return

0 10 20 30 40

Box-and-Whisker Plot of Three-Year Returns

MANAGEMENT PROGRAM



Statistics for Managers – Using Microsoft™ Excel. Fifth Editon. Pearson Education, Inc., Upper

Saddle River, New Jersey., pages 247‐249

4.2 Sampling and Sampling Distribution Read Excel Companion to Chapter 7 Levine, et.al. 2008. Statistics for Managers – Using Microsoft™ Excel.

Fifth Editon. Pearson Education, Inc., Upper Saddle River, New Jersey., pages 281‐282

4.2.1 Sample Selecting a sample is less time‐consuming than selecting every item in the population (census).

Selecting a sample is less costly than selecting every item in the population.

An analysis of a sample is less cumbersome and more practical than an analysis of the entire

population.

4-2The relationship between populations, samples, parameters, and statistics.

4.2.2 Types of Samples In a nonprobability sample, items included are chosen without regard to their probability of

occurrence.

o Convenience sampling, items are selected based only on the fact that they are easy,

inexpensive, or convenient to sample.

o Judgment sample, you get the opinions of pre‐selected experts in the subject

matter.

In a probability sample, items in the sample are chosen on the basis of known probabilities.

o Simple Random Sampling, every individual or item from the frame has an equal

chance of being selected. Selection may be with replacement (selected individual is

returned to frame for possible reselection) or without replacement (selected

individual isn’t returned to the frame). Samples obtained from table of random

numbers or computer random number generators.

MANAGEMENT PROGRAM



o Systematic Sampling, Decide on sample size: n; Divide frame of N individuals into

groups of k individuals: k=N/n; Randomly select one individual from the 1st group;

Select every kth individual thereafter.

For example, suppose you were sampling n = 9 individuals from a population

of N = 72. So, the population would be divided into k = 72/9 = 8 groups.

Randomly select a member from group 1, say individual 3. Then, select

every 8th individual thereafter (i.e. 3, 11, 19, 27, 35, 43, 51, 59, 67)

o Stratified Sampling, divide population into two or more subgroups (called strata)

according to some common characteristic. A simple random sample is selected from

each subgroup, with sample sizes proportional to strata sizes. Samples from

subgroups are combined into one. This is a common technique when sampling

population of voters, stratifying across racial or socio‐economic lines.

o Cluster Sampling, Population is divided into several “clusters,” each representative

of the population. A simple random sample of clusters is selected. All items in the

selected clusters can be used, or items can be chosen from a cluster using another

probability sampling technique. A common application of cluster sampling involves

election exit polls, where certain election districts are selected and sampled.

Comparing Sampling Methods

o Simple random sample and Systematic sample

Simple to use

May not be a good representation of the population’s underlying

characteristics

o Stratified sample

Ensures representation of individuals across the entire population

o Cluster sample

More cost effective

Less efficient (need larger sample to acquire the same level of precision)

4.2.3 Sampling Distributions A sampling distribution is a distribution of all of the possible values of a statistic for a given

size sample selected from a population.

For example, suppose you sample 50 students from your college regarding their mean GPA.

If you obtained many different samples of 50, you will compute a different mean for each

sample. We are interested in the distribution of all potential mean GPA we might calculate

for any given sample of 50 students.

Example:

o Suppose your population (simplified) was four people at your institution.

o Population size N=4

o Random variable, X, is age of individuals

MANAGEMENT PROGRAM



o Values of X: 18, 20, 22, 24 (years)

4.2.4 SAMPLING FROM FINITE POPULATIONS

4.2.4.1 USING THE FINITE POPULATION CORRECTION FACTOR WITH THE MEAN In the cereal‐filling example in Section 7.3 on page 265, you selected a sample of 25 cereal

boxes from a filling process with μ = 368 grams. Suppose that 2,000 boxes (i.e., the population)

are filled on this particular day. Using the fpc factor, what is the probability that the sample

mean is below 365 grams?

SOLUTION Using the fpc factor, σ = 15, n = 25, and N = 2,000, so that The probability that the

sample mean is below 365 is computed as follows:

From Table E.2, the area below 365 grams is 0.1562.

The fpc factor has a very small effect on the standard error of the mean and the subsequent

area under the normal curve because the sample size is only 1.25% of the population size (that

is, n/N = 25/2,000 = 0.0125).

4.3 Assignment for Simple Random Sample Problem for Section 7.1 Number 7.2, 7.4, and 7.8;

Problem for Section 7.2 Number 7.10, 7.14

4.4 Assignment for Sampling Distribution Problem for Section 7.4 Number 7.18, 7.20, and 7.24

4.5 Assignment for The Sampling Distribution of the mean Problem for Section 7.5 Number 7.28, and 7.32

MANAGEMENT PROGRAM



4.6 Assignment for Sampling from Finite Population 1. Given that N = 80 and n = 10 and the sample is selected without replacement,

determine the fpc factor. 2. Historically, 93% of the deliveries of an overnight mail service arrive before 10:30 the

following morning. If a random sample of 500 deliveries is selected without replacement from a population that consisted of 10,000 deliveries, what is the probability that the sample will have : a. between 93% and 95% of the deliveries arriving before 10:30 the following morning? b. more than 95% of the deliveries arriving before 10:30 the following morning?

MANAGEMENT PROGRAM




Date of Receipt




Module Description: CONFIDENCE INTERVAL ESTIMATION

Objective To construct and interpret confidence interval estimates for the mean and the proportion How to determine the sample size necessary to develop a confidence interval for the mean or proportion How to use confidence interval estimates in auditing


Pre‐Lab Read:


Education, Inc., Upper Saddle River, New Jersey., pages 322‐326.

5 CONFIDENCE INTERVAL ESTIMATION

5.1 Confidence intervals

5.1.1 A point estimate and a confidence interval estimate

5.1.1.1 Point Estimates A point estimate is a single number. For the population mean (and population standard

deviation), a point estimate is the sample mean (and sample standard deviation). A confidence

interval provides additional information about variability.

5.1.1.2 Confidence Interval Estimates Point Estimate ± (Critical Value) (Standard Error)

Point Estimate

Width of confidence interval

MANAGEMENT PROGRAM



A confidence interval gives a range estimate of values:

Takes into consideration variation in sample statistics from sample to sample

Based on all the observations from 1 sample

Gives information about closeness to unknown population parameters

Confidence Level: Confidence in which the interval will contain the unknown population

parameter. A percentage (less than 100%) Stated in terms of level of confidence

Ex. 95% confidence, 99% confidence

5.1.1.3 Confidence Level Suppose confidence level = 95% , also written (1 ‐ ) = .95. A relative frequency interpretation “In the long run, 95% of all the confidence intervals that can be constructed will contain the

unknown true parameter”. A specific interval either will contain or will not contain the true

parameter

5.1.2 Confidence Interval for μ (σ Known) Assumptions:

o Population standard deviation σ is known

o Population is normally distributed

o If population is not normal, use large sample

Confidence interval estimate: n

σZX where Z is the standardized normal distribution

critical value for a probability of α/2 in each tail.

5.1.2.1 Finding the Critical Value, Z Consider a 95% confidence interval:

MANAGEMENT PROGRAM



Commonly used confidence levels are 90%, 95%, and 99%

Confidence Level Confidence Coefficient Z value

80% .80 1.280

90% .90 1.645

95% .95 1.960

98% .98 2.330

99% .99 2.580

99.8% .998 3.080

99.9% .999 3.270

5.1.2.2 Intervals and Level of Confidence

Example:

A sample of 11 circuits from a large normal population has a mean resistance of 2.20 ohms. We

know from past testing that the population standard deviation is .35 ohms. Determine a 95%

and 99% confidence interval for the true mean resistance of the population.

Solution:

95% CI

X Zσ

√n)11(.35/ 1.96 2.20 .2068 2.20 2.4068) , (1.9932

We are 95% confident that the true mean resistance is between 1.9932 and 2.4068 ohms

Although the true mean may or may not be in this interval, 95% of intervals formed in this

manner will contain the true mean.

99% CI

X Zσ

√n)11(.35/ 2.58 2.20 0.2723 2.20 2.4723) , (1.9277

MANAGEMENT PROGRAM



We are 98% confident that the true mean resistance is between 1.9277 and 2.4723 ohms

Although the true mean may or may not be in this interval, 96% of intervals formed in this

manner will contain the true mean.

5.1.3 Confidence Interval for μ (σ Unknown) If the population standard deviation σ is unknown, we can substitute the sample standard

deviation, S This introduces extra uncertainty, since S is variable from sample to sample So we

use the t distribution instead of the normal distribution.

Assumptions:

o Population standard deviation is unknown

o Population is normally distributed

o If population is not normal, use large sample

o Use Student’s t Distribution

Confidence Interval Estimate : n

StX 1-n , where t is the critical value of the t distribution with

n‐1 d.f. and an area of α/2 in each tail

The t value depends on degrees of freedom (d.f.), Number of observations that are free to vary

after sample mean has been calculated: d.f. = n ‐ 1

5.1.3.1 Student’s t Distribution

If n increases then t Z.

MANAGEMENT PROGRAM



5.1.3.2 Student’s t Table

5.1.3.3 Confidence Interval for μ(σ Unknown) Example Example 1

A random sample of n = 25 has X = 50 and S = 8. Form a 95% confidence interval for μ.

Solution: d.f. = n – 1 = 24, so the confidence interval is

25

8(2.0639)50

n

S1-n /2, tX

= (46.698 , 53.302)

A B C 1 Form a 95% confidence interval

for Mean using Ms Excel 2

3 Data 4 Sample Standard Deviation 8 5 Sample Mean 50 6 Sample Size 25 7 Confidence Level 95% 8

9 Intermediate Calculations 10 Standard Error of the Mean 1.6000 =B4/SQRT(B6) 11 Degrees of Freedom 24 =B6‐1 12 t Value 2.0639 =TINV(1‐B7,B11) 13 Interval Half Width 3.3022 =B12*B10 14 15 Confidence Interval 16 Interval Lower Limit 46.6978 =B5‐B13 17 Interval Upper Limit 53.3022 =B5+B13

Example 2:

MANAGEMENT PROGRAM



Contruct a 95% confidence interval estimate for the population mean force required to break the

insulator:

Force Required to Break Electric Insulators (in pounds)

1870 1728 1656 1610 1634 1784 1522 1696 1592 1662

1866 1764 1734 1662 1734 1774 1550 1756 1762 1866

1820 1744 1788 1688 1810 1752 1680 1810 1652 1736

Solution:

Put Data on range of F2 to O4

A B C 1 Estimate for the Mean Amount of Force Required 2

3 Data 4 Sample Standard Deviation 89.5508 =STDEV(F2:O4) 5 Sample Mean 1723.4 =AVERAGE(F2:O4) 6 Sample Size 30 =COUNT(F2:O4) 7 Confidence Level 95% 8

9 Intermediate Calculations 10 Standard Error of the Mean 16.3497 =B4/SQRT(B6) 11 Degrees of Freedom 29 =B6‐1 12 t Value 2.0452 =TINV(1‐B7,B11) 13 Interval Half Width 33.4388 =B12*B10 14 15 Confidence Interval 16 Interval Lower Limit 1689.96 =B5‐B13 17 Interval Upper Limit 1756.84 =B5+B13

We can conclude with 95% confidence that the mean breaking force required for the population of

insulator is between 1689.96 an d 1756.84 pounds. The validity of this confidence interval estimate

depends on the assumption that the force required is normally distributed. If the sample number is large

than we can slightly loosen this assumption. Thus, with a sample of 30, we can use the t distribution

even distribution is slightly left skewed (see. Probability Plot or box‐and –whisker plot). Thus, the t

distribution is appropriate for the data.

MANAGEMENT PROGRAM



5.2 Confidence Interval Estimate for a Single Population Proportion An interval estimate for the population proportion ( π ) can be calculated by adding an allowance

for uncertainty to the sample proportion ( p ).

Recall that the distribution of the sample proportion is approximately normal if the sample size is

large, with standard deviation: n

)(1σp

We will estimate this with sample data:n

p)p(1

Upper and lower confidence limits for the population proportion are calculated with the formula:

n

p)p(1Zp

where :

Z is the standardized normal value for the level of confidence desired

p is the sample proportion

n is the sample size

5.2.1 Example for Confidence Intervals for the Population Proportion A random sample of 100 people shows that 25 have opened IRA’s this year. Form a 95%

confidence interval for the true proportion of the population who have opened IRA’s.

00.25(.75)/196.125/100p)/np(1p Z

0.3349) , (0.1651 (.0433) 1.96 .25

0

200

400

600

800

1000

1200

1400

1600

1800

2000

-3 -2 -1 0 1 2 3

For

ce

Z Value

Force Required to Break Electrical Insulators

Force

1500 1600 1700 1800 1900

Force Required to Break Electrical Insulators

MANAGEMENT PROGRAM



Solving Confidence Interval for Population Proportion using Ms Excel

A B C 1 Proportion of In‐Error Sales Invoices 2 Data 3 Sample Size 100 4 Number of Successes 10 5 Confidence Level 95% 6

7 Intermediate Calculations 8 Sample Proportion 0.1 =B5/B4 9

Z Value ‐

1.9600 =NORMSINV((1‐B6)/2) 10 Standard Error of the Proportion 0.03 =SQRT(B9*(1‐B9)/B4) Interval Half Width 0.0588 =ABS(B10*B11)

11 12 Confidence Interval 13 Interval Lower Limit 0.0412 =B9‐B12 14 Interval Upper Limit 0.1588 =B9+B12

5.3 Determining Sample Size The required sample size can be found to reach a desired margin of error (e) with a specified level

of confidence (1 ‐ ). The margin of error is also called sampling error is the amount of

imprecision in the estimate of the population parameter and the amount added and subtracted to

the point estimate to form the confidence interval.

To determine the required sample size for the mean, you must know The desired level of

confidence (1 ‐ ), which determines the critical Z value; the acceptable sampling error (margin of

error), e and The standard deviation, σ.

The formula: 2

22 σ

e

Zn

5.3.1 IF Population Standard Deviation (σ) Known If = 45, what sample size is needed to estimate the mean within ± 5 with 90% confidence?

Solution: 219.195

(45)(1.645)σ2

22

2

22

e

Zn The required sample size is n = 220

Using Ms Excel:

MANAGEMENT PROGRAM



5.3.2 IF Population Standard Deviation (σ) Unknown If unknown, σ can be estimated when using the required sample size formula by using a value

for σ that is expected to be at least as large as the true σ and select a pilot sample and

estimate σ with the sample standard deviation, S .

5.3.3 To Determine The Required Sample Size For The Proportion To determine the required sample size for the proportion, you must know:

o The desired level of confidence (1 ‐ ), which determines the critical Z value

o The acceptable sampling error (margin of error), e

o The true proportion of “successes”, π

o π can be estimated with a pilot sample, if necessary (or conservatively use π = .50)

2

2 )1(

e

Zn

o Example: How large a sample would be to estimate the true proportion defective in a large

population within ±3%, with 95% confidence? (Assume a pilot sample yields p = .12)

o Solution: For 95% confidence, use Z = 1.96, e = .03 and p = .12, so use this to estimate π

o samples 451 450.74(.03)

.12)(.12)(1(1.96))1(2

2

2

2

e

Zn

o Using Ms Excel:

MANAGEMENT PROGRAM



5.4 Assignment 5 Levine, et.al. 2008. Statistics for Managers – Using Microsoft™ Excel. Fifth Editon. Pearson

Education, Inc., Upper Saddle River, New Jersey., Chapter 8 Problems. Pages 283‐ 319

1. Problem for Section 8.1 No. 8.2, 8.4, 8.8

2. Problem for Section 8.2 No. 8.12, 8.14, 8.18, 8.22



MANAGEMENT PROGRAM




Date of Receipt




Module Description: HYPOTHESIS TESTING AND TWO SAMPLE TEST

Objective The basic principles of hypothesis testing How to use hypothesis testing to test a mean or proportion The assumption of each hypothesis‐testing procedure, how to evaluate them and the consequences if they are violated Formulate a decision rule for testing a hypothesis Know Type I and Type II errors and Use hypothesis testing for comparing the difference between: The means of two independent populations The means of two related populations The proportions of two independent populations The variances of two independent populations.


Pre‐Lab Read:


Education, Inc., Upper Saddle River, New Jersey., Chapter 9 and Excel Companion to Chapter 9.

Pages 328‐ 367 and 369 ‐420

6 HYPOTHESIS TESTING AND TWO SAMPLE TEST

6.1 Hypothesis Testing A hypothesis is a claim (assumption) about a population parameter:

Population mean. Example: The mean monthly cell phone bill of this city is μ = $52.

Population proportion. Example: The proportion of adults in this city with cell phones is π

= .68

States the assumption (numerical) to be tested

Example: The mean number of TV sets in U.S. Homes is equal to three. 3μ:H0

6.1.1 The Null Hypothesis, H0 o Is always about a population parameter, not about a sample statistic.

o Begin with the assumption that the null hypothesis is true.

o It refers to the status quo

o Always contains “=” , “≤” or “” sign o May or may not be rejected

MANAGEMENT PROGRAM



6.1.2 The Alternative Hypothesis, H1 o Is the opposite of the null hypothesis Ex: The mean number of TV sets in U.S. homes is not

equal to 3 ( H1: μ ≠ 3 )

o Challenges the status quo

o Never contains the “=” , “≤” or “” sign o May or may not be proven

o Is generally the hypothesis that the researcher is trying to prove

6.1.3 The Hypothesis Testing Process o Claim: The population mean age is 50.

o H0: μ = 50, H1: μ ≠ 50

o Sample the population and find sample mean.

o Suppose the sample mean age was X = 20.

o This is significantly lower than the claimed mean population age of 50.

o If the null hypothesis were true, the probability of getting such a different sample mean

would be very small, so you reject the null hypothesis .

o In other words, getting a sample mean of 20 is so unlikely if the population mean was 50, you

conclude that the population mean must not be 50.

Population :

Sample:

MANAGEMENT PROGRAM



6.1.4 The Test Statistic and Critical Values If the sample mean is close to the assumed population mean, the null hypothesis is not

rejected.

If the sample mean is far from the assumed population mean, the null hypothesis is rejected.

How far is “far enough” to reject H0?

The critical value of a test statistic creates a “line in the sand” for decision making.

6.1.5 Errors in Decision Making

6.1.5.1 Type I Error o Reject a true null hypothesis

o Considered a serious type of error

o The probability of a Type I Error is Called level of significance of the test

Set by researcher in advance

6.1.5.2 Type II Error o Failure to reject false null hypothesis

o The probability of a Type II Error is β

Possible Hypothesis Test Outcomes

Actual Situation

Decision H0 True H0 False

Do Not Reject H0 No Error Probability 1 ‐ α

Type II Error Probability β

Reject H0 Type I Error Probability α

No Error Probability 1 ‐ β

MANAGEMENT PROGRAM



6.1.6 Level of Significance, α For example, Claim: The population mean age is 50.

6.1.7 Hypothesis Testing: σ Known For two tail test for the mean, σ known:

o Convert sample statistic ( X ) to test statistic

n

σμX

Z

o Determine the critical Z values for a specified

level of significance from a table or by using Excel

o Decision Rule: If the test statistic falls in the rejection region, reject H0 ; otherwise do

not reject H0

MANAGEMENT PROGRAM



Example: Test the claim that the true mean weight ofchocolate bars manufactured in a factory is 3

ounces.

Solution:

State the appropriate null and alternative hypotheses: H0: μ = 3 H1: μ ≠ 3 (This is a two tailed

test)

Specify the desired level of significance: Suppose that = .05 is chosen for this test Choose a sample size: Suppose a sample of size n = 100 is selected

Determine the appropriate technique

o σ is known so this is a Z test

Set up the critical values

o For = .05 the critical Z values are ±1.96 Collect the data and compute the test statistic

o Suppose the sample results are n = 100, X = 2.84 (σ = 0.8 is assumed known from past

company records)

So the test statistic is: 2.0.08

.16

100

0.832.84

n

σμX

Z

Since Z = ‐2.0 < ‐1.96, you reject the null hypothesis and conclude that there is sufficient

evidence that the mean weight of chocolate bars is not equal to 3.

6.1.8 6 Steps of Hypothesis Testing: 1. State the null hypothesis, H0 and state the alternative hypotheses, H1

2. Choose the level of significance, α, and the sample size n.

3. Determine the appropriate statistical technique and the test statistic to use

4. Find the critical values and determine the rejection region(s)

5. Collect data and compute the test statistic from the sample result

MANAGEMENT PROGRAM



6. Compare the test statistic to the critical value to determine whether the test statistic falls in

the region of rejection. Make the statistical decision: Reject H0 if the test statistic falls in the

rejection region. Express the decision in the context of the problem.

See Example 9.2 and 9.3


Education, Inc., Upper Saddle River, New Jersey., pages 336 and 337

6.1.9 Hypothesis Testing: σ Known pValue Approach The p‐value is the probability of obtaining a test statistic equal to or more extreme ( < or > )

than the observed sample value given H0 is true. Also called observed level of significance.

Smallest value of for which H0 can be rejected .

Convert Sample Statistic (ex. X) to Test Statistic (ex. Z statistic )

Obtain the p‐value from a table or by using Excel

Compare the p‐value with If p‐value < , reject H0

If p‐value , do not reject H0

Example:

6.1.9.1 Manual Calculation How likely is it to see a sample mean of 2.84 (or something further from the mean, in either

direction) if the true mean is = 3.0? Suppose the sample results are n = 100, σ = 0.8 is assumed

Compare the p‐value with If p‐value < , reject H0

If p‐value , do not reject H0

Here: p‐value = .0456 and = .05, Since .0456 < .05, you reject the null hypothesis

MANAGEMENT PROGRAM



6.1.9.2 Using Ms Excel:

6.1.10 Hypothesis Testing: σ Known Confidence Interval Connections For X = 2.84, σ = 0.8 and n = 100, the 95% confidence interval is:

100

0.8 (1.96) 2.84 to

100

0.8 (1.96) - 2.84

.6832 ≤ μ ≤ 2.9968

Since this interval does not contain the hypothesized mean (3.0), you reject the null

hypothesis at = .05

6.1.11 One Tail Tests In many cases, the alternative hypothesis focuses on a particular direction

This is a lower‐tail test since the alternative hypothesis is focused on the

lower tail below the mean of 3

This is an upper‐tail test since the alternative hypothesis is focused on

the upper tail above the mean of 3

MANAGEMENT PROGRAM



Example:

A phone industry manager thinks that customer monthly cell phone bills have increased, and

now average more than $52 per month. The company wishes to test this claim. Past company

records indicate that the standard deviation is about $10.

Form hypothesis test:

H0: μ ≤ 52 the mean is less than or equal to than $52 per month

H1: μ > 52 the mean is greater than $52 per month (i.e., sufficient evidence exists to support

the manager’s claim)

Suppose that = .10 is chosen for this test

Find the rejection region:

What is Z given a = 0.10?

Suppose a sample is taken with the following results: n = 64, X = 53.1 (=10 was assumed

known from past company records)

Then the test statistic is: 0.88

64

105253.1

n

σμX

Z

MANAGEMENT PROGRAM



Do not reject H0 since Z = 0.88 ≤ 1.28

i.e.: there is not sufficient evidence that the mean bill is greater than $52

Calculate the p‐value and compare to

Microsoft Excel Z‐test Results

MANAGEMENT PROGRAM



6.1.12 Hypothesis Testing: σ Unknown If the population standard deviation is unknown, you instead use the sample standard deviation

S. Because of this change, you use the t distribution instead of the Z distribution to test the null

hypothesis about the mean. All other steps, concepts, and conclusions are the same.

The t test statistic with n‐1 degrees of freedom is:

n

SμX

t 1-n

Example: The mean cost of a hotel room in New York is said to be $168 per night. A random

sample of 25 hotels resulted in X = $172.50 and S = 15.40. Test at the = 0.05 level.

(A stem‐and‐leaf display and a normal probability plot indicate the data are approximately

normally distributed )

H0: μ = 168

H1: μ 168

1.46

25

15.40168172.50

n

SμX

t 1n

Do not reject H0: not sufficient evidence that true mean cost is different from $168

6.1.13 Hypothesis Testing: Connection to Confidence Intervals For X = 172.5, S = 15.40 and n = 25, the 95% confidence interval is:

25

15.4 (2.0639) 172.5 to

25

15.4 (2.0639) - 172.5

1.46

MANAGEMENT PROGRAM



166.14 ≤ μ ≤ 178.86

Since this interval contains the hypothesized mean (168), you do not reject the null hypothesis

at = .05

o Recall that you assume that the sample statistic comes from a random sample from a

normal distribution.

o If the sample size is small (< 30), you should use a box‐and‐whisker plot or a normal

probability plot to assess whether the assumption of normality is valid.

o If the sample size is large, the central limit theorem applies and the sampling

distribution of the mean will be normal.

Microsoft Excel Results

6.1.14 Hypothesis Testing Proportion Involves categorical variables. Two possible outcomes, that is, “Success” (possesses a certain

characteristic) and “Failure” (does not possesses that characteristic). Fraction or proportion of

the population in the “success” category is denoted by π

Sample proportion in the success category is denoted by p

sizesample

sampleinsuccessesofnumber

n

Xp

MANAGEMENT PROGRAM



When both nπ and n(1‐π) are at least 5, p can be approximated by a normal distribution with

mean and standard deviation pμ and n

)(1σ

p

The sampling distribution of p is approximately normal, so the test statistic is a Z value:

n

pZ

)1(

Example: A marketing company claims that it receives 8% responses from its mailing. To test

this claim, a random sample of 500 were surveyed with 30 responses. Test at the = .05 significance level.

Solution:

n π = (500)(.08) = 40 n(1‐π) = (500)(.92) = 460

6.2 Assignment 6.1 1. Problem for Section 9.1 No. 9.1 through 9.5, 9.14, 9.18





6.3 TwoSample Tests Goal: Test hypothesis or form a confidence interval for the difference between two population

means, μ1 – μ2

MANAGEMENT PROGRAM



The point estimate for the difference between sample means: X – X

Different data sources

Independent: Sample selected from one population has no effect on the sample selected

from the other population

Use the difference between 2 sample means

Use Z test, pooled variance t test, or separate‐variance t test

Independent Population Means:

1. σ1 and σ2 known Use a Z test statistic

Assumptions: Samples are randomly and independently drawn and population

distributions are normal

When σ1 and σ2 are known and both populations are normal, the test statistic is

a Z‐value and the standard error of X1 – X2 is

2

22

1

21

XX n

σ

n

σσ

21 and

2

22

1

21

2121

nσ

nσ

μμXXZ

Two Independent Populations, Comparing Means

2. σ1 and σ2 unknown Use S to estimate unknown σ, use a t test statistic

MANAGEMENT PROGRAM



Assumptions: Samples are randomly and independently drawn, Populations

are normally distributed and Population variances are unknown but assumed

equal

Forming interval estimates: The population variances are assumed equal, so

use the two sample standard deviations and pool them to estimate σ the test

statistic is a t value with (n1 + n2 – 2) degrees of freedom

1)n()1(n

S1nS1nS

21

222

211

p

21

2p

2121

n1

n1

S

μμXXt

1)n()1(n

S1nS1nS

21

222

2112

p

6.3.1 TwoSample Tests Independent Populations You are a financial analyst for a brokerage firm. Is there a difference in dividend yield between

stocks listed on the NYSE & NASDAQ? You collect the following data:

NYSE NASDAQ

Number 21 25

Sample mean 3.27 2.53

Sample std dev 1.30 1.16

Assuming both populations are approximately normal with equal variances, is there a difference

in average yield ( = 0.05)?

The test statistic is:

2.040

25

1

21

15021.1

02.533.27

n

1

n

1S

μμXXt

21

2p

2121

1.5021

1)25(1)-(21

1.161251.30121

1)n()1(n

S1nS1nS

22

21

222

2112

p

H0: μ1 ‐ μ2 = 0 i.e. (μ1 = μ2)

H1: μ1 ‐ μ2 ≠ 0 i.e. (μ1 ≠ μ2)

= 0.05 df = 21 + 25 ‐ 2 = 44

Critical Values: t = ± 2.0154

Test Statistic: 2.040

MANAGEMENT PROGRAM



Decision: Reject H0 at α = 0.05

Conclusion: There is evidence of a difference in the means

6.3.2 Independent Populations Unequal Variance If you cannot assume population variances are equal, the pooled‐variance t test is inappropriate,

Instead, use a separate‐variance t test, which includes the two separate sample variances in the

computation of the test statistic. The computations are complicated and are best performed

using Excel.

The confidence interval for μ1 – μ2 is: 2

22

1

21

21n

σ

n

σXX Z

MANAGEMENT PROGRAM




Date of Receipt




Module Description: ANOVA and CHI SQUARE AND NON PARAMETRIC TESTS

Objective The basic concepts of experimental design How to use the one‐way analysis of variance to test for the differences among the means of several groups How to use the two‐way analysis of variance and interpret the interaction and How and when to use the chi‐square test for contingency tables How to use the Marascuillo procedure for determining pair‐wise differences when evaluating more than two porportions How and when to use the McNemar test How and when to use nonparametric tests


Pre‐Lab Read:


Education, Inc., Upper Saddle River, New Jersey., Chapter 10 and Excel Companion to Chapter 10.

Pages 369‐ 420

7 ANOVA AND CHI SQUARE AND NON PARAMETRIC TESTS ANOVA

General ANOVA Setting

Investigator controls one or more factors of interest

MANAGEMENT PROGRAM



o Each factor contains two or more levels

o Levels can be numerical or categorical

o Different levels produce different groups

o Think of the groups as populations

Observe effects on the dependent variable, are the groups the same?

Experimental design: the plan used to collect the data

Completely Randomized Design

Experimental units (subjects) are assigned randomly to the different levels (groups), subjects are

assumed homogeneous

Only one factor or independent variable, with two or more levels (groups)

Analyzed by one‐factor analysis of variance (one‐way ANOVA)

7.1 OneWay Analysis of Variance Evaluate the difference among the means of three or more groups

Examples: Accident rates for 1st, 2nd, and 3rd shift or Expected mileage for five brands of tires

Assumptions:

Populations are normally distributed

Populations have equal variances

Samples are randomly and independently drawn

7.1.1 Hypotheses: OneWay ANOVA

c3210 μμμμ:H

All population means are equal, i.e., no treatment effect (no variation in means among

groups)

c3211 μμμμ:H

At least one population mean is different, i.e., there is a treatment (groups) effect. Does

not mean that all population means are different.

All Means are the same: The Null Hypothesis is True

(No Group Effect)

MANAGEMENT PROGRAM



At least one mean is

different: The Null

Hypothesis is NOT true

(Treatment Effect is

present)

7.1.2 Partitioning the Variation Total variation can be split into two parts:

SST = Total Variation = the aggregate dispersion of the individual data values around the overall (grand) mean of all factor levels (SST)

c

j

n

iij

j

XXSST1 1

2)(

2212

211 )(...)()( XXXXXXSST nc

Where: SST = Total sum of squares c = number of groups nj = number of values in group j = ith value from group j

= grand mean (mean of all data values)

SSA = Among‐Group Variation = dispersion between the factor sample means (SSA)

2

1

)( XXnSSA j

c

jj

2cc

222

211 )XX(n...)XX(n)XX(nSSA

Where: SSA = Sum of squares among groups c = number of groups nj = sample size from group j = sample mean from group j

= grand mean (mean of all data values)

SSW = Within‐Group Variation = dispersion that exists among the data values within the

particular factor levels (SSW)

jn

i

jij

c

j

XXSSW1

2

1

)(

22

212

11 )(...)()( 11 cXXXXXXSSW nc

MANAGEMENT PROGRAM



Where: SSW = Sum of squares within groups c = number of groups nj = sample size from group j = sample mean from group j

= ith value in group j

7.1.3 Obtaining the Mean Squares

1

n

SSTMST Mean Squares Total

1

c

SSAMSA Mean Squares Among

cn

SSWMSW

Mean Squares Within

7.1.4 OneWay ANOVA Table c = number of groups

n = sum of the sample sizes

from all groups

df = degrees of freedom

7.1.5 Test statistic MSA is mean squares among variances

MSW is mean squares within variances

Degrees of freedom

df1 = c – 1 (c = number of groups)

df2 = n – c (n = sum of all sample sizes)

The F statistic is the ratio of the among variance to the within variance

The ratio must always be positive

df1 = c ‐1 will typically be small

df2 = n ‐ c will typically be large

Decision Rule: Reject H0 if F > FU, otherwise do

not reject H0

MANAGEMENT PROGRAM



7.1.6 Example An experiment was conducted to determine whether any significant differences exist in the

strength of parachutes woven from synthetic fibers from four different suppliers (Supplier 1,

Supplier 2, Supplier 3, and Supplier 4)

Supplier 1 Supplier 2 Supplier 3 Supplier 4

18.5 26.3 20.6 25.4

24.0 25.3 25.2 19.9

17.2 24.0 20.8 22.6

19.9 21.2 24.7 17.5

18.0 24.5 22.9 20.4

Sample Mean 19.52 24.26 22.84 21.16

=AVERAGE(…)

Sample Standard Deviation 2.69 1.92 2.13 2.98 =STDEV(…)

To construct the ANOVA summary table, we compute the sample means in each group.

Then compute the grand mean by summing all 20 values and dividing by total number of

values:

∑ ∑ 438.920

21.945

Then compute sum of squares:

5 19.52 21.945 5 24.26 21.945 5 22.84 21.9455 21.16 21.945 63.2855

18.5 19.52 18 19.52 26.63 24.2624.5 24.26 20.6 22.84 22.9 22.8425.4 21.16 20.4 21.16 97.5040

0

5

10

15

20

25

30

0 1 2 3 4

Ten

sile

Str

engt

h

Supplier

Tensile Strength Scatter Diagram

MANAGEMENT PROGRAM



18.5 21.945 24 21.945 20.4 21.945

160.7895

1

62.28554 1

21.0952

97.504020 4

6.0940

21.09526.0940

3.4616

Fu form F distribution Table with 3 degrees of freedom in numerator and 16 degrees of

freedom in dominator at 0.05 level of significance is 3.24.

Because the compute test statistic F = 3.4616 > Fu=3.24, we reject the null hypotesis. The

conclusion that there is a significant difference in the mean tensile strength among the

four supplier.

Using Ms Excel Data – Data Analysis –Anova: Single Factor :

7.1.7 The The TukeyKramer Procedure First compute the differences, . Then compute CRITICAL RANGE FOR THE TURKEY‐

KRAMMER

MANAGEMENT PROGRAM



Where QU is the upper‐tail critical value from a

Studentized range distribution having c degrees of freedom in numerator and n‐c degrees

in the denominator.

where:

QU = Value from Studentized Range Distribution with c and n ‐ c degrees of freedom

for the desired level of MSW = Mean Square Within

nj and nj’ = Sample sizes from groups j and j’

7.1.8 ANOVA Assumptions Randomness and Independence: Select random samples from the c groups (or randomly

assign the levels)

Normality: The sample values from each group are from a normal population

Homogeneity of Variance: Can be tested with Levene’s Test

Levene’s Test

o Tests the assumption that the variances of each group are equal.

o First, define the null and alternative hypotheses:

H0: σ21 = σ

22 = …=σ

2c

H1: Not all σ2j are equal

o Second, compute the absolute value of the difference between each value and

the median of each group.

o Third, perform a one‐way ANOVA on these absolute differences.

MANAGEMENT PROGRAM



F = 0.2068 < 3.2389 (or the p‐value = 0.8902 > 0.05)., thus we do not reject the H0. There is no evidence

of a significant difference among the four variances. Therefore, the homogeneity of variance

assumption for ANOVA procedure is justified.

7.2 Two‐Way Analysis of Variance

Examines the effect of

Two factors of interest on the dependent variable

e.g., Percent carbonation and line speed on soft drink bottling process

Interaction between the different levels of these two factors

e.g., Does the effect of one particular carbonation level depend on which

level the line speed is set?

Assumptions

Populations are normally distributed

Populations have equal variances

Independent random samples are selected

7.2.1 Sources of Variation

SST = SSA + SSB + SSAB + SSE

Two Factors of interest: A and B

r = number of levels of factor A

c = number of levels of factor B

n/ = number of replications for each cell

n = total number of observations in all cells (n = rcn/)

Xijk = value of the kth observation of level i of factor A and level j of factor B

MANAGEMENT PROGRAM



7.2.2 Two‐Way ANOVA: Features

Degrees of freedom always add up: n‐1 = rc(n/‐1) + (r‐1) + (c‐1) + (r‐1)(c‐1)

Total = error + factor A + factor B + interaction

The denominator of the F Test is always the same but the numerator is different

The sums of squares always add up: SST = SSE + SSA + SSB + SSAB

Total = error + factor A + factor B + interaction

7.2.3 Interaction

7.3 CHI SQUARE AND NON PARAMETRIC TESTS

All of the inferential statistics we have covered in past lessons, are what are called parametric statistics. To use these statistics we make some assumptions about the distributions they come from, such as they are normally distributed. With parametric statistics we also deal with data for the dependent variable that is at the interval or ratio level of measurement, i.e. test scores, physical measurements.

The parametric statistics we have discussed so for in this course are:

1. the Z‐score test 2. the Z‐test 3. the single‐sample t‐test 4. the independent t‐test 5. the dependent t‐test 6. one‐sample analysis of variance (ANOVA)

We will now consider a widely used non‐parametric test, chi‐square, which we can use with data at the nominal level, that is data that is classificatory. For example, we know the frequency with which entering freshman, when required to purchase a computer for college use, select Macintosh

MANAGEMENT PROGRAM



Computers, IBM Computers, or Some other brand of computer. We want to know if there is a difference among the frequencies with which these three brands of computers are selected or if they choose basically equally among the three brands. This is a problem we can use the chi‐square statistic for.

The chi‐square statistic is used to compare the observed frequency of some observation (such as frequency of buying different brands of computers) with an expected frequency (such as buying equal numbers of each brand of computer). The comparison of observed and expected frequencies is used to calculate the value of the chi‐square statistic, which in turn can be compared with the distribution of chi‐square to make an inference about a statistical problem.

The symbol for chi‐square and the formula are as follows:

where

O is the observed frequency, and

E is the expected frequency.

The degrees of freedom for the one‐dimensional chi‐square statistic is:

df = C ‐ 1

where C is the number of categories or levels of the independent variable.

7.3.1 One‐Variable Chi‐Square (goodness‐of‐fit test) with equal expected frequencies

We can use the chi‐square statistic to test the distribution of measures over levels of a variable to indicate if the distribution of measures is the same for all levels. This is the first use of the one‐variable chi‐square test. This test is also referred to as the goodness‐of‐fit test.

Using the example we already mentioned of the frequency with which entering freshman, when required to purchase a computer for college use, select Macintosh Computers, IBM Computers, or Some other brand of computer. We want to know if there is a significant difference among the frequencies with which these three brands of computers are selected or if the students select equally among the three brands.

The data for 100 students is recorded in the table below (the observed frequencies). We have also indicated the expected frequency for each category. Since there are 100 measures or observations and there are three categories (Macintosh, IBM, and Other) we would indicate the expected frequency for each category to be 100/3 or 33.333. In the third column of the table we have calculated the square of the observed frequency minus the expected frequency divided by

MANAGEMENT PROGRAM



the expected frequency. The sum of the third column would be the value of the chi‐square statistic.

Frequency with which students select computer brand

Computer ObservedFrequency

ExpectedFrequency

(O‐E)2/E

IBM 47 33.333 5.604 Macintosh 36 33.333 0.213 Other 17 33.333 8.003

Total (chi‐square) 13.820

From the table we can see that:

The df = C ‐ 1 = 3 ‐ 1 = 2

We can compare the obtained value of chi‐square with the critical value for the .05 level and with degreees of freedom of 2 obtained from Appendix Table F (Distribution of Chi Square) on page 331 of the text. Looking under the column for .05 and the row for df = 2 we see that the critical value for chi‐square is 5.991.

We now have the information we need to complete the six step process for testing statistical hypotheses for our research problem.

1. State the null hypothesis and the alternative hypothesis based on your research question.

Note: Our null hypothesis, for the chi‐square test, states that there are no differences between the observed and the expected frequencies. The alternate hypothesis states that there are significant differences between the observed and expected frequencies.

2. Set the alpha level.

Note: As usual we will set our alpha level at .05, we have 5 chances in 100 of making a type I error.

3. Calculate the value of the appropriate statistic. Also indicate the degrees of freedom for the statistical test if necessary.

df = C ‐ 1 = 2

MANAGEMENT PROGRAM



4. Write the decision rule for rejecting the null hypothesis.

Reject H0 if >= 5.991.

Note: To write the decision rule we had to know the critical value for chi‐square, with an alpha

level of .05, and 2 degrees of freedom. We can do this by looking at Appendix Table F and noting

the tabled value for the column for the .05 level and the row for 2 df.

5. Write a summary statement based on the decision. Reject H0, p < .05

Note: Since our calculated value of (13.820) is greater than 5.991, we reject the null hypothesis and accept the alternative hypothesis.

6. Write a statement of results in standard English. There is a significant difference among the frequencies with which students purchased three different brands of computers.

7.3.2 One‐Variable Chi‐Square (goodness‐of‐fit test) with predetermined expected frequencies

Let's look at the problem we just solved, in a way that illustrates the other use of one‐variable chi‐square, that is with predetermined expected frequencies rather than with equal frequencies. We could formulated our revised problem as follows:

In a national study, students required to buy computers for college use bought IBM computers 50% of the time, Macintosh computers 25% of the time, and other computers 25% of the time. Of 100 entering freshman we surveyed 36 bought Macintosh Computers, 47 bought IBM computers, and 17 bought some other brand of computer. We want to know if these frequencies of computer buying behavior is similar to or different than the national study data.

The data for 100 students is recorded in the table below (the observed frequencies). In this case the expected frequencies are those from the national study. To get the expected frequency we take the percentages from the national study times the total number of subjects in the current study.

Expected frequency for IBM = 100 X 50% = 50 Expected frequency for Macintosh = 100 X 25% = 25 Expected frequency for Other = 100 X 25% = 25

The expected frequencies are recorded in the second column of the table. As before we have

calculated the square of the observed frequency minus the expected frequency divided by the

expected frequency and recorded this result in the third column of the table. The sum of the third

column would be the value of the chi‐square statistic.

MANAGEMENT PROGRAM



Frequency with which students select computer brand

Computer ObservedFrequency

ExpectedFrequency

(O‐E)2/E

IBM 47 50 0.18 Macintosh 36 25 4.84 Other 17 25 2.56

Total (chi‐square) 7.58


The df = C ‐ 1 = 3 ‐ 1 = 2

We can compare the obtained value of chi‐square with the critical value for the .05 level and with degreees of freedom of 2 obtained from Appendix Table F (Distribution of Chi Square) on page 331 of the text. Looking under the column for .05 and the row for df = 2 we see that the critical value for chi‐square is 5.991.



Note: Our null hypothesis, for the chi‐square test, states that there are no differences between the observed and the expected frequencies. The alternate hypothesis states that there are significant differences between the observed and expected frequencies.

2. Set the alpha level. Note: As usual we will set our alpha level at .05, we have 5 chances in 100 of making a type I error.

3. Calculate the value of the appropriate statistic. Also indicate the degrees of freedom for the statistical test if necessary.

7.58

df = C ‐ 1 = 2



MANAGEMENT PROGRAM



Note: To write the decision rule we had to know the critical value for chi‐square, with an alpha

level of .05, and 2 degrees of freedom. We can do this by looking at Appendix Table F and noting

the tabled value for the column for the .05 level and the row for 2 df.

5. Write a summary statement based on the decision. Reject H0, p < .05

Note: Since our calculated value of (7.58) is greater than 5.991, we reject the null hypothesis and accept the alternative hypothesis.

6. Write a statement of results in standard English. There is a significant difference among the frequencies with which students purchased three different brands of computers and the proportions suggested by a national study.

7.3.3 Two‐Variable Chi‐Square (test of independence)

Now let us consider the case of the two‐variable chi‐square test, also known as the test of independence.

For example we may wish to know if there is a significant difference in the frequencies with which males come from small, medium, or large cities as constrasted with females. The two variables we are considering here are hometown size (small, medium, or large) and sex (male or female). Another way of putting our research question is: Is gender independent of size of hometown?

The data for 30 females and 6 males is in the following table.

Frequency with which males and females come from small, medium, and large cities

Small Medium Large Totals

Female 10 14 6 30

Male 4 1 1 6

Totals 14 15 7 36

The formula for chi‐square is the same as before:

where

O is the observed frequency, and

E is the expected frequency.

The degrees of freedom for the two‐dimensional chi‐square statistic is:

MANAGEMENT PROGRAM



df = (C ‐ 1)(R ‐ 1)

where C is the number of columes or levels of the first variable and R is the number of rows or levels of the seconed variable.

In the table above we have the observed frequencies (six of them). Now we must calculate the expected frequency for each of the six cells. For two‐variable chi‐square we find the expected frequencies with the formula:

Expected Frequency for a Cell = (Column Total X Row Total)/Grand Total

In the table above we can see that the Column Totals are 14 (small), 15 (medium), and 7 (large), while the Row Totals are 30 (female) and 6 (male). The grand total is 36.

Using the formula we can thus find the expected frequency for each cell.

1. The expected frequency for the small female cell is 14X30/36 = 11.667 2. The expected frequency for the medium female cell is 15X30/36 = 12.500 3. The expected frequency for the large female cell is 7X30/36 = 5.833 4. The expected frequency for the small male cell is 14X6/36 = 2.333 5. The expected frequency for the medium male cell is 15X6/36 = 2.500 6. The expected frequency for the large male cell is 7X6/36 = 1.167

We can put these expected frequencies in our table and also include the values for (O ‐ E)2/E. The sum of all these will of course be the value of chi‐square.

Observed frequencies, expected frequencies, and (O ‐ E)2/E for males and females from small,

medium, and large cities

Small Medium Large Totals

Observed Expected (O‐E)2/E Observed Expected (O‐E)2/E Observed Expected (O‐E)2/E Female 10 11.667 0.238 14 12.500 0.180 6 5.833 0.005 30 Male 4 2.333 1.191 1 2.500 0.900 1 1.167 0.024 6 Totals 14 15 7 36


=0.238+.180+.005+1.191+0.900+0.024=2.538

and df = (C ‐ 1)(R ‐ 1) = (3 ‐ 1)(2 ‐ 1) = (2)(1) = 2


MANAGEMENT PROGRAM




2. Set the alpha level. 3. Calculate the value of the appropriate statistic. Also indicate the degrees of freedom for

the statistical test if necessary.

df = (C ‐ 1)(R ‐ 1) = (2)(1) = 2



Note: To write the decision rule we had to know the critical value for chi‐square, with an

alpha level of .05, and 2 degrees of freedom. We can do this by looking at Appendix Table

F and noting the tabled value for the column for the .05 level and the row for 2 df.

5. Write a summary statement based on the decision. Fail to reject H0

Note: Since our calculated value of (2.538) is not greater than 5.991, we fail to reject the null hypothesis and are unable to accept the alternative hypothesis.

6. Write a statement of results in standard English. There is not a significant difference in the frequencies with which males come from small, medium, or large towns as compared with females. Hometown size is not independent of gender.

Chi‐square is a useful non‐parametric statistic to help evaluate statistical hypothesis, involving frequencies with which observations fall in various categories (nominal data).

7.4 Assignment

7.4.1 Assignment 7.1

is the formula for

1. the dependent t‐test. 2. the independent t‐test. 3. the one‐way analysis of variance test. 4. the Scheffe post hoc test.

MANAGEMENT PROGRAM



is the formula for


is the formula for


For the following research problem ‐ You are concerned with the effect of computers on the quality of written language. You randomly place the 30 students in your English class into two groups of 15 each. The first group is asked to write their next English theme assignment using a word processing program on a computer, while the other group is asked to write their themes by hand. You ask another English teacher, to read all 30 themes and give them a 1 (poorest) to 10 (best) rating on the quality of their English usage. You want to know if there is a significant difference in the quality ratings of the two groups.

What is the proper statistical test to use with this research problem?

1. the dependent t‐test 2. the independent t‐test 3. the one‐sample t‐test 4. the one‐way analysis of variance test

For the following research problem ‐ The number of hours a subject could stay awake was measured as a function of the dose level of a particular drug. Three levels of drug dosage were used. Analyze the results for the data on the dependent variable (number of hours awake) to determine if there was a significant difference among the three levels of drug dosage used.


1. the dependent t‐test 2. the independent t‐test 3. the one‐sample t‐test 4. the one‐way analysis of variance test

MANAGEMENT PROGRAM



7.4.2 Assignment 7.2.

1. An industrial psychologist is interested in evaluating four different types of training on worker productivity. Using a standard measure of productivity, the psychologist measures the productivity of a set of workers who have been trained using each one of the four procedures. Using the data below, determine whether there is a significant difference between the training methods. Larger numbers on the dependent variable indicate higher productivity.

Productivity Scores for Four Groups of Workers Trained by Different Methods

Group 1 On the Job

Group 2 Computer Assisted

Group 3Lecture

Group 4 Videotape

67 68 46 37

68 62 39 46

61 59 38 49

62 71 47 48

60 60 46 49

56 66 49 53

1. H0 : 2. H1 : 3. F = 4. F12 = 5. F13 = 6. F23 = 7. Critical Value for F = 8. State conditions under which you would reject H0 :

2. A school guidance counselor investigates the influence of different motivational devices on the academic achievement of students. The counselor arranges for one group of students to receive immediate feedback upon the completion of an English assignment. A second group of students receives feedback at the end of the day, while a third group receives feedback at the end of the week. Using the students' grades on a standardized English test, determine whether there is a significant difference between the groups. If necessary, perform Scheffe tests.

English Test Results for Groups of Students Receiving Various Types of Feedback

No Group 1 Immediate Feedback

Group 2Day's End Feedback

Group 3Week's End Feedback

1 49 40 36

2 40 37 32

3 41 42 31

4 46 39 39

MANAGEMENT PROGRAM



No Group 1 Immediate Feedback

Group 2Day's End Feedback

Group 3Week's End Feedback

5 42 45 40

6 50 39 39

7 53 45 41

8 51 49 38

1. H0 : 2. H1 : 3. F = 4. F12 = 5. F13 = 6. F23 = 7. Critical Value for F = 8. State conditions under which you would reject H0 :


For each of the following problems, state the null hypothesis, the alternate hypothesis, the calculated value of the statistic, the critical value of the statistic, and the conditions under which you would reject the null hypothesis.

1. A sample of 100 people are classified as to their social club membership and their academic status. Is belonging to a social club independent of academic status?.

Academic Classification and Social Club Membership for 100 People

Academic classification

Belong toSocial Club

Do not Belong toSocial Club

Freshman 9 16 Sophomore 11 14

Junior 16 9 Senior 19 6

1. H0: 2. H1:

3. =

4. Critical Value for = 5. State conditions under which you would reject H0

2. A consumer‐research group asked 100 men to use each of three kinds of after‐shave lotion for one month. After the trial period, each man indicated the lotion he preferred. Using the results below, determine whether there is a significant preference for any of the three after‐shave lotions.

MANAGEMENT PROGRAM



Number of Men Preferring Each of Three After‐Shave Lotions

Lotion Number of Men Preferring

1 42 2 36 3 22

1. H0: 2. H1:

3. =

4. Critical Value for = 5. State conditions under which you would reject H0

� is the formula for

1. the dependent t‐test. 2. the independent t‐test. 3. the one‐way analysis of variance test. 4. the chi‐square test.


1. the dependent t‐test. 2. the independent t‐test. 3. the one‐way analysis of variance test. 4. the chi‐square test.


1. the dependent t‐test. 2. the independent t‐test.

MANAGEMENT PROGRAM



3. the one‐way analysis of variance test. 4. the Scheffe post hoc test.

For the following research problem ‐ You are interested in knowing whether or not the composition of a family is related to the type of vacations they like to take. Accordingly, you collect the following data from a survey of preferred vacations:

Frequencies with which families of various types prefer various vacation types

Vacation Family Type

No Children Less Than 5 Children 5‐10 Children

Visit Relatives 0 15 5

Go to Beach 5 5 10

Urban Sightseeing 15 0 5


1. the dependent t‐test 2. the independent t‐test 3. the one‐way analysis of variance test 4. the chi‐square test

For the following research problem ‐ Is it really true that people with graduate degrees in certain fields earn substantially less money than people with graduate degrees in certain other fields? To answer this question, you look at data collected by Yuppie University on the salaries earned by recent graduate and professional students.

Salaries for recent graduates of Yuppie University by field of study

Engineering PhD Humanities PhD Education PhD J.D. M.D.

$40,000 $22,000 $25,000 $40,000 $50,000

$28,000 $24,000 $27,000 $35,000 $43,000

$32,000 $28,000 $31,000 $33,000 $33,000

$36,000 $24,000 $24,000 $36,000 $39,000

$30,000 $27,000 $38,000 $50,000

$32,000


1. the dependent t‐test 2. the independent t‐test 3. the one‐way analysis of variance test 4. the chi‐square test

MANAGEMENT PROGRAM



ARD – BUSINESS STATISTICS‐08 of ‐131


MODULE 8

Date of Receipt




Module Description: Regression Analysis

Objective The student understand and able use regression analysis to predict the value of a dependent variable based on an independent variable; The meaning of the regression coefficients; Making inferences about the slope and correlation coefficient; Estimating mean values and predict individual values using Ms Excel Regression Analysis or Other Statistical Softwares.

Output A report of Simple Regression Analysis produced by the students should be in the form of working procedures and results in both softcopy and hardcopy.

8 Regression Analysis

8.1 Simple Regression Analysis The field of econometrics uses regression analysis to create quantitative models that can be used to predict the value of a series if one knows the value of several other variables. This analysis tool performs linear regression analysis by using the "least squares" method to fit a line through a set of observations. You can analyze how a single dependent variable is affected by the values of one or more independent variables — for example, how For example, the wage per hour can be predicted if one knows the values of the variables that constitute the regression equation. This is a big leap of faith from a correlation or Confidence interval estimate. In a correlation, the statistician is not presuming or implying any causality or deduction of causality. On the other hand, regression analysis is used so often (probably even abused) because of its supposed ability to link cause and effect. Skepticism of causal relationships is not only healthy but also important because real power of regression lies in a comprehensive interpretation of the results.

8.2 Regression Analysis Using Excel Before using any analysis tool, you must arrange the

data you want to analyze in columns or rows on your

worksheet. This will be your input range. Once the

data is set you can open the analysis tool, in this

case Regression. Tools‐> Data Analysis… ‐>

Regression


8.3 Regression Dialog Box

Input Y Range – Enter the reference for the range of dependent data. The range must consist of a single

column of data. You can type in the data or use the Collapse or “go out and get it button” . This will

collapse your window such that you can select the data you wish to use. Once you have chosen your

desired data either press Enter or click on the Expand button .

Input X Range – Enter the reference for the range of independent data. Microsoft Excel orders

independent variables from this range in ascending order from left to right. The maximum number of

independent variables is 16.

Labels – Select if the first row or column of your input range or ranges contains labels. Clear if your

input has no labels; Excel generates appropriate data labels for the output table.

Confidence Level – Select to include an additional level in the summary output table. In the box, enter

the confidence level you want applied in addition to the default 95 percent level.

Constant is Zero – Select to force the regression line to pass through the origin.

Output Range – Enter the reference for the upper‐left cell of the output table. Allow at least seven

columns for the summary output table, which includes an anova table, coefficients, standard error of y

estimate, r2 values, number of observations, and standard error of coefficients.

New Worksheet Ply – Click to insert a new worksheet in the current workbook and paste the results

starting at cell A1 of the new worksheet. To name the new worksheet, type a name in the box.

New Workbook – Click to create a new workbook and paste the results in the new workbook.

Residuals – Select to include residuals in the residuals output table.

Standardized Residuals –Select to include standardized residuals in the residuals output table.

Residual Plots – Select to generate a chart for each independent variable versus the residual.

Line Fit Plots – Select to generate a chart for predicted values versus the observed values.


Normal Probability Plots – Select to generate a chart that plots normal probability.

8.4 Simple Regression

8.5 Linear Correlation and Regression Analysis In this section the objective is to see whether there is a correlation between two variables and to

find a model that predicts one variable in terms of the other variable. There are so many

examples that we could mention but we will mention the popular ones in the world of business.

Usually independent variable is presented by the letter x and the dependent variable is presented

by the letter y. A business man would like to see whether there is a relationship between the

number of cases of sold and the temperature in a hot summer day based on information taken

from the past. He also would like to estimate the number cases of soda which will be sold in a

particular hot summer day in a ball game. He clearly recorded temperatures and number of cases

of soda sold on those particular days. The following table shows the recorded data from June 1

through June 13. The weatherman predicts a 94F degree temperature for June 14. The


businessman would like to meet all demands for the cases of sodas ordered by customers on June

14.

DAY Cases of Soda Temperature

1‐Jun 57 56

2‐Jun 59 58

3‐Jun 65 63

4‐Jun 67 66

5‐Jun 75 73

6‐Jun 81 78

7‐Jun 86 85

8‐Jun 88 85

9‐Jun 88 87

10‐Jun 84 84

11‐Jun 82 88

12‐Jun 80 84

13‐Jun 83 89

Now lets use Excel to find the linear correlation coefficient and the regression line equation. The linear correlation coefficient is a quantity between ‐1 and +1. This quantity is denoted by R. The closer R to +1 the stronger positive (direct) correlation and similarly the closer R to ‐1 the stronger negative (inverse) correlation exists between the two variables. The general form of the regression line is y = mx + b. In this formula, m is the slope of the line and b is the y‐intercept. You can find these quantities from the Excel output. In this situation the variable y (the dependent variable) is the number of cases of soda and the x (independent variable) is the temperature. To find the Excel output the following steps can be taken:

Step 1. From the menus choose Tools and click on Data Analysis.

Step 2. When Data Analysis dialog box appears, click on correlation.

Step 3. When correlation dialog box appears, enter B1:C14 in the input range box. Click on Labels in first row and enter a16 in the output range box. Click on OK.

Cases of Soda TemperatureCases of Soda 1 Temperature 0.96659877 1

As you see the correlation between the number of cases of soda demanded and the temperature is a very strong positive correlation. This means as the temperature increases the demand for cases of soda is also increasing. The linear correlation coefficient is 0.966598577 which is very close to +1.

Now lets follow same steps but a bit different to find the regression equation.


Step 1. From the menus choose Tools and click on Data Analysis

Step 2. When Data Analysis dialog box appears, click on regression.

Step 3. When Regression dialog box appears, enter b1:b14 in the y‐range box and c1:c14 in the x‐range box. Click on labels.

Step 4. Enter a19 in the output range box.

Note: The regression equation in general should look like Y=m X + b. In this equation m is the slope of the regression line and b is its y‐intercept.

SUMMARY OUTPUT

Regression Statistics

Multiple R 0.966598577

R Square 0.934312809

Adjusted R Square 0.928341246

Standard Error 2.919383191

Observations 13

ANOVA

df SS MS F Significance F

Regression 1 1333.479989 1333.479989 156.4603497 7.58511E‐08

Residual 11 93.75078034 8522798213

Total 12 1427.230769

Coefficients Standard Error t Stat P‐value Lower 95% Upper 95%

Intercept 9.17800767 5.445742836 1.685354587 0.120044801 ‐2.80799756 21.16401

Temperature 0.879202711 0.07028892 12.50841116 7.58511E‐08 0.724497763 1.033908

The relationship between the number of cans of soda and the temperature is:

Y = 0.879202711 X + 9.17800767

The number of cans of soda = 0.879202711*(Temperature) + 9.17800767. Referring to this expression we can approximately predict the number of cases of soda needed on June 14. The weather forecast for this is 94 degrees, hence the number of cans of soda needed is equal to; The number of cases of soda=0.879202711*(94) + 9.17800767 = 91.82 or about 92 cases.


Assignment 8.1 Regression Analysis:

The highway deaths per 100 million vehicle miles and highway speed limits for 10 countries, are given below:

(Death, Speed) = (3.0, 55), (3.3, 55), (3.4, 55), (3.5, 70), (4.1, 55), (4.3, 60), (4.7, 55), (4.9, 60), (5.1, 60), and (6.1, 75).

From this we can see that five countries with the same speed limit have very different positions on the safety list. For example, Britain ... with a speed limit of 70 is demonstrably safer than Japan, at 55. Can we argue that, speed has little to do with safety. Use regression analysis to answer this question.



MODULE 9

Date of Receipt




Module Description: MULTIPLE REGRESSION Objective How to develop a multiple regression model How to interpret the

regression coefficients How to determine which independent variables are most important in predicting a dependent variable How to use quadratic terms in a regression model How to measure the correlation among independent variables

Output A report of Multiple Regression Analysis produced by the students should be in the form of working procedures and results in both softcopy and hardcopy.

9 Multiple Regression Model Multiple Regression is an extension of simple regression. Simple regression has only one

independent (explanatory) variable. Multiple Regression fits a model for one dependent (response)

variable based on more than one independent (explanatory) variables.

9.1 MULTIPLE REGRESSION USING THE DATA ANALYSIS ADDIN

We then create a new variable in cells C2:C6, cubed household size as a regressor. Then in cell C1 give the the heading CUBED HH SIZE. (It turns out that for the se data squared HH SIZE has a coefficient of exactly 0.0 the cube is used).

The spreadsheet cells A1:C6 should look like:

We have regression with an intercept and the regressors HH SIZE and CUBED HH SIZE


The population regression model is: y = β1 + β2 x2 + β3 x3 + u It is assumed that the error u is independent with constant variance (homoskedastic) - see EXCEL LIMITATIONS at the bottom.

We wish to estimate the regression line: y = b1 + b2 x2 + b3 x3

We do this using the Data analysis Add-in and Regression.

The only change over one-variable regression is to include more than one column in the Input X Range. Note, however, that the regressors need to be in contiguous columns (here columns B and C). If this is not the case in the original data, then columns need to be copied to get the regressors in contiguous columns.

Hitting OK we obtain


The regression output has three components:

Regression statistics table ANOVA table Regression coefficients table.

9.2 INTERPRET REGRESSION STATISTICS TABLE

This is the following output. Of greatest interest is R Square.

Explanation

Multiple R 0.895828 R = square root of R2

R Square 0.802508 R2

Adjusted R Square 0.605016 Adjusted R2 used if more than one x variable

Standard Error 0.444401 This is the sample estimate of the standard deviation of the error u

Observations 5 Number of observations used in the regression (n)

The above gives the overall goodness-of-fit measures: R2 = 0.8025 Correlation between y and y-hat is 0.8958 (when squared gives 0.8025). Adjusted R2 = R2 - (1-R2 )*(k-1)/(n-k) = .8025 - .1975*2/2 = 0.6050.

The standard error here refers to the estimated standard deviation of the error term u. It is sometimes called the standard error of the regression. It equals sqrt(SSE/(n-k)). It is not to be confused with the standard error of y itself (from descriptive statistics) or with the standard errors of the regression coefficients given below.


R2 = 0.8025 means that 80.25% of the variation of yi around ybar (its mean) is explained by the regressors x2i and x3i.

9.3 INTERPRET ANOVA TABLE

An ANOVA table is given. This is often skipped.

df SS MS F Significance F

Regression 2 1.6050 0.8025 4.0635 0.1975

Residual 2 0.3950 0.1975

Total 4 2.0

The ANOVA (analysis of variance) table splits the sum of squares into its components.

Total sums of squares = Residual (or error) sum of squares + Regression (or explained) sum of squares.

Thus Σ i (yi - ybar)2 = Σ i (yi - yhati)2 + Σ i (yhati - ybar)2

where yhati is the value of yi predicted from the regression line and ybar is the sample mean of y.

For example: R2 = 1 - Residual SS / Total SS (general formula for R2) = 1 - 0.3950 / 1.6050 (from data in the ANOVA table) = 0.8025 (which equals R2 given in the regression Statistics table).

The column labeled F gives the overall F-test of H0: β2 = 0 and β3 = 0 versus Ha: at least one of β2 and β3 does not equal zero. Aside: Excel computes F this as: F = [Regression SS/(k-1)] / [Residual SS/(n-k)] = [1.6050/2] / [.39498/2] = 4.0635.

The column labeled significance F has the associated P-value. Since 0.1975 > 0.05, we do not reject H0 at signficance level 0.05. Note: Significance F in general = FINV(F, k-1, n-k) where k is the number of regressors including hte intercept. Here FINV(4.0635,2,2) = 0.1975.

9.4 INTERPRET REGRESSION COEFFICIENTS TABLE

The regression output of most interest is the following table of coefficients and associated output:


Coefficient St. error t Stat P‐value Lower 95% Upper 95%

Intercept 0.89655 0.76440 1.1729 0.3616 ‐2.3924 4.1855

HH SIZE 0.33647 0.42270 0.7960 0.5095 ‐1.4823 2.1552

CUBED HH SIZE 0.00209 0.01311 0.1594 0.8880 ‐0.0543 0.0585

Let βj denote the population coefficient of the jth regressor (intercept, HH SIZE and CUBED HH SIZE).

Then

Column "Coefficient" gives the least squares estimates of βj. Column "Standard error" gives the standard errors (i.e.the estimated standard deviation)

of the least squares estimates bj of βj. Column "t Stat" gives the computed t‐statistic for H0: βj = 0 against Ha: βj ≠ 0.

This is the coefficient divided by the standard error. It is compared to a t with (n‐k)

degrees of freedom where here n = 5 and k = 3.

Column "P‐value" gives the p‐value for test of H0: βj = 0 against Ha: βj ≠ 0..

This equals the Pr{|t| > t‐Stat}where t is a t‐distributed random variable with n‐k degrees

of freedom and t‐Stat is the computed value of the t‐statistic given in the previous

column.

Note that this p‐value is for a two‐sided test. For a one‐sided test divide this p‐value by 2

(also checking the sign of the t‐Stat).

Columns "Lower 95%" and "Upper 95%" values define a 95% confidence interval for βj.

A simple summary of the above output is that the fitted line is

y = 0.8966 + 0.3365*x + 0.0021*z

9.5 CONFIDENCE INTERVALS FOR SLOPE COEFFICIENTS

95% confidence interval for slope coefficient β2 is from Excel output (-1.4823, 2.1552).

Excel computes this as b2 ± t_.025(3) × se(b2) = 0.33647 ± TINV(0.05, 2) × 0.42270 = 0.33647 ± 4.303 × 0.42270 = 0.33647 ± 1.8189 = (-1.4823, 2.1552).


Other confidence intervals can be obtained.

For example, to find 99% confidence intervals: in the Regression dialog box (in the Data

Analysis Add‐in),

check the Confidence Level box and set the level to 99%.

9.6 TEST HYPOTHESIS OF ZERO SLOPE COEFFICIENT ("TEST OF STATISTICAL SIGNIFICANCE")

The coefficient of HH SIZE has estimated standard error of 0.4227, t-statistic of 0.7960 and p-value of 0.5095. It is therefore statistically insignificant at significance level α = .05 as p > 0.05.

The coefficient of CUBED HH SIZE has estimated standard error of 0.0131, t-statistic of 0.1594 and p-value of 0.8880. It is therefore statistically insignificant at significance level α = .05 as p > 0.05.

There are 5 observations and 3 regressors (intercept and x) so we use t(5-3)=t(2). For example, for HH SIZE p = =TDIST(0.796,2,2) = 0.5095.

9.7 TEST HYPOTHESIS ON A REGRESSION PARAMETER

Here we test whether HH SIZE has coefficient β2 = 1.0.

Example: H0: β2 = 1.0 against Ha: β2 ≠ 1.0 at significance level α = .05.

Then t = (b2 - H0 value of β2) / (standard error of b2 ) = (0.33647 - 1.0) / 0.42270 = -1.569.

9.7.1 Using the pvalue approach

p‐value = TDIST(1.569, 2, 2) = 0.257. [Here n=5 and k=3 so n‐k=2]. Do not reject the null hypothesis at level .05 since the p‐value is > 0.05.

9.7.2 Using the critical value approach

We computed t = ‐1.569 The critical value is t_.025(2) = TINV(0.05,2) = 4.303. [Here n=5 and k=3 so n‐k=2]. So do not reject null hypothesis at level .05 since t = |‐1.569| < 4.303.

9.8 OVERALL TEST OF SIGNIFICANCE OF THE REGRESSION PARAMETERS

We test H0: β2 = 0 and β3 = 0 versus Ha: at least one of β2 and β3 does not equal zero.


From the ANOVA table the F-test statistic is 4.0635 with p-value of 0.1975. Since the p-value is not less than 0.05 we do not reject the null hypothesis that the regression parameters are zero at significance level 0.05. Conclude that the parameters are jointly statistically insignificant at significance level 0.05.

Note: Significance F in general = FINV(F, k-1, n-k) where k is the number of regressors including hte intercept. Here FINV(4.0635,2,2) = 0.1975.

9.9 PREDICTED VALUE OF Y GIVEN REGRESSORS

Consider case where x = 4 in which case CUBED HH SIZE = x^3 = 4^3 = 64.

= b1 + b2 x2 + b3 x3 = 0.88966 + 0.3365×4 + 0.0021×64 = 2.37006

9.10 EXCEL LIMITATIONS

Excel restricts the number of regressors (only up to 16 regressors).

Excel requires that all the regressor variables be in adjoining columns. You may

need to move columns to ensure this. e.g. If the regressors are in columns B and D you need to copy at least one of columns B and D so that they are adjacent to each other.

Excel standard errors and t-statistics and p-values are based on the assumption that the error is independent with constant variance (homoskedastic). Excel does not provide alternaties, such asheteroskedastic-robust or autocorrelation-robust standard errors and t-statistics and p-values

9.11 Assignment 9.1 DATA:

Store Bars Sold Price (cents) Promotion ($) Store Bars sold Price (cents) Promotion ($)

1 4141 59 200 18 2730 79 400

2 3842 59 200 19 2618 79 400

3 3056 59 200 20 4421 79 400

4 3519 59 200 21 4113 79 600

5 4226 59 400 22 3746 79 600

6 4630 59 400 23 3532 79 600

7 3507 59 400 24 3825 79 600

8 3754 59 400 25 1096 99 200

9 5000 59 600 26 761 99 200


Store Bars Sold Price (cents) Promotion ($) Store Bars sold Price (cents) Promotion ($)

10 5120 59 600 27 2088 99 200

11 4011 59 600 28 820 99 200

12 5015 59 600 29 2114 99 400

13 1916 79 200 30 1882 99 400

14 675 79 200 31 2159 99 400

15 3636 79 200 32 1602 99 400

16 3224 79 200 33 3354 99 600

17 2295 79 400 34 2927 99 600

A sample of 34 stores data ini a supermarket chain is selected for a test‐market study of OmniPower. All

the stores selected have approximately the same monthly sales volume. Two independent variables are

prices of bar (X1) and monthly Ads expenditures (X2).

a. Use Excel Data Analysis – Regression to estimate the regression line

b. Interpret regression statistics table

c. Use 95% and 99% confidence interval

d. Test Hypothesis Of Zero Slope Coefficient ("Test Of Statistical Significance")

e. Test Hypothesis On A Regression Parameter

i. Using The P‐Value Approach

ii. Using The Critical Value Approach

f. Overall Test Of Significance Of The Regression Parameters

g. Predicted Value Of Y Given Price 89 cents and Promotion 800



MODULE 10

Date of Receipt




Module Description: TIME SERIES FORECASTING

Objective Discussed the important of forecasting Performed smoothing of data series Described least square trend fitting and forecasting Addressed time series forecasting Addressed autoregressive models Described procedure for choosing appropriate models

Output A report produced by the students should be in the form of working procedures and results in both softcopy and hardcopy.

10 Time Series Forecasting Time Series analysis has two main goals:

* Identifying the nature of a sequence of observations.

* Predicting future values using historical observations (also known as forecasting).

In Time Series analysis, it is assumed that the data consists of a systematic pattern, and also random

noise that makes the pattern difficult to identify. Most time series analysis techniques use filtering to

remove the data noise. There are two general components of Time series patterns: Trend and

Seasonality. The trend is a linear or non‐linear component, and does not repeat within the time range.

The Seasonality repeats itself in systematic intervals over time. These two components are often both

present in real data.

Trend Analysis

Trend analysis is a technique used to identify a trend component in time series data. In many

cases data can be approximated by a linear function, but logarithmic, exponential, and

polynomial functions can also be used.

Regression Analysis

Regression analysis is the study of relationships among variables, and its purpose is to predict, or

estimate, the value of one variable from the known values of other variables related to it. Any

method of fitting equations to data may be called regression, and these equations are useful for

making predictions, and judging the strength of relationships.

Forecasting and extrapolation from present values to future values is not a function of regression

analysis. To predict the future, time series analysis is used. To predict values it is necessary to find a

predictive function that will minimize the sum of distances between each of the points, and the

predictive function itself. The least‐squares method is the most common function amongst the


predictive functions, and it calculates the minimum average squared deviations between the points, and

the estimated function.

10.1 Time series forecasting models

Basic assumption of time‐series forecasting is that the factors that have influenced activities in the past

and present will continue to do so in approximately the same way in the future. A trend is an overall

long term upward or downward movement in a time series. The most basic in the classical multiplicative

model for annual, quarterly, and monthly.

10.1.1 CLASSICAL MULTIPLICATIVE TIMESERIES MODEL FOR ANNUAL DATA Yi = Ti x Ci x Ii

Where :

Ti = value of the trend component in year‐i Ci = value of the cyclical component in year‐i Ii = value of the irregular component in year‐i

CLASSICAL MULTIPLICATIVE TIME‐SERIES MODEL FOR ANNUAL DATA WITH A SEASONAL COMPONENT

Yi = Ti x Si x Ci x Ii

Where :

Ti, Ci, Ii = value of the trend, cyclical, and irregular components in year‐i Si = value of the component in year‐i

Use Wrigley Coded Data below to create excel chart plot for Actual Gross Revenue

Year Actual Revenue Year Actual Revenue

1984 591 1995 1770

1985 620 1996 1851

1986 699 1997 1954

1987 781 1998 2023

1988 891 1999 2079

1989 993 2000 2146

1990 1111 2001 2430

1991 1149 2002 2746

1992 1301 2003 3069

1993 1440 2004 3649

1994 1661 2005 4159



Year Population Workforce Year Population Workforce

1984 176,383 113,544 1995 198,584 132,304

1985 178,206 115,461 1996 200,591 133,943

1986 180,587 117,834 1997 203,133 136,297

1987 182,753 119,865 1998 205,220 137,673

1988 184,613 121,669 1999 207,753 139,368

1989 186,393 123,869 2000 212,577 142,583

1990 189,164 125,840 2001 215,092 143,734

1991 190,925 126,346 2002 217,570 144,863

1992 192,805 128,105 2003 221,168 146,510

1993 194,838 129,200 2004 223,357 146,817

1994 196,814 131,056 2005 226,082 147,956

c. Plot using Ms Excel the time series for US civilian noninstitutional population of people

16 years and older.

d. Compute the linier trend forecasting equation

e. Forecast the US civilian noninstitutional population of people 16 years and older for

2006 and 2007.

f. Repeat (a) through (c) for US. civilian noninstitutional workforce of people 16 years and

older.

10.2 Moving Average and Exponential Smoothing

10.2.1 Moving Average Models Use the Add Trendline option to analyze a moving average forecasting model in Excel. You must first create a graph of the time series you want to analyze. Select the range that contains your data and make a scatter plot of the data. Once the chart is created, follow these steps:

y = 143.63x - 284695R² = 0.9121

0

500

1000

1500

2000

2500

3000

3500

4000

4500

1980 1985 1990 1995 2000 2005 2010Rev

enu

e ($

mill

ion

s)

Year

Wm. Wrigley Jr. Company Actual Revenue


1. Click on the chart to select it, and click on any point on the line to select the data series. When you click on the chart to select it, a new option, Chart, s added to the menu bar.

2. From the Chart menu, select Add Trendline.

Moving averages for a chosen period of length (L) consist of a series of means computed over time such that each mean is calculated for a sequence of L observed values. Moving Average are represented by the symbol MA(L). For example we have 11 years data and want to compute five‐year moving averages ( L=5).

11‐years period 1996 to 2006 data:

4.0 5.0 7.0 6.0 8.0 9.0 5.0 2.0 3.5 5.5 6.6

MA(5) = (Y1 + Y2 + Y3+ Y4 +Y5 )/L = (4.0 + 5.0 +7.0 +6.0 + 8.0)/5 = 6.0

Put the moving average computed above centered on new middle value (7.0). Calculate the rest MA(L) and we have:

Revenue 4.0 5.0 7.0 6.0 8.0 9.0 5.0 2.0 3.5 5.5 6.6

MA ‐ ‐ 6.0 7.0 7.0 6.0 5.5 5.0 4.5 ‐ ‐

The following is three‐year and seven‐year moving for Cabot Corporation revenues:

Year Revenue MA 3-Year MA 7-Year

1982 1588 #N/A #N/A

1983 1558 1633 #N/A

1984 1753 1573 #N/A

1985 1408 1490.3 1531.1

1986 1310 1380.7 1581.0

1987 1424 1470.3 1599.1

1988 1677 1679.3 1561.3

1989 1937 1766.3 1583.3

1990 1685 1703.3 1627.4

1991 1488 1578.3 1665.0

1992 1562 1556.3 1688.4

1993 1619 1622.7 1678.1

1994 1687 1715.7 1671.3

1995 1841 1797.7 1694.9

1996 1865 1781.0 1714.4

1997 1637 1718.3 1725.7

1998 1653 1663.0 1702.3

1999 1699 1683.3 1661.7

2000 1698 1640.0 1651.7

2001 1523 1592.7 1694.1

2002 1557 1625.0 1761.6

2003 1795 1762.0 #N/A

2004 1934 1951.3 #N/A

2005 2125 #N/A #N/A

0

500

1000

1500

2000

2500

1980 1985 1990 1995 2000 2005 2010

Rev

enue

s ($

mill

ions

)

Year

Moving Averages for Cabot Corporation Revenue

Revenue

Revenue

MA 3-Year

MA 7-Year


10.2.2 Exponential Smoothing Models

The simplest way to analyze a timer series using an Exponential Smoothing model in Excel is to use the data analysis tool. This tool works almost exactly like the one for Moving Average, except that you will need to input the value of a instead of the number of periods, k. Once you have entered the data range and the damping factor, 1‐α, and indicated what output you want and a location, the analysis is the same as the one for the Moving Average model.

COMPUTING AND EXPONENTIALLY SMOOTHED VALUE IN TIME PERIOD i

Ei = Yi

Ei = WYi +(1‐W)Ei‐1 i= 2,3,4, …

Where Ei =value of the exponentially smoothed series being computed in time period i Ei‐1 = value of the exponentially smoothed series being computed in time period i‐1 Yi = Observed value of the time series in period i W = subjectively assigned weight or smoothing coefficient (0 < W <1).

Year Revenue ES(W=.50) ES(W=.25)

1982 1588 1588.0 1588.0

1983 1558 1573.0 1580.5

1984 1753 1663.0 1623.6

1985 1408 1535.5 1569.7

1986 1310 1422.8 1504.8

1987 1424 1423.4 1484.6

1988 1677 1550.2 1532.7

1989 1937 1743.6 1633.8

1990 1685 1714.3 1646.6

1991 1488 1601.1 1606.9

1992 1562 1581.6 1595.7

1993 1619 1600.3 1601.5

1994 1687 1643.6 1622.9

1995 1841 1742.3 1677.4

1996 1865 1803.7 1724.3

1997 1637 1720.3 1702.5

1998 1653 1686.7 1690.1

1999 1699 1692.8 1692.3

2000 1698 1695.4 1693.8

2001 1523 1609.2 1651.1

2002 1557 1583.1 1627.5

2003 1795 1689.1 1669.4

2004 1934 1811.5 1735.6

2005 2125 1968.3 1832.9

0

500

1000

1500

2000

2500

1980 1985 1990 1995 2000 2005 2010

Rev

enue

s ($

mill

ions

)

Year

Exponentially Smoothed Cabot Corporation Revenue

Revenue

ES(W=.50)

ES(W=.25)


10.3 Assignment 10.2 Year 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006

Deals 715 865 708 861 931 939 1031 893 735 759 1013 622

h. Plot the time series

i. Fit a three year moving average to the data and plot the results.

j. Using a smoothing coefficient of W = 0.50, exponentially smooth the series and

plots the results

k. Repeat (c) using W = 0.25

l. Compare the results of (c) and (d).

10.4 Linear, exponential and quadratic trend

10.4.1 Linear Trend Model Linier trend model Yi = β0 + β1Xi+ i is the simplest forecasting model.

Using Wrigley Data above we plot using Microsoft Excel time‐series of real gross revenues shown below:

Using Microsoft Excel, we perform a

simple linier regression analysis on the

adjusted time series results in the

following linier trend forecasting

equation: 469.9158 62.1068

The regression coefficient can be

interpret as follows:

The Y intercept, b0 =

469.9158

The Slope, b1 = 62.1068

For example we want to project the trend in 2006 then substitute X23 =22 (2006 code), into the linear

trend forecasting equation:


469.9158 62.1068 22 1,839.265 1983 1984

Quadratic Trend Model

, is the simplest nonlinear model. The equation of Quadratic Trend Model

presented below:

, ; ; estimated

quadratic effect on Y

For example, Using Microsoft Excel to compute the quadratic trend forecasting equation. Figure below provides

the results for quadratic trend model used to forecast real gross revenues at the WM. Wrigley Jr. company:

618.3211 17.5852 2.1201

To compute a forecast using the quadratic trend equation in 2006 then substitute X23 =22 (2006 code),

into the quadratic trend forecasting equation:

618.3211 17.5852 22 2.1201 22 2,031.324


10.4.2 Exponential Trend Model The exponential trend model equation ( ,

where 1 100% % . The exponential

trend forecasting equation is log(Yi)= b0 +b1Xi.

Excel results worksheet for an exponential trend model for real gross revenues at the WM. Wrigley Jr. company is

Using exponential trend equation and the results above we have: log(Yi)= 2.7647 +.0245Xi, where year 0 is 1984.

Compute the values for and by using the antilog of regression coefficients (b0 and b1):

2.7647 10 . 581.701

0.0245 10 . 1.058

Thus, the equation of the exponential trend forecasting is:

581.701 1.058

To forecast real gross revenues for 2006 (X23=22) using the above equation are as follow:

log(Yi)= 2.7647 +.0245(22)=3.3037

3.3037 103.3037 2,012.334

The chart of exponential trend forecasting is:


10.4.3 Model Selection Using First, Second, and Percentage Differences

To select which of those models above is the most appropriate model, we can use visually

inspecting scatter plot and compating the adjusted r2 values, we can compare and examine first,

second, and percentage differences.

Perfect Fit For Linear Trend Model: The first differences are constant. And the consecutive

values in the series are the same throughout

Example:

1997 1998 1999 2000 2001 2002 2003 2004 2005 2006Passengers 30 33 36 39 42 45 48 51 54 57First Diff 33 3 36 6 39 9 42 12 45

Perfect Fit For Quadratic Trend Model: The second differences are constant. And the

consecutive values in the series are the same throughout

Example:

1997 1998 1999 2000 2001 2002 2003 2004 2005 2006Passengers 30 31 33.5 37.5 43 50 58.5 68.5 80 93First Diff 31 2.5 35 8 42 16.5 52 28 65Second Diff 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5

Perfect Fit For Exponential Trend Model: The percentage difference between the consecutive

values are constant. Thus 100% 100% 100%

Example:

1997 1998 1999 2000 2001 2002 2003 2004 2005 2006


Passengers 30 31.5 33.1 34.8 36.5 38.3 40.2 42.2 44.3 46.5

First Diff 31.5 1.6 33.2 3.3 35 5.2 37 7.3 39.2

Second Diff 0.1 0.1 7.11E‐15 0.1 0.1 0.1 0.1 0.1

Percentage Diff 5% 5% 5% 5% 5% 5% 5% 5% 5%

For the real gross revenue data at WM Jr. Company, neither the first, second differences, nor

percentage differences are constant across the series (see: table below). Therefore, the other

models may be more appropriate (including those considered in Autoregressive Modeling.

10.4.4 Assignment 10.3 a. Plot the Data of Table 9.1 Bed Bath & Beyond Inc.

b. Compute a linear trend forecasting equation and plot the

results.

c. Compute a linear trend forecasting equation and plot the

results.

d. a linear trend forecasting equation and plot the results.

e. Using the forecasting equation in (b) through (d), what are

your annual forecasts of the number of stores open for

2007 and 2008

f. How can you explain the differences in the three forecast

in (e)? What forecast do you think you should use? Why?

10.5 The autoregressive and the leastsquare models for seasonal data Autoregressive modeling is a technique used to forecast time series with autocorrelation. A first‐order

autocorrelation refers to the relationship between consecutive values in time series. A second‐order

autocorrelation refers to the relationship between values that are two period apart. A pth‐order order

autocorrelation refers to the correlation between values in a time series that are p period apart.

First Order Autoregressive Model

is similar in form to the simple linear regression model.

10.6 Prices indexes Index numbers allow relative comparisons over time

Index numbers are reported relative to a base period index

Base period index = 100 by definition

Table 10‐1Bed Bath & Beyond Inc.


where

Ii = index number for year i

Pi = price for year i

Pbase = price for the base year

10.6.1 Example Airplane ticket prices from 1998 to 2006:

Prices in 1998 were 92.2% of base year prices

Prices in 2000 were 100% of base year prices (by definition, since 2000 is the base year)

Prices in 2006 were 130.2% of base year prices

10.7 Aggregated and simple indexes An aggregate index is used to measure the rate of change from a base period for a group of items


10.7.1 Unweighted Aggregate Price Index

Example:

Year Lease payment Fuel Repair Total Index (2003=100)

2003 260 45 40 345 100.0

2004 280 60 40 380 110.1

2005 305 55 45 405 117.4

2006 310 50 50 410 118.8

Unweighted total expenses were 18.8% higher in 2006 than in 2003

10.7.2 Weighted Aggregate Price Indexes

118.8(100)345

410100

P

PI

2003

20062006

math11002 prac modules reco

Documents