math11002 prac modules reco
DESCRIPTION
modulTRANSCRIPT
Practicum Module
Math11002 ‐ Business Statistics
By: Aurino Rilman Adam Djamaris
MODELLING AND SIMULATION LABORATORY MANAGEMENT PROGRAM
2010
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 2 Page 2 of ‐131
1.1 Answer questions below with a brief description. .................................................................................. 8
1. EXPLAIN KEY DEFINITION AND GIVE AT LEAST 1 EXAMPLE ! ......................................................... 8
1.2 Use Microsoft Excel complete following tasks !! ..................................................................................... 9
1.3 Create Bar chart and also include cumulative line chart using data on table 1. ........................................ 9
1.4 Create Pie Graph, and attach excel graph results to as your answer! ...................................................... 9
1.5 The following data represent the cost of electricity during july 2006 for random samples of 50 one‐
bedroom apartments in large city ...................................................................................................................... 9
1.6 From a frequency distribution and percentage distribution that have class interval with upper class limits
$99, $119, and so on. ........................................................................................................................................ 10
1.7 Construct a histogram and a percentage polygon .................................................................................. 10
1.8 Form a cumulative percentage distribution and plot a cumulative percentage polygon ......................... 10
1.9 Around what amount does monthly electricity cost seem to be concentrated? ..................................... 10
1.10 Appendix .............................................................................................................................................. 10
1.10.1 Installing Excel Add‐Ins for PHStat2 ....................................................................................................... 10
1.10.2 INSTALLING “DATA ANALYSIS” ON EXCEL 2007 ..................................................................................... 10
1.10.3 Installing and Operating the Prentice Hall PHStat ON Your Home Computer ...................................... 11
1.10.4 Configuring Excel 2007 security for PHStat2 ......................................................................................... 11
2 NUMERICAL DESCRIPTIVE MEASURES ......................................................................................... 13
2.1 Central Tendency .................................................................................................................................. 13
2.1.1 The Mean ................................................................................................................................................... 13
2.1.2 The Median ................................................................................................................................................ 14
2.1.3 The Mode ................................................................................................................................................... 15
2.1.4 Quartiles ..................................................................................................................................................... 16
2.1.5 The Geometric Mean ................................................................................................................................. 17
2.1.6 Other useful Excel Basic Built‐In Functions: ............................................................................................... 17
2.2 Assignment 2.1: .................................................................................................................................... 20
2.3 Variation .............................................................................................................................................. 20
2.3.1 The Range ................................................................................................................................................... 20
2.3.2 The InterQuartile Range ............................................................................................................................. 21
2.3.3 The Variance and Standar Deviation .......................................................................................................... 21
2.3.4 The Coefficient of Variance ....................................................................................................................... 22
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 2 Page 3 of ‐131
2.3.5 Z Scores ...................................................................................................................................................... 23
2.4 Shape ................................................................................................................................................... 24
2.4.1 Formula: ..................................................................................................................................................... 24
2.5 Assignment 2.2: ................................................................................................................................... 25
2.6 Descriptive summary of population ...................................................................................................... 25
2.6.1 Excel Statistical Analysis Tools ................................................................................................................... 25
2.6.2 Install and use the Analysis ToolPak .......................................................................................................... 26
2.7 Box‐whisker plot ................................................................................................................................... 27
2.8 Assignment 2.3 ..................................................................................................................................... 29
2.9 Weighted mean .................................................................................................................................... 29
2.10 Assignment 2.4 ..................................................................................................................................... 30
2.11 Correlation coefficients ......................................................................................................................... 30
2.12 Covariance ............................................................................................................................................ 33
2.13 Assignment 2.5 ..................................................................................................................................... 33
2.13.1 Calories and Fat relationship ................................................................................................................. 33
2.13.2 Fuel Efficiency Calculation and Standard ............................................................................................... 34
3 PROBABILITY .............................................................................................................................. 35
3.1 Basic Probability ................................................................................................................................... 35
3.2 Sample spaces and events, contingency tables, simple probability and joint probability ........................ 36
3.2.1 Sample Space ............................................................................................................................................. 36
3.2.2 Event in Sample Space ............................................................................................................................... 36
3.2.3 Simple and Joint Probability ....................................................................................................................... 37
3.3 Bayes' Theorem .................................................................................................................................... 38
3.4 Assignment 3.1 ..................................................................................................................................... 39
3.5 Basic Probability Rules .......................................................................................................................... 41
3.5.1 Discrete Random Variable .......................................................................................................................... 41
3.5.2 Discrete Random Variables Expected Value .............................................................................................. 42
3.5.3 Discrete Random Variables Dispersion ...................................................................................................... 42
3.5.4 Covariance .................................................................................................................................................. 42
3.5.5 The Sum of Two Random Variables: Measures .......................................................................................... 43
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 2 Page 4 of ‐131
3.6 Binomial Distribution ............................................................................................................................ 44
3.6.1 Properties ................................................................................................................................................... 44
3.6.2 The Binomial Distribution Formula ............................................................................................................ 45
3.6.3 The shape and Characteristics ................................................................................................................... 45
3.7 Poisson Distribution .............................................................................................................................. 46
3.7.1 Properties ................................................................................................................................................... 46
3.7.2 Formula ...................................................................................................................................................... 46
3.7.3 Shape .......................................................................................................................................................... 47
3.8 Hypergeometric distribution ................................................................................................................. 47
3.8.1 Formula ...................................................................................................................................................... 47
3.8.2 Example ...................................................................................................................................................... 48
3.9 Read Excel Companion to Chapter 5 ...................................................................................................... 48
3.10 Assignment 3.2 ..................................................................................................................................... 48
3.11 Assignment 3.3 ..................................................................................................................................... 49
4 NORMAL AND SAMPLING DISTRIBUTION ................................................................................... 50
4.1 Normal Distribution and Evaluating Normality ...................................................................................... 50
4.1.1 Normal Probability Density Function ......................................................................................................... 51
4.1.2 Evaluating Normality .................................................................................................................................. 52
4.2 Sampling and Sampling Distribution ...................................................................................................... 54
4.2.1 Sample ........................................................................................................................................................ 54
4.2.2 Types of Samples ........................................................................................................................................ 54
4.2.3 Sampling Distributions ............................................................................................................................... 55
4.2.4 SAMPLING FROM FINITE POPULATIONS .................................................................................................... 56
4.3 Assignment for Simple Random Sample ................................................................................................ 56
4.4 Assignment for Sampling Distribution ................................................................................................... 56
4.5 Assignment for The Sampling Distribution of the mean ......................................................................... 56
4.6 Assignment for Sampling from Finite Population ................................................................................... 57
5 CONFIDENCE INTERVAL ESTIMATION .......................................................................................... 58
5.1 Confidence intervals ............................................................................................................................. 58
5.1.1 A point estimate and a confidence interval estimate ................................................................................ 58
5.1.2 Confidence Interval for μ (σ Known) ......................................................................................................... 59
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 2 Page 5 of ‐131
5.1.3 Confidence Interval for μ (σ Unknown) ..................................................................................................... 61
5.2 Confidence Interval Estimate for a Single Population Proportion ........................................................... 64
5.2.1 Example for Confidence Intervals for the Population Proportion .............................................................. 64
5.3 Determining Sample Size ...................................................................................................................... 65
5.3.1 IF Population Standard Deviation (σ) Known ............................................................................................. 65
5.3.2 IF Population Standard Deviation (σ) Unknown ......................................................................................... 66
5.3.3 To Determine The Required Sample Size For The Proportion ................................................................... 66
5.4 Assignment 5 ........................................................................................................................................ 67
6 HYPOTHESIS TESTING AND TWO SAMPLE TEST ........................................................................... 68
6.1 Hypothesis Testing ................................................................................................................................ 68
6.1.1 The Null Hypothesis, H0 .............................................................................................................................. 68
6.1.2 The Alternative Hypothesis, H1 .................................................................................................................. 69
6.1.3 The Hypothesis Testing Process ................................................................................................................. 69
6.1.4 The Test Statistic and Critical Values .......................................................................................................... 70
6.1.5 Errors in Decision Making .......................................................................................................................... 70
6.1.6 Level of Significance, α ............................................................................................................................... 71
6.1.7 Hypothesis Testing: σ Known ..................................................................................................................... 71
6.1.8 6 Steps of Hypothesis Testing: ................................................................................................................... 72
6.1.9 Hypothesis Testing: σ Known p‐Value Approach ....................................................................................... 73
6.1.10 Hypothesis Testing: σ Known Confidence Interval Connections ........................................................... 74
6.1.11 One Tail Tests ......................................................................................................................................... 74
6.1.12 Hypothesis Testing: σ Unknown ............................................................................................................ 77
6.1.13 Hypothesis Testing: Connection to Confidence Intervals ...................................................................... 77
6.1.14 Hypothesis Testing Proportion .............................................................................................................. 78
6.2 Assignment 6.1 ..................................................................................................................................... 79
6.3 Two‐Sample Tests ................................................................................................................................. 79
6.3.1 Two‐Sample Tests Independent Populations ............................................................................................. 81
6.3.2 Independent Populations Unequal Variance ............................................................................................. 82
7 ANOVA AND CHI SQUARE AND NON PARAMETRIC TESTS .......................................................... 83
7.1 One‐Way Analysis of Variance .............................................................................................................. 84
7.1.1 Hypotheses: One‐Way ANOVA ................................................................................................................... 84
7.1.2 Partitioning the Variation ........................................................................................................................... 85
7.1.3 Obtaining the Mean Squares ..................................................................................................................... 86
7.1.4 One‐Way ANOVA Table .............................................................................................................................. 86
7.1.5 Test statistic ............................................................................................................................................... 86
7.1.6 Example ...................................................................................................................................................... 87
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 2 Page 6 of ‐131
7.1.7 The The Tukey‐Kramer Procedure ............................................................................................................. 88
7.1.8 ANOVA Assumptions .................................................................................................................................. 89
7.2 Two‐Way Analysis of Variance .............................................................................................................. 90
7.2.1 Sources of Variation ................................................................................................................................... 90
7.2.2 Two‐Way ANOVA: Features ....................................................................................................................... 91
7.2.3 Interaction .................................................................................................................................................. 91
7.3 CHI SQUARE AND NON PARAMETRIC TESTS .......................................................................................... 91
7.3.1 One‐Variable Chi‐Square (goodness‐of‐fit test) with equal expected frequencies ................................... 92
7.3.2 One‐Variable Chi‐Square (goodness‐of‐fit test) with predetermined expected frequencies .................... 94
7.3.3 Two‐Variable Chi‐Square (test of independence) ...................................................................................... 96
7.4 Assignment ........................................................................................................................................... 98
7.4.1 Assignment 7.1 ........................................................................................................................................... 98
7.4.2 Assignment 7.2. ........................................................................................................................................ 100
7.4.3 Assignment 7.3 ......................................................................................................................................... 101
8 REGRESSION ANALYSIS ............................................................................................................. 105
8.1 Simple Regression Analysis ................................................................................................................. 105
8.2 Regression Analysis Using Excel .......................................................................................................... 105
8.3 Regression Dialog Box ......................................................................................................................... 106
8.4 Simple Regression ............................................................................................................................... 107
8.5 Linear Correlation and Regression Analysis ......................................................................................... 107
9 MULTIPLE REGRESSION MODEL ................................................................................................ 111
9.1 MULTIPLE REGRESSION USING THE DATA ANALYSIS ADD‐IN ................................................................ 111
9.2 INTERPRET REGRESSION STATISTICS TABLE ......................................................................................... 113
9.3 INTERPRET ANOVA TABLE ................................................................................................................... 114
9.4 INTERPRET REGRESSION COEFFICIENTS TABLE ..................................................................................... 114
9.5 CONFIDENCE INTERVALS FOR SLOPE COEFFICIENTS ............................................................................. 115
9.6 TEST HYPOTHESIS OF ZERO SLOPE COEFFICIENT ("TEST OF STATISTICAL SIGNIFICANCE") ..................... 116
9.7 TEST HYPOTHESIS ON A REGRESSION PARAMETER .............................................................................. 116
9.7.1 Using the p‐value approach ..................................................................................................................... 116
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 2 Page 7 of ‐131
9.7.2 Using the critical value approach ............................................................................................................. 116
9.8 OVERALL TEST OF SIGNIFICANCE OF THE REGRESSION PARAMETERS ................................................... 116
9.9 PREDICTED VALUE OF Y GIVEN REGRESSORS ....................................................................................... 117
9.10 EXCEL LIMITATIONS ............................................................................................................................ 117
9.11 Assignment 9.1 ................................................................................................................................... 117
10 TIME SERIES FORECASTING ................................................................................................... 119
10.1 Time series forecasting models ........................................................................................................... 120
10.1.1 CLASSICAL MULTIPLICATIVE TIME‐SERIES MODEL FOR ANNUAL DATA .............................................. 120
10.1.2 Assignment 9.1 .................................................................................................................................... 121
10.2 Moving Average and Exponential Smoothing ...................................................................................... 121
10.2.1 Moving Average Models ...................................................................................................................... 121
10.2.2 Exponential Smoothing Models ........................................................................................................... 123
10.3 Assignment 10.2 ................................................................................................................................. 124
10.4 Linear, exponential and quadratic trend .............................................................................................. 124
10.4.1 Linear Trend Model ............................................................................................................................. 124
10.4.2 Exponential Trend Model .................................................................................................................... 126
10.4.3 Model Selection Using First, Second, and Percentage Differences ...................................................... 127
10.4.4 Assignment 10.3 .................................................................................................................................. 128
10.5 The autoregressive and the least‐square models for seasonal data ...................................................... 128
10.6 Prices indexes ..................................................................................................................................... 128
10.6.1 Example ............................................................................................................................................... 129
10.7 Aggregated and simple indexes ........................................................................................................... 129
10.7.1 Unweighted Aggregate Price Index ..................................................................................................... 130
10.7.2 Weighted Aggregate Price Indexes ..................................................................................................... 130
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 3 Page 8 of ‐131
Practicum: Math11002 Business Statistics MODULE 1
Date of Receipt
Score: Assistant Signature
Submitted only on Day/Date: ____________ / ______________ Time: 12.00 – 14.00 WIB In ____________________
I herewith signed here on stated that I have strived to do all this the module by myself. Name/NIM : ______________________________/_______________ Signature : _______________________________________________ Rem.:
Module Description: Data Collection and Data Presentation
Objective The student understand the sources of data used in business, types of data used in business, Developing tables and charts for categorical data Developing tables and charts for numerical data and presenting graphs Examination of cross tabulated data using the contingency table and side‐by‐side bar chart and using Microsoft Excel to process business data.
Output Use separate papers to report your results (in hand writing or computer print out). A report produced by the students should be in the form of working procedures and results in both softcopy and hardcopy.
Pre‐Lab Read:
Levine, et.al. 2008. Statistics for Managers – Using Microsoft™ Excel. Fifth Editon. Pearson
Education, Inc., Upper Saddle River, New Jersey., pages 18‐30 and pages 75‐93 .
Set the Ms. Excel Application to be ready for Data Analysis Add‐In. See page 28‐29.
1.1 Answer questions below with a brief description.
1. Explain Key Definition and give at least 1 example ! 1.1 Population :
1.2 Sample:
1.3 Parameter:
1.4 Statistics:
1.5 Descriptive:
1.6 Inferential Statistics:
2. Name three circumstances that require data collection
3. Explain the difference between Descriptive and Inferential Statistics
4. Design questionnaire about data collection of your own with at least 10 question!
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 3 Page 9 of ‐131
5. According to The State of the News Media, 2006, the average age of viewers of "ABC World News
Tonight" is 59 years. Suppose a rival network executive hypothesizes that the average age of ABC
news viewers is less than 59. To test her hypothesis, she samples 500 ABC nightly news viewers and
determines the age of each.
5.1 Describe the population.
5.2 Describe the variable of interest.
5.3 Describe the sample.
5.4 Describe the inference.
6. Problem Cola wars is the popular term for the intense competition between Coca‐Cola and Pepsi
displayed in their marketing campaigns. Their campaigns have featured movie and television stars,
rock videos, athletic endorsements, and claims of consumer preference based on taste tests.
Suppose, as part of a Pepsi marketing campaign, 1,000 cola consumers are given a blind taste test
(i.e., a taste test in which the two brand names are disguised). Each consumer is asked to state a
preference for brand A or brand B.
6.1 Describe the population,
6.2 Describe the variable of interest.
6.3 Describe the sample.
6.4 Describe the inference.
1.2 Use Microsoft Excel complete following tasks !!
1.3 Create Bar chart and also include cumulative line chart using data on table 1.
1.4 Create Pie Graph, and attach excel graph results to as your answer!
Table 1. Percentage Expended Money
What You Would Do With the Money
Percentage (%)
Buy a luxury item, vacation, or gift 20 Give it to charity 2 Pay debt 24 Save 31 Spend on essentials 16 Other 7
1.5 The following data represent the cost of electricity during july 2006 for random
samples of 50 one‐bedroom apartments in large city
Table 2. Utility Charge 96 171 202 178 147 102 153 197 127 82
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 3 Page 10 of ‐131
157 185 90 116 172 111 148 213 130 165 141 149 206 175 123 128 144 168 109 167 95 163 150 154 130 143 187 166 139 149
108 119 183 151 114 135 191 137 129 158
1.6 From a frequency distribution and percentage distribution that have class
interval with upper class limits $99, $119, and so on.
1.7 Construct a histogram and a percentage polygon
1.8 Form a cumulative percentage distribution and plot a cumulative percentage
polygon
1.9 Around what amount does monthly electricity cost seem to be concentrated?
1.10 Appendix
1.10.1 Installing Excel AddIns for PHStat2
The Prentice Hall PHStat Microsoft Excel add‐in enhances Microsoft Excel to better support the
statistical analyses taught in an introductory statistics course. Using PHStat lessens the technical training
needed to use Microsoft Excel to perform statistical analysis and allows you to generate results that
would otherwise be very tedious or impossible to produce from worksheets built from scratch. PHStat
requires that “Data Analysis” is installed on EXCEL and the following system requirements:
Any Windows 95 (or later) system; Microsoft Excel 95 or Microsoft Excel 97 (or later)
32 MB of main memory; 64 MB required when running sampling distribution simulations and
data‐intensive regression analyses; approximately 5 MB hard disk free space during setup
process and 3MB hard disk space after installation.
Preferred Display settings: PHStat will run with any display settings, but for best results set the
Desktop area to 800 by 600 pixels with Small Fonts. (Use the Settings tab of the Display applet of
the Control Panel to change settings.).
1.10.2 INSTALLING “DATA ANALYSIS” ON EXCEL 2007
1. Open Excel and click the Office Button. 2. In the Office Button pane, click Excel Options.
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 3 Page 11 of ‐131
3. In the Excel options dialog box that appears, click Add-Ins in the left panel and look for Analysis ToolPak and Analysis ToolPak –VBA under Active Application Add-ins.
4. If they do not appear, click Go. in the Add-Ins dialog box that appears, verify that Analysis ToolPak and Analysis ToolPak –VBA are both checked in the Add-Ins available list.
5. Click OK and exit Excel to save these settings.
Click on the “Microsoft Office” button in the upper left hand corner of the EXCEL spreadsheet and click
on “EXCEL Options” in the lower right hand corner of the pull‐down menu. On the left side of the “EXCEL
Options” page click on “Add‐ins” and then the “Go” button at the bottom of the page. This should open
the “Add‐ins” section. Select “Analysis ToolPak” and “Analysis ToolPak‐VBA” and click “OK.”
1.10.3 Installing and Operating the Prentice Hall PHStat ON Your Home Computer
To use the Prentice Hall PHStat Microsoft Excel add‐in, you first need to run the setup program
(Setup.exe) located in the PHStat directory on this disk. The setup program will install the PHStat
program files to your system and add icons on your Desktop and Start Menu for PHStat. To do this
simply insert PHStat disk in your CD drive and follow directions.
To operate PHStat or EXCEL simply double clicks on the PHStat icon. For EXCEL 2007 users, you will likely
have to click on “Enable Macros” which should popup by itself.
1.10.4 Configuring Excel 2007 security for PHStat2
You must change the Trust Center settings to allow PHStat2 to properly function. Click the Office Button, and then click Excel Options in the Office menu. In the Excel Options dialog box that appears, click Trust Center and then in the Trust Center panel, click Trust Center Settings. In the left pane of the
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 3 Page 12 of ‐131
Trust Center dialog box that appears, first click Add‐Ins and clear, if necessary all of the check boxes that appear under the Add‐ins banner. Next, click Macro Settings in the left pane and click either Disable all macros with notification (recommended) or Enable all macros (not recommended, use only if the other choice fails to allow PHStat2 to function properly).
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 4 Page 13 of ‐131
Practicum: Math11002 Business Statistics MODULE 2
Date of Receipt
Score: Assistant Signature
Submitted only on Day/Date: ____________ / ______________ Time: 12.00 – 14.00 WIB In ____________________
I herewith signed here on stated that I have strived to do all this the module by myself. Name/NIM : ______________________________/_______________ Signature : _______________________________________________ Rem.:
Module Description: NUMERICAL DESCRIPTIVE MEASURES
Objective Measures of central tendency, variation, and shape Population summary measures Five number summary and Box‐and‐Whisker plots Covariance and Coefficient of correlation.
Output Use separate papers to report your results (in hand writing or computer print out). A report produced by the students should be in the form of working procedures and results in both softcopy and hardcopy.
2 NUMERICAL DESCRIPTIVE MEASURES
2.1 Central Tendency Central tendency refers to the tendency of the individual measures in a distribution to cluster
together toward some point of aggregation.
2.1.1 The Mean Mean or arithmetic mean is value of total sum of values divided by the number of data values
included included to the calculation (quantity of integer).
2.1.1.1 Formula: The Mean Total sum divided by quantity of integers
∑
Where = Sample mean =Number of values or sample size =ith value of the variable X ∑ = Summation of all value in the sample
2.1.1.2 Ms Excel BuiltIn Function for calculating Mean The function is written as follows:
= AVERAGE (argument)
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 4 Page 14 of ‐131
The argument for this function is data contained in the selected range of cells.
Example Using Excel's AVERAGE Function:
Note: For help with this example, see the image to the right.
1. Enter the following data into cells C1 to C6: 11,12,13,14,15,16.
2. Click on cell C7 ‐ the location where the results will be displayed.
3. Type " = average( " in cell C7.
4. Drag select cells C1 to C6 with the mouse pointer.
5. Type the closing bracket " ) " after the cell range in cell C7.
6. Press the ENTER key on the keyboard.
7. The answer ‐ 13.5 ‐ should be displayed in cell C7.
8. The complete function = AVERAGE (C1 : C6) appears in the formula bar above the worksheet.
2.1.2 The Median The MEDIAN shows you the middle value in a list of numbers. Middle, in this case, refers to
arithmetic size rather than the location of the numbers in a list. If there is an even set of
numbers, the median is the average of the middle two values.
2.1.2.1 Formula: The Median Middle value that separates the greater and lesser halves of
a data set
ranked value
2.1.2.2 Ms Excel BuiltIn Function for calculating Median The syntax for the MEDIAN function is:
= MEDIAN ( number1, number2, ... number255 )
Note:Up to 255 numbers can be entered into the function.
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 4 Page 15 of ‐131
Example Using Excel's MEDIAN Function:
Note: For help with this example, see the image to the right.
1. Enter the following data into cells D1 to D5: 4,12,49,24,65.
2. Click on cell E1 ‐ the location where the results will be displayed.
3. Click on the Formulas tab.
4. Choose More Functions > Statistical from the ribbon to open the function drop down list.
5. Click on MEDIAN in the list to bring up the function's dialog box.
6. Drag select cells D1 to D5 in the spreadsheet to enter the range into the dialog box, then Click OK.
7. The answer 24 should appear in cell E1 since there are two numbers larger (49 and 65) and two numbers smaller (4 and 12) than it in the list.
8. The complete function = MEDIAN (D1 : D5) appears in the formula bar above the worksheet when you click on cell F1.
2.1.3 The Mode The mode is Most frequent number in a data set.
2.1.3.1 Formula: The Median For example, the mode of array of 1, 3, 4, 4, 4, 7, 7, 12, 17 is
4.
2.1.3.2 Ms Excel BuiltIn Function for calculating Mode The MODE function, one of Excel's statistical functions, tells
you the most frequently occurring value in a list of numbers.
The syntax for the MODE function is:
= MODE ( number1, number2, ... number255 )
Note:Up to 255 numbers can be entered into the function.
Example Using Excel's MODE Function:
Note: For help with this example, see the image to the right.
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 4 Page 16 of ‐131
1. Enter the following data into cells D1 to D6: 98,135,147,135,98,135. 2. Click on cell E1 ‐ the location where the results will be displayed. 3. Click on the Formulas tab. 4. Choose More Functions > Statistical from the ribbon to open the function drop down list. 5. Click on MODE in the list to bring up the function's dialog box. 6. Drag select cells D1 to D6 in the spreadsheet to enter the range into the dialog box. Then
Click OK. 7. The answer 135 should appear in cell E1 since this number appears the most (three times) in
the list of data. 8. The complete function = MODE (D1 : D6) appears in the formula bar above the worksheet
when you click on cell E1.
2.1.4 Quartiles Quartiles often are used in sales and survey data to divide populations into groups. For example,
you can use QUARTILE to find the top 25 percent of incomes in a population.
2.1.4.1 Formulas of Quartiles
First quartile (designated Q1) = lower quartile = cuts off lowest 25% of data = 25th percentile
Ssecond quartile (designated Q2) = median = cuts data set in half = 50th percentile
Third quartile (designated Q3) = upper quartile = cuts off highest 25% of data, or lowest 75% = 75th percentile
2.1.4.2 Ms Excel BuiltIn Function for calculating Mode The syntax for the MODE function is:
=QUARTILE(array,quart)
Array is the array or cell range of numeric values for which you want the quartile value.
Quart indicates which value to return.
If quart equals QUARTILE returns
0 Minimum value
1 First quartile (25th percentile)
2 Median value (50th percentile)
3 Third quartile (75th percentile)
4 Maximum value
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 4 Page 17 of ‐131
2.1.5 The Geometric Mean The Geometric Mean measures the rate of change of a variable over time. Returns the geometric
mean of an array or range of positive data. For example, you can use GEOMEAN to calculate average growth rate given compound interest with variable rates
2.1.5.1 Formula: The Geometric Mean
Geometric Mean is the nth root of the product of n values
xG X X X
Or
Geometric Mean Rate of Return measures the average percentage return of an investment over
time.
RG 1 R 1 R 1 R 1
2.1.5.2 Ms Excel BuiltIn Function for calculating Geometric Mean
Syntax
= GEOMEAN(number1,number2,...)
Number1, number2, ... are 1 to 255 arguments for which you
want to calculate the mean. You can also use a single
array or a reference to an array instead of arguments
separated by commas.
Example:
1. Enter data to cells A2 through A8 : 4, 5, 8, 7, 11, 4, 3
2. On B4 type formula =GEOMEAN(A2:A8). Then Click ENTER. 3. The answer 5.47698697 should appear in cell B4
2.1.6 Other useful Excel Basic BuiltIn Functions:
2.1.6.1 SUM
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 4 Page 18 of ‐131
Horizontal 100 200 300 600 =SUM(C4:E4)
Vertical 100 200 300 600 =SUM(C7:C9)
Single Cells 100 300 600
200
What Does It Do ? This function creates a total from a list of numbers. It can be used either horizontally or vertically. The numbers can be in single cells, ranges are from other functions.
Syntax =SUM(Range1,Range2,Range3... through to Range30).
2.1.6.2 COUNT
Entries To Be Counted Count
10 20 30 3 =COUNT(C4:E4)
10 0 30 3 =COUNT(C5:E5)
10 -20 30 3 =COUNT(C6:E6)
10 1-Jan-88 30 3 =COUNT(C7:E7)
10 21:30 30 3 =COUNT(C8:E8)
10 0.758576 30 3 =COUNT(C9:E9) 10 30 2 =COUNT(C10:E10)
10 Hello 30 2 =COUNT(C11:E11)
10 #DIV/0! 30 2 =COUNT(C12:E12)
What Does It Do ? This function counts the number of numeric entries in a list. It will ignore blanks, text and errors.
Syntax =COUNT(Range1,Range2,Range3... through to Range30)
2.1.6.3 MAX
Values Maximum
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 4 Page 19 of ‐131
120 800 100 120 250 800 =MAX(C4:G4)
Dates Maximum 1-Jan-98 25-Dec-98 31-Mar-98 27-Dec-98 4-Jul-98 27-Dec-98 =MAX(C7:G7)
What Does It Do ? This function picks the highest value from a list of data.
Syntax =MAX(Range1,Range2,Range3... through to Range30)
2.1.6.4 MIN
Values Minimum 120 800 100 120 250 100 =MIN(C4:G4)
Dates Maximum 1-Jan-98 25-Dec-98 31-Mar-98 27-Dec-98 4-Jul-98 1-Jan-98 =MIN(C7:G7)
What Does It Do ? This function picks the lowest value from a list of data.
Syntax =MIN(Range1,Range2,Range3... through to Range30)
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 4 Page 20 of ‐131
2.2 Assignment 2.1: The sample data of 38 banks for direct deposit customers who maintain a Rp. 100(millions) balance:
26 28 40 20 21 22 25 25 18 25 15 20
18 20 25 25 22 30 30 3 15 20 29 26
28 10 2 21 22 25 25 18 25 15 20 18
20 25 25 22 30 30 30 65 20 29 23 45
1. Using formulas above calculate Mean, Median, Mode, Quartiles and Geometric Mean of the
sample data.
2. Use Ms Excel Functions to calculate Mean, Median, Mode, Quartiles and Geometric Mean of the
sample data.
3. Compare the result and report your analysis.
2.3 Variation Variability or variation refers to the overall separations and differences that exist among the
individual measures in a distribution, while central tendency refers to their closeness and
similarity. Variation measures the spread or the dispersion of values in a data set.
2.3.1 The Range The Range equal to the largest value minus the smallest value.
2.3.1.1 Formula: The Range
2.3.1.2 Ms Excel BuiltIn Function for calculating The Range To calculate the range in Ms Excel we use two built‐in function: MAX() and MIN() . See Section
1.1.6 above.
Based on the formula of the range above the syntax of formula to calculate The Range:
= MAX()‐MIN()
Example:
1. Enter data to cells A2 through A8 : 4, 5, 8, 7, 11, 4, 3
2. On B4 type formula = MAX(A2:A8)‐MIN(A2:A8). Then Click ENTER.
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 4 Page 21 of ‐131
3. The answer 5.47698697 should appear in cell B4
2.3.2 The InterQuartile Range The InterQuartile Range equal to the different between the third quartile and the first quartile in a set
of data.
2.3.2.1 Formula: The Range
2.3.2.2 Ms Excel BuiltIn Function for calculating The Range To calculate the range in Ms Excel we use FORMULA with built‐in
function QUARTILE(). See Section 1.1.4.2 above.
Based on the formula of the range above the syntax of formula to
calculate The Range:
= QUARTILE(range,3)‐QUARTILE(range,1)
Example:
1. Enter data to cells A2 through A8 : 4, 5, 8, 7, 11, 4, 3
2. On B4 type formula = QUARTILE(A2:A8,3)‐ QUARTILE(A2:A8,1). Then Click ENTER. 3. The answer 5.47698697 should appear in cell B4
2.3.3 The Variance and Standar Deviation The InterQuartile Range equal to the different between the third quartile and the first quartile in a set
of data.
2.3.3.1 Formula: The Variance and Standard Deviation Variance formula:
1
Or
∑
1
Standar Variation formula:
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 4 Page 22 of ‐131
∑
1
2.3.3.2 Ms Excel BuiltIn Function for calculating Variance and Standard Deviation To calculate the range in Ms Excel we use FORMULA with built‐in function VAR(), and the
standard deviation we user STDEV()
Syntax:
=VAR(number1,number2,...)
=STDEV(number1,number2,...)
Number1, number2, ... are 1 to 255 number arguments
corresponding to a sample of a population
Example for VARIANCE:
1. Enter data to cells A2 through A8 : 4, 5, 8, 7, 11, 4, 3 2. On B4 type formula = VAR(A2:A8). Then Click ENTER. 3. The answer 8 should appear in cell B4
Example for STANDAR DEVIATION:
1. Enter data to cells A2 through A8 : 4, 5, 8, 7, 11, 4, 3
2. On B4 type formula = STDEV(A2:A8). Then Click ENTER. 3. The answer 2.828427125 should appear in cell B4
2.3.4 The Coefficient of Variance The Coefficient of Variance is a relative measure of variation that always expressed in percentage.
2.3.4.1 Formula: The Coefficient of Variance The coefficient of variance is equal to the standard deviation divided by the mean and multiplied by
100%
Formula:
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 4 Page 23 of ‐131
100%
2.3.4.2 Ms Excel BuiltIn Function for calculating Variance and Standard Deviation To calculate the range in Ms Excel we use FORMULA with built‐in function STDEV(), and the mean
we use AVERAGE()
Syntax:
=(STDEV(number1,number2,...)/AVERAGE(number1,number2,...))*100%
Number1, number2, ... are 1 to 255 number arguments corresponding to a sample of a population
Example for VARIANCE:
1. Enter data to cells A2 through A8: 4, 5, 8, 7, 11, 4, 3
2. On B4 type formula = (STDEV(A2:A8)/AVERAGE(A2:A8)X100% Then Click ENTER.
3. The answer 8 should appear in cell B4
2.3.5 Z Scores Z Scores is an extreme value or outlier located far away from the mean.
Formula:
2.3.5.1 Ms Excel BuiltIn Function for Z Scores To calculate Z Score in Ms Excel we use FORMULA with built‐in function STDEV(), and the mean
we use AVERAGE()
Syntax:
'=(number - AVERAGE(range of number))/STDEV(range of number)
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 4 Page 24 of ‐131
number is 1 number argument corresponding to a sample of a population
Range of Number are 1 to 255 number arguments corresponding to a sample of a population
Example for VARIANCE:
1. Enter data to cells A2 through A8 : 4, 5, 8, 7, 11, 4, 3
2. On C2 type formula '=(A2‐AVERAGE($A$2:$A$8))/STDEV($A$2:$A$8) Then copy to others cells ( C3 to C8) ENTER.
3. The answer ‐0.707106781 should appear in cell C2.
2.4 Shape The of a data set represents a pattern of all the values, from the lowest to the highest value. A
distribution is either symmetrical or skewed. A symmetrical distribution is values below mean are
distributed exactly as the values above the mean. While skewed distribution will results in an imbalance
of low values or high values.
2.4.1 Formula: Shape influences the relationship of the mean to the median in the following ways:
Mean < Median: negative or left skewed
Mean = Median: symmetric or zero skewness
Mean > Median: positive or right skewed
2.4.1.1 Ms Excel Function for calculating skewness
Returns the skewness of a distribution. Skewness characterizes the degree of asymmetry of a distribution
around its mean. Positive skewness indicates a distribution with an asymmetric tail extending toward more
positive values. Negative skewness indicates a
distribution with an asymmetric tail extending toward
more negative values.
Syntax
= SKEW(numbers)
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 4 Page 25 of ‐131
Examples:
Example for Negative SKEWNESS:
1. Enter data to cells A2 through A8 : 10,10,20,30,40,50,50
2. On B2 type formula =SKEW(A3:A8) , then press ENTER. 3. The answer ‐0.38 should appear in cell B2. This mean number mean that the
distribution of data (A3 to A8) is negative 4. On B4 type formula =SKEW(A2:A8) , then press ENTER. 5. The answer 0 should appear in cell B4. This mean number mean that the
distribution of data (A2 to A8) is symmetric. 6. On B6 type formula =SKEW(A2:A7) , then press ENTER. 7. The answer +0.38 should appear in cell B6. This mean number mean that the
distribution of data (A2 to A7) is positive
2.5 Assignment 2.2: Using Data on 1.2 above calculate or compose Range, InterQuartile Range, Variance and
Standar Deviation, The Coefficient of Variance, Z Scores, Shape. Report your results.
2.6 Descriptive summary of population The Descriptive Statistics procedure of the ToolPak add‐in.
INSTALLING “DATA ANALYSIS” ON EXCEL
2.6.1 Excel Statistical Analysis Tools Excel has several data analysis tools included through an Analysis ToolPak add-in. These tools can quickly produce complex engineering or statistical analyses of your data. Each tool is a little different, but all require you to input what data you wish Excel to analyze.
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 4 Page 26 of ‐131
Data Analysis… is located under the Tools menu. If the option is not there, you will need to install the
Analysis ToolPak.
2.6.2 Install and use the Analysis ToolPak
1. On the Tools menu, click Add‐Ins….
2. Select the Analysis ToolPak check box.
3. On the Tools menu, click Data Analysis.
Note: If Analysis ToolPak is not listed in the Add‐
Ins dialog box, click Browse… and locate the
drive, folder name, and file name for the Analysis
ToolPak add‐in, Analys32.xll — usually located in
the Microsoft Office\Office\Library\Analysis
folder — or run the Setup program if it isn't
installed.
For EXCEL 2007:
1. Click on Data Tab and click on “Data Analysis” Icon on Data Tab.
2. Click on the “Microsoft Office” button in the upper left hand corner of the EXCEL
spreadsheet and click on “EXCEL Options” in the lower right hand corner of the pull-down menu. On the left side of the “EXCEL Options” page click on “Add-ins” and then the “Go” button at the bottom of the page. This should open the “Add-ins” section.
3. Select “Analysis ToolPak” and “Analysis ToolPak-VBA” and click “OK.”
For EXCEL 2003 or earlier version:
1. Click on the “Tools” tab/pull-down menu and click on “Data Analysis.”
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 4 Page 27 of ‐131
2. If “Data Analysis” does not appear on the “Tools” pull-down menu, then click on “Add-Ins” and click on the first two boxes (“Analysis ToolPak” and “Analysis ToolPak-VBA”). Click “OK” and open “Data Analysis.”
Using ToolPak Descriptive Statistics
Begin the Analysis ToolPak add-in and Descriptive Statistics from the Analysis Tools list and Click OK. In the Descriptive Statistics dialog box (shown below), enter the cell range of the data as the Input Range. Click the Column option and Labels in first row. See Designing Effective Worksheets in Section 1.6 of Levine, et.al. 2008. Statistics For Managers Using Microsoft Excel, Fifth Edition. Pearson Education, Inc. Upper Saddle River, New Jersey, 07458.
Finish by clicking New Worksheet Ply, Summary statistics, Kth Larget, and Kth Smallest, and the OK.
2.7 Boxwhisker plot In descriptive statistics, a box plot or boxplot (also known as a box‐and‐whisker diagram or plot) is a
convenient way of graphically depicting groups of numerical data through their five‐number summaries:
the smallest observation (sample minimum), lower quartile (Q1), median (Q2), upper quartile (Q3), and
largest observation (sample maximum). A boxplot may also indicate which observations, if any, might be
considered outliers.
A boxplot, or box and whisker diagram, provides a simple graphical summary of a set of data. It shows a
measure of central location (the median), two measures of dispersion (the range and inter‐quartile
range), the skewness (from the orientation of the median relative to the quartiles) and potential outliers
(marked individually). Boxplots are especially useful when comparing two or more sets of data.
Regrettably, there is currently no boxplot facility in Microsoft Excel. For simplicity, many recent statistics
textbooks (for example, Daly et al, 1995) omit the fences used to identify possible outliers. These
simplified boxplots, displaying most of the important features, can be drawn quite easily in Excel. In the
absence of any fences (see Devore and Peck (1990) for a definition), a simple rule is that a whisker which
is longer than three times the length of the box probably indicates an outlier.
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 4 Page 28 of ‐131
Create Box‐Wisher Plot
To create BoxPlot using Ms Excel 2007 :
1. Highlight the whole table, including figures and series labels, then select Insert from the main
menu. Under Charts select a Line chart and choose the Line with Markers option.
2. Under Chart Tools select Design > Switch Row/Column. Right‐click on a data point from the first
data series, and choose Format Data Series > Line Colour > No line to remove the connecting
lines. Repeat for the other four data series in turn.
3. Select any of the data series and under Chart Tools select Layout > Analysis > Lines > High‐Low
Lines, then Layout > Analysis > Lines > Up/Down Bars > Up/Down Bars.
4. Further customising can be carried out according to your own preferences by right‐clicking on
the relevant object and selecting the Format option on the shortcut menu.
The Result:
0
10
20
30
40
50
60
70
80
90
set 1 set 2 set 3
Q1
Min
Median
Max
Q3
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 4 Page 29 of ‐131
2.8 Assignment 2.3 Replicate section 2.7 Box‐whisker plot procedure
2.9 Weighted mean Excel does not contain a built in function to calculate a weighted average. It is however easy to do it
using the SUMPRODUCT() function in a simple formula.
‐ A B C
1 Weighted average
2
3 Cost Staff
4 Grade A 13000 5
5 Grade B 15000 2
6 Grade C 20000 3
7
8 Average 16000
9 Wtd Avg 15500
SumProduct() multiplies two arrays (or ranges) together and returns the sum of the product. In the
illustration it would calculate '(B4 x C4) + (B5 x C5) + (B6 x C6)'.
The formula in cell B9 is: = SUMPRODUCT(B4:B6, C4:C6) / SUM(C4:C6)
The result shows that the weighted average is less than the plain arithmetic mean. This is because it has
taken into account the larger number of staff being paid the lower salary.
‐ F G H
13 Forecast incorporating risk
14
15 ProbabilitySales
16 Good weather 30% 10000
17 Mediocre weather 50% 8000
18 Poor weather 19% 2000
19 Hurricane 1% 0
20
21 Forecast 100% 7380
The weighted average can also be used for assessing the risk or determining the probability of various
outcomes. If a judgement is made about the likelihood of various weather conditions for an outdoor
sporting and the effect on ticket sales, a predicted value of sales can be calculated using a similar
formula as the previous example. =SUMPRODUCT(G16:G19, H16:H19) returns the value of 7,380. The
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 4 Page 30 of ‐131
probability values (G16:G19) are already expressed as percentages (total= 100% or 1.0) and so there is
no need to divide by SUM(G16:G19).
2.10 Assignment 2.4 Capital Component Cost % of capital structure
Retained Earnings 8% 30%
Common Stocks 9% 10%
Preferred Stocks 10% 15%
Debt (Bonds) 6.67% 45%
Using table above Calculate the weighted average cost of capital (WACC) of this company !
2.11 Correlation coefficients
2.11.1.1 Correlation Coefficients Formula
If (X1,Y1 ),(X2,Y2 ),(X3,Y3 )...,(Xn,Yn ) are the observed values then the correlation coefficient (usually denoted as Corr(X,Y) or ρXY ) of the observed sample is defined as:
∑
∑ ∑
Another way of visualizing the formula is:
,
Now we generalize the idea of sample correlation coefficient when the sample is not bivariate but multivariate.
Let X~1, X~2, X
~3,..., X
~n be a random sample where each X~i is a k‐dimensional vector of the form
X~i = Xi1, Xi2, Xi3,..., Xin. Just like in the previous topic.
Just like in the case of sample covariance, in the multivariate case we talk of sample correlation coefficient matrix. Like the dispersion matrix, the sample correlation coefficient matrix is a square matrix of order k x k defined as below.
All the diagonal entries are 1 as both mathematically and heuristically we see that the correlation coefficient of any variable with itself should be 1.
ρii = 1 for all i
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 4 Page 31 of ‐131
Similar to the dispersion matrix, the off‐diagonal elements are correlation coefficient of the ith and jth variables.
,∑
∑ ∑
Or in another way:
,
2.11.1.2 Ms Excel Function for calculating correlation
Step1: To make this calculation select Tools/Data Analysis/Correlation… The following dialog box is displayed:
Step 2: In the input range textbox enter the range of the data (include the first row containing the variable name) or click on the data selection icon and mark the range to use. Step 3: Notice that the “Labels in First Row” checkbox is checked. Step 4: Click on OK and the following information will appear in a new worksheet:
A B
1 TIME1 TIME2
2 TIME1 1
3 TIME2 0.763957 1
The Pearson’s correlation for these two variables is 0.764 (rounded.)
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 4 Page 32 of ‐131
Example 2
2.11.1.3 A second way to calculate the correlation is with a function. Step1: In the Example worksheet, enter some labels in column I to indicate that you are calculating a correlation.
Step 2: In the J3 (or wherever you want it) cell, you will enter an Excel function that will calculate the desired correlation.
Step 3: Enter the formula
=CORREL(C2:C51,D2:D51)
Note that it is of the form, =CORREL(array1,array2)
Where the first array and second array contain the paired numbers to correlate. It is IMPORTANT that the numbers be paired correctly.)
The answer will appear in the cell. In this case, the Pearson’s correlation is 0.764 (rounded.)
2.11.1.4 Calculate the correlation is with using Formula
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 4 Page 33 of ‐131
2.12 Covariance
For a bivariate sample we have dealt with the covariance already. Let us just recall it:
Given a random sample (X1,Y1 ),(X2,Y2 ),(X3,Y3 )...,(Xn,Yn ) the sample covariance Cov(X,Y) is defined as
,1
2.12.1.1 Ms Excel Calculation for Covariance:
2.12.1.2 Ms Excel Function for Covariance:
To calculate Covariance using Ms Excel Function we can use COVAR(array1,array2)
The covariance calculation on Ms Function base on equation, where x and y are the sample means AVERAGE(array1) and AVERAGE(array2), and n is the sample size.
2.13 Assignment 2.5
2.13.1 Calories and Fat relationship
Product Calories Fat
Dunkin' Donuts Iced Mocha Swirl latte (whole milk) 240 8.0
Starbucks Coffee Frappuccino blended coffee 260 3.5
Dunkin' Donuts Coffee Coolatta (cream) 350 22.0
Starbucks Iced Coffee Mocha Expresso (whole milk and whipped cream 350 20.0
Starbucks Mocha Frappuccino blended coffee (whipped cream) 420 16.0
Starbucks Chocolate Brownie Frappuccino blended coffee (whipped cream) 510 22.0
Starbucks Chocolate Frappuccino Blended Crème (whipped cream) 530 19.0
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 4 Page 34 of ‐131
Using data above calculate:
a. The covariance using both technique above and compare. Explain ! b. Compute the coefficient of correlation using techniques explained above. c. Which do you think is more valuable in expressing the relationship between calories and fat – the covariance
or the coefficient of correlation? Explain. d. What your conclusions about the relationship between Calories and Fat? Explain.
2.13.2 Fuel Efficiency Calculation and Standard
Car Owner Government
Standard
2005 Ford F-150 14.3 16.8
2005 Chevrolet Silverado 15.0 17.8
2002 Honda Accord LX 27.8 26.2
2002 Honda Civic 27.9 34.2
2004 Honda Civic Hybrid 48.8 47.6
2002 Ford Explorer 16.8 18.3
2005 Toyota Camry 23.7 28.5
2003 Toyota Corolla 32.8 33.1
2005 Toyota Prius 37.3 56.0
a. Compute the covariance using both techniques explained above and compare. Explain ! b. Compute the coefficient of correlation using techniques explained above. c. What your conclusions about the relationship between Owner Calculation and Government Standard?
Explain.
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 5 Page 35 of ‐131
Practicum: MATH11002 Business Statistics
MODULE 3
Date of Receipt
Score: Assistant Signature
Submitted only on Day/Date: ____________ / ______________ Time: WIB In ____________________
I herewith signed here on stated that I have strived to do all this with the module myself. Name/NIM : ______________________________/_______________ Signature : _______________________________________________ Rem.:
Module Description: Probability
Objective The student understand and able to define and examine basic probability concepts Define conditional, joint and marginal probability To use Bayes' theorem to revise probabilities Statistical Independence; Addressed the probability of a discrete random variable Define covariance and discuss its application in finance To compute probability from the binomial, Poisson and Hypergeometric distribution How to use this distribution to solve business problem using Ms Excel Regression Analysis or Other Statistical Softwares.
Output A report produced by the students should be in the form of working procedures and results in both softcopy and hardcopy.
3 PROBABILITY
3.1 Basic Probability
Probability: the chance that an uncertain event will occur (always between 0 and 1)
Event: Each possible type of occurrence or outcome
Simple Event: an event that can be described by a single characteristic
Sample Space: the collection of all possible events
There are three approaches to assessing the probability of an uncertain event:
1. A priori Classical Probability: the probability of an event is based on prior knowledge of the
process involve
d.
Example: Find the probability of selecting a face card (Jack, Queen, or King) from a standard
deck of 52 cards. Answer:
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 5 Page 36 of ‐131
2. Empirical Classical Probability: the probability of an event is based on observed data.
Example: Find the probability of selecting a male taking statistics from the population described
in the following table:
Taking Stats Not Taking Stats Total
Male 84 145 229
Female 76 134 210
Total 160 279 439
84439
0.191
3. Subjective Probability: the probability of an event is determined by an individual, based on that
person’s past experience, personal opinion, and/or analysis of a particular situation.
3.2 Sample spaces and events, contingency tables, simple probability and joint probability
3.2.1 Sample Space The Sample Space is the collection of all possible events
Ex. All 6 faces of a die:
Ex. All 52 cards in a deck of cards
Ex. All possible outcomes when having a child: Boy or Girl
3.2.2 Event in Sample Space Simple event
An outcome from a sample space with one characteristic
ex. A red card from a deck of cards
Complement of an event A (denoted A/)
All outcomes that are not part of event A
ex. All cards that are not diamonds
Joint event
Involves two or more characteristics simultaneously
observedoutcomesofnumber total
observed outcomes favorable ofnumber Occurrence ofy Probabilit
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 5 Page 37 of ‐131
ex. An ace that is also red from a deck of cards
In mathematics, a probability of an event A is represented by a real number in the range from 0 to 1 and
written as P(A), p(A) or Pr(A). An impossible event has a probability of 0, and a certain event has a
probability of 1. The opposite or complement of an event A is the event [not A] (that is, the event of A
not occurring); its probability is given by P(not A) = 1 ‐ P(A). As an example, the chance of not rolling a six
on a six‐sided die is 1 ‐ (chance of rolling a six) =1 .
3.2.3 Simple and Joint Probability Simple (Marginal) Probability refers to the probability of a simple event.
ex. P(King)
Joint Probability refers to the probability of an occurrence of two or more events.
ex. P(King and Spade)
If both the events A and B occur on a single performance of an experiment this is called the intersection
or joint probability of A and B, denoted as . If two events, A and B are independent then the
joint probability is
.
for example, if two coins are flipped the chance of both being heads is
If either event A or event B or both events occur on a single performance of an experiment this is called
the union of the events A and B denoted as . If two events are mutually exclusive then the
probability of either occurring is
For example, the chance of rolling a 1 or 2 on a six‐sided die is
1 2 1 2 1 216
16
26
13
If the events are not mutually exclusive then
For example, when drawing a single card at random from a regular deck of cards, the chance of getting a
heart or a face card (J,Q,K) (or one that is both) is , because of the 52 cards of a deck 13
are hearts, 12 are face cards, and 3 are both: here the possibilities included in the "3 that are both" are
included in each of the "13 hearts" and the "12 face cards" but should only be counted once.
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 5 Page 38 of ‐131
Conditional probability is the probability of some event A, given the occurrence of some other event B.
Conditional probability is written P(A|B), and is read "the probability of A, given B". It is defined by
|
If P(B) = 0 then P(A|B) is undefined.
Summary of probabilities Event Probability
A 0,1 not A 1 A or B
if A and B are mutually exclusive
A and B |if A and B are independent
A given B |
3.3 Bayes' Theorem
))P(BB|P(A))P(BB|P(A))P(BB|P(A
))P(BB|P(AA)|P(B
kk2211
iii
where:
Bi = ith event of k mutually exclusive and collectively exhaustive events
A = new event that might impact P(Bi)
Bayes’ Theorem Example
A drilling company has estimated a 40% chance of striking oil for their new well. A detailed test has
been scheduled for more information. Historically, 60% of successful wells have had detailed tests, and
20% of unsuccessful wells have had detailed tests. Given that this well has been scheduled for a
detailed test, what is the probability that the well will be successful?
Solution:
Let S = successful well and U = unsuccessful well
P(S) = .4 , P(U) = .6 (prior probabilities)
Define the detailed test event as D
Conditional probabilities: P(D|S) = .6 and P(D|U) = .2
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 5 Page 39 of ‐131
667.12.24.
24.
)6)(.2(.)4)(.6(.
)4)(.6(.
U)P(U)|P(DS)P(S)|P(D
S)P(S)|P(DD)|P(S
Given the detailed test, the revised probability of a successful well has risen to .667 from the
original estimate of 0.4.
Event Prior Prob. Conditional
Prob. Joint Prob. Revised Prob.
S (successful) .4 .6 .4*.6 = .24 .24/.36 = .667
U (unsuccessful) .6 .2 .6*.2 = .12 .12/.36 = .333
3.4 Assignment 3.1 Create entry as screenshot below or use Probability.xls workbook file from CD companion of Statistics
for Managers Using Microsoft Excel Textbook.
Input the data only to the blue color cells.
Probabilities
Sample Space Column Variable
B B' Totals
Row Variable A 200 50 250
A' 100 650 750
Totals 300 700 1000
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 5 Page 40 of ‐131
Simple Probabilities
P(A) 0.25
P(A') 0.75
P(B) 0.30
P(B') 0.70
Joint Probabilities
P(A and B) 0.20
P(A and B') 0.05
P(A' and B) 0.10
P(A' and B') 0.65
Addition Rule
P(A or B) 0.35
P(A or B') 0.90
P(A' or B) 0.95
P(A' or B') 0.80
1. A Music Store has been visited by 7 customers that have been bought some goods and 9 others
just window shopping at random times. Achmad (customer) arrived at 11:30 am.
a. Give an example of a simple event
b. What is the complement of a customer have been bought some goods?
2. Given the following contingency table:
B B’
A 12 48
A’ 30 54
Use calculator and MS Excel to find the probability of
a. Event A’
b. Event A and B
c. Event A’ and B
d. Event A’ and B’
3. Compare calculation results (calculator and Ms Excel)
4. A box of nine gloves contains two left‐handed gloves and seven right handed gloves.
a. if two gloves are randomly selected from the box without replacement, what is the
probability that both gloves selected will be right‐handed?
b. if two gloves are randomly selected from the box without replacement, what is the
probability there will be one right‐handed and one left‐handed gloves?
c. if three gloves are selected from the box with replacement, what is the probability that
all three gloves will be left right‐handed?
d. If you were sampling with replacement, what would be the answers to (a) and (b)?
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 5 Page 41 of ‐131
5. An advertizing executive is studying television viewing habits of married man and women during
prime‐time hours. Based on past viewing records, the executive has determined that during
prime‐time, husbands are watching television 60% of the time. When the husband is watching
television, 40% of the time the wife is also watching. When the husband is not watching
television, 30% of the time the wife is watching television. Find the probability that
a. If the wife is watching television, the husband is also watching television
b. The wife is watching television in prime time.
3.5 Basic Probability Rules A random variable represents a possible numerical value from an uncertain event.
Discrete random variables produce outcomes that come from a counting process (i.e. number
of classes you are taking).
Continuous random variables produce outcomes that come from a measurement (i.e. your
annual salary, or your weight).
3.5.1 Discrete Random Variable A probability distribution for a discrete random variable is a mutually exclusive listing of all
possible numerical outcomes for that variable and a particular probability of occurrence
associated with each outcome
Number of Classes Taken Probability
2 0.2
3 0.4
4 0.24
5 0.16
Example: Experiment with toss 2 coins. Let X = number of heads.
X Value Probability
0 1/4 = .25
1 2/4 = .50
2 1/4 = .25
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 5 Page 42 of ‐131
3.5.2 Discrete Random Variables Expected Value Expected Value (or mean) of a discrete distribution (Weighted Average)
N
iii XPX
1
)( E(X)
Example: Toss 2 coins, X = # of heads,
Compute expected value of X:
E(X) = (0)(.25) + (1)(.50) + (2)(.25) = 1.0
3.5.3 Discrete Random Variables Dispersion Variance of a discrete random variable
N
1ii
2i
2 )P(XE(X)][Xσ
Standard Deviation of a discrete random variable
N
1ii
2i
2 )P(XE(X)][Xσσ
where:
E(X) = Expected value of the discrete random variable X
Xi = the ith outcome of X
P(Xi) = Probability of the ith occurrence of X
Example: Toss 2 coins, X = # heads, compute standard deviation (recall that E(X) = 1)
.707.50(.25)1)(2(.50)1)(1(.25)1)(0σ 222
3.5.4 Covariance The covariance measures the strength of the linear relationship between two numerical random
variables X and Y. A positive covariance indicates a positive relationship. A negative covariance
indicates a negative relationship.
Covariance formula: )()]()][(([σ1
N
iiiiiXY YXPYEYXEX
where: X = discrete variable X
Xi = the ith outcome of X
Y = discrete variable Y
Yi = the ith outcome of Y
P(XiYi) = probability of occurrence of the condition affecting
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 5 Page 43 of ‐131
the ith outcome of X and the ith outcome of Y
Example:
Consider the return per $1000 for two types of investments
Economic P(X
iYi) Condition
Investment
Passive Fund X Aggressive Fund Y
0.2 Recession ‐ $25 ‐ $200
0.5 Stable Economy + $50 + $60
0.3 Expanding Economy + $100 + $350
Investment Returns ‐ The Mean
E(X) = μX = (‐25)(.2) +(50)(.5) + (100)(.3) = 50
E(Y) = μY = (‐200)(.2) +(60)(.5) + (350)(.3) = 95
Interpretation: Fund X is averaging a $50.00 return and fund Y is averaging a $95.00
return per $1000 invested.
Investment Returns ‐ Standard Deviation
43.30(.3)50)(100(.5)50)(50(.2)50)(-25σ 222X
71.193)3(.)95350()5(.)9560()2(.)95200-(σ 222Y
Interpretation: Even though fund Y has a higher average return, it is subject to much
more variability and the probability of loss is higher.
Investment Returns – Covariance
8250
95)(.3)50)(350(100 95)(.5)50)(60(5095)(.2)200-50)((-25σXY
Interpretation: Since the covariance is large and positive, there is a positive relationship
between the two investment funds, meaning that they will likely rise and fall together.
3.5.5 The Sum of Two Random Variables: Measures Expected Value: )()()( YEXEYXE
Variance: XYYXYXYX 2σσσσ)Var( 222
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 5 Page 44 of ‐131
Standard deviation: 2σσ YXYX
Example: Portfolio Expected Return and Expected Risk
Investment portfolios usually contain several different funds (random variables)
The expected return and standard deviation of two funds together can now be calculated.
Investment Objective: Maximize return (mean) while minimizing risk (standard deviation).
Recall: Investment X: E(X) = 50 σX = 43.30
Investment Y: E(Y) = 95 σY = 193.21
σXY = 8250
Suppose 40% of the portfolio is in Investment X and 60% is in Investment Y:
77)95()6(.)50(4.E(P)
04.1338250)2(.4)(.6)((193.21))6(.(43.30)(.4)σ 2222 P
The portfolio return is between the values for investments X and Y considered individually.
3.6 Binomial Distribution
3.6.1 Properties A fixed number of observations, n
ex. 15 tosses of a coin; ten light bulbs taken from a warehouse Two mutually exclusive and collectively exhaustive categories
ex. head or tail in each toss of a coin; defective or not defective light bulb; having a boy or girl
Generally called “success” and “failure” Probability of success is p, probability of failure is 1 – p
Constant probability for each observation ex. Probability of getting a tail is the same each time we toss the coin
Observations are independent
The outcome of one observation does not affect the outcome of the other
Two sampling methods
Infinite population without replacement
Finite population with replacement
The number of combinations of selecting X objects out of n objects is:
X)!(nX!
n!
X
nCXn
where:
n! =n(n ‐ 1)(n ‐ 2) . . . (2)(1)
X! = X(X ‐ 1)(X ‐ 2) . . . (2)(1)
0! = 1 (by definition)
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 5 Page 45 of ‐131
3.6.2 The Binomial Distribution Formula
XnX )(1X)!(nX!
n!P(X)
pp
P(X) = probability of X successes in n trials, with probability of success p on each trial
X = number of ‘successes’ in sample, (X = 0, 1, 2, ..., n) N = sample size (number of trials or observations) P = probability of “success” Example: What is the probability of one success in five observations if the probability of success
is .1? X = 1, n = 5, and p = .1
.32805
)(5)(.1)(.9
.1)(1(.1)1)!(51!
5!
)(1X)!(nX!
n!1)P(X
4
151
XnX
pp
3.6.3 The shape and Characteristics The shape of the binomial distribution depends on the values of p and n
Mean: pnE(x)μ
Variance and Standard Deviation
)-(1nσ2 pp and )-(1nσ pp
0.5(5)(.1)nμ p
0.6708.1)(5)(.1)(1)-(1nσ pp
2.5(5)(.5)nμ p
1.118.5)(5)(.5)(1)-(1nσ pp
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 5 Page 46 of ‐131
3.7 Poisson Distribution An area of opportunity is a continuous unit or interval of time, volume, or such area in which
more than one occurrence of an event can occur.
ex. The number of scratches in a car’s paint
ex. The number of mosquito bites on a person
ex. The number of computer crashes in a day
3.7.1 Properties Count the number of times an event occurs in a given area of opportunity
The probability that an event occurs in one area of opportunity is the same for all areas of
opportunity
The number of events that occur in one area of opportunity is independent of the number
of events that occur in the other areas of opportunity
The probability that two or more events occur in an area of opportunity approaches zero as
the area of opportunity becomes smaller
The average number of events per unit is (lambda)
3.7.2 Formula
X!
λeP(X)
xλ
where:
X = the probability of X events in an area of opportunity
= expected number of events
e = mathematical constant approximated by 2.71828…
Suppose that, on average, 5 cars enter a parking lot per minute. What is the probability that in a
given minute, 7 cars will enter? So, X = 7 and λ = 5
0.1047!
5e
X!
λeP(7)
75xλ
So, there is a 10.4% chance 7 cars will enter the parking in a given minute.
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 5 Page 47 of ‐131
3.7.3 Shape
3.8 Hypergeometric distribution
The binomial distribution is applicable when selecting from a finite population with replacement
or from an infinite population without replacement.
The hypergeometric distribution is applicable when selecting from a finite population without
replacement.
“n” trials in a sample taken from a finite population of size N
Sample taken without replacement
Outcomes of trials are dependent
Concerned with finding the probability of “X” successes in the sample where there are “A”
successes in the population
3.8.1 Formula
n
N
Xn
AN
X
A
XP )(
Where:
N = population size
A = number of successes in the population
N – A = number of failures in the population
X P(X)
0 1 2 3 4 5 6 7
0.6065 0.3033 0.0758 0.0126 0.0016 0.0002 0.0000 0.0000
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0 1 2 3 4 5 6 7
x
P(x
)
P(X = 2) = .0758
=0.5
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 5 Page 48 of ‐131
n = sample size
X = number of successes in the sample
n – X = number of failures in the sample
The mean of the hypergeometric distribution is: N
nAE(x)μ
The standard deviation is: 1- N
n-N
N
A)-nA(Nσ
2
Where: 1- N
n-Nis called the “Finite Population Correction Factor” from sampling without
replacement from a finite population
3.8.2 Example Different computers are checked from 10 in the department. 4 of the 10 computers have illegal
software loaded. What is the probability that 2 of the 3 selected computers have illegal
software loaded?
So, N = 10, n = 3, A = 4, X = 2
0.3120
(6)(6)
3
10
1
6
2
4
n
N
Xn
AN
X
A
2)P(X
The probability that 2 of the 3 selected computers have illegal software loaded is .30, or 30%.
3.9 Read Excel Companion to Chapter 5 Levine, et.al. 2008. Statistics for Managers – Using Microsoft™ Excel. Fifth Editon. Pearson
Education, Inc., Upper Saddle River, New Jersey., pages 211‐215
3.10 Assignment 3.2
1. Problems for Section 5.1 Number 5.2 and 5.4
2. Problems for Section 5.2 Number 5.14
3. Problems for Section 5.3 Number 5.24 and 5.28
4. Problems for Section 5.4 Number 5.34 and 5.42
5. Problems for Section 5.5 Number 5.46 and 5.50
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 5 Page 49 of ‐131
3.11 Assignment 3.3 1. Problem Section 6.2 – No. 6.2, No. 6.7
2. Problem Section 6.3 – No. 6.14, 6.15 and No. 6.16
3. Problem Section 6.4 – No. 6.24, 6.25 and No. 6.26
4. Problem Section 6.5 – No. 6.35 and No. 6.36
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 6 Page 50 of ‐131
Practicum: Math11002 Business Statistics MODULE 4
Date of Receipt
Score: Assistant Signature
Submitted only on Day/Date: ____________ / ______________ Time: 12.00 – 14.00 WIB In ____________________
I herewith signed here on stated that I have strived to do all this the module by myself. Name/NIM : ______________________________/_______________ Signature : _______________________________________________ Rem.:
Module Description: NORMAL AND SAMPLING DISTRIBUTION
Objective Define continuous distribution: normal, uniform and exponential Probabilities using formulas and tables The concept of the sampling distribution The importance of the Central Limit Theorem Examine when to apply different distributions
Output Use separate papers to report your results (in hand writing or computer print out). A report produced by the students should be in the form of working procedures and results in both softcopy and hardcopy.
4 NORMAL AND SAMPLING DISTRIBUTION
4.1 Normal Distribution and Evaluating Normality Normal distribution or Gaussian distribution is a continuous probability distribution that describes data
that cluster around the mean. The normal distribution has several theoretical properties:
Bell Shaped in its appearance
Measures of central tendency (mean, median and mode) are equal
Interquartile range is equal to 1.33 standar deviations.
Infinite range
The normal distribution can be used to describe, at least approximately, any variable that tends to
cluster around the mean. For example, the heights of adult males in the Indonesian are roughly normally
distributed, with a mean of about 160 cm. Most men have a height close to the mean, though a small
number of outliers have a height significantly above or below the mean. A histogram of male heights will
appear similar to a bell curve, with the correspondence becoming closer if more data are used.
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 6 Page 51 of ‐131
Figure 4‐1 Normal Distribution
Source: http://upload.wikimedia.org/wikipedia/commons/b/bb/Normal_distribution_and_scales.gif
By the central limit theorem, the sum of a large number of independent random variables is distributed
approximately normally. For this reason, the normal distribution is used throughout statistics, natural
science, and social science as a simple model for complex phenomena. For example, the observational
error in an experiment is usually assumed to follow a normal distribution, and the propagation of
uncertainty is computed using this assumption.
4.1.1 Normal Probability Density Function Normal equation. The value of the random variable Y (f(X)) is:
1
√2 /
where X is a normal random variable, μ is the mean, σ is the standard deviation, π is approximately 3.14159, and e is approximately 2.71828.
4.1.1.1 Transformation Formula The Z value is equal to the difference between X and the mean, µ, divided by the standard deviation, σ.
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 6 Page 52 of ‐131
4.1.1.2 Probability and the Normal Curve
The normal distribution is a continuous probability distribution. This has several implications for
probability.
The total area under the normal curve is equal to 1.
The probability that a normal random variable X equals any particular value is 0.
The probability that X is greater than a equals the area under the normal curve bounded by a
and plus infinity (as indicated by the non‐shaded area in the figure below).
The probability that X is less than a equals the area under the normal curve bounded by a and
minus infinity (as indicated by the shaded area in the figure below).
The Standardized Normal Probability Density Function is given by equation:
1
√2
Additionally, every normal curve (regardless of its mean or standard deviation) conforms to the
following "rule".
About 68% of the area under the curve falls within 1 standard deviation of the mean.
About 95% of the area under the curve falls within 2 standard deviations of the mean.
About 99.7% of the area under the curve falls within 3 standard deviations of the mean.
Collectively, these points are known as the empirical rule or the 68‐95‐99.7 rule. Clearly, given a normal
distribution, most outcomes will be within 3 standard deviations of the mean.
To see how transformation formula is applied see page 222‐232 Chapter 6 The Normal Distribution of
Levine, et.al. 2008. Statistics for Managers using Microsoft Excel Fifth Edition.
4.1.2 Evaluating Normality
4.1.2.1 Compare Data Characteristics to Theoretical Properties of normal distribution The normal distribution:
Symmetrical mean and median are equal
Bell shaped empirical rule applies
Interquartile range = 1.33 standard deviations
How to compare:
5. Construct charts and observe their appearance. For small or moderate data sets,
construct stem‐leaf display or a box‐and‐whisker plot. For large data sets, construct the
frequency distribution and plot the histogram or polygon.
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 6 Page 53 of ‐131
6. Compute descriptive numerical measures and compare the characteristics of the data
with the theoretical properties of the normal distribution. Compare mean and media.
The interquartile range should approximately 1.33 times of the standard deviation. The
range approximately 6 times the standard deviation.
7. Evaluate how the values in data distributed. Determine whether ±2/3 of values lie
between the mean and ± standard deviation. Determine ± 4/5 of the values lie between
the mean and ± 1.28 standard deviations. Determine whether ± 19 out of every 20
values lies between the mean ± 2 standard deviation
4. Example:
3 Year Return
Mean 17.8
Standard Error 0.17099
Median 17.2
Mode 15.1
Standard Deviation 4.94991
Sample Variance 24.5016
Kurtosis 1.03812
Skewness 0.66073
Range 35.6
Minimum 6.7
Maximum 42.3
Sum 14916.4
Count 838
Largest(1) 42.3
Smallest(1) 6.7
Confidence Level(95.0%) 0.33562
1. The Mean (17.8) slightly higher than The Median (17.2) {Normal Dist. mean =
median}
2. Box and Whisker plot right‐skewed withmax oulier 42. {Normal Dist. Symmetrical}
3. Interquartile range 7.0 approx. 1.41 Standard Deviation (SD) {Normal Dist. 1.33}
4. Range 35.6 equal to 7.19 SD {Normal Dist. 6 SD}
5. 74.2 Returns are within ± 1 SD of the mean. {Normal Dist. 68.26%}
6. 83.3% or returns within ± 1.28 SD (Normal Dist. 80% }
Thus, the conclusion base on the fact above, the three year returns are right skewed and
not normally distributed.
4.1.2.2 Construct a normal probability plot A normal probability plot is graphical approach for evaluating whether data are normally
distributed. The approach is called quantile‐quantile plot. A normal probability plot for data from
a normal distribution will be approximately linear. To compute normal probabilities and create
plots, we can use PHStat as described on Excel Companion to Chapter 6 of Levine, et.al. 2008.
3 Year Return
0 10 20 30 40
Box-and-Whisker Plot of Three-Year Returns
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 6 Page 54 of ‐131
Statistics for Managers – Using Microsoft™ Excel. Fifth Editon. Pearson Education, Inc., Upper
Saddle River, New Jersey., pages 247‐249
4.2 Sampling and Sampling Distribution Read Excel Companion to Chapter 7 Levine, et.al. 2008. Statistics for Managers – Using Microsoft™ Excel.
Fifth Editon. Pearson Education, Inc., Upper Saddle River, New Jersey., pages 281‐282
4.2.1 Sample Selecting a sample is less time‐consuming than selecting every item in the population (census).
Selecting a sample is less costly than selecting every item in the population.
An analysis of a sample is less cumbersome and more practical than an analysis of the entire
population.
4-2The relationship between populations, samples, parameters, and statistics.
4.2.2 Types of Samples In a nonprobability sample, items included are chosen without regard to their probability of
occurrence.
o Convenience sampling, items are selected based only on the fact that they are easy,
inexpensive, or convenient to sample.
o Judgment sample, you get the opinions of pre‐selected experts in the subject
matter.
In a probability sample, items in the sample are chosen on the basis of known probabilities.
o Simple Random Sampling, every individual or item from the frame has an equal
chance of being selected. Selection may be with replacement (selected individual is
returned to frame for possible reselection) or without replacement (selected
individual isn’t returned to the frame). Samples obtained from table of random
numbers or computer random number generators.
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 6 Page 55 of ‐131
o Systematic Sampling, Decide on sample size: n; Divide frame of N individuals into
groups of k individuals: k=N/n; Randomly select one individual from the 1st group;
Select every kth individual thereafter.
For example, suppose you were sampling n = 9 individuals from a population
of N = 72. So, the population would be divided into k = 72/9 = 8 groups.
Randomly select a member from group 1, say individual 3. Then, select
every 8th individual thereafter (i.e. 3, 11, 19, 27, 35, 43, 51, 59, 67)
o Stratified Sampling, divide population into two or more subgroups (called strata)
according to some common characteristic. A simple random sample is selected from
each subgroup, with sample sizes proportional to strata sizes. Samples from
subgroups are combined into one. This is a common technique when sampling
population of voters, stratifying across racial or socio‐economic lines.
o Cluster Sampling, Population is divided into several “clusters,” each representative
of the population. A simple random sample of clusters is selected. All items in the
selected clusters can be used, or items can be chosen from a cluster using another
probability sampling technique. A common application of cluster sampling involves
election exit polls, where certain election districts are selected and sampled.
Comparing Sampling Methods
o Simple random sample and Systematic sample
Simple to use
May not be a good representation of the population’s underlying
characteristics
o Stratified sample
Ensures representation of individuals across the entire population
o Cluster sample
More cost effective
Less efficient (need larger sample to acquire the same level of precision)
4.2.3 Sampling Distributions A sampling distribution is a distribution of all of the possible values of a statistic for a given
size sample selected from a population.
For example, suppose you sample 50 students from your college regarding their mean GPA.
If you obtained many different samples of 50, you will compute a different mean for each
sample. We are interested in the distribution of all potential mean GPA we might calculate
for any given sample of 50 students.
Example:
o Suppose your population (simplified) was four people at your institution.
o Population size N=4
o Random variable, X, is age of individuals
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 6 Page 56 of ‐131
o Values of X: 18, 20, 22, 24 (years)
4.2.4 SAMPLING FROM FINITE POPULATIONS
4.2.4.1 USING THE FINITE POPULATION CORRECTION FACTOR WITH THE MEAN In the cereal‐filling example in Section 7.3 on page 265, you selected a sample of 25 cereal
boxes from a filling process with μ = 368 grams. Suppose that 2,000 boxes (i.e., the population)
are filled on this particular day. Using the fpc factor, what is the probability that the sample
mean is below 365 grams?
SOLUTION Using the fpc factor, σ = 15, n = 25, and N = 2,000, so that The probability that the
sample mean is below 365 is computed as follows:
From Table E.2, the area below 365 grams is 0.1562.
The fpc factor has a very small effect on the standard error of the mean and the subsequent
area under the normal curve because the sample size is only 1.25% of the population size (that
is, n/N = 25/2,000 = 0.0125).
4.3 Assignment for Simple Random Sample Problem for Section 7.1 Number 7.2, 7.4, and 7.8;
Problem for Section 7.2 Number 7.10, 7.14
4.4 Assignment for Sampling Distribution Problem for Section 7.4 Number 7.18, 7.20, and 7.24
4.5 Assignment for The Sampling Distribution of the mean Problem for Section 7.5 Number 7.28, and 7.32
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 6 Page 57 of ‐131
4.6 Assignment for Sampling from Finite Population 1. Given that N = 80 and n = 10 and the sample is selected without replacement,
determine the fpc factor. 2. Historically, 93% of the deliveries of an overnight mail service arrive before 10:30 the
following morning. If a random sample of 500 deliveries is selected without replacement from a population that consisted of 10,000 deliveries, what is the probability that the sample will have : a. between 93% and 95% of the deliveries arriving before 10:30 the following morning? b. more than 95% of the deliveries arriving before 10:30 the following morning?
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 7 Page 58 of ‐131
Practicum: Math11002 Business Statistics MODULE 5
Date of Receipt
Score: Assistant Signature
Submitted only on Day/Date: ____________ / ______________ Time: 12.00 – 14.00 WIB In ____________________
I herewith signed here on stated that I have strived to do all this the module by myself. Name/NIM : ______________________________/_______________ Signature : _______________________________________________ Rem.:
Module Description: CONFIDENCE INTERVAL ESTIMATION
Objective To construct and interpret confidence interval estimates for the mean and the proportion How to determine the sample size necessary to develop a confidence interval for the mean or proportion How to use confidence interval estimates in auditing
Output Use separate papers to report your results (in hand writing or computer print out). A report produced by the students should be in the form of working procedures and results in both softcopy and hardcopy.
Pre‐Lab Read:
Levine, et.al. 2008. Statistics for Managers – Using Microsoft™ Excel. Fifth Editon. Pearson
Education, Inc., Upper Saddle River, New Jersey., pages 322‐326.
5 CONFIDENCE INTERVAL ESTIMATION
5.1 Confidence intervals
5.1.1 A point estimate and a confidence interval estimate
5.1.1.1 Point Estimates A point estimate is a single number. For the population mean (and population standard
deviation), a point estimate is the sample mean (and sample standard deviation). A confidence
interval provides additional information about variability.
5.1.1.2 Confidence Interval Estimates Point Estimate ± (Critical Value) (Standard Error)
Point Estimate
Width of confidence interval
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 7 Page 59 of ‐131
A confidence interval gives a range estimate of values:
Takes into consideration variation in sample statistics from sample to sample
Based on all the observations from 1 sample
Gives information about closeness to unknown population parameters
Confidence Level: Confidence in which the interval will contain the unknown population
parameter. A percentage (less than 100%) Stated in terms of level of confidence
Ex. 95% confidence, 99% confidence
5.1.1.3 Confidence Level Suppose confidence level = 95% , also written (1 ‐ ) = .95. A relative frequency interpretation “In the long run, 95% of all the confidence intervals that can be constructed will contain the
unknown true parameter”. A specific interval either will contain or will not contain the true
parameter
5.1.2 Confidence Interval for μ (σ Known) Assumptions:
o Population standard deviation σ is known
o Population is normally distributed
o If population is not normal, use large sample
Confidence interval estimate: n
σZX where Z is the standardized normal distribution
critical value for a probability of α/2 in each tail.
5.1.2.1 Finding the Critical Value, Z Consider a 95% confidence interval:
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 7 Page 60 of ‐131
Commonly used confidence levels are 90%, 95%, and 99%
Confidence Level Confidence Coefficient Z value
80% .80 1.280
90% .90 1.645
95% .95 1.960
98% .98 2.330
99% .99 2.580
99.8% .998 3.080
99.9% .999 3.270
5.1.2.2 Intervals and Level of Confidence
Example:
A sample of 11 circuits from a large normal population has a mean resistance of 2.20 ohms. We
know from past testing that the population standard deviation is .35 ohms. Determine a 95%
and 99% confidence interval for the true mean resistance of the population.
Solution:
95% CI
X Zσ
√n)11(.35/ 1.96 2.20 .2068 2.20 2.4068) , (1.9932
We are 95% confident that the true mean resistance is between 1.9932 and 2.4068 ohms
Although the true mean may or may not be in this interval, 95% of intervals formed in this
manner will contain the true mean.
99% CI
X Zσ
√n)11(.35/ 2.58 2.20 0.2723 2.20 2.4723) , (1.9277
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 7 Page 61 of ‐131
We are 98% confident that the true mean resistance is between 1.9277 and 2.4723 ohms
Although the true mean may or may not be in this interval, 96% of intervals formed in this
manner will contain the true mean.
5.1.3 Confidence Interval for μ (σ Unknown) If the population standard deviation σ is unknown, we can substitute the sample standard
deviation, S This introduces extra uncertainty, since S is variable from sample to sample So we
use the t distribution instead of the normal distribution.
Assumptions:
o Population standard deviation is unknown
o Population is normally distributed
o If population is not normal, use large sample
o Use Student’s t Distribution
Confidence Interval Estimate : n
StX 1-n , where t is the critical value of the t distribution with
n‐1 d.f. and an area of α/2 in each tail
The t value depends on degrees of freedom (d.f.), Number of observations that are free to vary
after sample mean has been calculated: d.f. = n ‐ 1
5.1.3.1 Student’s t Distribution
If n increases then t Z.
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 7 Page 62 of ‐131
5.1.3.2 Student’s t Table
5.1.3.3 Confidence Interval for μ(σ Unknown) Example Example 1
A random sample of n = 25 has X = 50 and S = 8. Form a 95% confidence interval for μ.
Solution: d.f. = n – 1 = 24, so the confidence interval is
25
8(2.0639)50
n
S1-n /2, tX
= (46.698 , 53.302)
A B C 1 Form a 95% confidence interval
for Mean using Ms Excel 2
3 Data 4 Sample Standard Deviation 8 5 Sample Mean 50 6 Sample Size 25 7 Confidence Level 95% 8
9 Intermediate Calculations 10 Standard Error of the Mean 1.6000 =B4/SQRT(B6) 11 Degrees of Freedom 24 =B6‐1 12 t Value 2.0639 =TINV(1‐B7,B11) 13 Interval Half Width 3.3022 =B12*B10 14 15 Confidence Interval 16 Interval Lower Limit 46.6978 =B5‐B13 17 Interval Upper Limit 53.3022 =B5+B13
Example 2:
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 7 Page 63 of ‐131
Contruct a 95% confidence interval estimate for the population mean force required to break the
insulator:
Force Required to Break Electric Insulators (in pounds)
1870 1728 1656 1610 1634 1784 1522 1696 1592 1662
1866 1764 1734 1662 1734 1774 1550 1756 1762 1866
1820 1744 1788 1688 1810 1752 1680 1810 1652 1736
Solution:
Put Data on range of F2 to O4
A B C 1 Estimate for the Mean Amount of Force Required 2
3 Data 4 Sample Standard Deviation 89.5508 =STDEV(F2:O4) 5 Sample Mean 1723.4 =AVERAGE(F2:O4) 6 Sample Size 30 =COUNT(F2:O4) 7 Confidence Level 95% 8
9 Intermediate Calculations 10 Standard Error of the Mean 16.3497 =B4/SQRT(B6) 11 Degrees of Freedom 29 =B6‐1 12 t Value 2.0452 =TINV(1‐B7,B11) 13 Interval Half Width 33.4388 =B12*B10 14 15 Confidence Interval 16 Interval Lower Limit 1689.96 =B5‐B13 17 Interval Upper Limit 1756.84 =B5+B13
We can conclude with 95% confidence that the mean breaking force required for the population of
insulator is between 1689.96 an d 1756.84 pounds. The validity of this confidence interval estimate
depends on the assumption that the force required is normally distributed. If the sample number is large
than we can slightly loosen this assumption. Thus, with a sample of 30, we can use the t distribution
even distribution is slightly left skewed (see. Probability Plot or box‐and –whisker plot). Thus, the t
distribution is appropriate for the data.
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 7 Page 64 of ‐131
5.2 Confidence Interval Estimate for a Single Population Proportion An interval estimate for the population proportion ( π ) can be calculated by adding an allowance
for uncertainty to the sample proportion ( p ).
Recall that the distribution of the sample proportion is approximately normal if the sample size is
large, with standard deviation: n
)(1σp
We will estimate this with sample data:n
p)p(1
Upper and lower confidence limits for the population proportion are calculated with the formula:
n
p)p(1Zp
where :
Z is the standardized normal value for the level of confidence desired
p is the sample proportion
n is the sample size
5.2.1 Example for Confidence Intervals for the Population Proportion A random sample of 100 people shows that 25 have opened IRA’s this year. Form a 95%
confidence interval for the true proportion of the population who have opened IRA’s.
00.25(.75)/196.125/100p)/np(1p Z
0.3349) , (0.1651 (.0433) 1.96 .25
0
200
400
600
800
1000
1200
1400
1600
1800
2000
-3 -2 -1 0 1 2 3
For
ce
Z Value
Force Required to Break Electrical Insulators
Force
1500 1600 1700 1800 1900
Force Required to Break Electrical Insulators
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 7 Page 65 of ‐131
Solving Confidence Interval for Population Proportion using Ms Excel
A B C 1 Proportion of In‐Error Sales Invoices 2 Data 3 Sample Size 100 4 Number of Successes 10 5 Confidence Level 95% 6
7 Intermediate Calculations 8 Sample Proportion 0.1 =B5/B4 9
Z Value ‐
1.9600 =NORMSINV((1‐B6)/2) 10 Standard Error of the Proportion 0.03 =SQRT(B9*(1‐B9)/B4) Interval Half Width 0.0588 =ABS(B10*B11)
11 12 Confidence Interval 13 Interval Lower Limit 0.0412 =B9‐B12 14 Interval Upper Limit 0.1588 =B9+B12
5.3 Determining Sample Size The required sample size can be found to reach a desired margin of error (e) with a specified level
of confidence (1 ‐ ). The margin of error is also called sampling error is the amount of
imprecision in the estimate of the population parameter and the amount added and subtracted to
the point estimate to form the confidence interval.
To determine the required sample size for the mean, you must know The desired level of
confidence (1 ‐ ), which determines the critical Z value; the acceptable sampling error (margin of
error), e and The standard deviation, σ.
The formula: 2
22 σ
e
Zn
5.3.1 IF Population Standard Deviation (σ) Known If = 45, what sample size is needed to estimate the mean within ± 5 with 90% confidence?
Solution: 219.195
(45)(1.645)σ2
22
2
22
e
Zn The required sample size is n = 220
Using Ms Excel:
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 7 Page 66 of ‐131
5.3.2 IF Population Standard Deviation (σ) Unknown If unknown, σ can be estimated when using the required sample size formula by using a value
for σ that is expected to be at least as large as the true σ and select a pilot sample and
estimate σ with the sample standard deviation, S .
5.3.3 To Determine The Required Sample Size For The Proportion To determine the required sample size for the proportion, you must know:
o The desired level of confidence (1 ‐ ), which determines the critical Z value
o The acceptable sampling error (margin of error), e
o The true proportion of “successes”, π
o π can be estimated with a pilot sample, if necessary (or conservatively use π = .50)
2
2 )1(
e
Zn
o Example: How large a sample would be to estimate the true proportion defective in a large
population within ±3%, with 95% confidence? (Assume a pilot sample yields p = .12)
o Solution: For 95% confidence, use Z = 1.96, e = .03 and p = .12, so use this to estimate π
o samples 451 450.74(.03)
.12)(.12)(1(1.96))1(2
2
2
2
e
Zn
o Using Ms Excel:
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 7 Page 67 of ‐131
5.4 Assignment 5 Levine, et.al. 2008. Statistics for Managers – Using Microsoft™ Excel. Fifth Editon. Pearson
Education, Inc., Upper Saddle River, New Jersey., Chapter 8 Problems. Pages 283‐ 319
1. Problem for Section 8.1 No. 8.2, 8.4, 8.8
2. Problem for Section 8.2 No. 8.12, 8.14, 8.18, 8.22
3. Problem for Section 8.3 No. 8.24, 8.28, 8.32
4. Problem for Section 8.4 No. 8.36, 8.40, 8.46
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 8 Page 68 of ‐131
Practicum: Math11002 Business Statistics MODULE 6
Date of Receipt
Score: Assistant Signature
Submitted only on Day/Date: ____________ / ______________ Time: 12.00 – 14.00 WIB In ____________________
I herewith signed here on stated that I have strived to do all this the module by myself. Name/NIM : ______________________________/_______________ Signature : _______________________________________________ Rem.:
Module Description: HYPOTHESIS TESTING AND TWO SAMPLE TEST
Objective The basic principles of hypothesis testing How to use hypothesis testing to test a mean or proportion The assumption of each hypothesis‐testing procedure, how to evaluate them and the consequences if they are violated Formulate a decision rule for testing a hypothesis Know Type I and Type II errors and Use hypothesis testing for comparing the difference between: The means of two independent populations The means of two related populations The proportions of two independent populations The variances of two independent populations.
Output Use separate papers to report your results (in hand writing or computer print out). A report produced by the students should be in the form of working procedures and results in both softcopy and hardcopy.
Pre‐Lab Read:
Levine, et.al. 2008. Statistics for Managers – Using Microsoft™ Excel. Fifth Editon. Pearson
Education, Inc., Upper Saddle River, New Jersey., Chapter 9 and Excel Companion to Chapter 9.
Pages 328‐ 367 and 369 ‐420
6 HYPOTHESIS TESTING AND TWO SAMPLE TEST
6.1 Hypothesis Testing A hypothesis is a claim (assumption) about a population parameter:
Population mean. Example: The mean monthly cell phone bill of this city is μ = $52.
Population proportion. Example: The proportion of adults in this city with cell phones is π
= .68
States the assumption (numerical) to be tested
Example: The mean number of TV sets in U.S. Homes is equal to three. 3μ:H0
6.1.1 The Null Hypothesis, H0 o Is always about a population parameter, not about a sample statistic.
o Begin with the assumption that the null hypothesis is true.
o It refers to the status quo
o Always contains “=” , “≤” or “” sign o May or may not be rejected
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 8 Page 69 of ‐131
6.1.2 The Alternative Hypothesis, H1 o Is the opposite of the null hypothesis Ex: The mean number of TV sets in U.S. homes is not
equal to 3 ( H1: μ ≠ 3 )
o Challenges the status quo
o Never contains the “=” , “≤” or “” sign o May or may not be proven
o Is generally the hypothesis that the researcher is trying to prove
6.1.3 The Hypothesis Testing Process o Claim: The population mean age is 50.
o H0: μ = 50, H1: μ ≠ 50
o Sample the population and find sample mean.
o Suppose the sample mean age was X = 20.
o This is significantly lower than the claimed mean population age of 50.
o If the null hypothesis were true, the probability of getting such a different sample mean
would be very small, so you reject the null hypothesis .
o In other words, getting a sample mean of 20 is so unlikely if the population mean was 50, you
conclude that the population mean must not be 50.
Population :
Sample:
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 8 Page 70 of ‐131
6.1.4 The Test Statistic and Critical Values If the sample mean is close to the assumed population mean, the null hypothesis is not
rejected.
If the sample mean is far from the assumed population mean, the null hypothesis is rejected.
How far is “far enough” to reject H0?
The critical value of a test statistic creates a “line in the sand” for decision making.
6.1.5 Errors in Decision Making
6.1.5.1 Type I Error o Reject a true null hypothesis
o Considered a serious type of error
o The probability of a Type I Error is Called level of significance of the test
Set by researcher in advance
6.1.5.2 Type II Error o Failure to reject false null hypothesis
o The probability of a Type II Error is β
Possible Hypothesis Test Outcomes
Actual Situation
Decision H0 True H0 False
Do Not Reject H0 No Error Probability 1 ‐ α
Type II Error Probability β
Reject H0 Type I Error Probability α
No Error Probability 1 ‐ β
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 8 Page 71 of ‐131
6.1.6 Level of Significance, α For example, Claim: The population mean age is 50.
6.1.7 Hypothesis Testing: σ Known For two tail test for the mean, σ known:
o Convert sample statistic ( X ) to test statistic
n
σμX
Z
o Determine the critical Z values for a specified
level of significance from a table or by using Excel
o Decision Rule: If the test statistic falls in the rejection region, reject H0 ; otherwise do
not reject H0
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 8 Page 72 of ‐131
Example: Test the claim that the true mean weight ofchocolate bars manufactured in a factory is 3
ounces.
Solution:
State the appropriate null and alternative hypotheses: H0: μ = 3 H1: μ ≠ 3 (This is a two tailed
test)
Specify the desired level of significance: Suppose that = .05 is chosen for this test Choose a sample size: Suppose a sample of size n = 100 is selected
Determine the appropriate technique
o σ is known so this is a Z test
Set up the critical values
o For = .05 the critical Z values are ±1.96 Collect the data and compute the test statistic
o Suppose the sample results are n = 100, X = 2.84 (σ = 0.8 is assumed known from past
company records)
So the test statistic is: 2.0.08
.16
100
0.832.84
n
σμX
Z
Since Z = ‐2.0 < ‐1.96, you reject the null hypothesis and conclude that there is sufficient
evidence that the mean weight of chocolate bars is not equal to 3.
6.1.8 6 Steps of Hypothesis Testing: 1. State the null hypothesis, H0 and state the alternative hypotheses, H1
2. Choose the level of significance, α, and the sample size n.
3. Determine the appropriate statistical technique and the test statistic to use
4. Find the critical values and determine the rejection region(s)
5. Collect data and compute the test statistic from the sample result
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 8 Page 73 of ‐131
6. Compare the test statistic to the critical value to determine whether the test statistic falls in
the region of rejection. Make the statistical decision: Reject H0 if the test statistic falls in the
rejection region. Express the decision in the context of the problem.
See Example 9.2 and 9.3
Levine, et.al. 2008. Statistics for Managers – Using Microsoft™ Excel. Fifth Editon. Pearson
Education, Inc., Upper Saddle River, New Jersey., pages 336 and 337
6.1.9 Hypothesis Testing: σ Known pValue Approach The p‐value is the probability of obtaining a test statistic equal to or more extreme ( < or > )
than the observed sample value given H0 is true. Also called observed level of significance.
Smallest value of for which H0 can be rejected .
Convert Sample Statistic (ex. X) to Test Statistic (ex. Z statistic )
Obtain the p‐value from a table or by using Excel
Compare the p‐value with If p‐value < , reject H0
If p‐value , do not reject H0
Example:
6.1.9.1 Manual Calculation How likely is it to see a sample mean of 2.84 (or something further from the mean, in either
direction) if the true mean is = 3.0? Suppose the sample results are n = 100, σ = 0.8 is assumed
Compare the p‐value with If p‐value < , reject H0
If p‐value , do not reject H0
Here: p‐value = .0456 and = .05, Since .0456 < .05, you reject the null hypothesis
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 8 Page 74 of ‐131
6.1.9.2 Using Ms Excel:
6.1.10 Hypothesis Testing: σ Known Confidence Interval Connections For X = 2.84, σ = 0.8 and n = 100, the 95% confidence interval is:
100
0.8 (1.96) 2.84 to
100
0.8 (1.96) - 2.84
.6832 ≤ μ ≤ 2.9968
Since this interval does not contain the hypothesized mean (3.0), you reject the null
hypothesis at = .05
6.1.11 One Tail Tests In many cases, the alternative hypothesis focuses on a particular direction
This is a lower‐tail test since the alternative hypothesis is focused on the
lower tail below the mean of 3
This is an upper‐tail test since the alternative hypothesis is focused on
the upper tail above the mean of 3
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 8 Page 75 of ‐131
Example:
A phone industry manager thinks that customer monthly cell phone bills have increased, and
now average more than $52 per month. The company wishes to test this claim. Past company
records indicate that the standard deviation is about $10.
Form hypothesis test:
H0: μ ≤ 52 the mean is less than or equal to than $52 per month
H1: μ > 52 the mean is greater than $52 per month (i.e., sufficient evidence exists to support
the manager’s claim)
Suppose that = .10 is chosen for this test
Find the rejection region:
What is Z given a = 0.10?
Suppose a sample is taken with the following results: n = 64, X = 53.1 (=10 was assumed
known from past company records)
Then the test statistic is: 0.88
64
105253.1
n
σμX
Z
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 8 Page 76 of ‐131
Do not reject H0 since Z = 0.88 ≤ 1.28
i.e.: there is not sufficient evidence that the mean bill is greater than $52
Calculate the p‐value and compare to
Microsoft Excel Z‐test Results
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 8 Page 77 of ‐131
6.1.12 Hypothesis Testing: σ Unknown If the population standard deviation is unknown, you instead use the sample standard deviation
S. Because of this change, you use the t distribution instead of the Z distribution to test the null
hypothesis about the mean. All other steps, concepts, and conclusions are the same.
The t test statistic with n‐1 degrees of freedom is:
n
SμX
t 1-n
Example: The mean cost of a hotel room in New York is said to be $168 per night. A random
sample of 25 hotels resulted in X = $172.50 and S = 15.40. Test at the = 0.05 level.
(A stem‐and‐leaf display and a normal probability plot indicate the data are approximately
normally distributed )
H0: μ = 168
H1: μ 168
1.46
25
15.40168172.50
n
SμX
t 1n
Do not reject H0: not sufficient evidence that true mean cost is different from $168
6.1.13 Hypothesis Testing: Connection to Confidence Intervals For X = 172.5, S = 15.40 and n = 25, the 95% confidence interval is:
25
15.4 (2.0639) 172.5 to
25
15.4 (2.0639) - 172.5
1.46
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 8 Page 78 of ‐131
166.14 ≤ μ ≤ 178.86
Since this interval contains the hypothesized mean (168), you do not reject the null hypothesis
at = .05
o Recall that you assume that the sample statistic comes from a random sample from a
normal distribution.
o If the sample size is small (< 30), you should use a box‐and‐whisker plot or a normal
probability plot to assess whether the assumption of normality is valid.
o If the sample size is large, the central limit theorem applies and the sampling
distribution of the mean will be normal.
Microsoft Excel Results
6.1.14 Hypothesis Testing Proportion Involves categorical variables. Two possible outcomes, that is, “Success” (possesses a certain
characteristic) and “Failure” (does not possesses that characteristic). Fraction or proportion of
the population in the “success” category is denoted by π
Sample proportion in the success category is denoted by p
sizesample
sampleinsuccessesofnumber
n
Xp
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 8 Page 79 of ‐131
When both nπ and n(1‐π) are at least 5, p can be approximated by a normal distribution with
mean and standard deviation pμ and n
)(1σ
p
The sampling distribution of p is approximately normal, so the test statistic is a Z value:
n
pZ
)1(
Example: A marketing company claims that it receives 8% responses from its mailing. To test
this claim, a random sample of 500 were surveyed with 30 responses. Test at the = .05 significance level.
Solution:
n π = (500)(.08) = 40 n(1‐π) = (500)(.92) = 460
6.2 Assignment 6.1 1. Problem for Section 9.1 No. 9.1 through 9.5, 9.14, 9.18
2. Problem for Section 9.2 No. 9.20, 9.24, 9.30, 9.32
3. Problem for Section 9.3 No. 9.36, 9.44, 9.46
4. Problem for Section 9.4 No. 9.50, 9.54, 9.56, 9.62
5. Problem for Section 9.5 No. 9.68, 9.70, 9.74
6.3 TwoSample Tests Goal: Test hypothesis or form a confidence interval for the difference between two population
means, μ1 – μ2
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 8 Page 80 of ‐131
The point estimate for the difference between sample means: X – X
Different data sources
Independent: Sample selected from one population has no effect on the sample selected
from the other population
Use the difference between 2 sample means
Use Z test, pooled variance t test, or separate‐variance t test
Independent Population Means:
1. σ1 and σ2 known Use a Z test statistic
Assumptions: Samples are randomly and independently drawn and population
distributions are normal
When σ1 and σ2 are known and both populations are normal, the test statistic is
a Z‐value and the standard error of X1 – X2 is
2
22
1
21
XX n
σ
n
σσ
21 and
2
22
1
21
2121
nσ
nσ
μμXXZ
Two Independent Populations, Comparing Means
2. σ1 and σ2 unknown Use S to estimate unknown σ, use a t test statistic
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 8 Page 81 of ‐131
Assumptions: Samples are randomly and independently drawn, Populations
are normally distributed and Population variances are unknown but assumed
equal
Forming interval estimates: The population variances are assumed equal, so
use the two sample standard deviations and pool them to estimate σ the test
statistic is a t value with (n1 + n2 – 2) degrees of freedom
1)n()1(n
S1nS1nS
21
222
211
p
21
2p
2121
n1
n1
S
μμXXt
1)n()1(n
S1nS1nS
21
222
2112
p
6.3.1 TwoSample Tests Independent Populations You are a financial analyst for a brokerage firm. Is there a difference in dividend yield between
stocks listed on the NYSE & NASDAQ? You collect the following data:
NYSE NASDAQ
Number 21 25
Sample mean 3.27 2.53
Sample std dev 1.30 1.16
Assuming both populations are approximately normal with equal variances, is there a difference
in average yield ( = 0.05)?
The test statistic is:
2.040
25
1
21
15021.1
02.533.27
n
1
n
1S
μμXXt
21
2p
2121
1.5021
1)25(1)-(21
1.161251.30121
1)n()1(n
S1nS1nS
22
21
222
2112
p
H0: μ1 ‐ μ2 = 0 i.e. (μ1 = μ2)
H1: μ1 ‐ μ2 ≠ 0 i.e. (μ1 ≠ μ2)
= 0.05 df = 21 + 25 ‐ 2 = 44
Critical Values: t = ± 2.0154
Test Statistic: 2.040
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 8 Page 82 of ‐131
Decision: Reject H0 at α = 0.05
Conclusion: There is evidence of a difference in the means
6.3.2 Independent Populations Unequal Variance If you cannot assume population variances are equal, the pooled‐variance t test is inappropriate,
Instead, use a separate‐variance t test, which includes the two separate sample variances in the
computation of the test statistic. The computations are complicated and are best performed
using Excel.
The confidence interval for μ1 – μ2 is: 2
22
1
21
21n
σ
n
σXX Z
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 9 Page 83 of ‐131
Practicum: Math11002 Business Statistics MODULE 7
Date of Receipt
Score: Assistant Signature
Submitted only on Day/Date: ____________ / ______________ Time: 12.00 – 14.00 WIB In ____________________
I herewith signed here on stated that I have strived to do all this the module by myself. Name/NIM : ______________________________/_______________ Signature : _______________________________________________ Rem.:
Module Description: ANOVA and CHI SQUARE AND NON PARAMETRIC TESTS
Objective The basic concepts of experimental design How to use the one‐way analysis of variance to test for the differences among the means of several groups How to use the two‐way analysis of variance and interpret the interaction and How and when to use the chi‐square test for contingency tables How to use the Marascuillo procedure for determining pair‐wise differences when evaluating more than two porportions How and when to use the McNemar test How and when to use nonparametric tests
Output Use separate papers to report your results (in hand writing or computer print out). A report produced by the students should be in the form of working procedures and results in both softcopy and hardcopy.
Pre‐Lab Read:
Levine, et.al. 2008. Statistics for Managers – Using Microsoft™ Excel. Fifth Editon. Pearson
Education, Inc., Upper Saddle River, New Jersey., Chapter 10 and Excel Companion to Chapter 10.
Pages 369‐ 420
7 ANOVA AND CHI SQUARE AND NON PARAMETRIC TESTS ANOVA
General ANOVA Setting
Investigator controls one or more factors of interest
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 9 Page 84 of ‐131
o Each factor contains two or more levels
o Levels can be numerical or categorical
o Different levels produce different groups
o Think of the groups as populations
Observe effects on the dependent variable, are the groups the same?
Experimental design: the plan used to collect the data
Completely Randomized Design
Experimental units (subjects) are assigned randomly to the different levels (groups), subjects are
assumed homogeneous
Only one factor or independent variable, with two or more levels (groups)
Analyzed by one‐factor analysis of variance (one‐way ANOVA)
7.1 OneWay Analysis of Variance Evaluate the difference among the means of three or more groups
Examples: Accident rates for 1st, 2nd, and 3rd shift or Expected mileage for five brands of tires
Assumptions:
Populations are normally distributed
Populations have equal variances
Samples are randomly and independently drawn
7.1.1 Hypotheses: OneWay ANOVA
c3210 μμμμ:H
All population means are equal, i.e., no treatment effect (no variation in means among
groups)
c3211 μμμμ:H
At least one population mean is different, i.e., there is a treatment (groups) effect. Does
not mean that all population means are different.
All Means are the same: The Null Hypothesis is True
(No Group Effect)
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 9 Page 85 of ‐131
At least one mean is
different: The Null
Hypothesis is NOT true
(Treatment Effect is
present)
7.1.2 Partitioning the Variation Total variation can be split into two parts:
SST = Total Variation = the aggregate dispersion of the individual data values around the overall (grand) mean of all factor levels (SST)
c
j
n
iij
j
XXSST1 1
2)(
2212
211 )(...)()( XXXXXXSST nc
Where: SST = Total sum of squares c = number of groups nj = number of values in group j = ith value from group j
= grand mean (mean of all data values)
SSA = Among‐Group Variation = dispersion between the factor sample means (SSA)
2
1
)( XXnSSA j
c
jj
2cc
222
211 )XX(n...)XX(n)XX(nSSA
Where: SSA = Sum of squares among groups c = number of groups nj = sample size from group j = sample mean from group j
= grand mean (mean of all data values)
SSW = Within‐Group Variation = dispersion that exists among the data values within the
particular factor levels (SSW)
jn
i
jij
c
j
XXSSW1
2
1
)(
22
212
11 )(...)()( 11 cXXXXXXSSW nc
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 9 Page 86 of ‐131
Where: SSW = Sum of squares within groups c = number of groups nj = sample size from group j = sample mean from group j
= ith value in group j
7.1.3 Obtaining the Mean Squares
1
n
SSTMST Mean Squares Total
1
c
SSAMSA Mean Squares Among
cn
SSWMSW
Mean Squares Within
7.1.4 OneWay ANOVA Table c = number of groups
n = sum of the sample sizes
from all groups
df = degrees of freedom
7.1.5 Test statistic MSA is mean squares among variances
MSW is mean squares within variances
Degrees of freedom
df1 = c – 1 (c = number of groups)
df2 = n – c (n = sum of all sample sizes)
The F statistic is the ratio of the among variance to the within variance
The ratio must always be positive
df1 = c ‐1 will typically be small
df2 = n ‐ c will typically be large
Decision Rule: Reject H0 if F > FU, otherwise do
not reject H0
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 9 Page 87 of ‐131
7.1.6 Example An experiment was conducted to determine whether any significant differences exist in the
strength of parachutes woven from synthetic fibers from four different suppliers (Supplier 1,
Supplier 2, Supplier 3, and Supplier 4)
Supplier 1 Supplier 2 Supplier 3 Supplier 4
18.5 26.3 20.6 25.4
24.0 25.3 25.2 19.9
17.2 24.0 20.8 22.6
19.9 21.2 24.7 17.5
18.0 24.5 22.9 20.4
Sample Mean 19.52 24.26 22.84 21.16
=AVERAGE(…)
Sample Standard Deviation 2.69 1.92 2.13 2.98 =STDEV(…)
To construct the ANOVA summary table, we compute the sample means in each group.
Then compute the grand mean by summing all 20 values and dividing by total number of
values:
∑ ∑ 438.920
21.945
Then compute sum of squares:
5 19.52 21.945 5 24.26 21.945 5 22.84 21.9455 21.16 21.945 63.2855
18.5 19.52 18 19.52 26.63 24.2624.5 24.26 20.6 22.84 22.9 22.8425.4 21.16 20.4 21.16 97.5040
0
5
10
15
20
25
30
0 1 2 3 4
Ten
sile
Str
engt
h
Supplier
Tensile Strength Scatter Diagram
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 9 Page 88 of ‐131
18.5 21.945 24 21.945 20.4 21.945
160.7895
1
62.28554 1
21.0952
97.504020 4
6.0940
21.09526.0940
3.4616
Fu form F distribution Table with 3 degrees of freedom in numerator and 16 degrees of
freedom in dominator at 0.05 level of significance is 3.24.
Because the compute test statistic F = 3.4616 > Fu=3.24, we reject the null hypotesis. The
conclusion that there is a significant difference in the mean tensile strength among the
four supplier.
Using Ms Excel Data – Data Analysis –Anova: Single Factor :
7.1.7 The The TukeyKramer Procedure First compute the differences, . Then compute CRITICAL RANGE FOR THE TURKEY‐
KRAMMER
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 9 Page 89 of ‐131
Where QU is the upper‐tail critical value from a
Studentized range distribution having c degrees of freedom in numerator and n‐c degrees
in the denominator.
where:
QU = Value from Studentized Range Distribution with c and n ‐ c degrees of freedom
for the desired level of MSW = Mean Square Within
nj and nj’ = Sample sizes from groups j and j’
7.1.8 ANOVA Assumptions Randomness and Independence: Select random samples from the c groups (or randomly
assign the levels)
Normality: The sample values from each group are from a normal population
Homogeneity of Variance: Can be tested with Levene’s Test
Levene’s Test
o Tests the assumption that the variances of each group are equal.
o First, define the null and alternative hypotheses:
H0: σ21 = σ
22 = …=σ
2c
H1: Not all σ2j are equal
o Second, compute the absolute value of the difference between each value and
the median of each group.
o Third, perform a one‐way ANOVA on these absolute differences.
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 9 Page 90 of ‐131
F = 0.2068 < 3.2389 (or the p‐value = 0.8902 > 0.05)., thus we do not reject the H0. There is no evidence
of a significant difference among the four variances. Therefore, the homogeneity of variance
assumption for ANOVA procedure is justified.
7.2 Two‐Way Analysis of Variance
Examines the effect of
Two factors of interest on the dependent variable
e.g., Percent carbonation and line speed on soft drink bottling process
Interaction between the different levels of these two factors
e.g., Does the effect of one particular carbonation level depend on which
level the line speed is set?
Assumptions
Populations are normally distributed
Populations have equal variances
Independent random samples are selected
7.2.1 Sources of Variation
SST = SSA + SSB + SSAB + SSE
Two Factors of interest: A and B
r = number of levels of factor A
c = number of levels of factor B
n/ = number of replications for each cell
n = total number of observations in all cells (n = rcn/)
Xijk = value of the kth observation of level i of factor A and level j of factor B
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 9 Page 91 of ‐131
7.2.2 Two‐Way ANOVA: Features
Degrees of freedom always add up: n‐1 = rc(n/‐1) + (r‐1) + (c‐1) + (r‐1)(c‐1)
Total = error + factor A + factor B + interaction
The denominator of the F Test is always the same but the numerator is different
The sums of squares always add up: SST = SSE + SSA + SSB + SSAB
Total = error + factor A + factor B + interaction
7.2.3 Interaction
7.3 CHI SQUARE AND NON PARAMETRIC TESTS
All of the inferential statistics we have covered in past lessons, are what are called parametric statistics. To use these statistics we make some assumptions about the distributions they come from, such as they are normally distributed. With parametric statistics we also deal with data for the dependent variable that is at the interval or ratio level of measurement, i.e. test scores, physical measurements.
The parametric statistics we have discussed so for in this course are:
1. the Z‐score test 2. the Z‐test 3. the single‐sample t‐test 4. the independent t‐test 5. the dependent t‐test 6. one‐sample analysis of variance (ANOVA)
We will now consider a widely used non‐parametric test, chi‐square, which we can use with data at the nominal level, that is data that is classificatory. For example, we know the frequency with which entering freshman, when required to purchase a computer for college use, select Macintosh
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 9 Page 92 of ‐131
Computers, IBM Computers, or Some other brand of computer. We want to know if there is a difference among the frequencies with which these three brands of computers are selected or if they choose basically equally among the three brands. This is a problem we can use the chi‐square statistic for.
The chi‐square statistic is used to compare the observed frequency of some observation (such as frequency of buying different brands of computers) with an expected frequency (such as buying equal numbers of each brand of computer). The comparison of observed and expected frequencies is used to calculate the value of the chi‐square statistic, which in turn can be compared with the distribution of chi‐square to make an inference about a statistical problem.
The symbol for chi‐square and the formula are as follows:
where
O is the observed frequency, and
E is the expected frequency.
The degrees of freedom for the one‐dimensional chi‐square statistic is:
df = C ‐ 1
where C is the number of categories or levels of the independent variable.
7.3.1 One‐Variable Chi‐Square (goodness‐of‐fit test) with equal expected frequencies
We can use the chi‐square statistic to test the distribution of measures over levels of a variable to indicate if the distribution of measures is the same for all levels. This is the first use of the one‐variable chi‐square test. This test is also referred to as the goodness‐of‐fit test.
Using the example we already mentioned of the frequency with which entering freshman, when required to purchase a computer for college use, select Macintosh Computers, IBM Computers, or Some other brand of computer. We want to know if there is a significant difference among the frequencies with which these three brands of computers are selected or if the students select equally among the three brands.
The data for 100 students is recorded in the table below (the observed frequencies). We have also indicated the expected frequency for each category. Since there are 100 measures or observations and there are three categories (Macintosh, IBM, and Other) we would indicate the expected frequency for each category to be 100/3 or 33.333. In the third column of the table we have calculated the square of the observed frequency minus the expected frequency divided by
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 9 Page 93 of ‐131
the expected frequency. The sum of the third column would be the value of the chi‐square statistic.
Frequency with which students select computer brand
Computer ObservedFrequency
ExpectedFrequency
(O‐E)2/E
IBM 47 33.333 5.604 Macintosh 36 33.333 0.213 Other 17 33.333 8.003
Total (chi‐square) 13.820
From the table we can see that:
The df = C ‐ 1 = 3 ‐ 1 = 2
We can compare the obtained value of chi‐square with the critical value for the .05 level and with degreees of freedom of 2 obtained from Appendix Table F (Distribution of Chi Square) on page 331 of the text. Looking under the column for .05 and the row for df = 2 we see that the critical value for chi‐square is 5.991.
We now have the information we need to complete the six step process for testing statistical hypotheses for our research problem.
1. State the null hypothesis and the alternative hypothesis based on your research question.
Note: Our null hypothesis, for the chi‐square test, states that there are no differences between the observed and the expected frequencies. The alternate hypothesis states that there are significant differences between the observed and expected frequencies.
2. Set the alpha level.
Note: As usual we will set our alpha level at .05, we have 5 chances in 100 of making a type I error.
3. Calculate the value of the appropriate statistic. Also indicate the degrees of freedom for the statistical test if necessary.
df = C ‐ 1 = 2
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 9 Page 94 of ‐131
4. Write the decision rule for rejecting the null hypothesis.
Reject H0 if >= 5.991.
Note: To write the decision rule we had to know the critical value for chi‐square, with an alpha
level of .05, and 2 degrees of freedom. We can do this by looking at Appendix Table F and noting
the tabled value for the column for the .05 level and the row for 2 df.
5. Write a summary statement based on the decision. Reject H0, p < .05
Note: Since our calculated value of (13.820) is greater than 5.991, we reject the null hypothesis and accept the alternative hypothesis.
6. Write a statement of results in standard English. There is a significant difference among the frequencies with which students purchased three different brands of computers.
7.3.2 One‐Variable Chi‐Square (goodness‐of‐fit test) with predetermined expected frequencies
Let's look at the problem we just solved, in a way that illustrates the other use of one‐variable chi‐square, that is with predetermined expected frequencies rather than with equal frequencies. We could formulated our revised problem as follows:
In a national study, students required to buy computers for college use bought IBM computers 50% of the time, Macintosh computers 25% of the time, and other computers 25% of the time. Of 100 entering freshman we surveyed 36 bought Macintosh Computers, 47 bought IBM computers, and 17 bought some other brand of computer. We want to know if these frequencies of computer buying behavior is similar to or different than the national study data.
The data for 100 students is recorded in the table below (the observed frequencies). In this case the expected frequencies are those from the national study. To get the expected frequency we take the percentages from the national study times the total number of subjects in the current study.
Expected frequency for IBM = 100 X 50% = 50 Expected frequency for Macintosh = 100 X 25% = 25 Expected frequency for Other = 100 X 25% = 25
The expected frequencies are recorded in the second column of the table. As before we have
calculated the square of the observed frequency minus the expected frequency divided by the
expected frequency and recorded this result in the third column of the table. The sum of the third
column would be the value of the chi‐square statistic.
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 9 Page 95 of ‐131
Frequency with which students select computer brand
Computer ObservedFrequency
ExpectedFrequency
(O‐E)2/E
IBM 47 50 0.18 Macintosh 36 25 4.84 Other 17 25 2.56
Total (chi‐square) 7.58
From the table we can see that:
The df = C ‐ 1 = 3 ‐ 1 = 2
We can compare the obtained value of chi‐square with the critical value for the .05 level and with degreees of freedom of 2 obtained from Appendix Table F (Distribution of Chi Square) on page 331 of the text. Looking under the column for .05 and the row for df = 2 we see that the critical value for chi‐square is 5.991.
We now have the information we need to complete the six step process for testing statistical hypotheses for our research problem.
1. State the null hypothesis and the alternative hypothesis based on your research question.
Note: Our null hypothesis, for the chi‐square test, states that there are no differences between the observed and the expected frequencies. The alternate hypothesis states that there are significant differences between the observed and expected frequencies.
2. Set the alpha level. Note: As usual we will set our alpha level at .05, we have 5 chances in 100 of making a type I error.
3. Calculate the value of the appropriate statistic. Also indicate the degrees of freedom for the statistical test if necessary.
7.58
df = C ‐ 1 = 2
4. Write the decision rule for rejecting the null hypothesis.
Reject H0 if >= 5.991.
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 9 Page 96 of ‐131
Note: To write the decision rule we had to know the critical value for chi‐square, with an alpha
level of .05, and 2 degrees of freedom. We can do this by looking at Appendix Table F and noting
the tabled value for the column for the .05 level and the row for 2 df.
5. Write a summary statement based on the decision. Reject H0, p < .05
Note: Since our calculated value of (7.58) is greater than 5.991, we reject the null hypothesis and accept the alternative hypothesis.
6. Write a statement of results in standard English. There is a significant difference among the frequencies with which students purchased three different brands of computers and the proportions suggested by a national study.
7.3.3 Two‐Variable Chi‐Square (test of independence)
Now let us consider the case of the two‐variable chi‐square test, also known as the test of independence.
For example we may wish to know if there is a significant difference in the frequencies with which males come from small, medium, or large cities as constrasted with females. The two variables we are considering here are hometown size (small, medium, or large) and sex (male or female). Another way of putting our research question is: Is gender independent of size of hometown?
The data for 30 females and 6 males is in the following table.
Frequency with which males and females come from small, medium, and large cities
Small Medium Large Totals
Female 10 14 6 30
Male 4 1 1 6
Totals 14 15 7 36
The formula for chi‐square is the same as before:
where
O is the observed frequency, and
E is the expected frequency.
The degrees of freedom for the two‐dimensional chi‐square statistic is:
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 9 Page 97 of ‐131
df = (C ‐ 1)(R ‐ 1)
where C is the number of columes or levels of the first variable and R is the number of rows or levels of the seconed variable.
In the table above we have the observed frequencies (six of them). Now we must calculate the expected frequency for each of the six cells. For two‐variable chi‐square we find the expected frequencies with the formula:
Expected Frequency for a Cell = (Column Total X Row Total)/Grand Total
In the table above we can see that the Column Totals are 14 (small), 15 (medium), and 7 (large), while the Row Totals are 30 (female) and 6 (male). The grand total is 36.
Using the formula we can thus find the expected frequency for each cell.
1. The expected frequency for the small female cell is 14X30/36 = 11.667 2. The expected frequency for the medium female cell is 15X30/36 = 12.500 3. The expected frequency for the large female cell is 7X30/36 = 5.833 4. The expected frequency for the small male cell is 14X6/36 = 2.333 5. The expected frequency for the medium male cell is 15X6/36 = 2.500 6. The expected frequency for the large male cell is 7X6/36 = 1.167
We can put these expected frequencies in our table and also include the values for (O ‐ E)2/E. The sum of all these will of course be the value of chi‐square.
Observed frequencies, expected frequencies, and (O ‐ E)2/E for males and females from small,
medium, and large cities
Small Medium Large Totals
Observed Expected (O‐E)2/E Observed Expected (O‐E)2/E Observed Expected (O‐E)2/E Female 10 11.667 0.238 14 12.500 0.180 6 5.833 0.005 30 Male 4 2.333 1.191 1 2.500 0.900 1 1.167 0.024 6 Totals 14 15 7 36
From the table we can see that:
=0.238+.180+.005+1.191+0.900+0.024=2.538
and df = (C ‐ 1)(R ‐ 1) = (3 ‐ 1)(2 ‐ 1) = (2)(1) = 2
We now have the information we need to complete the six step process for testing statistical hypotheses for our research problem.
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 9 Page 98 of ‐131
1. State the null hypothesis and the alternative hypothesis based on your research question.
2. Set the alpha level. 3. Calculate the value of the appropriate statistic. Also indicate the degrees of freedom for
the statistical test if necessary.
df = (C ‐ 1)(R ‐ 1) = (2)(1) = 2
4. Write the decision rule for rejecting the null hypothesis.
Reject H0 if >= 5.991.
Note: To write the decision rule we had to know the critical value for chi‐square, with an
alpha level of .05, and 2 degrees of freedom. We can do this by looking at Appendix Table
F and noting the tabled value for the column for the .05 level and the row for 2 df.
5. Write a summary statement based on the decision. Fail to reject H0
Note: Since our calculated value of (2.538) is not greater than 5.991, we fail to reject the null hypothesis and are unable to accept the alternative hypothesis.
6. Write a statement of results in standard English. There is not a significant difference in the frequencies with which males come from small, medium, or large towns as compared with females. Hometown size is not independent of gender.
Chi‐square is a useful non‐parametric statistic to help evaluate statistical hypothesis, involving frequencies with which observations fall in various categories (nominal data).
7.4 Assignment
7.4.1 Assignment 7.1
is the formula for
1. the dependent t‐test. 2. the independent t‐test. 3. the one‐way analysis of variance test. 4. the Scheffe post hoc test.
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 9 Page 99 of ‐131
is the formula for
1. the dependent t‐test. 2. the independent t‐test. 3. the one‐way analysis of variance test. 4. the Scheffe post hoc test.
is the formula for
1. the dependent t‐test. 2. the independent t‐test. 3. the one‐way analysis of variance test. 4. the Scheffe post hoc test.
For the following research problem ‐ You are concerned with the effect of computers on the quality of written language. You randomly place the 30 students in your English class into two groups of 15 each. The first group is asked to write their next English theme assignment using a word processing program on a computer, while the other group is asked to write their themes by hand. You ask another English teacher, to read all 30 themes and give them a 1 (poorest) to 10 (best) rating on the quality of their English usage. You want to know if there is a significant difference in the quality ratings of the two groups.
What is the proper statistical test to use with this research problem?
1. the dependent t‐test 2. the independent t‐test 3. the one‐sample t‐test 4. the one‐way analysis of variance test
For the following research problem ‐ The number of hours a subject could stay awake was measured as a function of the dose level of a particular drug. Three levels of drug dosage were used. Analyze the results for the data on the dependent variable (number of hours awake) to determine if there was a significant difference among the three levels of drug dosage used.
What is the proper statistical test to use with this research problem?
1. the dependent t‐test 2. the independent t‐test 3. the one‐sample t‐test 4. the one‐way analysis of variance test
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 9 Page 100 of ‐131
7.4.2 Assignment 7.2.
1. An industrial psychologist is interested in evaluating four different types of training on worker productivity. Using a standard measure of productivity, the psychologist measures the productivity of a set of workers who have been trained using each one of the four procedures. Using the data below, determine whether there is a significant difference between the training methods. Larger numbers on the dependent variable indicate higher productivity.
Productivity Scores for Four Groups of Workers Trained by Different Methods
Group 1 On the Job
Group 2 Computer Assisted
Group 3Lecture
Group 4 Videotape
67 68 46 37
68 62 39 46
61 59 38 49
62 71 47 48
60 60 46 49
56 66 49 53
1. H0 : 2. H1 : 3. F = 4. F12 = 5. F13 = 6. F23 = 7. Critical Value for F = 8. State conditions under which you would reject H0 :
2. A school guidance counselor investigates the influence of different motivational devices on the academic achievement of students. The counselor arranges for one group of students to receive immediate feedback upon the completion of an English assignment. A second group of students receives feedback at the end of the day, while a third group receives feedback at the end of the week. Using the students' grades on a standardized English test, determine whether there is a significant difference between the groups. If necessary, perform Scheffe tests.
English Test Results for Groups of Students Receiving Various Types of Feedback
No Group 1 Immediate Feedback
Group 2Day's End Feedback
Group 3Week's End Feedback
1 49 40 36
2 40 37 32
3 41 42 31
4 46 39 39
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 9 Page 101 of ‐131
No Group 1 Immediate Feedback
Group 2Day's End Feedback
Group 3Week's End Feedback
5 42 45 40
6 50 39 39
7 53 45 41
8 51 49 38
1. H0 : 2. H1 : 3. F = 4. F12 = 5. F13 = 6. F23 = 7. Critical Value for F = 8. State conditions under which you would reject H0 :
7.4.3 Assignment 7.3
For each of the following problems, state the null hypothesis, the alternate hypothesis, the calculated value of the statistic, the critical value of the statistic, and the conditions under which you would reject the null hypothesis.
1. A sample of 100 people are classified as to their social club membership and their academic status. Is belonging to a social club independent of academic status?.
Academic Classification and Social Club Membership for 100 People
Academic classification
Belong toSocial Club
Do not Belong toSocial Club
Freshman 9 16 Sophomore 11 14
Junior 16 9 Senior 19 6
1. H0: 2. H1:
3. =
4. Critical Value for = 5. State conditions under which you would reject H0
2. A consumer‐research group asked 100 men to use each of three kinds of after‐shave lotion for one month. After the trial period, each man indicated the lotion he preferred. Using the results below, determine whether there is a significant preference for any of the three after‐shave lotions.
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 9 Page 102 of ‐131
Number of Men Preferring Each of Three After‐Shave Lotions
Lotion Number of Men Preferring
1 42 2 36 3 22
1. H0: 2. H1:
3. =
4. Critical Value for = 5. State conditions under which you would reject H0
� is the formula for
1. the dependent t‐test. 2. the independent t‐test. 3. the one‐way analysis of variance test. 4. the chi‐square test.
� is the formula for
1. the dependent t‐test. 2. the independent t‐test. 3. the one‐way analysis of variance test. 4. the chi‐square test.
� is the formula for
1. the dependent t‐test. 2. the independent t‐test.
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 9 Page 103 of ‐131
3. the one‐way analysis of variance test. 4. the Scheffe post hoc test.
For the following research problem ‐ You are interested in knowing whether or not the composition of a family is related to the type of vacations they like to take. Accordingly, you collect the following data from a survey of preferred vacations:
Frequencies with which families of various types prefer various vacation types
Vacation Family Type
No Children Less Than 5 Children 5‐10 Children
Visit Relatives 0 15 5
Go to Beach 5 5 10
Urban Sightseeing 15 0 5
What is the proper statistical test to use with this research problem?
1. the dependent t‐test 2. the independent t‐test 3. the one‐way analysis of variance test 4. the chi‐square test
For the following research problem ‐ Is it really true that people with graduate degrees in certain fields earn substantially less money than people with graduate degrees in certain other fields? To answer this question, you look at data collected by Yuppie University on the salaries earned by recent graduate and professional students.
Salaries for recent graduates of Yuppie University by field of study
Engineering PhD Humanities PhD Education PhD J.D. M.D.
$40,000 $22,000 $25,000 $40,000 $50,000
$28,000 $24,000 $27,000 $35,000 $43,000
$32,000 $28,000 $31,000 $33,000 $33,000
$36,000 $24,000 $24,000 $36,000 $39,000
$30,000 $27,000 $38,000 $50,000
$32,000
What is the proper statistical test to use with this research problem?
1. the dependent t‐test 2. the independent t‐test 3. the one‐way analysis of variance test 4. the chi‐square test
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARD – BUSINESS STATISTICS‐Sec. 9 Page 104 of ‐131
ARD – BUSINESS STATISTICS‐08 Page 105 of ‐131
Practicum: MATH11002 Business Statistics
MODULE 8
Date of Receipt
Score: Assistant Signature
Submitted only on Day/Date: ____________ / ______________ Time: WIB In ____________________
I herewith signed here on stated that I have strived to do all this with the module myself. Name/NIM : ______________________________/_______________ Signature : _______________________________________________ Rem.:
Module Description: Regression Analysis
Objective The student understand and able use regression analysis to predict the value of a dependent variable based on an independent variable; The meaning of the regression coefficients; Making inferences about the slope and correlation coefficient; Estimating mean values and predict individual values using Ms Excel Regression Analysis or Other Statistical Softwares.
Output A report of Simple Regression Analysis produced by the students should be in the form of working procedures and results in both softcopy and hardcopy.
8 Regression Analysis
8.1 Simple Regression Analysis The field of econometrics uses regression analysis to create quantitative models that can be used to predict the value of a series if one knows the value of several other variables. This analysis tool performs linear regression analysis by using the "least squares" method to fit a line through a set of observations. You can analyze how a single dependent variable is affected by the values of one or more independent variables — for example, how For example, the wage per hour can be predicted if one knows the values of the variables that constitute the regression equation. This is a big leap of faith from a correlation or Confidence interval estimate. In a correlation, the statistician is not presuming or implying any causality or deduction of causality. On the other hand, regression analysis is used so often (probably even abused) because of its supposed ability to link cause and effect. Skepticism of causal relationships is not only healthy but also important because real power of regression lies in a comprehensive interpretation of the results.
8.2 Regression Analysis Using Excel Before using any analysis tool, you must arrange the
data you want to analyze in columns or rows on your
worksheet. This will be your input range. Once the
data is set you can open the analysis tool, in this
case Regression. Tools‐> Data Analysis… ‐>
Regression
ARD – BUSINESS STATISTICS‐08 Page 106 of ‐131
8.3 Regression Dialog Box
Input Y Range – Enter the reference for the range of dependent data. The range must consist of a single
column of data. You can type in the data or use the Collapse or “go out and get it button” . This will
collapse your window such that you can select the data you wish to use. Once you have chosen your
desired data either press Enter or click on the Expand button .
Input X Range – Enter the reference for the range of independent data. Microsoft Excel orders
independent variables from this range in ascending order from left to right. The maximum number of
independent variables is 16.
Labels – Select if the first row or column of your input range or ranges contains labels. Clear if your
input has no labels; Excel generates appropriate data labels for the output table.
Confidence Level – Select to include an additional level in the summary output table. In the box, enter
the confidence level you want applied in addition to the default 95 percent level.
Constant is Zero – Select to force the regression line to pass through the origin.
Output Range – Enter the reference for the upper‐left cell of the output table. Allow at least seven
columns for the summary output table, which includes an anova table, coefficients, standard error of y
estimate, r2 values, number of observations, and standard error of coefficients.
New Worksheet Ply – Click to insert a new worksheet in the current workbook and paste the results
starting at cell A1 of the new worksheet. To name the new worksheet, type a name in the box.
New Workbook – Click to create a new workbook and paste the results in the new workbook.
Residuals – Select to include residuals in the residuals output table.
Standardized Residuals –Select to include standardized residuals in the residuals output table.
Residual Plots – Select to generate a chart for each independent variable versus the residual.
Line Fit Plots – Select to generate a chart for predicted values versus the observed values.
ARD – BUSINESS STATISTICS‐08 Page 107 of ‐131
Normal Probability Plots – Select to generate a chart that plots normal probability.
8.4 Simple Regression
8.5 Linear Correlation and Regression Analysis In this section the objective is to see whether there is a correlation between two variables and to
find a model that predicts one variable in terms of the other variable. There are so many
examples that we could mention but we will mention the popular ones in the world of business.
Usually independent variable is presented by the letter x and the dependent variable is presented
by the letter y. A business man would like to see whether there is a relationship between the
number of cases of sold and the temperature in a hot summer day based on information taken
from the past. He also would like to estimate the number cases of soda which will be sold in a
particular hot summer day in a ball game. He clearly recorded temperatures and number of cases
of soda sold on those particular days. The following table shows the recorded data from June 1
through June 13. The weatherman predicts a 94F degree temperature for June 14. The
ARD – BUSINESS STATISTICS‐08 Page 108 of ‐131
businessman would like to meet all demands for the cases of sodas ordered by customers on June
14.
DAY Cases of Soda Temperature
1‐Jun 57 56
2‐Jun 59 58
3‐Jun 65 63
4‐Jun 67 66
5‐Jun 75 73
6‐Jun 81 78
7‐Jun 86 85
8‐Jun 88 85
9‐Jun 88 87
10‐Jun 84 84
11‐Jun 82 88
12‐Jun 80 84
13‐Jun 83 89
Now lets use Excel to find the linear correlation coefficient and the regression line equation. The linear correlation coefficient is a quantity between ‐1 and +1. This quantity is denoted by R. The closer R to +1 the stronger positive (direct) correlation and similarly the closer R to ‐1 the stronger negative (inverse) correlation exists between the two variables. The general form of the regression line is y = mx + b. In this formula, m is the slope of the line and b is the y‐intercept. You can find these quantities from the Excel output. In this situation the variable y (the dependent variable) is the number of cases of soda and the x (independent variable) is the temperature. To find the Excel output the following steps can be taken:
Step 1. From the menus choose Tools and click on Data Analysis.
Step 2. When Data Analysis dialog box appears, click on correlation.
Step 3. When correlation dialog box appears, enter B1:C14 in the input range box. Click on Labels in first row and enter a16 in the output range box. Click on OK.
Cases of Soda TemperatureCases of Soda 1 Temperature 0.96659877 1
As you see the correlation between the number of cases of soda demanded and the temperature is a very strong positive correlation. This means as the temperature increases the demand for cases of soda is also increasing. The linear correlation coefficient is 0.966598577 which is very close to +1.
Now lets follow same steps but a bit different to find the regression equation.
ARD – BUSINESS STATISTICS‐08 Page 109 of ‐131
Step 1. From the menus choose Tools and click on Data Analysis
Step 2. When Data Analysis dialog box appears, click on regression.
Step 3. When Regression dialog box appears, enter b1:b14 in the y‐range box and c1:c14 in the x‐range box. Click on labels.
Step 4. Enter a19 in the output range box.
Note: The regression equation in general should look like Y=m X + b. In this equation m is the slope of the regression line and b is its y‐intercept.
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.966598577
R Square 0.934312809
Adjusted R Square 0.928341246
Standard Error 2.919383191
Observations 13
ANOVA
df SS MS F Significance F
Regression 1 1333.479989 1333.479989 156.4603497 7.58511E‐08
Residual 11 93.75078034 8522798213
Total 12 1427.230769
Coefficients Standard Error t Stat P‐value Lower 95% Upper 95%
Intercept 9.17800767 5.445742836 1.685354587 0.120044801 ‐2.80799756 21.16401
Temperature 0.879202711 0.07028892 12.50841116 7.58511E‐08 0.724497763 1.033908
The relationship between the number of cans of soda and the temperature is:
Y = 0.879202711 X + 9.17800767
The number of cans of soda = 0.879202711*(Temperature) + 9.17800767. Referring to this expression we can approximately predict the number of cases of soda needed on June 14. The weather forecast for this is 94 degrees, hence the number of cans of soda needed is equal to; The number of cases of soda=0.879202711*(94) + 9.17800767 = 91.82 or about 92 cases.
ARD – BUSINESS STATISTICS‐08 Page 110 of ‐131
Assignment 8.1 Regression Analysis:
The highway deaths per 100 million vehicle miles and highway speed limits for 10 countries, are given below:
(Death, Speed) = (3.0, 55), (3.3, 55), (3.4, 55), (3.5, 70), (4.1, 55), (4.3, 60), (4.7, 55), (4.9, 60), (5.1, 60), and (6.1, 75).
From this we can see that five countries with the same speed limit have very different positions on the safety list. For example, Britain ... with a speed limit of 70 is demonstrably safer than Japan, at 55. Can we argue that, speed has little to do with safety. Use regression analysis to answer this question.
ARD – BUSINESS STATISTICS‐08 Page 111 of ‐131
Practicum: MATH11002 Business Statistics
MODULE 9
Date of Receipt
Score: Assistant Signature
Submitted only on Day/Date: ____________ / ______________ Time: WIB In ____________________
I herewith signed here on stated that I have strived to do all this with the module myself. Name/NIM : ______________________________/_______________ Signature : _______________________________________________ Rem.:
Module Description: MULTIPLE REGRESSION Objective How to develop a multiple regression model How to interpret the
regression coefficients How to determine which independent variables are most important in predicting a dependent variable How to use quadratic terms in a regression model How to measure the correlation among independent variables
Output A report of Multiple Regression Analysis produced by the students should be in the form of working procedures and results in both softcopy and hardcopy.
9 Multiple Regression Model Multiple Regression is an extension of simple regression. Simple regression has only one
independent (explanatory) variable. Multiple Regression fits a model for one dependent (response)
variable based on more than one independent (explanatory) variables.
9.1 MULTIPLE REGRESSION USING THE DATA ANALYSIS ADDIN
We then create a new variable in cells C2:C6, cubed household size as a regressor. Then in cell C1 give the the heading CUBED HH SIZE. (It turns out that for the se data squared HH SIZE has a coefficient of exactly 0.0 the cube is used).
The spreadsheet cells A1:C6 should look like:
We have regression with an intercept and the regressors HH SIZE and CUBED HH SIZE
ARD – BUSINESS STATISTICS‐08 Page 112 of ‐131
The population regression model is: y = β1 + β2 x2 + β3 x3 + u It is assumed that the error u is independent with constant variance (homoskedastic) - see EXCEL LIMITATIONS at the bottom.
We wish to estimate the regression line: y = b1 + b2 x2 + b3 x3
We do this using the Data analysis Add-in and Regression.
The only change over one-variable regression is to include more than one column in the Input X Range. Note, however, that the regressors need to be in contiguous columns (here columns B and C). If this is not the case in the original data, then columns need to be copied to get the regressors in contiguous columns.
Hitting OK we obtain
ARD – BUSINESS STATISTICS‐08 Page 113 of ‐131
The regression output has three components:
Regression statistics table ANOVA table Regression coefficients table.
9.2 INTERPRET REGRESSION STATISTICS TABLE
This is the following output. Of greatest interest is R Square.
Explanation
Multiple R 0.895828 R = square root of R2
R Square 0.802508 R2
Adjusted R Square 0.605016 Adjusted R2 used if more than one x variable
Standard Error 0.444401 This is the sample estimate of the standard deviation of the error u
Observations 5 Number of observations used in the regression (n)
The above gives the overall goodness-of-fit measures: R2 = 0.8025 Correlation between y and y-hat is 0.8958 (when squared gives 0.8025). Adjusted R2 = R2 - (1-R2 )*(k-1)/(n-k) = .8025 - .1975*2/2 = 0.6050.
The standard error here refers to the estimated standard deviation of the error term u. It is sometimes called the standard error of the regression. It equals sqrt(SSE/(n-k)). It is not to be confused with the standard error of y itself (from descriptive statistics) or with the standard errors of the regression coefficients given below.
ARD – BUSINESS STATISTICS‐08 Page 114 of ‐131
R2 = 0.8025 means that 80.25% of the variation of yi around ybar (its mean) is explained by the regressors x2i and x3i.
9.3 INTERPRET ANOVA TABLE
An ANOVA table is given. This is often skipped.
df SS MS F Significance F
Regression 2 1.6050 0.8025 4.0635 0.1975
Residual 2 0.3950 0.1975
Total 4 2.0
The ANOVA (analysis of variance) table splits the sum of squares into its components.
Total sums of squares = Residual (or error) sum of squares + Regression (or explained) sum of squares.
Thus Σ i (yi - ybar)2 = Σ i (yi - yhati)2 + Σ i (yhati - ybar)2
where yhati is the value of yi predicted from the regression line and ybar is the sample mean of y.
For example: R2 = 1 - Residual SS / Total SS (general formula for R2) = 1 - 0.3950 / 1.6050 (from data in the ANOVA table) = 0.8025 (which equals R2 given in the regression Statistics table).
The column labeled F gives the overall F-test of H0: β2 = 0 and β3 = 0 versus Ha: at least one of β2 and β3 does not equal zero. Aside: Excel computes F this as: F = [Regression SS/(k-1)] / [Residual SS/(n-k)] = [1.6050/2] / [.39498/2] = 4.0635.
The column labeled significance F has the associated P-value. Since 0.1975 > 0.05, we do not reject H0 at signficance level 0.05. Note: Significance F in general = FINV(F, k-1, n-k) where k is the number of regressors including hte intercept. Here FINV(4.0635,2,2) = 0.1975.
9.4 INTERPRET REGRESSION COEFFICIENTS TABLE
The regression output of most interest is the following table of coefficients and associated output:
ARD – BUSINESS STATISTICS‐08 Page 115 of ‐131
Coefficient St. error t Stat P‐value Lower 95% Upper 95%
Intercept 0.89655 0.76440 1.1729 0.3616 ‐2.3924 4.1855
HH SIZE 0.33647 0.42270 0.7960 0.5095 ‐1.4823 2.1552
CUBED HH SIZE 0.00209 0.01311 0.1594 0.8880 ‐0.0543 0.0585
Let βj denote the population coefficient of the jth regressor (intercept, HH SIZE and CUBED HH SIZE).
Then
Column "Coefficient" gives the least squares estimates of βj. Column "Standard error" gives the standard errors (i.e.the estimated standard deviation)
of the least squares estimates bj of βj. Column "t Stat" gives the computed t‐statistic for H0: βj = 0 against Ha: βj ≠ 0.
This is the coefficient divided by the standard error. It is compared to a t with (n‐k)
degrees of freedom where here n = 5 and k = 3.
Column "P‐value" gives the p‐value for test of H0: βj = 0 against Ha: βj ≠ 0..
This equals the Pr{|t| > t‐Stat}where t is a t‐distributed random variable with n‐k degrees
of freedom and t‐Stat is the computed value of the t‐statistic given in the previous
column.
Note that this p‐value is for a two‐sided test. For a one‐sided test divide this p‐value by 2
(also checking the sign of the t‐Stat).
Columns "Lower 95%" and "Upper 95%" values define a 95% confidence interval for βj.
A simple summary of the above output is that the fitted line is
y = 0.8966 + 0.3365*x + 0.0021*z
9.5 CONFIDENCE INTERVALS FOR SLOPE COEFFICIENTS
95% confidence interval for slope coefficient β2 is from Excel output (-1.4823, 2.1552).
Excel computes this as b2 ± t_.025(3) × se(b2) = 0.33647 ± TINV(0.05, 2) × 0.42270 = 0.33647 ± 4.303 × 0.42270 = 0.33647 ± 1.8189 = (-1.4823, 2.1552).
ARD – BUSINESS STATISTICS‐08 Page 116 of ‐131
Other confidence intervals can be obtained.
For example, to find 99% confidence intervals: in the Regression dialog box (in the Data
Analysis Add‐in),
check the Confidence Level box and set the level to 99%.
9.6 TEST HYPOTHESIS OF ZERO SLOPE COEFFICIENT ("TEST OF STATISTICAL SIGNIFICANCE")
The coefficient of HH SIZE has estimated standard error of 0.4227, t-statistic of 0.7960 and p-value of 0.5095. It is therefore statistically insignificant at significance level α = .05 as p > 0.05.
The coefficient of CUBED HH SIZE has estimated standard error of 0.0131, t-statistic of 0.1594 and p-value of 0.8880. It is therefore statistically insignificant at significance level α = .05 as p > 0.05.
There are 5 observations and 3 regressors (intercept and x) so we use t(5-3)=t(2). For example, for HH SIZE p = =TDIST(0.796,2,2) = 0.5095.
9.7 TEST HYPOTHESIS ON A REGRESSION PARAMETER
Here we test whether HH SIZE has coefficient β2 = 1.0.
Example: H0: β2 = 1.0 against Ha: β2 ≠ 1.0 at significance level α = .05.
Then t = (b2 - H0 value of β2) / (standard error of b2 ) = (0.33647 - 1.0) / 0.42270 = -1.569.
9.7.1 Using the pvalue approach
p‐value = TDIST(1.569, 2, 2) = 0.257. [Here n=5 and k=3 so n‐k=2]. Do not reject the null hypothesis at level .05 since the p‐value is > 0.05.
9.7.2 Using the critical value approach
We computed t = ‐1.569 The critical value is t_.025(2) = TINV(0.05,2) = 4.303. [Here n=5 and k=3 so n‐k=2]. So do not reject null hypothesis at level .05 since t = |‐1.569| < 4.303.
9.8 OVERALL TEST OF SIGNIFICANCE OF THE REGRESSION PARAMETERS
We test H0: β2 = 0 and β3 = 0 versus Ha: at least one of β2 and β3 does not equal zero.
ARD – BUSINESS STATISTICS‐08 Page 117 of ‐131
From the ANOVA table the F-test statistic is 4.0635 with p-value of 0.1975. Since the p-value is not less than 0.05 we do not reject the null hypothesis that the regression parameters are zero at significance level 0.05. Conclude that the parameters are jointly statistically insignificant at significance level 0.05.
Note: Significance F in general = FINV(F, k-1, n-k) where k is the number of regressors including hte intercept. Here FINV(4.0635,2,2) = 0.1975.
9.9 PREDICTED VALUE OF Y GIVEN REGRESSORS
Consider case where x = 4 in which case CUBED HH SIZE = x^3 = 4^3 = 64.
= b1 + b2 x2 + b3 x3 = 0.88966 + 0.3365×4 + 0.0021×64 = 2.37006
9.10 EXCEL LIMITATIONS
Excel restricts the number of regressors (only up to 16 regressors).
Excel requires that all the regressor variables be in adjoining columns. You may
need to move columns to ensure this. e.g. If the regressors are in columns B and D you need to copy at least one of columns B and D so that they are adjacent to each other.
Excel standard errors and t-statistics and p-values are based on the assumption that the error is independent with constant variance (homoskedastic). Excel does not provide alternaties, such asheteroskedastic-robust or autocorrelation-robust standard errors and t-statistics and p-values
9.11 Assignment 9.1 DATA:
Store Bars Sold Price (cents) Promotion ($) Store Bars sold Price (cents) Promotion ($)
1 4141 59 200 18 2730 79 400
2 3842 59 200 19 2618 79 400
3 3056 59 200 20 4421 79 400
4 3519 59 200 21 4113 79 600
5 4226 59 400 22 3746 79 600
6 4630 59 400 23 3532 79 600
7 3507 59 400 24 3825 79 600
8 3754 59 400 25 1096 99 200
9 5000 59 600 26 761 99 200
ARD – BUSINESS STATISTICS‐08 Page 118 of ‐131
Store Bars Sold Price (cents) Promotion ($) Store Bars sold Price (cents) Promotion ($)
10 5120 59 600 27 2088 99 200
11 4011 59 600 28 820 99 200
12 5015 59 600 29 2114 99 400
13 1916 79 200 30 1882 99 400
14 675 79 200 31 2159 99 400
15 3636 79 200 32 1602 99 400
16 3224 79 200 33 3354 99 600
17 2295 79 400 34 2927 99 600
A sample of 34 stores data ini a supermarket chain is selected for a test‐market study of OmniPower. All
the stores selected have approximately the same monthly sales volume. Two independent variables are
prices of bar (X1) and monthly Ads expenditures (X2).
a. Use Excel Data Analysis – Regression to estimate the regression line
b. Interpret regression statistics table
c. Use 95% and 99% confidence interval
d. Test Hypothesis Of Zero Slope Coefficient ("Test Of Statistical Significance")
e. Test Hypothesis On A Regression Parameter
i. Using The P‐Value Approach
ii. Using The Critical Value Approach
f. Overall Test Of Significance Of The Regression Parameters
g. Predicted Value Of Y Given Price 89 cents and Promotion 800
ARD – BUSINESS STATISTICS‐09 Page 119 of ‐131
Practicum: MATH11002 Business Statistics
MODULE 10
Date of Receipt
Score: Assistant Signature
Submitted only on Day/Date: ____________ / ______________ Time: WIB In ____________________
I herewith signed here on stated that I have strived to do all this with the module myself. Name/NIM : ______________________________/_______________ Signature : _______________________________________________ Rem.:
Module Description: TIME SERIES FORECASTING
Objective Discussed the important of forecasting Performed smoothing of data series Described least square trend fitting and forecasting Addressed time series forecasting Addressed autoregressive models Described procedure for choosing appropriate models
Output A report produced by the students should be in the form of working procedures and results in both softcopy and hardcopy.
10 Time Series Forecasting Time Series analysis has two main goals:
* Identifying the nature of a sequence of observations.
* Predicting future values using historical observations (also known as forecasting).
In Time Series analysis, it is assumed that the data consists of a systematic pattern, and also random
noise that makes the pattern difficult to identify. Most time series analysis techniques use filtering to
remove the data noise. There are two general components of Time series patterns: Trend and
Seasonality. The trend is a linear or non‐linear component, and does not repeat within the time range.
The Seasonality repeats itself in systematic intervals over time. These two components are often both
present in real data.
Trend Analysis
Trend analysis is a technique used to identify a trend component in time series data. In many
cases data can be approximated by a linear function, but logarithmic, exponential, and
polynomial functions can also be used.
Regression Analysis
Regression analysis is the study of relationships among variables, and its purpose is to predict, or
estimate, the value of one variable from the known values of other variables related to it. Any
method of fitting equations to data may be called regression, and these equations are useful for
making predictions, and judging the strength of relationships.
Forecasting and extrapolation from present values to future values is not a function of regression
analysis. To predict the future, time series analysis is used. To predict values it is necessary to find a
predictive function that will minimize the sum of distances between each of the points, and the
predictive function itself. The least‐squares method is the most common function amongst the
ARD – BUSINESS STATISTICS‐09 Page 120 of ‐131
predictive functions, and it calculates the minimum average squared deviations between the points, and
the estimated function.
10.1 Time series forecasting models
Basic assumption of time‐series forecasting is that the factors that have influenced activities in the past
and present will continue to do so in approximately the same way in the future. A trend is an overall
long term upward or downward movement in a time series. The most basic in the classical multiplicative
model for annual, quarterly, and monthly.
10.1.1 CLASSICAL MULTIPLICATIVE TIMESERIES MODEL FOR ANNUAL DATA Yi = Ti x Ci x Ii
Where :
Ti = value of the trend component in year‐i Ci = value of the cyclical component in year‐i Ii = value of the irregular component in year‐i
CLASSICAL MULTIPLICATIVE TIME‐SERIES MODEL FOR ANNUAL DATA WITH A SEASONAL COMPONENT
Yi = Ti x Si x Ci x Ii
Where :
Ti, Ci, Ii = value of the trend, cyclical, and irregular components in year‐i Si = value of the component in year‐i
Use Wrigley Coded Data below to create excel chart plot for Actual Gross Revenue
Year Actual Revenue Year Actual Revenue
1984 591 1995 1770
1985 620 1996 1851
1986 699 1997 1954
1987 781 1998 2023
1988 891 1999 2079
1989 993 2000 2146
1990 1111 2001 2430
1991 1149 2002 2746
1992 1301 2003 3069
1993 1440 2004 3649
1994 1661 2005 4159
ARD – BUSINESS STATISTICS‐09 Page 121 of ‐131
10.1.2 Assignment 9.1
Year Population Workforce Year Population Workforce
1984 176,383 113,544 1995 198,584 132,304
1985 178,206 115,461 1996 200,591 133,943
1986 180,587 117,834 1997 203,133 136,297
1987 182,753 119,865 1998 205,220 137,673
1988 184,613 121,669 1999 207,753 139,368
1989 186,393 123,869 2000 212,577 142,583
1990 189,164 125,840 2001 215,092 143,734
1991 190,925 126,346 2002 217,570 144,863
1992 192,805 128,105 2003 221,168 146,510
1993 194,838 129,200 2004 223,357 146,817
1994 196,814 131,056 2005 226,082 147,956
c. Plot using Ms Excel the time series for US civilian noninstitutional population of people
16 years and older.
d. Compute the linier trend forecasting equation
e. Forecast the US civilian noninstitutional population of people 16 years and older for
2006 and 2007.
f. Repeat (a) through (c) for US. civilian noninstitutional workforce of people 16 years and
older.
10.2 Moving Average and Exponential Smoothing
10.2.1 Moving Average Models Use the Add Trendline option to analyze a moving average forecasting model in Excel. You must first create a graph of the time series you want to analyze. Select the range that contains your data and make a scatter plot of the data. Once the chart is created, follow these steps:
y = 143.63x - 284695R² = 0.9121
0
500
1000
1500
2000
2500
3000
3500
4000
4500
1980 1985 1990 1995 2000 2005 2010Rev
enu
e ($
mill
ion
s)
Year
Wm. Wrigley Jr. Company Actual Revenue
ARD – BUSINESS STATISTICS‐09 Page 122 of ‐131
1. Click on the chart to select it, and click on any point on the line to select the data series. When you click on the chart to select it, a new option, Chart, s added to the menu bar.
2. From the Chart menu, select Add Trendline.
Moving averages for a chosen period of length (L) consist of a series of means computed over time such that each mean is calculated for a sequence of L observed values. Moving Average are represented by the symbol MA(L). For example we have 11 years data and want to compute five‐year moving averages ( L=5).
11‐years period 1996 to 2006 data:
4.0 5.0 7.0 6.0 8.0 9.0 5.0 2.0 3.5 5.5 6.6
MA(5) = (Y1 + Y2 + Y3+ Y4 +Y5 )/L = (4.0 + 5.0 +7.0 +6.0 + 8.0)/5 = 6.0
Put the moving average computed above centered on new middle value (7.0). Calculate the rest MA(L) and we have:
Revenue 4.0 5.0 7.0 6.0 8.0 9.0 5.0 2.0 3.5 5.5 6.6
MA ‐ ‐ 6.0 7.0 7.0 6.0 5.5 5.0 4.5 ‐ ‐
The following is three‐year and seven‐year moving for Cabot Corporation revenues:
Year Revenue MA 3-Year MA 7-Year
1982 1588 #N/A #N/A
1983 1558 1633 #N/A
1984 1753 1573 #N/A
1985 1408 1490.3 1531.1
1986 1310 1380.7 1581.0
1987 1424 1470.3 1599.1
1988 1677 1679.3 1561.3
1989 1937 1766.3 1583.3
1990 1685 1703.3 1627.4
1991 1488 1578.3 1665.0
1992 1562 1556.3 1688.4
1993 1619 1622.7 1678.1
1994 1687 1715.7 1671.3
1995 1841 1797.7 1694.9
1996 1865 1781.0 1714.4
1997 1637 1718.3 1725.7
1998 1653 1663.0 1702.3
1999 1699 1683.3 1661.7
2000 1698 1640.0 1651.7
2001 1523 1592.7 1694.1
2002 1557 1625.0 1761.6
2003 1795 1762.0 #N/A
2004 1934 1951.3 #N/A
2005 2125 #N/A #N/A
0
500
1000
1500
2000
2500
1980 1985 1990 1995 2000 2005 2010
Rev
enue
s ($
mill
ions
)
Year
Moving Averages for Cabot Corporation Revenue
Revenue
Revenue
MA 3-Year
MA 7-Year
ARD – BUSINESS STATISTICS‐09 Page 123 of ‐131
10.2.2 Exponential Smoothing Models
The simplest way to analyze a timer series using an Exponential Smoothing model in Excel is to use the data analysis tool. This tool works almost exactly like the one for Moving Average, except that you will need to input the value of a instead of the number of periods, k. Once you have entered the data range and the damping factor, 1‐α, and indicated what output you want and a location, the analysis is the same as the one for the Moving Average model.
COMPUTING AND EXPONENTIALLY SMOOTHED VALUE IN TIME PERIOD i
Ei = Yi
Ei = WYi +(1‐W)Ei‐1 i= 2,3,4, …
Where Ei =value of the exponentially smoothed series being computed in time period i Ei‐1 = value of the exponentially smoothed series being computed in time period i‐1 Yi = Observed value of the time series in period i W = subjectively assigned weight or smoothing coefficient (0 < W <1).
Year Revenue ES(W=.50) ES(W=.25)
1982 1588 1588.0 1588.0
1983 1558 1573.0 1580.5
1984 1753 1663.0 1623.6
1985 1408 1535.5 1569.7
1986 1310 1422.8 1504.8
1987 1424 1423.4 1484.6
1988 1677 1550.2 1532.7
1989 1937 1743.6 1633.8
1990 1685 1714.3 1646.6
1991 1488 1601.1 1606.9
1992 1562 1581.6 1595.7
1993 1619 1600.3 1601.5
1994 1687 1643.6 1622.9
1995 1841 1742.3 1677.4
1996 1865 1803.7 1724.3
1997 1637 1720.3 1702.5
1998 1653 1686.7 1690.1
1999 1699 1692.8 1692.3
2000 1698 1695.4 1693.8
2001 1523 1609.2 1651.1
2002 1557 1583.1 1627.5
2003 1795 1689.1 1669.4
2004 1934 1811.5 1735.6
2005 2125 1968.3 1832.9
0
500
1000
1500
2000
2500
1980 1985 1990 1995 2000 2005 2010
Rev
enue
s ($
mill
ions
)
Year
Exponentially Smoothed Cabot Corporation Revenue
Revenue
ES(W=.50)
ES(W=.25)
ARD – BUSINESS STATISTICS‐09 Page 124 of ‐131
10.3 Assignment 10.2 Year 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006
Deals 715 865 708 861 931 939 1031 893 735 759 1013 622
h. Plot the time series
i. Fit a three year moving average to the data and plot the results.
j. Using a smoothing coefficient of W = 0.50, exponentially smooth the series and
plots the results
k. Repeat (c) using W = 0.25
l. Compare the results of (c) and (d).
10.4 Linear, exponential and quadratic trend
10.4.1 Linear Trend Model Linier trend model Yi = β0 + β1Xi+ i is the simplest forecasting model.
Using Wrigley Data above we plot using Microsoft Excel time‐series of real gross revenues shown below:
Using Microsoft Excel, we perform a
simple linier regression analysis on the
adjusted time series results in the
following linier trend forecasting
equation: 469.9158 62.1068
The regression coefficient can be
interpret as follows:
The Y intercept, b0 =
469.9158
The Slope, b1 = 62.1068
For example we want to project the trend in 2006 then substitute X23 =22 (2006 code), into the linear
trend forecasting equation:
ARD – BUSINESS STATISTICS‐09 Page 125 of ‐131
469.9158 62.1068 22 1,839.265 1983 1984
Quadratic Trend Model
, is the simplest nonlinear model. The equation of Quadratic Trend Model
presented below:
, ; ; estimated
quadratic effect on Y
For example, Using Microsoft Excel to compute the quadratic trend forecasting equation. Figure below provides
the results for quadratic trend model used to forecast real gross revenues at the WM. Wrigley Jr. company:
618.3211 17.5852 2.1201
To compute a forecast using the quadratic trend equation in 2006 then substitute X23 =22 (2006 code),
into the quadratic trend forecasting equation:
618.3211 17.5852 22 2.1201 22 2,031.324
ARD – BUSINESS STATISTICS‐09 Page 126 of ‐131
10.4.2 Exponential Trend Model The exponential trend model equation ( ,
where 1 100% % . The exponential
trend forecasting equation is log(Yi)= b0 +b1Xi.
Excel results worksheet for an exponential trend model for real gross revenues at the WM. Wrigley Jr. company is
Using exponential trend equation and the results above we have: log(Yi)= 2.7647 +.0245Xi, where year 0 is 1984.
Compute the values for and by using the antilog of regression coefficients (b0 and b1):
2.7647 10 . 581.701
0.0245 10 . 1.058
Thus, the equation of the exponential trend forecasting is:
581.701 1.058
To forecast real gross revenues for 2006 (X23=22) using the above equation are as follow:
log(Yi)= 2.7647 +.0245(22)=3.3037
3.3037 103.3037 2,012.334
The chart of exponential trend forecasting is:
ARD – BUSINESS STATISTICS‐09 Page 127 of ‐131
10.4.3 Model Selection Using First, Second, and Percentage Differences
To select which of those models above is the most appropriate model, we can use visually
inspecting scatter plot and compating the adjusted r2 values, we can compare and examine first,
second, and percentage differences.
Perfect Fit For Linear Trend Model: The first differences are constant. And the consecutive
values in the series are the same throughout
Example:
1997 1998 1999 2000 2001 2002 2003 2004 2005 2006Passengers 30 33 36 39 42 45 48 51 54 57First Diff 33 3 36 6 39 9 42 12 45
Perfect Fit For Quadratic Trend Model: The second differences are constant. And the
consecutive values in the series are the same throughout
Example:
1997 1998 1999 2000 2001 2002 2003 2004 2005 2006Passengers 30 31 33.5 37.5 43 50 58.5 68.5 80 93First Diff 31 2.5 35 8 42 16.5 52 28 65Second Diff 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
Perfect Fit For Exponential Trend Model: The percentage difference between the consecutive
values are constant. Thus 100% 100% 100%
Example:
1997 1998 1999 2000 2001 2002 2003 2004 2005 2006
ARD – BUSINESS STATISTICS‐09 Page 128 of ‐131
Passengers 30 31.5 33.1 34.8 36.5 38.3 40.2 42.2 44.3 46.5
First Diff 31.5 1.6 33.2 3.3 35 5.2 37 7.3 39.2
Second Diff 0.1 0.1 7.11E‐15 0.1 0.1 0.1 0.1 0.1
Percentage Diff 5% 5% 5% 5% 5% 5% 5% 5% 5%
For the real gross revenue data at WM Jr. Company, neither the first, second differences, nor
percentage differences are constant across the series (see: table below). Therefore, the other
models may be more appropriate (including those considered in Autoregressive Modeling.
10.4.4 Assignment 10.3 a. Plot the Data of Table 9.1 Bed Bath & Beyond Inc.
b. Compute a linear trend forecasting equation and plot the
results.
c. Compute a linear trend forecasting equation and plot the
results.
d. a linear trend forecasting equation and plot the results.
e. Using the forecasting equation in (b) through (d), what are
your annual forecasts of the number of stores open for
2007 and 2008
f. How can you explain the differences in the three forecast
in (e)? What forecast do you think you should use? Why?
10.5 The autoregressive and the leastsquare models for seasonal data Autoregressive modeling is a technique used to forecast time series with autocorrelation. A first‐order
autocorrelation refers to the relationship between consecutive values in time series. A second‐order
autocorrelation refers to the relationship between values that are two period apart. A pth‐order order
autocorrelation refers to the correlation between values in a time series that are p period apart.
First Order Autoregressive Model
is similar in form to the simple linear regression model.
10.6 Prices indexes Index numbers allow relative comparisons over time
Index numbers are reported relative to a base period index
Base period index = 100 by definition
Table 10‐1Bed Bath & Beyond Inc.
ARD – BUSINESS STATISTICS‐09 Page 129 of ‐131
where
Ii = index number for year i
Pi = price for year i
Pbase = price for the base year
10.6.1 Example Airplane ticket prices from 1998 to 2006:
Prices in 1998 were 92.2% of base year prices
Prices in 2000 were 100% of base year prices (by definition, since 2000 is the base year)
Prices in 2006 were 130.2% of base year prices
10.7 Aggregated and simple indexes An aggregate index is used to measure the rate of change from a base period for a group of items
ARD – BUSINESS STATISTICS‐09 Page 130 of ‐131
10.7.1 Unweighted Aggregate Price Index
Example:
Year Lease payment Fuel Repair Total Index (2003=100)
2003 260 45 40 345 100.0
2004 280 60 40 380 110.1
2005 305 55 45 405 117.4
2006 310 50 50 410 118.8
Unweighted total expenses were 18.8% higher in 2006 than in 2003
10.7.2 Weighted Aggregate Price Indexes
118.8(100)345
410100
P
PI
2003
20062006
ARD – BUSINESS STATISTICS‐09 Page 131 of ‐131