agenda 1.homework 2.quiz 3 3.dummy variables 4.forecasting autoregressive model time series
TRANSCRIPT
AGENDA1. Homework
2. Quiz 3
3. Dummy Variables
4. Forecasting
• Autoregressive Model• Time Series
I. HOMEWORK
Posted under the LESSONS tab in ANGEL
DUE
Friday 4/23/10
DUMMY VARIABLES
Initially we said that all dependent and independent variables in the regression model should be continuous variables (interval or ratio scale).
Sometimes it is necessary to include a categorical (or qualitative) variable into the regression equation since the qualitative variable may play a significant role in the prediction or explanation of the dependent variable.
DUMMY VARIABLES (cont.)
Whether someone has a college degree or not may influence the salary the person gets.
The location of a house (rural, urban) can change its value.
The season (winter, fall, summer, spring) can be used when predicting the number of flights arriving in Hawaii.
Whether a company is public or private may have a role in explaining its exports level.
DUMMY VARIABLES (cont.)
An easy way of incorporating a qualitative variable into a regression model is by representing them with special types of variables called dummy variables.
A dummy variable is a variable that indicates the presence or absence of some characteristics or attribute. The dummy variable assumes the value of 1 if the attribute is present, and 0 if the value is absent.
Dummy variables are also called indicator variables, categorical variables, or binary variables.
DUMMY VARIABLES Let’s say that we want to predict the salary a
customer service agent gets. We think that years of experience is one of the variables (X1).
We would also like to include whether the person is a college graduate or not. We will use a dummy variable to include this information. Therefore x2 will be
x2 = 0, if the person is not a college
graduate.
x2 = 1, if the person is a college graduate.
DUMMY VARIABLES
A dummy variable can only take on 2 values (0 or 1), we call the condition in which the dummy variable is 0 the base condition.
The coefficient of the dummy variable represents the difference between being in the base condition and not being in the base condition.
For continuous variable coefficient:b1 is interpreted as “the change in the predicted value of annual salary (Y) with one unit change in years of experience (X1)”.
For dummy variable coefficient: b2 is interpreted as “the change in the predicted value of the annual salary (Y) when the person is a college graduate versus when he/she is not”. NOT with a one unit change in X2.
INTERPRETING DUMMY VARIABLES
The dummy variable affects the intercept of the regression model, not the slope.
No college degreeWith college degree
Y-annual salary
X1-years of experience
INTERPRETING DUMMY VARIABLES
DUMMY VARIABLE EXAMPLE
Y: annual salary
X1: years of experience
X2: 1 if the person has a college degree, 0 otherwise.
Assume that the person has 5 years of experience. What would his salary be if he is not a college graduate? What would his salary be if he is a college graduate?
21 85.225ˆ xxy
TESTING A MODEL WITH DUMMY VARIABLES
The F-test for the overall model includes the dummy variables as well, and it is interpreted same way as before.
The t-test for testing the significance of the coefficient (H0: 2=0) is also identical to how we did it previously. It tells us whether the use of the variable is justified for the regression model.
MULTI-CATEGORY DUMMY VARIABLES
What if we want to use a categorical variable that has more than 2 levels? For example, how do we use dummy variables for a “season” variable?
We cannot assign numbers 1, 2, 3, 4… because a dummy variable can only take on values 0 and 1.
Instead we use multiple dummy variables to code the multi-category variable.
When a categorical variable has d levels, d-1 number of dummy variables are used to code this categorical variable.
You take one level to be the base condition where all of the dummy variables are 0.
For example to code seasons, we need 4 - 1 = 3 dummy variables (X1, X2, X3).
Let’s take winter as our base case. We designate X1 to
represent Spring, X2 to represent Summer, X3 to represent
Fall. Only one of the dummy variables can be 1 at a time.
Winter: 0,0,0 Spring: 1,0,0 Summer: 0,1,0 Fall: 0,0,1
X1 X2 X3Winter 0 0 0 (base case)Spring 1 0 0 X1=1 when springSummer 0 1 0 X2=1 when summerFall 0 0 1 X3=1 when fall
MULTI-CATEGORY DUMMY VARIABLES
Dummy Variable Example
GPA is a function of class standing:
GPA = 2.65 + 0.5X1 + 0.67X2 -0.18X3
Freshman is the base caseX1 = 1 if sophomore
X2 = 1 if junior
X3 = 1 if senior
How many dummy variables are needed?What is the predicted GPA of a freshman? senior?
FORECASTING One of the most important applications of
regression analysis is developing forecasts. What we are trying to do is to develop
forecasts of future values based on an examination of the variable in past time periods.
Typical business examples: Forecasting sales of a product so that you can
plan your inventory levels. Forecasting profits and income so you can
determine whether or not you will need a bank loan.
FORECASTING
Typically the best forecast that you will have for the future is based on actual results and trends of the recent past.
We will use a quantitative forecasting method based on this logic called time series analysis.
Time series data are data collected at regular intervals over a period of time.
Time series analysis is a set of quantitative methods for determining patterns in time series data.
TIME SERIES ANALYSIS
So, forecasting is the extrapolation of series values beyond the region of the estimation data.
But, in regression we cannot use a regression model outside the range for which the model is estimated. Therefore we need to make the following basic assumption:
Those factors that have influenced patterns of activity in the past and present will continue to do so in more or less the same manner in the near future.
COMPONENTS OF TIME SERIES
Time-series data are usually effected by four factors:
1. TREND: steady tendency of increase or decrease over time.
Possible Causes: changes in technology, culture, population, popularity…
Duration: many years – Systematic
Example: number of internet users is steadily increasing
year after year.
2. SEASONAL VARIATION: Regular fluctuations or periodic changes
that repeat year after year.
Possible Causes: weather, social and religious customs…
Duration: repeats every year (4 seasons, 12 months, or 52 weeks depending on
periods being analyzed) – systematic.
Example: sales of snow blowers, suntan lotions, barbeque grills, Christmas shopping
etc.
COMPONENTS OF TIME SERIES (cont.)
COMPONENTS OF TIME SERIES (cont.)
3. CYCLICAL VARIATION: Repetitive fluctuations or swings of varying
length and intensity in the long-term.
Possible Causes: business or economic conditions.
Duration: periods longer than one year – systematic.
Example: Economic cycles of growth or contraction, inflation, recession
4. RANDOM OR IRREGULAR VARIATION:Unpredictable random variations in the
time- series that the above three components fail to account for.
Possible Causes: Unforeseen events such as catastrophes, strikes, etc.
Duration: short, unrepeatingUnsystematic, random.
Example: Sales of bundling supplies after a hurricane, loss of customers of
an airline due to a strike.
COMPONENTS OF TIME SERIES (cont.)
TIME SERIES EXAMPLE
You are asked to prepare a forecast of sales based on previous years’ sales.
Sales
0.0020.0040.0060.0080.00
100.00120.00140.00160.00180.00200.00
1986 1988 1990 1992 1994 1996 1998
Years
Sa
les(
in t
ho
usa
nd
un
its)
Sales
Years Sales1988 100.601989 102.901990 108.701991 128.401992 150.701993 149.601994 166.001995 161.601996 150.601997 174.00
ESTIMATING A LINEAR TREND
There is nothing specific about the “years” besides the fact that they are ranked with equal distances between observations.
Therefore, it is common practice to recode the years into simpler numbers.
To do this, we take the first year for which the data for which the data is available as the “base year” by setting t=0.
Then for each consecutive period, t increases by one.
TIME SERIES EXAMPLE (cont.)
Years Time Sales1988 0 100.601989 1 102.901990 2 108.701991 3 128.401992 4 150.701993 5 149.601994 6 166.001995 7 161.601996 8 150.601997 9 174.00
Sales
0.0020.0040.0060.0080.00
100.00120.00140.00160.00180.00200.00
1986 1988 1990 1992 1994 1996 1998
YearsS
ale
s(th
ou
san
d d
oll
ars
)
Sales
- 1988 is the base year. 8554.0
31.893.101ˆ2
R
ty
Regression results:
INTERPRETING TIME SERIES RESULTS
b0=101.93 predicted sales for t=0 (year 1988)
b1=8.31change in sales in one period (one year)
Forecast for sales in 1998:
Base year is 1988, so t=1998-1988=10
03.185)10(31.893.101ˆ:(1998) Sales Predicted 10 y
tyt 31.893.101ˆ
TIME SERIES EXAMPLE A time series analysis conducted using data
between years 1992 and 2002 yielded the following results:
Yt=1850 + 35t
where Yt is the yearly profit of a computer company in thousands of dollars and t is measured in years. (t1992=0).
a. What is the estimated yearly profit of this company for the year 2004?
b. How much would you expect the profit level of this company change in 6 years?
AUTOREGRESSIVE MODELS
So far our future predictions were based on the relationship between the time and existing values.
Another type of model that can be used for estimating the future values. That is the autoregressive model in which the values are predicted based directly on past observations.
AUTOREGRESSIVE MODEL An auto regressive model uses the past
observations as independent variables in the model as predictors for future values.
Autoregressive models are also called lagged models.
...ˆ)1(210 ktktt YbYbbY
AUTOREGRESSIVE EXAMPLESales for a women's perfume brand are given for each quarter of 3 consecutive years.
Predict the sales for the first and second quarters of 2001.
Year Quarter Sales
1998 1 147.62 251.83 273.14 249.1
1999 1 139.32 221.23 260.24 259.5
2000 1 140.52 245.53 298.84 287.0
2001 1408.176.11ˆ
tYYt
t-4
t-8
Seasonal Index
0
500
1000
1500
2000
2500
3000
Q1 Q2 Q3 Q4
Quarter
To
tal
Rev
enu
e
2006
2007
2008
Seasonal Index
To calculate the seasonal index you need:
1. Calculate the Moving Averages2. Calculate the Centered Moving Averages3. Calculate the Ratio to Centered Moving
Averages4. Average the Ratio to Centered Moving
Averages
Seasonal Index
To use the seasonal index:
Multiply the result of forecasted sales value times the seasonal index
Seasonal IndexTIME SALES
(1,000s)MOVING AVERAGE
CENTERED MOVING AVERAGE
RATIO TO CENTERED MOVING AVERAGE
2001-1
4.8
2001-2
4.1
5.35
2001-3
6.0 5.475 1.10
5.6
2001-4
6.5 5.7375 1.13
5.875
2002-1
5.8 5.975 0.97
6.075
2002-2
5.2 6.1875 0.84
6.3
2002-3
6.8
2002-4
7.4