tutes_new_1

35
J. Hirschberg ECON20003 Semester 1, 2014 1 TUTORIALS & ASSIGNMENTS Tutorials are held in the Computer labs on the 3 rd floor of the Spot. Please use the online tutorial registration system to obtain your assigned tute. Tutes start the first week of lectures. However because not everyone will have a tute to attend we will not take attendance. The first tutorial is an introduction to Eviews and should be attended by all students so that they can get some familiarity with the program. The tutes are in two parts. Part A is to be completed before attending the tute and should be handed in at the tute for credit so remember to put you name on it. Try to do these tasks because the tutor will assume that you will have done this work before coming to the session. Note that the assignments are closely related to the tutorial tasks and you should have no problems with the assignments if you have been keeping up with the tutorials. The material for each tute that is located inside the box is intended as keys to how to perform the part A of the tutorial in Eviews. Note that Eviews has an extensive set of pdfs under the Help button and there are a number of other sources on the Web for advice as to how tasks can be performed in Eviews. Note that we use version 8 but most of the earlier version results and programs are applicable to this version. You will be informed of any major changes. Tutes 5, 8, and 12 are assignments that are due by 9:am on the Monday of the week of the tutorial. They are to be submitted via the assignment tool in LMS. Assignment #1 is due on March 31 st , Assignment #2 is due on April 28 th and Assignment #3 is due on May 26 th . Also bring a copy of your assignment to the tute for that week where it will be reviewed in the tutes for that week. To avoid possible difficulties please keep a copy of all assignments turned in. No late assignments are accepted because we will go over them in tutes. I will count the best 2 out of the 3. The percentage of total marks for tutorial participation and assignments in this subject is 25%. The assignments total of 15% with each assignment potentially worth 7.5% of your mark. Thus if a1 a2 and a3 are the marks out of 5 for each assignment the 15% to your final mark is given by 1.5*( (a1 + a2 + a3) - min(a1,a2,a3) ). This means that if the mark for one assignment is 0 or you are happy with the marks you received on the first two the third is optional – or if you are unable to complete an assignment I will only count the ones you do. An additional 10% is awarded for tutorial participation which includes the completion of the part A questions for those tutes where these have been assigned to be printed out and available for your tutor to check it off. Your participation in the discussions at the tutorial will also go toward receiving the full tutorial participation mark. It is strongly suggested that you use a memory chip to store your results and data so that you have an ongoing copy of the work you do in the tutes. Once you copy the original data series all the results can be saved on the data set so you can refer to them latter. However you will need to copy the results to a word file to keep the results for each tute and assignment for handing in.

Upload: lucas-scodellaro

Post on 28-Dec-2015

82 views

Category:

Documents


14 download

DESCRIPTION

de shi

TRANSCRIPT

Page 1: Tutes_new_1

J. Hirschberg ECON20003 Semester 1, 2014

1

TUTORIALS & ASSIGNMENTS

Tutorials are held in the Computer labs on the 3rd floor of the Spot. Please use the online tutorial registration system to obtain your assigned tute. Tutes start the first week of lectures. However because not everyone will have a tute to attend we will not take attendance. The first tutorial is an introduction to Eviews and should be attended by all students so that they can get some familiarity with the program. The tutes are in two parts. Part A is to be completed before attending the tute and should be handed in at the tute for credit so remember to put you name on it. Try to do these tasks because the tutor will assume that you will have done this work before coming to the session. Note that the assignments are closely related to the tutorial tasks and you should have no problems with the assignments if you have been keeping up with the tutorials. The material for each tute that is located inside the box is intended as keys to how to perform the part A of the tutorial in Eviews. Note that Eviews has an extensive set of pdfs under the Help button and there are a number of other sources on the Web for advice as to how tasks can be performed in Eviews. Note that we use version 8 but most of the earlier version results and programs are applicable to this version. You will be informed of any major changes. Tutes 5, 8, and 12 are assignments that are due by 9:am on the Monday of the week of the tutorial. They are to be submitted via the assignment tool in LMS. Assignment #1 is due on March 31st, Assignment #2 is due on April 28th and Assignment #3 is due on May 26th. Also bring a copy of your assignment to the tute for that week where it will be reviewed in the tutes for that week. To avoid possible difficulties please keep a copy of all assignments turned in. No late assignments are accepted because we will go over them in tutes. I will count the best 2 out of the 3. The percentage of total marks for tutorial participation and assignments in this subject is 25%. The assignments total of 15% with each assignment potentially worth 7.5% of your mark. Thus if a1 a2 and a3 are the marks out of 5 for each assignment the 15% to your final mark is given by 1.5*( (a1 + a2 + a3) - min(a1,a2,a3) ). This means that if the mark for one assignment is 0 or you are happy with the marks you received on the first two the third is optional – or if you are unable to complete an assignment I will only count the ones you do. An additional 10% is awarded for tutorial participation which includes the completion of the part A questions for those tutes where these have been assigned to be printed out and available for your tutor to check it off. Your participation in the discussions at the tutorial will also go toward receiving the full tutorial participation mark. It is strongly suggested that you use a memory chip to store your results and data so that you have an ongoing copy of the work you do in the tutes. Once you copy the original data series all the results can be saved on the data set so you can refer to them latter. However you will need to copy the results to a word file to keep the results for each tute and assignment for handing in.

Page 2: Tutes_new_1

J. Hirschberg ECON20003 Semester 1, 2014

2

Tutorial 1 – Preliminary Tutorial Eviews Week of March 3rd. Prior work on this tutorial is not required. This is an optional tutorial to familiarize you with the Eviews software. The Eviews data set bank.wf1 contains data from a law suit for racial discrimination that was brought against a bank in the US. The data lists the following information for 474 individuals as identified by the column titles listed below.

name label AGE Age of employee EDLEVEL Educational level ID Employee code JOBCAT Employment category MINORITY Minority classification SALBEG Beginning salary SALNOW Current salary GENDER Gender of employee GEN_MIN Gender & Minority StatusTIME Job seniority WORK Work experience

1. Provide a descriptive analysis of each of the variables by generating the measures of central tendency and the measures of dispersion or the distributions of each variable when appropriate.

Some of the variables in this data set are categorical in nature. For this reason they have labels assigned to them which define which group they are in. In order to describe these variables the usual descriptive statistics make little sense. The appropriate descriptive measure is a frequency table of the different categories such this is generated by using One-

Way Tabulation under the button for the series – in this case we did it for the GEN_MIN variable.:

Page 3: Tutes_new_1

J. Hirschberg ECON20003 Semester 1, 2014

3

2. Create a new variable called DIFF by taking the difference between the salary paid now and the salary they started.

This can be done using the button at the top of the screen to reveal a command window where you type the equation:

Using this variable generate the descriptive statistics and draw a histogram of this variable. 3. Draw a set of side-by-side Boxplots for the new variable that you have created (DIFF) based on the classification variables GENDER, JOBCAT, MINORITY and GEN_MIN. What can you conclude from these plots? To produce these you will open the variable DIFF you will open the spread sheet listing of the variable then use the View button to get the graph sub-menu.

Page 4: Tutes_new_1

J. Hirschberg ECON20003 Semester 1, 2014

4

This is the way the menu looks automatically but you need to change three parts:

In the graph sub-menu you need to use the General from Basic graph to Categorical graph, change Specific from Line& Symbol to Boxplot and in the within graph window put either GENDER, JOBCAT, MINORITY or GEN_MIN to obtain the different versions of the side-by-side boxplots.

Page 5: Tutes_new_1

J. Hirschberg ECON20003 Semester 1, 2014

5

Tutorial 2. Simple Hypothesis tests with Eviews Week of March 10th. A. Prior to coming to the tutorial do the following analysis and bring it to your tute to be handed in. A.1. Open the workfile DDF_stores_1.wf1 (as described in Lecture 2 with more detail in assignment #2 below) which can be downloaded from LMS. Use the Details+/- to uncover the description of each of the variables copy these to a table with just the names and the descriptions. When you first open the file it will have only the names:

After using the Details +/- button we get more information.

By selecting the entire table with cntl/a you can copy it to a spread sheet and remove the Type and Last Update columns to create a table of names and descriptions. A.2. Examine a single series For the series Dairy that records the average sales in the dairy departments of the 84 stores. A.2.i Compute descriptive statistics and a histogram using the command Descriptive Statistics & Tests/Histogram and Stats and past the resulting table into a word document. 1st. Select the variable dairy using the left button on the mouse pointing at the name use the right button to open the sub-window. This will open a spreadsheet view of the series where the rows are the observations:

Page 6: Tutes_new_1

J. Hirschberg ECON20003 Semester 1, 2014

6

2nd From the spread sheet view use the View button to find the Descriptive Statistics sub-menu and select Histogram and Stats.

A.2.ii Interpret the test of normality in this case. Specify the null hypothesis and make a conclusion by specifying the probability of a type one error. B. In the tutorial you will do the following. Perform the tests of the location and create a new variable from the existing variable. B.1. Perform a test of Hypothesis concerning the average dairy section expenditures in each store: B.1.i Create a new series called log_dairy=log(dairy) (see tute 1 for details on how to create a new variable) and plot the histogram of this variable. Remember with some equations the results may not be allowed (such as the log of any non-positive number or a ratio when the denominator is zero). If this happens the program will set that observation to NA and it cannot be used in many computations. To obtain the histogram we now proceed as was the case for the dairy done in question 2 above. Interpret the test statistic for the null hypothesis that this series is normally distributed. B.1.ii Construct a test for the null hypothesis that the mean of the dairy expenditures is $4,400 as opposed to the hypothesis that it is not equal to $4,400 using Eviews. Find the p-value for this test and describe what it means. B.1.iii Construct a test for the mean of the log of HH income is equal to 10.6. Now test if the mean income was 40,134? (hint: we need to create a new variable).

Page 7: Tutes_new_1

J. Hirschberg ECON20003 Semester 1, 2014

7

Tutorial 3. Comparison of Independent Samples Week of March 17th. A. Prior to coming to the tutorial do the following analysis and bring it to your tute to be handed in. A.1. The data set turnpike.wf1 records the number of cars that travel on a road every day for approximately 3 years. The variable n_cars lists the number of cars recorded for each day. The variable weekend is equal to one when the day is a Sunday or Saturday and zero for the weekdays. Using this data graphically show if weekdays have more or less traffic than weekends over this period with side by side box-plots. Select the boxplot and categorical type with weekend as the within graph variable:

A.2 Test the hypothesis that the mean number of cars is the same for the weekdays as it was for the weekends. Use the comparison of means directly using the tests for descriptive stats we request the Equality Tests by Classification and use the variable weekend as the classifying variable:

Record the results of this test and bring it to your tute.

Page 8: Tutes_new_1

J. Hirschberg ECON20003 Semester 1, 2014

8

B. In the tutorial answer the following. Using the Training.wf1 data set described in the lecture notes. B.1 Use a set of boxplots to determine if the computer proficiency in using spread sheets (c_ss), data entry (c_de), and data base programs (c_db) differs by occupation (occ). B.2 Using the test for medians (why not means?) specify a set of alternative hypotheses and perform three tests that one occupation has a greater level of computer proficiency.

Page 9: Tutes_new_1

J. Hirschberg ECON20003 Semester 1, 2014

9

Tutorial 4. Comparison of paired samples Week of March 24th. A. Prior to coming to the tutorial do the following analysis and bring it to your tute to be handed in. A.1. Using the pc_car.wf1 data file that was discussed in the lecture notes determine the validity of the following propositions: A.1.i The number of Daihatsus is the same as the number of BMWs. First create a new variable with a particular name for the difference between the number of Daihatsus and the number of BMWs:

Now open the new variable and perform a test if the mean is equal to zero or not.

Repeat this process for all the remaining parts of this question and interpret the results. A.1.ii The number of German car makes is the same as the number of Korean car makes. A1.iii The number of motorcycles is the same as the number of Swedish car makes. A.1.iv The number of Holdens is the same as the number of Toyotas. Don’t forget to generate a new variable for each comparison.

Page 10: Tutes_new_1

J. Hirschberg ECON20003 Semester 1, 2014

10

B.1 Using the Training.wf1 data set described in the lecture notes. B.1.i Test if the computer proficiency in using spread sheets (c_ss) and computer proficiency for using word processing (c_wp) is different among all the persons in this sample. B.2.ii Test if the computer proficiency in using spread sheets (c_ss) and computer proficiency for data base retrieval (c_db) is different among all the persons in this sample.

Page 11: Tutes_new_1

J. Hirschberg ECON20003 Semester 1, 2014

11

Tutorial 5. & Assignment #1 ANOVA Due on March 31st at 9:00am. Tutes for Week of March 31st will review answers. This assignment will be worth a maximum of 5% of your total grade, each question has an equal weight. The assignment is due by 9am on March 31st the tutorials in week 5 will be planned around a discussion of the answers. Use the assignment tool to submit this assignment. Make sure that you have made a photocopy of your assignment before you had it in. All answers are to be written or pasted on no more than 4 pages (more will not be assessed). Please do not use a font smaller than 10 point for your written answers (plots may be reduced if they are legible). It is an important element of this assignment that you select the relevant computer generated results. In a job related task of this sort you will be asked to be concise in your description of the results that you obtain - part of this assignment is to give you practice in providing a brief summary of the statistics you compute. In 1993 the Australian Bureau of Statistics conducted a survey of a sample of households in Australia to determine their spending habits as well as other characteristics. The data set abs_hh_exp93.wf1 is a sample of averages from that data. It consists of values for the average share of total household expenditure by type of goods or services purchased for different groups of households. These groups of households are defined by combinations of income level, country of birth of the head of the household, number of persons who spend, and the state in which they live. A list of the expenditure variables is given below. E1 Breakfast cereals E2 Pasta (spaghetti, noodles, etc.) E3 Bacon E4 Marmalades, jams & conserves E5 Potato crisps & other savory confection E6 Ice confectionery E7 Baked beans & canned spaghetti E8 Jeans, men’s E9 Coffee (packaged) E10 Toiletries & cosmetics, nec E238 Meals in restaurants, hotels, clubs etc. E664 Beer - cons off licensed premises EGAMB All gambling EF_HOME Food at home The categorical variables are defined with the frequency of each group below.

DNSPN # spenders in the HH

122 21.0 21.0 21.0

166 28.5 28.5 49.5

144 24.7 24.7 74.2

105 18.0 18.0 92.3

45 7.7 7.7 100.0

582 100.0 100.0

1.00 1 spender

2.00 2 spenders

3.00 3 spenders

4.00 4 spenders

5.00 Five | > spenders

Total

ValidFrequency Percent

ValidPercent

CumulativePercent

Page 12: Tutes_new_1

J. Hirschberg ECON20003 Semester 1, 2014

12

STATE State

91 15.6 15.6 15.6

92 15.8 15.8 31.4

77 13.2 13.2 44.7

63 10.8 10.8 55.5

70 12.0 12.0 67.5

56 9.6 9.6 77.1

67 11.5 11.5 88.7

66 11.3 11.3 100.0

582 100.0 100.0

1.00 New South Wales

2.00 Victoria

3.00 Queensl&

4.00 South AU

5.00 Western AU

6.00 Tasmania

7.00 Northern Territory

8.00 AUn Capital Territory

Total

ValidFrequency Percent

ValidPercent

CumulativePercent

INCGRP Income group

65 11.2 11.2 11.2

82 14.1 14.1 25.3

90 15.5 15.5 40.7

91 15.6 15.6 56.4

85 14.6 14.6 71.0

77 13.2 13.2 84.2

92 15.8 15.8 100.0

582 100.0 100.0

1.00

2.00

3.00

4.00

5.00

6.00

7.00

Total

ValidFrequency Percent

ValidPercent

CumulativePercent

NHDCOB Country of Birth of Head of HH

222 38.1 38.1 38.1

169 29.0 29.0 67.2

191 32.8 32.8 100.0

582 100.0 100.0

1.00 Australia

2.00 rest of world

3.00 Europe

Total

ValidFrequency Percent

ValidPercent

CumulativePercent

I encourage you to work on this project with others but you are to hand in your own work. 1. (1%) Using a set of graphs establish if number of spenders, state, country of birth of the head of HH or income group is most important for the determination of the variation in E7 (Baked beans & canned spaghetti). Briefly describe the important features shown in these graphs. 2. (1%) Compare the expenditure share for E3 (Bacon) and the expenditure share on E238 (Meals in restaurants, hotels, clubs etc). Can you reject the hypothesis that the means are equal? Can you determine if there are potential problems in using this test? 3. (1%) Determine if the expenditure share for EGAMB (all gambling) is influenced by income level? Can you explain why there may be variation or why not? 4. (2%) Find the factor that best explains the variation in the expenditure share for EF_HOME (Food at home)? Can you give this an interpretation?

Page 13: Tutes_new_1

J. Hirschberg ECON20003 Semester 1, 2014

13

Tutorial 6. Regression Analysis I Week of April 7th . The Eviews file Stocks.wf1 contains the monthly returns for a number of stocks traded on the NY stock exchange.

Asset Industry

BOISE Building Materials CITCRP Bank CONED Electric Utility CONOCO Oil CONTIL Airline CPI Consumer Price IndexDATGEN Computers DEC Computers DELTA Airline DOW Chemicals DUPONT Chemicals GENMIL Food GERBER Food IBM Computers MARKET The market as a wholeMOBIL Oil MOTOR Electronics PANAM Airline PSNH Electric Utility RKFREE Risk Free Asset TANDY Electronics TEXACO Oil WEYER Building Materials

Part A. To be completed prior to the tutorial. A.1. Using the returns for the assets that relate to the oil companies compute the regressions that are used in the CAPM analysis. This will first require the creation of new variables which have the risk free asset (RKFREE) subtracted from the returns as they are listed. A.2 Which of the estimates of β are equal to one? A.3 Can we make a general statement about the value of β for this industry? Part B. To be completed in the tutorial B1. Using the returns for the assets related to the airlines and computers compute the CAPM regressions to derive the estimates for β. B.2 Are the values equal to 1 or not? B.3 Again can we make a general statement about the value of β for these groups of firms as they compare with the oil firms you did in part B?

Page 14: Tutes_new_1

J. Hirschberg ECON20003 Semester 1, 2014

14

Tutorial 7. Regression II Week of April 14th Question 1 : The data set training.wf1 lists the following information for a sample of 607 individuals.

Name Description C_DB Level of computing proficiency - database retrieval C_DE Level of computing proficiency - data entry C_SS Level of computing proficiency - spreadsheets C_WP Level of computing proficiency - word processing DURATION Time on the Job INCOME Average Weekly Income OCC Occupation (9 = teachers 12 = Business )

Where the Computer proficiency variables have the following values:

Value Level of Proficiency 0 No response for this category 1 Had basic proficiency 2 Had intermediate proficiency3 Had advanced proficiency

A. A.1. Using this data estimate a regression to determine the impact of the time on the job (DURATION) to the level of income (INCOME). Is a linear specification sufficient for this analysis or is there a nonlinear relationship such as a polynomial will provide a better fit? B.2. Using a log transformation estimate the elasticity of income with respect to time on the job. B. B.1 To specify a model with a dummy variable if a variable has a particular value we would specify a regression with the form:

income duration (c_wp = 2) c In this case the model would be a linear model in the duration with a variable was equal to one when the person reported that they “Had intermediate proficiency” in spread sheet programs and zero otherwise. Always remember to enclose these variables in parentheses to create a dummy variable. Specify a model as you have done in part A.1 now with dummy variables if the person is a teacher or not. Interpret your results now. B.2 Now add a dummy variable to indicate if they have any proficiency at all in data base programs. Use (c_db > 0) to define a dummy that is one for all values of c_db that are greater than 0 and = 0 for all observations when c_db = 0. What is the interpretation of this result – can we use this to quantify the value of the use of data base programs?

Page 15: Tutes_new_1

J. Hirschberg ECON20003 Semester 1, 2014

15

Tutorial 8 & Assignment #2 Regression II Due on April 28th . Tutes for Week of April 28th will review answers Assignment #2. This assignment is worth a maximum of 5 marks for your total grade. The assignment is due by 9am on April 28th via the assignment tool in LMS. Make sure that you print a copy to bring to the tutorial. The total number of pages should not exceed 5 pages. A research project at the University of Chicago has made available a set of detailed data for the Dominick’s super markets located in metropolitan Chicago.1 For the purposes of this subject I have created an Eviews file entitled DDF_stores_1.wf1 that covers a very small portion of this data. The data contained on the file created here are average daily sales by department for a set of 84 stores for which the data are relatively complete - at least 3 years of data are used for each store. The data also contains the information obtained from the Census for the areas in which the stores are located as well as some additional marketing information concerning the nature of the customers in the store. Again as in assignment 1, I encourage you to work on this project with others but you are to hand in your own work.

1 The Dominick's database covers store level scanner data collected at Dominick's Finer Foods over a period of more than seven years. The data is the property of the Marketing group at the University of Chicago Graduate School of Business and is intended for academic use only.

Store information. Variable Label store Store Number dairy Dairy Sales - 1 frozen Frozen Sales - 2 meat Meat Sales - 3 produce Produce Sales (fruit and Veg) - 4 deli Deli Services - 5 deliself Deli Self Serv Sales - 6 bakery Bakery Sales - 7 haba Health and Beauty Aids sales - 8 photofin Photo Finish Sales - 9 video Video Sales - 0 custcoun Customer Count num Number of days observed tot_sale Total average daily sales

Demographic and Marketing

Variable Label name name of store city Location zip Zip Code lat Latitude long Longitude age9 % aged 9 or under age60 % aged 60 or over educ % University Graduates nocar % with no car income Log Median HH income hsizeavg Average HH size hhsingle % that are single HH hhlarge % that have 5 or more in HH workwom % of HH with Women that Work hvalmean Mean House value in 000s single % that are single retired % that are retired unemp % that are unemployed wrkch5 % Working Women with Children 5 and

under nwrkch5 % Not Working Women with Children 5

and under poverty % under poverty income level

Page 16: Tutes_new_1

J. Hirschberg ECON20003 Semester 1, 2014

16

Location of the stores by number

Page 17: Tutes_new_1

J. Hirschberg ECON20003 Semester 1, 2014

17

In doing this assignment keep in mind that there are no single answers that are “right” nor those that are “wrong” we are evaluating your responses on the basis of how you interpret your results and the logic of the models you construct. 1. (1.5%) Compute the share of the total sales that the deliself (Deli Self Service Sales) department sells of all products in the store by dividing by the variable called tot_sale and creating a new series. Choosing those variables that describe the neighborhoods that the stores are located in, plot the scatter plot matrix as shown below and describe how they change with each other as well as how they change with your new dependent variable . You should choose those variables that could possibly be the regressors that may do the most to explain the variation in the sales of the department you are investigating. Note that it is not necessary to use all the variables and it is also not necessary to keep choosing different sets. Interpret these plots – which variables appear to have the most impact on the sales you are investigating? Which variables appear to be most closely related to each other so that you would not want to include them in the same model? (note you will need to resize this plot to fit in your assignment report).

2. (1.5%) Using a regression analysis determine which characteristics of the shoppers in the store are most important in influencing the sales share of the deliself (Deli Self Service Sales) department. Limit your models to 5 or less regressors given the dangers of over fitting. The model should be such that given the characteristics of a neighborhood the model could be used to predict the proportion of sales of a new store that would be in the department that you are studying. 3. (2%) Write a description of results describing the implications of each t-test you perform. Note not all of the variables need to have a significant impact on the dependent variable that you have used. Also it is not necessary that the model be the “best” or have the highest R2 just that you have a justification for the inclusion of each of the regressors. If they result in t-statistics that do not allow you to reject the null hypothesis that the parameter is zero report this as well and discuss why you feel this may be the case.

Page 18: Tutes_new_1

J. Hirschberg ECON20003 Semester 1, 2014

18

Tutorial 9. Logistic Regression Week of May 5th

On the night of the 13th and the morning of the 14th of April 1912 the passenger ship Titanic on its maiden voyage sank after it hit an iceberg in the North Atlantic. Notoriously, not all its passengers survived the ordeal. Using the list of passengers and their characteristics we are able to analyze which passengers survived and which did not. The data titanic.wf1 which was obtained from the web site (www.encyclopedia-titanica.org) provides data on the passengers including: their age, the class of cabin in which they were traveling (1st, 2nd or 3rd (also referred to as steerage)), their gender, and if they were traveling with other members of their family. In this analysis we use only those for which an age and a fare paid were recorded. The table below lists the variables that were created from this data set.

Name of series Description AGE Age at time BD_ID Body id number CABIN Cabin Number FAMILY Family on board FARE Fare paid in pounds FEMALE =1 if female = 0 otherwise NAME PCLASS Passenger Class PORT Port of embarkation SURVIVED =1 if survived =0 otherwiseTIK_NO Ticket Number

Part A to be done prior to the tutorial A.1. Using the variable survived as the dependent variable estimate a standard regression model using the gender of the passenger and the fare they paid. (This is what would be referred to as a linear probability model). A.2 Reestimate the model using a logistic regression

Page 19: Tutes_new_1

J. Hirschberg ECON20003 Semester 1, 2014

19

Note that you will need to select the Method and that the estimation method is Logit. A.3 and evaluate the estimated partial derivative of the fare paid by the passenger on the probability of surviving at the average rate of survival for all passengers by using the formula:

P( 1) ˆ P 1 1 P 1j

j

YY Y

X

How do these values differ from the ones you found in A.1? (The P 1Y is estimated as

the mean of the survived variable). B.1. Reestimate the model fit in question A.2 (the logistic regression) only this time with age and gender instead of the fare paid. Is this new model superior model to the one estimated in A.2? B.2 Reestimate both models from questions A.2 and B.1 with an interaction between the fare paid and gender for the first model and age and gender for the second model. B.3 Using the estimated parameters from part B.2 estimate the partial derivative of the probability to survive with age and with fare for both men and women evaluated at the mean survival.

Page 20: Tutes_new_1

J. Hirschberg ECON20003 Semester 1, 2014

20

Tutorial 10. Time Series I Week of May 12th A. A set of data was collected with monthly observations on the following variables that describe the Tasmanian Economy given in Tasmania.wf1. The variable names and description of these variables are listed below.

Variable Name Descriptionfood Retail Food retailing expenditureaccom Takings from accommodationlott LOTTERIESm1 Dummy for January .m12 Dummy for Decemberyear_ YEAR, not periodicmonth_ MONTH, period 12date_ DATE. FORMAT: "MMM YYYY"

Using the data series for the expenditures on the takings from accommodation (accom), answer the following. A.1 Determine the nature of the Seasonality of these data (additive or Multiplicative) First plot the series as a time series line:

Now do the seasonal plot of the data:

Page 21: Tutes_new_1

J. Hirschberg ECON20003 Semester 1, 2014

21

Then scan these plots to determine if they are multiplicative or additive in nature and if you can see a trend. A.2 Estimate the seasonal factors for each series and test whether they are significant from the one for March or not using a regression with dummies for each month Specify the regression with all the month dummies but the one for March and interpret the values of the parameters and what they mean for the level of accommodation in March as compared with the other months.

A.3 Using the predicted values estimate a deseasonalized series for accom and plot the three series on one graph. After estimating the regression, use the forecast option to estimate the series accomf that would be the values of accom just due to the seasonal component. We can then subtract this from the series and add the average to obtain a deseasonalized series.

Page 22: Tutes_new_1

J. Hirschberg ECON20003 Semester 1, 2014

22

Now subtract the predicted values from the original series and add the mean of the original data (use @mean) to rescale the total to create the new variable accom_ds:

A.4 By adding a trend variable to these models determine if they have a trend and whether it is linear or not. This can be done by respecifying the model used above with a polynomial in the trend in order to see if the linear parameter estimate is significant or the quadratic parameter estimate.

Part B Perform the same tasks as in part A with Lott (the amount spent on the state lottery) as the variable of interest from the same data series. In what ways do these results differ from what you found looking at the accommodation data in part A? Do you have a reason why this might be the case.

Page 23: Tutes_new_1

J. Hirschberg ECON20003 Semester 1, 2014

23

Tutorial 11. Time Series II Week of May 19th A. The Eviews data set timeseries11.wf1 contains a number of series. The plots for these series are listed below.

20

22

24

26

28

30

10 20 30 40 50 60 70 80 90 100

SAU

36

40

44

48

52

56

60

10 20 30 40 50 60 70 80 90 100

SAZ

0

20

40

60

80

100

120

140

160

10 20 30 40 50 60 70 80 90 100

SAT

-40

-20

0

20

40

60

80

10 20 30 40 50 60 70 80 90 100

SAD

74

76

78

80

82

84

86

88

90

10 20 30 40 50 60 70 80 90 100

SAF

6

8

10

12

14

16

18

20

22

10 20 30 40 50 60 70 80 90 100

SAH

A.1 Estimate the correllograms and partial correllograms for the variables SAU, AAZ SAT, SAF, SAH, and SAD. A.2 Propose a model for the six of these series - it is not necessary to estimate models though you may try some of them.

B. Using the results in part A estimate the appropriate model for each and check the correllograms of the residuals for an indication as to whether you have removed all the systematic variation.

Page 24: Tutes_new_1

J. Hirschberg ECON20003 Semester 1, 2014

24

Tutorial 12. & Assignment #3 Exam Review Week of May 26th . This assignment will be worth a maximum of 5% of your total mark, each question has an equal weight. Recall that I will use the sum of the maximum two marks that you receive on your submitted assignments, thus you do not need to submit this assignment to receive full marks for assignments. I do, however, urge everyone to attempt this assignment because a set of questions similar to these will be part of the final examination. The assignment is due by 9am May 26th, the tutorials from the 26th to the 30th will be planned around a discussion of the answers. The assignments are to be submitted electronically using the assignment tool as available via the LMS. Make sure that you have made a photocopy of your assignment before you had it in. All answers are to be written or pasted on no more than 4 pages (more will not be assessed). Provide no more than two paragraph answers to each of these questions – each is worth ½ a mark.

1. For a group of firms we observe the following characteristics: the total revenue to total cost ratio, the ratio of net revenue to total sales, the market share of the firm, and the sales to assets ratio. We also know which of these firms export their products and which do not. Describe a method for determining if any of these firm characteristics can be used to predict whether a firm from another sample will be an exporter or not. What are the strengths and weaknesses of this method?

2. Two sets of firms are observed at the same point in time. One set has just failed and

filed for bankruptcy; the other is successful and still operating. We are given the net revenue per total asset value for the previous period when the firms in both sets were still operating. What form of analysis will determine if net revenue per total asset value is different for these two groups of firms? What test statistic would we compute and what type of test would be appropriate to determine if there is a difference between these groups?

3. In a set of data we observe the ranks of a random sample of 150 employees of a

corporation on a test of computer skills. The employees have been split into three groups. 50 undertook a day-long course in computer methods, a second group of 50 were given a booklet which they were asked to study on their own time, while the third group of 50 were not given any special training. How could we draw inferences from these observations to predict the effectiveness of the course versus the booklet versus doing nothing? Specify the null hypothesis to be tested. What test statistic or statistics would be most appropriate to use?

4. The profits of a set of milk bars are observed for the last 5 months. These stores are

classified into two groups: those that had negative profits and those that had positive profits. We are also given the net revenue per total asset value for the previous five months. What form of analysis would you use to build a model to predict whether the profits of another set of mini-supermarkets would be positive or negative? How would you evaluate this model? If you are now told the value of the profits how would you change your answer – is this model preferable to the model you fit before?

Page 25: Tutes_new_1

J. Hirschberg ECON20003 Semester 1, 2014

25

5. Assume a person has two options for investment – one is in oil futures and the other is in coffee futures. Let us assume that they both have had similar rates of return over the past few months. Describe an analysis that would allow you to distinguish between these two on the basis of a measure of dispersion. In other words: How would you determine if they are equally risky investments? What is your null hypothesis? Which statistic would you use?

6. If we observe the monthly demand for natural gas in Victoria for the past 10 years,

how would you predict monthly demand for the next year with only this information, describe the method that you would employ and what sort of models you would consider for this data. What diagnostic method would you apply to check the model once you had estimated the parameters? If you wanted to include the impact of the September 1998 period when almost all gas supplies were cut could you accommodate this event in your analysis?

7. A company that sells biscuits collects data on the sales of its biscuits and those of its

competitors using supermarket Scanner data. This data records the price of the equivalent sized package, and whether there was a sales campaign that week or not. Given that there are two brands of biscuits describe a method for measuring the price response and the response if there was a sales campaign or not, for each biscuit brand relative to the other brands. What are the strengths and weaknesses of this method?

8. An insurance company collects data on its medical liability policy holders such as:

age, whether trained in Australia or not, specialty in which they practice and other such information. We also have an indicator as to whether these doctors had a claim of liability against them or not. Describe a method for predicting if a particular doctor who is applying for insurance will make a liability claim in the future or not. What are the strengths and weaknesses of this method? Can we determine which characteristic is most important in predicting future liability claims? Describe how one can use this method to evaluate the model’s ability to fit the data.

9. If we plan to buy a house in a suburb of Melbourne and we have a history of house

prices in the suburb on a quarterly basis for the past 10 years how could we predict what the value of the housing prices would be in 5 to 10 years’ time?

10. A new soft drink is about to be marketed in an area where this particular brand has not

been sold before. Given that data is available for the sales of a similar product with the price of the product, the amount of advertising, the prices of related products and the income of the consumers of the product, describe the methods that you could apply to this data to predict the sales for the new product. How could you account for the time factor in the response to advertising?

Page 26: Tutes_new_1

J. Hirschberg ECON20003 Semester 1, 2014

26

AppendixA:SomeUsefulEviewsCommands Eviews has an extensive library of help documents on line. The Eviews tutorials located at: http://www.eviews.com/Learning/index.html are particularly useful. Eviews Illustrated is the more readable version of the documentation available on line and a link to this is available on the LMS. The main elements of the program for this subject will be covered in the tutorials. This appendix has pulled together a number of techniques that we will use that you may find helpful later in the subject. Dummy variables. Logic statements can be evaluated to generate a variable that is either 0 or 1 based on the evaluation. For example if a variable SEX is equal to 1 for female and 2 for male then the expression (SEX = 2) would have a value of 1 if the logical expression was true (if the observation was for a male) and 0 otherwise. Note if there are more than one possible answer such as if the variable was for occupation type or month of the year, then the alternative would be zero for all values that do not satisfy the logical expression. Automatic dummy variables. The @expand expression may be added in estimation to indicate the use of one or more automatically created dummy variables. Syntax Expression: @expand(ser1[, ser2, ser3, ...][, drop_spec])) creates a set of dummy variables that span the unique values of the input series ser1, ser2, etc. The optional drop_spec may be used to drop one or more of the dummy variables. drop_spec may contain the keyword "@DROPFIRST" (indicating that you wish to drop the first category), "@DROPLAST" (to drop the last category), or a description of an explicit category, using the syntax: @DROP(val1[, val2, val3,...]) where each argument corresponds to a category in @EXPAND. You may use the wild card "*" to indicate all values of a corresponding category. Example For example, consider the following two variables: SEX is a numeric series which takes the values 1 and 0. REGION is an alpha series which takes the values "North", "South", "East", and "West".

Page 27: Tutes_new_1

J. Hirschberg ECON20003 Semester 1, 2014

27

The command: eq.ls income @expand(sex) age regresses INCOME on two dummy variables, one for "SEX=0" and one for "SEX=1" as well as the simple regressor AGE. The @EXPAND statement in, eq.ls income @expand(sex, region) age creates 8 dummy variables corresponding to : sex=0, region="North" sex=0, region="South" sex=0, region="East" sex=0, region="West" sex=1, region="North" sex=1, region="South" sex=1, region="East" sex=1, region="West" The expression: @expand(sex, region, @dropfirst) creates the set of dummy variables defined above, but no dummy is created for "SEX=0, REGION="North"". In the expression: @expand(sex, region, @droplast) no dummy is created for "SEX=1, REGION="WEST"". The expression: @expand(sex, region, @drop(0,"West"), @drop(1,"North") creates a set of dummy variables from SEX and REGION pairs, but no dummy is created for "SEX=0, REGION="West"" and "SEX=1, REGION="North"". @expand(sex, region, @drop(1,*)) specifies that dummy variables for all values of REGION where "SEX=1" should be dropped.

Page 28: Tutes_new_1

J. Hirschberg ECON20003 Semester 1, 2014

28

Simple Mean @mean(var) Computes the mean of the data over the sample that has been specified.

Page 29: Tutes_new_1

J. Hirschberg ECON20003 Semester 1, 2014

29

Basic Date Functions There is a set of functions that provides information about the dates in your dated workfiles. The first two functions return the start and end date of the period of time (interval) associated with the current observation of the workfile: @DATE: returns the start date of the period of time of the current observation of the workfile. @ENDDATE: returns the end date of the period of time associated with the current observation of the workfile. Each date is returned in a number using standard EViews date representation (fractional days since 1st Jan 1AD, see Dates). A period is considered to end during the last millisecond contained within the period. In a regular frequency workfile, each period will end immediately before the start of the next period. In an irregular workfile there may be gaps between the end of one period and the start of the following period due to observations that were omitted in the workfile. The @DATE and @ENDDATE functions can be combined with the EViews date manipulation functions to provide a wide variety of calendar information about a dated workfile. For example, if we had a monthly workfile containing sales data for a product, we might expect the total sales that occurred in a given month to be related to the number of business days (Mondays to Fridays) that occurred within the month. We could create a new series in the workfile containing the number of business days in each month by using: series busdays = @datediff(@date(+1), @date, "B") If the workfile contained irregular data, we would need to use a more complicated expression since in this case the we can not assume that the start of the next period occurs immediately after the end of the current period. For a monthly irregular file, we could use: series busdays = @datediff(@dateadd(@date, 1, "M"), @date, "B") Similarly, when working with a workfile containing daily share price data, we might be interested in whether price volatility is different in the days surrounding a holiday for which the market is closed. We can use the first formula given above to determine the number of business days between adjacent observations in the workfile, then use this result to create two dummy variables that indicate whether a particular observation is before or after a holiday day. series before_holiday = (busdays > 1) series after_holiday = (busdays(-1) > 1) We could then use these dummy variables as exogenous regressors in the variance equation of a GARCH estimation to estimate the impact of holidays on price volatility.

Page 30: Tutes_new_1

J. Hirschberg ECON20003 Semester 1, 2014

30

In many cases, you may wish to transform the date numbers returned by @DATE so that the information is contained in an alternate format. EViews provides workfile functions that bundle common translations of date numbers to usable information. These functions include: @YEAR: returns the four digit year in which the current observation begins. It is equivalent to "@DATEPART(@DATE, "YYYY")". @QUARTER: returns the quarter of the year in which the current observation begins. It is equivalent to "@DATEPART(@DATE, "Q")". @MONTH: returns the month of the year in which the current observation begins. It is equivalent to "@DATEPART(@DATE, "MM")". @DAY: returns the day of the month in which the current observation begins. It is equivalent to "@DATEPART(@DATE, "DD")". @WEEKDAY: returns the day of the week in which the current observation begins, where Monday is given the number 1 and Sunday is given the number 7. It is equivalent to "@DATEPART(@DATE, "W")". @STRDATE(fmt): returns the set of workfile row dates as strings, using the date format string fmt. See Date Formats for a discussion of date format strings. @SEAS(season_number): returns a dummy variable based on the period within the current year in which the current observation occurs, where the year is divided up according to the workfile frequency. For example, in a quarterly file, "@SEAS(1)", "@SEAS(2)", "@SEAS(3)", and "@SEAS(4)" correspond to the set of dummy variables for the four quarters of the year. These expressions are equivalent (in the quarterly workfile) to "@QUARTER=1", "@QUARTER=2", "@QUARTER=3", and "@QUARTER=4", respectively. @ISPERIOD(arg): returns a dummy variable for whether the observation is in the specified period, where arg is a double quoted date or period number. Note that in dated workfiles, arg is rounded down to the workfile frequency prior to computation. Additional information on working with dates is provided in Dates. Trend Functions One common task in time series analysis is the creation of variables that represent time trends. EViews provides two distinct functions for this purpose: @TREND(["base_date"]): returns a time trend that increases by one for each observation of the workfile. The optional base_date may be provided to indicate the starting date for the trend. @TRENDC(["base_date"]): returns a calendar time trend that increases based on the number of calendar periods between successive observations. The optional base_date may be provided to indicate the starting date for the trend.

Page 31: Tutes_new_1

J. Hirschberg ECON20003 Semester 1, 2014

31

The functions @TREND and @TRENDC are used to represent two different types of time trends that differ in some circumstances. In a regular frequency workfile, @TREND and @TRENDC both return a simple trend that increases by one for each observation of the workfile. In an irregular workfile, @TREND provides an observation trend as before, but @TRENDC now returns a calendar trend that increases based on the number of calendar periods between adjacent observations. For example, in a daily irregular file where a Thursday has been omitted because it was a holiday, the @TRENDC value would increase by two between the Wednesday before and the Friday after the holiday, while the @TREND will increase by only one. Both @TREND and @TRENDC functions may be used with an argument consisting of a string containing the date at which the trend has the value of zero. If this argument is omitted, the first observation in the workfile will be given the value of zero. The decision of which form of time trend is appropriate in a particular context should be based on what role the time trend is playing in the analysis. When used in estimation, a time trend is usually used to represent some sort of omitted variable. If the omitted variable is something that continues to increase independently of whether the workfile data is observed or not, then the appropriate trend would be the calendar trend. If the omitted variable is something that increases only during periods when the workfile data is observed, then the appropriate trend would be the observation trend. An example of the former sort of variable would be where the trend is used to represent population growth, which continues to increase whether or not, for example, financial markets are open on a particular day. An example of the second sort of variable might be some type of learning effect, where learning only occurs when the activity is actually undertaken. Note that while these two trends are provided as built in functions, other types of trends may also be generated based on the calendar data of the workfile. For example, in a file containing monthly sales data, a trend based on either the number of days in the month or the number of business days in the month might be more appropriate than a trend that increments by one for each observation. These sorts of trends can be readily generated using @DATE and the @DATEDIFF functions. For example, to generate a trend based on the number of days elapsed between the start date of the current observation and the base date of 1st Jan 2000, we can use: series daytrend = @datediff(@date, @dateval("1/1/2000"), "d") When used in a monthly file, this series would provide a trend that adjusts for the fact that some months contain more days than others.

Page 32: Tutes_new_1

J. Hirschberg ECON20003 Semester 1, 2014

32

Appendix B How to cut and paste a spread sheet file to Eviews For example if we wanted to read in the data set referred to as SERIESA.dat we would first load the data set into Excel:

This data is monthly but the actual dates have not been given so we need to assume a period of 48 months. Assume the data are for Jan 1990 to Dec 1993 (for these data this is arbitrary). Thus we open the Eviews program (while keeping the Excel program open to the same page) and under the File menu we click on New and then Workfile as shown below:

This opens a dialogue window which requests the information for the nature of the series - the frequency and the dates - here we use the dates as we assumed above (for other series that have specified dates this would be the first and the last date of the series.

Now we use the submenu under Quick to the entry entitled Empty Group (Edit Series) which opens the Eviews equivalent to a set of empty cells in a spread sheet program.

Page 33: Tutes_new_1

J. Hirschberg ECON20003 Semester 1, 2014

33

The new spread sheet looks like this:

Click here

Now we can paste the data from the Excel program by copying the columns we want and then pasting it. However if we want to copy the names we need to move the image up one cell by clicking on the up button for the sheet. Once this is done you reveal the line in the sheet that has the name of the series - thus you can import the name as well and when there are multiple series you can keep straight which is which without having to rename them once brought in. Thus the screen will look like:

Page 34: Tutes_new_1

J. Hirschberg ECON20003 Semester 1, 2014

34

The row entitled “obs” is where we paste the series names. Now returning to Excel we copy the series with a first row that lists the name so in this example we need to insert a row just above line 8 where the data begins it is:

Note it is necessary to highlight and copy the entire series or sets of series. Then return to the Eviews program and paste the columns starting in the upper left hand corner. Thus it should look like:

Page 35: Tutes_new_1

J. Hirschberg ECON20003 Semester 1, 2014

35

Now you can close the window - it will ask if you want to delete the group if you answer yes it will not lose the data.