multiple regression

21
Multiple Regression Topic 5

Upload: shiva-shankar

Post on 16-Sep-2015

230 views

Category:

Documents


13 download

DESCRIPTION

MR

TRANSCRIPT

  • Multiple RegressionTopic 5

  • AgendaBackgroundExample with Real DataSome ConsiderationsKey TermsSummary

  • BackgroundMultiple Linear Regression is widely used in academics and also in MRWe can consider it the start of Multivariate Analysis, for our courseAny idea what the following are:Multivariate analysisMultiple Linear Regression (MLR)

  • BackgroundMultivariate analysis is hard to define wellSome say anytime you have more than 2 variables, it is multivariateSome say that you need to have many combinations of variables i.e. variatesSome say that you need to have multiple dependent variablesFor all practical purposes, following can be considered multivariate MLRFactor analysis

  • BackgroundDiscriminant analysisCluster AnalysisConjoint analysisCanonical CorrelationStructural Equation ModelingWe shall consider just MLR, factor, discriminant and cluster analysesLinear regression involves finding a linear relationship between an independent variable and dependent variable

  • BackgroundDifferent levels of an independent variable are associated with corresponding changes in the dependent variableWhat is an IV? What is a DV?IV is denoted by X, while DV is denoted YWe can loosely say X causes YAny idea how regression works? The principle behind it? In what scale the IV is, the DV is?Assume one X, one Y

  • BackgroundNormally, the IV & DV continuous, not discreteMeaning?In regression, a line is repeatedly fitted in the scatter-plot of X and YThe line of best fit is the regression lineConsider the following data

  • Background

  • BackgroundLet us plot the pointsDrawing a line of best fit is childs playThe association is perfectly linearIn real life, we rarely find data that are so perfectWe instead may find data that may be as followsThus, the line of best fit is the regression lineThere is some errorBut the idea is to minimise this error; how is this done?

  • BackgroundThe sum of least squares is followedDifferent lines are fitted, the errors squared and the line with the sum of least squares is chosen finallySometimes, MLR is called OLS or Ordinary least squaresWhy should one square the errors and then add? Why not just add up?

  • BackgroundThe idea is 2-foldWe cancel out +ve and ve errorsWe penalise large errorsThis is a 1-IV case, similar with n IVsImpossible to show on the boardNow let us consider some real data and perform a regression

  • Some ConsiderationsCan also handle non-metric or categorical IVs e.g. gender influences shopping timeThis is called dummy codingBasically dummy regression is the same as an ANOVABoth are forms of the General Linear ModelWhile MLR is useful, it has certain prerequisites and limitations

  • Some ConsiderationsThere should be not be collinearity between the IVsThis creates biased estimatesFirst step is therefore to get the correlation matrix in Excel/SPSSHow to remove this collinearity?One should also go into MLR with sufficient research on likely relationshipsElse, may end up doing sample-specific data miningNo guarantee about robustness of results

  • Some ConsiderationsThe shot-gun approach should be avoidedMR firms may not agreeThere should not be heteroscedasticity in the DV2 marks bonus for saying this orally in the final!This can be got around by transforming the data using log, inverse, square rootCannot handle non-linear relationshipsConsider the following data

  • Some Considerations

  • Some ConsiderationsSPSS will give you a decent regression but it misses the pointHave to use polynomial regression, beyond scopeMust take great care in ensuring all IVs put in, else may reach utterly erroneous conclusions e.g.Sales on Ad, leaving out Price, SP

  • Some ConsiderationsIdeally have some likely results in mind before going in for data collectionMR firms screw up here We academics score big hereWhy is this important?In case no working knowledge is there, use stepwise regressionIt will give you the order of importance

  • Some ConsiderationsIn exploratory research, ok to use itNot a big fan of stepwise

  • Key Terms A ReviewCoefficient of Determination, R2, gives the extent of variation in Y explained by X (or X1, X2 and so on)Also called variance explainedBetter would be adjusted R2b is the unstandardised weight and is the standardised weightSince different units may be there for diff IVs

  • Key Terms A ReviewF-Value and t-value must be looked at tooAny doubts?Do you want to learn how regression can handle Categorical dataInteraction effects? What problems will come here?Need demo?

  • SummaryMLR is a very useful tool It has wide applicationsBut must be careful to avoid violating fundamental assumptions, mainly multicollinearityEsp. in MR