regression using lm lmregression.r basics prediction world bank co2 data
TRANSCRIPT
![Page 1: Regression using lm lmRegression.R Basics Prediction World Bank CO2 Data](https://reader036.vdocument.in/reader036/viewer/2022083008/56649eeb5503460f94bfcf48/html5/thumbnails/1.jpg)
Regression using lmlmRegression.R
• Basics• Prediction• World Bank CO2 Data
![Page 2: Regression using lm lmRegression.R Basics Prediction World Bank CO2 Data](https://reader036.vdocument.in/reader036/viewer/2022083008/56649eeb5503460f94bfcf48/html5/thumbnails/2.jpg)
Simple Linear regression
• Simple linear model: y = b1 + x b2 + error
y: the dependent variable x: the independent variable b1, b2 : intercept and slope coefficients
error: random departures between the model and the response.
Coefficients estimated by least squares
![Page 3: Regression using lm lmRegression.R Basics Prediction World Bank CO2 Data](https://reader036.vdocument.in/reader036/viewer/2022083008/56649eeb5503460f94bfcf48/html5/thumbnails/3.jpg)
Multiple regression
• y = b0 + x1 b1 + x2b2 + x3b3 + … + error
![Page 4: Regression using lm lmRegression.R Basics Prediction World Bank CO2 Data](https://reader036.vdocument.in/reader036/viewer/2022083008/56649eeb5503460f94bfcf48/html5/thumbnails/4.jpg)
Annual Boulder Temperatures
Temperature is dependent variable, Year is the independent variableErrors =???? Linear =???
![Page 5: Regression using lm lmRegression.R Basics Prediction World Bank CO2 Data](https://reader036.vdocument.in/reader036/viewer/2022083008/56649eeb5503460f94bfcf48/html5/thumbnails/5.jpg)
CO 2 Emissions by Country
• Independent: GDP/capita• Dependent: CO2 emission• Linear?? Errors ??
![Page 6: Regression using lm lmRegression.R Basics Prediction World Bank CO2 Data](https://reader036.vdocument.in/reader036/viewer/2022083008/56649eeb5503460f94bfcf48/html5/thumbnails/6.jpg)
The R lm function
• Takes a formula to describe the regression where ~ means equals
• Works best when the data set is a data frame• Returns a complicated list that can be used in summary,
predict, print plot lmFit <- lm( y ~ x1 + x2)
![Page 7: Regression using lm lmRegression.R Basics Prediction World Bank CO2 Data](https://reader036.vdocument.in/reader036/viewer/2022083008/56649eeb5503460f94bfcf48/html5/thumbnails/7.jpg)
Or more generally using a data frame
lmFit <- lm( y ~ x1 + x2, data=dataset)
dataset$y, dataset$x1, dataset$x2
![Page 8: Regression using lm lmRegression.R Basics Prediction World Bank CO2 Data](https://reader036.vdocument.in/reader036/viewer/2022083008/56649eeb5503460f94bfcf48/html5/thumbnails/8.jpg)
Analysis of World Bank data set
• Best to work on a log scale and GDP has the strongest linear relationship
• Some additional pattern leftover in the residuals
• Try other variables • Try a more complex curve• Check the predictions using cross-validation
![Page 9: Regression using lm lmRegression.R Basics Prediction World Bank CO2 Data](https://reader036.vdocument.in/reader036/viewer/2022083008/56649eeb5503460f94bfcf48/html5/thumbnails/9.jpg)
Leave-one-out Cross-validation• Robust way to check a models predictions andthe uncertainty measure
• Four steps:1. Sequentially leave out each observation2. Refit model with remaining data3. Predict the omitted observation4. Compare prediction and confidence interval to the actual
observation
A check on the consistency of the statistical modelBecause omitted observation is not used to make prediction