math 3359 introduction to mathematical modeling project multiple linear regression multiple logistic...

Post on 11-Jan-2016

238 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

MATH 3359 Introduction to Mathematical

Modeling

Project

Multiple Linear Regression

Multiple Logistic Regression

Project

Dataset: Any fields you are interested in,

large sample size

Methods: simple/multiple linear regression

simple/multiple logistic regression

Due on April 23rd

OutlineMultiple Linear Regression

IntroductionMake scatter plots of the data Fit multiple linear regression modelPrediction

Multiple Logistic RegressionIntroductionFit multiple logistic regression modelExercise

Given a data set {yi, xi, i=1,…,n} of n observations,

yi is dependent variable, xi is independent variable,

the linear regression model is

or where

Recall: Simple Linear Regression

Given a data set of n observations,

yi is dependent variable,

are independent variables,

the linear regression model is

Multiple Linear Regression

Generally, we can do transformations for those xi’s before plugging them in the model and they might not be independent with each other.

1. Transformations:

2. Dependent case:

3. Cross-Product Terms:

ExampleThe data includes the selling price at auction of 32 antique grandfather clocks. The ages of the clocks and the number of people who mad a bid are also recorded in this dataset.

Age Bidders Price127 13 1235115 12 1080127 7 845150 9 1522156 6 1047

Recall: Scatter Plots — Function ‘plot’

plot(auction$Age , auction$Price , main='Relationship between Price and Age')

plot(auction$Bidders , auction$Price , main='Relationship between Price and Number of bidders')

plot ( auction )

Fit Multiple Linear Regression Model

— Function ‘lm’ in Rreg= lm ( formula , data )  

summary ( reg )

In our example,reg= lm ( Price ~ Age + Bidders , data = auction )

> summary(reg)

Call:lm(formula = Price ~ Age + Bidders, data = auction)

Residuals: Min 1Q Median 3Q Max -207.2 -117.8 16.5 102.7 213.5

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -1336.7221 173.3561 -7.711 1.67e-08 ***Age 12.7362 0.9024 14.114 1.60e-14 ***Bidders 85.8151 8.7058 9.857 9.14e-11 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Hence, the function of best fit isPrice = 12.7362 * Age + 85.8151 * Bidders – 1336.7221

Prediction — Function ‘predict’ in R

predict the average price of the clock with Age=150, bidders=10:

predict ( reg , data.frame ( Age=150,Bidders=10) )

predict the average price of the clock with Age=150, Bidders=10 and Age=160, Bidders=5:

predict ( reg , data.frame ( Age=c(150,160), Bidders=c(10,5)) )

Exercise

1. Download data:http://www.statsci.org/data/multiple.html‘Mass and Physical Measurements for Male Subjects’

2. Import txt file in R

3. Use ‘Mass’ as the response, ‘ Fore’, ‘Waist’, ‘Height’ and ‘Thigh’ as independent variables

4. Make scatter plot for the response and each of the independent variables

5. Fit the multiple linear regression

6. Predict ‘Mass’ with Fore= 30, Waist=180, Height=38 and Thigh=58 and with Fore=29, Waist=179, Height=39 and Thigh=57

Recall: Simple Logistic Regression

Odds:

Log-odds:

Recall: Simple Logistic Regression

Logistic regression models the log-odds as a linear function of independent variables

Not a linear function of X

Multiple Logistic Regression

Example

am: transmission, 0: auto, 1: manualhp: gross horsepowerwt: weight (lb/1000)

Multiple Logistic Regression

— Function ‘glm’ in Rlogreg=glm(fomula, family=‘binomial’ ,data=binary)

glm: generalized linear model

Family: distribution of variance

Data: name of the dataset

In the example,

reg = lm ( am ~ hp + wt , data = mtcars )

> summary(reg)

Call:lm(formula = am ~ hp + wt, data = mtcars)

Residuals: Min 1Q Median 3Q Max -0.6309 -0.2562 -0.1099 0.3039 0.5301

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.547430 0.211046 7.332 4.46e-08 ***hp 0.002738 0.001192 2.297 0.029 * wt -0.479556 0.083523 -5.742 3.24e-06 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Final Model:

Final Model:

For every one unit change in hp,

the log odds of manual (versus auto) increases by 0.002738,

odds of manual (versus auto) increases by exp(0.002738)=1.002742.

For every one unit change in wt,

the log odds of manual (versus auto) decreases by 0.479556,

odds of manual (versus auto) decreases by exp(0.479556)=1.615357.

Exercise

1. Import data from web:

http://www.ats.ucla.edu/stat/data/binary.csv

2. Fit the logistic regression of admit (as response) and gre, rank and gpa (as independent variables).

What is the final logistic model?

Are three independent variables significant ?

glm(formula, family=‘binomial’, data=)

top related