math 3359 introduction to mathematical modeling download/import/modify data, logistic regression

20
MATH 3359 Introduction to Mathematical Modeling Download/Import/Modify Data, Logistic Regression

Upload: scott-harmon

Post on 23-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MATH 3359 Introduction to Mathematical Modeling Download/Import/Modify Data, Logistic Regression

MATH 3359 Introduction to Mathematical

Modeling

Download/Import/Modify Data,

Logistic Regression

Page 2: MATH 3359 Introduction to Mathematical Modeling Download/Import/Modify Data, Logistic Regression

OutlinePrepare data

Download dataImport dataModify data

Logistic RegressionIntroductionFit logistic regression modelExercise

Page 3: MATH 3359 Introduction to Mathematical Modeling Download/Import/Modify Data, Logistic Regression

Download Data

Data.gov: http://www.data.gov/

Page 4: MATH 3359 Introduction to Mathematical Modeling Download/Import/Modify Data, Logistic Regression

Import Data

Web URL:

http://www.ats.ucla.edu/stat/data/binary.csv

Files:Excel files, CVS files, text files (.txt), SPSS files, Minitab files

Page 5: MATH 3359 Introduction to Mathematical Modeling Download/Import/Modify Data, Logistic Regression

Modify DataExample: A study to investigate the effects of

3 treatments on hypertensionDrug medication: X, Y, ZPhysiological feedback: present, absentDiet: present, absent

Bp: blood pressure

Page 6: MATH 3359 Introduction to Mathematical Modeling Download/Import/Modify Data, Logistic Regression

Modify Data— Function ‘subset’ in R

# Delete missing data

subset(hypertension, !(bp=='NA'))

# select certain rows

subset(hypertension, bp<170)

subset(hypertension, bp<170 & drug=='X')

subset(hypertension,bp<170 | drug=='X')

# select certain columns

h1=subset(hypertension, select=c(biof,bp))

Page 7: MATH 3359 Introduction to Mathematical Modeling Download/Import/Modify Data, Logistic Regression

Modify Data— Function ‘level’ in R

# Rename levels of a factor

levels(hypertension$diet)

levels(hypertension$diet)=c(0,1)

Page 8: MATH 3359 Introduction to Mathematical Modeling Download/Import/Modify Data, Logistic Regression

Modify Data— function ‘order’ in R# sort by biof

hypertension[order(hypertension$biof), ]

# sort by drug

hypertension[order(hypertension$drug), ]

# sort by drug and diet

hypertension[order(hypertension$drug, hypertension$diet), ]

#sort by drug (ascending) and diet (descending)

data1=hypertension[order(hypertension$drug), ]

data1[order(hypertension$diet,decreasing = TRUE), ]

Page 9: MATH 3359 Introduction to Mathematical Modeling Download/Import/Modify Data, Logistic Regression

Exercise

attach(mtcars)

Select the subset of the data with columns ‘am’ and ‘vs’

Delete rows with am==0

Sort the dataset by vs (increasing)

Sort the dataset by mpg (decreasing)

Page 10: MATH 3359 Introduction to Mathematical Modeling Download/Import/Modify Data, Logistic Regression

Logistic Regression

The simple and multiple linear regression model apply to the data with a continuous response variable

Normality Assumptions

However, in many situations, we often have a binary (or ordinal) response variable

How to explore this type of relationship?

Page 11: MATH 3359 Introduction to Mathematical Modeling Download/Import/Modify Data, Logistic Regression

Example — Data

vs: types of the engine, 0: V engine, 1: straight engine

am: transmission, 0: auto, 1: manual

mpg: miles(US)/gallon

What is the relationship between am and mpg ?

Page 12: MATH 3359 Introduction to Mathematical Modeling Download/Import/Modify Data, Logistic Regression

Linear RegressionNot a good fit

Increasing relationship between the am and mpg

But we may want a simpler model, with am predicted as 0 or 1

We need a model for analyzing data with binary response

Page 13: MATH 3359 Introduction to Mathematical Modeling Download/Import/Modify Data, Logistic Regression

About Binary Response

Odds:

Log-odds:

Page 14: MATH 3359 Introduction to Mathematical Modeling Download/Import/Modify Data, Logistic Regression

Logistic Regression

Logistic regression models the log-odds as a linear function of explanatory variables

Page 15: MATH 3359 Introduction to Mathematical Modeling Download/Import/Modify Data, Logistic Regression

Logistic Regression— Function ‘glm’ in Rlogreg=glm(fomula, family=‘binomial’ ,data=binary)

glm: generalized linear model

Family: distribution of variance

Data: name of the dataset

Page 16: MATH 3359 Introduction to Mathematical Modeling Download/Import/Modify Data, Logistic Regression

> summary(logreg)

Call:glm(formula = am ~ mpg, family = binomial, data = mtcars)

Deviance Residuals: Min 1Q Median 3Q Max -1.5701 -0.7531 -0.4245 0.5866 2.0617

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -6.6035 2.3514 -2.808 0.00498 **mpg 0.3070 0.1148 2.673 0.00751 **---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1Final Model:

Page 17: MATH 3359 Introduction to Mathematical Modeling Download/Import/Modify Data, Logistic Regression

Final Model:

For every one unit change in mpg,

the log odds of manual (versus auto) increases by 0.3070,

odds of manual (versus auto) increases by exp(0.3070)=1.36.

Page 18: MATH 3359 Introduction to Mathematical Modeling Download/Import/Modify Data, Logistic Regression

Function ‘curve’ in R

plot(mtcars$mpg, mtcars$am,main='Logistic Regression', xlab='mpg', ylab='am’)

curve(predict(logreg, data.frame(mpg=x), type="response"), add=TRUE)

Page 19: MATH 3359 Introduction to Mathematical Modeling Download/Import/Modify Data, Logistic Regression
Page 20: MATH 3359 Introduction to Mathematical Modeling Download/Import/Modify Data, Logistic Regression

Exercise

Fit the logistic regression of vs (as response) and mpg (as independent variable).

What is the final logistic model?

Is mpg significant ?

Make the scatter plot and add the fitted logistic regression curve on it.