math 3359 introduction to mathematical modeling download/import/modify data, logistic regression
TRANSCRIPT
MATH 3359 Introduction to Mathematical
Modeling
Download/Import/Modify Data,
Logistic Regression
OutlinePrepare data
Download dataImport dataModify data
Logistic RegressionIntroductionFit logistic regression modelExercise
Download Data
Data.gov: http://www.data.gov/
Import Data
Web URL:
http://www.ats.ucla.edu/stat/data/binary.csv
Files:Excel files, CVS files, text files (.txt), SPSS files, Minitab files
Modify DataExample: A study to investigate the effects of
3 treatments on hypertensionDrug medication: X, Y, ZPhysiological feedback: present, absentDiet: present, absent
Bp: blood pressure
Modify Data— Function ‘subset’ in R
# Delete missing data
subset(hypertension, !(bp=='NA'))
# select certain rows
subset(hypertension, bp<170)
subset(hypertension, bp<170 & drug=='X')
subset(hypertension,bp<170 | drug=='X')
# select certain columns
h1=subset(hypertension, select=c(biof,bp))
Modify Data— Function ‘level’ in R
# Rename levels of a factor
levels(hypertension$diet)
levels(hypertension$diet)=c(0,1)
Modify Data— function ‘order’ in R# sort by biof
hypertension[order(hypertension$biof), ]
# sort by drug
hypertension[order(hypertension$drug), ]
# sort by drug and diet
hypertension[order(hypertension$drug, hypertension$diet), ]
#sort by drug (ascending) and diet (descending)
data1=hypertension[order(hypertension$drug), ]
data1[order(hypertension$diet,decreasing = TRUE), ]
Exercise
attach(mtcars)
Select the subset of the data with columns ‘am’ and ‘vs’
Delete rows with am==0
Sort the dataset by vs (increasing)
Sort the dataset by mpg (decreasing)
Logistic Regression
The simple and multiple linear regression model apply to the data with a continuous response variable
Normality Assumptions
However, in many situations, we often have a binary (or ordinal) response variable
How to explore this type of relationship?
Example — Data
vs: types of the engine, 0: V engine, 1: straight engine
am: transmission, 0: auto, 1: manual
mpg: miles(US)/gallon
What is the relationship between am and mpg ?
Linear RegressionNot a good fit
Increasing relationship between the am and mpg
But we may want a simpler model, with am predicted as 0 or 1
We need a model for analyzing data with binary response
About Binary Response
Odds:
Log-odds:
Logistic Regression
Logistic regression models the log-odds as a linear function of explanatory variables
Logistic Regression— Function ‘glm’ in Rlogreg=glm(fomula, family=‘binomial’ ,data=binary)
glm: generalized linear model
Family: distribution of variance
Data: name of the dataset
> summary(logreg)
Call:glm(formula = am ~ mpg, family = binomial, data = mtcars)
Deviance Residuals: Min 1Q Median 3Q Max -1.5701 -0.7531 -0.4245 0.5866 2.0617
Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -6.6035 2.3514 -2.808 0.00498 **mpg 0.3070 0.1148 2.673 0.00751 **---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1Final Model:
Final Model:
For every one unit change in mpg,
the log odds of manual (versus auto) increases by 0.3070,
odds of manual (versus auto) increases by exp(0.3070)=1.36.
Function ‘curve’ in R
plot(mtcars$mpg, mtcars$am,main='Logistic Regression', xlab='mpg', ylab='am’)
curve(predict(logreg, data.frame(mpg=x), type="response"), add=TRUE)
Exercise
Fit the logistic regression of vs (as response) and mpg (as independent variable).
What is the final logistic model?
Is mpg significant ?
Make the scatter plot and add the fitted logistic regression curve on it.