The following data includes students’ GPAs (both high school and college) and (combined) SAT scores with whether or not they passed Math 140 at Hampden-Sydney.
calc <- read.csv("https://people.hsc.edu/faculty-staff/blins/StatsExamples/calculusData.csv")
head(calc)
## sat hs_gpa college_gpa grade pass
## 1 1260 4.0 3.971 A 1
## 2 1300 4.4 3.400 A 1
## 3 1540 4.4 3.979 A 1
## 4 1430 4.2 2.993 A 1
## 5 1110 4.0 3.364 A 1
## 6 1360 4.5 3.961 A 1
In this case the two explanatory variables (SAT score and high school GPA) are both quantitative, but the response variable is a binary categorical variable which is recorded as 1 for a passing grade, and 0 for a failing grade).
We’ll start with a logistic regression model using just high school GPA as a predictor variable.
model1 <- glm(pass ~ hs_gpa, data = calc, family = "binomial")
model1
##
## Call: glm(formula = pass ~ hs_gpa, family = "binomial", data = calc)
##
## Coefficients:
## (Intercept) hs_gpa
## -5.531 1.828
##
## Degrees of Freedom: 159 Total (i.e. Null); 158 Residual
## Null Deviance: 179.9
## Residual Deviance: 161.7 AIC: 165.7
Here is a plot of the data and the logistic regression model.
plot(calc$hs_gpa, calc$pass, xlab = "High school GPA", ylab = "Probability of passing calculus", pch = 16)
curve(predict(model1, data.frame(hs_gpa = x), type='response'), add=T)
You can use this logistic regression model to predict the log-odds of a student passing Math 140.
predict(model1, data.frame(hs_gpa = 2.0))
## 1
## -1.87529
You still have to use the formulas \[\text{odds} = \exp(\text{log-odds}) ~~~~\text{ and } ~~~~ p = \frac{\text{odds}}{\text{odds} + 1}\] to convert the log-odds into the probability that a student passes.
log.odds <- predict(model1, data.frame(hs_gpa = 2.0))
exp(log.odds)/(exp(log.odds) + 1)
## 1
## 0.1329308
model2 <- glm(pass ~ hs_gpa + sat, data = calc, family = "binomial")
model2$coefficients
## (Intercept) hs_gpa sat
## -12.535680468 1.616574548 0.006763665
exp(model2$coefficients)
## (Intercept) hs_gpa sat
## 3.596029e-06 5.035811e+00 1.006787e+00
This model has the form \[\log \left( \text{odds of passing} \right) = -12.5356805 + 1.6165745 \text{hs_gpa} + 0.0067637 \text{sat}.\]
Once again, we can use it to make predictions about a single student.
predict(model2, data.frame(hs_gpa = 2.0, sat = 1000))
## 1
## -2.538867