Example: Predicting success in calculus

The following data includes students’ GPAs (both high school and college) and (combined) SAT scores with whether or not they passed Math 140 at Hampden-Sydney.

calc <- read.csv("https://people.hsc.edu/faculty-staff/blins/StatsExamples/calculusData.csv")
head(calc)
##    sat hs_gpa college_gpa grade pass
## 1 1260    4.0       3.971     A    1
## 2 1300    4.4       3.400     A    1
## 3 1540    4.4       3.979     A    1
## 4 1430    4.2       2.993     A    1
## 5 1110    4.0       3.364     A    1
## 6 1360    4.5       3.961     A    1

In this case the two explanatory variables (SAT score and high school GPA) are both quantitative, but the response variable is a binary categorical variable which is recorded as 1 for a passing grade, and 0 for a failing grade).

One variable model

We’ll start with a logistic regression model using just high school GPA as a predictor variable.

model1 <- glm(pass ~ hs_gpa, data = calc, family = "binomial")
model1
## 
## Call:  glm(formula = pass ~ hs_gpa, family = "binomial", data = calc)
## 
## Coefficients:
## (Intercept)       hs_gpa  
##      -5.531        1.828  
## 
## Degrees of Freedom: 159 Total (i.e. Null);  158 Residual
## Null Deviance:       179.9 
## Residual Deviance: 161.7     AIC: 165.7

Here is a plot of the data and the logistic regression model.

plot(calc$hs_gpa, calc$pass, xlab = "High school GPA", ylab = "Probability of passing calculus", pch = 16)
curve(predict(model1, data.frame(hs_gpa = x), type='response'), add=T)

You can use this logistic regression model to predict the log-odds of a student passing Math 140.

predict(model1, data.frame(hs_gpa = 2.0))
##        1 
## -1.87529

You still have to use the formulas \[\text{odds} = \exp(\text{log-odds}) ~~~~\text{ and } ~~~~ p = \frac{\text{odds}}{\text{odds} + 1}\] to convert the log-odds into the probability that a student passes.

log.odds <- predict(model1, data.frame(hs_gpa = 2.0))
exp(log.odds)/(exp(log.odds) + 1)
##         1 
## 0.1329308

Two variable model

model2 <- glm(pass ~ hs_gpa + sat, data = calc, family = "binomial")
model2$coefficients
##   (Intercept)        hs_gpa           sat 
## -12.535680468   1.616574548   0.006763665
exp(model2$coefficients)
##  (Intercept)       hs_gpa          sat 
## 3.596029e-06 5.035811e+00 1.006787e+00

This model has the form \[\log \left( \text{odds of passing} \right) = -12.5356805 + 1.6165745 \text{hs_gpa} + 0.0067637 \text{sat}.\]

Once again, we can use it to make predictions about a single student.

predict(model2, data.frame(hs_gpa = 2.0, sat = 1000))
##         1 
## -2.538867