student.data <- read.csv("StudentData.csv")
head(student.data)
##   Height Weight Birthplace Siblings
## 1     73    185         VA        1
## 2     70    160         VA        0
## 3     70    170         VA        1
## 4     74    170         SC        2
## 5     72    150         VA        2
## 6     74    302         NC        1

The student.data data frame contains data from my Math 121 students this year.

Percent of HSC students born in Virginia

R makes it very easy to quickly generate confidence intervals for categorical variables. First you need to compute the number of “successes” and the total sample size. In this case, we can count students born in VA as “successes” and we can use the nrow() function to find the sample size.

n <- nrow(student.data) # n is the sample size
x <- sum(student.data$Birthplace == 'VA') # x is the number born in VA
prop.test(x, n)
## 
##  1-sample proportions test with continuity correction
## 
## data:  x out of n, null probability 0.5
## X-squared = 0.125, df = 1, p-value = 0.7237
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
##  0.3547653 0.5926582
## sample estimates:
##         p 
## 0.4722222

In the sample, 34 out of 72 students were born in VA, which is 47.2222222%.

Notice that we get a lot more information than just a confidence interval since the prop.test() function also calculates a chi-squared statistic, and does a hypothesis test to see if our sample proportion is significantly different than the null value which is \(p = 0.5\) by default.

Percent of HSC students who are only children

x <- sum(student.data$Siblings == 0) # Now x is the number in the sample with no siblings
prop.test(x, n)
## 
##  1-sample proportions test with continuity correction
## 
## data:  x out of n, null probability 0.5
## X-squared = 55.125, df = 1, p-value = 1.131e-13
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
##  0.01794643 0.14347233
## sample estimates:
##          p 
## 0.05555556