student.data <- read.csv("StudentData.csv")
head(student.data)
## Height Weight Birthplace Siblings
## 1 73 185 VA 1
## 2 70 160 VA 0
## 3 70 170 VA 1
## 4 74 170 SC 2
## 5 72 150 VA 2
## 6 74 302 NC 1
The student.data data frame contains data from my Math
121 students this year.
R makes it very easy to quickly generate confidence intervals for
categorical variables. First you need to compute the number of
“successes” and the total sample size. In this case, we can count
students born in VA as “successes” and we can use the
nrow() function to find the sample size.
n <- nrow(student.data) # n is the sample size
x <- sum(student.data$Birthplace == 'VA') # x is the number born in VA
prop.test(x, n)
##
## 1-sample proportions test with continuity correction
##
## data: x out of n, null probability 0.5
## X-squared = 0.125, df = 1, p-value = 0.7237
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
## 0.3547653 0.5926582
## sample estimates:
## p
## 0.4722222
In the sample, 34 out of 72 students were born in VA, which is 47.2222222%.
Notice that we get a lot more information than just a confidence
interval since the prop.test() function also calculates a
chi-squared statistic, and does a hypothesis test to see if our sample
proportion is significantly different than the null value which is \(p = 0.5\) by default.
x <- sum(student.data$Siblings == 0) # Now x is the number in the sample with no siblings
prop.test(x, n)
##
## 1-sample proportions test with continuity correction
##
## data: x out of n, null probability 0.5
## X-squared = 55.125, df = 1, p-value = 1.131e-13
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
## 0.01794643 0.14347233
## sample estimates:
## p
## 0.05555556