Statistical Methods Notes

Math 222 - Spring 2026

Jump to: Math 222 homepage, Week 1, Week 2, Week 3, Week 4, Week 5, Week 6, Week 7, Week 8, Week 9, Week 10, Week 11, Week 12, Week 13, Week 14

Week 1 Notes

Day Section Topic
Mon, Jan 12 Working with R and Rstudio
Wed, Jan 14 1.3 Sampling principles and strategies
Fri, Jan 16 1.4 Experiments

Mon, Jan 12

Today we went over the course syllabus and talked about making R-markdown files in Rstudio. We started the following lab in class, I recommend finishing the second half on your own. I also recommend installing Rstudio on your own laptop (it’s free).

Wed, Jan 14

Today we reviewed populations and samples. We started with a famous example of a bad sample.

Then we reviewed population parameters, sample statistics, and sampling frames. The difference between a sample statistic and a population parameter is called the sample error.
There are two sources of sample error:

  1. Bias. Can be caused by a non-representative sample (sample bias) or by measurement errors, non-response, or biased questions (non-sample bias). The only way to avoid sample bias is a simple random sample (SRS) from the whole population.

  2. Random error. This is non-systematic error. It tends to get smaller with larger samples.

To summarize: Statistics=Parameters+Bias + Random errorSample error.\text{Statistics} = \text{Parameters} + \underbrace{\text{Bias } + \text{ Random error}}_\text{Sample error}.

We finished with this workshop.

Fri, Jan 16

If you find an association between an explanatory variable and a response variable in an observational study, then you can’t say for sure that the explanatory variable is the cause. We say that correlation is not causation because there might be lurking variables that are confounders, that is, they are associated with both the explanatory and response variables and so you can tell what is the true cause.

It turns out that randomized experiments can prove cause and effect because random assignment to treatment groups controls all lurking variables. We also talked about blocking and double-blind experiments.

We finished by simulating the results of the polio vaccine trials to see if they might just be a random fluke. We wrote this R code in class:

results = c()
trials <- 1000
for (x in 1:trials) {
  simulated.result <- sample(c(0,1), size = 244, replace = TRUE)
  percent <- sum(simulated.result) / 244
  results <- c(results, percent)
}
hist(results)
sum(results < 0.336) / trials

Week 2 Notes

Day Section Topic
Mon, Jan 19 Martin Luther King day - no class
Wed, Jan 21 2.1 Examining numerical data
Fri, Jan 23 3.2 Conditional probability

Wed, Jan 21

Today we did a lab about using R to visualize data.

You should be able to open this file in your browser, then hit CTRL-A and CTRL-C to select it and copy it so that you can paste it into Rstudio as an R-markdown document.

We had a little trouble with R-markdown on the lab computers.

Fri, Jan 23

Last time we talked about how to visualize data with R. Here are two quick summaries of how to make plots in R:

After that, we started talking about probability. We review some of the basic rules.

Probability Rules

  1. Addition Rule P(A or B)=P(A)+P(B)P(A and B).P(A \text{ or } B) = P(A) + P(B) - P(A \text{ and } B).
  2. Multiplication Rule P(A and B)=P(A)P(B|A).P(A \text{ and } B) = P(A) \cdot P(B \, | \, A).

The notation P(B|A)P(B \, |\, A) means “the probability of B given that A happened”. Two events AA and BB are independent if the probability of AA does not depend on whether or not BB happens. We did the following examples.

  1. If you shuffle a deck of 52 playing cards and then draw two, what is the probability that the second card is an ace if the first card is?

We also talked about tree diagrams (see subsection 3.2.7 from the book) and how to use them to compute probabilities.

  1. Based on a study of women in the United States and Germany, there is an 0.8% chance that a woman in her forties has breast cancer. Mammograms are 90% accurate at detecting breast cancer if someone has it. They are also 93% accurate at not detecting cancer in people who don’t have it. If a woman in her forties tests positive for cancer on a mammogram screening, what is the probability that she actually has breast cancer?

  2. 5% of men are color blind, but only 0.25% of women are. Find P(male|color-blind)P( \text{male} \, | \, \text{color-blind}).


Week 3 Notes

Day Section Topic
Mon, Jan 26 Class canceled (snow)
Wed, Jan 28 3.4 Random variables
Fri, Jan 30 4.1 Normal distribution

Week 4 Notes

Day Section Topic
Mon, Feb 2 4.3 Binomial distribution
Wed, Feb 4 5.1 Point estimates and error
Fri, Feb 6 5.2 Confidence intervals for a proportion

Week 5 Notes

Day Section Topic
Mon, Feb 9 5.3 Hypothesis tests for a proportion
Wed, Feb 11 Review
Fri, Feb 13 Midterm 1

Week 6 Notes

Day Section Topic
Mon, Feb 16 6.2 Difference in two proportions
Wed, Feb 18 6.3 Chi-squared goodness of fit test
Fri, Feb 20 6.4 Chi-squared test for association

Week 7 Notes

Day Section Topic
Mon, Feb 23 7.1 One-sample means with t-distribution
Wed, Feb 25 7.2 Paired data
Fri, Feb 27 7.3 Difference of two means

Week 8 Notes

Day Section Topic
Mon, Mar 2 7.4 Power calculations
Wed, Mar 4 7.5 Comparing many means with ANOVA
Fri, Mar 6 7.5 ANOVA - con’d

Week 9 Notes

Day Section Topic
Mon, Mar 16 7.5 ANOVA - con’d
Wed, Mar 18 Review
Fri, Mar 20 Midterm 2

Week 10 Notes

Day Section Topic
Mon, Mar 23 8.2 Least squares regression
Wed, Mar 25 9.1 Introduction to multiple regression
Fri, Mar 27 9.2 Model selection

Week 11 Notes

Day Section Topic
Mon, Mar 30 9.3 Checking model conditions
Wed, Apr 1 9.3 Checking model conditions - con’d
Fri, Apr 3 9.5 Introduction to logistic regression

Week 12 Notes

Day Section Topic
Mon, Apr 6 9.5 Logistic regression - con’d
Wed, Apr 8 Hypothesis testing with randomization
Fri, Apr 10 Confidence intervals with bootstrapping

Week 13 Notes

Day Section Topic
Mon, Apr 13 Bootstrapping continued
Wed, Apr 15 Review
Fri, Apr 17 Midterm 3

Week 14 Notes

Day Section Topic
Mon, Apr 20 Introduction to Bayesian methods
Wed, Apr 22 Credible intervals for proportions
Fri, Apr 24 Bayesian inference
Mon, Apr 27 Last day, recap & review