Statistical Methods Notes

Math 222 - Spring 2026

Jump to: Math 222 homepage, Week 1, Week 2, Week 3, Week 4, Week 5, Week 6, Week 7, Week 8, Week 9, Week 10, Week 11, Week 12, Week 13, Week 14

Week 1 Notes

Day	Section	Topic
Mon, Jan 12		Working with R and Rstudio
Wed, Jan 14	1.3	Sampling principles and strategies
Fri, Jan 16	1.4	Experiments

Mon, Jan 12

Today we went over the course syllabus and talked about making R-markdown files in Rstudio. We started the following lab in class, I recommend finishing the second half on your own. I also recommend installing Rstudio on your own laptop (it’s free).

Lab: Working with R and Rstudio

Wed, Jan 14

Today we reviewed populations and samples. We started with a famous example of a bad sample.

Slides: Literary Digest election poll 1936

Then we reviewed population parameters, sample statistics, and sampling frames. The difference between a sample statistic and a population parameter is called the sample error.
There are two sources of sample error:

Bias. Can be caused by a non-representative sample (sample bias) or by measurement errors, non-response, or biased questions (non-sample bias). The only way to avoid sample bias is a simple random sample (SRS) from the whole population.
Random error. This is non-systematic error. It tends to get smaller with larger samples.

To summarize: $\text{Statistics} = \text{Parameters} + \underbrace{\text{Bias } + \text{ Random error}}_\text{Sample error}.$

We finished with this workshop.

Workshop: Sampling

Fri, Jan 16

If you find an association between an explanatory variable and a response variable in an observational study, then you can’t say for sure that the explanatory variable is the cause. We say that correlation is not causation because there might be lurking variables that are confounders, that is, they are associated with both the explanatory and response variables and so you can tell what is the true cause.

It turns out that randomized experiments can prove cause and effect because random assignment to treatment groups controls all lurking variables. We also talked about blocking and double-blind experiments.

Example: 1954 polio vaccine trials
Workshop: Experiments

We finished by simulating the results of the polio vaccine trials to see if they might just be a random fluke. We wrote this R code in class:

results = c()
trials <- 1000
for (x in 1:trials) {
  simulated.result <- sample(c(0,1), size = 244, replace = TRUE)
  percent <- sum(simulated.result) / 244
  results <- c(results, percent)
}
hist(results)
sum(results < 0.336) / trials

Week 2 Notes

Day	Section	Topic
Mon, Jan 19		Martin Luther King day - no class
Wed, Jan 21	2.1	Examining numerical data
Fri, Jan 23	3.2	Conditional probability

Wed, Jan 21

Today we did a lab about using R to visualize data.

Lab: High Bridge half marathon (Rmd, html)

You should be able to open this file in your browser, then hit CTRL-A and CTRL-C to select it and copy it so that you can paste it into Rstudio as an R-markdown document.

We had a little trouble with R-markdown on the lab computers.

Fri, Jan 23

Last time we talked about how to visualize data with R. Here are two quick summaries of how to make plots in R:

Example: Basic plots with R
Example: Facier plots with ggplot2

After that, we started talking about probability. We review some of the basic rules.

Probability Rules

Addition Rule $P(A \text{ or } B) = P(A) + P(B) - P(A \text{ and } B).$
Multiplication Rule $P(A \text{ and } B) = P(A) \cdot P(B \, | \, A).$

The notation $P(B \, |\, A)$ means “the probability of B given that A happened”. Two events $A$ and $B$ are independent if the probability of $A$ does not depend on whether or not $B$ happens. We did the following examples.

If you shuffle a deck of 52 playing cards and then draw two, what is the probability that the second card is an ace if the first card is?

We also talked about tree diagrams (see subsection 3.2.7 from the book) and how to use them to compute probabilities.

Based on a study of women in the United States and Germany, there is an 0.8% chance that a woman in her forties has breast cancer. Mammograms are 90% accurate at detecting breast cancer if someone has it. They are also 93% accurate at not detecting cancer in people who don’t have it. If a woman in her forties tests positive for cancer on a mammogram screening, what is the probability that she actually has breast cancer?
5% of men are color blind, but only 0.25% of women are. Find $P( \text{male} \, | \, \text{color-blind})$ .

Week 3 Notes

Day	Section	Topic
Mon, Jan 26		Class canceled (snow)
Wed, Jan 28	4.1	Normal distribution
Fri, Jan 30	3.4	Random variables

Wed, Jan 28

Class was canceled today because I had a doctor’s appointment. But I recommended that everyone watch the following video and then complete a workshop about the R functions pnorm, qnorm, and rnorm.

Video: https://youtu.be/qLBmYfAVUdg
Workshop: Normal distribution calculations with R

Fri, Jan 30

Today we talked about random variables and probability distributions. We talked about some example probability distributions:

Flip a coin until you get a tail. Let $X$ represent the number of flips needed. (geometric distribution)
About 1 meteorite bigger than 1000 kg hits the Earth every year. The time until the next meteorite hits the Earth has probability density function $f(t) = e^{-t}$ . (exponential distribution)

We talked about the difference between continuous and discrete probability distributions. Then we introduced expected value.

Definition (Expected Value).

If $X$ is a discrete random variable, then the expected value of $X$ is $E(X) = \sum_{k} k P(X = k).$ If $X$ is a continuous random variable with probability density function $f(x)$ , then the expected value of $X$ is $E(X) = \int_{-\infty}^{\infty} x f(x) \, dx.$

We did the following example.

In roulette, if you bet $1 on black, there is an 18/38 probability that you win $2, and a 20/38 chance that you lose (and win nothing). What is the expected amount of money you will win?

We finished by talking about what we mean when we say something is “expected”.

Law of Large Numbers

If you repeat a random experiment many times, then the average outcome tends to get close to the expected value.

Week 4 Notes

Day	Section	Topic
Mon, Feb 2	3.4	Random variables - con’d
Wed, Feb 4	4.3	Binomial distribution
Fri, Feb 6	5.1	Point estimates and error

Mon, Feb 2

Definition (Variance and Standard Deviation).

For a random variable $X$ with expected value $\mu$ , the variance of $X$ is $\operatorname{Var}(X) = E((X-\mu)^2).$ The standard deviation of $X$ (denoted $\sigma$ ) is the square root of the variance.

We did these examples in class.

Find the variance and standard deviation when you roll a 6-sided die.

Exercise 3.34(a)

Here is an extra example from Kahn academy that we did not do in class.

Suppose a random variable $X$ has the following probability model.

$X$

0

1

2

3

4

$P(X)$

0.1

0.15

0.4

0.25

0.1
1. Find the expected value of $X$ . (https://youtu.be/qafPcWNUiM8)
2. Find the variance and standard deviation of $X$ . (https://youtu.be/2egl_5c8i-g)

Properties of Expected Value and Variance

Expected value is linear which means that for any two random variables $X$ and $Y$ and any constant $c$ , these two properties hold:

$E(X + Y) = E(X) + E(Y)$ (additivity)
$E(cX) = c E(X)$ (constant multiples)

Variance is not linear. Instead it has these properties:

$\operatorname{Var}(X + Y) = \operatorname{Var}(X) + \operatorname{Var}(Y)$ if $X$ and $Y$ are independent.
$\operatorname{Var}(cX) = c^2 \operatorname{Var}(X)$

A single six-sided die has expected value $\mu = 3.5$ and standard deviation $\sigma = 1.7078$ . What is the mean and standard deviation if you roll two dice and add them?
Exercise 3.34(b)

Wed, Feb 4

Binomial distribution. If $X$ is the total number of successes in $n$ independent trials, each with probability $p$ of a success, then $X$ has a binomial distribution, denoted $\operatorname{Binom}(n,p)$ for short. This distribution has

Probability mass function (PMF): $P(X = k) = \frac{n!}{k! (n-k)!} p^k (1-p)^k.$
Expected value $\mu = np$ .
Variance $\sigma^2 = np(1-p)$ .

If you play roulette and bet $1 on black, you have an 18/38 chance of winning $2. If you bet on a number like 7, then you have a 1/38 chance of winning $36. Both bets have the same expected value μ = $0.947. What are the variances for both bets?

We used this binomial distribution plotting tool to compare the distributions if you make these two bets 100 times. In one case we get something that looks roughly like a bell curve, in the other case we get something that is definitely skewed to the right.

In the 1954 polio vaccine trials there were 244 polio cases, but only 82 actually had polio. Use the binomial distribution to compute the probability that 82 or fewer fair coin tosses out of 244 come up heads. This can model the p-value under the hypothesis that the polio vaccine does not work. Use the pbinom(x, n, p) function in R.

Sometimes the assumption that the trials are independent is not justified.

Suppose you want to find the percent of Hampden-Sydney students who are left handed. So you interview a random sample of 50 students (out of the population of about 900 HSC students). Why aren’t these observations independent?

The correct probability distribution to model the example above is called the hypergeometric distribution. As long as the population is much larger than the sample, we typically do not need to worry about the trials not being independent.

We finished by discussing the normal approximation of a binomial distribution. When $n$ is large enough so that both $np \ge 10$ and $n(1-p) \ge 10$ , then $\operatorname{Binom}(n, p)$ is approximately normal with mean $np$ and standard deviation $\sqrt{n p (1-p)}$ .

Fri, Feb 6

The Central Limit Theorem

Suppose that $X_1, X_2, \ldots, X_n$ are independent random variables that all have the same probability distribution. If $n$ is large, then the total $X_1 + X_2 + \ldots + X_n$ has an approximately normal distribution.

If each $X_i$ has mean $\mu$ and standard deviation $\sigma$ , then what is the mean and the standard deviation of the total?
In Dungeons and Dragons, you calculate the damage from a fireball spell by rolling 8 six-sided dice and adding up the results. This has an approximately normal distribution. What is the mean and standard deviation of this distribution. (Recall that the mean and standard deviation of a single six-sided die is $\mu = 3.5$ and $\sigma = 1.7078$ ).

We looked at a graph of the distribution from the previous example to see that it is indeed approximately normal.

Use the normal approximation to estimate the probability of doing more than 35 points of damage with a fireball spell.

When you use a normal approximation to estimate discrete probabilities, it is recommended to use a continuity correction (see Section 4.3.3). To estimate $P(X \le k)$ , calculate $P(X < k + 0.5)$ using the normal approximation (and likewise, to estimate $P(X \ge k)$ , compute $P(X > k - 0.5)$ using the normal approximation).

An important special case of the central limit theorem is the normal approximation of the binomial distribution, which has mean $\mu = np$ and standard deviation $\sigma = \sqrt{np(1-p)}$ .

About 7% of the US population has type O-negative blood (universal donors). If a hospital has 700 patients, use the normal approximation to estimate the probability that more than 60 have type O-negative blood. Compare your answer with the result if you use the pbinom(x, n, p) function.

We finished by talking about the difference between the distribution of the total versus the distribution of the proportion of patients who are O-negative. The standard deviation of the sample proportion $\hat{p}$ is $\sigma_{\hat{p}} = \sqrt{ \frac{p (1-p)}{n} }.$

Week 5 Notes

Day	Section	Topic
Mon, Feb 9	5.2	Confidence intervals for a proportion
Wed, Feb 11		Review
Fri, Feb 13		Midterm 1

Mon, Feb 9

Today we talked about confidence intervals for a proportion.

Sampling Distribution for a Sample Proportion. In a SRS of size $n$ from a large population, the sample proportion $\hat{p}$ is random, so it has a probability distribution with the following features.

Shape. The distribution gets more normal as the sample size $n$ increases.
Center. The expected value of $\hat{p}$ is the same as the population proportion $p$ .
Spread. The standard deviation of $\hat{p}$ is $\sigma_{\hat{p}} = \sqrt{\frac{p (1-p)}{n}}.$

In practice, we usually don’t know the population proportion $p$ . Instead we can use the sample proportion $\hat{p}$ to calculate the standard error of $\hat{p}$ : $SE_{\hat{p}} = \sqrt{\frac{\hat{p} (1-\hat{p})}{n}}.$

This year, 34 out of 72 students in my statistics classes were born in Virginia. What is the sample proportion $\hat{p}$ and the standard error?

If the sample size is large enough, then there is a 95% chance that $\hat{p}$ will be within about two standard deviations of $p$ . So if we know $\hat{p}$ and we assume that the standard error is close to the standard deviation for $\hat{p}$ , then we can make a confidence interval for the location of the parameter $p$ .

Confidence Interval for a Proportion. This works well if the sample size $n$ is very large.

$\hat{p} \pm z^* \sqrt{\frac{\hat{p} ( 1- \hat{p})}{n}}.$

You can use the R command qnorm((1 - p) / 2) to find the critical z-value ( $z^*$ ) when you want a specific confidence level $p$ .

Use the data above to make a 95% confidence interval for the proportion of all HSC students born in VA.

After that, we talked about the prop.test() function in R which can make a confidence interval (among other things).

Example: Confidence intervals for proportions with R (Rmd, html)

Notice that the prop.test() confidence interval is not the same as what we got using the formula above. Instead of using the formula above, R uses something called a Wilson score confidence interval with continuity correction. The idea is to solve for the two points $p$ where $p - \hat{p} = \pm z^* \sqrt{\frac{p(1-p)}{n}}.$ If you add in the continuity correction, this pretty much guarantees that there is at least a 95% chance (or whatever other confidence level you want) that the interval contains the true population parameter. The Wilson method confidence intervals are fairly trustworthy even with relatively small samples and small numbers of successes/failures.

Wed, Feb 11

Today we went over the midterm 1 review problems (the solutions are also available now). We also did some additional practice problems including these.

If you draw a random card from a deck of 52 playing cards, what is the probability that you draw an ace or a heart?
Suppose you need knee surgery. There is an 11% that the surgery fails. There is a 4% chance of getting an infection. And there is a 3% chance of both infection and the surgery failing. What is the probability that the surgery succeeds without infection?
In the Wimbledon tennis tournament, serving players are more likely to win a point. A server has two chances to serve the ball. There is a 59% chance that the first serve is in, and if it is, then the server has a 73% chance of winning the point. If the first serve is out, then they have an 86% of getting the second serve in, and in that case they have a 59% chance of winning the point. But if the second serve is out, then the server automatically loses the point.
1. Make a tree diagram for this situation.
2. Find the probability that the serving player wins the point.
3. Find the conditional probability that 1st serve was in, if the server wins the point.

Week 6 Notes

Day	Section	Topic
Mon, Feb 16	5.3	Hypothesis tests for a proportion
Wed, Feb 18	6.2	Difference in two proportions
Fri, Feb 20	6.2	Difference in two proportions - con’d

Mon, Feb 18

Today we talked about hypothesis testing, specifically testing hypotheses about a population proportion. We looked at three examples.

Video: Helper, hinderer study
Picture: Bob vs. Tim
Picture: Zener cards

In the helper versus hinderer student, 14 out of 16 infants chose the helper toy. Could this be a random fluke? To find out we can do a hypothesis test for proportions.
1. What is the parameter of interest?
2. What are the correct null & alternative hypotheses about that parameter?
3. What output do you get from the prop.test() function in this situation?
4. What does the output mean?

When you do a hypothesis test, typically you choose a significance level α in advance, and then you calculate a p-value. A p-value is the probability of getting a result at least as extreme as the statistic, if the null hypothesis is true: $\text{p-value} = P(\text{Result at least as extreme } | ~ H_0).$ If the p-value is below the significance level, then you should reject the null hypothesis. The following things are all equivalent:

The results are statistically significant.
The p-value is below the significance level.
Reject the null hypothesis.
The results are probably not a random fluke.

Conversely, if the results are not statistically significant, then we don’t reject the null, and we should be aware that the results might be a random fluke. Be careful, a common misunderstanding is to think that the p-value is $P(H_0 ~ | \text{ results as extereme as ours})$ . The p-value does not directly tell you the probability that the null hypothesis is true, it only indirectly suggests that the null might not be true.

In another study, researchers presented 100 college students the images of two men (see the link above) and asked them to guess which was named Tim and which was named Bob. It turned out that 67 students guessed that Tim was the man with the goatee.
If someone gets 10 out of 25 guesses about what Zener card someone is looking at, is this strong evidence that they are psychic? Do a hypothesis test to find out.

The null hypothesis in the last example is that the person is not psychic, so they only have a 1 out of 5 chance of guessing right. Here is how you test this using the prop.test() function in R.

prop.test(10, 25, p = 0.2, alternative = "greater")

Wed, Feb 18

We talked about how to compare two proportions using confidence intervals and hypothesis testing. We started by talking about how the prop.test() function in R can accept a vector of successes and another vector of totals for more than one group. We used this to analyze the following study.

A 2002 study looked at whether nicotine lozenges could help smokers who want to quit. The subjects were randomly assigned to two treatment groups. One group got a nicotine lozenge to take when they had cravings, while the other group got a placebo lozenge. Of the 459 subjects who got the nicotine lozenge, 82 successfully abstained from smoking, while only 44 out of the 458 subjects in the placebo did.
1. Make a 2-way table to show the results of this experiment.
2. Make a stacked bar graph that shows the column percentagages for the 2-way table.
3. What are the correct null and alternative hypotheses for this situation?
4. Find the p-value and explain what it means about nicotine lozenges.
5. In the sample, how many times more likely were people in the nicotine lozenge group to quit smoking than people in the placebo group?
6. What are we 95% confident will be in the confidence interval?

We created an R-markdown document to answer these questions in class.

Example: Nicotine Lozenges (Rmd source)

After we did that example, I let everyone work on a similar example on their own:

The North Carolina State University Chemical Engineering department did a study to determine which factors affect student success in one of their introductory chemical engineering courses. One of the factors they looked at was whether students were from a rural area, or an urban/suburban area. Here is a two-way table showing the counts of the students who passed and failed the course.

	Rural	Urban/Suburban
Passed	30	52
Failed	25	13
Total	55	65

Use R to visualize the results and carry out a hypothesis test to see if background make a significant difference in student pass rates.

Example: Factors affecting pass rates in chemical engineering (Rmd source)

Fri, Feb 20

We started with this example that we did not have time for last time.

One of the other factors that the North Carolina State University Chemical Engineering department looked at was gender. Here is a two-way table showing the gender of students versus whether or not they passed the introductory chemical engineering course.

	Male	Female
Passed	60	23
Failed	29	11
Total	89	34

After that we talked briefly about the theory behind the two-sample test for proportions.

Theorem. If $X$ and $Y$ are independent random variables that each have a normal distribution, then $X+Y$ also has a normal distribution.

If we take two simple random samples from two populations, the two sample proportions $\hat{p}_1$ and $\hat{p}_2$ are each approximately normally distributed.

By the theorem above, the difference $\hat{p}_1 - \hat{p}_2$ should be approximately normal. What is its mean and standard deviation?

Two-Sample Hypothesis Test for Proportions.

\begin{array}{lr} H_0: & p_A = p_B \\ H_A: & p_A \ne p_B \end{array}

z = \dfrac{\hat{p}_A - \hat{p}_B}{\sqrt{\hat{p} (1 - \hat{p})\left( \frac{1}{N_A} + \frac{1}{N_B} \right)}}.

where $\hat{p}$ is the pooled proportion: $\hat{p} = \frac{\text{ Total number of successes in both samples }}{N_A + N_B}.$

Works best if both samples have at least 5 successes & 5 failures.

Two-sample Confidence Interval for Proportions.

$(\hat{p}_A - \hat{p}_B) \pm z^* \sqrt{\frac{\hat{p}_A (1-\hat{p}_A)}{N_A} + \frac{\hat{p}_B (1- \hat{p}_B)}{N_B}}.$

Works best if both samples contain at least 10 successes and 10 failures.

We also talked about one-sided confidence intervals, which you get automatically in R when you set the alternative option to either "greater" or "less".

We finished by introducing the chi-squared statistic $\chi^2 = \sum \frac{(E_{ij} - O_{ij})^2}{E_{ij}}$ where $E_{ij} = \frac{(\text{Row } i \text{ total}) \cdot (\text{Column } j \text{ total})}{\text{Table total}}$ is the expected count in row $i$ , column $j$ (assuming there is no association), and $O_{ij}$ is the observed count in row $i$ , column $j$ .

(Challenge problem). Use algebra to show the $\chi^2$ statistic for a 2-by-2 two-way table is the same as the square of the $z$ -value from the two sample proportions test.

Week 7 Notes

Day	Section	Topic
Mon, Feb 23	6.4	Chi-squared test for association
Wed, Feb 25	6.3	Chi-squared goodness of fit test
Fri, Feb 27	7.1	One-sample means with t-distribution

Mon, Feb 22

You can use the chi-squared test for association to see if there is a significant association between two categorical variables. We did this example using R.

Example: Chi-squared test for association (data, source)

We talked about the difference between long tables (also known as tidy tables) where each row represents one individual and each column represents a variable, versus two-way tables (also known as contingency tables) where the rows and columns represent categories for two categorical variables and the numbers in the table are the counts.

You can easily convert a long table stored as a data frame in R to a two-way table using the table() function. You can transpose a two-way table (swap the rows & columns) using the function t().

We also talked about mosaic plots as an alternative to stacked bar graphs for showing the relationship between two categorical variables.

Assumptions for Chi-Squared Test of Association

Independence. Data should come from a simple random sample from the population and each individual should only count once in the two-way table.
Sample size. Each cell in the two-way table should have an expected count of at least 5.

We did this example:

Suppose that a random sample of 100 people in a city are asked if they think the fire department is doing a satisfactory job. Shortly after the survey, there is a large fire in the city. If the same 100 people are asked their opinions again, you might get results like this:

Satisfactory

Unsatisfactory

Before

80

20

After

72

28

For this table, $\chi^2 = 1.754$ with a p-value of 18.5%. Why should you not trust this p-value?
The right way to look at this data is to include each person once. Each individual person has their before opinion and their after opinion recorded, so we could make a two-way table for those two variables:

Satisfactory Before

Unsatisfactory Before

Satisfactory After

70

2

Unsatisfactory After

10

18
1. Now $\chi^2 = 47.7$ with a p-value of $5.0 \times 10^{12}$ . This is very strong evidence that there is an association between which two variables?
2. Should we be worried that there are only 2 people in the top right cell? Why won’t that be a problem for the chi-squared test?

We ran out of time at the end, but I gave the following handout as extra practice to think about the chi-squared test for association.

Workshop: Chi-squared caveats

Wed, Feb 25

Today we introduced the chi-squared goodness of fit test. It is a lot like the chi-squared test for association, except instead of having two categorical variables, you just have one and you are testing to see whether the proportions in each category from the sample match some model for what the population should be.

Workshop: Chi-squared goodness of fit test (Rmd source)

Fri, Feb 27

We started with this question:

Are Hampden-Sydney students significantly taller than average for men in the USA? It is known that the average height of adult men in the United States is 70 inches with a standard deviation of 3 inches.

We tested the hypotheses:

$H_0: \mu_{HSC} = 70$
$H_A: \mu_{HSC} \ne 70$ .

We started by trying to find a z-value using $z = \frac{\bar{x} - \mu_0}{\sigma/\sqrt{n}}$ but since we do not know the correct standard deviation for the population of all HSC students, we need to switch to using t-values: $t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}}.$

students <- read.csv("https://bclins.github.io/spring26/math222/Examples/StudentData.csv")
t.test(students$Height, mean = 70)

The t-distribution was discovered by William Gossett while he worked for the Guinness brewing company.

Scientists studying the Earth’s atmosphere found amber resin that formed 95 to 75 million years ago when dinosaurs lived. They measured the percent of nitrogen trapped in air bubbles in the resin and found the following results: c(63.4, 65, 64.4, 63.3, 54.8, 64.5, 60.8, 49.1, 51). Is this strong evidence that nitrogen levels back then were significantly different than they are now? Currently nitrogen is 78.1% of the Earth’s atmosphere.
```
nitrogen <- c(63.4, 65, 64.4, 63.3, 54.8, 64.5, 60.8, 49.1, 51)
t.test(nitrogen, mu = 78.1)
```

Assumptions for the t-Test

No bias - Ideally the data should come from a SRS from the whole population.
Independence - Population should be much larger than the sample.
Population has a normal distribution - This assumption is less important if the sample size is large.

If you have a small sample ( $n < 30$ ), then you should be careful about trusting the t-distribution methods unless you are sure that the population really has a normal distribution.

The two t-tests we did above both had significant p-values. The first was $5.4 \times 10^{-4}$ and the second was $2.0 \times 10^{-5}$ . Thinking about the assumptions above, which p-value is probably stronger evidence?

Week 8 Notes

Day	Section	Topic
Mon, Mar 2	7.2	Paired data
Wed, Mar 4	7.3	Difference of two means
Fri, Mar 6	7.4	Power calculations

Mon, Mar 2

We started by talking about using quantile-quantile plots to check normality.

Example: Checking normality with qqplots

We talked about how to tell the difference between left-skewed and right-skewed distributions on a qqplot. You can also use a qqplot to tell if a distribution has tails that are too fat to be normal.

After that, we introduced prediction intervals. A 95% t-distribution confidence interval is supposed to contain the population mean, but it does not contain 95% of the individuals, nor does it have a 95% chance to contain a future observation. But you can make an interval that contains 95% of future observations by using a prediction interval.

Prediction Interval for a Quantitative Variable.

$\bar{x} \pm t^* \sqrt{s^2 + \dfrac{s^2}{N}}.$

Caution: Unlike confidence intervals, these are not robust if the population is not normal, even if the sample size is large!

We used R to find a 95% prediction interval for next year’s rainfall here in Farmville.

rain <- read.csv('http://people.hsc.edu/faculty-staff/blins/StatsExamples/rainfall.csv')
xbar <- mean(rain$total)
s <- sd(rain$total)
N <- 81
tstar <- qt(0.975, df = 80)
upper <- xbar + tstar * sqrt(s^2 + s^2 / N)
lower <- xbar - tstar * sqrt(s^2 + s^2 / N)

We introduced the qt() function which is similar to the qnorm() function, except it is for the t-distribution.

Then we talked about using the t-test with paired data. We started with this data set which shows the size in cubic centimeters of the left hippocampus region of the brain (measured using MRI) of pairs of twins. Each pair of twins had one who was diagnosed with schizophrenia and one who was unaffected by schizophrenia. So we want to know if the size of the hippocampus is significantly different in twins with schizophrenia.

brain = read.csv('https://www.rossmanchance.com/iscam2/data/hippocampus.txt', sep = "\t")

Notice the optional argument sep = "\t" which we had to use since the data file was stored as tab-separated values, not comma-separated values. Since the twins come in matched pairs, we test the differences:

t.test(brain$unaffected - brain$affected)

Wed, Mar 4

Today we worked on the following examples in class:

Example: Comparing two means (Rmd source)

For two-sample t-tests, we use Welch’s t-test which is a very robust method. It uses the fact that if you sample from two populations with equal means, then the two-sample t-value: $t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}$ will approximately follow a t-distribution (under very mild normality & independence assumptions). The formula for the degrees of freedom is a bit complicated, but R will calculate it for you automatically.

Fri, Mar 6

Today we talked about statistical power, significance levels, and Type I versus II errors. Traditionally when people do a hypothesis test, they have a significance level $\alpha$ in mind. If the results have a p-value below the significance level, then the researchers can feel justified rejecting the null hypothesis. But there are two potential problems with this type of significance test.

	$H_0$ is true	$H_A$ is true
p-value below $\alpha$	Type I error (false positive)	Reject $H_0$
p-value above $\alpha$	Don’t reject $H_0$	Type II error (false negative)

If the null hypothesis is true, then the probability of a Type I error is $1- \alpha$ . In order to talk about the probability of a Type II error, we need to make some extra assumptions about the situation, including picking a specific value for the parameter of interest.

Definition. The power of a statistical study is the probability of correctly rejecting the null hypothesis if a specific alternative hypothesis is true.

If you are going to the trouble to design an experiment or observational study, you should probably do a quick power calculation before you start, otherwise you might just be wasting your time. We did these examples:

A 1998 study looked at the herbal supplement Garcinia Cambogia to see if it can help people lose weight. Here is the abstract from the study:

A total of 135 subjects were randomized to either active hydroxycitric acid [The active ingredient in G. Cambogia] (n = 66) or placebo (n = 69); 42 (64%) in the active hydroxycitric acid group and 42 (61%) in the placebo group completed 12 weeks of treatment. Patients in both groups lost a significant amount of weight during the 12-week treatment period; however, between-group weight loss differences were not statistically significant (mean [SD], 3.2 [3.3] kg vs 4.1 [3.9] kg; P = 0.14).

If we wanted to perform a follow-up study to see if G. Cambogia can increase weight loss by at least 1 kg (over a placebo) and if we assume that the standard deviation in weight loss for each group will be around 4 kg, then how large should our groups be in order to get a power of at least 80%? What if we want 90% power?

In the previous example, we were doing a two-sample hypothesis test for means. In that case, the null model says that the difference in sample means $\bar{x}_1 - \bar{x}_2$ should have a normal distribution with mean $0$ and standard deviation $\sigma_{\bar{x}_1 - \bar{x}_2} = \sqrt{ \frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}.$

We picked an effect size that we would like to detect to be $\mu_1 - \mu_2 = 1$ kg. And we talked about why it is safer to round standard deviations up when you are picking plausible values for $\sigma_1$ and $\sigma_2$ in a power calculation, so we picked $\sigma_1 = \sigma_2 = 4$ kg. Then we used the following code to find the power:

n = 100 # We assumed both treatment groups would be same size
sigma_1 = 4
sigma_2 = 4 
sigma = sqrt(sigma_1^2/n + sigma_2^2/n)
threshold = qnorm(0.95, mean = 0, sd = sigma)
power = 1 - pnorm(threshold, mean = 1, sd = sigma)
power # The power with n = 100 is only about 54.9%.

By testing different sample sizes, you can find an n large enough to get a power of 80% or higher. We didn’t have time for the following example, but it is good practice if you want a power calculation for a 1-sample hypothesis test. For a 1-sample test for means, the null and alternative models will both be normal distributions with standard deviation $\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}.$

The Columbus Times-Dispatch wrote an article about a matched pairs experiment to see if footballs filled with helium go farther than footballs filled with regular air when you kick them. They had a novice kicker kick 39 pairs of footballs, one fill with helium and one with air. Their results were inconclusive. They found that the helium filled footballs went 0.46 yards farther on average, with a standard deviation of 6.87 yards, but the results were not statistically significant.

Example: Helium filled footballs data

Week 9 Notes

Day	Section	Topic
Mon, Mar 16	7.5	Comparing many means with ANOVA
Wed, Mar 18		Review
Fri, Mar 20		Midterm 2

Week 10 Notes

Day	Section	Topic
Mon, Mar 23	7.5	ANOVA - con’d
Wed, Mar 25	8.2	Least squares regression
Fri, Mar 27	9.1	Introduction to multiple regression

Week 11 Notes

Day	Section	Topic
Mon, Mar 30	9.2	Model selection
Wed, Apr 1	9.3	Checking model conditions
Fri, Apr 3	9.3	Checking model conditions - con’d

Week 12 Notes

Day	Section	Topic
Mon, Apr 6	9.5	Introduction to logistic regression
Wed, Apr 8	9.5	Logistic regression - con’d
Fri, Apr 10		Hypothesis testing with randomization

Week 13 Notes

Day	Section	Topic
Mon, Apr 13		Confidence intervals with bootstrapping
Wed, Apr 15		Review
Fri, Apr 17		Midterm 3

Week 14 Notes

Day	Section	Topic
Mon, Apr 20		Introduction to Bayesian methods
Wed, Apr 22		Credible intervals for proportions
Fri, Apr 24		Bayesian inference
Mon, Apr 27		Last day, recap & review

	Satisfactory Before	Unsatisfactory Before
Satisfactory After	70	2
Unsatisfactory After	10	18

$X$	0	1	2	3	4
$P(X)$	0.1	0.15	0.4	0.25	0.1