Statistics Notes

Math 121 - Spring 2026

Jump to: Math 121 homepage, Week 1, Week 2, Week 3, Week 4, Week 5, Week 6, Week 7, Week 8, Week 9, Week 10, Week 11, Week 12, Week 13, Week 14

Week 1 Notes

Day Section Topic
Mon, Jan 12 1.2 Data tables, variables, and individuals
Wed, Jan 14 2.1.3 Histograms & skew
Fri, Jan 16 2.1.5 Boxplots

Mon, Jan 12

Today we covered data tables, individuals, and variables. We also talked about the difference between categorical and quantitative variables.

  1. We looked at a case of a nurse who was accused of killing patients at the hospital where she worked for 18 months. One piece of evidence against her was that 40 patients died during the shifts when she worked, but only 34 died during shifts when she wasn’t working. If this evidence came from a date table, what would be the most natural individuals (rows) & variables (columns) for that table?
  1. In the data table in the example above, who or what are the individuals? What are the variables and which are quantitative and which are categorical?

  2. If we want to compare states to see which are safer, why is it better to compare the rates instead of the total fatalities?

  3. What is wrong with this student’s answer to the previous question?

Rates are better because they are more precise and easier to understand.

I like this incorrect answer because it is a perfect example of bullshit. This student doesn’t know the answer so they are trying to write something that sounds good and earns partial credit. Try to avoid writing bullshit. If you catch yourself writing B.S. on one of my quizzes or tests, then you can be sure that you a missing a really simple idea and you should see if you can figure out what it is.

Wed, Jan 14

We talked briefly about making bar charts for categorical data.

  1. Exercise 2.21

Then we introduced stem & leaf plots (stemplots) and histograms for quantitative data. We started by making a stemplot and a histogram for the weights of the students in the class. We also talked about how to tell if data is skewed left or skewed right.

  1. Can you think of a distribution that is skewed left?

  2. Why isn’t this bar graph from the book a histogram?

Then we did this workshop:

We finished by reviewing the mean and the median.

Median versus Average

The median of NN numbers is located at position N+12\dfrac{N+1}{2}.

The median is not affected by skew, but the average is pulled in the direction of the skew. So the average will be bigger than the median when the data is skewed right, and smaller when the data is skewed left.

Fri, Jan 16

We introduced the five number summary and box-and-whisker plots (boxplots). We also talked about the interquartile range (IQR) and how to use the 1.5×IQR1.5 \times \text{IQR} rule to determine if data is an outlier.

We started with this simple example:

  1. An 8 man crew team actually includes 9 men, the 8 rowers and one coxswain. Suppose the weights (in pounds) of the 9 men on a team are as follows:

     120  180  185  200  210  210  215  215  215

    Find the 5-number summary and draw a box-and-whisker plot for this data. Is the coxswain who weighs 120 lbs. an outlier?


Week 2 Notes

Day Section Topic
Mon, Jan 19 Martin Luther King day - no class
Wed, Jan 21 2.1.4 Standard deviation
Fri, Jan 23 4.1 Normal distribution

Wed, Jan 21

Today we talked about robust statistics such as the median and IQR that are not affected by outliers and skew. We also introduced the standard deviation. We did this one example of a standard deviation calculation by hand, but you won’t ever have to do that again in this class.

  1. 11 students just completed a nursing program. Here is the number of years it took each student to complete the program. Find the standard deviation of these numbers.

     3  3  3  3  4  4  4  4  5  5  6

From now on we will just use software to find standard deviation. In a spreadsheet (Excel or Google Sheets) you can use the =STDEV() function.

  1. Which of the following data sets has the largest standard deviation?

    1. 1000, 998, 1005
    2. 8, 10, 15, 20, 22, 27
    3. 30, 60, 90

We finished by looking at some examples of histograms that have a shape that looks roughly like a bell. This is a very common pattern in nature that is called the normal distribution.

The normal distribution is a mathematical model for data with a histogram that is shaped like a bell. The model has the following features:

  1. It is symmetric (left & right tails are same size)
  2. The mean (μ\mu) is the same as the median.
  3. It has two inflection points (the two steepest points on the curve)
  4. The distance from the mean to either inflection point is the standard deviation (σ\sigma).
  5. The two numbers μ\mu and σ\sigma completely describe the model.

The normal distribution is a theoretical model that doesn’t have to perfectly match the data to be useful. We use Greek letters μ\mu and σ\sigma for the theoretical mean and standard deviation of the normal distribution to distinguish them from the sample mean x\bar{x} and standard deviation ss of our data which probably won’t follow the theoretical model perfectly.

Fri, Jan 23

We talked about z-values and the 68-95-99.7 rule.

We also did these exercises before the workshop.

  1. In 2020, Farmville got 61 inches of rain total (making 2020 the second wettest year on record). How many standard deviations is this above average?

  2. The average high temperature in Anchorage, AK in January is 21 degrees Fahrenheit, with standard deviation 10. The average high temperature in Honolulu, HI in January is 80°F with σ = 8°F. In which city would it be more unusual to have a high temperature of 57°F in January?


Week 3 Notes

Day Section Topic
Mon, Jan 26 4.1.5 68-95-99.7 rule
Wed, Jan 28 4.1.4 Normal distribution computations
Fri, Jan 30 2.1, 8.1 Scatterplots and correlation

Week 4 Notes

Day Section Topic
Mon, Feb 2 8.2 Least squares regression introduction
Wed, Feb 4 8.2 Least squares regression practice
Fri, Feb 6 1.3 Sampling: populations and samples

Week 5 Notes

Day Section Topic
Mon, Feb 9 1.3 Bias versus random error
Wed, Feb 11 Review
Fri, Feb 13 Midterm 1

Week 6 Notes

Day Section Topic
Mon, Feb 16 1.4 Randomized controlled experiments
Wed, Feb 18 3.1 Defining probability
Fri, Feb 20 3.1 Multiplication and addition rules

Week 7 Notes

Day Section Topic
Mon, Feb 23 3.4 Weighted averages & expected value
Wed, Feb 25 3.4 Random variables
Fri, Feb 27 7.1 Sampling distributions

Week 8 Notes

Day Section Topic
Mon, Mar 2 5.1 Sampling distributions for proportions
Wed, Mar 4 5.2 Confidence intervals for a proportion
Fri, Mar 6 5.2 Confidence intervals for a proportion - con’d

Week 9 Notes

Day Section Topic
Mon, Mar 16 5.3 Hypothesis testing for a proportion
Wed, Mar 18 Review
Fri, Mar 20 Midterm 2

Week 10 Notes

Day Section Topic
Mon, Mar 23 6.1 Inference for a single proportion
Wed, Mar 25 5.3.3 Decision errors
Fri, Mar 27 6.2 Difference of two proportions (hypothesis tests)

Week 11 Notes

Day Section Topic
Mon, Mar 30 6.2.3 Difference of two proportions (confidence intervals)
Wed, Apr 1 7.1 Introducing the t-distribution
Fri, Apr 3 7.1.4 One sample t-confidence intervals

Week 12 Notes

Day Section Topic
Mon, Apr 6 7.2 Paired data
Wed, Apr 8 7.3 Difference of two means
Fri, Apr 10 7.3 Difference of two means - con’d

Week 13 Notes

Day Section Topic
Mon, Apr 13 7.4 Statistical power
Wed, Apr 15 Review
Fri, Apr 17 Midterm 3

Week 14 Notes

Day Section Topic
Mon, Apr 20 6.3 Chi-squared statistic
Wed, Apr 22 6.4 Testing association with chi-squared
Fri, Apr 24 Choosing the right technique
Mon, Apr 27 Last day, recap & review