| Day | Section | Topic |
|---|---|---|
| Mon, Jan 12 | Working with R and Rstudio | |
| Wed, Jan 14 | 1.3 | Sampling principles and strategies |
| Fri, Jan 16 | 1.4 | Experiments |
Today we went over the course syllabus and talked about making R-markdown files in Rstudio. We started the following lab in class, I recommend finishing the second half on your own. I also recommend installing Rstudio on your own laptop (it’s free).
Today we reviewed populations and samples. We started with a famous example of a bad sample.
Then we reviewed population parameters, sample
statistics, and sampling frames. The
difference between a sample statistic and a population parameter is
called the sample error.
There are two sources of sample error:
Bias. Can be caused by a non-representative sample (sample bias) or by measurement errors, non-response, or biased questions (non-sample bias). The only way to avoid sample bias is a simple random sample (SRS) from the whole population.
Random error. This is non-systematic error. It tends to get smaller with larger samples.
To summarize:
We finished with this workshop.
If you find an association between an explanatory variable and a response variable in an observational study, then you can’t say for sure that the explanatory variable is the cause. We say that correlation is not causation because there might be lurking variables that are confounders, that is, they are associated with both the explanatory and response variables and so you can tell what is the true cause.
It turns out that randomized experiments can prove cause and effect because random assignment to treatment groups controls all lurking variables. We also talked about blocking and double-blind experiments.
Example: 1954 polio vaccine trials
Workshop: Experiments
We finished by simulating the results of the polio vaccine trials to see if they might just be a random fluke. We wrote this R code in class:
results = c()
trials <- 1000
for (x in 1:trials) {
simulated.result <- sample(c(0,1), size = 244, replace = TRUE)
percent <- sum(simulated.result) / 244
results <- c(results, percent)
}
hist(results)
sum(results < 0.336) / trials| Day | Section | Topic |
|---|---|---|
| Mon, Jan 19 | Martin Luther King day - no class | |
| Wed, Jan 21 | 2.1 | Examining numerical data |
| Fri, Jan 23 | 3.2 | Conditional probability |
Today we did a lab about using R to visualize data.
You should be able to open this file in your browser, then hit CTRL-A and CTRL-C to select it and copy it so that you can paste it into Rstudio as an R-markdown document.
We had a little trouble with R-markdown on the lab computers.
Last time we talked about how to visualize data with R. Here are two quick summaries of how to make plots in R:
After that, we started talking about probability. We review some of the basic rules.
The notation means “the probability of B given that A happened”. Two events and are independent if the probability of does not depend on whether or not happens. We did the following examples.
We also talked about tree diagrams (see subsection 3.2.7 from the book) and how to use them to compute probabilities.
Based on a study of women in the United States and Germany, there is an 0.8% chance that a woman in her forties has breast cancer. Mammograms are 90% accurate at detecting breast cancer if someone has it. They are also 93% accurate at not detecting cancer in people who don’t have it. If a woman in her forties tests positive for cancer on a mammogram screening, what is the probability that she actually has breast cancer?
5% of men are color blind, but only 0.25% of women are. Find .
| Day | Section | Topic |
|---|---|---|
| Mon, Jan 26 | Class canceled (snow) | |
| Wed, Jan 28 | 3.4 | Random variables |
| Fri, Jan 30 | 4.1 | Normal distribution |
| Day | Section | Topic |
|---|---|---|
| Mon, Feb 2 | 4.3 | Binomial distribution |
| Wed, Feb 4 | 5.1 | Point estimates and error |
| Fri, Feb 6 | 5.2 | Confidence intervals for a proportion |
| Day | Section | Topic |
|---|---|---|
| Mon, Feb 9 | 5.3 | Hypothesis tests for a proportion |
| Wed, Feb 11 | Review | |
| Fri, Feb 13 | Midterm 1 |
| Day | Section | Topic |
|---|---|---|
| Mon, Feb 16 | 6.2 | Difference in two proportions |
| Wed, Feb 18 | 6.3 | Chi-squared goodness of fit test |
| Fri, Feb 20 | 6.4 | Chi-squared test for association |
| Day | Section | Topic |
|---|---|---|
| Mon, Feb 23 | 7.1 | One-sample means with t-distribution |
| Wed, Feb 25 | 7.2 | Paired data |
| Fri, Feb 27 | 7.3 | Difference of two means |
| Day | Section | Topic |
|---|---|---|
| Mon, Mar 2 | 7.4 | Power calculations |
| Wed, Mar 4 | 7.5 | Comparing many means with ANOVA |
| Fri, Mar 6 | 7.5 | ANOVA - con’d |
| Day | Section | Topic |
|---|---|---|
| Mon, Mar 16 | 7.5 | ANOVA - con’d |
| Wed, Mar 18 | Review | |
| Fri, Mar 20 | Midterm 2 |
| Day | Section | Topic |
|---|---|---|
| Mon, Mar 23 | 8.2 | Least squares regression |
| Wed, Mar 25 | 9.1 | Introduction to multiple regression |
| Fri, Mar 27 | 9.2 | Model selection |
| Day | Section | Topic |
|---|---|---|
| Mon, Mar 30 | 9.3 | Checking model conditions |
| Wed, Apr 1 | 9.3 | Checking model conditions - con’d |
| Fri, Apr 3 | 9.5 | Introduction to logistic regression |
| Day | Section | Topic |
|---|---|---|
| Mon, Apr 6 | 9.5 | Logistic regression - con’d |
| Wed, Apr 8 | Hypothesis testing with randomization | |
| Fri, Apr 10 | Confidence intervals with bootstrapping |
| Day | Section | Topic |
|---|---|---|
| Mon, Apr 13 | Bootstrapping continued | |
| Wed, Apr 15 | Review | |
| Fri, Apr 17 | Midterm 3 |
| Day | Section | Topic |
|---|---|---|
| Mon, Apr 20 | Introduction to Bayesian methods | |
| Wed, Apr 22 | Credible intervals for proportions | |
| Fri, Apr 24 | Bayesian inference | |
| Mon, Apr 27 | Last day, recap & review |