Pairwise Comparisons

Example: Dandruff Shampoo

A study looked at the following treatments for dandruff:

1% pyrithione zinc shampoo (PyrI),
1% pyrithione zinc shampoo with instructions to shampoo two times (PyrII),
2% ketoconazole shampoo (Keto),
a placebo shampoo (Placebo).

Subjects were randomized to each group, and initially each group had 112 volunteers, except the placebo group which was only assigned 28. Each volunteer was examined for dandruff flakes before and after six weeks of treatment. Dandruff flaking was measured on a scale from 0 to 80. Initially, there were no significant differences between the groups. During the clinical trial, 3 dropped out from the PyrII group and 6 from the Keto group. No patients dropped out of the other two groups.

We saw last time that all three dandruff treatments were significantly more effective than the placebo. But how do the three treatment groups compare against each other?

dandruff <- read.csv("http://people.hsc.edu/faculty-staff/blins/classes/spring18/math222/data/dandruff.txt")
treatment <- subset(dandruff, dandruff$treat != "Placebo")
boxplot(flaking ~ treat, data = treatment)

counts <- aggregate(flaking ~ treat, data = dandruff, FUN = length)
means <- aggregate(flaking ~ treat, data = dandruff, FUN = mean)
stdevs <- aggregate(flaking ~ treat, data = dandruff, FUN = sd)
groups <- data.frame(treat = counts$treat, n = counts$flaking, mean = means$flaking, sd = stdevs$flaking)
groups

##     treat   n     mean        sd
## 1    Keto 106 16.02830 0.9305149
## 2 Placebo  28 29.39286 1.5948827
## 3    PyrI 112 17.39286 1.1418110
## 4   PyrII 109 17.20183 1.3524999

From the graph & table above, it looks like ketoconazole shampoo might be more effective than the other two options.

results <- aov(flaking ~ treat, data = treatment)
summary(results)

##              Df Sum Sq Mean Sq F value Pr(>F)    
## treat         2  117.6   58.81   43.99 <2e-16 ***
## Residuals   324  433.2    1.34                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The ANOVA F-test confirms that there are significant differences between the averages of the three groups.

Bonferroni Correction

If we want to compare Keto vs. PyrI, and Keto vs. PyrII, and PyrI vs. PyrII at the \(\alpha = 0.05\) significance level, then the Bonferroni correction suggests we should aim for a significance level of \(\alpha* = 0.05 / 3 = 1.667\%\) on each test, and we should make a confidence intervals at the \(1 - \alpha^* = 98.333\%\) confidence level. Here are the statistics for each group:

Here is the t-value and corresponding p-value to compare Keto versus PyrI.

mse <- anova(results)["Residuals", "Mean Sq"]
dfe <- anova(results)["Residuals", "Df"]
meanA = groups$mean[1] # Keto
meanB = groups$mean[3] # PyrI
nA = groups$n[1]
nB = groups$n[3]
t = (meanA - meanB) / sqrt(mse * (1 / nA + 1 / nB))
p = 2 * (1 - pt(abs(t), dfe))

A t-test to compare Keto vs. PyrI has t-value -8.7088059 with corresponding two-sided p-value 0. That is statistically significant, even at the \(\alpha^* = 1.6667\%\) level.

alpha = 0.05 / 3
tstar = qt(1 - alpha / 2, dfe)
upper = meanA - meanB + tstar * sqrt(mse * (1 / nA + 1 / nB))
lower = meanA - meanB - tstar * sqrt(mse * (1 / nA + 1 / nB))

Using the Bonferroni correction, we can be 95% confidence that using ketoconazole shampoo will produce between -1.7416183 and -0.9874922 fewer flakes than pyrithione 1% shampoo.

Tukey’s Honest Significant Differences

Calculating pairwise differences in R is tedious to do by programming the details yourself. A convenient option is the following function:

TukeyHSD(results, conf.level = 0.95)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = flaking ~ treat, data = treatment)
## 
## $treat
##                  diff        lwr       upr     p adj
## PyrI-Keto   1.3645553  0.9956305 1.7334800 0.0000000
## PyrII-Keto  1.1735330  0.8021478 1.5449181 0.0000000
## PyrII-PyrI -0.1910223 -0.5573294 0.1752848 0.4375893