Chapter 3

Post-Hoc Tests and Effect Size

📖 ~45 min read ✍️ 1 practice question

3.1 Why Post-Hoc Tests Are Needed

When a one-way ANOVA rejects H0, we know that at least one group mean differs from the others — but we do not know which pairs of means are significantly different. Post-hoc (Latin for "after this") tests perform pairwise comparisons while controlling the familywise error rate, the probability of making at least one Type I error across all comparisons.

Why Not Just Run Multiple t-Tests?

With k groups there are k(k−1)/2 possible pairwise comparisons. Running individual t-tests inflates the overall Type I error rate. For example, with 4 groups there are 6 comparisons; at α = 0.05 each, the familywise error rate climbs to roughly 1 − (0.95)6 ≈ 0.26 — far above the intended 5%.

Common Post-Hoc Tests

  • Tukey HSD (Honestly Significant Difference) — Best when group sizes are equal. Uses the Studentized Range distribution.
  • Bonferroni Correction — Conservative approach. Divides α by the number of comparisons. Works with unequal group sizes.
  • Scheffé Test — Most conservative. Controls error for all possible contrasts, not just pairwise. Use when exploring complex contrasts.
🏪 NorthStar Enterprises

In Chapter 2, NorthStar's ANOVA rejected H0 for employee satisfaction across 4 divisions (F(3,36) = 5.77, p = 0.0025). HR now needs to determine which specific pairs of divisions differ. With 4 groups, there are 4(3)/2 = 6 pairwise comparisons to evaluate.

3.2 Tukey HSD

The Tukey HSD test compares every pair of group means. A pair is declared significantly different if the absolute difference between their means exceeds a critical threshold called the Honestly Significant Difference.

Tukey HSD Threshold
📊 Excel: =ABS(mean1-mean2) then compare to HSD threshold
where q is the critical value from the Studentized Range table (based on k groups and dfW), MSW is the within-group mean square from ANOVA, and n is the common sample size per group.
→ Tukey Q Critical Value Calculator

Decision Rule

For each pair of groups (i, j): if |xixj| > HSD, the difference is statistically significant. Otherwise, we have insufficient evidence that those two means differ.

✎ Worked Example: NorthStar Pairwise Comparisons
1
From Chapter 2: k = 4, n = 10 per group, MSW = 54.78, dfW = 36.
Group means: Retail = 72, Manufacturing = 65, Logistics = 68, Corp Services = 78
2
Find q from Studentized Range table for k = 4, df = 36, α = 0.05:
q ≈ 3.809
3
Compute the HSD threshold:
HSD = 3.809 × (54.78 / 10)^(1/2) = 3.809 × 2.340 = 8.91
4
Evaluate all 6 pairwise differences:
|Retail − Manuf| = |72 − 65| = 7 < 8.91 → Not significant
|Retail − Logistics| = |72 − 68| = 4 < 8.91 → Not significant
|Retail − Corp Svc| = |72 − 78| = 6 < 8.91 → Not significant
|Manuf − Logistics| = |65 − 68| = 3 < 8.91 → Not significant
|Manuf − Corp Svc| = |65 − 78| = 13 > 8.91 → Significant
|Logistics − Corp Svc| = |68 − 78| = 10 > 8.91 → Significant
5
Result: Corporate Services differs significantly from both Manufacturing and Logistics. No other pairs are significantly different at α = 0.05.
✓ Check Your Understanding
Tukey HSD shows that divisions A and C do not differ significantly. This means:
The original ANOVA was wrong
Divisions A and C have the same true population mean
There is insufficient evidence that A and C differ
The post-hoc test overrides the ANOVA result

3.3 Effect Size with Eta-Squared

A statistically significant ANOVA result tells us that group means differ, but not how much. Effect size quantifies the magnitude of the difference. The most common effect size for one-way ANOVA is eta-squared (η²), which represents the proportion of total variance explained by the grouping variable.

Eta-Squared
📊 Excel: =SSB/SST (from ANOVA output)
where SSB is the between-group sum of squares and SST is the total sum of squares from the ANOVA table.

Interpreting Eta-Squared

Cohen's guidelines for interpreting η²:

  • Small effect: η² ≈ 0.01 — The grouping variable explains about 1% of variance.
  • Medium effect: η² ≈ 0.06 — About 6% of variance explained.
  • Large effect: η² ≈ 0.14 — About 14% or more of variance explained.
🏪 NorthStar Enterprises

From Chapter 2, NorthStar's ANOVA produced SSB = 947.50 and SST = 2919.50. The effect size is:

η² = 947.50 / 2919.50 = 0.325

This is a large effect — division membership explains about 32.5% of the variance in employee satisfaction. This is not only statistically significant but also practically meaningful. HR should invest in understanding why Corporate Services scores so much higher.

💡 Key Takeaway

Always report effect size alongside p-values. A small p-value tells you the result is unlikely due to chance; η² tells you whether the effect is large enough to matter in practice. With large samples, even trivial differences can be "significant." Effect size provides the practical context that p-values alone cannot.

Chapter Summary

This chapter covered two essential follow-ups to a significant ANOVA: post-hoc pairwise comparisons and effect size measurement.

💡 Chapter 3 Summary

Post-Hoc Tests: After rejecting H0 in ANOVA, use Tukey HSD (equal n), Bonferroni (conservative), or Scheffé (most conservative) to identify which specific pairs of means differ.

Tukey HSD: Compare each |xixj| to HSD = q · (MSW/n)1/2. Pairs exceeding the threshold are significantly different.

Eta-Squared: η² = SSB/SST measures the proportion of variance explained. Always report alongside p-values.

📋 Chapter 3 — Formula Reference
Measure Formula Excel Function
Tukey HSD
=ABS(mean1-mean2)
η²
=SSB/SST
Bonferroni α
=alpha/COMBIN(k,2)
Number of pairs
=COMBIN(k,2)
Up Next
Chapter 4: Multiple Regression