Chapter 3: Post-Hoc Tests and Effect Size

3.1 Why Post-Hoc Tests Are Needed

When a one-way ANOVA rejects H₀, we know that at least one group mean differs from the others — but we do not know which pairs of means are significantly different. Post-hoc (Latin for "after this") tests perform pairwise comparisons while controlling the familywise error rate, the probability of making at least one Type I error across all comparisons.

Why Not Just Run Multiple t-Tests?

With k groups there are k(k−1)/2 possible pairwise comparisons. Running individual t-tests inflates the overall Type I error rate. For example, with 4 groups there are 6 comparisons; at α = 0.05 each, the familywise error rate climbs to roughly 1 − (0.95)⁶ ≈ 0.26 — far above the intended 5%.

Common Post-Hoc Tests

Tukey HSD (Honestly Significant Difference) — Best when group sizes are equal. Uses the Studentized Range distribution.
Bonferroni Correction — Conservative approach. Divides α by the number of comparisons. Works with unequal group sizes.
Scheffé Test — Most conservative. Controls error for all possible contrasts, not just pairwise. Use when exploring complex contrasts.

🏪 NorthStar Enterprises

In Chapter 2, NorthStar's ANOVA rejected H₀ for employee satisfaction across 4 divisions (F(3,36) = 5.77, p = 0.0025). HR now needs to determine which specific pairs of divisions differ. With 4 groups, there are 4(3)/2 = 6 pairwise comparisons to evaluate.

3.2 Tukey HSD

The Tukey HSD test compares every pair of group means. A pair is declared significantly different if the absolute difference between their means exceeds a critical threshold called the Honestly Significant Difference.

Tukey HSD Threshold

📊 Excel: =ABS(mean1-mean2) then compare to HSD threshold

where q is the critical value from the Studentized Range table (based on k groups and df_W), MSW is the within-group mean square from ANOVA, and n is the common sample size per group.
→ Tukey Q Critical Value Calculator

Decision Rule

For each pair of groups (i, j): if |x_i − x_j| > HSD, the difference is statistically significant. Otherwise, we have insufficient evidence that those two means differ.

✎ Worked Example: NorthStar Pairwise Comparisons

From Chapter 2: k = 4, n = 10 per group, MSW = 54.78, df_W = 36.
Group means: Retail = 72, Manufacturing = 65, Logistics = 68, Corp Services = 78

Find q from Studentized Range table for k = 4, df = 36, α = 0.05:
q ≈ 3.809

Compute the HSD threshold:
HSD = 3.809 × (54.78 / 10)^(1/2) = 3.809 × 2.340 = 8.91

Evaluate all 6 pairwise differences:
|Retail − Manuf| = |72 − 65| = 7 < 8.91 → Not significant
|Retail − Logistics| = |72 − 68| = 4 < 8.91 → Not significant
|Retail − Corp Svc| = |72 − 78| = 6 < 8.91 → Not significant
|Manuf − Logistics| = |65 − 68| = 3 < 8.91 → Not significant
|Manuf − Corp Svc| = |65 − 78| = 13 > 8.91 → Significant
|Logistics − Corp Svc| = |68 − 78| = 10 > 8.91 → Significant

Result: Corporate Services differs significantly from both Manufacturing and Logistics. No other pairs are significantly different at α = 0.05.

✓ Check Your Understanding

Tukey HSD shows that divisions A and C do not differ significantly. This means:

The original ANOVA was wrong

Divisions A and C have the same true population mean

There is insufficient evidence that A and C differ

The post-hoc test overrides the ANOVA result

3.3 Effect Size with Eta-Squared

A statistically significant ANOVA result tells us that group means differ, but not how much. Effect size quantifies the magnitude of the difference. The most common effect size for one-way ANOVA is eta-squared (η²), which represents the proportion of total variance explained by the grouping variable.

Eta-Squared

📊 Excel: =SSB/SST (from ANOVA output)

where SS_B is the between-group sum of squares and SS_T is the total sum of squares from the ANOVA table.

Interpreting Eta-Squared

Cohen's guidelines for interpreting η²:

Small effect: η² ≈ 0.01 — The grouping variable explains about 1% of variance.
Medium effect: η² ≈ 0.06 — About 6% of variance explained.
Large effect: η² ≈ 0.14 — About 14% or more of variance explained.

🏪 NorthStar Enterprises

From Chapter 2, NorthStar's ANOVA produced SSB = 947.50 and SST = 2919.50. The effect size is:

η² = 947.50 / 2919.50 = 0.325

This is a large effect — division membership explains about 32.5% of the variance in employee satisfaction. This is not only statistically significant but also practically meaningful. HR should invest in understanding why Corporate Services scores so much higher.

💡 Key Takeaway

Always report effect size alongside p-values. A small p-value tells you the result is unlikely due to chance; η² tells you whether the effect is large enough to matter in practice. With large samples, even trivial differences can be "significant." Effect size provides the practical context that p-values alone cannot.

Chapter Summary

This chapter covered two essential follow-ups to a significant ANOVA: post-hoc pairwise comparisons and effect size measurement.

💡 Chapter 3 Summary

Post-Hoc Tests: After rejecting H₀ in ANOVA, use Tukey HSD (equal n), Bonferroni (conservative), or Scheffé (most conservative) to identify which specific pairs of means differ.

Tukey HSD: Compare each |x_i − x_j| to HSD = q · (MSW/n)^1/2. Pairs exceeding the threshold are significantly different.

Eta-Squared: η² = SSB/SST measures the proportion of variance explained. Always report alongside p-values.

📋 Chapter 3 — Formula Reference