Chapter 1

Two-Sample Tests and ANOVA Overview

📖 ~50 min read 📈 1 interactive chart ✍️ 2 practice questions 🎯 1 linked game

1.1 Review of Two-Sample Tests

In STATS200, you learned to compare the means of two groups using two-sample t-tests. That framework works well when the research question involves exactly two populations — for example, does a new training program improve employee productivity compared to the old one? But what happens when the question involves more than two groups?

NorthStar Enterprises operates four major divisions: Retail, Manufacturing, Logistics, and Corporate Services. Management wants to know whether average employee satisfaction scores differ across all four divisions. A two-sample t-test can only compare two groups at a time, so comparing all four would require running multiple tests.

The Problem with Multiple t-Tests

With 4 divisions, you would need to run C(4,2) = 6 separate t-tests (Retail vs. Manufacturing, Retail vs. Logistics, etc.). Each test uses an alpha level of 0.05, meaning a 5% chance of a Type I error on each individual test. But across all 6 tests, the probability of making at least one Type I error grows dramatically. This inflated risk is called the familywise error rate.

Familywise Error Rate
📊 Excel: =1-(1-alpha)^COMBIN(k,2)
where α is the per-test significance level and k is the number of groups being compared.
✎ Worked Example: Familywise Error with 4 Groups
1
Number of pairwise comparisons with k = 4 groups:
C(4, 2) = 4! / (2! × 2!) = 6
2
Familywise error rate at α = 0.05:
1 − (1 − 0.05)^6 = 1 − 0.95^6 = 1 − 0.7351 = 0.2649
3
Result: There is approximately a 26.5% chance of at least one false positive — far above the intended 5% threshold.
🏪 NorthStar Enterprises

If NorthStar's HR team runs 6 separate t-tests to compare satisfaction across all 4 divisions, there is roughly a 1-in-4 chance they will flag a difference that does not actually exist. This could lead to costly interventions in divisions that are performing perfectly fine. The solution is a single test that evaluates all groups simultaneously — ANOVA.

✓ Check Your Understanding
A researcher wants to compare mean test scores across 5 training methods using α = 0.05. If they run all possible pairwise t-tests, what is the approximate familywise error rate?
5%
25%
40%
50%

1.2 Introduction to ANOVA

Analysis of Variance (ANOVA) is a statistical method that tests the equality of three or more population means simultaneously. Instead of conducting multiple pairwise tests, ANOVA performs a single hypothesis test that controls the Type I error rate at the desired alpha level.

The Hypotheses

ANOVA sets up two competing hypotheses:

  • H0: All population means are equal — μ1 = μ2 = μ3 = ... = μk
  • Ha: At least one population mean differs from the others

Notice that Ha does not specify which means differ — only that the pattern of means is not all identical. If ANOVA rejects H0, follow-up (post-hoc) tests are needed to identify which specific groups differ. We cover those in Chapter 3.

The Core Idea: Between vs. Within Variation

ANOVA works by decomposing the total variation in the data into two components:

  • Between-group variation (SSB): How much the group means differ from the overall grand mean. If the groups truly have different means, this will be large.
  • Within-group variation (SSW): How much individual observations vary within each group. This represents natural randomness.

ANOVA compares these two sources of variation. If between-group variation is large relative to within-group variation, there is evidence that the group means are not all equal.

🏪 NorthStar Enterprises

NorthStar's four divisions reported the following mean employee satisfaction scores (on a 1–100 scale): Retail: 72, Manufacturing: 65, Logistics: 68, Corporate Services: 78. The overall grand mean across all employees is 70.75. ANOVA asks: are these differences in division means larger than what we would expect from random variation alone?

Employee Satisfaction Scores by Division
🎮
Practice: ANOVA or Not? Decide whether ANOVA is the right test for each scenario
✓ Check Your Understanding
ANOVA is the appropriate test when:
Comparing two means only
The dependent variable is categorical
Comparing 3 or more means on a continuous variable
Only when samples are very large

1.3 ANOVA Assumptions

Like all parametric tests, ANOVA relies on certain assumptions about the data. Violating these assumptions can lead to incorrect conclusions, so it is important to check them before interpreting results.

1. Independence

Observations within and between groups must be independent. In NorthStar's case, each employee's satisfaction score should not influence another's. This assumption is typically satisfied through proper random sampling or experimental design.

2. Normality

The dependent variable should be approximately normally distributed within each group. ANOVA is moderately robust to violations of normality, especially with larger sample sizes (thanks to the Central Limit Theorem). Histograms or normal probability plots can be used to check this assumption.

3. Homogeneity of Variance

The population variances should be approximately equal across all groups. This is the most critical assumption for ANOVA. Levene's test provides a formal statistical test for this assumption — a non-significant result (p > 0.05) suggests the variances are sufficiently equal.

Robustness with Equal Sample Sizes

When all groups have the same sample size (balanced design), ANOVA is quite robust to moderate violations of normality and homogeneity of variance. This is one reason researchers often strive for equal group sizes in their studies. NorthStar's HR team sampled 25 employees from each division, creating a balanced design that provides extra protection against assumption violations.

💡 Key Takeaway

ANOVA solves the multiple-comparisons problem by testing all group means in a single test, keeping the Type I error rate at the intended alpha level. It works by comparing between-group variation to within-group variation. Before interpreting ANOVA results, always verify the three assumptions: independence, normality, and homogeneity of variance. Balanced designs (equal n per group) provide extra robustness.

Chapter Summary

This chapter laid the groundwork for ANOVA by reviewing two-sample tests and explaining why they fail when comparing more than two groups. Here is what you should take away:

💡 Chapter 1 Summary

Two-Sample Test Limitation: Running multiple pairwise t-tests inflates the familywise error rate well beyond the nominal alpha level. With just 5 groups, the error rate approaches 40%.

ANOVA: Analysis of Variance tests H0: all population means are equal versus Ha: at least one differs. It compares between-group variation to within-group variation in a single F-test.

Assumptions: Independence, normality, and homogeneity of variance. Balanced designs provide robustness against moderate violations.

📋 Chapter 1 — Formula Reference
Concept Formula Excel Function
Familywise Error
=1-(1-alpha)^COMBIN(k,2)
Number of Pairs
=COMBIN(k,2)
ANOVA H0
Up Next
Chapter 2: One-Way ANOVA