In STATS200, you learned to compare the means of two groups using two-sample t-tests. That framework works well when the research question involves exactly two populations — for example, does a new training program improve employee productivity compared to the old one? But what happens when the question involves more than two groups?
NorthStar Enterprises operates four major divisions: Retail, Manufacturing, Logistics, and Corporate Services. Management wants to know whether average employee satisfaction scores differ across all four divisions. A two-sample t-test can only compare two groups at a time, so comparing all four would require running multiple tests.
With 4 divisions, you would need to run C(4,2) = 6 separate t-tests (Retail vs. Manufacturing, Retail vs. Logistics, etc.). Each test uses an alpha level of 0.05, meaning a 5% chance of a Type I error on each individual test. But across all 6 tests, the probability of making at least one Type I error grows dramatically. This inflated risk is called the familywise error rate.
=1-(1-alpha)^COMBIN(k,2)α is the per-test significance level and k is the number of groups being compared.
If NorthStar's HR team runs 6 separate t-tests to compare satisfaction across all 4 divisions, there is roughly a 1-in-4 chance they will flag a difference that does not actually exist. This could lead to costly interventions in divisions that are performing perfectly fine. The solution is a single test that evaluates all groups simultaneously — ANOVA.
Analysis of Variance (ANOVA) is a statistical method that tests the equality of three or more population means simultaneously. Instead of conducting multiple pairwise tests, ANOVA performs a single hypothesis test that controls the Type I error rate at the desired alpha level.
ANOVA sets up two competing hypotheses:
Notice that Ha does not specify which means differ — only that the pattern of means is not all identical. If ANOVA rejects H0, follow-up (post-hoc) tests are needed to identify which specific groups differ. We cover those in Chapter 3.
ANOVA works by decomposing the total variation in the data into two components:
ANOVA compares these two sources of variation. If between-group variation is large relative to within-group variation, there is evidence that the group means are not all equal.
NorthStar's four divisions reported the following mean employee satisfaction scores (on a 1–100 scale): Retail: 72, Manufacturing: 65, Logistics: 68, Corporate Services: 78. The overall grand mean across all employees is 70.75. ANOVA asks: are these differences in division means larger than what we would expect from random variation alone?
Like all parametric tests, ANOVA relies on certain assumptions about the data. Violating these assumptions can lead to incorrect conclusions, so it is important to check them before interpreting results.
Observations within and between groups must be independent. In NorthStar's case, each employee's satisfaction score should not influence another's. This assumption is typically satisfied through proper random sampling or experimental design.
The dependent variable should be approximately normally distributed within each group. ANOVA is moderately robust to violations of normality, especially with larger sample sizes (thanks to the Central Limit Theorem). Histograms or normal probability plots can be used to check this assumption.
The population variances should be approximately equal across all groups. This is the most critical assumption for ANOVA. Levene's test provides a formal statistical test for this assumption — a non-significant result (p > 0.05) suggests the variances are sufficiently equal.
When all groups have the same sample size (balanced design), ANOVA is quite robust to moderate violations of normality and homogeneity of variance. This is one reason researchers often strive for equal group sizes in their studies. NorthStar's HR team sampled 25 employees from each division, creating a balanced design that provides extra protection against assumption violations.
ANOVA solves the multiple-comparisons problem by testing all group means in a single test, keeping the Type I error rate at the intended alpha level. It works by comparing between-group variation to within-group variation. Before interpreting ANOVA results, always verify the three assumptions: independence, normality, and homogeneity of variance. Balanced designs (equal n per group) provide extra robustness.
This chapter laid the groundwork for ANOVA by reviewing two-sample tests and explaining why they fail when comparing more than two groups. Here is what you should take away:
Two-Sample Test Limitation: Running multiple pairwise t-tests inflates the familywise error rate well beyond the nominal alpha level. With just 5 groups, the error rate approaches 40%.
ANOVA: Analysis of Variance tests H0: all population means are equal versus Ha: at least one differs. It compares between-group variation to within-group variation in a single F-test.
Assumptions: Independence, normality, and homogeneity of variance. Balanced designs provide robustness against moderate violations.