Chapter 2

One-Way ANOVA

📖 ~55 min read 📈 1 interactive chart ✍️ 2 practice questions 🎯 1 linked game

2.1 The ANOVA Table

The one-way ANOVA procedure organizes all its calculations into a structured ANOVA table. This table decomposes the total variation in the data into between-group and within-group components, then uses their ratio to test whether the group means differ.

Total Sum of Squares (SST)

SST measures the total variation in the dataset — how much all individual observations deviate from the grand mean (xgrand). It is the starting point: everything we observe in the data is captured by SST.

Total Sum of Squares
📊 Excel: =DEVSQ(all_data)
where xij is each observation and xgrand is the grand mean of all observations.

Between-Group Sum of Squares (SSB)

SSB measures how much the group means differ from the grand mean. If all group means were identical, SSB would be zero. Large SSB suggests the groups have genuinely different population means.

Between-Group Sum of Squares
📊 Excel: Use Data Analysis Toolpak > ANOVA: Single Factor
where nj is the sample size in group j, xj is group j's mean, and xgrand is the overall grand mean.

Within-Group Sum of Squares (SSW)

SSW (also called SSE, Sum of Squares Error) measures variation within each group. It captures the natural randomness — individual differences that exist even among members of the same group. SSW = SST − SSB.

Within-Group Sum of Squares
📊 Excel: SSW = SST − SSB (from Toolpak output)
where the inner sum is the deviation of each observation from its own group mean.

Mean Squares and the F-Statistic

To make SSB and SSW comparable, we divide each by their respective degrees of freedom to obtain mean squares. The ratio of MSB to MSW gives the F-statistic.

F-Statistic
📊 Excel: =F.TEST(range1, range2) or use Data Analysis Toolpak
where k is the number of groups and N is the total number of observations. MSB = SSB/(k−1), MSW = SSW/(N−k).
✎ Worked Example: ANOVA for 4 Divisions (n=10 each)
1
NorthStar sampled n=10 employees from each of k=4 divisions, so N=40 total.
Group means: Retail=72, Manufacturing=65, Logistics=68, Corp Services=78
Grand mean: (72+65+68+78)/4 = 70.75
2
Compute SSB (between groups):
SSB = 10[(72−70.75)² + (65−70.75)² + (68−70.75)² + (78−70.75)²]
SSB = 10[1.5625 + 33.0625 + 7.5625 + 52.5625] = 10(94.75) = 947.50
3
Assume SSW = 1,972.00 (from individual deviations within each group).
SST = SSB + SSW = 947.50 + 1972.00 = 2919.50
4
Compute mean squares and F:
MSB = 947.50 / (4−1) = 947.50 / 3 = 315.83
MSW = 1972.00 / (40−4) = 1972.00 / 36 = 54.78
F = 315.83 / 54.78 = 5.77
5
Result: F = 5.77 with df1 = 3 and df2 = 36. We will evaluate this in Section 2.2.
✓ Check Your Understanding
An ANOVA compares 4 groups with n = 10 observations each. What are the degrees of freedom for the between-group and within-group components?
df1 = 3 and df2 = 36
df1 = 4 and df2 = 40
df1 = 3 and df2 = 40
df1 = 4 and df2 = 36

2.2 The F-Test and Decision

The F-statistic follows an F-distribution, which is always right-skewed and only takes positive values. Unlike the t-test, the ANOVA F-test is always right-tailed — we reject H0 only when the F-statistic is unusually large, indicating that between-group variation greatly exceeds within-group variation.

Decision Rule

Compare the computed F to the critical value from the F-distribution table (or compute a p-value):

  • If F > Fcritical (or p-value < α), reject H0. Conclude at least one mean differs.
  • If F ≤ Fcritical (or p-value ≥ α), fail to reject H0. Insufficient evidence that the means differ.
F-Distribution (df1=3, df2=36) with Rejection Region
✎ Worked Example: Making the Decision
1
From the previous section: F = 5.77, df1 = 3, df2 = 36, α = 0.05.
2
F-critical for α = 0.05 with df1 = 3, df2 = 36:
Fcritical = 2.866
3
Compare: F = 5.77 > Fcritical = 2.866.
p-value = F.DIST.RT(5.77, 3, 36) ≈ 0.0025
4
Decision: Reject H0. There is statistically significant evidence that at least one division's mean satisfaction differs from the others (F(3,36) = 5.77, p = 0.0025).
🏪 NorthStar Enterprises

The ANOVA result tells NorthStar's HR team that employee satisfaction is not equal across all four divisions. However, it does not reveal which divisions differ. Corporate Services (mean 78) appears highest and Manufacturing (mean 65) lowest, but post-hoc tests (Chapter 3) are needed to confirm which specific pairwise differences are statistically significant.

→ Use the Tukey Q Calculator to find the critical value for your post-hoc comparisons.

✓ Check Your Understanding
In an ANOVA, the computed F-statistic is 1.2 and the critical value at α = 0.05 is Fcritical = 2.87. What is the correct decision?
Reject H0 — at least one mean differs
Fail to reject H0 — insufficient evidence means differ
Accept H0 — all means are equal
Inconclusive — more data needed
🎮
Practice: ANOVA Battleground Build the F-statistic and make the call under time pressure
💡 Key Takeaway

The F-statistic is the ratio of between-group mean square to within-group mean square. A large F means the group means are more spread out than you would expect from random variation alone. The F-test is always right-tailed: only large F values lead to rejection. Always report F, degrees of freedom, and the p-value together.

Chapter Summary

This chapter walked through the mechanics of one-way ANOVA, from constructing the ANOVA table to making a decision with the F-test.

💡 Chapter 2 Summary

ANOVA Table: Decomposes total variation (SST) into between-group (SSB) and within-group (SSW) components. Mean squares are obtained by dividing by degrees of freedom.

F-Statistic: F = MSB / MSW. Large values indicate group means differ more than expected by chance.

Decision: Compare F to the critical value or use the p-value. Reject H0 when F exceeds the critical value.

📋 Chapter 2 — Formula Reference
Measure Formula Excel Function
SST
=DEVSQ(all_data)
SSB
Data Analysis Toolpak
SSW
SST − SSB
MSB
Data Analysis Toolpak
MSW
Data Analysis Toolpak
F-Statistic
=F.TEST()
p-value
=F.DIST.RT(F,df1,df2)
Up Next
Chapter 3: Post-Hoc Tests and Effect Size