Chapter 7: Chi-Square Tests

7.1 Chi-Square Goodness of Fit

All the tests we have covered so far — Z-tests, t-tests, ANOVA — deal with continuous, numerical data. But many business questions involve categorical data: defect types, customer segments, satisfaction levels. The chi-square test is designed for exactly these situations.

The goodness-of-fit test asks whether the observed frequency distribution of a single categorical variable matches an expected (hypothesized) distribution. For example, if defects should be equally distributed across four categories, do the actual counts support that assumption?

Hypotheses

H₀: The observed frequencies match the expected frequencies
H_a: The observed frequencies do not match the expected frequencies

Chi-Square Test Statistic

📊 Excel: =CHISQ.TEST(actual_range, expected_range)

where O_i is the observed count for category i, E_i is the expected count, and the sum is over all k categories. For goodness of fit, df = k − 1.

🏭 GreatLakes Manufacturing

GreatLakes tracks defects in four categories: dimensional, surface finish, material, and assembly. Management expects defects to be equally distributed across all four categories (25% each). In a recent audit of 160 defects, the observed counts were: 45 dimensional, 32 surface finish, 58 material, and 25 assembly. Does this distribution differ significantly from the expected equal split?

✎ Worked Example: Chi-Square Goodness of Fit

Setup: 160 total defects across 4 categories. Expected: 40 per category (160 / 4). Test at α = 0.05.
H₀: Defects are equally distributed H_a: Defects are not equally distributed

Organize the observed and expected counts:

Category	Observed (O)	Expected (E)	(O − E)² / E
Dimensional	45	40	0.625
Surface Finish	32	40	1.600
Material	58	40	8.100
Assembly	25	40	5.625

Sum the contributions to get the test statistic:
χ² = 0.625 + 1.600 + 8.100 + 5.625 = 15.95

Degrees of freedom: df = k − 1 = 4 − 1 = 3. The critical value at α = 0.05 with df = 3 is 7.815.
χ² = 15.95 > 7.815 ⇒ Reject H₀

Result: At the 5% significance level, there is strong evidence that defects are not equally distributed across the four categories. Material defects appear disproportionately high, while assembly defects are lower than expected. Management should investigate the root cause of material defects.
Excel: =CHISQ.TEST(O_range, E_range) returns p ≈ 0.0012

✓ Check Your Understanding

A chi-square goodness-of-fit test has 5 categories. What are the degrees of freedom?

A) 5

B) 4

C) n − 1

D) 10

🎮

Practice: Chi-Square Checker Test whether observed frequencies match expected distributions

→

7.2 Chi-Square Test of Independence

While the goodness-of-fit test examines one categorical variable, the test of independence examines the relationship between two categorical variables. The data are organized in a contingency table (also called a cross-tabulation), where rows represent one variable and columns represent the other.

Hypotheses

H₀: The two variables are independent (no association)
H_a: The two variables are not independent (there is an association)

Computing Expected Frequencies

Under the assumption of independence, the expected count for each cell is:

Expected Frequency for Independence Test

📊 Excel: =(row_total * col_total) / grand_total

The degrees of freedom for a contingency table are df = (r − 1)(c − 1), where r is the number of rows and c is the number of columns.

🏭 GreatLakes Manufacturing

GreatLakes surveys 200 employees about job satisfaction across three shifts. Management wants to know whether satisfaction level is related to shift assignment, or whether the distributions are similar across shifts.

✎ Worked Example: Chi-Square Test of Independence

Setup: Survey of 200 employees. Rows: satisfaction (Satisfied, Neutral, Dissatisfied). Columns: shift (Day, Evening, Night). Test at α = 0.05.
H₀: Satisfaction and shift are independent
H_a: Satisfaction and shift are not independent

Observed contingency table:

	Day	Evening	Night	Row Total
Satisfied	50	30	10	90
Neutral	20	25	15	60
Dissatisfied	10	15	25	50
Col Total	80	70	50	200

Compute expected frequencies. For example, E(Satisfied, Day) = (90 × 80) / 200 = 36.0:

	Day	Evening	Night
Satisfied	36.0	31.5	22.5
Neutral	24.0	21.0	15.0
Dissatisfied	20.0	17.5	12.5

Compute each cell’s contribution (O − E)² / E and sum them:
χ² = (50−36)²/36 + (30−31.5)²/31.5 + (10−22.5)²/22.5
+ (20−24)²/24 + (25−21)²/21 + (15−15)²/15
+ (10−20)²/20 + (15−17.5)²/17.5 + (25−12.5)²/12.5
= 5.444 + 0.071 + 6.944 + 0.667 + 0.762 + 0
+ 5.000 + 0.357 + 12.500 = 31.745

Degrees of freedom: df = (r − 1)(c − 1) = (3 − 1)(3 − 1) = 4. The critical value at α = 0.05 with df = 4 is 9.488.
χ² = 31.745 > 9.488 ⇒ Reject H₀

Result: At the 5% significance level, there is strong evidence that employee satisfaction and shift are not independent. Day-shift workers show higher satisfaction, while night-shift workers show higher dissatisfaction. GreatLakes should investigate working conditions on the night shift.
Excel: =CHISQ.TEST(observed_range, expected_range) returns p < 0.001

✓ Check Your Understanding

A contingency table has 3 rows and 4 columns. What are the degrees of freedom for a chi-square test of independence?

A) 12

B) 6

C) 7

D) 11

7.3 Assumptions and Limitations

The chi-square test is widely applicable, but it does have important requirements:

Random sampling: The data must come from a random or representative sample.
Independence of observations: Each observation contributes to only one cell in the table.
Minimum expected frequency: All expected cell counts should be at least 5. When expected counts fall below 5, the chi-square approximation becomes unreliable. In such cases, consider combining categories or using Fisher’s exact test.
Use counts, not percentages: The chi-square formula requires raw frequency counts. Never plug in percentages or proportions — the test statistic will be incorrect.

When Chi-Square Fails

If your contingency table is 2×2 and any expected count is less than 5, use Fisher’s exact test instead. For larger tables with some small expected counts, you can often combine similar categories to increase expected frequencies above the threshold of 5.

💡 Key Takeaway

Chi-square tests work with counts, not percentages. Always verify that all expected cell frequencies are at least 5 before interpreting the results. The chi-square test tells you whether an association exists, but not how strong it is — for that, examine the standardized residuals or compute Cramér’s V.

Chapter Summary

In this final chapter of STATS200, we covered the chi-square family of tests for categorical data. Here is what you should take away:

💡 Chapter 7 Summary

Goodness of Fit: Tests whether the observed distribution of a single categorical variable matches an expected distribution. Uses df = k − 1.

Test of Independence: Tests whether two categorical variables are related using a contingency table. Expected counts are computed as (row total × column total) / grand total. Uses df = (r − 1)(c − 1).

Assumptions: Random sample, independent observations, all expected counts ≥ 5. Always use raw counts, never percentages.

Excel: Use =CHISQ.TEST(actual_range, expected_range) to get the p-value directly. Use =CHISQ.INV.RT(alpha, df) to find the critical value.

📋 Chapter 7 — Formula Reference

Measure	Formula	Excel Function
Chi-Square Statistic		`=CHISQ.TEST(O,E)`
Expected Frequency		`=(row*col)/total`
GoF Degrees of Freedom		`=categories-1`
Independence df		`=(rows-1)*(cols-1)`
Critical Value		`=CHISQ.INV.RT(alpha,df)`
P-Value		`=CHISQ.DIST.RT(stat,df)`

🎓

Course Complete!

Congratulations — you have completed all seven chapters of STATS200: Business Statistics. You now have a solid foundation in descriptive statistics, hypothesis testing, confidence intervals, regression, ANOVA, and chi-square tests. Return to the course overview to review any chapter.

Back to STATS200 Overview

Chi-Square Tests

7.1 Chi-Square Goodness of Fit

Hypotheses

7.2 Chi-Square Test of Independence

Hypotheses

Computing Expected Frequencies

7.3 Assumptions and Limitations

When Chi-Square Fails

Chapter Summary

Course Complete!

Chapter Outline

Progress