Chapter 1

Hypothesis Testing with Z-Tests

📖 ~50 min read 📈 1 interactive chart ✍️ 4 practice questions 🎯 2 linked games

1.1 The Logic of Hypothesis Testing

In descriptive statistics, we summarize data. In inferential statistics, we use sample data to draw conclusions about a larger population. The most structured way to do this is through hypothesis testing — a formal procedure for deciding whether sample evidence supports or contradicts a claim about a population parameter.

Every hypothesis test begins with two competing statements:

  • Null hypothesis (H0): A statement of “no effect” or “no difference.” It represents the status quo — the claim we assume to be true unless strong evidence contradicts it.
  • Alternative hypothesis (Ha): The claim we are trying to find evidence for. It contradicts the null hypothesis.

The burden of proof lies with Ha. Just as a defendant is presumed innocent until proven guilty, H0 is presumed true until the data provide convincing evidence against it. We never “prove” H0 — we either reject it or fail to reject it.

Type I and Type II Errors

Because we make decisions based on samples (not the entire population), we can make mistakes:

  • Type I error (α): Rejecting H0 when it is actually true. This is a “false positive.” The probability of a Type I error is the significance level α, typically set at 0.05.
  • Type II error (β): Failing to reject H0 when it is actually false. This is a “false negative.”

Lowering α reduces the chance of a Type I error but increases the chance of a Type II error. Analysts must balance these risks based on the business context.

🏭 GreatLakes Manufacturing

GreatLakes Manufacturing is a mid-size auto parts manufacturer in Wisconsin. Their flagship product is a precision-machined shaft with a specification diameter of 10.00 mm. Quality control regularly samples finished shafts to ensure the production process is on target.

An engineer suspects the process has shifted. She sets up a hypothesis test:

H0: μ = 10.00 mm (the process is on target)
Ha: μ ≠ 10.00 mm (the process has shifted)

This is a two-tailed test because the engineer is concerned about a shift in either direction — the diameter could be too large or too small.

💡 Key Takeaway

We never “prove” the null hypothesis. We either reject it (finding sufficient evidence against it) or fail to reject it (insufficient evidence to overturn it). The absence of evidence is not evidence of absence.

✓ Check Your Understanding
A Type I error is:
Failing to reject a false H0
Rejecting a true H0
Accepting H0
Using the wrong test
🎮
Practice: Type I or Type II Classify error scenarios in hypothesis testing contexts

1.2 The Z-Test for a Population Mean

The Z-test is used to test a hypothesis about a population mean when two conditions are met: (1) the population standard deviation σ is known, and (2) the sample size is n ≥ 30 (or the population is normally distributed). Under these conditions, the sampling distribution of x is approximately normal, and we can use the standard normal (Z) distribution to calculate our test statistic.

The Z-Test Statistic

The test statistic measures how many standard errors the sample mean is away from the hypothesized population mean. A large absolute value of Z indicates that the sample mean is far from what we would expect if H0 were true.

Z-Test Statistic
📊 Excel: =(xbar-mu0)/(sigma/SQRT(n))
where x is the sample mean, μ0 is the hypothesized population mean, σ is the known population standard deviation, and n is the sample size.

You can also use Excel's STANDARDIZE function: =STANDARDIZE(x, μ0, σ/SQRT(n)), which computes the same result.

✎ Worked Example: Two-Tailed Z-Test
1
Setup: GreatLakes samples n = 50 shafts. The sample mean is x = 10.08 mm. Historical data shows σ = 0.20 mm. Test at α = 0.05 (two-tailed).
H0: μ = 10.00   Ha: μ ≠ 10.00
2
Compute the standard error:
SE = σ / n1/2 = 0.20 / 501/2 = 0.20 / 7.071 = 0.02828
3
Compute the Z statistic:
Z = (10.08 − 10.00) / 0.02828 = 0.08 / 0.02828 = 2.83
4
Compare to critical values. For α = 0.05 two-tailed, critical Z = ±1.96.
|Z| = 2.83 > 1.96 ⇒ Reject H0
5
Result: At the 5% significance level, there is sufficient evidence that the mean shaft diameter has shifted from 10.00 mm. GreatLakes should investigate the production process.
Standard Normal Distribution — Two-Tailed Test (α = 0.05)
✓ Check Your Understanding
For the worked example above (Z = 2.83, two-tailed), the p-value is approximately:
0.05
0.023
0.0046
0.10
🎮
Practice: P-Value Poker Estimate p-values from test statistics and make reject/fail-to-reject decisions

1.3 One-Tailed vs Two-Tailed Tests

A two-tailed test checks for a difference in either direction (Ha: μ ≠ μ0). A one-tailed test checks for a difference in a specific direction:

  • Right-tailed: Ha: μ > μ0 (the parameter is greater than claimed)
  • Left-tailed: Ha: μ < μ0 (the parameter is less than claimed)

The choice between one-tailed and two-tailed depends on the research question. If you only care about one direction of departure, a one-tailed test is more powerful because all of α is concentrated in one tail.

Effect on Critical Values

For α = 0.05:

  • Two-tailed: Critical values are ±1.96 (splitting α/2 = 0.025 in each tail)
  • One-tailed: Critical value is 1.645 (right-tail) or −1.645 (left-tail), placing all of α in one tail

Worked Example: One-Tailed Z-Test

Using the same GreatLakes data (Z = 2.83), suppose the engineer specifically tests whether the diameter has increased:

  • H0: μ = 10.00   Ha: μ > 10.00 (right-tailed)
  • Critical value at α = 0.05: Z = 1.645
  • Since 2.83 > 1.645, reject H0
  • The one-tailed p-value is approximately 0.0023 (half the two-tailed p-value)
✓ Check Your Understanding
A manager wants to test whether a new supplier has a lower defect rate than the current 5%. The alternative hypothesis Ha is:
p ≠ 0.05
p > 0.05
p < 0.05
p = 0.05

1.4 The P-Value Approach

The p-value is the probability of observing a test statistic as extreme as (or more extreme than) the one calculated, assuming H0 is true. It quantifies the strength of evidence against H0:

  • If p-value ≤ α, reject H0
  • If p-value > α, fail to reject H0

Common Misinterpretations

The p-value is frequently misunderstood. It is not:

  • The probability that H0 is true
  • The probability that Ha is true
  • A measure of the size of the effect

A small p-value means the observed data would be unlikely if H0 were true. It does not tell us the probability that any hypothesis is true or false — it tells us about the data, given an assumption.

Calculating P-Values in Excel

For a two-tailed Z-test:

  • =2*(1-NORM.S.DIST(ABS(Z),TRUE))

For a one-tailed test (right tail):

  • =1-NORM.S.DIST(Z,TRUE)
✓ Check Your Understanding
A p-value of 0.03 with α = 0.05 means:
There is a 3% chance H0 is true
Reject H0 at the 5% significance level
The effect is large
Ha is proven

Chapter Summary

In this chapter, we established the logical framework of hypothesis testing and applied it using the Z-test for a population mean. Here is what you should take away:

💡 Chapter 1 Summary

Hypothesis Testing Logic: H0 represents the status quo; Ha is the claim we seek evidence for. We never prove H0 — we reject it or fail to reject it based on sample evidence.

Type I and II Errors: Type I (α) is rejecting a true H0; Type II (β) is failing to reject a false H0. The significance level α controls the Type I error rate.

Z-Test: Used when σ is known and n ≥ 30. The test statistic measures how many standard errors the sample mean is from the hypothesized mean.

One vs Two-Tailed: Use two-tailed when testing for any difference; use one-tailed when the direction of departure is specified in advance.

P-Value: The probability of observing data as extreme as ours if H0 is true. Reject H0 when p-value ≤ α.

📋 Chapter 1 — Formula Reference
Measure Formula Excel Function
Z-Test Statistic
=(xbar-mu0)/(sigma/SQRT(n))
Standard Error
=sigma/SQRT(n)
P-Value (two-tailed)
=2*(1-NORM.S.DIST(ABS(Z),TRUE))
P-Value (one-tailed)
=1-NORM.S.DIST(Z,TRUE)
Critical Z (two-tailed)
=NORM.S.INV(1-alpha/2)
📄
Download the GreatLakes Z-Test Dataset
Coming Soon — Excel file with sample shaft measurements for practice
Up Next
Chapter 2: Hypothesis Testing with T-Tests