Chapter 1: Hypothesis Testing with Z-Tests

1.1 The Logic of Hypothesis Testing

In descriptive statistics, we summarize data. In inferential statistics, we use sample data to draw conclusions about a larger population. The most structured way to do this is through hypothesis testing — a formal procedure for deciding whether sample evidence supports or contradicts a claim about a population parameter.

Every hypothesis test begins with two competing statements:

Null hypothesis (H₀): A statement of “no effect” or “no difference.” It represents the status quo — the claim we assume to be true unless strong evidence contradicts it.
Alternative hypothesis (H_a): The claim we are trying to find evidence for. It contradicts the null hypothesis.

The burden of proof lies with H_a. Just as a defendant is presumed innocent until proven guilty, H₀ is presumed true until the data provide convincing evidence against it. We never “prove” H₀ — we either reject it or fail to reject it.

Type I and Type II Errors

Because we make decisions based on samples (not the entire population), we can make mistakes:

Type I error (α): Rejecting H₀ when it is actually true. This is a “false positive.” The probability of a Type I error is the significance level α, typically set at 0.05.
Type II error (β): Failing to reject H₀ when it is actually false. This is a “false negative.”

Lowering α reduces the chance of a Type I error but increases the chance of a Type II error. Analysts must balance these risks based on the business context.

🏭 GreatLakes Manufacturing

GreatLakes Manufacturing is a mid-size auto parts manufacturer in Wisconsin. Their flagship product is a precision-machined shaft with a specification diameter of 10.00 mm. Quality control regularly samples finished shafts to ensure the production process is on target.

An engineer suspects the process has shifted. She sets up a hypothesis test:

H₀: μ = 10.00 mm (the process is on target)
H_a: μ ≠ 10.00 mm (the process has shifted)

This is a two-tailed test because the engineer is concerned about a shift in either direction — the diameter could be too large or too small.

💡 Key Takeaway

We never “prove” the null hypothesis. We either reject it (finding sufficient evidence against it) or fail to reject it (insufficient evidence to overturn it). The absence of evidence is not evidence of absence.

✓ Check Your Understanding

A Type I error is:

Failing to reject a false H₀

Rejecting a true H₀

Accepting H₀

Using the wrong test

🎮

Practice: Type I or Type II Classify error scenarios in hypothesis testing contexts

→

1.2 The Z-Test for a Population Mean

The Z-test is used to test a hypothesis about a population mean when two conditions are met: (1) the population standard deviation σ is known, and (2) the sample size is n ≥ 30 (or the population is normally distributed). Under these conditions, the sampling distribution of x is approximately normal, and we can use the standard normal (Z) distribution to calculate our test statistic.

The Z-Test Statistic

The test statistic measures how many standard errors the sample mean is away from the hypothesized population mean. A large absolute value of Z indicates that the sample mean is far from what we would expect if H₀ were true.

Z-Test Statistic

📊 Excel: =(xbar-mu0)/(sigma/SQRT(n))

where x is the sample mean, μ₀ is the hypothesized population mean, σ is the known population standard deviation, and n is the sample size.

You can also use Excel's STANDARDIZE function: =STANDARDIZE(x, μ₀, σ/SQRT(n)), which computes the same result.

✎ Worked Example: Two-Tailed Z-Test

Setup: GreatLakes samples n = 50 shafts. The sample mean is x = 10.08 mm. Historical data shows σ = 0.20 mm. Test at α = 0.05 (two-tailed).
H₀: μ = 10.00 H_a: μ ≠ 10.00

Compute the standard error:
SE = σ / n^1/2 = 0.20 / 50^1/2 = 0.20 / 7.071 = 0.02828

Compute the Z statistic:
Z = (10.08 − 10.00) / 0.02828 = 0.08 / 0.02828 = 2.83

Compare to critical values. For α = 0.05 two-tailed, critical Z = ±1.96.
|Z| = 2.83 > 1.96 ⇒ Reject H₀

Result: At the 5% significance level, there is sufficient evidence that the mean shaft diameter has shifted from 10.00 mm. GreatLakes should investigate the production process.

Standard Normal Distribution — Two-Tailed Test (α = 0.05)

✓ Check Your Understanding

For the worked example above (Z = 2.83, two-tailed), the p-value is approximately:

0.05

0.023

0.0046

0.10

🎮

Practice: P-Value Poker Estimate p-values from test statistics and make reject/fail-to-reject decisions

→

1.3 One-Tailed vs Two-Tailed Tests

A two-tailed test checks for a difference in either direction (H_a: μ ≠ μ₀). A one-tailed test checks for a difference in a specific direction:

Right-tailed: H_a: μ > μ₀ (the parameter is greater than claimed)
Left-tailed: H_a: μ < μ₀ (the parameter is less than claimed)

The choice between one-tailed and two-tailed depends on the research question. If you only care about one direction of departure, a one-tailed test is more powerful because all of α is concentrated in one tail.

Effect on Critical Values

For α = 0.05:

Two-tailed: Critical values are ±1.96 (splitting α/2 = 0.025 in each tail)
One-tailed: Critical value is 1.645 (right-tail) or −1.645 (left-tail), placing all of α in one tail

Worked Example: One-Tailed Z-Test

Using the same GreatLakes data (Z = 2.83), suppose the engineer specifically tests whether the diameter has increased:

H₀: μ = 10.00 H_a: μ > 10.00 (right-tailed)
Critical value at α = 0.05: Z = 1.645
Since 2.83 > 1.645, reject H₀
The one-tailed p-value is approximately 0.0023 (half the two-tailed p-value)

✓ Check Your Understanding

A manager wants to test whether a new supplier has a lower defect rate than the current 5%. The alternative hypothesis H_a is:

p ≠ 0.05

p > 0.05

p < 0.05

p = 0.05

1.4 The P-Value Approach

The p-value is the probability of observing a test statistic as extreme as (or more extreme than) the one calculated, assuming H₀ is true. It quantifies the strength of evidence against H₀:

If p-value ≤ α, reject H₀
If p-value > α, fail to reject H₀

Common Misinterpretations

The p-value is frequently misunderstood. It is not:

The probability that H₀ is true
The probability that H_a is true
A measure of the size of the effect

A small p-value means the observed data would be unlikely if H₀ were true. It does not tell us the probability that any hypothesis is true or false — it tells us about the data, given an assumption.

Calculating P-Values in Excel

For a two-tailed Z-test:

=2*(1-NORM.S.DIST(ABS(Z),TRUE))

For a one-tailed test (right tail):

=1-NORM.S.DIST(Z,TRUE)

✓ Check Your Understanding

A p-value of 0.03 with α = 0.05 means:

There is a 3% chance H₀ is true

Reject H₀ at the 5% significance level

The effect is large

H_a is proven

Chapter Summary

In this chapter, we established the logical framework of hypothesis testing and applied it using the Z-test for a population mean. Here is what you should take away:

💡 Chapter 1 Summary

Hypothesis Testing Logic: H₀ represents the status quo; H_a is the claim we seek evidence for. We never prove H₀ — we reject it or fail to reject it based on sample evidence.

Type I and II Errors: Type I (α) is rejecting a true H₀; Type II (β) is failing to reject a false H₀. The significance level α controls the Type I error rate.

Z-Test: Used when σ is known and n ≥ 30. The test statistic measures how many standard errors the sample mean is from the hypothesized mean.

One vs Two-Tailed: Use two-tailed when testing for any difference; use one-tailed when the direction of departure is specified in advance.

P-Value: The probability of observing data as extreme as ours if H₀ is true. Reject H₀ when p-value ≤ α.

📋 Chapter 1 — Formula Reference

Measure	Formula	Excel Function
Z-Test Statistic		`=(xbar-mu0)/(sigma/SQRT(n))`
Standard Error		`=sigma/SQRT(n)`
P-Value (two-tailed)		`=2*(1-NORM.S.DIST(ABS(Z),TRUE))`
P-Value (one-tailed)		`=1-NORM.S.DIST(Z,TRUE)`
Critical Z (two-tailed)		`=NORM.S.INV(1-alpha/2)`

📄

Download the GreatLakes Z-Test Dataset

Coming Soon — Excel file with sample shaft measurements for practice

Up Next

Chapter 2: Hypothesis Testing with T-Tests

→

Hypothesis Testing with Z-Tests

1.1 The Logic of Hypothesis Testing

Type I and Type II Errors

1.2 The Z-Test for a Population Mean

The Z-Test Statistic

1.3 One-Tailed vs Two-Tailed Tests

Effect on Critical Values

Worked Example: One-Tailed Z-Test

1.4 The P-Value Approach

Common Misinterpretations

Calculating P-Values in Excel

Chapter Summary

Chapter Outline

Progress