In descriptive statistics, we summarize data. In inferential statistics, we use sample data to draw conclusions about a larger population. The most structured way to do this is through hypothesis testing — a formal procedure for deciding whether sample evidence supports or contradicts a claim about a population parameter.
Every hypothesis test begins with two competing statements:
The burden of proof lies with Ha. Just as a defendant is presumed innocent until proven guilty, H0 is presumed true until the data provide convincing evidence against it. We never “prove” H0 — we either reject it or fail to reject it.
Because we make decisions based on samples (not the entire population), we can make mistakes:
Lowering α reduces the chance of a Type I error but increases the chance of a Type II error. Analysts must balance these risks based on the business context.
GreatLakes Manufacturing is a mid-size auto parts manufacturer in Wisconsin. Their flagship product is a precision-machined shaft with a specification diameter of 10.00 mm. Quality control regularly samples finished shafts to ensure the production process is on target.
An engineer suspects the process has shifted. She sets up a hypothesis test:
H0: μ = 10.00 mm (the process is on target)
Ha: μ ≠ 10.00 mm (the process has shifted)
This is a two-tailed test because the engineer is concerned about a shift in either direction — the diameter could be too large or too small.
We never “prove” the null hypothesis. We either reject it (finding sufficient evidence against it) or fail to reject it (insufficient evidence to overturn it). The absence of evidence is not evidence of absence.
The Z-test is used to test a hypothesis about a population mean when two conditions are met: (1) the population standard deviation σ is known, and (2) the sample size is n ≥ 30 (or the population is normally distributed). Under these conditions, the sampling distribution of is approximately normal, and we can use the standard normal (Z) distribution to calculate our test statistic.
The test statistic measures how many standard errors the sample mean is away from the hypothesized population mean. A large absolute value of Z indicates that the sample mean is far from what we would expect if H0 were true.
=(xbar-mu0)/(sigma/SQRT(n)) is the sample mean, μ0 is the hypothesized population mean, σ is the known population standard deviation, and n is the sample size.
You can also use Excel's STANDARDIZE function: =STANDARDIZE(, μ0, σ/SQRT(n)), which computes the same result.
A two-tailed test checks for a difference in either direction (Ha: μ ≠ μ0). A one-tailed test checks for a difference in a specific direction:
The choice between one-tailed and two-tailed depends on the research question. If you only care about one direction of departure, a one-tailed test is more powerful because all of α is concentrated in one tail.
For α = 0.05:
Using the same GreatLakes data (Z = 2.83), suppose the engineer specifically tests whether the diameter has increased:
The p-value is the probability of observing a test statistic as extreme as (or more extreme than) the one calculated, assuming H0 is true. It quantifies the strength of evidence against H0:
The p-value is frequently misunderstood. It is not:
A small p-value means the observed data would be unlikely if H0 were true. It does not tell us the probability that any hypothesis is true or false — it tells us about the data, given an assumption.
For a two-tailed Z-test:
=2*(1-NORM.S.DIST(ABS(Z),TRUE))For a one-tailed test (right tail):
=1-NORM.S.DIST(Z,TRUE)In this chapter, we established the logical framework of hypothesis testing and applied it using the Z-test for a population mean. Here is what you should take away:
Hypothesis Testing Logic: H0 represents the status quo; Ha is the claim we seek evidence for. We never prove H0 — we reject it or fail to reject it based on sample evidence.
Type I and II Errors: Type I (α) is rejecting a true H0; Type II (β) is failing to reject a false H0. The significance level α controls the Type I error rate.
Z-Test: Used when σ is known and n ≥ 30. The test statistic measures how many standard errors the sample mean is from the hypothesized mean.
One vs Two-Tailed: Use two-tailed when testing for any difference; use one-tailed when the direction of departure is specified in advance.
P-Value: The probability of observing data as extreme as ours if H0 is true. Reject H0 when p-value ≤ α.