Chapter 4

The Normal Distribution & Z-Scores

📖 ~45 min read 📈 3 interactive charts ✍️ 3 practice questions 🎯 1 linked game

4.1 The Normal Distribution

The normal distribution is the most important probability distribution in statistics. Also called the bell curve or Gaussian distribution, it describes a pattern where most observations cluster around a central value and the frequency of observations decreases symmetrically as you move away from the center.

The normal distribution is fully determined by just two parameters: the mean (μ), which sets the center of the curve, and the standard deviation (σ), which controls how wide or narrow the curve is. A small standard deviation produces a tall, narrow bell; a large standard deviation produces a short, wide bell. Regardless of the specific values, the shape is always perfectly symmetric.

Properties of the Normal Distribution

  • Symmetric: The left and right sides of the curve are mirror images around the mean.
  • Unimodal: There is a single peak, located at the mean. The mean, median, and mode are all equal.
  • Asymptotic: The tails of the curve approach but never touch the horizontal axis, extending infinitely in both directions.
  • Total area = 1: The entire area under the curve represents 100% of the probability.

The Empirical Rule (68-95-99.7)

One of the most practical features of the normal distribution is the empirical rule, which tells us how data is distributed around the mean:

  • 68% of data falls within 1 standard deviation of the mean (μ ± 1σ)
  • 95% of data falls within 2 standard deviations of the mean (μ ± 2σ)
  • 99.7% of data falls within 3 standard deviations of the mean (μ ± 3σ)

This rule provides a quick way to assess whether individual data points are typical or unusual. A value more than 2 standard deviations from the mean occurs less than 5% of the time — making it relatively rare.

🏪 LakeFront Retail Co.

LakeFront Retail Co. has analyzed years of shipping data and found that delivery times are approximately normally distributed with a mean of 3.2 days and a standard deviation of 0.6 days.

Using the empirical rule: 68% of deliveries arrive between 2.6 and 3.8 days, 95% arrive between 2.0 and 4.4 days, and 99.7% arrive between 1.4 and 5.0 days. A delivery taking more than 5 days would be extremely unusual and worth investigating.

Interactive Normal Curve

Adjust the mean and standard deviation to see how the normal curve changes shape.

✓ Check Your Understanding
In a normal distribution, approximately what percentage of data falls within 2 standard deviations of the mean?
A) 68%
B) 95%
C) 99.7%
D) 50%

4.2 Z-Scores

A raw data value by itself does not tell us much about how unusual it is. Is a daily sales figure of $4,800 impressive? It depends on the context — the average level and the typical amount of variation. A Z-score solves this problem by converting any raw value into a standardized measure that tells us exactly how many standard deviations it is from the mean.

Z-scores allow us to compare values across different scales. For example, you can compare a store's sales performance (measured in dollars) to its customer satisfaction score (measured on a 1-10 scale) by converting both to Z-scores. The Z-score strips away the original units and expresses everything in terms of standard deviations.

Interpreting Z-Scores

  • Z = 0: The value equals the mean.
  • Z > 0: The value is above the mean.
  • Z < 0: The value is below the mean.
  • |Z| > 2: The value is unusually far from the mean (outside 95% of the data).
  • |Z| > 3: The value is extremely rare (outside 99.7% of the data).
Z-Score (Standardized Value)
📊 Excel: =STANDARDIZE(x, mean, std_dev)
where Z is the standardized score, x is the raw value, μ is the population mean, and σ is the population standard deviation.
🏪 LakeFront Retail Co.

LakeFront Store A has daily sales with a mean of $4,100 and standard deviation of $320. Yesterday, Store A had sales of $4,800.

The Z-score is: Z = (4,800 − 4,100) / 320 = 700 / 320 = 2.19.

This means yesterday's sales were 2.19 standard deviations above the mean — a very strong day. Using the empirical rule, values beyond Z = 2 occur less than 5% of the time, so this was an exceptionally good performance.

⚡ Z-Score Calculator

Enter a value, mean, and standard deviation to compute the Z-score and see its position on the normal curve.

✓ Check Your Understanding
A Z-score of −1.5 means the value is:
A) 1.5 above the mean
B) 1.5 standard deviations below the mean
C) In the bottom 1.5% of data
D) Negative and therefore invalid
🎮
Practice: Z-Score Percentile Game Convert raw values to Z-scores and estimate percentiles

4.3 Finding Probabilities with Z-Scores

The real power of Z-scores comes from connecting them to probabilities. Once you convert a value to a Z-score, you can use the standard normal table (Z-table) or Excel functions to find the probability of observing a value at or below that point. This probability corresponds to the area under the curve to the left of the Z-score.

Three Types of Probability Questions

  • Left tail (less than): What proportion of values fall below a given point? Use the Z-table directly or =NORM.S.DIST(z, TRUE).
  • Right tail (greater than): What proportion of values fall above a given point? Subtract the left-tail probability from 1: =1-NORM.S.DIST(z, TRUE).
  • Between two values: What proportion falls between two points? Find the left-tail probability for each and subtract.

Excel also offers =NORM.DIST(x, mean, std_dev, TRUE) which lets you skip the Z-score conversion entirely and calculate probabilities directly from raw values.

Normal Probability (Left Tail)
📊 Excel: =NORM.S.DIST(z, TRUE)
Returns P(Z ≤ z), the area under the standard normal curve to the left of z.
Normal Probability (Direct from Raw Value)
📊 Excel: =NORM.DIST(x, mean, std_dev, TRUE)
Returns P(X ≤ x) directly without needing to compute the Z-score first.
✎ Example 1: Proportion of Deliveries Under 4 Days
1
LakeFront delivery times: μ = 3.2 days, σ = 0.6 days. Find P(X < 4).
Z = (4 − 3.2) / 0.6 = 0.8 / 0.6 = 1.333
2
Look up Z = 1.33 in the Z-table (or use Excel):
=NORM.S.DIST(1.333, TRUE) = 0.9088
3
Result: Approximately 90.9% of LakeFront deliveries arrive in under 4 days.
✎ Example 2: Proportion Taking More Than 4.5 Days
1
Find P(X > 4.5). First convert to a Z-score:
Z = (4.5 − 3.2) / 0.6 = 1.3 / 0.6 = 2.167
2
Right-tail probability:
=1 − NORM.S.DIST(2.167, TRUE) = 1 − 0.9849 = 0.0151
3
Result: Only about 1.5% of deliveries take more than 4.5 days.
✎ Example 3: Proportion Between 2.5 and 4 Days
1
Find P(2.5 < X < 4). Convert both values:
Z1 = (2.5 − 3.2) / 0.6 = −1.167
Z2 = (4 − 3.2) / 0.6 = 1.333
2
Subtract left-tail areas:
=NORM.S.DIST(1.333, TRUE) − NORM.S.DIST(−1.167, TRUE)
= 0.9088 − 0.1217 = 0.7871
3
Result: About 78.7% of deliveries arrive between 2.5 and 4 days.
✓ Check Your Understanding
LakeFront wants to flag deliveries in the top 5% for being too slow. Using μ = 3.2 days and σ = 0.6 days, what delivery time is the cutoff? (Hint: Z = 1.645 for the 95th percentile)
A) 3.8 days
B) 4.0 days
C) 4.19 days
D) 4.5 days
💡 Key Takeaway

Z-scores are the bridge between raw data and probability. By standardizing any value — regardless of its original scale — you can determine exactly how unusual it is and calculate the probability of observing values above, below, or between any two points. In business, this translates directly to setting thresholds, identifying outliers, and making data-driven decisions about what counts as “normal” versus “exceptional.”

4.4 Chapter Summary

In this chapter, we explored the normal distribution and learned how Z-scores let us standardize any data value and connect it to probabilities. Here is what you should take away:

💡 Chapter 4 Summary

Normal Distribution: The bell curve is defined by its mean (μ) and standard deviation (σ). The empirical rule (68-95-99.7) gives a quick estimate of how data is spread around the center.

Z-Scores: Standardizing a value tells you how many standard deviations it is from the mean, enabling comparisons across different scales and identifying outliers.

Probabilities: By converting to Z-scores, you can find the probability (area under the curve) of any range of values using tables or Excel functions like NORM.S.DIST and NORM.DIST.

The Big Picture: The normal distribution underpins most of the inferential statistics we will study next — including confidence intervals, hypothesis tests, and sampling theory. Mastering Z-scores now pays dividends in every chapter that follows.

📋 Chapter 4 — Formula Reference
Concept Formula Excel Function
Z-Score
=STANDARDIZE(x, mean, std_dev)
Left-Tail P(Z ≤ z)
=NORM.S.DIST(z, TRUE)
Right-Tail P(Z > z)
=1-NORM.S.DIST(z, TRUE)
Direct Normal P(X ≤ x)
=NORM.DIST(x, mean, sd, TRUE)
Empirical Rule
68-95-99.7 rule
📄
Download the Chapter 4 Practice Dataset
Coming Soon — Excel file with normal distribution exercises
Up Next
Chapter 5: Sampling & The Central Limit Theorem