Chapter 3

Two-Sample T-Tests

📖 ~45 min read ✍️ 3 practice questions

3.1 Independent vs Paired Samples

In Chapter 2, we tested whether a single sample mean differed from a hypothesized value. Now we extend hypothesis testing to compare two groups. The first decision you must make is whether the two samples are independent or paired.

Independent Samples

Two samples are independent when the observations in one group have no natural connection to the observations in the other. Each data point in Group A is unrelated to any specific data point in Group B. Examples include comparing output from two different machines, test scores from two different classrooms, or defect rates from two separate suppliers.

Paired Samples

Two samples are paired when each observation in one group is naturally linked to a specific observation in the other. The most common pairing is a before-and-after design, where the same subjects are measured twice. Other examples include matched pairs of similar units or left-versus-right measurements on the same item.

The distinction matters because paired designs remove between-subject variability, making it easier to detect a true difference. Using the wrong test can lead to incorrect conclusions — an independent test on paired data wastes statistical power, and a paired test on independent data violates assumptions.

🏭 GreatLakes Manufacturing

Independent example: GreatLakes wants to compare the output (units per hour) of Machine A versus Machine B. The machines run independently with different operators, so measurements from Machine A are unrelated to those from Machine B. This calls for an independent two-sample t-test.

Paired example: GreatLakes wants to evaluate whether scheduled maintenance improves a machine’s output. They measure the same machine’s output before and after maintenance. Because both measurements come from the same machine, the data are naturally paired. This calls for a paired t-test.

✓ Check Your Understanding
A company measures the productivity of 20 employees before and after a new training program. Which test is most appropriate?
A) Independent two-sample t-test
B) Paired t-test
C) One-sample t-test
D) Z-test

3.2 Independent Two-Sample T-Test

The independent two-sample t-test (also called Welch’s t-test) compares the means of two unrelated groups. Unlike the pooled t-test, Welch’s version does not assume equal variances, making it the safer default choice.

Hypotheses

  • H0: μ1 = μ2 (the two population means are equal)
  • Ha: μ1 ≠ μ2 (the two population means differ)
Independent Two-Sample T-Test Statistic (Welch’s)
📊 Excel: =T.TEST(array1, array2, tails, 3)
where x1 and x2 are the sample means, s1 and s2 are the sample standard deviations, n1 and n2 are the sample sizes. Type 3 in T.TEST specifies Welch’s (unequal variances).

Welch-Satterthwaite Degrees of Freedom

Because the two groups may have different variances and sample sizes, the degrees of freedom for Welch’s t-test are approximated using the Welch-Satterthwaite equation. The result is typically not a whole number and is rounded down in practice. When both groups have the same sample size, df falls somewhere between n−1 and 2(n−1), depending on how different the variances are.

Welch-Satterthwaite Degrees of Freedom
📊 Excel: =T.TEST() handles df internally
The numerator is the squared sum of the two variance-over-n terms. The denominator sums each squared variance-over-n term divided by its respective (n−1).
✎ Worked Example: Independent Two-Sample T-Test
1
Setup: GreatLakes compares shaft diameters from two machines. Machine A: n1 = 20, x1 = 10.05 mm, s1 = 0.15 mm. Machine B: n2 = 20, x2 = 9.95 mm, s2 = 0.22 mm. Test at α = 0.05 (two-tailed).
H0: μ1 = μ2   Ha: μ1 ≠ μ2
2
Compute the standard error of the difference:
SE = (s1²/n1 + s2²/n2)1/2 = (0.15²/20 + 0.22²/20)1/2
= (0.0225/20 + 0.0484/20)1/2 = (0.001125 + 0.00242)1/2
= (0.003545)1/2 = 0.05954
3
Compute the t statistic:
t = (10.05 − 9.95) / 0.05954 = 0.10 / 0.05954 = 1.679
4
Compute approximate degrees of freedom using Welch-Satterthwaite:
df = (0.003545)² / [(0.001125)²/19 + (0.00242)²/19]
= 0.00001257 / [0.00000006664 + 0.0000003083]
= 0.00001257 / 0.0000003749 = 33.5 ≈ 33
5
Find the critical value. With df = 33 and α = 0.05 (two-tailed), the critical t ≈ ±2.035.
|t| = 1.679 < 2.035 ⇒ Fail to reject H0
6
Result: At the 5% significance level, there is insufficient evidence that the mean shaft diameters from Machine A and Machine B differ. Both machines appear to be producing comparable output.
Excel: =T.TEST(A_data, B_data, 2, 3) returns p ≈ 0.102
✓ Check Your Understanding
When using Welch’s t-test with both groups having n = 25 observations, what are the degrees of freedom?
A) 50
B) 48
C) Approximately 40–48, depending on the variance ratio
D) 24

3.3 Paired T-Test

When data are naturally paired, we reduce the problem to a one-sample t-test on the differences. For each pair, compute the difference di = beforei − afteri. Then test whether the mean difference d is significantly different from zero.

Hypotheses

  • H0: μd = 0 (no mean difference)
  • Ha: μd ≠ 0 (there is a mean difference)
Paired T-Test Statistic
📊 Excel: =T.TEST(before, after, tails, 1)
where d is the mean of the paired differences, sd is the standard deviation of the differences, n is the number of pairs, and df = n − 1. Type 1 in T.TEST specifies a paired test.
✎ Worked Example: Paired T-Test
1
Setup: GreatLakes tests a new lubricant on 12 machines. Output (units/hr) is measured before and after applying the lubricant. Test at α = 0.05 (two-tailed) whether the lubricant changes output.
H0: μd = 0   Ha: μd ≠ 0
2
Calculate the differences (Before − After) for each machine:
d: −3, −5, −2, −4, −1, −6, −3, −2, −4, −5, −3, −2
(Negative values indicate output increased after the lubricant.)
3
Compute the mean difference:
d = (−3 + −5 + ... + −2) / 12 = −40 / 12 = −3.333
4
Compute the standard deviation of the differences:
sd = 1.497
(Use =STDEV.S(differences) in Excel.)
5
Compute the standard error and t statistic:
SE = sd / n1/2 = 1.497 / 121/2 = 1.497 / 3.464 = 0.4323
t = d / SE = −3.333 / 0.4323 = −7.712
6
Find the critical value. With df = 11 and α = 0.05 (two-tailed), the critical t = ±2.201.
|t| = 7.712 > 2.201 ⇒ Reject H0
7
Result: At the 5% significance level, there is strong evidence that the new lubricant changes machine output. The negative mean difference indicates output increased by about 3.3 units per hour on average.
Excel: =T.TEST(before, after, 2, 1) returns p < 0.001
✓ Check Your Understanding
What is the main advantage of a paired design over an independent design?
A) It always produces a larger sample size
B) It removes between-subject variability
C) It requires simpler calculations
D) It requires no assumptions about the data
💡 Key Takeaway

Always choose the test that matches your study design. Paired tests are more powerful when pairing is appropriate because they control for individual differences. Independent tests are required when the two groups have no natural connection. When in doubt about equal variances for independent samples, use Welch’s t-test — it is the safer default.

Chapter Summary

In this chapter, we extended hypothesis testing to two-group comparisons. Here is what you should take away:

💡 Chapter 3 Summary

Independent vs Paired: Use an independent t-test when the two groups are unrelated. Use a paired t-test when each observation in one group is naturally linked to an observation in the other (e.g., before/after on the same subject).

Welch’s T-Test: The default choice for independent samples. Does not assume equal variances. Uses the Welch-Satterthwaite approximation for degrees of freedom.

Paired T-Test: Reduces to a one-sample t-test on the differences. More powerful than independent tests when pairing is valid because it removes between-subject variability.

Excel: Use =T.TEST(array1, array2, tails, type) where type 1 = paired, type 3 = Welch’s independent.

📋 Chapter 3 — Formula Reference
Measure Formula Excel Function
Independent t Statistic
=T.TEST(a1,a2,tails,3)
Welch-Satterthwaite df
Handled by T.TEST
Paired t Statistic
=T.TEST(b,a,tails,1)
Mean Difference
=AVERAGE(diffs)
SD of Differences
=STDEV.S(diffs)
Paired df
=COUNT(diffs)-1
Up Next
Chapter 4: Confidence Intervals