Chapter 5: Correlation Analysis

5.1 Measuring Linear Relationships

In business, we often suspect that two variables move together. Does machine age predict maintenance cost? Does advertising spending correlate with sales? The Pearson correlation coefficient (r) quantifies the strength and direction of the linear relationship between two quantitative variables.

Properties of r

Range: r always falls between −1 and +1.
Direction: A positive r means both variables increase together; a negative r means one increases as the other decreases.
Strength: Values near ±1 indicate a strong linear relationship; values near 0 indicate little or no linear association.
Unitless: Correlation has no units — it is not affected by changes in scale or measurement units.
Not causation: Correlation measures association, not cause and effect.

Pearson Correlation Coefficient

📊 Excel: =CORREL(range1, range2)

where x_i and y_i are paired observations, x and y are their respective means.

🏪 GreatLakes Manufacturing

GreatLakes Manufacturing collected data on machine age (years) and annual maintenance cost ($000s) for n = 20 machines on their factory floor. Management suspects that older machines require more maintenance spending. The scatter plot below visualizes this relationship.

Machine Age vs. Annual Maintenance Cost (n = 20)

✓ Check Your Understanding

A correlation of r = −0.85 between two variables means:

A) No relationship

B) Strong positive linear relationship

C) Strong negative linear relationship

D) Proof of causation

🎮

Practice: Correlation Spotter Estimate the correlation from scatter plots and sharpen your visual intuition

→

5.2 Testing Significance of Correlation

A sample correlation of r = 0.73 looks impressive, but could it have arisen by chance from a population where the true correlation is zero? To answer this, we perform a hypothesis test for the population correlation ρ (rho).

Hypotheses

H₀: ρ = 0 (no linear relationship in the population)
H_a: ρ ≠ 0 (a linear relationship exists)

The test statistic follows a t-distribution with df = n − 2:

t-Test for Correlation

📊 Excel: =T.DIST.2T(abs_t, df) for the p-value

where r is the sample correlation, n is the sample size, and df = n − 2.

✎ Worked Example: Testing r = 0.73 with n = 20

Given: r = 0.73, n = 20, so df = 20 − 2 = 18.

Compute the test statistic:
t = r × (n − 2)^{1/2} / (1 − r²)^{1/2}
t = 0.73 × (18)^{1/2} / (1 − 0.5329)^{1/2}
t = 0.73 × 4.243 / (0.4671)^{1/2}
t = 3.097 / 0.683 = 4.53

With df = 18 and a two-tailed test, the critical t-value at α = 0.05 is approximately 2.101. Since 4.53 > 2.101, we reject H₀.

Result: There is statistically significant evidence of a linear relationship between machine age and maintenance cost (t = 4.53, p < 0.001).

✓ Check Your Understanding

For r = 0.40 and n = 10, the test statistic t is approximately:

A) 1.30

B) 2.31

C) 0.40

D) 4.00

5.3 Chapter Summary

This chapter introduced the Pearson correlation coefficient as a measure of linear association and the t-test for evaluating its statistical significance.

💡 Chapter 5 Summary

Correlation (r): Measures the strength and direction of a linear relationship between two variables, ranging from −1 to +1. Always unitless.

Significance Test: Use the t-test with df = n − 2 to determine whether the sample correlation is statistically different from zero.

Correlation is not causation: A strong correlation indicates association but does not prove that one variable causes changes in the other.

📋 Chapter 5 — Formula Reference

Measure	Formula	Excel Function
Pearson r		`=CORREL(range1, range2)`
t-Test for r		`=T.DIST.2T(abs_t, df)`
Degrees of Freedom		`=COUNT(range)-2`

Up Next

Chapter 6: Simple Linear Regression

→

Correlation Analysis

5.1 Measuring Linear Relationships

Properties of r

5.2 Testing Significance of Correlation

Hypotheses

5.3 Chapter Summary

Chapter Outline

Progress