In business, we often suspect that two variables move together. Does machine age predict maintenance cost? Does advertising spending correlate with sales? The Pearson correlation coefficient (r) quantifies the strength and direction of the linear relationship between two quantitative variables.
=CORREL(range1, range2)xi and yi are paired observations, and y are their respective means.
GreatLakes Manufacturing collected data on machine age (years) and annual maintenance cost ($000s) for n = 20 machines on their factory floor. Management suspects that older machines require more maintenance spending. The scatter plot below visualizes this relationship.
A sample correlation of r = 0.73 looks impressive, but could it have arisen by chance from a population where the true correlation is zero? To answer this, we perform a hypothesis test for the population correlation ρ (rho).
The test statistic follows a t-distribution with df = n − 2:
=T.DIST.2T(abs_t, df) for the p-valuer is the sample correlation, n is the sample size, and df = n − 2.
This chapter introduced the Pearson correlation coefficient as a measure of linear association and the t-test for evaluating its statistical significance.
Correlation (r): Measures the strength and direction of a linear relationship between two variables, ranging from −1 to +1. Always unitless.
Significance Test: Use the t-test with df = n − 2 to determine whether the sample correlation is statistically different from zero.
Correlation is not causation: A strong correlation indicates association but does not prove that one variable causes changes in the other.