Chapter 6

Logistic Regression Introduction

📖 ~45 min read ✍️ 1 practice question

6.1 When the Outcome is Binary

Not every business question involves predicting a continuous number like revenue or cost. Many of the most important decisions revolve around binary outcomes: Will a customer churn or stay? Will a loan default or not? Will a prospect convert or bounce? Linear regression is ill-suited for these problems because it can produce predicted probabilities outside the 0-to-1 range, and its assumptions about normally distributed errors do not hold for binary data.

Logistic regression solves this by modeling the probability that the outcome equals 1 (the event of interest) using a mathematical function that naturally stays between 0 and 1. Instead of fitting a straight line through the data, logistic regression fits an S-shaped curve called the logistic function (or sigmoid).

Logistic Function (Predicted Probability)
📊 Excel: =EXP(b0+b1*x)/(1+EXP(b0+b1*x))
where P(y=1) is the probability of the event occurring, b0 is the intercept, and b1 is the coefficient for predictor x.
Logit (Log Odds)
📊 Excel: =LN(p/(1-p))
The logit transforms a probability p into log odds. This transformation is the link function that makes logistic regression a linear model on the log-odds scale.

The logit function is the inverse of the logistic function. While the logistic function maps any real number to a probability between 0 and 1, the logit maps a probability back to the real line. This duality is what makes logistic regression work: the model is linear in the log-odds, but non-linear in the probability space.

In Excel, logistic regression coefficients can be estimated using the Solver add-in (maximizing the log-likelihood) or through the Analysis ToolPak with binary data preparation.

🏪 NorthStar Enterprises

NorthStar Enterprises wants to predict customer churn — whether a customer will leave within the next quarter. The outcome is binary: churned (1) or retained (0). Using logistic regression, the team models churn probability based on purchase frequency, months since last order, customer satisfaction score, and contract type.

A linear regression would predict values like -0.15 or 1.30 for some customers — impossible probabilities. Logistic regression constrains every prediction to a valid probability between 0 and 1.

6.2 Interpreting Logistic Regression

Interpreting logistic regression coefficients requires thinking on the odds scale rather than the probability scale. Each coefficient represents the change in the log odds of the outcome for a one-unit increase in the predictor. To make this more intuitive, we convert to the odds ratio by exponentiating the coefficient.

Odds Ratio
📊 Excel: =EXP(b)
The odds ratio tells you how the odds of the outcome change for each one-unit increase in the predictor. OR > 1 means the odds increase; OR < 1 means the odds decrease.

Interpreting Odds Ratios

  • OR = 1: No effect — the predictor does not change the odds of the outcome.
  • OR > 1: Each unit increase in the predictor increases the odds of the event. For example, OR = 2.0 means the odds double.
  • OR < 1: Each unit increase in the predictor decreases the odds. For example, OR = 0.5 means the odds are cut in half.

A useful shortcut: to express the percentage change in odds, compute (OR − 1) × 100. An OR of 1.35 means a 35% increase in odds; an OR of 0.71 means a 29% decrease in odds.

🏪 NorthStar Enterprises

In NorthStar's churn model, the coefficient for purchase frequency (monthly purchases) is b = −0.34. The odds ratio is:

OR = e−0.34 = 0.71

This means each additional purchase per month reduces the odds of churn by 29%. Customers who buy more frequently are substantially less likely to leave. This finding supports NorthStar's loyalty program, which incentivizes repeat purchases.

✓ Check Your Understanding
A logistic regression model yields an odds ratio of 2.5 for a predictor variable. What does this mean?
The probability of the outcome doubles
The odds of the outcome are 2.5 times higher for each one-unit increase in the predictor
The predictor explains 2.5% of the variance
The result is significant at the 2.5% level
💡 Key Takeaway

Odds ratios are the natural currency of logistic regression interpretation. An OR of 2.5 does not mean the probability is 2.5 times higher — it means the odds are 2.5 times higher. The distinction between odds and probability matters: odds can exceed 1, but probability cannot.

6.3 Model Fit and Classification

Unlike linear regression, where R-squared measures model fit, logistic regression uses classification-based metrics. After the model produces predicted probabilities, we apply a threshold (typically 0.5) to convert probabilities into predicted classes: above the threshold is predicted as 1, below as 0. Comparing these predictions to actual outcomes produces the confusion matrix.

The Confusion Matrix

Predicted: Stay Predicted: Churn
Actual: Stay TN = 430 FP = 70
Actual: Churn FN = 63 TP = 237

Key Metrics

  • Accuracy = (TP + TN) / Total — overall proportion of correct predictions.
  • Sensitivity (Recall) = TP / (TP + FN) — proportion of actual positives correctly identified. Critical when missing a positive case is costly.
  • Specificity = TN / (TN + FP) — proportion of actual negatives correctly identified.
  • ROC Curve and AUC: The Receiver Operating Characteristic curve plots sensitivity vs. (1 − specificity) at all thresholds. The Area Under the Curve (AUC) ranges from 0.5 (no discrimination) to 1.0 (perfect discrimination).
Accuracy
📊 Excel: =(TP+TN)/(TP+TN+FP+FN)
where TP = true positives, TN = true negatives, FP = false positives, FN = false negatives.
Sensitivity (Recall)
📊 Excel: =TP/(TP+FN)
Sensitivity measures how well the model identifies positive cases (e.g., churned customers).
Specificity
📊 Excel: =TN/(TN+FP)
Specificity measures how well the model identifies negative cases (e.g., retained customers).
🏪 NorthStar Enterprises

NorthStar's churn model achieved the following on the test data:

  • Accuracy: 83% — the model correctly classifies 83% of customers.
  • Sensitivity: 79% — it catches 79% of customers who actually churn.
  • Specificity: 86% — it correctly identifies 86% of customers who stay.

The slightly lower sensitivity means some churning customers slip through. Since the cost of losing a customer (lifetime value) far exceeds the cost of a retention offer, NorthStar decided to lower the classification threshold from 0.5 to 0.4 — catching more true churners at the expense of some false alarms.

💡 Key Takeaway

The cost of false negatives versus false positives should drive your choice of classification threshold. In churn prediction, missing a churner (false negative) is typically far more expensive than sending a retention offer to a loyal customer (false positive). Adjust the threshold to align with business priorities, not just to maximize overall accuracy.

6.4 Chapter Summary

Logistic regression extends regression analysis to binary outcomes, opening the door to classification problems that pervade business decision-making.

💡 Chapter 6 Summary

Binary Outcomes: Logistic regression models the probability of a binary event using the logistic (sigmoid) function, keeping predictions between 0 and 1.

Interpretation: Coefficients represent changes in log odds. Convert to odds ratios (eb) for intuitive interpretation. OR > 1 increases odds; OR < 1 decreases odds.

Classification: The confusion matrix, accuracy, sensitivity, specificity, and AUC measure model performance. Adjust the classification threshold based on the relative cost of false positives vs. false negatives.

📋 Chapter 6 — Formula Reference
Measure Formula Excel Function
Logistic Function
=EXP(b0+b1*x)/(1+EXP(b0+b1*x))
Logit
=LN(p/(1-p))
Odds Ratio
=EXP(b)
Accuracy
=(TP+TN)/Total
Sensitivity
=TP/(TP+FN)
Specificity
=TN/(TN+FP)
Up Next
Chapter 7: Forecasting and Time Series