Chapter 2

Frequency Distributions & Data Visualization

📖 ~40 min read 📈 10 interactive charts + Excel guides ✍️ 3 practice questions 🎯 2 linked games

1. Frequency Distributions and Class Intervals

Raw data is rarely useful in its original form. When LakeFront Retail Co. collects 200 individual transaction values in a single day, staring at a column of 200 numbers reveals almost nothing. A frequency distribution solves this problem by organizing the data into groups (called classes or bins) and counting how many observations fall into each group.

A well-constructed frequency distribution answers the question: how are the data values spread across the range? It tells us where most transactions cluster, whether the values are evenly spread or bunched in certain regions, and whether there are gaps or unusual concentrations.

Building a Frequency Distribution

The process involves three key decisions:

  1. Determine the range — the difference between the largest and smallest values in the dataset.
  2. Choose the number of classes — too few classes hide patterns; too many create noise. A common guideline is 5 to 20 classes, depending on the dataset size.
  3. Calculate the class width — divide the range by the number of classes and round up to a convenient number.

Relative and Cumulative Frequency

Once you have raw counts, two additional columns unlock deeper insight:

  • Relative frequency — the proportion of total observations in each class. Calculated as frequency divided by total n. These proportions sum to 1.0 (or 100%).
  • Cumulative frequency — a running total of frequencies from the lowest class upward. The cumulative frequency of the last class always equals n. This tells you how many observations fall at or below a given value.
🏪 LakeFront Retail Co.

LakeFront Retail Co. recorded 200 transaction values across all stores on a busy Saturday. Transaction values range from $5 to $180, capturing everything from small impulse buys to large electronics purchases. Management wants to understand the distribution of transaction sizes to optimize pricing tiers, promotional targeting, and inventory planning.

The interactive table below groups these 200 transactions into classes. Use the slider to adjust the number of classes and watch how the frequency distribution changes.

Class Width
📊 Excel: =ROUNDUP((MAX(range)-MIN(range))/num_classes,0)
where Range is the difference between the maximum and minimum data values, and k is the desired number of classes.
8
LakeFront Retail Co. — Transaction Frequency Distribution (n = 200)
Class Interval Frequency Relative Freq. Cumulative Freq.
✓ Check Your Understanding
LakeFront has transaction values ranging from $5 to $185 and wants 8 classes. What is the class width?
$20
$22.50
$25
$18

2. Histograms

A histogram is the visual counterpart of a frequency distribution. It represents each class as a bar whose height equals the frequency (or relative frequency) of that class. Unlike a bar chart — where bars represent distinct categories and have gaps between them — histogram bars are adjacent with no gaps, because the classes represent continuous ranges of a numeric variable.

Histogram vs. Bar Chart

This distinction matters. A bar chart compares categories (stores, products, regions), and the order of the bars can be rearranged without changing the meaning. A histogram shows the distribution of a continuous variable, and the bars must follow the natural order of the numeric scale. Swapping bars in a histogram would destroy its meaning.

Reading Histogram Shape

The shape of a histogram reveals critical information about the data:

  • Symmetric: Data is roughly mirror-imaged around the center. The mean and median are approximately equal.
  • Right-skewed (positively skewed): A long tail extends to the right. The mean is pulled higher than the median. Common in income data and transaction values.
  • Left-skewed (negatively skewed): A long tail extends to the left. The mean is pulled lower than the median.
  • Bimodal: Two distinct peaks suggest the data may come from two different groups.
LakeFront Transaction Values — Histogram
$22
✎ Worked Example: Building a Histogram Step by Step
1
Start with the frequency distribution table from Section 1 (using 8 classes).
2
Draw the horizontal axis (x-axis) labeled with transaction values, and the vertical axis (y-axis) labeled with frequency.
3
For each class, draw a bar from the lower boundary to the upper boundary with a height equal to the frequency. Bars must touch — no gaps between them.
4
Interpret: The tallest bars show where most transactions cluster. If the right side has a longer tail, the distribution is right-skewed — meaning a few large transactions pull the average above the typical value.
📊 How to Create a Histogram in Excel
1
Enter your raw data in a single column (e.g., all 200 transaction values in column A).
2
Select the data range — click on cell A1 and press Ctrl+Shift+End to select all data, or manually highlight the range A1:A200.
3
Go to the Insert tab → click Insert Statistic Chart (the icon with vertical bars and a curve) → select Histogram.
4
Adjust bin width: Right-click any bar → Format Data Series → under Axis Options, choose Bin Width and type your desired width (e.g., 22), or choose Number of Bins and enter 8.
5
Format the chart: Click the chart title to rename it (e.g., "Transaction Value Distribution"). Right-click bars to change fill color. Use the Chart Design tab to pick a style.
6
Alternative (Data Analysis Toolpak): Go to Data tab → Data AnalysisHistogram. Enter your data range and optional bin range (a column of upper class boundaries you define). Check Chart Output → click OK. This method gives you more control over exact bin edges.
✓ Check Your Understanding
A histogram with a long tail to the right is described as:
Symmetric
Skewed left
Skewed right
Bimodal
🎮
Practice: Distribution Guesser Test your ability to identify distribution shapes from histograms

3. Other Business Charts

Histograms are powerful for showing distributions, but they are only one tool in the analyst's toolkit. Different business questions call for different chart types. Choosing the right visualization is itself a critical analytical skill — the wrong chart can obscure patterns or even mislead the audience.

Bar Chart — Comparing Categories

Bar charts display values for distinct categories side by side. Unlike histograms, bars have gaps between them because the categories are discrete. Use bar charts when you want to compare named groups — stores, product lines, departments, or time periods.

Pie Chart — Showing Parts of a Whole

Pie charts show how a total is divided among categories. Each slice represents a proportion. Pie charts work best with a small number of categories (3–6). With too many slices, they become difficult to read. Use them when the emphasis is on the relative share of each category.

Line Chart — Showing Trends Over Time

Line charts connect data points across time to reveal trends, cycles, and seasonal patterns. They are the go-to choice for any time-series data. The x-axis represents time, and the y-axis represents the measured variable.

Scatter Plot — Showing Relationships

Scatter plots place two continuous variables on the x and y axes, with each data point represented as a dot. They are essential for exploring whether two variables are related — for example, whether stores with more square footage tend to have higher revenue.

Box Plot — Summarizing Distributions

Box plots (also called box-and-whisker plots) display the five-number summary: minimum, first quartile (Q1), median, third quartile (Q3), and maximum. The box spans from Q1 to Q3 (the interquartile range), with a line at the median. Whiskers extend to the minimum and maximum, and any points beyond 1.5 × IQR are marked as outliers. Box plots are especially useful for comparing distributions across groups.

📊 How to Create a Bar Chart in Excel
1
Set up your data in two columns: category names in column A (e.g., store names) and values in column B (e.g., revenue). Include headers in row 1.
2
Select both columns including headers (e.g., A1:B13).
3
Go to Insert tab → click Insert Column or Bar Chart → choose Clustered Bar (horizontal) or Clustered Column (vertical).
4
Sort for impact: Before creating the chart, sort your data from largest to smallest (Data tab → Sort). A sorted bar chart makes comparisons much easier to read.
5
Add data labels: Click on any bar → click the + icon (Chart Elements) → check Data Labels. This places the exact value on each bar so readers don’t have to estimate from the axis.
📊 How to Create a Pie Chart in Excel
1
Set up your data in two columns: category names in column A and values (counts or percentages) in column B. Keep it to 3–6 categories for readability.
2
Select both columns including headers.
3
Go to Insert tab → click Insert Pie or Doughnut Chart → choose Pie (or Doughnut for a ring variation).
4
Add percentage labels: Click on any slice → click + (Chart Elements) → Data LabelsMore Options → check Percentage and uncheck Value. This shows each slice as a percentage of the total.
5
Explode a slice for emphasis: Click on the slice you want to highlight, then drag it slightly outward. This visually separates it from the rest of the pie.
📊 How to Create a Line Chart in Excel
1
Set up your data with time periods in column A (e.g., Jan, Feb, Mar…) and values in column B (e.g., monthly sales). For multiple series, add more columns (C, D, etc.).
2
Select the entire data range including headers.
3
Go to Insert tab → click Insert Line or Area Chart → choose Line with Markers (markers make individual data points visible and easier to read).
4
Add a trendline: Right-click any data point on the line → Add Trendline → choose Linear to show the overall direction. Check Display Equation on chart if you want to see the trend formula.
5
Customize the axis: Right-click the y-axis → Format Axis → set the Minimum to a value just below your lowest data point (rather than 0) to emphasize the trend and make changes more visible.
📊 How to Create a Scatter Plot in Excel
1
Set up your data in two columns: the independent variable (x) in column A (e.g., store size in sq ft) and the dependent variable (y) in column B (e.g., revenue). Include headers.
2
Select both columns including headers.
3
Go to Insert tab → click Insert Scatter (X, Y) or Bubble Chart → choose Scatter (dots only, no lines connecting them).
4
Add a trendline: Right-click any data point → Add Trendline → choose Linear. Check Display R-squared value on chart to see how well the line fits the data. An R² close to 1 means a strong linear relationship.
5
Label axes: Click the + icon (Chart Elements) → check Axis Titles. Always label both axes with the variable name and units (e.g., "Store Size (sq ft)" and "Weekly Revenue ($K)").
📊 How to Create a Box Plot in Excel
1
Set up your data with each group in a separate column. For example: column A = "West" store revenues, column B = "North" store revenues, column C = "Central", etc. Include group names as headers.
2
Select all columns of data including headers.
3
Go to Insert tab → click Insert Statistic Chart (same icon group as Histogram) → choose Box and Whisker.
4
Customize whisker options: Right-click any box → Format Data Series. You can toggle between showing inclusive or exclusive quartile calculations, and choose whether whiskers extend to min/max or to 1.5×IQR.
5
Show outlier points: In Format Data Series, check Show outlier points. Any values beyond 1.5×IQR from Q1 or Q3 will appear as individual dots outside the whiskers — these are potential outliers worth investigating.
6
Note: Box and Whisker charts require Excel 2016 or later (or Microsoft 365). In older versions, you can build a box plot manually using stacked bar charts with hidden segments — but the built-in chart is much easier.
Bar Chart: Revenue by Store
Pie Chart: Transactions by Category
Line Chart: Monthly Sales Trend
Scatter Plot: Store Size vs Revenue
Box Plot: Transaction Values by Store Region
💡 Key Takeaway

Bar charts compare categories. Histograms show distributions of continuous data. Line charts reveal trends over time. Scatter plots expose relationships between two variables. Box plots summarize distributions and highlight outliers. Choosing the right chart type is as important as calculating the right statistic.

🎮
Practice: Which Chart? Game Match datasets to the best visualization type

4. Describing Distribution Shape

Before computing any statistics, the very first thing a good analyst does is visualize the data. The shape of a distribution tells you which summary statistics are appropriate, whether the data needs transformation, and whether there are surprising features like gaps or multiple peaks.

Common Distribution Shapes

There are five fundamental shapes you will encounter repeatedly in business data:

  • Symmetric: The histogram is roughly a mirror image around its center. The mean and median are approximately equal. Example: heights of adults, standardized test scores.
  • Skewed Right (Positive Skew): The right tail is longer. Most values cluster on the left with a few extreme high values pulling the mean above the median. Example: income, home prices, transaction amounts.
  • Skewed Left (Negative Skew): The left tail is longer. Most values cluster on the right with a few extreme low values pulling the mean below the median. Example: age at retirement, exam scores with a hard ceiling.
  • Bimodal: Two distinct peaks. This often indicates the data combines two separate groups. Example: arrival times at a restaurant (lunch and dinner peaks).
  • Uniform: All values are approximately equally likely. The histogram looks flat. Example: rolling a fair die many times, random number generation.

Why Shape Matters

The shape of the distribution directly influences your choice of statistics:

  • For symmetric distributions, the mean and standard deviation are effective summaries.
  • For skewed distributions, the median and IQR are more representative — the mean is pulled toward the tail and may not reflect a typical value.
  • For bimodal distributions, neither the mean nor the median may be meaningful. You should investigate whether the data should be split into subgroups.
Symmetric
Skewed Right
Skewed Left
Bimodal
Uniform
Bell-Shaped (Normal)
✓ Check Your Understanding
LakeFront's daily sales histogram shows most days clustered around $4,000 with a few days above $8,000. This distribution is:
Symmetric
Skewed left
Skewed right
Bimodal
💡 Key Takeaway

Always visualize your data before calculating statistics. The shape of the distribution tells you which measures of center and spread are most appropriate. Symmetric data works well with mean and standard deviation. Skewed data is better summarized by median and IQR. Bimodal data may need to be split into subgroups before summarizing.

5. Chapter Summary

In this chapter, we learned how to organize raw data into frequency distributions and bring those distributions to life through visualization. Here is what you should take away:

💡 Chapter 2 Summary

Frequency Distributions: Organize raw data into classes to reveal the overall pattern. Class width equals the range divided by the desired number of classes. Relative frequency shows proportions; cumulative frequency shows running totals.

Histograms: The visual version of a frequency distribution. Bars are adjacent (no gaps) because the variable is continuous. The shape of the histogram — symmetric, skewed, bimodal, or uniform — is the most important feature to identify.

Chart Selection: Bar charts for categories, histograms for distributions, line charts for trends, scatter plots for relationships, box plots for distribution summaries with outliers. The right chart makes the data story clear; the wrong chart obscures it.

Distribution Shape: Shape determines which statistics to use. Symmetric data works with mean and standard deviation. Skewed data is better served by median and IQR. Always look at the data before computing.

📋 Chapter 2 — Formula Reference
Measure Formula Excel Function
Class Width
=ROUNDUP((MAX(range)-MIN(range))/k,0)
Relative Frequency
=COUNTIFS(range,">="&lower,range,"<"&upper)/COUNT(range)
Cumulative Frequency
=COUNTIF(range,"<="&upper_bound)
Frequency
=COUNTIFS(range,">="&lower,range,"<"&upper)
Number of Classes (Sturges)
=ROUNDUP(1+3.322*LOG10(COUNT(range)),0)
Range
=MAX(range)-MIN(range)
📄
Download the LakeFront Transaction Dataset
Coming Soon — Excel file with all 200 transaction values
Up Next
Chapter 3: Probability Fundamentals