Skip to content
Probability and Statistics · Weeks 28-36

Comparing Data Sets

Using measures of center and variability to compare two numerical data distributions.

Key Questions

  1. When is the median a better measure of center than the mean?
  2. How does the overlap of two data sets affect our ability to say they are significantly different?
  3. Why does the range or interquartile range matter when comparing two groups?

Common Core State Standards

CCSS.Math.Content.7.SP.B.3CCSS.Math.Content.7.SP.B.4
Grade: 7th Grade
Subject: Mathematics
Unit: Probability and Statistics
Period: Weeks 28-36

About This Topic

Comparing two data distributions is one of the central applications of statistical reasoning in 7th grade. Rather than analyzing a single data set in isolation, students use measures of center and variability together to draw conclusions about how two groups differ or overlap. This is directly tied to CCSS 7.SP.B.3 and B.4, which require students to informally assess the degree of visual overlap between distributions.

The key insight is that two groups can have similar centers but very different variability, or vice versa. A difference in means only tells part of the story. When data sets overlap heavily, even a noticeable mean difference may not represent a meaningful distinction between groups. Students develop language for describing these comparisons: 'Group A's median is about 5 points higher than Group B's, and there is minimal overlap between the two distributions,' versus 'While the means differ, the ranges overlap substantially.'

Active learning is especially powerful for this topic because comparison requires interpretation, not just calculation. Structured discussions and peer argumentation push students to move beyond simply reporting numbers toward making evidence-based claims about differences between groups.

Learning Objectives

  • Calculate the mean, median, and interquartile range for two different numerical data sets.
  • Compare the measures of center (mean and median) and measures of variability (range and IQR) for two data sets using precise language.
  • Evaluate the degree of overlap between two data distributions and explain how it impacts conclusions about their differences.
  • Construct a written or verbal argument justifying whether two data sets represent significantly different groups, using calculated statistics and visual representations as evidence.

Before You Start

Calculating Measures of Center

Why: Students need to be able to accurately calculate the mean and median before they can compare them across data sets.

Calculating Measures of Variability

Why: Students must understand how to find the range and interquartile range to compare the spread of data distributions.

Data Visualization (Dot Plots, Box Plots)

Why: Students benefit from visual representations of data to understand overlap and variability intuitively before formal calculations.

Key Vocabulary

MeanThe average of a data set, calculated by summing all values and dividing by the number of values. It can be sensitive to extreme values.
MedianThe middle value in a data set when the values are ordered from least to greatest. It is not affected by extreme values and is a good measure of center for skewed data.
RangeThe difference between the maximum and minimum values in a data set. It provides a simple measure of the spread of the data.
Interquartile Range (IQR)The difference between the third quartile (75th percentile) and the first quartile (25th percentile) of a data set. It measures the spread of the middle 50% of the data and is less affected by outliers than the range.
OverlapThe extent to which the values in one data set share common values with another data set. Significant overlap suggests the groups may not be substantially different.

Active Learning Ideas

See all activities

Real-World Connections

Sports analysts compare statistics like batting averages or points per game between two teams or players to determine performance differences. They consider not just the average but also how consistent each player's performance is over a season.

Environmental scientists compare temperature readings or pollution levels from two different geographic locations to assess environmental impact or climate change. They look at both average conditions and the variability to understand if one location is consistently more extreme than the other.

Market researchers compare customer satisfaction scores for two different product versions. They analyze the average scores and the spread of responses to decide which product is performing better overall and for most customers.

Watch Out for These Misconceptions

Common MisconceptionIf one group has a higher mean, that group is definitively better or different.

What to Teach Instead

A higher mean matters more when the two distributions have low overlap. When distributions overlap heavily, the mean difference may be within the natural variability of the data and may not represent a meaningful distinction. Students need to consider both center and spread together.

Common MisconceptionVariability doesn't matter if you're comparing centers.

What to Teach Instead

Variability provides essential context for interpreting mean or median differences. A 10-point difference in means means something very different when IQRs are 5 versus when they are 40. Ignoring variability leads to overconfident comparisons.

Assessment Ideas

Exit Ticket

Provide students with two small data sets (e.g., test scores for two different classes). Ask them to calculate the mean, median, and range for each set. Then, have them write one sentence comparing the centers and one sentence comparing the spreads.

Discussion Prompt

Present students with two box plots showing the heights of two different plant species. Ask: 'Based on these box plots, can we confidently say that one plant species is taller than the other? Explain your reasoning, referring to the median, IQR, and any overlap you observe.'

Quick Check

Show students two dot plots of student performance on a task. Ask them to identify which data set has a larger median and which has a larger range. Then, ask them to describe the overlap between the two data sets in their own words.

Ready to teach this topic?

Generate a complete, classroom-ready active learning mission in seconds.

Generate a Custom Mission

Frequently Asked Questions

How do you compare two data sets using measures of center and variability?
Calculate the mean or median for each group to see which tends to be higher, then compare their ranges or IQRs to understand how spread out each is. Also note how much the two distributions overlap , heavy overlap suggests the groups are more similar than the difference in centers implies.
When is median a better measure than mean for comparing groups?
Median is better when one or both distributions contain outliers or are skewed, because the mean gets pulled toward extreme values. Comparing medians gives a more accurate sense of what's typical for each group when the data isn't symmetric.
What does it mean when two data distributions overlap?
Overlap means that many individual values from both groups fall in the same range. A lot of overlap suggests the two groups are quite similar, even if their centers differ slightly. Less overlap means the groups are more meaningfully different from each other.
Why is active learning effective for teaching data set comparisons?
Comparing data distributions requires judgment and argumentation, not just arithmetic. When students debate which group 'really' performed better using the same data, they encounter the complexity of statistical interpretation , the kind of reasoning that can't be learned by watching a teacher solve problems.