Comparing Data Sets
Using measures of center and variability to compare two numerical data distributions.
Need a lesson plan for Mathematics?
Key Questions
- When is the median a better measure of center than the mean?
- How does the overlap of two data sets affect our ability to say they are significantly different?
- Why does the range or interquartile range matter when comparing two groups?
Common Core State Standards
About This Topic
Comparing two data distributions is one of the central applications of statistical reasoning in 7th grade. Rather than analyzing a single data set in isolation, students use measures of center and variability together to draw conclusions about how two groups differ or overlap. This is directly tied to CCSS 7.SP.B.3 and B.4, which require students to informally assess the degree of visual overlap between distributions.
The key insight is that two groups can have similar centers but very different variability, or vice versa. A difference in means only tells part of the story. When data sets overlap heavily, even a noticeable mean difference may not represent a meaningful distinction between groups. Students develop language for describing these comparisons: 'Group A's median is about 5 points higher than Group B's, and there is minimal overlap between the two distributions,' versus 'While the means differ, the ranges overlap substantially.'
Active learning is especially powerful for this topic because comparison requires interpretation, not just calculation. Structured discussions and peer argumentation push students to move beyond simply reporting numbers toward making evidence-based claims about differences between groups.
Learning Objectives
- Calculate the mean, median, and interquartile range for two different numerical data sets.
- Compare the measures of center (mean and median) and measures of variability (range and IQR) for two data sets using precise language.
- Evaluate the degree of overlap between two data distributions and explain how it impacts conclusions about their differences.
- Construct a written or verbal argument justifying whether two data sets represent significantly different groups, using calculated statistics and visual representations as evidence.
Before You Start
Why: Students need to be able to accurately calculate the mean and median before they can compare them across data sets.
Why: Students must understand how to find the range and interquartile range to compare the spread of data distributions.
Why: Students benefit from visual representations of data to understand overlap and variability intuitively before formal calculations.
Key Vocabulary
| Mean | The average of a data set, calculated by summing all values and dividing by the number of values. It can be sensitive to extreme values. |
| Median | The middle value in a data set when the values are ordered from least to greatest. It is not affected by extreme values and is a good measure of center for skewed data. |
| Range | The difference between the maximum and minimum values in a data set. It provides a simple measure of the spread of the data. |
| Interquartile Range (IQR) | The difference between the third quartile (75th percentile) and the first quartile (25th percentile) of a data set. It measures the spread of the middle 50% of the data and is less affected by outliers than the range. |
| Overlap | The extent to which the values in one data set share common values with another data set. Significant overlap suggests the groups may not be substantially different. |
Active Learning Ideas
See all activitiesStructured Academic Controversy: Which Group Performed Better?
Provide pairs of dot plots or box plots comparing two groups (e.g., Class A vs. Class B test scores). Groups are assigned a position (Class A scored better / Class B scored better) and must support their claim using center and variability measures. After arguing their position, groups switch sides and argue the opposite view, then reach a consensus.
Think-Pair-Share: Does Overlap Matter?
Present two sets of dot plots , one pair with clearly separated distributions and one pair with significant overlap, both with the same mean difference. Students individually describe what the overlap tells them, compare with a partner, then share with the class why overlap changes the interpretation.
Data Analysis Station Rotation
Set up four stations, each with a different real-world comparison (heights of plants in two conditions, scores from two classes, speeds in two trials). Groups rotate every 8 minutes, recording center and variability measures and writing one comparison sentence at each station. Final debrief connects all four comparisons.
Real-World Connections
Sports analysts compare statistics like batting averages or points per game between two teams or players to determine performance differences. They consider not just the average but also how consistent each player's performance is over a season.
Environmental scientists compare temperature readings or pollution levels from two different geographic locations to assess environmental impact or climate change. They look at both average conditions and the variability to understand if one location is consistently more extreme than the other.
Market researchers compare customer satisfaction scores for two different product versions. They analyze the average scores and the spread of responses to decide which product is performing better overall and for most customers.
Watch Out for These Misconceptions
Common MisconceptionIf one group has a higher mean, that group is definitively better or different.
What to Teach Instead
A higher mean matters more when the two distributions have low overlap. When distributions overlap heavily, the mean difference may be within the natural variability of the data and may not represent a meaningful distinction. Students need to consider both center and spread together.
Common MisconceptionVariability doesn't matter if you're comparing centers.
What to Teach Instead
Variability provides essential context for interpreting mean or median differences. A 10-point difference in means means something very different when IQRs are 5 versus when they are 40. Ignoring variability leads to overconfident comparisons.
Assessment Ideas
Provide students with two small data sets (e.g., test scores for two different classes). Ask them to calculate the mean, median, and range for each set. Then, have them write one sentence comparing the centers and one sentence comparing the spreads.
Present students with two box plots showing the heights of two different plant species. Ask: 'Based on these box plots, can we confidently say that one plant species is taller than the other? Explain your reasoning, referring to the median, IQR, and any overlap you observe.'
Show students two dot plots of student performance on a task. Ask them to identify which data set has a larger median and which has a larger range. Then, ask them to describe the overlap between the two data sets in their own words.
Suggested Methodologies
Ready to teach this topic?
Generate a complete, classroom-ready active learning mission in seconds.
Generate a Custom MissionFrequently Asked Questions
How do you compare two data sets using measures of center and variability?
When is median a better measure than mean for comparing groups?
What does it mean when two data distributions overlap?
Why is active learning effective for teaching data set comparisons?
Planning templates for Mathematics
5E Model
The 5E Model structures lessons through five phases (Engage, Explore, Explain, Elaborate, and Evaluate), guiding students from curiosity to deep understanding through inquiry-based learning.
unit plannerMath Unit
Plan a multi-week math unit with conceptual coherence: from building number sense and procedural fluency to applying skills in context and developing mathematical reasoning across a connected sequence of lessons.
rubricMath Rubric
Build a math rubric that assesses problem-solving, mathematical reasoning, and communication alongside procedural accuracy, giving students feedback on how they think, not just whether they got the right answer.
More in Probability and Statistics
Understanding Populations and Samples
Students will differentiate between populations and samples and understand the importance of representative samples.
2 methodologies
Random Sampling and Bias
Understanding that statistics can be used to gain information about a population by examining a sample.
2 methodologies
Drawing Inferences from Samples
Students will use data from a random sample to draw inferences about a population with an unknown characteristic of interest.
2 methodologies
Measures of Center: Mean, Median, Mode
Students will calculate and interpret measures of center for numerical data sets.
2 methodologies
Measures of Variability: Range and IQR
Students will calculate and interpret measures of variability (range, interquartile range) for numerical data sets.
2 methodologies