Comparing Data Sets using Box Plots and Histograms
Using visual displays and summary statistics to compare two or more data sets.
About This Topic
Comparing data sets is about looking at the 'big picture' of different groups to see how they differ. Students use box plots (box-and-whisker plots) and histograms to compare the spread, center, and skewness of data. They learn to use the median and interquartile range (IQR) to make robust comparisons, especially when data is skewed or contains outliers.
In the Year 10 Australian Curriculum, this is where statistics becomes a tool for argument. Students don't just calculate numbers; they use them to answer questions like 'Is the girls' team more consistent than the boys' team?' or 'Has the new teaching method improved results?'. This topic comes alive when students can compare real-world data sets that matter to them. Students grasp this concept faster through structured discussion and peer explanation where they must 'defend' a population based on its box plot features.
Key Questions
- Explain how visual displays can be used to argue that two populations are significantly different?
- Compare the central tendency and spread of two data sets based on their box plots.
- Critique the effectiveness of different graphical displays for comparing data sets.
Learning Objectives
- Compare the distribution, center, and spread of two or more data sets using summary statistics and visual displays.
- Critique the suitability of box plots and histograms for representing and comparing different types of data distributions.
- Analyze visual displays of data to identify potential outliers and assess the symmetry or skewness of data sets.
- Formulate arguments about differences between populations based on statistical evidence from comparative displays.
Before You Start
Why: Students need to be able to calculate measures like the median, mean, and range, and understand what they represent before comparing data sets.
Why: Students must have prior experience creating these visual displays to effectively use them for comparison.
Why: A foundational understanding of concepts like spread, center, and skewness is necessary to interpret and compare data sets.
Key Vocabulary
| Box Plot | A visual representation of the distribution of data through quartiles. It shows the minimum, first quartile (Q1), median, third quartile (Q3), and maximum values. |
| Histogram | A graphical display of data where the data is divided into bins (intervals), and the frequency of data points falling into each bin is represented by a bar. |
| Interquartile Range (IQR) | The difference between the third quartile (Q3) and the first quartile (Q1) of a data set, representing the spread of the middle 50% of the data. |
| Median | The middle value in a data set when the data is ordered from least to greatest. It is a measure of central tendency. |
| Outlier | A data point that is significantly different from other data points in a data set. Box plots often use fences to identify potential outliers. |
Watch Out for These Misconceptions
Common MisconceptionThinking that a 'longer' box in a box plot means there is more data in that section.
What to Teach Instead
Every section of a box plot (each whisker and each half of the box) contains exactly 25% of the data. A longer section just means the data is more 'spread out' in that range. Using physical 'human box plots' where students stand in quartiles helps them see that the number of people is the same even if they are standing further apart.
Common MisconceptionAlways using the mean to compare groups.
What to Teach Instead
Students often default to the mean because it's familiar. Peer-led 'mean vs. median' challenges, using data like 'income' where one billionaire can skew the mean, help students see why the median is often a 'more honest' measure of the centre.
Active Learning Ideas
See all activitiesInquiry Circle: The Reaction Time Challenge
Students use an online tool to measure their reaction times (e.g., dominant vs. non-dominant hand). In groups, they create back-to-back box plots of the results and write a 'statistical report' comparing the median and spread of the two groups.
Gallery Walk: Skewness and Stories
The teacher posts four different histograms (e.g., house prices, heights, dice rolls). Groups must match each histogram to a 'story' or data source and explain their reasoning based on the shape (symmetric, left-skewed, right-skewed) to the rest of the class.
Think-Pair-Share: The Outlier Debate
Students are given a data set with one extreme outlier. They individually calculate the mean and median, then pair up to discuss which 'average' is a fairer representation of the group. They must agree on a recommendation for a 'news report' based on their findings.
Real-World Connections
- Sports analysts compare player statistics, such as batting averages or points per game, using box plots to determine if one player or team is more consistent or performs better than another over a season.
- Environmental scientists use histograms and box plots to compare air quality measurements or water pollution levels between different cities or industrial sites to identify areas of concern and inform policy.
- Economists analyze income distributions or housing prices in different regions using these graphical tools to understand economic disparities and trends.
Assessment Ideas
Present students with two box plots comparing the heights of Year 10 students from two different schools. Ask: 'Based on these box plots, which school appears to have taller students overall? Justify your answer using the median and IQR. What are the limitations of comparing only these two statistics?'
Provide students with a set of data for two different groups (e.g., test scores). Ask them to construct both a histogram and a box plot for each data set. Then, have them write two sentences comparing the central tendency and spread of the two groups, referencing their graphs.
Students work in pairs to compare two real-world data sets (e.g., daily temperatures in two cities). Each student creates a comparative visual display (box plot or histogram). They then swap displays and use a checklist to evaluate their partner's work: 'Is the display clear and correctly labeled? Does it effectively compare the data? Are summary statistics mentioned in the comparison?'
Frequently Asked Questions
What is the 'interquartile range' (IQR) and why does it matter?
How can active learning help students compare data sets?
How do you tell if a data set is 'skewed'?
Why are box plots better than histograms for comparing groups?
Planning templates for Mathematics
5E Model
The 5E Model structures lessons through five phases (Engage, Explore, Explain, Elaborate, and Evaluate), guiding students from curiosity to deep understanding through inquiry-based learning.
Unit PlannerMath Unit
Plan a multi-week math unit with conceptual coherence: from building number sense and procedural fluency to applying skills in context and developing mathematical reasoning across a connected sequence of lessons.
RubricMath Rubric
Build a math rubric that assesses problem-solving, mathematical reasoning, and communication alongside procedural accuracy, giving students feedback on how they think, not just whether they got the right answer.
More in Statistical Investigations and Data Analysis
Box Plots and Five-Number Summary
Constructing and interpreting box plots from a five-number summary to visualize data distribution.
2 methodologies
Bivariate Data and Scatter Plots
Examining the relationship between two numerical variables and identifying trends.
2 methodologies
Correlation and Causation
Understanding the difference between correlation and causation in bivariate data.
2 methodologies
Line of Best Fit and Prediction
Drawing and using lines of best fit to make predictions and interpret relationships.
2 methodologies
Introduction to Linear Regression
Using technology to find the equation of the least squares regression line.
2 methodologies
Statistical Investigations: Planning and Reporting
Designing and conducting a statistical investigation, from formulating questions to presenting findings.
2 methodologies