Skip to content
Mathematics · Year 10 · Statistical Investigations and Data Analysis · Term 4

Comparing Data Sets using Box Plots and Histograms

Using visual displays and summary statistics to compare two or more data sets.

ACARA Content DescriptionsAC9M10ST02

About This Topic

Comparing data sets is about looking at the 'big picture' of different groups to see how they differ. Students use box plots (box-and-whisker plots) and histograms to compare the spread, center, and skewness of data. They learn to use the median and interquartile range (IQR) to make robust comparisons, especially when data is skewed or contains outliers.

In the Year 10 Australian Curriculum, this is where statistics becomes a tool for argument. Students don't just calculate numbers; they use them to answer questions like 'Is the girls' team more consistent than the boys' team?' or 'Has the new teaching method improved results?'. This topic comes alive when students can compare real-world data sets that matter to them. Students grasp this concept faster through structured discussion and peer explanation where they must 'defend' a population based on its box plot features.

Key Questions

  1. Explain how visual displays can be used to argue that two populations are significantly different?
  2. Compare the central tendency and spread of two data sets based on their box plots.
  3. Critique the effectiveness of different graphical displays for comparing data sets.

Learning Objectives

  • Compare the distribution, center, and spread of two or more data sets using summary statistics and visual displays.
  • Critique the suitability of box plots and histograms for representing and comparing different types of data distributions.
  • Analyze visual displays of data to identify potential outliers and assess the symmetry or skewness of data sets.
  • Formulate arguments about differences between populations based on statistical evidence from comparative displays.

Before You Start

Calculating and Interpreting Summary Statistics

Why: Students need to be able to calculate measures like the median, mean, and range, and understand what they represent before comparing data sets.

Constructing Histograms and Box Plots

Why: Students must have prior experience creating these visual displays to effectively use them for comparison.

Understanding Data Distribution

Why: A foundational understanding of concepts like spread, center, and skewness is necessary to interpret and compare data sets.

Key Vocabulary

Box PlotA visual representation of the distribution of data through quartiles. It shows the minimum, first quartile (Q1), median, third quartile (Q3), and maximum values.
HistogramA graphical display of data where the data is divided into bins (intervals), and the frequency of data points falling into each bin is represented by a bar.
Interquartile Range (IQR)The difference between the third quartile (Q3) and the first quartile (Q1) of a data set, representing the spread of the middle 50% of the data.
MedianThe middle value in a data set when the data is ordered from least to greatest. It is a measure of central tendency.
OutlierA data point that is significantly different from other data points in a data set. Box plots often use fences to identify potential outliers.

Watch Out for These Misconceptions

Common MisconceptionThinking that a 'longer' box in a box plot means there is more data in that section.

What to Teach Instead

Every section of a box plot (each whisker and each half of the box) contains exactly 25% of the data. A longer section just means the data is more 'spread out' in that range. Using physical 'human box plots' where students stand in quartiles helps them see that the number of people is the same even if they are standing further apart.

Common MisconceptionAlways using the mean to compare groups.

What to Teach Instead

Students often default to the mean because it's familiar. Peer-led 'mean vs. median' challenges, using data like 'income' where one billionaire can skew the mean, help students see why the median is often a 'more honest' measure of the centre.

Active Learning Ideas

See all activities

Real-World Connections

  • Sports analysts compare player statistics, such as batting averages or points per game, using box plots to determine if one player or team is more consistent or performs better than another over a season.
  • Environmental scientists use histograms and box plots to compare air quality measurements or water pollution levels between different cities or industrial sites to identify areas of concern and inform policy.
  • Economists analyze income distributions or housing prices in different regions using these graphical tools to understand economic disparities and trends.

Assessment Ideas

Discussion Prompt

Present students with two box plots comparing the heights of Year 10 students from two different schools. Ask: 'Based on these box plots, which school appears to have taller students overall? Justify your answer using the median and IQR. What are the limitations of comparing only these two statistics?'

Quick Check

Provide students with a set of data for two different groups (e.g., test scores). Ask them to construct both a histogram and a box plot for each data set. Then, have them write two sentences comparing the central tendency and spread of the two groups, referencing their graphs.

Peer Assessment

Students work in pairs to compare two real-world data sets (e.g., daily temperatures in two cities). Each student creates a comparative visual display (box plot or histogram). They then swap displays and use a checklist to evaluate their partner's work: 'Is the display clear and correctly labeled? Does it effectively compare the data? Are summary statistics mentioned in the comparison?'

Frequently Asked Questions

What is the 'interquartile range' (IQR) and why does it matter?
The IQR is the distance between the 25th and 75th percentiles (the 'middle 50%' of the data). It's a great measure of consistency. A small IQR means the data is very bunched up around the middle, while a large IQR means there is a lot of variety. It's much more reliable than the 'range' because it ignores outliers.
How can active learning help students compare data sets?
Active learning, like the 'Reaction Time Challenge', gives students 'ownership' of the data. When they are comparing their own results, they are more likely to notice subtle differences in the box plots. Discussing these differences in groups forces them to use statistical language (like 'skew' and 'spread') to justify their observations.
How do you tell if a data set is 'skewed'?
Look at the 'tail' of the data. If the tail stretches out to the right (higher numbers), it's 'right-skewed' (or positively skewed). If it stretches to the left, it's 'left-skewed'. In a box plot, you can see this if one whisker is much longer than the other or if the median isn't in the middle of the box.
Why are box plots better than histograms for comparing groups?
Box plots are much 'cleaner' for side-by-side comparisons. You can stack three or four box plots on top of each other and immediately see which group has the highest median or the most spread. Histograms are better for seeing the detailed 'shape' of a single group, but they get messy when you try to overlay them.

Planning templates for Mathematics