Skip to content
Mathematics · Year 10 · Statistical Investigations and Data Analysis · Term 4

Box Plots and Five-Number Summary

Constructing and interpreting box plots from a five-number summary to visualize data distribution.

ACARA Content DescriptionsAC9M10ST02

About This Topic

Bivariate data involves examining the relationship between two different numerical variables, such as height and shoe size, or study hours and exam scores. Students learn to create scatter plots, identify the 'direction' (positive or negative) and 'strength' of a correlation, and draw a line of best fit. A major focus in Year 10 is the critical understanding that correlation does not equal causation, just because two things move together doesn't mean one causes the other.

This topic is a key part of statistical literacy in the Australian Curriculum, teaching students to be skeptical consumers of data. It introduces the concept of interpolation (predicting within the data range) and extrapolation (predicting outside it), along with the risks involved. This topic comes alive when students can collect their own bivariate data and use collaborative tools to find trends, fostering a sense of discovery and scientific inquiry.

Key Questions

  1. Explain how a box plot visually represents the five-number summary.
  2. Analyze how to identify outliers using the interquartile range.
  3. Design a box plot for a given data set and interpret its skewness.

Learning Objectives

  • Calculate the five-number summary (minimum, first quartile, median, third quartile, maximum) for a given data set.
  • Construct a box plot accurately from a calculated five-number summary.
  • Analyze a box plot to identify the range, interquartile range, and potential outliers.
  • Compare the distribution and skewness of two or more data sets represented by box plots.
  • Explain the relationship between the visual elements of a box plot and the underlying data distribution.

Before You Start

Measures of Central Tendency

Why: Students need to understand how to calculate the mean, median, and mode to properly find the median and understand its role in the five-number summary.

Measures of Spread

Why: Students must be familiar with the range and how to calculate it, as this is a fundamental component of data distribution and is extended by the IQR.

Data Representation (e.g., Stem-and-Leaf Plots, Histograms)

Why: Prior experience with other graphical data displays helps students understand the purpose and interpretation of visual data summaries like box plots.

Key Vocabulary

Five-Number SummaryA set of five key values that describe the distribution of a data set: minimum, first quartile (Q1), median (Q2), third quartile (Q3), and maximum.
Box PlotA graphical representation of the five-number summary, showing the median, quartiles, and range of a data set. It visually displays the spread and central tendency of the data.
Interquartile Range (IQR)The difference between the third quartile (Q3) and the first quartile (Q1) (IQR = Q3 - Q1). It represents the spread of the middle 50% of the data.
OutlierA data point that is significantly different from other data points in a data set. In box plots, outliers are often identified using a rule based on the IQR.
SkewnessA measure of the asymmetry of a probability distribution. In a box plot, skewness can be inferred by the position of the median within the box and the lengths of the whiskers.

Watch Out for These Misconceptions

Common MisconceptionBelieving that a 'negative correlation' means there is no relationship.

What to Teach Instead

Students often associate 'negative' with 'bad' or 'non-existent'. Using examples like 'the more you exercise, the lower your resting heart rate' helps them see that a negative correlation is a very strong, predictable relationship where variables move in opposite directions. Peer-led 'example hunting' is helpful here.

Common MisconceptionThinking the line of best fit must touch the first and last data points.

What to Teach Instead

Students often try to 'connect the dots'. Using a clear plastic ruler on a scatter plot and having students 'balance' the number of points above and below the line helps them understand that the line represents the *trend*, not the individual points. Collaborative 'line balancing' is a great fix.

Active Learning Ideas

See all activities

Real-World Connections

  • Financial analysts use box plots to visualize the distribution of stock prices or company earnings over a period, quickly identifying typical ranges, extreme values, and potential market volatility.
  • Sports statisticians employ box plots to compare player performance across different metrics, such as comparing the distribution of points scored per game for two different basketball players or teams.
  • Medical researchers use box plots to display the distribution of patient recovery times or drug efficacy measurements, helping to understand treatment variability and identify unusual responses.

Assessment Ideas

Quick Check

Provide students with a data set (e.g., heights of students in class). Ask them to calculate the five-number summary and then draw a box plot. Check their calculations and the accuracy of their plot construction.

Exit Ticket

Present students with two box plots comparing test scores from two different classes. Ask them to write two sentences comparing the central tendency and spread of the scores, and one sentence about which class performed more consistently.

Discussion Prompt

Pose the question: 'How can a box plot help us identify unusual data points that might warrant further investigation?' Facilitate a class discussion where students explain the concept of outliers and how the IQR is used to detect them.

Frequently Asked Questions

What is the difference between correlation and causation?
Correlation means two things change at the same time. Causation means one thing *makes* the other happen. For example, carrying an umbrella is correlated with rain, but carrying the umbrella doesn't *cause* the rain to fall. Identifying this difference is a vital life skill for evaluating news and advertisements.
How can active learning help students understand bivariate data?
Active learning, like collecting and plotting their own data, makes the 'dots' on the graph represent real people or objects. This connection makes the 'trend' more obvious. When students debate 'spurious correlations', they are engaging in higher-order thinking that moves them from just drawing lines to actually interpreting what the data means.
What is 'extrapolation' and why is it risky?
Extrapolation is making a prediction outside the range of the data you've collected. It's risky because you are assuming the trend will continue forever. For example, if a baby grows 10cm in a year, extrapolation would predict they'll be 3 metres tall by age 20, which obviously isn't true!
How do I know if a correlation is 'strong' or 'weak'?
A correlation is 'strong' if the points on the scatter plot are very close to the line of best fit, forming a clear 'path'. It is 'weak' if the points are more spread out like a cloud, but you can still see a general upward or downward direction.

Planning templates for Mathematics