Box Plots and Five-Number Summary
Constructing and interpreting box plots from a five-number summary to visualize data distribution.
About This Topic
Bivariate data involves examining the relationship between two different numerical variables, such as height and shoe size, or study hours and exam scores. Students learn to create scatter plots, identify the 'direction' (positive or negative) and 'strength' of a correlation, and draw a line of best fit. A major focus in Year 10 is the critical understanding that correlation does not equal causation, just because two things move together doesn't mean one causes the other.
This topic is a key part of statistical literacy in the Australian Curriculum, teaching students to be skeptical consumers of data. It introduces the concept of interpolation (predicting within the data range) and extrapolation (predicting outside it), along with the risks involved. This topic comes alive when students can collect their own bivariate data and use collaborative tools to find trends, fostering a sense of discovery and scientific inquiry.
Key Questions
- Explain how a box plot visually represents the five-number summary.
- Analyze how to identify outliers using the interquartile range.
- Design a box plot for a given data set and interpret its skewness.
Learning Objectives
- Calculate the five-number summary (minimum, first quartile, median, third quartile, maximum) for a given data set.
- Construct a box plot accurately from a calculated five-number summary.
- Analyze a box plot to identify the range, interquartile range, and potential outliers.
- Compare the distribution and skewness of two or more data sets represented by box plots.
- Explain the relationship between the visual elements of a box plot and the underlying data distribution.
Before You Start
Why: Students need to understand how to calculate the mean, median, and mode to properly find the median and understand its role in the five-number summary.
Why: Students must be familiar with the range and how to calculate it, as this is a fundamental component of data distribution and is extended by the IQR.
Why: Prior experience with other graphical data displays helps students understand the purpose and interpretation of visual data summaries like box plots.
Key Vocabulary
| Five-Number Summary | A set of five key values that describe the distribution of a data set: minimum, first quartile (Q1), median (Q2), third quartile (Q3), and maximum. |
| Box Plot | A graphical representation of the five-number summary, showing the median, quartiles, and range of a data set. It visually displays the spread and central tendency of the data. |
| Interquartile Range (IQR) | The difference between the third quartile (Q3) and the first quartile (Q1) (IQR = Q3 - Q1). It represents the spread of the middle 50% of the data. |
| Outlier | A data point that is significantly different from other data points in a data set. In box plots, outliers are often identified using a rule based on the IQR. |
| Skewness | A measure of the asymmetry of a probability distribution. In a box plot, skewness can be inferred by the position of the median within the box and the lengths of the whiskers. |
Watch Out for These Misconceptions
Common MisconceptionBelieving that a 'negative correlation' means there is no relationship.
What to Teach Instead
Students often associate 'negative' with 'bad' or 'non-existent'. Using examples like 'the more you exercise, the lower your resting heart rate' helps them see that a negative correlation is a very strong, predictable relationship where variables move in opposite directions. Peer-led 'example hunting' is helpful here.
Common MisconceptionThinking the line of best fit must touch the first and last data points.
What to Teach Instead
Students often try to 'connect the dots'. Using a clear plastic ruler on a scatter plot and having students 'balance' the number of points above and below the line helps them understand that the line represents the *trend*, not the individual points. Collaborative 'line balancing' is a great fix.
Active Learning Ideas
See all activitiesInquiry Circle: The Great Class Data Collection
Students work in groups to collect two pieces of data from their peers (e.g., arm span vs. height). They plot this on a shared digital scatter plot and use a 'string' or digital tool to find the line of best fit, discussing whether their data shows a strong or weak relationship.
Formal Debate: Spurious Correlations
The teacher provides 'crazy' correlations (e.g., ice cream sales vs. shark attacks). Students must work in pairs to identify the 'hidden variable' (e.g., summer heat) and debate why these two things are correlated but not causal.
Gallery Walk: Prediction Posters
Groups are given a scatter plot with a line of best fit. They must create a poster that uses the line to make one 'safe' prediction (interpolation) and one 'risky' prediction (extrapolation), explaining the dangers of the latter. The class rotates to critique the 'riskiness' of the predictions.
Real-World Connections
- Financial analysts use box plots to visualize the distribution of stock prices or company earnings over a period, quickly identifying typical ranges, extreme values, and potential market volatility.
- Sports statisticians employ box plots to compare player performance across different metrics, such as comparing the distribution of points scored per game for two different basketball players or teams.
- Medical researchers use box plots to display the distribution of patient recovery times or drug efficacy measurements, helping to understand treatment variability and identify unusual responses.
Assessment Ideas
Provide students with a data set (e.g., heights of students in class). Ask them to calculate the five-number summary and then draw a box plot. Check their calculations and the accuracy of their plot construction.
Present students with two box plots comparing test scores from two different classes. Ask them to write two sentences comparing the central tendency and spread of the scores, and one sentence about which class performed more consistently.
Pose the question: 'How can a box plot help us identify unusual data points that might warrant further investigation?' Facilitate a class discussion where students explain the concept of outliers and how the IQR is used to detect them.
Frequently Asked Questions
What is the difference between correlation and causation?
How can active learning help students understand bivariate data?
What is 'extrapolation' and why is it risky?
How do I know if a correlation is 'strong' or 'weak'?
Planning templates for Mathematics
5E Model
The 5E Model structures lessons through five phases (Engage, Explore, Explain, Elaborate, and Evaluate), guiding students from curiosity to deep understanding through inquiry-based learning.
Unit PlannerMath Unit
Plan a multi-week math unit with conceptual coherence: from building number sense and procedural fluency to applying skills in context and developing mathematical reasoning across a connected sequence of lessons.
RubricMath Rubric
Build a math rubric that assesses problem-solving, mathematical reasoning, and communication alongside procedural accuracy, giving students feedback on how they think, not just whether they got the right answer.
More in Statistical Investigations and Data Analysis
Comparing Data Sets using Box Plots and Histograms
Using visual displays and summary statistics to compare two or more data sets.
2 methodologies
Bivariate Data and Scatter Plots
Examining the relationship between two numerical variables and identifying trends.
2 methodologies
Correlation and Causation
Understanding the difference between correlation and causation in bivariate data.
2 methodologies
Line of Best Fit and Prediction
Drawing and using lines of best fit to make predictions and interpret relationships.
2 methodologies
Introduction to Linear Regression
Using technology to find the equation of the least squares regression line.
2 methodologies
Statistical Investigations: Planning and Reporting
Designing and conducting a statistical investigation, from formulating questions to presenting findings.
2 methodologies