Interpreting Scatter Plots and Association
Interpreting scatter plots to look for patterns, clusters, and outliers in data sets.
About This Topic
A line of best fit (or trend line) is a straight line that best represents the data on a scatter plot. In the Ontario Grade 8 curriculum, students learn to informally fit these lines by eye, ensuring that the line follows the general direction of the dots with roughly equal numbers of points above and below it. This is a crucial step in moving from data visualization to data prediction.
Students use the equation of the line of best fit to make 'interpolations' (predictions within the data range) and 'extrapolations' (predictions outside the range). They also interpret the slope and y-intercept of the line in the context of the data. For example, in a graph of 'hours worked' vs. 'money earned,' the slope represents the hourly wage. This connects data analysis directly back to linear algebra.
This topic comes alive when students can engage in collaborative investigations. By using real-world data sets, like the growth of a plant over time or the cooling of a cup of cocoa, students see how a simple line can help them predict the future and understand the underlying rate of change.
Key Questions
- Analyze what the strength and direction of a correlation tell us about the relationship between two variables.
- Explain how outliers influence our interpretation of a data set.
- Differentiate between positive, negative, and no association in scatter plots.
Learning Objectives
- Analyze scatter plots to identify patterns, clusters, and outliers in bivariate data sets.
- Explain the meaning of positive, negative, and no association between two variables as represented on a scatter plot.
- Evaluate how the presence of outliers can influence the perceived relationship between two variables in a scatter plot.
- Compare the strength and direction of association between different pairs of variables presented in scatter plots.
Before You Start
Why: Students need to be able to plot and read points on a coordinate plane before interpreting scatter plots.
Why: Understanding basic data sets and how to organize them is foundational to interpreting patterns within them.
Key Vocabulary
| Scatter Plot | A graph that displays the relationship between two quantitative variables by plotting individual data points. |
| Association | The relationship between two variables. This can be positive, negative, or show no clear pattern. |
| Outlier | A data point that is significantly different from other data points in the set, potentially affecting the interpretation of the overall trend. |
| Cluster | A group of data points that are close together on a scatter plot, suggesting a concentration of values for the two variables. |
Watch Out for These Misconceptions
Common MisconceptionStudents often think the line of best fit must pass through the origin (0,0).
What to Teach Instead
Show data sets where the starting value isn't zero (like 'age vs. height'). Peer discussion about why a baby isn't 0cm tall at birth helps them see that the y-intercept must match the data, not just the corner of the graph.
Common MisconceptionStudents may try to 'connect the dots' like a dot-to-dot puzzle.
What to Teach Instead
Use the 'spaghetti' method to show that the line represents the *trend*, not every single point. Collaborative work where students compare their straight lines to a 'connected' line helps them see which one is more useful for making general predictions.
Active Learning Ideas
See all activitiesInquiry Circle: The Spaghetti Fit
Groups are given scatter plots and pieces of uncooked spaghetti. They must place the spaghetti to represent the 'best fit' for the data, then use two points on their spaghetti line to calculate the slope and write the equation, comparing their line's 'fit' with other groups.
Think-Pair-Share: Predicting the Future
Give students a scatter plot of Canadian Olympic medal counts over time with a line of best fit. Students use the line to predict the count for the next Olympics (extrapolation). They pair up to discuss how reliable they think that prediction is and what factors might change it.
Peer Teaching: Slope in the Real World
Pairs are given different trend lines (e.g., 'fuel used vs. distance' or 'height vs. age'). They must explain to another pair what the slope and intercept mean in that specific context (e.g., 'the slope is the liters per kilometer').
Real-World Connections
- Environmental scientists use scatter plots to examine the relationship between air pollution levels and respiratory illness rates in urban areas, helping to identify potential health risks.
- Economists analyze scatter plots to investigate correlations between factors like education level and average income across different regions, informing policy decisions.
- Sports analysts use scatter plots to compare player statistics, such as points scored versus assists, to understand player performance and team dynamics.
Assessment Ideas
Provide students with 2-3 different scatter plots. Ask them to write one sentence describing the association (positive, negative, or none) for each plot and identify any obvious outliers.
Give students a scatter plot showing student study hours versus test scores. Ask them to explain in their own words what the pattern on the plot suggests about the relationship between these two variables and to identify one outlier if present.
Present a scatter plot with a clear outlier. Ask students: 'How does this single data point affect our understanding of the overall relationship between the two variables? What might this outlier represent in the real world?'
Frequently Asked Questions
How do you draw a line of best fit?
What is the difference between interpolation and extrapolation?
How can active learning help students understand lines of best fit?
What does the slope of a trend line tell us?
Planning templates for Mathematics
5E Model
The 5E Model structures lessons through five phases (Engage, Explore, Explain, Elaborate, and Evaluate), guiding students from curiosity to deep understanding through inquiry-based learning.
Unit PlannerMath Unit
Plan a multi-week math unit with conceptual coherence: from building number sense and procedural fluency to applying skills in context and developing mathematical reasoning across a connected sequence of lessons.
RubricMath Rubric
Build a math rubric that assesses problem-solving, mathematical reasoning, and communication alongside procedural accuracy, giving students feedback on how they think, not just whether they got the right answer.
More in Patterns in Data
Constructing Scatter Plots
Constructing scatter plots for bivariate measurement data to observe patterns.
3 methodologies
Lines of Best Fit
Informally fitting a straight line to data and using the equation of that line to make predictions.
3 methodologies
Using Linear Models for Prediction
Using the equation of a linear model to solve problems in the context of bivariate measurement data.
3 methodologies
Correlation vs. Causation
Understanding that correlation does not imply causation.
3 methodologies
Two-Way Tables for Categorical Data
Using two-way tables to summarize bivariate categorical data.
3 methodologies
Relative Frequencies and Associations
Calculating relative frequencies for two-way tables and identifying possible associations between the two categories.
3 methodologies