Skip to content
Mathematics · 9th Grade · Statistical Reasoning and Data · Weeks 10-18

Lines of Best Fit and Regression

Using scatter plots and residuals to determine the strength and direction of linear correlations.

Common Core State StandardsCCSS.Math.Content.HSS.ID.B.6CCSS.Math.Content.HSS.ID.C.7

About This Topic

Interpreting residuals is the final step in validating a linear model. A residual is the difference between the actual observed value and the value predicted by the line of best fit. In 9th grade, students learn to create 'residual plots' to determine if a linear model is actually appropriate for the data. This is a sophisticated Common Core standard that moves students toward high-level statistical thinking.

If a residual plot shows a random scatter of points, the linear model is a good fit. However, if the residuals show a clear pattern (like a U-shape), it suggests that a non-linear model (like a quadratic) would be better. This topic comes alive when students can use collaborative investigations to 'audit' their own models, using residuals to prove whether their predictions are trustworthy or if they need a different mathematical approach.

Key Questions

  1. Justify why correlation does not necessarily imply a cause-and-effect relationship.
  2. Explain how residuals help us determine if a linear model is appropriate for a data set.
  3. Analyze what the r-value can tell us about the reliability of our predictions.

Learning Objectives

  • Analyze residual plots to evaluate the appropriateness of a linear model for a given data set.
  • Calculate residuals for a set of data points using a given linear regression equation.
  • Explain the meaning of the correlation coefficient (r-value) in terms of the strength and direction of a linear relationship.
  • Critique the reliability of predictions made by a linear model based on the residual plot and r-value.
  • Compare and contrast correlation with causation, providing examples where a strong correlation does not imply a cause-and-effect link.

Before You Start

Creating and Interpreting Scatter Plots

Why: Students need to be able to visualize the relationship between two variables before they can draw a line of best fit or analyze residuals.

Linear Equations and Graphing

Why: Understanding the equation of a line (y = mx + b) is fundamental to calculating predicted values and understanding the line of best fit.

Basic Data Analysis and Interpretation

Why: Students should have prior experience calculating basic statistics like mean and understanding how to interpret data tables.

Key Vocabulary

Scatter PlotA graph that displays the relationship between two quantitative variables by plotting individual data points.
Line of Best Fit (Regression Line)A straight line that best represents the trend in a scatter plot, minimizing the distance between the line and the data points.
ResidualThe difference between an observed value in a data set and the value predicted by the line of best fit for that observation.
Residual PlotA scatter plot where the x-axis represents the independent variable (or predicted values) and the y-axis represents the residuals.
Correlation Coefficient (r-value)A statistical measure that indicates the strength and direction of a linear relationship between two variables, ranging from -1 to +1.

Watch Out for These Misconceptions

Common MisconceptionStudents often think a 'pattern' in a residual plot is a good thing because patterns are usually good in math.

What to Teach Instead

Use the 'Model Audit' activity. Peer discussion helps students realize that a pattern in the 'error' (residuals) means the model is consistently missing something, which is a sign that the model is wrong.

Common MisconceptionBelieving that a high r-value means you don't need to check the residuals.

What to Teach Instead

Show a data set that is slightly curved but still has a high r-value. Collaborative analysis of the residual plot will reveal the curve that the r-value missed, proving that residuals are the 'final word' on model fit.

Active Learning Ideas

See all activities

Real-World Connections

  • Economists use regression analysis to model the relationship between advertising spending and product sales, helping companies determine optimal marketing budgets.
  • Environmental scientists analyze data on pollution levels and respiratory illnesses to understand potential correlations, informing public health policies and regulations.
  • Sports analysts employ regression to predict player performance based on historical statistics, aiding in team strategy and player evaluations.

Assessment Ideas

Exit Ticket

Provide students with a scatter plot, a line of best fit, and a residual plot. Ask them to write one sentence explaining whether the linear model is appropriate based on the residual plot and to identify the r-value if provided, stating what it indicates about the data.

Quick Check

Present students with two scenarios: Scenario A shows a strong positive correlation between hours studied and test scores, with a random scatter of residuals. Scenario B shows a moderate positive correlation, but the residuals form a clear U-shape. Ask students to explain which scenario's linear model is more reliable and why, referencing the residual plot.

Discussion Prompt

Pose the question: 'If ice cream sales and drowning incidents are highly correlated, does eating ice cream cause people to drown?' Guide students to discuss correlation versus causation, using the concepts of lurking variables and the interpretation of residuals to support their arguments.

Frequently Asked Questions

What is a residual plot?
A residual plot is a graph that shows the residuals (errors) on the vertical axis and the independent variable (x) on the horizontal axis. It helps you see if the errors are random or if there is a systematic problem with your model.
How can active learning help students understand residuals?
Active learning strategies like 'Predicting with Error' turn residuals into something real, the distance a doll falls or the amount a budget was off. When students see the 'residual' as a physical mistake they made in a simulation, they understand why we want those mistakes to be small and random. This makes the abstract process of 'analyzing error' feel like a necessary part of problem-solving.
What does a 'U-shaped' residual plot mean?
A U-shaped (or curved) pattern in a residual plot indicates that the relationship between the variables is probably not linear. It suggests that a quadratic or exponential model would be a better fit for the data.
Can a residual be negative?
Yes! A negative residual means the actual observed value was lower than what the model predicted. A positive residual means the actual value was higher than the prediction.

Planning templates for Mathematics