Computer Science · 9th Grade · Data Intelligence and Visualization · Weeks 28-36

Correlation vs. Causation

Students will analyze why correlation does not necessarily imply a causal relationship.

Common Core State StandardsCSTA: 3A-DA-12

About This Topic

Correlation and causation is one of the most practically important distinctions in data literacy. When two variables move together, people naturally assume one causes the other. But correlation only measures how variables co-vary; it says nothing about whether one produces the other. Students who can spot this distinction become much more careful consumers of statistics in news articles, political claims, and scientific reporting.

In 9th grade, this topic connects to CSTA 3A-DA-12, which asks students to analyze the characteristics of data to effectively use it to solve problems. Students should practice both finding spurious correlations and identifying the confounding variables that explain why two unrelated things appear to move together. Classic examples from education research, health statistics, and social science give students concrete cases to analyze.

Active learning strategies that involve argument and debate are especially effective here because the temptation to infer causation is strong. When students have to defend or attack a causal claim in front of peers, they sharpen their reasoning about what evidence would actually prove causation.

Key Questions

Explain why correlation does not necessarily imply a causal relationship.
Differentiate between correlation and causation using real-world examples.
Critique claims of causation based solely on correlational data.

Learning Objectives

Analyze datasets to identify instances where correlation exists but causation is unlikely.
Explain the role of confounding variables in creating spurious correlations.
Critique media headlines and advertisements that incorrectly infer causation from correlation.
Differentiate between correlation and causation using at least two distinct real-world examples.
Evaluate the strength of evidence required to establish a causal relationship between two variables.

Before You Start

Introduction to Data Analysis

Why: Students need a basic understanding of how to interpret data tables and graphs to identify patterns and relationships between variables.

Basic Statistical Measures

Why: Familiarity with concepts like averages and trends helps students understand how variables co-vary, which is the basis of correlation.

Key Vocabulary

Correlation	A statistical measure that describes the extent to which two variables change together. It indicates a relationship but not necessarily a cause-and-effect link.
Causation	The relationship between cause and effect, where one event is the direct result of another event.
Spurious Correlation	A relationship between two variables that appears to be causal but is actually due to coincidence or a third, unobserved variable.
Confounding Variable	An unmeasured variable that influences both the presumed cause and the presumed effect, creating a misleading association between them.

Watch Out for These Misconceptions

Common MisconceptionA strong correlation means one variable definitely causes the other.

What to Teach Instead

A strong correlation only means two variables tend to move together. Causation requires evidence of a mechanism and is typically established through controlled experiments, not observation alone. Debate and critique activities where students must argue both sides help them feel the weakness of correlation-only arguments.

Common MisconceptionIf there is no correlation, there is no relationship.

What to Teach Instead

Some relationships are non-linear: two variables might be strongly related in a curved or cyclic pattern while showing near-zero linear correlation. Students who only look at correlation coefficients without graphing data can completely miss these patterns. Scatter plot analysis activities help build this visual awareness.

Active Learning Ideas

See all activities

Formal Debate: Does Ice Cream Cause Drowning?

Present the classic ice cream and drowning correlation (both rise in summer). Half the class argues it is causal; the other half argues against. After three minutes of debate, introduce the confounding variable (summer heat) and discuss what evidence would have been needed to establish causation.

25 min·Whole Class

Gallery Walk: Spurious Correlations

Post five or six real spurious correlation charts around the room. Groups rotate and for each chart write: (1) the apparent conclusion someone might draw, (2) why correlation does not prove causation here, and (3) a plausible confounding variable. Groups share their best find with the class.

35 min·Small Groups

Think-Pair-Share: News Headline Analysis

Provide four recent news headlines that imply causation from correlational data. Students individually identify the implied causal claim and whether the evidence supports it. Partners compare their analysis and discuss what a rigorous study would need to do differently.

20 min·Pairs

Real-World Connections

Medical researchers must distinguish between a symptom that correlates with a disease and the actual cause of the disease to develop effective treatments. For example, a high fever correlates with many illnesses, but it is a symptom, not the cause itself.
Marketing professionals sometimes present data showing a correlation between their product's sales and a positive trend, like increased happiness, without proving their product caused the happiness. Consumers need to recognize this distinction.
In public policy, policymakers might observe a correlation between increased ice cream sales and higher crime rates during summer months. They need to identify the confounding variable, such as warmer weather, which influences both.

Assessment Ideas

Exit Ticket

Provide students with three scenarios: one showing clear causation, one showing correlation without causation, and one with a potential confounding variable. Ask students to label each scenario and write one sentence explaining their reasoning for the correlation/causation distinction.

Discussion Prompt

Present students with a headline like 'Study Shows Coffee Drinkers Live Longer.' Ask them: 'What is the correlation being presented here? What are two possible confounding variables that could explain this correlation? What kind of study design would be needed to suggest causation?'

Quick Check

Show students two graphs: Graph A displays a strong positive correlation between two unrelated variables (e.g., number of pirates and global warming). Graph B displays a clear causal relationship (e.g., hours studied and test scores). Ask students to identify which graph represents correlation only and which might represent causation, and to briefly justify their answers.

Frequently Asked Questions

What is the difference between correlation and causation?

Correlation describes a statistical relationship where two variables tend to move together. Causation means one variable directly produces a change in the other. Establishing causation requires controlled experiments or strong theoretical mechanisms. Correlation alone, no matter how strong, cannot prove that one thing causes another.

What is a confounding variable?

A confounding variable is a third factor that influences both variables being studied, creating the appearance of a relationship between them. In the ice cream and drowning example, summer heat is the confounder: it independently increases both ice cream sales and swimming activity. Identifying confounders is the first step in questioning a causal claim.

How do researchers establish causation if not through correlation?

The gold standard is a randomized controlled experiment, where subjects are randomly assigned to a treatment or control group. Randomization eliminates systematic differences between groups, so any observed difference can be attributed to the treatment. When experiments are unethical or impractical, researchers use techniques like natural experiments or propensity score matching.

How does active learning help students understand correlation vs. causation?

Debate activities force students to articulate why a correlation does not prove causation rather than just passively accepting the fact. When students must construct and defend arguments, they internalize the reasoning at a deeper level. Analyzing real spurious correlations adds a memorable hook that makes the concept stick long after the lesson.

More in Data Intelligence and Visualization

Data Collection Methods and Bias

Students will explore techniques for gathering data and analyze how bias in data collection can lead to inaccurate conclusions.

2 methodologies

Ethical Data Scraping and Privacy

Students will discuss the ethical considerations of scraping data from public websites and privacy implications.

2 methodologies

Data Cleaning and Preprocessing

Students will learn the necessity of cleaning data to ensure accuracy and handle missing or corrupted data.

2 methodologies

Identifying Trends in Data

Students will use computational tools to identify patterns and trends within datasets.

2 methodologies

Evaluating Data-Driven Conclusions

Students will learn to critically evaluate conclusions drawn from data, considering limitations and potential biases.

2 methodologies

Ethical Implications of Algorithmic Predictions

Students will discuss the dangers of over-relying on algorithmic predictions for social issues.

2 methodologies