Correlation vs. Causation
Students will analyze why correlation does not necessarily imply a causal relationship.
About This Topic
Correlation and causation is one of the most practically important distinctions in data literacy. When two variables move together, people naturally assume one causes the other. But correlation only measures how variables co-vary; it says nothing about whether one produces the other. Students who can spot this distinction become much more careful consumers of statistics in news articles, political claims, and scientific reporting.
In 9th grade, this topic connects to CSTA 3A-DA-12, which asks students to analyze the characteristics of data to effectively use it to solve problems. Students should practice both finding spurious correlations and identifying the confounding variables that explain why two unrelated things appear to move together. Classic examples from education research, health statistics, and social science give students concrete cases to analyze.
Active learning strategies that involve argument and debate are especially effective here because the temptation to infer causation is strong. When students have to defend or attack a causal claim in front of peers, they sharpen their reasoning about what evidence would actually prove causation.
Key Questions
- Explain why correlation does not necessarily imply a causal relationship.
- Differentiate between correlation and causation using real-world examples.
- Critique claims of causation based solely on correlational data.
Learning Objectives
- Analyze datasets to identify instances where correlation exists but causation is unlikely.
- Explain the role of confounding variables in creating spurious correlations.
- Critique media headlines and advertisements that incorrectly infer causation from correlation.
- Differentiate between correlation and causation using at least two distinct real-world examples.
- Evaluate the strength of evidence required to establish a causal relationship between two variables.
Before You Start
Why: Students need a basic understanding of how to interpret data tables and graphs to identify patterns and relationships between variables.
Why: Familiarity with concepts like averages and trends helps students understand how variables co-vary, which is the basis of correlation.
Key Vocabulary
| Correlation | A statistical measure that describes the extent to which two variables change together. It indicates a relationship but not necessarily a cause-and-effect link. |
| Causation | The relationship between cause and effect, where one event is the direct result of another event. |
| Spurious Correlation | A relationship between two variables that appears to be causal but is actually due to coincidence or a third, unobserved variable. |
| Confounding Variable | An unmeasured variable that influences both the presumed cause and the presumed effect, creating a misleading association between them. |
Watch Out for These Misconceptions
Common MisconceptionA strong correlation means one variable definitely causes the other.
What to Teach Instead
A strong correlation only means two variables tend to move together. Causation requires evidence of a mechanism and is typically established through controlled experiments, not observation alone. Debate and critique activities where students must argue both sides help them feel the weakness of correlation-only arguments.
Common MisconceptionIf there is no correlation, there is no relationship.
What to Teach Instead
Some relationships are non-linear: two variables might be strongly related in a curved or cyclic pattern while showing near-zero linear correlation. Students who only look at correlation coefficients without graphing data can completely miss these patterns. Scatter plot analysis activities help build this visual awareness.
Active Learning Ideas
See all activitiesFormal Debate: Does Ice Cream Cause Drowning?
Present the classic ice cream and drowning correlation (both rise in summer). Half the class argues it is causal; the other half argues against. After three minutes of debate, introduce the confounding variable (summer heat) and discuss what evidence would have been needed to establish causation.
Gallery Walk: Spurious Correlations
Post five or six real spurious correlation charts around the room. Groups rotate and for each chart write: (1) the apparent conclusion someone might draw, (2) why correlation does not prove causation here, and (3) a plausible confounding variable. Groups share their best find with the class.
Think-Pair-Share: News Headline Analysis
Provide four recent news headlines that imply causation from correlational data. Students individually identify the implied causal claim and whether the evidence supports it. Partners compare their analysis and discuss what a rigorous study would need to do differently.
Real-World Connections
- Medical researchers must distinguish between a symptom that correlates with a disease and the actual cause of the disease to develop effective treatments. For example, a high fever correlates with many illnesses, but it is a symptom, not the cause itself.
- Marketing professionals sometimes present data showing a correlation between their product's sales and a positive trend, like increased happiness, without proving their product caused the happiness. Consumers need to recognize this distinction.
- In public policy, policymakers might observe a correlation between increased ice cream sales and higher crime rates during summer months. They need to identify the confounding variable, such as warmer weather, which influences both.
Assessment Ideas
Provide students with three scenarios: one showing clear causation, one showing correlation without causation, and one with a potential confounding variable. Ask students to label each scenario and write one sentence explaining their reasoning for the correlation/causation distinction.
Present students with a headline like 'Study Shows Coffee Drinkers Live Longer.' Ask them: 'What is the correlation being presented here? What are two possible confounding variables that could explain this correlation? What kind of study design would be needed to suggest causation?'
Show students two graphs: Graph A displays a strong positive correlation between two unrelated variables (e.g., number of pirates and global warming). Graph B displays a clear causal relationship (e.g., hours studied and test scores). Ask students to identify which graph represents correlation only and which might represent causation, and to briefly justify their answers.
Frequently Asked Questions
What is the difference between correlation and causation?
What is a confounding variable?
How do researchers establish causation if not through correlation?
How does active learning help students understand correlation vs. causation?
More in Data Intelligence and Visualization
Data Collection Methods and Bias
Students will explore techniques for gathering data and analyze how bias in data collection can lead to inaccurate conclusions.
2 methodologies
Ethical Data Scraping and Privacy
Students will discuss the ethical considerations of scraping data from public websites and privacy implications.
2 methodologies
Data Cleaning and Preprocessing
Students will learn the necessity of cleaning data to ensure accuracy and handle missing or corrupted data.
2 methodologies
Identifying Trends in Data
Students will use computational tools to identify patterns and trends within datasets.
2 methodologies
Evaluating Data-Driven Conclusions
Students will learn to critically evaluate conclusions drawn from data, considering limitations and potential biases.
2 methodologies
Ethical Implications of Algorithmic Predictions
Students will discuss the dangers of over-relying on algorithmic predictions for social issues.
2 methodologies