Data Cleaning and Preprocessing

Active learning works for data cleaning and preprocessing because students need to experience firsthand how messy data can obscure or reveal stories. When they see a poorly designed visualization fail to communicate, they understand why cleaning and preprocessing matter.

Common Core State StandardsCSTA: 3A-DA-11

20–45 minPairs → Whole Class3 activities

Activity 01

Gallery Walk35 min · Individual

Gallery Walk: The Good, The Bad, and The Misleading

Post various charts and infographics around the room. Students use a checklist to identify 'design wins' and 'design sins,' such as truncated Y-axes or confusing color schemes.

Explain how to handle missing or corrupted data in a large dataset.

Facilitation TipFor the Gallery Walk, circulate with a checklist of common design flaws so you can redirect groups to specific issues like inconsistent scales or misleading axes.

What to look forPresent students with a small table containing 5-7 rows of sample data with clear errors (e.g., a missing age, a text value in a numerical column, a duplicate entry). Ask them to list the specific errors they find and suggest one method for correcting each.

UnderstandApplyAnalyzeCreateRelationship SkillsSocial Awareness

Generate Complete Lesson

Activity 02

Inquiry Circle45 min · Small Groups

Inquiry Circle: Data Makeover

Give groups a poorly designed chart and the raw data it represents. They must work together to create a new, more accurate and persuasive visualization for a specific target audience.

Differentiate between various data cleaning techniques (e.g., imputation, outlier removal).

Facilitation TipDuring the Data Makeover, assign each group one messy dataset and one clear question it should answer, so their makeover has a measurable goal.

What to look forPose the scenario: 'Imagine you are cleaning a dataset of student test scores, and you find that 10% of the scores are missing. What are at least two different approaches you could take to handle these missing scores, and what are the pros and cons of each approach?' Facilitate a class discussion on their responses.

AnalyzeEvaluateCreateSelf-ManagementSelf-Awareness

Generate Complete Lesson

Activity 03

Think-Pair-Share20 min · Pairs

Think-Pair-Share: Color and Perception

Show two versions of the same map: one using a 'stoplight' (red/green) scale and one using a blue/orange scale. Students discuss which is better for accessibility and how color changes the 'mood' of the data.

Construct a plan for cleaning a given messy dataset.

Facilitation TipIn the Color and Perception activity, provide identical data to both partners but different color palettes so they can directly compare how perception changes with design choices.

What to look forProvide students with a brief description of a messy dataset (e.g., 'customer purchase history with some missing product IDs and inconsistent date formats'). Ask them to write down three specific cleaning steps they would perform on this data and the order in which they would perform them.

UnderstandApplyAnalyzeSelf-AwarenessRelationship Skills

Generate Complete Lesson

A few notes on teaching this unit

Start by teaching students the mantra: 'The data is not the visualization, and the visualization is not the truth.' Avoid overwhelming them with rules—instead, let them discover principles through critique and revision. Research shows that students learn data ethics best when they create misleading visualizations themselves, then reflect on why clarity matters.

Successful learning looks like students recognizing design flaws in visualizations, suggesting clear corrections, and explaining why their changes improve clarity. They should also articulate how human choices in color, scale, and chart type influence interpretation.

Watch Out for These Misconceptions

During Gallery Walk: Watch for students assuming all charts are objective. Redirect them by asking, 'What story does this chart tell, and who benefits from that story?'
Direct students to compare two charts of the same data side by side. Ask them to identify which chart aligns with the data and which alters the narrative, then explain their reasoning using specific design choices.
During Data Makeover: Watch for students prioritizing aesthetics over clarity. Redirect them by asking, 'Does this change make the data easier or harder to understand?'
Have students present their cleaned visualization to the class and justify each design choice. Peers should vote on whether the chart answers its intended question, forcing students to defend their clarity-focused decisions.

Methods used in this brief

More in Data Intelligence and Visualization

Data Collection Methods and Bias

Students will explore techniques for gathering data and analyze how bias in data collection can lead to inaccurate conclusions.

2 methodologies

Ethical Data Scraping and Privacy

Students will discuss the ethical considerations of scraping data from public websites and privacy implications.

2 methodologies

Correlation vs. Causation

Students will analyze why correlation does not necessarily imply a causal relationship.

2 methodologies

Identifying Trends in Data

Students will use computational tools to identify patterns and trends within datasets.

1 methodologies

Evaluating Data-Driven Conclusions

Students will learn to critically evaluate conclusions drawn from data, considering limitations and potential biases.

2 methodologies