Skip to content

Data Cleaning and PreprocessingActivities & Teaching Strategies

Active learning works for data cleaning and preprocessing because students must engage directly with messy datasets to understand why clean data matters. When students manipulate real-world data, they experience firsthand how poor data quality distorts insights, making abstract concepts like outliers and missing values tangible and memorable.

Year 9Technologies3 activities20 min50 min

Learning Objectives

  1. 1Identify types of data errors, including missing values, inconsistencies, and outliers, within a given dataset.
  2. 2Compare and contrast different data cleaning techniques, such as imputation, outlier removal, and data standardization, for suitability to specific error types.
  3. 3Construct a step-by-step plan to preprocess a provided dataset, justifying the chosen cleaning methods.
  4. 4Evaluate the impact of 'dirty data' on the accuracy and reliability of analytical results using a case study.
  5. 5Demonstrate the application of at least two data cleaning techniques using a spreadsheet or data analysis tool.

Want a complete lesson plan with these objectives? Generate a Mission

40 min·Small Groups

Gallery Walk: The Good, The Bad, and The Misleading

Display various data visualizations around the room, including some that are intentionally misleading (e.g., truncated y-axes). Students move in groups to identify what each graph is trying to say and any 'tricks' used to distort the data.

Prepare & details

Explain the impact of 'dirty data' on the accuracy of analytical results.

Facilitation Tip: During the Gallery Walk, circulate and ask students to explain why they grouped charts as 'good,' 'bad,' or 'misleading' to uncover their reasoning processes.

Setup: Wall space or tables arranged around room perimeter

Materials: Large paper/poster boards, Markers, Sticky notes for feedback

UnderstandApplyAnalyzeCreateRelationship SkillsSocial Awareness
50 min·Small Groups

Inquiry Circle: Data Makeover

Give groups a boring table of raw data and a poorly chosen chart. They must work together to create three different visualizations of that same data, explaining which one is most effective for a specific target audience, such as a local council.

Prepare & details

Differentiate between various data cleaning techniques and their appropriate uses.

Facilitation Tip: For Data Makeover, provide a rubric with clear criteria for clean data and effective visualizations to guide student revisions.

Setup: Groups at tables with access to source materials

Materials: Source material collection, Inquiry cycle worksheet, Question generation protocol, Findings presentation template

AnalyzeEvaluateCreateSelf-ManagementSelf-Awareness
20 min·Pairs

Think-Pair-Share: Outlier Detective

Show a scatter plot with several clear outliers. In pairs, students discuss what might have caused these outliers (data error vs. interesting phenomenon) and whether they should be included or removed from the final analysis.

Prepare & details

Construct a plan to preprocess a given dataset for analysis.

Facilitation Tip: In Outlier Detective, give students a limited time to analyze outliers before discussing how context determines whether an outlier is meaningful or erroneous.

Setup: Standard classroom seating; students turn to a neighbor

Materials: Discussion prompt (projected or printed), Optional: recording sheet for pairs

UnderstandApplyAnalyzeSelf-AwarenessRelationship Skills

Teaching This Topic

Experienced teachers approach this topic by focusing on real-world data rather than textbook examples, as students engage more when they see the relevance of their work. Avoid starting with theory; instead, let students encounter data problems organically through activities, then guide them to discover solutions collaboratively. Research shows that when students experience the frustration of working with 'dirty' data, they develop a deeper appreciation for the importance of preprocessing steps like handling missing values or correcting errors.

What to Expect

Successful learning looks like students confidently identifying data issues, justifying their cleaning choices, and selecting appropriate visualizations to communicate findings clearly. They should also articulate why certain chart types or cleaning methods are more effective than others for specific datasets.

These activities are a starting point. A full mission is the experience.

  • Complete facilitation script with teacher dialogue
  • Printable student materials, ready for class
  • Differentiation strategies for every learner
Generate a Mission

Watch Out for These Misconceptions

Common MisconceptionDuring the Gallery Walk, watch for students who assume colorful or visually complex charts are inherently better, even when they obscure the data.

What to Teach Instead

Use the Gallery Walk as a chance to redirect students to the rubric, asking them to compare charts based on clarity, not aesthetics. For example, if a student praises a 3D pie chart, ask them to explain how the extra dimension affects their ability to read the data.

Common MisconceptionDuring Data Makeover, watch for students who believe all outliers must be removed to make the data 'correct.'

What to Teach Instead

During the activity, challenge students to research their outlier before deleting it. Ask them to consider whether the outlier represents a true error or an important trend, using the dataset’s context to guide their decision.

Assessment Ideas

Quick Check

After the Gallery Walk, provide students with a small, intentionally flawed dataset (e.g., a table of student heights with missing entries and unrealistic values). Ask them to identify at least three specific data quality issues and propose one method to address each.

Discussion Prompt

During Data Makeover, present students with two versions of an analysis report: one based on raw, 'dirty' data and another based on cleaned data. Facilitate a class discussion using these questions: 'What differences do you observe in the conclusions drawn from each report?' and 'How did the data cleaning process influence the final results?'

Exit Ticket

After Outlier Detective, ask students to define 'outlier' in their own words and describe one scenario where an outlier might be intentionally kept rather than removed. Also, ask them to list one common method for handling missing data.

Extensions & Scaffolding

  • Challenge: Ask students to find their own flawed dataset online, clean it, and present a before-and-after comparison with an explanation of their choices.
  • Scaffolding: Provide a partially cleaned dataset for students who struggle with the initial steps, so they can focus on identifying remaining issues.
  • Deeper exploration: Have students research how data cleaning is used in a specific career field (e.g., healthcare, finance) and present their findings to the class.

Key Vocabulary

Dirty DataRefers to data that contains errors, inaccuracies, or inconsistencies, making it unreliable for analysis.
Missing ValuesData points that are absent in a dataset. These can be handled through imputation or removal.
OutliersData points that significantly differ from other observations in a dataset. They may indicate errors or unusual events.
Data ImputationThe process of replacing missing data values with substituted values, such as the mean, median, or mode of the dataset.
Data StandardizationThe process of scaling data to a common range, often between 0 and 1, or with a mean of 0 and a standard deviation of 1, to ensure fair comparison.

Ready to teach Data Cleaning and Preprocessing?

Generate a full mission with everything you need

Generate a Mission