Skip to content
Computer Science · Grade 9

Active learning ideas

Data Cleaning and Preprocessing

Active learning works for data cleaning and preprocessing because students need to experience the frustration of messy data to understand why cleaning matters. Working with real, imperfect datasets helps them see the direct impact of their decisions on analysis quality.

Ontario Curriculum ExpectationsCS.HS.DA.4CS.HS.S.2
15–45 minPairs → Whole Class3 activities

Activity 01

Gallery Walk45 min · Whole Class

Gallery Walk: Data Storytelling

Groups create large-scale visualizations of a local issue (e.g., cafeteria waste or local transit times). They display their charts around the room, and other students use sticky notes to write down one 'story' or 'trend' they see in the data.

Explain why data cleaning is a crucial step before data analysis.

Facilitation TipDuring the Gallery Walk, circulate and ask students to explain why they chose specific visualizations for their datasets rather than telling them if they are correct.

What to look forProvide students with a small, messy dataset (e.g., a list of student heights with some missing values and inconsistent units). Ask them to list two specific problems they observe and propose one method to address each problem.

UnderstandApplyAnalyzeCreateRelationship SkillsSocial Awareness
Generate Complete Lesson

Activity 02

Inquiry Circle30 min · Small Groups

Inquiry Circle: The Bias Hunt

Provide groups with three different graphs of the same data set, each using a different scale or chart type. Students must figure out which graph is the most 'honest' and which ones might be trying to mislead the viewer.

Analyze common types of data errors and inconsistencies.

Facilitation TipFor The Bias Hunt, provide printed survey questions so students can physically mark language that might lead respondents toward certain answers.

What to look forPresent students with a scenario: 'A survey collected responses about favorite colors, but some entries are 'blue', 'Blue', and 'blu'. What type of data error is this, and how would you standardize it?' Gauge understanding of inconsistency and standardization.

AnalyzeEvaluateCreateSelf-ManagementSelf-Awareness
Generate Complete Lesson

Activity 03

Think-Pair-Share15 min · Pairs

Think-Pair-Share: Ethical Collection

Students are given a scenario where a new app wants to collect their location data. They discuss with a partner: What is the benefit to the user? What is the risk? Is the collection ethical?

Design a strategy to address missing or erroneous data in a given dataset.

Facilitation TipIn Think-Pair-Share, assign roles: one student explains ethical collection principles, the other identifies potential violations in a given scenario.

What to look forFacilitate a class discussion using the prompt: 'Imagine you are cleaning data for a survey on student opinions about school lunches. One question asks for a rating from 1 to 5, but some students wrote 'good' or 'great'. What are the implications of these non-numeric responses for your analysis, and what are your options for handling them?'

UnderstandApplyAnalyzeSelf-AwarenessRelationship Skills
Generate Complete Lesson

A few notes on teaching this unit

Teachers should model data cleaning with think-alouds, showing how they decide to standardize categories or handle missing values. Avoid the trap of treating data cleaning as a mechanical task. Emphasize that every decision reflects assumptions about what counts as valid data. Research shows students grasp these concepts better when they work with datasets they care about, so incorporate student-generated data when possible.

Successful learning looks like students confidently identifying data issues, justifying their cleaning choices, and explaining how those choices affect the stories their visualizations tell. They should connect technical steps to ethical and practical implications.


Watch Out for These Misconceptions

  • During the Gallery Walk, watch for students assuming their visualizations are correct because they look polished.

    Have peers ask presenters to explain how each choice of chart type connects to the data's structure and purpose. Use a simple rubric during the walk to guide their feedback.

  • During The Bias Hunt, students may think bias only comes from obvious wording like 'Do you agree that this is the best plan?'.

    Provide examples of subtle bias, such as leading scales or double-barreled questions, and have students rewrite these questions to remove bias, then discuss why their versions are better.


Methods used in this brief