Skip to content
Technologies · Year 9

Active learning ideas

Data Cleaning and Preprocessing

Active learning works for data cleaning and preprocessing because students must engage directly with messy datasets to understand why clean data matters. When students manipulate real-world data, they experience firsthand how poor data quality distorts insights, making abstract concepts like outliers and missing values tangible and memorable.

ACARA Content DescriptionsAC9DT10P01
20–50 minPairs → Whole Class3 activities

Activity 01

Gallery Walk40 min · Small Groups

Gallery Walk: The Good, The Bad, and The Misleading

Display various data visualizations around the room, including some that are intentionally misleading (e.g., truncated y-axes). Students move in groups to identify what each graph is trying to say and any 'tricks' used to distort the data.

Explain the impact of 'dirty data' on the accuracy of analytical results.

Facilitation TipDuring the Gallery Walk, circulate and ask students to explain why they grouped charts as 'good,' 'bad,' or 'misleading' to uncover their reasoning processes.

What to look forProvide students with a small, intentionally flawed dataset (e.g., a table of student heights with some missing entries and unrealistic values). Ask them to identify at least three specific data quality issues and propose one method to address each.

UnderstandApplyAnalyzeCreateRelationship SkillsSocial Awareness
Generate Complete Lesson

Activity 02

Inquiry Circle50 min · Small Groups

Inquiry Circle: Data Makeover

Give groups a boring table of raw data and a poorly chosen chart. They must work together to create three different visualizations of that same data, explaining which one is most effective for a specific target audience, such as a local council.

Differentiate between various data cleaning techniques and their appropriate uses.

Facilitation TipFor Data Makeover, provide a rubric with clear criteria for clean data and effective visualizations to guide student revisions.

What to look forPresent students with two versions of an analysis report: one based on raw, 'dirty' data and another based on cleaned data. Facilitate a class discussion using these questions: 'What differences do you observe in the conclusions drawn from each report?', 'How did the data cleaning process influence the final results?'

AnalyzeEvaluateCreateSelf-ManagementSelf-Awareness
Generate Complete Lesson

Activity 03

Think-Pair-Share20 min · Pairs

Think-Pair-Share: Outlier Detective

Show a scatter plot with several clear outliers. In pairs, students discuss what might have caused these outliers (data error vs. interesting phenomenon) and whether they should be included or removed from the final analysis.

Construct a plan to preprocess a given dataset for analysis.

Facilitation TipIn Outlier Detective, give students a limited time to analyze outliers before discussing how context determines whether an outlier is meaningful or erroneous.

What to look forOn an exit ticket, ask students to define 'outlier' in their own words and describe one scenario where an outlier might be intentionally kept rather than removed. Also, ask them to list one common method for handling missing data.

UnderstandApplyAnalyzeSelf-AwarenessRelationship Skills
Generate Complete Lesson

A few notes on teaching this unit

Experienced teachers approach this topic by focusing on real-world data rather than textbook examples, as students engage more when they see the relevance of their work. Avoid starting with theory; instead, let students encounter data problems organically through activities, then guide them to discover solutions collaboratively. Research shows that when students experience the frustration of working with 'dirty' data, they develop a deeper appreciation for the importance of preprocessing steps like handling missing values or correcting errors.

Successful learning looks like students confidently identifying data issues, justifying their cleaning choices, and selecting appropriate visualizations to communicate findings clearly. They should also articulate why certain chart types or cleaning methods are more effective than others for specific datasets.


Watch Out for These Misconceptions

  • During the Gallery Walk, watch for students who assume colorful or visually complex charts are inherently better, even when they obscure the data.

    Use the Gallery Walk as a chance to redirect students to the rubric, asking them to compare charts based on clarity, not aesthetics. For example, if a student praises a 3D pie chart, ask them to explain how the extra dimension affects their ability to read the data.

  • During Data Makeover, watch for students who believe all outliers must be removed to make the data 'correct.'

    During the activity, challenge students to research their outlier before deleting it. Ask them to consider whether the outlier represents a true error or an important trend, using the dataset’s context to guide their decision.


Methods used in this brief