Skip to content
Technologies · Year 10

Active learning ideas

Data Cleaning and Preprocessing

Active learning works for data cleaning because students need to wrestle with real messy data to see how decisions affect outcomes. Year 10 students remember techniques better when they debate trade-offs between deletion and imputation, plot outliers to test their assumptions, and build pipelines they can explain. This hands-on approach builds both technical skill and critical judgment they will use in later data science tasks.

ACARA Content DescriptionsAC9DT10P02
30–50 minPairs → Whole Class4 activities

Activity 01

Pairs Challenge: Missing Data Strategy

Provide pairs with a dataset containing 20% missing values from a sales record. Students discuss and apply two strategies, such as deletion or imputation, then compare results on summary statistics. Pairs share one key insight with the class.

Design a strategy to handle missing data in a large dataset.

Facilitation TipDuring the Pairs Challenge, circulate and ask each pair to explain why they picked imputation over deletion before they touch any data.

What to look forProvide students with a small, messy dataset (e.g., a table of student test scores with missing entries and a few extreme values). Ask them to identify one missing value and one outlier, and then write a sentence explaining how they would address each.

ApplyAnalyzeEvaluateCreateRelationship SkillsDecision-MakingSelf-Management
Generate Complete Lesson

Activity 02

Collaborative Problem-Solving45 min · Small Groups

Small Groups: Outlier Detection Lab

Groups receive a housing price dataset with planted outliers. They use box plots and z-scores to identify anomalies, decide removal or retention, and recalculate averages. Groups present their choices and rationale.

Evaluate the impact of data outliers on statistical analysis.

Facilitation TipIn the Outlier Detection Lab, require students to sketch a quick boxplot by hand first, then compare it to the digital version to spot discrepancies.

What to look forPose the question: 'Imagine you are cleaning a dataset of customer feedback for a new product. What are two potential problems you might encounter, and how would you decide whether to remove an outlier or try to correct it?' Facilitate a class discussion on their proposed solutions and reasoning.

ApplyAnalyzeEvaluateCreateRelationship SkillsDecision-MakingSelf-Management
Generate Complete Lesson

Activity 03

Collaborative Problem-Solving40 min · Whole Class

Whole Class: Inconsistency Cleanup Relay

Project a large dataset with format errors like mixed date styles. Teams take turns correcting one row or column, passing control after each fix. Class votes on the cleanest final version.

Justify the importance of data cleaning before any data analysis.

Facilitation TipDuring the Inconsistency Cleanup Relay, give each group a unique typo so they experience how real-world data entry errors vary from dataset to dataset.

What to look forOn an index card, have students define 'data imputation' in their own words and provide one example of when it would be necessary. Then, ask them to list one reason why cleaning data is crucial before analysis.

ApplyAnalyzeEvaluateCreateRelationship SkillsDecision-MakingSelf-Management
Generate Complete Lesson

Activity 04

Collaborative Problem-Solving50 min · Individual

Individual: Preprocessing Pipeline

Students select a public dataset, document steps to clean missing values and outliers, then generate a cleaned version. They reflect on changes in a one-page report for peer review.

Design a strategy to handle missing data in a large dataset.

Facilitation TipIn the Preprocessing Pipeline, insist students write a one-sentence justification for every transformation before they run the code or calculation.

What to look forProvide students with a small, messy dataset (e.g., a table of student test scores with missing entries and a few extreme values). Ask them to identify one missing value and one outlier, and then write a sentence explaining how they would address each.

ApplyAnalyzeEvaluateCreateRelationship SkillsDecision-MakingSelf-Management
Generate Complete Lesson

A few notes on teaching this unit

Teachers should avoid presenting cleaning as a mechanical checklist. Instead, frame each technique as a strategic move with consequences. Research shows students grasp outliers better when they plot real data and see how a single point can pull a mean or bend a trend line. Encourage students to document their reasoning in margin notes so they can revisit and revise decisions later.

By the end of these activities, students will confidently choose appropriate cleaning methods, justify their choices with evidence, and evaluate how each step changes summary statistics and visual trends. They will move from asking 'How do I clean this?' to 'Why is this cleaning decision better than the alternatives?'


Watch Out for These Misconceptions

  • During the Pairs Challenge: watch for students defaulting to deletion without considering the impact on dataset size.

    Prompt pairs to calculate how many rows they would lose and what summary statistics would shift before they choose a method. Ask them to sketch two histograms side-by-side to visualize the difference.

  • During the Outlier Detection Lab: watch for students labeling any extreme value as an error without context.

    Have students read the dataset’s metadata aloud before they plot, forcing them to ask whether the extreme reflects a rare event or a data entry mistake. Require them to write a one-sentence justification for removing or keeping each outlier.

  • During the Inconsistency Cleanup Relay: watch for students fixing typos without checking if the error affects downstream analysis.

    After each typo fix, ask students to recalculate the mean and standard deviation to see if the change matters. Use a quick peer check so they compare their revised statistics with another group.


Methods used in this brief