Skip to content

Data Collection and CleaningActivities & Teaching Strategies

Active learning works here because data collection and cleaning are hands-on skills. Students need to touch messy data to see why cleaning matters, and moving through stations or pair work keeps the abstract concrete. This builds the judgment they’ll need when designing their own research.

Year 8Technologies4 activities20 min50 min

Learning Objectives

  1. 1Classify data sources as either primary or secondary, justifying the choice based on a given research question.
  2. 2Identify common data errors, including duplicates, missing values, and outliers, within a provided dataset.
  3. 3Evaluate the impact of data cleaning on the accuracy of simple statistical measures, such as the mean or median.
  4. 4Design a step-by-step plan for collecting and cleaning data to answer a specific, teacher-provided research question.
  5. 5Critique a data collection and cleaning plan for potential ethical considerations or inefficiencies.

Want a complete lesson plan with these objectives? Generate a Mission

45 min·Small Groups

Stations Rotation: Source Hunt

Prepare stations with survey forms, online articles, sensor apps, and databases. Groups visit each for 7 minutes, collect sample data, note pros and cons, then share plans for a class question like 'What affects lunch choices?'. Rotate twice for depth.

Prepare & details

Justify the importance of data cleaning before analysis.

Facilitation Tip: During Source Hunt, set a 3-minute timer at each station so groups must move quickly, forcing them to evaluate sources under time pressure.

Setup: Tables/desks arranged in 4-6 distinct stations around room

Materials: Station instruction cards, Different materials per station, Rotation timer

RememberUnderstandApplyAnalyzeSelf-ManagementRelationship Skills
30 min·Pairs

Pairs: Spreadsheet Scrub

Provide messy datasets with errors in Google Sheets. Pairs identify issues using filters and formulas, remove duplicates, fill gaps logically, then graph before-and-after. Discuss changes' effects on trends.

Prepare & details

Differentiate between primary and secondary data sources.

Facilitation Tip: Pairs working on Spreadsheet Scrub should swap cleaned sheets to compare corrections before finalizing, ensuring they defend each edit.

Setup: Flexible workspace with access to materials and technology

Materials: Project brief with driving question, Planning template and timeline, Rubric with milestones, Presentation materials

ApplyAnalyzeEvaluateCreateSelf-ManagementRelationship SkillsDecision-Making
50 min·Whole Class

Whole Class: Data Plan Pitch

Pose a question like 'School waste patterns'. Students brainstorm sources and cleaning steps on shared boards, vote on best plans, then test one by collecting initial data.

Prepare & details

Construct a plan for collecting and cleaning data for a specific research question.

Facilitation Tip: In Data Plan Pitch, require students to hold up their planned data source when they explain why it’s suitable, making their reasoning visible.

Setup: Flexible workspace with access to materials and technology

Materials: Project brief with driving question, Planning template and timeline, Rubric with milestones, Presentation materials

ApplyAnalyzeEvaluateCreateSelf-ManagementRelationship SkillsDecision-Making
20 min·Individual

Individual: Error Detective

Give printed datasets with planted errors. Students circle problems, propose fixes, and justify choices in a log, preparing for group cleaning.

Prepare & details

Justify the importance of data cleaning before analysis.

Facilitation Tip: For Error Detective, give students red pens to mark errors directly on the printout so corrections are visible and discussable.

Setup: Flexible workspace with access to materials and technology

Materials: Project brief with driving question, Planning template and timeline, Rubric with milestones, Presentation materials

ApplyAnalyzeEvaluateCreateSelf-ManagementRelationship SkillsDecision-Making

Teaching This Topic

Teach this topic by letting students experience the frustration of dirty data first. Start with quick, low-stakes messes they can spot immediately, then layer in subtler issues like duplicates or outdated figures. Model your own thinking aloud when cleaning a sample dataset so students see the internal dialogue behind each decision. Avoid over-teaching the rules upfront; let students discover the need for cleaning through their own trials.

What to Expect

Successful students will justify their cleaning choices with evidence, spot errors without prompting, and plan steps that prevent bias. They’ll discuss trade-offs between data sources and explain how clean data leads to reliable results in their own projects.

These activities are a starting point. A full mission is the experience.

  • Complete facilitation script with teacher dialogue
  • Printable student materials, ready for class
  • Differentiation strategies for every learner
Generate a Mission

Watch Out for These Misconceptions

Common MisconceptionDuring Source Hunt, watch for students who assume any government website or published chart is flawless.

What to Teach Instead

Have students record one potential error for each source they examine, then share findings in a whole-class debrief to highlight how even trusted sources need scrutiny.

Common MisconceptionDuring Data Plan Pitch, listen for students who default to primary data without weighing its limitations.

What to Teach Instead

Require them to present one advantage and one drawback of their chosen source, then challenge the class to suggest alternatives or improvements.

Common MisconceptionDuring Spreadsheet Scrub, observe students who alter values to match their expectations rather than restore accuracy.

What to Teach Instead

Display before-and-after graphs side by side during the activity’s wrap-up to show how honest cleaning reveals true trends without distortion.

Assessment Ideas

Exit Ticket

After Source Hunt, ask students to write a sentence classifying each source they reviewed as primary or secondary and to name one potential error they spotted during their hunt.

Quick Check

During Spreadsheet Scrub, circulate and ask pairs to point out the two most serious errors in their sheet and explain how they would correct them before submitting.

Discussion Prompt

After Data Plan Pitch, facilitate a quick discussion where students vote on the most convincing plan and justify their choice, revealing their understanding of data source trade-offs.

Extensions & Scaffolding

  • Challenge: Ask early finishers to design a mini-survey with intentional flaws, then swap with a peer to clean and justify fixes.
  • Scaffolding: Provide a color-coded checklist for Spreadsheet Scrub with categories like "duplicates," "outliers," and "missing values" to guide struggling pairs.
  • Deeper exploration: Have students research a dataset’s metadata to understand how collection methods influence cleaning choices, then present findings to the class.

Key Vocabulary

Primary DataInformation collected directly by the researcher for the specific purpose of their study, such as through surveys or experiments.
Secondary DataInformation that has already been collected by someone else for a different purpose, such as from existing reports or databases.
Data CleaningThe process of detecting and correcting (or removing) corrupt, inaccurate, or irrelevant records from a dataset to improve data quality.
OutlierA data point that differs significantly from other observations, potentially indicating variability or measurement error.
Duplicate RecordAn entry in a dataset that is identical or nearly identical to another entry, which can skew analysis if not handled.

Ready to teach Data Collection and Cleaning?

Generate a full mission with everything you need

Generate a Mission