Skip to content
Technologies · Year 8 · Data Intelligence · Term 2

Data Collection and Cleaning

Students will learn methods for collecting data from various sources and techniques for cleaning and preparing data for analysis.

ACARA Content DescriptionsAC9TDI8P01

About This Topic

Data collection gathers information from primary sources, such as student surveys or sensor readings, and secondary sources, like government databases or research articles. Cleaning follows by spotting errors, duplicates, outliers, and gaps, then fixing them for accurate analysis. Year 8 students master these to meet AC9TDI8P01, justifying cleaning to avoid misleading results and planning steps for research questions.

In the Data Intelligence unit, students differentiate sources by reliability and relevance, for example, using primary data for local school habits and secondary for national trends. They construct plans outlining tools, sample sizes, and cleaning protocols, building skills for ethical data use and computational thinking.

Active learning excels with this topic. Students collect real data, face authentic issues like typos from surveys, and collaborate on spreadsheets to clean it. Hands-on trials show cleaning's impact on graphs and conclusions, fostering critical evaluation and persistence as they iterate plans.

Key Questions

  1. Justify the importance of data cleaning before analysis.
  2. Differentiate between primary and secondary data sources.
  3. Construct a plan for collecting and cleaning data for a specific research question.

Learning Objectives

  • Classify data sources as either primary or secondary, justifying the choice based on a given research question.
  • Identify common data errors, including duplicates, missing values, and outliers, within a provided dataset.
  • Evaluate the impact of data cleaning on the accuracy of simple statistical measures, such as the mean or median.
  • Design a step-by-step plan for collecting and cleaning data to answer a specific, teacher-provided research question.
  • Critique a data collection and cleaning plan for potential ethical considerations or inefficiencies.

Before You Start

Introduction to Data and Information

Why: Students need a foundational understanding of what data is and how it can represent real-world information before learning to collect and clean it.

Basic Spreadsheet Skills

Why: Familiarity with using spreadsheet software is essential for practical data collection and cleaning activities.

Key Vocabulary

Primary DataInformation collected directly by the researcher for the specific purpose of their study, such as through surveys or experiments.
Secondary DataInformation that has already been collected by someone else for a different purpose, such as from existing reports or databases.
Data CleaningThe process of detecting and correcting (or removing) corrupt, inaccurate, or irrelevant records from a dataset to improve data quality.
OutlierA data point that differs significantly from other observations, potentially indicating variability or measurement error.
Duplicate RecordAn entry in a dataset that is identical or nearly identical to another entry, which can skew analysis if not handled.

Watch Out for These Misconceptions

Common MisconceptionAll data from trusted sources is clean and ready to use.

What to Teach Instead

Sources often have unintentional errors like typos or outdated info. Active data hunts reveal these, and group cleaning sessions let students compare fixes, building judgment on data quality.

Common MisconceptionPrimary data is always better than secondary data.

What to Teach Instead

Primary suits specific contexts but takes time; secondary offers breadth but needs verification. Source comparison activities help students weigh trade-offs through debate, clarifying choices for plans.

Common MisconceptionCleaning data means changing it to fit desired results.

What to Teach Instead

Cleaning restores accuracy without bias. Hands-on graphing before and after shows honest trends, and peer reviews during activities reinforce ethical standards.

Active Learning Ideas

See all activities

Real-World Connections

  • Market researchers at companies like Nielsen use primary data from focus groups and surveys, alongside secondary data from sales figures, to understand consumer behaviour and inform product development.
  • Epidemiologists at the World Health Organization (WHO) collect primary data from patient interviews and medical tests, and analyze secondary data from global health databases to track disease outbreaks and develop public health strategies.
  • Financial analysts at investment firms meticulously clean secondary data from stock markets and company reports, as errors can lead to significant miscalculations in predicting company performance and market trends.

Assessment Ideas

Exit Ticket

Provide students with a short list of data sources (e.g., a student survey, a published census report, sensor readings from a weather station). Ask them to write one sentence for each, classifying it as primary or secondary data and briefly explaining why.

Quick Check

Present students with a small table of sample data containing obvious errors (e.g., a typo in a name, a nonsensical age, a duplicate entry). Ask them to identify at least two specific errors and suggest how they would correct or handle each one.

Discussion Prompt

Pose the question: 'Imagine you are collecting data about the most popular sports at your school. What are two potential problems you might encounter when collecting this data, and how would you clean your data to fix these problems?' Facilitate a brief class discussion on their responses.

Frequently Asked Questions

What are primary and secondary data sources for Year 8?
Primary sources come from direct collection, like class surveys or experiments, offering fresh, targeted data. Secondary sources are pre-existing, such as websites or reports, providing context and scale. Students differentiate by planning activities: match sources to questions, evaluate biases, and combine for robust datasets, as per AC9TDI8P01.
Why justify data cleaning before analysis?
Unclean data leads to wrong patterns, like skewed averages from duplicates. Cleaning ensures fair insights, vital for decisions. Students justify through demos: graph messy vs clean data, spot differences, and link to real impacts like policy errors, strengthening analytical arguments.
How to plan data collection and cleaning?
Start with a clear question, choose sources, define tools and samples. List cleaning steps: check formats, handle misses, validate. Class pitches refine plans; test small-scale to adjust, ensuring feasibility and ethics for projects like environmental monitoring.
How can active learning help with data collection and cleaning?
Active methods make abstract steps tangible: students survey peers for primary data, import secondary sets, then clean collaboratively in tools like Sheets. Encounters with real errors build problem-solving; sharing cleaned visuals reveals cleaning's value, boosting engagement and retention over lectures.