Skip to content
Computer Science · 9th Grade · Data Intelligence and Visualization · Weeks 28-36

Data Collection Methods and Bias

Students will explore techniques for gathering data and analyze how bias in data collection can lead to inaccurate conclusions.

Common Core State StandardsCSTA: 3A-DA-11

About This Topic

Data collection and cleaning are the first steps in any meaningful data analysis. For 9th graders, this topic emphasizes that data is rarely 'perfect' when first gathered. This aligns with CSTA standards for collecting and refining data sets. Students learn to identify missing values, outliers, and formatting errors that could skew their results.

This topic also introduces the critical concept of bias. Students explore how the way data is collected, who is asked, what questions are used, and where the data comes from, can lead to unfair or inaccurate conclusions. This connection to ethics and social impact is a key part of the high school curriculum. Students grasp this concept faster through collaborative investigations where they 'clean' a messy dataset and discover how much the results change.

Key Questions

  1. Analyze how bias in data collection can lead to inaccurate or harmful conclusions.
  2. Compare different data collection methods and their potential sources of bias.
  3. Design a data collection strategy that minimizes bias for a specific research question.

Learning Objectives

  • Compare potential biases in at least two different data collection methods, such as surveys versus observational studies.
  • Analyze how specific sampling techniques can introduce bias into a dataset, leading to skewed results.
  • Design a data collection plan for a given research question that actively mitigates at least two common sources of bias.
  • Explain the ethical implications of collecting biased data in real-world scenarios, citing potential harms.
  • Critique a provided dataset for potential biases and suggest methods for correction or further investigation.

Before You Start

Introduction to Data and Variables

Why: Students need a foundational understanding of what data is and how it is represented to grasp the concepts of collecting and analyzing it.

Basic Survey Design Principles

Why: Familiarity with constructing simple questions is helpful before analyzing how poorly designed questions can introduce bias.

Key Vocabulary

Sampling BiasSystematic error introduced into a sample when individuals or groups are not represented in the same proportion as they are in the population. This can lead to inaccurate generalizations.
Selection BiasBias introduced when the sample selected is not representative of the target population. This can occur if certain individuals are more likely to be included or excluded from the study.
Measurement BiasBias that occurs when the method of measurement or the instrument used consistently produces inaccurate results. This can happen with poorly worded survey questions or faulty equipment.
Confirmation BiasThe tendency to search for, interpret, favor, and recall information in a way that confirms one's pre-existing beliefs or hypotheses. In data collection, this can influence question design or data interpretation.
Convenience SamplingA method of data collection where participants are selected based on their easy availability and proximity. This method often leads to biased samples because it does not represent the broader population.

Watch Out for These Misconceptions

Common MisconceptionComputers automatically fix errors in data.

What to Teach Instead

Computers will process whatever data they are given, even if it is wrong ('Garbage In, Garbage Out'). Hands-on cleaning activities show students that human judgment is needed to set the rules for what is 'valid' data.

Common MisconceptionMore data always means better results.

What to Teach Instead

A large amount of biased or 'dirty' data is less useful than a smaller amount of high-quality data. Comparing results from 'raw' vs. 'cleaned' datasets helps students see the value of quality over quantity.

Active Learning Ideas

See all activities

Real-World Connections

  • Market researchers for companies like Nielsen use various data collection methods, including surveys and observational studies, to understand consumer behavior. Biased data can lead to misinformed product development or marketing campaigns, costing millions.
  • Political pollsters collect data to predict election outcomes. If their sampling methods over or underrepresent certain demographics, the poll results can be significantly inaccurate, influencing public perception and campaign strategies.
  • Healthcare providers collect patient data to identify disease trends and evaluate treatment effectiveness. Biased data collection, perhaps by only surveying patients who visit a specific clinic, could lead to a misunderstanding of a disease's prevalence or impact across diverse populations.

Assessment Ideas

Quick Check

Present students with two hypothetical scenarios for collecting data on smartphone usage: Scenario A uses online pop-up surveys, and Scenario B uses randomly selected phone call surveys. Ask students to write one sentence identifying a potential bias in each scenario and one sentence explaining why Scenario B might be less biased.

Discussion Prompt

Pose the question: 'Imagine you are designing a survey to understand student opinions on school lunch quality. What are three specific steps you would take during the design and distribution process to minimize bias?' Facilitate a class discussion, encouraging students to share and critique each other's strategies.

Exit Ticket

Provide students with a short, fictional news report about a study. Ask them to identify one potential source of bias mentioned or implied in the data collection method described and write one sentence explaining how that bias might have affected the study's conclusions.

Frequently Asked Questions

What does it mean to 'clean' data?
Cleaning data means identifying and fixing errors, such as removing duplicate entries, correcting typos, handling missing information, and making sure all data is in the same format (like making sure all dates look the same).
How can data collection be biased?
Bias happens when the data collected doesn't accurately represent the whole group. For example, if you only survey people at a library about their favorite hobby, your data will be biased toward reading.
What is 'Garbage In, Garbage Out'?
This is a famous phrase in computer science meaning that if you start with bad or 'garbage' data, your results and conclusions will also be bad, no matter how good your code or analysis is.
How can active learning help students understand data cleaning?
Active learning turns a tedious task into a puzzle. When students work together to 'detect' errors in a dataset, they develop a critical eye for detail. This collaborative process surfaces different perspectives on what should be considered an 'outlier' versus a 'valid' piece of data, leading to deeper discussions about accuracy.