Skip to content

Ethical Data Scraping and PrivacyActivities & Teaching Strategies

Active learning helps students confront the real-world tensions between data utility and privacy head-on. When students scrape data or build models, they quickly see how choices affect individuals, which builds lasting ethical awareness. Role-playing and structured discussion make abstract privacy risks tangible and memorable.

9th GradeComputer Science3 activities25 min40 min

Learning Objectives

  1. 1Critique the ethical considerations and potential harms of scraping data from public websites.
  2. 2Evaluate the importance of data privacy principles when collecting and using personal information.
  3. 3Justify the legal and societal implications of unauthorized data collection.
  4. 4Predict potential negative consequences of data breaches resulting from unethical scraping practices.

Want a complete lesson plan with these objectives? Generate a Mission

40 min·Small Groups

Simulation Game: The Mystery Predictor

Give students a 'training' dataset (e.g., shoe size vs. reading level in elementary students). They build a simple 'model' to predict one from the other, then test it against a 'hidden' dataset to see if their prediction holds up.

Prepare & details

Critique the ethical considerations of scraping data from public websites.

Facilitation Tip: During The Mystery Predictor, circulate and listen for students to name the specific third variable (like ice cream sales and drowning both rising in summer) that explains a spurious correlation.

Setup: Flexible space for group stations

Materials: Role cards with goals/resources, Game currency or tokens, Round tracker

ApplyAnalyzeEvaluateCreateSocial AwarenessDecision-Making
30 min·Small Groups

Formal Debate: Correlation vs. Causation

Present several 'spurious correlations' (e.g., ice cream sales and shark attacks). Groups must argue whether there is a causal link, a hidden third variable, or if it is just a coincidence.

Prepare & details

Justify the importance of data privacy in the context of data collection.

Facilitation Tip: For the Correlation vs. Causation debate, assign roles explicitly—affirmative, negative, and moderator—to keep the discussion focused on evidence rather than opinion.

Setup: Two teams facing each other, audience seating for the rest

Materials: Debate proposition card, Research brief for each side, Judging rubric for audience, Timer

AnalyzeEvaluateCreateSelf-ManagementDecision-Making
25 min·Pairs

Think-Pair-Share: Model Ethics

Students read a short case study about an algorithm used to predict which students might drop out of school. They discuss the benefits and the potential dangers of relying on such a model.

Prepare & details

Predict the potential negative impacts of unauthorized data collection.

Facilitation Tip: In Model Ethics think-pair-share, prompt pairs to swap written responses so they compare justifications before sharing with the whole class.

Setup: Standard classroom seating; students turn to a neighbor

Materials: Discussion prompt (projected or printed), Optional: recording sheet for pairs

UnderstandApplyAnalyzeSelf-AwarenessRelationship Skills

Teaching This Topic

Teachers should frame ethics as a design constraint, not an add-on. Start with familiar tools students already use, then layer in privacy concepts. Research shows that students grasp abstract rules better when they see immediate consequences, so activities should surface real dilemmas early. Avoid long lectures; instead, let students experience the tension and then reflect together.

What to Expect

Students will articulate why correlation does not imply causation, identify ethical pitfalls in data collection, and justify privacy safeguards with concrete examples. Success looks like clear explanations, respectful debate, and thoughtful written justifications tied to the activities.

These activities are a starting point. A full mission is the experience.

  • Complete facilitation script with teacher dialogue
  • Printable student materials, ready for class
  • Differentiation strategies for every learner
Generate a Mission

Watch Out for These Misconceptions

Common MisconceptionDuring The Mystery Predictor, watch for students to assume that because two variables appear linked, one must cause the other.

What to Teach Instead

Use the activity’s spurious correlation examples (like shoe size and reading ability) and ask teams to brainstorm a third hidden factor that could explain the pattern.

Common MisconceptionDuring Simulation: The Mystery Predictor, watch for students to believe a model that fits training data perfectly will work on new data.

What to Teach Instead

Have teams test their predictor on a separate dataset they haven’t seen before and discuss why performance often drops when conditions change.

Assessment Ideas

Discussion Prompt

After Simulation: The Mystery Predictor, pose this scenario: 'A student wants to build a website that aggregates job postings from various company career pages. What ethical questions should they consider before they start scraping these sites? What are the potential privacy risks for job applicants?' Use student responses to assess their ability to identify PII and respect data ownership.

Quick Check

During Structured Debate: Correlation vs. Causation, distribute the two hypothetical scenarios. Collect responses and review them to confirm students can distinguish between public non-personal data and sensitive personal data, explaining privacy risks in one sentence.

Exit Ticket

After Think-Pair-Share: Model Ethics, collect written responses that define PII in their own words and list two examples. Check for accurate definitions and clear reasoning about why protecting PII matters when collecting data.

Extensions & Scaffolding

  • Challenge: Have students design a data-scraping tool that maximizes transparency by documenting every data source and purpose.
  • Scaffolding: Provide sentence stems for the Model Ethics think-pair-share (e.g., 'One ethical risk is... because...').
  • Deeper: Invite a guest speaker from a local nonprofit to explain how they balance data needs with privacy obligations.

Key Vocabulary

Data ScrapingThe automated process of extracting large amounts of data from websites. This can be done for various purposes, both legitimate and unethical.
Personally Identifiable Information (PII)Any data that could potentially identify a specific individual. This includes names, addresses, email addresses, social security numbers, and more.
Data PrivacyThe practice of protecting sensitive personal data from unauthorized access, use, disclosure, alteration, or destruction.
Terms of Service (ToS)A legal agreement between a service provider and a user that outlines the rules and restrictions for using a website or service.
Ethical HackingThe practice of using hacking skills to identify vulnerabilities in systems with permission, often to improve security. This is distinct from malicious hacking or unauthorized scraping.

Ready to teach Ethical Data Scraping and Privacy?

Generate a full mission with everything you need

Generate a Mission