Skip to content
Computer Science · 9th Grade · The Impact of Artificial Intelligence · Weeks 28-36

The Role of Training Data Quality

Students will analyze the role of training data quality in the success of an AI model.

Common Core State StandardsCSTA: 3A-AP-13CSTA: 3A-IC-24

About This Topic

Automation and the future of work is a topic that directly affects students' career paths. In 9th grade, students predict how AI and robotics will transform various industries, from manufacturing to healthcare. This aligns with CSTA standards regarding the impact of computing on the economy and workforce. Students learn to distinguish between tasks that are easy to automate (repetitive, data-heavy) and those that are difficult (creative, empathetic, physical dexterity).

This topic also addresses the economic and social implications of widespread automation, such as the need for lifelong learning and potential changes to the work week. By focusing on 'human-centric' skills, students can better prepare for a changing job market. Students grasp these concepts faster through simulations where they 're-skill' for a future economy.

Key Questions

  1. Analyze the role of training data quality in the success of an AI model.
  2. Critique the potential biases introduced by poor quality or unrepresentative training data.
  3. Design strategies for improving the quality and diversity of training datasets.

Learning Objectives

  • Analyze how the quality and quantity of training data impact the performance and fairness of an AI model.
  • Critique specific examples of AI bias resulting from unrepresentative or inaccurate training datasets.
  • Design a plan to identify and mitigate bias in a given AI training dataset.
  • Explain the ethical considerations related to data collection and its use in AI model training.

Before You Start

Introduction to Artificial Intelligence Concepts

Why: Students need a basic understanding of what AI models are and how they learn before analyzing the role of training data.

Data Representation and Types

Why: Understanding how data is structured and categorized is foundational to discussing data quality and bias.

Key Vocabulary

Training DataThe dataset used to teach an AI model patterns and relationships. The model learns from this data to make predictions or decisions.
Data BiasSystematic errors or prejudices in a dataset that can lead an AI model to produce unfair or discriminatory outcomes.
Representative DataA dataset that accurately reflects the diversity and characteristics of the real-world population or phenomenon the AI model is intended to serve.
Data CleaningThe process of detecting and correcting or removing corrupt, inaccurate, or irrelevant records from a dataset used for AI training.
Algorithmic FairnessThe principle that AI systems should not create or perpetuate unjust discrimination against individuals or groups, often achieved through careful data management.

Watch Out for These Misconceptions

Common MisconceptionAutomation will eliminate all jobs.

What to Teach Instead

Automation usually changes jobs rather than eliminating them, often creating new types of work. The 'Automation Wave' simulation helps students see how roles evolve and new needs emerge.

Common MisconceptionOnly 'blue-collar' manual labor jobs are at risk.

What to Teach Instead

Many 'white-collar' jobs involving data analysis or routine writing are also being automated. Researching AI in law and medicine helps students see the broad reach of technology.

Active Learning Ideas

See all activities

Real-World Connections

  • Facial recognition systems have shown lower accuracy rates for individuals with darker skin tones due to underrepresentation in training datasets, impacting law enforcement and security applications.
  • Hiring algorithms trained on historical company data may perpetuate gender or racial biases if past hiring practices were discriminatory, leading to fewer opportunities for underrepresented groups.
  • Medical diagnostic AI trained on data primarily from one demographic may misdiagnose conditions in patients from other groups, affecting healthcare outcomes at institutions like the Mayo Clinic or Johns Hopkins.

Assessment Ideas

Quick Check

Present students with two short descriptions of AI training datasets for a loan application model. One dataset is described as 'diverse and up-to-date,' the other as 'older and primarily from urban areas.' Ask students to write one sentence explaining which dataset is likely to produce a fairer model and why.

Discussion Prompt

Facilitate a class discussion using the prompt: 'Imagine you are building an AI to recommend books. What potential biases could exist in your training data, and what specific steps would you take to ensure your data is representative of a wide range of readers?'

Exit Ticket

Provide students with a scenario where an AI chatbot exhibits biased language. Ask them to identify one possible cause related to training data quality and suggest one method to improve the chatbot's responses.

Frequently Asked Questions

Which jobs are most likely to be automated?
Jobs that involve repetitive tasks, predictable environments, and processing large amounts of data are the most likely to be automated, such as data entry, some manufacturing roles, and basic bookkeeping.
What skills should I learn for the future?
Focus on 'human' skills like critical thinking, complex problem-solving, empathy, creativity, and the ability to work alongside AI tools. Being 'tech-literate' is more important than just knowing how to code.
What is 're-skilling'?
Re-skilling is the process of learning new skills so you can do a different job. As technology changes the workforce, many people will need to re-skill several times throughout their careers.
How can active learning help students understand the future of work?
Active learning strategies like the 'Automation Wave' simulation put students in the driver's seat of their own career planning. By physically navigating a changing 'job market' in the classroom, they move from feeling anxious about the future to feeling proactive about the skills they need to develop.