Computer Science · 9th Grade · The Impact of Artificial Intelligence · Weeks 28-36

The Role of Training Data Quality

Students will analyze the role of training data quality in the success of an AI model.

Common Core State StandardsCSTA: 3A-AP-13CSTA: 3A-IC-24

About This Topic

Automation and the future of work is a topic that directly affects students' career paths. In 9th grade, students predict how AI and robotics will transform various industries, from manufacturing to healthcare. This aligns with CSTA standards regarding the impact of computing on the economy and workforce. Students learn to distinguish between tasks that are easy to automate (repetitive, data-heavy) and those that are difficult (creative, empathetic, physical dexterity).

This topic also addresses the economic and social implications of widespread automation, such as the need for lifelong learning and potential changes to the work week. By focusing on 'human-centric' skills, students can better prepare for a changing job market. Students grasp these concepts faster through simulations where they 're-skill' for a future economy.

Key Questions

Analyze the role of training data quality in the success of an AI model.
Critique the potential biases introduced by poor quality or unrepresentative training data.
Design strategies for improving the quality and diversity of training datasets.

Learning Objectives

Analyze how the quality and quantity of training data impact the performance and fairness of an AI model.
Critique specific examples of AI bias resulting from unrepresentative or inaccurate training datasets.
Design a plan to identify and mitigate bias in a given AI training dataset.
Explain the ethical considerations related to data collection and its use in AI model training.

Before You Start

Introduction to Artificial Intelligence Concepts

Why: Students need a basic understanding of what AI models are and how they learn before analyzing the role of training data.

Data Representation and Types

Why: Understanding how data is structured and categorized is foundational to discussing data quality and bias.

Key Vocabulary

Training Data	The dataset used to teach an AI model patterns and relationships. The model learns from this data to make predictions or decisions.
Data Bias	Systematic errors or prejudices in a dataset that can lead an AI model to produce unfair or discriminatory outcomes.
Representative Data	A dataset that accurately reflects the diversity and characteristics of the real-world population or phenomenon the AI model is intended to serve.
Data Cleaning	The process of detecting and correcting or removing corrupt, inaccurate, or irrelevant records from a dataset used for AI training.
Algorithmic Fairness	The principle that AI systems should not create or perpetuate unjust discrimination against individuals or groups, often achieved through careful data management.

Watch Out for These Misconceptions

Common MisconceptionAutomation will eliminate all jobs.

What to Teach Instead

Automation usually changes jobs rather than eliminating them, often creating new types of work. The 'Automation Wave' simulation helps students see how roles evolve and new needs emerge.

Common MisconceptionOnly 'blue-collar' manual labor jobs are at risk.

What to Teach Instead

Many 'white-collar' jobs involving data analysis or routine writing are also being automated. Researching AI in law and medicine helps students see the broad reach of technology.

Active Learning Ideas

See all activities

Simulation Game: The Automation Wave

Assign students different 'jobs' (truck driver, surgeon, artist). Introduce 'AI breakthroughs' one by one. Students must decide if their job is automated, assisted, or unchanged, and then 're-skill' by finding a new role.

45 min·Whole Class

Inquiry Circle: Industry 4.0

Groups research how a specific industry (like farming or fashion) has changed due to technology over the last 50 years and predict what it will look like in 2050.

40 min·Small Groups

Think-Pair-Share: The Un-automatable

Students brainstorm a list of skills they think a robot will *never* be able to do. They pair up to challenge each other's lists and narrow it down to the top three 'human-only' skills.

20 min·Pairs

Real-World Connections

Facial recognition systems have shown lower accuracy rates for individuals with darker skin tones due to underrepresentation in training datasets, impacting law enforcement and security applications.
Hiring algorithms trained on historical company data may perpetuate gender or racial biases if past hiring practices were discriminatory, leading to fewer opportunities for underrepresented groups.
Medical diagnostic AI trained on data primarily from one demographic may misdiagnose conditions in patients from other groups, affecting healthcare outcomes at institutions like the Mayo Clinic or Johns Hopkins.

Assessment Ideas

Quick Check

Present students with two short descriptions of AI training datasets for a loan application model. One dataset is described as 'diverse and up-to-date,' the other as 'older and primarily from urban areas.' Ask students to write one sentence explaining which dataset is likely to produce a fairer model and why.

Discussion Prompt

Facilitate a class discussion using the prompt: 'Imagine you are building an AI to recommend books. What potential biases could exist in your training data, and what specific steps would you take to ensure your data is representative of a wide range of readers?'

Exit Ticket

Provide students with a scenario where an AI chatbot exhibits biased language. Ask them to identify one possible cause related to training data quality and suggest one method to improve the chatbot's responses.

Frequently Asked Questions

Which jobs are most likely to be automated?

Jobs that involve repetitive tasks, predictable environments, and processing large amounts of data are the most likely to be automated, such as data entry, some manufacturing roles, and basic bookkeeping.

What skills should I learn for the future?

Focus on 'human' skills like critical thinking, complex problem-solving, empathy, creativity, and the ability to work alongside AI tools. Being 'tech-literate' is more important than just knowing how to code.

What is 're-skilling'?

Re-skilling is the process of learning new skills so you can do a different job. As technology changes the workforce, many people will need to re-skill several times throughout their careers.

How can active learning help students understand the future of work?

Active learning strategies like the 'Automation Wave' simulation put students in the driver's seat of their own career planning. By physically navigating a changing 'job market' in the classroom, they move from feeling anxious about the future to feeling proactive about the skills they need to develop.

More in The Impact of Artificial Intelligence

Machine Learning vs. Traditional Programming

Students will understand how machine learning differs from traditional rule-based programming.

2 methodologies

Supervised and Unsupervised Learning

Students will understand how computers learn from examples through supervised and unsupervised learning.

2 methodologies

AI Creativity and Mimicry

Students will discuss whether a computer can truly be creative or if it is just mimicking patterns.

2 methodologies

Sources of Algorithmic Bias

Students will analyze how human prejudices can be encoded into software and the resulting social impact.

2 methodologies

Ethical Decision-Making in AI

Students will discuss ethical dilemmas faced by AI systems and the importance of human oversight.

2 methodologies

Identifying Bias in AI Outputs

Students will learn to identify and analyze instances of bias in the outputs of AI systems.

2 methodologies