Skip to content
Computer Science · 11th Grade · Artificial Intelligence and Ethics · Weeks 19-27

AI Applications: Image and Speech Recognition

Exploring how AI is used in practical applications like recognizing images and understanding speech.

Common Core State StandardsCSTA: 3B-AP-09

About This Topic

Image recognition and speech recognition are two of the most visible applications of machine learning across US society. Both systems work by converting raw sensory input into numerical representations and applying neural networks trained on large labeled datasets to classify those inputs. For 11th-grade Computer Science students, this topic connects the neural network mechanics studied earlier in the unit to technologies already part of daily life: Face ID, voice assistants, live captioning, and medical imaging tools.

What makes these applications analytically rich is the documented gap between aggregate accuracy and subgroup reliability. Studies such as the MIT Media Lab Gender Shades project and NIST's Face Recognition Vendor Testing show consistently higher error rates for darker-skinned individuals and speakers of non-mainstream English dialects. Connecting those performance gaps to dataset composition makes fairness arguments concrete and technically grounded rather than abstract.

Active learning is particularly effective here because the ethical and policy questions involve competing values, not settled answers. Deliberation formats, case analysis with real disparity data, and structured role-play build the analytical skills CSTA 3B-AP-09 targets and prepare students to evaluate societal impacts of computing with evidence rather than opinion.

Key Questions

  1. Explain how AI enables computers to 'see' and 'hear' in applications like facial recognition or voice assistants.
  2. Analyze the societal impact and ethical considerations of widespread image and speech recognition technologies.
  3. Predict future advancements and challenges in AI-powered perception.

Learning Objectives

  • Explain the underlying computational principles that enable AI systems to process and interpret visual and auditory data.
  • Analyze the performance disparities in image and speech recognition systems across different demographic groups, citing specific research findings.
  • Critique the ethical implications of widespread AI-powered image and speech recognition, considering privacy, bias, and societal equity.
  • Design a conceptual framework for a new AI application that utilizes image or speech recognition, outlining its potential benefits and risks.
  • Evaluate the potential societal impacts of future advancements in AI perception technologies.

Before You Start

Introduction to Machine Learning Concepts

Why: Students need a foundational understanding of how machine learning models learn from data before exploring specific applications like image and speech recognition.

Data Representation and Preprocessing

Why: Understanding how raw data is converted into numerical formats is crucial for grasping how AI processes sensory input like images and sound.

Key Vocabulary

Feature ExtractionThe process of identifying and isolating specific, relevant characteristics from raw data, such as edges and textures in an image or phonemes in speech.
Neural NetworkA computational model inspired by the structure of the human brain, used to recognize patterns in data through layers of interconnected nodes.
Training DataLarge, labeled datasets used to teach AI models to recognize patterns and make predictions, where the quality and diversity of data significantly impact performance.
Bias in AISystematic and repeatable errors in an AI system that create unfair outcomes, often stemming from skewed training data or flawed algorithms.
Algorithmic FairnessThe principle of ensuring that AI systems do not create or perpetuate unjust discrimination against individuals or groups.

Watch Out for These Misconceptions

Common MisconceptionAI recognition systems can understand what they perceive the way humans do.

What to Teach Instead

These systems detect statistical patterns in numerical arrays without any semantic understanding. A speech recognition model does not know what a word means; it learned which audio patterns map to which text strings during training. Having students trace a specific misclassification back to the input data makes this distinction concrete and helps it stick in ways that definitional explanations do not.

Common MisconceptionA high overall accuracy rate means a recognition system is fair and reliable for all users.

What to Teach Instead

Aggregate accuracy masks sharp subgroup performance gaps. A system averaging 97% overall may perform at 78% for darker-skinned women, as documented in Gender Shades research. Students who work with disaggregated accuracy tables during structured case analysis develop a more critical standard for reading performance claims and become less susceptible to headline statistics without demographic breakdowns.

Common MisconceptionRemoving demographic labels from training data is sufficient to eliminate bias in recognition systems.

What to Teach Instead

Bias is encoded in which faces or voices dominate the dataset, not in explicit labels. A dataset heavy with lighter-skinned faces produces biased recognition even without race labels, because pixel-level features correlate with demographics. Students who have built a full data pipeline diagram in a role-play or case study can explain why label removal at the annotation layer does not address imbalance at the collection layer.

Active Learning Ideas

See all activities

Gallery Walk: Recognition in the Wild

Post four stations: a radiology AI success case, a documented facial recognition false-positive incident, a voice assistant accuracy comparison across English accents, and a real-time captioning failure example. Groups rotate and annotate what data conditions produced each outcome and what safeguard was or was not in place. The class reconvenes to map shared patterns and build a collective framework for evaluating recognition system deployments.

40 min·Small Groups

Think-Pair-Share: What Does the Model Actually Learn?

Show students three images: two that a low-quality recognition system treats as the same person (a false match) and one it misclassifies. Students individually write a hypothesis about which pattern the model likely latched onto, then compare reasoning with a partner. The class builds a shared diagram tracing what each neural network layer is likely detecting, anchoring abstract architecture concepts to observable failure modes.

25 min·Pairs

Structured Academic Controversy: Facial Recognition in Schools

Assign pairs to argue for or against deploying facial recognition in their school district, using a brief that includes NIST accuracy statistics, student privacy law citations, and a documented incident of false identification in a school setting. After presenting arguments, pairs switch sides and then collaborate to write a consensus statement with specific accuracy thresholds or conditions. The forced perspective switch requires engaging the strongest version of the opposing argument.

50 min·Pairs

Role-Play: City Council Hearing on AI Surveillance

Students take roles as city council members, police department representatives, civil liberties advocates, and affected community members to debate a proposed facial recognition ordinance. Each group prepares a two-minute statement and fields questions from other roles. The class votes on a final ordinance text that must specify required accuracy thresholds, audit provisions, and appeal processes, grounding policy decisions in the technical concepts from the unit.

45 min·Small Groups

Real-World Connections

  • Law enforcement agencies use facial recognition software, such as Clearview AI, to identify suspects from surveillance footage, raising concerns about privacy and potential misidentification.
  • Companies like Apple and Google deploy voice assistants (Siri, Google Assistant) that rely on sophisticated speech recognition to understand and respond to user commands, impacting how people interact with technology.
  • Medical professionals utilize AI-powered image analysis tools to detect anomalies in X-rays and MRIs, aiding in earlier and more accurate diagnoses for conditions like cancer.

Assessment Ideas

Discussion Prompt

Present students with a news article detailing a real-world case of bias in facial recognition (e.g., higher error rates for women or people of color). Ask: 'What specific aspect of the AI system's training or design might have led to this disparity? How could this bias be addressed in future development?'

Quick Check

Provide students with two short audio clips: one of a standard English speaker and one of a speaker with a distinct regional accent. Ask them to predict which clip a typical voice assistant might struggle with more and explain why, referencing concepts like training data diversity.

Exit Ticket

Ask students to write down one specific application of image recognition and one of speech recognition they encounter daily. For each, they should briefly describe the AI's function and one potential ethical concern associated with its use.

Frequently Asked Questions

How does image recognition actually work inside a neural network?
The network applies successive layers of filters to an input image. Early layers detect simple features like edges and gradients; later layers combine those into shapes and then higher-order patterns like facial geometry or object outlines. Training adjusts filter weights by comparing predictions against labeled examples until the model reliably generalizes. The same principle applies to speech: audio waveforms pass through layers that detect phonemes before assembling words.
Why do voice assistants make more errors with certain accents or dialects?
Speech recognition models learn from training corpora, and most large datasets over-represent mainstream American English or standard British English. Accents with fewer training examples produce higher word error rates because the model encounters phoneme patterns it saw too infrequently during training to recognize reliably. The performance gap comes from dataset composition, not from any property of the accent itself.
What are the main legal and ethical concerns about facial recognition in the US?
Primary concerns include mass surveillance without meaningful consent, accuracy gaps that concentrate false identifications among marginalized communities, law enforcement reliance on unaudited systems, and chilling effects on public assembly. Multiple US cities have enacted government facial recognition bans, and federal legislative proposals cite NIST demographic accuracy testing as the technical basis for proposed restrictions.
What active learning approaches work best for teaching image and speech recognition in high school?
Structured deliberation formats work better than lecture here because the ethical questions involve genuine value tradeoffs rather than technical facts with clear right answers. Structured academic controversy, requiring students to argue both sides before reaching a consensus, builds evidence-evaluation skills tied directly to CSTA 3B-AP-09. Gallery walks with real disparity data give students practice connecting technical understanding to policy decisions in a tangible way.
AI Applications: Image and Speech Recognition | 11th Grade Computer Science Lesson Plan | Flip Education