Skip to content

Training Data and Model EvaluationActivities & Teaching Strategies

Active learning works for this topic because students need to see how data quality and evaluation choices shape real model behavior. Watching a model fail due to overfitting or trusting a high-accuracy metric without context sticks better than abstract lectures.

11th GradeComputer Science4 activities20 min45 min

Learning Objectives

  1. 1Explain the critical role of training data quality in the development of reliable machine learning models.
  2. 2Analyze and compare common metrics such as accuracy, precision, and recall for evaluating AI model performance.
  3. 3Critique the consequences of overfitting and underfitting on a model's ability to generalize to new data.
  4. 4Design a simple experiment to demonstrate the impact of data quantity on model performance.

Want a complete lesson plan with these objectives? Generate a Mission

20 min·Pairs

Think-Pair-Share: Accuracy Isn't Everything

Present a scenario: a disease affects 1% of the population, and a diagnostic AI claims 99% accuracy by always predicting 'healthy.' Ask partners to explain why this is misleading and what metric would be better. After sharing, introduce precision and recall as tools for understanding model behavior on imbalanced datasets.

Prepare & details

Explain the critical role of training data in machine learning model development.

Facilitation Tip: During Think-Pair-Share: Accuracy Isn't Everything, assign one student in each pair to argue for accuracy and the other to critique it using the provided scenario cards.

Setup: Standard classroom seating; students turn to a neighbor

Materials: Discussion prompt (projected or printed), Optional: recording sheet for pairs

UnderstandApplyAnalyzeSelf-AwarenessRelationship Skills
45 min·Pairs

Overfitting Experiment

Students train a simple model (using a provided notebook) on progressively smaller subsets of training data while testing on the same fixed test set. They plot training vs. test accuracy as sample size decreases and observe overfitting emerge. Pairs write a paragraph describing what they observed and predicting what would happen with even less data.

Prepare & details

Analyze various metrics used to evaluate the performance of AI models (e.g., accuracy, precision, recall).

Facilitation Tip: When running the Overfitting Experiment, give each group two different model sizes and let them present their validation curves side by side on the same chart.

Setup: Groups at tables with case materials

Materials: Case study packet (3-5 pages), Analysis framework worksheet, Presentation template

AnalyzeEvaluateCreateDecision-MakingSelf-Management
30 min·Small Groups

Gallery Walk: Critique the AI Claim

Post five printed AI headlines or marketing claims ('Our model achieves 98% accuracy!', 'AI outperforms doctors in diagnosis'). Student groups annotate each with questions they'd need answered before accepting the claim: What is the test set? Is accuracy the right metric? What population was tested? How were edge cases handled? Class discusses which claims hold up to scrutiny.

Prepare & details

Critique the potential pitfalls of overfitting and underfitting in model training.

Facilitation Tip: During the Gallery Walk: Critique the AI Claim, post each claim at a station with a single evaluation metric; students must write why that metric alone is insufficient.

Setup: Wall space or tables arranged around room perimeter

Materials: Large paper/poster boards, Markers, Sticky notes for feedback

UnderstandApplyAnalyzeCreateRelationship SkillsSocial Awareness
35 min·Small Groups

Feature Engineering Challenge

Give teams a raw dataset (e.g., raw text strings, timestamps) and ask them to engineer three new features they think would help a model predict a given outcome. Teams present their features and justify why they might be predictive. Class votes on which features they think would most improve the model, then test predictions using a provided script.

Prepare & details

Explain the critical role of training data in machine learning model development.

Facilitation Tip: In the Feature Engineering Challenge, provide a dataset with 10 raw features and require teams to submit their code before they can add or drop any.

Setup: Groups at tables with case materials

Materials: Case study packet (3-5 pages), Analysis framework worksheet, Presentation template

AnalyzeEvaluateCreateDecision-MakingSelf-Management

Teaching This Topic

Start with concrete examples before theory. Use the same dataset across activities so students see how data choices ripple into evaluation and model behavior. Avoid jumping straight to code; focus first on the reasoning behind each step. Research shows students grasp overfitting better when they see a model’s validation curve move as they change complexity.

What to Expect

Students will explain why accuracy can mislead, recognize overfitting in models, critique claims about AI systems, and justify their own feature choices. They will use evaluation metrics to make decisions, not just report numbers.

These activities are a starting point. A full mission is the experience.

  • Complete facilitation script with teacher dialogue
  • Printable student materials, ready for class
  • Differentiation strategies for every learner
Generate a Mission

Watch Out for These Misconceptions

Common MisconceptionDuring Think-Pair-Share: Accuracy Isn't Everything, watch for students who insist accuracy is always the best metric. Redirect them to the imbalanced dataset card and ask them to calculate precision and recall from the confusion matrix provided.

What to Teach Instead

Use the confusion matrix on the card to recalculate precision and recall. Ask students to compare the two metrics to the reported accuracy and explain why a model with 95% accuracy might miss half the positive cases.

Common MisconceptionDuring the Overfitting Experiment, watch for students who blame overfitting on the model being 'too smart'. Redirect them to the validation curve on their shared screen and ask what happens to training and validation error as model size increases.

What to Teach Instead

Ask students to point out where the validation error starts rising while the training error keeps falling. Then ask them to explain what the model is memorizing instead of learning.

Common MisconceptionDuring the Feature Engineering Challenge, watch for students who add every feature hoping for better results. Redirect them to the performance plot on the whiteboard and ask which features actually improved the score.

What to Teach Instead

Have teams present their feature list and the code that generated it. Ask the class to vote on which features were truly informative and which introduced noise.

Assessment Ideas

Exit Ticket

After Think-Pair-Share: Accuracy Isn't Everything, collect each pair’s written justification for which evaluation metric matters most in their scenario and one concrete example of a model that would mislead if judged by accuracy alone.

Quick Check

During the Gallery Walk: Critique the AI Claim, ask students to write a short note at each station explaining why the single metric shown is insufficient and what additional metrics or data they would need.

Discussion Prompt

After the Overfitting Experiment, facilitate a whole-class discussion using the prompt: 'If we only had 100 training examples, how would that limit the largest model size we could use without overfitting? Use the validation curves from your experiment to support your answer.'

Extensions & Scaffolding

  • Challenge: Ask students to design a new evaluation metric for a scenario where false positives are 100 times more costly than false negatives.
  • Scaffolding: Provide a partially completed confusion matrix template and ask students to fill in the missing values before calculating precision and recall.
  • Deeper exploration: Have students research and present one regularization technique (L1, L2, dropout) and explain how it changes the training curve in the Overfitting Experiment.

Key Vocabulary

Training DataThe dataset used to teach a machine learning model patterns and relationships. Its quality directly impacts the model's effectiveness.
Feature EngineeringThe process of selecting, transforming, and creating features from raw data to improve model performance and accuracy.
AccuracyA metric that measures the proportion of correct predictions made by a model out of the total number of predictions.
PrecisionA metric that measures the proportion of true positive predictions among all positive predictions made by the model. It answers, 'Of all the times the model predicted X, how often was it correct?'
RecallA metric that measures the proportion of true positive predictions among all actual positive instances. It answers, 'Of all the actual X cases, how many did the model correctly identify?'
OverfittingA phenomenon where a machine learning model learns the training data too well, including its noise and outliers, leading to poor performance on unseen data.

Ready to teach Training Data and Model Evaluation?

Generate a full mission with everything you need

Generate a Mission