Training Data and Model EvaluationActivities & Teaching Strategies
Active learning works for this topic because students need to see how data quality and evaluation choices shape real model behavior. Watching a model fail due to overfitting or trusting a high-accuracy metric without context sticks better than abstract lectures.
Learning Objectives
- 1Explain the critical role of training data quality in the development of reliable machine learning models.
- 2Analyze and compare common metrics such as accuracy, precision, and recall for evaluating AI model performance.
- 3Critique the consequences of overfitting and underfitting on a model's ability to generalize to new data.
- 4Design a simple experiment to demonstrate the impact of data quantity on model performance.
Want a complete lesson plan with these objectives? Generate a Mission →
Think-Pair-Share: Accuracy Isn't Everything
Present a scenario: a disease affects 1% of the population, and a diagnostic AI claims 99% accuracy by always predicting 'healthy.' Ask partners to explain why this is misleading and what metric would be better. After sharing, introduce precision and recall as tools for understanding model behavior on imbalanced datasets.
Prepare & details
Explain the critical role of training data in machine learning model development.
Facilitation Tip: During Think-Pair-Share: Accuracy Isn't Everything, assign one student in each pair to argue for accuracy and the other to critique it using the provided scenario cards.
Setup: Standard classroom seating; students turn to a neighbor
Materials: Discussion prompt (projected or printed), Optional: recording sheet for pairs
Overfitting Experiment
Students train a simple model (using a provided notebook) on progressively smaller subsets of training data while testing on the same fixed test set. They plot training vs. test accuracy as sample size decreases and observe overfitting emerge. Pairs write a paragraph describing what they observed and predicting what would happen with even less data.
Prepare & details
Analyze various metrics used to evaluate the performance of AI models (e.g., accuracy, precision, recall).
Facilitation Tip: When running the Overfitting Experiment, give each group two different model sizes and let them present their validation curves side by side on the same chart.
Setup: Groups at tables with case materials
Materials: Case study packet (3-5 pages), Analysis framework worksheet, Presentation template
Gallery Walk: Critique the AI Claim
Post five printed AI headlines or marketing claims ('Our model achieves 98% accuracy!', 'AI outperforms doctors in diagnosis'). Student groups annotate each with questions they'd need answered before accepting the claim: What is the test set? Is accuracy the right metric? What population was tested? How were edge cases handled? Class discusses which claims hold up to scrutiny.
Prepare & details
Critique the potential pitfalls of overfitting and underfitting in model training.
Facilitation Tip: During the Gallery Walk: Critique the AI Claim, post each claim at a station with a single evaluation metric; students must write why that metric alone is insufficient.
Setup: Wall space or tables arranged around room perimeter
Materials: Large paper/poster boards, Markers, Sticky notes for feedback
Feature Engineering Challenge
Give teams a raw dataset (e.g., raw text strings, timestamps) and ask them to engineer three new features they think would help a model predict a given outcome. Teams present their features and justify why they might be predictive. Class votes on which features they think would most improve the model, then test predictions using a provided script.
Prepare & details
Explain the critical role of training data in machine learning model development.
Facilitation Tip: In the Feature Engineering Challenge, provide a dataset with 10 raw features and require teams to submit their code before they can add or drop any.
Setup: Groups at tables with case materials
Materials: Case study packet (3-5 pages), Analysis framework worksheet, Presentation template
Teaching This Topic
Start with concrete examples before theory. Use the same dataset across activities so students see how data choices ripple into evaluation and model behavior. Avoid jumping straight to code; focus first on the reasoning behind each step. Research shows students grasp overfitting better when they see a model’s validation curve move as they change complexity.
What to Expect
Students will explain why accuracy can mislead, recognize overfitting in models, critique claims about AI systems, and justify their own feature choices. They will use evaluation metrics to make decisions, not just report numbers.
These activities are a starting point. A full mission is the experience.
- Complete facilitation script with teacher dialogue
- Printable student materials, ready for class
- Differentiation strategies for every learner
Watch Out for These Misconceptions
Common MisconceptionDuring Think-Pair-Share: Accuracy Isn't Everything, watch for students who insist accuracy is always the best metric. Redirect them to the imbalanced dataset card and ask them to calculate precision and recall from the confusion matrix provided.
What to Teach Instead
Use the confusion matrix on the card to recalculate precision and recall. Ask students to compare the two metrics to the reported accuracy and explain why a model with 95% accuracy might miss half the positive cases.
Common MisconceptionDuring the Overfitting Experiment, watch for students who blame overfitting on the model being 'too smart'. Redirect them to the validation curve on their shared screen and ask what happens to training and validation error as model size increases.
What to Teach Instead
Ask students to point out where the validation error starts rising while the training error keeps falling. Then ask them to explain what the model is memorizing instead of learning.
Common MisconceptionDuring the Feature Engineering Challenge, watch for students who add every feature hoping for better results. Redirect them to the performance plot on the whiteboard and ask which features actually improved the score.
What to Teach Instead
Have teams present their feature list and the code that generated it. Ask the class to vote on which features were truly informative and which introduced noise.
Assessment Ideas
After Think-Pair-Share: Accuracy Isn't Everything, collect each pair’s written justification for which evaluation metric matters most in their scenario and one concrete example of a model that would mislead if judged by accuracy alone.
During the Gallery Walk: Critique the AI Claim, ask students to write a short note at each station explaining why the single metric shown is insufficient and what additional metrics or data they would need.
After the Overfitting Experiment, facilitate a whole-class discussion using the prompt: 'If we only had 100 training examples, how would that limit the largest model size we could use without overfitting? Use the validation curves from your experiment to support your answer.'
Extensions & Scaffolding
- Challenge: Ask students to design a new evaluation metric for a scenario where false positives are 100 times more costly than false negatives.
- Scaffolding: Provide a partially completed confusion matrix template and ask students to fill in the missing values before calculating precision and recall.
- Deeper exploration: Have students research and present one regularization technique (L1, L2, dropout) and explain how it changes the training curve in the Overfitting Experiment.
Key Vocabulary
| Training Data | The dataset used to teach a machine learning model patterns and relationships. Its quality directly impacts the model's effectiveness. |
| Feature Engineering | The process of selecting, transforming, and creating features from raw data to improve model performance and accuracy. |
| Accuracy | A metric that measures the proportion of correct predictions made by a model out of the total number of predictions. |
| Precision | A metric that measures the proportion of true positive predictions among all positive predictions made by the model. It answers, 'Of all the times the model predicted X, how often was it correct?' |
| Recall | A metric that measures the proportion of true positive predictions among all actual positive instances. It answers, 'Of all the actual X cases, how many did the model correctly identify?' |
| Overfitting | A phenomenon where a machine learning model learns the training data too well, including its noise and outliers, leading to poor performance on unseen data. |
Suggested Methodologies
More in Artificial Intelligence and Ethics
Introduction to Artificial Intelligence
Students will define AI, explore its history, and differentiate between strong and weak AI.
2 methodologies
Machine Learning Fundamentals
Introduction to how computers learn from data through supervised and unsupervised learning.
2 methodologies
Supervised Learning: Classification and Regression
Exploring algorithms that learn from labeled data to make predictions.
2 methodologies
Unsupervised Learning: Clustering
Discovering patterns and structures in unlabeled data using algorithms like K-Means.
2 methodologies
AI Applications: Image and Speech Recognition
Exploring how AI is used in practical applications like recognizing images and understanding speech.
2 methodologies
Ready to teach Training Data and Model Evaluation?
Generate a full mission with everything you need
Generate a Mission