Computer Science · 12th Grade · Data Science and Intelligent Systems · Weeks 19-27

Fundamentals of Machine Learning: Supervised Learning

Students are introduced to supervised learning, exploring concepts like regression and classification and how models learn from labeled data.

Common Core State StandardsCSTA: 3B-AP-09CSTA: 3B-DA-06

About This Topic

Supervised learning is the foundation of most deployed machine learning systems in use today. Students in US 12th-grade CS learn that in supervised learning, a model is trained on a labeled dataset, pairs of input features and correct outputs, to learn a mapping function that generalizes to new, unseen inputs. The term 'supervised' reflects that the training process is guided by known correct answers.

Two major task types fall under supervised learning: classification, where the output is a category (spam or not spam, tumor type, digit label), and regression, where the output is a continuous value (house price, temperature forecast, credit score). Both tasks follow the same pipeline: collect labeled data, choose a model architecture, train by minimizing a loss function, evaluate on held-out test data, and iterate. Students also learn why splitting data into training and test sets is essential, using the same data for both produces inflated performance estimates that do not predict real-world behavior.

Active learning approaches are productive here because students can experience the training feedback loop directly using tools like Teachable Machine or scikit-learn, building intuition for what 'learning from data' actually means rather than treating it as a black box.

Key Questions

How does a machine learning model differ from a traditional rule-based program?
Differentiate between classification and regression tasks in supervised learning.
Explain the process of training and evaluating a supervised learning model.

Learning Objectives

Compare and contrast classification and regression tasks within supervised machine learning.
Explain the fundamental process of training a supervised learning model using labeled data.
Evaluate the performance of a trained supervised learning model using appropriate metrics.
Design a simple supervised learning experiment to predict a categorical or numerical outcome.

Before You Start

Introduction to Programming Concepts

Why: Students need basic programming skills to implement and experiment with machine learning models.

Data Representation and Manipulation

Why: Understanding how to structure and process data is essential before applying machine learning algorithms.

Key Vocabulary

Labeled Data	A dataset where each data point is paired with a correct output or 'label', used to train supervised learning models.
Classification	A supervised learning task that predicts a discrete category or class label, such as 'spam' or 'not spam'.
Regression	A supervised learning task that predicts a continuous numerical value, such as a house price or temperature.
Training Set	The portion of labeled data used to teach the machine learning model by adjusting its parameters.
Test Set	A separate portion of labeled data, unseen during training, used to evaluate the model's generalization ability.

Watch Out for These Misconceptions

Common MisconceptionMore training data always produces a better model.

What to Teach Instead

More data helps, but data quality and relevance matter more than volume. A large dataset with systematic labeling errors or missing important features can train a confident but wrong model. Having students deliberately corrupt a portion of their training labels and observe the effect makes this concrete.

Common MisconceptionHigh accuracy on the training set means the model is good.

What to Teach Instead

A model can memorize training examples without learning generalizable patterns, a problem called overfitting. Evaluating on a separate test set is essential. Students who train on the full dataset and then 'test' on the same data regularly see near-100% accuracy, and experiencing the drop when they apply their model to new examples is a lesson that sticks.

Common MisconceptionSupervised learning models understand the meaning of the data they process.

What to Teach Instead

Models learn statistical associations between input features and outputs. They do not understand context, causation, or meaning. A spam classifier that achieves 98% accuracy has no idea what spam is, it has found patterns that correlate with the label. This distinction matters enormously when discussing model failures and AI ethics.

Active Learning Ideas

See all activities

Hands-On Lab: Train Your First Classifier

Students use Google's Teachable Machine or a simple scikit-learn notebook to train an image or text classifier on a dataset they collect themselves. They deliberately include mislabeled examples and observe how this degrades accuracy. The lab closes with each pair reporting their accuracy and one insight about what made their training data better or worse.

45 min·Pairs

Think-Pair-Share: Classification or Regression?

Present eight real-world prediction problems and ask pairs to categorize each as classification or regression and justify the choice. Include ambiguous cases like predicting customer satisfaction (score 1-10 versus positive/negative). Whole-class discussion reveals that the distinction sometimes depends on how you frame the business problem, not just the data.

15 min·Pairs

Socratic Seminar: What Does 'Learning' Mean?

Open with the question: 'Is a model that scores 99% accuracy on training data but 60% on new data actually learning?' Students draw on their lab experience to discuss generalization, memorization, and the purpose of the train/test split. The teacher facilitates without providing answers, letting student reasoning drive the conversation toward overfitting.

25 min·Whole Class

Gallery Walk: Algorithm Comparison

Post four posters around the room, linear regression, decision trees, k-nearest neighbors, and naive Bayes, each with a brief description, a sample use case, and a blank section labeled 'when this would struggle.' Groups rotate, add sticky notes to the struggle section, then rotate again to critique and extend each other's entries.

20 min·Small Groups

Real-World Connections

Financial analysts use classification models to predict loan default risk, helping banks decide whether to approve applications for individuals in cities like New York or Chicago.
Medical researchers employ regression models to forecast patient recovery times based on various health indicators, aiding treatment planning in hospitals worldwide.
E-commerce platforms like Amazon utilize classification algorithms to categorize products and recommend items to customers based on their past purchases and browsing history.

Assessment Ideas

Exit Ticket

Provide students with two scenarios: one describing predicting house prices and another describing identifying images of cats or dogs. Ask them to identify which scenario is a classification task and which is a regression task, and to briefly explain why.

Quick Check

Present students with a small, pre-labeled dataset (e.g., fruit type and color). Ask them to verbally explain how they would use this data to train a model to identify new fruits, focusing on the role of the labels.

Discussion Prompt

Pose the question: 'Why is it crucial to evaluate a supervised learning model on data it has not seen during training?' Facilitate a discussion where students explain the concept of overfitting and the importance of the test set for assessing real-world performance.

Frequently Asked Questions

What is supervised learning and how is it different from regular programming?

In traditional programming, a developer writes explicit rules: if these conditions are met, produce this output. In supervised learning, a model infers rules from examples, the developer provides many input-output pairs and the algorithm finds patterns that generalize. No one writes the rules explicitly; the model discovers them from data.

What is the difference between classification and regression in machine learning?

Classification predicts which category an input belongs to, for example, determining whether an email is spam or not spam. Regression predicts a continuous numerical value, for example, estimating a home's sale price. Both are supervised tasks, but they use different loss functions and evaluation metrics to measure how well the model performs.

Why do you need separate training and test datasets in supervised learning?

A model evaluated on its own training data reports inflated performance because it has already seen those examples. The test set contains examples the model has never encountered, giving an honest estimate of how it will perform on real-world inputs. Without this separation, you cannot tell whether the model has learned generalizable patterns or simply memorized the training data.

How does active learning help students understand supervised machine learning?

Hands-on training labs, especially when students collect and label their own data, make the supervised learning pipeline tangible. When students deliberately corrupt labels or test their model on new examples, they experience overfitting and generalization failure firsthand. These direct experiences build intuitions that are much harder to develop through lecture or reading about the concepts abstractly.

More in Data Science and Intelligent Systems

Introduction to Data Science Workflow

Students learn the end-to-end process of data science, from data acquisition and cleaning to analysis and communication of results.

2 methodologies

Big Data Concepts and Pattern Recognition

Students analyze massive datasets to find hidden trends, using statistical libraries to process and visualize complex information sets.

2 methodologies

Data Visualization and Interpretation

Students learn to create effective data visualizations to communicate insights and identify patterns in complex datasets.

2 methodologies

Fundamentals of Machine Learning: Unsupervised Learning

Students explore unsupervised learning techniques like clustering and dimensionality reduction to find hidden structures in unlabeled data.

2 methodologies

Neural Networks and Deep Learning (Conceptual)

Students conceptually explore how neural networks are structured, how they learn from experience, and the basics of deep learning.

2 methodologies

Evaluating Machine Learning Models

Students learn various metrics and techniques for evaluating the performance and robustness of machine learning models.

2 methodologies