Skip to content
Computer Science · 12th Grade · Data Science and Intelligent Systems · Weeks 19-27

Fundamentals of Machine Learning: Unsupervised Learning

Students explore unsupervised learning techniques like clustering and dimensionality reduction to find hidden structures in unlabeled data.

Common Core State StandardsCSTA: 3B-AP-09CSTA: 3B-DA-06

About This Topic

Unsupervised learning addresses a fundamentally different challenge: finding structure in data when no labels exist. Rather than predicting a known output, the algorithm explores the data to discover patterns, groupings, or lower-dimensional representations on its own. This mirrors many real-world situations where labeling is expensive, impossible, or not yet defined, customer segmentation, anomaly detection, and exploratory data analysis all fall into this category.

Clustering algorithms, particularly k-means, group data points so that members of the same cluster are more similar to each other than to members of other clusters. Students learn how the algorithm iteratively assigns points to centroids and updates those centroids, and why the choice of k (number of clusters) and the definition of 'similarity' dramatically affect results. Dimensionality reduction techniques like PCA compress high-dimensional data into fewer dimensions while preserving as much variance as possible, useful both for visualization and for improving downstream algorithm performance.

Active learning fits naturally because students can cluster physical objects or unlabeled data points by hand before seeing the algorithm, revealing both the human intuition behind the approach and the arbitrary choices that must be made explicit in formal algorithms.

Key Questions

  1. Explain how unsupervised learning can discover patterns without explicit labels.
  2. Compare the applications of clustering and dimensionality reduction in data analysis.
  3. Analyze the challenges of evaluating the performance of unsupervised learning models.

Learning Objectives

  • Classify data points into distinct groups based on inherent similarities using clustering algorithms.
  • Compare the effectiveness of k-means and hierarchical clustering for different dataset structures.
  • Analyze the trade-offs between information loss and dimensionality reduction using techniques like PCA.
  • Evaluate the suitability of unsupervised learning methods for anomaly detection in financial transaction data.
  • Design a process to visualize high-dimensional data by applying dimensionality reduction techniques.

Before You Start

Introduction to Data Science and Supervised Learning

Why: Students need a foundational understanding of data, data types, and the concept of learning from data, including supervised approaches, to grasp the distinctions of unsupervised learning.

Data Visualization

Why: The ability to interpret and create visualizations is crucial for understanding the output of dimensionality reduction techniques and for exploring potential clusters.

Basic Statistical Concepts (Mean, Variance)

Why: Understanding fundamental statistical measures is necessary for comprehending how algorithms like k-means calculate centroids and how PCA works with variance.

Key Vocabulary

ClusteringAn unsupervised learning technique that groups data points into clusters based on their similarity, without prior knowledge of group labels.
CentroidThe center of a cluster, typically calculated as the mean of all data points within that cluster, used in algorithms like k-means.
Dimensionality ReductionA process that reduces the number of random variables under consideration by obtaining a set of principal variables, simplifying data while retaining essential information.
Principal Component Analysis (PCA)A statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components.
Unlabeled DataData that does not have predefined categories or tags, requiring algorithms to discover patterns or structures independently.

Watch Out for These Misconceptions

Common MisconceptionUnsupervised learning discovers the 'true' structure in data objectively.

What to Teach Instead

Unsupervised algorithms impose structure based on mathematical assumptions, how similarity is defined, how many clusters are specified, which dimensions are retained. Different assumptions produce different results from the same data. Having students run the same clustering with different values of k or different distance metrics makes the subjectivity visible.

Common MisconceptionClustering is only useful when you have no idea what's in the data.

What to Teach Instead

Clustering is also used to validate hypotheses (do known customer types appear as distinct clusters?), to compress data for downstream processing, and to detect anomalies. Students who apply clustering to a dataset where they know the ground-truth labels, then compare clusters to labels, see both the strengths and limits of the approach.

Common MisconceptionDimensionality reduction always loses important information.

What to Teach Instead

PCA retains the directions of maximum variance, meaning that the most statistically significant patterns in the data are often preserved even after aggressive compression. Noise and redundant features are discarded, which can actually improve downstream model performance. Visualizing before-and-after representations helps students see what is kept, not just what is lost.

Active Learning Ideas

See all activities

Simulation Activity: Human K-Means Clustering

Tape a large coordinate grid on the floor. Give each student a card with (x, y) values and have them stand at their position. The teacher randomly assigns two students as initial centroids. Students assign themselves to the nearest centroid by walking toward it, then recompute centroids as a group average. Repeat for two more rounds. Students observe convergence and discuss whether the result is globally optimal.

25 min·Whole Class

Collaborative Problem-Solving: Clustering Unlabeled Data

Students run k-means on a dataset of their choice, customer purchase data, penguin measurements, or movie ratings, using Python and scikit-learn. They experiment with different values of k, visualize the results, and write a paragraph interpreting what each cluster might represent. The ambiguity of interpreting unlabeled clusters is a key learning moment.

40 min·Pairs

Think-Pair-Share: Is This Clustering Useful?

Present two clustering results for the same dataset, one with two clusters, one with eight. Pairs discuss which is more useful for a specific business decision (e.g., designing a marketing campaign). There is no single right answer; the discussion surfaces the fact that 'good' clustering depends on the question being asked, not just on a mathematical metric.

15 min·Pairs

Gallery Walk: Dimensionality Reduction Visualization

Post printouts showing the same dataset in 3D and as a 2D PCA projection, alongside visualizations of t-SNE and UMAP. Students annotate each with what information appears preserved and what appears lost. The walk helps students understand dimensionality reduction as a compression decision with trade-offs rather than as a magical reveal of hidden truth.

18 min·Small Groups

Real-World Connections

  • Marketing professionals use clustering algorithms to segment customer bases for targeted advertising campaigns, identifying distinct groups of consumers with similar purchasing behaviors for companies like Amazon.
  • Cybersecurity analysts employ anomaly detection, a form of unsupervised learning, to identify unusual network traffic patterns that may indicate a security breach for organizations such as Google or Microsoft.
  • Genomic researchers use dimensionality reduction to visualize and analyze complex gene expression data, helping to identify patterns related to diseases or biological processes in studies at institutions like the Broad Institute.

Assessment Ideas

Quick Check

Present students with a scatter plot of unlabeled data points. Ask them to visually identify 2-3 potential clusters and explain the criteria they used for grouping. Then, ask them to hypothesize what a centroid for one of their clusters might represent.

Discussion Prompt

Pose the question: 'Imagine you are given a dataset of customer reviews for a new product, but the reviews are not categorized by sentiment (positive, negative, neutral). How could you use unsupervised learning to gain insights into customer feedback, and what are the potential challenges in interpreting the results?'

Exit Ticket

Provide students with a brief description of a scenario (e.g., identifying fraudulent transactions, grouping similar news articles). Ask them to identify whether clustering or dimensionality reduction would be more appropriate and to explain why in one to two sentences.

Frequently Asked Questions

What is unsupervised learning and how is it different from supervised learning?
Unsupervised learning finds patterns in data without using labeled examples or predefined correct answers. The algorithm explores structure on its own, grouping similar items, reducing redundancy, or detecting outliers. Supervised learning, by contrast, trains on labeled input-output pairs to predict known outcomes. Unsupervised learning is used when labels are unavailable, expensive, or not yet defined.
How does the k-means clustering algorithm work?
K-means starts by randomly placing k centroids in the data space. Each data point is assigned to the nearest centroid, forming k clusters. The centroid of each cluster is then recalculated as the average of its members. This assignment-update cycle repeats until cluster assignments stop changing. The result depends on the initial centroid placement, so the algorithm is often run multiple times.
What is dimensionality reduction and why is it useful in machine learning?
Dimensionality reduction compresses a dataset with many features into fewer dimensions while preserving as much useful structure as possible. It reduces computational cost, can remove noise and redundancy, and enables visualization of high-dimensional data in 2D or 3D. Techniques like PCA find the directions of greatest variance and project data onto those axes.
How does active learning help students understand unsupervised learning concepts?
Physical clustering simulations, where students group themselves based on shared characteristics, make the algorithm's iterative logic concrete and reveal how much human judgment goes into defining similarity. When students then run the algorithm on real data and struggle to interpret ambiguous clusters, they directly experience the epistemological challenge that makes unsupervised learning distinct from classification.