Computer Science · 12th Grade · Data Science and Intelligent Systems · Weeks 19-27

Introduction to Data Science Workflow

Students learn the end-to-end process of data science, from data acquisition and cleaning to analysis and communication of results.

TL;DR:Active learning works for this topic because students need to experience firsthand how messy, human-centered decisions shape every stage of the data science workflow. When students clean data, debate categories, or interpret visualizations, they confront the real challenges of turning raw information into meaningful insight.

Common Core State StandardsCSTA: 3B-DA-05CCSS.ELA-LITERACY.RST.11-12.7

About This Topic

Big Data and pattern recognition involve the processing of datasets so massive that traditional methods fail. In 12th grade, students learn to use statistical tools and computational models to find hidden trends in fields like medicine, economics, and social media. This topic covers the 'Four Vs' of Big Data: Volume, Velocity, Variety, and Veracity. Students move from simple data entry to analyzing real-world datasets, such as climate records or public health statistics, to make evidence-based predictions.

A major focus of this unit is identifying bias. Students examine how historical data can reinforce existing inequalities if not handled carefully. This aligns with CSTA standards for using data to highlight relationships and for evaluating how bias in data collection affects the results of a model. This topic particularly benefits from hands-on, student-centered approaches where students can debate the ethics of data usage and collaboratively visualize complex information.

Key Questions

Explain the iterative nature of the data science workflow and its key stages.
Analyze the importance of data cleaning and preprocessing in ensuring reliable insights.
Design a basic data science project plan for a given real-world problem.

Learning Objectives

Describe the sequential stages of the data science workflow, including data acquisition, cleaning, analysis, and communication.
Evaluate the impact of data quality issues, such as missing values and outliers, on the reliability of analytical results.
Design a project plan for a data science initiative, identifying key steps, potential challenges, and necessary resources for a given scenario.
Critique the ethical implications of data collection and usage in a specific real-world context.
Synthesize findings from a data analysis into a clear and concise report or presentation suitable for a non-technical audience.

Before You Start

Introduction to Programming Concepts

Why: Students need foundational programming skills to manipulate data and implement algorithms used in data science.

Basic Statistical Concepts

Why: Understanding concepts like mean, median, mode, and basic probability is essential for data analysis and interpretation.

Data Representation and Visualization

Why: Students should be familiar with different ways to represent data (tables, charts) to effectively explore and communicate findings.

Key Vocabulary

Data Acquisition	The process of gathering raw data from various sources, such as databases, APIs, or surveys, for analysis.
Data Cleaning	The process of identifying and correcting or removing errors, inconsistencies, and inaccuracies in datasets to improve data quality.
Exploratory Data Analysis (EDA)	An approach to analyzing datasets to summarize their main characteristics, often with visual methods, to uncover patterns and identify anomalies.
Feature Engineering	The process of using domain knowledge to create new input variables (features) from existing raw data to improve the performance of machine learning models.
Model Deployment	The process of making a trained machine learning model available for use in a production environment to make predictions on new data.

Watch Out for These Misconceptions

Common MisconceptionMore data always leads to more accurate results.

What to Teach Instead

Explain that if the data is low quality or biased, having more of it just makes the error more 'certain.' Use a peer discussion about 'garbage in, garbage out' to show how the quality of data collection is more important than the sheer volume.

Common MisconceptionData is neutral and objective.

What to Teach Instead

Clarify that data is collected by humans who make choices about what to measure and how to categorize it. A hands-on activity where students try to categorize 'ambiguous' items will show how human judgment is baked into every dataset.

Active Learning Ideas

See all activities→

Inquiry Circle

Bias in the Data

Provide groups with a dataset used for a fictional 'college admissions AI' that contains historical biases (e.g., favoring certain zip codes). Students must find the patterns that lead to unfair outcomes and propose a way to 'clean' or adjust the data to ensure equity.

50 min·Small Groups

Gallery Walk

Data Visualizations

Students take a raw dataset and create a visualization (chart, map, or infographic) that tells a specific story. They display their work around the room, and peers use a 'See-Think-Wonder' protocol to evaluate what the data is saying and what might be missing.

45 min·Individual

Think-Pair-Share

Correlation vs. Causation

Present students with 'spurious correlations' (e.g., ice cream sales and shark attacks). Students work in pairs to explain why these two things are correlated but not causal, and then share their own examples of how Big Data might lead to false conclusions if not interpreted correctly.

20 min·Pairs

Real-World Connections

Data scientists at Netflix analyze viewing habits and user ratings to recommend movies and shows, influencing content creation and platform development.
Epidemiologists use data science workflows to track disease outbreaks, clean public health records, and analyze transmission patterns to inform public health interventions, as seen during global health crises.
Financial analysts at investment firms employ data science techniques to clean historical market data, identify trading patterns, and build predictive models for stock market forecasting.

Assessment Ideas

Quick Check

Present students with a short, messy dataset (e.g., a CSV with inconsistent formatting, missing entries). Ask them to identify at least three specific cleaning steps needed and explain why each step is important for accurate analysis.

Discussion Prompt

Pose the scenario: 'A city wants to use data from traffic cameras to optimize traffic light timing.' Ask students to discuss: What types of data would be acquired? What are potential ethical concerns regarding privacy? How would they communicate their findings to city officials?

Exit Ticket

On an index card, have students list the four main stages of the data science workflow in order. For each stage, ask them to write one sentence describing a key activity or challenge associated with it.

Frequently Asked Questions

How can active learning help students understand Big Data?

Big Data can feel overwhelmingly abstract. Active learning strategies, like 'data physicalization' (using physical objects to represent data points) or collaborative 'bias hunting' in real datasets, make the concepts tangible. These activities allow students to move from being passive consumers of information to critical analysts who understand how data is shaped and manipulated.

What are the 'Four Vs' of Big Data?

They are Volume (the amount of data), Velocity (the speed at which it's generated), Variety (the different types of data), and Veracity (the accuracy and trustworthiness of the data).

Why is bias such a big concern in data science?

Because computers learn from the data we give them. If that data reflects past human prejudices, the computer will learn to repeat those prejudices, often making them look like 'objective' facts.

What skills do students need for data science?

Students need a mix of math (statistics), programming (to process the data), and critical thinking (to interpret what the results actually mean in a real-world context).

More in Data Science and Intelligent Systems

Big Data Concepts and Pattern Recognition

Students analyze massive datasets to find hidden trends, using statistical libraries to process and visualize complex information sets.

8 methodologies

Data Visualization and Interpretation

Students learn to create effective data visualizations to communicate insights and identify patterns in complex datasets.

8 methodologies

Fundamentals of Machine Learning: Supervised Learning

Students are introduced to supervised learning, exploring concepts like regression and classification and how models learn from labeled data.

8 methodologies

Fundamentals of Machine Learning: Unsupervised Learning

Students explore unsupervised learning techniques like clustering and dimensionality reduction to find hidden structures in unlabeled data.

8 methodologies

Neural Networks and Deep Learning (Conceptual)

Students conceptually explore how neural networks are structured, how they learn from experience, and the basics of deep learning.

8 methodologies

Evaluating Machine Learning Models

Students learn various metrics and techniques for evaluating the performance and robustness of machine learning models.

8 methodologies

Edited by Adriana Perusin, Editor-in-Chief, Flip Education