Skip to content

Introduction to Data Science WorkflowActivities & Teaching Strategies

Active learning works for this topic because students need to experience firsthand how messy, human-centered decisions shape every stage of the data science workflow. When students clean data, debate categories, or interpret visualizations, they confront the real challenges of turning raw information into meaningful insight.

12th GradeComputer Science3 activities20 min50 min

Learning Objectives

  1. 1Describe the sequential stages of the data science workflow, including data acquisition, cleaning, analysis, and communication.
  2. 2Evaluate the impact of data quality issues, such as missing values and outliers, on the reliability of analytical results.
  3. 3Design a project plan for a data science initiative, identifying key steps, potential challenges, and necessary resources for a given scenario.
  4. 4Critique the ethical implications of data collection and usage in a specific real-world context.
  5. 5Synthesize findings from a data analysis into a clear and concise report or presentation suitable for a non-technical audience.

Want a complete lesson plan with these objectives? Generate a Mission

50 min·Small Groups

Inquiry Circle: Bias in the Data

Provide groups with a dataset used for a fictional 'college admissions AI' that contains historical biases (e.g., favoring certain zip codes). Students must find the patterns that lead to unfair outcomes and propose a way to 'clean' or adjust the data to ensure equity.

Prepare & details

Explain the iterative nature of the data science workflow and its key stages.

Facilitation Tip: During Collaborative Investigation: Bias in the Data, circulate and listen for groups that conflate 'common' with 'correct' when identifying bias in datasets, then ask them to justify their claims with data examples.

Setup: Groups at tables with access to source materials

Materials: Source material collection, Inquiry cycle worksheet, Question generation protocol, Findings presentation template

AnalyzeEvaluateCreateSelf-ManagementSelf-Awareness
45 min·Individual

Gallery Walk: Data Visualizations

Students take a raw dataset and create a visualization (chart, map, or infographic) that tells a specific story. They display their work around the room, and peers use a 'See-Think-Wonder' protocol to evaluate what the data is saying and what might be missing.

Prepare & details

Analyze the importance of data cleaning and preprocessing in ensuring reliable insights.

Facilitation Tip: For the Gallery Walk: Data Visualizations, post guiding questions at each station to push students beyond 'it looks pretty' to 'what pattern does this reveal and why'.

Setup: Wall space or tables arranged around room perimeter

Materials: Large paper/poster boards, Markers, Sticky notes for feedback

UnderstandApplyAnalyzeCreateRelationship SkillsSocial Awareness
20 min·Pairs

Think-Pair-Share: Correlation vs. Causation

Present students with 'spurious correlations' (e.g., ice cream sales and shark attacks). Students work in pairs to explain why these two things are correlated but not causal, and then share their own examples of how Big Data might lead to false conclusions if not interpreted correctly.

Prepare & details

Design a basic data science project plan for a given real-world problem.

Facilitation Tip: In Think-Pair-Share: Correlation vs. Causation, deliberately pair students with opposing initial interpretations so they must reconcile differences using dataset evidence.

Setup: Standard classroom seating; students turn to a neighbor

Materials: Discussion prompt (projected or printed), Optional: recording sheet for pairs

UnderstandApplyAnalyzeSelf-AwarenessRelationship Skills

Teaching This Topic

Approach this topic by treating data science as a human practice, not just a technical skill. Teach students to question every step, from data collection to final claims, by modeling your own skepticism during demonstrations. Avoid rushing to tools before students understand what those tools are actually doing to the data. Research shows that students grasp the Four Vs better when they grapple with concrete consequences of each V, like velocity overwhelming analysis or veracity making predictions unreliable.

What to Expect

Successful learning looks like students recognizing that data is not neutral, questioning the stories charts tell, and justifying their reasoning with evidence from datasets. By the end of these activities, students should articulate why workflow steps matter and how to avoid common pitfalls like confusing correlation with causation.

These activities are a starting point. A full mission is the experience.

  • Complete facilitation script with teacher dialogue
  • Printable student materials, ready for class
  • Differentiation strategies for every learner
Generate a Mission

Watch Out for These Misconceptions

Common MisconceptionDuring Collaborative Investigation: Bias in the Data, watch for students who assume larger datasets automatically correct for bias because they include more examples.

What to Teach Instead

Use the dataset’s metadata and collection context to guide students into noticing how even large datasets can encode bias if the original sampling excluded certain groups or measured irrelevant variables.

Common MisconceptionDuring Collaborative Investigation: Bias in the Data, watch for students who believe data is neutral if it comes from 'official' sources like government records.

What to Teach Instead

Have students trace a single variable’s journey from collection to publication, highlighting the human choices in defining categories, setting thresholds, and omitting outliers.

Assessment Ideas

Quick Check

After Collaborative Investigation: Bias in the Data, present a short, messy CSV and ask students to identify at least three cleaning steps and explain why each step matters for reducing bias.

Discussion Prompt

During Gallery Walk: Data Visualizations, ask pairs to draft a short memo summarizing one visualization’s key insight and one limitation, then share with the class.

Exit Ticket

After Think-Pair-Share: Correlation vs. Causation, collect index cards where students list one real-world example where correlation does not imply causation and explain why.

Extensions & Scaffolding

  • Challenge early finishers to design a visualization that intentionally hides a key trend, then have peers detect the manipulation.
  • For students who struggle, provide pre-categorized datasets that still contain obvious errors so they can focus on cleaning steps without cognitive overload.
  • Deeper exploration: Invite students to find and analyze a real-world dataset misused in media, then present their findings in a mock policy brief.

Key Vocabulary

Data AcquisitionThe process of gathering raw data from various sources, such as databases, APIs, or surveys, for analysis.
Data CleaningThe process of identifying and correcting or removing errors, inconsistencies, and inaccuracies in datasets to improve data quality.
Exploratory Data Analysis (EDA)An approach to analyzing datasets to summarize their main characteristics, often with visual methods, to uncover patterns and identify anomalies.
Feature EngineeringThe process of using domain knowledge to create new input variables (features) from existing raw data to improve the performance of machine learning models.
Model DeploymentThe process of making a trained machine learning model available for use in a production environment to make predictions on new data.

Ready to teach Introduction to Data Science Workflow?

Generate a full mission with everything you need

Generate a Mission