Introduction to Data Science WorkflowActivities & Teaching Strategies
Active learning works for this topic because students need to experience firsthand how messy, human-centered decisions shape every stage of the data science workflow. When students clean data, debate categories, or interpret visualizations, they confront the real challenges of turning raw information into meaningful insight.
Learning Objectives
- 1Describe the sequential stages of the data science workflow, including data acquisition, cleaning, analysis, and communication.
- 2Evaluate the impact of data quality issues, such as missing values and outliers, on the reliability of analytical results.
- 3Design a project plan for a data science initiative, identifying key steps, potential challenges, and necessary resources for a given scenario.
- 4Critique the ethical implications of data collection and usage in a specific real-world context.
- 5Synthesize findings from a data analysis into a clear and concise report or presentation suitable for a non-technical audience.
Want a complete lesson plan with these objectives? Generate a Mission →
Inquiry Circle: Bias in the Data
Provide groups with a dataset used for a fictional 'college admissions AI' that contains historical biases (e.g., favoring certain zip codes). Students must find the patterns that lead to unfair outcomes and propose a way to 'clean' or adjust the data to ensure equity.
Prepare & details
Explain the iterative nature of the data science workflow and its key stages.
Facilitation Tip: During Collaborative Investigation: Bias in the Data, circulate and listen for groups that conflate 'common' with 'correct' when identifying bias in datasets, then ask them to justify their claims with data examples.
Setup: Groups at tables with access to source materials
Materials: Source material collection, Inquiry cycle worksheet, Question generation protocol, Findings presentation template
Gallery Walk: Data Visualizations
Students take a raw dataset and create a visualization (chart, map, or infographic) that tells a specific story. They display their work around the room, and peers use a 'See-Think-Wonder' protocol to evaluate what the data is saying and what might be missing.
Prepare & details
Analyze the importance of data cleaning and preprocessing in ensuring reliable insights.
Facilitation Tip: For the Gallery Walk: Data Visualizations, post guiding questions at each station to push students beyond 'it looks pretty' to 'what pattern does this reveal and why'.
Setup: Wall space or tables arranged around room perimeter
Materials: Large paper/poster boards, Markers, Sticky notes for feedback
Think-Pair-Share: Correlation vs. Causation
Present students with 'spurious correlations' (e.g., ice cream sales and shark attacks). Students work in pairs to explain why these two things are correlated but not causal, and then share their own examples of how Big Data might lead to false conclusions if not interpreted correctly.
Prepare & details
Design a basic data science project plan for a given real-world problem.
Facilitation Tip: In Think-Pair-Share: Correlation vs. Causation, deliberately pair students with opposing initial interpretations so they must reconcile differences using dataset evidence.
Setup: Standard classroom seating; students turn to a neighbor
Materials: Discussion prompt (projected or printed), Optional: recording sheet for pairs
Teaching This Topic
Approach this topic by treating data science as a human practice, not just a technical skill. Teach students to question every step, from data collection to final claims, by modeling your own skepticism during demonstrations. Avoid rushing to tools before students understand what those tools are actually doing to the data. Research shows that students grasp the Four Vs better when they grapple with concrete consequences of each V, like velocity overwhelming analysis or veracity making predictions unreliable.
What to Expect
Successful learning looks like students recognizing that data is not neutral, questioning the stories charts tell, and justifying their reasoning with evidence from datasets. By the end of these activities, students should articulate why workflow steps matter and how to avoid common pitfalls like confusing correlation with causation.
These activities are a starting point. A full mission is the experience.
- Complete facilitation script with teacher dialogue
- Printable student materials, ready for class
- Differentiation strategies for every learner
Watch Out for These Misconceptions
Common MisconceptionDuring Collaborative Investigation: Bias in the Data, watch for students who assume larger datasets automatically correct for bias because they include more examples.
What to Teach Instead
Use the dataset’s metadata and collection context to guide students into noticing how even large datasets can encode bias if the original sampling excluded certain groups or measured irrelevant variables.
Common MisconceptionDuring Collaborative Investigation: Bias in the Data, watch for students who believe data is neutral if it comes from 'official' sources like government records.
What to Teach Instead
Have students trace a single variable’s journey from collection to publication, highlighting the human choices in defining categories, setting thresholds, and omitting outliers.
Assessment Ideas
After Collaborative Investigation: Bias in the Data, present a short, messy CSV and ask students to identify at least three cleaning steps and explain why each step matters for reducing bias.
During Gallery Walk: Data Visualizations, ask pairs to draft a short memo summarizing one visualization’s key insight and one limitation, then share with the class.
After Think-Pair-Share: Correlation vs. Causation, collect index cards where students list one real-world example where correlation does not imply causation and explain why.
Extensions & Scaffolding
- Challenge early finishers to design a visualization that intentionally hides a key trend, then have peers detect the manipulation.
- For students who struggle, provide pre-categorized datasets that still contain obvious errors so they can focus on cleaning steps without cognitive overload.
- Deeper exploration: Invite students to find and analyze a real-world dataset misused in media, then present their findings in a mock policy brief.
Key Vocabulary
| Data Acquisition | The process of gathering raw data from various sources, such as databases, APIs, or surveys, for analysis. |
| Data Cleaning | The process of identifying and correcting or removing errors, inconsistencies, and inaccuracies in datasets to improve data quality. |
| Exploratory Data Analysis (EDA) | An approach to analyzing datasets to summarize their main characteristics, often with visual methods, to uncover patterns and identify anomalies. |
| Feature Engineering | The process of using domain knowledge to create new input variables (features) from existing raw data to improve the performance of machine learning models. |
| Model Deployment | The process of making a trained machine learning model available for use in a production environment to make predictions on new data. |
Suggested Methodologies
More in Data Science and Intelligent Systems
Big Data Concepts and Pattern Recognition
Students analyze massive datasets to find hidden trends, using statistical libraries to process and visualize complex information sets.
2 methodologies
Data Visualization and Interpretation
Students learn to create effective data visualizations to communicate insights and identify patterns in complex datasets.
2 methodologies
Fundamentals of Machine Learning: Supervised Learning
Students are introduced to supervised learning, exploring concepts like regression and classification and how models learn from labeled data.
2 methodologies
Fundamentals of Machine Learning: Unsupervised Learning
Students explore unsupervised learning techniques like clustering and dimensionality reduction to find hidden structures in unlabeled data.
2 methodologies
Neural Networks and Deep Learning (Conceptual)
Students conceptually explore how neural networks are structured, how they learn from experience, and the basics of deep learning.
2 methodologies
Ready to teach Introduction to Data Science Workflow?
Generate a full mission with everything you need
Generate a Mission