Week 4: ETL Pipeline
In the final week, we bring many different skills together: cloud compute, storage, networking, and databases. Students will build a complete ETL (Extract, Transform, Load) pipeline using two different approaches — low-code and code-first.
Focus
- Understand how ETL pipelines work in cloud-based production systems
- Build and run pipelines using both graphical and code-based tools
- Compare the flexibility, transparency, and scalability of each approach
Hands-On Activities
- Extract data from a public REST API using the
requestslibrary - Transform and clean data
- Load the processed data into db
- Compare two approaches to orchestrating the workflow:
- Low-code: Azure Data Factory
- Code-first: Python with Prefect
- Show how to pull and analyze stored data using PowerBI.
Learning Outcomes
By the end of this week, students will be able to:
- Explain the purpose and structure of an ETL pipeline
- Extract external data from an API using Python
- Load data into a relational cloud-native database
- Analyze the final dataset to generate insights
- Compare tradeoffs between:
- Azure Data Factory (low-code, scalable, abstracted)
- Prefect (flexible, Python-native, transparent)
- Analyze data in db using Power BI.
Resources
Instructor Notes
Teamwork is central this week — students will collaborate on designing and deploying a real pipeline.
Consider: - Assigning different groups to different tools (ADF vs Prefect) - Reviewing how to monitor and troubleshoot pipelines in both environments - Prompting class discussion on the long-term maintainability of low-code vs code-first solutions