Week 4: ETL Pipeline

In the final week, we bring many different skills together: cloud compute, storage, networking, and databases. Students will build a complete ETL (Extract, Transform, Load) pipeline using two different approaches — low-code and code-first.

Focus

Understand how ETL pipelines work in cloud-based production systems
Build and run pipelines using both graphical and code-based tools
Compare the flexibility, transparency, and scalability of each approach

Hands-On Activities

Extract data from a public REST API using the requests library
Transform and clean data
Load the processed data into db
Compare two approaches to orchestrating the workflow:
- Low-code: Azure Data Factory
- Code-first: Python with Prefect
Show how to pull and analyze stored data using PowerBI.

Learning Outcomes

By the end of this week, students will be able to:

Explain the purpose and structure of an ETL pipeline
Extract external data from an API using Python
Load data into a relational cloud-native database
Analyze the final dataset to generate insights
Compare tradeoffs between:
- Azure Data Factory (low-code, scalable, abstracted)
- Prefect (flexible, Python-native, transparent)
Analyze data in db using Power BI.

Resources

Instructor Notes

Teamwork is central this week — students will collaborate on designing and deploying a real pipeline.

Consider: - Assigning different groups to different tools (ADF vs Prefect) - Reviewing how to monitor and troubleshoot pipelines in both environments - Prompting class discussion on the long-term maintainability of low-code vs code-first solutions