This post is about module 4 in the Cloud Skills challenge.
Previous posts in the series:
Microsoft Learn Module
Use Data Factory Pipelines in Microsoft Fabric
- This module is about Data pipelines, which will be familiar to anyone who uses Azure Data Factory as the concepts are the same, and it uses the same architecture. Pipelines are a common tool to automate some ETL processes (Extract, Transform & Load).
- The exercises in this module walk you through creating a connection and setting up a Copy Data activity to pull data into a Lakehouse, then use some concepts from earlier modules via Python commands in a Notebook to create tables in the Lakehouse and extract the data from the files to the tables.
Learn Together links (recordings from wave 1)
This was Day 4, the first session of the week 2 series of learning sessions. One of the two recordings is linked below, the one I happened to watch.
Key Takeaways
The key takeaways to as follows, in no particular order:
- It is not the same as Azure Data Factory but similar conceptually.
- Copying data is one of the most common use cases for using a pipeline, but if a lot of transformations are needed, it may be better to use a Dataflow activity instead of the Copy Data activity.
- There are primary options for data transformation activities - copy data, dataflow, notebook or stored procedure.
- The control flow activities which are the rules, variables, looping and conditions for what the pipeline should do, appear to be virtually unlimited. Lots of templates already exist that should allow users to get started quickly without having to build it out all from a blank slate.