Course Overview:
In this course, the student will learn how to implement and manage data engineering workloads on Microsoft Azure, using Azure services such as Azure Synapse Analytics, Azure Data Lake Storage Gen2, Azure Stream Analytics, Azure Databricks, and others. The course focuses on common data engineering tasks such as orchestrating data transfer and transformation pipelines, working with data files in a data lake, creating and loading relational data warehouses, capturing and aggregating streams of real-time data, and tracking data assets and lineage.
Course Objectives:
- Identify common data engineering tasks
- Describe common data engineering concepts
- Identify Azure services for data engineering
- Describe the key features and benefits of Azure Data Lake Storage Gen2
- Enable Azure Data Lake Storage Gen2 in an Azure Storage account
- Compare Azure Data Lake Storage Gen2 and Azure Blob storage
- Describe where Azure Data Lake Storage Gen2 fits in the stages of analytical processing
- Describe how Azure data Lake Storage Gen2 is used in common analytical workloads
- Identify the business problems that Azure Synapse Analytics addresses.
- Describe core capabilities of Azure Synapse Analytics.
- Determine when to use Azure Synapse Analytics.
- Identify capabilities and use cases for serverless SQL pools in Azure Synapse Analytics
- Query CSV, JSON, and Parquet files using a serverless SQL pool
- Create external database objects in a serverless SQL pool
- Use a CREATE EXTERNAL TABLE AS SELECT (CETAS) statement to transform data.
- Encapsulate a CETAS statement in a stored procedure.
- Include a data transformation stored procedure in a pipeline.
- Understand lake database concepts and components
- Describe database templates in Azure Synapse Analytics
- Create a lake database
- Identify core features and capabilities of Apache Spark.
- Configure a Spark pool in Azure Synapse Analytics.
- Run code to load, analyze, and visualize data in a Spark notebook.
- Use Apache Spark to modify and save dataframes
- Partition data files for improved performance and scalability.
- Transform data with SQL
- Describe core features and capabilities of Delta Lake.
- Create and use Delta Lake tables in a Synapse Analytics Spark pool.
- Create Spark catalog tables for Delta Lake data.
- Use Delta Lake tables for streaming data.
- Query Delta Lake tables from a Synapse Analytics SQL pool.
- Design a schema for a relational data warehouse.
- Create fact, dimension, and staging tables.
- Use SQL to load data into data warehouse tables.
- Use SQL to query relational data warehouse tables.
- Load staging tables in a data warehouse
- Load dimension tables in a data warehouse
- Load time dimensions in a data warehouse
- Load slowly changing dimensions in a data warehouse
- Load fact tables in a data warehouse
- Perform post-load optimizations in a data warehouse
- Describe core concepts for Azure Synapse Analytics pipelines.
- Create a pipeline in Azure Synapse Studio.
- Implement a data flow activity in a pipeline.
- Initiate and monitor pipeline runs.
- Describe notebook and pipeline integration.
- Use a Synapse notebook activity in a pipeline.
- Use parameters with a notebook activity.
- Describe Hybrid Transactional / Analytical Processing patterns.
- Identify Azure Synapse Link services for HTAP.
- Configure an Azure Cosmos DB Account to use Azure Synapse Link.
- Create an analytical store enabled container.
- Create a linked service for Azure Cosmos DB.
- Analyze linked data using Spark.
- Analyze linked data using Synapse SQL.
- Understand key concepts and capabilities of Azure Synapse Link for SQL.
- Configure Azure Synapse Link for Azure SQL Database.
- Configure Azure Synapse Link for Microsoft SQL Server.
- Understand data streams.
- Understand event processing.
- Understand window functions.
- Get started with Azure Stream Analytics.
- Describe common stream ingestion scenarios for Azure Synapse Analytics.
- Configure inputs and outputs for an Azure Stream Analytics job.
- Define a query to ingest real-time data into Azure Synapse Analytics.
- Run a job to ingest real-time data, and consume that data in Azure Synapse Analytics.
- Configure a Stream Analytics output for Power BI.
- Use a Stream Analytics query to write data to Power BI.
- Create a real-time data visualization in Power BI.
- Evaluate whether Microsoft Purview is appropriate for your data discovery and governance needs.
- Describe how the features of Microsoft Purview work to provide data discovery and governance.
- Catalog Azure Synapse Analytics database assets in Microsoft Purview.
- Configure Microsoft Purview integration in Azure Synapse Analytics.
- Search the Microsoft Purview catalog from Synapse Studio.
- Track data lineage in Azure Synapse Analytics pipelines activities.
- Provision an Azure Databricks workspace.
- Identify core workloads and personas for Azure Databricks.
- Describe key concepts of an Azure Databricks solution.
- Describe key elements of the Apache Spark architecture.
- Create and configure a Spark cluster.
- Describe use cases for Spark.
- Use Spark to process and analyze data stored in files.
- Use Spark to visualize data.
- Describe how Azure Databricks notebooks can be run in a pipeline.
- Create an Azure Data Factory linked service for Azure Databricks.
- Use a Notebook activity in a pipeline.
- Pass parameters to a notebook.
Who Should Attend?
The primary audience for this course is data professionals, data architects, and business intelligence professionals who want to learn about data engineering and building analytical solutions using data platform technologies that exist on Microsoft Azure. The secondary audience for this course includes data analysts and data scientists who work with analytical solutions built on Microsoft Azure.
Course Prerequisites
There are no prerequisites for this course.
Course Content:
Module 1: Introduction to data engineering on Azure
Introduction
What is data engineering
Important data engineering concepts
Data engineering in Microsoft Azure
Knowledge check
Summary
Module 2: Introduction to Azure Data Lake Storage Gen2
Introduction
Understand Azure Data Lake Storage Gen2
Enable Azure Data Lake Storage Gen2 in Azure Storage
Compare Azure Data Lake Store to Azure Blob storage
Understand the stages for processing big data
Use Azure Data Lake Storage Gen2 in data analytics workloads
Knowledge check
Summary
Module 3: Introduction to Azure Synapse Analytics
Introduction
What is Azure Synapse Analytics
How Azure Synapse Analytics works
When to use Azure Synapse Analytics
Exercise – Explore Azure Synapse Analytics
Knowledge check
Summary
Module 4: Use Azure Synapse serverless SQL pool to query files in a data lake
Introduction
Understand Azure Synapse serverless SQL pool capabilities and use cases
Query files using a serverless SQL pool
Create external database objects
Exercise – Query files using a serverless SQL pool
Knowledge check
Summary
Module 5: Use Azure Synapse serverless SQL pools to transform data in a data lake
Introduction
Transform data files with the CREATE EXTERNAL TABLE AS SELECT statement
Encapsulate data transformations in a stored procedure
Include a data transformation stored procedure in a pipeline
Exercise – Transform files using a serverless SQL pool
Knowledge check
Summary
Module 6: Create a lake database in Azure Synapse Analytics
Introduction
Understand lake database concepts
Explore database templates
Create a lake database
Use a lake database
Exercise – Analyze data in a lake database
Knowledge check
Summary
Module 7: Analyze data with Apache Spark in Azure Synapse Analytics
Introduction
Get to know Apache Spark
Use Spark in Azure Synapse Analytics
Analyze data with Spark
Visualize data with Spark
Exercise – Analyze data with Spark
Knowledge check
Summary
Module 8: Transform data with Spark in Azure Synapse Analytics
Introduction
Modify and save dataframes
Partition data files
Transform data with SQL
Exercise: Transform data with Spark in Azure Synapse Analytics
Knowledge check
Summary
Module 9: Use Delta Lake in Azure Synapse Analytics
Introduction
Understand Delta Lake
Create Delta Lake tables
Create catalog tables
Use Delta Lake with streaming data
Use Delta Lake in a SQL pool
Exercise – Use Delta Lake in Azure Synapse Analytics
Knowledge check
Summary
Module 10: Analyze data in a relational data warehouse
Introduction
Design a data warehouse schema
Create data warehouse tables
Load data warehouse tables
Query a data warehouse
Exercise – Explore a data warehouse
Knowledge check
Summary
Module 11: Load data into a relational data warehouse
Introduction
Load staging tables
Load dimension tables
Load time dimension tables
Load slowly changing dimensions
Load fact tables
Perform post load optimization
Exercise – load data into a relational data warehouse
Knowledge check
Summary
Module 12: Build a data pipeline in Azure Synapse Analytics
Introduction
Understand pipelines in Azure Synapse Analytics
Create a pipeline in Azure Synapse Studio
Define data flows
Run a pipeline
Exercise – Build a data pipeline in Azure Synapse Analytics
Knowledge check
Summary
Module 13: Use Spark Notebooks in an Azure Synapse Pipeline
Introduction
Understand Synapse Notebooks and Pipelines
Use a Synapse notebook activity in a pipeline
Use parameters in a notebook
Exercise – Use an Apache Spark notebook in a pipeline
Knowledge check
Summary
Module 14: Plan hybrid transactional and analytical processing using Azure Synapse Analytics
Introduction
Understand hybrid transactional and analytical processing patterns
Describe Azure Synapse Link
Knowledge check
Summary
Module 15: Implement Azure Synapse Link with Azure Cosmos DB
Introduction
Enable Cosmos DB account to use Azure Synapse Link
Create an analytical store enabled container
Create a linked service for Cosmos DB
Query Cosmos DB data with Spark
Query Cosmos DB with Synapse SQL
Exercise – Implement Azure Synapse Link for Cosmos DB
Knowledge check
Summary
Module 16: Implement Azure Synapse Link for SQL
Introduction
What is Azure Synapse Link for SQL?
Configure Azure Synapse Link for Azure SQL Database
Configure Azure Synapse Link for SQL Server 2022
Exercise – Implement Azure Synapse Link for SQL
Knowledge check
Summary
Module 17: Get started with Azure Stream Analytics
Introduction
Understand data streams
Understand event processing
Understand window functions
Exercise – Get started with Azure Stream Analytics
Knowledge check
Summary
Module 18: Ingest streaming data using Azure Stream Analytics and Azure Synapse Analytics
Introduction
Stream ingestion scenarios
Configure inputs and outputs
Define a query to select, filter, and aggregate data
Run a job to ingest data
Exercise – Ingest streaming data into Azure Synapse Analytics
Knowledge check
Summary
Module 19: Visualize real-time data with Azure Stream Analytics and Power BI
Introduction
Use a Power BI output in Azure Stream Analytics
Create a query for real-time visualization
Create real-time data visualizations in Power BI
Exercise – Create a real-time data visualization
Knowledge check
Summary
Module 20: Introduction to Microsoft Purview
Introduction
What is Microsoft Purview?
How Microsoft Purview works
When to use Microsoft Purview
Knowledge check
Summary
Module 21: Integrate Microsoft Purview and Azure Synapse Analytics
Introduction
Catalog Azure Synapse Analytics data assets in Microsoft Purview
Connect Microsoft Purview to an Azure Synapse Analytics workspace
Search a Purview catalog in Synapse Studio
Track data lineage in pipelines
Exercise – Integrate Azure Synapse Analytics and Microsoft Purview
Knowledge check
Summary
Module 22: Explore Azure Databricks
Introduction
Get started with Azure Databricks
Identify Azure Databricks workloads
Understand key concepts
Exercise – Explore Azure Databricks
Knowledge check
Summary
Module 23: Use Apache Spark in Azure Databricks
Introduction
Get to know Spark
Create a Spark cluster
Use Spark in notebooks
Use Spark to work with data files
Visualize data
Exercise – Use Spark in Azure Databricks
Knowledge check
Summary
Module 24: Run Azure Databricks Notebooks with Azure Data Factory
Introduction
Understand Azure Databricks notebooks and pipelines
Create a linked service for Azure Databricks
Use a Notebook activity in a pipeline
Use parameters in a notebook
Exercise – Run an Azure Databricks Notebook with Azure Data Factory
Knowledge check
Summary