This article will explain how to perform CI/CD using Git Hub Actions and Azure Data Factory.
The most common scenario when there is an Azure subscription it’s to use Azure Devops as a repository and CI/CD pipelines as well, but some organizations have a multicloud solution in mind and are using other tools like Github instead.
Here you will find the required steps to do the deployment between environments using GitHub actions and ADF.
The engineer works and develop his pipelines in the Azure Data Factory DEV environment. According to the best practices, they will work using branches, it will depend on the project approach but usually there is a branch per feature.
The engineer will save their work on this branch, and as soon as he has a final version, he will do a pull request to publish the changes on main branch.
When this changes from main branch will be ready to be deployed on the next environments the user uses the Publish button from the ADF portal. This action will take all the configurations and developments from main branch and will be published on the “adf_publish” branch.
The “adf_publish” branch is an internal branch that uses ADF to publish the changes.
At this point we can set a trigger on it to deploy those changes between environments automatically and additionally we can require an approval between environments.
Below a simplified diagram about CI/CD between the environments that we will cover in the article.
For more information about CI/CD best practices recommended by Microsoft you can visit:
Continuous integration and delivery - Azure Data Factory | Microsoft Learn
For more complex code cycle:
Progressive experimentation with feature flags - Azure DevOps | Microsoft Learn
Using a hotfix production environment - Azure Data Factory | Microsoft Learn
After understanding the principles of how it works Azure Data Factory with the repository, let’s move on the required steps to perform the setup.
This are the steps that will be covered in the next lines.
Let’s go in detail over each step.
Go to GitHub and create a new repository for ADF, we named as Azure-Data-Factory.
After the repository it’s created, let’s go to create the required environments in GitHub. Go to Settings>Environments, will be required at least test and prod that are the environments that we will use to deploy.
Go to the ADF portal, Settings>Source Control>Git Configuration. You must perform this change only in the development ADF, not in the next environments, that’s because in the next environments we will propagate the changes through Github Actions.
After save the changes, Azure Data Factory will create a new branch named “adf_publish” in the repository that we created.
At this point we will have the main branch that contains the resources created in ADF in json format and then the adf_publish that will contain the arm code to deploy these resources through the next environments.
The service principal is required to perform the deployments in the Azure Data Factory, for that reason it requires a contributor role on the resource group.
Using powershell we will create a new service principal for each environment. For example, we have the “tst” resource group and “prd” resource group with both ADF’s.
Use the code below, changing the service account name, subscription and resource group to point to the one that you want to use.
az ad sp create-for-rbac --name tst-projectname-sp --role contributor --scopes /subscriptions/xxxxxx/resourceGroups/rg-projectname-tst --sdk-auth
This will generate an output like:
{
"clientId": "***",
"clientSecret": "*******",
"subscriptionId": "********",
"tenantId": "****",
"activeDirectoryEndpointUrl": "https://login.microsoftonline.com",
"resourceManagerEndpointUrl": "https://management.azure.com/",
"activeDirectoryGraphResourceId": "https://graph.windows.net/",
"sqlManagementEndpointUrl": "https://management.core.windows.net:8443/",
"galleryEndpointUrl": "https://gallery.azure.com/",
"managementEndpointUrl": "https://management.core.windows.net/"
}
You need to keep this output with you, the suggestion here it’s to create a secret in your Azure Key Vault to store this output credentials.
These secrets and variables will be used by the YAML file to do the deploy across the environments.
Go to GitHub, Azure Data Factory repository>Settings>Security>Secrets and variables>Actions
Create the next entries for now:
GitHub environment secrets and variables:
Name | Type | Defines the environment variable (tst, prd) |
---|---|---|
ENVIRONMENT | Variable | Euro |
AZURE_RG | Variable | Resource group name |
ADF_FACTORYNAME | Variable | Data Factory Name |
AZURE_CREDENTIALS | Secret | Credentials that we got on step 3 |
GitHub Repository variables:
Secret name | Content |
---|---|
AZURE_SUBSCRIPTION | Subscription id |
ARM_TEMPLATE_FILE | ARMTemplateForFactory.json |
ARM_TEMPLATE_PARAMETERS_FILE | ARMTemplateParametersForFactory.json |
It should look like next image, plus the secret tab for the Azure credentials.
We need to create the workflow file, to do this, go to the Git ADF repository, in adf_publish branch and create the next folders and finally the YAML file:
./github/workflows/ADF-CD.yml
This is the YAML file that will contain all the steps to be performed during the deployment.
The next code it’s the basic code required in order to perform a deploy from the adf_publish branch directly to another ADF.
You can find it here.
With this code you have the minimum YAML file required in order to do CI/CD with Github Actions and ADF.
You can commit the changes if you wish, but let’s go further… on the next steps we will cover how to:
Add stages
This will create multiple stages for each of the environments and will execute all the defined jobs per each environment, with this we can use the same YAML code to deploy between the environments and be sure that we will do the same code deployment in each environment, additionally keeps the code simpler instead of duplicate the steps.
We can define as much as stages that we want, in this case we defined test and production.
At the end of this article, when we execute the workflow it will show in GitHub like this:
Additionally in this case we added an approval and it’s waiting for it.
Usage of secrets and variables
As we saw in step five we can define secrets and variables. At this point we will setup the parameters required by the parameters file that will be replaced in each environment. Those are for example, the Keyvault URL, ADLS URL etc.
Depending on the linked services that you created in your ADF you will need some extra variables to be defined, you can find the required parameters defined on your repo under adf_publish/repo/adf_name/ARMTemplateForFactory.json as soon as you will perform your first publish.
Here in this example, we defined an ADLS storage, Blob storage and Keyvault.
Register all of them in the environment variables.
Disable and enable triggers
Microsoft recommends to disable and then enable the triggers in order to pass the code between environments, as we can see here: Azure/data-factory-deploy-action: GitHub Action for side-effect free deployment of Azure Data Factory resources
In order to perform this operation, we will use the next script.
Pre-Post Deployment Script
This script it’s provided by Microsoft in order to disable and enable the triggers and the original one can be found here: data-factory-deploy-action/PrePostDeploymentScript.ps1 at main · Azure/data-factory-deploy-action (github.com)
But in order to make it work from the Github Action we need to perform some adjustments, this is because even GitHub it’s using YAML, this YAML language it’s slightly different than the YAML used by Microsoft.
That’s the change that has been performed, for some reason the Boolean parse from the YAML script call was not working properly, so the workaround that we found was use a String type instead of Boolean.
You can find the final code here.
Disable step
This is the required code to run the powershell command that will execute the PrePostDeploymentScript.ps1
Enable step
This is the required code to run the powershell command that will execute the PrePostDeploymentScript.ps1
That was all the required steps in order to understand and be able to setup your YAML file. Being said that you can find the complete YAML code here.
name: ADF Deployment
on:
push:
branches:
- adf_publish
workflow_dispatch:
inputs:
skipAzModuleInstallation:
description: 'Parameters which skip the Az module installation'
required: false
default: 'false'
jobs:
deploy:
runs-on: ubuntu-latest
strategy:
matrix:
stage: ['stg', 'prd']
fail-fast: true
max-parallel: 1
environment:
name: ${{ matrix.stage }}
steps:
- uses: actions/checkout@v3
- name: Install Az PowerShell module
run: if('${{ inputs.skipAzModuleInstallation }}' -ne 'true') { Install-Module -Name Az -Scope CurrentUser -Repository PSGallery -Force }
shell: pwsh
- name: Azure Login
uses: Azure/login@v1
with:
creds: ${{ secrets.AZURE_CREDENTIALS }}
enable-AzPSSession: true
#Disable Triggers
- name: Run Pre-deployment script
shell: pwsh
run: pwsh -command "./PrePostDeploymentScript.ps1 -armTemplate ${{ vars.ARM_TEMPLATE_FILE }} -ResourceGroupName ${{ vars.AZURE_RG }} -DataFactoryName ${{ vars.ADF_FACTORYNAME }} -predeployment $true -deleteDeployment $false"
- name: Deploy ADF ARM Templates ${{ matrix.stage }}
uses: Azure/arm-deploy@v1
with:
resourceGroupName: ${{ vars.AZURE_RG }}
template: ${{ vars.ARM_TEMPLATE_FILE }}
parameters:
${{ vars.ARM_TEMPLATE_PARAMETERS_FILE }}
factoryName=${{ vars.ADF_FACTORYNAME }}
ADLS_properties_typeProperties_url=${{ vars.ADF_ADLS_PROPERTIES_TYPEPROPERTIES_URL }}
AzureBlob_properties_typeProperties_serviceEndpoint=${{ vars.ADF_AZUREBLOB_PROPERTIES_TYPEPROPERTIES_SERVICEENDPOINT }}
AzureKeyVault_properties_typeProperties_baseUrl=${{ vars.ADF_AZUREKEYVAULT_PROPERTIES_TYPEPROPERTIES_BASEURL }}
#Enable Triggers
- name: Run Post-deployment script
shell: pwsh
run: pwsh -command "./PrePostDeploymentScript.ps1 -armTemplate ${{ vars.ARM_TEMPLATE_FILE }} -ResourceGroupName ${{ vars.AZURE_RG }} -DataFactoryName ${{ vars.ADF_FACTORYNAME }} -predeployment $false -deleteDeployment $true"
I hope that this article was useful for you, thanks for reading!