Azure Data Factory CI/CD with GitHub Actions

This article will explain how to perform CI/CD using Git Hub Actions and Azure Data Factory.

The most common scenario when there is an Azure subscription it’s to use Azure Devops as a repository and CI/CD pipelines as well, but some organizations have a multicloud solution in mind and are using other tools like Github instead.

Here you will find the required steps to do the deployment between environments using GitHub actions and ADF.

How it works ADF with CI/CD

The engineer works and develop his pipelines in the Azure Data Factory DEV environment. According to the best practices, they will work using branches, it will depend on the project approach but usually there is a branch per feature.

The engineer will save their work on this branch, and as soon as he has a final version, he will do a pull request to publish the changes on main branch.

When this changes from main branch will be ready to be deployed on the next environments the user uses the Publish button from the ADF portal. This action will take all the configurations and developments from main branch and will be published on the “adf_publish” branch.

The “adf_publish” branch is an internal branch that uses ADF to publish the changes.

At this point we can set a trigger on it to deploy those changes between environments automatically and additionally we can require an approval between environments.

Below a simplified diagram about CI/CD between the environments that we will cover in the article.

For more information about CI/CD best practices recommended by Microsoft you can visit:

Continuous integration and delivery - Azure Data Factory | Microsoft Learn

For more complex code cycle:

Progressive experimentation with feature flags - Azure DevOps | Microsoft Learn

Using a hotfix production environment - Azure Data Factory | Microsoft Learn

Hands on - CI/CD Setup

After understanding the principles of how it works Azure Data Factory with the repository, let’s move on the required steps to perform the setup.

Steps summary:

This are the steps that will be covered in the next lines.

  1. GitHub repository creation
  2. GitHub environments creation
  3. Repository setup in ADF
  4. Azure Service principal creation
  5. GitHub secrets and variables creation
  6. YAML file creation
  7. Minimal YAML creation
  8. Full YAML

Let’s go in detail over each step.

1- Create the GitHub repository

Go to GitHub and create a new repository for ADF, we named as Azure-Data-Factory.

 

2- Create the environments in GitHub:

After the repository it’s created, let’s go to create the required environments in GitHub. Go to Settings>Environments, will be required at least test and prod that are the environments that we will use to deploy.

3- Setup the new repository in Azure Data Factory:

Go to the ADF portal, Settings>Source Control>Git Configuration. You must perform this change only in the development ADF, not in the next environments, that’s because in the next environments we will propagate the changes through Github Actions.

After save the changes, Azure Data Factory will create a new branch named “adf_publish” in the repository that we created.

At this point we will have the main branch that contains the resources created in ADF in json format and then the adf_publish that will contain the arm code to deploy these resources through the next environments.

4- Create the service principal in Azure portal:

The service principal is required to perform the deployments in the Azure Data Factory, for that reason it requires a contributor role on the resource group.

Using powershell we will create a new service principal for each environment. For example, we have the “tst” resource group and “prd” resource group with both ADF’s.

Use the code below, changing the service account name, subscription and resource group to point to the one that you want to use.

az ad sp create-for-rbac --name tst-projectname-sp --role contributor --scopes /subscriptions/xxxxxx/resourceGroups/rg-projectname-tst --sdk-auth

This will generate an output like:

{
  "clientId": "***",
  "clientSecret": "*******",
  "subscriptionId": "********",
  "tenantId": "****",
  "activeDirectoryEndpointUrl": "https://login.microsoftonline.com",
  "resourceManagerEndpointUrl": "https://management.azure.com/",
  "activeDirectoryGraphResourceId": "https://graph.windows.net/",
  "sqlManagementEndpointUrl": "https://management.core.windows.net:8443/",
  "galleryEndpointUrl": "https://gallery.azure.com/",
  "managementEndpointUrl": "https://management.core.windows.net/"
}

You need to keep this output with you, the suggestion here it’s to create a secret in your Azure Key Vault to store this output credentials.

5- Create the next secrets and variables in GitHub:

These secrets and variables will be used by the YAML file to do the deploy across the environments.

Go to GitHub, Azure Data Factory repository>Settings>Security>Secrets and variables>Actions

Create the next entries for now:

GitHub environment secrets and variables:

 
Name Type Content
ENVIRONMENT  Variable  Defines the environment variable (tst, prd)
AZURE_RG  Variable Resource group name
ADF_FACTORYNAME  Variable Data Factory Name
AZURE_CREDENTIALS   Secret Credentials that we got on step 3

GitHub Repository variables:

 
Secret name Content
AZURE_SUBSCRIPTION Subscription id
ARM_TEMPLATE_FILE ARMTemplateForFactory.json
ARM_TEMPLATE_PARAMETERS_FILE   ARMTemplateParametersForFactory.json

It should look like next image, plus the secret tab for the Azure credentials.

 

6- YAML file creation

We need to create the workflow file, to do this, go to the Git ADF repository, in adf_publish branch and create the next folders and finally the YAML file:

./github/workflows/ADF-CD.yml

This is the YAML file that will contain all the steps to be performed during the deployment.

The next code it’s the basic code required in order to perform a deploy from the adf_publish branch directly to another ADF.

You can find it here.

With this code you have the minimum YAML file required in order to do CI/CD with Github Actions and ADF.

You can commit the changes if you wish, but let’s go further… on the next steps we will cover how to:

  • Add multiple stages for multiple environments
  • Use variables instead of hardcoding the values and make the code re-usable
  • Disable/Enable the triggers during deployment

Digital Lover

7- Full YAML definition

Add stages

This will create multiple stages for each of the environments and will execute all the defined jobs per each environment, with this we can use the same YAML code to deploy between the environments and be sure that we will do the same code deployment in each environment, additionally keeps the code simpler instead of duplicate the steps.

We can define as much as stages that we want, in this case we defined test and production.

 

At the end of this article, when we execute the workflow it will show in GitHub like this:

Additionally in this case we added an approval and it’s waiting for it.

Usage of secrets and variables

As we saw in step five we can define secrets and variables. At this point we will setup the parameters required by the parameters file that will be replaced in each environment. Those are for example, the Keyvault URL, ADLS URL etc.

Depending on the linked services that you created in your ADF you will need some extra variables to be defined, you can find the required parameters defined on your repo under adf_publish/repo/adf_name/ARMTemplateForFactory.json as soon as you will perform your first publish.

Here in this example, we defined an ADLS storage, Blob storage and Keyvault.

Register all of them in the environment variables.

Disable and enable triggers

Microsoft recommends to disable and then enable the triggers in order to pass the code between environments, as we can see here: Azure/data-factory-deploy-action: GitHub Action for side-effect free deployment of Azure Data Factory resources

  • This prevents the execution of active triggers during the deployment.
  • Unavailability of some resources during the deployment.

In order to perform this operation, we will use the next script.

Pre-Post Deployment Script

This script it’s provided by Microsoft in order to disable and enable the triggers and the original one can be found here: data-factory-deploy-action/PrePostDeploymentScript.ps1 at main · Azure/data-factory-deploy-action (github.com)

But in order to make it work from the Github Action we need to perform some adjustments, this is because even GitHub it’s using YAML, this YAML language it’s slightly different than the YAML used by Microsoft.

That’s the change that has been performed, for some reason the Boolean parse from the YAML script call was not working properly, so the workaround that we found was use a String type instead of Boolean.

You can find the final code here

Disable step

This is the required code to run the powershell command that will execute the PrePostDeploymentScript.ps1

Enable step

This is the required code to run the powershell command that will execute the PrePostDeploymentScript.ps1

That was all the required steps in order to understand and be able to setup your YAML file. Being said that you can find the complete YAML code here

name: ADF Deployment
on:
  push:
    branches:
      - adf_publish
  workflow_dispatch:
    inputs:
      skipAzModuleInstallation:
        description: 'Parameters which skip the Az module installation'
        required: false
        default: 'false'
        
jobs:
  deploy:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        stage: ['stg', 'prd']
      fail-fast: true
      max-parallel: 1
    environment:
      name: ${{ matrix.stage }}
      
    steps:
    - uses: actions/checkout@v3

    - name: Install Az PowerShell module
      run: if('${{ inputs.skipAzModuleInstallation }}' -ne 'true') { Install-Module -Name Az -Scope CurrentUser -Repository PSGallery -Force }
      shell: pwsh
      
    - name: Azure Login
      uses: Azure/login@v1
      with:
        creds: ${{ secrets.AZURE_CREDENTIALS }}
        enable-AzPSSession: true 

    #Disable Triggers
    - name: Run Pre-deployment script
      shell: pwsh
      run: pwsh -command "./PrePostDeploymentScript.ps1 -armTemplate ${{ vars.ARM_TEMPLATE_FILE }}  -ResourceGroupName ${{ vars.AZURE_RG }} -DataFactoryName ${{ vars.ADF_FACTORYNAME }} -predeployment $true -deleteDeployment $false"
      
      
    - name: Deploy ADF ARM Templates ${{ matrix.stage }}
      uses: Azure/arm-deploy@v1
      with:
        resourceGroupName: ${{ vars.AZURE_RG }}
        template: ${{ vars.ARM_TEMPLATE_FILE }}
        parameters: 
          ${{ vars.ARM_TEMPLATE_PARAMETERS_FILE }}
          factoryName=${{ vars.ADF_FACTORYNAME }} 
          ADLS_properties_typeProperties_url=${{ vars.ADF_ADLS_PROPERTIES_TYPEPROPERTIES_URL }} 
          AzureBlob_properties_typeProperties_serviceEndpoint=${{ vars.ADF_AZUREBLOB_PROPERTIES_TYPEPROPERTIES_SERVICEENDPOINT }}
          AzureKeyVault_properties_typeProperties_baseUrl=${{ vars.ADF_AZUREKEYVAULT_PROPERTIES_TYPEPROPERTIES_BASEURL }}
    
    #Enable Triggers
    - name: Run Post-deployment script
      shell: pwsh
      run: pwsh -command "./PrePostDeploymentScript.ps1 -armTemplate ${{ vars.ARM_TEMPLATE_FILE }} -ResourceGroupName ${{ vars.AZURE_RG }} -DataFactoryName ${{ vars.ADF_FACTORYNAME }} -predeployment $false -deleteDeployment $true"

I hope that this article was useful for you, thanks for reading!

 

 

Guía de posibilidades profesionales sobre Azure
He leído y acepto la política de privacidad
Acepto recibir emails sobre actividades de recruiting NTT DATA