Archief op mei 2019

Using logic apps and Azure data factory to improve your ETL processes.

Logic Apps are great when it comes to performing recurring tasks. There are a bunch of connectors available and it looks like your own creativity is the limit of what you can do with logic apps. Of course when it comes to complex, long running tasks it might be better to look at a different solution.

Another great product is Azure data factory, especially after version 2 was released and a visual GUI made it so much easier to build data driven pipelines.

The combination of these two tools is great and I use them a lot together. Logic apps can be triggered from a http request and in Azure data factory you have this web task available by default. So to run a logic app from Azure data factory you simply add this url to a web task and your good to go.

The example above fires a web request to the url that is automatically created after a logic app is created from a HTTP trigger .

Depending on what the logic app does you need to deal with authentication. For now we leave it this way and start by creating a simple logic app.

New logic app

Go to logic Apps in the Azure Portal, Click Add and specify a name. In this case I will name it “MyLogicApp”.
After submitting your Logic App should be in the list, if not hit refresh. Open your logic app and in the overview page click on ” View in Logic Apps designer “.

This is where the magic starts. Scroll down and find the “Blank Logic App”

Search for “Http” and select the following

From the “Add new parameter” dropdown select the Method and select Get.
Hit Save and look for the url that is now available in the ” HTTP POST URL ” field.

The logic App should look like this now.

If you run this logic app you can see that it succeeds. Of course, there is nothing that can fail 🙂

New Azure data factory pipeline

Go to data factories in the Azure portal. Create a data factory if you have none otherwise open the data factory by selecting it and choose for “Author & Monitor”. This brings you in the development GUI for Azure data factory

Under Factory resources hit the “+” and select New Pipeline

From the activities menu drag a Web task (under General) onto the canvas and give it a name. Select the Web task and go to settings. Paste the logic app url into the URL field and select the Method Get.


This method depends on the method we have selected while creating the logic app. There are some different methods but the most used are GET, POST, PUT and Delete. Some reference to these methodes https://www.w3schools.com/tags/ref_httpmethods.asp

Run the pipeline by selecting Debug.

This should result in green, succeeded messages. This is the first step in running web requests from Azure data factory.

A practical example

In a modern data platform architecture it might be interesting to do some up- and downscaling before starting a long running ETL process. In this example we are going to check the current configuration for a Azure SQL database. This might be the first step, after we found out the current service level objective we can create another logic app to up- or downscale this database.

Create a Azure SQL database if there is none available. This can be done under SQL Databases in the Azure portal. If there is no SQL Server this should be created as well but can be done from the database creation wizard.

Navigate back to logic apps and create a new Logic App or go back to the designer interface from the previous Logic App. Start again with a HTTP Request and hit the “+ New step” button. Choose HTTP.

By selecting HTTP you this Logic App is going to do a HTTP request itself to get the state of a SQL Server database.

Fill in the field Method, URl and Authentication.


Change the url parts between []
https://management.azure.com/subscriptions/[]/resourceGroups/[]/providers/Microsoft.Sql/servers/[]/databases/[]?api-version=2017-10-01-preview

Full reference on the HTTP Requests can be found @ https://docs.microsoft.com/en-us/rest/api/sql/databases

It’s time to save and run this Logic App. After running you will notice that it failes to get the database status. We did specify Managed Identity as our authentication but we did not give this identity access to our SQL server. Before we go into this part let me quickly explain two types of authentication I normally use.

Authentication

From the Authentication dropdown a couple of options were available. I am not familair with all options but prefer to use Active directory OAuth or managed identity. With Oauth you need to create a app registration from within Active directory that can be referred to as a service principle with a username (unique identifier) and a password. Together with the tenant-id this account can be used to authenticate on the Azure components you give it permission to.

The managed Identity is not fully available on all Azure components at the moment but is quickly gaining terrain. For each Azure service there is an automatically created identity in Azure active directory that can be used between Azure services without using usernames and password in your code. Reference: https://docs.microsoft.com/en-us/azure/active-directory/managed-identities-azure-resources/services-support-managed-identities

In our Logic App we selected the Managed Identity but did not yet give it permission. Time to do so. First we need to set up this identity for this Logic App because for Logic Apps it is not automatically created in Azure AD. So from our logic app navigate to Identity and toggle the status to on.

Hit Save and navigate to your Azure SQL Server. Navigate to Access control, click Add, select “Contributor” as role and now search for your Logic App. Select it and hit save.

Now go back to your logic app and rerun! This can be done directly from Run trigger in the Overview tab. It succeeds!

At this point we have created a Logic App that fires a HTTP request to our SQL Server database to return it’s current state based on a managed identity authentication. Back to Azure data factory to tie both together and do something with this result.

Combining Logic App and Azure data factory

Navigate back to your data factory. Create a new pipeline and put a web task on the canvas. Specify the Logic App url, Get as it’s method and run the pipeline. As you can see it succeeds but the response is blank.
You can see this response from the output window and selecting the outgoing arrow under Actions. The actual response is:
{ “Response”: “” }

This is because we did not specify an output in our Logic App. This is easy to set-up. Navigate back to your Logic App and open the designer. Add a new task by searching Response.

After selecting the status code field a popup appears. Select the status code from there and do the same twith the body field.

Go back to you Azure data factory pipeline and re-run the pipeline. Check the output, it should look like below.

{
    "sku": {
        "name": "Basic",
        "tier": "Basic",
        "capacity": 5
    },
    "kind": "v12.0,user",
    "properties": {
        "collation": "SQL_Latin1_General_CP1_CI_AS",
        "maxSizeBytes": 2147483648,
        "status": "Online",
        "databaseId": "",
        "creationDate": "",
        "currentServiceObjectiveName": "Basic",
        "requestedServiceObjectiveName": "Basic",
        "defaultSecondaryLocation": "northeurope",
        "catalogCollation": "SQL_Latin1_General_CP1_CI_AS",
        "zoneRedundant": false,
        "earliestRestoreDate": "",
        "readScale": "Disabled",
        "readReplicaCount": 0,
        "currentSku": {
            "name": "Basic",
            "tier": "Basic",
            "capacity": 5
        }
    },
    "location": "westeurope",
    "id": "",
    "name": "",
    "type": "Microsoft.Sql/servers/databases"
}

Now add an If task to the canvas and drag the green line from the web task to the If task. Select the If task, go to settings and click the “Add dynamic content” text. Specify this expression: @equals(‘Basic’, coalesce(activity(‘Get database state’).output.sku.tier, ‘null’))
Be aware that the “Get database state” is the name of the web component.

It should look like this now

Now go to Activities and click the button under true. Add a Wait task to the canvas and now run this pipeline. In the output window you can see that the wait task was triggered so now we could add another web request to scale this database to standard or premium.

Like I sad before with enough creativity great processed can be created with Logic Apps and Azure Data factory.

But why Logic Apps?

Everything we’ve done so far with Logic Apps can also be accomplished with only Azure data factory. For example we could have used the web request to the database in our Logic App directly in Azure data factory.

The url is exactly the same as in the Logic App, method is also Get and MSI is selected with the default Azure managed identity url. So why use Logic Apps at all?

If you run processes or tasks that are not initiated from Azure data factory or you have multiple data factories running it is a good idea to put these actions into a Logic App. With this you create a higher level of flexibility and if anything needs to be changed you only have to change the Logic App.

If you only run processes from Azure data factory you could just create a couple pipelines with each a specific task that can be referred from another pipeline by using the “Execute Pipeline” task.

One of my next blogs will be full of examples with web-requests that will improve your ETL process. Start thinking about

  • Up- downscale database / datawarehouse
  • Pause / resume datawarehouse
  • Process Azure analysis services
  • Start / Stop integration runtime

Conclusion

There are a lot of possibilities with web-request that can run from within Azure data factory alone or in combination with Logic Apps to get a more dynamic approach to your ETL proccesses. If you can dynamically scale your whole architecture only when needed this could deliver great performance improvement and lowering your cost when everything is scaled down or paused after execution.