Parsing Azure Firewall logs in Microsoft Sentinel

8 min readJan 12, 2022

… and how to parse other large datasets with KQL

II’ve created parsers for Azure Firewall logs to use with Microsoft Sentinel. Follow the steps of this Deploy to Azure automated deployment to get started.

Introduction

One of the great things about Microsoft Sentinel is obviously the shear amount of data connectors available with which you can ingest logs from possibly every log source imaginable. But sooner or later you’ll probably end up in a scenario where your logs are ingested as single lines of strings, and you need to parse them into separate columns.

This is especially noticeable with syslog sources where the source doesn’t use the Common Event Format (CEF):

"HTTPS request from 0.0.0.0:46182 to dev.azure.com:443. Action: Allow. Policy: azfwpolicy-01. Rule Collection Group: DefaultApplicationRuleCollectionGroup. Rule Collection: enablement-genericwebcategory-prd-allow-200. Rule: Allowed Business-Use Webcategories. Web Category: ComputersAndTechnology"

The example above shows what an unparsed syslog message might look like. You obviously want to separate all the details into individual columns to make filtering possible.

While you can filter on text strings with operators like contains and has, this is very resource intensive. Especially when working with larger data sets, where the source outputs a lot of log entries (i.e. a firewall appliance), this immediately becomes unworkable since the queries will take too long and will probably timeout.

The example of an unparsed log entry shown above was not from a syslog source. It unfortunately came from Azure Firewall… The reason I call this unfortunate is because one might expect that since this is also a Microsoft-product, it probably should’ve generated neatly structured logs towards Microsoft Sentinel “out of the box”.

In this article I’ll not only show you how to properly parse your Azure Firewall log entries, I will also guide you through the whole process of how my parsers came about. That way you can also apply this to other logs sources that need parsing as well.

Kusto to the rescue!

The Kusto Query Language (KQL) we’re using in Microsoft Sentinel provides a plethora of tabular operators to interact with out data, including options to parse entries:

parse will evaluate a string expression and parses its value into one or more columns. The columns will have nulls, for unsuccessfully parsed strings.
parse-where will evaluate and parse a string expression, but will only output results once it was successful in this process.

If the to-be-parsed string contains a value you want to keep right at the start, you need to use these operators a bit differently then when it starts with an expression on which you want to evaluate strings on.

For example, this next string contains an expression “Protocol:” which we want to evaluate right at the start. What comes after that is our first value that should be put into a column named “protocol”:

"Protocol: TCP Source: 10.0.0.1 Destination: 172.0.0.1"

This can be parsed with:

| parse string with * 
     "Protocol: "      protocol:string
     "Source: "        sourceIp:string
     "Destination: "   destinationIp:string

But the next example immediately starts with a value we want to keep.

"TCP source 10.0.0.1 with destination 172.0.0.1"

And should be parsed with:

| parse string 
     with                   protocol:string
     " Source "             sourceIp:string
     " with destination "   destinationIp:string

Note that the wildcard ‘ * ’ is missing in this parser.

Advanced parsing

Where things get a bit more complicated tough, is when you encounter log sources where not all entries have the exact same pattern.

You won’t be able to write one parser to rule them all. Because as we’ve seen above a parse operator will try and parse all rows you provide. Depending on if you use parse or parse-where either some columns values wil stay empty, or those rows unable to parse simply won’t be outputted at all.

So how can we parse a complete dataset which requires multiple parse operators, each with their own logic?

Simply put:

We first need to determine how many different variations/patterns there are inside the dataset.
Secondly we need to separate the dataset into multiple smaller sets.
Each set will be parsed with it owns parsing pattern.
Lastly, all datasets should be combined again to form one output.

So a parser within a parser? Where have I seen this before?

You might have already guessed that this approach is also required when parsing Azure Firewall logs. But before we dig into actual Azure Firewall logs, let’s take a closer look at how we split, parse and join data with a simpler example.

These four rows contains information about several cities but note that not all information is there for every city. If we want to successfully parse all rows of this dataset, we need to same approach as outlined above.

This data seems to contains six different values per city, but none of the rows seem to be complete. If we just try and parse this with a single parse operator we’ll notice that our output is incomplete because none of the rows will perfectly match the parse logic.

| parse string_ with *
     "Country: "     country:string
     "State: "       state:string
     "City: "        city:string
     "Population: "  population:int
     "Attraction: "  attraction:string
     "Local dish: "  localDish:string

So, let’s try that again but with some parseception™ this time:

By using multiple let statements with parse-where we’re parsing the rows and only outputting data if the parser is successful. And since these four different rows need four different parser logics, we know that each section will only output one row.

By using a union operator we can glue these multiple tables together again, and the output will be shown as follows:

Performance optimization

Before digging into Azure Firewall logs, there’s one last this I need to adres.

In the example above we’ve divided the dataset into four smaller sets, applied parsers and combined the output. And even tough every let statement is throwing away rows which couldn’t be parsed, it is still going through the entire dataset. In this case that means we’re going through the same data four times in a row!

In this particular example that might now be much of an issue. But when handling larger datasets the performance hit will become very large, very quickly.

The solution to this is to make use of string operators within every let statement to filter the data first before parsing. The most efficient way to do this is, is by making use of the has operator and its many variations like hasprefix, hassuffix, has_any and has_all.

Always use has over contains since the latter is much slower! This is because has searches for keywords that are indexed. Therefore you should also make sure that your keyword(s) contains at least three characters. Otherwise has will resort to full-text search on the entire string, and it will perform equally slow ascontains.

Our previous example, but now “optimized” by leveraging filters in each parse section:

Azure Firewall

Finally we get to the meat of the matter!

Logs are captured by enabling diagnostics settings on the Azure Firewall instances. This will make sure they’ll forward their logs towards the Microsoft Sentinel-enabled workspace.

You can use Azure Policy to enforce these settings on all your firewall instances for you. The dedicated Azure Firewall data connector in Microsoft Sentinel will guide you through the steps on how to achieve this.

You’ll find that there are three categories of logs which can be enabled for Azure Firewall:

Application rule logs
DNS logs
Network rule logs

Once logs are coming in you will find these in the AzureDiagnostics table:

AzureDiagnostics
| where ResourceType == "AZUREFIREWALLS"

And each of the three categories are divided by the Category column:

| where Category == "AzureFirewallApplicationRule"
| where Category == "AzureFirewallDnsProxy"
| where Category == "AzureFirewallNetworkRule"

Luckily for us the people at Microsoft already did a fantastic job of creating a parser for AzureFirewallDnsProxy data as part of the ASIM parsers collection, so we only need to create parsers for the other two categories.

At the bottom of this page you’ll find an automated deployment which you can use to directly deploy parsers for all three categories (including the one from Microsoft) directly into your environment.

Application rule logs

My research led me to the conclusion that logs in this category come in six (!!) different parsing “flavors”. Here’s a sample of six lines showing the different patterns:

And these can be parsed with this parser:

Network rule logs

Log entries in this category come in five different “flavors”:

And these can be parsed with this parser:

Using these parsers

You can either opt for a manual “installation” or an automated deployment.

Automated Deployment

You can deploy the parsers to your existing workspace with the button below:

Additionally you can save queries as functions. See details below.

Manual installation

Load up the query interface of Log Analytics / Microsoft Sentinel and paste the code of a parser in the UI.
Click on Save --> Save as function and give it a name.
Repeat the process for each of the three parsers.

Once a function is saved, it might take a minute or two before they can be used. Once active you should be able to run a function by entering the name you gave it in step 2.

For example:

AzureFirewallNetworkRules
| where protocol == "TCP" and
        action == "Allow" and
        port_destination == 3389

Conclusion

That’s a wrap! I hope my Azure Firewall parsers also become useful for others. And I also hope that this article gave you a clearer understanding of how you can parse data with KQL.

If you have any questions, never hesitate to reach out to me!

— Koos