Parsing Azure Firewall logs in Microsoft Sentinel
… and how to parse other large datasets with KQL
II’ve created parsers for Azure Firewall logs to use with Microsoft Sentinel. Follow the steps of this Deploy to Azure automated deployment to get started.
Introduction
One of the great things about Microsoft Sentinel is obviously the shear amount of data connectors available with which you can ingest logs from possibly every log source imaginable. But sooner or later you’ll probably end up in a scenario where your logs are ingested as single lines of strings, and you need to parse them into separate columns.
This is especially noticeable with syslog
sources where the source doesn’t use the Common Event Format (CEF):
"HTTPS request from 0.0.0.0:46182 to dev.azure.com:443. Action: Allow. Policy: azfwpolicy-01. Rule Collection Group: DefaultApplicationRuleCollectionGroup. Rule Collection: enablement-genericwebcategory-prd-allow-200. Rule: Allowed Business-Use Webcategories. Web Category: ComputersAndTechnology"
The example above shows what an unparsed syslog message might look like. You obviously want to separate all the details into individual columns to make filtering possible.
While you can filter on text strings with operators like
contains
andhas
, this is very resource intensive. Especially when working with larger data sets, where the source outputs a lot of log entries (i.e. a firewall appliance), this immediately becomes unworkable since the queries will take too long and will probably timeout.
The example of an unparsed log entry shown above was not from a syslog
source. It unfortunately came from Azure Firewall… The reason I call this unfortunate is because one might expect that since this is also a Microsoft-product, it probably should’ve generated neatly structured logs towards Microsoft Sentinel “out of the box”.
In this article I’ll not only show you how to properly parse your Azure Firewall log entries, I will also guide you through the whole process of how my parsers came about. That way you can also apply this to other logs sources that need parsing as well.
Kusto to the rescue!
The Kusto Query Language (KQL) we’re using in Microsoft Sentinel provides a plethora of tabular operators to interact with out data, including options to parse entries:
parse
will evaluate a string expression and parses its value into one or more columns. The columns will have nulls, for unsuccessfully parsed strings.parse-where
will evaluate and parse a string expression, but will only output results once it was successful in this process.
If the to-be-parsed string contains a value you want to keep right at the start, you need to use these operators a bit differently then when it starts with an expression on which you want to evaluate strings on.
For example, this next string contains an expression “Protocol:” which we want to evaluate right at the start. What comes after that is our first value that should be put into a column named “protocol”:
"Protocol: TCP Source: 10.0.0.1 Destination: 172.0.0.1"
This can be parsed with:
| parse string with *
"Protocol: " protocol:string
"Source: " sourceIp:string
"Destination: " destinationIp:string
But the next example immediately starts with a value we want to keep.
"TCP source 10.0.0.1 with destination 172.0.0.1"
And should be parsed with:
| parse string
with protocol:string
" Source " sourceIp:string
" with destination " destinationIp:string
Note that the wildcard ‘ * ’ is missing in this parser.
Advanced parsing
Where things get a bit more complicated tough, is when you encounter log sources where not all entries have the exact same pattern.
You won’t be able to write one parser to rule them all. Because as we’ve seen above a parse operator will try and parse all rows you provide. Depending on if you use parse
or parse-where
either some columns values wil stay empty, or those rows unable to parse simply won’t be outputted at all.
So how can we parse a complete dataset which requires multiple parse operators, each with their own logic?
Simply put:
- We first need to determine how many different variations/patterns there are inside the dataset.
- Secondly we need to separate the dataset into multiple smaller sets.
- Each set will be parsed with it owns parsing pattern.
- Lastly, all datasets should be combined again to form one output.
You might have already guessed that this approach is also required when parsing Azure Firewall logs. But before we dig into actual Azure Firewall logs, let’s take a closer look at how we split, parse and join data with a simpler example.
These four rows contains information about several cities but note that not all information is there for every city. If we want to successfully parse all rows of this dataset, we need to same approach as outlined above.
This data seems to contains six different values per city, but none of the rows seem to be complete. If we just try and parse this with a single parse operator we’ll notice that our output is incomplete because none of the rows will perfectly match the parse logic.
| parse string_ with *
"Country: " country:string
"State: " state:string
"City: " city:string
"Population: " population:int
"Attraction: " attraction:string
"Local dish: " localDish:string
So, let’s try that again but with some parseception™ this time:
By using multiple let
statements with parse-where
we’re parsing the rows and only outputting data if the parser is successful. And since these four different rows need four different parser logics, we know that each section will only output one row.
By using a union
operator we can glue these multiple tables together again, and the output will be shown as follows:
Performance optimization
Before digging into Azure Firewall logs, there’s one last this I need to adres.
In the example above we’ve divided the dataset into four smaller sets, applied parsers and combined the output. And even tough every let
statement is throwing away rows which couldn’t be parsed, it is still going through the entire dataset. In this case that means we’re going through the same data four times in a row!
In this particular example that might now be much of an issue. But when handling larger datasets the performance hit will become very large, very quickly.
The solution to this is to make use of string operators within every let
statement to filter the data first before parsing. The most efficient way to do this is, is by making use of the has
operator and its many variations like hasprefix
, hassuffix
, has_any
and has_all
.
Always use
has
overcontains
since the latter is much slower! This is becausehas
searches for keywords that are indexed. Therefore you should also make sure that your keyword(s) contains at least three characters. Otherwisehas
will resort to full-text search on the entire string, and it will perform equally slow ascontains
.
Our previous example, but now “optimized” by leveraging filters in each parse section:
Azure Firewall
Finally we get to the meat of the matter!
Logs are captured by enabling diagnostics settings on the Azure Firewall instances. This will make sure they’ll forward their logs towards the Microsoft Sentinel-enabled workspace.
You can use Azure Policy to enforce these settings on all your firewall instances for you. The dedicated Azure Firewall data connector in Microsoft Sentinel will guide you through the steps on how to achieve this.
You’ll find that there are three categories of logs which can be enabled for Azure Firewall:
- Application rule logs
- DNS logs
- Network rule logs
Once logs are coming in you will find these in the AzureDiagnostics
table:
AzureDiagnostics
| where ResourceType == "AZUREFIREWALLS"
And each of the three categories are divided by the Category
column:
| where Category == "AzureFirewallApplicationRule"
| where Category == "AzureFirewallDnsProxy"
| where Category == "AzureFirewallNetworkRule"
Luckily for us the people at Microsoft already did a fantastic job of creating a parser for AzureFirewallDnsProxy
data as part of the ASIM parsers collection, so we only need to create parsers for the other two categories.
At the bottom of this page you’ll find an automated deployment which you can use to directly deploy parsers for all three categories (including the one from Microsoft) directly into your environment.
Application rule logs
My research led me to the conclusion that logs in this category come in six (!!) different parsing “flavors”. Here’s a sample of six lines showing the different patterns:
And these can be parsed with this parser:
Network rule logs
Log entries in this category come in five different “flavors”:
And these can be parsed with this parser:
Using these parsers
You can either opt for a manual “installation” or an automated deployment.
Automated Deployment
You can deploy the parsers to your existing workspace with the button below:
Additionally you can save queries as functions
. See details below.
Manual installation
- Load up the query interface of Log Analytics / Microsoft Sentinel and paste the code of a parser in the UI.
- Click on
Save
-->Save as function
and give it a name. - Repeat the process for each of the three parsers.
Once a function is saved, it might take a minute or two before they can be used. Once active you should be able to run a function by entering the name you gave it in step 2.
For example:
AzureFirewallNetworkRules
| where protocol == "TCP" and
action == "Allow" and
port_destination == 3389
Conclusion
That’s a wrap! I hope my Azure Firewall parsers also become useful for others. And I also hope that this article gave you a clearer understanding of how you can parse data with KQL.
If you have any questions, never hesitate to reach out to me!
— Koos