Ingest your logs into Azure Data Explorer with Logstash
Is this a better alternative to Sentinel Basic logs?
Introduction
While creating solutions based on Azure Data Explorer recently, I've discovered that this resource can be quite a valuable companion to use alongside Microsoft Sentinel. This is because it delivers unlimited storage at a much lower price while still providing us the powerful Kusto Query Language (KQL) to delve through our massive datasets.
This makes it a perfect solution for logs you don't necessarily want to real-time trigger security incidents of, but are still useful from a security perspective.
A couple of examples would be:
- Incident triage — i.e. a security analist can run investigations queries on these datasets (even straight from Sentinel! More on this below)
- Automatic enrichment of security incidents — i.e. entity lookup in your more chatty logs, like for example network logs.
- Pro-active threat hunting
- Forensics — some logs might be beneficial for tracking down an adversary's steps in case of an emergency.
I've written about using Azure Data Explorer for your Microsoft 365 Defender logs for similar reasons and also about how to ingest Sentinel Basic logs with Logstash. This article will be combining these two solutions because, dare I say it, it might better than Sentinel Basic logs. 😳
The caveats of Sentinel Basic logs
It's already been 1,5 years ago when Microsoft announced new tiers for Sentinel. And since then I've helped quite a few customers on-board their chatty logs into the Basic logs tier. It seemed to make sense; store high volume logs (like network traffic logs) in Sentinel at a lower cost, and retrieve them from your archive once you really need them.
But in practice this seemed a little bit more cumbersome than it maybe should've been. There are a couple of things customers were encountering:
- Logs were stored in a Basic logs table for only 8 days. There's no way to get a proper idea of what logs you were exactly ingesting over a longer period of time. No sense of sizing or knowledge about ingest growth and other trends.
- This also had to do with the fact that KQL capabilities are heavily cripled on Basic log tables. You can use
where
,extend
andproject
, and that's basically it. Event a simplecount
is not allowed. - Data beyond 8 days can be archived. Archived data can be searched and restored if necesary. Once data is restored, so are full KQL capabilities. But the search functionality is quite basic, no KQL just a wildcard string search.
- And once your search job is completed, you can restore the results in a restore job. But these restore jobs will cost you. Microsoft will charge a minimum of 2TB in size and 12 hours in timespan of the actual logs. So you're facing a minimum charge of ~ € 250,- per restore. If you need 30 days worth of data, it's even 30 times this amount!
Especially the lack of proper KQL functionality makes it unsuited for any of the four examples I've mentioned in the introduction.
Ingest you Basic logs into ADX with Logstash
If you're already using Basic logs, then you're already using DCR-based log ingestion via Data Collection Rules. And you're probably also using Logstash in doing so. If not, then check out this article I've posted last year because I think is has some benefits over the standard logging solution with Azure Monitoring Agent.
For outputting to Azure Data Explorer there's a
kusto
output plug-in available!
Preparing Azure Data Explorer
Before we can start sending logs from Logstash we need to perform some preparations on the ADX side first:
- Create a database or use an existing one
Database retention and cache levels determine your table level retention and Kusto query performance accordingly. - Assign
Database Ingestor
permissions to your app registration
Pro tip: You can also do this with an ADX command and integrate this into your deployment script(s) with:.add database ['database'] ingestors ('aadapp=<applicationid>;yourdomain.com')
- Create a table and table mapping with proper schema based on the log source. More on this below…
Creating the table and table mapping
The table in your ADX database should have a proper schema and a mapping based on the original source. Constructing the ADX commands to create these two items can be a little bit intimidating because mistakes are easily made. And you also need to factor in that some column names are not allowed in ADX. The same goes for Log Analytics.
That's why Microsoft's own output plugin will transform these columns and replaces illegal characters (like
@
) withls_
.
We need to do the same here. That's why I've created a PowerShell script to help you construct these lengthy commands based on a sample file.
- Use your existing
microsoft-sentinel-logstash-output-plugin
to create a sample log file with thesample_file_path
parameter. - Transfer/download the sample file to your local machine and run the following PowerShell script:
Let's say your sample file looks like this:
Run: ./Get-AdxCommand.ps1 -sampleFileDirectory "<path to your sample files folder>"
and the results should look something like this:
Notice in this example that the tablename
Cisco_RO_CL
has a column namedls_timestamp
. But since the original source will be still outputting@timestamp
, the PowerShell script added a "transformation" in the mapping:
"column": " ls_timestamp", "path": "$[\'@timestamp\']"
The only thing needed is to copy/paste the purple lines into ADX and run them ( SHIFT - RETURN ) to create the two items.
Create/update your Logstash 'pipeline'
First we need to install the kusto
plugin:
sudo /usr/share/logstash/bin/logstash-plugin install logstash-output-kusto
Lastly, we can update the Logstash configuration file to use the kusto
plugin:
output {
kusto {
path => "/tmp/kusto/%{+YYYY-MM-dd-HH-mm-ss}.txt"
ingest_url => "https://ingest-clustername.westeurope.kusto.windows.net"
app_id => "<APPID>"
app_key=> "<SECRET>"
app_tenant => "<TENANT>"
database => "logstash-archive"
table => "Cisco_RO_CL"
json_mapping => "ciscorocl_mapping"
}
}
Just leave the
path
parameter unless you really need to change this. This will make sure the output plugin has some location to temporarily store the logs in transit.
Conclusion
Great to see that Logstash has the ability to output directly to ADX! Whether you want to replace Basic logs with this solution or not, is up to you. But I think this can be a valuable addition anyway to your logging strategy.
Please note that ADX is a service that requires proper monitoring! For more details please check out these details.
If you have any follow-up questions don’t hesitate to reach out to me. Also follow me here on Medium or keep an eye on my Twitter and LinkedIn feeds to get notified about new articles here on Medium.
I still wouldn’t call myself an expert on PowerShell. So if you have feedback on any of my approaches, please let me know! Also never hesitate to fork my repository and submit a pull request. They always make me smile because I learn from them and it will help out others using these tools. 👌🏻
If you have any follow-up questions, please reach out to me!
— Koos