Ingest your logs into Azure Data Explorer with Logstash

Koos Goossens
6 min readAug 11, 2023

--

Is this a better alternative to Sentinel Basic logs?

It should be no surprise that Elastic Logstash also pairs nicely with Azure Data Explorer

Introduction

While creating solutions based on Azure Data Explorer recently, I've discovered that this resource can be quite a valuable companion to use alongside Microsoft Sentinel. This is because it delivers unlimited storage at a much lower price while still providing us the powerful Kusto Query Language (KQL) to delve through our massive datasets.

This makes it a perfect solution for logs you don't necessarily want to real-time trigger security incidents of, but are still useful from a security perspective.

A couple of examples would be:

  • Incident triage — i.e. a security analist can run investigations queries on these datasets (even straight from Sentinel! More on this below)
  • Automatic enrichment of security incidents — i.e. entity lookup in your more chatty logs, like for example network logs.
  • Pro-active threat hunting
  • Forensics — some logs might be beneficial for tracking down an adversary's steps in case of an emergency.

I've written about using Azure Data Explorer for your Microsoft 365 Defender logs for similar reasons and also about how to ingest Sentinel Basic logs with Logstash. This article will be combining these two solutions because, dare I say it, it might better than Sentinel Basic logs. 😳

The caveats of Sentinel Basic logs

It's already been 1,5 years ago when Microsoft announced new tiers for Sentinel. And since then I've helped quite a few customers on-board their chatty logs into the Basic logs tier. It seemed to make sense; store high volume logs (like network traffic logs) in Sentinel at a lower cost, and retrieve them from your archive once you really need them.

But in practice this seemed a little bit more cumbersome than it maybe should've been. There are a couple of things customers were encountering:

  1. Logs were stored in a Basic logs table for only 8 days. There's no way to get a proper idea of what logs you were exactly ingesting over a longer period of time. No sense of sizing or knowledge about ingest growth and other trends.
  2. This also had to do with the fact that KQL capabilities are heavily cripled on Basic log tables. You can use where, extend and project, and that's basically it. Event a simple count is not allowed.
  3. Data beyond 8 days can be archived. Archived data can be searched and restored if necesary. Once data is restored, so are full KQL capabilities. But the search functionality is quite basic, no KQL just a wildcard string search.
  4. And once your search job is completed, you can restore the results in a restore job. But these restore jobs will cost you. Microsoft will charge a minimum of 2TB in size and 12 hours in timespan of the actual logs. So you're facing a minimum charge of ~ € 250,- per restore. If you need 30 days worth of data, it's even 30 times this amount!

Especially the lack of proper KQL functionality makes it unsuited for any of the four examples I've mentioned in the introduction.

Yes, technically you can park your logs there. But once they're there, they're kinda stuck

Ingest you Basic logs into ADX with Logstash

If you're already using Basic logs, then you're already using DCR-based log ingestion via Data Collection Rules. And you're probably also using Logstash in doing so. If not, then check out this article I've posted last year because I think is has some benefits over the standard logging solution with Azure Monitoring Agent.

For outputting to Azure Data Explorer there's a kusto output plug-in available!

Preparing Azure Data Explorer

Before we can start sending logs from Logstash we need to perform some preparations on the ADX side first:

  1. Create a database or use an existing one
    Database retention and cache levels determine your table level retention and Kusto query performance accordingly.
  2. Assign Database Ingestor permissions to your app registration
    Pro tip: You can also do this with an ADX command and integrate this into your deployment script(s) with: .add database ['database'] ingestors ('aadapp=<applicationid>;yourdomain.com')
  3. Create a table and table mapping with proper schema based on the log source. More on this below…
Databases overview
Database permissions

Creating the table and table mapping

The table in your ADX database should have a proper schema and a mapping based on the original source. Constructing the ADX commands to create these two items can be a little bit intimidating because mistakes are easily made. And you also need to factor in that some column names are not allowed in ADX. The same goes for Log Analytics.

That's why Microsoft's own output plugin will transform these columns and replaces illegal characters (like @ ) with ls_.

We need to do the same here. That's why I've created a PowerShell script to help you construct these lengthy commands based on a sample file.

  1. Use your existing microsoft-sentinel-logstash-output-plugin to create a sample log file with the sample_file_path parameter.
  2. Transfer/download the sample file to your local machine and run the following PowerShell script:

Let's say your sample file looks like this:

Note the special characters in columns: version and timestamp

Run: ./Get-AdxCommand.ps1 -sampleFileDirectory "<path to your sample files folder>" and the results should look something like this:

Example of processing a sample file coming from a Cisco appliance

Notice in this example that the tablename Cisco_RO_CL has a column named ls_timestamp. But since the original source will be still outputting @timestamp, the PowerShell script added a "transformation" in the mapping:

"column": " ls_timestamp", "path": "$[\'@timestamp\']"

The only thing needed is to copy/paste the purple lines into ADX and run them ( SHIFT - RETURN ) to create the two items.

Create/update your Logstash 'pipeline'

First we need to install the kusto plugin:

sudo /usr/share/logstash/bin/logstash-plugin install logstash-output-kusto

Lastly, we can update the Logstash configuration file to use the kusto plugin:

output {
kusto {
path => "/tmp/kusto/%{+YYYY-MM-dd-HH-mm-ss}.txt"
ingest_url => "https://ingest-clustername.westeurope.kusto.windows.net"
app_id => "<APPID>"
app_key=> "<SECRET>"
app_tenant => "<TENANT>"
database => "logstash-archive"
table => "Cisco_RO_CL"
json_mapping => "ciscorocl_mapping"
}
}

Just leave the path parameter unless you really need to change this. This will make sure the output plugin has some location to temporarily store the logs in transit.

Provide Sentinel users access to ADX

One neat trick you can do into providing seamless access to ADX from the Sentinel interface. This can be done by deploying workspace functions in your Sentinel workspace pointing towards the ADX clusters tables.

Check out more details here and here.

Query ADX from Sentinel is great! But unfortunately you cannot use identical tablenames as function names

Conclusion

Great to see that Logstash has the ability to output directly to ADX! Whether you want to replace Basic logs with this solution or not, is up to you. But I think this can be a valuable addition anyway to your logging strategy.

Please note that ADX is a service that requires proper monitoring! For more details please check out these details.

If you have any follow-up questions don’t hesitate to reach out to me. Also follow me here on Medium or keep an eye on my Twitter and LinkedIn feeds to get notified about new articles here on Medium.

I still wouldn’t call myself an expert on PowerShell. So if you have feedback on any of my approaches, please let me know! Also never hesitate to fork my repository and submit a pull request. They always make me smile because I learn from them and it will help out others using these tools. 👌🏻

If you have any follow-up questions, please reach out to me!

— Koos

--

--

Koos Goossens

Microsoft Security MVP | Photographer | Watch nerd | Pinball enthusiast | BBQ Grillmaster