Use Sentinel Basic and Archive logs
to extend retention of 365 Defender logs past the 30 day limit! [part 1/2]
Update
I’ve recently wrote an article about storing Microsoft 365 Defender data in Azure Data Explorer (ADX). Because of the versatility of ADX, you might want to reconsider using Sentinel Basic logs. I’ve built a fully automated solution to setup the ADX environment and called it ‘ArchiveR’ 🤖 and here you can read all about it.
Introduction
I regularly get asked questions about data retention in Microsoft Sentinel. Some customers want a longer data retention than previously possible within Log Analytics for compliance requirements. Others might have table-specific needs for storing data. And lately these questions and decisions come up when discussing Microsoft 365 Defender data as well.
Although you have some data retention options for 365 Defender, with a maximum of 180 days these are way more limited than what we’re used to compared to Sentinel. And for your raw logs, used for Advanced Hunting or more advanced triage and investigations, you are even more limited. These are only stored for 30 days, regardless what setting you choose for global data retention.
Recently Microsoft announced new tiers for storing data in Sentinel into public preview: Basic and Archive log tiers. These new tiers accommodate for more elaborate retention requirements as well as control over ingest costs.
So how do the three different log tiers differ from each other? What new capabilities and limits are newly available? And most importantly (for this article at least): Can we also ingest raw data from Microsoft 365 Defender in the new Archive tier as well? And can we also leverage the benefits of Basic logs in between, to minimize costs as well?
These are the questions I’ll be answering in this article. But since we have a lot of ground to cover, I’d like to make this my first multi-part article and split it into two distinct sections:
- Part 1 | Introduction to new log tiers [📍you are here ]
– Basic and Archive log tier details
– New custom log ingestion method with Data Collection Endpoints - Part 2 | Archiving Microsoft 365 Defender logs
– Downsides and limitations of the integrated M365 data connector
– Use Logic App to ingest 365 Defender data as custom logs into Basic table
So if you’re already familiar with the concepts described in this first part, you may want to dive straight into the next one.
New log tiers
Azure Log Analytics (and thus also Sentinel) has received two new log tiers; Basic and Archive. The already existing way of ingesting logs into your workspace is now called Analytics logs.
Both Analytics logs and Basic logs can be combined for different log streams and act as a storage solution for you log ingestion.
Once logs are stored in either one of these tiers they might be moved into Archive logs later, depending on the retention requirements set on both the table and the workspace.
Date stored in Archive can be retrieved with either Search jobs and/or Restore Job after which the data is pulled back into a Analytics logs table:
Each of these three log tier types comes with its own “quirks and features” of course, so let’s dig into the specifics of each tier.
Basic logs
This tier is designed to provide an extremely cost effective way to store logs which are high in volume but low in priority. These “chatty” logs might be required to store to meet compliance requirements, or might be of an added value for investigations. Netflow logs or other networking kind of logs might be a good example of data you might want to store into a Basic logs table.
To summarize:
➕ Reduced ingestion price
➕ Can be archived the same way as Analytics logs
➖ Data retention of only 8 days
➖ Data is not available for analytic rules or log alerts
➖ Reduced KQL functions *
➖ Queries are charged separately
* No
join()
,union()
, or aggregations (i.e.summarize()
)
All tables in Log Analytics are by default of the Analytics tier type. You may configure particular tables to use Basic logs. But there are some limitations to which tables support Basic logs. As of the time of writing only the following tables are supported:
- All custom log tables created as Data Collection Rule (DCR)-based. (more on this later!)
- Tables used by Container Insights like
ContainerLog
andContainerLogV2
. - Tables used by Application Insights like
AppTraces
.
Please note that existing custom log tables and custom logs ingested with either the Microsoft Monitoring Agent or the Log Analytics ingestion API’s (i.e. Logstash) are not supported!
This feature is still in public preview. Check Microsoft’s documentation regarding Basic logs for the latest updates.
Archive logs
More and more customers were asking for options to retain their data in Log Analytics past the 730 day limit. And now Microsoft delivers with the option to extend the data retention into an Archive tier for up to seven years.
Tables in your Log Analytics workspace always had the ability to deviate from the global workspace retention level to shorten or even extend the lifetime of your data. But because you could only configure this via an API call, this was considered a little “secret”. Now, with Basic and Archive logs Microsoft also updated the UI to incorporate table-level retention with the added ability to configure the totalRetentionInDays
.
Getting data back from Archive logs is done via Search and Restore jobs.
Search jobs
➕ Parallel processing jobs in the background
➕ Results are published in a new table with _SRCH
suffix
➕ Will not impact workspace performance
➕ Queries will never timeout
➖ Search jobs are charged separately
➖ Only one table at a time
➖ No KQL!
Restore jobs
➕ No impact on workspace performance
➕ Uses elastic compute in background for additional load
➕ Restores to hot cache, new table with _RST
suffix
➕ Full set of KQL functions is supported
➖ It might become expensive when restoring larger datasets
➖ Minimum of 2 TB is charged for smaller restores
So, with Search jobs you can determine if the archive data contains what you’re looking for. Following up with a Restore jobs the data is brought back and ready for more extensive searching and filtering.
This feature is still in public preview. Check Microsoft’s documentation regarding data retention and archive policies for the latest updates.
Ingest DCR-based custom logs
As previously mentioned above, changing log tables to the Basic tier comes with certain limitations. For custom logs you’ll need to use the new custom logs API where you leverage another new Azure resource; the Data Collection Endpoint.
A Data Collection Endpoint (DCE) creates an external URI for you to POST
your custom logs to. Attached to the DCE is a Data Collection Rule (DCR) which determines what the destination of these logs should be. The results land in a custom table (with _CL
suffix) inside the workspace, seemingly similar to the ones you may have seen before.
In the past custom log tables were automatically created once you pushed log to the Log Analytics API. But this new approach is quite different and also brings quite a few benefits:
- Column names are no longer auto-generated and suffixed to represent their data type (i.e.
_b
for boolean,_s
for string and_d
for double) upon first ingestion. - With transformation rules you can parse and filter while ingesting data on-the-fly. This is huge!
- Custom logs can be send to multiple different destinations.
Getting started
To help you get acquainted with this new approach, Microsoft has prepared a tutorial which guides you through the process of preparing and setting up custom log ingestion, as well as uploading some Apache sample logs.
To quickly summarize, you’ll need to setup the following:
- Create a Data Collection Endpoint and take note of the URI displayed in the overview blade.
- Create a DCR-based custom logs table within the Log Analytics interface. During this step you’ll also create a Data Collection Rule, assign a Data Collection Endpoint as well as create your transformation rule.
- You need to note down the
immutableId
of the Data Collection Rule created in step #2. Within the Data Collection Rule the destination workspace is assigned as well as thesteam
name based on the name of the custom table, including the_CL
suffix. - The role
Monitoring Metrics Publisher
need to be assigned to the identity responsible for performing thePOST
method to the API. Microsoft uses an app registration in their tutorial, but in part #2 of this article we’ll be using a system-assigned managed identity from an Azure Logic App.
With this in place you’ll be able to construct the complete URI:
{Data Collection Endpoint URI}/dataCollectionRules/{DCR Immutable ID}/streams/{Stream Name}?api-version=2021-11-01-preview
For example:
https://custom-logs-1a2b3.westeurope-1.ingest.monitor.azure.com/dataCollectionRules/dcr-01/streams/Custom-table_name_CL?api-version=2021-11-01-preview
Archive Microsoft 365 Defender events
I guess this is the entire reason why you’ve clicked on the link and started reading this article. And yes, now that I’ve laid the groundwork, it’s time to dig into the nitty-gritty-details of how you can leverage Archive and Basic logs for you 365 Defender events.
Why?
Microsoft currently provides an excellent Data Connector for Sentinel to natively ingest 365 Defender events. But most customers who’ve enabled them, most probably also faced the (only) major downside: this generates a LOT of data FAST!
If you have special usecases where data from incidents coming from one product, need to be correlated with incidents from another, then this is the way to go. Because analytics rules cannot make use of Basic logs tables.
But as I’ve mentioned in the introduction; I hear more and more customers having the need to extend the 30 day limit for querying data in the Advanced Hunting UI. By ingesting the logs into Sentinel we’re are able to query the data and extend the lifetime to up to seven years with the new Archive logs. But depending on the size of your environment, you’ll be facing high ingest costs for doing so as well.
This is because the data will still be ingested in “regular” Analytics logs tables. And even if you bring down the retention for these tables to 30 days, (which is also the lowest amount available) this might still generate quite some data volume. I’ve seen cases where customers generate over 50 GB on a daily basis. That’s 1,5 TB per month just to “temporarily” store the data before it goes into archive logs!
Basic logs to the rescue!
To work around this you want the temporary storage to leverage the benefits of Basic logs before them going into Archive.
- Store data at a much lower rate at almost 1/5th of the price you’ll be charged for Analytics logs (€ 1,10 / GB instead of € 5,32 / GB for West Europe)
- Basic logs have a shorter retention of only 8 days
Since we cannot convert existing tables to the Basic tier, we need to ingest the Microsoft 365 Defender logs into Sentinel as custom logs. And for the Basic tier to be available we also need to leverage a Data Collection Endpoint as our ingestion method.
But how?!
For this example I’m going to use a combination of an Azure Storage Account and Azure Logic App to store the data into Sentinel.
- By using the streaming API feature of Microsoft 365 Defender we can stream all logs to blob containers on an Azure Storage Account. We only need to store the data there for a couple of days, so the added costs within this setup is quite low.
- An Azure Logic app will query the blob containers on a daily basis to collect all logs from the previous day.
- Next, it’ll construct a proper API call to push all of the log entries to a Data Collection Endpoint with the relevant stream name.
- Because of a stream count limit of 10 (more on this later in part #2 of this article) per Data Collection Rule, we need multiple DCR’s; one for every table to be exact.
- Custom logs are flowing into Sentinel/Log Analytics where the tables can be configured to use Basic logs. Besides of the
_CL
table name suffix, nothing within your KQL queries need to be changed. All column names stay exactly the same thanks to this new custom logs ingestion method. - After 8 days data will be moved to Archive logs where it can be retained for up to seven years. In case of a major security incident, you can search your logs are restore parts of it for further forensic investigation if needed.
To be continued…
In the next part of this article I’ll walk you through the intricate details of the Logic App and explain all challenges that I had to overcome to get this solution to work.
Part #2 is now also available to read!
— Koos