Use Sentinel Basic and Archive logs

Koos Goossens
10 min readMay 16, 2022

--

to extend retention of 365 Defender logs past the 30 day limit! [part 1/2]

Update

I’ve recently wrote an article about storing Microsoft 365 Defender data in Azure Data Explorer (ADX). Because of the versatility of ADX, you might want to reconsider using Sentinel Basic logs. I’ve built a fully automated solution to setup the ADX environment and called it ‘ArchiveR’ 🤖 and here you can read all about it.

Introduction

I regularly get asked questions about data retention in Microsoft Sentinel. Some customers want a longer data retention than previously possible within Log Analytics for compliance requirements. Others might have table-specific needs for storing data. And lately these questions and decisions come up when discussing Microsoft 365 Defender data as well.

Although you have some data retention options for 365 Defender, with a maximum of 180 days these are way more limited than what we’re used to compared to Sentinel. And for your raw logs, used for Advanced Hunting or more advanced triage and investigations, you are even more limited. These are only stored for 30 days, regardless what setting you choose for global data retention.

Unfortunately ‘raw’ data for Advanced Hunting and investigations is only available for 30 days

Recently Microsoft announced new tiers for storing data in Sentinel into public preview: Basic and Archive log tiers. These new tiers accommodate for more elaborate retention requirements as well as control over ingest costs.

So how do the three different log tiers differ from each other? What new capabilities and limits are newly available? And most importantly (for this article at least): Can we also ingest raw data from Microsoft 365 Defender in the new Archive tier as well? And can we also leverage the benefits of Basic logs in between, to minimize costs as well?

These are the questions I’ll be answering in this article. But since we have a lot of ground to cover, I’d like to make this my first multi-part article and split it into two distinct sections:

  • Part 1 | Introduction to new log tiers [📍you are here ]
    – Basic and Archive log tier details
    – New custom log ingestion method with Data Collection Endpoints
  • Part 2 | Archiving Microsoft 365 Defender logs
    – Downsides and limitations of the integrated M365 data connector
    – Use Logic App to ingest 365 Defender data as custom logs into Basic table

So if you’re already familiar with the concepts described in this first part, you may want to dive straight into the next one.

I hope you’re young enough not to remember this early 2000’s meme. ;-)

New log tiers

Azure Log Analytics (and thus also Sentinel) has received two new log tiers; Basic and Archive. The already existing way of ingesting logs into your workspace is now called Analytics logs.

Both Analytics logs and Basic logs can be combined for different log streams and act as a storage solution for you log ingestion.

Once logs are stored in either one of these tiers they might be moved into Archive logs later, depending on the retention requirements set on both the table and the workspace.

Date stored in Archive can be retrieved with either Search jobs and/or Restore Job after which the data is pulled back into a Analytics logs table:

Diagram showing the three different log tier types available

Each of these three log tier types comes with its own “quirks and features” of course, so let’s dig into the specifics of each tier.

Basic logs

This tier is designed to provide an extremely cost effective way to store logs which are high in volume but low in priority. These “chatty” logs might be required to store to meet compliance requirements, or might be of an added value for investigations. Netflow logs or other networking kind of logs might be a good example of data you might want to store into a Basic logs table.

To summarize:

➕ Reduced ingestion price
➕ Can be archived the same way as Analytics logs
➖ Data retention of only 8 days
➖ Data is not available for analytic rules or log alerts
➖ Reduced KQL functions *
➖ Queries are charged separately

* No join(), union(), or aggregations (i.e. summarize())

All tables in Log Analytics are by default of the Analytics tier type. You may configure particular tables to use Basic logs. But there are some limitations to which tables support Basic logs. As of the time of writing only the following tables are supported:

  • All custom log tables created as Data Collection Rule (DCR)-based. (more on this later!)
  • Tables used by Container Insights like ContainerLog and ContainerLogV2.
  • Tables used by Application Insights like AppTraces.

Please note that existing custom log tables and custom logs ingested with either the Microsoft Monitoring Agent or the Log Analytics ingestion API’s (i.e. Logstash) are not supported!

This feature is still in public preview. Check Microsoft’s documentation regarding Basic logs for the latest updates.

Archive logs

More and more customers were asking for options to retain their data in Log Analytics past the 730 day limit. And now Microsoft delivers with the option to extend the data retention into an Archive tier for up to seven years.

Tables in your Log Analytics workspace always had the ability to deviate from the global workspace retention level to shorten or even extend the lifetime of your data. But because you could only configure this via an API call, this was considered a little “secret”. Now, with Basic and Archive logs Microsoft also updated the UI to incorporate table-level retention with the added ability to configure the totalRetentionInDays.

The new interface showing table-level retention with an example where the workspace retention is set to 90 days, and this particular table was configured to only retain data for 60 days. After 60 days, data should be moved into Archive logs and kept there for 1400 days. (4 years in total)

Getting data back from Archive logs is done via Search and Restore jobs.

Search jobs

➕ Parallel processing jobs in the background
➕ Results are published in a new table with _SRCH suffix
➕ Will not impact workspace performance
➕ Queries will never timeout
➖ Search jobs are charged separately
➖ Only one table at a time
➖ No KQL!

Restore jobs

➕ No impact on workspace performance
➕ Uses elastic compute in background for additional load
➕ Restores to hot cache, new table with _RST suffix
➕ Full set of KQL functions is supported
➖ It might become expensive when restoring larger datasets
➖ Minimum of 2 TB is charged for smaller restores

So, with Search jobs you can determine if the archive data contains what you’re looking for. Following up with a Restore jobs the data is brought back and ready for more extensive searching and filtering.

The new search interface lets you initiate search job and collect their results. This also provides you the ability to follow-up with a restore job.

This feature is still in public preview. Check Microsoft’s documentation regarding data retention and archive policies for the latest updates.

Ingest DCR-based custom logs

As previously mentioned above, changing log tables to the Basic tier comes with certain limitations. For custom logs you’ll need to use the new custom logs API where you leverage another new Azure resource; the Data Collection Endpoint.

A Data Collection Endpoint (DCE) creates an external URI for you to POST your custom logs to. Attached to the DCE is a Data Collection Rule (DCR) which determines what the destination of these logs should be. The results land in a custom table (with _CL suffix) inside the workspace, seemingly similar to the ones you may have seen before.

General overview of ingesting DCR-based custom logs with a DCE

In the past custom log tables were automatically created once you pushed log to the Log Analytics API. But this new approach is quite different and also brings quite a few benefits:

  • Column names are no longer auto-generated and suffixed to represent their data type (i.e. _b for boolean, _s for string and_d for double) upon first ingestion.
  • With transformation rules you can parse and filter while ingesting data on-the-fly. This is huge!
  • Custom logs can be send to multiple different destinations.

Getting started

To help you get acquainted with this new approach, Microsoft has prepared a tutorial which guides you through the process of preparing and setting up custom log ingestion, as well as uploading some Apache sample logs.

To quickly summarize, you’ll need to setup the following:

  1. Create a Data Collection Endpoint and take note of the URI displayed in the overview blade.
  2. Create a DCR-based custom logs table within the Log Analytics interface. During this step you’ll also create a Data Collection Rule, assign a Data Collection Endpoint as well as create your transformation rule.
  3. You need to note down the immutableId of the Data Collection Rule created in step #2. Within the Data Collection Rule the destination workspace is assigned as well as the steam name based on the name of the custom table, including the _CL suffix.
  4. The role Monitoring Metrics Publisher need to be assigned to the identity responsible for performing the POST method to the API. Microsoft uses an app registration in their tutorial, but in part #2 of this article we’ll be using a system-assigned managed identity from an Azure Logic App.
Without proper permissions you will not pass!

With this in place you’ll be able to construct the complete URI:

{Data Collection Endpoint URI}/dataCollectionRules/{DCR Immutable ID}/streams/{Stream Name}?api-version=2021-11-01-preview

For example:

https://custom-logs-1a2b3.westeurope-1.ingest.monitor.azure.com/dataCollectionRules/dcr-01/streams/Custom-table_name_CL?api-version=2021-11-01-preview

Archive Microsoft 365 Defender events

I guess this is the entire reason why you’ve clicked on the link and started reading this article. And yes, now that I’ve laid the groundwork, it’s time to dig into the nitty-gritty-details of how you can leverage Archive and Basic logs for you 365 Defender events.

Why?

Microsoft currently provides an excellent Data Connector for Sentinel to natively ingest 365 Defender events. But most customers who’ve enabled them, most probably also faced the (only) major downside: this generates a LOT of data FAST!

Screenshot showing all of the individual products now supported by the 365 Defender Data Connector.

If you have special usecases where data from incidents coming from one product, need to be correlated with incidents from another, then this is the way to go. Because analytics rules cannot make use of Basic logs tables.

But as I’ve mentioned in the introduction; I hear more and more customers having the need to extend the 30 day limit for querying data in the Advanced Hunting UI. By ingesting the logs into Sentinel we’re are able to query the data and extend the lifetime to up to seven years with the new Archive logs. But depending on the size of your environment, you’ll be facing high ingest costs for doing so as well.

This is because the data will still be ingested in “regular” Analytics logs tables. And even if you bring down the retention for these tables to 30 days, (which is also the lowest amount available) this might still generate quite some data volume. I’ve seen cases where customers generate over 50 GB on a daily basis. That’s 1,5 TB per month just to “temporarily” store the data before it goes into archive logs!

Basic logs to the rescue!

To work around this you want the temporary storage to leverage the benefits of Basic logs before them going into Archive.

  • Store data at a much lower rate at almost 1/5th of the price you’ll be charged for Analytics logs (€ 1,10 / GB instead of € 5,32 / GB for West Europe)
  • Basic logs have a shorter retention of only 8 days

Since we cannot convert existing tables to the Basic tier, we need to ingest the Microsoft 365 Defender logs into Sentinel as custom logs. And for the Basic tier to be available we also need to leverage a Data Collection Endpoint as our ingestion method.

But how?!

For this example I’m going to use a combination of an Azure Storage Account and Azure Logic App to store the data into Sentinel.

  1. By using the streaming API feature of Microsoft 365 Defender we can stream all logs to blob containers on an Azure Storage Account. We only need to store the data there for a couple of days, so the added costs within this setup is quite low.
  2. An Azure Logic app will query the blob containers on a daily basis to collect all logs from the previous day.
  3. Next, it’ll construct a proper API call to push all of the log entries to a Data Collection Endpoint with the relevant stream name.
  4. Because of a stream count limit of 10 (more on this later in part #2 of this article) per Data Collection Rule, we need multiple DCR’s; one for every table to be exact.
  5. Custom logs are flowing into Sentinel/Log Analytics where the tables can be configured to use Basic logs. Besides of the _CL table name suffix, nothing within your KQL queries need to be changed. All column names stay exactly the same thanks to this new custom logs ingestion method.
  6. After 8 days data will be moved to Archive logs where it can be retained for up to seven years. In case of a major security incident, you can search your logs are restore parts of it for further forensic investigation if needed.

To be continued…

In the next part of this article I’ll walk you through the intricate details of the Logic App and explain all challenges that I had to overcome to get this solution to work.

Part #2 is now also available to read!

— Koos

--

--

Koos Goossens
Koos Goossens

Written by Koos Goossens

Microsoft Security MVP | Photographer | Watch nerd | Pinball enthusiast | BBQ Grillmaster

Responses (1)