blog に戻る

2021年07月22日 Rick Jury

Optimize value of Cloudtrail logs with infrequent tier

A common scenario for log analytics is that many log events are high value for real time analytics, but there are also events that are low value for analytics, but account for a very large percentage of overall log volume.

Often these same low value logs are used only for ad-hoc investigations from time to time or need to be retained for audit purposes.

Sumo Logic data tiering provides a comprehensive solution for all types of data that an organization has, low touch, high touch and everything in between, at an economical price. Data Tiers provide tier-based pricing based on your planned usage of the data you ingest.

  • The continuous data tier provides the best Sumo Logic features with a fixed ingest cost per GB in credits.

  • Where users don't require dashboards or real time alerts, the infrequent data tier can ingest logs at a vastly reduced credits cost per GB with an additional small scan charge per search run. If implemented correctly credits savings per GB can be over 90% vs the continuous data tier.

This article shows how to work through analysis and the implementation of data tiering for AWS Cloudtrail logs - but the same analysis methodology and tiering design can be used for any log type.

Routing low value Cloudtrail events to the infrequent tier

AWS Cloudtrail is a log source that is a critical log source for security analytics in AWS environments, and Sumo Logic and has many out of the box Cloudtrail Apps for security and operations use cases.

Cloudtrail logging can also be very verbose, and it's quite common that some AWS workloads can generate such large volumes of Cloudtrail events that the log analytics budget can be under intense cost pressure!

By identifying and routing low value logs to the infrequent data tier Sumo Logic customers can match credits consumption to the analytics value of the Cloudtrail log based on categories such as AWS service, event or workload.

Users can setup Cloudtrail ingestion such that:

  • Events with real time analytics value are in 'continuous' tier and available for Cloudtrail apps, dashboards and alerting;

  • Low value, large volume cloudtrail events are stored in the infrequent tier where they are available for ad-hoc search if required;

  • Overall solution cost is greatly reduced;

  • All events are online for immediate search with no archive retrieval delays;

  • All events are stored in the same Sumo Logic highly secure, scalable cloud platform.

In the next sections we will see how to:

  • Check for and identify high volume / low value Cloudtrail logs

  • Create an infrequent tier routing solution in Sumo Logic using a custom 'tier' field, field extraction rules and an infrequent partition.

Walkthrough

1. Analysis: Find What Makes the Volume

To make data tiering valuable we would need to identify one or more segments of the logs that are over 10% of the total volume.

First we need to understand what cloudtrail events generate the most volume. The Sumo Logic Data Volume index can be valuable across categories but we need a new approach to look at the volume drivers inside the same category.

The key is to analyze the breakdown of cloudtrail using properties of the event and find the largest categories by the builtin _size field.

Here is an example Cloudtrail event shown in the Sumo Logic UI, some of the most commonly used fields are highlighted in the green boxes.

Cloud Train Infrequent Tier Walkthrough 1

In order to identify the largest contributors of cloudtrail events we can do analysis using fields like eventname, or eventsource and look for large contributors to overall size.

Here is an example Sumo Logic query to analyze Cloudtrail logs by volume and a custom dimension: eventname using the _size field.

_sourcecategory=*cloudtrail*

| json field=_raw "eventSource"

| json field=_raw "eventName"

| sum(_size) as bytes by eventname | sort bytes

Use the pie chart visualisation to see which are the top contributors. In the example below Decrypt events generate over 35% of traffic in this account. This would be a top candidate to look at routing to the infrequent tier to reduce credit consumption.

Cloud Train Infrequent Tier Walkthrough 2

Further Filtering

Once you have determined high level volumes segments, it's important to understand why what AWS workload is creating verbose event logging. To do this update your search to filter for a single large category: such as eventname="decrypt"; then repeat the bytes breakdown by other key fields to find what workloads create the events.

This query is an example of generating a breakdown that is filtered to only decrypt events but providing context using other fields. Use the table aggregation view to view the results of this query.

_sourceCategory=*cloudtrail* decrypt

| json field=_raw "eventName"

| where eventname="Decrypt"

| json field=_raw "eventSource"

| json field=_raw "userIdentity.arn" as arn nodrop

| sum(_size) as bytes by eventsource,eventname,arn

| sort bytes

| total bytes as total_bytes | 100 * (bytes/total_bytes) as percent | fields -total_bytes,bytes

In this example account most Decrypt events have no user arn context, but events from a trusted third party role make up a significant portion.

Cloud Train Infrequent Tier Walkthrough 3

Common Analytics Dimensions

This table below shows some suggestions for fields to break down your logs by and example workload scenarios:

Field/Dimension

About

Example Scenarios Creating Verbose Logging

eventname

Name of the API call

  • A lambda function behind an API gateway is called over 1 million times each day and each time calls AssumeRole and GetParameter.

  • A data warehousing batch process reads millions of s3 bucket objects each day triggering a Decrypt event for each.

userIdentity.arn or userName

User context for assumed roles or user credential

  • Third party polling from trusted external vendor roles poll Cloudwatch metrics and configuration information which overall contributes 30% of events.

eventsource

AWS API source for event

  • A data migration project to the cloud is underway. As Snowball devices are uploaded, massive volumes of events are generated for the bulk data import.

2. Validate the Use Case

Once the workload sources is understood this can be discussed with security and devops/SRE teams to establish if they are suitable for the infrequent tier.

A good candidate for the infrequent tier is one that:

  • Comprises significant volume for the log (ideally > 10% or more);

  • Is used for ad-hoc search ony (ie. no dashboards or real time alerts);

  • You can define an if statement to scope a query to those logs.

The Search Audit index is a great way to validate what queries are being run against log sources, and what query type they are: dashboard, interactive search etc. Analysing queries in this case vs Cloudtrail logs for say a 30 day period it's possible to validate:

  • Who searches the logs;

  • Do they search all logs or a subset of them;

  • If dashboards and alerts are used is this vs just a subset of logs.

Here is an example dashboard you can import into your Sumo Logic account that enables easier analytics of the search audit data. It can be filtered by query, type or user and shows what meta data strings are used as well as complete queries and types. You can use the searches in these panels as a start for custom analysis.

Cloud Train Infrequent Tier Walkthrough 4
Cloud Train Infrequent Tier Walkthrough 4.5


Continuing with our example scenario administrators use the search audit index to analyse queries vs Cloudtrail logs and find that the Security team are the key users of Cloudtrail logs, and there seem to be no queries targeting decrypt events directly.

After discussing with the security team it's agreed Decrypt events need to be kept for audit purposes but the use case for search is that they are only occasionally searched in response to some security investigations, so dashboards and alerting are not required.

3. Create A Routing Plan

Before configuring routing to infrequent tier you should have a clear plan for what data to route.

In our example scenario the team decided on the following routing plan.

  • Eventname = Decrypt events could be routed to infrequent except where they are errors;

  • An error is defined as "contains errorCode" key;

  • This should reduce continuous ingest credits use by about 35%, but all events are still retained in Sumo Logic for audit purposes.

4. Implement The Infrequent Tier

Let's walk through the steps taken by the team to setup data tiering for Cloudtrail.

Define A Tier Field

Open the Manage Data / Logs / Fields UI and add a field called tier if it's not already present. Sumo will be configured to set the value of this field for each event in the following steps - with the tier field used to route logs to partitions.

The team should agree on known values for this field for example:

Value

Routing Strategy

continuous (default)

continuous tier

infrequent

infrequent tier

Create Partitions

For cloudtrail implement TWO partitions. One for continuous data and one for infrequent.

Partition name

cloudtrail

cloudtrail_infreq

Type

continuous

infrequent

Routing

_sourcecategory=cloudtrail not tier=infrequent

_sourcecategory=cloudtrail tier=infrequent

contains

Most cloudtrail event types but not decrypt (unless it contains an error code)

Only decrypt events with no errorcode.

Use case

Mission critical, frequent searches and alerts

Audit trail or ad hoc searches when required using the search UI or API.

Cloud Train Infrequent Tier Walkthrough 5

Create A Field Extraction Rule

You can set a field value in Sumo Logic at a collector or source level, but a great flexible way to set the tier field value per event is to use field extraction rules(FER). FER can accommodate very complex routing logic by using parsing and logical operator like if.

Here's what the team creates for Cloudtrail logs, including one or more if statements to set the tier field value.

Cloud Train Infrequent Tier Walkthrough 6

Source:

_sourcecategory=*cloudtrail*

Parse Expression:

parse "eventSource\":\"*\"" as eventsource

| parse "\"sourceIPAddress\":\"*\"" as sourceipaddress

| parse "\"eventName\":\"*\"" as eventname

| parse "awsRegion\":\"*\"" as awsRegion

| parse "\"userName\":\"*\"" as userName nodrop

| json field=_raw "userIdentity.arn" as arn nodrop

| json field=_raw "recipientaccountid" as recipientaccountid nodrop
| json field=_raw "errorCode" as errorCode nodrop

// tier routing if statements
| "continuous" as tier

| if (eventname = "Decrypt" and isempty(errorcode),"infrequent",tier) as tier

As new data streams in it will be split into the two partitions based on the tier value. Most Cloudtrail events will have a 'continuous' as the tier field value except exceptions set by if statements.

5. Searching Infrequent Data

To search the infrequent data users would search using the _index name:

_index=cloudtrail_infreq

Or the infrequent tier just add the _datatier modifier to any existing search:

_datatier=infrequent _sourcecategory=*cloudtrail*

A query such as below can be used by administrators to validate data is routing to the correct index by using the _view field and _datatier modifier:

_sourcecategory=*cloudtrail* _datatier=infrequent // or _datatier=continuous
| _view as index
| count by _sourcecategory,_index

No More Tears With Tiers

In summary we learned tha Sumo Logic data tiering is a solution to the dilemma that not all logs are of equal log analytics value (or size), that enables Sumo Logic customers to work with all log types at an economical price.

We learned about how to use _size field and parsing out log specific fields to do detailed analysis of contributors to overall log size.

Following this process outlined above you can:

  • Understand what types of events within a set of logs are driving volume size;

  • Use the search audit index to investigate actual usage;

  • Design and implement data tiers to get better value from your Sumo Logic log analytics.

Thanks for taking the time to read this article about data tiers and good luck on your data tiering journey!

Complete visibility for DevSecOps

Reduce downtime and move from reactive to proactive monitoring.

Sumo Logic cloud-native SaaS analytics

Build, run, and secure modern applications and cloud infrastructures.

Start free trial
Rick Jury

Rick Jury

Customer Success Engineer

More posts by Rick Jury.

これを読んだ人も楽しんでいます