blog に戻る

2024年07月11日 Anton Ovrutsky

Building the foundations: A defender’s guide to AWS Bedrock

Building the foundations: A defender’s guide to AWS Bedrock


Opinions regarding artificial intelligence (AI) range from fears of Skynet taking over to hope regarding medical advancements enabled by AI models. Regardless of where you sit on this spectrum of anxiety and hype, it is evident that the AI epoch is upon us.

While influencers and tech leadership pontificate about the impact of AI on both broader society as well as our cyber assets, developers are busy training data sets, tweaking models and generally injecting AI into their respective products and services.

For those of us charged with defending enterprises, the addition of AI workflows generates new challenges with additional services and systems to secure. Indeed, in the AI race, it is sometimes convenient to forget about the security of the systems on which the AI magic is developed.

Read on to learn about securing AWS Bedrock, a popular service for the creation and generation of AI workflows and models.

What is AWS Bedrock?

Within the cloud computing models, AWS Bedrock falls into the Software-as-a-Service (Saas) category.

A perhaps overly-simplistic way to think about AWS Bedrock is something along the lines of an AI development platform as a service.

According to official AWS Documentation, Bedrock is defined as:

“[...] a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. “

AWS Bedrock, in other words, manages the infrastructure necessary to create AI workflows and applications for its users. This allows AI developers to not worry about provisioning compute resources and to focus on AI-enabled application development.

Like any other cloud computing service, AWS Bedrock specifically and AI development generally are aspects that require a thoughtful security plan to be put in place to protect model data, intellectual property and generally anything that would affect the confidentiality, integrity or availability of AI workflows and development efforts.

Risks to AI workflows and AWS Bedrock

The introduction of AI development workflows into enterprises has added a layer of challenge and complexity for us defenders.

In addition to our existing mandate to secure cloud and on-premises workloads, we now also have to combine this general enterprise security orientation with an understanding of AI-specific risk avenues.

Practically speaking, this means that defenders need to take into account techniques found in the MITRE ATT&CK Cloud Matrix in addition to other frameworks such as the MITRE ATLAS Matrix as well as the OWASP Top 10 LLM Applications & Generative AI and OWASP Machine Learning Top 10 list.

Some examples of AI-specific risks can include:

  • Prompt Injection

  • Poisoning training data sets

  • LLM data leakage

  • Model denial of service

  • Model theft

AWS Bedrock telemetry tour

AWS Bedrock telemetry generally falls into a few categories.

Bedrock CloudTrail management events cover Bedrock control plane operations.

An example of a Bedrock management event is as follows:

Example of a Bedrock management event

We can see that this event looks very similar in structure to typical CloudTrail telemetry and includes the API call being used (ListFoundationModels) as well as information regarding the IP address, User Agent, AWS Region used and other request IDs that we will be coming back to in later sections of this blog. In this case, we can see that the event has an “eventCategory” of “Management”.

In addition to Management events, you can also configure Bedrock to log “Data” events. These events are higher in volume and cover data plane operations such as invocation telemetry.

We can take a look at an example of a Bedrock Data event:

Example of a Bedrock management event 2

In addition to management and data events, Bedrock also makes model invocation telemetry available. This provides us with deep visibility into prompts and model responses as well:

Example of a Bedrock management event 3

Finally, a fourth strand of telemetry that we will be looking at is good old-fashioned endpoint telemetry. This is necessary to cover AWS CLI usage that targets Bedrock operations and functionality.

For additional information, check out the excellent AWS documentation around monitoring AWS Bedrock.

In sum, we have four different telemetry strands to weave and work from in the context of Bedrock:

  • Management events

  • Data events

  • Model invocation telemetry

  • Endpoint telemetry

We will be utilizing all four of these lines of telemetry when looking at our rules and queries.

AWS Bedrock rules and hunts

Let’s look at AWS Bedrock Cloud SIEM rules as well as various Log Analytics Platform queries that can be executed to proactively detect threats in a Bedrock environment.

Let’s begin by looking at enumeration and reconnaissance.

Enumeration and reconnaissance

When choosing to build AI applications of workloads on Bedrock, users can use “Base models” that are included with Bedrock or can customize and/or import models.

From a threat actor point of view, enumeration and reconnaissance are critical steps when targeting a victim user's network or, in this case, an AI development environment.

We can map these aspects roughly to the AML.TA0002 MITRE Atlas category and to LLM 10 - Model Theft within the OWASP LLM/AI top 10

To begin, we can detect the enumeration of Bedrock Foundation Models by operationalizing the “ListFoundationModels” CloudTrail event.

We can choose to alert on every single occurrence of this event; however, this approach can be problematic as it is generated when simply browsing the various models within the AWS Console. This event may also occur as part of normal systems operations.

To make this type of alert more actionable, we can utilize Cloud SIEM’s UEBA rules, particularly the first seen rule type.

Here, rather than alerting on every occurrence of this event and flooding our analysts, we can baseline this activity for a set period of time and alert when a user is enumerating Bedrock models for the first time since the established baseline period.

The rule logic for this type of alert is relatively straightforward:

Enumeration and reconnaissance 1

The resulting Cloud SIEM signal will look something like:

Enumeration and reconnaissance

Now, when a user performs model enumeration for the first time since the established baseline period, we will get a Cloud SIEM signal raised for this activity.

We can further expand on this type of alerting logic by utilizing match lists and a combination of severities. For example, we can set the severity lower for a shorter baseline and clone the rule and set the severity higher for a longer baseline period.

In addition to specifically looking at first seen enumeration, we can also generally look at when an AWS user is making Bedrock API calls not seen since the baseline period. This type of rule logic would look like:

Enumeration and reconnaissance 3

Some ideas around enhancing this type of rule would be to have two versions of the rule, one with a configured match list of authorized Bedrock users with a lower severity and another more generic version without a match list constraint with a higher severity level. This dynamic will help analysts make decisions regarding malicious or benign activity with additional context.

So far, we have covered the temporal aspect of Bedrock threat detection - that is, looking at first seen activity. We can also cover Bedrock enumeration through a volumetric lens.

When detecting malicious or unauthorized enumeration/discovery, we can look at a larger than usual amount of enumeration-related API calls being made by a user.

Rather than counting the number of API calls being made, baselining this activity, and then setting thresholds accordingly, we can let Outlier rules do all that work for us.

One example here is looking for an outlier in the number of times a user has enumerated AWS Bedrock Foundation Models.

This type of rule expression will look like:

Enumeration and reconnaissance 4

And looking at the resulting signal:

Enumeration and reconnaissance 5

We can see from the above view that an analyst investigating this activity has a lot of context to work with, including a historical view of the baseline period as well as a specific view of when exactly the threshold was breached. The analyst also does not have to navigate away from this screen to investigate the activity at the telemetry level, as all the records are presented at the bottom of the view.

Initial access

After enumeration and discovery, the next logical step in a threat actor kill chain is initial access.

This is a very broad category that encompasses many vectors, so let’s take a look at a few of these.

Consider the following scenario: a threat actor locates a leaked AWS key for your environment, performs the necessary reconnaissance and enumeration within the environment utilizing the stolen key and discovers that the Bedrock service is in use.

As a next step, this nefarious actor attempts to invoke an AWS Bedrock model using the stolen credentials and gets an access denied error. In this fictitious example, the stolen credentials are valid for the AWS environment generally but do not have the necessary rights assigned to perform AWS Bedrock activity.

Although the activity failed, it would behoove us to alert and take action on it, as there may be other sensitive systems that this particular access key has access to beyond Bedrock.

As such, we can operationalize the following match rule:

Initial access 1

Here, we are making the decision to alert on every occurrence of this event. However, we can also create a “First Seen Model Invocation Denied” type rule and assign varying levels of severity to the match version and “first seen” version of the rule. This type of flexibility allows the customization of threat detection vectors depending on the behavior of particular environments.

In addition to the above, we can also create alerts that look for new AWS Bedrock Agents being created or a first seen role creating an AWS Bedrock agent:

Initial access 2

This type of operational alert is effective for those companies that have a particular AWS IAM Role assigned to developers who utilize and build with the Bedrock service. If a threat actor were to escalate privileges or assign a role to perform Bedrock operations, then this type of alerting logic would detect this type of activity.

Going off the rails

One key feature of AWS Bedrock is Guardrails. These are designed to implement some safeguards and controls around AI applications and prompts.

Depending on how the Guardrail is configured, it may protect against such as risks as LLM06 - Sensitive Information Disclosure as well as LLM05 - Model Denial of Service

We can look at a user or administrator removing a Bedrock Guardrail with the following alerting logic:

Going off the rails 1

In this instance, we are deciding to alert on every occurrence of this event, as Guardrails should only be removed by authorized users, with some kind of ticket or change request associated with the activity.

In addition to the removal of Guardrails, another threat vector for Bedrock is changes to model invocation logging configurations. This creates an interesting dynamic, as defenders charged with monitoring AWS Bedrock must keep in mind AI-specific risks such as those found in the OWASP LLM Top 10 as well as MITRE ATLAS, in addition to more “traditional” threat vectors found in MITRE ATT&CK. In this instance, the model invocation logging configuration change can be mapped to T1562.008 - Impair Defenses: Disable or Modify Cloud Logs.

Indeed, the interplay and interconnectedness of various frameworks converging on AI development infrastructure is a dynamic worthy of further research by the community.

Practically, we can alert on model invocation logging configuration changes using the following Cloud SIEM match rule logic:

Going off the rails 2

This type of activity should be fairly rare in most environments, so we can go ahead and alert on every occurrence, with a slightly higher severity assigned so that a Cloud SIEM Insight can be created for other malicious events related to this entity and activity.

Hunting in AWS Bedrock telemetry

Model invocation telemetry is fairly verbose relative to other types of telemetry like AWS Bedrock Management Events. This dynamic lends itself well to proactive threat-hunting efforts which we will cover in this section.

LLM01 - Prompt injection - Direct prompt injection

We can look for attempts at direct prompt injections by utilizing model invocation telemetry, specifically looking at the input that the user interacting with the model or Agent utilized.

Let’s take a look at an example query to get us started:

_collector="AWS Bedrock Model Invocation Logs"
| %"input.inputBodyJson.prompt" as prompt
| %"output.outputBodyJson.completion" as completion
| parse regex field=prompt "<input>(?<user_prompt>.*)<\/input>" nodrop
| parse regex field=completion "<thinking>(?<model_output>\s.*\s)<\/thinking>" nodrop
| values(user_prompt) as user_prompts,values(model_output) as model_output by requestId

This query utilizes Bedrock model invocation logs and parses out both the user prompts as well as the responses that the model returns to the user and presents them sorted by their respective requestId.

Looking at our results, we see the following:

LLM01 - Prompt injection - Direct prompt injection 1

We can see that the results highlighted in red can potentially be classified as attempts at direct prompt injection, as the user is trying to get information regarding the underlying system prompts.

From here, we can use a little bit of regular expressions to create a list of keywords that we want to flag as sensitive, these can be user inputs that contain phrases like “training data” or “model” or something else related to your specific industry or workload:

_collector="AWS Bedrock Model Invocation Logs"
| %"input.inputBodyJson.prompt" as prompt
| %"output.outputBodyJson.completion" as completion
| parse regex field=prompt "<input>(?<user_prompt>.*)<\/input>" nodrop
| parse regex field=completion "<thinking>(?<model_output>\s.*\s)<\/thinking>" nodrop
| if(user_prompt matches /(training data|model|prompts)/,1,0) as sensitive_user_prompt
| where sensitive_user_prompt = "1"
| values(user_prompt) as user_prompts,values(model_output) as model_output by requestId

Looking at the results below, we can see some user prompts that could potentially be suspicious, the model’s output/response also gives us some clues that this activity may be abnormal or outwardly malicious:

LLM01 - Prompt injection - Direct prompt injection 2

LLM03: Training data poisoning

Features of AWS Bedrock allow AI developers to fine tune and customize existing models.

This is an event that can obviously occur during normal operations and is therefore infeasible to alert on every occurrence of it.

However, this telemetry makes for a very rich hunting experience. Let’s take a look at an example:

_sourceCategory=cloudtrail
| where eventName = "CreateModelCustomizationJob"
| %"requestParameters.trainingDataConfig.s3Uri" as S3Location
| %"requestParameters.customModelName" as ModelName
| %"userIdentity.arn" as arn
| values(S3Location) as S3Location,values(ModelName) as ModelName by arn

The results here will tell you that a customization job for a model was created, as well as where the training data are for the aforementioned customization job:

LLM03 Training data poisoning

From here, if we were to deem this event suspicious, we can perform deeper investigation and take a look at the training data itself:

LLM03 Training data poisoning 2

To take this dynamic a bit further, we can also hunt for a user who is both creating a model customization job and putting an object into an S3 bucket.

This type of dynamic will flag on users who are both tweaking models and changing the underlying model training datasets. Once again, this series of events may occur during normal operations, so consider this search experimental or as something you can build upon & refine:

_sourceCategory=cloudtrail or _sourceCategory=cloudtraildataevents
| where (eventname = "CreateModelCustomizationJob" OR eventname = "PutObject")
| %"requestParameters.trainingDataConfig.s3Uri" as S3Location
| %"requestParameters.bucketName" as bucketName
| parse regex field = S3Location "s3\:\/\/(?<bucketName>.*)\/" nodrop
| transaction on %"userIdentity.principalId",bucketName
with "*PutObject*" as PutObject,
with "*CreateModelCustomizationJob*" as CreateModelCustomizationJob
results by transactions
| where PutObject >= 1 AND CreateModelCustomizationJob >= 1

Looking at the results, we can see that a user has created a model customization job as well as placed an object in an S3 bucket within the same day. This activity is not necessarily suspicious or malicious without context, but being able to investigate and hunt for multiple stages of operations is a valuable tool to have in your toolset.

LLM03 Training data poisoning 3

Model denial of service

Having the telemetry exposed by Bedrock model invocation logs lets us write queries that look at attempts at performing model denial of service. Let’s take a look at two examples of this dynamic.

In the first example, we will be measuring the length of the users’ input/prompt:

_collector="AWS Bedrock Model Invocation Logs"
| %"input.inputBodyJson.prompt" as prompt
| %"output.outputBodyJson.completion" as completion
| parse regex field=prompt "<input>(?<user_prompt>.*)<\/input>" nodrop
| parse regex field=completion "<thinking>(?<model_output>\s.*\s)<\/thinking>" nodrop
| length(user_prompt) as user_prompt_length
| values(user_prompt) as user_prompts,values(user_prompt_length) as prompt_length, values(model_output) as model_output by requestId

From here, we can baseline activity and then add a qualifier to our query to show results only when a user prompt is longer than the baselined amount:

_collector="AWS Bedrock Model Invocation Logs"
| %"input.inputBodyJson.prompt" as prompt
| %"output.outputBodyJson.completion" as completion
| parse regex field=prompt "<input>(?<user_prompt>.*)<\/input>" nodrop
| parse regex field=completion "<thinking>(?<model_output>\s.*\s)<\/thinking>" nodrop
| length(user_prompt) as user_prompt_length
| where user_prompt_length > 1000
| values(user_prompt) as user_prompts,values(user_prompt_length) as prompt_length, values(model_output) as model_output by requestId

Looking at our results, we see a malicious-looking user input:

Model denial of service

The model output will also give us some clues as to the category that the users’ prompt belongs to. We can see in this case, this prompt received a classification of malicious or harmful. It is important to call out that model output parameters can be configured within the Agent configuration and may vary depending on environments.

Another approach we can take is to sum up the input tokens used by a particular account ID so that we can flag on any deviations and anomalies:

_collector="AWS Bedrock Model Invocation Logs"
| %"input.inputTokenCount" as inputTokenCount
| timeslice 4h
| sum(inputTokenCount) as inputTokenSum by accountId,_timeslice

This relatively simple query will sum up the total number of input tokens utilized by various AWS account IDs - it will then time slice these data into 4-hour chunks and display the results sorted by account ID and timeslice.

Looking at the results, we see a fairly steep jump in the sum of the input tokens utilized by the model in the top row:

Model denial of service 2

An important call out here is the need to baseline activity and figure out what’s normal in your particular environment. However, having all the different strands of AWS Bedrock telemetry available will make this task much more accessible.

Let’s move onto the final strand of telemetry covered at the start of this blog: endpoint logs. Here, we will look at how to profile AWS CLI usage related to AWS Bedrock.

AWS CLI profiling

It’s time for the fourth and final strand of telemetry that we will be exploring: endpoint telemetry.

One of the powerful features of Cloud SIEM is its built-in and constantly refreshed and updated log mappings, which serve to normalize disparate strands of telemetry.


For example, if we had an EDR device logging command line values as “cmd_line” and native Windows events logging command line values as “CommandLine” then Cloud SIEM would normalize these records so that both fields would be “commandLine” - this means that you don’t have to worry about missing key events when performing searches.

We can go ahead and utilize this power by searching the normalized endpoint index - in our example, we will be looking at Jamf telemetry:

_index=sec_record_endpoint
| where commandLine matches /(aws\,bedrock)/
| replace(commandLine, /,/," ") as commandLine
| parse regex field = commandline "(?<commandLine_clean>[^\/]*$)" nodrop
| values(commandLine_clean) as AWSBedrockCommandLines by device_hostname,user_username

In this query, we are using the normalized endpoint index to find command lines that match on AWS Bedrock functionality. Because Jamf logs each command line parameter as a separate field within a JSON array, we need to perform some cleanup operations to display a human-readable command line.

We then display all the AWS Bedrock-associated command lines sorted by the device and user that issued or executed them:

AWS CLI profiling

With this command line piece of the puzzle, we now have visibility and insight into all facets of AWS Bedrock operations: AWS CouldTrail data and management events, model invocation telemetry as well as endpoint telemetry.

In some cases, command line telemetry can be overwhelming to look at for analysts. We can begin to solve this issue by “abstracting” away the actual command line values into something that is easier to digest. For example, we can abstract the command line value of aws bedrock list-custom-models to simply a “list” action.

This technique will help the analyst group command lines and will aid in profiling AWS Bedrock CLI commands. Let’s take a look at a concrete example.

Looking at the query that we ran above, we get the following command line values returned:

AWS CLI profiling 2

From here, we want to abstract and group these command lines:

AWS CLI profiling 3

In query form, this abstraction will look like:

_index=sec_record_endpoint
| where commandLine matches /(aws\,bedrock)/
| replace(commandLine, /,/," ") as commandLine
| parse regex field = commandline "(?<commandLine_clean>[^\/]*$)" nodrop

| if(commandLine_clean matches /aws bedrock delete/,"delete",
if(commandLine_clean matches /aws bedrock get/,"get",
if(commandLine_clean matches /aws bedrock list/,"list",""))) as bedrock_cmdline_action

From here, we can use our newly generated abstractions to perform a count of AWS Bedrock command line actions a particular user and host have executed:

_index=sec_record_endpoint
| where commandLine matches /(aws\,bedrock)/
| replace(commandLine, /,/," ") as commandLine
| parse regex field = commandline "(?<commandLine_clean>[^\/]*$)" nodrop

| if(commandLine_clean matches /aws bedrock delete/,"delete",
if(commandLine_clean matches /aws bedrock get/,"get",
if(commandLine_clean matches /aws bedrock list/,"list",""))) as bedrock_cmdline_action
| count(bedrock_cmdline_action) as actions by user_username,device_hostname,bedrock_cmdline_action

This query generates the following table:

AWS CLI profiling 2

Now, instead of an analyst looking at various complex command lines and trying to reverse engineer their functionality, they are presented in a relatively clean list that clearly shows what types of AWS Bedrock command lines a user has executed.

As a final step, we can decide to exclude the “list” and “get” actions and focus on the “delete” action:

_index=sec_record_endpoint
| where commandLine matches /(aws\,bedrock)/
| replace(commandLine, /,/," ") as commandLine
| parse regex field = commandline "(?<commandLine_clean>[^\/]*$)" nodrop

| if(commandLine_clean matches /aws bedrock delete/,"delete",
if(commandLine_clean matches /aws bedrock get/,"get",
if(commandLine_clean matches /aws bedrock list/,"list",""))) as bedrock_cmdline_action
| bedrock_cmdline_action as actions
| where actions = "delete"
| values(commandLine_clean) as AWSBedrockCommandLines by device_hostname,user_username

Now, we only get results returned when the command line in question has an associated action of “delete”:

AWS CLI profiling 5

This type of profiling may take a little bit of detection engineering elbow grease to get set up, but once it is in place, it can save many hours of pouring over various command line values that, as we all know, seem to blur together after a certain period of time.

Final thoughts

Much has been written and spoken about AI and every day it is becoming apparent that this technology is not only here to stay, but will have a material impact on both the social and professional aspects of our lives. For those in charge of defending enterprises and networks, the protection of AI workloads will predictably become a priority.

This blog has focused on AWS Bedrock and its relevant telemetry streams: CloudTrail management and data events, model invocation telemetry and endpoint telemetry. All four streams need to be understood, analyzed and formed into efficacious threat detection rules and proactive hunts to gain visibility into and protect the AWS Bedrock platform.

Learn more about how AI will impact cybersecurity.

Complete visibility for DevSecOps

Reduce downtime and move from reactive to proactive monitoring.

部門

Sumo Logic cloud-native SaaS analytics

Build, run, and secure modern applications and cloud infrastructures.

Start free trial

Anton Ovrutsky

Senior Threat Research Engineer

Anton Ovrutsky leverages his 10+ years of expertise and experience as a BSides Toronto speaker, C3X volunteer, and an OSCE, OSCP, CISSP, CSSP and KCNA certificate holder in his role at Sumo Logic's Threat Labs. He enjoys the defensive aspects of cybersecurity and loves logs and queries. When not diving into the details of security, he enjoys listening to music and cycling.

More posts by Anton Ovrutsky.

これを読んだ人も楽しんでいます