Agents and Security - Friend or Foe?

Dec 06, 2024

Agents have given rise to a transformative shift in how we view and build technology for the modern world. Similar to the overarching hype cycles of emerging technology categories, the tools and subcategories within go through their own cycles as a part of the larger curve– and it’s clear we’re right at the peak of curiosity and experimentation for AI Agents.

You’ve probably heard about how AI agents are transforming workflows in different verticals, such as customer support, finance, legal, and software development. They’re also starting to make a splash in security applications. Specifically within security, everyone’s talking about the AI driven SOC (Security Operations Center), but security applications have so much more scope for where Agents could be applied.

First, let’s break down the meaning of an AI agent, or agentic workflows. Clearly there’s variance in definitions, but the most agreed upon is a software that autonomously performs specific tasks or makes decisions on behalf of a user or system. This software shows a clear usage of System 2 thinking, or analytical problem solving, rather than System 1 thinking, or automatic information retrieval.

Bear in mind that AI Agents are conceptually different from chatbots like ChatGPT, which simply regurgitate answers based on learned data (System 1). We’ll also cover Copilots, which are peripheral information synthesizers with the ability to offer contextual suggestions and insights. Some copilots are based on System 1 thinking, and some utilize agentic workflows to achieve deeper System 2 thinking and contextual decision making. We’ll come back to these later.

Use Cases in Security

Think about what’s covered by SOC automation:

Tier 1 (& maybe Tier 2) SOC analysts
Triaging & investigating security alerts (using logs/alerts from a SIEM or other data integrations)
Remediating simple vulnerabilities

The Security Operations Center (SOC) is a largely human labor based organization within companies, usually too short-staffed to assess the large number of threats and alerts which come through, and with high turnover of employees. For this reason, automating the low level and repetitive work within the SOC with AI agents is both a pressing and obvious application of agents in security. Companies like Prophet Security and Radiant Security have jumped in at an opportune time to tackle this problem.

But Agents don’t have to simply aid in circumstances where human labor is the bottleneck- they can also add contextual reasoning capabilities to software tasks which would benefit from adding more nuance and explanation. These tasks can be broken down into 1) those which require reasoning to create hypotheses for predictive capabilities, and 2) those which require the generation of solutions based on insights or evidence.

Using our above definition of agents, we can think of many use cases in security applications where Agents can fit in.

Note: These use-cases can be initiated in SAAS security applications, or within the security teams of enterprises themselves!

Compliance (GRC)

Each organization is different, and has to adhere to compliance protocols and standards specific to their industry and the type of data they handle. This information is usually manually audited and synthesized into security questionnaires, in both an error prone and inefficient process. With AI agents which are able to correctly contextualize and interpret compliance rules, they can recognize data associated with compliance risks, automatically complete risk assessments, and provide continuous feedback on how to improve adherence to regulatory compliance.

This is a proven use case, and companies like Norm.ai & Simbian have developed GRC and Regulation specific Agents that have reduced time spent filling out security questionnaires from hours to minutes! These tasks mostly rely on summary/synthesis and are generative in nature, so LLMs are likely to perform well here.

Classification

One of the most variable problems in security is classification. This can be classification of data, categorization of alerts, or determining the sensitivity of certain information. These classification tasks are highly specific to the business or to the customer’s data, and require heavy customization. Naive rules/regexes/decision-trees produce high amounts of false positives, or worse, false negatives. (Some could argue that false positives are harmful since they waste time needed for further investigation, but imagine missing a critical vulnerability or important alert due to incorrect classification! It’s the ever-fearsome possibility that you don’t know what you don’t know.)

Most of these classification tasks could be automated by a single LLM call (fine-tuned or not) with specific reasoning rules. But for non-deterministic, multi-step processes which involve reasoning rules, RAG for pulling similar docs/alerts to assess their classifications + relevance, and tool-calling to verify certain requirements, Agents could be of great help.

Here, companies will need to build their own internal agents, perhaps using highly customizable and extensible agent frameworks like LangGraph. It’s evident that only highly specific and complex classification tasks will actually require something like predictive Agents to solve them. Because of the rarity of such tasks which can’t be addressed with LLMs or classical machine learning, this is a slightly weaker use case.

Specific Vulnerability Remediation

With security vulnerabilities, these also fall into the “highly business-specific” category, with each alert being tied to specific business context which needs to be assessed for importance and relevance.

For example, if you receive an alert in your SIEM which says “Role X is highly overprovisioned”, here are some of the questions you would immediately ask to get the context you need:

What kinds of permissions does Role X have? Does it have access to sensitive business information? How can we quantify ($$) this risk?
How many users have access to role X? Who are they? What departments do they work in?
Do all of the above users absolutely need to have access to Role X based on their work functions?
When was the last time each of these users last assumed Role X?
Are all of these users full-time employees? Should some be deprovisioned?
Based on the above, can we pare that list down to who Role X is most necessary for, to uphold least privilege?

As you can see, the process for assessing some risks may be common to different companies, but the process of acquiring said information, and the follow-up questions needed are company specific.

Now let’s take this one step further and suggest a fix for said vulnerability, based on the Agent’s context and the risk- maybe with a final approval requirement from a human. We’ve just automated the nitty-gritty and repetitive parts of a security engineer’s day, freeing them up for higher level and more stimulating work!

This use case excites me the most because of the potential for tangible manual task automation, and precise information extraction. It is however more complex due to the combination of predictive and generative tasks, mostly rooted in the intuition needed to decide on a follow up question/action. There’s a lot of interesting initiatives going on in this space, with Wiz recently announcing remediation 2.0 … and keep an eye on Bedrock Security for more!

Prioritization of Alerts

As an extension of the above two use-cases, by pulling in relevant context for alerts, we can better understand their actual risk level, and even quantify that risk relative to other alerts. Especially in security products which have a reputation for being a litter of disjoint alerts, or worse, alerts mixed in with benign warnings (AKA the purveyors of alert fatigue), having a contextual reasoning based way to prioritize them would save time spent not knowing where to start to improve security posture.

At first glance, this might seem like a use case ripe for false positives. And maybe with a single LLM call, that would be the case. The right way to think about this use case is from multiple perspectives – such as exposure (data privacy & attack paths), compliance (regulation & quantified cost of breach), and misconfiguration (backups, encryption, vulnerabilities) – the same way you would analyze an alert. This can now be automated using multiple LLMs fine-tuned on each perspective, and combining assessments to form a multifaceted ranking of the alert’s priority. The good thing is there’s no shortage of alerts, which means there’s abundant training data to improve these models in a feedback loop, and increase their confidence.

This is mostly a predictive use case, based on using additional data in an alert’s context to formulate a sort of risk-score, normalized across all alerts. It’s an interesting use case for a SIEM to go after, as this is a hub with access to many layers of tangential context which could be useful when prioritizing. This is also a great starting point to tackling automated remediation, which IMO provides more tangible value.

Attack Pen-testing

Everyone needs to stress-test their applications, whether for scale, or for vulnerability to attacks. Pen-testing involves autonomously analyzing the security posture of an application by finding system vulnerabilities or security lapses, and assessing their risk. Along the same lines, red-teaming simulates a real-world attack, in which an Agent finds and exploits vulnerabilities found in a system, making its way to sensitive business information, or “crown jewels”. This is supposed to be a real-life test of robustness towards attacks, which are likely to also be orchestrated with cutting edge AI approaches.

In perhaps the only security use case where we would want to use AI in an adversarial (but controlled) way, Agents no-doubt excel. Companies like Horizon3.ai have an entire playbook of agentic pentesting frameworks at their disposal, and since this technique is mostly generative and slightly brute-force based (try attacks until they work), it’s a solid solution.

Threat Analysis & Detection

Many companies ingest their various logs (access/write/deletion logs, alerts, warnings) into a separate datastore, or external SIEM, like Datadog or Sumo Logic. After these logs are aggregated, they need to be analyzed and evaluated for potential security threats - whether it be in access patterns or based on well-known attack frameworks like MITRE (in which attacks can be initiated over a long period of time, and extremely hard to detect). Agents which are trained in understanding these attack protocols and can detect unusual activity by asking the right business-specific questions using prior context & information are invaluable to decreasing Dwell Time & MTTD (Mean Time To Detection).

Again, questions to be asked are specific to the situation and may require access to various data sources based on the company’s architecture (e.g. Why did this role grant expand to include user X, and what is user X’s business function? Or who was data X deleted by what was their business function, are they the data owner, and what did data X contain?).

What’s interesting about this use case is that it’s actually using a synthesis of information observed over various time periods, and reasoning over existing detection frameworks to find similarities (predictive). While there is a high chance for false positives when initially implementing this use case (due to lack of data), we’re much more likely to detect slow, subtly orchestrated and expanding attacks, which rule based systems would almost always miss!

Security Copilots

Perhaps a synthesis of some of the above use-cases (predictive & generative), a security copilot can aggregate information from various fragmented but data-rich sources (database/SIEM, Cloud access patterns & configurations, data catalog, IAM tools, DSPM, etc) in order to provide actionable insights to improve the security posture of the organization.

This Copilot can have agentic threat-detection, compliance assessment, and/or vulnerability remediation capabilities, and proactively prioritize and suggest actions to perform as they are identified. It could also have a chat interface for reasoning out and answering user queries. Coming back to our analysis of chatbots vs copilots, this copilot would be an example of System 2 thinking, since it utilizes agentic workflows. Microsoft Copilot for Security is an impressive development of this use-case, though it is Microsoft products specific (Azure, Entra, Purview, etc).

I expect almost every security “pseudo-platform” to implement their own version of a copilot or natural language interface. What differentiates here is the variety and complexity of the data- this decides whether a reasoning agent provides real value.

Prerequisites for using Agents

The one quality in common with security applications which can implement agentic workflows is that they have access to data. This data is collected by the immediate application itself, or exists within tangential security-related applications (SIEM, CSPM, IAM, etc), ready to be queried. Without access to some or all of the context which would enrich a human’s ability to derive certain solutions, an LLM would almost certainly fail, no matter the training data.

And if a solution can be derived by looking at the problem itself, or with minimal amounts of reasoning (or a single LLM call) , we can argue that Agents shouldn’t be used at all. In order for a problem to be “suitable” for the use of agentic workflows, the following must be true:

The problem manifests in highly variable scenarios, with differentiated diagnosis procedures.
The solution to the problem can be one of many, or a combination of different steps.
There exists multiple sources/tools from which additional context can be pulled in order to reason out next steps in a workflow.

In other words, a problem for which a straightforward decision tree exists is not a use case for AI Agents. But here’s a straightforward decision tree for when you should be using Agents:

Moat for using Agents

Building off of access to data, I anticipate that companies with the biggest “moat” for building agentic workflows will be the owners of proprietary data or rich insights gathered by their own security platforms. Solutions which plug-and-play into various existing data-sources will be commoditized, unless they can bring in a new technology or approach with heavy customization.

The Future

There’s an old saying that goes “The sharpest blade is useless in unskilled hands”. Along the same lines, when creating Agents for various applications, it’s important to first understand the nuance and iteration required to implement and successfully productionize such a powerful technology. Part 2 of this deep dive will focus on the tools and frameworks available to increase determinism and accuracy in agentic workflows while preserving efficiency needed to meet strict SLAs.

On the other hand, we must remember that we don’t need to hit a nail with a battering ram. There may be reasoning tasks where a simple decision tree with RAG may suffice, or where a specialized LLM can be trained to provide answers to moderately complex tasks. Thinking back to the days of the blockchain hype cycle, every company was slapping a blockchain onto their storage use cases (I wrote about this here) so they could claim they were using the latest and greatest technology. We will definitely witness something similar for AI Agents.

At the end of the day, it matters that the problem at hand is being solved in the best possible way, regardless of what technology is being used. Using this as our guiding principle, we can begin to cautiously and intentionally imagine a world where manual and repetitive security tasks are automated, and alert fatigue is nothing but a forgotten nightmare.