AV // SEC
Writing IndexTrust Boundaries// Notes & Thoughts

Trust-Boundary Drift in Multi-Agent Workflows

2026-05-206 min read

The New Perimeter is Non-Deterministic

As engineers, we are trained to think of security in terms of deterministic parameters. A user is authenticated, an IP is whitelisted, an input is sanitized against a known regular expression. We draw neat, solid lines around our infrastructure, calling them trust boundaries.

But the rise of autonomous LLM agents—orchestrated via frameworks like LangChain, AutoGPT, or our own custom systems—has introduced a new vector: non-deterministic trust-boundary drift.

When an agent synthesizes its own SQL queries, dynamically maps API payloads, or executes raw command shell scripts based on natural language instructions, our traditional boundaries dissolve. The perimeter shifts with every single forward token.


How Drift Occurs in Autonomous Workflows

Consider a standard multi-agent coding assistant framework. It consists of a Planner Agent, a Coder Agent, and an Executor Agent. The Executor Agent operates in a Docker container, equipped with tool access to execute commands.

[User Prompt] 


┌──────────────┐
│ Planner Agent│ ◄─── (High Trust: Direct User Input)
└──────┬───────┘
       │ Synthesizes sub-tasks

┌──────────────┐
│  Coder Agent │ ◄─── (Medium Trust: Generates Python code)
└──────┬───────┘
       │ Generates code block

┌──────────────┐
│Executor Agent│ ◄─── (Low Trust: Runs shell code)
└──────────────┘

The trust boundary here appears clean: the Executor runs in an isolated Docker container, so it can do no harm.

But what happens when the Executor needs to write results back to a shared host mount, or read database credentials to verify its output? Under the hood, developers often allow "temporary" elevation permissions to make tools work smoothly during debugging. Over time, as more tools are integrated, these temporary permissions accumulate.

This is Trust-Boundary Drift.

Because the agent is autonomous, a clever prompt injection contained inside a target repository can hijack the Coder Agent, steer it into generating malicious Python code, and force the Executor to escape its Docker sandbox utilizing a mount exploit. The system has shifted from high-trust user assistance to a remote code execution pipeline for the attacker.


Auditing the Drift: Core Exploits

In our audits of modern enterprise agent systems, we consistently discover three primary drift vectors:

1. Dynamic Tool Introspection

Agents are often allowed to dynamically query what tools are available. An attacker can inject instructions that trick the agent into utilizing sensitive internal tools (like an email dispatcher or database wiper) that were never intended for the user's current privilege tier.

2. Context Windows Pollution

If an agent reads untrusted text (e.g., a scanned website or a PDF document) and appends it directly into its active context, the prompt instructions contained in that document can override the system prompt. This is known as indirect prompt injection and is the most common driver of trust boundary drift.

Important

[!IMPORTANT] If your autonomous agent reads emails, slack messages, or target files, you must assume the context window is compromised. Never allow the agent to execute write actions or dispatch APIs without manual-in-the-loop validation.


Engineering Robust Isolation

How do we mitigate a boundary that won't stand still?

  1. Deterministic Sandboxes: Runtimes must be isolated at the hypervisor or kernel level (utilizing tools like gVisor or eBPF socket filters), not just simple Docker containers.
  2. Dynamic Tokens: Tools should require single-use cryptographically signed tokens generated by a parent orchestrator. The agent never receives raw credentials.
  3. Strict Mono-directional Flow: A low-trust agent should never be able to synthesize actions that alter the configuration of high-trust planner agents.

By treating agent orchestrations as highly vulnerable network boundaries and designing defensive boundaries that are strictly deterministic, we can harness the power of autonomous networks without letting our perimeters drift away.


[^1]: Varma, A. "Secure Agent Execution: An eBPF containment model," AI Security Symposium, 2025.