REPO-SYNCED SPEC DOC

Execution Safety Loop Specification

Informative Document: This is an informative specification that provides detailed guidance for the execution safety loop pattern defined normatively in Section 8.6 of the core specification. In case of conflict, the norm

Support documentDraft1.0.0-draft2026-04-15

Source: spec/docs/security/execution-safety-loop.md

Open Source Markdown Browse Full Spec Reference

On this page

Jump to a section

Use the outline to move through longer pages without losing your place.

Abstract
1. Introduction
2. The Safety Dongle Deployment Model
3. The Execution Safety Loop
4. Minimal Operation Surface
5. Integration with Gatekeeper
6. Integration with Danger Zone
7. Deployment Examples
8. Opting Out
9. Implementation Requirements
Related Specifications

Version: 1.0.0-draft Status: Draft Last Updated: 2026-04-15

Informative Document: This is an informative specification that provides detailed guidance for the execution safety loop pattern defined normatively in Section 8.6 of the core specification. In case of conflict, the normative specification takes precedence.

Abstract

This document specifies the execution safety loop — an opt-in monitoring pattern that enables continuous safety evaluation of an LLM's intended actions across all connected MCP servers. It also defines the safety dongle deployment model, where a standalone MCP-AQL server functions as a universal safety layer without implementing any other MCP-AQL features.

1. Introduction

1.1 Purpose

The execution safety loop addresses a fundamental challenge in LLM tool-calling environments: how do you enforce safety policies on an LLM's actions when those actions target multiple independent MCP servers?

Individual MCP servers can protect their own operations, but no single server has visibility into what the LLM is doing across all connected servers. The execution safety loop solves this by providing a central checkpoint that the LLM reports to before taking any action — regardless of which server handles the actual operation.

1.2 Status

The execution safety loop is an optional feature defined normatively in Section 8.6 of the core specification. This document provides informative guidance on deployment models, integration patterns, and implementation strategies.

1.3 Core Concept

The execution safety loop is a protocol-level monitoring pattern:

The LLM plans an action (any tool call, file operation, shell command, etc.)
The LLM reports its intent to a safety-enabled MCP-AQL server via record_execution_step with nextActionHint
The server evaluates the intent against its safety pipeline (Gatekeeper policies, Autonomy Evaluator, Danger Zone rules)
The server returns an AutonomyDirective — go, pause, or hard stop
The LLM respects the directive before proceeding

This is distinct from the full agent lifecycle that a specific adapter (like DollhouseMCP) may implement. The safety loop is purely about monitoring and evaluating — it does not manage agent state, personas, elements, or any domain-specific functionality.

1.4 Opt-In Design

The execution safety loop is entirely optional at every level:

Adapters choose whether to support it
Clients choose whether to enable it
Users can disable it at any time

No adapter is required to implement the safety loop, and no client is required to use it. When disabled, the adapter operates normally without any safety enforcement overhead. See Section 8: Opting Out for details.

2. The Safety Dongle Deployment Model

2.1 What Is a Safety Dongle?

A safety dongle is a standalone MCP-AQL server whose sole purpose is safety enforcement. It has no domain functionality — it does not manage files, query databases, or interact with external services. It exists only to evaluate the LLM's intended actions and return go/no-go directives.

The name "dongle" reflects the deployment model: you plug it in alongside your existing MCP servers, and it acts as a universal firewall for all tool calls in the session.

2.2 Architecture

┌────────────────────────────────────────────────────────┐
│  MCP Client (Claude, GPT, etc.)                        │
│                                                        │
│  LLM decides to take an action                         │
│       │                                                │
│       ▼                                                │
│  1. Report intent to safety dongle                     │
│     record_execution_step {                            │
│       nextActionHint: "calling write_file on server X" │
│     }                                                  │
│       │                                                │
│       ▼                                                │
│  ┌──────────────────────────────────────────┐          │
│  │  Safety Dongle (MCP-AQL Server)          │          │
│  │                                          │          │
│  │  Gatekeeper → Autonomy Evaluator →       │          │
│  │  Danger Zone Enforcer                    │          │
│  │                                          │          │
│  │  Returns: AutonomyDirective              │          │
│  │  { continue: true/false, factors: [...] }│          │
│  └──────────────────────────────────────────┘          │
│       │                                                │
│       ▼                                                │
│  2. If continue: execute the action on target server   │
│     If !continue: pause, escalate, or abort            │
│                                                        │
│  ┌─────────┐  ┌─────────┐  ┌─────────┐               │
│  │ MCP     │  │ MCP     │  │ MCP     │               │
│  │ Server A│  │ Server B│  │ Server C│               │
│  │ (files) │  │ (db)    │  │ (slack) │               │
│  └─────────┘  └─────────┘  └─────────┘               │
└────────────────────────────────────────────────────────┘

2.3 What the Dongle Does NOT Do

A safety dongle in its minimal configuration:

Does not manage elements (personas, skills, templates, agents, memories)
Does not store or query domain data
Does not interact with external services beyond safety evaluation
Does not implement the full CRUDE pattern for domain operations
Does not require introspection of other servers' capabilities

It is a purpose-built safety layer with a minimal operation surface.

2.4 Relationship to Full MCP-AQL Adapters

The safety dongle and a full MCP-AQL adapter are both valid configurations of the same protocol. They differ only in scope:

Aspect	Safety Dongle	Full MCP-AQL Adapter
Purpose	Safety enforcement only	Domain operations + safety
Operations	~7 (safety loop + introspection)	Dozens (full CRUDE surface)
Elements	None	Personas, skills, agents, etc.
State	Execution state + policies only	Full domain state
Endpoints used	CREATE, EXECUTE, READ (subset)	All five CRUDE endpoints

A full MCP-AQL adapter MAY embed the safety loop directly — it does not need a separate dongle. The dongle model is for environments where safety enforcement is decoupled from the adapters performing the actual work.

3. The Execution Safety Loop

3.1 Loop Protocol

The execution safety loop follows a simple protocol:

┌─────────────────────────────────────┐
│                                     │
│  ┌──────────┐                       │
│  │  Plan    │  LLM determines       │
│  │  action  │  next action          │
│  └────┬─────┘                       │
│       │                             │
│       ▼                             │
│  ┌──────────┐                       │
│  │  Report  │  record_execution_step│
│  │  intent  │  { nextActionHint,    │
│  │          │    outcome? }         │
│  └────┬─────┘                       │
│       │                             │
│       ▼                             │
│  ┌──────────┐                       │
│  │ Evaluate │  Gatekeeper +         │
│  │          │  Autonomy Evaluator + │
│  │          │  Danger Zone          │
│  └────┬─────┘                       │
│       │                             │
│       ▼                             │
│  ┌──────────┐    ┌──────────┐       │
│  │ continue │───▶│  Act     │       │
│  │ = true   │    │  (tool   │       │
│  └──────────┘    │   call)  │       │
│                  └────┬─────┘       │
│       ┌───────────────┘             │
│       │                             │
│       ▼                             │
│  ┌──────────┐    ┌──────────┐       │
│  │ continue │───▶│  Pause / │       │
│  │ = false  │    │  Escalate│       │
│  └──────────┘    └──────────┘       │
│                                     │
└─────────────────────────────────────┘

Lifecycle:

Start: execute_agent — initiates the safety loop (requires explicit approval)
Loop: record_execution_step — report each intended action, receive directive
End: complete_execution (success) or abort_execution (abnormal termination)

3.2 Monitoring Scope

The execution safety loop monitors all intended actions, not only MCP-AQL operations on the same adapter. This includes:

Tool calls to any MCP server in the client session
Tool calls to non-MCP-AQL MCP servers
Built-in client capabilities (file I/O, shell access, web requests)
Any operation that produces effects beyond the LLM's reasoning context

This broad scope is what makes the safety dongle deployment model possible: the dongle does not need to understand or proxy the actual operations — it only needs to evaluate the LLM's description of what it intends to do.

3.3 The `record_execution_step` Parameters

The record_execution_step operation accepts several parameters. The full parameter table is defined normatively in Section 8.6.3 of the core specification. The two most important fields are:

nextActionHint (MUST) — describes the intended next action for safety evaluation
outcome (SHOULD) — reports the result of the previous step ("success", "failure", or "skipped"), which drives Stage 2 of the Autonomy Evaluator pipeline (Section 8.7.2)

The nextActionHint is a human-readable string describing the intended action with sufficient detail for safety evaluation.

Good nextActionHint examples:

// Specific tool call with target
"calling write_file on project/config.json to update database URL"

// Shell command with detail
"executing shell command: npm install express"

// Cross-server operation
"sending message to #general channel via Slack MCP server"

// Destructive operation
"deleting all records from staging_users table via database MCP server"

// File system operation
"reading /etc/passwd to check user configuration"

Poor nextActionHint examples:

// Too vague — cannot evaluate risk
"doing something"

// No target information
"writing a file"

// Missing context for risk assessment
"running a command"

The quality of safety evaluation depends directly on the quality of the nextActionHint. LLM system prompts SHOULD instruct the model to provide specific, detailed action descriptions.

3.4 The `AutonomyDirective` Response

Every record_execution_step call returns an AutonomyDirective indicating whether the LLM may proceed. See Section 8.7 of the core specification for the full type definition.

Key fields:

Field	Type	Description
`continue`	boolean	Whether the LLM may proceed with the reported action
`factors`	string[]	Human-readable explanations of the evaluation decision
`stopped`	boolean	Whether the agent has been hard-blocked (Danger Zone)
`reason`	string	Why the agent was paused or stopped
`stepsRemaining`	number	Steps remaining before mandatory pause
`nextStepRisk`	SafetyTier	Safety tier assigned to the reported action
`notifications`	AgentNotification[]	Gatekeeper blocks, danger alerts, and other events

Decision tree:

If stopped === true → Hard stop. Agent MUST cease all actions until unblocked via out-of-band verification.
If continue === false → Pause. Agent MUST NOT proceed. Report the pause to the user or upstream system.
If continue === true → Proceed. Agent MAY execute the reported action.

4. Minimal Operation Surface

4.1 Required Operations

A safety dongle implements a minimal set of operations:

Operation	Endpoint	Purpose
`execute_agent`	EXECUTE	Start the execution safety loop
`record_execution_step`	CREATE	Report intended action, receive `AutonomyDirective`
`complete_execution`	EXECUTE	Signal normal completion of the safety loop
`abort_execution`	EXECUTE	Signal abnormal termination
`confirm_operation`	EXECUTE	Confirm Gatekeeper blocks or Autonomy Evaluator `confirm` tier pauses
`verify_challenge`	CREATE	Submit out-of-band verification code (`verify` tier and Danger Zone)
`introspect`	READ	Discover available operations and capabilities

This is 7 operations across 3 endpoints — a fraction of a full MCP-AQL adapter's surface.

4.2 Endpoint Classification

The endpoint placement of these operations is critical to the permission architecture:

Endpoint	Default Permission	Operations	Rationale
READ	`AUTO_APPROVE`	`introspect`	Zero friction for discovery
CREATE	`CONFIRM_SESSION`	`record_execution_step`, `verify_challenge`	Approve once, then frictionless for the session
EXECUTE	`CONFIRM_SINGLE_USE`	`execute_agent`, `complete_execution`, `abort_execution`, `confirm_operation`	Each invocation requires explicit approval

The key design decision: record_execution_step is on CREATE (not EXECUTE) because it runs on every loop cycle. If it were on EXECUTE, the client would need to either:

Auto-approve all EXECUTE operations — which also auto-approves confirm_operation, collapsing the Gatekeeper permission architecture
Manually approve every loop cycle — making the safety loop unusable in practice

By placing it on CREATE with CONFIRM_SESSION semantics, the user approves it once at the start of the session, and subsequent calls are frictionless. Meanwhile, confirm_operation stays on EXECUTE with CONFIRM_SINGLE_USE, preserving the permission model.

4.3 Permission Architecture

The CRUDE endpoint classification directly determines default friction levels:

READ (AUTO_APPROVE)
  └── introspect           → Zero friction

CREATE (CONFIRM_SESSION)
  ├── record_execution_step → Approve once, then frictionless
  └── verify_challenge      → Approve once, then frictionless

EXECUTE (CONFIRM_SINGLE_USE)
  ├── execute_agent         → Explicit approval each time
  ├── complete_execution    → Explicit approval each time
  ├── abort_execution       → Explicit approval each time
  └── confirm_operation     → AUTO_APPROVE override (avoids "confirm to confirm" loop)

Implementation Note: confirm_operation has an AUTO_APPROVE override despite being on the EXECUTE endpoint. This prevents an infinite confirmation loop where confirming an operation itself requires confirmation.

Security Note: Because confirm_operation is auto-approved, the LLM could theoretically call it to approve its own blocked operations (Gatekeeper blocks or Autonomy Evaluator confirm tier pauses) without human involvement. Implementations MUST prevent self-approval — the entity confirming an operation MUST NOT be the same agent that triggered the block. Server-side controls (caller identity validation, confirmation tokens bound to a human channel, or out-of-band confirmation delivery) are required. See Section 8.7.3 for the normative requirement.

5. Integration with Gatekeeper

5.1 How the Dongle Evaluates Actions

The safety dongle uses the same Gatekeeper architecture as a full MCP-AQL adapter, but scoped to the execution safety loop:

Layer 1 — Route Validation: Verifies the operation is called via the correct CRUDE endpoint
Layer 2 — Safety Tier: Evaluates the endpoint's permission characteristics
Layer 3 — Policy Evaluation: Checks nextActionHint against configured policies
Layer 4 — Confirmation: Issues confirmation tokens for operations requiring acknowledgment

For the safety dongle, Layer 3 is the most important: it evaluates the LLM's stated intent against configurable patterns.

5.2 Policy Model

Policies control which actions are allowed, denied, or require confirmation. Policies are matched against the nextActionHint string using glob patterns:

// Example policy configuration
{
  deny: [
    "drop_*",           // Block any drop operations
    "delete_all*",      // Block bulk deletions
    "*_production*",    // Block anything targeting production
    "rm -rf*"           // Block recursive force deletions
  ],
  requiresApproval: [
    "delete_*",         // Require approval for deletions
    "*force*",          // Require approval for force operations
    "deploy_*",         // Require approval for deployments
    "git push*"         // Require approval for git pushes
  ],
  autoApprove: [
    "read_*",           // Auto-approve reads
    "list_*",           // Auto-approve listings
    "get_*",            // Auto-approve getters
    "search_*"          // Auto-approve searches
  ]
}

Resolution order: deny > requiresApproval > autoApprove > default (evaluate via Autonomy Evaluator)

5.3 Notification System

When the Gatekeeper blocks an operation or the Autonomy Evaluator pauses execution, notifications are included in the AutonomyDirective response:

{
  continue: false,
  reason: "Action requires approval: delete_user",
  factors: ["Pattern match: delete_* requires approval"],
  notifications: [
    {
      type: "permission_pending",
      message: "Operation 'delete_user' requires confirmation",
      metadata: {
        operation: "delete_user",
        level: "CONFIRM_SINGLE_USE"
      },
      timestamp: "2026-02-25T14:30:00Z"
    }
  ]
}

Notification types:

Type	Trigger	Action Required
`permission_pending`	Gatekeeper blocked an operation	Call `confirm_operation` to approve
`autonomy_pause`	Autonomy Evaluator returned `continue: false`	For `confirm` tier: call `confirm_operation`; for `verify` tier: out-of-band `verify_challenge` (Section 6)
`danger_zone`	Hard block triggered (`danger_zone` tier or `deny` pattern)	Out-of-band `verify_challenge` required (Section 6)

The notification system is pull-based: notifications are included in record_execution_step responses. There is no push channel — the LLM must call record_execution_step to discover pending events. This aligns with the MCP protocol's request-response model.

6. Integration with Danger Zone

6.1 Danger Level Escalation During Execution

When the Autonomy Evaluator assigns a high risk score to a nextActionHint, the action escalates through safety tiers defined in the Danger Levels specification:

Risk Score	Tier	Behavior
0-30	`advisory`	Log and continue
31-60	`confirm`	Pause for human review (`continue: false`)
61-85	`verify`	Pause + out-of-band verification via `verify_challenge` (Section 8.8)
86-100	`danger_zone`	Hard stop + out-of-band verification via `verify_challenge` (Section 8.8)

6.2 Out-of-Band Verification

Both the verify and danger_zone tiers trigger out-of-band verification, but with different enforcement severity.

Danger Zone hard stop (danger_zone tier or deny pattern match):

The AutonomyDirective returns stopped: true
A verification challenge is generated with a cryptographically random code
The code is displayed through a channel inaccessible to the AI (OS dialog, hardware token, SMS, etc.)
The agent is blocked at the agent level until a human provides the correct code via verify_challenge
The block persists across server restarts

Verify tier pause (verify tier):

The AutonomyDirective returns continue: false (without stopped: true)
A verification challenge is generated and displayed out-of-band (same as above)
The agent is paused — subsequent record_execution_step calls return continue: false until verified
Unlike Danger Zone blocks, verify pauses do not persist across restarts and do not prevent new executions

See Section 8.8 of the core specification for the full out-of-band verification protocol.

Security Principle: The verification code MUST NOT appear in any MCP response, tool result, log accessible to the AI, or _meta field. The AI can only see the challenge ID — never the code itself. This ensures that even a prompt-injected LLM cannot self-verify.

7. Deployment Examples

7.1 Minimal Safety Dongle

A standalone MCP-AQL server with no domain functionality:

// MCP client configuration
{
  mcpServers: {
    // Safety dongle — evaluates all actions
    "safety": {
      command: "mcpaql-safety-dongle",
      args: ["--policy", "./safety-policies.json"]
    },
    // Domain servers — do the actual work
    "filesystem": { command: "mcp-filesystem", args: ["/workspace"] },
    "database": { command: "mcp-postgres", args: ["--connection", "..."] },
    "slack": { command: "mcp-slack", args: ["--token", "..."] }
  }
}

The LLM's system prompt includes:

Before taking any action (tool call, file operation, shell command),
you MUST report your intent to the safety server via
record_execution_step with a detailed nextActionHint.
Only proceed if the AutonomyDirective returns continue: true.

7.2 Safety Dongle + Multiple MCP Servers

In a multi-server environment, the safety dongle monitors actions across all servers:

┌─────────────────────────────────────────────────┐
│  MCP Client                                     │
│                                                 │
│  ┌─────────────┐                                │
│  │ Safety      │ ◄── report ALL intended actions│
│  │ Dongle      │ ──► go/no-go directives        │
│  │ (MCP-AQL)   │                                │
│  └─────────────┘                                │
│                                                 │
│  ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐           │
│  │FS    │ │DB    │ │Slack │ │Git   │           │
│  │Server│ │Server│ │Server│ │Server│           │
│  └──────┘ └──────┘ └──────┘ └──────┘           │
└─────────────────────────────────────────────────┘

The domain servers do not need to know about the safety dongle. They are standard MCP servers — the safety evaluation happens between the LLM and the dongle, before the LLM calls any tool on any server.

7.3 Full MCP-AQL Adapter with Embedded Safety

A full MCP-AQL adapter can embed the execution safety loop directly — no separate dongle needed:

// Single adapter with full CRUDE surface + embedded safety
{
  mcpServers: {
    "dollhouse": {
      command: "dollhouse-mcp",
      args: ["--safety-mode", "enforcing"]
    }
  }
}

In this model, the adapter handles both domain operations (elements, personas, agents) and safety enforcement (Gatekeeper, Autonomy Evaluator, Danger Zone) in the same server. The execution safety loop is just one feature among many.

8. Opting Out

8.1 When to Disable

The execution safety loop may be unnecessary or undesirable in several scenarios:

Trusted environments: Development machines, sandboxed containers, or testing environments where safety enforcement adds friction without benefit
Performance-sensitive workloads: The loop adds latency to every action (one round-trip per tool call)
Simple automation: Scripts or pipelines where the action sequence is predetermined and does not need LLM-level safety evaluation
User preference: Some users may prefer to operate without safety constraints

8.2 How to Disable

Adapters indicate their safety loop status via introspection:

// Safety loop actively enforcing
{ capabilities: { execution_safety_loop: "enforcing" } }

// Safety loop in monitoring mode (advisory only)
{ capabilities: { execution_safety_loop: "monitoring" } }

// Safety loop logging only (no evaluation)
{ capabilities: { execution_safety_loop: "logging" } }

// Safety loop supported but currently disabled
{ capabilities: { execution_safety_loop: "disabled" } }

// Adapter does not support the safety loop (field omitted entirely)

Disabling is typically done via adapter configuration:

// Adapter startup configuration
{ safetyLoop: { mode: "disabled" } }

// Or via environment variable
// MCPAQL_SAFETY_LOOP=disabled

8.3 Partial Modes

Between full enforcement and fully disabled, adapters MAY support intermediate modes:

Monitoring mode ("monitoring"):

Actions are evaluated by the full safety pipeline
AutonomyDirective is returned with accurate factors and risk assessment
continue is always true — the directive is advisory, never blocking
stopped MUST NOT be set to true
Useful for: observing what the safety loop would block without actually blocking, gradual rollout, policy tuning

Logging mode ("logging"):

Actions are recorded for audit purposes
No evaluation occurs — the safety pipeline is not invoked
Useful for: compliance auditing, post-incident analysis, establishing baseline action patterns

9. Implementation Requirements

This section provides a compliance checklist for adapters implementing the execution safety loop. Every requirement listed here corresponds to a normative MUST, SHOULD, or MAY in Sections 8.6–8.8 of the core specification. See Section 9.4 for the complete mapping between normative sections and compliance bullets.

9.1 MUST Requirements

Adapters that support the execution safety loop:

Core Operations & Introspection:

MUST document whether they support the execution safety loop via introspection (Section 8.6.1)
MUST clearly indicate via introspection whether safety enforcement is active or disabled (Section 8.6.7)
MUST implement execute_agent, record_execution_step, complete_execution, and abort_execution (Section 8.6.2)
MUST place record_execution_step on the CREATE endpoint (Section 8.5.4)
MUST place execute_agent, complete_execution, and abort_execution on the EXECUTE endpoint (Section 8.5.4)

AutonomyDirective Contract:

MUST return an AutonomyDirective on every record_execution_step call (Section 8.7.1)
MUST include continue and factors in every AutonomyDirective (Section 8.7.1)
MUST NOT allow agents to proceed after a stop directive (stopped: true) (Section 8.6.4)
MUST NOT set stopped: true when operating in monitoring mode (Section 8.6.4)
MUST return stopped: true in the AutonomyDirective for danger_zone tier and deny pattern triggers (Section 8.8.1)

Step Limit:

MUST implement the Step Limit stage with a configurable maxAutonomousSteps threshold (Section 8.7.5)

Notifications & Self-Approval:

MUST include a danger_zone notification broadcast to all executing agents when returning stopped: true (Section 8.8.1)
MUST include a danger_zone notification with metadata.verificationId for danger_zone tier and deny pattern challenges (Section 8.8.2)
MUST include an autonomy_pause notification with metadata.verificationId when issuing a verify tier challenge (Section 8.8.2)
MUST prevent self-approval — the entity confirming an operation MUST NOT be the same agent that triggered the block (Section 8.7.3)

Out-of-Band Verification (when the adapter implements verify or danger_zone tier evaluation, or deny pattern matching):

MUST require out-of-band human verification when a verify or danger_zone safety tier is assigned, or when a deny pattern matches (Section 8.8)
MUST generate verification codes with at least 128 bits of entropy from a cryptographically secure random source; codes MUST be cryptographically unpredictable (Section 8.8.2)
MUST display verification codes through a channel inaccessible to the AI agent — codes MUST NOT appear in any MCP response, tool result, _meta field, log, error message, or diagnostic output accessible to the AI, and MUST NOT be derivable from any information available to the AI (e.g., timestamps, sequential IDs) (Sections 8.8.2, 8.8.4)
MUST treat expired verification challenges as failed (Section 8.8.2)

Blocking Semantics (when the adapter implements stopped: true behavior):

MUST reject all subsequent execution operations for a hard-blocked agent until unblocked via verify_challenge or admin override (Section 8.8.3)
MUST NOT allow an agent to bypass a hard block by starting a new execution — blocks apply at the agent level, not the execution level (Section 8.8.3)
MUST persist hard-blocked agent state across server restarts (file-based or database storage) (Section 8.8.3)

9.2 SHOULD Requirements

Adapters that support the execution safety loop:

Evaluation Pipeline:

SHOULD evaluate actions through the multi-stage pipeline defined in Section 8.7.2
SHOULD return continue: false with an appropriate reason when the step limit is exceeded (Section 8.7.2, Stage 1)
SHOULD document the default step limit via introspection (Section 8.7.2, Stage 1)
SHOULD evaluate the outcome field in record_execution_step calls and return continue: false on reported failures (Section 8.7.2, Stage 2)
SHOULD support configurable policy patterns (deny, requiresApproval, autoApprove) (Section 8.7.2, Stage 3)
SHOULD NOT allow agents to proceed after continue: false without human intervention (Section 8.6.4)

Operations:

SHOULD implement confirm_operation for Gatekeeper blocks and confirm tier pauses (Section 8.7.3)
SHOULD implement verify_challenge for verify tier pauses and Danger Zone unblocking (Section 8.8)

Configuration & Introspection:

SHOULD expose the execution_safety_loop capability value in the introspection response (Section 8.6.1)
SHOULD make evaluation pipeline elements configurable per agent or per adapter (Section 8.7.4)
SHOULD document default configuration and supported options via introspection (Section 8.7.4)

Notifications:

SHOULD include notifications in AutonomyDirective responses for non-hard-block events (note: verify tier autonomy_pause with metadata.verificationId and danger_zone notifications are MUST — see Section 9.1) (Section 8.7.3)

Verify Tier Behavior:

SHOULD return continue: false with the pending challenge ID for subsequent record_execution_step calls while a verify challenge is pending (Section 8.8.3)
SHOULD re-evaluate actions normally when a verify challenge expires, generating a new challenge if the same action is reported again (Section 8.8.3)
SHOULD expire verification challenges after a configurable timeout (default: 5 minutes) (Section 8.8.2)

Rate Limiting:

SHOULD rate-limit failed verification attempts (no more than 10 per 60-second window per agent) (Section 8.8.5)
SHOULD reject subsequent verification attempts after the rate limit is exceeded (Section 8.8.5)
SHOULD persist rate limit state across server restarts (Section 8.8.5)
SHOULD trigger security audit events on failed verification attempts (Section 8.8.5)

Monitoring & Audit:

SHOULD support the "monitoring" partial mode for gradual rollout (Section 8.6.7)

9.3 MAY Requirements

Adapters that support the execution safety loop:

MAY support the "logging" partial mode (Section 8.6.7)
MAY implement Safety Tier evaluation (Stage 4) using pattern matching, LLM-provided risk assessments, or adapter-specific heuristics for risk scoring (Section 8.7.2, Stage 4)
MAY support configurable risk tolerance thresholds (conservative, moderate, aggressive) (Section 8.7.2, Stage 5)
MAY apply progressive lockout (e.g., doubling the window duration) after repeated rate-limit violations (Section 8.8.5)
MAY use OS dialogs, hardware tokens, SMS/email, or other display channels for presenting verification codes to operators (Section 8.8.6)

9.4 Normative Cross-Reference

The following table maps each normative section to its corresponding Section 9 requirements, providing an audit trail for compliance verification.

Normative Section	Key Requirements	Section 9 Coverage
8.6.1 Opt-In Activation	MUST document support; SHOULD expose capability value	9.1 (introspection); 9.2 (capability value)
8.6.2 Enforcement Boundary	MUST report and evaluate all actions	9.1 (operations, AutonomyDirective on every call)
8.6.3 Action Reporting	MUST/SHOULD/MAY parameter requirements	9.1 (operations — parameter validation is part of implementing `record_execution_step`)
8.6.4 Continuous Enforcement	MUST NOT proceed after stop; SHOULD NOT after pause; MUST NOT `stopped` in monitoring	9.1 (stop directive, monitoring mode); 9.2 (SHOULD NOT after pause)
8.6.5 Non-Bypass Property	Agent MUST NOT act outside loop	Agent-side protocol obligation (not adapter-specific)
8.6.6 Scope of Monitoring	Monitors all actions (informative)	Described in Sections 3.2, 2.2 of this document
8.6.7 Disabling	MAY disable; MUST indicate via introspection	9.1 (introspection); 9.2 (monitoring); 9.3 (logging)
8.7.1 AutonomyDirective	MUST include `continue` + `factors`	9.1 (AutonomyDirective fields)
8.7.2 Pipeline Stages 1–5	MUST step limit; SHOULD stages 2–3; MAY stages 4–5	9.1 (step limit MUST); 9.2 (stage 1 documentation SHOULDs, stages 2–3 SHOULDs); 9.3 (stages 4–5)
8.7.3 Notifications	MUST notification contents; MUST prevent self-approval	9.1 (notifications, self-approval); 9.2 (non-hard-block notifications)
8.7.4 Configurable Elements	SHOULD configurable; SHOULD document	9.2 (configuration, introspection)
8.7.5 Minimum Viable	MUST Step Limit; SHOULD Previous Outcome + Pattern Matching; MAY Safety Tier + Risk Tolerance	9.1 (step limit); 9.2 (outcome, patterns); 9.3 (tier, tolerance)
8.8 Introduction	MUST require OOB for `verify`/`danger_zone`	9.1 (OOB verification)
8.8.1 Trigger Conditions	MUST `stopped: true` + `danger_zone` notification	9.1 (`stopped: true`, notification broadcast)
8.8.2 Challenge Protocol	MUST entropy, display, notification, expiration	9.1 (code generation, display, challenge ID, expiration); 9.2 (timeout)
8.8.3 Blocking Semantics	MUST reject, persist, no bypass; SHOULD verify behavior	9.1 (blocking); 9.2 (verify tier)
8.8.4 Channel Separation	MUST NOT expose code (5 specific prohibitions)	9.1 (display/channel separation — consolidated)
8.8.5 Rate Limiting	SHOULD rate-limit, persist, audit; MAY progressive lockout	9.2 (rate limiting); 9.3 (lockout)
8.8.6 Implementation Flexibility	MAY any display channel	9.3 (display channels)

Core Specification — Section 8.6: Execution Safety Loop — Normative requirements
Core Specification — Section 8.7: Autonomy Evaluation — AutonomyDirective contract and evaluation pipeline
Core Specification — Section 8.8: Out-of-Band Verification — Out-of-band verification protocol (verify tier and Danger Zone)
Gatekeeper Specification — Multi-layer access control architecture
Danger Levels Specification — Risk classification and trust-to-danger gating
Confirmation Tokens Specification — Token protocol for operation confirmation

Execution Safety Loop Specification

Jump to a section

Abstract

1. Introduction

1.1 Purpose

1.2 Status

1.3 Core Concept

1.4 Opt-In Design

2. The Safety Dongle Deployment Model

2.1 What Is a Safety Dongle?

2.2 Architecture

2.3 What the Dongle Does NOT Do

2.4 Relationship to Full MCP-AQL Adapters

3. The Execution Safety Loop

3.1 Loop Protocol

3.2 Monitoring Scope

3.3 The record_execution_step Parameters

3.4 The AutonomyDirective Response

4. Minimal Operation Surface

4.1 Required Operations

4.2 Endpoint Classification

4.3 Permission Architecture

5. Integration with Gatekeeper

5.1 How the Dongle Evaluates Actions

5.2 Policy Model

5.3 Notification System

6. Integration with Danger Zone

6.1 Danger Level Escalation During Execution

6.2 Out-of-Band Verification

7. Deployment Examples

7.1 Minimal Safety Dongle

7.2 Safety Dongle + Multiple MCP Servers

7.3 Full MCP-AQL Adapter with Embedded Safety

8. Opting Out

8.1 When to Disable

8.2 How to Disable

8.3 Partial Modes

9. Implementation Requirements

9.1 MUST Requirements

9.2 SHOULD Requirements

9.3 MAY Requirements

9.4 Normative Cross-Reference

Related Specifications

3.3 The `record_execution_step` Parameters

3.4 The `AutonomyDirective` Response