REPO-SYNCED SPEC DOC

Execution Safety Loop Specification

Informative Document: This is an informative specification that provides detailed guidance for the execution safety loop pattern defined normatively in Section 8.6 of the core specification. In case of conflict, the norm

Support documentDraft1.0.0-draft2026-04-15

Source: spec/docs/security/execution-safety-loop.md

Version: 1.0.0-draft Status: Draft Last Updated: 2026-04-15

Informative Document: This is an informative specification that provides detailed guidance for the execution safety loop pattern defined normatively in Section 8.6 of the core specification. In case of conflict, the normative specification takes precedence.

Abstract

This document specifies the execution safety loop — an opt-in monitoring pattern that enables continuous safety evaluation of an LLM's intended actions across all connected MCP servers. It also defines the safety dongle deployment model, where a standalone MCP-AQL server functions as a universal safety layer without implementing any other MCP-AQL features.


1. Introduction

1.1 Purpose

The execution safety loop addresses a fundamental challenge in LLM tool-calling environments: how do you enforce safety policies on an LLM's actions when those actions target multiple independent MCP servers?

Individual MCP servers can protect their own operations, but no single server has visibility into what the LLM is doing across all connected servers. The execution safety loop solves this by providing a central checkpoint that the LLM reports to before taking any action — regardless of which server handles the actual operation.

1.2 Status

The execution safety loop is an optional feature defined normatively in Section 8.6 of the core specification. This document provides informative guidance on deployment models, integration patterns, and implementation strategies.

1.3 Core Concept

The execution safety loop is a protocol-level monitoring pattern:

  1. The LLM plans an action (any tool call, file operation, shell command, etc.)
  2. The LLM reports its intent to a safety-enabled MCP-AQL server via record_execution_step with nextActionHint
  3. The server evaluates the intent against its safety pipeline (Gatekeeper policies, Autonomy Evaluator, Danger Zone rules)
  4. The server returns an AutonomyDirective — go, pause, or hard stop
  5. The LLM respects the directive before proceeding

This is distinct from the full agent lifecycle that a specific adapter (like DollhouseMCP) may implement. The safety loop is purely about monitoring and evaluating — it does not manage agent state, personas, elements, or any domain-specific functionality.

1.4 Opt-In Design

The execution safety loop is entirely optional at every level:

  • Adapters choose whether to support it
  • Clients choose whether to enable it
  • Users can disable it at any time

No adapter is required to implement the safety loop, and no client is required to use it. When disabled, the adapter operates normally without any safety enforcement overhead. See Section 8: Opting Out for details.


2. The Safety Dongle Deployment Model

2.1 What Is a Safety Dongle?

A safety dongle is a standalone MCP-AQL server whose sole purpose is safety enforcement. It has no domain functionality — it does not manage files, query databases, or interact with external services. It exists only to evaluate the LLM's intended actions and return go/no-go directives.

The name "dongle" reflects the deployment model: you plug it in alongside your existing MCP servers, and it acts as a universal firewall for all tool calls in the session.

2.2 Architecture

┌────────────────────────────────────────────────────────┐
│  MCP Client (Claude, GPT, etc.)                        │
│                                                        │
│  LLM decides to take an action                         │
│       │                                                │
│       ▼                                                │
│  1. Report intent to safety dongle                     │
│     record_execution_step {                            │
│       nextActionHint: "calling write_file on server X" │
│     }                                                  │
│       │                                                │
│       ▼                                                │
│  ┌──────────────────────────────────────────┐          │
│  │  Safety Dongle (MCP-AQL Server)          │          │
│  │                                          │          │
│  │  Gatekeeper → Autonomy Evaluator →       │          │
│  │  Danger Zone Enforcer                    │          │
│  │                                          │          │
│  │  Returns: AutonomyDirective              │          │
│  │  { continue: true/false, factors: [...] }│          │
│  └──────────────────────────────────────────┘          │
│       │                                                │
│       ▼                                                │
│  2. If continue: execute the action on target server   │
│     If !continue: pause, escalate, or abort            │
│                                                        │
│  ┌─────────┐  ┌─────────┐  ┌─────────┐               │
│  │ MCP     │  │ MCP     │  │ MCP     │               │
│  │ Server A│  │ Server B│  │ Server C│               │
│  │ (files) │  │ (db)    │  │ (slack) │               │
│  └─────────┘  └─────────┘  └─────────┘               │
└────────────────────────────────────────────────────────┘

2.3 What the Dongle Does NOT Do

A safety dongle in its minimal configuration:

  • Does not manage elements (personas, skills, templates, agents, memories)
  • Does not store or query domain data
  • Does not interact with external services beyond safety evaluation
  • Does not implement the full CRUDE pattern for domain operations
  • Does not require introspection of other servers' capabilities

It is a purpose-built safety layer with a minimal operation surface.

2.4 Relationship to Full MCP-AQL Adapters

The safety dongle and a full MCP-AQL adapter are both valid configurations of the same protocol. They differ only in scope:

Aspect Safety Dongle Full MCP-AQL Adapter
Purpose Safety enforcement only Domain operations + safety
Operations ~7 (safety loop + introspection) Dozens (full CRUDE surface)
Elements None Personas, skills, agents, etc.
State Execution state + policies only Full domain state
Endpoints used CREATE, EXECUTE, READ (subset) All five CRUDE endpoints

A full MCP-AQL adapter MAY embed the safety loop directly — it does not need a separate dongle. The dongle model is for environments where safety enforcement is decoupled from the adapters performing the actual work.


3. The Execution Safety Loop

3.1 Loop Protocol

The execution safety loop follows a simple protocol:

┌─────────────────────────────────────┐
│                                     │
│  ┌──────────┐                       │
│  │  Plan    │  LLM determines       │
│  │  action  │  next action          │
│  └────┬─────┘                       │
│       │                             │
│       ▼                             │
│  ┌──────────┐                       │
│  │  Report  │  record_execution_step│
│  │  intent  │  { nextActionHint,    │
│  │          │    outcome? }         │
│  └────┬─────┘                       │
│       │                             │
│       ▼                             │
│  ┌──────────┐                       │
│  │ Evaluate │  Gatekeeper +         │
│  │          │  Autonomy Evaluator + │
│  │          │  Danger Zone          │
│  └────┬─────┘                       │
│       │                             │
│       ▼                             │
│  ┌──────────┐    ┌──────────┐       │
│  │ continue │───▶│  Act     │       │
│  │ = true   │    │  (tool   │       │
│  └──────────┘    │   call)  │       │
│                  └────┬─────┘       │
│       ┌───────────────┘             │
│       │                             │
│       ▼                             │
│  ┌──────────┐    ┌──────────┐       │
│  │ continue │───▶│  Pause / │       │
│  │ = false  │    │  Escalate│       │
│  └──────────┘    └──────────┘       │
│                                     │
└─────────────────────────────────────┘

Lifecycle:

  1. Start: execute_agent — initiates the safety loop (requires explicit approval)
  2. Loop: record_execution_step — report each intended action, receive directive
  3. End: complete_execution (success) or abort_execution (abnormal termination)

3.2 Monitoring Scope

The execution safety loop monitors all intended actions, not only MCP-AQL operations on the same adapter. This includes:

  • Tool calls to any MCP server in the client session
  • Tool calls to non-MCP-AQL MCP servers
  • Built-in client capabilities (file I/O, shell access, web requests)
  • Any operation that produces effects beyond the LLM's reasoning context

This broad scope is what makes the safety dongle deployment model possible: the dongle does not need to understand or proxy the actual operations — it only needs to evaluate the LLM's description of what it intends to do.

3.3 The record_execution_step Parameters

The record_execution_step operation accepts several parameters. The full parameter table is defined normatively in Section 8.6.3 of the core specification. The two most important fields are:

  • nextActionHint (MUST) — describes the intended next action for safety evaluation
  • outcome (SHOULD) — reports the result of the previous step ("success", "failure", or "skipped"), which drives Stage 2 of the Autonomy Evaluator pipeline (Section 8.7.2)

The nextActionHint is a human-readable string describing the intended action with sufficient detail for safety evaluation.

Good nextActionHint examples:

// Specific tool call with target
"calling write_file on project/config.json to update database URL"

// Shell command with detail
"executing shell command: npm install express"

// Cross-server operation
"sending message to #general channel via Slack MCP server"

// Destructive operation
"deleting all records from staging_users table via database MCP server"

// File system operation
"reading /etc/passwd to check user configuration"

Poor nextActionHint examples:

// Too vague — cannot evaluate risk
"doing something"

// No target information
"writing a file"

// Missing context for risk assessment
"running a command"

The quality of safety evaluation depends directly on the quality of the nextActionHint. LLM system prompts SHOULD instruct the model to provide specific, detailed action descriptions.

3.4 The AutonomyDirective Response

Every record_execution_step call returns an AutonomyDirective indicating whether the LLM may proceed. See Section 8.7 of the core specification for the full type definition.

Key fields:

Field Type Description
continue boolean Whether the LLM may proceed with the reported action
factors string[] Human-readable explanations of the evaluation decision
stopped boolean Whether the agent has been hard-blocked (Danger Zone)
reason string Why the agent was paused or stopped
stepsRemaining number Steps remaining before mandatory pause
nextStepRisk SafetyTier Safety tier assigned to the reported action
notifications AgentNotification[] Gatekeeper blocks, danger alerts, and other events

Decision tree:

  1. If stopped === trueHard stop. Agent MUST cease all actions until unblocked via out-of-band verification.
  2. If continue === falsePause. Agent MUST NOT proceed. Report the pause to the user or upstream system.
  3. If continue === trueProceed. Agent MAY execute the reported action.

4. Minimal Operation Surface

4.1 Required Operations

A safety dongle implements a minimal set of operations:

Operation Endpoint Purpose
execute_agent EXECUTE Start the execution safety loop
record_execution_step CREATE Report intended action, receive AutonomyDirective
complete_execution EXECUTE Signal normal completion of the safety loop
abort_execution EXECUTE Signal abnormal termination
confirm_operation EXECUTE Confirm Gatekeeper blocks or Autonomy Evaluator confirm tier pauses
verify_challenge CREATE Submit out-of-band verification code (verify tier and Danger Zone)
introspect READ Discover available operations and capabilities

This is 7 operations across 3 endpoints — a fraction of a full MCP-AQL adapter's surface.

4.2 Endpoint Classification

The endpoint placement of these operations is critical to the permission architecture:

Endpoint Default Permission Operations Rationale
READ AUTO_APPROVE introspect Zero friction for discovery
CREATE CONFIRM_SESSION record_execution_step, verify_challenge Approve once, then frictionless for the session
EXECUTE CONFIRM_SINGLE_USE execute_agent, complete_execution, abort_execution, confirm_operation Each invocation requires explicit approval

The key design decision: record_execution_step is on CREATE (not EXECUTE) because it runs on every loop cycle. If it were on EXECUTE, the client would need to either:

  • Auto-approve all EXECUTE operations — which also auto-approves confirm_operation, collapsing the Gatekeeper permission architecture
  • Manually approve every loop cycle — making the safety loop unusable in practice

By placing it on CREATE with CONFIRM_SESSION semantics, the user approves it once at the start of the session, and subsequent calls are frictionless. Meanwhile, confirm_operation stays on EXECUTE with CONFIRM_SINGLE_USE, preserving the permission model.

4.3 Permission Architecture

The CRUDE endpoint classification directly determines default friction levels:

READ (AUTO_APPROVE)
  └── introspect           → Zero friction

CREATE (CONFIRM_SESSION)
  ├── record_execution_step → Approve once, then frictionless
  └── verify_challenge      → Approve once, then frictionless

EXECUTE (CONFIRM_SINGLE_USE)
  ├── execute_agent         → Explicit approval each time
  ├── complete_execution    → Explicit approval each time
  ├── abort_execution       → Explicit approval each time
  └── confirm_operation     → AUTO_APPROVE override (avoids "confirm to confirm" loop)

Implementation Note: confirm_operation has an AUTO_APPROVE override despite being on the EXECUTE endpoint. This prevents an infinite confirmation loop where confirming an operation itself requires confirmation.

Security Note: Because confirm_operation is auto-approved, the LLM could theoretically call it to approve its own blocked operations (Gatekeeper blocks or Autonomy Evaluator confirm tier pauses) without human involvement. Implementations MUST prevent self-approval — the entity confirming an operation MUST NOT be the same agent that triggered the block. Server-side controls (caller identity validation, confirmation tokens bound to a human channel, or out-of-band confirmation delivery) are required. See Section 8.7.3 for the normative requirement.


5. Integration with Gatekeeper

5.1 How the Dongle Evaluates Actions

The safety dongle uses the same Gatekeeper architecture as a full MCP-AQL adapter, but scoped to the execution safety loop:

  1. Layer 1 — Route Validation: Verifies the operation is called via the correct CRUDE endpoint
  2. Layer 2 — Safety Tier: Evaluates the endpoint's permission characteristics
  3. Layer 3 — Policy Evaluation: Checks nextActionHint against configured policies
  4. Layer 4 — Confirmation: Issues confirmation tokens for operations requiring acknowledgment

For the safety dongle, Layer 3 is the most important: it evaluates the LLM's stated intent against configurable patterns.

5.2 Policy Model

Policies control which actions are allowed, denied, or require confirmation. Policies are matched against the nextActionHint string using glob patterns:

// Example policy configuration
{
  deny: [
    "drop_*",           // Block any drop operations
    "delete_all*",      // Block bulk deletions
    "*_production*",    // Block anything targeting production
    "rm -rf*"           // Block recursive force deletions
  ],
  requiresApproval: [
    "delete_*",         // Require approval for deletions
    "*force*",          // Require approval for force operations
    "deploy_*",         // Require approval for deployments
    "git push*"         // Require approval for git pushes
  ],
  autoApprove: [
    "read_*",           // Auto-approve reads
    "list_*",           // Auto-approve listings
    "get_*",            // Auto-approve getters
    "search_*"          // Auto-approve searches
  ]
}

Resolution order: deny > requiresApproval > autoApprove > default (evaluate via Autonomy Evaluator)

5.3 Notification System

When the Gatekeeper blocks an operation or the Autonomy Evaluator pauses execution, notifications are included in the AutonomyDirective response:

{
  continue: false,
  reason: "Action requires approval: delete_user",
  factors: ["Pattern match: delete_* requires approval"],
  notifications: [
    {
      type: "permission_pending",
      message: "Operation 'delete_user' requires confirmation",
      metadata: {
        operation: "delete_user",
        level: "CONFIRM_SINGLE_USE"
      },
      timestamp: "2026-02-25T14:30:00Z"
    }
  ]
}

Notification types:

Type Trigger Action Required
permission_pending Gatekeeper blocked an operation Call confirm_operation to approve
autonomy_pause Autonomy Evaluator returned continue: false For confirm tier: call confirm_operation; for verify tier: out-of-band verify_challenge (Section 6)
danger_zone Hard block triggered (danger_zone tier or deny pattern) Out-of-band verify_challenge required (Section 6)

The notification system is pull-based: notifications are included in record_execution_step responses. There is no push channel — the LLM must call record_execution_step to discover pending events. This aligns with the MCP protocol's request-response model.


6. Integration with Danger Zone

6.1 Danger Level Escalation During Execution

When the Autonomy Evaluator assigns a high risk score to a nextActionHint, the action escalates through safety tiers defined in the Danger Levels specification:

Risk Score Tier Behavior
0-30 advisory Log and continue
31-60 confirm Pause for human review (continue: false)
61-85 verify Pause + out-of-band verification via verify_challenge (Section 8.8)
86-100 danger_zone Hard stop + out-of-band verification via verify_challenge (Section 8.8)

6.2 Out-of-Band Verification

Both the verify and danger_zone tiers trigger out-of-band verification, but with different enforcement severity.

Danger Zone hard stop (danger_zone tier or deny pattern match):

  1. The AutonomyDirective returns stopped: true
  2. A verification challenge is generated with a cryptographically random code
  3. The code is displayed through a channel inaccessible to the AI (OS dialog, hardware token, SMS, etc.)
  4. The agent is blocked at the agent level until a human provides the correct code via verify_challenge
  5. The block persists across server restarts

Verify tier pause (verify tier):

  1. The AutonomyDirective returns continue: false (without stopped: true)
  2. A verification challenge is generated and displayed out-of-band (same as above)
  3. The agent is paused — subsequent record_execution_step calls return continue: false until verified
  4. Unlike Danger Zone blocks, verify pauses do not persist across restarts and do not prevent new executions

See Section 8.8 of the core specification for the full out-of-band verification protocol.

Security Principle: The verification code MUST NOT appear in any MCP response, tool result, log accessible to the AI, or _meta field. The AI can only see the challenge ID — never the code itself. This ensures that even a prompt-injected LLM cannot self-verify.


7. Deployment Examples

7.1 Minimal Safety Dongle

A standalone MCP-AQL server with no domain functionality:

// MCP client configuration
{
  mcpServers: {
    // Safety dongle — evaluates all actions
    "safety": {
      command: "mcpaql-safety-dongle",
      args: ["--policy", "./safety-policies.json"]
    },
    // Domain servers — do the actual work
    "filesystem": { command: "mcp-filesystem", args: ["/workspace"] },
    "database": { command: "mcp-postgres", args: ["--connection", "..."] },
    "slack": { command: "mcp-slack", args: ["--token", "..."] }
  }
}

The LLM's system prompt includes:

Before taking any action (tool call, file operation, shell command),
you MUST report your intent to the safety server via
record_execution_step with a detailed nextActionHint.
Only proceed if the AutonomyDirective returns continue: true.

7.2 Safety Dongle + Multiple MCP Servers

In a multi-server environment, the safety dongle monitors actions across all servers:

┌─────────────────────────────────────────────────┐
│  MCP Client                                     │
│                                                 │
│  ┌─────────────┐                                │
│  │ Safety      │ ◄── report ALL intended actions│
│  │ Dongle      │ ──► go/no-go directives        │
│  │ (MCP-AQL)   │                                │
│  └─────────────┘                                │
│                                                 │
│  ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐           │
│  │FS    │ │DB    │ │Slack │ │Git   │           │
│  │Server│ │Server│ │Server│ │Server│           │
│  └──────┘ └──────┘ └──────┘ └──────┘           │
└─────────────────────────────────────────────────┘

The domain servers do not need to know about the safety dongle. They are standard MCP servers — the safety evaluation happens between the LLM and the dongle, before the LLM calls any tool on any server.

7.3 Full MCP-AQL Adapter with Embedded Safety

A full MCP-AQL adapter can embed the execution safety loop directly — no separate dongle needed:

// Single adapter with full CRUDE surface + embedded safety
{
  mcpServers: {
    "dollhouse": {
      command: "dollhouse-mcp",
      args: ["--safety-mode", "enforcing"]
    }
  }
}

In this model, the adapter handles both domain operations (elements, personas, agents) and safety enforcement (Gatekeeper, Autonomy Evaluator, Danger Zone) in the same server. The execution safety loop is just one feature among many.


8. Opting Out

8.1 When to Disable

The execution safety loop may be unnecessary or undesirable in several scenarios:

  • Trusted environments: Development machines, sandboxed containers, or testing environments where safety enforcement adds friction without benefit
  • Performance-sensitive workloads: The loop adds latency to every action (one round-trip per tool call)
  • Simple automation: Scripts or pipelines where the action sequence is predetermined and does not need LLM-level safety evaluation
  • User preference: Some users may prefer to operate without safety constraints

8.2 How to Disable

Adapters indicate their safety loop status via introspection:

// Safety loop actively enforcing
{ capabilities: { execution_safety_loop: "enforcing" } }

// Safety loop in monitoring mode (advisory only)
{ capabilities: { execution_safety_loop: "monitoring" } }

// Safety loop logging only (no evaluation)
{ capabilities: { execution_safety_loop: "logging" } }

// Safety loop supported but currently disabled
{ capabilities: { execution_safety_loop: "disabled" } }

// Adapter does not support the safety loop (field omitted entirely)

Disabling is typically done via adapter configuration:

// Adapter startup configuration
{ safetyLoop: { mode: "disabled" } }

// Or via environment variable
// MCPAQL_SAFETY_LOOP=disabled

8.3 Partial Modes

Between full enforcement and fully disabled, adapters MAY support intermediate modes:

Monitoring mode ("monitoring"):

  • Actions are evaluated by the full safety pipeline
  • AutonomyDirective is returned with accurate factors and risk assessment
  • continue is always true — the directive is advisory, never blocking
  • stopped MUST NOT be set to true
  • Useful for: observing what the safety loop would block without actually blocking, gradual rollout, policy tuning

Logging mode ("logging"):

  • Actions are recorded for audit purposes
  • No evaluation occurs — the safety pipeline is not invoked
  • Useful for: compliance auditing, post-incident analysis, establishing baseline action patterns

9. Implementation Requirements

This section provides a compliance checklist for adapters implementing the execution safety loop. Every requirement listed here corresponds to a normative MUST, SHOULD, or MAY in Sections 8.6–8.8 of the core specification. See Section 9.4 for the complete mapping between normative sections and compliance bullets.

9.1 MUST Requirements

Adapters that support the execution safety loop:

Core Operations & Introspection:

  • MUST document whether they support the execution safety loop via introspection (Section 8.6.1)
  • MUST clearly indicate via introspection whether safety enforcement is active or disabled (Section 8.6.7)
  • MUST implement execute_agent, record_execution_step, complete_execution, and abort_execution (Section 8.6.2)
  • MUST place record_execution_step on the CREATE endpoint (Section 8.5.4)
  • MUST place execute_agent, complete_execution, and abort_execution on the EXECUTE endpoint (Section 8.5.4)

AutonomyDirective Contract:

  • MUST return an AutonomyDirective on every record_execution_step call (Section 8.7.1)
  • MUST include continue and factors in every AutonomyDirective (Section 8.7.1)
  • MUST NOT allow agents to proceed after a stop directive (stopped: true) (Section 8.6.4)
  • MUST NOT set stopped: true when operating in monitoring mode (Section 8.6.4)
  • MUST return stopped: true in the AutonomyDirective for danger_zone tier and deny pattern triggers (Section 8.8.1)

Step Limit:

  • MUST implement the Step Limit stage with a configurable maxAutonomousSteps threshold (Section 8.7.5)

Notifications & Self-Approval:

  • MUST include a danger_zone notification broadcast to all executing agents when returning stopped: true (Section 8.8.1)
  • MUST include a danger_zone notification with metadata.verificationId for danger_zone tier and deny pattern challenges (Section 8.8.2)
  • MUST include an autonomy_pause notification with metadata.verificationId when issuing a verify tier challenge (Section 8.8.2)
  • MUST prevent self-approval — the entity confirming an operation MUST NOT be the same agent that triggered the block (Section 8.7.3)

Out-of-Band Verification (when the adapter implements verify or danger_zone tier evaluation, or deny pattern matching):

  • MUST require out-of-band human verification when a verify or danger_zone safety tier is assigned, or when a deny pattern matches (Section 8.8)
  • MUST generate verification codes with at least 128 bits of entropy from a cryptographically secure random source; codes MUST be cryptographically unpredictable (Section 8.8.2)
  • MUST display verification codes through a channel inaccessible to the AI agent — codes MUST NOT appear in any MCP response, tool result, _meta field, log, error message, or diagnostic output accessible to the AI, and MUST NOT be derivable from any information available to the AI (e.g., timestamps, sequential IDs) (Sections 8.8.2, 8.8.4)
  • MUST treat expired verification challenges as failed (Section 8.8.2)

Blocking Semantics (when the adapter implements stopped: true behavior):

  • MUST reject all subsequent execution operations for a hard-blocked agent until unblocked via verify_challenge or admin override (Section 8.8.3)
  • MUST NOT allow an agent to bypass a hard block by starting a new execution — blocks apply at the agent level, not the execution level (Section 8.8.3)
  • MUST persist hard-blocked agent state across server restarts (file-based or database storage) (Section 8.8.3)

9.2 SHOULD Requirements

Adapters that support the execution safety loop:

Evaluation Pipeline:

  • SHOULD evaluate actions through the multi-stage pipeline defined in Section 8.7.2
  • SHOULD return continue: false with an appropriate reason when the step limit is exceeded (Section 8.7.2, Stage 1)
  • SHOULD document the default step limit via introspection (Section 8.7.2, Stage 1)
  • SHOULD evaluate the outcome field in record_execution_step calls and return continue: false on reported failures (Section 8.7.2, Stage 2)
  • SHOULD support configurable policy patterns (deny, requiresApproval, autoApprove) (Section 8.7.2, Stage 3)
  • SHOULD NOT allow agents to proceed after continue: false without human intervention (Section 8.6.4)

Operations:

  • SHOULD implement confirm_operation for Gatekeeper blocks and confirm tier pauses (Section 8.7.3)
  • SHOULD implement verify_challenge for verify tier pauses and Danger Zone unblocking (Section 8.8)

Configuration & Introspection:

  • SHOULD expose the execution_safety_loop capability value in the introspection response (Section 8.6.1)
  • SHOULD make evaluation pipeline elements configurable per agent or per adapter (Section 8.7.4)
  • SHOULD document default configuration and supported options via introspection (Section 8.7.4)

Notifications:

  • SHOULD include notifications in AutonomyDirective responses for non-hard-block events (note: verify tier autonomy_pause with metadata.verificationId and danger_zone notifications are MUST — see Section 9.1) (Section 8.7.3)

Verify Tier Behavior:

  • SHOULD return continue: false with the pending challenge ID for subsequent record_execution_step calls while a verify challenge is pending (Section 8.8.3)
  • SHOULD re-evaluate actions normally when a verify challenge expires, generating a new challenge if the same action is reported again (Section 8.8.3)
  • SHOULD expire verification challenges after a configurable timeout (default: 5 minutes) (Section 8.8.2)

Rate Limiting:

  • SHOULD rate-limit failed verification attempts (no more than 10 per 60-second window per agent) (Section 8.8.5)
  • SHOULD reject subsequent verification attempts after the rate limit is exceeded (Section 8.8.5)
  • SHOULD persist rate limit state across server restarts (Section 8.8.5)
  • SHOULD trigger security audit events on failed verification attempts (Section 8.8.5)

Monitoring & Audit:

  • SHOULD support the "monitoring" partial mode for gradual rollout (Section 8.6.7)

9.3 MAY Requirements

Adapters that support the execution safety loop:

  • MAY support the "logging" partial mode (Section 8.6.7)
  • MAY implement Safety Tier evaluation (Stage 4) using pattern matching, LLM-provided risk assessments, or adapter-specific heuristics for risk scoring (Section 8.7.2, Stage 4)
  • MAY support configurable risk tolerance thresholds (conservative, moderate, aggressive) (Section 8.7.2, Stage 5)
  • MAY apply progressive lockout (e.g., doubling the window duration) after repeated rate-limit violations (Section 8.8.5)
  • MAY use OS dialogs, hardware tokens, SMS/email, or other display channels for presenting verification codes to operators (Section 8.8.6)

9.4 Normative Cross-Reference

The following table maps each normative section to its corresponding Section 9 requirements, providing an audit trail for compliance verification.

Normative Section Key Requirements Section 9 Coverage
8.6.1 Opt-In Activation MUST document support; SHOULD expose capability value 9.1 (introspection); 9.2 (capability value)
8.6.2 Enforcement Boundary MUST report and evaluate all actions 9.1 (operations, AutonomyDirective on every call)
8.6.3 Action Reporting MUST/SHOULD/MAY parameter requirements 9.1 (operations — parameter validation is part of implementing record_execution_step)
8.6.4 Continuous Enforcement MUST NOT proceed after stop; SHOULD NOT after pause; MUST NOT stopped in monitoring 9.1 (stop directive, monitoring mode); 9.2 (SHOULD NOT after pause)
8.6.5 Non-Bypass Property Agent MUST NOT act outside loop Agent-side protocol obligation (not adapter-specific)
8.6.6 Scope of Monitoring Monitors all actions (informative) Described in Sections 3.2, 2.2 of this document
8.6.7 Disabling MAY disable; MUST indicate via introspection 9.1 (introspection); 9.2 (monitoring); 9.3 (logging)
8.7.1 AutonomyDirective MUST include continue + factors 9.1 (AutonomyDirective fields)
8.7.2 Pipeline Stages 1–5 MUST step limit; SHOULD stages 2–3; MAY stages 4–5 9.1 (step limit MUST); 9.2 (stage 1 documentation SHOULDs, stages 2–3 SHOULDs); 9.3 (stages 4–5)
8.7.3 Notifications MUST notification contents; MUST prevent self-approval 9.1 (notifications, self-approval); 9.2 (non-hard-block notifications)
8.7.4 Configurable Elements SHOULD configurable; SHOULD document 9.2 (configuration, introspection)
8.7.5 Minimum Viable MUST Step Limit; SHOULD Previous Outcome + Pattern Matching; MAY Safety Tier + Risk Tolerance 9.1 (step limit); 9.2 (outcome, patterns); 9.3 (tier, tolerance)
8.8 Introduction MUST require OOB for verify/danger_zone 9.1 (OOB verification)
8.8.1 Trigger Conditions MUST stopped: true + danger_zone notification 9.1 (stopped: true, notification broadcast)
8.8.2 Challenge Protocol MUST entropy, display, notification, expiration 9.1 (code generation, display, challenge ID, expiration); 9.2 (timeout)
8.8.3 Blocking Semantics MUST reject, persist, no bypass; SHOULD verify behavior 9.1 (blocking); 9.2 (verify tier)
8.8.4 Channel Separation MUST NOT expose code (5 specific prohibitions) 9.1 (display/channel separation — consolidated)
8.8.5 Rate Limiting SHOULD rate-limit, persist, audit; MAY progressive lockout 9.2 (rate limiting); 9.3 (lockout)
8.8.6 Implementation Flexibility MAY any display channel 9.3 (display channels)