Execution Safety Loop Specification
Informative Document: This is an informative specification that provides detailed guidance for the execution safety loop pattern defined normatively in Section 8.6 of the core specification. In case of conflict, the norm
On this page
Jump to a section
Use the outline to move through longer pages without losing your place.
Version: 1.0.0-draft Status: Draft Last Updated: 2026-04-15
Informative Document: This is an informative specification that provides detailed guidance for the execution safety loop pattern defined normatively in Section 8.6 of the core specification. In case of conflict, the normative specification takes precedence.
Abstract
This document specifies the execution safety loop — an opt-in monitoring pattern that enables continuous safety evaluation of an LLM's intended actions across all connected MCP servers. It also defines the safety dongle deployment model, where a standalone MCP-AQL server functions as a universal safety layer without implementing any other MCP-AQL features.
1. Introduction
1.1 Purpose
The execution safety loop addresses a fundamental challenge in LLM tool-calling environments: how do you enforce safety policies on an LLM's actions when those actions target multiple independent MCP servers?
Individual MCP servers can protect their own operations, but no single server has visibility into what the LLM is doing across all connected servers. The execution safety loop solves this by providing a central checkpoint that the LLM reports to before taking any action — regardless of which server handles the actual operation.
1.2 Status
The execution safety loop is an optional feature defined normatively in Section 8.6 of the core specification. This document provides informative guidance on deployment models, integration patterns, and implementation strategies.
1.3 Core Concept
The execution safety loop is a protocol-level monitoring pattern:
- The LLM plans an action (any tool call, file operation, shell command, etc.)
- The LLM reports its intent to a safety-enabled MCP-AQL server via
record_execution_stepwithnextActionHint - The server evaluates the intent against its safety pipeline (Gatekeeper policies, Autonomy Evaluator, Danger Zone rules)
- The server returns an
AutonomyDirective— go, pause, or hard stop - The LLM respects the directive before proceeding
This is distinct from the full agent lifecycle that a specific adapter (like DollhouseMCP) may implement. The safety loop is purely about monitoring and evaluating — it does not manage agent state, personas, elements, or any domain-specific functionality.
1.4 Opt-In Design
The execution safety loop is entirely optional at every level:
- Adapters choose whether to support it
- Clients choose whether to enable it
- Users can disable it at any time
No adapter is required to implement the safety loop, and no client is required to use it. When disabled, the adapter operates normally without any safety enforcement overhead. See Section 8: Opting Out for details.
2. The Safety Dongle Deployment Model
2.1 What Is a Safety Dongle?
A safety dongle is a standalone MCP-AQL server whose sole purpose is safety enforcement. It has no domain functionality — it does not manage files, query databases, or interact with external services. It exists only to evaluate the LLM's intended actions and return go/no-go directives.
The name "dongle" reflects the deployment model: you plug it in alongside your existing MCP servers, and it acts as a universal firewall for all tool calls in the session.
2.2 Architecture
┌────────────────────────────────────────────────────────┐
│ MCP Client (Claude, GPT, etc.) │
│ │
│ LLM decides to take an action │
│ │ │
│ ▼ │
│ 1. Report intent to safety dongle │
│ record_execution_step { │
│ nextActionHint: "calling write_file on server X" │
│ } │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────┐ │
│ │ Safety Dongle (MCP-AQL Server) │ │
│ │ │ │
│ │ Gatekeeper → Autonomy Evaluator → │ │
│ │ Danger Zone Enforcer │ │
│ │ │ │
│ │ Returns: AutonomyDirective │ │
│ │ { continue: true/false, factors: [...] }│ │
│ └──────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ 2. If continue: execute the action on target server │
│ If !continue: pause, escalate, or abort │
│ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ MCP │ │ MCP │ │ MCP │ │
│ │ Server A│ │ Server B│ │ Server C│ │
│ │ (files) │ │ (db) │ │ (slack) │ │
│ └─────────┘ └─────────┘ └─────────┘ │
└────────────────────────────────────────────────────────┘
2.3 What the Dongle Does NOT Do
A safety dongle in its minimal configuration:
- Does not manage elements (personas, skills, templates, agents, memories)
- Does not store or query domain data
- Does not interact with external services beyond safety evaluation
- Does not implement the full CRUDE pattern for domain operations
- Does not require introspection of other servers' capabilities
It is a purpose-built safety layer with a minimal operation surface.
2.4 Relationship to Full MCP-AQL Adapters
The safety dongle and a full MCP-AQL adapter are both valid configurations of the same protocol. They differ only in scope:
| Aspect | Safety Dongle | Full MCP-AQL Adapter |
|---|---|---|
| Purpose | Safety enforcement only | Domain operations + safety |
| Operations | ~7 (safety loop + introspection) | Dozens (full CRUDE surface) |
| Elements | None | Personas, skills, agents, etc. |
| State | Execution state + policies only | Full domain state |
| Endpoints used | CREATE, EXECUTE, READ (subset) | All five CRUDE endpoints |
A full MCP-AQL adapter MAY embed the safety loop directly — it does not need a separate dongle. The dongle model is for environments where safety enforcement is decoupled from the adapters performing the actual work.
3. The Execution Safety Loop
3.1 Loop Protocol
The execution safety loop follows a simple protocol:
┌─────────────────────────────────────┐
│ │
│ ┌──────────┐ │
│ │ Plan │ LLM determines │
│ │ action │ next action │
│ └────┬─────┘ │
│ │ │
│ ▼ │
│ ┌──────────┐ │
│ │ Report │ record_execution_step│
│ │ intent │ { nextActionHint, │
│ │ │ outcome? } │
│ └────┬─────┘ │
│ │ │
│ ▼ │
│ ┌──────────┐ │
│ │ Evaluate │ Gatekeeper + │
│ │ │ Autonomy Evaluator + │
│ │ │ Danger Zone │
│ └────┬─────┘ │
│ │ │
│ ▼ │
│ ┌──────────┐ ┌──────────┐ │
│ │ continue │───▶│ Act │ │
│ │ = true │ │ (tool │ │
│ └──────────┘ │ call) │ │
│ └────┬─────┘ │
│ ┌───────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────┐ ┌──────────┐ │
│ │ continue │───▶│ Pause / │ │
│ │ = false │ │ Escalate│ │
│ └──────────┘ └──────────┘ │
│ │
└─────────────────────────────────────┘
Lifecycle:
- Start:
execute_agent— initiates the safety loop (requires explicit approval) - Loop:
record_execution_step— report each intended action, receive directive - End:
complete_execution(success) orabort_execution(abnormal termination)
3.2 Monitoring Scope
The execution safety loop monitors all intended actions, not only MCP-AQL operations on the same adapter. This includes:
- Tool calls to any MCP server in the client session
- Tool calls to non-MCP-AQL MCP servers
- Built-in client capabilities (file I/O, shell access, web requests)
- Any operation that produces effects beyond the LLM's reasoning context
This broad scope is what makes the safety dongle deployment model possible: the dongle does not need to understand or proxy the actual operations — it only needs to evaluate the LLM's description of what it intends to do.
3.3 The record_execution_step Parameters
The record_execution_step operation accepts several parameters. The full parameter table is defined normatively in Section 8.6.3 of the core specification. The two most important fields are:
nextActionHint(MUST) — describes the intended next action for safety evaluationoutcome(SHOULD) — reports the result of the previous step ("success","failure", or"skipped"), which drives Stage 2 of the Autonomy Evaluator pipeline (Section 8.7.2)
The nextActionHint is a human-readable string describing the intended action with sufficient detail for safety evaluation.
Good nextActionHint examples:
// Specific tool call with target
"calling write_file on project/config.json to update database URL"
// Shell command with detail
"executing shell command: npm install express"
// Cross-server operation
"sending message to #general channel via Slack MCP server"
// Destructive operation
"deleting all records from staging_users table via database MCP server"
// File system operation
"reading /etc/passwd to check user configuration"Poor nextActionHint examples:
// Too vague — cannot evaluate risk
"doing something"
// No target information
"writing a file"
// Missing context for risk assessment
"running a command"The quality of safety evaluation depends directly on the quality of the nextActionHint. LLM system prompts SHOULD instruct the model to provide specific, detailed action descriptions.
3.4 The AutonomyDirective Response
Every record_execution_step call returns an AutonomyDirective indicating whether the LLM may proceed. See Section 8.7 of the core specification for the full type definition.
Key fields:
| Field | Type | Description |
|---|---|---|
continue |
boolean | Whether the LLM may proceed with the reported action |
factors |
string[] | Human-readable explanations of the evaluation decision |
stopped |
boolean | Whether the agent has been hard-blocked (Danger Zone) |
reason |
string | Why the agent was paused or stopped |
stepsRemaining |
number | Steps remaining before mandatory pause |
nextStepRisk |
SafetyTier | Safety tier assigned to the reported action |
notifications |
AgentNotification[] | Gatekeeper blocks, danger alerts, and other events |
Decision tree:
- If
stopped === true→ Hard stop. Agent MUST cease all actions until unblocked via out-of-band verification. - If
continue === false→ Pause. Agent MUST NOT proceed. Report the pause to the user or upstream system. - If
continue === true→ Proceed. Agent MAY execute the reported action.
4. Minimal Operation Surface
4.1 Required Operations
A safety dongle implements a minimal set of operations:
| Operation | Endpoint | Purpose |
|---|---|---|
execute_agent |
EXECUTE | Start the execution safety loop |
record_execution_step |
CREATE | Report intended action, receive AutonomyDirective |
complete_execution |
EXECUTE | Signal normal completion of the safety loop |
abort_execution |
EXECUTE | Signal abnormal termination |
confirm_operation |
EXECUTE | Confirm Gatekeeper blocks or Autonomy Evaluator confirm tier pauses |
verify_challenge |
CREATE | Submit out-of-band verification code (verify tier and Danger Zone) |
introspect |
READ | Discover available operations and capabilities |
This is 7 operations across 3 endpoints — a fraction of a full MCP-AQL adapter's surface.
4.2 Endpoint Classification
The endpoint placement of these operations is critical to the permission architecture:
| Endpoint | Default Permission | Operations | Rationale |
|---|---|---|---|
| READ | AUTO_APPROVE |
introspect |
Zero friction for discovery |
| CREATE | CONFIRM_SESSION |
record_execution_step, verify_challenge |
Approve once, then frictionless for the session |
| EXECUTE | CONFIRM_SINGLE_USE |
execute_agent, complete_execution, abort_execution, confirm_operation |
Each invocation requires explicit approval |
The key design decision: record_execution_step is on CREATE (not EXECUTE) because it runs on every loop cycle. If it were on EXECUTE, the client would need to either:
- Auto-approve all EXECUTE operations — which also auto-approves
confirm_operation, collapsing the Gatekeeper permission architecture - Manually approve every loop cycle — making the safety loop unusable in practice
By placing it on CREATE with CONFIRM_SESSION semantics, the user approves it once at the start of the session, and subsequent calls are frictionless. Meanwhile, confirm_operation stays on EXECUTE with CONFIRM_SINGLE_USE, preserving the permission model.
4.3 Permission Architecture
The CRUDE endpoint classification directly determines default friction levels:
READ (AUTO_APPROVE)
└── introspect → Zero friction
CREATE (CONFIRM_SESSION)
├── record_execution_step → Approve once, then frictionless
└── verify_challenge → Approve once, then frictionless
EXECUTE (CONFIRM_SINGLE_USE)
├── execute_agent → Explicit approval each time
├── complete_execution → Explicit approval each time
├── abort_execution → Explicit approval each time
└── confirm_operation → AUTO_APPROVE override (avoids "confirm to confirm" loop)
Implementation Note:
confirm_operationhas anAUTO_APPROVEoverride despite being on the EXECUTE endpoint. This prevents an infinite confirmation loop where confirming an operation itself requires confirmation.Security Note: Because
confirm_operationis auto-approved, the LLM could theoretically call it to approve its own blocked operations (Gatekeeper blocks or Autonomy Evaluatorconfirmtier pauses) without human involvement. Implementations MUST prevent self-approval — the entity confirming an operation MUST NOT be the same agent that triggered the block. Server-side controls (caller identity validation, confirmation tokens bound to a human channel, or out-of-band confirmation delivery) are required. See Section 8.7.3 for the normative requirement.
5. Integration with Gatekeeper
5.1 How the Dongle Evaluates Actions
The safety dongle uses the same Gatekeeper architecture as a full MCP-AQL adapter, but scoped to the execution safety loop:
- Layer 1 — Route Validation: Verifies the operation is called via the correct CRUDE endpoint
- Layer 2 — Safety Tier: Evaluates the endpoint's permission characteristics
- Layer 3 — Policy Evaluation: Checks
nextActionHintagainst configured policies - Layer 4 — Confirmation: Issues confirmation tokens for operations requiring acknowledgment
For the safety dongle, Layer 3 is the most important: it evaluates the LLM's stated intent against configurable patterns.
5.2 Policy Model
Policies control which actions are allowed, denied, or require confirmation. Policies are matched against the nextActionHint string using glob patterns:
// Example policy configuration
{
deny: [
"drop_*", // Block any drop operations
"delete_all*", // Block bulk deletions
"*_production*", // Block anything targeting production
"rm -rf*" // Block recursive force deletions
],
requiresApproval: [
"delete_*", // Require approval for deletions
"*force*", // Require approval for force operations
"deploy_*", // Require approval for deployments
"git push*" // Require approval for git pushes
],
autoApprove: [
"read_*", // Auto-approve reads
"list_*", // Auto-approve listings
"get_*", // Auto-approve getters
"search_*" // Auto-approve searches
]
}Resolution order: deny > requiresApproval > autoApprove > default (evaluate via Autonomy Evaluator)
5.3 Notification System
When the Gatekeeper blocks an operation or the Autonomy Evaluator pauses execution, notifications are included in the AutonomyDirective response:
{
continue: false,
reason: "Action requires approval: delete_user",
factors: ["Pattern match: delete_* requires approval"],
notifications: [
{
type: "permission_pending",
message: "Operation 'delete_user' requires confirmation",
metadata: {
operation: "delete_user",
level: "CONFIRM_SINGLE_USE"
},
timestamp: "2026-02-25T14:30:00Z"
}
]
}Notification types:
| Type | Trigger | Action Required |
|---|---|---|
permission_pending |
Gatekeeper blocked an operation | Call confirm_operation to approve |
autonomy_pause |
Autonomy Evaluator returned continue: false |
For confirm tier: call confirm_operation; for verify tier: out-of-band verify_challenge (Section 6) |
danger_zone |
Hard block triggered (danger_zone tier or deny pattern) |
Out-of-band verify_challenge required (Section 6) |
The notification system is pull-based: notifications are included in record_execution_step responses. There is no push channel — the LLM must call record_execution_step to discover pending events. This aligns with the MCP protocol's request-response model.
6. Integration with Danger Zone
6.1 Danger Level Escalation During Execution
When the Autonomy Evaluator assigns a high risk score to a nextActionHint, the action escalates through safety tiers defined in the Danger Levels specification:
| Risk Score | Tier | Behavior |
|---|---|---|
| 0-30 | advisory |
Log and continue |
| 31-60 | confirm |
Pause for human review (continue: false) |
| 61-85 | verify |
Pause + out-of-band verification via verify_challenge (Section 8.8) |
| 86-100 | danger_zone |
Hard stop + out-of-band verification via verify_challenge (Section 8.8) |
6.2 Out-of-Band Verification
Both the verify and danger_zone tiers trigger out-of-band verification, but with different enforcement severity.
Danger Zone hard stop (danger_zone tier or deny pattern match):
- The
AutonomyDirectivereturnsstopped: true - A verification challenge is generated with a cryptographically random code
- The code is displayed through a channel inaccessible to the AI (OS dialog, hardware token, SMS, etc.)
- The agent is blocked at the agent level until a human provides the correct code via
verify_challenge - The block persists across server restarts
Verify tier pause (verify tier):
- The
AutonomyDirectivereturnscontinue: false(withoutstopped: true) - A verification challenge is generated and displayed out-of-band (same as above)
- The agent is paused — subsequent
record_execution_stepcalls returncontinue: falseuntil verified - Unlike Danger Zone blocks, verify pauses do not persist across restarts and do not prevent new executions
See Section 8.8 of the core specification for the full out-of-band verification protocol.
Security Principle: The verification code MUST NOT appear in any MCP response, tool result, log accessible to the AI, or
_metafield. The AI can only see the challenge ID — never the code itself. This ensures that even a prompt-injected LLM cannot self-verify.
7. Deployment Examples
7.1 Minimal Safety Dongle
A standalone MCP-AQL server with no domain functionality:
// MCP client configuration
{
mcpServers: {
// Safety dongle — evaluates all actions
"safety": {
command: "mcpaql-safety-dongle",
args: ["--policy", "./safety-policies.json"]
},
// Domain servers — do the actual work
"filesystem": { command: "mcp-filesystem", args: ["/workspace"] },
"database": { command: "mcp-postgres", args: ["--connection", "..."] },
"slack": { command: "mcp-slack", args: ["--token", "..."] }
}
}The LLM's system prompt includes:
Before taking any action (tool call, file operation, shell command),
you MUST report your intent to the safety server via
record_execution_step with a detailed nextActionHint.
Only proceed if the AutonomyDirective returns continue: true.
7.2 Safety Dongle + Multiple MCP Servers
In a multi-server environment, the safety dongle monitors actions across all servers:
┌─────────────────────────────────────────────────┐
│ MCP Client │
│ │
│ ┌─────────────┐ │
│ │ Safety │ ◄── report ALL intended actions│
│ │ Dongle │ ──► go/no-go directives │
│ │ (MCP-AQL) │ │
│ └─────────────┘ │
│ │
│ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ │
│ │FS │ │DB │ │Slack │ │Git │ │
│ │Server│ │Server│ │Server│ │Server│ │
│ └──────┘ └──────┘ └──────┘ └──────┘ │
└─────────────────────────────────────────────────┘
The domain servers do not need to know about the safety dongle. They are standard MCP servers — the safety evaluation happens between the LLM and the dongle, before the LLM calls any tool on any server.
7.3 Full MCP-AQL Adapter with Embedded Safety
A full MCP-AQL adapter can embed the execution safety loop directly — no separate dongle needed:
// Single adapter with full CRUDE surface + embedded safety
{
mcpServers: {
"dollhouse": {
command: "dollhouse-mcp",
args: ["--safety-mode", "enforcing"]
}
}
}In this model, the adapter handles both domain operations (elements, personas, agents) and safety enforcement (Gatekeeper, Autonomy Evaluator, Danger Zone) in the same server. The execution safety loop is just one feature among many.
8. Opting Out
8.1 When to Disable
The execution safety loop may be unnecessary or undesirable in several scenarios:
- Trusted environments: Development machines, sandboxed containers, or testing environments where safety enforcement adds friction without benefit
- Performance-sensitive workloads: The loop adds latency to every action (one round-trip per tool call)
- Simple automation: Scripts or pipelines where the action sequence is predetermined and does not need LLM-level safety evaluation
- User preference: Some users may prefer to operate without safety constraints
8.2 How to Disable
Adapters indicate their safety loop status via introspection:
// Safety loop actively enforcing
{ capabilities: { execution_safety_loop: "enforcing" } }
// Safety loop in monitoring mode (advisory only)
{ capabilities: { execution_safety_loop: "monitoring" } }
// Safety loop logging only (no evaluation)
{ capabilities: { execution_safety_loop: "logging" } }
// Safety loop supported but currently disabled
{ capabilities: { execution_safety_loop: "disabled" } }
// Adapter does not support the safety loop (field omitted entirely)Disabling is typically done via adapter configuration:
// Adapter startup configuration
{ safetyLoop: { mode: "disabled" } }
// Or via environment variable
// MCPAQL_SAFETY_LOOP=disabled8.3 Partial Modes
Between full enforcement and fully disabled, adapters MAY support intermediate modes:
Monitoring mode ("monitoring"):
- Actions are evaluated by the full safety pipeline
AutonomyDirectiveis returned with accuratefactorsand risk assessmentcontinueis alwaystrue— the directive is advisory, never blockingstoppedMUST NOT be set totrue- Useful for: observing what the safety loop would block without actually blocking, gradual rollout, policy tuning
Logging mode ("logging"):
- Actions are recorded for audit purposes
- No evaluation occurs — the safety pipeline is not invoked
- Useful for: compliance auditing, post-incident analysis, establishing baseline action patterns
9. Implementation Requirements
This section provides a compliance checklist for adapters implementing the execution safety loop. Every requirement listed here corresponds to a normative MUST, SHOULD, or MAY in Sections 8.6–8.8 of the core specification. See Section 9.4 for the complete mapping between normative sections and compliance bullets.
9.1 MUST Requirements
Adapters that support the execution safety loop:
Core Operations & Introspection:
- MUST document whether they support the execution safety loop via introspection (Section 8.6.1)
- MUST clearly indicate via introspection whether safety enforcement is active or disabled (Section 8.6.7)
- MUST implement
execute_agent,record_execution_step,complete_execution, andabort_execution(Section 8.6.2) - MUST place
record_execution_stepon the CREATE endpoint (Section 8.5.4) - MUST place
execute_agent,complete_execution, andabort_executionon the EXECUTE endpoint (Section 8.5.4)
AutonomyDirective Contract:
- MUST return an
AutonomyDirectiveon everyrecord_execution_stepcall (Section 8.7.1) - MUST include
continueandfactorsin everyAutonomyDirective(Section 8.7.1) - MUST NOT allow agents to proceed after a stop directive (
stopped: true) (Section 8.6.4) - MUST NOT set
stopped: truewhen operating in monitoring mode (Section 8.6.4) - MUST return
stopped: truein theAutonomyDirectivefordanger_zonetier anddenypattern triggers (Section 8.8.1)
Step Limit:
- MUST implement the Step Limit stage with a configurable
maxAutonomousStepsthreshold (Section 8.7.5)
Notifications & Self-Approval:
- MUST include a
danger_zonenotification broadcast to all executing agents when returningstopped: true(Section 8.8.1) - MUST include a
danger_zonenotification withmetadata.verificationIdfordanger_zonetier anddenypattern challenges (Section 8.8.2) - MUST include an
autonomy_pausenotification withmetadata.verificationIdwhen issuing averifytier challenge (Section 8.8.2) - MUST prevent self-approval — the entity confirming an operation MUST NOT be the same agent that triggered the block (Section 8.7.3)
Out-of-Band Verification (when the adapter implements verify or danger_zone tier evaluation, or deny pattern matching):
- MUST require out-of-band human verification when a
verifyordanger_zonesafety tier is assigned, or when adenypattern matches (Section 8.8) - MUST generate verification codes with at least 128 bits of entropy from a cryptographically secure random source; codes MUST be cryptographically unpredictable (Section 8.8.2)
- MUST display verification codes through a channel inaccessible to the AI agent — codes MUST NOT appear in any MCP response, tool result,
_metafield, log, error message, or diagnostic output accessible to the AI, and MUST NOT be derivable from any information available to the AI (e.g., timestamps, sequential IDs) (Sections 8.8.2, 8.8.4) - MUST treat expired verification challenges as failed (Section 8.8.2)
Blocking Semantics (when the adapter implements stopped: true behavior):
- MUST reject all subsequent execution operations for a hard-blocked agent until unblocked via
verify_challengeor admin override (Section 8.8.3) - MUST NOT allow an agent to bypass a hard block by starting a new execution — blocks apply at the agent level, not the execution level (Section 8.8.3)
- MUST persist hard-blocked agent state across server restarts (file-based or database storage) (Section 8.8.3)
9.2 SHOULD Requirements
Adapters that support the execution safety loop:
Evaluation Pipeline:
- SHOULD evaluate actions through the multi-stage pipeline defined in Section 8.7.2
- SHOULD return
continue: falsewith an appropriate reason when the step limit is exceeded (Section 8.7.2, Stage 1) - SHOULD document the default step limit via introspection (Section 8.7.2, Stage 1)
- SHOULD evaluate the
outcomefield inrecord_execution_stepcalls and returncontinue: falseon reported failures (Section 8.7.2, Stage 2) - SHOULD support configurable policy patterns (
deny,requiresApproval,autoApprove) (Section 8.7.2, Stage 3) - SHOULD NOT allow agents to proceed after
continue: falsewithout human intervention (Section 8.6.4)
Operations:
- SHOULD implement
confirm_operationfor Gatekeeper blocks andconfirmtier pauses (Section 8.7.3) - SHOULD implement
verify_challengeforverifytier pauses and Danger Zone unblocking (Section 8.8)
Configuration & Introspection:
- SHOULD expose the
execution_safety_loopcapability value in the introspection response (Section 8.6.1) - SHOULD make evaluation pipeline elements configurable per agent or per adapter (Section 8.7.4)
- SHOULD document default configuration and supported options via introspection (Section 8.7.4)
Notifications:
- SHOULD include
notificationsinAutonomyDirectiveresponses for non-hard-block events (note:verifytierautonomy_pausewithmetadata.verificationIdanddanger_zonenotifications are MUST — see Section 9.1) (Section 8.7.3)
Verify Tier Behavior:
- SHOULD return
continue: falsewith the pending challenge ID for subsequentrecord_execution_stepcalls while averifychallenge is pending (Section 8.8.3) - SHOULD re-evaluate actions normally when a
verifychallenge expires, generating a new challenge if the same action is reported again (Section 8.8.3) - SHOULD expire verification challenges after a configurable timeout (default: 5 minutes) (Section 8.8.2)
Rate Limiting:
- SHOULD rate-limit failed verification attempts (no more than 10 per 60-second window per agent) (Section 8.8.5)
- SHOULD reject subsequent verification attempts after the rate limit is exceeded (Section 8.8.5)
- SHOULD persist rate limit state across server restarts (Section 8.8.5)
- SHOULD trigger security audit events on failed verification attempts (Section 8.8.5)
Monitoring & Audit:
- SHOULD support the
"monitoring"partial mode for gradual rollout (Section 8.6.7)
9.3 MAY Requirements
Adapters that support the execution safety loop:
- MAY support the
"logging"partial mode (Section 8.6.7) - MAY implement Safety Tier evaluation (Stage 4) using pattern matching, LLM-provided risk assessments, or adapter-specific heuristics for risk scoring (Section 8.7.2, Stage 4)
- MAY support configurable risk tolerance thresholds (
conservative,moderate,aggressive) (Section 8.7.2, Stage 5) - MAY apply progressive lockout (e.g., doubling the window duration) after repeated rate-limit violations (Section 8.8.5)
- MAY use OS dialogs, hardware tokens, SMS/email, or other display channels for presenting verification codes to operators (Section 8.8.6)
9.4 Normative Cross-Reference
The following table maps each normative section to its corresponding Section 9 requirements, providing an audit trail for compliance verification.
| Normative Section | Key Requirements | Section 9 Coverage |
|---|---|---|
| 8.6.1 Opt-In Activation | MUST document support; SHOULD expose capability value | 9.1 (introspection); 9.2 (capability value) |
| 8.6.2 Enforcement Boundary | MUST report and evaluate all actions | 9.1 (operations, AutonomyDirective on every call) |
| 8.6.3 Action Reporting | MUST/SHOULD/MAY parameter requirements | 9.1 (operations — parameter validation is part of implementing record_execution_step) |
| 8.6.4 Continuous Enforcement | MUST NOT proceed after stop; SHOULD NOT after pause; MUST NOT stopped in monitoring |
9.1 (stop directive, monitoring mode); 9.2 (SHOULD NOT after pause) |
| 8.6.5 Non-Bypass Property | Agent MUST NOT act outside loop | Agent-side protocol obligation (not adapter-specific) |
| 8.6.6 Scope of Monitoring | Monitors all actions (informative) | Described in Sections 3.2, 2.2 of this document |
| 8.6.7 Disabling | MAY disable; MUST indicate via introspection | 9.1 (introspection); 9.2 (monitoring); 9.3 (logging) |
| 8.7.1 AutonomyDirective | MUST include continue + factors |
9.1 (AutonomyDirective fields) |
| 8.7.2 Pipeline Stages 1–5 | MUST step limit; SHOULD stages 2–3; MAY stages 4–5 | 9.1 (step limit MUST); 9.2 (stage 1 documentation SHOULDs, stages 2–3 SHOULDs); 9.3 (stages 4–5) |
| 8.7.3 Notifications | MUST notification contents; MUST prevent self-approval | 9.1 (notifications, self-approval); 9.2 (non-hard-block notifications) |
| 8.7.4 Configurable Elements | SHOULD configurable; SHOULD document | 9.2 (configuration, introspection) |
| 8.7.5 Minimum Viable | MUST Step Limit; SHOULD Previous Outcome + Pattern Matching; MAY Safety Tier + Risk Tolerance | 9.1 (step limit); 9.2 (outcome, patterns); 9.3 (tier, tolerance) |
| 8.8 Introduction | MUST require OOB for verify/danger_zone |
9.1 (OOB verification) |
| 8.8.1 Trigger Conditions | MUST stopped: true + danger_zone notification |
9.1 (stopped: true, notification broadcast) |
| 8.8.2 Challenge Protocol | MUST entropy, display, notification, expiration | 9.1 (code generation, display, challenge ID, expiration); 9.2 (timeout) |
| 8.8.3 Blocking Semantics | MUST reject, persist, no bypass; SHOULD verify behavior | 9.1 (blocking); 9.2 (verify tier) |
| 8.8.4 Channel Separation | MUST NOT expose code (5 specific prohibitions) | 9.1 (display/channel separation — consolidated) |
| 8.8.5 Rate Limiting | SHOULD rate-limit, persist, audit; MAY progressive lockout | 9.2 (rate limiting); 9.3 (lockout) |
| 8.8.6 Implementation Flexibility | MAY any display channel | 9.3 (display channels) |
Related Specifications
- Core Specification — Section 8.6: Execution Safety Loop — Normative requirements
- Core Specification — Section 8.7: Autonomy Evaluation — AutonomyDirective contract and evaluation pipeline
- Core Specification — Section 8.8: Out-of-Band Verification — Out-of-band verification protocol (
verifytier and Danger Zone) - Gatekeeper Specification — Multi-layer access control architecture
- Danger Levels Specification — Risk classification and trust-to-danger gating
- Confirmation Tokens Specification — Token protocol for operation confirmation