SECURITY MODEL

Trust And Safety Controls

MCP-AQL security is designed around operation risk, trust-level mediation, and explicit confirmation for dangerous actions. This page explains how the draft protocol handles approval paths, danger tiers, and safe execution behavior.

On this page

Jump to a section

Use the outline to move through longer pages without losing your place.

  1. Trust-Level Mediation
  2. Danger Classification
  3. Confirmation token flow
  4. Execution Safety Loop
  5. Open Security Hardening Tracks
  6. Source documents behind this summary

Related reading

Trust-Level Mediation

What it means in practice

Implementations classify requester trust and apply policy gates before operation dispatch. The exact trust taxonomy is adapter-defined, but the enforcement pattern is consistent: low-trust contexts default toward safe reads, while higher-trust contexts can unlock mutations or execution flows.

  • Trust-level checks happen before execution
  • Dangerous operations require stronger policy context
  • Policies should be introspectable where feasible

Typical mediation outcome

A validated user might be allowed to call list_files and update_profile, but blocked from bulk_delete until stronger review, confirmation, or out-of-band verification is available.

Danger Classification

Operation risk labels

Operations can be tagged by danger class to support UI and agent policy decisions. Draft specification work tracks stronger requirements for danger metadata alignment with runtime surfaces.

Why the labels matter

Danger labels are the bridge between semantics and policy. They help an agent distinguish advisory actions from ones that must pause for approval or trigger hard-stop verification.

Danger level Example operation Likely safety tier Expected behavior
safe list_files advisory No intervention beyond normal logging
destructive delete_account confirm Pause and require an approval path
dangerous bulk_delete verify Mandatory pause plus stronger verification
forbidden drop_database danger_zone Hard stop with out-of-band verification

Confirmation token flow

Request and required confirmation response

{
  "operation": "delete_repo",
  "params": {
    "owner": "acme",
    "repo": "widgets"
  }
}

{
  "success": false,
  "error": {
    "code": "CONFIRMATION_REQUIRED",
    "message": "This operation requires confirmation",
    "details": {
      "danger_level": "destructive",
      "confirmation_message": "Delete repository 'acme/widgets'?",
      "confirmation_token": "conf_abc123xyz"
    }
  }
}

Confirmed retry

The client retries the same operation with the issued token. That keeps the first denial machine-readable and makes the second request an explicit act of confirmation.

{
  "operation": "delete_repo",
  "params": {
    "owner": "acme",
    "repo": "widgets",
    "_confirmation": "conf_abc123xyz"
  }
}

Execution Safety Loop

What it means for agents

The execution safety loop means actions are evaluated before they happen, not merely logged afterward. An agent reports intent, receives an autonomy decision, and only proceeds when the current step is allowed.

Lifecycle shape

execute_agent -> record_execution_step -> continue_execution
                 \-> pause/confirm -> complete_execution or abort_execution

Session-bound controls prevent stale confirmation tokens, unexpected operation drift, and hidden escalation during long-running execution.

Open Security Hardening Tracks

Public launch posture should explicitly label in-progress hardening work. Current priority categories include batch safeguards, structured error alignment, and conformance validation depth for policy-sensitive operations.

  • Batch/resource exhaustion protections
  • Structured error surface consistency
  • Conformance coverage for security and safety behaviors

Source documents behind this summary