SECURITY MODEL

Trust And Safety Controls

MCP-AQL security is designed around operation risk, trust-level mediation, and explicit confirmation for dangerous actions. This page explains how the draft protocol handles approval paths, danger tiers, and safe execution behavior.

Gatekeeper Spec Confirmation Tokens Execution Safety Loop

On this page

Jump to a section

Use the outline to move through longer pages without losing your place.

Trust-Level Mediation
Danger Classification
Confirmation token flow
Execution Safety Loop
Open Security Hardening Tracks
Source documents behind this summary

Go Deeper In The Full Spec

The full safety model is split across several dedicated spec documents hosted here on the website.

Related Summary Pages

Security decisions sit next to routing, error handling, and launch-readiness claims.

Trust-Level Mediation

What it means in practice

Implementations classify requester trust and apply policy gates before operation dispatch. The exact trust taxonomy is adapter-defined, but the enforcement pattern is consistent: low-trust contexts default toward safe reads, while higher-trust contexts can unlock mutations or execution flows.

Trust-level checks happen before execution
Dangerous operations require stronger policy context
Policies should be introspectable where feasible

Typical mediation outcome

A validated user might be allowed to call list_files and update_profile, but blocked from bulk_delete until stronger review, confirmation, or out-of-band verification is available.

Danger Classification

Operation risk labels

Operations can be tagged by danger class to support UI and agent policy decisions. Draft specification work tracks stronger requirements for danger metadata alignment with runtime surfaces.

Why the labels matter

Danger labels are the bridge between semantics and policy. They help an agent distinguish advisory actions from ones that must pause for approval or trigger hard-stop verification.

Danger level	Example operation	Likely safety tier	Expected behavior
`safe`	`list_files`	`advisory`	No intervention beyond normal logging
`destructive`	`delete_account`	`confirm`	Pause and require an approval path
`dangerous`	`bulk_delete`	`verify`	Mandatory pause plus stronger verification
`forbidden`	`drop_database`	`danger_zone`	Hard stop with out-of-band verification

Confirmation token flow

Request and required confirmation response

{
  "operation": "delete_repo",
  "params": {
    "owner": "acme",
    "repo": "widgets"
  }
}

{
  "success": false,
  "error": {
    "code": "CONFIRMATION_REQUIRED",
    "message": "This operation requires confirmation",
    "details": {
      "danger_level": "destructive",
      "confirmation_message": "Delete repository 'acme/widgets'?",
      "confirmation_token": "conf_abc123xyz"
    }
  }
}

Confirmed retry

The client retries the same operation with the issued token. That keeps the first denial machine-readable and makes the second request an explicit act of confirmation.

{
  "operation": "delete_repo",
  "params": {
    "owner": "acme",
    "repo": "widgets",
    "_confirmation": "conf_abc123xyz"
  }
}

Execution Safety Loop

What it means for agents

The execution safety loop means actions are evaluated before they happen, not merely logged afterward. An agent reports intent, receives an autonomy decision, and only proceeds when the current step is allowed.

Lifecycle shape

execute_agent -> record_execution_step -> continue_execution
                 \-> pause/confirm -> complete_execution or abort_execution

Session-bound controls prevent stale confirmation tokens, unexpected operation drift, and hidden escalation during long-running execution.

Open Security Hardening Tracks

Public launch posture should explicitly label in-progress hardening work. Current priority categories include batch safeguards, structured error alignment, and conformance validation depth for policy-sensitive operations.

Batch/resource exhaustion protections
Structured error surface consistency
Conformance coverage for security and safety behaviors