Trust And Safety Controls
MCP-AQL security is designed around operation risk, trust-level mediation, and explicit confirmation for dangerous actions. This page explains how the draft protocol handles approval paths, danger tiers, and safe execution behavior.
On this page
Jump to a section
Use the outline to move through longer pages without losing your place.
Related reading
Go Deeper In The Full Spec
The full safety model is split across several dedicated spec documents hosted here on the website.
Related Summary Pages
Security decisions sit next to routing, error handling, and launch-readiness claims.
Trust-Level Mediation
What it means in practice
Implementations classify requester trust and apply policy gates before operation dispatch. The exact trust taxonomy is adapter-defined, but the enforcement pattern is consistent: low-trust contexts default toward safe reads, while higher-trust contexts can unlock mutations or execution flows.
- Trust-level checks happen before execution
- Dangerous operations require stronger policy context
- Policies should be introspectable where feasible
Typical mediation outcome
A validated user might be allowed to call list_files and update_profile, but blocked from
bulk_delete until stronger review, confirmation, or out-of-band verification is available.
Danger Classification
Operation risk labels
Operations can be tagged by danger class to support UI and agent policy decisions. Draft specification work tracks stronger requirements for danger metadata alignment with runtime surfaces.
Why the labels matter
Danger labels are the bridge between semantics and policy. They help an agent distinguish advisory actions from ones that must pause for approval or trigger hard-stop verification.
| Danger level | Example operation | Likely safety tier | Expected behavior |
|---|---|---|---|
safe |
list_files |
advisory |
No intervention beyond normal logging |
destructive |
delete_account |
confirm |
Pause and require an approval path |
dangerous |
bulk_delete |
verify |
Mandatory pause plus stronger verification |
forbidden |
drop_database |
danger_zone |
Hard stop with out-of-band verification |
Confirmation token flow
Request and required confirmation response
{
"operation": "delete_repo",
"params": {
"owner": "acme",
"repo": "widgets"
}
}
{
"success": false,
"error": {
"code": "CONFIRMATION_REQUIRED",
"message": "This operation requires confirmation",
"details": {
"danger_level": "destructive",
"confirmation_message": "Delete repository 'acme/widgets'?",
"confirmation_token": "conf_abc123xyz"
}
}
}
Confirmed retry
The client retries the same operation with the issued token. That keeps the first denial machine-readable and makes the second request an explicit act of confirmation.
{
"operation": "delete_repo",
"params": {
"owner": "acme",
"repo": "widgets",
"_confirmation": "conf_abc123xyz"
}
}
Execution Safety Loop
What it means for agents
The execution safety loop means actions are evaluated before they happen, not merely logged afterward. An agent reports intent, receives an autonomy decision, and only proceeds when the current step is allowed.
Lifecycle shape
execute_agent -> record_execution_step -> continue_execution
\-> pause/confirm -> complete_execution or abort_execution
Session-bound controls prevent stale confirmation tokens, unexpected operation drift, and hidden escalation during long-running execution.
Open Security Hardening Tracks
Public launch posture should explicitly label in-progress hardening work. Current priority categories include batch safeguards, structured error alignment, and conformance validation depth for policy-sensitive operations.
- Batch/resource exhaustion protections
- Structured error surface consistency
- Conformance coverage for security and safety behaviors