REPO-SYNCED SPEC DOC

Dangerous Operation Classification Specification

This document defines a standard classification system for dangerous operations in MCP-AQL adapters. Danger levels enable automatic lockdown of high-risk actions, consistent confirmation requirements, and trust-based per

Support documentDraft1.0.0-draft2026-04-15

Source: spec/docs/adapter/danger-levels.md

Open Source Markdown Browse Full Spec Reference

On this page

Jump to a section

Use the outline to move through longer pages without losing your place.

Abstract
1. Overview
2. Danger Level Enum
3. Operation Schema
4. Trust-to-Danger Gating
5. Automatic Lockdown
6. Standard Dangerous Patterns
7. Implementation Requirements
8. Danger Zone Enforcement During Execution
9. Future Extensions
References

Version: 1.0.0-draft Status: Draft Last Updated: 2026-04-15

Abstract

1. Overview

1.1 Purpose

MCP-AQL adapters expose operations that can have significant consequences:

Data modification - Creating, updating, or deleting resources
Irreversible actions - Operations that cannot be undone
Bulk operations - Actions affecting multiple resources
System-level changes - Configuration or access modifications

Danger levels provide a standardized framework for:

Risk classification - Categorize operations by potential harm
Confirmation requirements - Require acknowledgment for risky actions
Trust-based gating - Restrict operations based on adapter trust level
Consistent UX - Uniform handling across different adapters

1.2 Design Principles

Safe by default - Operations without classification treated as potentially dangerous
Explicit classification - Adapter authors declare danger levels
Progressive escalation - Higher danger requires stronger safeguards
Auditable decisions - All dangerous operation attempts logged

1.3 Inspiration

This specification is inspired by:

Claude Code's dangerous git operation handling (force push, hard reset)
Database permission systems (SELECT vs DELETE vs DROP)
Unix file permissions (read, write, execute escalation)

1.4 Relationship to Trust Levels

Danger levels work in conjunction with trust levels (see Trust Levels):

Trust levels describe adapter reliability (who vouches for it)
Danger levels describe operation risk (what harm could occur)
Gating matrix combines both to determine if operation is permitted

2. Danger Level Enum

2.1 Danger Level Values

Level	Value	Name	Behavior
0	`safe`	Safe	No restrictions
1	`reversible`	Reversible	Standard confirmation
2	`destructive`	Destructive	Enhanced confirmation
3	`dangerous`	Dangerous	Explicit unlock required
4	`forbidden`	Forbidden	Blocked unless admin override

2.2 Level Descriptions

2.2.1 safe (Level 0)

Operations that cannot cause harm and are always permitted.

Characteristics:

Read-only operations
No state modification
No external effects
Fully reversible (N/A - no changes made)

Examples:

List resources
Get resource details
Search operations
Introspection queries

2.2.2 reversible (Level 1)

Operations that modify state but whose effects can typically be undone.

Characteristics:

Creates new resources
Modifies existing data
Changes are typically reversible
Affects single resources

Examples:

Create a new record
Update existing fields
Add permissions
Enable features

2.2.3 destructive (Level 2)

Operations that remove or significantly alter data, requiring enhanced confirmation.

Characteristics:

Deletes resources
Overwrites data
May affect dependent resources
Partially reversible (with effort)

Examples:

Delete a single resource
Overwrite file contents
Remove permissions
Archive/disable resources

2.2.4 dangerous (Level 3)

Operations that can cause significant harm and require explicit unlock.

Characteristics:

Bulk modifications
Bypasses safety checks
Affects multiple resources
Difficult or impossible to reverse

Examples:

Force push (overwrites history)
Bulk delete operations
Override safety validations
Clear audit logs

2.2.5 forbidden (Level 4)

Operations that should never be performed without administrator override.

Characteristics:

Catastrophic potential
System-wide impact
Cannot be undone
Production environment risks

Examples:

Drop database/table
Delete all records
Reset to factory state
Production data mutations
Truncate operations

3. Operation Schema

3.1 Danger Metadata in Operations

Operations declare danger level in their schema:

operations:
  delete:
    - name: delete_repo
      maps_to: "DELETE /repos/{owner}/{repo}"
      description: "Permanently delete a repository"
      danger:
        level: destructive
        reasons:
          - "Permanently removes repository and all contents"
          - "Cannot be recovered after grace period"
          - "Affects forks and dependent projects"
        confirmation_message: "Delete repository '{owner}/{repo}'? This cannot be undone."
        cooldown_seconds: 5

3.2 Danger Block Schema

interface DangerMetadata {
  /**
   * Danger level (0-4)
   * Default: inferred from CRUDE category
   */
  level: 'safe' | 'reversible' | 'destructive' | 'dangerous' | 'forbidden';

  /**
   * Human-readable explanations of why this is dangerous
   */
  reasons?: string[];

  /**
   * Custom confirmation message template
   * Supports {param} interpolation
   */
  confirmation_message?: string;

  /**
   * Minimum seconds between confirmation and execution
   * Gives user time to reconsider
   */
  cooldown_seconds?: number;

  /**
   * Requires re-authentication before execution
   */
  requires_reauth?: boolean;

  /**
   * Additional context for audit logging
   */
  audit_context?: string[];
}

3.3 Default Danger Levels by CRUDE Category

When danger.level is not specified, defaults are inferred:

CRUDE Category	Default Danger Level	Rationale
Read	`safe` (0)	No state modification
Create	`reversible` (1)	Adds data, reversible by delete
Update	`reversible` (1)	Modifies data, often reversible
Delete	`destructive` (2)	Removes data
Execute	`reversible` (1)	Depends on operation

Adapter authors SHOULD override defaults when operations are more dangerous than the category default suggests.

3.4 Introspection Response

Danger metadata is available via introspection:

{
  operation: "introspect",
  params: { query: "operations", name: "force_delete" }
}

// Response
{
  success: true,
  data: {
    name: "force_delete",
    endpoint: "DELETE",
    danger: {
      level: "dangerous",
      reasons: [
        "Bypasses soft-delete protection",
        "Cannot be undone",
        "Affects dependent resources"
      ],
      confirmation_message: "This will permanently delete {resource} and all dependent data."
    }
  }
}

4. Trust-to-Danger Gating

This section defines the canonical gating matrix that combines adapter trust levels with operation danger levels. For trust level definitions and promotion rules, see Trust Levels Specification.

4.1 Gating Matrix

The combination of adapter trust level and operation danger level determines behavior:

Danger Level	untested	generated	validated	community_reviewed	certified
safe (0)	introspect_only	allow	allow	allow	allow
reversible (1)	deny	deny	allow	allow	allow
destructive (2)	deny	deny	confirm	allow	allow
dangerous (3)	deny	deny	deny	confirm	allow
forbidden (4)	deny	deny	deny	deny	confirm

4.2 Behavior Definitions

Behavior	Description
`allow`	Operation executes without additional gates
`confirm`	Operation requires explicit user confirmation
`deny`	Operation blocked with error response
`introspect_only`	Only introspection operations permitted; all other operations (including safe ones) are blocked

4.3 Confirmation Flow

When confirmation is required, the adapter issues a confirmation token that the client must include in a retry request. See Confirmation Token Specification for token generation, validation, and lifecycle requirements.

Step 1: Initial Request

{
  operation: "delete_repo",
  params: { owner: "acme", repo: "widgets" }
}

Step 2: Confirmation Required Response

{
  "success": false,
  "error": {
    "code": "CONFIRMATION_REQUIRED",
    "message": "This operation requires confirmation",
    "details": {
      "operation": "delete_repo",
      "danger_level": "destructive",
      "reasons": [
        "Permanently removes repository and all contents",
        "Cannot be recovered after grace period"
      ],
      "confirmation_message": "Delete repository 'acme/widgets'? This cannot be undone.",
      "confirmation_token": "conf_abc123xyz",
      "expires_at": "2026-01-28T12:05:00Z"
    }
  }
}

Step 3: Confirmed Request

{
  operation: "delete_repo",
  params: {
    owner: "acme",
    repo: "widgets",
    _confirmation: "conf_abc123xyz"
  }
}

4.4 Denial Response

When an operation is denied due to trust/danger mismatch:

{
  "success": false,
  "error": {
    "code": "PERMISSION_DANGER_LEVEL_DENIED",
    "message": "Operation 'bulk_delete' (danger: dangerous) denied for adapter trust level 'validated'",
    "details": {
      "operation": "bulk_delete",
      "danger_level": "dangerous",
      "adapter_trust": "validated",
      "minimum_trust_required": "community_reviewed",
      "reasons": [
        "Affects multiple resources",
        "Cannot be undone"
      ]
    }
  }
}

Note: The PERMISSION_DANGER_LEVEL_DENIED error code is introduced by this specification and should be added to the Error Codes Specification as a future extension under the PERMISSION_ category.

5. Automatic Lockdown

5.1 Pattern-Based Classification

Systems SHOULD automatically classify operations based on naming patterns:

# Gatekeeper configuration
danger_patterns:
  dangerous:
    - "force_*"           # Force operations
    - "*_permanently"     # Permanent actions
    - "bulk_delete*"      # Bulk deletion
    - "override_*"        # Safety overrides
    - "bypass_*"          # Validation bypass

  forbidden:
    - "drop_*"            # Drop database/table
    - "delete_all*"       # Delete all records
    - "truncate_*"        # Truncate operations
    - "reset_*"           # Reset to factory
    - "destroy_*"         # Destroy operations

Pattern precedence: When an operation matches multiple patterns at different danger levels, the highest danger level applies. For example, force_delete_all matches both force_* (dangerous, level 3) and delete_all* (forbidden, level 4), so it defaults to forbidden.

5.2 Lockdown Behavior

When an operation matches a dangerous pattern:

Classification - Operation tagged with inferred danger level
Warning - LLM/user warned of danger classification
Confirmation - Appropriate confirmation level applied
Audit - Operation attempt logged regardless of outcome

5.3 Override Mechanism

Explicit danger declarations in the adapter schema override pattern matching:

operations:
  execute:
    - name: force_sync
      maps_to: "POST /sync?force=true"
      description: "Force synchronization (safe, just skips cache)"
      danger:
        level: safe  # Overrides pattern match for "force_*"
        reasons:
          - "Force flag only bypasses cache, no data risk"

6. Standard Dangerous Patterns

6.1 Level 3 (Dangerous) Patterns

Operations matching these patterns SHOULD default to dangerous:

Pattern	Description	Examples
`force_*`	Force operations bypassing safety	`force_push`, `force_delete`
`*_permanently`	Permanent/irreversible actions	`remove_permanently`
`bulk_delete*`	Mass deletion operations	`bulk_delete_users`
`override_*`	Safety override operations	`override_validation`
`bypass_*`	Validation bypass	`bypass_approval`
`*_without_backup`	Operations skipping backup	`delete_without_backup`
`purge_*`	Purge/clean operations	`purge_logs`, `purge_cache`

6.2 Level 4 (Forbidden) Patterns

Operations matching these patterns SHOULD default to forbidden:

Pattern	Description	Examples
`drop_*`	Drop database/table/collection	`drop_table`, `drop_database`
`delete_all*`	Delete all records	`delete_all_users`
`truncate_*`	Truncate operations	`truncate_table`
`reset_*`	Reset to empty/factory state	`reset_database`
`destroy_*`	Complete destruction	`destroy_environment`
`wipe_*`	Wipe operations	`wipe_data`
`*_production`	Production mutations	`deploy_production`

6.3 Safe Patterns

Operations matching these patterns are confirmed safe:

Pattern	Description	Examples
`get_*`	Retrieve single resource	`get_user`, `get_repo`
`list_*`	List resources	`list_repos`, `list_users`
`search_*`	Search operations	`search_issues`
`count_*`	Count operations	`count_records`
`check_*`	Validation checks	`check_status`
`introspect*`	Introspection	`introspect`

7. Implementation Requirements

7.1 MUST Requirements

Implementations supporting danger levels MUST:

Default unlabeled operations to at least reversible (not safe)
Enforce confirmation for destructive and higher operations
Log all dangerous operation attempts (level 2+)
Return appropriate error codes for denied operations
Include danger information in introspection responses

7.2 SHOULD Requirements

Implementations supporting danger levels SHOULD:

Implement pattern-based automatic classification
Support configurable gating policies
Provide confirmation token mechanism
Allow adapter-level danger overrides
Display danger reasons to users before confirmation

7.3 MAY Requirements

Implementations supporting danger levels MAY:

Implement cooldown periods between confirmation and execution
Require re-authentication for dangerous and forbidden operations
Support custom confirmation UX per danger level
Integrate with external approval workflows

7.4 Audit Requirements

All operations at danger level 2 (destructive) or higher MUST be logged:

interface DangerAuditEntry {
  timestamp: string;
  adapter_name: string;
  operation: string;
  danger_level: number;
  adapter_trust: string;
  outcome: 'allowed' | 'confirmed' | 'denied';
  confirmation_token?: string;
  user_id?: string;
  parameters?: Record<string, unknown>;  // Redacted as appropriate
}

8. Danger Zone Enforcement During Execution

When the Execution Safety Loop is active, danger levels integrate with the Autonomy Evaluator to provide continuous risk assessment during agent execution.

8.1 Safety Tier Mapping

The Autonomy Evaluator maps danger levels to safety tiers using a numeric risk score (0-100):

Danger Level	Risk Score Range	Safety Tier	Enforcement
safe (0)	0-15	`advisory`	No intervention
reversible (1)	16-40	`advisory` or `confirm`	Depends on risk tolerance
destructive (2)	41-60	`confirm`	Requires approval
dangerous (3)	61-85	`verify`	Mandatory pause + approval
forbidden (4)	86-100	`danger_zone`	Hard stop + out-of-band verification

Note: The exact risk score assigned depends on additional factors evaluated by the Autonomy Evaluator pipeline (step history, action patterns, risk tolerance configuration). The danger level provides the baseline, not the final score.

8.2 Out-of-Band Verification by Danger Level

Operations at danger level dangerous (level 3) and forbidden (level 4) both trigger out-of-band verification during execution, but with different enforcement severity:

Forbidden (level 4) — Hard block (danger_zone tier):

The Autonomy Evaluator assigns danger_zone safety tier
The AutonomyDirective returns stopped: true
A danger_zone notification is broadcast to all executing agents
The agent is blocked at the agent level (not just the current execution)
A verification challenge is generated with a code displayed through an AI-inaccessible channel
Only successful out-of-band verification or admin override can unblock the agent
The block persists across server restarts

Dangerous (level 3) — Pause (verify tier):

The Autonomy Evaluator assigns verify safety tier
The AutonomyDirective returns continue: false (without stopped: true)
An autonomy_pause notification is sent to the executing agent (not broadcast)
A verification challenge is generated and displayed through an AI-inaccessible channel
The agent is paused until verification succeeds or the challenge expires
The pause does not persist across server restarts and does not prevent new executions

See Section 8.8 (Out-of-Band Verification) of the core specification for the full challenge-response protocol.

8.3 Danger Escalation During Execution

An operation's effective danger level MAY increase during execution based on context:

Repetition escalation: The same operation performed repeatedly within an execution session MAY escalate (e.g., a single delete_record is destructive, but 50 sequential deletions MAY escalate to dangerous)
Pattern escalation: The Autonomy Evaluator's pattern matching (Section 8.7.2, Stage 3) MAY assign a higher safety tier than the operation's declared danger level warrants
Cumulative risk: Adapters MAY track cumulative risk within an execution and escalate when a threshold is exceeded

Danger level escalation during execution does not modify the operation's declared danger level — it affects only the safety tier assigned by the Autonomy Evaluator for that specific evaluation.

9. Future Extensions

9.1 Conditional Danger Levels

Danger level based on parameter values:

operations:
  delete:
    - name: delete_records
      maps_to: "DELETE /records"
      danger:
        default_level: reversible
        conditions:
          - when: "params.count > 100"
            level: dangerous
            reason: "Bulk delete of more than 100 records"
          - when: "params.permanent == true"
            level: destructive
            reason: "Permanent deletion requested"

9.2 Approval Workflows

Integration with external approval systems:

danger:
  level: dangerous
  approval:
    required: true
    approvers:
      - role: "admin"
      - team: "security"
    timeout_hours: 24

9.3 Danger Escalation

Operations that become more dangerous over time:

danger:
  level: reversible
  escalation:
    - after_count: 10
      level: destructive
      message: "You've performed this operation 10 times this hour"
    - after_count: 50
      level: dangerous
      message: "Unusual activity detected"

9.4 Danger Score Aggregation

Combining multiple risk factors into a composite score:

danger:
  score_factors:
    - base_level: reversible
    - production_env: +1
    - bulk_operation: +1
    - no_backup: +1
  computed_level: dangerous  # Sum exceeds threshold

References

Adapter Element Type Specification
Trust Levels Specification
Rate Limiting Specification
Confirmation Token Specification
Security Model: Gatekeeper
Execution Safety Loop Specification
Error Codes Specification
Claude Code dangerous git operation handling
GitHub Issue: #49