REPO-SYNCED SPEC DOC

Rate Limiting and Quota Management Specification

This document defines the rate limiting and quota management system for MCP-AQL adapters. Rate limits enable systems to respect API constraints, prevent runaway costs, and provide graceful degradation under load.

Support documentDraft1.0.0-draft2026-01-28

Source: spec/docs/adapter/rate-limiting.md

Open Source Markdown Browse Full Spec Reference

On this page

Jump to a section

Use the outline to move through longer pages without losing your place.

Abstract
1. Overview
2. Rate Limits Schema
3. Quota Management
4. Enforcement Behavior
5. Error Codes
6. Introspection
7. Implementation Requirements
8. Future Extensions
References

Version: 1.0.0-draft Status: Draft Last Updated: 2026-01-28

Abstract

1. Overview

1.1 Purpose

When adapters wrap external APIs:

APIs have rate limits that must be respected to avoid blocking
Users want budgets to control costs and usage
Systems need protection against runaway API calls (especially in autonomous agent loops)
Graceful degradation is preferable to hard failures

Rate limiting in MCP-AQL provides:

API limit awareness - Extract and respect target API constraints
User-configurable quotas - Budget controls at multiple thresholds
Progressive enforcement - Warn, pause, and hard stop behaviors
Cost tracking - Monitor usage for paid APIs

1.2 Design Principles

Respect upstream limits - Never exceed API provider constraints
User control - Configurable quotas override defaults
Progressive response - Warn before blocking
Transparent status - Quota state always queryable

1.3 Scope

Included:

Rate limit schema for adapters
Quota configuration
Enforcement behaviors
Error codes for rate limiting
Introspection of quota status

Deferred:

Cross-adapter quota aggregation
Token-based rate limiting for LLM APIs
Distributed rate limiting coordination

2. Rate Limits Schema

2.1 Schema Structure

The rate_limits block in adapter front matter:

rate_limits:
  # API-defined limits (from target API documentation or headers)
  api_limits:
    - scope: global
      limit: 5000
      window: hour
    - scope: endpoint
      endpoint: "POST /search"
      limit: 30
      window: minute

  # User-configurable quotas
  quotas:
    enabled: true
    limits:
      - metric: requests_per_hour
        warn: 4000
        pause: 4800
        hard_stop: 5000
      - metric: cost_per_day
        warn: 5.00
        pause: 10.00
        hard_stop: 50.00
        currency: USD

  # Cost estimation for paid APIs
  cost:
    model: per_call
    currency: USD
    pricing:
      - endpoint: "*"
        cost_per_call: 0.001
      - endpoint: "POST /premium/*"
        cost_per_call: 0.01

2.2 api_limits Block

Declares rate limits imposed by the target API.

interface ApiLimit {
  /**
   * Scope of the limit
   * - global: Applies to all operations
   * - endpoint: Applies to specific endpoint
   * - category: Applies to CRUDE category
   */
  scope: 'global' | 'endpoint' | 'category';

  /**
   * Endpoint pattern (when scope is 'endpoint')
   * Supports wildcards: "GET /users/*"
   */
  endpoint?: string;

  /**
   * CRUDE category (when scope is 'category')
   */
  category?: 'create' | 'read' | 'update' | 'delete' | 'execute';

  /**
   * Maximum number of requests in the window
   */
  limit: number;

  /**
   * Time window for the limit
   */
  window: 'second' | 'minute' | 'hour' | 'day';

  /**
   * Header name containing remaining requests (for runtime tracking)
   */
  remaining_header?: string;

  /**
   * Header name containing reset timestamp
   */
  reset_header?: string;
}

2.3 Common API Limit Patterns

# GitHub API pattern
api_limits:
  - scope: global
    limit: 5000
    window: hour
    remaining_header: "X-RateLimit-Remaining"
    reset_header: "X-RateLimit-Reset"

# OpenAI pattern (tokens per minute)
api_limits:
  - scope: global
    limit: 90000
    window: minute
    metric: tokens_per_minute

# Stripe pattern (different limits by CRUDE category)
api_limits:
  - scope: category
    category: read
    limit: 100
    window: second
  - scope: category
    category: create
    limit: 25
    window: second

3. Quota Management

3.1 Quotas Block

User-configurable limits that can be stricter than API limits.

interface QuotaConfig {
  /**
   * Whether quota enforcement is enabled
   */
  enabled: boolean;

  /**
   * Individual quota limits
   */
  limits: QuotaLimit[];

  /**
   * How to persist quota tracking
   */
  persistence?: 'memory' | 'file' | 'database';

  /**
   * When to reset counters
   */
  reset_schedule?: string;  // Cron expression
}

interface QuotaLimit {
  /**
   * What is being measured
   */
  metric: QuotaMetric;

  /**
   * Threshold for warning notification
   */
  warn: number;

  /**
   * Threshold for pause (require confirmation to continue)
   */
  pause: number;

  /**
   * Threshold for hard stop (block all requests)
   */
  hard_stop?: number;

  /**
   * Currency for cost metrics
   */
  currency?: string;
}

type QuotaMetric =
  | 'requests_per_minute'
  | 'requests_per_hour'
  | 'requests_per_day'
  | 'tokens_per_minute'
  | 'tokens_per_hour'
  | 'tokens_per_day'
  | 'cost_per_hour'
  | 'cost_per_day'
  | 'cost_per_month';

3.2 Threshold Behaviors

Threshold	Behavior	User Experience
`warn`	Log warning, continue	Notification displayed
`pause`	Require confirmation	"Continue?" prompt
`hard_stop`	Block all requests	Error response

3.3 Example Quota Configurations

Conservative (cost-conscious):

quotas:
  enabled: true
  limits:
    - metric: cost_per_day
      warn: 1.00
      pause: 5.00
      hard_stop: 10.00
      currency: USD

Development (permissive):

quotas:
  enabled: true
  limits:
    - metric: requests_per_hour
      warn: 1000
      pause: 5000
      # No hard_stop - never fully block

Production (strict):

quotas:
  enabled: true
  limits:
    - metric: requests_per_minute
      warn: 50
      pause: 80
      hard_stop: 100
    - metric: cost_per_day
      warn: 100.00
      pause: 200.00
      hard_stop: 500.00
      currency: USD

4. Enforcement Behavior

4.1 Enforcement Flow

Request → Check API Limits → Check Quotas → Execute or Block
              │                    │
              ▼                    ▼
         API blocked?        Quota exceeded?
              │                    │
              ▼                    ▼
         Wait/Retry         Warn/Pause/Stop

4.2 API Limit Enforcement

When API rate limit is reached:

Check remaining - Read from response headers if available
Predict limit - Track request counts if no headers
Pre-flight block - Block request before sending if limit would be exceeded
Retry-After - Respect Retry-After header from API responses

Pre-flight block response:

{
  "success": false,
  "error": {
    "code": "RATE_LIMIT_EXCEEDED",
    "message": "API rate limit would be exceeded",
    "details": {
      "limit": 5000,
      "remaining": 0,
      "window": "hour",
      "resets_at": "2026-01-28T13:00:00Z",
      "retry_after_seconds": 1847
    }
  }
}

4.3 Quota Enforcement

Warnings are included in successful responses via the warnings array. See Warnings Specification for the complete schema and client handling requirements.

Warning (warn threshold):

{
  "success": true,
  "data": { ... },
  "warnings": [
    {
      "code": "RATE_LIMIT_QUOTA_WARNING",
      "message": "Approaching quota limit",
      "details": {
        "metric": "requests_per_hour",
        "current": 4100,
        "warn_threshold": 4000,
        "pause_threshold": 4800
      }
    }
  ]
}

Pause (pause threshold):

{
  "success": false,
  "error": {
    "code": "RATE_LIMIT_QUOTA_PAUSE",
    "message": "Quota pause threshold reached",
    "details": {
      "metric": "requests_per_hour",
      "current": 4850,
      "pause_threshold": 4800,
      "hard_stop_threshold": 5000,
      "confirmation_token": "quota_continue_abc123",
      "expires_at": "2026-01-28T12:05:00Z"
    }
  }
}

Hard stop (hard_stop threshold):

{
  "success": false,
  "error": {
    "code": "RATE_LIMIT_QUOTA_EXHAUSTED",
    "message": "Quota exhausted",
    "details": {
      "metric": "requests_per_hour",
      "current": 5000,
      "hard_stop_threshold": 5000,
      "resets_at": "2026-01-28T13:00:00Z"
    }
  }
}

4.4 Continuing After Pause

To continue after pause threshold, include the confirmation token from the RATE_LIMIT_QUOTA_PAUSE response. See Confirmation Token Specification for token validation and lifecycle details.

{
  operation: "get_user",
  params: {
    user_id: "alice",
    _quota_continue: "quota_continue_abc123"
  }
}

5. Error Codes

5.1 Rate Limit Error Codes

Code	Description	Recovery
`RATE_LIMIT_EXCEEDED`	Target API rate limit reached	Wait for reset
`RATE_LIMIT_QUOTA_PAUSE`	User quota pause threshold	Confirm to continue
`RATE_LIMIT_QUOTA_EXHAUSTED`	User quota hard stop	Wait for reset
`RATE_LIMIT_QUOTA_WARNING`	Approaching limit (in warnings)	Consider slowing

5.2 Error Response Schema

interface RateLimitError {
  code: string;
  message: string;
  details: {
    /** What metric triggered the limit */
    metric?: string;
    /** Current usage count/amount */
    current?: number;
    /** The limit that was exceeded */
    limit?: number;
    /** Time window of the limit */
    window?: string;
    /** When the limit resets */
    resets_at?: string;
    /** Seconds until retry is allowed */
    retry_after_seconds?: number;
    /** Token to continue after pause */
    confirmation_token?: string;
    /** When confirmation expires */
    expires_at?: string;
  };
}

6. Introspection

6.1 Quota Status Query

{
  operation: "introspect",
  params: {
    query: "quota_status"
  }
}

6.2 Quota Status Response

{
  "success": true,
  "data": {
    "adapter": "github-api",
    "api_limits": {
      "global": {
        "limit": 5000,
        "remaining": 3500,
        "window": "hour",
        "resets_at": "2026-01-28T13:00:00Z"
      }
    },
    "quotas": [
      {
        "metric": "requests_per_hour",
        "current": 1500,
        "warn": 4000,
        "pause": 4800,
        "hard_stop": 5000,
        "status": "ok"
      },
      {
        "metric": "cost_per_day",
        "current": 4.50,
        "warn": 5.00,
        "pause": 10.00,
        "hard_stop": 50.00,
        "status": "warn",
        "currency": "USD"
      }
    ],
    "next_reset": "2026-01-28T13:00:00Z"
  }
}

6.3 Status Values

Status	Meaning
`ok`	Below warn threshold
`warn`	Above warn, below pause
`paused`	At pause threshold, confirmation required
`exhausted`	At hard stop, requests blocked

7. Implementation Requirements

7.1 MUST Requirements

Implementations supporting rate limiting MUST:

Respect api_limits and not exceed target API constraints
Return proper error codes when limits are reached
Include retry_after_seconds when blocking requests
Support introspection of current quota status

7.2 SHOULD Requirements

Implementations supporting rate limiting SHOULD:

Track API response headers for remaining/reset info
Implement the three-tier quota system (warn/pause/hard_stop)
Persist quota counters across sessions
Provide warnings in successful responses when approaching limits

7.3 MAY Requirements

Implementations supporting rate limiting MAY:

Support cost estimation and tracking
Implement automatic request queuing and retry
Support per-operation rate limits
Integrate with external rate limit services

7.4 Cost Tracking

For adapters with cost estimation:

cost:
  model: per_call    # per_call | per_token | per_byte | tiered
  currency: USD
  pricing:
    - endpoint: "*"
      cost_per_call: 0.001
    - endpoint: "POST /completions"
      cost_per_call: 0.002
      cost_per_token:
        input: 0.00001
        output: 0.00003

8. Future Extensions

8.1 Cross-Adapter Aggregation

For APIs with shared rate limits across endpoints:

rate_limits:
  shared_pools:
    - name: "github_core"
      endpoints:
        - "GET /repos/*"
        - "GET /users/*"
        - "GET /orgs/*"
      limit: 5000
      window: hour

8.2 Intelligent Request Scheduling

Automatic request queuing and batching:

rate_limits:
  scheduling:
    enabled: true
    max_queue_size: 100
    batch_similar_requests: true
    priority_by_danger_level: true

8.3 Token-Based Rate Limiting

For LLM APIs with token-based limits:

rate_limits:
  api_limits:
    - scope: global
      metric: tokens
      limit: 90000
      window: minute
      count_method: tiktoken  # Token counting method

8.4 Budget Alerts

External notification integration:

quotas:
  alerts:
    - trigger: warn
      action: webhook
      url: "https://alerts.example.com/budget"
    - trigger: pause
      action: email
      to: "admin@example.com"

8.5 Distributed Rate Limiting

For multi-instance deployments:

rate_limits:
  coordination:
    backend: redis
    connection: "${REDIS_URL}"
    key_prefix: "mcpaql:ratelimit:"