Rate Limiting and Quota Management Specification
This document defines the rate limiting and quota management system for MCP-AQL adapters. Rate limits enable systems to respect API constraints, prevent runaway costs, and provide graceful degradation under load.
On this page
Jump to a section
Use the outline to move through longer pages without losing your place.
Version: 1.0.0-draft Status: Draft Last Updated: 2026-01-28
Abstract
This document defines the rate limiting and quota management system for MCP-AQL adapters. Rate limits enable systems to respect API constraints, prevent runaway costs, and provide graceful degradation under load.
1. Overview
1.1 Purpose
When adapters wrap external APIs:
- APIs have rate limits that must be respected to avoid blocking
- Users want budgets to control costs and usage
- Systems need protection against runaway API calls (especially in autonomous agent loops)
- Graceful degradation is preferable to hard failures
Rate limiting in MCP-AQL provides:
- API limit awareness - Extract and respect target API constraints
- User-configurable quotas - Budget controls at multiple thresholds
- Progressive enforcement - Warn, pause, and hard stop behaviors
- Cost tracking - Monitor usage for paid APIs
1.2 Design Principles
- Respect upstream limits - Never exceed API provider constraints
- User control - Configurable quotas override defaults
- Progressive response - Warn before blocking
- Transparent status - Quota state always queryable
1.3 Scope
Included:
- Rate limit schema for adapters
- Quota configuration
- Enforcement behaviors
- Error codes for rate limiting
- Introspection of quota status
Deferred:
- Cross-adapter quota aggregation
- Token-based rate limiting for LLM APIs
- Distributed rate limiting coordination
2. Rate Limits Schema
2.1 Schema Structure
The rate_limits block in adapter front matter:
rate_limits:
# API-defined limits (from target API documentation or headers)
api_limits:
- scope: global
limit: 5000
window: hour
- scope: endpoint
endpoint: "POST /search"
limit: 30
window: minute
# User-configurable quotas
quotas:
enabled: true
limits:
- metric: requests_per_hour
warn: 4000
pause: 4800
hard_stop: 5000
- metric: cost_per_day
warn: 5.00
pause: 10.00
hard_stop: 50.00
currency: USD
# Cost estimation for paid APIs
cost:
model: per_call
currency: USD
pricing:
- endpoint: "*"
cost_per_call: 0.001
- endpoint: "POST /premium/*"
cost_per_call: 0.012.2 api_limits Block
Declares rate limits imposed by the target API.
interface ApiLimit {
/**
* Scope of the limit
* - global: Applies to all operations
* - endpoint: Applies to specific endpoint
* - category: Applies to CRUDE category
*/
scope: 'global' | 'endpoint' | 'category';
/**
* Endpoint pattern (when scope is 'endpoint')
* Supports wildcards: "GET /users/*"
*/
endpoint?: string;
/**
* CRUDE category (when scope is 'category')
*/
category?: 'create' | 'read' | 'update' | 'delete' | 'execute';
/**
* Maximum number of requests in the window
*/
limit: number;
/**
* Time window for the limit
*/
window: 'second' | 'minute' | 'hour' | 'day';
/**
* Header name containing remaining requests (for runtime tracking)
*/
remaining_header?: string;
/**
* Header name containing reset timestamp
*/
reset_header?: string;
}2.3 Common API Limit Patterns
# GitHub API pattern
api_limits:
- scope: global
limit: 5000
window: hour
remaining_header: "X-RateLimit-Remaining"
reset_header: "X-RateLimit-Reset"
# OpenAI pattern (tokens per minute)
api_limits:
- scope: global
limit: 90000
window: minute
metric: tokens_per_minute
# Stripe pattern (different limits by CRUDE category)
api_limits:
- scope: category
category: read
limit: 100
window: second
- scope: category
category: create
limit: 25
window: second3. Quota Management
3.1 Quotas Block
User-configurable limits that can be stricter than API limits.
interface QuotaConfig {
/**
* Whether quota enforcement is enabled
*/
enabled: boolean;
/**
* Individual quota limits
*/
limits: QuotaLimit[];
/**
* How to persist quota tracking
*/
persistence?: 'memory' | 'file' | 'database';
/**
* When to reset counters
*/
reset_schedule?: string; // Cron expression
}
interface QuotaLimit {
/**
* What is being measured
*/
metric: QuotaMetric;
/**
* Threshold for warning notification
*/
warn: number;
/**
* Threshold for pause (require confirmation to continue)
*/
pause: number;
/**
* Threshold for hard stop (block all requests)
*/
hard_stop?: number;
/**
* Currency for cost metrics
*/
currency?: string;
}
type QuotaMetric =
| 'requests_per_minute'
| 'requests_per_hour'
| 'requests_per_day'
| 'tokens_per_minute'
| 'tokens_per_hour'
| 'tokens_per_day'
| 'cost_per_hour'
| 'cost_per_day'
| 'cost_per_month';3.2 Threshold Behaviors
| Threshold | Behavior | User Experience |
|---|---|---|
warn |
Log warning, continue | Notification displayed |
pause |
Require confirmation | "Continue?" prompt |
hard_stop |
Block all requests | Error response |
3.3 Example Quota Configurations
Conservative (cost-conscious):
quotas:
enabled: true
limits:
- metric: cost_per_day
warn: 1.00
pause: 5.00
hard_stop: 10.00
currency: USDDevelopment (permissive):
quotas:
enabled: true
limits:
- metric: requests_per_hour
warn: 1000
pause: 5000
# No hard_stop - never fully blockProduction (strict):
quotas:
enabled: true
limits:
- metric: requests_per_minute
warn: 50
pause: 80
hard_stop: 100
- metric: cost_per_day
warn: 100.00
pause: 200.00
hard_stop: 500.00
currency: USD4. Enforcement Behavior
4.1 Enforcement Flow
Request → Check API Limits → Check Quotas → Execute or Block
│ │
▼ ▼
API blocked? Quota exceeded?
│ │
▼ ▼
Wait/Retry Warn/Pause/Stop
4.2 API Limit Enforcement
When API rate limit is reached:
- Check remaining - Read from response headers if available
- Predict limit - Track request counts if no headers
- Pre-flight block - Block request before sending if limit would be exceeded
- Retry-After - Respect
Retry-Afterheader from API responses
Pre-flight block response:
{
"success": false,
"error": {
"code": "RATE_LIMIT_EXCEEDED",
"message": "API rate limit would be exceeded",
"details": {
"limit": 5000,
"remaining": 0,
"window": "hour",
"resets_at": "2026-01-28T13:00:00Z",
"retry_after_seconds": 1847
}
}
}4.3 Quota Enforcement
Warnings are included in successful responses via the warnings array. See Warnings Specification for the complete schema and client handling requirements.
Warning (warn threshold):
{
"success": true,
"data": { ... },
"warnings": [
{
"code": "RATE_LIMIT_QUOTA_WARNING",
"message": "Approaching quota limit",
"details": {
"metric": "requests_per_hour",
"current": 4100,
"warn_threshold": 4000,
"pause_threshold": 4800
}
}
]
}Pause (pause threshold):
{
"success": false,
"error": {
"code": "RATE_LIMIT_QUOTA_PAUSE",
"message": "Quota pause threshold reached",
"details": {
"metric": "requests_per_hour",
"current": 4850,
"pause_threshold": 4800,
"hard_stop_threshold": 5000,
"confirmation_token": "quota_continue_abc123",
"expires_at": "2026-01-28T12:05:00Z"
}
}
}Hard stop (hard_stop threshold):
{
"success": false,
"error": {
"code": "RATE_LIMIT_QUOTA_EXHAUSTED",
"message": "Quota exhausted",
"details": {
"metric": "requests_per_hour",
"current": 5000,
"hard_stop_threshold": 5000,
"resets_at": "2026-01-28T13:00:00Z"
}
}
}4.4 Continuing After Pause
To continue after pause threshold, include the confirmation token from the RATE_LIMIT_QUOTA_PAUSE response. See Confirmation Token Specification for token validation and lifecycle details.
{
operation: "get_user",
params: {
user_id: "alice",
_quota_continue: "quota_continue_abc123"
}
}5. Error Codes
5.1 Rate Limit Error Codes
| Code | Description | Recovery |
|---|---|---|
RATE_LIMIT_EXCEEDED |
Target API rate limit reached | Wait for reset |
RATE_LIMIT_QUOTA_PAUSE |
User quota pause threshold | Confirm to continue |
RATE_LIMIT_QUOTA_EXHAUSTED |
User quota hard stop | Wait for reset |
RATE_LIMIT_QUOTA_WARNING |
Approaching limit (in warnings) | Consider slowing |
5.2 Error Response Schema
interface RateLimitError {
code: string;
message: string;
details: {
/** What metric triggered the limit */
metric?: string;
/** Current usage count/amount */
current?: number;
/** The limit that was exceeded */
limit?: number;
/** Time window of the limit */
window?: string;
/** When the limit resets */
resets_at?: string;
/** Seconds until retry is allowed */
retry_after_seconds?: number;
/** Token to continue after pause */
confirmation_token?: string;
/** When confirmation expires */
expires_at?: string;
};
}6. Introspection
6.1 Quota Status Query
{
operation: "introspect",
params: {
query: "quota_status"
}
}6.2 Quota Status Response
{
"success": true,
"data": {
"adapter": "github-api",
"api_limits": {
"global": {
"limit": 5000,
"remaining": 3500,
"window": "hour",
"resets_at": "2026-01-28T13:00:00Z"
}
},
"quotas": [
{
"metric": "requests_per_hour",
"current": 1500,
"warn": 4000,
"pause": 4800,
"hard_stop": 5000,
"status": "ok"
},
{
"metric": "cost_per_day",
"current": 4.50,
"warn": 5.00,
"pause": 10.00,
"hard_stop": 50.00,
"status": "warn",
"currency": "USD"
}
],
"next_reset": "2026-01-28T13:00:00Z"
}
}6.3 Status Values
| Status | Meaning |
|---|---|
ok |
Below warn threshold |
warn |
Above warn, below pause |
paused |
At pause threshold, confirmation required |
exhausted |
At hard stop, requests blocked |
7. Implementation Requirements
7.1 MUST Requirements
Implementations supporting rate limiting MUST:
- Respect
api_limitsand not exceed target API constraints - Return proper error codes when limits are reached
- Include
retry_after_secondswhen blocking requests - Support introspection of current quota status
7.2 SHOULD Requirements
Implementations supporting rate limiting SHOULD:
- Track API response headers for remaining/reset info
- Implement the three-tier quota system (warn/pause/hard_stop)
- Persist quota counters across sessions
- Provide warnings in successful responses when approaching limits
7.3 MAY Requirements
Implementations supporting rate limiting MAY:
- Support cost estimation and tracking
- Implement automatic request queuing and retry
- Support per-operation rate limits
- Integrate with external rate limit services
7.4 Cost Tracking
For adapters with cost estimation:
cost:
model: per_call # per_call | per_token | per_byte | tiered
currency: USD
pricing:
- endpoint: "*"
cost_per_call: 0.001
- endpoint: "POST /completions"
cost_per_call: 0.002
cost_per_token:
input: 0.00001
output: 0.000038. Future Extensions
8.1 Cross-Adapter Aggregation
For APIs with shared rate limits across endpoints:
rate_limits:
shared_pools:
- name: "github_core"
endpoints:
- "GET /repos/*"
- "GET /users/*"
- "GET /orgs/*"
limit: 5000
window: hour8.2 Intelligent Request Scheduling
Automatic request queuing and batching:
rate_limits:
scheduling:
enabled: true
max_queue_size: 100
batch_similar_requests: true
priority_by_danger_level: true8.3 Token-Based Rate Limiting
For LLM APIs with token-based limits:
rate_limits:
api_limits:
- scope: global
metric: tokens
limit: 90000
window: minute
count_method: tiktoken # Token counting method8.4 Budget Alerts
External notification integration:
quotas:
alerts:
- trigger: warn
action: webhook
url: "https://alerts.example.com/budget"
- trigger: pause
action: email
to: "admin@example.com"8.5 Distributed Rate Limiting
For multi-instance deployments:
rate_limits:
coordination:
backend: redis
connection: "${REDIS_URL}"
key_prefix: "mcpaql:ratelimit:"