MCP-AQL Conformance Testing Specification
Document Status: This document is an informative conformance framework specification aligned to the normative protocol requirements in docs/versions/v1.0.0-draft.md. Implementation Status: This repository now includes a
Source: spec/docs/conformance-testing.md
On this page
Jump to a section
Use the outline to move through longer pages without losing your place.
Version: 1.0.0-draft Status: Draft Last Updated: 2026-04-15
Document Status: This document is an informative conformance framework specification aligned to the normative protocol requirements in
docs/versions/v1.0.0-draft.md.Implementation Status: This repository now includes a fixture-driven prototype runner in
scripts/run-conformance-tests.mjsplus reference evidence bundles undertests/conformance/. A future packagedmcpaql-conformancetool may add live adapter execution on top of this baseline.
Abstract
This document specifies the conformance testing requirements for MCP-AQL implementations. It defines conformance levels, test categories, pass/fail criteria, and evaluation methodologies including LLM-based semantic evaluation.
1. Introduction
1.1 Purpose
MCP-AQL conformance testing ensures implementations meet the protocol specification and provide consistent, discoverable APIs for LLM interaction. This specification defines:
- What implementations MUST test
- How to evaluate test results
- How to report conformance levels
The repository implementation currently validates evidence bundles instead of probing live adapters over the network. Each bundle captures introspection responses, accepted parameter sets, representative success and failure results, and optional semantic-discoverability examples. This keeps the spec repo reviewable and deterministic while a future external runner grows into direct adapter execution.
1.2 Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
1.3 Test Result Classifications
| Result | Meaning |
|---|---|
| PASS | Test criteria fully satisfied |
| FAIL | Test criteria not met, conformance blocked |
| WARN | Test criteria partially met, conformance not blocked |
| SKIP | Test not applicable to this implementation |
2. Conformance Levels
2.1 Level 1: Basic Conformance
Level 1 conformance establishes the minimum viable MCP-AQL implementation.
Requirements:
| Requirement | Description |
|---|---|
| Introspect (operations) | introspect operation with query: "operations" implemented |
| Introspect (types) | introspect operation with query: "types" implemented |
| Endpoint routing | Operations routed to the correct documented semantic endpoint family in semantic endpoint mode |
| Response format | Discriminated union responses ({ success, data } or { success, error }) |
| Error handling | Structured error responses with code and message |
| Parameter naming | snake_case parameter naming convention |
Test Categories Required:
- Introspection Fidelity (MUST PASS)
- Parameter Handling (MUST PASS)
- Error Quality (MUST PASS)
- Round-Trip Integrity (MUST PASS)
2.2 Level 2: Full Conformance
Level 2 conformance indicates a complete MCP-AQL implementation with all optional features.
Requirements:
| Requirement | Description |
|---|---|
| Level 1 | All Level 1 requirements |
| Endpoint modes | crude semantic endpoint mode and single mode supported |
| Field selection | fields parameter on READ operations, with preset-name support documented where implemented |
| Batch operations | Multi-operation batching with individual results |
| Cross-cutting params | Consistent documentation for collection-query controls such as query, filter, pagination, field selection, and sorting |
Test Categories Required:
- All Level 1 test categories
- Level 2 Features (SHOULD PASS)
- Constraint Documentation (SHOULD PASS)
- Semantic Evaluation (SHOULD PASS)
2.3 Conformance Certification
Implementations MAY claim conformance levels as follows:
MCP-AQL Level 1 Conformant
MCP-AQL Level 2 Conformant
Claims MUST include the specification version tested against.
3. Test Categories
3.1 Introspection Fidelity Tests (MUST PASS)
Verifies that introspection accurately describes the implementation.
TEST: Introspection Parameter Accuracy
FOR EACH operation in introspection response:
1. Extract parameter names and types
2. Construct a valid request using ONLY introspection guidance
3. Verify the request succeeds OR fails with a documented error
PASS: All documented parameters work as described
FAIL: Following introspection exactly produces unexpected errors
TEST: Introspection Completeness
FOR EACH operation in implementation:
1. Get documented parameters from introspection
2. Attempt operation with each documented parameter
3. Attempt operation with known cross-cutting parameters
4. Compare accepted vs documented parameters
PASS: All accepted parameters appear in introspection
FAIL: Operation accepts parameters not in introspection
WARN: Introspection documents parameters not accepted
3.2 Parameter Handling Tests (MUST PASS)
Verifies consistent parameter behavior.
TEST: Required Parameter Enforcement
FOR EACH operation with required parameters:
1. Omit each required parameter in turn
2. Verify operation fails with clear error
PASS: Missing required params produce "missing parameter" errors
FAIL: Operation succeeds without required parameter
FAIL: Error message does not identify the missing parameter
TEST: Unknown Parameter Handling
FOR EACH operation:
1. Send valid request with one additional unknown parameter
2. Verify server either:
a) Accepts request (param ignored with optional warning), OR
b) Rejects with clear "unknown parameter" error
PASS: Behavior is explicit (warning OR error)
FAIL: Unknown parameters silently ignored with no indication
TEST: Optional Parameter Defaults
FOR EACH optional parameter with documented default:
1. Send request without the parameter
2. Verify response reflects documented default behavior
PASS: Defaults applied as documented
FAIL: Behavior differs from documented default
3.3 Error Quality Tests (MUST PASS)
Verifies error messages are user-appropriate.
TEST: No Implementation Leakage
FOR EACH error condition:
1. Trigger the error
2. Verify error message does NOT contain:
- Programming language artifacts (TypeError, #<Object>, .js:, .ts:)
- Stack traces (at Function, at Module)
- Internal paths (/src/, /node_modules/)
PASS: Error messages are implementation-agnostic
FAIL: Raw implementation errors exposed
TEST: Actionable Error Messages
FOR EACH validation error:
1. Trigger the error
2. Verify error message includes:
- What went wrong
- Which parameter/field is affected
- Expected type or format (for type errors)
PASS: Error messages enable self-correction
WARN: Error messages lack actionable guidance
Recommended Error Format:
Missing required parameter '{paramName}'. Expected: {type} ({description})
3.4 Round-Trip Integrity Tests (MUST PASS)
Verifies data consistency through create-read cycles.
TEST: Create-Read Consistency
FOR EACH element type:
1. Create element with all documented optional fields
2. Read element back via get operation
3. Compare all fields
PASS: All submitted fields present and unchanged
FAIL: Data silently dropped during create
FAIL: Field values modified unexpectedly
TEST: Update Preservation
FOR EACH element type:
1. Create element with initial values
2. Update subset of fields
3. Read element back
4. Verify non-updated fields unchanged
PASS: Unmodified fields preserved
FAIL: Update operation affects non-targeted fields
3.5 Constraint Documentation Tests (SHOULD PASS)
Verifies element-specific constraints are discoverable.
TEST: Read-Only Field Protection
FOR EACH element type with read-only fields:
1. Attempt to update a read-only field
2. Verify operation fails with constraint error
3. Verify introspection documents the constraint
PASS: Constraint enforced AND documented
WARN: Constraint enforced but not in introspection
FAIL: Read-only field can be modified
TEST: Append-Only Semantics
FOR EACH append-only data type (e.g., Memory entries):
1. Verify content cannot be modified via update
2. Verify error message explains append-only behavior
3. Verify introspection documents the constraint
PASS: Append-only enforced AND documented
WARN: Enforced but not documented
4. Test Requirements
4.1 MUST PASS Requirements
The following test categories MUST pass for Level 1 conformance:
| Category | Tests | Rationale |
|---|---|---|
| Introspection Fidelity | 4 | LLMs must trust introspection |
| Parameter Handling | 4 | Consistent behavior across operations |
| Error Quality | 3 | Usable error messages |
| Round-Trip Integrity | 2 | Data consistency guarantee |
Total MUST PASS tests: 13
4.2 SHOULD PASS Requirements
The following test categories SHOULD pass for Level 2 conformance:
| Category | Tests | Rationale |
|---|---|---|
| Constraint Documentation | 2 | Discoverable constraints |
| Level 2 Features | 3 | Endpoint modes, field selection, and batch operations |
| Semantic Evaluation | Per implementation | LLM discoverability |
Failure in SHOULD PASS tests: Produces WARN, does not block conformance
5. Evaluation Methodology
5.1 Two-Tier Evaluation Approach
Conformance tests SHOULD use a two-tier evaluation approach:
Tier 1: Structural Validation (Fast, Deterministic)
- Pattern matching for expected response elements
- JSON schema validation for response structure
- Presence checks for required fields
- Regex validation for error message patterns
Characteristics:
- Fast execution
- Deterministic results
- Catches obvious failures
Tier 2: Semantic Validation (Comprehensive, AI-Assisted)
- LLM evaluation of response correctness
- Semantic similarity scoring
- Intent classification for operation selection
- Natural language understanding of guidance
Characteristics:
- Comprehensive coverage
- Handles natural language variation
- Catches subtle usability issues
5.2 Semantic Evaluation Requirements
Tests for LLM discoverability SHOULD use semantic evaluation:
TEST: API Discoverability
1. Prompt LLM with discovery task
2. Capture LLM response and tool calls
3. Tier 1: Check for expected patterns (fast fail)
4. Tier 2: If Tier 1 inconclusive, evaluate semantic correctness
5. Report both structural and semantic results
PASS: Tier 1 passes AND Tier 2 confirms understanding
WARN: Tier 1 fails but Tier 2 passes (potential pattern update needed)
FAIL: Both tiers fail OR Tier 1 passes but Tier 2 fails
5.3 Test Categories Requiring Semantic Evaluation
| Category | Example Prompt | Why Semantic Matters |
|---|---|---|
| Introspection Discovery | "List available operations" | LLM may describe operations differently |
| Field Selection Discovery | "Does search support field selection?" | Affirmative answer may use varied phrasing |
| Error Understanding | "Handle this error gracefully" | Recovery strategies may vary |
| Operation Selection | "Create an element" | Correct tool choice matters more than exact call format |
5.4 Evaluation Workflow
graph TD
A[Run Test] --> B{Tier 1 Pass?}
B -->|Yes| C{Tier 2 Pass?}
B -->|No| D{Tier 2 Pass?}
C -->|Yes| E[PASS]
C -->|No| F[FAIL]
D -->|Yes| G[WARN: Update patterns]
D -->|No| F
6. Reporting
6.1 Conformance Report Format
Implementations SHOULD generate conformance reports:
{
"implementation": "example-adapter",
"version": "1.0.0",
"specVersion": "1.0.0-draft",
"requestedLevel": 1,
"conformanceLevel": 1,
"summary": {
"total": 11,
"passed": 9,
"warned": 2,
"failed": 0,
"skipped": 0
},
"categories": [
{
"name": "Introspection Fidelity",
"required": true,
"result": "PASS",
"tests": [
{ "name": "Parameter Accuracy", "result": "PASS" },
{ "name": "Completeness", "result": "PASS" }
]
}
]
}6.2 Badge Format
Conformant implementations MAY display badges:

6.3 Certification Registry
A future certification registry MAY track conformant implementations.
7. Command-Line Interface
7.1 CLI Invocation
Conformance test runners SHOULD provide a command-line interface:
mcpaql-conformance <command> [options]The current repository prototype is invoked as:
node scripts/run-conformance-tests.mjs <command> [options]7.2 Commands
| Command | Description | Example |
|---|---|---|
test |
Run conformance tests against a fixture evidence bundle | node scripts/run-conformance-tests.mjs test tests/conformance/evidence/reference-level2.json --level 2 |
verify-fixtures |
Verify all repository reference fixtures against their expected exit codes | node scripts/run-conformance-tests.mjs verify-fixtures |
report |
Generate formatted output from a JSON results file | node scripts/run-conformance-tests.mjs report ./results.json --format markdown |
version |
Print tool version | node scripts/run-conformance-tests.mjs version |
Note: The report command takes a JSON results file (produced by test --format json --output results.json) as input and generates human-readable or markdown output.
7.3 Test Command Options
| Option | Description | Default |
|---|---|---|
--level, -l |
Conformance level to test (1 or 2) | 1 |
--output, -o |
Output file for results | stdout |
--format, -f |
Output format (json, text, markdown) |
text |
--tier |
Evaluation tier (1, 2, both) |
both |
--category, -c |
Run specific test category only | All |
--tier 2 and --tier both currently produce the same semantic-evaluation behavior. The runner reserves the distinction for future live-adapter or split-tier expansion.
7.4 Exit Codes
| Code | Meaning | Description |
|---|---|---|
0 |
All tests passed | Conformance achieved at requested level |
1 |
Tests failed | One or more MUST PASS tests failed |
2 |
Tests warned | All MUST PASS passed, but SHOULD PASS tests warned |
3 |
Configuration error | Invalid fixture/report path, malformed JSON, unknown command, or invalid report input |
7.5 Example Usage
Run Level 1 conformance tests:
node scripts/run-conformance-tests.mjs test \
tests/conformance/evidence/reference-level1.json \
--level 1Run Level 2 tests with JSON output:
node scripts/run-conformance-tests.mjs test \
tests/conformance/evidence/reference-level2.json \
--level 2 \
--format json \
--output results.jsonRun specific test category:
node scripts/run-conformance-tests.mjs test \
tests/conformance/evidence/reference-level2.json \
--level 2 \
--category "Introspection Fidelity"Tier 2 semantic evaluation:
node scripts/run-conformance-tests.mjs test \
tests/conformance/evidence/reference-level2.json \
--level 2 \
--tier bothGenerate markdown report from results:
node scripts/run-conformance-tests.mjs report ./results.json --format markdown > CONFORMANCE.mdVerify the repository reference fixtures:
npm run test:conformance7.6 Integration with Generator
The adapter generator (see Adapter Generator Specification) SHOULD invoke conformance tests as part of the generation workflow:
# Generate adapter and run conformance tests
mcpaql-generate --schema adapter.yaml --target typescript --output ./adapter
# Test generated fixture evidence
node scripts/run-conformance-tests.mjs test ./adapter/conformance/reference.json --level 1References
- MCP-AQL Specification v1.0.0-draft
- Introspection Specification
- Operations Specification
- GitHub Issue #10 - Conformance test suite
- GitHub Issue #56 - LLM semantic evaluation