chore: preserve one more schema layer during large tool compaction (#27084)

## Summary

Some customer MCP tools expose large input schemas that exceed Codex's
compact schema budget even after description stripping. Today, the final
compaction pass collapses complex schemas starting at depth 2, which can
erase important shallow call structure such as small `anyOf` branches,
required fields, and help-mode entry points. In one reported case, this
degraded a tool schema into `query: any | any`, leaving the model
without enough structure to discover the required help call.

This change raises the deep-schema collapse boundary from depth 2 to
depth 3. That preserves one additional layer of the tool contract while
still collapsing deeper expensive subtrees to `{}` when a schema remains
over budget.

## What Changed

- Increased `MAX_COMPACT_TOOL_SCHEMA_DEPTH` from `2` to `3`.
- Updated the schema compaction traversal test to assert the new
collapse boundary.
- The resulting compacted shape keeps useful shallow structure, for
example:
  - top-level argument names
  - shallow `anyOf` branches
  - required object fields
  - nested property names one level deeper than before

## Validation

- Ran `just test -p codex-tools`: 81 tests passed.
- Ran a golden schema corpus comparison over 214 discovered tool input
schemas under `golden_schemas/*/mcp_tools/*/input_schema.json`.
- Depth 2 and depth 3 had identical percentile token counts across the
corpus.
  - Both ended with `0 / 214` schemas over 1k tokens.
- Both ended with `0 / 214` schemas over the 4,000-byte compact JSON
budget.
- Only one golden schema changed, increasing from 49 to 56 tokens, so
this does not appear to introduce a meaningful corpus-wide regression.

Corpus percentile results:

| Percentile | Depth 2 | Depth 3 |
|---|---:|---:|
| p0 | 9 | 9 |
| p10 | 31 | 31 |
| p25 | 54 | 54 |
| p50 | 81 | 81 |
| p75 | 143 | 143 |
| p90 | 290 | 290 |
| p95 | 431 | 431 |
| p99 | 600 | 600 |
| max | 832 | 832 |
This commit is contained in:
Celia Chen
2026-06-08 16:07:56 -07:00
committed by GitHub
Unverified
parent 0473a5cc52
commit 6042e5810e
2 changed files with 49 additions and 9 deletions
+1 -1
View File
@@ -220,7 +220,7 @@ fn deserialize_tool_input_schema(input_schema: JsonValue) -> Result<JsonSchema,
// Use compact normalized JSON bytes as a cheap local proxy for the 1k-token
// schema budget.
const MAX_COMPACT_TOOL_SCHEMA_BYTES: usize = 4_000;
const MAX_COMPACT_TOOL_SCHEMA_DEPTH: usize = 2;
const MAX_COMPACT_TOOL_SCHEMA_DEPTH: usize = 3;
/// Shrink unusually large tool schemas while preserving the top-level argument
/// surface. Compaction is best-effort rather than a hard cap: it runs only
+48 -8
View File
@@ -1300,7 +1300,12 @@ fn collapse_deep_schema_objects_traverses_schema_children() {
"complex": {
"type": "object",
"properties": {
"leaf": { "type": "string" }
"nested": {
"type": "object",
"properties": {
"leaf": { "type": "string" }
}
}
}
},
"scalar": {
@@ -1313,7 +1318,12 @@ fn collapse_deep_schema_objects_traverses_schema_children() {
"items": {
"type": "object",
"properties": {
"leaf": { "type": "string" }
"nested": {
"type": "object",
"properties": {
"leaf": { "type": "string" }
}
}
}
}
},
@@ -1322,7 +1332,12 @@ fn collapse_deep_schema_objects_traverses_schema_children() {
"additionalProperties": {
"type": "object",
"properties": {
"leaf": { "type": "string" }
"nested": {
"type": "object",
"properties": {
"leaf": { "type": "string" }
}
}
}
}
},
@@ -1331,7 +1346,12 @@ fn collapse_deep_schema_objects_traverses_schema_children() {
{
"type": "object",
"properties": {
"leaf": { "type": "string" }
"nested": {
"type": "object",
"properties": {
"leaf": { "type": "string" }
}
}
}
},
{ "type": "string" }
@@ -1350,7 +1370,12 @@ fn collapse_deep_schema_objects_traverses_schema_children() {
"object_parent": {
"type": "object",
"properties": {
"complex": {},
"complex": {
"type": "object",
"properties": {
"nested": {}
}
},
"scalar": {
"type": "string"
}
@@ -1358,15 +1383,30 @@ fn collapse_deep_schema_objects_traverses_schema_children() {
},
"array_parent": {
"type": "array",
"items": {}
"items": {
"type": "object",
"properties": {
"nested": {}
}
}
},
"map_parent": {
"type": "object",
"additionalProperties": {}
"additionalProperties": {
"type": "object",
"properties": {
"nested": {}
}
}
},
"union_parent": {
"anyOf": [
{},
{
"type": "object",
"properties": {
"nested": {}
}
},
{ "type": "string" }
]
}