mirror of
https://github.com/pchuan98/codex.git
synced 2026-07-01 00:31:56 +08:00
chore: preserve one more schema layer during large tool compaction (#27084)
## Summary
Some customer MCP tools expose large input schemas that exceed Codex's
compact schema budget even after description stripping. Today, the final
compaction pass collapses complex schemas starting at depth 2, which can
erase important shallow call structure such as small `anyOf` branches,
required fields, and help-mode entry points. In one reported case, this
degraded a tool schema into `query: any | any`, leaving the model
without enough structure to discover the required help call.
This change raises the deep-schema collapse boundary from depth 2 to
depth 3. That preserves one additional layer of the tool contract while
still collapsing deeper expensive subtrees to `{}` when a schema remains
over budget.
## What Changed
- Increased `MAX_COMPACT_TOOL_SCHEMA_DEPTH` from `2` to `3`.
- Updated the schema compaction traversal test to assert the new
collapse boundary.
- The resulting compacted shape keeps useful shallow structure, for
example:
- top-level argument names
- shallow `anyOf` branches
- required object fields
- nested property names one level deeper than before
## Validation
- Ran `just test -p codex-tools`: 81 tests passed.
- Ran a golden schema corpus comparison over 214 discovered tool input
schemas under `golden_schemas/*/mcp_tools/*/input_schema.json`.
- Depth 2 and depth 3 had identical percentile token counts across the
corpus.
- Both ended with `0 / 214` schemas over 1k tokens.
- Both ended with `0 / 214` schemas over the 4,000-byte compact JSON
budget.
- Only one golden schema changed, increasing from 49 to 56 tokens, so
this does not appear to introduce a meaningful corpus-wide regression.
Corpus percentile results:
| Percentile | Depth 2 | Depth 3 |
|---|---:|---:|
| p0 | 9 | 9 |
| p10 | 31 | 31 |
| p25 | 54 | 54 |
| p50 | 81 | 81 |
| p75 | 143 | 143 |
| p90 | 290 | 290 |
| p95 | 431 | 431 |
| p99 | 600 | 600 |
| max | 832 | 832 |
This commit is contained in:
committed by
GitHub
Unverified
parent
0473a5cc52
commit
6042e5810e
@@ -220,7 +220,7 @@ fn deserialize_tool_input_schema(input_schema: JsonValue) -> Result<JsonSchema,
|
||||
// Use compact normalized JSON bytes as a cheap local proxy for the 1k-token
|
||||
// schema budget.
|
||||
const MAX_COMPACT_TOOL_SCHEMA_BYTES: usize = 4_000;
|
||||
const MAX_COMPACT_TOOL_SCHEMA_DEPTH: usize = 2;
|
||||
const MAX_COMPACT_TOOL_SCHEMA_DEPTH: usize = 3;
|
||||
|
||||
/// Shrink unusually large tool schemas while preserving the top-level argument
|
||||
/// surface. Compaction is best-effort rather than a hard cap: it runs only
|
||||
|
||||
@@ -1300,7 +1300,12 @@ fn collapse_deep_schema_objects_traverses_schema_children() {
|
||||
"complex": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"leaf": { "type": "string" }
|
||||
"nested": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"leaf": { "type": "string" }
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"scalar": {
|
||||
@@ -1313,7 +1318,12 @@ fn collapse_deep_schema_objects_traverses_schema_children() {
|
||||
"items": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"leaf": { "type": "string" }
|
||||
"nested": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"leaf": { "type": "string" }
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
@@ -1322,7 +1332,12 @@ fn collapse_deep_schema_objects_traverses_schema_children() {
|
||||
"additionalProperties": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"leaf": { "type": "string" }
|
||||
"nested": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"leaf": { "type": "string" }
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
@@ -1331,7 +1346,12 @@ fn collapse_deep_schema_objects_traverses_schema_children() {
|
||||
{
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"leaf": { "type": "string" }
|
||||
"nested": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"leaf": { "type": "string" }
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
{ "type": "string" }
|
||||
@@ -1350,7 +1370,12 @@ fn collapse_deep_schema_objects_traverses_schema_children() {
|
||||
"object_parent": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"complex": {},
|
||||
"complex": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"nested": {}
|
||||
}
|
||||
},
|
||||
"scalar": {
|
||||
"type": "string"
|
||||
}
|
||||
@@ -1358,15 +1383,30 @@ fn collapse_deep_schema_objects_traverses_schema_children() {
|
||||
},
|
||||
"array_parent": {
|
||||
"type": "array",
|
||||
"items": {}
|
||||
"items": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"nested": {}
|
||||
}
|
||||
}
|
||||
},
|
||||
"map_parent": {
|
||||
"type": "object",
|
||||
"additionalProperties": {}
|
||||
"additionalProperties": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"nested": {}
|
||||
}
|
||||
}
|
||||
},
|
||||
"union_parent": {
|
||||
"anyOf": [
|
||||
{},
|
||||
{
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"nested": {}
|
||||
}
|
||||
},
|
||||
{ "type": "string" }
|
||||
]
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user