Commit Graph

35 Commits

  • [rollout_trace] Add debug trace reduction command (#18880)
    ## Summary
    
    Adds the debug CLI entry point for reducing recorded rollout traces.
    This gives developers a direct way to inspect whether the emitted trace
    stream reduces into the expected conversation/runtime model.
    
    ## Stack
    
    This is PR 5/5 in the rollout trace stack.
    
    - [#18876](https://github.com/openai/codex/pull/18876): Add rollout
    trace crate
    - [#18877](https://github.com/openai/codex/pull/18877): Record core
    session rollout traces
    - [#18878](https://github.com/openai/codex/pull/18878): Trace tool and
    code-mode boundaries
    - [#18879](https://github.com/openai/codex/pull/18879): Trace sessions
    and multi-agent edges
    - [#18880](https://github.com/openai/codex/pull/18880): Add debug trace
    reduction command
    
    ## Review Notes
    
    This PR is intentionally last: it depends on the trace crate, core
    recorder, runtime/tool events, and session/agent edge data all existing.
    The command should remain a debug/developer tool and avoid adding new
    runtime behavior.
    
    The useful review question is whether the CLI exposes the reducer in the
    smallest practical way for local inspection without turning the debug
    command into a supported user-facing workflow.
  • [rollout_trace] Trace tool and code-mode boundaries (#18878)
    ## Summary
    
    Extends rollout tracing across tool dispatch and code-mode runtime
    boundaries. This records canonical tool-call lifecycle events and links
    code-mode execution/wait operations back to the model-visible calls that
    caused them.
    
    ## Stack
    
    This is PR 3/5 in the rollout trace stack.
    
    - [#18876](https://github.com/openai/codex/pull/18876): Add rollout
    trace crate
    - [#18877](https://github.com/openai/codex/pull/18877): Record core
    session rollout traces
    - [#18878](https://github.com/openai/codex/pull/18878): Trace tool and
    code-mode boundaries
    - [#18879](https://github.com/openai/codex/pull/18879): Trace sessions
    and multi-agent edges
    - [#18880](https://github.com/openai/codex/pull/18880): Add debug trace
    reduction command
    
    ## Review Notes
    
    This PR is about attribution. Reviewers should focus on whether direct
    tool calls, code-mode-originated tool calls, waits, outputs, and
    cancellation boundaries are recorded with enough source information for
    deterministic reduction without coupling the reducer to live runtime
    internals.
    
    The stack remains valid after this layer: tool and code-mode traces
    reduce through the existing crate model, while the broader session and
    multi-agent relationships are added in the next PR.
  • chore: document intentional await-holding cases (#18423)
    ## Why
    
    This PR prepares the stack to enable Clippy await-holding lints that
    were left disabled in #18178. The mechanical lock-scope cleanup is
    handled separately; this PR is the documentation/configuration layer for
    the remaining await-across-guard sites.
    
    Without explicit annotations, reviewers and future maintainers cannot
    tell whether an await-holding warning is a real concurrency smell or an
    intentional serialization boundary.
    
    ## What changed
    
    - Configures `clippy.toml` so `await_holding_invalid_type` also covers
    `tokio::sync::{MutexGuard,RwLockReadGuard,RwLockWriteGuard}`.
    - Adds targeted `#[expect(clippy::await_holding_invalid_type, reason =
    ...)]` annotations for intentional async guard lifetimes.
    - Documents the main categories of intentional cases: active-turn state
    transitions that must remain atomic, session-owned MCP manager accesses,
    remote-control websocket serialization, JS REPL kernel/process
    serialization, OAuth persistence, external bearer token refresh
    serialization, and tests that intentionally serialize shared global or
    session-owned state.
    - For external bearer token refresh, documents the existing
    serialization boundary: holding `cached_token` across the provider
    command prevents concurrent cache misses from starting duplicate refresh
    commands, and the current behavior is small enough that an explicit
    expectation is easier to maintain than adding another synchronization
    primitive.
    
    ## Verification
    
    - `cargo clippy -p codex-login --all-targets`
    - `cargo clippy -p codex-connectors --all-targets`
    - `cargo clippy -p codex-core --all-targets`
    - The follow-up PR #18698 enables `await_holding_invalid_type` and
    `await_holding_lock` as workspace `deny` lints, so any undocumented
    remaining offender will fail Clippy.
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18423).
    * #18698
    * __->__ #18423
  • Update image outputs to default to high detail (#18386)
    Do not assume the default `detail`.
  • Move codex module under session (#18249)
    ## Summary
    - rename the core codex module root to session/mod.rs without using
    #[path]
    - move the codex module directory and tests under core/src/session
    - remove session/mod.rs reexports so call sites use explicit child
    module paths
    
    ## Testing
    - cargo test -p codex-core --lib
    - cargo check -p codex-core --tests
    - just fmt
    - just fix -p codex-core
    - git diff --check
  • [mcp] Add dummy tools for previously called but currently missing tools. (#17853)
    - [x] Add dummy tools for previously called but currently missing tools.
    Currently supporting MCP tools only.
  • Support original-detail metadata on MCP image outputs (#17714)
    ## Summary
    - honor `_meta["codex/imageDetail"] == "original"` on MCP image content
    and map it to `detail: "original"` where supported
    - strip that detail back out when the active model does not support
    original-detail image inputs
    - update code-mode `image(...)` to accept individual MCP image blocks
    - teach `js_repl` / `codex.emitImage(...)` to preserve the same hint
    from raw MCP image outputs
    - document the new `_meta` contract and add generic RMCP-backed coverage
    across protocol, core, code-mode, and js_repl paths
  • register all mcp tools with namespace (#17404)
    stacked on #17402.
    
    MCP tools returned by `tool_search` (deferred tools) get registered in
    our `ToolRegistry` with a different format than directly available
    tools. this leads to two different ways of accessing MCP tools from our
    tool catalog, only one of which works for each. fix this by registering
    all MCP tools with the namespace format, since this info is already
    available.
    
    also, direct MCP tools are registered to responsesapi without a
    namespace, while deferred MCP tools have a namespace. this means we can
    receive MCP `FunctionCall`s in both formats from namespaces. fix this by
    always registering MCP tools with namespace, regardless of deferral
    status.
    
    make code mode track `ToolName` provenance of tools so it can map the
    literal JS function name string to the correct `ToolName` for
    invocation, rather than supporting both in core.
    
    this lets us unify to a single canonical `ToolName` representation for
    each MCP tool and force everywhere to use that one, without supporting
    fallbacks.
  • Add supports_parallel_tool_calls flag to included mcps (#17667)
    ## Why
    
    For more advanced MCP usage, we want the model to be able to emit
    parallel MCP tool calls and have Codex execute eligible ones
    concurrently, instead of forcing all MCP calls through the serial block.
    
    The main design choice was where to thread the config. I made this
    server-level because parallel safety depends on the MCP server
    implementation. Codex reads the flag from `mcp_servers`, threads the
    opted-in server names into `ToolRouter`, and checks the parsed
    `ToolPayload::Mcp { server, .. }` at execution time. That avoids relying
    on model-visible tool names, which can be incomplete in
    deferred/search-tool paths or ambiguous for similarly named
    servers/tools.
    
    ## What was added
    
    Added `supports_parallel_tool_calls` for MCP servers.
    
    Before:
    
    ```toml
    [mcp_servers.docs]
    command = "docs-server"
    ```
    
    After:
    
    ```toml
    [mcp_servers.docs]
    command = "docs-server"
    supports_parallel_tool_calls = true
    ```
    
    MCP calls remain serial by default. Only tools from opted-in servers are
    eligible to run in parallel. Docs also now warn to enable this only when
    the server’s tools are safe to run concurrently, especially around
    shared state or read/write races.
    
    ## Testing
    
    Tested with a local stdio MCP server exposing real delay tools. The
    model/Responses side was mocked only to deterministically emit two MCP
    calls in the same turn.
    
    Each test called `query_with_delay` and `query_with_delay_2` with `{
    "seconds": 25 }`.
    
    | Build/config | Observed | Wall time |
    | --- | --- | --- |
    | main with flag enabled | serial | `58.79s` |
    | PR with flag enabled | parallel | `31.73s` |
    | PR without flag | serial | `56.70s` |
    
    PR with flag enabled showed both tools start before either completed;
    main and PR-without-flag completed the first delay before starting the
    second.
    
    Also added an integration test.
    
    Additional checks:
    
    - `cargo test -p codex-tools` passed
    - `cargo test -p codex-core
    mcp_parallel_support_uses_exact_payload_server` passed
    - `git diff --check` passed
  • chore: refactor name and namespace to single type (#17402)
    avoid passing them both around, unify on a type. this now also keys
    `ToolRegistry`.
    
    tests pass
  • [mcp] Expand tool search to custom MCPs. (#16944)
    - [x] Expand tool search to custom MCPs.
    - [x] Rename several variables/fields to be more generic.
    
    Updated tool & server name lifecycles:
    
    **Raw Identity**
    
    ToolInfo.server_name is raw MCP server name.
    ToolInfo.tool.name is raw MCP tool name.
    MCP calls route back to raw via parse_tool_name() returning
    (tool.server_name, tool.tool.name).
    mcpServerStatus/list now groups by raw server and keys tools by
    Tool.name: mod.rs:599
    App-server just forwards that grouped raw snapshot:
    codex_message_processor.rs:5245
    
    **Callable Names**
    
    On list-tools, we create provisional callable_namespace / callable_name:
    mcp_connection_manager.rs:1556
    For non-app MCP, provisional callable name starts as raw tool name.
    For codex-apps, provisional callable name is sanitized and strips
    connector name/id prefix; namespace includes connector name.
    Then qualify_tools() sanitizes callable namespace + name to ASCII alnum
    / _ only: mcp_tool_names.rs:128
    Note: this is stricter than Responses API. Hyphen is currently replaced
    with _ for code-mode compatibility.
    
    **Collision Handling**
    
    We do initially collapse example-server and example_server to the same
    base.
    Then qualify_tools() detects distinct raw namespace identities behind
    the same sanitized namespace and appends a hash to the callable
    namespace: mcp_tool_names.rs:137
    Same idea for tool-name collisions: hash suffix goes on callable tool
    name.
    Final list_all_tools() map key is callable_namespace + callable_name:
    mcp_connection_manager.rs:769
    
    **Direct Model Tools**
    
    Direct MCP tool declarations use the full qualified sanitized key as the
    Responses function name.
    The raw rmcp Tool is converted but renamed for model exposure.
    
    **Tool Search / Deferred**
    
    Tool search result namespace = final ToolInfo.callable_namespace:
    tool_search.rs:85
    Tool search result nested name = final ToolInfo.callable_name:
    tool_search.rs:86
    Deferred tool handler is registered as "{namespace}:{name}":
    tool_registry_plan.rs:248
    When a function call comes back, core recombines namespace + name, looks
    up the full qualified key, and gets the raw server/tool for MCP
    execution: codex.rs:4353
    
    **Separate Legacy Snapshot**
    
    collect_mcp_snapshot_from_manager_with_detail() still returns a map
    keyed by qualified callable name.
    mcpServerStatus/list no longer uses that; it uses
    McpServerStatusSnapshot, which is raw-inventory shaped.
  • core: cut codex-core compile time 63% with native async ToolHandler (#16630)
    ## Why
    
    `ToolHandler` was still paying a large compile-time tax from
    `#[async_trait]` on every concrete handler impl, even though the only
    object-safe boundary the registry actually stores is the internal
    `AnyToolHandler` adapter.
    
    This PR removes that macro-generated async wrapper layer from concrete
    `ToolHandler` impls while keeping the existing object-safe shim in
    `AnyToolHandler`. In practice, that gets essentially the same
    compile-time win as the larger type-erasure refactor in #16627, but with
    a much smaller diff and without changing the public shape of
    `ToolHandler<Output = T>`.
    
    That tradeoff matters here because this is a broad `codex-core` hotspot
    and reviewers should be able to judge the compile-time impact from hard
    numbers, not vibes.
    
    ## Headline result
    
    On a clean `codex-core` package rebuild (`cargo clean -p codex-core`
    before each command), rustc `total` dropped from **187.15s to 68.98s**
    versus the shared `0bd31dc382bd` baseline: **-63.1%**.
    
    The biggest hot passes dropped by roughly **71-72%**:
    
    | Metric | Baseline `0bd31dc382bd` | This PR `41f7ac0adeac` | Delta |
    |---|---:|---:|---:|
    | `total` | 187.15s | 68.98s | **-63.1%** |
    | `generate_crate_metadata` | 84.53s | 24.49s | **-71.0%** |
    | `MIR_borrow_checking` | 84.13s | 24.58s | **-70.8%** |
    | `monomorphization_collector_graph_walk` | 79.74s | 22.19s | **-72.2%**
    |
    | `evaluate_obligation` self-time | 180.62s | 46.91s | **-74.0%** |
    
    Important caveat: `-Z time-passes` timings are nested, so
    `generate_crate_metadata` and `monomorphization_collector_graph_walk`
    are mostly overlapping, not additive.
    
    ## Why this PR over #16627
    
    #16627 already proved that the `ToolHandler` stack was the right
    hotspot, but it got there by making `ToolHandler` object-safe and
    changing every handler to return `BoxFuture<Result<AnyToolResult, _>>`
    directly.
    
    This PR keeps the lower-churn shape:
    
    - `ToolHandler` remains generic over `type Output`.
    - Concrete handlers use native RPITIT futures with explicit `Send`
    bounds.
    - `AnyToolHandler` remains the only object-safe adapter and still does
    the boxing at the registry boundary, as before.
    - The implementation diff is only **33 files, +28/-77**.
    
    The measurements are at least comparable, and in this run this PR is
    slightly faster than #16627 on the pass-level total:
    
    | Metric | #16627 | This PR | Delta |
    |---|---:|---:|---:|
    | `total` | 79.90s | 68.98s | **-13.7%** |
    | `generate_crate_metadata` | 25.88s | 24.49s | **-5.4%** |
    | `monomorphization_collector_graph_walk` | 23.54s | 22.19s | **-5.7%**
    |
    | `evaluate_obligation` self-time | 43.29s | 46.91s | +8.4% |
    
    ## Profile data
    
    ### Crate-level timings
    
    `cargo +nightly build -p codex-core --lib -Z unstable-options
    --timings=json` after `cargo clean -p codex-core`.
    
    Baseline data below is reused from the shared parent `0bd31dc382bd`
    profile because this PR and #16627 are both one commit on top of that
    same parent.
    
    | Crate | Baseline `duration` | This PR `duration` | Delta | Baseline
    `rmeta_time` | This PR `rmeta_time` | Delta |
    |---|---:|---:|---:|---:|---:|---:|
    | `codex_core` | 187.380776583s | 69.171113833s | **-63.1%** |
    174.474507208s | 55.873015583s | **-68.0%** |
    | `starlark` | 17.90s | 16.773824125s | -6.3% | n/a | 8.8999965s | n/a |
    
    ### Pass-level timings
    
    `cargo +nightly rustc -p codex-core --lib -- -Z time-passes -Z
    time-passes-format=json` after `cargo clean -p codex-core`.
    
    | Pass | Baseline | This PR | Delta |
    |---|---:|---:|---:|
    | `total` | 187.150662083s | 68.978770375s | **-63.1%** |
    | `generate_crate_metadata` | 84.531864625s | 24.487462958s | **-71.0%**
    |
    | `MIR_borrow_checking` | 84.131389375s | 24.575553875s | **-70.8%** |
    | `monomorphization_collector_graph_walk` | 79.737515042s |
    22.190207417s | **-72.2%** |
    | `codegen_crate` | 12.362532292s | 12.695237625s | +2.7% |
    | `type_check_crate` | 4.4765405s | 5.442019542s | +21.6% |
    | `coherence_checking` | 3.311121208s | 4.239935292s | +28.0% |
    | process `real` / `user` / `sys` | 187.70s / 201.87s / 4.99s | 69.52s /
    85.90s / 2.92s | n/a |
    
    ### Self-profile query summary
    
    `cargo +nightly rustc -p codex-core --lib -- -Z self-profile=... -Z
    self-profile-events=default,query-keys,args,llvm,artifact-sizes` after
    `cargo clean -p codex-core`, summarized with `measureme summarize -p
    0.5`.
    
    | Query / phase | Baseline self time | This PR self time | Delta |
    Baseline total time | This PR total time | Baseline item count | This PR
    item count | Baseline cache hits | This PR cache hits |
    |---|---:|---:|---:|---:|---:|---:|---:|---:|---:|
    | `evaluate_obligation` | 180.62s | 46.91s | **-74.0%** | 182.08s |
    48.37s | 572,234 | 388,659 | 1,130,998 | 1,058,553 |
    | `mir_borrowck` | 1.42s | 1.49s | +4.9% | 93.77s | 29.59s | n/a | 6,184
    | n/a | 15,298 |
    | `typeck` | 1.84s | 1.87s | +1.6% | 2.38s | 2.44s | n/a | 9,367 | n/a |
    79,247 |
    | `LLVM_module_codegen_emit_obj` | n/a | 17.12s | n/a | 17.01s | 17.12s
    | n/a | 256 | n/a | 0 |
    | `LLVM_passes` | n/a | 13.07s | n/a | 12.95s | 13.07s | n/a | 1 | n/a |
    0 |
    | `codegen_module` | n/a | 12.33s | n/a | 12.22s | 13.64s | n/a | 256 |
    n/a | 0 |
    | `items_of_instance` | n/a | 676.00ms | n/a | n/a | 24.96s | n/a |
    99,990 | n/a | 0 |
    | `type_op_prove_predicate` | n/a | 660.79ms | n/a | n/a | 24.78s | n/a
    | 78,762 | n/a | 235,877 |
    
    | Summary | Baseline | This PR |
    |---|---:|---:|
    | `evaluate_obligation` % of total CPU | 70.821% | 38.880% |
    | self-profile total CPU time | 255.042999997s | 120.661175956s |
    | process `real` / `user` / `sys` | 220.96s / 235.02s / 7.09s | 86.35s /
    103.66s / 3.54s |
    
    ### Artifact sizes
    
    From the same `measureme summarize` output:
    
    | Artifact | Baseline | This PR | Delta |
    |---|---:|---:|---:|
    | `crate_metadata` | 26,534,471 bytes | 26,545,248 bytes | +10,777 |
    | `dep_graph` | 253,181,425 bytes | 239,240,806 bytes | -13,940,619 |
    | `linked_artifact` | 565,366,624 bytes | 562,673,176 bytes | -2,693,448
    |
    | `object_file` | 513,127,264 bytes | 510,464,096 bytes | -2,663,168 |
    | `query_cache` | 137,440,945 bytes | 136,982,566 bytes | -458,379 |
    | `cgu_instructions` | 3,586,307 bytes | 3,575,121 bytes | -11,186 |
    | `codegen_unit_size_estimate` | 2,084,846 bytes | 2,078,773 bytes |
    -6,073 |
    | `work_product_index` | 19,565 bytes | 19,565 bytes | 0 |
    
    ### Baseline hotspots before this change
    
    These are the top normalized obligation buckets from the shared baseline
    profile:
    
    | Obligation bucket | Samples | Duration |
    |---|---:|---:|
    | `outlives:tasks::review::ReviewTask` | 1,067 | 6.33s |
    | `outlives:tools::handlers::unified_exec::UnifiedExecHandler` | 896 |
    5.63s |
    | `trait:T as tools::registry::ToolHandler` | 876 | 5.45s |
    | `outlives:tools::handlers::shell::ShellHandler` | 888 | 5.37s |
    | `outlives:tools::handlers::shell::ShellCommandHandler` | 870 | 5.29s |
    |
    `outlives:tools::runtimes::shell::unix_escalation::CoreShellActionProvider`
    | 637 | 3.73s |
    | `outlives:tools::handlers::mcp::McpHandler` | 695 | 3.61s |
    | `outlives:tasks::regular::RegularTask` | 726 | 3.57s |
    
    Top `items_of_instance` entries before this change were mostly concrete
    async handler/task impls:
    
    | Instance | Duration |
    |---|---:|
    | `tasks::regular::{impl#2}::run` | 3.79s |
    | `tools::handlers::mcp::{impl#0}::handle` | 3.27s |
    | `tools::runtimes::shell::unix_escalation::{impl#2}::determine_action`
    | 3.09s |
    | `tools::handlers::agent_jobs::{impl#11}::handle` | 3.07s |
    | `tools::handlers::multi_agents::spawn::{impl#1}::handle` | 2.84s |
    | `tasks::review::{impl#4}::run` | 2.82s |
    | `tools::handlers::multi_agents_v2::spawn::{impl#2}::handle` | 2.80s |
    | `tools::handlers::multi_agents::resume_agent::{impl#1}::handle` |
    2.73s |
    | `tools::handlers::unified_exec::{impl#2}::handle` | 2.54s |
    | `tasks::compact::{impl#4}::run` | 2.45s |
    
    ## What changed
    
    Relevant pre-change registry shape:
    [`codex-rs/core/src/tools/registry.rs`](https://github.com/openai/codex/blob/0bd31dc382bd1c33dc2bb6b97069c76aa10ba14b/codex-rs/core/src/tools/registry.rs#L38-L219)
    
    Current registry shape in this PR:
    [`codex-rs/core/src/tools/registry.rs`](https://github.com/openai/codex/blob/41f7ac0adeac81d667541853d6546267d6083613/codex-rs/core/src/tools/registry.rs#L38-L203)
    
    - `ToolHandler::{is_mutating, handle}` now return native `impl Future +
    Send` futures instead of using `#[async_trait]`.
    - `AnyToolHandler` remains the object-safe adapter and boxes those
    futures at the registry boundary with explicit lifetimes.
    - Concrete handlers and the registry test handler drop `#[async_trait]`
    but otherwise keep their async method bodies intact.
    - Representative examples:
    [`codex-rs/core/src/tools/handlers/shell.rs`](https://github.com/openai/codex/blob/41f7ac0adeac81d667541853d6546267d6083613/codex-rs/core/src/tools/handlers/shell.rs#L223-L379),
    [`codex-rs/core/src/tools/handlers/unified_exec.rs`](https://github.com/openai/codex/blob/41f7ac0adeac81d667541853d6546267d6083613/codex-rs/core/src/tools/handlers/unified_exec.rs),
    [`codex-rs/core/src/tools/registry_tests.rs`](https://github.com/openai/codex/blob/41f7ac0adeac81d667541853d6546267d6083613/codex-rs/core/src/tools/registry_tests.rs)
    
    ## Tradeoff
    
    This is intentionally less invasive than #16627: it does **not** move
    result boxing into every concrete handler and does **not** change
    `ToolHandler` into an object-safe trait.
    
    Instead, it keeps the existing registry-level type-erasure boundary and
    only removes the macro-generated async wrapper layer from concrete
    impls. So the runtime boxing story stays basically the same as before,
    while the compile-time savings are still large.
    
    ## Verification
    
    Existing verification for this branch still applies:
    
    - Ran `cargo test -p codex-core`; this change compiled and the suite
    reached the known unrelated `config::tests::*guardian*` failures, with
    no local diff under `codex-rs/core/src/config/`.
    
    Profiling commands used for the tables above:
    
    - `cargo clean -p codex-core`
    - `cargo +nightly build -p codex-core --lib -Z unstable-options
    --timings=json`
    - `cargo +nightly rustc -p codex-core --lib -- -Z time-passes -Z
    time-passes-format=json`
    - `cargo +nightly rustc -p codex-core --lib -- -Z self-profile=... -Z
    self-profile-events=default,query-keys,args,llvm,artifact-sizes`
    - `measureme summarize -p 0.5`
  • Extract code-mode nested tool collection into codex-tools (#16509)
    ## Why
    This is another small step in the `codex-core` -> `codex-tools`
    migration described in `AGENTS.md`.
    
    `core/src/tools/spec.rs` and `core/src/tools/code_mode/mod.rs` were both
    hand-rolling the same pure transformation: convert visible `ToolSpec`s
    into code-mode nested tool definitions, then sort and deduplicate by
    tool name. That logic does not depend on core runtime state or handlers,
    so keeping it in `codex-core` makes `spec.rs` harder to peel out later
    than it needs to be.
    
    ## What Changed
    - Add `collect_code_mode_tool_definitions()` to
    `codex-rs/tools/src/code_mode.rs`.
    - Reuse that helper from `codex-rs/core/src/tools/spec.rs` when
    assembling the `exec` tool description.
    - Reuse the same helper from `codex-rs/core/src/tools/code_mode/mod.rs`
    when exposing nested tool metadata to the code-mode runtime.
    
    This is intended to be a straight refactor with no behavior change and
    no new test surface.
    
    ## Verification
    - `cargo test -p codex-tools`
    - `cargo test -p codex-core tools::spec::tests`
    - `cargo test -p codex-core code_mode_only_`
  • Remove client_common tool re-exports (#16482)
    ## Why
    
    `codex-rs/core/src/client_common.rs` still had a `tools` re-export
    module that forwarded `codex_tools` types back into `codex-core`. After
    the earlier extraction work in #16379, #16471, #16477, and #16481, that
    extra layer no longer adds value.
    
    Removing it keeps dependencies explicit: the `codex-core` modules that
    actually use `ToolSpec` and related types now depend on `codex_tools`
    directly instead of reaching through `client_common`.
    
    ## What Changed
    
    - removed the `client_common::tools` re-export module from
    `core/src/client_common.rs`
    - updated the remaining `codex-core` consumers to import `codex_tools`
    directly
    - adjusted the affected test code to reference
    `codex_tools::ResponsesApiTool` directly as well
    
    This is a mechanical cleanup only. It does not change tool behavior or
    runtime logic.
    
    ## Testing
    
    - `cargo test -p codex-core client_common::tests`
    - `cargo test -p codex-core tools::router::tests`
    - `cargo test -p codex-core tools::context::tests`
    - `cargo test -p codex-core tools::spec::tests`
  • codex-tools: extract code mode tool spec adapters (#16132)
    ## Why
    
    The longer-term `codex-tools` migration is to move pure tool-definition
    and tool-spec plumbing out of `codex-core` while leaving session- and
    runtime-coupled orchestration behind.
    
    The remaining code-mode adapter layer in
    `core/src/tools/code_mode_description.rs` was a good next extraction
    seam because it only transformed `ToolSpec` values for code mode and
    already delegated the low-level description rendering to
    `codex-code-mode`.
    
    ## What Changed
    
    - added `codex-rs/tools/src/code_mode.rs` with
    `augment_tool_spec_for_code_mode()` and
    `tool_spec_to_code_mode_tool_definition()`
    - added focused unit coverage in `codex-rs/tools/src/code_mode_tests.rs`
    - rewired `core/src/tools/spec.rs` and `core/src/tools/code_mode/mod.rs`
    to use the extracted adapters from `codex-tools`
    - removed the old `core/src/tools/code_mode_description.rs` shim and its
    test file from `codex-core`
    - added the `codex-code-mode` dependency to `codex-tools`, updated
    `Cargo.lock`, and refreshed the `codex-tools` README to reflect the
    expanded boundary
    
    ## Test Plan
    
    - `cargo test -p codex-tools`
    - `CARGO_TARGET_DIR=/tmp/codex-core-code-mode-adapters cargo test -p
    codex-core --lib tools::spec::`
    - `CARGO_TARGET_DIR=/tmp/codex-core-code-mode-adapters cargo test -p
    codex-core --lib tools::code_mode::`
    - `just bazel-lock-update`
    - `just bazel-lock-check`
    - `just argument-comment-lint`
    
    ## References
    
    - #15923
    - #15928
    - #15944
    - #15953
    - #16031
    - #16047
    - #16129
  • Move string truncation helpers into codex-utils-string (#15572)
    - move the shared byte-based middle truncation logic from `core` into
    `codex-utils-string`
    - keep token-specific truncation in `codex-core` so rollout can reuse
    the shared helper in the next stacked PR
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Code mode on v8 (#15276)
    Moves Code Mode to a new crate with no dependencies on codex. This
    create encodes the code mode semantics that we want for lifetime,
    mounting, tool calling.
    
    The model-facing surface is mostly unchanged. `exec` still runs raw
    JavaScript, `wait` still resumes or terminates a `cell_id`, nested tools
    are still available through `tools.*`, and helpers like `text`, `image`,
    `store`, `load`, `notify`, `yield_control`, and `exit` still exist.
    
    The major change is underneath that surface:
    
    - Old code mode was an external Node runtime.
    - New code mode is an in-process V8 runtime embedded directly in Rust.
    - Old code mode managed cells inside a long-lived Node runner process.
    - New code mode manages cells in Rust, with one V8 runtime thread per
    active `exec`.
    - Old code mode used JSON protocol messages over child stdin/stdout plus
    Node worker-thread messages.
    - New code mode uses Rust channels and direct V8 callbacks/events.
    
    This PR also fixes the two migration regressions that fell out of that
    substrate change:
    
    - `wait { terminate: true }` now waits for the V8 runtime to actually
    stop before reporting termination.
    - synchronous top-level `exit()` now succeeds again instead of surfacing
    as a script error.
    
    ---
    
    - `core/src/tools/code_mode/*` is now mostly an adapter layer for the
    public `exec` / `wait` tools.
    - `code-mode/src/service.rs` owns cell sessions and async control flow
    in Rust.
    - `code-mode/src/runtime/*.rs` owns the embedded V8 isolate and
    JavaScript execution.
    - each `exec` spawns a dedicated runtime thread plus a Rust
    session-control task.
    - helper globals are installed directly into the V8 context instead of
    being injected through a source prelude.
    - helper modules like `tools.js` and `@openai/code_mode` are synthesized
    through V8 module resolution callbacks in Rust.
    
    ---
    
    Also added a benchmark for showing the speed of init and use of a code
    mode env:
    ```
    $ cargo bench -p codex-code-mode --bench exec_overhead -- --samples 30 --warm-iterations 25 --tool-counts 0,32,128
    Finished [`bench` profile [optimized]](https://doc.rust-lang.org/cargo/reference/profiles.html#default-profiles) target(s) in 0.18s
         Running benches/exec_overhead.rs (target/release/deps/exec_overhead-008c440d800545ae)
    exec_overhead: samples=30, warm_iterations=25, tool_counts=[0, 32, 128]
    scenario       tools samples    warmups      iters      mean/exec       p95/exec       rssΔ p50       rssΔ max
    cold_exec          0      30          0          1         1.13ms         1.20ms        8.05MiB        8.06MiB
    warm_exec          0      30          1         25       473.43us       512.49us      912.00KiB        1.33MiB
    cold_exec         32      30          0          1         1.03ms         1.15ms        8.08MiB        8.11MiB
    warm_exec         32      30          1         25       509.73us       545.76us      960.00KiB        1.30MiB
    cold_exec        128      30          0          1         1.14ms         1.19ms        8.30MiB        8.34MiB
    warm_exec        128      30          1         25       575.08us       591.03us      736.00KiB      864.00KiB
    memory uses a fresh-process max RSS delta for each scenario
    ```
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Split features into codex-features crate (#15253)
    - Split the feature system into a new `codex-features` crate.
    - Cut `codex-core` and workspace consumers over to the new config and
    warning APIs.
    
    Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
    Co-authored-by: Codex <noreply@openai.com>
  • Return image URL from view_image tool (#15072)
    Cleanup image semantics in code mode.
    
    `view_image` now returns `{image_url:string, details?: string}` 
    
    `image()` now allows both string parameter and `{image_url:string,
    details?: string}`
  • Propagate tool errors to code mode (#15075)
    Clean up error flow to push the FunctionCallError all the way up to
    dispatcher and allow code mode to surface as exception.
  • Add notify to code-mode (#14842)
    Allows model to send an out-of-band notification.
    
    The notification is injected as another tool call output for the same
    call_id.
  • Rename exec_wait tool to wait (#14983)
    Summary
    - document that code mode only exposes `exec` and the renamed `wait`
    tool
    - update code mode tool spec and descriptions to match the new tool name
    - rename tests and helper references from `exec_wait` to `wait`
    
    Testing
    - Not run (not requested)
  • Apply argument comment lint across codex-rs (#14652)
    ## Why
    
    Once the repo-local lint exists, `codex-rs` needs to follow the
    checked-in convention and CI needs to keep it from drifting. This commit
    applies the fallback `/*param*/` style consistently across existing
    positional literal call sites without changing those APIs.
    
    The longer-term preference is still to avoid APIs that require comments
    by choosing clearer parameter types and call shapes. This PR is
    intentionally the mechanical follow-through for the places where the
    existing signatures stay in place.
    
    After rebasing onto newer `main`, the rollout also had to cover newly
    introduced `tui_app_server` call sites. That made it clear the first cut
    of the CI job was too expensive for the common path: it was spending
    almost as much time installing `cargo-dylint` and re-testing the lint
    crate as a representative test job spends running product tests. The CI
    update keeps the full workspace enforcement but trims that extra
    overhead from ordinary `codex-rs` PRs.
    
    ## What changed
    
    - keep a dedicated `argument_comment_lint` job in `rust-ci`
    - mechanically annotate remaining opaque positional literals across
    `codex-rs` with exact `/*param*/` comments, including the rebased
    `tui_app_server` call sites that now fall under the lint
    - keep the checked-in style aligned with the lint policy by using
    `/*param*/` and leaving string and char literals uncommented
    - cache `cargo-dylint`, `dylint-link`, and the relevant Cargo
    registry/git metadata in the lint job
    - split changed-path detection so the lint crate's own `cargo test` step
    runs only when `tools/argument-comment-lint/*` or `rust-ci.yml` changes
    - continue to run the repo wrapper over the `codex-rs` workspace, so
    product-code enforcement is unchanged
    
    Most of the code changes in this commit are intentionally mechanical
    comment rewrites or insertions driven by the lint itself.
    
    ## Verification
    
    - `./tools/argument-comment-lint/run.sh --workspace`
    - `cargo test -p codex-tui-app-server -p codex-tui`
    - parsed `.github/workflows/rust-ci.yml` locally with PyYAML
    
    ---
    
    * -> #14652
    * #14651
  • Add exit helper to code mode scripts (#14851)
    - **Summary**
    - expose `exit` through the code mode bridge and module so scripts can
    stop mid-flight
      - surface the helper in the description documentation
      - add a regression test ensuring `exit()` terminates execution cleanly
    - **Testing**
      - Not run (not requested)
  • Add code_mode_only feature (#14617)
    Summary
    - add the code_mode_only feature flag/config schema and wire its
    dependency on code_mode
    - update code mode tool descriptions to list nested tools with detailed
    headers
    - restrict available tools for prompt and exec descriptions when
    code_mode_only is enabled and test the behavior
    
    Testing
    - Not run (not requested)
  • code_mode: Move exec params from runtime declarations to @pragma (#14511)
    This change moves code_mode exec session settings out of the runtime API
    and into an optional first-line pragma, so instead of calling runtime
    helpers like set_yield_time() or set_max_output_tokens_per_exec_call(),
    the model can write // @exec: {"yield_time_ms": ...,
    "max_output_tokens": ...} at the top of the freeform exec source. Rust
    now parses that pragma before building the source, validates it, and
    passes the values directly in the exec start message to the code-mode
    broker, which applies them at session start without any worker-runtime
    mutation path. The @openai/code_mode module no longer exposes those
    setter functions, the docs and grammar were updated to describe the
    pragma form, and the existing code_mode tests were converted to use
    pragma-based configuration instead.
  • Expose code-mode tools through globals (#14517)
    Summary
    - make all code-mode tools accessible as globals so callers only need
    `tools.<name>`
    - rename text/image helpers and key globals (store, load, ALL_TOOLS,
    etc.) to reflect the new shared namespace
    - update the JS bridge, runners, descriptions, router, and tests to
    follow the new API
    
    Testing
    - Not run (not requested)
  • Rename exec session IDs to cell IDs (#14510)
    - Update the code-mode executor, wait handler, and protocol plumbing to
    use cell IDs instead of session IDs for node communication
    - Switch tool metadata, wait description, and suite tests to refer to
    cell IDs so user-visible messages match the new terminology
    
    **Testing**
    - Not run (not requested)
  • Fix MCP tool calling (#14491)
    Properly escape mcp tool names and make tools only available via
    imports.
  • Reuse tool runtime for code mode worker (#14496)
    ## Summary
    - create the turn-scoped `ToolCallRuntime` before starting the code mode
    worker so the worker reuses the same runtime and router
    - thread the shared runtime through the code mode service/worker path
    and use it for nested tool calls
    - model aborted tool calls as a concrete `ToolOutput` so aborted
    responses still produce valid tool output shapes
    
    ## Testing
    - `just fmt`
    - `cargo test -p codex-core` (still running locally)
  • Add default code-mode yield timeout (#14484)
    Summary
    - expose the default yield timeout through code mode runtime so the
    handler, wait tool, and protocol share the same 10s value that matches
    unified exec
    - document the timeout change in the tool descriptions and propagate the
    value all the way into the runner metadata
    - adjust Cargo.lock to keep the dependency tree in sync with the added
    code mode tool dependency
    
    Testing
    - Not run (not requested)
  • Cleanup code_mode tool descriptions (#14480)
    Move to separate files and clarify a bit.
  • Move code mode tool files under tools/code_mode and split functionality (#14476)
    - **Summary**
    - migrate the code mode handler, service, worker, process, runner, and
    bridge assets into the `tools/code_mode` module tree
    - split Execution, protocol, and handler logic into dedicated files and
    relocate the tool definition into `code_mode/spec.rs`
    - update core references and tests to stitch the new organization
    together
    - **Testing**
      - Not run (not requested)