Commit Graph

141 Commits

  • Add remote env CI matrix and integration test (#14869)
    `CODEX_TEST_REMOTE_ENV` will make `test_codex` start the executor
    "remotely" (inside a docker container) turning any integration test into
    remote test.
  • Split features into codex-features crate (#15253)
    - Split the feature system into a new `codex-features` crate.
    - Cut `codex-core` and workspace consumers over to the new config and
    warning APIs.
    
    Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
    Co-authored-by: Codex <noreply@openai.com>
  • feat(core, tracing): create turn spans over websockets (#14632)
    ## Description
    
    Dependent on:
    - [responsesapi] https://github.com/openai/openai/pull/760991 
    - [codex-backend] https://github.com/openai/openai/pull/760985
    
    `codex app-server -> codex-backend -> responsesapi` now reuses a
    persistent websocket connection across many turns. This PR updates
    tracing when using websockets so that each `response.create` websocket
    request propagates the current tracing context, so we can get a holistic
    end-to-end trace for each turn.
    
    Tracing is propagated via special keys (`ws_request_header_traceparent`,
    `ws_request_header_tracestate`) set in the `client_metadata` param in
    Responses API.
    
    Currently tracing on websockets is a bit broken because we only set
    tracing context on ws connection time, so it's detached from a
    `turn/start` request.
  • Prefer websockets when providers support them (#13592)
    Remove all flags and model settings.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Stabilize Windows cmd-based shell test harnesses (#14958)
    ## What is flaky
    The Windows shell-driven integration tests in `codex-rs/core` were
    intermittently unstable, especially:
    
    - `apply_patch_cli_can_use_shell_command_output_as_patch_input`
    - `websocket_test_codex_shell_chain`
    - `websocket_v2_test_codex_shell_chain`
    
    ## Why it was flaky
    These tests were exercising real shell-tool flows through whichever
    shell Codex selected on Windows, and the `apply_patch` test also nested
    a PowerShell read inside `cmd /c`.
    
    There were multiple independent sources of nondeterminism in that setup:
    
    - The test harness depended on the model-selected Windows shell instead
    of pinning the shell it actually meant to exercise.
    - `cmd.exe /c powershell.exe -Command "..."` is quoting-sensitive; on CI
    that could leave the read command wrapped as a literal string instead of
    executing it.
    - Even after getting the quoting right, PowerShell could emit CLIXML
    progress records like module-initialization output onto stdout.
    - The `apply_patch` test was building a patch directly from shell
    stdout, so any quoting artifact or progress noise corrupted the patch
    input.
    
    So the failures were driven by shell startup and output-shape variance,
    not by the `apply_patch` or websocket logic themselves.
    
    ## How this PR fixes it
    - Add a test-only `user_shell_override` path so Windows integration
    tests can pin `cmd.exe` explicitly.
    - Use that override in the websocket shell-chain tests and in the
    `apply_patch` harness.
    - Change the nested Windows file read in
    `apply_patch_cli_can_use_shell_command_output_as_patch_input` to a UTF-8
    PowerShell `-EncodedCommand` script.
    - Run that nested PowerShell process with `-NonInteractive`, set
    `$ProgressPreference = 'SilentlyContinue'`, and read the file with
    `[System.IO.File]::ReadAllText(...)`.
    
    ## Why this fix fixes the flakiness
    The outer harness now runs under a deterministic shell, and the inner
    PowerShell read no longer depends on fragile `cmd` quoting or on
    progress output staying quiet by accident. The shell tool returns only
    the file contents, so patch construction and websocket assertions depend
    on stable test inputs instead of on runner-specific shell behavior.
    
    ---------
    
    Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
    Co-authored-by: Codex <noreply@openai.com>
  • Apply argument comment lint across codex-rs (#14652)
    ## Why
    
    Once the repo-local lint exists, `codex-rs` needs to follow the
    checked-in convention and CI needs to keep it from drifting. This commit
    applies the fallback `/*param*/` style consistently across existing
    positional literal call sites without changing those APIs.
    
    The longer-term preference is still to avoid APIs that require comments
    by choosing clearer parameter types and call shapes. This PR is
    intentionally the mechanical follow-through for the places where the
    existing signatures stay in place.
    
    After rebasing onto newer `main`, the rollout also had to cover newly
    introduced `tui_app_server` call sites. That made it clear the first cut
    of the CI job was too expensive for the common path: it was spending
    almost as much time installing `cargo-dylint` and re-testing the lint
    crate as a representative test job spends running product tests. The CI
    update keeps the full workspace enforcement but trims that extra
    overhead from ordinary `codex-rs` PRs.
    
    ## What changed
    
    - keep a dedicated `argument_comment_lint` job in `rust-ci`
    - mechanically annotate remaining opaque positional literals across
    `codex-rs` with exact `/*param*/` comments, including the rebased
    `tui_app_server` call sites that now fall under the lint
    - keep the checked-in style aligned with the lint policy by using
    `/*param*/` and leaving string and char literals uncommented
    - cache `cargo-dylint`, `dylint-link`, and the relevant Cargo
    registry/git metadata in the lint job
    - split changed-path detection so the lint crate's own `cargo test` step
    runs only when `tools/argument-comment-lint/*` or `rust-ci.yml` changes
    - continue to run the repo wrapper over the `codex-rs` workspace, so
    product-code enforcement is unchanged
    
    Most of the code changes in this commit are intentionally mechanical
    comment rewrites or insertions driven by the lint itself.
    
    ## Verification
    
    - `./tools/argument-comment-lint/run.sh --workspace`
    - `cargo test -p codex-tui-app-server -p codex-tui`
    - parsed `.github/workflows/rust-ci.yml` locally with PyYAML
    
    ---
    
    * -> #14652
    * #14651
  • Reuse guardian session across approvals (#14668)
    ## Summary
    - reuse a guardian subagent session across approvals so reviews keep a
    stable prompt cache key and avoid one-shot startup overhead
    - clear the guardian child history before each review so prior guardian
    decisions do not leak into later approvals
    - include the `smart_approvals` -> `guardian_approval` feature flag
    rename in the same PR to minimize release latency on a very tight
    timeline
    - add regression coverage for prompt-cache-key reuse without
    prior-review prompt bleed
    
    ## Request
    - Bug/enhancement request: internal guardian prompt-cache and latency
    improvement request
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • [apps] Improve search tool fallback. (#14732)
    - [x] Bypass tool search and stuff tool specs directly into model
    context when either a. Tool search is not available for the model or b.
    There are not that many tools to search for.
  • [apps] Add tool call meta. (#14647)
    - [x] Add resource_uri and other things to _meta to shortcut resource
    lookup and speed things up.
  • move plugin/skill instructions into dev msg and reorder (#14609)
    Move the general `Apps`, `Skills` and `Plugins` instructions blocks out
    of `user_instructions` and into the developer message, with new `Apps ->
    Skills -> Plugins` order for better clarity.
    
    Also wrap those sections in stable XML-style instruction tags (like
    other sections) and update prompt-layout tests/snapshots. This makes the
    tests less brittle in snapshot output (we can parse the sections), and
    it consolidates the capability instructions in one place.
    
    #### Tests
    Updated snapshots, added tests.
    
    `<AGENTS_MD>` disappearing in snapshots is expected: before this change,
    the wrapped user-instructions message was kept alive by `Skills`
    content. Now that `Skills` and `Plugins` are in the developer message,
    that wrapper only appears when there is real
    project-doc/user-instructions content.
    
    ---------
    
    Co-authored-by: Charley Cunningham <ccunningham@openai.com>
  • Add openai_base_url config override for built-in provider (#12031)
    We regularly get bug reports from users who mistakenly have the
    `OPENAI_BASE_URL` environment variable set. This PR deprecates this
    environment variable in favor of a top-level config key
    `openai_base_url` that is used for the same purpose. By making it a
    config key, it will be more visible to users. It will also participate
    in all of the infrastructure we've added for layered and managed
    configs.
    
    Summary
    - introduce the `openai_base_url` top-level config key, update
    schema/tests, and route the built-in openai provider through it while
    - fall back to deprecated `OPENAI_BASE_URL` env var but warn user of
    deprecation when no `openai_base_url` config key is present
    - update CLI, SDK, and TUI code to prefer the new config path (with a
    deprecated env-var fallback) and document the SDK behavior change
  • [apps] Add tool_suggest tool. (#14287)
    - [x] Add tool_suggest tool.
    - [x] Move chatgpt/src/connectors.rs and core/src/connectors.rs into a
    dedicated mod so that we have all the logic and global cache in one
    place.
    - [x] Update TUI app link view to support rendering the installation
    view for mcp elicitation.
    
    ---------
    
    Co-authored-by: Shaqayeq <shaqayeq@openai.com>
    Co-authored-by: Eric Traut <etraut@openai.com>
    Co-authored-by: pakrym-oai <pakrym@openai.com>
    Co-authored-by: Ahmed Ibrahim <aibrahim@openai.com>
    Co-authored-by: guinness-oai <guinness@openai.com>
    Co-authored-by: Eugene Brevdo <ebrevdo@users.noreply.github.com>
    Co-authored-by: Charlie Guo <cguo@openai.com>
    Co-authored-by: Fouad Matin <fouad@openai.com>
    Co-authored-by: Fouad Matin <169186268+fouad-openai@users.noreply.github.com>
    Co-authored-by: xl-openai <xl@openai.com>
    Co-authored-by: alexsong-oai <alexsong@openai.com>
    Co-authored-by: Owen Lin <owenlin0@gmail.com>
    Co-authored-by: sdcoffey <stevendcoffey@gmail.com>
    Co-authored-by: Codex <noreply@openai.com>
    Co-authored-by: Won Park <won@openai.com>
    Co-authored-by: Dylan Hurd <dylan.hurd@openai.com>
    Co-authored-by: celia-oai <celia@openai.com>
    Co-authored-by: gabec-openai <gabec@openai.com>
    Co-authored-by: joeytrasatti-openai <joey.trasatti@openai.com>
    Co-authored-by: Leo Shimonaka <leoshimo@openai.com>
    Co-authored-by: Rasmus Rygaard <rasmus@openai.com>
    Co-authored-by: maja-openai <163171781+maja-openai@users.noreply.github.com>
    Co-authored-by: pash-openai <pash@openai.com>
    Co-authored-by: Josh McKinney <joshka@openai.com>
  • feat(app-server): propagate traces across tasks and core ops (#14387)
    ## Summary
    
    This PR keeps app-server RPC request trace context alive for the full
    lifetime of the work that request kicks off (e.g. for `thread/start`,
    this is `app-server rpc handler -> tokio background task -> core op
    submissions`). Previously we lose trace lineage once the request handler
    returns or hands work off to background tasks.
    
    This approach is especially relevant for `thread/start` and other RPC
    handlers that run in a non-blocking way. In the near future we'll most
    likely want to make all app-server handlers run in a non-blocking way by
    default, and only queue operations that must operate in order (e.g.
    thread RPCs per thread?), so we want to make sure tracing in app-server
    just generally works.
    
    Depends on https://github.com/openai/codex/pull/14300
    
    **Before**
    <img width="155" height="207" alt="image"
    src="https://github.com/user-attachments/assets/c9487459-36f1-436c-beb7-fafeb40737af"
    />
    
    
    **After**
    <img width="299" height="337" alt="image"
    src="https://github.com/user-attachments/assets/727392b2-d072-4427-9dc4-0502d8652dea"
    />
    
    ## What changed
    
    - Keep request-scoped trace context around until we send the final
    response or error, or the connection closes.
    - Thread that trace context through detached `thread/start` work so
    background startup stays attached to the originating request.
    - Pass request trace context through to downstream core operations,
    including:
      - thread creation
      - resume/fork flows
      - turn submission
      - review
      - interrupt
      - realtime conversation operations
    - Add tracing tests that verify:
      - remote W3C trace context is preserved for `thread/start`
      - remote W3C trace context is preserved for `turn/start`
      - downstream core spans stay under the originating request span
      - request-scoped tracing state is cleaned up correctly
    - Clean up shutdown behavior so detached background tasks and spawned
    threads are drained before process exit.
  • feat: search_tool migrate to bring you own tool of Responses API (#14274)
    ## Why
    
    to support a new bring your own search tool in Responses
    API(https://developers.openai.com/api/docs/guides/tools-tool-search#client-executed-tool-search)
    we migrating our bm25 search tool to use official way to execute search
    on client and communicate additional tools to the model.
    
    ## What
    - replace the legacy `search_tool_bm25` flow with client-executed
    `tool_search`
    - add protocol, SSE, history, and normalization support for
    `tool_search_call` and `tool_search_output`
    - return namespaced Codex Apps search results and wire namespaced
    follow-up tool calls back into MCP dispatch
  • Stabilize websocket response.failed error delivery (#14017)
    ## What changed
    - Drop failed websocket connections immediately after a terminal stream
    error instead of awaiting a graceful close handshake before forwarding
    the error to the caller.
    - Keep the success path and the closed-connection guard behavior
    unchanged.
    
    ## Why this fixes the flake
    - The failing integration test waits for the second websocket stream to
    surface the model error before issuing a follow-up request.
    - On slower runners, the old error path awaited
    `ws_stream.close().await` before sending the error downstream. If that
    close handshake stalled, the test kept waiting for an error that had
    already happened server-side and nextest timed it out.
    - Dropping the failed websocket immediately makes the terminal error
    observable right away and marks the session closed so the next request
    reconnects cleanly instead of depending on a best-effort close
    handshake.
    
    ## Code or test?
    - This is a production logic fix in `codex-api`. The existing websocket
    integration test already exercises the regression path.
  • feat: support disabling bundled system skills (#13792)
    Support disable bundled system skills with a config:
    
    [skills.bundled]
    enabled = false
  • fix(ci) fix guardian ci (#13911)
    ## Summary
    #13910 was merged with some unused imports, let's fix this
    
    ## Testing
    - [x] Let's make sure CI is green
    
    ---------
    
    Co-authored-by: Charles Cunningham <ccunningham@openai.com>
    Co-authored-by: Codex <noreply@openai.com>
  • image-gen-core (#13290)
    Core tool-calling for image-gen, handles requesting and receiving logic
    for images using response API
  • core: box wrapper futures to reduce stack pressure (#13429)
    Follow-up to [#13388](https://github.com/openai/codex/pull/13388). This
    uses the same general fix pattern as
    [#12421](https://github.com/openai/codex/pull/12421), but in the
    `codex-core` compact/resume/fork path.
    
    ## Why
    
    `compact_resume_after_second_compaction_preserves_history` started
    overflowing the stack on Windows CI after `#13388`.
    
    The important part is that this was not a compaction-recursion bug. The
    test exercises a path with several thin `async fn` wrappers around much
    larger thread-spawn, resume, and fork futures. When one `async fn`
    awaits another inline, the outer future stores the callee future as part
    of its own state machine. In a long wrapper chain, that means a caller
    can accidentally inline a lot more state than the source code suggests.
    
    That is exactly what was happening here:
    
    - `ThreadManager` convenience methods such as `start_thread`,
    `resume_thread_from_rollout`, and `fork_thread` were inlining the larger
    spawn/resume futures beneath them.
    - `core_test_support::test_codex` added another wrapper layer on top of
    those same paths.
    - `compact_resume_fork` adds a few more helpers, and this particular
    test drives the resume/fork path multiple times.
    
    On Windows, that was enough to push both the libtest thread and Tokio
    worker threads over the edge. The previous 8 MiB test-thread workaround
    proved the failure was stack-related, but it did not address the
    underlying future size.
    
    ## How This Was Debugged
    
    The useful debugging pattern here was to turn the CI-only failure into a
    local low-stack repro.
    
    1. First, remove the explicit large-stack harness so the test runs on
    the normal `#[tokio::test]` path.
    2. Build the test binary normally.
    3. Re-run the already-built `tests/all` binary directly with
    progressively smaller `RUST_MIN_STACK` values.
    
    Running the built binary directly matters: it keeps the reduced stack
    size focused on the test process instead of also applying it to `cargo`
    and `rustc`.
    
    That made it possible to answer two questions quickly:
    
    - Does the failure still reproduce without the workaround? Yes.
    - Does boxing the wrapper futures actually buy back stack headroom? Also
    yes.
    
    After this change, the built test binary passes with
    `RUST_MIN_STACK=917504` and still overflows at `786432`, which is enough
    evidence to justify removing the explicit 8 MiB override while keeping a
    deterministic low-stack repro for future debugging.
    
    If we hit a similar issue again, the first places to inspect are thin
    `async fn` wrappers that mostly forward into a much larger async
    implementation.
    
    ## `Box::pin()` Primer
    
    `async fn` compiles into a state machine. If a wrapper does this:
    
    ```rust
    async fn wrapper() {
        inner().await;
    }
    ```
    
    then `wrapper()` stores the full `inner()` future inline as part of its
    own state.
    
    If the wrapper instead does this:
    
    ```rust
    async fn wrapper() {
        Box::pin(inner()).await;
    }
    ```
    
    then the child future lives on the heap, and the outer future only
    stores a pinned pointer to it. That usually trades one allocation for a
    substantially smaller outer future, which is exactly the tradeoff we
    want when the problem is stack pressure rather than raw CPU time.
    
    Useful references:
    
    -
    [`Box::pin`](https://doc.rust-lang.org/std/boxed/struct.Box.html#method.pin)
    - [Async book:
    Pinning](https://rust-lang.github.io/async-book/04_pinning/01_chapter.html)
    
    ## What Changed
    
    - Boxed the wrapper futures in `core/src/thread_manager.rs` around
    `start_thread`, `resume_thread_from_rollout`, `fork_thread`, and the
    corresponding `ThreadManagerState` spawn helpers so callers no longer
    inline the full spawn/resume state machine through multiple layers.
    - Boxed the matching test-only wrapper futures in
    `core/tests/common/test_codex.rs` and
    `core/tests/suite/compact_resume_fork.rs`, which sit directly on top of
    the same path.
    - Restored `compact_resume_after_second_compaction_preserves_history` in
    `core/tests/suite/compact_resume_fork.rs` to a normal `#[tokio::test]`
    and removed the explicit `TEST_STACK_SIZE_BYTES` thread/runtime sizing.
    - Simplified a tiny helper in `compact_resume_fork` by making
    `fetch_conversation_path()` synchronous, which removes one more
    unnecessary future layer from the test path.
    
    ## Verification
    
    - `cargo test -p codex-core --test all
    suite::compact_resume_fork::compact_resume_after_second_compaction_preserves_history
    -- --exact --nocapture`
    - `cargo test -p codex-core --test all suite::compact_resume_fork --
    --nocapture`
    - Re-ran the built `codex-core` `tests/all` binary directly with reduced
    stack sizes:
      - `RUST_MIN_STACK=917504` passes
      - `RUST_MIN_STACK=786432` still overflows
    - `cargo test -p codex-core`
    - Still fails locally in unrelated existing integration areas that
    expect the `codex` / `test_stdio_server` binaries or hit the existing
    `search_tool` wiremock mismatches.
  • config: enforce enterprise feature requirements (#13388)
    ## Why
    
    Enterprises can already constrain approvals, sandboxing, and web search
    through `requirements.toml` and MDM, but feature flags were still only
    configurable as managed defaults. That meant an enterprise could suggest
    feature values, but it could not actually pin them.
    
    This change closes that gap and makes enterprise feature requirements
    behave like the other constrained settings. The effective feature set
    now stays consistent with enterprise requirements during config load,
    when config writes are validated, and when runtime code mutates feature
    flags later in the session.
    
    It also tightens the runtime API for managed features. `ManagedFeatures`
    now follows the same constraint-oriented shape as `Constrained<T>`
    instead of exposing panic-prone mutation helpers, and production code
    can no longer construct it through an unconstrained `From<Features>`
    path.
    
    The PR also hardens the `compact_resume_fork` integration coverage on
    Windows. After the feature-management changes,
    `compact_resume_after_second_compaction_preserves_history` was
    overflowing the libtest/Tokio thread stacks on Windows, so the test now
    uses an explicit larger-stack harness as a pragmatic mitigation. That
    may not be the ideal root-cause fix, and it merits a parallel
    investigation into whether part of the async future chain should be
    boxed to reduce stack pressure instead.
    
    ## What Changed
    
    Enterprises can now pin feature values in `requirements.toml` with the
    requirements-side `features` table:
    
    ```toml
    [features]
    personality = true
    unified_exec = false
    ```
    
    Only canonical feature keys are allowed in the requirements `features`
    table; omitted keys remain unconstrained.
    
    - Added a requirements-side pinned feature map to
    `ConfigRequirementsToml`, threaded it through source-preserving
    requirements merge and normalization in `codex-config`, and made the
    TOML surface use `[features]` (while still accepting legacy
    `[feature_requirements]` for compatibility).
    - Exposed `featureRequirements` from `configRequirements/read`,
    regenerated the JSON/TypeScript schema artifacts, and updated the
    app-server README.
    - Wrapped the effective feature set in `ManagedFeatures`, backed by
    `ConstrainedWithSource<Features>`, and changed its API to mirror
    `Constrained<T>`: `can_set(...)`, `set(...) -> ConstraintResult<()>`,
    and result-returning `enable` / `disable` / `set_enabled` helpers.
    - Removed the legacy-usage and bulk-map passthroughs from
    `ManagedFeatures`; callers that need those behaviors now mutate a plain
    `Features` value and reapply it through `set(...)`, so the constrained
    wrapper remains the enforcement boundary.
    - Removed the production loophole for constructing unconstrained
    `ManagedFeatures`. Non-test code now creates it through the configured
    feature-loading path, and `impl From<Features> for ManagedFeatures` is
    restricted to `#[cfg(test)]`.
    - Rejected legacy feature aliases in enterprise feature requirements,
    and return a load error when a pinned combination cannot survive
    dependency normalization.
    - Validated config writes against enterprise feature requirements before
    persisting changes, including explicit conflicting writes and
    profile-specific feature states that normalize into invalid
    combinations.
    - Updated runtime and TUI feature-toggle paths to use the constrained
    setter API and to persist or apply the effective post-constraint value
    rather than the requested value.
    - Updated the `core_test_support` Bazel target to include the bundled
    core model-catalog fixtures in its runtime data, so helper code that
    resolves `core/models.json` through runfiles works in remote Bazel test
    environments.
    - Renamed the core config test coverage to emphasize that effective
    feature values are normalized at runtime, while conflicting persisted
    config writes are rejected.
    - Ran `compact_resume_after_second_compaction_preserves_history` inside
    an explicit 8 MiB test thread and Tokio runtime worker stack, following
    the existing larger-stack integration-test pattern, to keep the Windows
    `compact_resume_fork` test slice from aborting while a parallel
    investigation continues into whether some of the underlying async
    futures should be boxed.
    
    ## Verification
    
    - `cargo test -p codex-config`
    - `cargo test -p codex-core feature_requirements_ -- --nocapture`
    - `cargo test -p codex-core
    load_requirements_toml_produces_expected_constraints -- --nocapture`
    - `cargo test -p codex-core
    compact_resume_after_second_compaction_preserves_history -- --nocapture`
    - `cargo test -p codex-core compact_resume_fork -- --nocapture`
    - Re-ran the built `codex-core` `tests/all` binary with
    `RUST_MIN_STACK=262144` for
    `compact_resume_after_second_compaction_preserves_history` to confirm
    the explicit-stack harness fixes the deterministic low-stack repro.
    - `cargo test -p codex-core`
    - This still fails locally in unrelated integration areas that expect
    the `codex` / `test_stdio_server` binaries or hit existing `search_tool`
    wiremock mismatches.
    
    ## Docs
    
    `developers.openai.com/codex` should document the requirements-side
    `[features]` table for enterprise and MDM-managed configuration,
    including that it only accepts canonical feature keys and that
    conflicting config writes are rejected.
  • feat: load plugin apps (#13401)
    load plugin-apps from `.app.json`.
    
    make apps runtime-mentionable iff `codex_apps` MCP actually exposes
    tools for that `connector_id`.
    
    if the app isn't available, it's filtered out of runtime connector set,
    so no tools are added and no app-mentions resolve.
    
    right now we don't have a clean cli-side error for an app not being
    installed. can look at this after.
    
    ### Tests
    Added tests, tested locally that using a plugin that bundles an app
    picks up the app.
  • add fast mode toggle (#13212)
    - add a local Fast mode setting in codex-core (similar to how model id
    is currently stored on disk locally)
    - send `service_tier=priority` on requests when Fast is enabled
    - add `/fast` in the TUI and persist it locally
    - feature flag
  • Update realtime websocket API (#13265)
    - migrate the realtime websocket transport to the new session and
    handoff flow
    - make the realtime model configurable in config.toml and use API-key
    auth for the websocket
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Record realtime close marker on replacement (#13058)
    ## Summary
    - record a realtime close developer message when a new realtime session
    replaces an active one
    - assert the replacement marker through the mocked responses request
    path
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
    Co-authored-by: Charles Cunningham <ccunningham@openai.com>
  • core: adopt host_executable() rules in zsh-fork (#13046)
    ## Why
    
    [#12964](https://github.com/openai/codex/pull/12964) added
    `host_executable()` support to `codex-execpolicy`, but the zsh-fork
    interception path in `unix_escalation.rs` was still evaluating commands
    with the default exact-token matcher.
    
    That meant an intercepted absolute executable such as `/usr/bin/git
    status` could still miss basename rules like `prefix_rule(pattern =
    ["git", "status"])`, even when the policy also defined a matching
    `host_executable(name = "git", ...)` entry.
    
    This PR adopts the new matching behavior in the zsh-fork runtime only.
    That keeps the rollout intentionally narrow: zsh-fork already requires
    explicit user opt-in, so it is a safer first caller to exercise the new
    `host_executable()` scheme before expanding it to other execpolicy call
    sites.
    
    It also brings zsh-fork back in line with the current `prefix_rule()`
    execution model. Until prefix rules can carry their own permission
    profiles, a matched `prefix_rule()` is expected to rerun the intercepted
    command unsandboxed on `allow`, or after the user accepts `prompt`,
    instead of merely continuing inside the inherited shell sandbox.
    
    ## What Changed
    
    - added `evaluate_intercepted_exec_policy()` in
    `core/src/tools/runtimes/shell/unix_escalation.rs` to centralize
    execpolicy evaluation for intercepted commands
    - switched intercepted direct execs in the zsh-fork path to
    `check_multiple_with_options(...)` with `MatchOptions {
    resolve_host_executables: true }`
    - added `commands_for_intercepted_exec_policy()` so zsh-fork policy
    evaluation works from intercepted `(program, argv)` data instead of
    reconstructing a synthetic command before matching
    - left shell-wrapper parsing intentionally disabled by default behind
    `ENABLE_INTERCEPTED_EXEC_POLICY_SHELL_WRAPPER_PARSING`, so
    path-sensitive matching relies on later direct exec interception rather
    than shell-script parsing
    - made matched `prefix_rule()` decisions rerun intercepted commands with
    `EscalationExecution::Unsandboxed`, while unmatched-command fallback
    keeps the existing sandbox-preserving behavior
    - extracted the zsh-fork test harness into
    `core/tests/common/zsh_fork.rs` so both the skill-focused and
    approval-focused integration suites can exercise the same runtime setup
    - limited this change to the intercepted zsh-fork path rather than
    changing every execpolicy caller at once
    - added runtime coverage in
    `core/src/tools/runtimes/shell/unix_escalation_tests.rs` for allowed and
    disallowed `host_executable()` mappings and the wrapper-parsing modes
    - added integration coverage in `core/tests/suite/approvals.rs` to
    verify a saved `prefix_rule(pattern=["touch"], decision="allow")` reruns
    under zsh-fork outside a restrictive `WorkspaceWrite` sandbox
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13046).
    * #13065
    * __->__ #13046
  • Support multimodal custom tool outputs (#12948)
    ## Summary
    
    This changes `custom_tool_call_output` to use the same output payload
    shape as `function_call_output`, so freeform tools can return either
    plain text or structured content items.
    
    The main goal is to let `js_repl` return image content from nested
    `view_image` calls in its own `custom_tool_call_output`, instead of
    relying on a separate injected message.
    
    ## What changed
    
    - Changed `custom_tool_call_output.output` from `string` to
    `FunctionCallOutputPayload`
    - Updated freeform tool plumbing to preserve structured output bodies
    - Updated `js_repl` to aggregate nested tool content items and attach
    them to the outer `js_repl` result
    - Removed the old `js_repl` special case that injected `view_image`
    results as a separate pending user image message
    - Updated normalization/history/truncation paths to handle multimodal
    `custom_tool_call_output`
    - Regenerated app-server protocol schema artifacts
    
    ## Behavior
    
    Direct `view_image` calls still return a `function_call_output` with
    image content.
    
    When `view_image` is called inside `js_repl`, the outer `js_repl`
    `custom_tool_call_output` now carries:
    - an `input_text` item if the JS produced text output
    - one or more `input_image` items from nested tool results
    
    So the nested image result now stays inside the `js_repl` tool output
    instead of being injected as a separate message.
    
    ## Compatibility
    
    This is intended to be backward-compatible for resumed conversations.
    
    Older histories that stored `custom_tool_call_output.output` as a plain
    string still deserialize correctly, and older histories that used the
    previous injected-image-message flow also continue to resume.
    
    Added regression coverage for resuming a pre-change rollout containing:
    - string-valued `custom_tool_call_output`
    - legacy injected image message history
    
    
    #### [git stack](https://github.com/magus/git-stack-cli)
    - 👉 `1` https://github.com/openai/codex/pull/12948
  • Make realtime audio test deterministic (#12959)
    ## Summary\n- add a websocket test-server request waiter so tests can
    synchronize on recorded client messages\n- use that waiter in the
    realtime delegation test instead of a fixed audio timeout\n- add
    temporary timing logs in the test and websocket mock to inspect where
    the flake stalls
  • Allow clients not to send summary as an option (#12950)
    Summary is a required parameter on UserTurn. Ideally we'd like the core
    to decide the appropriate summary level.
    
    Make the summary optional and don't send it when not needed.
  • core: bundle settings diff updates into one dev/user envelope (#12417)
    ## Summary
    - bundle contextual prompt injection into at most one developer message
    plus one contextual user message in both:
      - per-turn settings updates
      - initial context insertion
    - preserve `<model_switch>` across compaction by rebuilding it through
    canonical initial-context injection, instead of relying on
    strip/reattach hacks
    - centralize contextual user fragment detection in one shared definition
    table and reuse it for parsing/compaction logic
    - keep `AGENTS.md` in its natural serialized format:
      - `# AGENTS.md instructions for {dirname}`
      - `<INSTRUCTIONS>...</INSTRUCTIONS>`
    - simplify related tests/helpers and accept the expected snapshot/layout
    updates from bundled multi-part messages
    
    ## Why
    The goal is to converge toward a simpler, more intentional prompt shape
    where contextual updates are consistently represented as one developer
    envelope plus one contextual user envelope, while keeping parsing and
    compaction behavior aligned with that representation.
    
    ## Notable details
    - the temporary `SettingsUpdateEnvelope` wrapper was removed; these
    paths now return `Vec<ResponseItem>` directly
    - local/remote compaction no longer rely on model-switch strip/restore
    helpers
    - contextual user detection is now driven by shared fragment definitions
    instead of ad hoc matcher assembly
    - AGENTS/user instructions are still the same logical context; only the
    synthetic `<user_instructions>` wrapper was replaced by the natural
    AGENTS text format
    
    ## Testing
    - `just fmt`
    - `cargo test -p codex-app-server
    codex_message_processor::tests::extract_conversation_summary_prefers_plain_user_messages
    -- --exact`
    - `cargo test -p codex-core
    compact::tests::collect_user_messages_filters_session_prefix_entries
    --lib -- --exact`
    - `cargo test -p codex-core --test all
    'suite::compact::snapshot_request_shape_pre_turn_compaction_strips_incoming_model_switch'
    -- --exact`
    - `cargo test -p codex-core --test all
    'suite::compact_remote::snapshot_request_shape_remote_pre_turn_compaction_strips_incoming_model_switch'
    -- --exact`
    - `cargo test -p codex-core --test all
    'suite::client::includes_apps_guidance_as_developer_message_when_enabled'
    -- --exact`
    - `cargo test -p codex-core --test all
    'suite::client::includes_developer_instructions_message_in_request' --
    --exact`
    - `cargo test -p codex-core --test all
    'suite::client::includes_user_instructions_message_in_request' --
    --exact`
    - `cargo test -p codex-core --test all
    'suite::client::resume_includes_initial_messages_and_sends_prior_items'
    -- --exact`
    - `cargo test -p codex-core --test all
    'suite::review::review_input_isolated_from_parent_history' -- --exact`
    - `cargo test -p codex-exec --test all
    'suite::resume::exec_resume_last_respects_cwd_filter_and_all_flag' --
    --exact`
    - `cargo test -p core_test_support
    context_snapshot::tests::full_text_mode_preserves_unredacted_text --
    --exact`
    
    ## Notes
    - I also ran several targeted `compact`, `compact_remote`,
    `prompt_caching`, `model_visible_layout`, and `event_mapping` tests
    while iterating on prompt-shape changes.
    - I have not claimed a clean full-workspace `cargo test` from this
    environment because local sandbox/resource conditions have previously
    produced unrelated failures in large workspace runs.
  • Agent jobs (spawn_agents_on_csv) + progress UI (#10935)
    ## Summary
    - Add agent job support: spawn a batch of sub-agents from CSV, auto-run,
    auto-export, and store results in SQLite.
    - Simplify workflow: remove run/resume/get-status/export tools; spawn is
    deterministic and completes in one call.
    - Improve exec UX: stable, single-line progress bar with ETA; suppress
    sub-agent chatter in exec.
    
    ## Why
    Enables map-reduce style workflows over arbitrarily large repos using
    the existing Codex orchestrator. This addresses review feedback about
    overly complex job controls and non-deterministic monitoring.
    
    ## Demo (progress bar)
    ```
    ./codex-rs/target/debug/codex exec \
      --enable collab \
      --enable sqlite \
      --full-auto \
      --progress-cursor \
      -c agents.max_threads=16 \
      -C /Users/daveaitel/code/codex \
      - <<'PROMPT'
    Create /tmp/agent_job_progress_demo.csv with columns: path,area and 30 rows:
    path = item-01..item-30, area = test.
    
    Then call spawn_agents_on_csv with:
    - csv_path: /tmp/agent_job_progress_demo.csv
    - instruction: "Run `python - <<'PY'` to sleep a random 0.3–1.2s, then output JSON with keys: path, score (int). Set score = 1."
    - output_csv_path: /tmp/agent_job_progress_demo_out.csv
    PROMPT
    ```
    
    ## Review feedback addressed
    - Auto-start jobs on spawn; removed run/resume/status/export tools.
    - Auto-export on success.
    - More descriptive tool spec + clearer prompts.
    - Avoid deadlocks on spawn failure; pending/running handled safely.
    - Progress bar no longer scrolls; stable single-line redraw.
    
    ## Tests
    - `cd codex-rs && cargo test -p codex-exec`
    - `cd codex-rs && cargo build -p codex-cli`
  • Send warmup request (#11258)
    Send a request with `generate: falls` but a full set of tools and
    instructions to pre-warm inference.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • test: vendor zsh fork via DotSlash and stabilize zsh-fork tests (#12518)
    ## Why
    
    The zsh integration tests were still brittle in two ways:
    
    - they relied on `CODEX_TEST_ZSH_PATH` / environment-specific setup, so
    they often did not exercise the patched zsh fork that `shell-tool-mcp`
    ships
    - once the tests consistently used the vendored zsh fork, they exposed
    real Linux-specific zsh-fork issues in CI
    
    In particular, the Linux failures were not just test noise:
    
    - the zsh-fork launch path was dropping `ExecRequest.arg0`, so Linux
    `codex-linux-sandbox` arg0 dispatch did not run and zsh wrapper-mode
    could receive malformed arguments
    - the
    `turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2`
    test uses the zsh exec bridge (which talks to the parent over a Unix
    socket), but Linux restricted sandbox seccomp denies `connect(2)`,
    causing timeouts on `ubuntu-24.04` x86/arm
    
    This PR makes the zsh tests consistently run against the intended
    vendored zsh fork and fixes/hardens the zsh-fork path so the Linux CI
    signal is meaningful.
    
    ## What Changed
    
    - Added a single shared test-only DotSlash file for the patched zsh fork
    at `codex-rs/exec-server/tests/suite/zsh` (analogous to the existing
    `bash` test resource).
    - Updated both app-server and exec-server zsh tests to use that shared
    DotSlash zsh (no duplicate zsh DotSlash file, no `CODEX_TEST_ZSH_PATH`
    dependency).
    - Updated the app-server zsh-fork test helper to resolve the shared
    DotSlash zsh and avoid silently falling back to host zsh.
    - Kept the app-server zsh-fork tests configured via `config.toml`, using
    a test wrapper path where needed to force `zsh -df` (and rewrite `-lc`
    to `-c`) for the subcommand-decline test.
    - Hardened the app-server subcommand-decline zsh-fork test for CI
    variability:
      - tolerate an extra `/responses` POST with a no-op mock response
    - tolerate non-target approval ordering while remaining strict on the
    two `/usr/bin/true` approvals and decline behavior
    - use `DangerFullAccess` on Linux for this one test because it validates
    zsh approval flow, not Linux sandbox socket restrictions
    - Fixed zsh-fork process launching on Linux by preserving `req.arg0` in
    `ZshExecBridge::execute_shell_request(...)` so `codex-linux-sandbox`
    arg0 dispatch continues to work.
    - Moved `maybe_run_zsh_exec_wrapper_mode()` under
    `arg0_dispatch_or_else(...)` in `app-server` and `cli` so wrapper-mode
    handling coexists correctly with arg0-dispatched helper modes.
    - Consolidated duplicated `dotslash -- fetch` resolution logic into
    shared test support (`core/tests/common/lib.rs`).
    - Updated `codex-rs/exec-server/tests/suite/accept_elicitation.rs` to
    use DotSlash zsh and hardened the zsh elicitation test for Bazel/zsh
    differences by:
      - resolving an absolute `git` path
      - running `git init --quiet .`
    - asserting success / `.git` creation instead of relying on banner text
    
    ## Verification
    
    - `cargo test -p codex-app-server turn_start_zsh_fork -- --nocapture`
    - `cargo test -p codex-exec-server accept_elicitation -- --nocapture`
    - `bazel test //codex-rs/exec-server:exec-server-all-test
    --test_output=streamed --test_arg=--nocapture
    --test_arg=accept_elicitation_for_prompt_rule_with_zsh`
    - CI (`rust-ci`) on the final cleaned commit: `Tests — ubuntu-24.04 -
    x86_64-unknown-linux-gnu` and `Tests — ubuntu-24.04-arm -
    aarch64-unknown-linux-gnu` passed in [run
    22291424358](https://github.com/openai/codex/actions/runs/22291424358)
  • chore: remove codex-core public protocol/shell re-exports (#12432)
    ## Why
    
    `codex-rs/core/src/lib.rs` re-exported a broad set of types and modules
    from `codex-protocol` and `codex-shell-command`. That made it easy for
    workspace crates to import those APIs through `codex-core`, which in
    turn hides dependency edges and makes it harder to reduce compile-time
    coupling over time.
    
    This change removes those public re-exports so call sites must import
    from the source crates directly. Even when a crate still depends on
    `codex-core` today, this makes dependency boundaries explicit and
    unblocks future work to drop `codex-core` dependencies where possible.
    
    ## What Changed
    
    - Removed public re-exports from `codex-rs/core/src/lib.rs` for:
    - `codex_protocol::protocol` and related protocol/model types (including
    `InitialHistory`)
      - `codex_protocol::config_types` (`protocol_config_types`)
    - `codex_shell_command::{bash, is_dangerous_command, is_safe_command,
    parse_command, powershell}`
    - Migrated workspace Rust call sites to import directly from:
      - `codex_protocol::protocol`
      - `codex_protocol::config_types`
      - `codex_protocol::models`
      - `codex_shell_command`
    - Added explicit `Cargo.toml` dependencies (`codex-protocol` /
    `codex-shell-command`) in crates that now import those crates directly.
    - Kept `codex-core` internal modules compiling by using `pub(crate)`
    aliases in `core/src/lib.rs` (internal-only, not part of the public
    API).
    - Updated the two utility crates that can already drop a `codex-core`
    dependency edge entirely:
      - `codex-utils-approval-presets`
      - `codex-utils-cli`
    
    ## Verification
    
    - `cargo test -p codex-utils-approval-presets`
    - `cargo test -p codex-utils-cli`
    - `cargo check --workspace --all-targets`
    - `just clippy`
  • Fix compaction context reinjection and model baselines (#12252)
    ## Summary
    - move regular-turn context diff/full-context persistence into
    `run_turn` so pre-turn compaction runs before incoming context updates
    are recorded
    - after successful pre-turn compaction, rely on a cleared
    `reference_context_item` to trigger full context reinjection on the
    follow-up regular turn (manual `/compact` keeps replacement history
    summary-only and also clears the baseline)
    - preserve `<model_switch>` when full context is reinjected, and inject
    it *before* the rest of the full-context items
    - scope `reference_context_item` and `previous_model` to regular user
    turns only so standalone tasks (`/compact`, shell, review, undo) cannot
    suppress future reinjection or `<model_switch>` behavior
    - make context-diff persistence + `reference_context_item` updates
    explicit in the regular-turn path, with clearer docs/comments around the
    invariant
    - stop persisting local `/compact` `RolloutItem::TurnContext` snapshots
    (only regular turns persist `TurnContextItem` now)
    - simplify resume/fork previous-model/reference-baseline hydration by
    looking up the last surviving turn context from rollout lifecycle
    events, including rollback and compaction-crossing handling
    - remove the legacy fallback that guessed from bare `TurnContext`
    rollouts without lifecycle events
    - update compaction/remote-compaction/model-visible snapshots and
    compact test assertions (including remote compaction mock response
    shape)
    
    ## Why
    We were persisting incoming context items before spawning the regular
    turn task, which let pre-turn compaction requests accidentally include
    incoming context diffs without the new user message. Fixing that exposed
    follow-on baseline issues around `/compact`, resume/fork, and standalone
    tasks that could cause duplicate context injection or suppress
    `<model_switch>` instructions.
    
    This PR re-centers the invariants around regular turns:
    - regular turns persist model-visible context diffs/full reinjection and
    update the `reference_context_item`
    - standalone tasks do not advance those regular-turn baselines
    - compaction clears the baseline when replacement history may have
    stripped the referenced context diffs
    
    ## Follow-ups (TODOs left in code)
    - `TODO(ccunningham)`: fix rollback/backtracking baseline handling more
    comprehensively
    - `TODO(ccunningham)`: include pending incoming context items in
    pre-turn compaction threshold estimation
    - `TODO(ccunningham)`: inject updated personality spec alongside
    `<model_switch>` so some model-switch paths can avoid forced full
    reinjection
    - `TODO(ccunningham)`: review task turn lifecycle
    (`TurnStarted`/`TurnComplete`) behavior and emit task-start context
    diffs for task types that should have them (excluding `/compact`)
    
    ## Validation
    - `just fmt`
    - CI should cover the updated compaction/resume/model-visible snapshot
    expectations and rollout-hydration behavior
    - I did **not** rerun the full local test suite after the latest
    resume-lookup / rollout-persistence simplifications
  • Unify remote compaction snapshot mocks around default endpoint behavior (#12050)
    ## Summary
    - standardize remote compaction test mocking around one default behavior
    in shared helpers
    - make default remote compact mocks mirror production shape: keep
    `message/user` + `message/developer`, drop assistant/tool artifacts,
    then append a summary user message
    - switch non-special `compact_remote` tests to the shared default mock
    instead of ad-hoc JSON payloads
    
    ## Special-case tests that still use explicit mocks
    - remote compaction error payload / HTTP failure behavior
    - summary-only compact output behavior
    - manual `/compact` with no prior user messages
    - stale developer-instruction injection coverage
    
    ## Why
    This removes inconsistent manual remote compaction fixtures and gives us
    one source of truth for normal remote compact behavior, while preserving
    explicit mocks only where tests intentionally cover non-default
    behavior.
  • bazel: fix snapshot parity for tests/*.rs rust_test targets (#11893)
    ## Summary
    - make `rust_test` targets generated from `tests/*.rs` use Cargo-style
    crate names (file stem) so snapshot names match Cargo (`all__...`
    instead of Bazel-derived names)
    - split lib vs `tests/*.rs` test env wiring in `codex_rust_crate` to
    keep existing lib snapshot behavior while applying Bazel
    runfiles-compatible workspace root for `tests/*.rs`
    - compute the `tests/*.rs` snapshot workspace root from package depth so
    `insta` resolves committed snapshots under Bazel `--noenable_runfiles`
    
    ## Validation
    - `bazelisk test //codex-rs/core:core-all-test
    --test_arg=suite::compact:: --cache_test_results=no`
    - `bazelisk test //codex-rs/core:core-all-test
    --test_arg=suite::compact_remote:: --cache_test_results=no`
  • feat: persist and restore codex app's tools after search (#11780)
    ### What changed
    1. Removed per-turn MCP selection reset in `core/src/tasks/mod.rs`.
    2. Added `SessionState::set_mcp_tool_selection(Vec<String>)` in
    `core/src/state/session.rs` for authoritative restore behavior (deduped,
    order-preserving, empty clears).
    3. Added rollout parsing in `core/src/codex.rs` to recover
    `active_selected_tools` from prior `search_tool_bm25` outputs:
       - tracks matching `call_id`s
       - parses function output text JSON
       - extracts `active_selected_tools`
       - latest valid payload wins
       - malformed/non-matching payloads are ignored
    4. Applied restore logic to resumed and forked startup paths in
    `core/src/codex.rs`.
    5. Updated instruction text to session/thread scope in
    `core/templates/search_tool/tool_description.md`.
    6. Expanded tests in `core/tests/suite/search_tool.rs`, plus unit
    coverage in:
       - `core/src/codex.rs`
       - `core/src/state/session.rs`
    
    ### Behavior after change
    1. Search activates matched tools.
    2. Additional searches union into active selection.
    3. Selection survives new turns in the same thread.
    4. Resume/fork restores selection from rollout history.
    5. Separate threads do not inherit selection unless forked.
  • core: snapshot tests for compaction requests, post-compaction layout, some additional compaction tests (#11487)
    This PR keeps compaction context-layout test coverage separate from
    runtime compaction behavior changes, so runtime logic review can stay
    focused.
    
    ## Included
    - Adds reusable context snapshot helpers in
    `core/tests/common/context_snapshot.rs` for rendering model-visible
    request/history shapes.
    - Standardizes helper naming for readability:
      - `format_request_input_snapshot`
      - `format_response_items_snapshot`
      - `format_labeled_requests_snapshot`
      - `format_labeled_items_snapshot`
    - Expands snapshot coverage for both local and remote compaction flows:
      - pre-turn auto-compaction
      - pre-turn failure/context-window-exceeded paths
      - mid-turn continuation compaction
      - manual `/compact` with and without prior user turns
    - Captures both sides where relevant:
      - compaction request shape
      - post-compaction history layout shape
    - Adds/uses shared request-inspection helpers so assertions target
    structured request content instead of ad-hoc JSON string parsing.
    - Aligns snapshots/assertions to current behavior and leaves explicit
    `TODO(ccunningham)` notes where behavior is known and intentionally
    deferred.
    
    ## Not Included
    - No runtime compaction logic changes.
    - No model-visible context/state behavior changes.
  • Do not attempt to append after response.completed (#11402)
    Completed responses are fully done, and new response must be created.
  • Remove test-support feature from codex-core and replace it with explicit test toggles (#11405)
    ## Why
    
    `codex-core` was being built in multiple feature-resolved permutations
    because test-only behavior was modeled as crate features. For a large
    crate, those permutations increase compile cost and reduce cache reuse.
    
    ## Net Change
    
    - Removed the `test-support` crate feature and related feature wiring so
    `codex-core` no longer needs separate feature shapes for test consumers.
    - Standardized cross-crate test-only access behind
    `codex_core::test_support`.
    - External test code now imports helpers from
    `codex_core::test_support`.
    - Underlying implementation hooks are kept internal (`pub(crate)`)
    instead of broadly public.
    
    ## Outcome
    
    - Fewer `codex-core` build permutations.
    - Better incremental cache reuse across test targets.
    - No intended production behavior change.
  • Remove deterministic_process_ids feature to avoid duplicate codex-core builds (#11393)
    ## Why
    
    `codex-core` enabled `deterministic_process_ids` through a self
    dev-dependency.
    That forced a second feature-resolved build of the same crate, which
    increased
    compile time and test latency.
    
    ## What Changed
    
    - Removed the `deterministic_process_ids` feature from
    `codex-rs/core/Cargo.toml`.
    - Removed the self dev-dependency on `codex-core` that enabled that
    feature.
    - Removed the Bazel `deterministic_process_ids` crate feature for
    `codex-core`.
    - Added a test-only `AtomicBool` override in unified exec process-id
    allocation.
    - Added a test-support setter for that override and re-exported it from
    `codex-core`.
    - Enabled deterministic process IDs in integration tests via
    `core_test_support` ctor.
    
    ## Behavior
    
    - Production behavior remains random process IDs.
    - Unit tests remain deterministic via `cfg(test)`.
    - Integration tests remain deterministic via explicit test-support
    initialization.
    
    ## Validation
    
    - `just fmt`
    - `cargo test -p codex-core unified_exec::`
    - `cargo test -p codex-core --test all unified_exec -- --test-threads=1`
    - `cargo tree -p codex-core -e features` (verified the removed feature
    path)
  • feat: enable premessage-deflate for websockets (#10966)
    note:
    unfortunately, tokio-tungstenite / tungstenite upgrade triggers some
    problems with linker of rama-tls-boring with openssl:
    ```
    error: linking with `/Users/apanasenko/Library/Caches/cargo-zigbuild/0.20.1/zigcc-x86_64-unknown-linux-musl-ff6a.sh` failed: exit status: 1
      |
      = note:  "/Users/apanasenko/Library/Caches/cargo-zigbuild/0.20.1/zigcc-x86_64-unknown-linux-musl-ff6a.sh" "-m64" "<sysroot>/lib/rustlib/x86_64-unknown-linux-musl/lib/self-contained/rcrt1.o" "<sysroot>/lib/rustlib/x86_64-unknown-linux-musl/lib/self-contained/crti.o" "<sysroot>/lib/rustlib/x86_64-unknown-linux-musl/lib/self-contained/crtbeginS.o" "<1 object files omitted>" "-Wl,--as-needed" "-Wl,-Bstatic" "/var/folders/kt/52y_g75x3ng8ktvk3rfwm6400000gp/T/rustcyGQdYm/{liblzma_sys-662a82316f96ec30,libbzip2_sys-bf78a2d58d5cbce6,liblibsqlite3_sys-6c004987fd67a36a,libtree_sitter_bash-220b99a97d331ab7,libtree_sitter-858f0a1dbfea58bd,libzstd_sys-6eb237deec748c5b,libring-2a87376483bf916f,libopenssl_sys-7c189e68b37fe2bb,liblibz_sys-4344eef4345520b1,librama_boring_sys-0414e98115015ee0}.rlib" "-lc++" "-lc++abi" "-lunwind" "-lc" "<sysroot>/lib/rustlib/x86_64-unknown-linux-musl/lib/libcompiler_builtins-*.rlib" "-L" "/var/folders/kt/52y_g75x3ng8ktvk3rfwm6400000gp/T/rustcyGQdYm/raw-dylibs" "-Wl,-Bdynamic" "-Wl,--eh-frame-hdr" "-Wl,-z,noexecstack" "-nostartfiles" "-L" "/Users/apanasenko/code/codex/codex-rs/target/x86_64-unknown-linux-musl/release/build/libz-sys-ff5ea50d88c28ffb/out/lib" "-L" "/Users/apanasenko/code/codex/codex-rs/target/x86_64-unknown-linux-musl/release/build/ring-bdec3dddc19f5a5e/out" "-L" "/Users/apanasenko/code/codex/codex-rs/target/x86_64-unknown-linux-musl/release/build/openssl-sys-96e0870de3ca22bc/out/openssl-build/install/lib" "-L" "/Users/apanasenko/code/codex/codex-rs/target/x86_64-unknown-linux-musl/release/build/zstd-sys-0cc37a5da1481740/out" "-L" "/Users/apanasenko/code/codex/codex-rs/target/x86_64-unknown-linux-musl/release/build/tree-sitter-72d2418073317c0f/out" "-L" "/Users/apanasenko/code/codex/codex-rs/target/x86_64-unknown-linux-musl/release/build/tree-sitter-bash-bfd293a9f333ce6a/out" "-L" "/Users/apanasenko/code/codex/codex-rs/target/x86_64-unknown-linux-musl/release/build/libsqlite3-sys-b78b2cfb81a330fc/out" "-L" "/Users/apanasenko/code/codex/codex-rs/target/x86_64-unknown-linux-musl/release/build/bzip2-sys-69a145cc859ef275/out/lib" "-L" "/Users/apanasenko/code/codex/codex-rs/target/x86_64-unknown-linux-musl/release/build/lzma-sys-07e92d0b6baa6fd4/out" "-L" "/Users/apanasenko/code/codex/codex-rs/target/x86_64-unknown-linux-musl/release/build/rama-boring-sys-0bc2dfbf669addc4/out/build/crypto/" "-L" "/Users/apanasenko/code/codex/codex-rs/target/x86_64-unknown-linux-musl/release/build/rama-boring-sys-0bc2dfbf669addc4/out/build/ssl/" "-L" "/Users/apanasenko/code/codex/codex-rs/target/x86_64-unknown-linux-musl/release/build/rama-boring-sys-0bc2dfbf669addc4/out/build/" "-L" "/Users/apanasenko/code/codex/codex-rs/target/x86_64-unknown-linux-musl/release/build/rama-boring-sys-0bc2dfbf669addc4/out/build" "-L" "<sysroot>/lib/rustlib/x86_64-unknown-linux-musl/lib/self-contained" "-L" "<sysroot>/lib/rustlib/x86_64-unknown-linux-musl/lib" "-o" "/Users/apanasenko/code/codex/codex-rs/target/x86_64-unknown-linux-musl/release/deps/codex_network_proxy-d08268b863517761" "-Wl,--gc-sections" "-static-pie" "-Wl,-z,relro,-z,now" "-Wl,-O1" "-Wl,--strip-all" "-nodefaultlibs" "<sysroot>/lib/rustlib/x86_64-unknown-linux-musl/lib/self-contained/crtendS.o" "<sysroot>/lib/rustlib/x86_64-unknown-linux-musl/lib/self-contained/crtn.o"
      = note: some arguments are omitted. use `--verbose` to show all linker arguments
      = note: warning: ignoring deprecated linker optimization setting '1'
              warning: unable to open library directory '/Users/apanasenko/code/codex/codex-rs/target/x86_64-unknown-linux-musl/release/build/rama-boring-sys-0bc2dfbf669addc4/out/build/crypto/': FileNotFound
              ld.lld: error: duplicate symbol: SSL_export_keying_material
              >>> defined at ssl_lib.c:3816 (ssl/ssl_lib.c:3816)
              >>>            libssl-lib-ssl_lib.o:(SSL_export_keying_material) in archive /var/folders/kt/52y_g75x3ng8ktvk3rfwm6400000gp/T/rustcyGQdYm/libopenssl_sys-7c189e68b37fe2bb.rlib
              >>> defined at t1_enc.cc:205 (/Users/apanasenko/code/codex/codex-rs/target/x86_64-unknown-linux-musl/release/build/rama-boring-sys-0bc2dfbf669addc4/out/boringssl/ssl/t1_enc.cc:205)
              >>>            t1_enc.cc.o:(.text.SSL_export_keying_material+0x0) in archive /var/folders/kt/52y_g75x3ng8ktvk3rfwm6400000gp/T/rustcyGQdYm/librama_boring_sys-0414e98115015ee0.rlib
    
              ld.lld: error: duplicate symbol: d2i_ASN1_TIME
              >>> defined at a_time.c:27 (crypto/asn1/a_time.c:27)
              >>>            libcrypto-lib-a_time.o:(d2i_ASN1_TIME) in archive /var/folders/kt/52y_g75x3ng8ktvk3rfwm6400000gp/T/rustcyGQdYm/libopenssl_sys-7c189e68b37fe2bb.rlib
              >>> defined at a_time.cc:34 (/Users/apanasenko/code/codex/codex-rs/target/x86_64-unknown-linux-musl/release/build/rama-boring-sys-0bc2dfbf669addc4/out/boringssl/crypto/asn1/a_time.cc:34)
              >>>            a_time.cc.o:(.text.d2i_ASN1_TIME+0x0) in archive /var/folders/kt/52y_g75x3ng8ktvk3rfwm6400000gp/T/rustcyGQdYm/librama_boring_sys-0414e98115015ee0.rlib
    ``` 
    
    that force me to migrate away from rama-tls-boring to rama-tls-rustls
    and pin `ring` for rustls.
  • core: preconnect Responses websocket for first turn (#10698)
    ## Problem
    The first user turn can pay websocket handshake latency even when a
    session has already started. We want to reduce that initial delay while
    preserving turn semantics and avoiding any prompt send during startup.
    
    Reviewer feedback also called out duplicated connect/setup paths and
    unnecessary preconnect state complexity.
    
    ## Mental model
    `ModelClient` owns session-scoped transport state. During session
    startup, it can opportunistically warm one websocket handshake slot. A
    turn-scoped `ModelClientSession` adopts that slot once if available,
    restores captured sticky turn-state, and otherwise opens a websocket
    through the same shared connect path.
    
    If startup preconnect is still in flight, first turn setup awaits that
    task and treats it as the first connection attempt for the turn.
    
    Preconnect is handshake-only. The first `response.create` is still sent
    only when a turn starts.
    
    ## Non-goals
    This change does not make preconnect required for correctness and does
    not change prompt/turn payload semantics. It also does not expand
    fallback behavior beyond clearing preconnect state when fallback
    activates.
    
    ## Tradeoffs
    The implementation prioritizes simpler ownership and shared connection
    code over header-match gating for reuse. The single-slot cache keeps
    lifecycle straightforward but only benefits the immediate next turn.
    
    Awaiting in-flight preconnect has the same app-level connect-timeout
    semantics as existing websocket connect behavior (no new timeout class
    introduced by this PR).
    
    ## Architecture
    `core/src/client.rs`:
    - Added session-level preconnect lifecycle state (`Idle` / `InFlight` /
    `Ready`) carrying one warmed websocket plus optional captured
    turn-state.
    - Added `pre_establish_connection()` startup warmup and `preconnect()`
    handshake-only setup.
    - Deduped auth/provider resolution into `current_client_setup()` and
    websocket handshake wiring into `connect_websocket()` /
    `build_websocket_headers()`.
    - Updated turn websocket path to adopt preconnect first, await in-flight
    preconnect when present, then create a new websocket only when needed.
    - Ensured fallback activation clears warmed preconnect state.
    - Added documentation for lifecycle, ownership, sticky-routing
    invariants, and timeout semantics.
    
    `core/src/codex.rs`:
    - Session startup invokes `model_client.pre_establish_connection(...)`.
    - Turn metadata resolution uses the shared timeout helper.
    
    `core/src/turn_metadata.rs`:
    - Centralized shared timeout helper used by both turn-time metadata
    resolution and startup preconnect metadata building.
    
    `core/tests/common/responses.rs` + websocket test suites:
    - Added deterministic handshake waiting helper (`wait_for_handshakes`)
    with bounded polling.
    - Added startup preconnect and in-flight preconnect reuse coverage.
    - Fallback expectations now assert exactly two websocket attempts in
    covered scenarios (startup preconnect + turn attempt before fallback
    sticks).
    
    ## Observability
    Preconnect remains best-effort and non-fatal. Existing
    websocket/fallback telemetry remains in place, and debug logs now make
    preconnect-await behavior and preconnect failures easier to reason
    about.
    
    ## Tests
    Validated with:
    1. `just fmt`
    2. `cargo test -p codex-core websocket_preconnect -- --nocapture`
    3. `cargo test -p codex-core websocket_fallback -- --nocapture`
    4. `cargo test -p codex-core
    websocket_first_turn_waits_for_inflight_preconnect -- --nocapture`
  • add none personality option (#10688)
    - add none personality enum value and empty placeholder behavior\n- add
    docs/schema updates and e2e coverage
  • Update tests to stop using sse_completed fixture (#10638)
    Summary:
    - replace the `sse_completed` fixture and related JSON template with
    direct `responses::ev_completed` payload builders
    - cascade the new SSE helpers through all affected core tests for
    consistency and clarity
    - remove legacy fixtures that were no longer needed once the helpers are
    in place
    
    Testing:
    - Not run (not requested)
  • feat: replace custom mcp-types crate with equivalents from rmcp (#10349)
    We started working with MCP in Codex before
    https://crates.io/crates/rmcp was mature, so we had our own crate for
    MCP types that was generated from the MCP schema:
    
    
    https://github.com/openai/codex/blob/8b95d3e082376f4cb23e92641705a22afb28a9da/codex-rs/mcp-types/README.md
    
    Now that `rmcp` is more mature, it makes more sense to use their MCP
    types in Rust, as they handle details (like the `_meta` field) that our
    custom version ignored. Though one advantage that our custom types had
    is that our generated types implemented `JsonSchema` and `ts_rs::TS`,
    whereas the types in `rmcp` do not. As such, part of the work of this PR
    is leveraging the adapters between `rmcp` types and the serializable
    types that are API for us (app server and MCP) introduced in #10356.
    
    Note this PR results in a number of changes to
    `codex-rs/app-server-protocol/schema`, which merit special attention
    during review. We must ensure that these changes are still
    backwards-compatible, which is possible because we have:
    
    ```diff
    - export type CallToolResult = { content: Array<ContentBlock>, isError?: boolean, structuredContent?: JsonValue, };
    + export type CallToolResult = { content: Array<JsonValue>, structuredContent?: JsonValue, isError?: boolean, _meta?: JsonValue, };
    ```
    
    so `ContentBlock` has been replaced with the more general `JsonValue`.
    Note that `ContentBlock` was defined as:
    
    ```typescript
    export type ContentBlock = TextContent | ImageContent | AudioContent | ResourceLink | EmbeddedResource;
    ```
    
    so the deletion of those individual variants should not be a cause of
    great concern.
    
    Similarly, we have the following change in
    `codex-rs/app-server-protocol/schema/typescript/Tool.ts`:
    
    ```
    - export type Tool = { annotations?: ToolAnnotations, description?: string, inputSchema: ToolInputSchema, name: string, outputSchema?: ToolOutputSchema, title?: string, };
    + export type Tool = { name: string, title?: string, description?: string, inputSchema: JsonValue, outputSchema?: JsonValue, annotations?: JsonValue, icons?: Array<JsonValue>, _meta?: JsonValue, };
    ```
    
    so:
    
    - `annotations?: ToolAnnotations` ➡️ `JsonValue`
    - `inputSchema: ToolInputSchema` ➡️ `JsonValue`
    - `outputSchema?: ToolOutputSchema` ➡️ `JsonValue`
    
    and two new fields: `icons?: Array<JsonValue>, _meta?: JsonValue`
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/10349).
    * #10357
    * __->__ #10349
    * #10356
  • Remove WebSocket wire format (#10179)
    I'd like WireApi to go away (when chat is removed) and WebSockets is
    still responses API just over a different transport.
  • default enable compression, update test helpers (#10102)
    set `enable_request_compression` flag to default-enabled.
    
    update integration test helpers to decompress `zstd` if flag set.