Commit Graph

906 Commits

  • Code mode on v8 (#15276)
    Moves Code Mode to a new crate with no dependencies on codex. This
    create encodes the code mode semantics that we want for lifetime,
    mounting, tool calling.
    
    The model-facing surface is mostly unchanged. `exec` still runs raw
    JavaScript, `wait` still resumes or terminates a `cell_id`, nested tools
    are still available through `tools.*`, and helpers like `text`, `image`,
    `store`, `load`, `notify`, `yield_control`, and `exit` still exist.
    
    The major change is underneath that surface:
    
    - Old code mode was an external Node runtime.
    - New code mode is an in-process V8 runtime embedded directly in Rust.
    - Old code mode managed cells inside a long-lived Node runner process.
    - New code mode manages cells in Rust, with one V8 runtime thread per
    active `exec`.
    - Old code mode used JSON protocol messages over child stdin/stdout plus
    Node worker-thread messages.
    - New code mode uses Rust channels and direct V8 callbacks/events.
    
    This PR also fixes the two migration regressions that fell out of that
    substrate change:
    
    - `wait { terminate: true }` now waits for the V8 runtime to actually
    stop before reporting termination.
    - synchronous top-level `exit()` now succeeds again instead of surfacing
    as a script error.
    
    ---
    
    - `core/src/tools/code_mode/*` is now mostly an adapter layer for the
    public `exec` / `wait` tools.
    - `code-mode/src/service.rs` owns cell sessions and async control flow
    in Rust.
    - `code-mode/src/runtime/*.rs` owns the embedded V8 isolate and
    JavaScript execution.
    - each `exec` spawns a dedicated runtime thread plus a Rust
    session-control task.
    - helper globals are installed directly into the V8 context instead of
    being injected through a source prelude.
    - helper modules like `tools.js` and `@openai/code_mode` are synthesized
    through V8 module resolution callbacks in Rust.
    
    ---
    
    Also added a benchmark for showing the speed of init and use of a code
    mode env:
    ```
    $ cargo bench -p codex-code-mode --bench exec_overhead -- --samples 30 --warm-iterations 25 --tool-counts 0,32,128
    Finished [`bench` profile [optimized]](https://doc.rust-lang.org/cargo/reference/profiles.html#default-profiles) target(s) in 0.18s
         Running benches/exec_overhead.rs (target/release/deps/exec_overhead-008c440d800545ae)
    exec_overhead: samples=30, warm_iterations=25, tool_counts=[0, 32, 128]
    scenario       tools samples    warmups      iters      mean/exec       p95/exec       rssΔ p50       rssΔ max
    cold_exec          0      30          0          1         1.13ms         1.20ms        8.05MiB        8.06MiB
    warm_exec          0      30          1         25       473.43us       512.49us      912.00KiB        1.33MiB
    cold_exec         32      30          0          1         1.03ms         1.15ms        8.08MiB        8.11MiB
    warm_exec         32      30          1         25       509.73us       545.76us      960.00KiB        1.30MiB
    cold_exec        128      30          0          1         1.14ms         1.19ms        8.30MiB        8.34MiB
    warm_exec        128      30          1         25       575.08us       591.03us      736.00KiB      864.00KiB
    memory uses a fresh-process max RSS delta for each scenario
    ```
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • chore(core) Remove Feature::PowershellUtf8 (#15128)
    ## Summary
    This feature has been enabled for powershell for a while now, let's get
    rid of the logic
    
    ## Testing
    - [x] Unit tests
  • feat: change multi-agent to use path-like system instead of uuids (#15313)
    This PR add an URI-based system to reference agents within a tree. This
    comes from a sync between research and engineering.
    
    The main agent (the one manually spawned by a user) is always called
    `/root`. Any sub-agent spawned by it will be `/root/agent_1` for example
    where `agent_1` is chosen by the model.
    
    Any agent can contact any agents using the path.
    
    Paths can be used either in absolute or relative to the calling agents
    
    Resume is not supported for now on this new path
  • Add remote test skill (#15324)
    Teach codex to run remote tests.
  • Add remote env CI matrix and integration test (#14869)
    `CODEX_TEST_REMOTE_ENV` will make `test_codex` start the executor
    "remotely" (inside a docker container) turning any integration test into
    remote test.
  • Split features into codex-features crate (#15253)
    - Split the feature system into a new `codex-features` crate.
    - Cut `codex-core` and workspace consumers over to the new config and
    warning APIs.
    
    Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
    Co-authored-by: Codex <noreply@openai.com>
  • Move auth code into login crate (#15150)
    - Move the auth implementation and token data into codex-login.
    - Keep codex-core re-exporting that surface from codex-login for
    existing callers.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • changed save directory to codex_home (#15222)
    saving image gen default save directory to
    codex_home/imagegen/thread_id/
  • Plumb MCP turn metadata through _meta (#15190)
    ## Summary
    
    Some background. We're looking to instrument GA turns end to end. Right
    now a big gap is grouping mcp tool calls with their codex sessions. We
    send session id and turn id headers to the responses call but not the
    mcp/wham calls.
    
    Ideally we could pass the args as headers like with responses, but given
    the setup of the rmcp client, we can't send as headers without either
    changing the rmcp package upstream to allow per request headers or
    introducing a mutex which break concurrency. An earlier attempt made the
    assumption that we had 1 client per thread, which allowed us to set
    headers at the start of a turn. @pakrym mentioned that this assumption
    might break in the near future.
    
    So the solution now is to package the turn metadata/session id into the
    _meta field in the post body and pull out in codex-backend.
    
    - send turn metadata to MCP servers via `tools/call` `_meta` instead of
    assuming per-thread request headers on shared clients
    - preserve the existing `_codex_apps` metadata while adding
    `x-codex-turn-metadata` for all MCP tool calls
    - extend tests to cover both custom MCP servers and the codex apps
    search flow
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • [hooks] use a user message > developer message for prompt continuation (#14867)
    ## Summary
    
    Persist Stop-hook continuation prompts as `user` messages instead of
    hidden `developer` messages + some requested integration tests
    
    This is a followup to @pakrym 's comment in
    https://github.com/openai/codex/pull/14532 to make sure stop-block
    continuation prompts match training for turn loops
    
    - Stop continuation now writes `<hook_prompt hook_run_id="...">stop
    hook's user prompt<hook_prompt>`
    - Introduces quick-xml dependency, though we already indirectly depended
    on it anyway via syntect
    - This PR only has about 500 lines of actual logic changes, the rest is
    tests/schema
    
    ## Testing
    
    Example run (with a sessionstart hook and 3 stop hooks) - this shows
    context added by session start, then two stop hooks sending their own
    additional prompts in a new turn. The model responds with a single
    message addressing both. Then when that turn ends, the hooks detect that
    they just ran using `stop_hook_active` and decide not to infinite loop
    
    test files for this (unzip, move codex -> .codex):
    [codex.zip](https://github.com/user-attachments/files/26075806/codex.zip)
    
    ```
    › cats
    
    
    • Running SessionStart hook: lighting the observatory
    
    SessionStart hook (completed)
      warning: Hi, I'm a session start hook for wizard-tower (startup).
      hook context: A wimboltine stonpet is an exotic cuisine from hyperspace
    
    • Cats are tiny zen wizards, my friend: equal parts nap, mystery, and chaos. If you want, we can talk
      cat facts, cat breeds, cat names, or build something cat-themed in this repo.
    
    • Running Stop hook: checking the tower wards
    
    • Running Stop hook: sacking the guards
    
    • Running Stop hook: hiring the guards
    
    Stop hook (completed)
      warning: Wizard Tower Stop hook reviewed the completed reply (177 chars).
    
    Stop hook (blocked)
      warning: Wizard Tower Stop hook continuing conversation
      feedback: cook the stonpet
    
    Stop hook (blocked)
      warning: Wizard Tower Stop hook continuing conversation
      feedback: eat the cooked stonpet
    
    • Stonpet’s cooked, aloha style: flash-seared over a blue quasiflame, glazed with nebula salt, and
      rested until the hyperspace juices settle.
    
      Now we eat with gratitude, my friend. One mindful bite in, and the flavor is pure cosmic surf:
      smoky, bright, and totally out of this dimension.
    
    • Running Stop hook: checking the tower wards
    
    • Running Stop hook: sacking the guards
    
    • Running Stop hook: hiring the guards
    
    Stop hook (completed)
      warning: Wizard Tower Stop hook reviewed the completed reply (285 chars).
    
    Stop hook (completed)
      warning: Wizard Tower Stop hook saw a second pass and stayed calm to avoid a loop.
    
    Stop hook (completed)
      warning: Wizard Tower Stop hook saw a second pass and stayed calm to avoid a loop.
    ```
  • [hooks] turn_id extension for Stop & UserPromptSubmit (#15118)
    ## Description
    
    Adding an extension to the spec that exposes the turn_id to hook
    scripts. This is a codex-specific mechanic that allows connecting the
    hook runs with particular turns
    
    ## Testing
    
    hooks config / sample hooks to use. Extract this, rename codex ->
    .codex, and place this into a repo or your home folder. It includes:
    config.toml that enables hooks, hooks.json, and sample python hooks:
    
    
    [codex.zip](https://github.com/user-attachments/files/26102671/codex.zip)
    
    example run (note the turn_ids change between turns):
    
    ```
    › hello
    
    
    • Running SessionStart hook: lighting the observatory
    
    SessionStart hook (completed)
      warning: Hi, I'm a session start hook for wizard-tower (startup).
      hook context: Startup context: A wimboltine stonpet is an exotic cuisine from hyperspace
    
    • Running UserPromptSubmit hook: lighting the observatory lanterns
    
    UserPromptSubmit hook (completed)
      warning: wizard-tower UserPromptSubmit demo inspected: hello for turn: 019d036d-c7fa-72d2-b6fd-
    78878bfe34e4
      hook context: Wizard Tower UserPromptSubmit demo fired. For this reply only, include the exact
    phrase 'observatory lanterns lit' near the end.
    
    • Aloha! Grateful to be here and ready to build with you. Show me what you want to tackle in wizard-
      tower, and we’ll surf the next wave together. observatory lanterns lit
    
    • Running Stop hook: back to shore
    
    Stop hook (completed)
      warning: Wizard Tower Stop hook reviewed the completed reply (170 chars) for turn: 019d036d-c7fa-
    72d2-b6fd-78878bfe34e4
    
    
    › what's a stonpet?
    
    
    • Running UserPromptSubmit hook: lighting the observatory lanterns
    
    UserPromptSubmit hook (completed)
      warning: wizard-tower UserPromptSubmit demo inspected: what's a stonpet? for turn: 019d036e-3164-
    72c3-a170-98925564c4fc
      hook context: Wizard Tower UserPromptSubmit demo fired. For this reply only, include the exact
    phrase 'observatory lanterns lit' near the end.
    
    • A stonpet isn’t a standard real-world word, brah. In our shared context here, a wimboltine stonpet
      is an exotic cuisine from hyperspace, so “stonpet” sounds like the dish or food itself.
    
      If you want, we can totally invent the lore for it next. observatory lanterns lit
    
    • Running Stop hook: back to shore
    
    Stop hook (completed)
      warning: Wizard Tower Stop hook reviewed the completed reply (271 chars) for turn: 019d036e-3164-
    72c3-a170-98925564c4fc
    ```
  • feat(core, tracing): create turn spans over websockets (#14632)
    ## Description
    
    Dependent on:
    - [responsesapi] https://github.com/openai/openai/pull/760991 
    - [codex-backend] https://github.com/openai/openai/pull/760985
    
    `codex app-server -> codex-backend -> responsesapi` now reuses a
    persistent websocket connection across many turns. This PR updates
    tracing when using websockets so that each `response.create` websocket
    request propagates the current tracing context, so we can get a holistic
    end-to-end trace for each turn.
    
    Tracing is propagated via special keys (`ws_request_header_traceparent`,
    `ws_request_header_tracestate`) set in the `client_metadata` param in
    Responses API.
    
    Currently tracing on websockets is a bit broken because we only set
    tracing context on ws connection time, so it's detached from a
    `turn/start` request.
  • Align SQLite feedback logs with feedback formatter (#13494)
    ## Summary
    - store a pre-rendered `feedback_log_body` in SQLite so `/feedback`
    exports keep span prefixes and structured event fields
    - render SQLite feedback exports with timestamps and level prefixes to
    match the old in-memory feedback formatter, while preserving existing
    trailing newlines
    - count `feedback_log_body` in the SQLite retention budget so structured
    or span-prefixed rows still prune correctly
    - bound `/feedback` row loading in SQL with the retention estimate, then
    apply exact whole-line truncation in Rust so uploads stay capped without
    splitting lines
    
    ## Details
    - add a `feedback_log_body` column to `logs` and backfill it from
    `message` for existing rows
    - capture span names plus formatted span and event fields at write time,
    since SQLite does not retain enough structure to reconstruct the old
    formatter later
    - keep SQLite feedback queries scoped to the requested thread plus
    same-process threadless rows
    - restore a SQL-side cumulative `estimated_bytes` cap for feedback
    export queries so over-retained partitions do not load every matching
    row before truncation
    - add focused formatting coverage for exported feedback lines and parity
    coverage against `tracing_subscriber`
    
    ## Testing
    - cargo test -p codex-state
    - just fix -p codex-state
    - just fmt
    
    codex author: `codex resume 019ca1b0-0ecc-78b1-85eb-6befdd7e4f1f`
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Add final message prefix to realtime handoff output (#15077)
    - prefix realtime handoff output with the agent final message label for
    both realtime v1 and v2
    - update realtime websocket and core expectations to match
  • Return image URL from view_image tool (#15072)
    Cleanup image semantics in code mode.
    
    `view_image` now returns `{image_url:string, details?: string}` 
    
    `image()` now allows both string parameter and `{image_url:string,
    details?: string}`
  • Propagate tool errors to code mode (#15075)
    Clean up error flow to push the FunctionCallError all the way up to
    dispatcher and allow code mode to surface as exception.
  • Add notify to code-mode (#14842)
    Allows model to send an out-of-band notification.
    
    The notification is injected as another tool call output for the same
    call_id.
  • fix(subagents) share execpolicy by default (#13702)
    ## Summary
    If a subagent requests approval, and the user persists that approval to
    the execpolicy, it should (by default) propagate. We'll need to rethink
    this a bit in light of coming Permissions changes, though I think this
    is closer to the end state that we'd want, which is that execpolicy
    changes to one permissions profile should be synced across threads.
    
    ## Testing
    - [x] Added integration test
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • [hooks] userpromptsubmit - hook before user's prompt is executed (#14626)
    - this allows blocking the user's prompts from executing, and also
    prevents them from entering history
    - handles the edge case where you can both prevent the user's prompt AND
    add n amount of additionalContexts
    - refactors some old code into common.rs where hooks overlap
    functionality
    - refactors additionalContext being previously added to user messages,
    instead we use developer messages for them
    - handles queued messages correctly
    
    Sample hook for testing - if you write "[block-user-submit]" this hook
    will stop the thread:
    
    example run
    ```
    › sup
    
    
    • Running UserPromptSubmit hook: reading the observatory notes
    
    UserPromptSubmit hook (completed)
      warning: wizard-tower UserPromptSubmit demo inspected: sup
      hook context: Wizard Tower UserPromptSubmit demo fired. For this reply only, include the exact
    phrase 'observatory lanterns lit' exactly once near the end.
    
    • Just riding the cosmic wave and ready to help, my friend. What are we building today? observatory
      lanterns lit
    
    
    › and [block-user-submit]
    
    
    • Running UserPromptSubmit hook: reading the observatory notes
    
    UserPromptSubmit hook (stopped)
      warning: wizard-tower UserPromptSubmit demo blocked the prompt on purpose.
      stop: Wizard Tower demo block: remove [block-user-submit] to continue.
    ```
    
    .codex/config.toml
    ```
    [features]
    codex_hooks = true
    ```
    
    .codex/hooks.json
    ```
    {
      "hooks": {
        "UserPromptSubmit": [
          {
            "hooks": [
              {
                "type": "command",
                "command": "/usr/bin/python3 .codex/hooks/user_prompt_submit_demo.py",
                "timeoutSec": 10,
                "statusMessage": "reading the observatory notes"
              }
            ]
          }
        ]
      }
    }
    ```
    
    .codex/hooks/user_prompt_submit_demo.py
    ```
    #!/usr/bin/env python3
    
    import json
    import sys
    from pathlib import Path
    
    
    def prompt_from_payload(payload: dict) -> str:
        prompt = payload.get("prompt")
        if isinstance(prompt, str) and prompt.strip():
            return prompt.strip()
    
        event = payload.get("event")
        if isinstance(event, dict):
            user_prompt = event.get("user_prompt")
            if isinstance(user_prompt, str):
                return user_prompt.strip()
    
        return ""
    
    
    def main() -> int:
        payload = json.load(sys.stdin)
        prompt = prompt_from_payload(payload)
        cwd = Path(payload.get("cwd", ".")).name or "wizard-tower"
    
        if "[block-user-submit]" in prompt:
            print(
                json.dumps(
                    {
                        "systemMessage": (
                            f"{cwd} UserPromptSubmit demo blocked the prompt on purpose."
                        ),
                        "decision": "block",
                        "reason": (
                            "Wizard Tower demo block: remove [block-user-submit] to continue."
                        ),
                    }
                )
            )
            return 0
    
        prompt_preview = prompt or "(empty prompt)"
        if len(prompt_preview) > 80:
            prompt_preview = f"{prompt_preview[:77]}..."
    
        print(
            json.dumps(
                {
                    "systemMessage": (
                        f"{cwd} UserPromptSubmit demo inspected: {prompt_preview}"
                    ),
                    "hookSpecificOutput": {
                        "hookEventName": "UserPromptSubmit",
                        "additionalContext": (
                            "Wizard Tower UserPromptSubmit demo fired. "
                            "For this reply only, include the exact phrase "
                            "'observatory lanterns lit' exactly once near the end."
                        ),
                    },
                }
            )
        )
        return 0
    
    
    if __name__ == "__main__":
        raise SystemExit(main())
    ```
  • Handle realtime conversation end in the TUI (#14903)
    - close live realtime sessions on errors, ctrl-c, and active meter
    removal
    - centralize TUI realtime cleanup and avoid duplicate follow-up close
    info
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
    Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
  • Prefer websockets when providers support them (#13592)
    Remove all flags and model settings.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Unify realtime shutdown in core (#14902)
    - route realtime startup, input, and transport failures through a single
    shutdown path
    - emit one realtime error/closed lifecycle while clearing session state
    once
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
    Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
  • Gate realtime audio interruption logic to v2 (#14984)
    - thread the realtime version into conversation start and app-server
    notifications
    - keep playback-aware mic gating and playback interruption behavior on
    v2 only, leaving v1 on the legacy path
  • Rename exec_wait tool to wait (#14983)
    Summary
    - document that code mode only exposes `exec` and the renamed `wait`
    tool
    - update code mode tool spec and descriptions to match the new tool name
    - rename tests and helper references from `exec_wait` to `wait`
    
    Testing
    - Not run (not requested)
  • Stabilize approval matrix write-file command (#14968)
    ## What is flaky
    The approval-matrix `WriteFile` scenario is flaky. It sometimes fails in
    CI even though the approval logic is unchanged, because the test
    delegates the file write and readback to shell parsing instead of
    deterministic file I/O.
    
    ## Why it was flaky
    The test generated a command shaped like `printf ... > file && cat
    file`. That means the scenario depended on shell quoting, redirection,
    newline handling, and encoding behavior in addition to the approval
    system it was actually trying to validate. If the shell interpreted the
    payload differently, the test would report an approval failure even
    though the product logic was fine.
    
    That also made failures hard to diagnose, because the test did not log
    the exact generated command or the parsed result payload.
    
    ## How this PR fixes it
    This PR replaces the shell-redirection path with a deterministic
    `python3 -c` script that writes the file with `Path.write_text(...,
    encoding='utf-8')` and then reads it back with the same UTF-8 path. It
    also logs the generated command and the resulting exit code/stdout for
    the approval scenario so any future failure is directly attributable.
    
    ## Why this fix fixes the flakiness
    The scenario no longer depends on shell parsing and redirection
    semantics. The file contents are produced and read through explicit
    UTF-8 file I/O, so the approval test is measuring approval behavior
    instead of shell behavior. The added diagnostics mean a future failure
    will show the exact command/result pair instead of looking like a
    generic intermittent mismatch.
    
    Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
    Co-authored-by: Codex <noreply@openai.com>
  • Stabilize Windows cmd-based shell test harnesses (#14958)
    ## What is flaky
    The Windows shell-driven integration tests in `codex-rs/core` were
    intermittently unstable, especially:
    
    - `apply_patch_cli_can_use_shell_command_output_as_patch_input`
    - `websocket_test_codex_shell_chain`
    - `websocket_v2_test_codex_shell_chain`
    
    ## Why it was flaky
    These tests were exercising real shell-tool flows through whichever
    shell Codex selected on Windows, and the `apply_patch` test also nested
    a PowerShell read inside `cmd /c`.
    
    There were multiple independent sources of nondeterminism in that setup:
    
    - The test harness depended on the model-selected Windows shell instead
    of pinning the shell it actually meant to exercise.
    - `cmd.exe /c powershell.exe -Command "..."` is quoting-sensitive; on CI
    that could leave the read command wrapped as a literal string instead of
    executing it.
    - Even after getting the quoting right, PowerShell could emit CLIXML
    progress records like module-initialization output onto stdout.
    - The `apply_patch` test was building a patch directly from shell
    stdout, so any quoting artifact or progress noise corrupted the patch
    input.
    
    So the failures were driven by shell startup and output-shape variance,
    not by the `apply_patch` or websocket logic themselves.
    
    ## How this PR fixes it
    - Add a test-only `user_shell_override` path so Windows integration
    tests can pin `cmd.exe` explicitly.
    - Use that override in the websocket shell-chain tests and in the
    `apply_patch` harness.
    - Change the nested Windows file read in
    `apply_patch_cli_can_use_shell_command_output_as_patch_input` to a UTF-8
    PowerShell `-EncodedCommand` script.
    - Run that nested PowerShell process with `-NonInteractive`, set
    `$ProgressPreference = 'SilentlyContinue'`, and read the file with
    `[System.IO.File]::ReadAllText(...)`.
    
    ## Why this fix fixes the flakiness
    The outer harness now runs under a deterministic shell, and the inner
    PowerShell read no longer depends on fragile `cmd` quoting or on
    progress output staying quiet by accident. The shell tool returns only
    the file contents, so patch construction and websocket assertions depend
    on stable test inputs instead of on runner-specific shell behavior.
    
    ---------
    
    Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
    Co-authored-by: Codex <noreply@openai.com>
  • fix(core): prevent hanging turn/start due to websocket warming issues (#14838)
    ## Description
    
    This PR fixes a bad first-turn failure mode in app-server when the
    startup websocket prewarm hangs. Before this change, `initialize ->
    thread/start -> turn/start` could sit behind the prewarm for up to five
    minutes, so the client would not see `turn/started`, and even
    `turn/interrupt` would block because the turn had not actually started
    yet.
    
    Now, we:
    - set a (configurable) timeout of 15s for websocket startup time,
    exposed as `websocket_startup_timeout_ms` in config.toml
    - `turn/started` is sent immediately on `turn/start` even if the
    websocket is still connecting
    - `turn/interrupt` can be used to cancel a turn that is still waiting on
    the websocket warmup
    - the turn task will wait for the full 15s websocket warming timeout
    before falling back
    
    ## Why
    
    The old behavior made app-server feel stuck at exactly the moment the
    client expects turn lifecycle events to start flowing. That was
    especially painful for external clients, because from their point of
    view the server had accepted the request but then went silent for
    minutes.
    
    ## Configuring the websocket startup timeout
    Can set it in config.toml like this:
    ```
    [model_providers.openai]
    supports_websockets = true
    websocket_connect_timeout_ms = 15000
    ```
  • [stack 2/4] Align main realtime v2 wire and runtime flow (#14830)
    ## Stack Position
    2/4. Built on top of #14828.
    
    ## Base
    - #14828
    
    ## Unblocks
    - #14829
    - #14827
    
    ## Scope
    - Port the realtime v2 wire parsing, session, app-server, and
    conversation runtime behavior onto the split websocket-method base.
    - Branch runtime behavior directly on the current realtime session kind
    instead of parser-derived flow flags.
    - Keep regression coverage in the existing e2e suites.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Apply argument comment lint across codex-rs (#14652)
    ## Why
    
    Once the repo-local lint exists, `codex-rs` needs to follow the
    checked-in convention and CI needs to keep it from drifting. This commit
    applies the fallback `/*param*/` style consistently across existing
    positional literal call sites without changing those APIs.
    
    The longer-term preference is still to avoid APIs that require comments
    by choosing clearer parameter types and call shapes. This PR is
    intentionally the mechanical follow-through for the places where the
    existing signatures stay in place.
    
    After rebasing onto newer `main`, the rollout also had to cover newly
    introduced `tui_app_server` call sites. That made it clear the first cut
    of the CI job was too expensive for the common path: it was spending
    almost as much time installing `cargo-dylint` and re-testing the lint
    crate as a representative test job spends running product tests. The CI
    update keeps the full workspace enforcement but trims that extra
    overhead from ordinary `codex-rs` PRs.
    
    ## What changed
    
    - keep a dedicated `argument_comment_lint` job in `rust-ci`
    - mechanically annotate remaining opaque positional literals across
    `codex-rs` with exact `/*param*/` comments, including the rebased
    `tui_app_server` call sites that now fall under the lint
    - keep the checked-in style aligned with the lint policy by using
    `/*param*/` and leaving string and char literals uncommented
    - cache `cargo-dylint`, `dylint-link`, and the relevant Cargo
    registry/git metadata in the lint job
    - split changed-path detection so the lint crate's own `cargo test` step
    runs only when `tools/argument-comment-lint/*` or `rust-ci.yml` changes
    - continue to run the repo wrapper over the `codex-rs` workspace, so
    product-code enforcement is unchanged
    
    Most of the code changes in this commit are intentionally mechanical
    comment rewrites or insertions driven by the lint itself.
    
    ## Verification
    
    - `./tools/argument-comment-lint/run.sh --workspace`
    - `cargo test -p codex-tui-app-server -p codex-tui`
    - parsed `.github/workflows/rust-ci.yml` locally with PyYAML
    
    ---
    
    * -> #14652
    * #14651
  • Add exit helper to code mode scripts (#14851)
    - **Summary**
    - expose `exit` through the code mode bridge and module so scripts can
    stop mid-flight
      - surface the helper in the description documentation
      - add a regression test ensuring `exit()` terminates execution cleanly
    - **Testing**
      - Not run (not requested)
  • Reuse guardian session across approvals (#14668)
    ## Summary
    - reuse a guardian subagent session across approvals so reviews keep a
    stable prompt cache key and avoid one-shot startup overhead
    - clear the guardian child history before each review so prior guardian
    decisions do not leak into later approvals
    - include the `smart_approvals` -> `guardian_approval` feature flag
    rename in the same PR to minimize release latency on a very tight
    timeline
    - add regression coverage for prompt-cache-key reuse without
    prior-review prompt bleed
    
    ## Request
    - Bug/enhancement request: internal guardian prompt-cache and latency
    improvement request
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Preserve background terminals on interrupt and rename cleanup command to /stop (#14602)
    ### Motivation
    - Interrupting a running turn (Ctrl+C / Esc) currently also terminates
    long‑running background shells, which is surprising for workflows like
    local dev servers or file watchers.
    - The existing cleanup command name was confusing; callers expect an
    explicit command to stop background terminals rather than a UI clear
    action.
    - Make background‑shell termination explicit and surface a clearer
    command name while preserving backward compatibility.
    
    ### Description
    - Renamed the background‑terminal cleanup slash command from `Clean`
    (`/clean`) to `Stop` (`/stop`) and kept `clean` as an alias in the
    command parsing/visibility layer, updated the user descriptions and
    command popup wiring accordingly.
    - Updated the unified‑exec footer text and snapshots to point to `/stop`
    (and trimmed corresponding snapshot output to match the new label).
    - Changed interrupt behavior so `Op::Interrupt` (Ctrl+C / Esc interrupt)
    no longer closes or clears tracked unified exec / background terminal
    processes in the TUI or core cleanup path; background shells are now
    preserved after an interrupt.
    - Updated protocol/docs to clarify that `turn/interrupt` (or
    `Op::Interrupt`) interrupts the active turn but does not terminate
    background terminals, and that `thread/backgroundTerminals/clean` is the
    explicit API to stop those shells.
    - Updated unit/integration tests and insta snapshots in the TUI and core
    unified‑exec suites to reflect the new semantics and command name.
    
    ### Testing
    - Ran formatting with `just fmt` in `codex-rs` (succeeded). 
    - Ran `cargo test -p codex-protocol` (succeeded). 
    - Attempted `cargo test -p codex-tui` but the build could not complete
    in this environment due to a native build dependency that requires
    `libcap` development headers (the `codex-linux-sandbox` vendored build
    step); install `libcap-dev` / make `libcap.pc` available in
    `PKG_CONFIG_PATH` to run the TUI test suite locally.
    - Updated and accepted the affected `insta` snapshots for the TUI
    changes so visual diffs reflect the new `/stop` wording and preserved
    interrupt behavior.
    
    ------
    [Codex
    Task](https://chatgpt.com/codex/tasks/task_i_69b39c44b6dc8323bd133ae206310fae)
  • [apps] Improve search tool fallback. (#14732)
    - [x] Bypass tool search and stuff tool specs directly into model
    context when either a. Tool search is not available for the model or b.
    There are not that many tools to search for.
  • [apps] Add tool call meta. (#14647)
    - [x] Add resource_uri and other things to _meta to shortcut resource
    lookup and speed things up.
  • dynamic tool calls: add param exposeToContext to optionally hide tool (#14501)
    This extends dynamic_tool_calls to allow us to hide a tool from the
    model context but still use it as part of the general tool calling
    runtime (for ex from js_repl/code_mode)
  • move plugin/skill instructions into dev msg and reorder (#14609)
    Move the general `Apps`, `Skills` and `Plugins` instructions blocks out
    of `user_instructions` and into the developer message, with new `Apps ->
    Skills -> Plugins` order for better clarity.
    
    Also wrap those sections in stable XML-style instruction tags (like
    other sections) and update prompt-layout tests/snapshots. This makes the
    tests less brittle in snapshot output (we can parse the sections), and
    it consolidates the capability instructions in one place.
    
    #### Tests
    Updated snapshots, added tests.
    
    `<AGENTS_MD>` disappearing in snapshots is expected: before this change,
    the wrapped user-instructions message was kept alive by `Skills`
    content. Now that `Skills` and `Plugins` are in the developer message,
    that wrapper only appears when there is real
    project-doc/user-instructions content.
    
    ---------
    
    Co-authored-by: Charley Cunningham <ccunningham@openai.com>
  • Add openai_base_url config override for built-in provider (#12031)
    We regularly get bug reports from users who mistakenly have the
    `OPENAI_BASE_URL` environment variable set. This PR deprecates this
    environment variable in favor of a top-level config key
    `openai_base_url` that is used for the same purpose. By making it a
    config key, it will be more visible to users. It will also participate
    in all of the infrastructure we've added for layered and managed
    configs.
    
    Summary
    - introduce the `openai_base_url` top-level config key, update
    schema/tests, and route the built-in openai provider through it while
    - fall back to deprecated `OPENAI_BASE_URL` env var but warn user of
    deprecation when no `openai_base_url` config key is present
    - update CLI, SDK, and TUI code to prefer the new config path (with a
    deprecated env-var fallback) and document the SDK behavior change
  • [hooks] stop continuation & stop_hook_active mechanics (#14532)
    Stop hooks now receive `stop_hook_active` and enable stop hooks to loop
    forever if they'd like to. In the initial hooks PR, we implemented a
    simpler mechanic that the stop-blocking could only happen once in a row
    
    - support stop hook adding a continuation prompt to add a further task
    - if multiple stop-blocks happen that have continuation prompts, they
    are concatenated
    
    example run:
    ```
    › hey :)
    
    
    • Running SessionStart hook: lighting the observatory
    
    SessionStart hook (completed)
      warning: Hi, I'm a session start hook for wizard-tower (startup).
      hook context: Startup context: A wimboltine stonpet is an exotic cuisine from hyperspace
    
    • Aloha :) Happy to jam with you. What are we building today?
    
    • Running Stop hook: updating the guards
    
    Stop hook (blocked)
      warning: Wizard Tower Stop hook continuing conversation
      feedback: cook the stonpet
    
    • Aloha, here’s the hyperspace move for cooking a wimboltine stonpet:
    
      1. Sear the stonpet in a hot pan with moon-oil until the edges shimmer.
      2. Add star-lime, black salt, and a little fermented nebula paste.
      3. Lower the heat and let it braise for 8 cosmic minutes with a splash of comet broth.
      4. Finish with sky herbs and serve over warm asteroid rice.
    
      The vibe: crispy outside, tender center, deep interdimensional savor.
    
    • Running Stop hook: updating the guards
    
    Stop hook (completed)
      warning: Wizard Tower Stop hook saw a second pass and stayed calm to avoid a loop.
    ```
    
    .codex/config.toml
    ```
    [features]
    codex_hooks = true
    ```
    
    .codex/hooks.json
    ```
    {
      "hooks": {
        "SessionStart": [
          {
            "matcher": "startup|resume",
            "hooks": [
              {
                "type": "command",
                "command": "/usr/bin/python3 .codex/hooks/session_start_demo.py",
                "timeoutSec": 10,
                "statusMessage": "lighting the observatory"
              }
            ]
          }
        ],
        "Stop": [
          {
            "hooks": [
              {
                "type": "command",
                "command": "/usr/bin/python3 .codex/hooks/stop_demo_block.py",
                "timeoutSec": 10,
                "statusMessage": "updating the guards"
              }
            ]
          }
        ]
      }
    }
    ```
    
    .codex/hooks/session_start_demo.py
    ```
    #!/usr/bin/env python3
    
    import json
    import sys
    from pathlib import Path
    
    
    def main() -> int:
        payload = json.load(sys.stdin)
        cwd = Path(payload.get("cwd", ".")).name or "wizard-tower"
        source = payload.get("source", "startup")
        source_label = "resume" if source == "resume" else "startup"
        source_prefix = (
            "Resume context:"
            if source == "resume"
            else "Startup context:"
        )
    
        output = {
            "systemMessage": (
                f"Hi, I'm a session start hook for {cwd} ({source_label})."
            ),
            "hookSpecificOutput": {
                "hookEventName": "SessionStart",
                "additionalContext": (
                    f"{source_prefix} A wimboltine stonpet is an exotic cuisine from hyperspace"
                ),
            },
        }
        print(json.dumps(output))
        return 0
    
    
    if __name__ == "__main__":
        raise SystemExit(main())
    ```
    
    .codex/hooks/stop_demo_block.py
    ```
    #!/usr/bin/env python3
    
    import json
    import sys
    
    
    def main() -> int:
        payload = json.load(sys.stdin)
        stop_hook_active = payload.get("stop_hook_active", False)
        last_assistant_message = payload.get("last_assistant_message") or ""
        char_count = len(last_assistant_message.strip())
    
        if stop_hook_active:
            system_message = (
                "Wizard Tower Stop hook saw a second pass and stayed calm to avoid a loop."
            )
            print(json.dumps({"systemMessage": system_message}))
        else:
            system_message = (
                f"Wizard Tower Stop hook continuing conversation"
            )
            print(json.dumps({"systemMessage": system_message, "decision": "block", "reason": "cook the stonpet"}))
    
        return 0
    
    
    if __name__ == "__main__":
        raise SystemExit(main())
    ```
  • Add Smart Approvals guardian review across core, app-server, and TUI (#13860)
    ## Summary
    - add `approvals_reviewer = "user" | "guardian_subagent"` as the runtime
    control for who reviews approval requests
    - route Smart Approvals guardian review through core for command
    execution, file changes, managed-network approvals, MCP approvals, and
    delegated/subagent approval flows
    - expose guardian review in app-server with temporary unstable
    `item/autoApprovalReview/{started,completed}` notifications carrying
    `targetItemId`, `review`, and `action`
    - update the TUI so Smart Approvals can be enabled from `/experimental`,
    aligned with the matching `/approvals` mode, and surfaced clearly while
    reviews are pending or resolved
    
    ## Runtime model
    This PR does not introduce a new `approval_policy`.
    
    Instead:
    - `approval_policy` still controls when approval is needed
    - `approvals_reviewer` controls who reviewable approval requests are
    routed to:
      - `user`
      - `guardian_subagent`
    
    `guardian_subagent` is a carefully prompted reviewer subagent that
    gathers relevant context and applies a risk-based decision framework
    before approving or denying the request.
    
    The `smart_approvals` feature flag is a rollout/UI gate. Core runtime
    behavior keys off `approvals_reviewer`.
    
    When Smart Approvals is enabled from the TUI, it also switches the
    current `/approvals` settings to the matching Smart Approvals mode so
    users immediately see guardian review in the active thread:
    - `approval_policy = on-request`
    - `approvals_reviewer = guardian_subagent`
    - `sandbox_mode = workspace-write`
    
    Users can still change `/approvals` afterward.
    
    Config-load behavior stays intentionally narrow:
    - plain `smart_approvals = true` in `config.toml` remains just the
    rollout/UI gate and does not auto-set `approvals_reviewer`
    - the deprecated `guardian_approval = true` alias migration does
    backfill `approvals_reviewer = "guardian_subagent"` in the same scope
    when that reviewer is not already configured there, so old configs
    preserve their original guardian-enabled behavior
    
    ARC remains a separate safety check. For MCP tool approvals, ARC
    escalations now flow into the configured reviewer instead of always
    bypassing guardian and forcing manual review.
    
    ## Config stability
    The runtime reviewer override is stable, but the config-backed
    app-server protocol shape is still settling.
    
    - `thread/start`, `thread/resume`, and `turn/start` keep stable
    `approvalsReviewer` overrides
    - the config-backed `approvals_reviewer` exposure returned via
    `config/read` (including profile-level config) is now marked
    `[UNSTABLE]` / experimental in the app-server protocol until we are more
    confident in that config surface
    
    ## App-server surface
    This PR intentionally keeps the guardian app-server shape narrow and
    temporary.
    
    It adds generic unstable lifecycle notifications:
    - `item/autoApprovalReview/started`
    - `item/autoApprovalReview/completed`
    
    with payloads of the form:
    - `{ threadId, turnId, targetItemId, review, action? }`
    
    `review` is currently:
    - `{ status, riskScore?, riskLevel?, rationale? }`
    - where `status` is one of `inProgress`, `approved`, `denied`, or
    `aborted`
    
    `action` carries the guardian action summary payload from core when
    available. This lets clients render temporary standalone pending-review
    UI, including parallel reviews, even when the underlying tool item has
    not been emitted yet.
    
    These notifications are explicitly documented as `[UNSTABLE]` and
    expected to change soon.
    
    This PR does **not** persist guardian review state onto `thread/read`
    tool items. The intended follow-up is to attach guardian review state to
    the reviewed tool item lifecycle instead, which would improve
    consistency with manual approvals and allow thread history / reconnect
    flows to replay guardian review state directly.
    
    ## TUI behavior
    - `/experimental` exposes the rollout gate as `Smart Approvals`
    - enabling it in the TUI enables the feature and switches the current
    session to the matching Smart Approvals `/approvals` mode
    - disabling it in the TUI clears the persisted `approvals_reviewer`
    override when appropriate and returns the session to default manual
    review when the effective reviewer changes
    - `/approvals` still exposes the reviewer choice directly
    - the TUI renders:
    - pending guardian review state in the live status footer, including
    parallel review aggregation
      - resolved approval/denial state in history
    
    ## Scope notes
    This PR includes the supporting core/runtime work needed to make Smart
    Approvals usable end-to-end:
    - shell / unified-exec / apply_patch / managed-network / MCP guardian
    review
    - delegated/subagent approval routing into guardian review
    - guardian review risk metadata and action summaries for app-server/TUI
    - config/profile/TUI handling for `smart_approvals`, `guardian_approval`
    alias migration, and `approvals_reviewer`
    - a small internal cleanup of delegated approval forwarding to dedupe
    fallback paths and simplify guardian-vs-parent approval waiting (no
    intended behavior change)
    
    Out of scope for this PR:
    - redesigning the existing manual approval protocol shapes
    - persisting guardian review state onto app-server `ThreadItem`s
    - delegated MCP elicitation auto-review (the current delegated MCP
    guardian shim only covers the legacy `RequestUserInput` path)
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Fix wait_agent expectations in core tests (#14637)
    ## Summary
    - update stale core tool-spec expectations from `wait` to `wait_agent`
    - update the prompt-caching tool-name assertion to match the renamed
    tool
    - fix the Bazel regressions introduced after #14631 renamed the
    multi-agent wait tool
    
    ## Testing
    - cargo test -p codex-core tools::spec::tests
    - cargo test -p codex-core
    suite::prompt_caching::prompt_tools_are_consistent_across_requests
    
    Co-authored-by: Codex <noreply@openai.com>
  • Normalize MCP tool names to code-mode safe form (#14605)
    Code mode doesn't allow `-` in names and it's better if function names
    and code-mode names are the same.
  • Stabilize multi-agent feature flag (#14622)
    - make multi_agent stable and enabled by default
    - update feature and tool-spec coverage to match the new default
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Add code_mode_only feature (#14617)
    Summary
    - add the code_mode_only feature flag/config schema and wire its
    dependency on code_mode
    - update code mode tool descriptions to list nested tools with detailed
    headers
    - restrict available tools for prompt and exec descriptions when
    code_mode_only is enabled and test the behavior
    
    Testing
    - Not run (not requested)
  • chore: clarify plugin + app copy in model instructions (#14541)
    - clarify app mentions are in user messages
    - clarify what it means for tools to be provided via `codex_apps` MCP
    - add plugin descriptions (with basic sanitization) to top-level `##
    Plugins` section alongside the corresponding plugin names
    - explain that skills from plugins are prefixed with `plugin_name:` in
    top-level `##Plugins` section
    
    changes to more logically organize `Apps`, `Skills`, and `Plugins`
    instructions will be in a separate PR, as that shuffles dev + user
    instructions in ways that change tests broadly.
    
    ### Tests
    confirmed in local rollout, some new tests.
  • sending back imagaegencall response back to responseapi (#14558)
    Sending back the ResponseItem::ImageGenerationCall as is, because it is
    now supported from the API-side.
  • Use a private desktop for Windows sandbox instead of Winsta0\Default (#14400)
    ## Summary
    - launch Windows sandboxed children on a private desktop instead of
    `Winsta0\Default`
    - make private desktop the default while keeping
    `windows.sandbox_private_desktop=false` as the escape hatch
    - centralize process launch through the shared
    `create_process_as_user(...)` path
    - scope the private desktop ACL to the launching logon SID
    
    ## Why
    Today sandboxed Windows commands run on the visible shared desktop. That
    leaves an avoidable same-desktop attack surface for window interaction,
    spoofing, and related UI/input issues. This change moves sandboxed
    commands onto a dedicated per-launch desktop by default so the sandbox
    no longer shares `Winsta0\Default` with the user session.
    
    The implementation stays conservative on security with no silent
    fallback back to `Winsta0\Default`
    
    If private-desktop setup fails on a machine, users can still opt out
    explicitly with `windows.sandbox_private_desktop=false`.
    
    ## Validation
    - `cargo build -p codex-cli`
    - elevated-path `codex exec` desktop-name probe returned
    `CodexSandboxDesktop-*`
    - elevated-path `codex exec` smoke sweep for shell commands, nested
    `pwsh`, jobs, and hidden `notepad` launch
    - unelevated-path full private-desktop compatibility sweep via `codex
    exec` with `-c windows.sandbox=unelevated`