Commit Graph

123 Commits

  • feat: change multi-agent to use path-like system instead of uuids (#15313)
    This PR add an URI-based system to reference agents within a tree. This
    comes from a sync between research and engineering.
    
    The main agent (the one manually spawned by a user) is always called
    `/root`. Any sub-agent spawned by it will be `/root/agent_1` for example
    where `agent_1` is chosen by the model.
    
    Any agent can contact any agents using the path.
    
    Paths can be used either in absolute or relative to the calling agents
    
    Resume is not supported for now on this new path
  • Split features into codex-features crate (#15253)
    - Split the feature system into a new `codex-features` crate.
    - Cut `codex-core` and workspace consumers over to the new config and
    warning APIs.
    
    Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
    Co-authored-by: Codex <noreply@openai.com>
  • feat: support product-scoped plugins. (#15041)
    1. Added SessionSource::Custom(String) and --session-source.
      2. Enforced plugin and skill products by session_source.
      3. Applied the same filtering to curated background refresh.
  • Add thread/shellCommand to app server API surface (#14988)
    This PR adds a new `thread/shellCommand` app server API so clients can
    implement `!` shell commands. These commands are executed within the
    sandbox, and the command text and output are visible to the model.
    
    The internal implementation mirrors the current TUI `!` behavior.
    - persist shell command execution as `CommandExecution` thread items,
    including source and formatted output metadata
    - bridge live and replayed app-server command execution events back into
    the existing `tui_app_server` exec rendering path
    
    This PR also wires `tui_app_server` to submit `!` commands through the
    new API.
  • Revert "fix: harden plugin feature gating" (#15102)
    Reverts openai/codex#15020
    
    I messed up the commit in my PR and accidentally merged changes that
    were still under review.
  • fix: harden plugin feature gating (#15020)
    1. Use requirement-resolved config.features as the plugin gate.
    2. Guard plugin/list, plugin/read, and related flows behind that gate.
    3. Skip bad marketplace.json files instead of failing the whole list.
    4. Simplify plugin state and caching.
  • Prefer websockets when providers support them (#13592)
    Remove all flags and model settings.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Fix fuzzy search notification buffering in app-server tests (#14955)
    ## What is flaky
    `codex-rs/app-server/tests/suite/fuzzy_file_search.rs` intermittently
    loses the expected `fuzzyFileSearch/sessionUpdated` and
    `fuzzyFileSearch/sessionCompleted` notifications when multiple
    fuzzy-search sessions are active and CI delivers notifications out of
    order.
    
    ## Why it was flaky
    The wait helpers were keyed only by JSON-RPC method name.
    
    - `wait_for_session_updated` consumed the next
    `fuzzyFileSearch/sessionUpdated` notification even when it belonged to a
    different search session.
    - `wait_for_session_completed` did the same for
    `fuzzyFileSearch/sessionCompleted`.
    - Once an unmatched notification was read, it was dropped permanently
    instead of buffered.
    - That meant a valid completion for the target search could arrive
    slightly early, be consumed by the wrong waiter, and disappear before
    the test started waiting for it.
    
    The result depended on notification ordering and runner scheduling
    instead of on the actual product behavior.
    
    ## How this PR fixes it
    - Add a buffered notification reader in
    `codex-rs/app-server/tests/common/mcp_process.rs`.
    - Match fuzzy-search notifications on the identifying payload fields
    instead of matching only on method name.
    - Preserve unmatched notifications in the in-process queue so later
    waiters can still consume them.
    - Include pending notification methods in timeout failures to make
    future diagnosis concrete.
    
    ## Why this fix fixes the flakiness
    The test now behaves like a real consumer of an out-of-order event
    stream: notifications for other sessions stay buffered until the correct
    waiter asks for them. Reordering no longer loses the target event, so
    the test result is determined by whether the server emitted the right
    notifications, not by which one happened to be read first.
    
    Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
    Co-authored-by: Codex <noreply@openai.com>
  • Apply argument comment lint across codex-rs (#14652)
    ## Why
    
    Once the repo-local lint exists, `codex-rs` needs to follow the
    checked-in convention and CI needs to keep it from drifting. This commit
    applies the fallback `/*param*/` style consistently across existing
    positional literal call sites without changing those APIs.
    
    The longer-term preference is still to avoid APIs that require comments
    by choosing clearer parameter types and call shapes. This PR is
    intentionally the mechanical follow-through for the places where the
    existing signatures stay in place.
    
    After rebasing onto newer `main`, the rollout also had to cover newly
    introduced `tui_app_server` call sites. That made it clear the first cut
    of the CI job was too expensive for the common path: it was spending
    almost as much time installing `cargo-dylint` and re-testing the lint
    crate as a representative test job spends running product tests. The CI
    update keeps the full workspace enforcement but trims that extra
    overhead from ordinary `codex-rs` PRs.
    
    ## What changed
    
    - keep a dedicated `argument_comment_lint` job in `rust-ci`
    - mechanically annotate remaining opaque positional literals across
    `codex-rs` with exact `/*param*/` comments, including the rebased
    `tui_app_server` call sites that now fall under the lint
    - keep the checked-in style aligned with the lint policy by using
    `/*param*/` and leaving string and char literals uncommented
    - cache `cargo-dylint`, `dylint-link`, and the relevant Cargo
    registry/git metadata in the lint job
    - split changed-path detection so the lint crate's own `cargo test` step
    runs only when `tools/argument-comment-lint/*` or `rust-ci.yml` changes
    - continue to run the repo wrapper over the `codex-rs` workspace, so
    product-code enforcement is unchanged
    
    Most of the code changes in this commit are intentionally mechanical
    comment rewrites or insertions driven by the lint itself.
    
    ## Verification
    
    - `./tools/argument-comment-lint/run.sh --workspace`
    - `cargo test -p codex-tui-app-server -p codex-tui`
    - parsed `.github/workflows/rust-ci.yml` locally with PyYAML
    
    ---
    
    * -> #14652
    * #14651
  • Add openai_base_url config override for built-in provider (#12031)
    We regularly get bug reports from users who mistakenly have the
    `OPENAI_BASE_URL` environment variable set. This PR deprecates this
    environment variable in favor of a top-level config key
    `openai_base_url` that is used for the same purpose. By making it a
    config key, it will be more visible to users. It will also participate
    in all of the infrastructure we've added for layered and managed
    configs.
    
    Summary
    - introduce the `openai_base_url` top-level config key, update
    schema/tests, and route the built-in openai provider through it while
    - fall back to deprecated `OPENAI_BASE_URL` env var but warn user of
    deprecation when no `openai_base_url` config key is present
    - update CLI, SDK, and TUI code to prefer the new config path (with a
    deprecated env-var fallback) and document the SDK behavior change
  • app-server: add v2 filesystem APIs (#14245)
    Add a protocol-level filesystem surface to the v2 app-server so Codex
    clients can read and write files, inspect directories, and subscribe to
    path changes without relying on host-specific helpers.
    
    High-level changes:
    - define the new v2 fs/readFile, fs/writeFile, fs/createDirectory,
    fs/getMetadata, fs/readDirectory, fs/remove, fs/copy RPCs
    - implement the app-server handlers, including absolute-path validation,
    base64 file payloads, recursive copy/remove semantics
    - document the API, regenerate protocol schemas/types, and add
    end-to-end tests for filesystem operations, copy edge cases
    
    Testing plan:
    - validate protocol serialization and generated schema output for the
    new fs request, response, and notification types
    - run app-server integration coverage for file and directory CRUD paths,
    metadata/readDirectory responses, copy failure modes, and absolute-path
    validation
  • Add plugin usage telemetry (#14531)
    adding metrics including: 
    * plugin used
    * plugin installed/uninstalled
    * plugin enabled/disabled
  • feat: add plugin/read. (#14445)
    return more information for a specific plugin.
  • chore(app-server): stop emitting codex/event/ notifications (#14392)
    ## Description
    
    This PR stops emitting legacy `codex/event/*` notifications from the
    public app-server transports.
    
    It's been a long time coming! app-server was still producing a raw
    notification stream from core, alongside the typed app-server
    notifications and server requests, for compatibility reasons. Now,
    external clients should no longer be depending on those legacy
    notifications, so this change removes them from the stdio and websocket
    contract and updates the surrounding docs, examples, and tests to match.
    
    ### Caveat
    I left the "in-process" version of app-server alone for now, since
    `codex exec` was recently based on top of app-server via this in-process
    form here: https://github.com/openai/codex/pull/14005
    
    Seems like `codex exec` still consumes some legacy notifications
    internally, so this branch only removes `codex/event/*` from app-server
    over stdio and websockets.
    
    ## Follow-up
    
    Once `codex exec` is fully migrated off `codex/event/*` notifications,
    we'll be able to stop emitting them entirely entirely instead of just
    filtering it at the external transport boundary.
  • chore: plugin/uninstall endpoint (#14111)
    add `plugin/uninstall` app-server endpoint to fully rm plugin from
    plugins cache dir and rm entry from user config file.
    
    plugin-enablement is session-scoped, so uninstalls are only picked up in
    new sessions (like installs).
    
    added tests.
  • Add request permissions tool (#13092)
    Adds a built-in `request_permissions` tool and wires it through the
    Codex core, protocol, and app-server layers so a running turn can ask
    the client for additional permissions instead of relying on a static
    session policy.
    
    The new flow emits a `RequestPermissions` event from core, tracks the
    pending request by call ID, forwards it through app-server v2 as an
    `item/permissions/requestApproval` request, and resumes the tool call
    once the client returns an approved subset of the requested permission
    profile.
  • app-server: Add streaming and tty/pty capabilities to command/exec (#13640)
    * Add an ability to stream stdin, stdout, and stderr
    * Streaming of stdout and stderr has a configurable cap for total amount
    of transmitted bytes (with an ability to disable it)
    * Add support for overriding environment variables
    * Add an ability to terminate running applications (using
    `command/exec/terminate`)
    * Add TTY/PTY support, with an ability to resize the terminal (using
    `command/exec/resize`)
  • check app auth in plugin/install (#13685)
    #### What
    on `plugin/install`, check if installed apps are already authed on
    chatgpt, and return list of all apps that are not. clients can use this
    list to trigger auth workflows as needed.
    
    checks are best effort based on `codex_apps` loading, much like
    `app/list`.
    
    #### Tests
    Added integration tests, tested locally.
  • support plugin/list. (#13540)
    Introduce a plugin/list which reads from local marketplace.json.
    Also update the signature for plugin/install.
  • chore: add web_search_tool_type for image support (#13538)
    add `web_search_tool_type` on model_info that can be populated from
    backend. will be used to filter which models can use `web_search` with
    images and which cant.
    
    added small unit test.
  • Add under-development original-resolution view_image support (#13050)
    ## Summary
    
    Add original-resolution support for `view_image` behind the
    under-development `view_image_original_resolution` feature flag.
    
    When the flag is enabled and the target model is `gpt-5.3-codex` or
    newer, `view_image` now preserves original PNG/JPEG/WebP bytes and sends
    `detail: "original"` to the Responses API instead of using the legacy
    resize/compress path.
    
    ## What changed
    
    - Added `view_image_original_resolution` as an under-development feature
    flag.
    - Added `ImageDetail` to the protocol models and support for serializing
    `detail: "original"` on tool-returned images.
    - Added `PromptImageMode::Original` to `codex-utils-image`.
      - Preserves original PNG/JPEG/WebP bytes.
      - Keeps legacy behavior for the resize path.
    - Updated `view_image` to:
    - use the shared `local_image_content_items_with_label_number(...)`
    helper in both code paths
      - select original-resolution mode only when:
        - the feature flag is enabled, and
        - the model slug parses as `gpt-5.3-codex` or newer
    - Kept local user image attachments on the existing resize path; this
    change is specific to `view_image`.
    - Updated history/image accounting so only `detail: "original"` images
    use the docs-based GPT-5 image cost calculation; legacy images still use
    the old fixed estimate.
    - Added JS REPL guidance, gated on the same feature flag, to prefer JPEG
    at 85% quality unless lossless is required, while still allowing other
    formats when explicitly requested.
    - Updated tests and helper code that construct
    `FunctionCallOutputContentItem::InputImage` to carry the new `detail`
    field.
    
    ## Behavior
    
    ### Feature off
    - `view_image` keeps the existing resize/re-encode behavior.
    - History estimation keeps the existing fixed-cost heuristic.
    
    ### Feature on + `gpt-5.3-codex+`
    - `view_image` sends original-resolution images with `detail:
    "original"`.
    - PNG/JPEG/WebP source bytes are preserved when possible.
    - History estimation uses the GPT-5 docs-based image-cost calculation
    for those `detail: "original"` images.
    
    
    #### [git stack](https://github.com/magus/git-stack-cli)
    - 👉 `1` https://github.com/openai/codex/pull/13050
    -  `2` https://github.com/openai/codex/pull/13331
    -  `3` https://github.com/openai/codex/pull/13049
  • Add thread metadata update endpoint to app server (#13280)
    ## Summary
    - add the v2 `thread/metadata/update` API, including
    protocol/schema/TypeScript exports and app-server docs
    - patch stored thread `gitInfo` in sqlite without resuming the thread,
    with validation plus support for explicit `null` clears
    - repair missing sqlite thread rows from rollout data before patching,
    and make those repairs safe by inserting only when absent and updating
    only git columns so newer metadata is not clobbered
    - keep sqlite authoritative for mutable thread git metadata by
    preserving existing sqlite git fields during reconcile/backfill and only
    using rollout `SessionMeta` git fields to fill gaps
    - add regression coverage for the endpoint, repair paths, concurrent
    sqlite writes, clearing git fields, and rollout/backfill reconciliation
    - fix the login server shutdown race so cancelling before the waiter
    starts still terminates `block_until_done()` correctly
    
    ## Testing
    - `cargo test -p codex-state
    apply_rollout_items_preserves_existing_git_branch_and_fills_missing_git_fields`
    - `cargo test -p codex-state
    update_thread_git_info_preserves_newer_non_git_metadata`
    - `cargo test -p codex-core
    backfill_sessions_preserves_existing_git_branch_and_fills_missing_git_fields`
    - `cargo test -p codex-app-server thread_metadata_update`
    - `cargo test`
    - currently fails in existing `codex-core` grep-files tests with
    `unsupported call: grep_files`:
        - `suite::grep_files::grep_files_tool_collects_matches`
        - `suite::grep_files::grep_files_tool_reports_empty_results`
  • chore(app-server): delete v1 RPC methods and notifications (#13375)
    ## Summary
    This removes the old app-server v1 methods and notifications we no
    longer need, while keeping the small set the main codex app client still
    depends on for now.
    
    The remaining legacy surface is:
    - `initialize`
    - `getConversationSummary`
    - `getAuthStatus`
    - `gitDiffToRemote`
    - `fuzzyFileSearch`
    - `fuzzyFileSearch/sessionStart`
    - `fuzzyFileSearch/sessionUpdate`
    - `fuzzyFileSearch/sessionStop`
    
    And the raw `codex/event/*` notifications emitted from core. These
    notifications will be removed in a followup PR.
    
    ## What changed
    - removed deprecated v1 request variants from the protocol and
    app-server dispatcher
    - removed deprecated typed notifications: `authStatusChange`,
    `loginChatGptComplete`, and `sessionConfigured`
    - updated the app-server test client to use v2 flows instead of deleted
    v1 flows
    - deleted legacy-only app-server test suites and added focused coverage
    for `getConversationSummary`
    - regenerated app-server schema fixtures and updated the MCP interface
    docs to match the remaining compatibility surface
    
    ## Testing
    - `just write-app-server-schema`
    - `cargo test -p codex-app-server-protocol`
    - `cargo test -p codex-app-server`
  • feat(app-server): add tracing to all app-server APIs (#13285)
    ### Overview
    This PR adds the first piece of tracing for app-server JSON-RPC
    requests.
    
    There are two main changes:
    - JSON-RPC requests can now take an optional W3C trace context at the
    top level via a `trace` field (`traceparent` / `tracestate`).
    - app-server now creates a dedicated request span for every inbound
    JSON-RPC request in `MessageProcessor`, and uses the request-level trace
    context as the parent when present.
    
    For compatibility with existing flows, app-server still falls back to
    the TRACEPARENT env var when there is no request-level traceparent.
    
    This PR is intentionally scoped to the app-server boundary. In a
    followup, we'll actually propagate trace context through the async
    handoff into core execution spans like run_turn, which will make
    app-server traces much more useful.
    
    ### Spans
    A few details on the app-server span shape:
    - each inbound request gets its own server span
    - span/resource names are based on the JSON-RPC method (`initialize`,
    `thread/start`, `turn/start`, etc.)
    - spans record transport (stdio vs websocket), request id, connection
    id, and client name/version when available
    - `initialize` stores client metadata in session state so later requests
    on the same connection can reuse it
  • feat: polluted memories (#13008)
    Add a feature flag to disable memory creation for "polluted"
  • Add model availability NUX metadata (#12972)
    - replace show_nux with structured availability_nux model metadata
    - expose availability NUX data through the app-server model API
    - update shared fixtures and tests for the new field
  • Use model catalog default for reasoning summary fallback (#12873)
    ## Summary
    - make `Config.model_reasoning_summary` optional so unset means use
    model default
    - resolve the optional config value to a concrete summary when building
    `TurnContext`
    - add protocol support for `default_reasoning_summary` in model metadata
    
    ## Validation
    - `cargo test -p codex-core --lib client::tests -- --nocapture`
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • feat(app-server): thread/unsubscribe API (#10954)
    Adds a new v2 app-server API for a client to be able to unsubscribe to a
    thread:
    - New RPC method: `thread/unsubscribe`
    - New server notification: `thread/closed`
    
    Today clients can start/resume/archive threads, but there wasn’t a way
    to explicitly unload a live thread from memory without archiving it.
    With `thread/unsubscribe`, a client can indicate it is no longer
    actively working with a live Thread. If this is the only client
    subscribed to that given thread, the thread will be automatically closed
    by app-server, at which point the server will send `thread/closed` and
    `thread/status/changed` with `status: notLoaded` notifications.
    
    This gives clients a way to prevent long-running app-server processes
    from accumulating too many thread (and related) objects in memory.
    
    Closed threads will also be removed from `thread/loaded/list`.
  • Add app-server v2 thread realtime API (#12715)
    Add experimental `thread/realtime/*` v2 requests and notifications, then
    route app-server realtime events through that thread-scoped surface with
    integration coverage.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Agent jobs (spawn_agents_on_csv) + progress UI (#10935)
    ## Summary
    - Add agent job support: spawn a batch of sub-agents from CSV, auto-run,
    auto-export, and store results in SQLite.
    - Simplify workflow: remove run/resume/get-status/export tools; spawn is
    deterministic and completes in one call.
    - Improve exec UX: stable, single-line progress bar with ETA; suppress
    sub-agent chatter in exec.
    
    ## Why
    Enables map-reduce style workflows over arbitrarily large repos using
    the existing Codex orchestrator. This addresses review feedback about
    overly complex job controls and non-deterministic monitoring.
    
    ## Demo (progress bar)
    ```
    ./codex-rs/target/debug/codex exec \
      --enable collab \
      --enable sqlite \
      --full-auto \
      --progress-cursor \
      -c agents.max_threads=16 \
      -C /Users/daveaitel/code/codex \
      - <<'PROMPT'
    Create /tmp/agent_job_progress_demo.csv with columns: path,area and 30 rows:
    path = item-01..item-30, area = test.
    
    Then call spawn_agents_on_csv with:
    - csv_path: /tmp/agent_job_progress_demo.csv
    - instruction: "Run `python - <<'PY'` to sleep a random 0.3–1.2s, then output JSON with keys: path, score (int). Set score = 1."
    - output_csv_path: /tmp/agent_job_progress_demo_out.csv
    PROMPT
    ```
    
    ## Review feedback addressed
    - Auto-start jobs on spawn; removed run/resume/status/export tools.
    - Auto-export on success.
    - More descriptive tool spec + clearer prompts.
    - Avoid deadlocks on spawn failure; pending/running handled safely.
    - Progress bar no longer scrolls; stable single-line redraw.
    
    ## Tests
    - `cd codex-rs && cargo test -p codex-exec`
    - `cd codex-rs && cargo build -p codex-cli`
  • chore: rm hardcoded PRESETS list (#12650)
    rm `PRESETS` list harcoded in `model_presets` as we now have bundled
    `models.json` with equivalent info.
    
    update logic to rely on bundled models instead, update tests.
  • Add field to Thread object for the latest rename set for a given thread (#12301)
    Exposes through the app server updated names set for a thread. This
    enables other surfaces to use the core as the source of truth for thread
    naming. `threadName` is gathered using the helper functions used to
    interact with `session_index.jsonl`, and is hydrated in:
    - `thread/list`
    - `thread/read`
    - `thread/resume`
    - `thread/unarchive`
    - `thread/rollback`
    
    We don't do this for `thread/start` and `thread/fork`.
  • feat: add nick name to sub-agents (#12320)
    Adding random nick name to sub-agents. Used for UX
    
    At the same time, also storing and wiring the role of the sub-agent
  • tests: centralize in-flight turn cleanup helper (#12271)
    ## Why
    
    Several tests intentionally exercise behavior while a turn is still
    active. The cleanup sequence for those tests (`turn/interrupt` + waiting
    for `codex/event/turn_aborted`) was duplicated across files, which made
    the rationale easy to lose and the pattern easy to apply inconsistently.
    
    This change centralizes that cleanup in one place with a single
    explanatory doc comment.
    
    ## What Changed
    
    ### Added shared helper
    
    In `codex-rs/app-server/tests/common/mcp_process.rs`:
    
    - Added `McpProcess::interrupt_turn_and_wait_for_aborted(...)`.
    - Added a doc comment explaining why explicit interrupt + terminal wait
    is required for tests that intentionally leave a turn in-flight.
    
    ### Migrated call sites
    
    Replaced duplicated interrupt/aborted blocks with the helper in:
    
    - `codex-rs/app-server/tests/suite/v2/thread_resume.rs`
      - `thread_resume_rejects_history_when_thread_is_running`
      - `thread_resume_rejects_mismatched_path_when_thread_is_running`
    - `codex-rs/app-server/tests/suite/v2/turn_start_zsh_fork.rs`
      - `turn_start_shell_zsh_fork_executes_command_v2`
    -
    `turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2`
    - `codex-rs/app-server/tests/suite/v2/turn_steer.rs`
      - `turn_steer_returns_active_turn_id`
    
    ### Existing cleanup retained
    
    In `codex-rs/app-server/tests/suite/v2/turn_start.rs`:
    
    - `turn_start_accepts_local_image_input` continues to explicitly wait
    for `turn/completed` so the turn lifecycle is fully drained before test
    exit.
    
    ## Verification
    
    - `cargo test -p codex-app-server`
  • app-server tests: reduce intermittent nextest LEAK via graceful child shutdown (#12266)
    ## Why
    `cargo nextest` was intermittently reporting `LEAK` for
    `codex-app-server` tests even when assertions passed. This adds noise
    and flakiness to local/CI signals.
    
    Sample output used as the basis of this investigation:
    
    ```text
    LEAK [   7.578s] ( 149/3663) codex-app-server::all suite::output_schema::send_user_turn_output_schema_is_per_turn_v1
    LEAK [   7.383s] ( 210/3663) codex-app-server::all suite::v2::dynamic_tools::dynamic_tool_call_round_trip_sends_text_content_items_to_model
    LEAK [   7.768s] ( 213/3663) codex-app-server::all suite::v2::dynamic_tools::thread_start_injects_dynamic_tools_into_model_requests
    LEAK [   8.841s] ( 224/3663) codex-app-server::all suite::v2::output_schema::turn_start_accepts_output_schema_v2
    LEAK [   8.151s] ( 225/3663) codex-app-server::all suite::v2::plan_item::plan_mode_uses_proposed_plan_block_for_plan_item
    LEAK [   8.230s] ( 232/3663) codex-app-server::all suite::v2::safety_check_downgrade::openai_model_header_mismatch_emits_model_rerouted_notification_v2
    LEAK [   6.472s] ( 273/3663) codex-app-server::all suite::v2::turn_start::turn_start_accepts_collaboration_mode_override_v2
    LEAK [   6.107s] ( 275/3663) codex-app-server::all suite::v2::turn_start::turn_start_accepts_personality_override_v2
    ```
    
    ## How I Reproduced
    I focused on the suspect tests and ran them under `nextest` stress mode
    with leak reporting enabled.
    
    ```bash
    cargo nextest run -p codex-app-server -j 2 --no-fail-fast --stress-count 25 --status-level leak --final-status-level fail -E 'test(suite::output_schema::send_user_turn_output_schema_is_per_turn_v1) | test(suite::v2::dynamic_tools::dynamic_tool_call_round_trip_sends_text_content_items_to_model) | test(suite::v2::dynamic_tools::thread_start_injects_dynamic_tools_into_model_requests) | test(suite::v2::output_schema::turn_start_accepts_output_schema_v2) | test(suite::v2::plan_item::plan_mode_uses_proposed_plan_block_for_plan_item) | test(suite::v2::safety_check_downgrade::openai_model_header_mismatch_emits_model_rerouted_notification_v2) | test(suite::v2::turn_start::turn_start_accepts_collaboration_mode_override_v2) | test(suite::v2::turn_start::turn_start_accepts_personality_override_v2)'
    ```
    
    This reproduced intermittent `LEAK` statuses while tests still passed.
    
    ## What Changed
    In `codex-rs/app-server/tests/common/mcp_process.rs`:
    
    - Changed `stdin: ChildStdin` to `stdin: Option<ChildStdin>` so teardown
    can explicitly close stdin.
    - In `Drop`, close stdin first to trigger EOF-based graceful shutdown.
    - Wait briefly for graceful exit.
    - If still running, fall back to `start_kill()` and the existing bounded
    `try_wait()` loop.
    - Updated send-path handling to bail if stdin is already closed.
    
    ## Why This Is the Right Fix
    The leak signal was caused by child-process teardown timing, not
    test-logic assertion failure. The helper previously relied mostly on
    force-kill timing in `Drop`; that can race with nextest leak detection.
    
    Closing stdin first gives `codex-app-server` a deterministic, graceful
    shutdown path before force-kill. Keeping the force-kill fallback
    preserves robustness if graceful shutdown does not complete in time.
    
    ## Verification
    - `cargo test -p codex-app-server`
    - Re-ran the stress repro above after this change: no `LEAK` statuses
    observed.
    - Additional high-signal stress run also showed no leaks:
    
    ```bash
    cargo nextest run -p codex-app-server -j 2 --no-fail-fast --stress-count 100 --status-level leak --final-status-level fail -E 'test(suite::output_schema::send_user_turn_output_schema_is_per_turn_v1) | test(suite::v2::dynamic_tools::dynamic_tool_call_round_trip_sends_text_content_items_to_model)'
    ```
  • Stabilize app-server detached review and running-resume tests (#12203)
    ## Summary
    - stabilize
    `thread_resume_rejoins_running_thread_even_with_override_mismatch` by
    using a valid delayed second SSE response instead of an intentionally
    truncated stream
    - set `RUST_MIN_STACK=4194304` for spawned app-server test processes in
    `McpProcess` to avoid stack-sensitive CI overflows in detached review
    tests
    
    ## Why
    - the thread-resume assertion could race with a mocked stream-disconnect
    error and intermittently observe `systemError`
    - detached review startup is stack-sensitive in some CI environments;
    pinning a larger stack in the test harness removes that flake without
    changing product behavior
    
    ## Validation
    - `just fmt`
    - `cargo test -p codex-app-server --test all
    suite::v2::thread_resume::thread_resume_rejoins_running_thread_even_with_override_mismatch`
    - `cargo test -p codex-app-server --test all
    suite::v2::review::review_start_with_detached_delivery_returns_new_thread_id`
  • app-server support for Windows sandbox setup. (#12025)
    app-server support for initiating Windows sandbox setup.
    server responds quickly to setup request and makes a future RPC call
    back to client when the setup finishes.
    
    The TUI implementation is unaffected but in a future PR I'll update the
    TUI to use the shared setup helper
    (`windows_sandbox.run_windows_sandbox_setup`)
  • chore: rm remote models fflag (#11699)
    rm `remote_models` feature flag.
    
    We see issues like #11527 when a user has `remote_models` disabled, as
    we always use the default fallback `ModelInfo`. This causes issues with
    model performance.
    
    Builds on #11690, which helps by warning the user when they are using
    the default fallback. This PR will make that happen much less frequently
    as an accidental consequence of disabling `remote_models`.
  • fix: show user warning when using default fallback metadata (#11690)
    ### What
    It's currently unclear when the harness falls back to the default,
    generic `ModelInfo`. This happens when the `remote_models` feature is
    disabled or the model is truly unknown, and can lead to bad performance
    and issues in the harness.
    
    Add a user-facing warning when this happens so they are aware when their
    setup is broken.
    
    ### Tests
    Added tests, tested locally.
  • Reapply "Add app-server transport layer with websocket support" (#11370)
    Reapply "Add app-server transport layer with websocket support" with
    additional fixes from https://github.com/openai/codex/pull/11313/changes
    to avoid deadlocking.
    
    This reverts commit 47356ff83c.
    
    ## Summary
    
    To avoid deadlocking when queues are full, we maintain separate tokio
    tasks dedicated to incoming vs outgoing event handling
    - split the app-server main loop into two tasks in
    `run_main_with_transport`
       - inbound handling (`transport_event_rx`)
       - outbound handling (`outgoing_rx` + `thread_created_rx`)
    - separate incoming and outgoing websocket tasks
    
    ## Validation
    
    Integration tests, testing thoroughly e2e in codex app w/ >10 concurrent
    requests
    
    <img width="1365" height="979" alt="Screenshot 2026-02-10 at 2 54 22 PM"
    src="https://github.com/user-attachments/assets/47ca2c13-f322-4e5c-bedd-25859cbdc45f"
    />
    
    ---------
    
    Co-authored-by: jif-oai <jif@openai.com>
  • Remove test-support feature from codex-core and replace it with explicit test toggles (#11405)
    ## Why
    
    `codex-core` was being built in multiple feature-resolved permutations
    because test-only behavior was modeled as crate features. For a large
    crate, those permutations increase compile cost and reduce cache reuse.
    
    ## Net Change
    
    - Removed the `test-support` crate feature and related feature wiring so
    `codex-core` no longer needs separate feature shapes for test consumers.
    - Standardized cross-crate test-only access behind
    `codex_core::test_support`.
    - External test code now imports helpers from
    `codex_core::test_support`.
    - Underlying implementation hooks are kept internal (`pub(crate)`)
    instead of broadly public.
    
    ## Outcome
    
    - Fewer `codex-core` build permutations.
    - Better incremental cache reuse across test targets.
    - No intended production behavior change.
  • Prefer websocket transport when model opts in (#11386)
    Summary
    - add a `prefer_websockets` field to `ModelInfo`, defaulting to `false`
    in all fixtures and constructors
    - wire the new flag into websocket selection so models that opt in
    always use websocket transport even when the feature gate is off
    
    Testing
    - Not run (not requested)
  • feat: opt-out of events in the app-server (#11319)
    Add `optOutNotificationMethods` in the app-server to opt-out events
    based on exact method matching
  • fix(app-server): for external auth, replace id_token with chatgpt_acc… (#11240)
    …ount_id and chatgpt_plan_type
    
    ### Summary
    Following up on external auth mode which was introduced here:
    https://github.com/openai/codex/pull/10012
    
    Turns out some clients have a differently shaped ID token and don't have
    a chosen workspace (aka chatgpt_account_id) encoded in their ID token.
    So, let's replace `id_token` param with `chatgpt_account_id` and
    `chatgpt_plan_type` (optional) when initializing the external ChatGPT
    auth mode (`account/login/start` with `chatgptAuthTokens`).
    
    The client was able to test end-to-end with a Codex build from this
    branch and verified it worked!
  • test: deflake nextest child-process leak in MCP harnesses (#11263)
    ## Summary
    - add deterministic child-process cleanup to both test `McpProcess`
    helpers
    - keep Tokio `kill_on_drop(true)` but also reap via bounded `try_wait()`
    polling in `Drop`
    - document the failure mode and why this avoids nondeterministic `LEAK`
    flakes
    
    ## Why
    `cargo nextest` leak detection can intermittently report `LEAK` when a
    spawned server outlives test teardown, making CI flaky.
    
    ## Testing
    - `just fmt`
    - `cargo test -p codex-app-server`
    - `cargo test -p codex-mcp-server`
    
    
    ## Failing CI Reference
    - Original failing job:
    https://github.com/openai/codex/actions/runs/21845226299/job/63039443593?pr=11245
  • feat(app-server): turn/steer API (#10821)
    This PR adds a dedicated `turn/steer` API for appending user input to an
    in-flight turn.
    
    ## Motivation
    Currently, steering in the app is implemented by just calling
    `turn/start` while a turn is running. This has some really weird quirks:
    - Client gets back a new `turn.id`, even though streamed
    events/approvals remained tied to the original active turn ID.
    - All the various turn-level override params on `turn/start` do not
    apply to the "steer", and would only apply to the next real turn.
    - There can also be a race condition where the client thinks the turn is
    active but the server has already completed it, so there might be bugs
    if the client has baked in some client-specific behavior thinking it's a
    steer when in fact the server kicked off a new turn. This is
    particularly possible when running a client against a remote app-server.
    
    Having a dedicated `turn/steer` API eliminates all those quirks.
    
    `turn/steer` behavior:
    - Requires an active turn on threadId. Returns a JSON-RPC error if there
    is no active turn.
    - If expectedTurnId is provided, it must match the active turn (more
    useful when connecting to a remote app-server).
    - Does not emit `turn/started`.
    - Does not accept turn overrides (`cwd`, `model`, `sandbox`, etc.) or
    `outputSchema` to accurately reflect that these are not applied when
    steering.