474 Commits

  • feat(linux-sandbox): add bwrap support (#9938)
    ## Summary
    This PR introduces a gated Bubblewrap (bwrap) Linux sandbox path. The
    curent Linux sandbox path relies on in-process restrictions (including
    Landlock). Bubblewrap gives us a more uniform filesystem isolation
    model, especially explicit writable roots with the option to make some
    directories read-only and granular network controls.
    
    This is behind a feature flag so we can validate behavior safely before
    making it the default.
    
    - Added temporary rollout flag:
      - `features.use_linux_sandbox_bwrap`
    - Preserved existing default path when the flag is off.
    - In Bubblewrap mode:
    - Added internal retry without /proc when /proc mount is not permitted
    by the host/container.
  • Handle exec shutdown on Interrupt (fixes immortal codex exec with websockets) (#10519)
    ### Motivation
    - Ensure `codex exec` exits when a running turn is interrupted (e.g.,
    Ctrl-C) so the CLI is not "immortal" when websockets/streaming are used.
    
    ### Description
    - Return `CodexStatus::InitiateShutdown` when handling
    `EventMsg::TurnAborted` in
    `exec/src/event_processor_with_human_output.rs` so human-output exec
    mode shuts down after an interrupt.
    - Treat `protocol::EventMsg::TurnAborted` as
    `CodexStatus::InitiateShutdown` in
    `exec/src/event_processor_with_jsonl_output.rs` so JSONL output mode
    behaves the same.
    - Applied formatting with `just fmt`.
    
    ### Testing
    - Ran `just fmt` successfully.
    - Ran `cargo test -p codex-exec`; many unit tests ran and the test
    command completed, but the full test run in this environment produced
    `35 passed, 11 failed` where the failures are due to Landlock sandbox
    panics and 403 responses in the test harness (environmental/integration
    issues) and are not caused by the interrupt/shutdown changes.
    
    ------
    [Codex
    Task](https://chatgpt.com/codex/tasks/task_i_698165cec4e083258d17702bd29014c1)
  • feat: add APIs to list and download public remote skills (#10448)
    Add API to list / download from remote public skills
  • Cleanup collaboration mode variants (#10404)
    ## Summary
    
    This PR simplifies collaboration modes to the visible set `default |
    plan`, while preserving backward compatibility for older partners that
    may still send legacy mode
    names.
    
    Specifically:
    - Renames the old Code behavior to **Default**.
    - Keeps **Plan** as-is.
    - Removes **Custom** mode behavior (fallbacks now resolve to Default).
    - Keeps `PairProgramming` and `Execute` internally for compatibility
    plumbing, while removing them from schema/API and UI visibility.
    - Adds legacy input aliasing so older clients can still send old mode
    names.
    
    ## What Changed
    
    1. Mode enum and compatibility
    - `ModeKind` now uses `Plan` + `Default` as active/public modes.
    - `ModeKind::Default` deserialization accepts legacy values:
      - `code`
      - `pair_programming`
      - `execute`
      - `custom`
    - `PairProgramming` and `Execute` variants remain in code but are hidden
    from protocol/schema generation.
    - `Custom` variant is removed; previous custom fallbacks now map to
    `Default`.
    
    2. Collaboration presets and templates
    - Built-in presets now return only:
      - `Plan`
      - `Default`
    - Template rename:
      - `core/templates/collaboration_mode/code.md` -> `default.md`
    - `execute.md` and `pair_programming.md` remain on disk but are not
    surfaced in visible preset lists.
    
    3. TUI updates
    - Updated user-facing naming and prompts from “Code” to “Default”.
    - Updated mode-cycle and indicator behavior to reflect only visible
    `Plan` and `Default`.
    - Updated corresponding tests and snapshots.
    
    4. request_user_input behavior
    - `request_user_input` remains allowed only in `Plan` mode.
    - Rejection messaging now consistently treats non-plan modes as
    `Default`.
    
    5. Schemas
    - Regenerated config and app-server schemas.
    - Public schema types now advertise mode values as:
      - `plan`
      - `default`
    
    ## Backward Compatibility Notes
    
    - Incoming legacy mode names (`code`, `pair_programming`, `execute`,
    `custom`) are accepted and coerced to `default`.
    - Outgoing/public schema surfaces intentionally expose only `plan |
    default`.
    - This allows tolerant ingestion of older partner payloads while
    standardizing new integrations on the reduced mode set.
    
    ## Codex author
    `codex fork 019c1fae-693b-7840-b16e-9ad38ea0bd00`
  • feat: replace custom mcp-types crate with equivalents from rmcp (#10349)
    We started working with MCP in Codex before
    https://crates.io/crates/rmcp was mature, so we had our own crate for
    MCP types that was generated from the MCP schema:
    
    
    https://github.com/openai/codex/blob/8b95d3e082376f4cb23e92641705a22afb28a9da/codex-rs/mcp-types/README.md
    
    Now that `rmcp` is more mature, it makes more sense to use their MCP
    types in Rust, as they handle details (like the `_meta` field) that our
    custom version ignored. Though one advantage that our custom types had
    is that our generated types implemented `JsonSchema` and `ts_rs::TS`,
    whereas the types in `rmcp` do not. As such, part of the work of this PR
    is leveraging the adapters between `rmcp` types and the serializable
    types that are API for us (app server and MCP) introduced in #10356.
    
    Note this PR results in a number of changes to
    `codex-rs/app-server-protocol/schema`, which merit special attention
    during review. We must ensure that these changes are still
    backwards-compatible, which is possible because we have:
    
    ```diff
    - export type CallToolResult = { content: Array<ContentBlock>, isError?: boolean, structuredContent?: JsonValue, };
    + export type CallToolResult = { content: Array<JsonValue>, structuredContent?: JsonValue, isError?: boolean, _meta?: JsonValue, };
    ```
    
    so `ContentBlock` has been replaced with the more general `JsonValue`.
    Note that `ContentBlock` was defined as:
    
    ```typescript
    export type ContentBlock = TextContent | ImageContent | AudioContent | ResourceLink | EmbeddedResource;
    ```
    
    so the deletion of those individual variants should not be a cause of
    great concern.
    
    Similarly, we have the following change in
    `codex-rs/app-server-protocol/schema/typescript/Tool.ts`:
    
    ```
    - export type Tool = { annotations?: ToolAnnotations, description?: string, inputSchema: ToolInputSchema, name: string, outputSchema?: ToolOutputSchema, title?: string, };
    + export type Tool = { name: string, title?: string, description?: string, inputSchema: JsonValue, outputSchema?: JsonValue, annotations?: JsonValue, icons?: Array<JsonValue>, _meta?: JsonValue, };
    ```
    
    so:
    
    - `annotations?: ToolAnnotations` ➡️ `JsonValue`
    - `inputSchema: ToolInputSchema` ➡️ `JsonValue`
    - `outputSchema?: ToolOutputSchema` ➡️ `JsonValue`
    
    and two new fields: `icons?: Array<JsonValue>, _meta?: JsonValue`
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/10349).
    * #10357
    * __->__ #10349
    * #10356
  • chore(config) Rename config setting to personality (#10314)
    ## Summary
    Let's make the setting name consistent with the SlashCommand!
    
    ## Testing
    - [x] Updated tests
  • add missing fields to WebSearchAction and update app-server types (#10276)
    - add `WebSearchAction` to app-server v2 types
    - add `queries` to `WebSearchAction::Search` type
    
    Updated tests.
  • Add enforce_residency to requirements (#10263)
    Add `enforce_residency` to requirements.toml and thread it through to a
    header on `default_client`.
  • Wire up cloud reqs in exec, app-server (#10241)
    We're fetching cloud requirements in TUI in
    https://github.com/openai/codex/pull/10167.
    
    This adds the same fetching in exec and app-server binaries also.
  • Plan mode: stream proposed plans, emit plan items, and render in TUI (#9786)
    ## Summary
    - Stream proposed plans in Plan Mode using `<proposed_plan>` tags parsed
    in core, emitting plan deltas plus a plan `ThreadItem`, while stripping
    tags from normal assistant output.
    - Persist plan items and rebuild them on resume so proposed plans show
    in thread history.
    - Wire plan items/deltas through app-server protocol v2 and render a
    dedicated proposed-plan view in the TUI, including the “Implement this
    plan?” prompt only when a plan item is present.
    
    ## Changes
    
    ### Core (`codex-rs/core`)
    - Added a generic, line-based tag parser that buffers each line until it
    can disprove a tag prefix; implements auto-close on `finish()` for
    unterminated tags. `codex-rs/core/src/tagged_block_parser.rs`
    - Refactored proposed plan parsing to wrap the generic parser.
    `codex-rs/core/src/proposed_plan_parser.rs`
    - In plan mode, stream assistant deltas as:
      - **Normal text** → `AgentMessageContentDelta`
      - **Plan text** → `PlanDelta` + `TurnItem::Plan` start/completion  
      (`codex-rs/core/src/codex.rs`)
    - Final plan item content is derived from the completed assistant
    message (authoritative), not necessarily the concatenated deltas.
    - Strips `<proposed_plan>` blocks from assistant text in plan mode so
    tags don’t appear in normal messages.
    (`codex-rs/core/src/stream_events_utils.rs`)
    - Persist `ItemCompleted` events only for plan items for rollout replay.
    (`codex-rs/core/src/rollout/policy.rs`)
    - Guard `update_plan` tool in Plan Mode with a clear error message.
    (`codex-rs/core/src/tools/handlers/plan.rs`)
    - Updated Plan Mode prompt to:  
      - keep `<proposed_plan>` out of non-final reasoning/preambles  
      - require exact tag formatting  
      - allow only one `<proposed_plan>` block per turn  
      (`codex-rs/core/templates/collaboration_mode/plan.md`)
    
    ### Protocol / App-server protocol
    - Added `TurnItem::Plan` and `PlanDeltaEvent` to core protocol items.
    (`codex-rs/protocol/src/items.rs`, `codex-rs/protocol/src/protocol.rs`)
    - Added v2 `ThreadItem::Plan` and `PlanDeltaNotification` with
    EXPERIMENTAL markers and note that deltas may not match the final plan
    item. (`codex-rs/app-server-protocol/src/protocol/v2.rs`)
    - Added plan delta route in app-server protocol common mapping.
    (`codex-rs/app-server-protocol/src/protocol/common.rs`)
    - Rebuild plan items from persisted `ItemCompleted` events on resume.
    (`codex-rs/app-server-protocol/src/protocol/thread_history.rs`)
    
    ### App-server
    - Forward plan deltas to v2 clients and map core plan items to v2 plan
    items. (`codex-rs/app-server/src/bespoke_event_handling.rs`,
    `codex-rs/app-server/src/codex_message_processor.rs`)
    - Added v2 plan item tests.
    (`codex-rs/app-server/tests/suite/v2/plan_item.rs`)
    
    ### TUI
    - Added a dedicated proposed plan history cell with special background
    and padding, and moved “• Proposed Plan” outside the highlighted block.
    (`codex-rs/tui/src/history_cell.rs`, `codex-rs/tui/src/style.rs`)
    - Only show “Implement this plan?” when a plan item exists.
    (`codex-rs/tui/src/chatwidget.rs`,
    `codex-rs/tui/src/chatwidget/tests.rs`)
    
    <img width="831" height="847" alt="Screenshot 2026-01-29 at 7 06 24 PM"
    src="https://github.com/user-attachments/assets/69794c8c-f96b-4d36-92ef-c1f5c3a8f286"
    />
    
    ### Docs / Misc
    - Updated protocol docs to mention plan deltas.
    (`codex-rs/docs/protocol_v1.md`)
    - Minor plumbing updates in exec/debug clients to tolerate plan deltas.
    (`codex-rs/debug-client/src/reader.rs`, `codex-rs/exec/...`)
    
    ## Tests
    - Added core integration tests:
      - Plan mode strips plan from agent messages.
      - Missing `</proposed_plan>` closes at end-of-message.  
      (`codex-rs/core/tests/suite/items.rs`)
    - Added unit tests for generic tag parser (prefix buffering, non-tag
    lines, auto-close). (`codex-rs/core/src/tagged_block_parser.rs`)
    - Existing app-server plan item tests in v2.
    (`codex-rs/app-server/tests/suite/v2/plan_item.rs`)
    
    ## Notes / Behavior
    - Plan output no longer appears in standard assistant text in Plan Mode;
    it streams via `PlanDelta` and completes as a `TurnItem::Plan`.
    - The final plan item content is authoritative and may diverge from
    streamed deltas (documented as experimental).
    - Reasoning summaries are not filtered; prompt instructs the model not
    to include `<proposed_plan>` outside the final plan message.
    
    ## Codex Author
    `codex fork 019bec2d-b09d-7450-b292-d7bcdddcdbfb`
  • Conversation naming (#8991)
    Session renaming:
    - `/rename my_session`
    - `/rename` without arg and passing an argument in `customViewPrompt`
    - AppExitInfo shows resume hint using the session name if set instead of
    uuid, defaults to uuid if not set
    - Names are stored in `CODEX_HOME/sessions.jsonl`
    
    Session resuming:
    - codex resume <name> lookup for `CODEX_HOME/sessions.jsonl` first entry
    matching the name and resumes the session
    
    ---------
    
    Co-authored-by: jif-oai <jif@openai.com>
  • [bazel] Improve runfiles handling (#10098)
    we can't use runfiles directory on Windows due to path lengths, so swap
    to manifest strategy. Parsing the manifest is a bit complex and the
    format is changing in Bazel upstream, so pull in the official Rust
    library (via a small hack to make it importable...) and cleanup all the
    associated logic to work cleanly in both bazel and cargo without extra
    confusion
  • Fix resume --last with --json option (#9475)
    Fix resume --last prompt parsing by dropping the clap conflict on the
    codex resume subcommand so a positional prompt is accepted when --last
    is set. This aligns interactive resume behavior with exec-mode logic and
    avoids the “--last cannot be used with SESSION_ID” error.
    
    This addresses #6717
  • fix: handle all web_search actions and in progress invocations (#9960)
    ### Summary
    - Parse all `web_search` tool actions (`search`, `find_in_page`,
    `open_page`).
    - Previously we only parsed + displayed `search`, which made the TUI
    appear to pause when the other actions were being used.
    - Show in progress `web_search` calls as `Searching the web`
      - Previously we only showed completed tool calls
    
    <img width="308" height="149" alt="image"
    src="https://github.com/user-attachments/assets/90a4e8ff-b06a-48ff-a282-b57b31121845"
    />
    
    ### Tests
    Added + updated tests, tested locally
    
    ### Follow ups
    Update VSCode extension to display these as well
  • Fix flakey resume test (#9789)
    Sessions' `updated_at` times are truncated to seconds, with the UUID
    session ID used to break ties. If the two test sessions are created in
    the same second, AND the session B UUID < session A UUID, the test
    fails.
    
    Fix this by mutating the session mtimes, from which we derive the
    updated_at time, to ensure session B is updated_at later than session A.
  • feat: dynamic tools injection (#9539)
    ## Summary
    Add dynamic tool injection to thread startup in API v2, wire dynamic
    tool calls through the app server to clients, and plumb responses back
    into the model tool pipeline.
    
    ### Flow (high level)
    - Thread start injects `dynamic_tools` into the model tool list for that
    thread (validation is done here).
    - When the model emits a tool call for one of those names, core raises a
    `DynamicToolCallRequest` event.
    - The app server forwards it to the client as `item/tool/call`, waits
    for the client’s response, then submits a `DynamicToolResponse` back to
    core.
    - Core turns that into a `function_call_output` in the next model
    request so the model can continue.
    
    ### What changed
    - Added dynamic tool specs to v2 thread start params and protocol types;
    introduced `item/tool/call` (request/response) for dynamic tool
    execution.
    - Core now registers dynamic tool specs at request time and routes those
    calls via a new dynamic tool handler.
    - App server validates tool names/schemas, forwards dynamic tool call
    requests to clients, and publishes tool outputs back into the session.
    - Integration tests
  • feat: ephemeral threads (#9765)
    Add ephemeral threads capabilities. Only exposed through the
    `app-server` v2
    
    The idea is to disable the rollout recorder for those threads.
  • Another round of improvements for config error messages (#9746)
    In a [recent PR](https://github.com/openai/codex/pull/9182), I made some
    improvements to config error messages so errors didn't leave app server
    clients in a dead state. This is a follow-on PR to make these error
    messages more readable and actionable for both TUI and GUI users. For
    example, see #9668 where the user was understandably confused about the
    source of the problem and how to fix it.
    
    The improved error message:
    1. Clearly identifies the config file where the error was found (which
    is more important now that we support layered configs)
    2. Provides a line and column number of the error
    3. Displays the line where the error occurred and underlines it
    
    For example, if my `config.toml` includes the following:
    ```toml
    [features]
    collaboration_modes = "true"
    ```
    
    Here's the current CLI error message:
    ```
    Error loading config.toml: invalid type: string "true", expected a boolean in `features`
    ```
    
    And here's the improved message:
    ```
    Error loading config.toml:
    /Users/etraut/.codex/config.toml:43:23: invalid type: string "true", expected a boolean
       |
    43 | collaboration_modes = "true"
       |                       ^^^^^^
    ```
    
    The bulk of the new logic is contained within a new module
    `config_loader/diagnostics.rs` that is responsible for calculating the
    text range for a given toml path (which is more involved than I would
    have expected).
    
    In addition, this PR adds the file name and text range to the
    `ConfigWarningNotification` app server struct. This allows GUI clients
    to present the user with a better error message and an optional link to
    open the errant config file. This was a suggestion from @.bolinfest when
    he reviewed my previous PR.
  • fix(exec): skip git repo check when --yolo flag is used (#9590)
    ## Summary
    
    Fixes #7522
    
    The `--yolo` (`--dangerously-bypass-approvals-and-sandbox`) flag is
    documented to skip all confirmation prompts and execute commands without
    sandboxing, intended solely for running in environments that are
    externally sandboxed. However, it was not bypassing the trusted
    directory (git repo) check, requiring users to also specify
    `--skip-git-repo-check`.
    
    This change makes `--yolo` also skip the git repo check, matching the
    documented behavior and user expectations.
    
    ## Changes
    
    - Modified `codex-rs/exec/src/lib.rs` to check for
    `dangerously_bypass_approvals_and_sandbox` flag in addition to
    `skip_git_repo_check` when determining whether to skip the git repo
    check
    
    ## Testing
    
    - Verified the code compiles with `cargo check -p codex-exec`
    - Ran existing tests with `cargo test -p codex-exec` (34 passed, 8
    integration tests failed due to unrelated API connectivity issues)
    
    ---
    🤖 Generated with [Claude Code](https://claude.ai/code)
    
    Co-authored-by: Claude <noreply@anthropic.com>
  • feat(app-server) Expose personality (#9674)
    ### Motivation
    Exposes a per-thread / per-turn `personality` override in the v2
    app-server API so clients can influence model communication style at
    thread/turn start. Ensures the override is passed into the session
    configuration resolution so it becomes effective for subsequent turns
    and headless runners.
    
    ### Testing
    - [x] Add an integration-style test
    `turn_start_accepts_personality_override_v2` in
    `codex-rs/app-server/tests/suite/v2/turn_start.rs` that verifies a
    `/personality` override results in a developer update message containing
    `<personality_spec>` in the outbound model request.
    
    ------
    [Codex
    Task](https://chatgpt.com/codex/tasks/task_i_6971d646b1c08322a689a54d2649f3fe)
  • feat(core) update Personality on turn (#9644)
    ## Summary
    Support updating Personality mid-Thread via UserTurn/OverwriteTurn. This
    is explicitly unused by the clients so far, to simplify PRs - app-server
    and tui implementations will be follow-ups.
    
    ## Testing
    - [x] added integration tests
  • Persist text elements through TUI input and history (#9393)
    Continuation of breaking up this PR
    https://github.com/openai/codex/pull/9116
    
    ## Summary
    - Thread user text element ranges through TUI/TUI2 input, submission,
    queueing, and history so placeholders survive resume/edit flows.
    - Preserve local image attachments alongside text elements and rehydrate
    placeholders when restoring drafts.
    - Keep model-facing content shapes clean by attaching UI metadata only
    to user input/events (no API content changes).
    
    ## Key Changes
    - TUI/TUI2 composer now captures text element ranges, trims them with
    text edits, and restores them when submission is suppressed.
    - User history cells render styled spans for text elements and keep
    local image paths for future rehydration.
    - Initial chat widget bootstraps accept empty `initial_text_elements` to
    keep initialization uniform.
    - Protocol/core helpers updated to tolerate the new InputText field
    shape without changing payloads sent to the API.
  • Feat: request user input tool (#9472)
    ### Summary
    * Add `requestUserInput` tool that the model can use for gather
    feedback/asking question mid turn.
    
    
    ### Tool input schema
    ```
    {
      "$schema": "http://json-schema.org/draft-07/schema#",
      "title": "requestUserInput input",
      "type": "object",
      "additionalProperties": false,
      "required": ["questions"],
      "properties": {
        "questions": {
          "type": "array",
          "description": "Questions to show the user (1-3). Prefer 1 unless multiple independent decisions block progress.",
          "minItems": 1,
          "maxItems": 3,
          "items": {
            "type": "object",
            "additionalProperties": false,
            "required": ["id", "header", "question"],
            "properties": {
              "id": {
                "type": "string",
                "description": "Stable identifier for mapping answers (snake_case)."
              },
              "header": {
                "type": "string",
                "description": "Short header label shown in the UI (12 or fewer chars)."
              },
              "question": {
                "type": "string",
                "description": "Single-sentence prompt shown to the user."
              },
              "options": {
                "type": "array",
                "description": "Optional 2-3 mutually exclusive choices. Put the recommended option first and suffix its label with \"(Recommended)\". Only include \"Other\" option if we want to include a free form option. If the question is free form in nature, do not include any option.",
                "minItems": 2,
                "maxItems": 3,
                "items": {
                  "type": "object",
                  "additionalProperties": false,
                  "required": ["value", "label", "description"],
                  "properties": {
                    "value": {
                      "type": "string",
                      "description": "Machine-readable value (snake_case)."
                    },
                    "label": {
                      "type": "string",
                      "description": "User-facing label (1-5 words)."
                    },
                    "description": {
                      "type": "string",
                      "description": "One short sentence explaining impact/tradeoff if selected."
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
    ```
    
    ### Tool output schema
    ```
    {
      "$schema": "http://json-schema.org/draft-07/schema#",
      "title": "requestUserInput output",
      "type": "object",
      "additionalProperties": false,
      "required": ["answers"],
      "properties": {
        "answers": {
          "type": "object",
          "description": "Map of question id to user answer.",
          "additionalProperties": {
            "type": "object",
            "additionalProperties": false,
            "required": ["selected"],
            "properties": {
              "selected": {
                "type": "array",
                "items": { "type": "string" }
              },
              "other": {
                "type": ["string", "null"]
              }
            }
          }
        }
      }
    }
    ```
  • feat: show forked from session id in /status (#9330)
    Summary:
    - Add forked_from to SessionMeta/SessionConfiguredEvent and persist it
    for forked sessions.
    - Surface forked_from in /status for tui + tui2 and add snapshots.
  • feat(app-server, core): return threads by created_at or updated_at (#9247)
    Add support for returning threads by either `created_at` OR `updated_at`
    descending. Previously core always returned threads ordered by
    `created_at`.
    
    This PR:
    - updates core to be able to list threads by `updated_at` OR
    `created_at` descending based on what the caller wants
    - also update `thread/list` in app-server to expose this (default to
    `created_at` if not specified)
    
    All existing codepaths (app-server, TUI) still default to `created_at`,
    so no behavior change is expected with this PR.
    
    **Implementation**
    To sort by `updated_at` is a bit nontrivial (whereas `created_at` is
    easy due to the way we structure the folders and filenames on disk,
    which are all based on `created_at`).
    
    The most naive way to do this without introducing a cache file or sqlite
    DB (which we have to implement/maintain) is to scan files in reverse
    `created_at` order on disk, and look at the file's mtime (last modified
    timestamp according to the filesystem) until we reach `MAX_SCAN_FILES`
    (currently set to 10,000). Then, we can return the most recent N
    threads.
    
    Based on some quick and dirty benchmarking on my machine with ~1000
    rollout files, calling `thread/list` with limit 50, the `updated_at`
    path is slower as expected due to all the I/O:
    - updated-at: average 103.10 ms
    - created-at: average 41.10 ms
    
    Those absolute numbers aren't a big deal IMO, but we can certainly
    optimize this in a followup if needed by introducing more state stored
    on disk.
    
    **Caveat**
    There's also a limitation in that any files older than `MAX_SCAN_FILES`
    will be excluded, which means if a user continues a REALLY old thread,
    it's possible to not be included. In practice that should not be too big
    of an issue.
    
    If a user makes...
    - 1000 rollouts/day → threads older than 10 days won't show up
    - 100 rollouts/day → ~100 days
    
    If this becomes a problem for some reason, even more motivation to
    implement an updated_at cache.
  • Made codex exec resume --last consistent with codex resume --last (#9352)
    PR #9245 made `codex resume --last` honor cwd, but I forgot to make the
    same change for `codex exec resume --last`. This PR fixes the
    inconsistency.
    
    This addresses #8700
  • fix(exec): improve stdin prompt decoding (#9151)
    Fixes #8733.
    
    - Read prompt from stdin as raw bytes and decode more helpfully.
    - Strip UTF-8 BOM; decode UTF-16LE/UTF-16BE when a BOM is present.
    - For other non-UTF8 input, fail with an actionable message (offset +
    iconv hint).
    
    Tests: `cargo test -p codex-exec`.
  • Add text element metadata to types (#9235)
    Initial type tweaking PR to make the diff of
    https://github.com/openai/codex/pull/9116 smaller
    
    This should not change any behavior, just adds some fields to types
  • feat: emit events around collab tools (#9095)
    Emit the following events around the collab tools. On the `app-server`
    this will be under `item/started` and `item/completed`
    ```
    #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
    pub struct CollabAgentSpawnBeginEvent {
        /// Identifier for the collab tool call.
        pub call_id: String,
        /// Thread ID of the sender.
        pub sender_thread_id: ThreadId,
        /// Initial prompt sent to the agent. Can be empty to prevent CoT leaking at the
        /// beginning.
        pub prompt: String,
    }
    
    #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
    pub struct CollabAgentSpawnEndEvent {
        /// Identifier for the collab tool call.
        pub call_id: String,
        /// Thread ID of the sender.
        pub sender_thread_id: ThreadId,
        /// Thread ID of the newly spawned agent, if it was created.
        pub new_thread_id: Option<ThreadId>,
        /// Initial prompt sent to the agent. Can be empty to prevent CoT leaking at the
        /// beginning.
        pub prompt: String,
        /// Last known status of the new agent reported to the sender agent.
        pub status: AgentStatus,
    }
    
    #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
    pub struct CollabAgentInteractionBeginEvent {
        /// Identifier for the collab tool call.
        pub call_id: String,
        /// Thread ID of the sender.
        pub sender_thread_id: ThreadId,
        /// Thread ID of the receiver.
        pub receiver_thread_id: ThreadId,
        /// Prompt sent from the sender to the receiver. Can be empty to prevent CoT
        /// leaking at the beginning.
        pub prompt: String,
    }
    
    #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
    pub struct CollabAgentInteractionEndEvent {
        /// Identifier for the collab tool call.
        pub call_id: String,
        /// Thread ID of the sender.
        pub sender_thread_id: ThreadId,
        /// Thread ID of the receiver.
        pub receiver_thread_id: ThreadId,
        /// Prompt sent from the sender to the receiver. Can be empty to prevent CoT
        /// leaking at the beginning.
        pub prompt: String,
        /// Last known status of the receiver agent reported to the sender agent.
        pub status: AgentStatus,
    }
    
    #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
    pub struct CollabWaitingBeginEvent {
        /// Thread ID of the sender.
        pub sender_thread_id: ThreadId,
        /// Thread ID of the receiver.
        pub receiver_thread_id: ThreadId,
        /// ID of the waiting call.
        pub call_id: String,
    }
    
    #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
    pub struct CollabWaitingEndEvent {
        /// Thread ID of the sender.
        pub sender_thread_id: ThreadId,
        /// Thread ID of the receiver.
        pub receiver_thread_id: ThreadId,
        /// ID of the waiting call.
        pub call_id: String,
        /// Last known status of the receiver agent reported to the sender agent.
        pub status: AgentStatus,
    }
    
    #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
    pub struct CollabCloseBeginEvent {
        /// Identifier for the collab tool call.
        pub call_id: String,
        /// Thread ID of the sender.
        pub sender_thread_id: ThreadId,
        /// Thread ID of the receiver.
        pub receiver_thread_id: ThreadId,
    }
    
    #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
    pub struct CollabCloseEndEvent {
        /// Identifier for the collab tool call.
        pub call_id: String,
        /// Thread ID of the sender.
        pub sender_thread_id: ThreadId,
        /// Thread ID of the receiver.
        pub receiver_thread_id: ThreadId,
        /// Last known status of the receiver agent reported to the sender agent before
        /// the close.
        pub status: AgentStatus,
    }
    ```
  • Improve handling of config and rules errors for app server clients (#9182)
    When an invalid config.toml key or value is detected, the CLI currently
    just quits. This leaves the VSCE in a dead state.
    
    This PR changes the behavior to not quit and bubble up the config error
    to users to make it actionable. It also surfaces errors related to
    "rules" parsing.
    
    This allows us to surface these errors to users in the VSCE, like this:
    
    <img width="342" height="129" alt="Screenshot 2026-01-13 at 4 29 22 PM"
    src="https://github.com/user-attachments/assets/a79ffbe7-7604-400c-a304-c5165b6eebc4"
    />
    
    <img width="346" height="244" alt="Screenshot 2026-01-13 at 4 45 06 PM"
    src="https://github.com/user-attachments/assets/de874f7c-16a2-4a95-8c6d-15f10482e67b"
    />
  • clean models manager (#9168)
    Have only the following Methods:
    - `list_models`: getting current available models
    - `try_list_models`: sync version no refresh for tui use
    - `get_default_model`: get the default model (should be tightened to
    core and received on session configuration)
    - `get_model_info`: get `ModelInfo` for a specific model (should be
    tightened to core but used in tests)
    - `refresh_if_new_etag`: trigger refresh on different etags
    
    Also move the cache to its own struct
  • ollama: default to Responses API for built-ins (#8798)
    This is an alternate PR to solving the same problem as
    <https://github.com/openai/codex/pull/8227>.
    
    In this PR, when Ollama is used via `--oss` (or via `model_provider =
    "ollama"`), we default it to use the Responses format. At runtime, we do
    an Ollama version check, and if the version is older than when Responses
    support was added to Ollama, we print out a warning.
    
    Because there's no way of configuring the wire api for a built-in
    provider, we temporarily add a new `oss_provider`/`model_provider`
    called `"ollama-chat"` that will force the chat format.
    
    Once the `"chat"` format is fully removed (see
    <https://github.com/openai/codex/discussions/7782>), `ollama-chat` can
    be removed as well
    
    ---------
    
    Co-authored-by: Eric Traut <etraut@openai.com>
    Co-authored-by: Michael Bolin <mbolin@openai.com>
  • feat: add support for building with Bazel (#8875)
    This PR configures Codex CLI so it can be built with
    [Bazel](https://bazel.build) in addition to Cargo. The `.bazelrc`
    includes configuration so that remote builds can be done using
    [BuildBuddy](https://www.buildbuddy.io).
    
    If you are familiar with Bazel, things should work as you expect, e.g.,
    run `bazel test //... --keep-going` to run all the tests in the repo,
    but we have also added some new aliases in the `justfile` for
    convenience:
    
    - `just bazel-test` to run tests locally
    - `just bazel-remote-test` to run tests remotely (currently, the remote
    build is for x86_64 Linux regardless of your host platform). Note we are
    currently seeing the following test failures in the remote build, so we
    still need to figure out what is happening here:
    
    ```
    failures:
        suite::compact::manual_compact_twice_preserves_latest_user_messages
        suite::compact_resume_fork::compact_resume_after_second_compaction_preserves_history
        suite::compact_resume_fork::compact_resume_and_fork_preserve_model_history_view
    ```
    
    - `just build-for-release` to build release binaries for all
    platforms/architectures remotely
    
    To setup remote execution:
    - [Create a buildbuddy account](https://app.buildbuddy.io/) (OpenAI
    employees should also request org access at
    https://openai.buildbuddy.io/join/ with their `@openai.com` email
    address.)
    - [Copy your API key](https://app.buildbuddy.io/docs/setup/) to
    `~/.bazelrc` (add the line `build
    --remote_header=x-buildbuddy-api-key=YOUR_KEY`)
    - Use `--config=remote` in your `bazel` invocations (or add `common
    --config=remote` to your `~/.bazelrc`, or use the `just` commands)
    
    ## CI
    
    In terms of CI, this PR introduces `.github/workflows/bazel.yml`, which
    uses Bazel to run the tests _locally_ on Mac and Linux GitHub runners
    (we are working on supporting Windows, but that is not ready yet). Note
    that the failures we are seeing in `just bazel-remote-test` do not occur
    on these GitHub CI jobs, so everything in `.github/workflows/bazel.yml`
    is green right now.
    
    The `bazel.yml` uses extra config in `.github/workflows/ci.bazelrc` so
    that macOS CI jobs build _remotely_ on Linux hosts (using the
    `docker://docker.io/mbolin491/codex-bazel` Docker image declared in the
    root `BUILD.bazel`) using cross-compilation to build the macOS
    artifacts. Then these artifacts are downloaded locally to GitHub's macOS
    runner so the tests can be executed natively. This is the relevant
    config that enables this:
    
    ```
    common:macos --config=remote
    common:macos --strategy=remote
    common:macos --strategy=TestRunner=darwin-sandbox,local
    ```
    
    Because of the remote caching benefits we get from BuildBuddy, these new
    CI jobs can be extremely fast! For example, consider these two jobs that
    ran all the tests on Linux x86_64:
    
    - Bazel 1m37s
    https://github.com/openai/codex/actions/runs/20861063212/job/59940545209?pr=8875
    - Cargo 9m20s
    https://github.com/openai/codex/actions/runs/20861063192/job/59940559592?pr=8875
    
    For now, we will continue to run both the Bazel and Cargo jobs for PRs,
    but once we add support for Windows and running Clippy, we should be
    able to cutover to using Bazel exclusively for PRs, which should still
    speed things up considerably. We will probably continue to run the Cargo
    jobs post-merge for commits that land on `main` as a sanity check.
    
    Release builds will also continue to be done by Cargo for now.
    
    Earlier attempt at this PR: https://github.com/openai/codex/pull/8832
    Earlier attempt to add support for Buck2, now abandoned:
    https://github.com/openai/codex/pull/8504
    
    ---------
    
    Co-authored-by: David Zbarsky <dzbarsky@gmail.com>
    Co-authored-by: Michael Bolin <mbolin@openai.com>
  • fix(app-server): set originator header from initialize JSON-RPC request (#8873)
    **Motivation**
    The `originator` header is important for codex-backend’s Responses API
    proxy because it identifies the real end client (codex cli, codex vscode
    extension, codex exec, future IDEs) and is used to categorize requests
    by client for our enterprise compliance API.
    
    Today the `originator` header is set by either:
    - the `CODEX_INTERNAL_ORIGINATOR_OVERRIDE` env var (our VSCode extension
    does this)
    - calling `set_default_originator()` which sets a global immutable
    singleton (`codex exec` does this)
    
    For `codex app-server`, we want the `initialize` JSON-RPC request to set
    that header because it is a natural place to do so. Example:
    ```json
    {
      "method": "initialize",
      "id": 0,
      "params": {
        "clientInfo": {
          "name": "codex_vscode",
          "title": "Codex VS Code Extension",
          "version": "0.1.0"
        }
      }
    }
    ```
    and when app-server receives that request, it can call
    `set_default_originator()`. This is a much more natural interface than
    asking third party developers to set an env var.
    
    One hiccup is that `originator()` reads the global singleton and locks
    in the value, preventing a later `set_default_originator()` call from
    setting it. This would be fine but is brittle, since any codepath that
    calls `originator()` before app-server can process an `initialize`
    JSON-RPC call would prevent app-server from setting it. This was
    actually the case with OTEL initialization which runs on boot, but I
    also saw this behavior in certain tests.
    
    Instead, what we now do is:
    - [unchanged] If `CODEX_INTERNAL_ORIGINATOR_OVERRIDE` env var is set,
    `originator()` would return that value and `set_default_originator()`
    with some other value does NOT override it.
    - [new] If no env var is set, `originator()` would return the default
    value which is `codex_cli_rs` UNTIL `set_default_originator()` is called
    once, in which case it is set to the new value and becomes immutable.
    Later calls to `set_default_originator()` returns
    `SetOriginatorError::AlreadyInitialized`.
    
    **Other notes**
    - I updated `codex_core::otel_init::build_provider` to accepts a service
    name override, and app-server sends a hardcoded `codex_app_server`
    service name to distinguish it from `codex_cli_rs` used by default (e.g.
    TUI).
    
    **Next steps**
    - Update VSCE to set the proper value for `clientInfo.name` on
    `initialize` and drop the `CODEX_INTERNAL_ORIGINATOR_OVERRIDE` env var.
    - Delete support for `CODEX_INTERNAL_ORIGINATOR_OVERRIDE` in codex-rs.
  • Immutable CodexAuth (#8857)
    Historically we started with a CodexAuth that knew how to refresh it's
    own tokens and then added AuthManager that did a different kind of
    refresh (re-reading from disk).
    
    I don't think it makes sense for both `CodexAuth` and `AuthManager` to
    be mutable and contain behaviors.
    
    Move all refresh logic into `AuthManager` and keep `CodexAuth` as a data
    object.
  • feat: introduce find_resource! macro that works with Cargo or Bazel (#8879)
    To support Bazelification in https://github.com/openai/codex/pull/8875,
    this PR introduces a new `find_resource!` macro that we use in place of
    our existing logic in tests that looks for resources relative to the
    compile-time `CARGO_MANIFEST_DIR` env var.
    
    To make this work, we plan to add the following to all `rust_library()`
    and `rust_test()` Bazel rules in the project:
    
    ```
    rustc_env = {
        "BAZEL_PACKAGE": native.package_name(),
    },
    ```
    
    Our new `find_resource!` macro reads this value via
    `option_env!("BAZEL_PACKAGE")` so that the Bazel package _of the code
    using `find_resource!`_ is injected into the code expanded from the
    macro. (If `find_resource()` were a function, then
    `option_env!("BAZEL_PACKAGE")` would always be
    `codex-rs/utils/cargo-bin`, which is not what we want.)
    
    Note we only consider the `BAZEL_PACKAGE` value when the `RUNFILES_DIR`
    environment variable is set at runtime, indicating that the test is
    being run by Bazel. In this case, we have to concatenate the runtime
    `RUNFILES_DIR` with the compile-time `BAZEL_PACKAGE` value to build the
    path to the resource.
    
    In testing this change, I discovered one funky edge case in
    `codex-rs/exec-server/tests/common/lib.rs` where we have to _normalize_
    (but not canonicalize!) the result from `find_resource!` because the
    path contains a `common/..` component that does not exist on disk when
    the test is run under Bazel, so it must be semantically normalized using
    the [`path-absolutize`](https://crates.io/crates/path-absolutize) crate
    before it is passed to `dotslash fetch`.
    
    Because this new behavior may be non-obvious, this PR also updates
    `AGENTS.md` to make humans/Codex aware that this API is preferred.
  • chore: unify conversation with thread name (#8830)
    Done and verified by Codex + refactor feature of RustRover
  • feat(app-server): thread/rollback API (#8454)
    Add `thread/rollback` to app-server to support IDEs undo-ing the last N
    turns of a thread.
    
    For context, an IDE partner will be supporting an "undo" capability
    where the IDE (the app-server client) will be responsible for reverting
    the local changes made during the last turn. To support this well, we
    also need a way to drop the last turn (or more generally, the last N
    turns) from the agent's context. This is what `thread/rollback` does.
    
    **Core idea**: A Thread rollback is represented as a persisted event
    message (EventMsg::ThreadRollback) in the rollout JSONL file, not by
    rewriting history. On resume, both the model's context (core replay) and
    the UI turn list (app-server v2's thread history builder) apply these
    markers so the pruned history is consistent across live conversations
    and `thread/resume`.
    
    Implementation notes:
    - Rollback only affects agent context and appends to the rollout file;
    clients are responsible for reverting files on disk.
    - If a thread rollback is currently in progress, subsequent
    `thread/rollback` calls are rejected.
    - Because we use `CodexConversation::submit` and codex core tracks
    active turns, returning an error on concurrent rollbacks is communicated
    via an `EventMsg::Error` with a new variant
    `CodexErrorInfo::ThreadRollbackFailed`. app-server watches for that and
    sends the BAD_REQUEST RPC response.
    
    Tests cover thread rollbacks in both core and app-server, including when
    `num_turns` > existing turns (which clears all turns).
    
    **Note**: this explicitly does **not** behave like `/undo` which we just
    removed from the CLI, which does the opposite of what `thread/rollback`
    does. `/undo` reverts local changes via ghost commits/snapshots and does
    not modify the agent's context / conversation history.
  • Allow global exec flags after resume and fix CI codex build/timeout (#8440)
    **Motivation**
    - Bring `codex exec resume` to parity with top‑level flags so global
    options (git check bypass, json, model, sandbox toggles) work after the
    subcommand, including when outside a git repo.
    
    **Description**
    - Exec CLI: mark `--skip-git-repo-check`, `--json`, `--model`,
    `--full-auto`, and `--dangerously-bypass-approvals-and-sandbox` as
    global so they’re accepted after `resume`.
    - Tests: add `exec_resume_accepts_global_flags_after_subcommand` to
    verify those flags work when passed after `resume`.
    
    **Testing**
    - `just fmt`
    - `cargo test -p codex-exec` (pass; ran with elevated perms to allow
    network/port binds)
    - Manual: exercised `codex exec resume` with global flags after the
    subcommand to confirm behavior.
  • [chore] add additional_details to StreamErrorEvent + wire through (#8307)
    ### What
    
    Builds on #8293.
    
    Add `additional_details`, which contains the upstream error message, to
    relevant structures used to pass along retryable `StreamError`s.
    
    Uses the new TUI status indicator's `details` field (shows under the
    status header) to display the `additional_details` error to the user on
    retryable `Reconnecting...` errors. This adds clarity for users for
    retryable errors.
    
    Will make corresponding change to VSCode extension to show
    `additional_details` as expandable from the `Reconnecting...` cell.
    
    Examples:
    <img width="1012" height="326" alt="image"
    src="https://github.com/user-attachments/assets/f35e7e6a-8f5e-4a2f-a764-358101776996"
    />
    
    <img width="1526" height="358" alt="image"
    src="https://github.com/user-attachments/assets/0029cbc0-f062-4233-8650-cc216c7808f0"
    />