Commit Graph

82 Commits

  • Reapply "Add app-server transport layer with websocket support" (#11370)
    Reapply "Add app-server transport layer with websocket support" with
    additional fixes from https://github.com/openai/codex/pull/11313/changes
    to avoid deadlocking.
    
    This reverts commit 47356ff83c.
    
    ## Summary
    
    To avoid deadlocking when queues are full, we maintain separate tokio
    tasks dedicated to incoming vs outgoing event handling
    - split the app-server main loop into two tasks in
    `run_main_with_transport`
       - inbound handling (`transport_event_rx`)
       - outbound handling (`outgoing_rx` + `thread_created_rx`)
    - separate incoming and outgoing websocket tasks
    
    ## Validation
    
    Integration tests, testing thoroughly e2e in codex app w/ >10 concurrent
    requests
    
    <img width="1365" height="979" alt="Screenshot 2026-02-10 at 2 54 22 PM"
    src="https://github.com/user-attachments/assets/47ca2c13-f322-4e5c-bedd-25859cbdc45f"
    />
    
    ---------
    
    Co-authored-by: jif-oai <jif@openai.com>
  • Remove test-support feature from codex-core and replace it with explicit test toggles (#11405)
    ## Why
    
    `codex-core` was being built in multiple feature-resolved permutations
    because test-only behavior was modeled as crate features. For a large
    crate, those permutations increase compile cost and reduce cache reuse.
    
    ## Net Change
    
    - Removed the `test-support` crate feature and related feature wiring so
    `codex-core` no longer needs separate feature shapes for test consumers.
    - Standardized cross-crate test-only access behind
    `codex_core::test_support`.
    - External test code now imports helpers from
    `codex_core::test_support`.
    - Underlying implementation hooks are kept internal (`pub(crate)`)
    instead of broadly public.
    
    ## Outcome
    
    - Fewer `codex-core` build permutations.
    - Better incremental cache reuse across test targets.
    - No intended production behavior change.
  • Prefer websocket transport when model opts in (#11386)
    Summary
    - add a `prefer_websockets` field to `ModelInfo`, defaulting to `false`
    in all fixtures and constructors
    - wire the new flag into websocket selection so models that opt in
    always use websocket transport even when the feature gate is off
    
    Testing
    - Not run (not requested)
  • feat: opt-out of events in the app-server (#11319)
    Add `optOutNotificationMethods` in the app-server to opt-out events
    based on exact method matching
  • fix(app-server): for external auth, replace id_token with chatgpt_acc… (#11240)
    …ount_id and chatgpt_plan_type
    
    ### Summary
    Following up on external auth mode which was introduced here:
    https://github.com/openai/codex/pull/10012
    
    Turns out some clients have a differently shaped ID token and don't have
    a chosen workspace (aka chatgpt_account_id) encoded in their ID token.
    So, let's replace `id_token` param with `chatgpt_account_id` and
    `chatgpt_plan_type` (optional) when initializing the external ChatGPT
    auth mode (`account/login/start` with `chatgptAuthTokens`).
    
    The client was able to test end-to-end with a Codex build from this
    branch and verified it worked!
  • test: deflake nextest child-process leak in MCP harnesses (#11263)
    ## Summary
    - add deterministic child-process cleanup to both test `McpProcess`
    helpers
    - keep Tokio `kill_on_drop(true)` but also reap via bounded `try_wait()`
    polling in `Drop`
    - document the failure mode and why this avoids nondeterministic `LEAK`
    flakes
    
    ## Why
    `cargo nextest` leak detection can intermittently report `LEAK` when a
    spawned server outlives test teardown, making CI flaky.
    
    ## Testing
    - `just fmt`
    - `cargo test -p codex-app-server`
    - `cargo test -p codex-mcp-server`
    
    
    ## Failing CI Reference
    - Original failing job:
    https://github.com/openai/codex/actions/runs/21845226299/job/63039443593?pr=11245
  • feat(app-server): turn/steer API (#10821)
    This PR adds a dedicated `turn/steer` API for appending user input to an
    in-flight turn.
    
    ## Motivation
    Currently, steering in the app is implemented by just calling
    `turn/start` while a turn is running. This has some really weird quirks:
    - Client gets back a new `turn.id`, even though streamed
    events/approvals remained tied to the original active turn ID.
    - All the various turn-level override params on `turn/start` do not
    apply to the "steer", and would only apply to the next real turn.
    - There can also be a race condition where the client thinks the turn is
    active but the server has already completed it, so there might be bugs
    if the client has baked in some client-specific behavior thinking it's a
    steer when in fact the server kicked off a new turn. This is
    particularly possible when running a client against a remote app-server.
    
    Having a dedicated `turn/steer` API eliminates all those quirks.
    
    `turn/steer` behavior:
    - Requires an active turn on threadId. Returns a JSON-RPC error if there
    is no active turn.
    - If expectedTurnId is provided, it must match the active turn (more
    useful when connecting to a remote app-server).
    - Does not emit `turn/started`.
    - Does not accept turn overrides (`cwd`, `model`, `sandbox`, etc.) or
    `outputSchema` to accurately reflect that these are not applied when
    steering.
  • [app-server] Add a method to list experimental features. (#10721)
    - [x] Add a method to list experimental features.
  • Add thread/compact v2 (#10445)
    - add `thread/compact` as a trigger-only v2 RPC that submits
    `Op::Compact` and returns `{}` immediately.
    - add v2 compaction e2e coverage for success and invalid/unknown thread
    ids, and update protocol schemas/docs.
  • [Codex][CLI] Gate image inputs by model modalities (#10271)
    ###### Summary
    
    - Add input_modalities to model metadata so clients can determine
    supported input types.
    - Gate image paste/attach in TUI when the selected model does not
    support images.
    - Block submits that include images for unsupported models and show a
    clear warning.
    - Propagate modality metadata through app-server protocol/model-list
    responses.
      - Update related tests/fixtures.
    
      ###### Rationale
    
      - Models support different input modalities.
    - Clients need an explicit capability signal to prevent unsupported
    requests.
    - Backward-compatible defaults preserve existing behavior when modality
    metadata is absent.
    
      ###### Scope
    
      - codex-rs/protocol, codex-rs/core, codex-rs/tui
      - codex-rs/app-server-protocol, codex-rs/app-server
      - Generated app-server types / schema fixtures
    
      ###### Trade-offs
    
    - Default behavior assumes text + image when field is absent for
    compatibility.
      - Server-side validation remains the source of truth.
    
      ###### Follow-up
    
    - Non-TUI clients should consume input_modalities to disable unsupported
    attachments.
    - Model catalogs should explicitly set input_modalities for text-only
    models.
    
      ###### Testing
    
      - cargo fmt --all
      - cargo test -p codex-tui
      - env -u GITHUB_APP_KEY cargo test -p codex-core --lib
      - just write-app-server-schema
    - cargo run -p codex-cli --bin codex -- app-server generate-ts --out
    app-server-types
      - test against local backend
      
    <img width="695" height="199" alt="image"
    src="https://github.com/user-attachments/assets/d22dd04f-5eba-4db9-a7c5-a2506f60ec44"
    />
    
    ---------
    
    Co-authored-by: Josh McKinney <joshka@openai.com>
  • feat: experimental flags (#10231)
    ## Problem being solved
    - We need a single, reliable way to mark app-server API surface as
    experimental so that:
      1. the runtime can reject experimental usage unless the client opts in
    2. generated TS/JSON schemas can exclude experimental methods/fields for
    stable clients.
    
    Right now that’s easy to drift or miss when done ad-hoc.
    
    ## How to declare experimental methods and fields
    - **Experimental method**: add `#[experimental("method/name")]` to the
    `ClientRequest` variant in `client_request_definitions!`.
    - **Experimental field**: on the params struct, derive `ExperimentalApi`
    and annotate the field with `#[experimental("method/name.field")]` + set
    `inspect_params: true` for the method variant so
    `ClientRequest::experimental_reason()` inspects params for experimental
    fields.
    
    ## How the macro solves it
    - The new derive macro lives in
    `codex-rs/codex-experimental-api-macros/src/lib.rs` and is used via
    `#[derive(ExperimentalApi)]` plus `#[experimental("reason")]`
    attributes.
    - **Structs**:
    - Generates `ExperimentalApi::experimental_reason(&self)` that checks
    only annotated fields.
      - The “presence” check is type-aware:
        - `Option<T>`: `is_some_and(...)` recursively checks inner.
        - `Vec`/`HashMap`/`BTreeMap`: must be non-empty.
        - `bool`: must be `true`.
        - Other types: considered present (returns `true`).
    - Registers each experimental field in an `inventory` with `(type_name,
    serialized field name, reason)` and exposes `EXPERIMENTAL_FIELDS` for
    that type. Field names are converted from `snake_case` to `camelCase`
    for schema/TS filtering.
    - **Enums**:
    - Generates an exhaustive `match` returning `Some(reason)` for annotated
    variants and `None` otherwise (no wildcard arm).
    - **Wiring**:
    - Runtime gating uses `ExperimentalApi::experimental_reason()` in
    `codex-rs/app-server/src/message_processor.rs` to reject requests unless
    `InitializeParams.capabilities.experimental_api == true`.
    - Schema/TS export filters use the inventory list and
    `EXPERIMENTAL_CLIENT_METHODS` from `client_request_definitions!` to
    strip experimental methods/fields when `experimental_api` is false.
  • chore: rename ChatGpt -> Chatgpt in type names (#10244)
    When using ChatGPT in names of types, we should be consistent, so this
    renames some types with `ChatGpt` in the name to `Chatgpt`. From
    https://rust-lang.github.io/api-guidelines/naming.html:
    
    > In `UpperCamelCase`, acronyms and contractions of compound words count
    as one word: use `Uuid` rather than `UUID`, `Usize` rather than `USize`
    or `Stdin` rather than `StdIn`. In `snake_case`, acronyms and
    contractions are lower-cased: `is_xid_start`.
    
    This PR updates existing uses of `ChatGpt` and changes them to
    `Chatgpt`. Though in all cases where it could affect the wire format, I
    visually inspected that we don't change anything there. That said, this
    _will_ change the codegen because it will affect the spelling of type
    names.
    
    For example, this renames `AuthMode::ChatGPT` to `AuthMode::Chatgpt` in
    `app-server-protocol`, but the wire format is still `"chatgpt"`.
    
    This PR also updates a number of types in `codex-rs/core/src/auth.rs`.
  • Plan mode: stream proposed plans, emit plan items, and render in TUI (#9786)
    ## Summary
    - Stream proposed plans in Plan Mode using `<proposed_plan>` tags parsed
    in core, emitting plan deltas plus a plan `ThreadItem`, while stripping
    tags from normal assistant output.
    - Persist plan items and rebuild them on resume so proposed plans show
    in thread history.
    - Wire plan items/deltas through app-server protocol v2 and render a
    dedicated proposed-plan view in the TUI, including the “Implement this
    plan?” prompt only when a plan item is present.
    
    ## Changes
    
    ### Core (`codex-rs/core`)
    - Added a generic, line-based tag parser that buffers each line until it
    can disprove a tag prefix; implements auto-close on `finish()` for
    unterminated tags. `codex-rs/core/src/tagged_block_parser.rs`
    - Refactored proposed plan parsing to wrap the generic parser.
    `codex-rs/core/src/proposed_plan_parser.rs`
    - In plan mode, stream assistant deltas as:
      - **Normal text** → `AgentMessageContentDelta`
      - **Plan text** → `PlanDelta` + `TurnItem::Plan` start/completion  
      (`codex-rs/core/src/codex.rs`)
    - Final plan item content is derived from the completed assistant
    message (authoritative), not necessarily the concatenated deltas.
    - Strips `<proposed_plan>` blocks from assistant text in plan mode so
    tags don’t appear in normal messages.
    (`codex-rs/core/src/stream_events_utils.rs`)
    - Persist `ItemCompleted` events only for plan items for rollout replay.
    (`codex-rs/core/src/rollout/policy.rs`)
    - Guard `update_plan` tool in Plan Mode with a clear error message.
    (`codex-rs/core/src/tools/handlers/plan.rs`)
    - Updated Plan Mode prompt to:  
      - keep `<proposed_plan>` out of non-final reasoning/preambles  
      - require exact tag formatting  
      - allow only one `<proposed_plan>` block per turn  
      (`codex-rs/core/templates/collaboration_mode/plan.md`)
    
    ### Protocol / App-server protocol
    - Added `TurnItem::Plan` and `PlanDeltaEvent` to core protocol items.
    (`codex-rs/protocol/src/items.rs`, `codex-rs/protocol/src/protocol.rs`)
    - Added v2 `ThreadItem::Plan` and `PlanDeltaNotification` with
    EXPERIMENTAL markers and note that deltas may not match the final plan
    item. (`codex-rs/app-server-protocol/src/protocol/v2.rs`)
    - Added plan delta route in app-server protocol common mapping.
    (`codex-rs/app-server-protocol/src/protocol/common.rs`)
    - Rebuild plan items from persisted `ItemCompleted` events on resume.
    (`codex-rs/app-server-protocol/src/protocol/thread_history.rs`)
    
    ### App-server
    - Forward plan deltas to v2 clients and map core plan items to v2 plan
    items. (`codex-rs/app-server/src/bespoke_event_handling.rs`,
    `codex-rs/app-server/src/codex_message_processor.rs`)
    - Added v2 plan item tests.
    (`codex-rs/app-server/tests/suite/v2/plan_item.rs`)
    
    ### TUI
    - Added a dedicated proposed plan history cell with special background
    and padding, and moved “• Proposed Plan” outside the highlighted block.
    (`codex-rs/tui/src/history_cell.rs`, `codex-rs/tui/src/style.rs`)
    - Only show “Implement this plan?” when a plan item exists.
    (`codex-rs/tui/src/chatwidget.rs`,
    `codex-rs/tui/src/chatwidget/tests.rs`)
    
    <img width="831" height="847" alt="Screenshot 2026-01-29 at 7 06 24 PM"
    src="https://github.com/user-attachments/assets/69794c8c-f96b-4d36-92ef-c1f5c3a8f286"
    />
    
    ### Docs / Misc
    - Updated protocol docs to mention plan deltas.
    (`codex-rs/docs/protocol_v1.md`)
    - Minor plumbing updates in exec/debug clients to tolerate plan deltas.
    (`codex-rs/debug-client/src/reader.rs`, `codex-rs/exec/...`)
    
    ## Tests
    - Added core integration tests:
      - Plan mode strips plan from agent messages.
      - Missing `</proposed_plan>` closes at end-of-message.  
      (`codex-rs/core/tests/suite/items.rs`)
    - Added unit tests for generic tag parser (prefix buffering, non-tag
    lines, auto-close). (`codex-rs/core/src/tagged_block_parser.rs`)
    - Existing app-server plan item tests in v2.
    (`codex-rs/app-server/tests/suite/v2/plan_item.rs`)
    
    ## Notes / Behavior
    - Plan output no longer appears in standard assistant text in Plan Mode;
    it streams via `PlanDelta` and completes as a `TurnItem::Plan`.
    - The final plan item content is authoritative and may diverge from
    streamed deltas (documented as experimental).
    - Reasoning summaries are not filtered; prompt instructs the model not
    to include `<proposed_plan>` outside the final plan message.
    
    ## Codex Author
    `codex fork 019bec2d-b09d-7450-b292-d7bcdddcdbfb`
  • Chore: plan mode do not include free form question and always include isOther (#10210)
    We should never ask a freeform question when planning and we should
    always include isOther as an escape hatch.
  • chore(personality) new schema with fallbacks (#10147)
    ## Summary
    Let's dial in this api contract in a bit more with more robust fallback
    behavior when model_instructions_template is false.
    
    Switches to a more explicit template / variables structure, with more
    fallbacks.
    
    ## Testing
    - [x] Adding unit tests
    - [x] Tested locally
  • [feat] persist dynamic tools in session rollout file (#10130)
    Add dynamic tools to rollout file for persistence & read from rollout on
    resume. Ran a real example and spotted the following in the rollout
    file:
    ```
    {"timestamp":"2026-01-29T01:27:57.468Z","type":"session_meta","payload":{"id":"019c075d-3f0b-77e3-894e-c1c159b04b1e","timestamp":"2026-01-29T01:27:57.451Z","...."dynamic_tools":[{"name":"demo_tool","description":"Demo dynamic tool","inputSchema":{"additionalProperties":false,"properties":{"city":{"type":"string"}},"required":["city"],"type":"object"}}],"git":{"commit_hash":"ebc573f15c01b8af158e060cfedd401f043e9dfa","branch":"dev/cc/dynamic-tools","repository_url":"https://github.com/openai/codex.git"}}}
    ```
  • feat(app-server): support external auth mode (#10012)
    This enables a new use case where `codex app-server` is embedded into a
    parent application that will directly own the user's ChatGPT auth
    lifecycle, which means it owns the user’s auth tokens and refreshes it
    when necessary. The parent application would just want a way to pass in
    the auth tokens for codex to use directly.
    
    The idea is that we are introducing a new "auth mode" currently only
    exposed via app server: **`chatgptAuthTokens`** which consist of the
    `id_token` (stores account metadata) and `access_token` (the bearer
    token used directly for backend API calls). These auth tokens are only
    stored in-memory. This new mode is in addition to the existing `apiKey`
    and `chatgpt` auth modes.
    
    This PR reuses the shape of our existing app-server account APIs as much
    as possible:
    - Update `account/login/start` with a new `chatgptAuthTokens` variant,
    which will allow the client to pass in the tokens and have codex
    app-server use them directly. Upon success, the server emits
    `account/login/completed` and `account/updated` notifications.
    - A new server->client request called
    `account/chatgptAuthTokens/refresh` which the server can use whenever
    the access token previously passed in has expired and it needs a new one
    from the parent application.
    
    I leveraged the core 401 retry loop which typically triggers auth token
    refreshes automatically, but made it pluggable:
    - **chatgpt** mode refreshes internally, as usual.
    - **chatgptAuthTokens** mode calls the client via
    `account/chatgptAuthTokens/refresh`, the client responds with updated
    tokens, codex updates its in-memory auth, then retries. This RPC has a
    10s timeout and handles JSON-RPC errors from the client.
    
    Also some additional things:
    - chatgpt logins are blocked while external auth is active (have to log
    out first. typically clients will pick one OR the other, not support
    both)
    - `account/logout` clears external auth in memory
    - Ensures that if `forced_chatgpt_workspace_id` is set via the user's
    config, we respect it in both:
    - `account/login/start` with `chatgptAuthTokens` (returns a JSON-RPC
    error back to the client)
    - `account/chatgptAuthTokens/refresh` (fails the turn, and on next
    request app-server will send another `account/chatgptAuthTokens/refresh`
    request to the client).
  • Add app-server compaction item notifications tests (#10123)
    - add v2 tests covering local + remote auto-compaction item
    started/completed notifications
  • Add thread/unarchive to restore archived rollouts (#9843)
    ## Summary
    - Adds a new `thread/unarchive` RPC to move archived thread rollouts
    back into the active `sessions/` tree.
    
    ## What changed
    - **Protocol**
      - Adds `thread/unarchive` request/response types and wiring.
    - **Server**
      - Implements `thread_unarchive` in the app server.
      - Validates the archived rollout path and thread ID.
    - Restores the rollout to `sessions/YYYY/MM/DD/...` based on the rollout
    filename timestamp.
    - **Core**
    - Adds `find_archived_thread_path_by_id_str` helper for archived
    rollouts.
    - **Docs**
      - Documents the new RPC and usage example.
    - **Tests**
      - Adds an end-to-end server test that:
        1) starts a thread,
        2) archives it,
        3) unarchives it,
        4) asserts the file is restored to `sessions/`.
    
    ## How to use
    ```json
    { "method": "thread/unarchive", "id": 24, "params": { "threadId": "<thread-id>" } }
    ```
    
    ## Author Codex Session
    
    `codex resume 019bf158-54b6-7960-a696-9d85df7e1bc1` (soon I'll make this
    kind of session UUID forkable by anyone with the right
    `session_object_storage_url` line in their config, but for now just
    pasting it here for my reference)
  • Feat: add isOther to question returned by request user input tool (#9890)
    ### Summary
    Add `isOther` to question object from request_user_input tool input and
    remove `other` option from the tool prompt to better handle tool input.
  • [connectors] Support connectors part 1 - App server & MCP (#9667)
    In order to make Codex work with connectors, we add a built-in gateway
    MCP that acts as a transparent proxy between the client and the
    connectors. The gateway MCP collects actions that are accessible to the
    user and sends them down to the user, when a connector action is chosen
    to be called, the client invokes the action through the gateway MCP as
    well.
    
     - [x] Add the system built-in gateway MCP to list and run connectors.
     - [x] Add the app server methods and protocol
  • feat(core) ModelInfo.model_instructions_template (#9597)
    ## Summary
    #9555 is the start of a rename, so I'm starting to standardize here.
    Sets up `model_instructions` templating with a strongly-typed object for
    injecting a personality block into the model instructions.
    
    ## Testing
    - [x] Added tests
    - [x] Ran locally
  • Add layered config.toml support to app server (#9510)
    This PR adds support for chained (layered) config.toml file merging for
    clients that use the app server interface. This feature already exists
    for the TUI, but it does not work for GUI clients.
    
    It does the following:
    * Changes code paths for new thread, resume thread, and fork thread to
    use the effective config based on the cwd.
    * Updates the `config/read` API to accept an optional `cwd` parameter.
    If specified, the API returns the effective config based on that cwd
    path. Also optionally includes all layers including project config
    files. If cwd is not specified, the API falls back on its older behavior
    where it considers only the global (non-project) config files when
    computing the effective config.
    
    The changes in codex_message_processor.rs look deceptively large. They
    mostly just involve moving existing blocks of code to a later point in
    some functions so it can use the cwd to calculate the config.
    
    This PR builds upon #9509 and should be reviewed and merged after that
    PR.
    
    Tested:
    * Verified change with (dependent, as-yet-uncommitted) changes to IDE
    Extension and confirmed correct behavior
    
    The full fix requires additional changes in the IDE Extension code base,
    but they depend on this PR.
  • fix(core) Preserve base_instructions in SessionMeta (#9427)
    ## Summary
    This PR consolidates base_instructions onto SessionMeta /
    SessionConfiguration, so we ensure `base_instructions` is set once per
    session and should be (mostly) immutable, unless:
    - overridden by config on resume / fork
    - sub-agent tasks, like review or collab
    
    
    In a future PR, we should convert all references to `base_instructions`
    to consistently used the typed struct, so it's less likely that we put
    other strings there. See #9423. However, this PR is already quite
    complex, so I'm deferring that to a follow-up.
    
    ## Testing
    - [x] Added a resume test to assert that instructions are preserved. In
    particular, `resume_switches_models_preserves_base_instructions` fails
    against main.
    
    Existing test coverage thats assert base instructions are preserved
    across multiple requests in a session:
    - Manual compact keeps baseline instructions:
    core/tests/suite/compact.rs:199
    - Auto-compact keeps baseline instructions:
    core/tests/suite/compact.rs:1142
    - Prompt caching reuses the same instructions across two requests:
    core/tests/suite/prompt_caching.rs:150 and
    core/tests/suite/prompt_caching.rs:157
    - Prompt caching with explicit expected string across two requests:
    core/tests/suite/prompt_caching.rs:213 and
    core/tests/suite/prompt_caching.rs:222
    - Resume with model switch keeps original instructions:
    core/tests/suite/resume.rs:136
    - Compact/resume/fork uses request 0 instructions for later expected
    payloads: core/tests/suite/compact_resume_fork.rs:215
  • Feat: request user input tool (#9472)
    ### Summary
    * Add `requestUserInput` tool that the model can use for gather
    feedback/asking question mid turn.
    
    
    ### Tool input schema
    ```
    {
      "$schema": "http://json-schema.org/draft-07/schema#",
      "title": "requestUserInput input",
      "type": "object",
      "additionalProperties": false,
      "required": ["questions"],
      "properties": {
        "questions": {
          "type": "array",
          "description": "Questions to show the user (1-3). Prefer 1 unless multiple independent decisions block progress.",
          "minItems": 1,
          "maxItems": 3,
          "items": {
            "type": "object",
            "additionalProperties": false,
            "required": ["id", "header", "question"],
            "properties": {
              "id": {
                "type": "string",
                "description": "Stable identifier for mapping answers (snake_case)."
              },
              "header": {
                "type": "string",
                "description": "Short header label shown in the UI (12 or fewer chars)."
              },
              "question": {
                "type": "string",
                "description": "Single-sentence prompt shown to the user."
              },
              "options": {
                "type": "array",
                "description": "Optional 2-3 mutually exclusive choices. Put the recommended option first and suffix its label with \"(Recommended)\". Only include \"Other\" option if we want to include a free form option. If the question is free form in nature, do not include any option.",
                "minItems": 2,
                "maxItems": 3,
                "items": {
                  "type": "object",
                  "additionalProperties": false,
                  "required": ["value", "label", "description"],
                  "properties": {
                    "value": {
                      "type": "string",
                      "description": "Machine-readable value (snake_case)."
                    },
                    "label": {
                      "type": "string",
                      "description": "User-facing label (1-5 words)."
                    },
                    "description": {
                      "type": "string",
                      "description": "One short sentence explaining impact/tradeoff if selected."
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
    ```
    
    ### Tool output schema
    ```
    {
      "$schema": "http://json-schema.org/draft-07/schema#",
      "title": "requestUserInput output",
      "type": "object",
      "additionalProperties": false,
      "required": ["answers"],
      "properties": {
        "answers": {
          "type": "object",
          "description": "Map of question id to user answer.",
          "additionalProperties": {
            "type": "object",
            "additionalProperties": false,
            "required": ["selected"],
            "properties": {
              "selected": {
                "type": "array",
                "items": { "type": "string" }
              },
              "other": {
                "type": ["string", "null"]
              }
            }
          }
        }
      }
    }
    ```
  • chore(instructions) Remove unread SessionMeta.instructions field (#9423)
    ### Description
    - Remove the now-unused `instructions` field from the session metadata
    to simplify SessionMeta and stop propagating transient instruction text
    through the rollout recorder API. This was only saving
    user_instructions, and was never being read.
    - Stop passing user instructions into the rollout writer at session
    creation so the rollout header only contains canonical session metadata.
    
    ### Testing
    
    - Ran `just fmt` which completed successfully.
    - Ran `just fix -p codex-protocol`, `just fix -p codex-core`, `just fix
    -p codex-app-server`, `just fix -p codex-tui`, and `just fix -p
    codex-tui2` which completed (Clippy fixes applied) as part of
    verification.
    - Ran `cargo test -p codex-protocol` which passed (28 tests).
    - Ran `cargo test -p codex-core` which showed failures in a small set of
    tests (not caused by the protocol type change directly):
    `default_client::tests::test_create_client_sets_default_headers`,
    several `models_manager::manager::tests::refresh_available_models_*`,
    and `shell_snapshot::tests::linux_sh_snapshot_includes_sections` (these
    tests failed in this CI run).
    - Ran `cargo test -p codex-app-server` which reported several failing
    integration tests (including
    `suite::codex_message_processor_flow::test_codex_jsonrpc_conversation_flow`,
    `suite::output_schema::send_user_turn_*`, and
    `suite::user_agent::get_user_agent_returns_current_codex_user_agent`).
    - `cargo test -p codex-tui` and `cargo test -p codex-tui2` were
    attempted but aborted due to disk space exhaustion (`No space left on
    device`).
    
    ------
    [Codex
    Task](https://chatgpt.com/codex/tasks/task_i_696bd8ce632483228d298cf07c7eb41c)
  • Expose collaboration presets (#9421)
    Expose collaboration presets for clients
    
    ---------
    
    Co-authored-by: Josh McKinney <joshka@openai.com>
  • feat: show forked from session id in /status (#9330)
    Summary:
    - Add forked_from to SessionMeta/SessionConfiguredEvent and persist it
    for forked sessions.
    - Surface forked_from in /status for tui + tui2 and add snapshots.
  • feat(app-server, core): return threads by created_at or updated_at (#9247)
    Add support for returning threads by either `created_at` OR `updated_at`
    descending. Previously core always returned threads ordered by
    `created_at`.
    
    This PR:
    - updates core to be able to list threads by `updated_at` OR
    `created_at` descending based on what the caller wants
    - also update `thread/list` in app-server to expose this (default to
    `created_at` if not specified)
    
    All existing codepaths (app-server, TUI) still default to `created_at`,
    so no behavior change is expected with this PR.
    
    **Implementation**
    To sort by `updated_at` is a bit nontrivial (whereas `created_at` is
    easy due to the way we structure the folders and filenames on disk,
    which are all based on `created_at`).
    
    The most naive way to do this without introducing a cache file or sqlite
    DB (which we have to implement/maintain) is to scan files in reverse
    `created_at` order on disk, and look at the file's mtime (last modified
    timestamp according to the filesystem) until we reach `MAX_SCAN_FILES`
    (currently set to 10,000). Then, we can return the most recent N
    threads.
    
    Based on some quick and dirty benchmarking on my machine with ~1000
    rollout files, calling `thread/list` with limit 50, the `updated_at`
    path is slower as expected due to all the I/O:
    - updated-at: average 103.10 ms
    - created-at: average 41.10 ms
    
    Those absolute numbers aren't a big deal IMO, but we can certainly
    optimize this in a followup if needed by introducing more state stored
    on disk.
    
    **Caveat**
    There's also a limitation in that any files older than `MAX_SCAN_FILES`
    will be excluded, which means if a user continues a REALLY old thread,
    it's possible to not be included. In practice that should not be too big
    of an issue.
    
    If a user makes...
    - 1000 rollouts/day → threads older than 10 days won't show up
    - 100 rollouts/day → ~100 days
    
    If this becomes a problem for some reason, even more motivation to
    implement an updated_at cache.
  • Add text element metadata to protocol, app server, and core (#9331)
    The second part of breaking up PR
    https://github.com/openai/codex/pull/9116
    
    Summary:
    
    - Add `TextElement` / `ByteRange` to protocol user inputs and user
    message events with defaults.
    - Thread `text_elements` through app-server v1/v2 request handling and
    history rebuild.
    - Preserve UI metadata only in user input/events (not `ContentItem`)
    while keeping local image attachments in user events for rehydration.
    
    Details:
    
    - Protocol: `UserInput::Text` carries `text_elements`;
    `UserMessageEvent` carries `text_elements` + `local_images`.
    Serialization includes empty vectors for backward compatibility.
    - app-server-protocol: v1 defines `V1TextElement` / `V1ByteRange` in
    camelCase with conversions; v2 uses its own camelCase wrapper.
    - app-server: v1/v2 input mapping includes `text_elements`; thread
    history rebuilds include them.
    - Core: user event emission preserves UI metadata while model history
    stays clean; history replay round-trips the metadata.
  • Add migration_markdown in model_info (#9219)
    Next step would be to clean Model Upgrade in model presets
    
    ---------
    
    Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
    Co-authored-by: aibrahim-oai <219906144+aibrahim-oai@users.noreply.github.com>
  • feat: add support for building with Bazel (#8875)
    This PR configures Codex CLI so it can be built with
    [Bazel](https://bazel.build) in addition to Cargo. The `.bazelrc`
    includes configuration so that remote builds can be done using
    [BuildBuddy](https://www.buildbuddy.io).
    
    If you are familiar with Bazel, things should work as you expect, e.g.,
    run `bazel test //... --keep-going` to run all the tests in the repo,
    but we have also added some new aliases in the `justfile` for
    convenience:
    
    - `just bazel-test` to run tests locally
    - `just bazel-remote-test` to run tests remotely (currently, the remote
    build is for x86_64 Linux regardless of your host platform). Note we are
    currently seeing the following test failures in the remote build, so we
    still need to figure out what is happening here:
    
    ```
    failures:
        suite::compact::manual_compact_twice_preserves_latest_user_messages
        suite::compact_resume_fork::compact_resume_after_second_compaction_preserves_history
        suite::compact_resume_fork::compact_resume_and_fork_preserve_model_history_view
    ```
    
    - `just build-for-release` to build release binaries for all
    platforms/architectures remotely
    
    To setup remote execution:
    - [Create a buildbuddy account](https://app.buildbuddy.io/) (OpenAI
    employees should also request org access at
    https://openai.buildbuddy.io/join/ with their `@openai.com` email
    address.)
    - [Copy your API key](https://app.buildbuddy.io/docs/setup/) to
    `~/.bazelrc` (add the line `build
    --remote_header=x-buildbuddy-api-key=YOUR_KEY`)
    - Use `--config=remote` in your `bazel` invocations (or add `common
    --config=remote` to your `~/.bazelrc`, or use the `just` commands)
    
    ## CI
    
    In terms of CI, this PR introduces `.github/workflows/bazel.yml`, which
    uses Bazel to run the tests _locally_ on Mac and Linux GitHub runners
    (we are working on supporting Windows, but that is not ready yet). Note
    that the failures we are seeing in `just bazel-remote-test` do not occur
    on these GitHub CI jobs, so everything in `.github/workflows/bazel.yml`
    is green right now.
    
    The `bazel.yml` uses extra config in `.github/workflows/ci.bazelrc` so
    that macOS CI jobs build _remotely_ on Linux hosts (using the
    `docker://docker.io/mbolin491/codex-bazel` Docker image declared in the
    root `BUILD.bazel`) using cross-compilation to build the macOS
    artifacts. Then these artifacts are downloaded locally to GitHub's macOS
    runner so the tests can be executed natively. This is the relevant
    config that enables this:
    
    ```
    common:macos --config=remote
    common:macos --strategy=remote
    common:macos --strategy=TestRunner=darwin-sandbox,local
    ```
    
    Because of the remote caching benefits we get from BuildBuddy, these new
    CI jobs can be extremely fast! For example, consider these two jobs that
    ran all the tests on Linux x86_64:
    
    - Bazel 1m37s
    https://github.com/openai/codex/actions/runs/20861063212/job/59940545209?pr=8875
    - Cargo 9m20s
    https://github.com/openai/codex/actions/runs/20861063192/job/59940559592?pr=8875
    
    For now, we will continue to run both the Bazel and Cargo jobs for PRs,
    but once we add support for Windows and running Clippy, we should be
    able to cutover to using Bazel exclusively for PRs, which should still
    speed things up considerably. We will probably continue to run the Cargo
    jobs post-merge for commits that land on `main` as a sanity check.
    
    Release builds will also continue to be done by Cargo for now.
    
    Earlier attempt at this PR: https://github.com/openai/codex/pull/8832
    Earlier attempt to add support for Buck2, now abandoned:
    https://github.com/openai/codex/pull/8504
    
    ---------
    
    Co-authored-by: David Zbarsky <dzbarsky@gmail.com>
    Co-authored-by: Michael Bolin <mbolin@openai.com>
  • fix(app-server): set originator header from initialize JSON-RPC request (#8873)
    **Motivation**
    The `originator` header is important for codex-backend’s Responses API
    proxy because it identifies the real end client (codex cli, codex vscode
    extension, codex exec, future IDEs) and is used to categorize requests
    by client for our enterprise compliance API.
    
    Today the `originator` header is set by either:
    - the `CODEX_INTERNAL_ORIGINATOR_OVERRIDE` env var (our VSCode extension
    does this)
    - calling `set_default_originator()` which sets a global immutable
    singleton (`codex exec` does this)
    
    For `codex app-server`, we want the `initialize` JSON-RPC request to set
    that header because it is a natural place to do so. Example:
    ```json
    {
      "method": "initialize",
      "id": 0,
      "params": {
        "clientInfo": {
          "name": "codex_vscode",
          "title": "Codex VS Code Extension",
          "version": "0.1.0"
        }
      }
    }
    ```
    and when app-server receives that request, it can call
    `set_default_originator()`. This is a much more natural interface than
    asking third party developers to set an env var.
    
    One hiccup is that `originator()` reads the global singleton and locks
    in the value, preventing a later `set_default_originator()` call from
    setting it. This would be fine but is brittle, since any codepath that
    calls `originator()` before app-server can process an `initialize`
    JSON-RPC call would prevent app-server from setting it. This was
    actually the case with OTEL initialization which runs on boot, but I
    also saw this behavior in certain tests.
    
    Instead, what we now do is:
    - [unchanged] If `CODEX_INTERNAL_ORIGINATOR_OVERRIDE` env var is set,
    `originator()` would return that value and `set_default_originator()`
    with some other value does NOT override it.
    - [new] If no env var is set, `originator()` would return the default
    value which is `codex_cli_rs` UNTIL `set_default_originator()` is called
    once, in which case it is set to the new value and becomes immutable.
    Later calls to `set_default_originator()` returns
    `SetOriginatorError::AlreadyInitialized`.
    
    **Other notes**
    - I updated `codex_core::otel_init::build_provider` to accepts a service
    name override, and app-server sends a hardcoded `codex_app_server`
    service name to distinguish it from `codex_cli_rs` used by default (e.g.
    TUI).
    
    **Next steps**
    - Update VSCE to set the proper value for `clientInfo.name` on
    `initialize` and drop the `CODEX_INTERNAL_ORIGINATOR_OVERRIDE` env var.
    - Delete support for `CODEX_INTERNAL_ORIGINATOR_OVERRIDE` in codex-rs.
  • [chore] move app server tests from chat completion to responses (#8939)
    We are deprecating chat completions. Move all app server tests from chat
    completion to responses.
  • feat: fork conversation/thread (#8866)
    ## Summary
    - add thread/conversation fork endpoints to the protocol (v1 + v2)
    - implement fork handling in app-server using thread manager and config
    overrides
    - add fork coverage in app-server tests and document `thread/fork` usage
  • [fix] app server flaky send_messages test (#8874)
    Fix flakiness of CI test:
    https://github.com/openai/codex/actions/runs/20350530276/job/58473691434?pr=8282
    
    This PR does two things:
    1. move the flakiness test to use responses API instead of chat
    completion API
    2. make mcp_process agnostic to the order of
    responses/notifications/requests that come in, by buffering messages not
    read
  • [fix] app server flaky thread/resume tests (#8870)
    Fix flakiness of CI tests:
    https://github.com/openai/codex/actions/runs/20350530276/job/58473691443?pr=8282
    
    This PR does two things:
    1. test with responses API instead of chat completions API in
    thread_resume tests;
    2. have a new responses API fixture that mocks out arbitrary numbers of
    responses API calls (including no calls) and have the same repeated
    response.
    
    Tested by CI
  • Fix app-server write_models_cache to treat models with less priority number as higher priority. (#8844)
    Rank models with p0 higher than p1. This shouldn't result in any
    behavioral changes. Just reordering.
  • Merge Modelfamily into modelinfo (#8763)
    - Merge ModelFamily into ModelInfo
    - Remove logic for adding instructions to apply patch
    - Add compaction limit and visible context window to `ModelInfo`
  • chore: unify conversation with thread name (#8830)
    Done and verified by Codex + refactor feature of RustRover
  • feat(app-server): thread/rollback API (#8454)
    Add `thread/rollback` to app-server to support IDEs undo-ing the last N
    turns of a thread.
    
    For context, an IDE partner will be supporting an "undo" capability
    where the IDE (the app-server client) will be responsible for reverting
    the local changes made during the last turn. To support this well, we
    also need a way to drop the last turn (or more generally, the last N
    turns) from the agent's context. This is what `thread/rollback` does.
    
    **Core idea**: A Thread rollback is represented as a persisted event
    message (EventMsg::ThreadRollback) in the rollout JSONL file, not by
    rewriting history. On resume, both the model's context (core replay) and
    the UI turn list (app-server v2's thread history builder) apply these
    markers so the pruned history is consistent across live conversations
    and `thread/resume`.
    
    Implementation notes:
    - Rollback only affects agent context and appends to the rollout file;
    clients are responsible for reverting files on disk.
    - If a thread rollback is currently in progress, subsequent
    `thread/rollback` calls are rejected.
    - Because we use `CodexConversation::submit` and codex core tracks
    active turns, returning an error on concurrent rollbacks is communicated
    via an `EventMsg::Error` with a new variant
    `CodexErrorInfo::ThreadRollbackFailed`. app-server watches for that and
    sends the BAD_REQUEST RPC response.
    
    Tests cover thread rollbacks in both core and app-server, including when
    `num_turns` > existing turns (which clears all turns).
    
    **Note**: this explicitly does **not** behave like `/undo` which we just
    removed from the CLI, which does the opposite of what `thread/rollback`
    does. `/undo` reverts local changes via ghost commits/snapshots and does
    not modify the agent's context / conversation history.
  • feat: introduce codex-utils-cargo-bin as an alternative to assert_cmd::Command (#8496)
    This PR introduces a `codex-utils-cargo-bin` utility crate that
    wraps/replaces our use of `assert_cmd::Command` and
    `escargot::CargoBuild`.
    
    As you can infer from the introduction of `buck_project_root()` in this
    PR, I am attempting to make it possible to build Codex under
    [Buck2](https://buck2.build) as well as `cargo`. With Buck2, I hope to
    achieve faster incremental local builds (largely due to Buck2's
    [dice](https://buck2.build/docs/insights_and_knowledge/modern_dice/)
    build strategy, as well as benefits from its local build daemon) as well
    as faster CI builds if we invest in remote execution and caching.
    
    See
    https://buck2.build/docs/getting_started/what_is_buck2/#why-use-buck2-key-advantages
    for more details about the performance advantages of Buck2.
    
    Buck2 enforces stronger requirements in terms of build and test
    isolation. It discourages assumptions about absolute paths (which is key
    to enabling remote execution). Because the `CARGO_BIN_EXE_*` environment
    variables that Cargo provides are absolute paths (which
    `assert_cmd::Command` reads), this is a problem for Buck2, which is why
    we need this `codex-utils-cargo-bin` utility.
    
    My WIP-Buck2 setup sets the `CARGO_BIN_EXE_*` environment variables
    passed to a `rust_test()` build rule as relative paths.
    `codex-utils-cargo-bin` will resolve these values to absolute paths,
    when necessary.
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/8496).
    * #8498
    * __->__ #8496