Commit Graph

148 Commits

  • feat(core) RequestRule (#9489)
    ## Summary
    Instead of trying to derive the prefix_rule for a command mechanically,
    let's let the model decide for us.
    
    ## Testing
    - [x] tested locally
  • fix(app-server, core): defer initial context write to rollout file until first turn (#9950)
    ### Overview
    Currently calling `thread/resume` will always bump the thread's
    `updated_at` timestamp. This PR makes it the `updated_at` timestamp
    changes only if a turn is triggered.
    
    ### Additonal context
    What we typically do on resuming a thread is **always** writing “initial
    context” to the rollout file immediately. This initial context includes:
    - Developer instructions derived from sandbox/approval policy + cwd
    - Optional developer instructions (if provided)
    - Optional collaboration-mode instructions
    - Optional user instructions (if provided)
    - Environment context (cwd, shell, etc.)
    
    This PR defers writing the “initial context” to the rollout file until
    the first `turn/start`, so we don't inadvertently bump the thread's
    `updated_at` timestamp until a turn is actually triggered.
    
    This works even though both `thread/resume` and `turn/start` accept
    overrides (such as `model`, `cwd`, etc.) because the initial context is
    seeded from the effective `TurnContext` in memory, computed at
    `turn/start` time, after both sets of overrides have been applied.
    
    **NOTE**: This is a very short-lived solution until we introduce sqlite.
    Then we can remove this.
  • Add thread/unarchive to restore archived rollouts (#9843)
    ## Summary
    - Adds a new `thread/unarchive` RPC to move archived thread rollouts
    back into the active `sessions/` tree.
    
    ## What changed
    - **Protocol**
      - Adds `thread/unarchive` request/response types and wiring.
    - **Server**
      - Implements `thread_unarchive` in the app server.
      - Validates the archived rollout path and thread ID.
    - Restores the rollout to `sessions/YYYY/MM/DD/...` based on the rollout
    filename timestamp.
    - **Core**
    - Adds `find_archived_thread_path_by_id_str` helper for archived
    rollouts.
    - **Docs**
      - Documents the new RPC and usage example.
    - **Tests**
      - Adds an end-to-end server test that:
        1) starts a thread,
        2) archives it,
        3) unarchives it,
        4) asserts the file is restored to `sessions/`.
    
    ## How to use
    ```json
    { "method": "thread/unarchive", "id": 24, "params": { "threadId": "<thread-id>" } }
    ```
    
    ## Author Codex Session
    
    `codex resume 019bf158-54b6-7960-a696-9d85df7e1bc1` (soon I'll make this
    kind of session UUID forkable by anyone with the right
    `session_object_storage_url` line in their config, but for now just
    pasting it here for my reference)
  • Feat: add isOther to question returned by request user input tool (#9890)
    ### Summary
    Add `isOther` to question object from request_user_input tool input and
    remove `other` option from the tool prompt to better handle tool input.
  • Fix flakey conversation flow test (#9784)
    I've seen this test fail with:
    
    ```
     - Mock #1.
            	Expected range of matching incoming requests: == 2
            	Number of matched incoming requests: 1
    ```
    
    This is because we pop the wrong task_complete events and then the test
    exits. I think this is because the MCP events are now buffered after
    https://github.com/openai/codex/pull/8874.
    
    So:
    1. clear the buffer before we do any user message sending
    2. additionally listen for task start before task complete
    3. use the ID from task start to find the correct task complete event.
  • feat: dynamic tools injection (#9539)
    ## Summary
    Add dynamic tool injection to thread startup in API v2, wire dynamic
    tool calls through the app server to clients, and plumb responses back
    into the model tool pipeline.
    
    ### Flow (high level)
    - Thread start injects `dynamic_tools` into the model tool list for that
    thread (validation is done here).
    - When the model emits a tool call for one of those names, core raises a
    `DynamicToolCallRequest` event.
    - The app server forwards it to the client as `item/tool/call`, waits
    for the client’s response, then submits a `DynamicToolResponse` back to
    core.
    - Core turns that into a `function_call_output` in the next model
    request so the model can continue.
    
    ### What changed
    - Added dynamic tool specs to v2 thread start params and protocol types;
    introduced `item/tool/call` (request/response) for dynamic tool
    execution.
    - Core now registers dynamic tool specs at request time and routes those
    calls via a new dynamic tool handler.
    - App server validates tool names/schemas, forwards dynamic tool call
    requests to clients, and publishes tool outputs back into the session.
    - Integration tests
  • chore(core) move model_instructions_template config (#9871)
    ## Summary
    Move `model_instructions_template` config to the experimental slug while
    we iterate on this feature
    
    ## Testing
    - [x] Tested locally, unit tests still pass
  • feat(tui) /personality (#9718)
    ## Summary
    Adds /personality selector in the TUI, which leverages the new core
    interface in #9644
    
    Notes:
    - We are doing some of our own state management for model_info loading
    here, but not sure if that's ideal. open to opinions on simpler
    approach, but would like to avoid blocking on a larger refactor
    - Right now, the `/personality` selector just hides when the model
    doesn't support it. we can update this behavior down the line
    
    ## Testing
    - [x] Tested locally
    - [x] Added snapshot tests
  • Use collaboration mode masks without mutating base settings (#9806)
    Keep an unmasked base collaboration mode and apply the active mask on
    demand. Simplify the TUI mask helpers and update tests/docs to match the
    mask contract.
  • feat: ephemeral threads (#9765)
    Add ephemeral threads capabilities. Only exposed through the
    `app-server` v2
    
    The idea is to disable the rollout recorder for those threads.
  • change collaboration mode to struct (#9793)
    Shouldn't cause behavioral change
  • feat(app-server) Expose personality (#9674)
    ### Motivation
    Exposes a per-thread / per-turn `personality` override in the v2
    app-server API so clients can influence model communication style at
    thread/turn start. Ensures the override is passed into the session
    configuration resolution so it becomes effective for subsequent turns
    and headless runners.
    
    ### Testing
    - [x] Add an integration-style test
    `turn_start_accepts_personality_override_v2` in
    `codex-rs/app-server/tests/suite/v2/turn_start.rs` that verifies a
    `/personality` override results in a developer update message containing
    `<personality_spec>` in the outbound model request.
    
    ------
    [Codex
    Task](https://chatgpt.com/codex/tasks/task_i_6971d646b1c08322a689a54d2649f3fe)
  • [connectors] Support connectors part 1 - App server & MCP (#9667)
    In order to make Codex work with connectors, we add a built-in gateway
    MCP that acts as a transparent proxy between the client and the
    connectors. The gateway MCP collects actions that are accessible to the
    user and sends them down to the user, when a connector action is chosen
    to be called, the client invokes the action through the gateway MCP as
    well.
    
     - [x] Add the system built-in gateway MCP to list and run connectors.
     - [x] Add the app server methods and protocol
  • Support end_turn flag (#9698)
    Experimental flag that signals the end of the turn.
  • feat(core) ModelInfo.model_instructions_template (#9597)
    ## Summary
    #9555 is the start of a rename, so I'm starting to standardize here.
    Sets up `model_instructions` templating with a strongly-typed object for
    injecting a personality block into the model instructions.
    
    ## Testing
    - [x] Added tests
    - [x] Ran locally
  • Add layered config.toml support to app server (#9510)
    This PR adds support for chained (layered) config.toml file merging for
    clients that use the app server interface. This feature already exists
    for the TUI, but it does not work for GUI clients.
    
    It does the following:
    * Changes code paths for new thread, resume thread, and fork thread to
    use the effective config based on the cwd.
    * Updates the `config/read` API to accept an optional `cwd` parameter.
    If specified, the API returns the effective config based on that cwd
    path. Also optionally includes all layers including project config
    files. If cwd is not specified, the API falls back on its older behavior
    where it considers only the global (non-project) config files when
    computing the effective config.
    
    The changes in codex_message_processor.rs look deceptively large. They
    mostly just involve moving existing blocks of code to a later point in
    some functions so it can use the cwd to calculate the config.
    
    This PR builds upon #9509 and should be reviewed and merged after that
    PR.
    
    Tested:
    * Verified change with (dependent, as-yet-uncommitted) changes to IDE
    Extension and confirmed correct behavior
    
    The full fix requires additional changes in the IDE Extension code base,
    but they depend on this PR.
  • Chore: update plan mode output in prompt (#9592)
    ### Summary
    * Update plan prompt output
    * Update requestUserInput response to be a single key value pair
    `answer: String`.
  • Reject ask user question tool in Execute and Custom (#9560)
    ## Summary
    - Keep `request_user_input` in the tool list but reject it at runtime in
    Execute/Custom modes with a clear model-facing error.
    - Add a session accessor for current collaboration mode and enforce the
    gate in the request_user_input handler.
    - Update core/app-server tests to use Plan mode for success and add
    Execute/Custom rejection coverage.
  • Show session header before configuration (#9568)
    We were skipping if we know the model. We shouldn't
  • Add total (non-partial) TextElement placeholder accessors (#9545)
    ## Summary
    - Make `TextElement` placeholders private and add a text-backed accessor
    to avoid assuming `Some`.
    - Since they are optional in the protocol, we want to make sure any
    accessors properly handle the None case (getting the placeholder using
    the byte range in the text)
    - Preserve placeholders during protocol/app-server conversions using the
    accessor fallback.
    - Update TUI composer/remap logic and tests to use the new
    constructor/accessor.
  • fix(core) Preserve base_instructions in SessionMeta (#9427)
    ## Summary
    This PR consolidates base_instructions onto SessionMeta /
    SessionConfiguration, so we ensure `base_instructions` is set once per
    session and should be (mostly) immutable, unless:
    - overridden by config on resume / fork
    - sub-agent tasks, like review or collab
    
    
    In a future PR, we should convert all references to `base_instructions`
    to consistently used the typed struct, so it's less likely that we put
    other strings there. See #9423. However, this PR is already quite
    complex, so I'm deferring that to a follow-up.
    
    ## Testing
    - [x] Added a resume test to assert that instructions are preserved. In
    particular, `resume_switches_models_preserves_base_instructions` fails
    against main.
    
    Existing test coverage thats assert base instructions are preserved
    across multiple requests in a session:
    - Manual compact keeps baseline instructions:
    core/tests/suite/compact.rs:199
    - Auto-compact keeps baseline instructions:
    core/tests/suite/compact.rs:1142
    - Prompt caching reuses the same instructions across two requests:
    core/tests/suite/prompt_caching.rs:150 and
    core/tests/suite/prompt_caching.rs:157
    - Prompt caching with explicit expected string across two requests:
    core/tests/suite/prompt_caching.rs:213 and
    core/tests/suite/prompt_caching.rs:222
    - Resume with model switch keeps original instructions:
    core/tests/suite/resume.rs:136
    - Compact/resume/fork uses request 0 instructions for later expected
    payloads: core/tests/suite/compact_resume_fork.rs:215
  • Feat: request user input tool (#9472)
    ### Summary
    * Add `requestUserInput` tool that the model can use for gather
    feedback/asking question mid turn.
    
    
    ### Tool input schema
    ```
    {
      "$schema": "http://json-schema.org/draft-07/schema#",
      "title": "requestUserInput input",
      "type": "object",
      "additionalProperties": false,
      "required": ["questions"],
      "properties": {
        "questions": {
          "type": "array",
          "description": "Questions to show the user (1-3). Prefer 1 unless multiple independent decisions block progress.",
          "minItems": 1,
          "maxItems": 3,
          "items": {
            "type": "object",
            "additionalProperties": false,
            "required": ["id", "header", "question"],
            "properties": {
              "id": {
                "type": "string",
                "description": "Stable identifier for mapping answers (snake_case)."
              },
              "header": {
                "type": "string",
                "description": "Short header label shown in the UI (12 or fewer chars)."
              },
              "question": {
                "type": "string",
                "description": "Single-sentence prompt shown to the user."
              },
              "options": {
                "type": "array",
                "description": "Optional 2-3 mutually exclusive choices. Put the recommended option first and suffix its label with \"(Recommended)\". Only include \"Other\" option if we want to include a free form option. If the question is free form in nature, do not include any option.",
                "minItems": 2,
                "maxItems": 3,
                "items": {
                  "type": "object",
                  "additionalProperties": false,
                  "required": ["value", "label", "description"],
                  "properties": {
                    "value": {
                      "type": "string",
                      "description": "Machine-readable value (snake_case)."
                    },
                    "label": {
                      "type": "string",
                      "description": "User-facing label (1-5 words)."
                    },
                    "description": {
                      "type": "string",
                      "description": "One short sentence explaining impact/tradeoff if selected."
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
    ```
    
    ### Tool output schema
    ```
    {
      "$schema": "http://json-schema.org/draft-07/schema#",
      "title": "requestUserInput output",
      "type": "object",
      "additionalProperties": false,
      "required": ["answers"],
      "properties": {
        "answers": {
          "type": "object",
          "description": "Map of question id to user answer.",
          "additionalProperties": {
            "type": "object",
            "additionalProperties": false,
            "required": ["selected"],
            "properties": {
              "selected": {
                "type": "array",
                "items": { "type": "string" }
              },
              "other": {
                "type": ["string", "null"]
              }
            }
          }
        }
      }
    }
    ```
  • Add collaboration modes test prompts (#9443)
    # External (non-OpenAI) Pull Request Requirements
    
    Before opening this Pull Request, please read the dedicated
    "Contributing" markdown file or your PR may be closed:
    https://github.com/openai/codex/blob/main/docs/contributing.md
    
    If your PR conforms to our contribution guidelines, replace this text
    with a detailed and high quality description of your changes.
    
    Include a link to a bug report or enhancement request.
  • chore(instructions) Remove unread SessionMeta.instructions field (#9423)
    ### Description
    - Remove the now-unused `instructions` field from the session metadata
    to simplify SessionMeta and stop propagating transient instruction text
    through the rollout recorder API. This was only saving
    user_instructions, and was never being read.
    - Stop passing user instructions into the rollout writer at session
    creation so the rollout header only contains canonical session metadata.
    
    ### Testing
    
    - Ran `just fmt` which completed successfully.
    - Ran `just fix -p codex-protocol`, `just fix -p codex-core`, `just fix
    -p codex-app-server`, `just fix -p codex-tui`, and `just fix -p
    codex-tui2` which completed (Clippy fixes applied) as part of
    verification.
    - Ran `cargo test -p codex-protocol` which passed (28 tests).
    - Ran `cargo test -p codex-core` which showed failures in a small set of
    tests (not caused by the protocol type change directly):
    `default_client::tests::test_create_client_sets_default_headers`,
    several `models_manager::manager::tests::refresh_available_models_*`,
    and `shell_snapshot::tests::linux_sh_snapshot_includes_sections` (these
    tests failed in this CI run).
    - Ran `cargo test -p codex-app-server` which reported several failing
    integration tests (including
    `suite::codex_message_processor_flow::test_codex_jsonrpc_conversation_flow`,
    `suite::output_schema::send_user_turn_*`, and
    `suite::user_agent::get_user_agent_returns_current_codex_user_agent`).
    - `cargo test -p codex-tui` and `cargo test -p codex-tui2` were
    attempted but aborted due to disk space exhaustion (`No space left on
    device`).
    
    ------
    [Codex
    Task](https://chatgpt.com/codex/tasks/task_i_696bd8ce632483228d298cf07c7eb41c)
  • Expose collaboration presets (#9421)
    Expose collaboration presets for clients
    
    ---------
    
    Co-authored-by: Josh McKinney <joshka@openai.com>
  • feat: show forked from session id in /status (#9330)
    Summary:
    - Add forked_from to SessionMeta/SessionConfiguredEvent and persist it
    for forked sessions.
    - Surface forked_from in /status for tui + tui2 and add snapshots.
  • feat(app-server, core): return threads by created_at or updated_at (#9247)
    Add support for returning threads by either `created_at` OR `updated_at`
    descending. Previously core always returned threads ordered by
    `created_at`.
    
    This PR:
    - updates core to be able to list threads by `updated_at` OR
    `created_at` descending based on what the caller wants
    - also update `thread/list` in app-server to expose this (default to
    `created_at` if not specified)
    
    All existing codepaths (app-server, TUI) still default to `created_at`,
    so no behavior change is expected with this PR.
    
    **Implementation**
    To sort by `updated_at` is a bit nontrivial (whereas `created_at` is
    easy due to the way we structure the folders and filenames on disk,
    which are all based on `created_at`).
    
    The most naive way to do this without introducing a cache file or sqlite
    DB (which we have to implement/maintain) is to scan files in reverse
    `created_at` order on disk, and look at the file's mtime (last modified
    timestamp according to the filesystem) until we reach `MAX_SCAN_FILES`
    (currently set to 10,000). Then, we can return the most recent N
    threads.
    
    Based on some quick and dirty benchmarking on my machine with ~1000
    rollout files, calling `thread/list` with limit 50, the `updated_at`
    path is slower as expected due to all the I/O:
    - updated-at: average 103.10 ms
    - created-at: average 41.10 ms
    
    Those absolute numbers aren't a big deal IMO, but we can certainly
    optimize this in a followup if needed by introducing more state stored
    on disk.
    
    **Caveat**
    There's also a limitation in that any files older than `MAX_SCAN_FILES`
    will be excluded, which means if a user continues a REALLY old thread,
    it's possible to not be included. In practice that should not be too big
    of an issue.
    
    If a user makes...
    - 1000 rollouts/day → threads older than 10 days won't show up
    - 100 rollouts/day → ~100 days
    
    If this becomes a problem for some reason, even more motivation to
    implement an updated_at cache.
  • Add text element metadata to protocol, app server, and core (#9331)
    The second part of breaking up PR
    https://github.com/openai/codex/pull/9116
    
    Summary:
    
    - Add `TextElement` / `ByteRange` to protocol user inputs and user
    message events with defaults.
    - Thread `text_elements` through app-server v1/v2 request handling and
    history rebuild.
    - Preserve UI metadata only in user input/events (not `ContentItem`)
    while keeping local image attachments in user events for rehydration.
    
    Details:
    
    - Protocol: `UserInput::Text` carries `text_elements`;
    `UserMessageEvent` carries `text_elements` + `local_images`.
    Serialization includes empty vectors for backward compatibility.
    - app-server-protocol: v1 defines `V1TextElement` / `V1ByteRange` in
    camelCase with conversions; v2 uses its own camelCase wrapper.
    - app-server: v1/v2 input mapping includes `text_elements`; thread
    history rebuilds include them.
    - Core: user event emission preserves UI metadata while model history
    stays clean; history replay round-trips the metadata.
  • Add migration_markdown in model_info (#9219)
    Next step would be to clean Model Upgrade in model presets
    
    ---------
    
    Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
    Co-authored-by: aibrahim-oai <219906144+aibrahim-oai@users.noreply.github.com>
  • Add text element metadata to types (#9235)
    Initial type tweaking PR to make the diff of
    https://github.com/openai/codex/pull/9116 smaller
    
    This should not change any behavior, just adds some fields to types
  • feat(app-server): add an --analytics-default-enabled flag (#9118)
    Add a new `codex app-server --analytics-default-enabled` CLI flag that
    controls whether analytics are enabled by default.
    
    Analytics are disabled by default for app-server. Users have to
    explicitly opt in
    via the `analytics` section in the config.toml file.
    
    However, for first-party use cases like the VSCode IDE extension, we
    default analytics
    to be enabled by default by setting this flag. Users can still opt out
    by setting this
    in their config.toml:
    
    ```toml
    [analytics]
    enabled = false
    ```
    
    See https://developers.openai.com/codex/config-advanced/#metrics for
    more details.
  • Assemble sandbox/approval/network prompts dynamically (#8961)
    - Add a single builder for developer permissions messaging that accepts
    SandboxPolicy and approval policy. This builder now drives the developer
    “permissions” message that’s injected at session start and any time
    sandbox/approval settings change.
    - Trim EnvironmentContext to only include cwd, writable roots, and
    shell; removed sandbox/approval/network duplication and adjusted XML
    serialization and tests accordingly.
    
    Follow-up: adding a config value to replace the developer permissions
    message for custom sandboxes.
  • feat: add support for building with Bazel (#8875)
    This PR configures Codex CLI so it can be built with
    [Bazel](https://bazel.build) in addition to Cargo. The `.bazelrc`
    includes configuration so that remote builds can be done using
    [BuildBuddy](https://www.buildbuddy.io).
    
    If you are familiar with Bazel, things should work as you expect, e.g.,
    run `bazel test //... --keep-going` to run all the tests in the repo,
    but we have also added some new aliases in the `justfile` for
    convenience:
    
    - `just bazel-test` to run tests locally
    - `just bazel-remote-test` to run tests remotely (currently, the remote
    build is for x86_64 Linux regardless of your host platform). Note we are
    currently seeing the following test failures in the remote build, so we
    still need to figure out what is happening here:
    
    ```
    failures:
        suite::compact::manual_compact_twice_preserves_latest_user_messages
        suite::compact_resume_fork::compact_resume_after_second_compaction_preserves_history
        suite::compact_resume_fork::compact_resume_and_fork_preserve_model_history_view
    ```
    
    - `just build-for-release` to build release binaries for all
    platforms/architectures remotely
    
    To setup remote execution:
    - [Create a buildbuddy account](https://app.buildbuddy.io/) (OpenAI
    employees should also request org access at
    https://openai.buildbuddy.io/join/ with their `@openai.com` email
    address.)
    - [Copy your API key](https://app.buildbuddy.io/docs/setup/) to
    `~/.bazelrc` (add the line `build
    --remote_header=x-buildbuddy-api-key=YOUR_KEY`)
    - Use `--config=remote` in your `bazel` invocations (or add `common
    --config=remote` to your `~/.bazelrc`, or use the `just` commands)
    
    ## CI
    
    In terms of CI, this PR introduces `.github/workflows/bazel.yml`, which
    uses Bazel to run the tests _locally_ on Mac and Linux GitHub runners
    (we are working on supporting Windows, but that is not ready yet). Note
    that the failures we are seeing in `just bazel-remote-test` do not occur
    on these GitHub CI jobs, so everything in `.github/workflows/bazel.yml`
    is green right now.
    
    The `bazel.yml` uses extra config in `.github/workflows/ci.bazelrc` so
    that macOS CI jobs build _remotely_ on Linux hosts (using the
    `docker://docker.io/mbolin491/codex-bazel` Docker image declared in the
    root `BUILD.bazel`) using cross-compilation to build the macOS
    artifacts. Then these artifacts are downloaded locally to GitHub's macOS
    runner so the tests can be executed natively. This is the relevant
    config that enables this:
    
    ```
    common:macos --config=remote
    common:macos --strategy=remote
    common:macos --strategy=TestRunner=darwin-sandbox,local
    ```
    
    Because of the remote caching benefits we get from BuildBuddy, these new
    CI jobs can be extremely fast! For example, consider these two jobs that
    ran all the tests on Linux x86_64:
    
    - Bazel 1m37s
    https://github.com/openai/codex/actions/runs/20861063212/job/59940545209?pr=8875
    - Cargo 9m20s
    https://github.com/openai/codex/actions/runs/20861063192/job/59940559592?pr=8875
    
    For now, we will continue to run both the Bazel and Cargo jobs for PRs,
    but once we add support for Windows and running Clippy, we should be
    able to cutover to using Bazel exclusively for PRs, which should still
    speed things up considerably. We will probably continue to run the Cargo
    jobs post-merge for commits that land on `main` as a sanity check.
    
    Release builds will also continue to be done by Cargo for now.
    
    Earlier attempt at this PR: https://github.com/openai/codex/pull/8832
    Earlier attempt to add support for Buck2, now abandoned:
    https://github.com/openai/codex/pull/8504
    
    ---------
    
    Co-authored-by: David Zbarsky <dzbarsky@gmail.com>
    Co-authored-by: Michael Bolin <mbolin@openai.com>
  • fix(app-server): set originator header from initialize JSON-RPC request (#8873)
    **Motivation**
    The `originator` header is important for codex-backend’s Responses API
    proxy because it identifies the real end client (codex cli, codex vscode
    extension, codex exec, future IDEs) and is used to categorize requests
    by client for our enterprise compliance API.
    
    Today the `originator` header is set by either:
    - the `CODEX_INTERNAL_ORIGINATOR_OVERRIDE` env var (our VSCode extension
    does this)
    - calling `set_default_originator()` which sets a global immutable
    singleton (`codex exec` does this)
    
    For `codex app-server`, we want the `initialize` JSON-RPC request to set
    that header because it is a natural place to do so. Example:
    ```json
    {
      "method": "initialize",
      "id": 0,
      "params": {
        "clientInfo": {
          "name": "codex_vscode",
          "title": "Codex VS Code Extension",
          "version": "0.1.0"
        }
      }
    }
    ```
    and when app-server receives that request, it can call
    `set_default_originator()`. This is a much more natural interface than
    asking third party developers to set an env var.
    
    One hiccup is that `originator()` reads the global singleton and locks
    in the value, preventing a later `set_default_originator()` call from
    setting it. This would be fine but is brittle, since any codepath that
    calls `originator()` before app-server can process an `initialize`
    JSON-RPC call would prevent app-server from setting it. This was
    actually the case with OTEL initialization which runs on boot, but I
    also saw this behavior in certain tests.
    
    Instead, what we now do is:
    - [unchanged] If `CODEX_INTERNAL_ORIGINATOR_OVERRIDE` env var is set,
    `originator()` would return that value and `set_default_originator()`
    with some other value does NOT override it.
    - [new] If no env var is set, `originator()` would return the default
    value which is `codex_cli_rs` UNTIL `set_default_originator()` is called
    once, in which case it is set to the new value and becomes immutable.
    Later calls to `set_default_originator()` returns
    `SetOriginatorError::AlreadyInitialized`.
    
    **Other notes**
    - I updated `codex_core::otel_init::build_provider` to accepts a service
    name override, and app-server sends a hardcoded `codex_app_server`
    service name to distinguish it from `codex_cli_rs` used by default (e.g.
    TUI).
    
    **Next steps**
    - Update VSCE to set the proper value for `clientInfo.name` on
    `initialize` and drop the `CODEX_INTERNAL_ORIGINATOR_OVERRIDE` env var.
    - Delete support for `CODEX_INTERNAL_ORIGINATOR_OVERRIDE` in codex-rs.
  • [chore] move app server tests from chat completion to responses (#8939)
    We are deprecating chat completions. Move all app server tests from chat
    completion to responses.
  • feat: fork conversation/thread (#8866)
    ## Summary
    - add thread/conversation fork endpoints to the protocol (v1 + v2)
    - implement fork handling in app-server using thread manager and config
    overrides
    - add fork coverage in app-server tests and document `thread/fork` usage
  • [fix] app server flaky send_messages test (#8874)
    Fix flakiness of CI test:
    https://github.com/openai/codex/actions/runs/20350530276/job/58473691434?pr=8282
    
    This PR does two things:
    1. move the flakiness test to use responses API instead of chat
    completion API
    2. make mcp_process agnostic to the order of
    responses/notifications/requests that come in, by buffering messages not
    read
  • [fix] app server flaky thread/resume tests (#8870)
    Fix flakiness of CI tests:
    https://github.com/openai/codex/actions/runs/20350530276/job/58473691443?pr=8282
    
    This PR does two things:
    1. test with responses API instead of chat completions API in
    thread_resume tests;
    2. have a new responses API fixture that mocks out arbitrary numbers of
    responses API calls (including no calls) and have the same repeated
    response.
    
    Tested by CI
  • fix: implement 'Allow this session' for apply_patch approvals (#8451)
    **Summary**
    This PR makes “ApprovalDecision::AcceptForSession / don’t ask again this
    session” actually work for `apply_patch` approvals by caching approvals
    based on absolute file paths in codex-core, properly wiring it through
    app-server v2, and exposing the choice in both TUI and TUI2.
    - This brings `apply_patch` calls to be at feature-parity with general
    shell commands, which also have a "Yes, and don't ask again" option.
    - This also fixes VSCE's "Allow this session" button to actually work.
    
    While we're at it, also split the app-server v2 protocol's
    `ApprovalDecision` enum so execpolicy amendments are only available for
    command execution approvals.
    
    **Key changes**
    - Core: per-session patch approval allowlist keyed by absolute file
    paths
    - Handles multi-file patches and renames/moves by recording both source
    and destination paths for `Update { move_path: Some(...) }`.
    - Extend the `Approvable` trait and `ApplyPatchRuntime` to work with
    multiple keys, because an `apply_patch` tool call can modify multiple
    files. For a request to be auto-approved, we will need to check that all
    file paths have been approved previously.
    - App-server v2: honor AcceptForSession for file changes
    - File-change approval responses now map AcceptForSession to
    ReviewDecision::ApprovedForSession (no longer downgraded to plain
    Approved).
    - Replace `ApprovalDecision` with two enums:
    `CommandExecutionApprovalDecision` and `FileChangeApprovalDecision`
    - TUI / TUI2: expose “don’t ask again for these files this session”
    - Patch approval overlays now include a third option (“Yes, and don’t
    ask again for these files this session (s)”).
        - Snapshot updates for the approval modal.
    
    **Tests added/updated**
    - Core:
    - Integration test that proves ApprovedForSession on a patch skips the
    next patch prompt for the same file
    - App-server:
    - v2 integration test verifying
    FileChangeApprovalDecision::AcceptForSession works properly
    
    **User-visible behavior**
    - When the user approves a patch “for session”, future patches touching
    only those previously approved file(s) will no longer prompt gain during
    that session (both via app-server v2 and TUI/TUI2).
    
    **Manual testing**
    Tested both TUI and TUI2 - see screenshots below.
    
    TUI:
    <img width="1082" height="355" alt="image"
    src="https://github.com/user-attachments/assets/adcf45ad-d428-498d-92fc-1a0a420878d9"
    />
    
    
    TUI2:
    <img width="1089" height="438" alt="image"
    src="https://github.com/user-attachments/assets/dd768b1a-2f5f-4bd6-98fd-e52c1d3abd9e"
    />
  • Fix app-server write_models_cache to treat models with less priority number as higher priority. (#8844)
    Rank models with p0 higher than p1. This shouldn't result in any
    behavioral changes. Just reordering.
  • Merge Modelfamily into modelinfo (#8763)
    - Merge ModelFamily into ModelInfo
    - Remove logic for adding instructions to apply patch
    - Add compaction limit and visible context window to `ModelInfo`