Commit Graph

76 Commits

  • fix: handle all web_search actions and in progress invocations (#9960)
    ### Summary
    - Parse all `web_search` tool actions (`search`, `find_in_page`,
    `open_page`).
    - Previously we only parsed + displayed `search`, which made the TUI
    appear to pause when the other actions were being used.
    - Show in progress `web_search` calls as `Searching the web`
      - Previously we only showed completed tool calls
    
    <img width="308" height="149" alt="image"
    src="https://github.com/user-attachments/assets/90a4e8ff-b06a-48ff-a282-b57b31121845"
    />
    
    ### Tests
    Added + updated tests, tested locally
    
    ### Follow ups
    Update VSCode extension to display these as well
  • Fix flakey resume test (#9789)
    Sessions' `updated_at` times are truncated to seconds, with the UUID
    session ID used to break ties. If the two test sessions are created in
    the same second, AND the session B UUID < session A UUID, the test
    fails.
    
    Fix this by mutating the session mtimes, from which we derive the
    updated_at time, to ensure session B is updated_at later than session A.
  • feat: ephemeral threads (#9765)
    Add ephemeral threads capabilities. Only exposed through the
    `app-server` v2
    
    The idea is to disable the rollout recorder for those threads.
  • feat: show forked from session id in /status (#9330)
    Summary:
    - Add forked_from to SessionMeta/SessionConfiguredEvent and persist it
    for forked sessions.
    - Surface forked_from in /status for tui + tui2 and add snapshots.
  • Made codex exec resume --last consistent with codex resume --last (#9352)
    PR #9245 made `codex resume --last` honor cwd, but I forgot to make the
    same change for `codex exec resume --last`. This PR fixes the
    inconsistency.
    
    This addresses #8700
  • feat: introduce find_resource! macro that works with Cargo or Bazel (#8879)
    To support Bazelification in https://github.com/openai/codex/pull/8875,
    this PR introduces a new `find_resource!` macro that we use in place of
    our existing logic in tests that looks for resources relative to the
    compile-time `CARGO_MANIFEST_DIR` env var.
    
    To make this work, we plan to add the following to all `rust_library()`
    and `rust_test()` Bazel rules in the project:
    
    ```
    rustc_env = {
        "BAZEL_PACKAGE": native.package_name(),
    },
    ```
    
    Our new `find_resource!` macro reads this value via
    `option_env!("BAZEL_PACKAGE")` so that the Bazel package _of the code
    using `find_resource!`_ is injected into the code expanded from the
    macro. (If `find_resource()` were a function, then
    `option_env!("BAZEL_PACKAGE")` would always be
    `codex-rs/utils/cargo-bin`, which is not what we want.)
    
    Note we only consider the `BAZEL_PACKAGE` value when the `RUNFILES_DIR`
    environment variable is set at runtime, indicating that the test is
    being run by Bazel. In this case, we have to concatenate the runtime
    `RUNFILES_DIR` with the compile-time `BAZEL_PACKAGE` value to build the
    path to the resource.
    
    In testing this change, I discovered one funky edge case in
    `codex-rs/exec-server/tests/common/lib.rs` where we have to _normalize_
    (but not canonicalize!) the result from `find_resource!` because the
    path contains a `common/..` component that does not exist on disk when
    the test is run under Bazel, so it must be semantically normalized using
    the [`path-absolutize`](https://crates.io/crates/path-absolutize) crate
    before it is passed to `dotslash fetch`.
    
    Because this new behavior may be non-obvious, this PR also updates
    `AGENTS.md` to make humans/Codex aware that this API is preferred.
  • chore: unify conversation with thread name (#8830)
    Done and verified by Codex + refactor feature of RustRover
  • Allow global exec flags after resume and fix CI codex build/timeout (#8440)
    **Motivation**
    - Bring `codex exec resume` to parity with top‑level flags so global
    options (git check bypass, json, model, sandbox toggles) work after the
    subcommand, including when outside a git repo.
    
    **Description**
    - Exec CLI: mark `--skip-git-repo-check`, `--json`, `--model`,
    `--full-auto`, and `--dangerously-bypass-approvals-and-sandbox` as
    global so they’re accepted after `resume`.
    - Tests: add `exec_resume_accepts_global_flags_after_subcommand` to
    verify those flags work when passed after `resume`.
    
    **Testing**
    - `just fmt`
    - `cargo test -p codex-exec` (pass; ran with elevated perms to allow
    network/port binds)
    - Manual: exercised `codex exec resume` with global flags after the
    subcommand to confirm behavior.
  • [chore] add additional_details to StreamErrorEvent + wire through (#8307)
    ### What
    
    Builds on #8293.
    
    Add `additional_details`, which contains the upstream error message, to
    relevant structures used to pass along retryable `StreamError`s.
    
    Uses the new TUI status indicator's `details` field (shows under the
    status header) to display the `additional_details` error to the user on
    retryable `Reconnecting...` errors. This adds clarity for users for
    retryable errors.
    
    Will make corresponding change to VSCode extension to show
    `additional_details` as expandable from the `Reconnecting...` cell.
    
    Examples:
    <img width="1012" height="326" alt="image"
    src="https://github.com/user-attachments/assets/f35e7e6a-8f5e-4a2f-a764-358101776996"
    />
    
    <img width="1526" height="358" alt="image"
    src="https://github.com/user-attachments/assets/0029cbc0-f062-4233-8650-cc216c7808f0"
    />
  • feat: introduce codex-utils-cargo-bin as an alternative to assert_cmd::Command (#8496)
    This PR introduces a `codex-utils-cargo-bin` utility crate that
    wraps/replaces our use of `assert_cmd::Command` and
    `escargot::CargoBuild`.
    
    As you can infer from the introduction of `buck_project_root()` in this
    PR, I am attempting to make it possible to build Codex under
    [Buck2](https://buck2.build) as well as `cargo`. With Buck2, I hope to
    achieve faster incremental local builds (largely due to Buck2's
    [dice](https://buck2.build/docs/insights_and_knowledge/modern_dice/)
    build strategy, as well as benefits from its local build daemon) as well
    as faster CI builds if we invest in remote execution and caching.
    
    See
    https://buck2.build/docs/getting_started/what_is_buck2/#why-use-buck2-key-advantages
    for more details about the performance advantages of Buck2.
    
    Buck2 enforces stronger requirements in terms of build and test
    isolation. It discourages assumptions about absolute paths (which is key
    to enabling remote execution). Because the `CARGO_BIN_EXE_*` environment
    variables that Cargo provides are absolute paths (which
    `assert_cmd::Command` reads), this is a problem for Buck2, which is why
    we need this `codex-utils-cargo-bin` utility.
    
    My WIP-Buck2 setup sets the `CARGO_BIN_EXE_*` environment variables
    passed to a `rust_test()` build rule as relative paths.
    `codex-utils-cargo-bin` will resolve these values to absolute paths,
    when necessary.
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/8496).
    * #8498
    * __->__ #8496
  • feat: change ConfigLayerName into a disjoint union rather than a simple enum (#8095)
    This attempts to tighten up the types related to "config layers."
    Currently, `ConfigLayerEntry` is defined as follows:
    
    
    https://github.com/openai/codex/blob/bef36f4ae765f471d7cd69372fcf1b92c8f0367a/codex-rs/core/src/config_loader/state.rs#L19-L25
    
    but the `source` field is a bit of a lie, as:
    
    - for `ConfigLayerName::Mdm`, it is
    `"com.openai.codex/config_toml_base64"`
    - for `ConfigLayerName::SessionFlags`, it is `"--config"`
    - for `ConfigLayerName::User`, it is `"config.toml"` (just the file
    name, not the path to the `config.toml` on disk that was read)
    - for `ConfigLayerName::System`, it seems like it is usually
    `/etc/codex/managed_config.toml` in practice, though on Windows, it is
    `%CODEX_HOME%/managed_config.toml`:
    
    
    https://github.com/openai/codex/blob/bef36f4ae765f471d7cd69372fcf1b92c8f0367a/codex-rs/core/src/config_loader/layer_io.rs#L84-L101
    
    All that is to say, in three out of the four `ConfigLayerName`, `source`
    is a `PathBuf` that is not an absolute path (or even a true path).
    
    This PR tries to uplevel things by eliminating `source` from
    `ConfigLayerEntry` and turning `ConfigLayerName` into a disjoint union
    named `ConfigLayerSource` that has the appropriate metadata for each
    variant, favoring the use of `AbsolutePathBuf` where appropriate:
    
    ```rust
    pub enum ConfigLayerSource {
        /// Managed preferences layer delivered by MDM (macOS only).
        #[serde(rename_all = "camelCase")]
        #[ts(rename_all = "camelCase")]
        Mdm { domain: String, key: String },
        /// Managed config layer from a file (usually `managed_config.toml`).
        #[serde(rename_all = "camelCase")]
        #[ts(rename_all = "camelCase")]
        System { file: AbsolutePathBuf },
        /// Session-layer overrides supplied via `-c`/`--config`.
        SessionFlags,
        /// User config layer from a file (usually `config.toml`).
        #[serde(rename_all = "camelCase")]
        #[ts(rename_all = "camelCase")]
        User { file: AbsolutePathBuf },
    }
    ```
  • Reimplement skills loading using SkillsManager + skills/list op. (#7914)
    refactor the way we load and manage skills:
    1. Move skill discovery/caching into SkillsManager and reuse it across
    sessions.
    2. Add the skills/list API (Op::ListSkills/SkillsListResponse) to fetch
    skills for one or more cwds. Also update app-server for VSCE/App;
    3. Trigger skills/list during session startup so UIs preload skills and
    handle errors immediately.
  • fix: introduce AbsolutePathBuf as part of sandbox config (#7856)
    Changes the `writable_roots` field of the `WorkspaceWrite` variant of
    the `SandboxPolicy` enum from `Vec<PathBuf>` to `Vec<AbsolutePathBuf>`.
    This is helpful because now callers can be sure the value is an absolute
    path rather than a relative one. (Though when using an absolute path in
    a Seatbelt config policy, we still have to _canonicalize_ it first.)
    
    Because `writable_roots` can be read from a config file, it is important
    that we are able to resolve relative paths properly using the parent
    folder of the config file as the base path.
  • Inject SKILL.md when it's explicitly mentioned. (#7763)
    1. Skills load once in core at session start; the cached outcome is
    reused across core and surfaced to TUI via SessionConfigured.
    2. TUI detects explicit skill selections, and core injects the matching
    SKILL.md content into the turn when a selected skill is present.
  • seatbelt: allow openpty() (#7507)
    This allows `openpty(3)` to run in the default sandbox. Also permit
    reading `kern.argmax`, which is the maximum number of arguments to
    exec().
  • [app-server] update doc with codex error info (#6941)
    Document new codex error info. Also fixed the name from
    `codex_error_code` to `codex_error_info`.
  • [app-server & core] introduce new codex error code and v2 app-server error events (#6938)
    This PR does two things:
    1. populate a new `codex_error_code` protocol in error events sent from
    core to client;
    2. old v1 core events `codex/event/stream_error` and `codex/event/error`
    will now both become `error`. We also show codex error code for
    turncompleted -> error status.
    
    new events in app server test:
    ```
    < {
    <   "method": "codex/event/stream_error",
    <   "params": {
    <     "conversationId": "019aa34c-0c14-70e0-9706-98520a760d67",
    <     "id": "0",
    <     "msg": {
    <       "codex_error_code": {
    <         "response_stream_disconnected": {
    <           "http_status_code": 401
    <         }
    <       },
    <       "message": "Reconnecting... 2/5",
    <       "type": "stream_error"
    <     }
    <   }
    < }
    
     {
    <   "method": "error",
    <   "params": {
    <     "error": {
    <       "codexErrorCode": {
    <         "responseStreamDisconnected": {
    <           "httpStatusCode": 401
    <         }
    <       },
    <       "message": "Reconnecting... 2/5"
    <     }
    <   }
    < }
    
    < {
    <   "method": "turn/completed",
    <   "params": {
    <     "turn": {
    <       "error": {
    <         "codexErrorCode": {
    <           "responseTooManyFailedAttempts": {
    <             "httpStatusCode": 401
    <           }
    <         },
    <         "message": "exceeded retry limit, last status: 401 Unauthorized, request id: 9a1b495a1a97ed3e-SJC"
    <       },
    <       "id": "0",
    <       "items": [],
    <       "status": "failed"
    <     }
    <   }
    < }
    ```
  • codex-exec: allow resume --last to read prompt #6717 (#6719)
    ### Description
    
    - codex exec --json resume --last "<prompt>" bailed out because clap
    treated the prompt as SESSION_ID. I removed the conflicts_with flag and
    reinterpret that positional as a prompt when
    --last is set, so the flow now keeps working in JSON mode.
    (codex-rs/exec/src/cli.rs:84-104, codex-rs/exec/src/lib.rs:75-130)
    - Added a regression test that exercises resume --last in JSON mode to
    ensure the prompt is accepted and the rollout file is updated.
    (codex-rs/exec/tests/suite/resume.rs:126-178)
    
    ### Testing
    
      - just fmt
      - cargo test -p codex-exec
      - just fix -p codex-exec
      - cargo test -p codex-exec
    
    #6717
    
    Signed-off-by: Dmitri Khokhlov <dkhokhlov@cribl.io>
  • [app-server] feat: v2 apply_patch approval flow (#6760)
    This PR adds the API V2 version of the apply_patch approval flow, which
    centers around `ThreadItem::FileChange`.
    
    This PR wires the new RPC (`item/fileChange/requestApproval`, V2 only)
    and related events (`item/started`, `item/completed` for
    `ThreadItem::FileChange`, which are emitted in both V1 and V2) through
    the app-server
    protocol. The new approval RPC is only sent when the user initiates a
    turn with the new `turn/start` API so we don't break backwards
    compatibility with VSCE.
    
    Similar to https://github.com/openai/codex/pull/6758, the approach I
    took was to make as few changes to the Codex core as possible,
    leveraging existing `EventMsg` core events, and translating those in
    app-server. I did have to add a few additional fields to
    `EventMsg::PatchApplyBegin` and `EventMsg::PatchApplyEnd`, but those
    were fairly lightweight.
    
    However, the `EventMsg`s emitted by core are the following:
    ```
    1) Auto-approved (no request for approval)

    - EventMsg::PatchApplyBegin
    - EventMsg::PatchApplyEnd
    
    2) Approved by user
    - EventMsg::ApplyPatchApprovalRequest
    - EventMsg::PatchApplyBegin
    - EventMsg::PatchApplyEnd
    
    3) Declined by user
    - EventMsg::ApplyPatchApprovalRequest
    - EventMsg::PatchApplyBegin
    - EventMsg::PatchApplyEnd
    ```
    
    For a request triggering an approval, this would result in:
    ```
    item/fileChange/requestApproval
    item/started
    item/completed
    ```
    
    which is different from the `ThreadItem::CommandExecution` flow
    introduced in https://github.com/openai/codex/pull/6758, which does the
    below and is preferable:
    ```
    item/started
    item/commandExecution/requestApproval
    item/completed
    ```
    
    To fix this, we leverage `TurnSummaryStore` on codex_message_processor
    to store a little bit of state, allowing us to fire `item/started` and
    `item/fileChange/requestApproval` whenever we receive the underlying
    `EventMsg::ApplyPatchApprovalRequest`, and no-oping when we receive the
    `EventMsg::PatchApplyBegin` later.
    
    This is much less invasive than modifying the order of EventMsg within
    core (I tried).
    
    The resulting payloads:
    ```
    {
      "method": "item/started",
      "params": {
        "item": {
          "changes": [
            {
              "diff": "Hello from Codex!\n",
              "kind": "add",
              "path": "/Users/owen/repos/codex/codex-rs/APPROVAL_DEMO.txt"
            }
          ],
          "id": "call_Nxnwj7B3YXigfV6Mwh03d686",
          "status": "inProgress",
          "type": "fileChange"
        }
      }
    }
    ```
    
    ```
    {
      "id": 0,
      "method": "item/fileChange/requestApproval",
      "params": {
        "grantRoot": null,
        "itemId": "call_Nxnwj7B3YXigfV6Mwh03d686",
        "reason": null,
        "threadId": "019a9e11-8295-7883-a283-779e06502c6f",
        "turnId": "1"
      }
    }
    ```
    
    ```
    {
      "id": 0,
      "result": {
        "decision": "accept"
      }
    }
    ```
    
    ```
    {
      "method": "item/completed",
      "params": {
        "item": {
          "changes": [
            {
              "diff": "Hello from Codex!\n",
              "kind": "add",
              "path": "/Users/owen/repos/codex/codex-rs/APPROVAL_DEMO.txt"
            }
          ],
          "id": "call_Nxnwj7B3YXigfV6Mwh03d686",
          "status": "completed",
          "type": "fileChange"
        }
      }
    }
    ```
  • Revert "[core] add optional status_code to error events (#6865)" (#6955)
    This reverts commit c2ec477d93.
    
    # External (non-OpenAI) Pull Request Requirements
    
    Before opening this Pull Request, please read the dedicated
    "Contributing" markdown file or your PR may be closed:
    https://github.com/openai/codex/blob/main/docs/contributing.md
    
    If your PR conforms to our contribution guidelines, replace this text
    with a detailed and high quality description of your changes.
    
    Include a link to a bug report or enhancement request.
  • [core] add optional status_code to error events (#6865)
    We want to better uncover error status code for clients. Add an optional
    status_code to error events (thread error, error, stream error) so app
    server could uncover the status code from the client side later.
    
    in event log:
    ```
    < {
    <   "method": "codex/event/stream_error",
    <   "params": {
    <     "conversationId": "019a9a32-f576-7292-9711-8e57e8063536",
    <     "id": "0",
    <     "msg": {
    <       "message": "Reconnecting... 5/5",
    <       "status_code": 401,
    <       "type": "stream_error"
    <     }
    <   }
    < }
    < {
    <   "method": "codex/event/error",
    <   "params": {
    <     "conversationId": "019a9a32-f576-7292-9711-8e57e8063536",
    <     "id": "0",
    <     "msg": {
    <       "message": "exceeded retry limit, last status: 401 Unauthorized, request id: 9a0cb03a485067f7-SJC",
    <       "status_code": 401,
    <       "type": "error"
    <     }
    <   }
    < }
    ```
  • fix: add more fields to ThreadStartResponse and ThreadResumeResponse (#6847)
    This adds the following fields to `ThreadStartResponse` and
    `ThreadResumeResponse`:
    
    ```rust
        pub model: String,
        pub model_provider: String,
        pub cwd: PathBuf,
        pub approval_policy: AskForApproval,
        pub sandbox: SandboxPolicy,
        pub reasoning_effort: Option<ReasoningEffort>,
    ```
    
    This is important because these fields are optional in
    `ThreadStartParams` and `ThreadResumeParams`, so the caller needs to be
    able to determine what values were ultimately used to start/resume the
    conversation. (Though note that any of these could be changed later
    between turns in the conversation.)
    
    Though to get this information reliably, it must be read from the
    internal `SessionConfiguredEvent` that is created in response to the
    start of a conversation. Because `SessionConfiguredEvent` (as defined in
    `codex-rs/protocol/src/protocol.rs`) did not have all of these fields, a
    number of them had to be added as part of this PR.
    
    Because `SessionConfiguredEvent` is referenced in many tests, test
    instances of `SessionConfiguredEvent` had to be updated, as well, which
    is why this PR touches so many files.
  • Update defaults to gpt-5.1 (#6652)
    ## Summary
    - update documentation, example configs, and automation defaults to
    reference gpt-5.1 / gpt-5.1-codex
    - bump the CLI and core configuration defaults, model presets, and error
    messaging to the new models while keeping the model-family/tool coverage
    for legacy slugs
    - refresh tests, fixtures, and TUI snapshots so they expect the upgraded
    defaults
    
    ## Testing
    - `cargo test -p codex-core
    config::tests::test_precedence_fixture_with_gpt5_profile`
    
    
    ------
    [Codex
    Task](https://chatgpt.com/codex/tasks/task_i_6916c5b3c2b08321ace04ee38604fc6b)
  • [app-server] feat: add v2 command execution approval flow (#6758)
    This PR adds the API V2 version of the command‑execution approval flow
    for the shell tool.
    
    This PR wires the new RPC (`item/commandExecution/requestApproval`, V2
    only) and related events (`item/started`, `item/completed`, and
    `item/commandExecution/delta`, which are emitted in both V1 and V2)
    through the app-server
    protocol. The new approval RPC is only sent when the user initiates a
    turn with the new `turn/start` API so we don't break backwards
    compatibility with VSCE.
    
    The approach I took was to make as few changes to the Codex core as
    possible, leveraging existing `EventMsg` core events, and translating
    those in app-server. I did have to add additional fields to
    `EventMsg::ExecCommandEndEvent` to capture the command's input so that
    app-server can statelessly transform these events to a
    `ThreadItem::CommandExecution` item for the `item/completed` event.
    
    Once we stabilize the API and it's complete enough for our partners, we
    can work on migrating the core to be aware of command execution items as
    a first-class concept.
    
    **Note**: We'll need followup work to make sure these APIs work for the
    unified exec tool, but will wait til that's stable and landed before
    doing a pass on app-server.
    
    Example payloads below:
    ```
    {
      "method": "item/started",
      "params": {
        "item": {
          "aggregatedOutput": null,
          "command": "/bin/zsh -lc 'touch /tmp/should-trigger-approval'",
          "cwd": "/Users/owen/repos/codex/codex-rs",
          "durationMs": null,
          "exitCode": null,
          "id": "call_lNWWsbXl1e47qNaYjFRs0dyU",
          "parsedCmd": [
            {
              "cmd": "touch /tmp/should-trigger-approval",
              "type": "unknown"
            }
          ],
          "status": "inProgress",
          "type": "commandExecution"
        }
      }
    }
    ```
    
    ```
    {
      "id": 0,
      "method": "item/commandExecution/requestApproval",
      "params": {
        "itemId": "call_lNWWsbXl1e47qNaYjFRs0dyU",
        "parsedCmd": [
          {
            "cmd": "touch /tmp/should-trigger-approval",
            "type": "unknown"
          }
        ],
        "reason": "Need to create file in /tmp which is outside workspace sandbox",
        "risk": null,
        "threadId": "019a93e8-0a52-7fe3-9808-b6bc40c0989a",
        "turnId": "1"
      }
    }
    ```
    
    ```
    {
      "id": 0,
      "result": {
        "acceptSettings": {
          "forSession": false
        },
        "decision": "accept"
      }
    }
    ```
    
    ```
    {
      "params": {
        "item": {
          "aggregatedOutput": null,
          "command": "/bin/zsh -lc 'touch /tmp/should-trigger-approval'",
          "cwd": "/Users/owen/repos/codex/codex-rs",
          "durationMs": 224,
          "exitCode": 0,
          "id": "call_lNWWsbXl1e47qNaYjFRs0dyU",
          "parsedCmd": [
            {
              "cmd": "touch /tmp/should-trigger-approval",
              "type": "unknown"
            }
          ],
          "status": "completed",
          "type": "commandExecution"
        }
      }
    }
    ```
  • feat: better UI for unified_exec (#6515)
    <img width="376" height="132" alt="Screenshot 2025-11-12 at 17 36 22"
    src="https://github.com/user-attachments/assets/ce693f0d-5ca0-462e-b170-c20811dcc8d5"
    />
  • feat: Add support for --add-dir to exec and TypeScript SDK (#6565)
    ## Summary
    
    Adds support for specifying additional directories in the TypeScript SDK
    through a new `additionalDirectories` option in `ThreadOptions`.
    
    ## Changes
    
    - Added `additionalDirectories` parameter to `ThreadOptions` interface
    - Updated `CodexExec` to accept and pass through additional directories
    via the `--config` flag for `sandbox_workspace_write.writable_roots`
    - Added comprehensive test coverage for the new functionality
    
    ## Test plan
    
    - Added test case that verifies `additionalDirectories` is correctly
    passed as repeated flags
    - Existing tests continue to pass
    
    ---------
    
    Co-authored-by: Claude <noreply@anthropic.com>
  • Fix warning message phrasing (#6446)
    Small fix for sentence phrasing in the warning message
    
    Co-authored-by: AndrewNikolin <877163+AndrewNikolin@users.noreply.github.com>
  • Add warning on compact (#6052)
    This PR introduces the ability for `core` to send `warnings` as it can
    send `errors. It also sends a warning on compaction.
    
    <img width="811" height="187" alt="image"
    src="https://github.com/user-attachments/assets/0947a42d-b720-420d-b7fd-115f8a65a46a"
    />
  • [exec] Add MCP tool arguments and results (#5899)
    Extends mcp_tool_call item to include arguments and results.
  • feature: Add "!cmd" user shell execution (#2471)
    feature: Add "!cmd" user shell execution
    
    This change lets users run local shell commands directly from the TUI by
    prefixing their input with ! (e.g. !ls). Output is truncated to keep the
    exec cell usable, and Ctrl-C cleanly
      interrupts long-running commands (e.g. !sleep 10000).
    
    **Summary of changes**
    
    - Route Op::RunUserShellCommand through a dedicated UserShellCommandTask
    (core/src/tasks/user_shell.rs), keeping the task logic out of codex.rs.
    - Reuse the existing tool router: the task constructs a ToolCall for the
    local_shell tool and relies on ShellHandler, so no manual MCP tool
    lookup is required.
    - Emit exec lifecycle events (ExecCommandBegin/ExecCommandEnd) so the
    TUI can show command metadata, live output, and exit status.
    
    **End-to-end flow**
    
      **TUI handling**
    
    1. ChatWidget::submit_user_message (TUI) intercepts messages starting
    with !.
    2. Non-empty commands dispatch Op::RunUserShellCommand { command };
    empty commands surface a help hint.
    3. No UserInput items are created, so nothing is enqueued for the model.
    
      **Core submission loop**
    4. The submission loop routes the op to handlers::run_user_shell_command
    (core/src/codex.rs).
    5. A fresh TurnContext is created and Session::spawn_user_shell_command
    enqueues UserShellCommandTask.
    
      **Task execution**
    6. UserShellCommandTask::run emits TaskStartedEvent, formats the
    command, and prepares a ToolCall targeting local_shell.
      7. ToolCallRuntime::handle_tool_call dispatches to ShellHandler.
    
      **Shell tool runtime**
    8. ShellHandler::run_exec_like launches the process via the unified exec
    runtime, honoring sandbox and shell policies, and emits
    ExecCommandBegin/End.
    9. Stdout/stderr are captured for the UI, but the task does not turn the
    resulting ToolOutput into a model response.
    
      **Completion**
    10. After ExecCommandEnd, the task finishes without an assistant
    message; the session marks it complete and the exec cell displays the
    final output.
    
      **Conversation context**
    
    - The command and its output never enter the conversation history or the
    model prompt; the flow is local-only.
      - Only exec/task events are emitted for UI rendering.
    
    **Demo video**
    
    
    https://github.com/user-attachments/assets/fcd114b0-4304-4448-a367-a04c43e0b996
  • feat(tui): clarify Windows auto mode requirements (#5568)
    ## Summary
    - Coerce Windows `workspace-write` configs back to read-only, surface
    the forced downgrade in the approvals popup,
      and funnel users toward WSL or Full Access.
    - Add WSL installation instructions to the Auto preset on Windows while
    keeping the preset available for other
      platforms.
    - Skip the trust-on-first-run prompt on native Windows so new folders
    remain read-only without additional
      confirmation.
    - Expose a structured sandbox policy resolution from config to flag
    Windows downgrades and adjust tests (core,
    exec, TUI) to reflect the new behavior; provide a Windows-only approvals
    snapshot.
    
      ## Testing
      - cargo fmt
    - cargo test -p codex-core
    config::tests::add_dir_override_extends_workspace_writable_roots
    - cargo test -p codex-exec
    suite::resume::exec_resume_preserves_cli_configuration_overrides
    - cargo test -p codex-tui
    chatwidget::tests::approvals_selection_popup_snapshot
    - cargo test -p codex-tui
    approvals_popup_includes_wsl_note_for_auto_mode
      - cargo test -p codex-tui windows_skips_trust_prompt
      - just fix -p codex-core
      - just fix -p codex-tui
  • chore: drop approve all (#5503)
    Not needed anymore
  • Set codex SDK TypeScript originator (#4894)
    ## Summary
    - ensure the TypeScript SDK sets CODEX_INTERNAL_ORIGINATOR_OVERRIDE to
    codex_sdk_ts when spawning the Codex CLI
    - extend the responses proxy test helper to capture request headers for
    assertions
    - add coverage that verifies Codex threads launched from the TypeScript
    SDK send the codex_sdk_ts originator header
    
    ## Testing
    - Not Run (not requested)
    
    
    ------
    https://chatgpt.com/codex/tasks/task_i_68e561b125248320a487f129093d16e7
  • Simplify request body assertions (#4845)
    We'll have a lot more test like these
  • Use response helpers when mounting SSE test responses (#4783)
    ## Summary
    - replace manual wiremock SSE mounts in the compact suite with the
    shared response helpers
    - simplify the exec auth_env integration test by using the
    mount_sse_once_match helper
    - rely on mount_sse_sequence plus server request collection to replace
    the bespoke SeqResponder utility in tests
    
    ## Testing
    - just fmt
    
    ------
    https://chatgpt.com/codex/tasks/task_i_68e2e238f2a88320a337f0b9e4098093
  • Add helper for response created SSE events in tests (#4758)
    ## Summary
    - add a reusable `ev_response_created` helper that builds
    `response.created` SSE events for integration tests
    - update the exec and core integration suites to use the new helper
    instead of repeating manual JSON literals
    - keep the streaming fixtures consistent by relying on the shared helper
    in every touched test
    
    ## Testing
    - `just fmt`
    
    
    ------
    https://chatgpt.com/codex/tasks/task_i_68e1fe885bb883208aafffb94218da61
  • feat: codex exec writes only the final message to stdout (#4644)
    This updates `codex exec` so that, by default, most of the agent's
    activity is written to stderr so that only the final agent message is
    written to stdout. This makes it easier to pipe `codex exec` into
    another tool without extra filtering.
    
    I introduced `#![deny(clippy::print_stdout)]` to help enforce this
    change and renamed the `ts_println!()` macro to `ts_msg()` because (1)
    it no longer calls `println!()` and (2), `ts_eprintln!()` seemed too
    long of a name.
    
    While here, this also adds `-o` as an alias for `--output-last-message`.
    
    Fixes https://github.com/openai/codex/issues/1670
  • chore: refactor tool handling (#4510)
    # Tool System Refactor
    
    - Centralizes tool definitions and execution in `core/src/tools/*`:
    specs (`spec.rs`), handlers (`handlers/*`), router (`router.rs`),
    registry/dispatch (`registry.rs`), and shared context (`context.rs`).
    One registry now builds the model-visible tool list and binds handlers.
    - Router converts model responses to tool calls; Registry dispatches
    with consistent telemetry via `codex-rs/otel` and unified error
    handling. Function, Local Shell, MCP, and experimental `unified_exec`
    all flow through this path; legacy shell aliases still work.
    - Rationale: reduce per‑tool boilerplate, keep spec/handler in sync, and
    make adding tools predictable and testable.
    
    Example: `read_file`
    - Spec: `core/src/tools/spec.rs` (see `create_read_file_tool`,
    registered by `build_specs`).
    - Handler: `core/src/tools/handlers/read_file.rs` (absolute `file_path`,
    1‑indexed `offset`, `limit`, `L#: ` prefixes, safe truncation).
    - E2E test: `core/tests/suite/read_file.rs` validates the tool returns
    the requested lines.
    
    ## Next steps:
    - Decompose `handle_container_exec_with_params` 
    - Add parallel tool calls
  • Support CODEX_API_KEY for codex exec (#4615)
    Allows to set API key per invocation of `codex exec`
  • fix: remove mcp-types from app server protocol (#4537)
    We continue the separation between `codex app-server` and `codex
    mcp-server`.
    
    In particular, we introduce a new crate, `codex-app-server-protocol`,
    and migrate `codex-rs/protocol/src/mcp_protocol.rs` into it, renaming it
    `codex-rs/app-server-protocol/src/protocol.rs`.
    
    Because `ConversationId` was defined in `mcp_protocol.rs`, we move it
    into its own file, `codex-rs/protocol/src/conversation_id.rs`, and
    because it is referenced in a ton of places, we have to touch a lot of
    files as part of this PR.
    
    We also decide to get away from proper JSON-RPC 2.0 semantics, so we
    also introduce `codex-rs/app-server-protocol/src/jsonrpc_lite.rs`, which
    is basically the same `JSONRPCMessage` type defined in `mcp-types`
    except with all of the `"jsonrpc": "2.0"` removed.
    
    Getting rid of `"jsonrpc": "2.0"` makes our serialization logic
    considerably simpler, as we can lean heavier on serde to serialize
    directly into the wire format that we use now.
  • Remove legacy codex exec --json format (#4525)
    `codex exec --json` now maps to the behavior of `codex exec
    --experimental-json` with new event and item shapes.
    
    Thread events:
    - thread.started
    - turn.started
    - turn.completed
    - turn.failed
    - item.started
    - item.updated
    - item.completed
    
    Item types: 
    - assistant_message
    - reasoning
    - command_execution
    - file_change
    - mcp_tool_call
    - web_search
    - todo_list
    - error
    
    Sample output:
    
    <details>
    `codex exec "list my assigned github issues"  --json | jq`
    
    ```
    {
      "type": "thread.started",
      "thread_id": "01999ce5-f229-7661-8570-53312bd47ea3"
    }
    {
      "type": "turn.started"
    }
    {
      "type": "item.completed",
      "item": {
        "id": "item_0",
        "item_type": "reasoning",
        "text": "**Planning to list assigned GitHub issues**"
      }
    }
    {
      "type": "item.started",
      "item": {
        "id": "item_1",
        "item_type": "mcp_tool_call",
        "server": "github",
        "tool": "search_issues",
        "status": "in_progress"
      }
    }
    {
      "type": "item.completed",
      "item": {
        "id": "item_1",
        "item_type": "mcp_tool_call",
        "server": "github",
        "tool": "search_issues",
        "status": "completed"
      }
    }
    {
      "type": "item.completed",
      "item": {
        "id": "item_2",
        "item_type": "reasoning",
        "text": "**Organizing final message structure**"
      }
    }
    {
      "type": "item.completed",
      "item": {
        "id": "item_3",
        "item_type": "assistant_message",
        "text": "**Assigned Issues**\n- openai/codex#3267 – “stream error: stream disconnected before completion…” (bug) – last update 2025-09-08\n- openai/codex#3257 – “You've hit your usage limit. Try again in 4 days 20 hours 9 minutes.” – last update 2025-09-23\n- openai/codex#3054 – “reqwest SSL panic (library has no ciphers)” (bug) – last update 2025-09-03\n- openai/codex#3051 – “thread 'main' panicked at linux-sandbox/src/linux_run_main.rs:53:5:” (bug) – last update 2025-09-10\n- openai/codex#3004 – “Auto-compact when approaching context limit” (enhancement) – last update 2025-09-26\n- openai/codex#2916 – “Feature request: Add OpenAI service tier support for cost optimization” – last update 2025-09-12\n- openai/codex#1581 – “stream error: stream disconnected before completion: stream closed before response.complete; retrying...” (bug) – last update 2025-09-17"
      }
    }
    {
      "type": "turn.completed",
      "usage": {
        "input_tokens": 34785,
        "cached_input_tokens": 12544,
        "output_tokens": 560
      }
    }
    ```
    
    </details>
  • Wire up web search item (#4511)
    Add handling for web search events.