71 Commits

  • [codex] Enable remote plugins by default (#30297)
    ## Summary
    
    - enable the remote plugin feature by default
    - promote the remote plugin feature from under development to stable
    - preserve the existing `features.remote_plugin` override for explicitly
    disabling it
    - keep legacy disabled-path coverage explicit in TUI and app-server
    tests
    
    ## Impact
    
    Remote plugin functionality is enabled by default for configurations
    that do not set the feature flag. The existing Codex backend
    authentication gate still applies.
    
    ## Validation
    
    - `just fmt`
    - `just test -p codex-features`
    - `just test -p codex-tui
    plugins_popup_remote_section_fallback_states_snapshot`
    - targeted `codex-app-server` plugin-list and skills-list tests
    - `git diff --check`
    
    The full TUI and app-server suites were also exercised locally. All
    remote-plugin-related coverage passed; unrelated local
    sandbox/test-binary failures remain outside this change.
  • feat(app-server): list descendant threads by ancestor (#29591)
    ## Why
    
    `thread/list` can filter direct children with `parentThreadId`, but
    clients cannot request an entire spawned subtree. Discovering every
    descendant requires repeated client-side requests and gives up the
    database's existing filtering and pagination path.
    
    ## What changed
    
    Experimental clients can use `ancestorThreadId` to return strict
    descendants at any depth while `parentThreadId` retains its direct-child
    meaning. The filters are mutually exclusive, the ancestor is excluded,
    and every result preserves its immediate `parentThreadId` so callers can
    reconstruct the tree.
    
    ## How it works
    
    - **Explicit relationship:** Internal list parameters distinguish direct
    children from transitive descendants without changing the meaning of
    `parentThreadId`.
    - **Existing graph:** Persisted parent-child spawn edges remain the
    source of truth, so descendant lookup needs no schema migration or
    ancestry cache.
    - **Indexed traversal:** A recursive SQLite query starts from the
    parent-edge index, walks each generation, and applies thread filters,
    sorting, and cursor pagination in the same database request.
    - **Reconstructable results:** The response stays flat and normally
    ordered while carrying each descendant's immediate parent.
    
    ## Verification
    
    Ran 550 tests across the protocol, state, rollout, and thread-store
    crates, then reran the four focused state, store, and app-server
    descendant-listing tests after the final diff reduction. Scoped Clippy
    and formatting checks passed. Stable and experimental schema generation
    was checked; the stable fixtures remain unchanged while the experimental
    schema includes the new field.
  • Separate local and remote plugin analytics IDs (#29495)
    ## Why
    
    Plugin analytics overloaded `plugin_id`: most events used the Codex
    `<plugin>@<marketplace>` identity, while remote install events used the
    backend plugin ID. That makes the same field change meaning across event
    types and complicates downstream identity resolution.
    
    This change makes the contract unambiguous:
    
    - `plugin_id`: the local Codex `<plugin>@<marketplace>` identity, when
    resolved
    - `remote_plugin_id`: the backend plugin identity, when available
    
    For a remote install failure that happens before plugin details resolve,
    `plugin_id` is `null` and `remote_plugin_id` remains populated.
    
    ## What changed
    
    All six plugin analytics events use the same identity contract:
    
    - `codex_plugin_installed`
    - `codex_plugin_install_failed`
    - `codex_plugin_uninstalled`
    - `codex_plugin_enabled`
    - `codex_plugin_disabled`
    - `codex_plugin_used`
    
    Remote identity is resolved from the current installed-plugin snapshot
    first, with persisted install metadata as fallback. The telemetry
    metadata type keeps local identity optional for failures that occur
    before remote details are available.
    
    The app-server test client's manual analytics smokes now find remote
    mutation events through `remote_plugin_id` and validate that `plugin_id`
    remains local.
    
    ## Remote uninstall
    
    Resolve and capture telemetry metadata before removing the local plugin
    cache, then emit `codex_plugin_uninstalled` after the backend confirms
    success. The event is also emitted when backend uninstall succeeds but
    local cache cleanup reports `CacheRemove`.
    
    If a concurrent remote-cache refresh removes the local bundle before
    telemetry capture, the already-fetched remote plugin detail supplies
    fallback capability metadata.
    
    ## Validation
    
    - `just test -p codex-analytics` — 82 passed
    - `just test -p codex-core-plugins` — 271 passed
    - `just test -p codex-app-server-test-client` — 5 passed
    - `just test -p codex-plugin` — 3 passed
    - `just test -p codex-app-server plugin_install` — 37 passed
    - `just test -p codex-app-server plugin_uninstall` — 10 passed
    
    The production app-server install/uninstall flow was also exercised
    against `plugins~Plugin_f1b845ac33888191ac156169c58733c2`
    (`build-ios-apps@openai-curated-remote`), and the plugin's original
    uninstalled state was restored.
  • [codex] handle request_user_input in app-server test client (#29476)
    ## Why
    
    `codex-app-server-test-client` previously treated
    `item/tool/requestUserInput` as an unsupported server request and
    terminated the connection. That made it impossible to use the client for
    end-to-end testing of interactive turns: an operator could observe the
    request, but could not answer it and confirm that the same turn resumed.
    
    ## What changed
    
    - Handle `ToolRequestUserInput` server requests in the test client's
    central request dispatcher.
    - Render numbered terminal choices, accept exact option labels, support
    free-form `Other` and text-only questions, and collect multiple answers.
    - Send a protocol-native `ToolRequestUserInputResponse` and continue
    streaming the active turn.
    - Fail clearly when interactive input is requested without a terminal.
    - Document the interactive behavior and add focused tests for option
    selection, free-form answers, multiple questions, and invalid-selection
    retries.
    
    ## Testing
    
    - `just test -p codex-app-server-test-client`
    - `just bazel-lock-check`
    - Manually exercised the app-server flow, selected `TUI`, observed
    `serverRequest/resolved`, and verified that the same turn completed with
    the selected answer.
  • Support openai/form extended form elicitations (#27500)
    # Summary
    Allow App Server clients to opt into `openai/form` MCP elicitations.
  • unified-exec: retain PathUri in command events (#28780)
    ## Why
    
    App-server must report command events containing foreign-platform paths
    without changing existing client or rollout path-string formats.
    
    ## What changed
    
    - retain `PathUri` through exec command begin/end events
    - convert cwd values to `LegacyAppPathString` at the app-server
    compatibility boundary
    - drop command actions with foreign paths and log them
    - serialize rollout-trace cwd values using their inferred native path
    representation
    - restore Wine coverage for retained Windows cwd values and successful
    completion
  • Scope command approvals by execution environment (#28738)
    ## Why
    
    Command approval cache keys included the command and working directory,
    but not the execution environment. An approval for `/workspace` locally
    could therefore be reused for the same command and path on an executor.
    
    ## What changed
    
    - Include the selected environment ID in shell and unified-exec approval
    cache keys.
    - Carry that ID through the normal command approval request so clients
    can show which environment is being approved.
    - Expose the environment through app-server as a required nullable
    `environmentId` and show it in the inline TUI approval prompt.
    - Keep older recorded approval events compatible when the environment is
    absent.
    
    For example, `echo ok` in local `/workspace` and `echo ok` in executor
    `/workspace` now produce different approval keys and separate prompts.
    
    ## Scope
    
    This PR does not change network approvals, Guardian review actions, MCP
    elicitation, full-screen TUI rendering, or environment-ID validation.
    Remote `shell_command` execution itself remains in #28722; this PR only
    makes its approval key environment-aware.
  • [codex-app-server-test-client] Plugin Install/Uninstall Analytics Smoke Test (#27100)
    ## This PR
    
    The original [combined remote plugin analytics PR
    #26281](https://github.com/openai/codex/pull/26281) mixed reusable
    analytics test infrastructure, two manual smoke workflows, a metadata
    refactor, and the final identity behavior. This PR adds the
    account-mutating validation workflow separately so its cleanup and
    recovery guarantees can be reviewed without the final analytics behavior
    change.
    
    - Add a manually invoked remote plugin install/uninstall smoke workflow.
    - Require explicit account-mutation confirmation and an initially
    uninstalled plugin.
    - Validate the current `codex_plugin_installed` contract, where
    `plugin_id` is the backend ID.
    - Restore and verify the original uninstalled state, with a dedicated
    recovery command.
    
    This baseline intentionally does not require `codex_plugin_uninstalled`,
    because production does not emit that event yet. The final PR will
    update this smoke to require local `plugin_id`, `remote_plugin_id`, and
    uninstall emission. Review this PR as the net diff against #27099.
    
    ## Testing
    
    - `just test -p codex-app-server-test-client` (3 focused
    capture/validation tests passed)
    - The live workflow was previously exercised on the green combined
    reference branch, and the original uninstalled account state was
    restored.
    - CI is green across the required platform matrix.
    
    ## Split Overview
    
    ```text
    main
    ├── #27093  Debug analytics capture
    │   └── #27099  Non-mutating plugin smoke
    │       └── #27100  Remote install/uninstall smoke  ← you are here
    └── #27102  Plugin telemetry metadata refactor
    
    After #27093, #27099, #27100, and #27102 merge:
    └── Final PR: add remote_plugin_id to plugin analytics
    ```
    
    Review order and dependencies:
    
    1. [#27093 Add debug-only analytics event
    capture](https://github.com/openai/codex/pull/27093) (based on `main`)
    2. [#27099 Add a plugin analytics smoke
    workflow](https://github.com/openai/codex/pull/27099) (stacked on
    #27093)
    3. [#27100 Add a remote plugin analytics mutation smoke
    workflow](https://github.com/openai/codex/pull/27100) **(this PR,
    stacked on #27099)**
    4. [#27102 Centralize plugin telemetry metadata
    construction](https://github.com/openai/codex/pull/27102) (independent,
    based on `main`)
    5. Final remote-ID behavior PR (created after PRs 1-4 merge)
    
    The original [#26281](https://github.com/openai/codex/pull/26281)
    remains open as the green aggregate reference until the final PR is
    published.
  • [codex-app-server-test-client & codex-app-server] Plugin Usage Analytics Smoke Test (#27099)
    ## This PR
    
    The original [combined remote plugin analytics PR
    #26281](https://github.com/openai/codex/pull/26281) mixed reusable
    analytics test infrastructure, two manual smoke workflows, a metadata
    refactor, and the final identity behavior. This PR establishes a
    non-mutating end-to-end plugin smoke workflow before any analytics
    identity semantics change.
    
    - Add `plugin-analytics-smoke` to the existing app-server test client.
    - Exercise plugin disable, enable, and use through production app-server
    RPC paths.
    - Isolate config writes in a temporary file and use a loopback Responses
    API server.
    - Capture analytics without sending them to the production analytics
    backend.
    - Validate the current local `plugin_id`, names, capability metadata,
    thread, turn, and model fields.
    
    This is intentionally a baseline smoke workflow. It does not assert
    `remote_plugin_id`; the final PR will update it when that field exists.
    Review this PR as the net diff against #27093.
    
    ## Testing
    
    - The test-client target compiles successfully.
    - The combined reference branch exercised the manual smoke against the
    live remote plugin service.
    - CI is green across the required platform matrix.
    
    ## Split Overview
    
    ```text
    main
    ├── #27093  Debug analytics capture
    │   └── #27099  Non-mutating plugin smoke           ← you are here
    │       └── #27100  Remote install/uninstall smoke
    └── #27102  Plugin telemetry metadata refactor
    
    After #27093, #27099, #27100, and #27102 merge:
    └── Final PR: add remote_plugin_id to plugin analytics
    ```
    
    Review order and dependencies:
    
    1. [#27093 Add debug-only analytics event
    capture](https://github.com/openai/codex/pull/27093) (based on `main`)
    2. [#27099 Add a plugin analytics smoke
    workflow](https://github.com/openai/codex/pull/27099) **(this PR,
    stacked on #27093)**
    3. [#27100 Add a remote plugin analytics mutation smoke
    workflow](https://github.com/openai/codex/pull/27100) (stacked on this
    PR)
    4. [#27102 Centralize plugin telemetry metadata
    construction](https://github.com/openai/codex/pull/27102) (independent,
    based on `main`)
    5. Final remote-ID behavior PR (created after PRs 1-4 merge)
    
    The original [#26281](https://github.com/openai/codex/pull/26281)
    remains open as the green aggregate reference until the final PR is
    published.
  • Expose explicit dynamic tool namespaces in thread start (#27371)
    Stacked on #27365.
    
    ## Stack note
    
    [#27365](https://github.com/openai/codex/pull/27365) kept `thread/start`
    unchanged and converted its input in `thread_processor`. This PR updates
    `thread/start` to accept explicit functions and namespaces directly.
    
    Legacy per-tool arrays are still accepted and converted while reading
    the request. As a result, `thread_processor` can validate and pass the
    tools through directly, which is why some code added in #27365 is
    removed here.
    
    ## Why
    
    `thread/start.dynamicTools` still repeats namespace data on each
    function even though core now stores explicit namespace groups. The
    request API should use the same shape so each namespace has one
    description and one member list.
    
    ## What changed
    
    - Accept top-level functions and explicit namespace objects in
    `dynamicTools`.
    - Continue accepting fully legacy flat arrays, including
    `exposeToContext`.
    - Reject arrays that mix legacy and canonical entries.
    - Reuse the protocol types directly and remove the temporary app-server
    adapter.
    - Update validation, docs, the test client, and generated schemas.
    
    ## Test plan
    
    - `just test -p codex-app-server-protocol`
    - `just test -p codex-app-server
    dynamic_tool_call_round_trip_sends_text_content_items_to_model`
    - `just test -p codex-app-server
    thread_start_normalizes_legacy_dynamic_tools_into_model_request`
    - `just test -p codex-app-server
    thread_start_rejects_mixed_dynamic_tool_formats`
    - `just test -p codex-app-server
    thread_start_rejects_hidden_dynamic_tools_without_namespace`
  • feat(app-server): filter threads by parent (#26662)
    ## Why
    
    Clients that display or coordinate spawned subagents need an
    authoritative snapshot of a thread's immediate spawned children when
    they connect to app-server or recover after missing live events.
    `thread/list` cannot query by parent, so clients must otherwise scan
    unrelated threads or reconstruct relationships from rollout history and
    transient events.
    
    The direct spawn relationship already exists in persisted
    `thread_spawn_edges` state. Review and Guardian threads do not
    participate in that lifecycle and are intentionally outside this
    filter's scope.
    
    ## What changed
    
    This adds an experimental `parentThreadId` filter to `thread/list`.
    Parent-filtered requests return direct spawned children from persisted
    state while preserving the existing response shape, explicit filters,
    sorting, and timestamp-only cursor behavior. The lookup does not read
    rollout transcripts or recursively return descendants.
    
    Supersedes #25112 with the narrower `thread/list` filter approach.
    
    ## How it works
    
    1. An experimental client passes a valid thread ID as `parentThreadId`.
    2. App-server routes the list through the existing thread-store and
    state-database boundaries.
    3. SQLite selects threads whose IDs have a direct persisted spawn edge
    from that parent.
    4. Omitted provider and source filters include all values; explicit
    filters keep ordinary `thread/list` semantics.
    5. Grandchildren, Review threads, and Guardian threads are excluded.
    
    ## Verification
    
    State (144 tests), rollout (69 tests), and focused app-server
    thread-list (31 tests) suites passed. Scoped Clippy checks and
    repository formatting also passed. Coverage includes direct spawned
    children, omitted grandchildren, pagination, malformed IDs, mixed source
    kinds, explicit filters, and operation without rollout files.
  • [codex] Add user input client ids (#24653)
    ## Summary
    
    Adds an optional `clientId` field to app-server v2 `UserInput` and
    carries it through the core `UserInput` model so clients can correlate
    echoed user input items without relying on payload equality.
    
    ## Details
    
    - Adds `client_id: Option<String>` to core `UserInput` variants.
    - Exposes the v2 app-server field as `clientId` on the wire and in
    generated TypeScript.
    - Preserves the id when converting between app-server v2 and core
    protocol types.
    - Regenerates app-server schema fixtures.
    
    ## Validation
    
    - `just fmt`
    - `just write-app-server-schema`
    - `cargo test -p codex-app-server-protocol`
    - `cargo test -p codex-protocol`
    - `just fix -p codex-app-server-protocol`
    - `just fix -p codex-protocol`
    - `git diff --check`
  • [codex] request desktop attestation from app (#20619)
    ## Summary
    
    TL;DR: teaches `codex-rs` / app-server to request a desktop-provided
    attestation token and attach it as `x-oai-attestation` on the scoped
    ChatGPT Codex request paths.
    
    ![DeviceCheck attestation
    interface](https://raw.githubusercontent.com/openai/codex/dev/jm/devicecheck-diagram-assets/pr-assets/devicecheck-attestation-interface.png)
    
    ## Details
    
    This PR teaches the Codex app-server runtime how to request and attach
    an attestation token. It does not generate DeviceCheck tokens directly;
    instead, it relies on the connected desktop app to advertise that it can
    generate attestation and then asks that app for a fresh header value
    when needed.
    
    The flow is:
    
    1. The Codex desktop app connects to app-server.
    2. During `initialize`, the app can advertise that it supports
    `requestAttestation`.
    3. Before app-server calls selected ChatGPT Codex endpoints, it sends
    the internal server request `attestation/generate` to the app.
    4. app-server receives a pre-encoded header value back.
    5. app-server forwards that value as `x-oai-attestation` on the scoped
    outbound requests.
    
    The code in this repo is mostly protocol and runtime plumbing: it adds
    the app-server request/response shape, introduces an attestation
    provider in core, wires that provider into Responses / compaction /
    realtime setup paths, and covers the intended scoping with tests. The
    signed macOS DeviceCheck generation remains owned by the desktop app PR.
    
    ## Related PR
    
    - Codex desktop app implementation:
    https://github.com/openai/openai/pull/878649
    
    ## Validation
    
    <details>
    <summary>Tests run</summary>
    
    ```sh
    cargo test -p codex-app-server-protocol
    cargo test -p codex-core attestation --lib
    cargo test -p codex-app-server --lib attestation
    ```
    
    Also ran:
    
    ```sh
    just fix -p codex-core
    just fix -p codex-app-server
    just fix -p codex-app-server-protocol
    just fmt
    just write-app-server-schema
    ```
    
    </details>
    
    <details>
    <summary>E2E DeviceCheck validation</summary>
    
    First validated the signed desktop app boundary directly: launched a
    packaged signed `Codex.app`, sent `attestation/generate`, decoded the
    returned `v1.` attestation header, and validated the extracted
    DeviceCheck token with `personal/jm/verify_devicecheck_token.py` using
    bundle ID `com.openai.codex`. Apple returned `status_code: 200` and
    `is_ok: true`.
    
    Then ran the fuller app + app-server flow. The packaged `Codex.app`
    launched a current-branch app-server via `CODEX_CLI_PATH`, and a local
    MITM proxy intercepted outbound `chatgpt.com` traffic. The app-server
    requested `attestation/generate` from the real Electron app process, and
    the intercepted `/backend-api/codex/responses` traffic included
    `x-oai-attestation` on both routes:
    
    ```text
    GET  /backend-api/codex/responses  Upgrade: websocket  x-oai-attestation: present
    POST /backend-api/codex/responses  Upgrade: none       x-oai-attestation: present
    ```
    
    The captured header decoded to a DeviceCheck token that also validated
    with Apple for `com.openai.codex` (`status_code: 200`, `is_ok: true`,
    team `2DC432GLL2`).
    
    </details>
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • [codex-analytics] plumb protocol-native review timing (#21434)
    ## Why
    
    We want terminal tool review analytics, but the reducer should not stamp
    review timing from its own wall clock.
    
    This PR plumbs review timing through the real protocol and app-server
    seams so downstream analytics can consume the emitter's timestamps
    directly. Guardian reviews keep their enriched `started_at` /
    `completed_at` analytics fields by deriving those legacy second-based
    values from the same protocol-native millisecond lifecycle timestamps,
    rather than sampling a separate analytics clock.
    
    ## What changed
    
    - add `started_at_ms` to user approval request payloads
    - add `started_at_ms` / `completed_at_ms` to guardian review
    notifications
    - preserve Guardian review `started_at` / `completed_at` enrichment from
    the protocol-native timing source
    - stamp typed `ServerResponse` analytics facts with app-server-observed
    `completed_at_ms`
    - thread the new timing fields through core, protocol, app-server, TUI,
    and analytics fixtures
    
    ## Verification
    
    - `cargo test -p codex-app-server outgoing_message --manifest-path
    codex-rs/Cargo.toml`
    - `cargo test -p codex-app-server-protocol guardian --manifest-path
    codex-rs/Cargo.toml`
    - `cargo test -p codex-tui guardian --manifest-path codex-rs/Cargo.toml`
    - `cargo test -p codex-analytics analytics_client_tests --manifest-path
    codex-rs/Cargo.toml`
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/21434).
    * #18748
    * __->__ #21434
    * #18747
    * #17090
    * #17089
    * #20514
  • Disable empty Cargo test targets (#21584)
    ## Summary
    
    `cargo test` has entails both running standard Rust tests and doctests.
    It turns out that the doctest discovery is fairly slow, and it's a cost
    you pay even for crates that don't include any doctests.
    
    This PR disables doctests with `doctest = false` for crates that lack
    any doctests.
    
    For the collection of crates below, this speeds up test execution by
    >4x.
    
    E.g., before this PR:
    
    ```
    Benchmark 1: cargo test     -p codex-utils-absolute-path     -p codex-utils-cache     -p codex-utils-cli     -p codex-utils-home-dir     -p codex-utils-output-truncation     -p codex-utils-path     -p codex-utils-string     -p codex-utils-template     -p codex-utils-elapsed     -p codex-utils-json-to-toml
      Time (mean ± σ):      1.849 s ±  4.455 s    [User: 0.752 s, System: 1.367 s]
      Range (min … max):    0.418 s … 14.529 s    10 runs
    ```
    
    And after:
    
    ```
    Benchmark 1: cargo test     -p codex-utils-absolute-path     -p codex-utils-cache     -p codex-utils-cli     -p codex-utils-home-dir     -p codex-utils-output-truncation     -p codex-utils-path     -p codex-utils-string     -p codex-utils-template     -p codex-utils-elapsed     -p codex-utils-json-to-toml
      Time (mean ± σ):     428.6 ms ±   6.9 ms    [User: 187.7 ms, System: 219.7 ms]
      Range (min … max):   418.0 ms … 436.8 ms    10 runs
    ```
    
    For a single crate, with >2x speedup, before:
    
    ```
    Benchmark 1: cargo test -p codex-utils-string
      Time (mean ± σ):     491.1 ms ±   9.0 ms    [User: 229.8 ms, System: 234.9 ms]
      Range (min … max):   480.9 ms … 512.0 ms    10 runs
    ```
    
    And after:
    
    ```
    Benchmark 1: cargo test -p codex-utils-string
      Time (mean ± σ):     213.9 ms ±   4.3 ms    [User: 112.8 ms, System: 84.0 ms]
      Range (min … max):   206.8 ms … 221.0 ms    13 runs
    ```
    
    Co-authored-by: Codex <noreply@openai.com>
  • Update Codex login success page UX (#20136)
    ## Summary
    
    update the local login success page to match the Codex desktop auth UX
    use theme-aware colors and an inline 20px Codex mark
    keep the actual localhost success page aligned with the browser auth UX
    PR
    
    ## Tests
    
    <img width="1728" height="1117" alt="Screenshot 2026-04-29 at 12 00
    34 PM"
    src="https://github.com/user-attachments/assets/76a40c3f-07c3-452c-97da-e7c43717cd2c"
    />
  • permissions: remove legacy read-only access modes (#19449)
    ## Why
    
    `ReadOnlyAccess` was a transitional legacy shape on `SandboxPolicy`:
    `FullAccess` meant the historical read-only/workspace-write modes could
    read the full filesystem, while `Restricted` tried to carry partial
    readable roots. The partial-read model now belongs in
    `FileSystemSandboxPolicy` and `PermissionProfile`, so keeping it on
    `SandboxPolicy` makes every legacy projection reintroduce lossy
    read-root bookkeeping and creates unnecessary noise in the rest of the
    permissions migration.
    
    This PR makes the legacy policy model narrower and explicit:
    `SandboxPolicy::ReadOnly` and `SandboxPolicy::WorkspaceWrite` represent
    the old full-read sandbox modes only. Split readable roots, deny-read
    globs, and platform-default/minimal read behavior stay in the runtime
    permissions model.
    
    ## What changed
    
    - Removes `ReadOnlyAccess` from
    `codex_protocol::protocol::SandboxPolicy`, including the generated
    `access` and `readOnlyAccess` API fields.
    - Updates legacy policy/profile conversions so restricted filesystem
    reads are represented only by `FileSystemSandboxPolicy` /
    `PermissionProfile` entries.
    - Keeps app-server v2 compatible with legacy `fullAccess` read-access
    payloads by accepting and ignoring that no-op shape, while rejecting
    legacy `restricted` read-access payloads instead of silently widening
    them to full-read legacy policies.
    - Carries Windows sandbox platform-default read behavior with an
    explicit override flag instead of depending on
    `ReadOnlyAccess::Restricted`.
    - Refreshes generated app-server schema/types and updates tests/docs for
    the simplified legacy policy shape.
    
    ## Verification
    
    - `cargo check -p codex-app-server-protocol --tests`
    - `cargo check -p codex-windows-sandbox --tests`
    - `cargo test -p codex-app-server-protocol sandbox_policy_`
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19449).
    * #19395
    * #19394
    * #19393
    * #19392
    * #19391
    * __->__ #19449
  • Support multiple cwd filters for thread list (#18502)
    ## Summary
    
    - Teach app-server `thread/list` to accept either a single `cwd` or an
    array of cwd filters, returning threads whose recorded session cwd
    matches any requested path
    - Add `useStateDbOnly` as an explicit opt-in fast path for callers that
    want to answer `thread/list` from SQLite without scanning JSONL rollout
    files
    - Preserve backwards compatibility: by default, `thread/list` still
    scans JSONL rollouts and repairs SQLite state
    - Wire the new cwd array and SQLite-only options through app-server,
    local/remote thread-store, rollout listing, generated TypeScript/schema
    fixtures, proto output, and docs
    
    ## Test Plan
    
    - `cargo test -p codex-app-server-protocol`
    - `cargo test -p codex-rollout`
    - `cargo test -p codex-thread-store`
    - `cargo test -p codex-app-server thread_list`
    - `just fmt`
    - `just fix -p codex-app-server-protocol -p codex-rollout -p
    codex-thread-store -p codex-app-server`
    - `cargo build -p codex-cli --bin codex`
  • Add sorting/backwardsCursor to thread/list and new thread/turns/list api (#17305)
    To improve performance of UI loads from the app, add two main
    improvements:
    1. The `thread/list` api now gets a `sortDirection` request field and a
    `backwardsCursor` to the response, which lets you paginate forwards and
    backwards from a window. This lets you fetch the first few items to
    display immediately while you paginate to fill in history, then can
    paginate "backwards" on future loads to catch up with any changes since
    the last UI load without a full reload of the entire data set.
    2. Added a new `thread/turns/list` api which also has sortDirection and
    backwardsCursor for the same behavior as `thread/list`, allowing you the
    same small-fetch for immediate display followed by background fill-in
    and resync catchup.
  • Add ChatGPT device-code login to app server (#15525)
    ## Problem
    
    App-server clients could only initiate ChatGPT login through the browser
    callback flow, even though the shared login crate already supports
    device-code auth. That left VS Code, Codex App, and other app-server
    clients without a first-class way to use the existing device-code
    backend when browser redirects are brittle or when the client UX wants
    to own the login ceremony.
    
    ## Mental model
    
    This change adds a second ChatGPT login start path to app-server:
    clients can now call `account/login/start` with `type:
    "chatgptDeviceCode"`. App-server immediately returns a `loginId` plus
    the device-code UX payload (`verificationUrl` and `userCode`), then
    completes the login asynchronously in the background using the existing
    `codex_login` polling flow. Successful device-code login still resolves
    to ordinary `chatgpt` auth, and completion continues to flow through the
    existing `account/login/completed` and `account/updated` notifications.
    
    ## Non-goals
    
    This does not introduce a new auth mode, a new account shape, or a
    device-code eligibility discovery API. It also does not add automatic
    fallback to browser login in core; clients remain responsible for
    choosing when to request device code and whether to retry with a
    different UX if the backend/admin policy rejects it.
    
    ## Tradeoffs
    
    We intentionally keep `login_chatgpt_common` as a local validation
    helper instead of turning it into a capability probe. Device-code
    eligibility is checked by actually calling `request_device_code`, which
    means policy-disabled cases surface as an immediate request error rather
    than an async completion event. We also keep the active-login state
    machine minimal: browser and device-code logins share the same public
    cancel contract, but device-code cancellation is implemented with a
    local cancel token rather than a larger cross-crate refactor.
    
    ## Architecture
    
    The protocol grows a new `chatgptDeviceCode` request/response variant in
    app-server v2. On the server side, the new handler reuses the existing
    ChatGPT login precondition checks, calls `request_device_code`, returns
    the device-code payload, and then spawns a background task that waits on
    either cancellation or `complete_device_code_login`. On success, it
    reuses the existing auth reload and cloud-requirements refresh path
    before emitting `account/login/completed` success and `account/updated`.
    On failure or cancellation, it emits only `account/login/completed`
    failure. The existing `account/login/cancel { loginId }` contract
    remains unchanged and now works for both browser and device-code
    attempts.
    
    
    ## Tests
    
    Added protocol serialization coverage for the new request/response
    variant, plus app-server tests for device-code success, failure, cancel,
    and start-time rejection behavior. Existing browser ChatGPT login
    coverage remains in place to show that the callback-based flow is
    unchanged.
  • chore: remove skill metadata from command approval payloads (#15906)
    ## Why
    
    This is effectively a follow-up to
    [#15812](https://github.com/openai/codex/pull/15812). That change
    removed the special skill-script exec path, but `skill_metadata` was
    still being threaded through command-approval payloads even though the
    approval flow no longer uses it to render prompts or resolve decisions.
    
    Keeping it around added extra protocol, schema, and client surface area
    without changing behavior.
    
    Removing it keeps the command-approval contract smaller and avoids
    carrying a dead field through app-server, TUI, and MCP boundaries.
    
    ## What changed
    
    - removed `ExecApprovalRequestSkillMetadata` and the corresponding
    `skillMetadata` field from core approval events and the v2 app-server
    protocol
    - removed the generated JSON and TypeScript schema output for that field
    - updated app-server, MCP server, TUI, and TUI app-server approval
    plumbing to stop forwarding the field
    - cleaned up tests that previously constructed or asserted
    `skillMetadata`
    
    ## Testing
    
    - `cargo test -p codex-app-server-protocol`
    - `cargo test -p codex-protocol`
    - `cargo test -p codex-app-server-test-client`
    - `cargo test -p codex-mcp-server`
    - `just argument-comment-lint`
  • Apply argument comment lint across codex-rs (#14652)
    ## Why
    
    Once the repo-local lint exists, `codex-rs` needs to follow the
    checked-in convention and CI needs to keep it from drifting. This commit
    applies the fallback `/*param*/` style consistently across existing
    positional literal call sites without changing those APIs.
    
    The longer-term preference is still to avoid APIs that require comments
    by choosing clearer parameter types and call shapes. This PR is
    intentionally the mechanical follow-through for the places where the
    existing signatures stay in place.
    
    After rebasing onto newer `main`, the rollout also had to cover newly
    introduced `tui_app_server` call sites. That made it clear the first cut
    of the CI job was too expensive for the common path: it was spending
    almost as much time installing `cargo-dylint` and re-testing the lint
    crate as a representative test job spends running product tests. The CI
    update keeps the full workspace enforcement but trims that extra
    overhead from ordinary `codex-rs` PRs.
    
    ## What changed
    
    - keep a dedicated `argument_comment_lint` job in `rust-ci`
    - mechanically annotate remaining opaque positional literals across
    `codex-rs` with exact `/*param*/` comments, including the rebased
    `tui_app_server` call sites that now fall under the lint
    - keep the checked-in style aligned with the lint policy by using
    `/*param*/` and leaving string and char literals uncommented
    - cache `cargo-dylint`, `dylint-link`, and the relevant Cargo
    registry/git metadata in the lint job
    - split changed-path detection so the lint crate's own `cargo test` step
    runs only when `tools/argument-comment-lint/*` or `rust-ci.yml` changes
    - continue to run the repo wrapper over the `codex-rs` workspace, so
    product-code enforcement is unchanged
    
    Most of the code changes in this commit are intentionally mechanical
    comment rewrites or insertions driven by the lint itself.
    
    ## Verification
    
    - `./tools/argument-comment-lint/run.sh --workspace`
    - `cargo test -p codex-tui-app-server -p codex-tui`
    - parsed `.github/workflows/rust-ci.yml` locally with PyYAML
    
    ---
    
    * -> #14652
    * #14651
  • chore(app-server): stop emitting codex/event/ notifications (#14392)
    ## Description
    
    This PR stops emitting legacy `codex/event/*` notifications from the
    public app-server transports.
    
    It's been a long time coming! app-server was still producing a raw
    notification stream from core, alongside the typed app-server
    notifications and server requests, for compatibility reasons. Now,
    external clients should no longer be depending on those legacy
    notifications, so this change removes them from the stdio and websocket
    contract and updates the surrounding docs, examples, and tests to match.
    
    ### Caveat
    I left the "in-process" version of app-server alone for now, since
    `codex exec` was recently based on top of app-server via this in-process
    form here: https://github.com/openai/codex/pull/14005
    
    Seems like `codex exec` still consumes some legacy notifications
    internally, so this branch only removes `codex/event/*` from app-server
    over stdio and websockets.
    
    ## Follow-up
    
    Once `codex exec` is fully migrated off `codex/event/*` notifications,
    we'll be able to stop emitting them entirely entirely instead of just
    filtering it at the external transport boundary.
  • fix(otel): make HTTP trace export survive app-server runtimes (#14300)
    ## Summary
    
    This PR fixes OTLP HTTP trace export in runtimes where the previous
    exporter setup was unreliable, especially around app-server usage. It
    also removes the old `codex_otel::otel_provider` compatibility shim and
    switches remaining call sites over to the crate-root
    `codex_otel::OtelProvider` export.
    
    ## What changed
    
    - Use a runtime-safe OTLP HTTP trace exporter path for Tokio runtimes.
    - Add an async HTTP client path for trace export when we are already
    inside a multi-thread Tokio runtime.
    - Make provider shutdown flush traces before tearing down the tracer
    provider.
    - Add loopback coverage that verifies traces are actually sent to
    `/v1/traces`:
      - outside Tokio
      - inside a multi-thread Tokio runtime
      - inside a current-thread Tokio runtime
    - Remove the `codex_otel::otel_provider` shim and update remaining
    imports.
    
    ## Why
    
    I hit cases where spans were being created correctly but never made it
    to the collector. The issue turned out to be in exporter/runtime
    behavior rather than the span plumbing itself. This PR narrows that gap
    and gives us regression coverage for the actual export path.
  • Implemented thread-level atomic elicitation counter for stopwatch pausing (#12296)
    ### Purpose
    While trying to build out CLI-Tools for the agent to use under skills we
    have found that those tools sometimes need to invoke a user elicitation.
    These elicitations are handled out of band of the codex app-server but
    need to indicate to the exec manager that the command running is not
    going to progress on the usual timeout horizon.
    
    ### Example
    Model calls universal exec:
    `$ download-credit-card-history --start-date 2026-01-19 --end-date
    2026-02-19 > credit_history.jsonl`
    
    download-cred-card-history might hit a hosted/preauthenticated service
    to fetch data. That service might decide that the request requires an
    end user approval the access to the personal data. It should be able to
    signal to the running thread that the command in question is blocked on
    user elicitation. In that case we want the exec to continue, but the
    timeout to not expire on the tool call, essentially freezing time until
    the user approves or rejects the command at which point the tool would
    signal the app-server to decrement the outstanding elicitation count.
    Now timeouts would proceed as normal.
    
    ### What's Added
    
    - New v2 RPC methods:
        - thread/increment_elicitation
        - thread/decrement_elicitation
    - Protocol updates in:
        - codex-rs/app-server-protocol/src/protocol/common.rs
        - codex-rs/app-server-protocol/src/protocol/v2.rs
    - App-server handlers wired in:
        - codex-rs/app-server/src/codex_message_processor.rs
    
    ### Behavior
    
    - Counter starts at 0 per thread.
    - increment atomically increases the counter.
    - decrement atomically decreases the counter; decrement at 0 returns
    invalid request.
    - Transition rules:
    - 0 -> 1: broadcast pause state, pausing all active stopwatches
    immediately.
        - \>0 -> >0: remain paused.
        - 1 -> 0: broadcast unpause state, resuming stopwatches.
    - Core thread/session logic:
        - codex-rs/core/src/codex_thread.rs
        - codex-rs/core/src/codex.rs
        - codex-rs/core/src/mcp_connection_manager.rs
    
    ### Exec-server stopwatch integration
    
    - Added centralized stopwatch tracking/controller:
        - codex-rs/exec-server/src/posix/stopwatch_controller.rs
    - Hooked pause/unpause broadcast handling + stopwatch registration:
        - codex-rs/exec-server/src/posix/mcp.rs
        - codex-rs/exec-server/src/posix/stopwatch.rs
        - codex-rs/exec-server/src/posix.rs
  • app-server: include experimental skill metadata in exec approval requests (#13929)
    ## Summary
    
    This change surfaces skill metadata on command approval requests so
    app-server clients can tell when an approval came from a skill script
    and identify the originating `SKILL.md`.
    
    - add `skill_metadata` to exec approval events in the shared protocol
    - thread skill metadata through core shell escalation and delegated
    approval handling for skill-triggered approvals
    - expose the field in app-server v2 as experimental `skillMetadata`
    - regenerate the JSON/TypeScript schemas and cover the new field in
    protocol, transport, core, and TUI tests
    
    ## Why
    
    Skill-triggered approvals already carry skill context inside core, but
    app-server clients could not see which skill caused the prompt. Sending
    the skill metadata with the approval request makes it possible for
    clients to present better approval UX and connect the prompt back to the
    relevant skill definition.
    
    
    ## example event in app-server-v2
    verified that we see this event when experimental api is on:
    ```
    < {
    <   "id": 11,
    <   "method": "item/commandExecution/requestApproval",
    <   "params": {
    <     "additionalPermissions": {
    <       "fileSystem": null,
    <       "macos": {
    <         "accessibility": false,
    <         "automations": {
    <           "bundle_ids": [
    <             "com.apple.Notes"
    <           ]
    <         },
    <         "calendar": false,
    <         "preferences": "read_only"
    <       },
    <       "network": null
    <     },
    <     "approvalId": "25d600ee-5a3c-4746-8d17-e2e61fb4c563",
    <     "availableDecisions": [
    <       "accept",
    <       "acceptForSession",
    <       "cancel"
    <     ],
    <     "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info",
    <     "commandActions": [
    <       {
    <         "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info",
    <         "type": "unknown"
    <       }
    <     ],
    <     "cwd": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes",
    <     "itemId": "call_jZp3xFpNg4D8iKAD49cvEvZy",
    <     "skillMetadata": {
    <       "pathToSkillsMd": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/SKILL.md"
    <     },
    <     "threadId": "019ccc10-b7d3-7ff2-84fe-3a75e7681e69",
    <     "turnId": "019ccc10-b848-76f1-81b3-4a1fa225493f"
    <   }
    < }`
    ```
    
    & verified that this is the event when experimental api is off:
    ```
    < {
    <   "id": 13,
    <   "method": "item/commandExecution/requestApproval",
    <   "params": {
    <     "approvalId": "5fbbf776-261b-4cf8-899b-c125b547f2c0",
    <     "availableDecisions": [
    <       "accept",
    <       "acceptForSession",
    <       "cancel"
    <     ],
    <     "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info",
    <     "commandActions": [
    <       {
    <         "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info",
    <         "type": "unknown"
    <       }
    <     ],
    <     "cwd": "/Users/celia/code/codex/codex-rs",
    <     "itemId": "call_OV2DHzTgYcbYtWaTTBWlocOt",
    <     "threadId": "019ccc16-2a2b-7be1-8500-e00d45b892d4",
    <     "turnId": "019ccc16-2a8e-7961-98ec-649600e7d06a"
    <   }
    < }
    ```
  • app-server: Add streaming and tty/pty capabilities to command/exec (#13640)
    * Add an ability to stream stdin, stdout, and stderr
    * Streaming of stdout and stderr has a configurable cap for total amount
    of transmitted bytes (with an ability to disable it)
    * Add support for overriding environment variables
    * Add an ability to terminate running applications (using
    `command/exec/terminate`)
    * Add TTY/PTY support, with an ability to resize the terminal (using
    `command/exec/resize`)
  • feat(app-server-test-client): OTEL setup for tracing (#13493)
    ### Overview
    This PR:
    - Updates `app-server-test-client` to load OTEL settings from
    `$CODEX_HOME/config.toml` and initializes its own OTEL provider.
    - Add real client root spans to app-server test client traces.
    
    This updates `codex-app-server-test-client` so its Datadog traces
    reflect the full client-driven flow instead of a set of server spans
    stitched together under a synthetic parent.
    
    Before this change, the test client generated a fake `traceparent` once
    and reused it for every JSON-RPC request. That kept the requests in one
    trace, but there was no real client span at the top, so Datadog ended up
    showing the sequence in a slightly misleading way, where all RPCs were
    anchored under `initialize`.
    
    Now the test client:
    - loads OTEL settings from the normal Codex config path, including
    `$CODEX_HOME/config.toml` and existing --config overrides
    - initializes tracing the same way other Codex binaries do when trace
    export is enabled
    - creates a real client root span for each scripted command
    - creates per-request client spans for JSON-RPC methods like
    `initialize`, `thread/start`, and `turn/start`
    - injects W3C trace context from the current client span into
    request.trace instead of reusing a fabricated carrier
    
    This gives us a cleaner trace shape in Datadog:
    - one trace URL for the whole scripted flow
    - a visible client root span
    - proper client/server parent-child relationships for each app-server
    request
  • Feat: Preserve network access on read-only sandbox policies (#13409)
    ## Summary
    
    `PermissionProfile.network` could not be preserved when additional or
    compiled permissions resolved to
    `SandboxPolicy::ReadOnly`, because `ReadOnly` had no network_access
    field. This change makes read-only + network
    enabled representable directly and threads that through the protocol,
    app-server v2 mirror, and permission-
      merging logic.
    
    ## What changed
    
    - Added `network_access: bool` to `SandboxPolicy::ReadOnly` in the core
    protocol and app-server v2 protocol.
    - Kept backward compatibility by defaulting the new field to false, so
    legacy read-only payloads still
        deserialize unchanged.
    - Updated `has_full_network_access()` and sandbox summaries to respect
    read-only network access.
      - Preserved PermissionProfile.network when:
          - compiling skill permission profiles into sandbox policies
          - normalizing additional permissions
          - merging additional permissions into existing sandbox policies
    - Updated the approval overlay to show network in the rendered
    permission rule when requested.
      - Regenerated app-server schema fixtures for the new v2 wire shape.
  • chore(app-server): delete v1 RPC methods and notifications (#13375)
    ## Summary
    This removes the old app-server v1 methods and notifications we no
    longer need, while keeping the small set the main codex app client still
    depends on for now.
    
    The remaining legacy surface is:
    - `initialize`
    - `getConversationSummary`
    - `getAuthStatus`
    - `gitDiffToRemote`
    - `fuzzyFileSearch`
    - `fuzzyFileSearch/sessionStart`
    - `fuzzyFileSearch/sessionUpdate`
    - `fuzzyFileSearch/sessionStop`
    
    And the raw `codex/event/*` notifications emitted from core. These
    notifications will be removed in a followup PR.
    
    ## What changed
    - removed deprecated v1 request variants from the protocol and
    app-server dispatcher
    - removed deprecated typed notifications: `authStatusChange`,
    `loginChatGptComplete`, and `sessionConfigured`
    - updated the app-server test client to use v2 flows instead of deleted
    v1 flows
    - deleted legacy-only app-server test suites and added focused coverage
    for `getConversationSummary`
    - regenerated app-server schema fixtures and updated the MCP interface
    docs to match the remaining compatibility surface
    
    ## Testing
    - `just write-app-server-schema`
    - `cargo test -p codex-app-server-protocol`
    - `cargo test -p codex-app-server`
  • app-server: Add an ability to watch events in the test client (#13080)
    Add a `watch` subcommand to `codex-app-server-test-client` binary to
    help in manual testing of events flow.
  • feat: include available decisions in command approval requests (#12758)
    Command-approval clients currently infer which choices to show from
    side-channel fields like `networkApprovalContext`,
    `proposedExecpolicyAmendment`, and `additionalPermissions`. That makes
    the request shape harder to evolve, and it forces each client to
    replicate the server's heuristics instead of receiving the exact
    decision list for the prompt.
    
    This PR introduces a mapping between `CommandExecutionApprovalDecision`
    and `codex_protocol::protocol::ReviewDecision`:
    
    ```rust
    impl From<CoreReviewDecision> for CommandExecutionApprovalDecision {
        fn from(value: CoreReviewDecision) -> Self {
            match value {
                CoreReviewDecision::Approved => Self::Accept,
                CoreReviewDecision::ApprovedExecpolicyAmendment {
                    proposed_execpolicy_amendment,
                } => Self::AcceptWithExecpolicyAmendment {
                    execpolicy_amendment: proposed_execpolicy_amendment.into(),
                },
                CoreReviewDecision::ApprovedForSession => Self::AcceptForSession,
                CoreReviewDecision::NetworkPolicyAmendment {
                    network_policy_amendment,
                } => Self::ApplyNetworkPolicyAmendment {
                    network_policy_amendment: network_policy_amendment.into(),
                },
                CoreReviewDecision::Abort => Self::Cancel,
                CoreReviewDecision::Denied => Self::Decline,
            }
        }
    }
    ```
    
    And updates `CommandExecutionRequestApprovalParams` to have a new field:
    
    ```rust
    available_decisions: Option<Vec<CommandExecutionApprovalDecision>>
    ```
    
    when, if specified, should make it easier for clients to display an
    appropriate list of options in the UI.
    
    This makes it possible for `CoreShellActionProvider::prompt()` in
    `unix_escalation.rs` to specify the `Vec<ReviewDecision>` directly,
    adding support for `ApprovedForSession` when approving a skill script,
    which was previously missing in the TUI.
    
    Note this results in a significant change to `exec_options()` in
    `approval_overlay.rs`, as the displayed options are now derived from
    `available_decisions: &[ReviewDecision]`.
    
    ## What Changed
    
    - Add `available_decisions` to
    [`ExecApprovalRequestEvent`](https://github.com/openai/codex/blob/de00e932dd9801de0a4faac0519162099753f331/codex-rs/protocol/src/approvals.rs#L111-L175),
    including helpers to derive the legacy default choices when older
    senders omit the field.
    - Map `codex_protocol::protocol::ReviewDecision` to app-server
    `CommandExecutionApprovalDecision` and expose the ordered list as
    experimental `availableDecisions` in
    [`CommandExecutionRequestApprovalParams`](https://github.com/openai/codex/blob/de00e932dd9801de0a4faac0519162099753f331/codex-rs/app-server-protocol/src/protocol/v2.rs#L3798-L3807).
    - Thread optional `available_decisions` through the core approval path
    so Unix shell escalation can explicitly request `ApprovedForSession` for
    session-scoped approvals instead of relying on client heuristics.
    [`unix_escalation.rs`](https://github.com/openai/codex/blob/de00e932dd9801de0a4faac0519162099753f331/codex-rs/core/src/tools/runtimes/shell/unix_escalation.rs#L194-L214)
    - Update the TUI approval overlay to build its buttons from the ordered
    decision list, while preserving the legacy fallback when
    `available_decisions` is missing.
    - Update the app-server README, test client output, and generated schema
    artifacts to document and surface the new field.
    
    ## Testing
    
    - Add `approval_overlay.rs` coverage for explicit decision lists,
    including the generic `ApprovedForSession` path and network approval
    options.
    - Update `chatwidget/tests.rs` and app-server protocol tests to populate
    the new optional field and keep older event shapes working.
    
    ## Developers Docs
    
    - If we document `item/commandExecution/requestApproval` on
    [developers.openai.com/codex](https://developers.openai.com/codex), add
    experimental `availableDecisions` as the preferred source of approval
    choices and note that older servers may omit it.
  • Revert "Add skill approval event/response (#12633)" (#12811)
    This reverts commit https://github.com/openai/codex/pull/12633. We no
    longer need this PR, because we favor sending normal exec command
    approval server request with `additional_permissions` of skill
    permissions instead
  • feat: add search term to thread list (#12578)
    Add `searchTerm` to `thread/list` that will search for a match in the
    titles (the condition being `searchTerm` $$\in$$ `title`)
  • feat(ui): add network approval persistence plumbing (#12358)
    ## Summary
    - add TUI approval options for persistent network host rules
    - add app-server v2 approval payload plumbing for network approval
    context + proposed network policy amendments
    - add app-server handling to translate `applyNetworkPolicyAmendment`
    decisions back into core review decisions
    - update docs/test client output and generated app-server schemas/types
  • feat: add experimental additionalPermissions to v2 command execution approval requests (#12737)
    This adds additionalPermissions to the app-server v2
    item/commandExecution/requestApproval payload as an experimental field.
    
    The field is now exposed on CommandExecutionRequestApprovalParams and is
    populated from the existing core approval event when a command requests
    additional sandbox permissions.
    
    This PR also contains changes to make server requests to support
    experiment API.
    
    A real app server test client test:
    
    sample payload with experimental flag off:
    ```
     {
    <   "id": 0,
    <   "method": "item/commandExecution/requestApproval",
    <   "params": {
    <     "command": "/bin/zsh -lc 'mkdir -p ~/some/test && touch ~/some/test/file'",
    <     "commandActions": [
    <       {
    <         "command": "mkdir -p '~/some/test'",
    <         "type": "unknown"
    <       },
    <       {
    <         "command": "touch '~/some/test/file'",
    <         "type": "unknown"
    <       }
    <     ],
    <     "cwd": "/Users/celia/code/codex/codex-rs",
    <     "itemId": "call_QLp0LWkQ1XkU6VW9T2vUZFWB",
    <     "proposedExecpolicyAmendment": [
    <       "mkdir",
    <       "-p",
    <       "~/some/test"
    <     ],
    <     "reason": "Do you want to allow creating ~/some/test/file outside the workspace?",
    <     "threadId": "019c9309-e209-7d82-a01b-dcf9556a354d",
    <     "turnId": "019c9309-e27a-7f33-834f-6011e795c2d6"
    <   }
    < }
    ```
    with experimental flag on: 
    ```
    < {
    <   "id": 0,
    <   "method": "item/commandExecution/requestApproval",
    <   "params": {
    <     "additionalPermissions": {
    <       "fileSystem": null,
    <       "macos": null,
    <       "network": true
    <     },
    <     "command": "/bin/zsh -lc 'install -D /dev/null ~/some/test/file'",
    <     "commandActions": [
    <       {
    <         "command": "install -D /dev/null '~/some/test/file'",
    <         "type": "unknown"
    <       }
    <     ],
    <     "cwd": "/Users/celia/code/codex/codex-rs",
    <     "itemId": "call_K3U4b3dRbj3eMCqslmncbGsq",
    <     "proposedExecpolicyAmendment": [
    <       "install",
    <       "-D"
    <     ],
    <     "reason": "Do you want to allow creating the file at ~/some/test/file outside the workspace sandbox?",
    <     "threadId": "019c9303-3a8e-76e1-81bf-d67ac446d892",
    <     "turnId": "019c9303-3af1-7143-88a1-73132f771234"
    <   }
    < }
    ```
  • Add skill approval event/response (#12633)
    Set the stage for skill-level permission approval in addition to
    command-level.
    
    Behind a feature flag.
  • Refactor network approvals to host/protocol/port scope (#12140)
    ## Summary
    Simplify network approvals by removing per-attempt proxy correlation and
    moving to session-level approval dedupe keyed by (host, protocol, port).
    Instead of encoding attempt IDs into proxy credentials/URLs, we now
    treat approvals as a destination policy decision.
    
    - Concurrent calls to the same destination share one approval prompt.
    - Different destinations (or same host on different ports) get separate
    prompts.
    - Allow once approves the current queued request group only.
    - Allow for session caches that (host, protocol, port) and auto-allows
    future matching requests.
    - Never policy continues to deny without prompting.
    
    Example:
    - 3 calls: 
      - a.com (line 443)
      - b.com (line 443)
      - a.com (line 443)
    => 2 prompts total (a, b), second a waits on the first decision.
    - a.com:80 is treated separately from a.com line 443
    
    ## Testing
    - `just fmt` (in `codex-rs`)
    - `cargo test -p codex-core tools::network_approval::tests`
    - `cargo test -p codex-core` (unit tests pass; existing
    integration-suite failures remain in this environment)
  • feat(core): zsh exec bridge (#12052)
    zsh fork PR stack:
    - https://github.com/openai/codex/pull/12051 
    - https://github.com/openai/codex/pull/12052 👈 
    
    ### Summary
    This PR introduces a feature-gated native shell runtime path that routes
    shell execution through a patched zsh exec bridge, removing MCP-specific
    behavior from the shell hot path while preserving existing
    CommandExecution lifecycle semantics.
    
    When shell_zsh_fork is enabled, shell commands run via patched zsh with
    per-`execve` interception through EXEC_WRAPPER. Core receives wrapper
    IPC requests over a Unix socket, applies existing approval policy, and
    returns allow/deny before the subcommand executes.
    
    ### What’s included
    **1) New zsh exec bridge runtime in core**
    - Wrapper-mode entrypoint (maybe_run_zsh_exec_wrapper_mode) for
    EXEC_WRAPPER invocations.
    - Per-execution Unix-socket IPC handling for wrapper requests/responses.
    - Approval callback integration using existing core approval
    orchestration.
    - Streaming stdout/stderr deltas to existing command output event
    pipeline.
    - Error handling for malformed IPC, denial/abort, and execution
    failures.
    
    **2) Session lifecycle integration**
    SessionServices now owns a `ZshExecBridge`.
    Session startup initializes bridge state; shutdown tears it down
    cleanly.
    
    **3) Shell runtime routing (feature-gated)**
    When `shell_zsh_fork` is enabled:
    - Build execution env/spec as usual.
    - Add wrapper socket env wiring.
    - Execute via `zsh_exec_bridge.execute_shell_request(...)` instead of
    the regular shell path.
    - Non-zsh-fork behavior remains unchanged.
    
    **4) Config + feature wiring**
    - Added `Feature::ShellZshFork` (under development).
    - Added config support for `zsh_path` (optional absolute path to patched
    zsh):
    - `Config`, `ConfigToml`, `ConfigProfile`, overrides, and schema.
    - Session startup validates that `zsh_path` exists/usable when zsh-fork
    is enabled.
    - Added startup test for missing `zsh_path` failure mode.
    
    **5) Seatbelt/sandbox updates for wrapper IPC**
    - Extended seatbelt policy generation to optionally allow outbound
    connection to explicitly permitted Unix sockets.
    - Wired sandboxing path to pass wrapper socket path through to seatbelt
    policy generation.
    - Added/updated seatbelt tests for explicit socket allow rule and
    argument emission.
    
    **6) Runtime entrypoint hooks**
    - This allows the same binary to act as the zsh wrapper subprocess when
    invoked via `EXEC_WRAPPER`.
    
    **7) Tool selection behavior**
    - ToolsConfig now prefers ShellCommand type when shell_zsh_fork is
    enabled.
    - Added test coverage for precedence with unified-exec enabled.
  • feat(core): plumb distinct approval ids for command approvals (#12051)
    zsh fork PR stack:
    - https://github.com/openai/codex/pull/12051 👈 
    - https://github.com/openai/codex/pull/12052
    
    With upcoming support for a fork of zsh that allows us to intercept
    `execve` and run execpolicy checks for each subcommand as part of a
    `CommandExecution`, it will be possible for there to be multiple
    approval requests for a shell command like `/path/to/zsh -lc 'git status
    && rg \"TODO\" src && make test'`.
    
    To support that, this PR introduces a new `approval_id` field across
    core, protocol, and app-server so that we can associate approvals
    properly for subcommands.
  • codex-rs: fix thread resume rejoin semantics (#11756)
    ## Summary
    - always rejoin an in-memory running thread on `thread/resume`, even
    when overrides are present
    - reject `thread/resume` when `history` is provided for a running thread
    - reject `thread/resume` when `path` mismatches the running thread
    rollout path
    - warn (but do not fail) on override mismatches for running threads
    - add more `thread_resume` integration tests and fixes; including
    restart-based resume-with-overrides coverage
    
    ## Validation
    - `just fmt`
    - `cargo test -p codex-app-server --test all thread_resume`
    - manual test with app-server-test-client
    https://github.com/openai/codex/pull/11755
    - manual test both stdio and websocket in app
  • app-server-test-client websocket client and thread tools (#11755)
    - add websocket endpoint mode with default ws://127.0.0.1:4222 while
    keeping stdio codex-bin path compatibility
    - add thread-resume (follow stream) and thread-list commands for manual
    thread lifecycle testing
    - quickstart docs
  • feat: make sandbox read access configurable with ReadOnlyAccess (#11387)
    `SandboxPolicy::ReadOnly` previously implied broad read access and could
    not express a narrower read surface.
    This change introduces an explicit read-access model so we can support
    user-configurable read restrictions in follow-up work, while preserving
    current behavior today.
    
    It also ensures unsupported backends fail closed for restricted-read
    policies instead of silently granting broader access than intended.
    
    ## What
    
    - Added `ReadOnlyAccess` in protocol with:
      - `Restricted { include_platform_defaults, readable_roots }`
      - `FullAccess`
    - Updated `SandboxPolicy` to carry read-access configuration:
      - `ReadOnly { access: ReadOnlyAccess }`
      - `WorkspaceWrite { ..., read_only_access: ReadOnlyAccess }`
    - Preserved existing behavior by defaulting current construction paths
    to `ReadOnlyAccess::FullAccess`.
    - Threaded the new fields through sandbox policy consumers and call
    sites across `core`, `tui`, `linux-sandbox`, `windows-sandbox`, and
    related tests.
    - Updated Seatbelt policy generation to honor restricted read roots by
    emitting scoped read rules when full read access is not granted.
    - Added fail-closed behavior on Linux and Windows backends when
    restricted read access is requested but not yet implemented there
    (`UnsupportedOperation`).
    - Regenerated app-server protocol schema and TypeScript artifacts,
    including `ReadOnlyAccess`.
    
    ## Compatibility / rollout
    
    - Runtime behavior remains unchanged by default (`FullAccess`).
    - API/schema changes are in place so future config wiring can enable
    restricted read access without another policy-shape migration.
  • fix: remove errant Cargo.lock files (#11526)
    These leaked into the repo:
    
    - #4905 `codex-rs/windows-sandbox-rs/Cargo.lock`
    - #5391 `codex-rs/app-server-test-client/Cargo.lock`
    
    Note that these affect cache keys such as:
    
    
    https://github.com/openai/codex/blob/9722567a80b4bac81b74f679828747031fd95fa0/.github/workflows/rust-release.yml#L154
    
    so it seems best to remove them.
  • chore: persist turn_id in rollout session and make turn_id uuid based (#11246)
    Problem:
    1. turn id is constructed in-memory;
    2. on resuming threads, turn_id might not be unique;
    3. client cannot no the boundary of a turn from rollout files easily.
    
    This PR does three things:
    1. persist `task_started` and `task_complete` events;
    1. persist `turn_id` in rollout turn events;
    5. generate turn_id as unique uuids instead of incrementing it in
    memory.
    
    This helps us resolve the issue of clients wanting to have unique turn
    ids for resuming a thread, and knowing the boundry of each turn in
    rollout files.
    
    example debug logs
    ```
    2026-02-11T00:32:10.746876Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=8 turn=Turn { id: "019c4a07-d809-74c3-bc4b-fd9618487b4b", items: [UserMessage { id: "item-24", content: [Text { text: "hi", text_elements: [] }] }, AgentMessage { id: "item-25", text: "Hi. I’m in the workspace with your current changes loaded and ready. Send the next task and I’ll execute it end-to-end." }], status: Completed, error: None }
    2026-02-11T00:32:10.746888Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=9 turn=Turn { id: "019c4a18-1004-76c0-a0fb-a77610f6a9b8", items: [UserMessage { id: "item-26", content: [Text { text: "hello", text_elements: [] }] }, AgentMessage { id: "item-27", text: "Hello. Ready for the next change in `codex-rs`; I can continue from the current in-progress diff or start a new task." }], status: Completed, error: None }
    2026-02-11T00:32:10.746899Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=10 turn=Turn { id: "019c4a19-41f0-7db0-ad78-74f1503baeb8", items: [UserMessage { id: "item-28", content: [Text { text: "hello", text_elements: [] }] }, AgentMessage { id: "item-29", text: "Hello. Send the specific change you want in `codex-rs`, and I’ll implement it and run the required checks." }], status: Completed, error: None }
    ```
    
    backward compatibility:
    if you try to resume an old session without task_started and
    task_complete event populated, the following happens:
    - If you resume and do nothing: those reconstructed historical IDs can
    differ next time you resume.
    - If you resume and send a new turn: the new turn gets a fresh UUID from
    live submission flow and is persisted, so that new turn’s ID is stable
    on later resumes.
    I think this behavior is fine, because we only care about deterministic
    turn id once a turn is triggered.
  • feat: opt-out of events in the app-server (#11319)
    Add `optOutNotificationMethods` in the app-server to opt-out events
    based on exact method matching
  • chore: add codex debug app-server tooling (#10367)
    codex debug app-server <user message> forwards the message through
    codex-app-server-test-client’s send_message_v2 library entry point,
    using std::env::current_exe() to resolve the codex binary.
    
    for how it looks like, see:
    
    ```
    celia@com-92114 codex-rs % cargo build -p codex-cli && target/debug/codex debug app-server --help                       
        Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.34s
    Tooling: helps debug the app server
    
    Usage: codex debug app-server [OPTIONS] <COMMAND>
    
    Commands:
      send-message-v2  
      help             Print this message or the help of the given subcommand(s)
    ````
    and
    ```
    celia@com-92114 codex-rs % cargo build -p codex-cli && target/debug/codex debug app-server send-message-v2 "hello world"
       Compiling codex-cli v0.0.0 (/Users/celia/code/codex/codex-rs/cli)
        Finished `dev` profile [unoptimized + debuginfo] target(s) in 1.38s
    > {
    >   "method": "initialize",
    >   "id": "f8ba9f60-3a49-4ea9-81d6-4ab6853e3954",
    >   "params": {
    >     "clientInfo": {
    >       "name": "codex-toy-app-server",
    >       "title": "Codex Toy App Server",
    >       "version": "0.0.0"
    >     },
    >     "capabilities": {
    >       "experimentalApi": true
    >     }
    >   }
    > }
    < {
    <   "id": "f8ba9f60-3a49-4ea9-81d6-4ab6853e3954",
    <   "result": {
    <     "userAgent": "codex-toy-app-server/0.0.0 (Mac OS 26.2.0; arm64) vscode/2.4.27 (codex-toy-app-server; 0.0.0)"
    <   }
    < }
    < initialize response: InitializeResponse { user_agent: "codex-toy-app-server/0.0.0 (Mac OS 26.2.0; arm64) vscode/2.4.27 (codex-toy-app-server; 0.0.0)" }
    > {
    >   "method": "thread/start",
    >   "id": "203f1630-beee-4e60-b17b-9eff16b1638b",
    >   "params": {
    >     "model": null,
    >     "modelProvider": null,
    >     "cwd": null,
    >     "approvalPolicy": null,
    >     "sandbox": null,
    >     "config": null,
    >     "baseInstructions": null,
    >     "developerInstructions": null,
    >     "personality": null,
    >     "ephemeral": null,
    >     "dynamicTools": null,
    >     "mockExperimentalField": null,
    >     "experimentalRawEvents": false
    >   }
    > }
    ...
    ```
  • feat: replace custom mcp-types crate with equivalents from rmcp (#10349)
    We started working with MCP in Codex before
    https://crates.io/crates/rmcp was mature, so we had our own crate for
    MCP types that was generated from the MCP schema:
    
    
    https://github.com/openai/codex/blob/8b95d3e082376f4cb23e92641705a22afb28a9da/codex-rs/mcp-types/README.md
    
    Now that `rmcp` is more mature, it makes more sense to use their MCP
    types in Rust, as they handle details (like the `_meta` field) that our
    custom version ignored. Though one advantage that our custom types had
    is that our generated types implemented `JsonSchema` and `ts_rs::TS`,
    whereas the types in `rmcp` do not. As such, part of the work of this PR
    is leveraging the adapters between `rmcp` types and the serializable
    types that are API for us (app server and MCP) introduced in #10356.
    
    Note this PR results in a number of changes to
    `codex-rs/app-server-protocol/schema`, which merit special attention
    during review. We must ensure that these changes are still
    backwards-compatible, which is possible because we have:
    
    ```diff
    - export type CallToolResult = { content: Array<ContentBlock>, isError?: boolean, structuredContent?: JsonValue, };
    + export type CallToolResult = { content: Array<JsonValue>, structuredContent?: JsonValue, isError?: boolean, _meta?: JsonValue, };
    ```
    
    so `ContentBlock` has been replaced with the more general `JsonValue`.
    Note that `ContentBlock` was defined as:
    
    ```typescript
    export type ContentBlock = TextContent | ImageContent | AudioContent | ResourceLink | EmbeddedResource;
    ```
    
    so the deletion of those individual variants should not be a cause of
    great concern.
    
    Similarly, we have the following change in
    `codex-rs/app-server-protocol/schema/typescript/Tool.ts`:
    
    ```
    - export type Tool = { annotations?: ToolAnnotations, description?: string, inputSchema: ToolInputSchema, name: string, outputSchema?: ToolOutputSchema, title?: string, };
    + export type Tool = { name: string, title?: string, description?: string, inputSchema: JsonValue, outputSchema?: JsonValue, annotations?: JsonValue, icons?: Array<JsonValue>, _meta?: JsonValue, };
    ```
    
    so:
    
    - `annotations?: ToolAnnotations` ➡️ `JsonValue`
    - `inputSchema: ToolInputSchema` ➡️ `JsonValue`
    - `outputSchema?: ToolOutputSchema` ➡️ `JsonValue`
    
    and two new fields: `icons?: Array<JsonValue>, _meta?: JsonValue`
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/10349).
    * #10357
    * __->__ #10349
    * #10356
  • [feat] persist thread_dynamic_tools in db (#10252)
    Persist thread_dynamic_tools in sqlite and read first from it. Fall back
    to rollout files if it's not found. Persist dynamic tools to both sqlite
    and rollout files.
    
    Saw that new sessions get populated to db correctly & old sessions get
    backfilled correctly at startup:
    ```
    celia@com-92114 codex-rs % sqlite3 ~/.codex/state.sqlite \      "select thread_id, position,name,description,input_schema from thread_dynamic_tools;"
    019c0cad-ec0d-74b2-a787-e8b33a349117|0|geo_lookup|lookup a city|{"properties":{"city":{"type":"string"}},"required":["city"],"type":"object"}
    ....
    019c10ca-aa4b-7620-ae40-c0919fbd7ea7|0|geo_lookup|lookup a city|{"properties":{"city":{"type":"string"}},"required":["city"],"type":"object"}
    ```