Commit Graph

7552 Commits

  • code-mode: move cell state into library actor (#28599)
    A code-mode cell is a single JavaScript execution that can produce
    output, call tools, wait for asynchronous work, resume, or be
    terminated. This PR extracts the existing per-cell run loop into a
    dedicated actor that owns the cell’s lifecycle state. It is primarily an
    ownership change rather than a new lifecycle contract: existing behavior
    now has one clear implementation boundary.
    
    ### Architecture
    The session service remains responsible for session-wide concerns:
    allocating cell IDs, storing shared values, creating cells, and routing
    requests to them.
    
    Once a cell is created, its execution state belongs to its actor.
    Callers interact with the actor through a handle. The actor receives two
    kinds of input: runtime events and control requests.
    
    A single event loop serializes these inputs and applies the lifecycle
    rules. It tracks the current observer—the caller waiting for an
    update—along with accumulated output, outstanding callbacks, runtime
    state, yield deadlines, and termination progress. Observation,
    termination, completion, and cleanup therefore have one consistent
    owner.
    
    When the runtime has no immediately runnable work and is waiting only on
    timers or tool results, the actor can return accumulated output and
    information about outstanding tool calls while keeping the cell
    available to resume. On completion or termination, it performs the
    appropriate callback cleanup before publishing the final result and
    removing the cell from the session.
    
    A small host interface connects the actor to session-owned facilities
    such as tool dispatch, notifications, stored values, and final cell
    removal, keeping those responsibilities outside the actor itself.
    
    ### Why
    Previously, cell lifecycle state and coordination lived alongside
    session management. The actor boundary makes each cell a self-contained
    state machine with a single writer, while the service becomes a registry
    and adapter around it.
    
    This makes lifecycle behavior easier to reason about and test in
    isolation. It also establishes a clean boundary for later changing where
    cells run or how they communicate without recreating their lifecycle
    rules.
  • [codex] Support object-valued plugin MCP manifests (#28580)
    ## Summary
    This fixes plugin manifest parsing for MCP servers declared as an object
    directly in `plugin.json`.
    
    Before this change, Codex modeled `mcpServers` as only a string path,
    for example:
    
    ```json
    {
      "name": "counter-sample",
      "version": "1.1.1",
      "mcpServers": "./.mcp.json"
    }
    ```
    
    Some migrated plugins instead provide the server map directly in the
    manifest:
    
    ```json
    {
      "name": "counter-sample",
      "version": "1.1.1",
      "description": "Plugin that declares MCP servers in the manifest",
      "mcpServers": {
        "counter": {
          "type": "http",
          "url": "https://sample.example/counter/mcp"
        }
      }
    }
    ```
    
    That object form previously failed during install/load with an error
    like:
    
    ```text
    failed to parse plugin manifest: invalid type: map, expected a string
    ```
    
    ## What changed
    - Add a manifest representation for `mcpServers` as either
    `Path(Resource)` or `Object(map)`.
    - Parse `plugin.json` `mcpServers` as either a string path or an object.
    - Route object-valued MCP server maps through the existing plugin MCP
    config parser instead of adding a second parser.
    - Apply existing per-plugin MCP server policy to object-valued MCP
    servers the same way as file-backed MCP servers.
    - Include object-valued MCP server names in plugin telemetry/capability
    metadata.
    - Support object-valued MCP config for executor plugins without
    requiring a `.mcp.json` filesystem read.
    - Update the bundled plugin-creator validator and `plugin-json-spec.md`
    so generated-plugin validation accepts the same object-valued shape.
    
    ## Compatibility
    Existing plugin manifests that use `"mcpServers": "./.mcp.json"`
    continue to work. Plugins can now also use the object shape shown above.
    
    ## Tests
    Added coverage for the new manifest attribute shape at the install,
    normal load, telemetry, and executor-provider layers:
    
    - `install_accepts_manifest_mcp_server_objects`
    - `load_plugins_loads_manifest_mcp_server_objects`
    - `plugin_telemetry_metadata_uses_manifest_mcp_server_objects`
    - `reads_manifest_object_config_without_executor_file_system_access`
    
    Also smoke-tested the plugin-creator validator against both supported
    forms:
    
    - `mcpServers` as a direct object in `plugin.json`
    - `mcpServers` as `"./.mcp.json"` with a companion `.mcp.json`
    
    ## Validation
    - `just test -p codex-plugin`
    - `just test -p codex-core-plugins`
    - `just test -p codex-mcp-extension`
    - `just bazel-lock-update`
    - `just bazel-lock-check`
    - `just fmt`
    - `git diff --check`
    - Focused rename/object-form rerun: `just test -p codex-core-plugins
    manager::tests::load_plugins_loads_manifest_mcp_server_objects
    manager::tests::plugin_telemetry_metadata_uses_manifest_mcp_server_objects
    store::tests::install_accepts_manifest_mcp_server_objects`
    - Focused executor rerun: `just test -p codex-mcp-extension
    executor_plugin::provider::tests::reads_manifest_object_config_without_executor_file_system_access`
    - `python3
    codex-rs/skills/src/assets/samples/plugin-creator/scripts/validate_plugin.py
    /private/tmp/codex-validator-object`
    - `python3
    codex-rs/skills/src/assets/samples/plugin-creator/scripts/validate_plugin.py
    /private/tmp/codex-validator-path`
  • thread-store: fix response fixture compilation (#28642)
    ## Why
    
    A `codex-thread-store` test fixture still constructs
    `ResponseItem::FunctionCallOutput` without its required `metadata`
    field, preventing the crate's test targets from compiling on `main`.
    
    ## What changed
    
    - Set the fixture's response-item metadata to `None`.
    
    ## Testing
    
    - `cargo check -p codex-thread-store --tests`
  • [codex] core: restore absolute turn context cwd (#28629)
    ## Why
    
    #28152 jumped the gun on moving the rollout format to store URIs, and
    would likely break compat with some features that don't go through the
    same types as the core logic.
    
    ## What
    
    Make `TurnContextItem.cwd` an `AbsolutePathBuf` again, remove test added
    for `PathUri` serialization in rollouts. Also drops a bunch of error
    paths that are no longer needed.
  • [codex] Gate remote plugin catalog by auth (#28625)
    ## Summary
    
    - Treat the remote global plugin catalog as active only when
    `remote_plugin` is enabled and the current auth uses the Codex backend.
    - Skip the local OpenAI curated marketplace for remote-enabled ChatGPT
    users while preserving configured marketplaces.
    - Keep the local curated marketplace for API-key users, unauthenticated
    fallback, and ChatGPT users with `remote_plugin` disabled.
    - Apply the same effective-remote gate to the remote
    installed-marketplace cache.
    
    ## Root cause
    
    The tool-suggestion discovery path unconditionally included the local
    OpenAI curated marketplace. For remote-enabled ChatGPT users, that made
    remote discovery additive: Codex parsed every local curated
    `plugin.json` before also loading the remote catalog.
    
    ## Validation
    
    - `just fmt`
    - `cargo build -p codex-cli --bin codex`
    - Targeted auth/feature matrix tests pass, including API-key auth with
    `remote_plugin` enabled.
    - Manual CLI validation confirmed:
      - ChatGPT + remote off includes local curated.
      - ChatGPT + remote on excludes local curated.
      - API-key auth keeps local curated when remote is enabled.
    - `just test -p codex-core-plugins`: 235 passed; one unrelated existing
    marketplace test failed because it loaded the developer's home
    marketplace configuration.
  • Revert "Tell codex about PathUri serde compat. (#28595)" (#28627)
    This reverts commit bd2a786326, which
    didn't capture all the nuance we need for this migration.
  • Add thread recencyAt for sidebar ordering (#27910)
    ## Summary
    
    Add a server-owned `recencyAt` timestamp and `recency_at` thread-list
    sort key for product recency ordering while preserving the existing
    meaning of `updatedAt` as the latest persisted thread mutation.
    
    This is the server-side alternative to #27697. Rather than narrowing
    `updatedAt`, clients can sort the sidebar by `recency_at` and continue
    treating `updatedAt` as mutation time.
    
    Paired Codex Apps PR:
    [openai/openai#1024599](https://github.com/openai/openai/pull/1024599)
    
    ## Contract
    
    - `recencyAt` initializes when a thread is created.
    - A turn start advances `recencyAt` monotonically.
    - Commentary, agent output, tool results, token/accounting updates, turn
    completion, archive, unarchive, resume, and generic metadata writes do
    not advance it.
    - `updatedAt` retains its existing behavior and continues to advance for
    persisted thread mutations.
    - Current servers populate `recencyAt`; the response field is optional
    in generated TypeScript so clients connected to older servers can fall
    back to `updatedAt`.
    - Filesystem-only fallback uses existing updated/mtime ordering when
    SQLite is unavailable.
    
    ## Persistence and compatibility
    
    Migration 0038 adds second- and millisecond-precision recency columns,
    backfills them from the existing updated timestamp, creates list
    indexes, and includes an insert trigger so older binaries writing to a
    migrated database seed recency without causing later mutations to
    advance it.
    
    Generic metadata upserts preserve existing recency values. Turn-start
    updates use a dedicated monotonic touch, and process-local allocation
    keeps millisecond cursor values unique. State DB list, search, read,
    filtered-list repair, rollout fallback propagation, and app-server
    conversions all carry the new field.
    
    ## API
    
    `Thread` responses include:
    
    ```ts
    recencyAt?: number
    ```
    
    `thread/list` and `thread/search` accept:
    
    ```json
    { "sortKey": "recency_at" }
    ```
    
    Generated TypeScript and JSON schemas are included.
    
    ## Validation
    
    - `just test -p codex-state` — 146 passed
    - `just test -p codex-rollout` — 69 passed
    - `just test -p codex-thread-store` — 81 passed
    - `just test -p codex-app-server-protocol` — 231 passed
    - Focused app-server list ordering, response mapping, archive/unarchive,
    and resume lifecycle tests passed
    - Scoped `just fix` for state, rollout, thread-store,
    app-server-protocol, and app-server
    - `just fmt`
    - `git diff --check`
    - Independent correctness, simplicity, elegance, security, and
    test-quality reviews; actionable ordering, lifecycle, query-projection,
    and timestamp-uniqueness findings were addressed
  • PAC 1 - Add system proxy feature config surface (#26706)
    ## Summary
    
    Introduces the default-off `respect_system_proxy` feature flag used to
    gate first-class system PAC/proxy support for Codex-owned native
    clients.
    
    With the feature disabled or absent, behavior remains unchanged. This PR
    establishes the configuration and managed-requirement surface; proxy
    discovery and request routing are implemented by follow-up PRs.
    
    ## Configuration
    
    User configuration uses the standard boolean feature form:
    
    ```toml
    [features]
    respect_system_proxy = true
    ```
    
    Managed feature requirements use the corresponding boolean key. The
    effective runtime configuration is exposed as a boolean and defaults to
    `false`.
    
    ## Implementation
    
    - Registers `respect_system_proxy` as an under-development, default-off
    feature.
    - Resolves user configuration and managed feature requirements into
    `Config.respect_system_proxy`.
    - Provides bootstrap resolution for startup paths that must evaluate the
    feature before full configuration loading completes.
    - Uses the standard feature CLI and config-editing behavior.
    - Excludes `features.respect_system_proxy` from project-local
    configuration.
    - Updates the generated configuration schema.
    
    ## End-user behavior
    
    - No networking behavior changes when the feature is absent or disabled.
    - Enabling the feature makes the boolean available to the native
    proxy-routing implementation in follow-up PRs.
    - Repository-local configuration cannot enable the feature.
    
    ## Test coverage
    
    Covers scalar configuration and CLI override resolution, managed
    requirement constraints, bootstrap resolution, and project-local
    filtering.
  • [codex] [4/4] Simplify recommended plugin install schema (#28403)
    ## Summary
    - Simplify recommendation-context `request_plugin_install` arguments to
    `plugin_id` and `suggest_reason`.
    - Derive plugin type and install action from the matched candidate while
    preserving Codex-owned elicitation metadata.
    - Keep the legacy list-backed schema unchanged and accept resumed calls
    that still use `tool_id`.
    
    ## Stack
    - #28399
    - #28400
    - #27704
    - This PR
    
    ## Validation
    - `just test -p codex-tools -p codex-core request_plugin_install` (25
    passed)
    - `just fix -p codex-tools -p codex-core`
    - `just fmt`
    - `git diff --check`
  • core: render remote environment cwd natively (#28152)
    ## Why
    
    Model-visible `<environment_context>` should match the environment of
    the executor, not of the app server.
    
    Stacked on #28146.
    
    ## What
    
    - Keep selected environment cwd values as `PathUri` while building
    environment context.
    - Render cwd text using the path convention represented by the URI, with
    the canonical URI as a fallback.
    - Preserve compatibility with legacy `TurnContextItem.cwd` values when
    reconstructing and diffing context.
    - Extend the Wine-backed remote Windows test to assert that the model
    sees `powershell` and `C:\windows`.
  • [codex] [3/4] Activate endpoint plugin recommendations (#27704)
    Summary\n- Await endpoint recommendation selection while constructing
    each authenticated turn, removing the first-turn cache race.\n- Snapshot
    and filter endpoint candidates once per turn, then use that same set for
    the bounded contextual user fragment, tool exposure, and exact install
    validation.\n- Keep recommendation selection ephemeral: do not persist
    recommendation state in or gate resumed threads on prior context.\n-
    Hide the legacy list tool in endpoint mode and preserve legacy discovery
    unchanged when the endpoint is disabled or unavailable.\n- Keep remote
    plugin and connector app identities out of model-visible context and
    attach them only to Codex-owned elicitation metadata.\n\nStack\n- 3/4,
    based on #28400.\n- Endpoint client and cache: #28399.\n- Generalized
    suggestion presentation: #28400.\n- Install-schema follow-up:
    #28403.\n\nValidation\n- \n- \n- \n- \n- Full : 2,649 passed and 88
    environment-dependent tests failed because this sandbox cannot write ,
    nest Seatbelt, or locate auxiliary test binaries.
  • [codex] [2/4] Generalize plugin suggestion presentation (#28400)
    Summary
    - Add list-backed and developer-context presentations for plugin
    suggestion candidates.
    - Let tool planning, install validation, and request-tool copy follow
    the selected presentation.
    - Keep every production caller on the existing list-backed presentation,
    preserving the current list tool, request schema, connector behavior,
    and model-visible copy.
    - Leave developer-context presentation latent until the final PR in the
    stack.
    
    Stack
    - 2/3, based on #28399.
    - Follow-up: #27704 activates endpoint recommendations.
    
    Validation
    - `just test -p codex-core request_plugin_install`
    - `just test -p codex-core spec_plan`
    - `just fix -p codex-core`
    - `just fmt`
    - `git diff --check`
  • [codex] [1/4] Add recommended plugin endpoint cache (#28399)
    Summary
    - Add authenticated parsing for `/ps/plugins/suggested?scope=GLOBAL`,
    including remote plugin and connector app identities.
    - Validate, deduplicate, sort, and cap endpoint candidates before
    caching them by backend and account identity.
    - Deduplicate concurrent cache misses and warm recommendations from the
    existing remote-installed-plugin refresh path used at startup and after
    account changes.
    - Keep endpoint results model-invisible in this PR; failures and
    responses without `enabled: true` resolve to legacy mode.
    
    Stack
    - 1/3. Follow-up: #28400 generalizes plugin suggestion presentation
    without activating endpoint recommendations.
    - Final activation: #27704.
    
    Validation
    - `just test -p codex-core-plugins recommended_plugins`
    - `just fix -p codex-core-plugins`
    - `just fmt`
    - `git diff --check`
  • Tell codex about PathUri serde compat. (#28595)
    This addresses another wrinkle I keep having to re-prompt codex about
    when migrating to cross-OS paths.
  • app-server: preserve target-native environment cwd (#28146)
    ## Why
    
    app-server may run on a different OS from the selected exec-server
    environment. Parsing that environment’s cwd with the Codex host’s path
    rules prevents thread startup.
    
    ## What
    
    Carry environment cwd values as `LegacyAppPathString` at the app-server
    boundary and `PathUri` internally. Existing tool-call schemas and
    relative-path behavior stay host-native; remaining local-only consumers
    convert explicitly and leave follow-up TODOs.
    
    The Wine integration test verifies app-server can start a thread and
    complete an ordinary turn with a Windows environment cwd from Linux.
    
    ## Validation
    
    - `bazel test //codex-rs/core/tests/remote_env_windows:smoke-test
    --test_output=errors`
    - focused app-server environment-selection and protocol schema tests
    - scoped Clippy for `codex-core` and `codex-app-server-protocol`
  • Record invariants for path migration. (#28589)
    ## Why
    
    Help Codex understand how to execute the migration to support cross-OS
    paths.
    
    ## What
    
    Expand the path-types skill with our goals and constraints.
  • Clarify model-generated and legacy app path types (#28577)
    ## Why
    
    `ApiPathString` kind of implies that it can be used anywhere we pull a
    path out of JSON, but it's not really appropriate for tool arguments
    when the model might generate relative paths.
    
    Prefer `String` for model-generated paths and we can handle the
    conversion per feature for now and define a shared abstraction later if
    it makes sense.
    
    # What
    
    Rename `ApiPathString` to `AppLegacyPathString` to clarify its role.
    
    Expand the `path-types` skill to tell the model to leave tool args as
    bare strings.
  • [codex] test exec relative additional permissions (#28587)
    ## Why
    
    Review caught some would-be regressions in changes to unified_exec that
    weren't surfaced in CI.
    
    ## What
    
    Add coverage for requesting permissions through unified exec when there
    are additional permissions. Previously this flow was only tested against
    shell_command.
  • code-mode: extend test coverage to lock in cell lifecycle (#28468)
    This PR establishes the intended behavior as an executable contract
    before a refactor of the cell runtime begins. It also fixes cases where
    a second observer or termination request could replace an existing
    response channel and leave the original caller unresolved.
    
    ### Behavior codified
    - A cell can yield output and subsequently resume to completion.
    - A caller can run a cell until it has no immediately runnable work,
    receive its accumulated output and outstanding tool-call IDs, and then
    resume the same cell when the awaited work is available.
    - Each cell admits one active observer:
       - a second observer receives an explicit busy error
       - the existing observer remains registered and is not displaced
    - A natural result (conclusion of the js module) that has already
    reached the cell controller wins over a later termination request.
    - Otherwise, termination preempts execution and resolves both:
      - the active observer, if present
      - the caller requesting termination
    - Repeated termination requests are rejected while termination is
    already in progress.
    - Terminal responses are sent only after outstanding callback work has
    been handled:
    - natural completion drains notifications and cancels outstanding tool
    calls
    - termination cancels and drains both notification and tool callbacks.
    - Cell removal and cell_closed notification happen after callback
    cleanup
  • [codex] re-enable absolute workdir integration test (#28581)
    ## Why
    
    In #28146 I missed the invariant that an absolute `exec_command` workdir
    must override the environment cwd. The existing integration test would
    have caught that regression, but it was ignored as flaky.
    
    ## What
    
    Re-enable `unified_exec_respects_workdir_override`.
    
    ## Validation
    
    `just test -p codex-core unified_exec_respects_workdir_override`
  • [codex-app-server-test-client] Plugin Install/Uninstall Analytics Smoke Test (#27100)
    ## This PR
    
    The original [combined remote plugin analytics PR
    #26281](https://github.com/openai/codex/pull/26281) mixed reusable
    analytics test infrastructure, two manual smoke workflows, a metadata
    refactor, and the final identity behavior. This PR adds the
    account-mutating validation workflow separately so its cleanup and
    recovery guarantees can be reviewed without the final analytics behavior
    change.
    
    - Add a manually invoked remote plugin install/uninstall smoke workflow.
    - Require explicit account-mutation confirmation and an initially
    uninstalled plugin.
    - Validate the current `codex_plugin_installed` contract, where
    `plugin_id` is the backend ID.
    - Restore and verify the original uninstalled state, with a dedicated
    recovery command.
    
    This baseline intentionally does not require `codex_plugin_uninstalled`,
    because production does not emit that event yet. The final PR will
    update this smoke to require local `plugin_id`, `remote_plugin_id`, and
    uninstall emission. Review this PR as the net diff against #27099.
    
    ## Testing
    
    - `just test -p codex-app-server-test-client` (3 focused
    capture/validation tests passed)
    - The live workflow was previously exercised on the green combined
    reference branch, and the original uninstalled account state was
    restored.
    - CI is green across the required platform matrix.
    
    ## Split Overview
    
    ```text
    main
    ├── #27093  Debug analytics capture
    │   └── #27099  Non-mutating plugin smoke
    │       └── #27100  Remote install/uninstall smoke  ← you are here
    └── #27102  Plugin telemetry metadata refactor
    
    After #27093, #27099, #27100, and #27102 merge:
    └── Final PR: add remote_plugin_id to plugin analytics
    ```
    
    Review order and dependencies:
    
    1. [#27093 Add debug-only analytics event
    capture](https://github.com/openai/codex/pull/27093) (based on `main`)
    2. [#27099 Add a plugin analytics smoke
    workflow](https://github.com/openai/codex/pull/27099) (stacked on
    #27093)
    3. [#27100 Add a remote plugin analytics mutation smoke
    workflow](https://github.com/openai/codex/pull/27100) **(this PR,
    stacked on #27099)**
    4. [#27102 Centralize plugin telemetry metadata
    construction](https://github.com/openai/codex/pull/27102) (independent,
    based on `main`)
    5. Final remote-ID behavior PR (created after PRs 1-4 merge)
    
    The original [#26281](https://github.com/openai/codex/pull/26281)
    remains open as the green aggregate reference until the final PR is
    published.
  • [codex] Route MCP file uploads through environment filesystem (#27923)
    ## Why
    
    Codex Apps tools can mark arguments with `openai/fileParams`, but the
    execution path resolved and opened those files directly on the host.
    That bypassed the selected turn environment and prevented annotated file
    arguments from working with remote environments.
    
    ## What changed
    
    - resolve annotated file arguments against the primary turn environment
    - read file metadata and contents through that environment's sandboxed
    `ExecutorFileSystem`
    - reject files over the 512 MiB limit from metadata before reading or
    transferring them
    - retain the buffered upload-size check as defense in depth
    - make the OpenAI upload API accept a filename and buffered contents
    instead of owning local filesystem access
    - describe the model-visible argument as a path in the primary
    environment
    
    This builds on #27927, which added `size` to internal filesystem
    metadata.
    
    ## Testing
    
    - `just test -p codex-api upload_openai_file_returns_canonical_uri`
    - `just test -p codex-mcp
    tool_with_model_visible_input_schema_masks_file_params`
    - `just test -p codex-core mcp_openai_file`
    - `just test -p codex-core
    codex_apps_file_params_upload_environment_files_before_mcp_tool_call`
  • ci: run code-mode unit tests on all bazel targets (#28562)
    ## Why
    
    V8 should be stable under Bazel, so the `codex-code-mode` unit tests
    should run across the Bazel platform matrix. If these tests prove
    unstable, we should fix the tests rather than exclude them from CI.
    
    ## What changed
    
    - Remove the explicit `//codex-rs/code-mode:code-mode-unit-tests`
    exclusion from the macOS and Linux Bazel test jobs.
    - Remove the same exclusion from the native Windows post-merge job.
    - Keep the existing Windows gnullvm shard coverage.
    
    ## Bazel test coverage
    
    The target contains 26 unit tests. A fresh uncached local Bazel
    execution ran all 26 with 0 failures, 0 ignored tests, and 0 filtered
    tests.
    
    PR Bazel CI selected the target on every enabled platform and reported a
    cached pass:
    
    | Platform | Passing CI job |
    | --- | --- |
    | macOS aarch64 | [Bazel test
    passed](https://github.com/openai/codex/actions/runs/27636617545/job/81725447804)
    |
    | macOS x86_64 | [Bazel test passed in
    2.2s](https://github.com/openai/codex/actions/runs/27636617545/job/81725448008)
    |
    | Linux GNU | [Bazel test passed in
    0.4s](https://github.com/openai/codex/actions/runs/27636617545/job/81725447898)
    |
    | Linux musl | [Bazel test passed in
    0.4s](https://github.com/openai/codex/actions/runs/27636617545/job/81725448117)
    |
    | Windows gnullvm | [Bazel test passed in shard 4/4 in
    1.6s](https://github.com/openai/codex/actions/runs/27636617545/job/81725448166)
    |
  • feat(tui): add rate-limit reset redemption to /usage (#28154)
    ## Why
    
    Codex users can earn personal rate-limit reset credits, but the CLI does
    not currently provide a way to view or redeem them. The `/usage` command
    restored in #27925 is intended to be the entry point for usage-related
    actions, so reset redemption belongs there rather than in a separate
    dashed slash command.
    
    Depends on #28143 for the app-server and backend-client reset-credit
    APIs.
    
    ## What changed
    
    - Turn bare `/usage` into a menu with entries for token activity and
    earned rate-limit resets while preserving `/usage daily`, `/usage
    weekly`, and `/usage cumulative`.
    - Add loading, empty, confirmation, success, retry, and error states
    with a caller-generated UUID idempotency key reused across retries of
    the same logical reset.
    - Show an availability hint only for backend-classified rate-limit
    errors with credits available.
    - Hide the reset entry for workspace accounts.
    
    ## Validation
    
    - `just test -p codex-tui chatwidget::tests::usage` — 19 passed.
    - `just fix -p codex-tui` — passed.
    - `just fmt` — passed.
    - `cargo insta pending-snapshots` from `codex-rs/tui` — no pending
    snapshots.
    
    ## Examples
    <img width="1168" height="304" alt="image"
    src="https://github.com/user-attachments/assets/caa4c1e3-e996-494d-ae17-50b521f5dce8"
    />
    <img width="908" height="260" alt="image"
    src="https://github.com/user-attachments/assets/e38a726b-77cc-4bd0-9ea8-9f3ad21c5768"
    />
    
    
    ### Reset flow
    <img width="1509" height="312" alt="image"
    src="https://github.com/user-attachments/assets/d987013c-78a5-48a2-ad8d-c61ad267a327"
    />
    <img width="585" height="190" alt="image"
    src="https://github.com/user-attachments/assets/de32be19-79b9-4a3e-8574-6f1c208c98ae"
    />
    <img width="600" height="210" alt="image"
    src="https://github.com/user-attachments/assets/88a165cf-796d-4fdc-a7bc-ea89917573da"
    />
    
    <img width="512" height="193" alt="image"
    src="https://github.com/user-attachments/assets/d2353998-5aa8-442e-a5f8-3a8a5b832753"
    />
  • Add incremental thread history changes
    Add ThreadHistoryBuilder APIs for collecting incremental thread item and turn changes while applying rollout items.
    
    Batch handling coalesces repeated changes so callers can get the latest incremental thread item changes for a set of rollout items without rebuilding full history.
  • [codex] Warn clearly when code mode output is truncated (#28467)
    ## Summary
    
    - make `formatted_truncate_text` prepend `Warning: truncated output
    (original token count: N)` above the existing `Total output lines`
    header
    - update direct formatter, unified-exec, user-shell, and code-mode
    expectations
    - add core unit coverage that runs in Bazel without requiring the
    skipped V8-backed code-mode integration suite
    
    ## Validation
    
    - `cargo test -p codex-utils-output-truncation -- --nocapture` (17
    passed)
    - `cargo test -p codex-core --lib
    truncated_text_output_starts_with_warning -- --nocapture`
    - `cargo test -p codex-core --test all
    clamps_model_requested_max_output_tokens_to_policy -- --nocapture` (2
    passed)
    - `cargo test -p codex-core --test all
    unified_exec_formats_large_output_summary -- --nocapture`
    - `cargo test -p codex-core --test all
    user_shell_command_output_is_truncated_in_history -- --nocapture`
    - Bazel CI exercises the shared formatter and downstream integration
    expectations
  • fix(tui): highlight C++ module files (#28554)
    ## Why
    
    Codex syntax-highlights diffs for conventional C++ extensions such as
    `.cpp` and `.cxx`, but C++ module interface files using `.cppm`, `.ixx`,
    or `.cxxm` fall back to plain diff coloring. The bundled syntax set
    already includes C++, but it does not resolve those module extensions by
    itself.
    
    Closes #28223.
    
    ## What changed
    
    - map `.cppm`, `.ixx`, and `.cxxm` to the existing `cpp` syntax in
    `render/highlight.rs`
    - extend alias-resolution coverage for all three module extensions
    - verify `.cpp`, `.cppm`, `.ixx`, and `.cxxm` diffs produce
    syntax-highlighted RGB spans while unknown extensions retain the plain
    fallback
    - snapshot the syntax-colored token segmentation for the supported C++
    module extensions
    
    ## How to Test
    
    1. Ask Codex to create or modify a C++ module interface file using
    `.cppm`, `.ixx`, or `.cxxm`.
    2. Confirm C++ tokens in the rendered diff receive syntax colors instead
    of only the red/green diff treatment.
    3. Modify an equivalent `.cpp` file and confirm its existing
    highlighting remains unchanged.
    4. Modify a file with an unknown extension and confirm it still uses the
    plain diff fallback.
    
    Targeted tests:
    
    - `just test -p codex-tui -E
    'test(find_syntax_resolves_languages_and_aliases) |
    test(cpp_module_extensions_use_cpp_highlighting) |
    test(unknown_extension_falls_back_without_syntax_highlighting)'`
  • [codex-app-server-test-client & codex-app-server] Plugin Usage Analytics Smoke Test (#27099)
    ## This PR
    
    The original [combined remote plugin analytics PR
    #26281](https://github.com/openai/codex/pull/26281) mixed reusable
    analytics test infrastructure, two manual smoke workflows, a metadata
    refactor, and the final identity behavior. This PR establishes a
    non-mutating end-to-end plugin smoke workflow before any analytics
    identity semantics change.
    
    - Add `plugin-analytics-smoke` to the existing app-server test client.
    - Exercise plugin disable, enable, and use through production app-server
    RPC paths.
    - Isolate config writes in a temporary file and use a loopback Responses
    API server.
    - Capture analytics without sending them to the production analytics
    backend.
    - Validate the current local `plugin_id`, names, capability metadata,
    thread, turn, and model fields.
    
    This is intentionally a baseline smoke workflow. It does not assert
    `remote_plugin_id`; the final PR will update it when that field exists.
    Review this PR as the net diff against #27093.
    
    ## Testing
    
    - The test-client target compiles successfully.
    - The combined reference branch exercised the manual smoke against the
    live remote plugin service.
    - CI is green across the required platform matrix.
    
    ## Split Overview
    
    ```text
    main
    ├── #27093  Debug analytics capture
    │   └── #27099  Non-mutating plugin smoke           ← you are here
    │       └── #27100  Remote install/uninstall smoke
    └── #27102  Plugin telemetry metadata refactor
    
    After #27093, #27099, #27100, and #27102 merge:
    └── Final PR: add remote_plugin_id to plugin analytics
    ```
    
    Review order and dependencies:
    
    1. [#27093 Add debug-only analytics event
    capture](https://github.com/openai/codex/pull/27093) (based on `main`)
    2. [#27099 Add a plugin analytics smoke
    workflow](https://github.com/openai/codex/pull/27099) **(this PR,
    stacked on #27093)**
    3. [#27100 Add a remote plugin analytics mutation smoke
    workflow](https://github.com/openai/codex/pull/27100) (stacked on this
    PR)
    4. [#27102 Centralize plugin telemetry metadata
    construction](https://github.com/openai/codex/pull/27102) (independent,
    based on `main`)
    5. Final remote-ID behavior PR (created after PRs 1-4 merge)
    
    The original [#26281](https://github.com/openai/codex/pull/26281)
    remains open as the green aggregate reference until the final PR is
    published.
  • chore: side prompt (#28553)
    Fix side bug with prompt
  • [codex] exec-server: stream files in chunks (#28354)
    ## Why
    
    `fs/readFile` buffers the entire file in one response, which makes large
    remote reads expensive and prevents callers from applying backpressure.
    We need an opt-in streaming path with bounded block sizes while
    preserving the existing single-call API for small and sandboxed reads.
    
    ## What changed
    
    - Add `ExecServerClient::stream`, returning a named `FileReadStream`
    that implements `futures::Stream` and yields immutable 1 MiB byte
    blocks.
    - Add internal `fs/open`, `fs/readBlock`, and `fs/close` RPCs.
    `fs/readBlock` accepts an explicit offset and length.
    - Keep unsandboxed files open between block reads, cap open handles per
    connection, and clean them up on EOF, error, stream drop, explicit
    close, or connection shutdown.
    - Reject platform-sandboxed streaming opens instead of turning the
    one-shot sandbox helper into a persistent server. Existing `fs/readFile`
    behavior is unchanged.
    
    ## Testing
    
    - `just test -p codex-exec-server`
    - Integration coverage for 1 MiB chunking, exact block-boundary EOF,
    sandbox rejection, and continued reads from the opened file after path
    replacement.
    - Handle-manager coverage for non-sequential offsets, variable block
    lengths, the 128-handle limit, and capacity release after close.
  • fix(tui): restore TUI after suspend (#28342)
    ## Why
    
    On Linux, suspending Codex with `Ctrl+Z` and returning with `fg` can
    leave the composer misaligned or inject terminal response bytes such as
    focus reports into the prompt. Shell job-control output moves the cursor
    while Codex is suspended, and terminal input polling can race with the
    responses used to restore the inline viewport.
    
    Fixes #26564.
    
    ## What changed
    
    - preserve and restore keyboard reporting without disturbing the parent
    terminal stack
    - pause terminal event polling while Codex is suspended and flush
    buffered input before resuming it
    - force crossterm's cached raw-mode state back in sync after the shell
    completes its `fg` handoff
    - probe the actual post-`fg` cursor position with the tolerant
    terminal-response parser, then realign the inline viewport before
    redrawing
    
    ## How to Test
    
    1. On Linux, start the development TUI with `just c`.
    2. Type text into the composer without submitting it.
    3. Press `Ctrl+Z`, run any harmless shell command, then run `fg`.
    4. Confirm the composer redraws below the shell output, the draft text
    is preserved, and no raw escape sequences appear.
    5. Repeat the suspend/resume cycle and confirm normal typing still
    works.
    
    Targeted tests:
    
    - `cargo test -p codex-tui --lib parses_cursor_position_as_zero_based -j
    1`
    - `cargo test -p codex-tui --lib tui::event_stream::tests -j 1`
  • path-uri: clarify invalid host path errors (#28473)
    ## Why
    
    Ensure a consistent string format when exposing path conversion errors
    to the model.
    
    ## What
    
    - Render `PathUriParseError::InvalidFileUriPath` as `'$PATH' is invalid
    on '$OS'`.
  • perf(config): defer remote sandbox hostname lookup (#28542)
    ## Why
    
    [#18763](https://github.com/openai/codex/pull/18763) added canonical
    hostname resolution for `remote_sandbox_config`. Requirements
    composition currently performs that synchronous DNS lookup on every
    fresh process, even when none of the loaded requirements layers contains
    `[[remote_sandbox_config]]`. On hosts with slow local DNS resolution,
    this can add several seconds to Codex startup.
    
    ## What
    
    - defer hostname resolution until a parsed requirements layer actually
    contains `remote_sandbox_config`
    - cache the resolver result once per requirements composition,
    preserving the existing single-lookup behavior across multiple layers
    - keep the existing FQDN resolution and per-layer requirements
    precedence unchanged
    - cover both the ordinary no-lookup path and the multi-layer
    single-lookup path
    
    ## How to Test
    
    On a host where local canonical-name resolution is slow:
    
    1. Start Codex without `[[remote_sandbox_config]]` in any managed
    requirements layer and confirm startup no longer waits for hostname
    resolution.
    2. Add a matching `[[remote_sandbox_config]]` entry and confirm its
    `allowed_sandbox_modes` still overrides the layer's top-level value.
    3. Add remote sandbox entries to multiple requirements layers and
    confirm precedence remains unchanged while the hostname is resolved only
    once.
    
    Targeted tests:
    
    - `just test -p codex-config hostname_resolver`
    - `just test -p codex-config` (181 passed)
  • core: surface terminal subagent errors to parent agents (#28375)
    ## Why
    
    When a subagent exhausts its retries, it emits an `Error`, but the
    generic task lifecycle then emits `TurnComplete(None)`. That completion
    used to overwrite the subagent's `Errored` status with
    `Completed(None)`, so the parent received an empty completion
    notification.
    
    This made a failed child look indistinguishable from a child that
    completed without an answer. In unattended or long-running multi-agent
    work, the root could silently continue without knowing that delegated
    work failed or how to restart it.
    
    ## Behavior
    
    Before, a terminal stream failure was reduced to an empty completion:
    
    ```text
    <subagent_notification>
    {"agent_path":"/root/worker","status":{"completed":null}}
    </subagent_notification>
    ```
    
    Now the parent receives the actual terminal error, bounded to 1,000
    tokens, together with an actionable recovery hint:
    
    ```text
    <subagent_notification>
    {
      "agent_path": "/root/worker",
      "status": {
        "errored": "stream disconnected before completion: stream closed before response.completed"
      },
      "next_action": "This agent's turn failed. If you still need this agent, use `followup_task` to give it another task."
    }
    </subagent_notification>
    ```
    
    The notification remains queue-only: it does not wake the root or replay
    the failed request. The root sees it at the next sampling boundary and
    can use `followup_task` to start a new turn for that agent.
    
    ## What changed
    
    - Added terminal-error precedence to the [agent status
    reducer](https://github.com/openai/codex/blob/e95fcfe2bb6a02f1a75650afa20048859f556511/codex-rs/core/src/agent/status.rs#L23-L34),
    so a closing `TurnComplete` cannot erase an immediately preceding
    `Errored` status.
    - Made MultiAgentV2 completion forwarding use the retained session
    status instead of re-deriving `Completed(None)` from the final event.
    - Extended the [subagent notification
    fragment](https://github.com/openai/codex/blob/e95fcfe2bb6a02f1a75650afa20048859f556511/codex-rs/core/src/context/subagent_notification.rs#L6-L60)
    with a `next_action` for terminal errors and a hard cap on model-visible
    error text.
    - Kept successful completions and interrupted turns unchanged.
    
    ## Verification
    
    - Added a status-reducer test proving that `Errored` survives the
    trailing `TurnComplete`.
    - Added an integration test that exhausts a subagent's stream retries
    and verifies the exact `agent_message` delivered to the parent,
    including the error and `followup_task` guidance.
    - Re-ran the existing successful-completion and interrupted-turn
    notification tests.
  • [codex] Clarify plugin load and runtime capability stages (#28472)
    ## Summary
    
    Plugin loading and auth projection both previously produced
    `PluginLoadOutcome`. That made an unfiltered load result look like
    runtime-ready capabilities and generated capability summaries before
    auth routing had run.
    
    This change keeps loaded plugin records in the cache, applies the
    current auth policy in `PluginsManager`, and only then builds
    `PluginLoadOutcome` and its summaries. Auth changes still reuse the
    cached disk load and re-resolve apps and MCP servers without reloading
    plugins.
    
    The updated tests cover cached auth changes and verify that capability
    summaries match the effective app/MCP surface.
    
    ## Testing
    
    - `just test -p codex-core-plugins`
    - `just test -p codex-plugin`
    - `just fix -p codex-core-plugins`
  • [tests] Keep Apps out of generic core test harness (#28508)
    ## Summary
    
    - disable the stable Apps feature in the generic `test_codex()`
    integration-test harness
    - keep Apps-specific tests explicit: their builders re-enable Apps and
    point it at a local mock server
    
    ## Why
    
    Generic tests that use dummy ChatGPT auth were also enabling the
    host-owned `codex_apps` MCP server. That made unrelated tests contact
    `chatgpt.com` and wait for MCP startup, causing the Bazel timeouts
    observed on #28368.
    
    The generic harness should be hermetic and should not start an external
    service that the test did not request. This is test-only; production
    Apps behavior is unchanged. The broader optional-MCP startup behavior is
    being handled separately in #28407.
    
    ## Testing
    
    - `just test -p codex-core -E
    'test(pre_sampling_compact_runs_when_comp_hash_changes) |
    test(model_switch_to_smaller_model_updates_token_context_window) |
    test(codex_apps_file_params_upload_local_paths_before_mcp_tool_call)'`
    - `just fix -p codex-core`
    - `just fmt`
  • feat: render typed envelopes for multi-agent v2 messages (#28368)
    ## Why
    
    Multi-agent v2 messages need a consistent, model-visible envelope that
    identifies what kind of interaction occurred, who sent it, and which
    agent it targets. Previously, encrypted deliveries exposed only
    `encrypted_content`, while child completion used the legacy
    `<subagent_notification>` shape. That meant the client could not
    consistently present `NEW_TASK`, `MESSAGE`, and `FINAL_ANSWER` using the
    same format.
    
    This change adds the routing envelope as plaintext while keeping task
    and message payloads encrypted. No new Responses API field is required:
    an encrypted delivery is represented as an `input_text` header
    immediately followed by its existing `encrypted_content` item.
    
    Every envelope now follows this shape:
    
    ```text
    Message Type: <NEW_TASK | MESSAGE | FINAL_ANSWER>
    Task name: <recipient agent path>
    Sender: <author agent path>
    Payload:
    <message payload>
    ```
    
    ## Message types
    
    ### `NEW_TASK`
    
    `NEW_TASK` is used when the recipient should begin a new turn, including
    an initial `spawn_agent` task and a later `followup_task`.
    
    For a root agent spawning `/root/worker`, the request contains a
    plaintext envelope followed by the encrypted task:
    
    ```json
    {
      "type": "agent_message",
      "author": "/root",
      "recipient": "/root/worker",
      "content": [
        {
          "type": "input_text",
          "text": "Message Type: NEW_TASK\nTask name: /root/worker\nSender: /root\nPayload:\n"
        },
        {
          "type": "encrypted_content",
          "encrypted_content": "<encrypted task payload>"
        }
      ]
    }
    ```
    
    Conceptually, the model receives:
    
    ```text
    Message Type: NEW_TASK
    Task name: /root/worker
    Sender: /root
    Payload:
    Review the authentication changes and report any regressions.
    ```
    
    ### `MESSAGE`
    
    `MESSAGE` is used for a queued `send_message` delivery. It communicates
    with an existing agent without starting a new turn.
    
    For `/root/worker` reporting progress to the root agent, the request
    contains:
    
    ```json
    {
      "type": "agent_message",
      "author": "/root/worker",
      "recipient": "/root",
      "content": [
        {
          "type": "input_text",
          "text": "Message Type: MESSAGE\nTask name: /root\nSender: /root/worker\nPayload:\n"
        },
        {
          "type": "encrypted_content",
          "encrypted_content": "<encrypted message payload>"
        }
      ]
    }
    ```
    
    Conceptually, the model receives:
    
    ```text
    Message Type: MESSAGE
    Task name: /root
    Sender: /root/worker
    Payload:
    The protocol tests pass; I am checking the resume path now.
    ```
    
    ### `FINAL_ANSWER`
    
    `FINAL_ANSWER` is emitted when a child agent reaches a terminal state
    and reports its result to its parent. Completion payloads are already
    available locally, so the complete envelope is represented as plaintext
    rather than as a plaintext header plus encrypted content.
    
    For `/root/worker` completing work for the root agent, the request
    contains:
    
    ```json
    {
      "type": "agent_message",
      "author": "/root/worker",
      "recipient": "/root",
      "content": [
        {
          "type": "input_text",
          "text": "Message Type: FINAL_ANSWER\nTask name: /root\nSender: /root/worker\nPayload:\nNo regressions found."
        }
      ]
    }
    ```
    
    The model-visible form is:
    
    ```text
    Message Type: FINAL_ANSWER
    Task name: /root
    Sender: /root/worker
    Payload:
    No regressions found.
    ```
    
    Errored, shut down, and missing agents also use `FINAL_ANSWER`, with a
    terminal-status description in the payload.
    
    ## What changed
    
    - Render `NEW_TASK` or `MESSAGE` in
    `InterAgentCommunication::to_model_input_item`, based on whether the
    encrypted delivery starts a turn.
    - Replace the multi-agent v2 `<subagent_notification>` completion
    payload with a model-visible `FINAL_ANSWER` envelope.
    - Document `Task name`, `Sender`, and `Payload` consistently in the
    multi-agent developer instructions.
    - Prevent local-only history projections from treating an encrypted
    message's plaintext header as the complete assistant message.
    - Preserve rollout-trace interaction edges when an agent message
    contains both plaintext and encrypted content.
    
    Legacy multi-agent behavior remains unchanged.
    
    ## Verification
    
    - `just test -p codex-protocol`
    - `just test -p codex-rollout-trace`
    - `just test -p codex-web-search-extension`
    - `just test -p codex-core
    encrypted_multi_agent_v2_spawn_sends_agent_message_to_child`
    - `just test -p codex-core
    plaintext_multi_agent_v2_completion_sends_agent_message`
    - `just test -p codex-core
    multi_agent_v2_followup_task_completion_notifies_parent_on_every_turn`
    - `just test -p codex-core
    multi_agent_v2_completion_queues_message_for_direct_parent`
  • [codex] Compress cold active rollouts (#28338)
    ## Why
    
    The local rollout compression worker currently scans only
    `archived_sessions`, so cold unarchived thread history remains expanded
    indefinitely.
    
    ## What changed
    
    - Scan `sessions` after `archived_sessions` within the existing worker
    runtime budget.
    - Update rollout compression coverage to require both cold active and
    archived rollouts to be compressed while fresh active rollouts remain
    plain.
    
    The worker remains behind the disabled-by-default
    `local_thread_store_compression` feature, and the existing seven-day
    cold-file threshold is unchanged.
    
    ## Validation
    
    - `just test -p codex-rollout` (69 passed)
    - `just fmt`
    - `git diff --check`
  • [codex] expose Bedrock credential source in account/read (#27751)
    ## Why
    
    `account/read` currently reports only `type: "amazonBedrock"`, so
    clients cannot distinguish a Codex-managed Bedrock API key from
    credentials supplied by AWS. The app UI needs that distinction to render
    the appropriate account state without duplicating provider-auth logic.
    
    Credential-source selection belongs to the Bedrock model provider
    because it already owns the precedence between managed Bedrock auth and
    the external AWS credential path. This builds on #27443 and #27689.
    
    ## What changed
    
    - Added `AmazonBedrockCredentialSource` with `codexManaged` and
    `awsManaged` values.
    - Included the selected credential source in
    `ProviderAccount::AmazonBedrock` and the app-server `Account` response.
    - Made `AmazonBedrockModelProvider::account_state()` classify the source
    from its managed-auth state.
    - Regenerated the app-server JSON and TypeScript schemas.
    - Updated app-server account documentation and downstream TUI matches.
    
    `codexManaged` means the provider found a managed Bedrock API key.
    `awsManaged` identifies the provider's external AWS credential path; it
    does not assert that the AWS credential chain has been validated.
    
    ## Testing
    
    - Added model-provider coverage for Codex-managed precedence and
    AWS-managed fallback.
    - Added app-server protocol serialization coverage for both wire values.
    - Added app-server integration coverage for both `account/read`
    responses.
    - `just test -p codex-protocol -p codex-model-provider -p
    codex-app-server-protocol` (497 tests passed).
    
    After rebasing onto #27711, the `codex-app-server` test target compiled
    past the image-generation `PathUri` migration. Local linking was then
    interrupted by disk exhaustion (`No space left on device`).
  • [codex] Record external agent import results (#28396)
    ## Summary
    - restore `externalAgentConfig/import/progress` notifications while
    keeping `externalAgentConfig/import/completed` as the must-deliver event
    - persist completed external-agent config imports in state DB by
    `importId`, including concrete success/failure details for config,
    AGENTS.md, skills, plugins, MCP servers, subagents, hooks, commands, and
    sessions
    - add `externalAgentConfig/import/readHistories` so clients can recover
    persisted import results after missing the live completion notification
    - include `errorType` on import failures in protocol
    responses/notifications and persisted DB JSON so future code can
    classify failures without another wire/storage shape change
    
    ## Validation
    - `git diff --check`
    - `just test -p codex-state external_agent_config_imports`
    - `just test -p codex-app-server-protocol`
    - `CODEX_SQLITE_HOME=/private/tmp/codex-app-server-sqlite-read-details
    just test -p codex-app-server
    external_agent_config_import_sends_completion_notification_for_sync_only_import`
    
    Also ran earlier broader checks before publishing:
    - `just test -p codex-state`
    -
    `CODEX_SQLITE_HOME=/private/tmp/codex-app-server-external-agent-test-sqlite
    just test -p codex-app-server external_agent_config`
    - `just test -p codex-external-agent-migration`
  • [codex] Use local environment for user shell commands (#28163)
    ## Why
    
    User shell commands still read the legacy turn cwd and session shell
    even though execution context is now owned by selected turn
    environments. App-server also defines `thread/shellCommand` as a
    local-host escape hatch, so it must use an available local environment
    even when a remote environment is primary.
    
    ## What changed
    
    - Add `ResolvedTurnEnvironments::local()` to find the selected local
    environment.
    - Resolve the user shell command cwd and shell from that local
    `TurnEnvironment`.
    - Emit the standard `shell is unavailable in this session` error when no
    selected local environment or resolved local shell is available.
    - Add an integration test covering `/shell` without a local environment.
    
    ## Test plan
    
    - `just test -p codex-core
    user_shell_command_without_local_environment_emits_error`
  • [codex] Use expect in integration tests (#28441)
    The workspace denies `clippy::expect_used` in production. Although
    `clippy.toml` allows `expect` in tests, Bazel Clippy compiles
    integration-test helper code in a way that does not receive that
    exemption, which encouraged verbose `unwrap_or_else(... panic!(...))`
    and equivalent `match`/`let else` forms.
    
    This allows `clippy::expect_used` once at each integration-test crate
    root (including aggregated suites and test-support libraries), then
    replaces manual panic-based Result and Option unwraps with
    `expect`/`expect_err`. Standalone `tests/*.rs` files remain their own
    crate roots. Intentional assertion and unexpected-variant panics remain
    unchanged, and the production `expect_used = "deny"` lint remains in
    place.
    
    The cleanup is mechanical and net-negative in line count.
  • [codex] Add interruptible sleep tool (#28429)
    ## Why
    
    Models sometimes need to pause briefly while waiting for external work,
    but using a shell command for that delay ties the wait to a process and
    does not naturally resume when new turn input arrives.
    
    ## What changed
    
    - add a built-in `sleep` tool behind the under-development `sleep_tool`
    feature
    - accept a bounded `duration_ms` argument, matching the millisecond
    convention used by unified exec
    - end the sleep early when either steered user input or mailbox input
    arrives
    - include elapsed wall-clock time in completed and interrupted outputs
    - emit a dedicated core `SleepItem` through `item/started` and
    `item/completed`
    - expose the sleep item as app-server v2 `ThreadItem::Sleep` and retain
    it in reconstructed thread history
    - regenerate the configuration schema for the new feature flag
    - regenerate app-server JSON and TypeScript schema fixtures
    
    ## Test plan
    
    - `just test -p codex-core sleep_tool_follows_feature_gate`
    - `just test -p codex-core any_new_input_interrupts_sleep`
    - `just test -p codex-app-server-protocol`
    - `just test -p codex-app-server
    sleep_emits_started_and_completed_items`
  • [codex] Bind shell snapshots to retained thread environments (#28421)
    ## Why
    
    Shell snapshots are currently session-scoped even though shell and cwd
    are properties of a selected turn environment. That makes snapshot
    refresh depend on separate session-cwd plumbing, prevents retained
    environments from retaining their snapshot work, and can make snapshot
    construction use a different shell than command execution.
    
    This follows #27955 by making the retained thread-environment service
    own environment snapshot lifecycles. Session configuration remains the
    requested selection state, while `ThreadEnvironments` remains the source
    of successfully resolved environments.
    
    ## What changed
    
    - Configure the shell-snapshot builder before initial environment
    resolution.
    - Start each local environment snapshot task when its `TurnEnvironment`
    is built and retain that shared task while environment ID and cwd still
    match.
    - Inherit retained environment snapshots into spawned child threads.
    - Carry the selected `TurnEnvironment` through shell runtimes so
    snapshot construction and command execution use the same
    environment-specific shell and cwd.
    - Load project instructions and warm plugins/skills after initial
    environment resolution.
    - Continue decoding invalid UTF-8 instruction files lossily without
    emitting a startup warning.
    - Keep requested selections in `SessionConfiguration`; failed or
    duplicate resolutions only affect the resolved environment snapshot.
    
    ## Validation
    
    - `cargo check -p codex-core --tests`
    - `just test -p codex-home instructions` (6 passed)
    - Focused environment, instruction, shell-snapshot, and user-shell tests
    (84 passed)
    - Focused shell-snapshot, user-shell, and unified-exec tests (126
    passed; two event-timing tests passed on retry)
  • Use ApiPathString in app-server filesystem permission paths (#28367)
    ## Why
    
    Clients running an app-server on one OS and an exec-server on another OS
    need to be able to pass sandbox config to app-server that refers to
    resources on the executor's foreign OS.
    
    ## What
    
    `AbsolutePathBuf` can't represent these paths and we don't want users to
    be exposed to `PathUri` yet, so this moves the public app-server API to
    be expressed in terms of `ApiPathString`.
    
    Stacked on #28165.
    
    - change app-server v2 filesystem permission paths, including legacy
    read/write roots, to `ApiPathString`
    - localize API paths through `PathUri` when converting into the current
    native core permission types
    - make path-bearing permission conversions fallible and surface
    localization failures instead of silently treating malformed grants as
    ordinary denials
    - propagate conversion failures through app-server and TUI approval
    handling
    - regenerate the app-server JSON and TypeScript schemas
    - leave migration TODOs on native-path conversions so they can be
    removed once core permission paths use `PathUri`
  • [codex] Make plugin details capability aware (#27958)
    ## Summary
    
    Makes plugin details/read flows capability-aware so auth-filtered plugin
    surfaces report the same usable app/MCP/skill shape as the marketplace
    and install flows.
    
    ## Validation
    
    Not run; this change was rebased onto the current plugin auth stack and
    pushed as a draft PR.
    
    **Manual test**
    1. set up a local marketplace with a plugin that has both app and mcp
    declarations
    
    ```
    // .app.json
    {
      "apps": {
        "linear": {
          "id": "some_id"
        }
      }
    }
    
    ```
    
    ```
    // .mcp.json
    {
      "mcpServers": {
        "linear": {
          "type": "http",
          "url": "https://mcp.linear.app/mcp",
          "oauth_resource": "https://mcp.linear.app/mcp"
        },
        "linear2": {
          "type": "http",
          "url": "https://mcp.linear2.app/mcp",
          "oauth_resource": "https://mcp.linear2.app/mcp"
        }
      }
    }
    ```
    
    2a. **login in with api key** and observe plugin details page which
    shows no apps (note we don't show "app not available due to api key log
    in as there's no way to differentiate between no apps and app without
    substitute mcp exists" without significantly more code changes, i've
    separated this to a follow up if we want that behaviour.
    <img width="1170" height="279" alt="Screenshot 2026-06-15 at 23 45 40"
    src="https://github.com/user-attachments/assets/d36cb160-fbec-461e-9643-9c761dbae7bb"
    />
    <img width="975" height="640" alt="Screenshot 2026-06-15 at 18 40 30"
    src="https://github.com/user-attachments/assets/90ec0bc8-7506-4b90-bbd3-070720de799e"
    />
    
    
    2b. **log in with chat** and observe intended conflict resolution logic
    <img width="1165" height="224" alt="Screenshot 2026-06-15 at 17 17 30"
    src="https://github.com/user-attachments/assets/80adfbf2-7dac-4f08-8b76-8eeeab6c95e7"
    />
    <img width="968" height="567" alt="Screenshot 2026-06-15 at 18 38 59"
    src="https://github.com/user-attachments/assets/9ea92c5e-535b-4aa4-8ad0-ee513b57bc3c"
    />
  • [codex] Load API curated marketplace by auth (#28383)
    ## Summary
    - choose the local OpenAI curated marketplace manifest based on auth:
    Codex backend auth gets the existing marketplace, direct provider auth
    gets `api_marketplace.json`
    - include Bedrock API key auth in the direct-provider API marketplace
    path
    - safely skip the API marketplace when `api_marketplace.json` is absent
    
    ## Validation
    - `just fmt`
    - `git diff --check origin/main...HEAD`
    - CI should run the full validation
    
    ## Manual Testing
    
    ### - New api marketplace not available for API key sign
    1. Safely not display anything from api marketplace
    <img width="1161" height="289" alt="Screenshot 2026-06-15 at 21 37 43"
    src="https://github.com/user-attachments/assets/a5f16642-8a20-4ac1-a0de-1274a4c7b5b2"
    />
    
    ### - New api marketplace for API key sign in
    1. Setup api_marketplace.json
    ```
    {
      "name": "openai-curated",
      "interface": {
        "displayName": "Codex official"
      },
      "plugins": [
        {
          "name": "linear",
          "source": {
            "source": "local",
            "path": "./plugins/linear"
          },
          "policy": {
            "installation": "AVAILABLE",
            "authentication": "ON_INSTALL"
          },
          "category": "Productivity"
        }
      ]
    }
    ```
    
    2. Log in with API key, observe that only the defined plugin from
    api_marketplace.json is available from "Codex Official" (outside of
    local testing marketplaces)
    <img width="1167" height="446" alt="Screenshot 2026-06-15 at 21 16 53"
    src="https://github.com/user-attachments/assets/7cf61477-d826-4ef6-bc05-0a23ac1c0259"
    />
    
    also checked functionality on codex app
    
    ### - SiWC users 
    Still uses 'default' marketplace.json and renders all plugins
    <img width="1171" height="502" alt="Screenshot 2026-06-15 at 21 40 25"
    src="https://github.com/user-attachments/assets/d212ea9b-0aa5-470b-8ea4-450efe65bb2b"
    />
    
    also checked functionality on codex app
    
    
    ## Notes
    - `just test -p codex-core-plugins` was started locally before splitting
    branches, but I stopped relying on local tests per follow-up and left
    final validation to PR CI.
  • exec-server: default remote transport to Noise (#26245)
    ## Why
    
    The transport in
    [openai/codex#26242](https://github.com/openai/codex/pull/26242) needs
    to be used by every remote orchestrator-to-executor connection before
    JSON-RPC traffic starts.
    
    ## Changes
    
    - Generates one executor Noise identity when remote exec-server starts
    and registers its public key.
    - Creates a harness identity for each physical remote environment
    connection.
    - Fetches a fresh registry bundle before connecting and validates the
    authenticated harness key before completing the executor handshake.
    - Multiplexes encrypted logical streams over the existing executor
    WebSocket.
    - Adds bounded stream, handshake-failure, and reassembly state.
    - Adds safe lifecycle diagnostics without logging keys, authorizations,
    plaintext, or ciphertext.
    - Covers reconnects, replay rejection, validation failure, framing
    limits, and encrypted JSON-RPC tool traffic.
    
    ## Stack
    
    1. [openai/codex#26242](https://github.com/openai/codex/pull/26242):
    Noise channel and relay transport
    2. **[openai/codex#26245](https://github.com/openai/codex/pull/26245)**:
    remote registration and runtime activation
    
    ## Verification
    
    - `just test -p codex-exec-server`
    - `just fix -p codex-exec-server`
    - `just bazel-lock-check`
    - `cargo shear`
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Run core integration tests against a Wine-backed Windows executor (#28401)
    ## Why
    
    We want to exercise a linux app-server against a windows exec-server
    without having to repeat every test case. This approach has slight
    precedent in the remote docker test setup.
    
    ## What
    
    Run the shared `codex-core` integration suite against Windows
    exec-server behavior from Linux. This makes cross-OS path and shell
    regressions visible while keeping unsupported cases owned by individual
    tests.
    
    - Add `local`, `docker`, and `wine-exec` test environment selection with
    legacy Docker compatibility.
    - Extend `codex_rust_crate` to generate a sharded Wine-exec variant
    using a cross-built Windows server and pinned Bazel Wine/PowerShell
    runtimes.
    - Teach remote-aware helpers about Windows paths and track temporary
    incompatibilities with source-local `skip_if_wine_exec!` calls and
    follow-up reasons.
  • Preserve hook trust bypass in codex exec threads (#26434)
    Addresses #26383 and #26452
    
    ## Summary
    
    `codex exec --dangerously-bypass-hook-trust` printed the bypass warning,
    but valid untrusted hooks still did not run.
    
    Exec applied the flag to its initial config, then lost it when
    app-server reloaded config for the new or resumed thread.
    
    ## Fix
    
    Forward `bypass_hook_trust: true` through the existing thread request
    config override for both start and resume.
    
    The override is omitted when the flag is not enabled, preserving normal
    trust behavior.
    
    ## Testing
    
    Added:
    
    - A test confirming start and resume preserve the override.
    - An end-to-end exec test confirming a `SessionStart` hook runs and
    creates a marker file.