Commit Graph

7488 Commits

  • feat(core): add metadata field to ResponseItem (#28355)
    ## Description
    
    This PR adds an optional `metadata` field to `ResponseItem` for
    Responses API calls. Only mechanical plumbing, no actual values
    populated and sent yet. Turns out just adding a new field to
    `ResponseItem` has quite a large blast radius already.
    
    This change is backwards compatible because `metadata` is optional and
    omitted when absent, so existing response items and rollout history
    without it still deserialize and requests that do not set it keep the
    same wire shape. For provider compatibility, we strip out `metadata`
    before non-OpenAI Responses requests so Azure and AWS Bedrock never see
    this field.
    
    My followup PR here will actually make use of it to start storing and
    passing along `turn_id`: https://github.com/openai/codex/pull/28360
    
    ## What changed
    
    - Added `ResponseItemMetadata` with optional `turn_id`, plus optional
    `metadata` on Responses API item variants and inter-agent communication.
    - Preserved item metadata through response-item rewrites such as
    truncation, missing tool-output synthesis, compaction history
    rebuilding, visible-history conversion, rollout/resume, and generated
    app-server schemas/types.
    - Strip item metadata from non-OpenAI Responses requests while
    preserving it for OpenAI-shaped requests.
    - Updated the mechanical fixture/test construction churn required by the
    new optional field.
  • feat(app-server): expose rate-limit reset credits (#28143)
    ## Why
    
    Codex users can earn personal rate-limit reset credits, but app-server
    clients do not currently have an API for reading or redeeming them. This
    adds the backend and protocol foundation used by the `/usage` TUI flow
    in #28154.
    
    ## What changed
    
    - Extend `account/rateLimits/read` with a nullable
    `rateLimitResetCredits` summary sourced from the existing usage
    response.
    - Add backend-client and app-server support for consuming a reset with a
    caller-generated idempotency key. A UUID is recommended, and clients
    reuse the same key when retrying the same logical reset.
    - Return only the consume `outcome`; clients refetch
    `account/rateLimits/read` for updated window state.
    - Document the response field and each consume outcome, and regenerate
    the JSON and TypeScript schema fixtures.
    - Clarify in `AGENTS.md` that new app-server string enum values use
    camelCase on the wire.
    - Update the existing TUI response fixture for the expanded protocol
    shape.
    - Add coverage for authentication, response mapping, backend failures,
    consume outcomes, and request timeout behavior.
    
    ## Validation
    
    - `just test -p codex-app-server-protocol` — 231 passed.
    - `just test -p codex-backend-client` — 14 passed.
    - Focused `codex-app-server` reset-credit tests — 5 passed.
    - Focused `codex-tui` protocol response fixture test — passed.
    - `just fix -p codex-backend-client -p codex-app-server-protocol -p
    codex-app-server` — passed.
    - `just fmt` — passed.
  • core: cache the tool search handler per session (#27258)
    ## Why
    
    Tool router construction rebuilds the deferred-tool BM25 index during
    session initialization and before each sampling continuation, even when
    the searchable tool metadata is unchanged. Local profiling measured
    `append_tool_search_executor` at roughly 113 ms per continuation, making
    repeated index construction the largest measured router-building cost.
    
    ## What changed
    
    - Add a session-scoped `ToolSearchHandlerCache` so continuations and
    user turns can reuse the existing handler.
    - Key reuse on the complete ordered `Vec<ToolSearchInfo>`, rebuilding
    when searchable text, loadable tool specs, source metadata, or ordering
    changes.
    - Build handlers outside the cache lock and recheck before publishing
    them, avoiding holding the mutex during index construction.
    
    ## Verification
    
    - `cache_reuses_identical_search_infos_and_rebuilds_changed_inputs`
    covers exact cache reuse and invalidation when the ordered search
    metadata changes.
    - Local rollout profiling showed the initial router build populating the
    cache and unchanged later continuations reusing it:
      - uncached: 118 ms median across 14 spans from 3 rollouts
      - cached: 4 ms median across 12 spans from 3 rollouts
  • Add hidden Windows sandbox wrapper entrypoint (#28358)
    ## Why
    
    This is the second PR in the Windows fs-helper sandbox stack. The
    fs-helper path needs a Windows sandbox launcher that has the same
    argv-shaped contract as macOS `sandbox-exec` and `codex-linux-sandbox`,
    but this PR only introduces that hidden launcher. It does not route
    fs-helper through it yet.
    
    The hidden launcher still needs to be policy-complete before later
    direct-spawn callers use it. In particular, it has to carry the same
    Windows sandbox policy details that the existing spawn paths already
    understand: proxy enforcement, read/write root overrides, and
    deny-read/deny-write overrides.
    
    ## What Changed
    
    - Added the hidden `codex.exe --run-as-windows-sandbox` arg1 dispatch
    path.
    - Added `windows-sandbox-rs/src/wrapper.rs`, which parses the wrapper
    argv, launches the requested command through the shared Windows sandbox
    session runner from PR1, and forwards stdio.
    - Added `create_windows_sandbox_command_args_for_permission_profile()`
    so later direct-spawn callers can build the wrapper argv consistently.
    - Made the wrapper argv round-trip the full Windows sandbox policy
    surface it needs later: workspace roots, environment, permission
    profile, sandbox level, private desktop, proxy enforcement, read/write
    root overrides, and deny-read/deny-write overrides.
    - Carried `proxy_enforced` through the shared Windows session request so
    proxy-managed executions continue to use the offline/elevated sandbox
    identity.
    - Added wrapper argument round-trip coverage for the full policy fields.
    
    ## Verification
    
    - `just test -p codex-windows-sandbox windows_wrapper_args_round_trip`
    - `just test -p codex-arg0`
    - `just test -p codex-core exec::tests::windows_`
    - `just fix -p codex-windows-sandbox -p codex-core -p codex-cli`
    
    Local note: the full `just fmt` command still fails on this workstation
    in non-Rust formatter setup (`uv` cache access denied and missing
    `dotslash`/buildifier), but the Rust formatter phase completed.
  • Add Windows unified exec yield floor (#27086)
    ## Why
    
    The Windows `unified_exec` experiment regressed at the turn level in a
    way that points to premature backgrounding / extra command cycles rather
    than individual responses getting heavier:
    
    - `codex_local_tool_calls_per_turn` was up about 20.7%.
    - `codex_local_blended_tokens_per_turn` was up about 4.1%, and
    `codex_local_output_tokens_per_turn` was up about 4.0%.
    - `codex_local_response_latency_per_turn` was up about 8.3%.
    - The primary activity metrics also moved down: `codex_turns` about
    -6.6%, `codex_dau` about -1.0%, and `codex_local_hourly_active_users`
    about -3.0%.
    
    At the same time, the per-response metrics moved in the other direction:
    blended tokens per response, output tokens per response, and latency per
    response were all lower in test. That suggests the bad turn-level shape
    is largely about extra tool/model cycles, not each response being slower
    or more expensive on its own.
    
    Local Windows benchmarking showed the likely mechanism: shell-wrapped
    commands pay a large PowerShell startup/teardown tax before the actual
    command has much time to run. In the benchmark, the PowerShell wrapper
    added roughly 0.7-1.0s versus direct exec:
    
    - Windows PowerShell: about 740ms p50 / 800ms p90 overhead versus direct
    exec.
    - PowerShell 7 (`pwsh`): about 930ms p50 / 980ms p90 overhead versus
    direct exec.
    
    The model commonly asks for a 1s initial yield. On Windows, that can
    spend nearly the whole window waiting on PowerShell machinery, so
    otherwise-short commands are more likely to return as background
    sessions and require follow-up polling/tool calls.
    
    This is intentionally a temporary unlock. It gives Windows closer to the
    same useful post-shell command window as other platforms while we work
    on reducing the PowerShell tax directly, for example with persistent
    PowerShell workers or conservative direct-exec paths for commands that
    do not need shell semantics.
    
    ## What changed
    
    - Adds a Windows-only 2s floor to `unified_exec`'s initial
    `yield_time_ms` clamp.
    - Keeps larger model-requested waits unchanged, including the existing
    10s default.
    - Keeps the existing 30s max clamp.
    - Leaves non-Windows behavior unchanged.
    - Adds platform-gated tests for both the Windows floor and the
    non-Windows clamp behavior.
    
    ## Verification
    
    - `just test -p codex-core unified_exec`
  • recover stale Windows sandbox credentials (#27944)
    ## Why
    
    The elevated Windows sandbox persists dedicated sandbox account
    credentials so later commands can launch without reprovisioning. If
    those persisted credentials drift from the actual Windows account
    password, `CreateProcessWithLogonW` fails with `ERROR_LOGON_FAILURE` and
    Codex currently surfaces that as a hard runner launch failure.
    
    This change makes that failure self-healing. When Windows specifically
    rejects the sandbox login, Codex now treats the persisted sandbox
    credentials as stale, regenerates them through the existing setup path,
    and retries the runner launch once.
    
    ## What Changed
    
    - Preserve `CreateProcessWithLogonW` failures as a typed runner logon
    error so callers can distinguish `ERROR_LOGON_FAILURE` from unrelated
    launch failures.
    - Add a sandbox credential refresh helper that deletes the persisted
    `sandbox_users.json` record and reuses `require_logon_sandbox_creds()`
    to reprovision credentials through the established setup flow.
    - Retry elevated runner startup after stale-credential failures in both
    the legacy elevated capture path and unified exec elevated backend.
    - Add focused tests for stale logon failure detection and persisted
    sandbox user file removal.
    
    ## Validation
    
    - `git diff --check`
    - `cargo test -p codex-windows-sandbox`
  • [codex] Add external agent import result accounting (#28008)
    ## Why
    
    External-agent imports can complete synchronously or continue in the
    background for plugins/sessions. Clients need a stable import id to
    correlate the immediate response with the eventual completion
    notification, and the completion payload needs enough accounting to show
    which artifact types succeeded or failed without hiding partial
    failures.
    
    ## What Changed
    
    - `externalAgentConfig/import` now returns an `importId`;
    `externalAgentConfig/import/completed` includes the same `importId` plus
    type-level `itemResults`.
    - Completed `itemResults` report `successCount`, `errorCount`,
    `successes`, and `rawErrors` for each migrated item type.
    - Added protocol/schema/TypeScript types for import successes, raw
    errors, and type-level results. No progress notification is included in
    the final PR.
    - `ExternalAgentConfigService::import` now returns an outcome object
    with synchronous item results and pending plugin imports.
    - Plugin import outcomes track succeeded/failed marketplaces, plugin
    ids, and raw errors. Plugin failures can be reported in completed
    accounting while later migration items continue.
    - Non-plugin synchronous import failures still fail the request, so
    invalid config/skills-style failures are not reported as a successful
    import response.
    - Session imports now return item results. Successful imports include
    the source session path and imported thread id; prepare, persist,
    ledger, and source-validation failures become raw errors in completion
    accounting where the import can continue.
    - The request processor generates the `importId`, aggregates synchronous
    results with background plugin/session results, and sends a single
    completed notification when all selected work is done.
    - App-server docs and generated schema fixtures were updated for the new
    response/completed payload shapes.
    
    ## Validation
    
    - `just test -p codex-app-server-protocol`
    - `just test -p codex-app-server-client event_requires_delivery`
    - `CODEX_SQLITE_HOME=/private/tmp/codex-app-server-review-sync-error
    just test -p codex-app-server
    external_agent_config_import_returns_error_for_failed_sync_import`
    - `CODEX_SQLITE_HOME=/private/tmp/codex-app-server-review-external-agent
    just test -p codex-app-server external_agent_config`
    
    Note: local sandbox validation used `CODEX_SQLITE_HOME` because the
    default sqlite state path is read-only in this environment.
  • [mcp] Increase default tool timeout to 300 seconds (#28234)
    Summary
    - Increase the default MCP tool-call timeout from 120 to 300 seconds.
    
    Validation
    - `just test -p codex-mcp`
    - `just fmt`
  • Add request user input auto-resolution timer (#28235)
    ## Summary
    - Add TUI auto-resolution handling for `request_user_input` prompts when
    `autoResolutionMs` is present.
    - Use a 60s hidden grace period followed by a 60s visible countdown,
    then submit an empty answer response if the user does not interact.
    - Snooze auto-resolution on key or paste interaction and add
    snapshot/test coverage for the countdown UI.
    
    ## Notes
    - The TUI currently treats `autoResolutionMs` as an enable signal and
    intentionally does not use the provided duration value for the countdown
    policy.
    
    ### Auto resolution
    
    
    https://github.com/user-attachments/assets/5323152f-2ece-4aba-b75d-c32aa776f544
    
    
    ### Snooze after interaction
    
    
    https://github.com/user-attachments/assets/100d54c4-3a41-4c6c-9c07-cd28075a0d62
  • [codex] add path-types skill (#28347)
    ## Why
    
    Codex contributors and agents need repository-scoped guidance for
    choosing compatible Rust types
    for operating system paths during the ongoing URI migration. Keeping the
    guidance in the repository
    makes the app-server and exec-server rules available consistently
    without relying on a personal
    skill installation.
    
    ## What
    
    - Add the `path-types` skill at `.codex/skills/path-types/SKILL.md`.
    - Document the intended uses of `ApiPathString`, `PathUri`,
    `AbsolutePathBuf`, and `PathBuf` across
      protocol, internal, and shared dependency boundaries.
    - Keep migrations of existing types limited to explicit requests and
    proportional edits.
    
    ## Validation
    
    - Validated the skill structure with skill-creator's
    `quick_validate.py`.
  • Use aws-lc-rs for rustls crypto provider (#27706)
    ## Why
    
    Some enterprise TLS proxies issue certificate chains signed with
    `ecdsa_secp521r1_sha512` / `ECDSA_NISTP521_SHA512`. Custom CA
    configuration such as `SSL_CERT_FILE` can add the right trust root, but
    it cannot make `rustls`'s `ring` verifier support a certificate
    signature algorithm it does not advertise.
    
    That can still break TLS after the CA bundle is configured, including on
    Rust websocket paths that call the shared
    `ensure_rustls_crypto_provider()` helper, such as the Responses
    websocket connector and remote app-server client:
    
    -
    [`codex-api/src/endpoint/responses_websocket.rs`](https://github.com/openai/codex/blob/eddc5c75ed527a8348bfcaa85692e53189600833/codex-rs/codex-api/src/endpoint/responses_websocket.rs#L441)
    -
    [`app-server-client/src/remote.rs`](https://github.com/openai/codex/blob/eddc5c75ed527a8348bfcaa85692e53189600833/codex-rs/app-server-client/src/remote.rs#L718)
    
    The `aws-lc-rs` `rustls` provider supports this P-521/SHA-512
    certificate signature scheme, so use it as Codex's process-wide `rustls`
    provider.
    
    ## What Changed
    
    - Switch the workspace `rustls` feature from `ring` to `aws_lc_rs`.
    - Update `codex-utils-rustls-provider` to install
    `rustls::crypto::aws_lc_rs::default_provider()`.
    - Add an assertion and integration test that the installed provider
    supports `ECDSA_NISTP521_SHA512`.
    
    ## Verification
    
    ```shell
    just fmt
    just test -p codex-utils-rustls-provider
    just bazel-lock-update
    just bazel-lock-check
    ```
  • Extract shared Windows sandbox session runner (#28357)
    ## Why
    
    This is the first PR in a stack for the Windows fs-helper sandbox fix.
    Before changing fs-helper behavior, this pulls the reusable Windows
    sandbox session launch pieces out of the debug CLI path so later PRs can
    call the same backend selection and stdio forwarding logic.
    
    Keeping this as a pure refactor makes the later security fix easier to
    review: `codex sandbox windows` should continue to launch the same
    elevated or restricted-token backend, just through shared APIs in
    `windows-sandbox-rs` instead of code local to
    `cli/src/debug_sandbox.rs`.
    
    ## What Changed
    
    - Added `WindowsSandboxSessionRequest` and
    `spawn_windows_sandbox_session_for_level()` in `windows-sandbox-rs` to
    share the elevated-vs-legacy session launch decision.
    - Moved the Windows sandbox stdio forwarding helpers from
    `cli/src/debug_sandbox.rs` into
    `windows-sandbox-rs/src/stdio_bridge.rs`.
    - Updated `codex sandbox windows` to call the shared session launcher
    and stdio bridge.
    - Added unit coverage for the moved stdio forwarding helpers.
    
    ## Verification
    
    - `just bazel-lock-update`
    - `just bazel-lock-check`
    - `just test -p codex-windows-sandbox stdio_bridge::tests`
    - `just fix -p codex-windows-sandbox -p codex-sandboxing -p
    codex-exec-server -p codex-arg0 -p codex-core -p codex-file-system`
    - The new `stdio_bridge` tests also passed as part of `just test -p
    codex-windows-sandbox` on the stack tip. That full local run still fails
    in pre-existing legacy session integration tests with
    `CreateRestrictedToken failed: 87` on this workstation.
  • skills: cache orchestrator resources per thread (#28336)
    ## Why
    
    Hosted orchestrator skills are read through the remote MCP resource
    server. Within one thread, the same catalog or skill resource can be
    requested multiple times by prompt injection and the `skills.list` /
    `skills.read` tools. Re-fetching adds latency and can make those
    surfaces observe different remote contents during the same thread.
    
    This is a follow-up to #28333: orchestrator skills remain limited to
    threads without a local executor, and those threads now get a stable
    per-thread view of the remote skill data they use.
    
    ## What changed
    
    - Reuse the existing per-thread orchestrator catalog snapshot for
    `skills.list` and `skills.read` availability checks.
    - Cache successful orchestrator resource reads by authority, package,
    and resource so prompt injection and tool calls share the same contents.
    - Keep the cache memory-only and bounded to 100 resources and 8 MiB per
    thread.
    - Leave host and executor skill reads unchanged, and do not cache failed
    remote reads.
    
    ## Verification
    
    - Extended the app-server MCP resource integration test to read the same
    hosted skill resource twice and verify that the remote server receives
    one read.
    - The same test verifies that catalog discovery and the selected skill's
    main prompt are each fetched only once per thread.
  • core: let steer interrupt wait_agent (#28341)
    ## Why
    
    `wait_agent` can block for a long timeout while waiting for sub-agent
    mailbox activity. Although same-turn user steer is accepted during that
    tool call, the input remains pending until the wait returns, so an
    explicit request to change direction can appear unresponsive.
    
    ## What changed
    
    - Notify active `wait_agent` calls when user input is steered into the
    current turn.
    - Check for already-pending steer input when subscribing so input that
    races with tool startup is not missed.
    - Distinguish mailbox activity, steered input, and timeout outcomes,
    returning `Wait interrupted by new input.` for the steer path.
    - Update the `wait_agent` tool description to document the early-return
    behavior.
    
    ## Testing
    
    - `just test -p codex-core input_queue_`
    - `just test -p codex-core wait_agent`
    
    The coverage includes steer notification before and after subscription,
    plus an end-to-end test that verifies the interrupted wait result and
    steered user input are both included exactly once in the follow-up model
    request.
  • Support staging OAuth client ID overrides (#28257)
    ## Summary
    
    - allow app-server ChatGPT login to use a configured OAuth client ID
    - reuse the same client ID for refresh and revoke requests
    - cover staging login, refresh, and revoke request payloads
    
    ## Tests
    
    - `just test -p codex-login`
    - `just test -p codex-app-server
    login_account_chatgpt_uses_debug_oauth_overrides`
    - `just test -p codex-login
    logout_with_revoke_revokes_refresh_token_then_removes_auth`
    - `just fix -p codex-login`
    - `just fix -p codex-app-server`
    - `just fmt`
  • bound prompt image cache retention (#28294)
    ## Why
    
    The prompt image cache was bounded to 32 entries, but not by the size of
    those entries. A set of large encoded images could therefore retain
    substantially more memory than intended. Cache hits also cloned the full
    encoded payload.
    
    ## What changed
    
    - cap the cache at 64 MiB of encoded image data while preserving its
    existing 32-entry limit
    - skip caching an image that exceeds the entire byte budget
    - evict least-recently-used entries until the cache is back within its
    byte budget
    - share cached encoded bytes with `Arc<[u8]>` so cache hits do not
    deep-clone image payloads
    
    ## Validation
    
    - `just test -p codex-utils-image`
  • TUI Plugin Sharing 2 - add remote plugin section plumbing (#26702)
    This adds the background plumbing for remote-backed plugin catalog
    sections while leaving the fuller directory presentation to the next PR.
    The TUI can fetch section-specific remote marketplace results, keep
    local plugin data available, and carry section errors forward for later
    rendering.
    
    - Fetches explicit remote marketplace kinds for curated, workspace, and
    shared-with-me sections.
    - Gates shared-with-me loading on the plugin sharing feature flag.
    - Adds section-level error state and user-actionable error copy.
    - Merges remote marketplace results into the cached plugin list without
    discarding local results.
  • guardian: isolate review context from skills and memories (#28285)
    ## Why
    
    Guardian reviews embed the parent session transcript as untrusted
    evidence. Skill or plugin mentions in that transcript must not be
    interpreted as requests to inject more instructions into the Guardian
    request, and memory context adds unrelated model-visible context to an
    approval decision.
    
    Keeping those sources out of the nested review session makes the request
    smaller and preserves the trust boundary around the transcript being
    assessed.
    
    ## What changed
    
    - Skip skill and plugin discovery when building turns for Guardian
    reviewer sessions.
    - Disable memory context and dedicated memory tools in the derived
    Guardian configuration.
    - Extend the Guardian request-layout coverage to verify that a `$skill`
    mention remains visible only as transcript evidence while neither the
    skill body nor memory context is injected.
    - Expand the Guardian configuration test to cover the disabled memory
    settings.
    
    ## Testing
    
    - Updated the Guardian review request snapshot and assertions for skill
    and memory isolation.
    - Extended the Guardian session configuration test to cover memories.
  • [codex] preserve explicit environment cwd (#27995)
    ## Why
    
    `TurnEnvironmentSelections::new` rewrote the primary environment's
    explicit `cwd` to the legacy fallback cwd. For a remote-first selection,
    this could replace the remote working directory with a local fallback
    path and made the legacy cwd overlay authoritative over
    environment-owned state.
    
    ## What changed
    
    - Preserve every explicit environment cwd when constructing turn
    environment selections.
    - Keep `cwd`-only app-server updates compatible by rebuilding the
    default environment selections at the requested cwd.
    - Cover both explicit primary cwd preservation and cwd-only updates
    reaching the model-visible execution environment.
    
    ## Testing
    
    - `just test -p codex-core
    session_update_settings_does_not_rewrite_sticky_environment_cwds`
    - `just test -p codex-core
    environment_settings_preserve_explicit_primary_cwd`
    - `just test -p codex-app-server
    thread_settings_update_cwd_retargets_default_environment`
  • reuse encoded Responses request bodies (#28327)
    ## Why
    
    Responses HTTP requests were converted from `ResponsesApiRequest` into a
    full `serde_json::Value`. `EndpointSession` then deep-cloned that value
    for each retry, and the transport serialized and compressed it again
    before every send.
    
    Large histories make those copies expensive. Retry attempts should reuse
    the same immutable request bytes.
    
    ## What
    
    - Serialize standard Responses requests directly into a ref-counted
    `EncodedJsonBody`.
    - Preserve the Azure path that attaches item IDs before encoding.
    - Prepare JSON, compression, and derived content headers once before the
    retry loop.
    - Clone the prepared request per attempt so body clones only bump the
    `Bytes` reference count.
    - Keep auth inside the retry loop. Signing auth sees the exact final
    headers and body bytes that the transport sends.
    - Preserve request-body TRACE output. With TRACE plus compression,
    retain the original JSON bytes for logging; normal requests keep only
    the final wire bytes.
    - Leave non-Responses endpoint bodies on the existing `Value` path.
    
    ## Performance
    
    A temporary release-mode measurement used a 10 MiB JSON body and 10
    retry preparations:
    
    - old `Value` clone + serialize path: 30 ms total
    - prepared shared-byte path: less than 1 ms total
    
    That is about 3 ms avoided per retry for this payload on the test
    machine. Each retry also stops allocating another request-sized JSON
    tree and serialized buffer. Without TRACE, compressed requests retain
    only the final compressed wire bytes.
    
    ## Validation
    
    - `just test -p codex-client` — 28 passed
    - `just test -p codex-api` — 125 passed
    - `just fix -p codex-client`
    - `just fix -p codex-api`
  • [codex] Cover OTLP HTTP log and trace event export (#27059)
    ## Why
    
    The generic OTLP HTTP paths for log events and trace events need
    end-to-end coverage before exec-server relies on them.
    
    ## What changed
    
    - Adds loopback coverage for exporting `codex_otel.log_only` events to
    `/v1/logs`.
    - Verifies `codex_otel.trace_safe` events are present in the exported
    trace payload.
    
    This is a test-only PR. It does not change OTEL runtime behavior or
    metric APIs.
    
    ## Related work
    
    - #26091: counter descriptions
    - #27057: gauge instruments
    - #27058: second-based duration histograms
    
    This PR is independent and can land directly on `main`.
    
    ## Validation
    
    - `just test -p codex-otel`
    - `just fix -p codex-otel`
    - `just fmt`
  • [codex] remove stale PathExt import (#28344)
    ## Why
    
    `main` fails dev-profile Cargo and Bazel Clippy builds because
    `core/src/tools/runtimes/mod_tests.rs` imports `PathExt` after its last
    use was removed. With warnings denied, that stale import prevents
    `codex-core` test targets from compiling across platforms.
    
    ## What changed
    
    Remove the unused `PathExt` import. Remaining `.abs()` calls in the
    module operate on `PathBuf` and continue to use `PathBufExt`.
    
    ## Validation
    
    - `just fmt`
    - Focused `codex-core` test compile attempted; blocked locally by disk
    exhaustion before compilation completed. The CI failure itself is the
    unused-import diagnostic this change removes.
  • avoid cloning websocket request history (#28313)
    ## Why
    
    WebSocket continuations only send the new part of a request. Checking
    whether a request could be continued was cloning the full previous
    request, the current request, and their input history.
    
    For long conversations or large tool lists, that meant copying several
    request-sized values on every continuation.
    
    ## What changed
    
    - compare the request settings by reference
    - check the previous input and server response as borrowed prefixes
    - allocate only the new input items that will be sent
    
    The reuse rules stay the same, including ignoring `client_metadata` for
    this check.
    
    The comparison is still `O(n)`, but it removes several `O(n)`
    allocations and copies. Temporary memory no longer grows by multiple
    full request sizes for each continuation.
    
    ## Performance
    
    Local rollout traces show continuation checks on turns around 260k input
    tokens. Before this change the reuse gate cloned the previous request,
    the current request, and the previous input history before deciding
    whether it could continue incrementally. After this change it borrows
    those structures and allocates only the incremental tail. For large
    continuations with a small delta, that removes roughly three
    request-sized copies from the hot path and reduces temporary memory from
    multiple full request sizes to just the new tail.
    
    ## Validation
    
    - `just test -p codex-core
    responses_websocket_v2_creates_with_previous_response_id_on_prefix`
    - `just test -p codex-core
    responses_websocket_v2_creates_without_previous_response_id_when_non_input_fields_change`
  • serialize websocket requests directly (#28323)
    ## Why
    
    Responses WebSocket requests were encoded in two steps: first into a
    full `serde_json::Value`, then again into the JSON string sent over the
    socket.
    
    That walks the full request twice and keeps an extra JSON tree alive.
    These requests can contain the complete conversation history and tool
    schemas, so the extra work grows with the request size.
    
    ## What changed
    
    - serialize `ResponsesWsRequest` directly to the wire string
    - pass that string through the existing WebSocket stream and send path
    - keep the existing error mapping, tracing, send timeout, and telemetry
    behavior
    - compare the new wire JSON with the previous `to_value` payload in a
    focused test
    
    ## Performance
    
    I measured both paths in an optimized temporary test using a
    6,324,180-byte request: 4 MiB of history plus 256 tools with 8 KiB
    descriptions. Each path ran 100 times.
    
    - previous `to_value` + `to_string`: 209 ms total, 2.09 ms per request
    - direct `to_string`: 174 ms total, 1.74 ms per request
    - difference: about 17% faster, or 0.35 ms per request
    
    The direct path also removes one full temporary `serde_json::Value`
    tree. For this mostly string-backed payload, that avoids roughly one
    payload-sized copy plus the JSON node overhead. The exact memory saving
    depends on the request shape.
    
    The temporary benchmark was removed before committing.
    
    ## Validation
    
    - `just test -p codex-api` — 125 passed
    - `just fix -p codex-api`
  • avoid cloning sampling request input (#28306)
    ## Why
    
    Every model request cloned the full prepared input just to keep it for
    the legacy after-agent hook. That copy gets more expensive as the
    conversation grows.
    
    ## What
    
    Move the prepared input into the sampling loop and return it with the
    result. If the request retries, keep the first input so the hook still
    sees the same data as before.
    
    This removes one `O(n)` clone per sampling request, where `n` is the
    size of the prepared input. It saves `O(n)` copy work and `O(n)`
    temporary memory.
    
    No behavior change is intended.
    
    ## Performance
    
    Local rollout traces show turns reaching roughly 260k input tokens. On
    turns of that size, this removes the only unconditional full
    prepared-input clone on the happy path. That avoids one request-sized
    allocation/copy per sampling attempt for large conversations, and the
    savings scale linearly with request size.
    
    ## Testing
    
    - `just test -p codex-core continue_after_stream_error`
    - `just fix -p codex-core`
  • linearize history output normalization (#28309)
    ## Why
    
    When we prepare the conversation history, every tool call needs a
    matching output.
    
    Before this change, we scanned the full history again for every call. In
    a tool-heavy conversation, that makes the work `O(items x calls)`, or
    `O(n^2)` in the worst case.
    
    ## What
    
    Scan the history once and collect the IDs of existing outputs. Then each
    call can check its ID with an expected `O(1)` lookup.
    
    The full normalization step is now expected `O(n)`. The output order and
    missing-output behavior stay the same.
    
    ## Performance
    
    Based on local rollout traces, one tool-heavy session reached roughly
    17,050 transcript items with about 4,292 tool-call items. On a history
    of that shape, the old `calls x items` scan does about 73.2 million
    membership checks, while the new pass does about 21.3 thousand set
    inserts/lookups. That is roughly 3.4k times less membership work in this
    normalization step.
    
    ## Validation
    
    - `just test -p codex-core normalize_` (19 passed)
  • Expose explicit dynamic tool namespaces in thread start (#27371)
    Stacked on #27365.
    
    ## Stack note
    
    [#27365](https://github.com/openai/codex/pull/27365) kept `thread/start`
    unchanged and converted its input in `thread_processor`. This PR updates
    `thread/start` to accept explicit functions and namespaces directly.
    
    Legacy per-tool arrays are still accepted and converted while reading
    the request. As a result, `thread_processor` can validate and pass the
    tools through directly, which is why some code added in #27365 is
    removed here.
    
    ## Why
    
    `thread/start.dynamicTools` still repeats namespace data on each
    function even though core now stores explicit namespace groups. The
    request API should use the same shape so each namespace has one
    description and one member list.
    
    ## What changed
    
    - Accept top-level functions and explicit namespace objects in
    `dynamicTools`.
    - Continue accepting fully legacy flat arrays, including
    `exposeToContext`.
    - Reject arrays that mix legacy and canonical entries.
    - Reuse the protocol types directly and remove the temporary app-server
    adapter.
    - Update validation, docs, the test client, and generated schemas.
    
    ## Test plan
    
    - `just test -p codex-app-server-protocol`
    - `just test -p codex-app-server
    dynamic_tool_call_round_trip_sends_text_content_items_to_model`
    - `just test -p codex-app-server
    thread_start_normalizes_legacy_dynamic_tools_into_model_request`
    - `just test -p codex-app-server
    thread_start_rejects_mixed_dynamic_tool_formats`
    - `just test -p codex-app-server
    thread_start_rejects_hidden_dynamic_tools_without_namespace`
  • [codex] simplify memory read metrics (#28164)
    ## Why
    
    Memory read telemetry currently reconstructs the executable shell
    command after a tool call finishes. That duplicates shell, login-policy,
    and cwd resolution owned by the tool handlers, and can diverge from the
    environment-specific command that unified exec actually ran.
    
    ## What changed
    
    - Expose the existing restricted shell-script parser directly for raw
    script text.
    - Parse `shell_command` and `exec_command` input into plain command argv
    before classifying memory reads.
    - Preserve all-or-nothing safe-command validation for multi-command
    scripts.
    - Remove cwd resolution, shell selection, and the unnecessary async
    boundary from memory read metric emission.
    
    ## Testing
    
    - `just test -p codex-shell-command`
    - `cargo check -p codex-core`
  • chore: restore exec-server relay keepalives (#28286)
    ## Why
    
    The ws pump refactor removed the relay keepalive timers that had been
    added to keep idle rendezvous connections alive. An idle relay could
    therefore be closed by the rendezvous service or a load balancer,
    disconnecting executor-backed MCP processes.
    
    ## What
    
    - restore periodic WebSocket ping frames on both rendezvous relay
    endpoints
    - keep missed-tick behavior bounded with `MissedTickBehavior::Skip`
    - cover the harness and remote-environment pumps with focused
    traffic-after-keepalive tests
  • Remove terminal resize reflow flag gates (#27794)
    ## Why
    
    `terminal_resize_reflow` is now stable and should behave as always on.
    Keeping the disabled runtime paths around made the feature look
    configurable even though the rollout is complete, and old config could
    still suggest there was a supported off mode.
    
    ## What Changed
    
    - Marked `terminal_resize_reflow` as `Stage::Removed` while keeping it
    default-enabled for compatibility.
    - Ignored `[features].terminal_resize_reflow` config entries so stale
    `false` settings no longer affect the effective feature set.
    - Removed TUI branches that depended on the flag being disabled, so
    draw, replay buffering, stream finalization, and resize scheduling all
    assume resize reflow is active.
    - Simplified resize smoke coverage to exercise the always-on behavior
    only.
    
    ## Verification
    
    - `just test -p codex-features`
    - `just test -p codex-tui resize_reflow`
    - `just test -p codex-tui initial_replay_buffer
    thread_switch_replay_buffer`
  • [codex] simplify shell snapshot ownership (#27756)
    ## Why
    
    Shell snapshot lifecycle state was split between `Shell` and
    `SessionServices`: `Shell` carried the receiver while session code
    exposed and forwarded the raw sender. That coupled shell identity to
    mutable snapshot state and made refresh, inheritance, and file lifetime
    harder to reason about.
    
    ## What changed
    
    - make each `Arc<ShellSnapshot>` represent one cwd-specific snapshot
    generation
    - store the active generation in `SessionServices` with `ArcSwapOption`
    - have construction start the background build and expose only a
    cwd-validated snapshot path
    - use `ShellSnapshotFile` ownership to delete snapshot files
    automatically
    - pass snapshot paths explicitly to shell runtimes instead of storing
    snapshot state on `Shell`
    - preserve inherited and in-flight generations by pinning their `Arc`
    while they are in use
    
    ## Test plan
    
    - `cargo check -p codex-core --lib`
    - `just test -p codex-core 'shell_snapshot::tests'`
    - `just test -p codex-core
    shell_command_snapshot_still_intercepts_apply_patch`
    - `just test -p codex-core
    shell_snapshot_deleted_after_shutdown_with_skills`
  • skills: hide orchestrator skills with a local executor (#28333)
    ## Why
    
    App-server threads without a local executor need orchestrator-owned
    skills from the hosted `codex_apps` MCP server. Threads with the local
    executor already discover installed skills from the local filesystem.
    
    After the orchestrator skill provider was enabled for every app-server
    thread, local-executor threads also received the hosted skill catalog
    and the `skills.list` and `skills.read` tools. This changed the existing
    local behavior and could expose a second hosted copy of a skill that was
    already installed locally.
    
    ## What changed
    
    - Expose the thread's selected execution environments to extensions at
    thread startup.
    - Enable orchestrator skills only when the reserved local environment is
    not selected.
    - Apply that decision consistently to hosted skill catalog discovery,
    explicit skill injection, and the `skills.list` and `skills.read` tools.
    
    ## Verification
    
    - The existing no-executor app-server test continues to verify hosted
    skill discovery, invocation, and child-resource reads.
    - A new app-server test verifies that local-executor threads do not
    receive hosted skill context or `skills.*` tools.
  • Represent dynamic tools with explicit namespaces internally (#27365)
    Follow-up to #27356.
    
    ## Stack note
    
    This PR changes Codex's internal dynamic-tool shape while leaving
    `thread/start` unchanged. App-server therefore converts the existing
    per-tool input into explicit functions and namespaces before passing it
    to core.
    
    [#27371](https://github.com/openai/codex/pull/27371) updates
    `thread/start` to use the same explicit shape and removes this temporary
    conversion.
    
    ## Why
    
    Dynamic tools repeat namespace metadata on every function. Core should
    keep one explicit namespace with its member tools so descriptions and
    membership stay consistent across sessions and runtime planning.
    
    ## What changed
    
    - Represent dynamic tools as top-level functions or explicit namespaces
    in protocol and session state.
    - Read old flat rollout metadata and write the canonical hierarchy.
    - Flatten namespace members only when registering callable tools.
    - Keep `thread/start.dynamicTools` flat for now and normalize it at the
    app-server boundary.
    
    New builds can read old rollout metadata. Older builds cannot read newly
    written hierarchical metadata.
    
    ## Test plan
    
    - `just test -p codex-app-server
    thread_start_normalizes_legacy_dynamic_tools_into_model_request`
    - `just test -p codex-protocol
    session_meta_normalizes_legacy_dynamic_tools`
    - `just test -p codex-core
    resume_restores_dynamic_tools_from_rollout_with_sqlite_enabled`
    - `just test -p codex-core
    tool_search_returns_deferred_dynamic_tool_and_routes_follow_up_call`
    - `just test -p codex-core code_mode_can_call_hidden_dynamic_tools`
    - `just test -p codex-tools`
  • [codex] Cap feedback upload subtrees (#28332)
    ## Summary
    - cap feedback log uploads to at most eight threads before SQLite log
    aggregation and rollout attachment resolution
    - keep the root session included while bounding descendant fanout during
    `/feedback` uploads
    
    ## Why
    Very large sessions can accumulate large spawned-thread subtrees.
    Feedback uploads currently walk the entire subtree and then read each
    resolved rollout into memory, which can blow up when one session has
    hundreds of descendants.
    
    ## Validation
    - ran `just fmt`
    - did not run tests or Clippy per request; CI will cover validation
  • Activate selected executor plugin MCPs in app-server (#27893)
    ## Why
    
    #27870 teaches the MCP extension how to discover stdio MCP servers
    declared by a selected executor plugin, but app-server does not yet
    install that contributor or initialize its per-thread state. As a
    result, `thread/start.selectedCapabilityRoots` can select the plugin
    while its MCP servers remain inactive.
    
    This PR closes that app-server wiring gap:
    
    ```text
    thread/start(selectedCapabilityRoots)
        -> initialize the thread's selected-plugin MCP snapshot
        -> read the selected plugin's .mcp.json through its environment
        -> start declared stdio servers in that environment
        -> expose their tools only on the selected thread
    ```
    
    ## What changed
    
    - Install the selected-executor-plugin MCP contributor in app-server
    using the existing shared `EnvironmentManager`.
    - Initialize its frozen thread snapshot when `thread/start` includes
    selected capability roots.
    - Document that selected plugin stdio MCPs are activated in their owning
    environment.
    - Add an app-server E2E covering the complete selection-to-tool-call
    path.
    
    The E2E verifies that:
    
    - the selected MCP process receives an executor-only environment value,
    proving the tool runs through the selected environment;
    - the MCP tool is advertised to the model and can be called;
    - a normal MCP config reload does not discard the thread's frozen
    selected-plugin registration;
    - another thread without the selected root does not see the MCP server.
    
    ## Scope
    
    - Existing sessions without `selectedCapabilityRoots` are unchanged.
    - Only stdio MCP declarations are activated. HTTP declarations remain
    inactive.
    - This does not change selected-root persistence across resume/fork or
    add hosted-plugin behavior.
    
    ## Verification
    
    - Focused app-server E2E:
    `selected_executor_plugin_exposes_its_stdio_mcp_only_to_that_thread`
    
    ## Stack
    
    Stacked on #27870.
  • [codex] Skip plugin MCP OAuth for matching app routes (#27461)
    ## Context
    
    This is PR5 in the plugin auth-routing stack. Earlier PRs make plugin
    surface projection auth-aware, narrow App/MCP conflicts by App
    declaration name, and keep connector listings auth-aware. This PR
    applies the same name-based App/MCP conflict rule into plugin MCP
    loading, so install-time MCP OAuth and plugin detail metadata both
    reflect the MCPs available for the current auth route.
    
    ## Stack
    
    - PR1: #27652 seed plugin manager auth at construction.
    - PR2: #27459 route plugin surfaces by auth mode.
    - PR3: #27607 dedupe plugin MCP servers by App declaration name.
    - PR4: #27602 preserve plugin Apps in connector listings.
    - PR5: #27461 skip install-time plugin MCP OAuth for matching App
    routes.
    
    ## Summary
    
    - Make `load_plugin_mcp_servers` auth-aware and let it load App
    declarations before filtering same-name MCP servers for Codex-backend
    auth.
    - Use that filtered MCP list for both install-time MCP OAuth and
    marketplace plugin detail metadata.
    - Preserve API-key/direct auth behavior so plugin MCP servers remain
    visible and can still start OAuth.
    
    ## Validation
    
    ```bash
    cargo fmt --all
    cargo test -p codex-core-plugins read_plugin_for_config_filters_mcp_servers_for_codex_backend_auth
    cargo check -p codex-core-plugins -p codex-app-server
    git diff --check
    git diff --cached --check
    ```
  • [codex] Preserve plugin apps in connector listings (#27602)
    ## Context
    
    This is PR4 in the plugin auth-routing stack. The earlier PRs make
    plugin surface projection auth-aware and narrow App/MCP conflicts by App
    declaration name. This PR keeps connector listing paths aligned with
    that projected plugin App set.
    
    This means ChatGPT/SIWC users will still see plugin-provided Apps in
    connector listing surfaces like the Apps/connector picker, while API-key
    users will not see Apps they cannot use.
    
    ## Stack
    
    - PR1: #27652 seed plugin manager auth at construction.
    - PR2: #27459 route plugin surfaces by auth mode.
    - PR3: #27607 dedupe plugin MCP servers by App declaration name.
    - PR4: #27602 preserve plugin Apps in connector listings.
    - PR5: #27461 skip install-time plugin MCP OAuth for matching App
    routes.
    
    ## Summary
    
    - Have app-server compute effective plugin Apps from the existing
    PluginsManager and pass them into connector listing.
    - Keep plugin Apps visible in Apps/connector listing for ChatGPT/SIWC
    users.
    - Keep API-key-style auth from surfacing plugin Apps in connector
    listings.
    
    ## Validation
    
    ```bash
    cargo test -p codex-chatgpt connectors::tests
    cargo test -p codex-app-server list_apps_includes_plugin_apps_for_chatgpt_auth
    git diff --check
    ```
  • [codex] update multi-agent v2 prompts (#28283)
    ## Summary
    
    - align the default multi-agent v2 root and subagent hints with the
    evaluated prompt guidance for direct collaboration-tool calls, parallel
    delegation, and shared workspaces
    - keep the current `interrupt_agent` tool name and existing
    concurrency-hint placement, with the explicit no-spawn instruction last
    - document the context tradeoff between `fork_turns="none"` and
    `fork_turns="all"` in the v2 `spawn_agent` description
    - extend the focused prompt and tool-surface tests
    
    ## Why
    
    The evaluated multi-agent prompt includes operational guidance that is
    missing from the current Codex defaults. This applies that guidance to
    the current tool surface without restoring stale `close_agent` or
    duplicated concurrency wording.
    
    ## User impact
    
    Multi-agent v2 receives clearer instructions about when and how to
    parallelize work, how agent workspaces interact, and how `fork_turns`
    affects subagent context. The existing default opt-out behavior remains
    in place.
    
    ## Testing
    
    - `just fmt`
    - `just test -p codex-core
    multi_agent_v2_default_usage_hints_use_configured_thread_cap`
    - `just test -p codex-core
    multi_agent_feature_selects_one_agent_tool_family`
  • Discover stdio MCP servers from selected executor plugins (#27870)
    ## Why
    
    **In short:** this PR discovers MCP registrations by reading a selected
    plugin's `.mcp.json` on its executor. #27884 then resolves those
    registrations in the shared catalog.
    
    `thread/start.selectedCapabilityRoots` can select a plugin root owned by
    an executor, and Codex can resolve that package through the executor
    filesystem. MCP declarations inside the selected plugin are still
    ignored.
    
    This PR adds the source-specific discovery layer on top of the
    selected-plugin catalog boundary in #27884:
    
    ```text
    selected capability root
            |
            v
    resolve the plugin through its executor filesystem
            |
            v
    read and normalize its MCP config through the same filesystem
            |
            v
    contribute stdio registrations bound to that environment ID
    ```
    
    The existing MCP launcher and connection manager remain unchanged. MCP
    config parsing is shared with local plugins through #27863.
    
    ## What changed
    
    - Added an executor plugin MCP provider in the MCP extension.
    - Retained only the exact filesystem capability used for package
    resolution and reused it for the selected plugin's MCP config, with no
    host-filesystem fallback or unrelated process/HTTP authority.
    - Read either the manifest-declared MCP config or the default
    `.mcp.json`; a missing default file means the plugin has no MCP servers.
    - Accepted stdio servers only for this first vertical. Executor-owned
    HTTP declarations are skipped with a warning until their placement
    semantics are defined.
    - Normalized stdio registrations with the owning environment's stable
    logical ID and plugin-root working directory.
    - Resolved environment-variable names on the owning executor and
    rejected explicit local forwarding for non-local plugins.
    - Froze discovered declarations once per active thread runtime, then
    applied current managed plugin and MCP requirements when contributing
    them.
    - Carried the selected root ID, display name, and selection order into
    the catalog contribution defined by #27884.
    
    ## Behavior and scope
    
    There is intentionally no production behavior change yet. This PR
    provides the executor provider and contribution boundary, but app-server
    does not install it in this change. Existing local plugin MCP loading is
    unchanged, and no MCP process is launched by this PR alone.
    
    ## Assumptions
    
    - The selected root ID is the plugin policy identity; the manifest
    display name is presentation metadata.
    - An environment ID is a stable logical authority. Reconnection or
    replacement under the same ID does not change ownership.
    - Selected plugin packages and their manifests are trusted inputs.
    - The selected package and MCP discovery snapshot remain frozen for the
    active thread runtime.
    
    ## Follow-up
    
    The next PR installs this contributor in app-server and adds an
    end-to-end test proving that a selected plugin MCP tool launches on its
    owning executor, can be called by the model, survives an explicit MCP
    refresh, and is invisible when its root was not selected.
    
    Resume, fork, environment removal or ID changes, dynamic catalog reload,
    and executor-owned HTTP MCP placement remain separate lifecycle
    decisions.
    
    ## Verification
    
    Focused tests cover executor-only filesystem reads, missing and
    malformed config, stdio filtering and normalization, managed
    requirements, package attribution, and selection order. CI owns
    execution of the test suite.
  • Add selected-plugin precedence and attribution to the MCP catalog (#27884)
    ## Why
    
    **In short:** this PR resolves already-discovered MCP registrations. It
    does not read selected plugins or discover their MCP servers.
    
    The resolved MCP catalog currently builds config and auto-discovered
    plugin registrations before runtime contributors are applied. A
    thread-selected plugin needs a distinct precedence tier in that same
    initial resolution pass: otherwise a disabled lower-precedence winner
    can leave stale name-level state behind, and the winning MCP tools
    cannot be attributed to the selected package reliably.
    
    This PR adds that catalog boundary before executor discovery is
    connected.
    
    ## What changed
    
    - Added an explicit selected-plugin registration tier between
    auto-discovered plugins and explicit config.
    - Collected selected-plugin contributions before the initial catalog
    build, while leaving compatibility and generic extension overlays in
    their existing runtime phase.
    - Retained the winning plugin ID and display name directly on
    plugin-owned catalog registrations.
    - Derived MCP tool provenance from the winning catalog entry instead of
    joining against local-only plugin summaries.
    - Retained the winning selected server's tool approval policy in the
    running connection manager, so a selected registration cannot inherit
    approval behavior from a losing local plugin.
    - Kept remembered approval session-scoped for selected plugins until
    there is an authority-aware persistence contract; Codex will not write
    approval back to an unrelated local plugin.
    - Preserved existing name-level disabled vetoes for discovered plugins
    and config, while keeping a selected package's own disabled registration
    scoped to that registration.
    - Preserved deterministic selection order and existing config,
    compatibility, and extension precedence.
    
    The resulting order is:
    
    ```text
    auto-discovered plugin
      < selected plugin
      < explicit config
      < compatibility registration
      < extension overlay
    ```
    
    ## Behavior and scope
    
    This is a catalog and provenance change only. No production host
    contributes selected-plugin MCP registrations yet, so existing local MCP
    behavior remains unchanged.
    
    The stacked follow-up, #27870, installs the executor plugin provider
    that produces these registrations. App-server activation remains a
    separate final step.
    
    ## Verification
    
    Focused tests cover precedence, deterministic selected-plugin conflicts,
    disabled-veto behavior across catalog phases, managed requirements
    before selected-plugin resolution, winning-server approval policy, and
    attribution when local and selected packages share an ID or server name.
    CI owns execution of the test suite.
  • feat(app-server): filter threads by parent (#26662)
    ## Why
    
    Clients that display or coordinate spawned subagents need an
    authoritative snapshot of a thread's immediate spawned children when
    they connect to app-server or recover after missing live events.
    `thread/list` cannot query by parent, so clients must otherwise scan
    unrelated threads or reconstruct relationships from rollout history and
    transient events.
    
    The direct spawn relationship already exists in persisted
    `thread_spawn_edges` state. Review and Guardian threads do not
    participate in that lifecycle and are intentionally outside this
    filter's scope.
    
    ## What changed
    
    This adds an experimental `parentThreadId` filter to `thread/list`.
    Parent-filtered requests return direct spawned children from persisted
    state while preserving the existing response shape, explicit filters,
    sorting, and timestamp-only cursor behavior. The lookup does not read
    rollout transcripts or recursively return descendants.
    
    Supersedes #25112 with the narrower `thread/list` filter approach.
    
    ## How it works
    
    1. An experimental client passes a valid thread ID as `parentThreadId`.
    2. App-server routes the list through the existing thread-store and
    state-database boundaries.
    3. SQLite selects threads whose IDs have a direct persisted spawn edge
    from that parent.
    4. Omitted provider and source filters include all values; explicit
    filters keep ordinary `thread/list` semantics.
    5. Grandchildren, Review threads, and Guardian threads are excluded.
    
    ## Verification
    
    State (144 tests), rollout (69 tests), and focused app-server
    thread-list (31 tests) suites passed. Scoped Clippy checks and
    repository formatting also passed. Coverage includes direct spawned
    children, omitted grandchildren, pagination, malformed IDs, mixed source
    kinds, explicit filters, and operation without rollout files.
  • [codex] exec-server honors remote environment cwd and shell (#28122)
    ## Why
    
    Next slice needed to make progress on the `remote_env_windows` test is
    to support passing a Windows cwd for the remote environment and using
    that environment's native shell. This lets the test run a real Windows
    process instead of only recording an early path or shell mismatch.
    
    ## What
    
    - change `TurnEnvironmentSelection.cwd` from `AbsolutePathBuf` to
    `PathUri`
    - convert local cwd values to URIs when constructing selections
    - preserve a remote primary cwd instead of replacing it with the local
    legacy fallback
    - prefer the selected environment's discovered shell for unified exec,
    falling back to the session shell when unavailable
    - convert back to a host-native absolute path at current native-only
    consumer boundaries
    - reject or deny unsupported foreign cwd values at the existing
    request-permissions boundary, with TODOs for its future migration
    - extend the hermetic Wine test to execute Windows PowerShell in
    `C:\windows` and verify successful process completion
    - record the current app-server rejection against the same Wine-backed
    remote Windows fixture when its cwd is supplied as a native Windows path
  • path-uri: render native paths across platforms (#27819)
    ## Why
    
    We're moving to `PathUri` in more places to support cross-OS
    app-server/exec-server, but we don't want to expose the URI encoding to
    users of app-server's public APIs yet.
    
    We'll need to translate at the app-server API boundary between
    client-visible "regular" paths that are appropriate for the OS of the
    environment for which the paths make sense, which means using the
    environment's path personality to do the conversion.
    
    `PathUri` doesn't yet attempt to encode environment ID, so for now we'll
    sniff the most likely path convention for a given path.
    
    ## What
    
    - Add `PathConvention` and `NativePathString` with host-independent
    POSIX, Windows drive, and UNC rendering.
    - Cover cross-host rendering, encoding, Unicode, invalid components.
  • bazel: add PowerShell to Wine test harness (#28120)
    ## Why
    
    Cross-OS tests in the wine environment will be much more faithful if we
    can also test powershell integration.
    
    ## What
    
    Add an x86_64 powershell binary to the bazel wine environment and
    include smoke tests.
  • build: run buildifier from just fmt (#28125)
    ## Intent
    
    Keep Bazel and Starlark files consistently formatted without requiring
    contributors to install or version buildifier themselves.
    
    ## Implementation
    
    - Add a SHA-256-pinned, cross-platform DotSlash manifest for buildifier
    v8.5.1.
    - Run buildifier from the shared `just fmt` and `just fmt-check` driver,
    with Windows-safe explicit DotSlash invocation.
    - Provision DotSlash in formatting CI and contributor devcontainers, and
    document the source-build prerequisite.
    - Apply the initial mechanical buildifier formatting baseline.
  • [codex] Pin bundled SQLite to fixed WAL-reset version (#27992)
    ## Summary
    
    Prevent dependency refreshes from silently downgrading Codex's bundled
    SQLite to a release affected by the WAL-reset corruption bug.
    
    SQLx 0.9 accepts a broad `libsqlite3-sys` range. An unrelated lock
    refresh therefore moved Codex from `libsqlite3-sys 0.37.0` back to
    `0.35.0`, changing the bundled SQLite runtime from 3.51.3 to 3.50.2.
    SQLite documents the affected versions and fix in [The WAL Reset
    Bug](https://www.sqlite.org/wal.html#the_wal_reset_bug) and the [SQLite
    3.51.3 changelog](https://www.sqlite.org/changes.html#version_3_51_3).
  • [codex] Dedupe plugin MCPs by app declaration name (#27607)
    ## Context
    
    This is the next step in the plugin auth-routing stack. The earlier PRs
    make `PluginsManager` auth-aware and move the broad App/MCP surface
    decision into that layer. This PR narrows the ChatGPT/SIWC behavior so
    we only hide a plugin MCP server when it conflicts with an App
    declaration of the same name.
    
    In product terms: if a plugin exposes both an App route and MCP route
    for `foo`, ChatGPT/SIWC sessions should use the App route for `foo`. If
    the same plugin also exposes a separate MCP server like `foo2`, that MCP
    server should remain available.
    
    ```json
    // .app.json
    {
      "apps": {
        "foo": {
          "id": "connector_abc"
        }
      }
    }
    ```
    
    ```json
    // .mcp.json
    {
      "mcpServers": {
        "foo": {
          "url": "https://mcp.foo.com/mcp"
        },
        "foo2": {
          "url": "https://mcp.foo2.com/mcp"
        }
      }
    }
    ```
    
    ## Stack
    
    - PR1: #27652 seed plugin manager auth at construction.
    - PR2: #27459 route plugin surfaces by auth mode.
    - PR3: #27607 dedupe plugin MCP servers by App declaration name.
    - PR4: #27602 preserve plugin Apps in connector listings.
    - PR5: #27461 skip install-time plugin MCP OAuth for matching App
    routes.
    
    ## Summary
    
    - Preserve App declaration names in loaded plugin metadata.
    - Keep public effective App outputs as deduped connector IDs for
    existing callers.
    - For ChatGPT/SIWC, suppress only plugin MCP servers whose names match
    declared App names.
    
    ## Validation
    
    ```bash
    cargo fmt --all
    cargo test -p codex-core-plugins plugin_auth_projection
    cargo test -p codex-core-plugins effective_apps
    cargo test -p codex-core-plugins read_plugin_for_config_installed_git_source_reads_from_cache_without_cloning
    cargo test -p codex-core explicit_plugin_mentions_use_apps_for_chatgpt_dual_surface_plugins
    cargo test -p codex-core explicit_plugin_mentions_keep_non_conflicting_mcp_for_chatgpt_auth
    cargo test -p codex-app-server --test all plugin_install_filters_disallowed_apps_needing_auth
    git diff --check
    ```
    
    ---------
    
    Co-authored-by: Xin Lin <xl@openai.com>
  • [codex] Carry exec-server cwd as PathUri (#28032)
    ## Why
    
    This is the second-to-last place in the exec-server protocol that needs
    to migrate to URIs to support cross-OS operation.
    
    ## What
    
    - Change `ExecParams.cwd` to `PathUri`.
    - Keep the cwd URI-shaped through core and rmcp producers, converting it
    to `AbsolutePathBuf` only in `LocalProcess::start_process`.
    - Reject non-native cwd URIs before launch and update the affected
    protocol documentation and call sites.
  • [codex] package Windows ARM64 on x64 (#28001)
    The first release after parallelizing Windows packaging moved the
    critical path to the ARM64 packaging job:
    
    https://github.com/openai/codex/actions/runs/27451157324
    
    The x64 job started immediately and finished in 5m29s. The ARM64
    job waited 76s for its runner and then took 5m56s, holding the
    release for 1m43s after x64 had finished.
    
    Packaging only downloads, signs, archives, and compresses already
    built binaries. It does not execute target code. Run both packaging
    jobs on x64 runners, keeping ARM64 hardware for compilation.
  • [codex] Send turn state through compact requests (#28002)
    ## Context
    
    Inline compaction is part of the active logical turn. Compact requests
    and the sampling requests around them should use the same turn state,
    including when compaction is the first request to establish it.
    
    ## Change
    
    Pass the turn-scoped `OnceLock` directly to inline v1 compaction so
    `/responses/compact` includes an established value in the existing HTTP
    header. Capture `x-codex-turn-state` from the compact response into that
    same lock, allowing pre-turn compact to establish the value that
    subsequent sampling reuses.
    
    V2 compact already uses the normal Responses HTTP/WebSocket path and
    continues to share the same `OnceLock` without separate plumbing. The
    first returned value wins for the logical turn.
    
    ## Test plan
    
    Integration coverage verifies that:
    
    - pre-turn v1 compact can establish state for the first sampling request
    - inline v1 compact receives established state over HTTP
    - inline v2 compact reuses established state over HTTP
    - inline v2 compact reuses established state over WebSocket
    
    CI validates the full change.