Commit Graph

7193 Commits

  • protocol: remove submission-side serde from Op (#26674)
    ## Why
    
    Submission-side `Op` payloads are now an internal handoff inside the
    Rust codebase, so keeping a stable serde contract there adds complexity
    without a real wire consumer.
    
    ## What changed
    
    - remove serde/schema annotations from `Submission`, `Op`, and
    submission-only payload types like thread settings overrides, additional
    context, realtime conversation params, `TurnEnvironmentSelection`, and
    `RequestUserInputResponse`
    - delete the `Op` serialization tests and the now-unused double-option
    prompt serde helper
    - keep event/API-facing serialization where it is still required, and
    serialize the `request_user_input` tool output from its wire payload
    instead of the core response struct
    - update `protocol_v1.md` to call out that events remain the serialized
    transport surface while submission payloads are implementation details
    
    ## Testing
    
    - `just test -p codex-protocol`
    - `cargo check -p codex-core -p codex-app-server -p codex-thread-store`
    - `just test -p codex-core request_user_input`
  • [2 of 2] Finish moving goal runtime to extension (#26548)
    ## Stack
    
    1. [#26547](https://github.com/openai/codex/pull/26547) - [1 of 2] Align
    goal extension with core behavior
    2. [#26548](https://github.com/openai/codex/pull/26548) - [2 of 2] Move
    goal runtime to extension
    
    ## Why
    
    This PR completes the switch of the goal behavior to the
    extension-backed runtime and removes the old core goal implementation.
    
    ## What Changed
    
    - Installs the goal extension for app-server `ThreadManager` sessions.
    - Routes app-server thread goal `get`, `set`, and `clear` through
    `GoalService`.
    - Uses thread-idle lifecycle emission after goal resume and snapshot
    ordering so the extension can decide whether to continue the goal.
    - Forwards extension goal updates through a FIFO async app-server
    notification path so backpressure does not drop them or reorder updates.
    - Keeps review turns from enabling goal runtime behavior.
    - Plans extension tools before dynamic tools so built-in goal tool names
    keep their old precedence when goals are enabled.
    - Removes the old core goal runtime, core goal tool handlers, and core
    goal tool specs.
    - Updates tests that were coupled to the core-owned goal runtime while
    leaving the legacy `<goal_context>` compatibility path in core for old
    threads.
    - Removes the stale cargo-shear ignore now that `codex-goal-extension`
    is used by the workspace.
    - Keeps realtime event matching exhaustive after removing the old
    goal-specific realtime text path.
    
    
    ## Validation
    
    - Ran manual `/goal` runs in TUI. Validated time accounting matched
    wall-clock time and goal lifecycle state transitions.
  • [codex] Bound WSL local curated discovery (#26669)
    ## Context
    The installed-app suggestion expansion added in #24996 reads plugin
    details for trusted file-backed marketplace candidates because the list
    response does not include app ids. On Windows-backed WSL mounts, the
    local `openai-curated` checkout lives under `$CODEX_HOME/.tmp/plugins`,
    and those per-plugin detail reads can be very slow.
    
    Remote curated already has cached app ids, so it does not need the same
    local filesystem traversal.
    
    ## Summary
    - Keep only the WSL Windows-backed local `openai-curated` checkout on
    the legacy fallback/configured discovery path.
    - Preserve installed-app expansion for non-WSL file-backed marketplaces
    and remote curated.
    - Add focused tests for the WSL local curated path predicate.
    
    ## Test
    - `just test -p codex-core-plugins discoverable`
    - `just test -p codex-core plugins::discoverable::tests`
  • Add JSON output for plugin subcommands (#26631)
    ## Summary
    - Follow-up to #25330 and #26417
    - Add `--json` output for `codex plugin add` and `codex plugin remove`
    - Add `--json` output for `codex plugin marketplace
    add/list/upgrade/remove`
    - Keep existing human-readable output unchanged
    - Keep existing error handling/stderr behavior unchanged; `--json`
    changes successful stdout output only
    - Align marketplace add/remove JSON field names with the existing
    app-server protocol shape
    - Add CLI coverage for plugin and marketplace JSON outputs
    
    ## Validation
    - `just fmt`
    - `just fix -p codex-cli`
    - `just test -p codex-cli`
  • Speed up TUI startup by reusing plugin discovery (#26469)
    ## Summary
    
    TUI startup loads related plugin data from `hooks/list`, session MCP
    initialization, and plugin skill warmup. These paths repeated filesystem
    discovery and emitted the same plugin warnings, while `hooks/list` and
    account/model bootstrap ran serially.
    
    This change:
    
    - Reuses one immutable plugin load outcome across startup consumers.
    - Keys the cache only on plugin-relevant configuration.
    - Single-flights concurrent plugin loads and prevents invalidated loads
    from repopulating the cache.
    - Runs hook discovery and account/model bootstrap concurrently.
    - Preserves configuration-migration ordering, hook review behavior, and
    accurate startup telemetry.
    
    In 10 alternating release-build launches in the Ruff repository with the
    existing `~/.codex` configuration, median time to the first editable
    composer decreased from 833ms to 504ms. The branch was faster in 9 of 10
    pairs, with a paired median improvement of 312ms.
  • Use state DB first for resume --last (#26462)
    ## Summary
    
    `codex resume --last` currently lists sessions by updated time using
    scan-and-repair. Updated-time filesystem listing must stat every rollout
    before applying the cwd, provider, and source filters, so startup scales
    with the entire local session history...
    
    This change queries the state DB first for the latest matching session.
    For local workspaces, we only accept the indexed result when its rollout
    path still exists; otherwise we retry with scan-and-repair. The same
    lookup path is shared by `fork --last`.
    
    I benchmarked the same `thread/list` request used by `resume --last` in
    my local `ruff` checkout against a Codex home with 2,599 active rollouts
    totaling 3.7 GiB, including 90 Ruff threads.
    
    Across five fresh release app-server processes with warm filesystem
    caches, the state-DB-only lookup had median latency of 0.37-0.44 ms,
    while scan-and-repair had median latency of 139-162 ms. First-request
    latency was 0.7-1.7 ms versus 142-185 ms.
    
    So this **removes roughly 140-160 ms from the `resume --last` lookup**
    on this machine, and makes that lookup over 300x faster.
    
    The tradeoff is that this does leave two correctness gaps:
    
    - If a newer matching rollout is missing from SQLite but an older
    matching row exists, the fast path resumes the older thread and never
    falls back to the filesystem scan.
    - If an existing row has stale filter or ordering metadata, the fast
    path can select a different thread from scan-and-repair. The rollout
    tests already demonstrate this for stale cwd metadata: state-DB-only
    returns the stale match, while scan-and-repair removes and repairs it.
    
    So you could end up seeing the "wrong" result in cases like...
    
    1. A crash or SQLite error occurs between Codex writing the conversation
    file and updating SQLite, leaving the newer file unindexed.
    
    2. An older Codex version, restore, or manual copy adds a conversation
    file after SQLite’s one-time backfill completed.
    
    These seem pretty rare though (and sessions can always be recovered via
    other mechanisms -- `--last` is just a convenience feature), and I think
    the tradeoffs are good here?
  • Make runtime workspace roots absolute in app-server API (#26552)
    Stacked on #26532.
    
    ## Why
    
    #26532 moves cwd normalization to the app-server/core boundary.
    `runtimeWorkspaceRoots` still accepted raw paths in v2 requests and in
    `ConfigOverrides`, which left core responsible for interpreting those
    roots later. This makes runtime workspace roots follow the same
    absolute-path boundary as cwd.
    
    ## What
    
    - Change v2 `runtimeWorkspaceRoots` request fields for `thread/start`,
    `thread/resume`, `thread/fork`, and `turn/start` to `AbsolutePathBuf`.
    - Deduplicate already-absolute runtime roots in app-server handlers and
    pass them through `ConfigOverrides.workspace_roots` as
    `AbsolutePathBuf`.
    - Update TUI and exec client request builders to pass absolute runtime
    roots directly.
    - Update app-server docs, schema fixtures, and focused tests for
    absolute runtime roots.
    
    ## Testing
    
    - `just test -p codex-app-server-protocol`
    - `just test -p codex-app-server runtime_workspace_roots`
    - `just test -p codex-core
    session_permission_profile_rebinds_runtime_workspace_roots`
    - `just test -p codex-tui app_server_session`
    - `just test -p codex-exec`
  • [codex] Add turn profiling analytics (#26484)
    ## Summary
    
    Add flat profiling fields to `codex_turn_event` so analytics can explain
    where turn wall-clock time is spent without changing tool execution
    behavior.
    
    The profile reports:
    - time before the first sampling request
    - sampling time across all attempts and follow-ups
    - overhead between sampling requests
    - time blocked in the post-sampling tool drain
    - time after the final sampling request
    - sampling request and retry counts
    
    ## Implementation
    
    - Extend the existing turn timing state with constant-memory phase
    accounting and one RAII phase guard.
    - Observe sampling and the existing post-sampling drain only at turn
    orchestration boundaries.
    - Keep tool runtime, tool futures, response item handling, and turn
    lifecycle values unchanged.
    - Add the profiling fields directly to the existing analytics turn event
    without changing app-server protocol or rollout persistence.
    - Use the existing turn `status` to distinguish completed, failed, and
    interrupted profiles.
    
    Exact sampling/tool overlap is intentionally omitted because measuring
    tool completion accurately would require hooks in the tool execution
    path.
    
    ## Validation
    
    - Add app-server end-to-end coverage for a single-sampling turn with no
    blocking tool work.
    - Add app-server end-to-end coverage for `request_user_input` blocking
    followed by a second sampling request.
    - CI is running on the PR; tests were not executed locally per
    repository guidance.
  • [codex] Respect Windows sandbox backend in exec policy (#26307)
    ## Why
    
    Windows managed filesystem permissions can now be backed by a real
    Windows sandbox. `exec-policy` was still treating the managed read-only
    policy shape as if there were never a sandbox backend, so benign
    unmatched commands such as PowerShell directory listings could be
    rejected with `blocked by policy` even when `windows.sandbox` was
    enabled.
    
    The inverse case still needs to stay conservative: when the Windows
    sandbox backend is disabled, managed filesystem restrictions are only
    configuration intent, not an enforced filesystem boundary. That applies
    to writable-root restricted profiles too, not just read-only profiles.
    
    ## What Changed
    
    - Thread the effective `WindowsSandboxLevel` into exec-policy approval
    decisions for shell, unified exec, and intercepted shell exec paths.
    - Treat managed restricted filesystem profiles as lacking sandbox
    protection only on Windows when `WindowsSandboxLevel::Disabled`.
    - Exclude full-disk-write profiles from that no-backend path because
    they do not rely on filesystem sandbox enforcement.
    - Remove the cwd-sensitive read-only heuristic and the now-stale cwd
    plumbing from exec-policy approval contexts.
    - Add Windows coverage for both enabled-sandbox and disabled-backend
    behavior, including a writable-root managed profile.
    
    ## Validation
    
    - Added/updated `exec_policy` coverage for managed filesystem
    restrictions, full-disk-write exclusion, enabled Windows sandbox
    behavior, and disabled-backend read-only/writable-root behavior.
    - `just test -p codex-core exec_policy` — 100 passed, 10 leaky
    - Empirical local `codex exec` probe with `--sandbox read-only -c
    'windows.sandbox="unelevated"'`: PowerShell directory listing completed
    successfully.
    - Disabled-backend control with Windows sandbox cleared: the same
    command was rejected with `blocked by policy`.
  • fix(tui): restore cancelled prompt cursor at end (#26457)
    ## Why
    
    Pressing `Esc` on a turn that produced no visible output restores the
    submitted prompt so the user can keep editing it. That restore path
    preserved the prompt content, images, and mention bindings, but left the
    composer cursor at the start of the restored text. The next edit
    therefore inserted at the beginning instead of continuing from the end
    of the prompt.
    
    ## What Changed
    
    - Move the cursor to the end after
    `BottomPane::set_composer_text_with_mention_bindings` rehydrates a
    restored draft.
    - Add test-only cursor accessors so restore tests can assert the
    composer state directly.
    - Extend the queued restore regression to assert the restored composer
    cursor is positioned at `text.len()`.
    
    ## How to Test
    
    Manual reviewer flow:
    
    1. Start Codex in the TUI.
    2. Submit a prompt that will take long enough to interrupt.
    3. Press `Esc` before any visible assistant output appears.
    4. Confirm the prompt is restored into the composer and the cursor is at
    the end, so typing appends to the prompt.
    5. Repeat with a prompt that includes an attached image or resolved
    mention and confirm the restored content remains intact.
    
    Targeted tests:
    
    - `just test -p codex-tui
    chatwidget::tests::composer_submission::queued_restore_with_remote_images_keeps_local_placeholder_mapping`
    
    Lint note:
    
    - `just argument-comment-lint` is blocked locally by the existing Bazel
    `compiler-rt` empty glob failure before analyzing touched code. The
    touched Rust diff was manually inspected and adds no new opaque
    positional literal callsites.
  • fix(tui): Windows composer background (#26181)
    ## Why
    
    On Windows, the TUI could not shade the composer against the terminal
    background because `terminal_palette::default_colors()` always fell back
    to `None`. That preserved safety, but it also meant terminals that do
    support OSC 10/11 default color replies had no path to report their real
    background color.
    
    This keeps the existing fallback behavior for unsupported terminals
    while allowing capable Windows terminals to report their default
    foreground/background colors during startup.
    
    | Before | After |
    |---|---|
    | <img width="1235" height="658" alt="win-before"
    src="https://github.com/user-attachments/assets/ff756589-fcb3-43de-8f2a-ebc0369b30dd"
    /> | <img width="1235" height="658" alt="win-after"
    src="https://github.com/user-attachments/assets/9563ff20-4be5-4608-9414-a2afb647e745"
    /> |
    
    ## What Changed
    
    - Moved the OSC 10/11 default color parser in
    `tui/src/terminal_probe.rs` out of the Unix-only implementation so it
    can be reused by Windows.
    - Added a Windows-only bounded OSC 10/11 probe using raw console handles
    and the existing `windows-sys` dependency.
    - Added Windows palette caching in `tui/src/terminal_palette.rs` so
    startup probe results, including `None`, are reused instead of probing
    again later.
    - Wired the Windows color probe into TUI startup after the existing
    non-Unix crossterm cursor and keyboard checks.
    - Added parser coverage for malformed, partial, and noisy OSC color
    replies.
    
    If the probe fails, times out, receives only one color, or receives
    malformed data, the cache stores `None` and the composer keeps the
    current behavior.
    
    ## How to Test
    
    1. On Windows, start Codex in a terminal that supports OSC 10/11 default
    color replies.
    2. Open the TUI composer.
    3. Confirm the composer/status area is painted using the terminal's
    reported default background, instead of leaving the background unshaded.
    4. Start Codex in a terminal that does not answer OSC 10/11, or
    otherwise blocks terminal color replies.
    5. Confirm startup still succeeds and the composer uses the existing
    fallback behavior.
    
    Targeted tests:
    
    - `CARGO_TARGET_DIR=/private/tmp/codex-windows-osc-default-colors-target
    just test -p codex-tui terminal_probe`
    
    Additional local verification:
    
    - `CARGO_TARGET_DIR=/private/tmp/codex-windows-osc-default-colors-target
    just test -p codex-tui` was run; 2774 tests passed, and two unrelated
    Guardian feature-flag tests failed reproducibly when isolated.
    - `just argument-comment-lint` was attempted but blocked by the local
    Bazel/LLVM `include/sanitizer/*.h` empty glob issue. Touched Rust
    literal callsites were inspected manually.
    - `cargo check -p codex-tui --target x86_64-pc-windows-msvc` was
    attempted after installing the target, but local macOS cross-checking is
    blocked by missing Windows C SDK headers in native dependencies
    (`ring`/`aws-lc-sys`).
    
    ---------
    
    Co-authored-by: Kevin Bond <kbond@openai.com>
  • [1 of 2] Align goal extension with core behavior (#26547)
    ## Stack
    
    1. [#26547](https://github.com/openai/codex/pull/26547) - [1 of 2] Align
    goal extension with core behavior
    2. [#26548](https://github.com/openai/codex/pull/26548) - [2 of 2] Move
    goal runtime to extension
    
    ## Why
    
    The goal runtime is moving out of `codex-core` and into
    `codex-goal-extension`. This first PR brings the extension back in line
    with the current core behavior before the follow-up PR switches
    app-server sessions over to the extension, so that review can focus on
    ownership and wiring rather than hidden behavior drift.
    
    ## What Changed
    
    - Updates the extension `create_goal` and `update_goal` tool
    schemas/descriptions to match the current core wording for explicit
    token budgets, blocked-goal audits, resumed blocked goals, and
    system-owned budget/usage-limit transitions.
    - Marks `codex-goal-extension` as the live `/goal` extension crate
    rather than an unwired sketch.
    - Looks up the live thread before reading goal state for idle
    continuation, so continuation setup exits early when no live thread can
    accept the automatic turn.
  • Clean up Rust release workflow (#26335)
    ## Why
    PR #26252 moved macOS release signing into the tag-triggered
    `rust-release` workflow through the protected `codesigning` environment
    and Azure Key Vault. That leaves the old manual unsigned-build /
    signed-promotion handoff as dead compatibility scaffolding: it makes the
    release DAG harder to reason about and keeps paths around that the
    current release process no longer intends to operate.
    
    ## What changed
    - Remove the manual `workflow_dispatch` inputs and validation for
    `build_unsigned`, `promote_signed`, and the deprecated `sign_macos`
    flag.
    - Drop the `stage-signed-macos` job and the promotion-specific artifact
    download, re-upload, pruning, and cleanup logic.
    - Make tag-pushed releases always follow the signed release path: build,
    sign, package, finalize, publish, and then run downstream release jobs
    from `release` success.
    - Remove stale `SIGN_MACOS` / `sign_macos` conditions and outputs,
    including downstream gates for npm, DotSlash, WinGet, dev website
    deploy, and `latest-alpha-cli` branch updates.
    
    ## Verification
    - `ruby -e 'require "yaml"; YAML.load_file(ARGV.fetch(0)); puts "yaml
    ok"' .github/workflows/rust-release.yml`
    - `git diff --check`
    - `rg -n
    "workflow_dispatch|inputs\\.|release_mode|build_unsigned|SIGN_MACOS|outputs\\.sign_macos|sign_macos\\b"
    .github/workflows/rust-release.yml` returned no matches
  • feat(app-server): add remote control pairing status RPC (#26450)
    ## What
    
    Exposes the pairing status transport as experimental app-server v2 RPC
    `remoteControl/pairing/status`.
    
    - Adds request/response protocol types for exactly one lookup key:
    `pairingCode` or `manualPairingCode`, returning `{ claimed }`.
    - Registers the RPC with `global_shared_read("remote-control-pairing")`.
    - Wires the method through `MessageProcessor` and
    `RemoteControlRequestProcessor`.
    - Validates missing/conflicting pairing-code params as invalid requests.
    - Documents the RPC in `app-server/README.md`.
    - Adds processor, protocol export, and JSON-RPC integration coverage for
    both code paths.
    
    ## Why
    
    This is the app-server surface the desktop app can poll while the
    QR/manual pairing modal is active.
    
    Depends on https://github.com/openai/codex/pull/26449
    Related backend change: https://github.com/openai/openai/pull/990244
    
    ## Verification
    
    - `cargo test --manifest-path app-server-protocol/Cargo.toml
    remote_control`
    - `cargo test --manifest-path app-server/Cargo.toml remote_control`
    - `cargo fmt --all --check`
    - `git diff --check`
  • fix(tui): avoid doubled blank rows while streaming (#26636)
    ## Summary
    
    During assistant-message streaming, blank markdown lines in the
    transient active tail were prefixed with two spaces. Ratatui measured
    those whitespace-only lines as two viewport rows, so list- and
    table-heavy answers showed doubled vertical gaps while streaming and
    then visibly compacted when finalized into scrollback.
    
    - keep whitespace-only `StreamingAgentTailCell` lines structurally empty
    while preserving nonblank message prefixes
    - clear impossible hyperlink metadata when normalizing a blank tail line
    - add an inline snapshot and height regression proving one blank
    markdown line occupies one viewport row
    
    Related to #26618, but fixes a separate live-tail row-height issue
    rather than stale committed markdown content.
    
    ## How to Test
    
    Recommended before/after reproduction:
    
    1. Start the latest Codex build without this change.
    2. Submit this exact prompt:
    
    > Send 20 different lists: bullets vs numbered, simple vs complex with
    paragraphs in between items, etc. Intertwine them with some tables and
    some paragraphs.
    
    3. While the answer streams, observe duplicated vertical gaps around
    list items and paragraphs. When the answer finishes, observe the spacing
    compact.
    4. Start this branch with `just c` and submit the same prompt.
    5. Confirm each intended blank markdown line occupies one terminal row
    throughout streaming and that the spacing does not compact or jump when
    the answer finishes.
    6. As a focused regression, verify the sections after the first table,
    especially loose lists with paragraphs between items; those blank rows
    should remain stable throughout streaming.
    
    Targeted tests:
    
    - `just test -p codex-tui
    streaming_agent_tail_blank_line_uses_one_viewport_row`
    - `just test -p codex-tui history_cell::tests`
    
    ## Test Notes
    
    - Verified the exact prompt above in a real tmux TUI using latest Codex
    and this branch as the before/after comparison.
    - The full `just test -p codex-tui` run completed 2,782 of 2,784 tests
    successfully. Two unrelated guardian feature-flag tests fail
    reproducibly in isolation because the expected `OverrideTurnContext`
    message is absent.
    - `just argument-comment-lint` is blocked locally by the existing Bazel
    `compiler-rt` missing-header glob error; the touched Rust diff was
    inspected manually for opaque positional literals.
  • Make turn diff tracker multi-env aware (#26433)
    ## Why
    
    Turn diffs were tracked as one flat set of absolute paths. In
    multi-environment turns, local and remote environments can report the
    same path while representing different filesystems, so a single path key
    can collapse distinct changes or attribute them to the wrong
    environment.
    
    The environment name is **NOT** included in the generated unified diff.
    This can come later.
  • feat(remote-control): add pairing status transport (#26449)
    ## What
    
    Adds transport support for checking remote-control pairing status
    against the backend.
    
    - Adds the normalized `server/pair/status` backend URL.
    - Adds backend request/response structs for exactly one lookup key:
    `pairing_code` or `manual_pairing_code`, returning `{ claimed }`.
    - Adds `RemoteControlEnrollment::pairing_status` and
    `RemoteControlHandle::pairing_status`.
    - Preserves auth refresh/retry behavior and backend error mapping.
    - Adds transport coverage for pending, claimed, manual-code payloads,
    token refresh, mapped backend errors, malformed responses, and URL
    normalization.
    
    ## Why
    
    Desktop needs a host-authenticated way to poll whether a QR or manual
    pairing code has been claimed.
    
    Related backend change: https://github.com/openai/openai/pull/990244
    
    ## Verification
    
    - `cargo test --manifest-path app-server-transport/Cargo.toml
    remote_control::tests::pairing_tests`
    - `cargo fmt --all --check`
    - `git diff --check`
  • [codex] Add /usr/bin/bash shell fallback (#26538)
    ## Why
    
    Some Linux environments expose `bash` at `/usr/bin/bash` instead of
    `/bin/bash`. The shell detection fallback list should cover both
    standard locations once PATH/user-shell probing fails.
    
    Stacked on #26480.
    
    ## What changed
    
    - Add `/usr/bin/bash` to the bash fallback path list in
    `codex-shell-command`.
    - Extend shell type detection coverage for `/usr/bin/bash`.
    - Add AGENTS.md testing guidance to avoid tests for statically defined
    values and negative tests for removed logic.
    
    ## Verification
    
    - `just test -p codex-shell-command`
  • [codex] Allow socketpair in proxy-routed Linux sandbox (#26625)
    ## Summary
    
    - allow `socketpair(AF_UNIX, ...)` in the proxy-routed Linux seccomp
    mode
    - continue denying `socket(AF_UNIX, ...)` so user commands cannot create
    pathname or abstract Unix sockets
    - extend the managed-proxy integration test to verify both behaviors
    
    ## Root cause
    
    `NetworkSeccompMode::ProxyRouted` treated anonymous Unix socket pairs
    like externally addressable Unix sockets and returned `EPERM`. This
    breaks tools that use socket pairs for local child-process IPC even
    though a socket pair cannot connect outside the sandbox or bypass the
    routed proxy.
    
    `dangerously_allow_all_unix_sockets` controls Unix-socket requests
    forwarded by the managed network proxy; it does not currently configure
    the Linux seccomp filter. Socket pairs should not require that dangerous
    setting because they are unnamed, process-local IPC.
    
    Related but independent: #26553 fixes host proxy bridge socket path
    length handling.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Require absolute cwd in thread settings (#26532)
    ## Why
    
    Thread settings cwd overrides are expected to be resolved before they
    enter core. Keeping this boundary as a plain `PathBuf` made it easy for
    core/session code to keep fallback normalization and relative-path
    resolution logic in places that should only receive an already-resolved
    cwd.
    
    This is intentionally the absolute-cwd-only slice: it does not change
    environment selection stickiness or cwd-to-default-environment fallback
    behavior.
    
    ## What changed
    
    - Changes `ThreadSettingsOverrides.cwd`,
    `CodexThreadSettingsOverrides.cwd`, and `SessionSettingsUpdate.cwd` to
    use `AbsolutePathBuf`.
    - Removes core-side cwd normalization/resolution from session settings
    updates.
    - Updates affected core/app-server test helpers and callsites to pass
    existing absolute cwd values or use `abs()` helpers.
    
    ## Validation
    
    Opening as draft so CI can start while local validation continues.
  • feat: reload v2 agents on delivery (#26623)
    ## Summary
    
    This is the first small step toward making multi-agent v2 agents durable
    logical agents whose `ThreadManager` residency is only an implementation
    detail.
    
    This PR adds a narrow v2 reload-on-delivery hook:
    
    - If a known v2 agent target is already loaded, delivery is unchanged.
    - If the target is still registered but missing from `ThreadManager`,
    delivery reloads that exact v2 thread from durable rollout history
    before submitting the message.
    - If the target is unknown, closed, missing from storage, or not a v2
    thread, delivery still fails as not found.
    
    The reload is wired only into existing-agent delivery paths: v2
    `send_message` / `followup_task`, and legacy `send_input` when its
    target is a known v2 agent.
    
    ## Stack
    
    1. **Reload on delivery**: load known unloaded v2 agents before
    `followup_task`, `send_message`, or `send_input` delivery. This PR.
    2. **Residency LRU**: unload idle resident v2 agents from
    `ThreadManager` without making them closed or unreachable.
    3. **Execution concurrency**: count active non-root turns, not logical
    agents or resident idle threads.
    4. **Close semantics**: make v2 close interrupt-only and leave durable
    agent identity intact.
    5. **Resume cleanup**: remove user-facing v2 resume semantics;
    addressing an unloaded durable agent reloads it implicitly.
    
    ## Validation
    
    - Ran `just fmt`.
    - Left broader tests and clippy to CI.
  • Render code comment directives in TUI replay (#26554)
    ## Summary
    
    Resumed Codex App or VS Code review sessions can contain
    `::code-comment` directives that the TUI previously displayed verbatim
    because only rich clients interpret them.
    
    This change rewrites valid line-start directives into readable Markdown
    during assistant-message parsing, using the session working directory
    for relative file paths. The fallback is applied consistently to live
    messages, replayed transcripts, and resume previews while preserving
    malformed directives and existing `::git-*` parsing.
    
    ## Before
    
    The TUI exposed the raw client directive:
    
    ```text
    ::code-comment{title="Fix body= parsing" body="Keep role=\"tab\", ::git-stage{cwd=/tmp}, file=, and \n literal." file="/repo/src/app.ts" start=10 end=12 priority="P2"}
    ```
    
    ## After
    
    The same directive is rendered as readable review feedback:
    
    ```text
    - [P2] Fix body= parsing — src/app.ts:10-12
      Keep role="tab", ::git-stage{cwd=/tmp}, file=, and \n literal.
    ```
    
    Fixes #25658
  • Fix /goal usage text for control commands (#26551)
    ## Why
    
    The TUI's `/goal` usage text only advertised the objective form even
    though `/goal clear`, `/goal edit`, `/goal pause`, and `/goal resume`
    are implemented. This made the lifecycle controls difficult to discover
    and allowed the duplicated help text to drift from actual behavior.
    
    Fixes #25530.
    
    ## What changed
    
    - Show the complete `/goal [<objective>|clear|edit|pause|resume]` syntax
    in usage messages.
    - Share one usage string across slash-command dispatch and goal-related
    app messages.
    - Add inline snapshot coverage for the control-command usage path.
  • Open Windows app workspaces via deep link (#26500)
    ## Summary
    
    Fixes #26423.
    
    On Windows, `codex app PATH` detected Codex Desktop and launched the app
    shell target, then only printed a manual instruction to open the
    workspace. The Desktop app already supports
    `codex://threads/new?path=...`, so the CLI can open the requested
    workspace directly.
    
    This updates the Windows launcher to normalize the workspace path,
    encode it into a `codex://threads/new` deep link, and open that URL when
    Codex Desktop is installed. The installer fallback still opens the
    Windows installer and prints the workspace path for after installation.
  • Surface TUI config write error causes (#26537)
    ## Summary
    
    TUI config writes currently wrap app-server failures with local context
    like `config/batchWrite failed in TUI`, but several user-visible paths
    only render the outer error. That hides the actionable app-server
    message, such as validation constraints or read-only `CODEX_HOME`
    failures, leaving users with a dead-end diagnostic.
    
    This change adds a small formatter next to the TUI config write helpers
    that renders the error source chain, then uses it for model persistence,
    feature persistence, project trust, status line writes, hook trust, and
    hook enablement.
    
    Fixes #26077
  • [codex] Fix long proxy socket paths (#26553)
    ## Summary
    
    - avoid generating host proxy bridge Unix socket paths that exceed
    Linux's `sockaddr_un.sun_path` limit
    - fall back from a long `$CODEX_HOME/tmp` path to the system temp
    directory, then `/tmp`
    - add focused unit coverage for short and overlong parent paths
    
    ## Root cause
    
    With a sufficiently long `CODEX_HOME`, the generated
    `proxy-route-*.sock` path exceeds Linux's 107-byte pathname limit. The
    host bridge child exits before writing its readiness byte, so the parent
    reports the indirect error `failed to prepare host proxy routing bridge:
    failed to fill whole buffer`.
    
    ## Validation
    
    - reproduced the original error with a long `CODEX_HOME` using
    `codex-cli 0.138.0-alpha.4`
    - `cargo clippy -p codex-linux-sandbox --all-targets`
    - `just fix -p codex-linux-sandbox`
    - `just fmt`
    
    The Linux-only unit test could not execute locally: the arm64 Docker
    build was repeatedly OOM-killed by `rustc` while compiling an unrelated
    `codex-app-server-protocol` dependency, before reaching the test.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • feat(app-server): expose account token usage [1 of 2] (#25344)
    ## Why
    
    Token activity is useful account-level context, but terminal clients
    need a supported app-server path to fetch it without reaching into
    ChatGPT backend details directly. The API should also live under the
    broader account usage umbrella so future usage surfaces can be added
    without proliferating user-facing concepts.
    
    ## What Changed
    
    - Add `codex-backend-client` support for the ChatGPT profile token-usage
    payload.
    - Add the v2 `account/usage/read` app-server RPC.
    - Map lifetime usage, peak daily usage, streak, longest task duration,
    and daily buckets into app-server protocol types.
    - Gate the request on Codex-backend auth, which supports ChatGPT auth
    tokens and AgentIdentity.
    - Regenerate the app-server JSON and TypeScript schema fixtures.
    
    ## Token Count Source
    
    `account/usage/read` returns the token-usage aggregate supplied by the
    ChatGPT profile backend. App-server maps that backend-owned aggregate
    into protocol fields; it does not recompute cached-token treatment,
    usage multipliers, or raw input/output totals locally.
    
    ## Stack
    
    1. feat(app-server): expose account token usage [1 of 2] (this PR)
    2. [#25345](https://github.com/openai/codex/pull/25345) feat(tui): add
    token activity command [2 of 2]
    
    ## How to Test
    
    1. Start an app-server client from this branch while authenticated with
    ChatGPT or AgentIdentity.
    2. Call `account/usage/read`.
    3. Confirm the response includes `summary` and `dailyUsageBuckets`.
    4. Also verify a session without Codex-backend auth receives the
    existing auth error path.
    
    Targeted tests:
    - `just test -p codex-backend-client -p codex-app-server-protocol -p
    codex-app-server`
    - `just write-app-server-schema`
  • refactor: split agent control modules (#26610)
    ## Summary
    
    Mechanically splits `AgentControl` into focused modules so later agent
    runtime changes are easier to review. The shared lookup, messaging, and
    completion logic remains in `control.rs`, while spawn-specific code and
    V1 legacy close/resume behavior move into dedicated files.
    
    ## Changes
    
    - Extract spawn-agent code into `agent/control/spawn.rs`.
    - Extract V1-only legacy close/resume behavior into
    `agent/control/legacy.rs`.
    - Keep shared control-plane behavior in `agent/control.rs`.
    - Preserve existing behavior; this PR is intended to be mechanical.
    
    ## Stack
    
    1. This PR - Mechanical `AgentControl` split: extracts spawn and V1
    legacy code without behavior changes.
    2. #26614 - Execution slot accounting: separates logical agents from
    active execution slots.
    3. #26611 - Residency and reload runtime: adds resident-agent LRU,
    eviction/reload, durable lookup, and V2 delivery through reload.
    4. #26612 - V2 tool semantics: narrows `close_agent` to interrupt-only
    and updates V2 tool coverage.
  • [codex] Keep v1 spawn metadata visible (#26599)
    ## Summary
    - keep the legacy v1 `spawn_agent` role and model selectors visible
    - add regression coverage for the default v1 tool plan
    
    ## Why
    `hide_spawn_agent_metadata` is a multi-agent v2 setting, but the v1
    planning branch also consumed it. After the default changed to `true`,
    v1 stopped advertising `agent_type`, `model`, `reasoning_effort`, and
    `service_tier`, preventing configured agents from being selected.
    
    This keeps the hidden-metadata default for v2 while opting v1 out of
    that behavior.
    
    Fixes #26363.
    
    ## Validation
    Not run locally, per request; CI will validate the change.
  • [codex] Forward turn moderation metadata through app-server (#25710)
    ## Why
    First-party backends can supply turn-scoped moderation metadata that
    app-server clients need for client-side presentation. Exposing this as
    an experimental typed notification lets opted-in clients consume it
    without interpreting raw Responses API events.
    
    ## What changed
    - forward `response.metadata.openai_chatgpt_moderation_metadata` from
    Responses API SSE and WebSocket streams as turn-scoped moderation
    metadata
    - emit the experimental app-server v2 `turn/moderationMetadata`
    notification with `{ threadId, turnId, metadata }`
    - add app-server integration coverage for the typed moderation metadata
    notification
    
    ## Testing
    - `just test -p codex-core
    build_ws_client_metadata_includes_window_lineage_and_turn_metadata`
    - `just test -p codex-core` (fails locally: 46 failures and 1 timeout,
    primarily missing `test_stdio_server` and shell snapshot timeouts)
    - `just test -p codex-app-server-protocol`
    - `just test -p codex-app-server
    turn_moderation_metadata_emits_typed_notification_v2`
    - `just test -p codex-app-server` (fails locally: 792 passed, 10 failed,
    and 5 timed out; failures are in existing environment-sensitive tests,
    primarily because nested macOS `sandbox-exec` is not permitted)
    - `just write-app-server-schema --experimental --schema-root
    /tmp/codex-app-server-schema-experimental`
  • nit: doc (#26566)
    Matching CBv9
  • Encrypt multi-agent v2 message payloads (#26210)
    ## Why
    
    Multi-agent v2 currently routes agent instructions through normal tool
    arguments and inter-agent context. That means the parent model can emit
    plaintext task text, Codex can persist it in history/rollouts, and the
    recipient can receive it as ordinary assistant-message JSON.
    
    This changes the v2 path so agent instructions stay encrypted between
    model calls: Responses encrypts the `message` argument returned by the
    model, Codex forwards only that ciphertext, and Responses decrypts it
    internally for the recipient model.
    
    ## What changed
    
    - Mark the v2 `message` parameter as encrypted for `spawn_agent`,
    `send_message`, and `followup_task`.
    - Treat multi-agent v2 tool `message` values as ciphertext
    unconditionally.
    - Store v2 inter-agent task text in
    `InterAgentCommunication.encrypted_content` with empty plaintext
    `content`.
    - Convert encrypted inter-agent communications into the Responses
    `agent_message` input item before sending the child request.
    - Preserve `agent_message` items across history, rollout, compaction,
    telemetry, and app-server schema paths.
    - Leave multi-agent v1 unchanged.
    
    ## Message shape
    
    The model still calls the v2 tools with a `message` argument, but that
    value is now ciphertext:
    
    ```json
    {
      "name": "spawn_agent",
      "arguments": {
        "task_name": "worker",
        "message": "<ciphertext>"
      }
    }
    ```
    
    Codex stores the task as encrypted inter-agent communication:
    
    ```json
    {
      "author": "/root",
      "recipient": "/root/worker",
      "content": "",
      "encrypted_content": "<ciphertext>",
      "trigger_turn": true
    }
    ```
    
    When Codex builds the recipient request, it forwards the ciphertext
    using the new Responses input item:
    
    ```json
    {
      "type": "agent_message",
      "author": "/root",
      "recipient": "/root/worker",
      "content": [
        {
          "type": "encrypted_content",
          "encrypted_content": "<ciphertext>"
        }
      ]
    }
    ```
    
    Responses decrypts that item internally for the recipient model.
    
    ## Context impact
    
    - Parent context no longer carries plaintext v2 agent task instructions
    from these tool arguments.
    - Codex rollout/history stores ciphertext for v2 agent instructions.
    - Recipient requests receive an `agent_message` item instead of
    assistant commentary JSON for encrypted task delivery.
    - Plaintext completion/status notifications are still plaintext because
    they are Codex-generated status messages, not encrypted model tool
    arguments.
    
    ## Validation
    
    - `just test -p codex-tools`
    - `just test -p codex-protocol`
    - `just test -p codex-rollout`
    - `just test -p codex-rollout-trace`
    - `just test -p codex-otel`
    - `just write-app-server-schema`
  • [codex] Add environment shell info (#26480)
    ## Why
    
    Shell detection needs to be available through the `Environment`
    abstraction so callers can ask the selected local or remote environment
    for shell metadata without adding a separate HTTP endpoint or parallel
    info-source path. This keeps shell metadata shaped like the existing
    environment-owned filesystem capability and lets remote environments
    answer through exec-server JSON-RPC.
    
    ## What changed
    
    - Added `environment/info` to the exec-server protocol/client/server and
    exposed `Environment::info()`.
    - Added local and remote environment info providers on `Environment`,
    following the existing capability-provider pattern used for filesystem
    access.
    - Moved the shared shell detection logic into `codex-shell-command` and
    kept core shell APIs as wrappers around that implementation.
    - Returned shell metadata as `EnvironmentInfo { shell: ShellInfo }`
    using the existing shell detection path.
    - Added a remote environment test that calls `Environment::info()`
    through an exec-server-backed environment.
    
    ## Validation
    
    - `git diff --check`
    - `just test -p codex-shell-command`
    - `just test -p codex-core -E 'test(/shell::tests::/)'`\n- `just test -p
    codex-exec-server environment`
  • feat(remote-control): allow pairing while disabled (#26215)
    ## Why
    
    `remoteControl/pairing/start` creates authorization for future
    remote-control connections, so it should not require the live websocket
    to already be enabled. Requiring enable first made pairing depend on
    presence instead of the persisted server enrollment that pairing
    actually uses.
    
    Pairing also needs to recover when that persisted server row is stale.
    If `/server/pair` returns `404`, making the first pairing attempt fail
    forces a manual retry even though the client can clear the stale row and
    create a replacement enrollment immediately.
    
    ## What Changed
    
    - Allow `remoteControl/pairing/start` to reuse or create the persisted
    remote-control server enrollment while remote control is disabled.
    - Keep the selected in-memory enrollment across disable and share it
    with websocket connect so a later enable uses the same selected server.
    - Thread the app-server client name through pairing so stdio persistence
    keeps using the websocket-owned enrollment key.
    - Recover pairing server-token auth failures through the existing
    refresh/auth-recovery path.
    - Recover stale pairing enrollment on `/server/pair` `404` by clearing
    the stale selected enrollment, re-enrolling once, and retrying pairing
    once.
    - Add focused disabled-pairing and stale-pairing recovery coverage.
    
    ## Verification
    
    -
    `remote_control_pairing_start_returns_pairing_artifacts_while_disabled`
    exercises pairing before enable.
    - `remote_control_handle_reenrolls_after_stale_pairing_enrollment`
    exercises stale `/server/pair` `404` recovery without a manual retry.
    
    Related: N/A
  • core: derive exec policy filesystem policy from profile (#26499)
    ## Why
    
    `PermissionProfile` already owns the runtime filesystem sandbox policy
    through `file_system_sandbox_policy()`. Keeping a separate
    `FileSystemSandboxPolicy` on exec-policy fallback contexts made it
    possible for callers and tests to construct split states that the
    production permission model should not rely on.
    
    ## What changed
    
    - Removed `file_system_sandbox_policy` from `UnmatchedCommandContext`,
    `ExecApprovalRequest`, and the intercepted Unix exec-policy context.
    - Derived filesystem sandbox policy inside unmatched-command decision
    logic from `PermissionProfile::file_system_sandbox_policy()`.
    - Simplified shell/unified-exec callers and tests that were only
    plumbing the duplicate policy through.
    
    ## Testing
    
    Local tests not run per request; relying on remote CI.
  • [codex] Keep Bazel startup options stable across commands (#26256)
    ## Why
    
    `just bazel-clippy` ran target discovery with
    `--noexperimental_remote_repo_contents_cache`, then ran the build with
    the workspace default `--experimental_remote_repo_contents_cache`. Bazel
    therefore killed and restarted its server on each transition, slowing
    repeated commands and discarding the in-memory analysis cache. An audit
    found the same class of startup-option variation in several CI command
    sequences.
    
    ## What changed
    
    - Keep local lint target-discovery queries on the workspace-default
    Bazel server, while making CI target discovery explicitly use the CI
    startup options.
    - Normalize GitHub Actions launches through the BuildBuddy wrapper to
    share `BAZEL_OUTPUT_USER_ROOT` and
    `--noexperimental_remote_repo_contents_cache`.
    - Route the CI lockfile check and Windows test-shard query through the
    same startup configuration.
    - Document the startup-option invariant and add wrapper regression
    coverage.
    
    ## Validation
    
    - Confirmed consecutive local clippy target-discovery runs retained the
    same Bazel server PID.
  • fix(rmcp): refresh expired OAuth tokens before startup (#26482)
    ## Why
    
    Codex persists OAuth expiry as an absolute `expires_at`, then
    reconstructs RMCP’s relative `expires_in` when credentials are loaded.
    For an already-expired token, Codex reconstructed `expires_in` as
    missing.
    
    [RMCP 0.15 treated a missing `expires_in` as zero when a refresh token
    was
    present](https://github.com/modelcontextprotocol/rust-sdk/blob/9cfc905a9ef17c8bba6748dc0a9bdd2452681733/crates/rmcp/src/transport/auth.rs#L704-L723),
    so this still triggered a refresh. [RMCP 1.7 treats missing expiry
    information as unknown and uses the access token
    as-is](https://github.com/modelcontextprotocol/rust-sdk/blob/3529c3675ff64db805bd947ca6ece6090809e43d/crates/rmcp/src/transport/auth.rs#L1233-L1265),
    causing the stale token to be sent during `initialize`.
    
    ## What changed
    
    - Represent a known-expired persisted token as `expires_in = 0`,
    preserving `None` for genuinely unknown expiry.
    - Add Streamable HTTP coverage requiring the token to refresh before the
    startup handshake.
    
    ## Validation
    
    - The new regression test fails on RMCP 1.7 before the fix and passes
    afterward.
    - The same scenario passes on the commit immediately before the RMCP 1.7
    update, using RMCP 0.15.
    - `just test -p codex-rmcp-client` (63 passed).
  • [codex] Add use_responses_lite 'override' logic (#26487)
    ## Summary
    
    - add a defaulted `ModelInfo.use_responses_lite` catalog field
    - support serializing `reasoning.context` while preserving the existing
    effort and summary path
    - has not been turned on for any models yet
    
    I've added an override to parallel tools if responses_lite is on. I've
    also forced persistent reasoning when using responses_lite. It would be
    ideal if we could centralize all the responses_lite plumbing, but I
    think this is best for now to keep the plumbing & diffs small.
    
    ## Testing
    
    - `cargo test -p codex-protocol
    model_info_defaults_availability_nux_to_none_when_omitted`
    - `RUST_MIN_STACK=8388608 cargo test -p codex-core
    responses_lite_sets_all_turns_context_and_disables_parallel_tool_calls`
    - `RUST_MIN_STACK=8388608 cargo test -p codex-core
    configured_reasoning_summary_is_sent`
    - `cargo check -p codex-core --tests`
    - `RUST_MIN_STACK=8388608 cargo clippy -p codex-core --tests` (passes
    with pre-existing warnings in `codex-code-mode` and
    `codex-core-plugins`)
  • [codex] Emit sandbox outcome telemetry event (#25955)
    ## Summary
    
    Adds a dedicated `codex.sandbox_outcome` telemetry event so we can query
    sandbox edge outcomes without threading sandbox metadata through
    tool-result output types.
    
    This is meant to make sandbox failures and approved escalation retries
    visible in OTEL while keeping the existing `codex.tool_result` event
    shape focused on tool completion data.
    
    ## What changed
    
    - Adds `SessionTelemetry::sandbox_outcome(...)`, which emits
    `codex.sandbox_outcome` as both a log and trace event.
    - Records the tool name, call id, sandbox outcome, initial attempt
    duration, and escalated attempt duration when a retry runs.
    - Emits `denied` when the sandbox blocks execution and no retry is run.
    - Emits `timed_out` and `signal` when those sandbox errors surface from
    tool execution.
    - Emits `escalated` when the initial sandboxed attempt fails and the
    approved unsandboxed retry succeeds.
    - Adds OTEL coverage for the new event payload, including timing fields.
    
    ## Validation
    
    - `RUST_MIN_STACK=8388608 just test -p codex-core
    sandbox_outcome_event_records_outcome
    handle_sandbox_error_user_approves_retry_records_tool_decision`
    - `just test -p codex-otel
    otel_export_routing_policy_routes_tool_result_log_and_trace_events
    runtime_metrics_summary_collects_tool_api_and_streaming_metrics`
    - `just fix -p codex-core`
    - `just fix -p codex-otel`
  • ci: test windows cross build (#25000)
    We cross build when using bazel for windows. This causes a couple
    hiccups in that v8 does a mksnapshot step that is expecting to snapshot
    on the host arch which wasn't matching when we were doing the
    crossbuild. This was causing segfault failiures when starting up
    codemode from a cross built artifact.
    
    This changes things such that we cross build the library and then run
    and link a snapshot on the host machine/arch which is windows. This
    gives us a functional snapshot and library that can start code-mode on
    windows.
    
    This fixes the build and then fixes two test regressions we had.
  • Pull plugin service less frequently (#26431)
    # Summary
    Reduce download traffic to `github.com/openai/plugins` while continuing
    to check for updates on every Codex startup.
    
    # Root cause
    The startup sync replaced the local repository with a fresh shallow
    clone whenever the remote revision changed. At Codex's global scale,
    repeatedly downloading the repository created excessive GitHub traffic.
    
    # Changes
    - Run `git ls-remote` on each startup to read the remote HEAD SHA.
    - Skip all repository downloads when the local and remote SHAs match.
    - Update existing checkouts with an exact-SHA shallow `git fetch`,
    followed by reset and clean.
    - Bootstrap new installations with `git init` plus the same shallow
    fetch, rather than cloning.
    - Keep the existing file lock so concurrent Codex processes serialize
    updates and do not duplicate fetches.
    - Preserve the existing GitHub HTTP and export archive fallback
    behavior.
    
    # Impact
    Each startup makes one lightweight remote HEAD check. Repository objects
    are downloaded only when the revision changes, and existing Git objects
    are reused during updates.
    
    # Validation
    - `just test -p codex-core-plugins startup_sync` (15 tests passed)
    - `just test -p codex-core-plugins` (201 tests passed)
    - `just clippy -p codex-core-plugins` (passes with one pre-existing
    `large_enum_variant` warning)
    - Production app-server smoke test against GitHub:
      - Fresh home: `ls-remote`, `git init`, one exact-SHA shallow fetch
    - Unchanged restart: `ls-remote` and local `rev-parse` only; no fetch or
    clone
    - Bench smoke passed
  • Improve Windows sandbox setup refresh diagnostics (#26471)
    ## Why
    
    Users have been seeing opaque Windows sandbox setup refresh failures
    such as `windows sandbox: spawn setup refresh`, including reports in
    #24391 and #21208. The setup refresh path already runs the Windows
    sandbox setup helper, but it was not using the same structured
    `setup_error.json` reporting path that elevated setup uses. As a result,
    when the helper exited non-zero, Codex only surfaced a generic refresh
    status instead of the helper's `SetupFailure` code and message.
    
    ## What changed
    
    - Clear stale `setup_error.json` before non-elevated setup refresh
    launches the helper.
    - When the refresh helper exits non-zero, read the helper-written report
    through the existing `report_helper_failure` path.
    - Keep a parent-side launch diagnostic for cases where the helper never
    starts, including the helper path, cwd, sandbox log path, and spawn
    error.
    - Clear the setup error report after a successful refresh.
    - Add regression coverage for report consumption and stale-report
    avoidance.
    
    ## Verification
    
    - `cargo test -p codex-windows-sandbox setup::tests::`
  • [codex] Expose unavailable app templates in plugin detail (#26317)
    ## Summary
    - Adds `unavailable_app_templates` to the app-server protocol and
    generated schemas/types.
    - Parses plugin-service `release.unavailable_app_templates` in the
    remote plugin client.
    - Maps remote unavailable templates into app-server `PluginDetail`.
    - Defaults local plugins to an empty unavailable app template list.
    
    ## Validation
    - `just write-app-server-schema`
    - `cargo +1.95.0 fmt --manifest-path codex-rs/Cargo.toml --all --check`
    - `cargo +1.95.0 test --manifest-path codex-rs/Cargo.toml -p
    codex-app-server-protocol schema_fixtures`
    - `cargo +1.95.0 check --manifest-path codex-rs/Cargo.toml -p
    codex-app-server-protocol -p codex-core-plugins -p codex-app-server`
    - `git diff --check`
    
    Note: default `cargo check` uses rustc 1.89 locally and failed because
    dependencies require newer Rust, so validation was rerun with installed
    Rust 1.95.
  • Add skill for pushing CI configuration changes (#26473)
    ## Why
    
    Codex agents that modify GitHub Actions configuration need clear
    guidance when repository push protections require temporary approval.
    Without it, an agent may pursue an unavailable exemption or stop before
    checking whether the user already has access.
    
    ## What
    
    Add a `pushing-ci-changes` skill that explains the restriction, directs
    agents to attempt the push first, and tells them how to involve the user
    when approval is required.
    
    ## Validation
    
    Not run; this change only adds skill documentation.
  • fix(app-server): expose remote MCP servers in plugin read (#26453)
    ## Why
    
    Remote plugin detail responses include MCP server metadata under
    `release.mcp_servers`, but Codex did not deserialize or propagate that
    field. As a result, `plugin/read` always returned an empty `mcpServers`
    list for remote plugins, so the plugin details pane omitted the MCP
    Servers section even when the remote plugin declares one.
    
    This affects uninstalled plugins as well: the remote detail API is the
    source of truth and returns MCP server keys without requiring a local
    plugin bundle.
    
    ## What changed
    
    - Deserialize MCP server entries from remote plugin detail responses.
    - Normalize their keys into a sorted, deduplicated list on
    `RemotePluginDetail`.
    - Return those keys from app-server `plugin/read` instead of hardcoding
    an empty list.
    - Add regression coverage proving an uninstalled remote plugin returns
    its MCP server names.
    
    ## Test plan
    
    - `just test -p codex-core-plugins`
    - `just test -p codex-app-server plugin_read`
  • [codex] Preserve logical paths during AGENTS.md discovery (#26465)
    ## Intent
    
    Follow up on #26205 by avoiding unnecessary filesystem canonicalization
    during `AGENTS.md` discovery. The configured working directory is
    already absolute, and canonicalization incorrectly switches symlinked
    workspaces from their logical parent hierarchy to the target's
    hierarchy.
    
    ## User-facing behavior
    
    For a symlinked working directory such as:
    
    ```text
    test-root/
    |-- logical-repo/
    |   |-- AGENTS.md              ("logical parent doc")
    |   `-- workspace ------------> physical-repo/workspace/
    `-- physical-repo/
        |-- AGENTS.md              ("physical parent doc")
        `-- workspace/
            `-- AGENTS.md          ("workspace doc")
    ```
    
    Before this change, Codex canonicalized `logical-repo/workspace` to
    `physical-repo/workspace` before discovery. It therefore loaded
    `physical-repo/AGENTS.md` and `physical-repo/workspace/AGENTS.md`,
    ignoring the instructions from the repository through which the user
    entered the workspace.
    
    After this change, ancestor discovery walks the configured logical path,
    so Codex loads `logical-repo/AGENTS.md`. Opening
    `logical-repo/workspace/AGENTS.md` still follows the symlink through the
    host filesystem, so the workspace document is also loaded.
    `physical-repo/AGENTS.md` is not loaded.
    
    ## Implementation
    
    Use the logical absolute working directory when discovering project
    instructions and reporting instruction sources. Filesystem reads still
    follow the working-directory symlink, so an `AGENTS.md` in the target
    workspace continues to load while ancestor discovery uses the symlink's
    parents.
    
    ## Validation
    
    Added integration coverage proving that discovery loads the logical
    parent's instructions and the target workspace's instructions, but not
    the target parent's instructions.
  • Use Winget release environment secret (#26466)
    ## Why
    `WINGET_PUBLISH_PAT` now lives as a GitHub environment secret under
    `mainline-release-winget`. The WinGet release job needs to enter that
    environment so `secrets.WINGET_PUBLISH_PAT` resolves during
    stable/mainline Rust releases.
    
    ## What Changed
    - Attach the `winget` job in `.github/workflows/rust-release.yml` to the
    `mainline-release-winget` environment.
    - Set `deployment: false` so the job can read environment secrets
    without creating GitHub deployment records.
    
    ## Operational Note
    The `mainline-release-winget` environment must allow `rust-v*.*.*` tag
    refs before this can run on release tags. The live environment currently
    has a custom policy named `rust-v*.*.*` with type `branch`; add the
    corresponding `tag` policy before relying on this path for a release.
    
    ## Validation
    - `git diff --check origin/main...HEAD --
    .github/workflows/rust-release.yml`
    - `ruby -e 'require "yaml"; ARGV.each { |f| YAML.load_file(f); puts
    "yaml ok: #{f}" }' .github/workflows/rust-release.yml`
  • [codex] Use model-advertised reasoning effort order (#26446)
    ## Summary
    - preserve the model catalog order for app-server
    `supportedReasoningEfforts` and document that client contract
    - render TUI reasoning choices in the advertised order
    - step reasoning shortcuts by adjacent list position instead of deriving
    order from known effort names
    - anchor unsupported configured values to the advertised default, or the
    first option when needed
    - remove canonical effort ordering helpers and the unused upgrade effort
    mapping
    
    ## Validation
    - `just fmt`
    - Local tests and compilation were not run per request; relying on CI.
    
    Stacked on #26444.
  • [codex] Support model-defined reasoning efforts (#26444)
    ## Summary
    - accept non-empty model-defined reasoning effort values while
    preserving built-in effort behavior
    - propagate the non-Copy effort type through core, app-server, TUI,
    telemetry, and persistence call sites
    - preserve string wire encoding and expose an open-string schema for
    clients
    - update model selection and shortcut behavior for model-advertised
    effort values
    
    ## Root cause
    `ReasoningEffort` gained a string-backed custom variant, so it could no
    longer implement `Copy` or rely on derived closed-enum serialization.
    Existing consumers still moved effort values from shared references and
    assumed a fixed built-in value set.
    
    ## Validation
    - `just fmt`
    - Local tests and compilation were not run per request; relying on CI.
  • Cleanup experimentalFeature/enablement/set (#26312)
    ## Why
    
    `experimentalFeature/enablement/set` still allowed several keys that no
    longer need to be managed through this API. Keeping those keys also
    preserved corresponding special-case logic, including refreshing the
    apps list when the `apps` key was enabled.
    
    The endpoint also rejected an entire request when any key was invalid or
    unsupported. That makes clients brittle when they send a mix of current
    and stale keys, even when the valid entries can still be applied safely.
    
    ## What changed
    
    - remove the feature keys that no longer need to be supported by
    `experimentalFeature/enablement/set`
    - remove the corresponding apps-list refresh path and its auth/config
    plumbing
    - ignore and warn on invalid or unsupported keys while still applying
    valid keys from the same request
    - update the app-server documentation and integration coverage for the
    reduced key set and partial-acceptance behavior
    
    ## Test plan
    
    - `just test -p codex-app-server experimental_feature_enablement_set` (6
    passed)
    - `just test -p codex-app-server` exercised the changed tests
    successfully; unrelated sandbox-dependent and watcher/timing tests
    failed locally