Commit Graph

7585 Commits

  • Expose selecte namespaces as direct model tools (#28825)
    ## Why
    
    Som tools, such as history and notes, must remain top-level when MCP
    deferral is enabled while staying unavailable through code-mode `exec`.
    
    ## What changed
    
    - Added `features.code_mode.direct_only_tool_namespaces`.
    - Classified matching MCP tools as `DirectModelOnly`.
    - Kept those tools top-level in `code_mode_only`.
    - Excluded them from `tool_search` deferral and the nested `exec`
    surface.
    - Updated the generated config schema.
    
    ## Validation
    
    - `code_mode_only_exposes_direct_model_only_mcp_namespaces`
    - `load_config_resolves_code_mode_config`
  • Refresh signed exec-server URLs on reconnect (#28374)
    ## Summary
    
    - add a provider API that supplies a fresh signed WebSocket URL for each
    remote exec-server connection
    - refresh the signed URL after disconnects and retry once when a
    handshake returns `401 Unauthorized`
    - allow `EnvironmentManager` consumers to register remote environments
    backed by the URL provider
    
    ## Tests
    
    - `just test -p codex-exec-server -E
    'test(remote_websocket_client_refreshes_url_after_unauthorized_handshake)
    | test(remote_websocket_client_refreshes_url_after_disconnect)'` — 2
    passed
    - `cargo check -p codex-core-api` — passed
    - `just fix -p codex-exec-server` — passed
    - `just fix -p codex-core-api` — no test targets; no-op
    - `just fmt` — passed
    - `just test -p codex-exec-server` — 187 passed; 32 unrelated macOS
    sandbox tests could not invoke nested `sandbox-exec` (`Operation not
    permitted`)
  • [codex] Support assistant realtime append text (#28836)
    ## Why
    
    Frontend realtime voice continuity needs to replay a tiny
    previous-session overlap as actual conversation items, including
    assistant text. The app-server `thread/realtime/appendText` API already
    carries a role through to the Rust realtime websocket layer, but the
    shared role enum only accepted `user` and `developer`.
    
    ## What Changed
    
    - Added `assistant` to `ConversationTextRole` and regenerated the
    app-server schema/type fixtures.
    - Added `output_text` as a realtime conversation content type.
    - Updated realtime websocket item creation so assistant appendText emits
    `content: [{ type: "output_text", text }]`, while user and developer
    continue to emit `input_text`.
    - Updated app-server docs and tests to cover assistant appendText
    alongside the existing developer role behavior.
    
    ## Validation
    
    - `just write-app-server-schema`
    - `just fmt` (first sandboxed attempt failed because `uv` could not
    access `~/.cache/uv`; reran with filesystem access and passed)
    - `just test -p codex-api` passed: 126/126
    - `just test -p codex-app-server-protocol` passed: 239/239, including
    generated JSON/TypeScript fixture checks
    - `just test -p codex-app-server` was started locally but stopped per
    request after unrelated local sandbox/Seatbelt failures (`sandbox-exec:
    sandbox_apply: Operation not permitted`) and one missing local `codex`
    binary failure; CI should be faster and more authoritative for the full
    suite.
  • [codex] control automatic realtime handoff delivery (#27986)
    ## What
    
    Built on the realtime speech-control plumbing merged in #27917.
    
    - Add optional `codexResponseHandoffPrefix` to `thread/realtime/start`.
    - Apply that prefix only to automatic V1 commentary sent through
    `conversation.handoff.append`; final answers remain unprefixed.
    - Add opt-in `clientManagedHandoffs`. When true, core suppresses
    automatic response handoffs and completion output so delivery is
    controlled by explicit client append APIs.
    - Preserve existing automatic behavior by default.
    `codexResponsesAsItems: true` continues to select item routing when
    client-managed mode is disabled.
    
    ## Why
    
    Voice clients need two delivery policies: automatic background context
    with silent commentary instructions and fully client-owned handoffs.
    Phase-aware prefixing keeps routine commentary silent without
    suppressing the final answer, while client-managed mode lets an app
    decide exactly which updates to append.
    
    ## Validation
    
    - `just fmt`
    - `cargo test -p codex-app-server-protocol
    serialize_thread_realtime_start`
    - `RUST_MIN_STACK=16777216 cargo test -p codex-core --test all
    conversation_handoff_persists_across_item_done_until_turn_complete`
    - `RUST_MIN_STACK=16777216 cargo test -p codex-app-server --test all
    webrtc_v1_client_managed_handoffs_disable_automatic_output`
    - `RUST_MIN_STACK=16777216 cargo test -p codex-app-server --test all
    webrtc_v1_final_automatic_handoff_omits_silent_prefix`
    - `cargo build -p codex-cli --bin codex`
    - Local Codex Apps compatibility check: 43 focused webview tests passed,
    and a live voice session routed through the source-built app-server.
    
    The explicit `RUST_MIN_STACK` avoids a macOS Tokio test-worker stack
    overflow seen with the default test environment.
  • [codex] Use unique IDs for realtime-routed turns (#28826)
    ## Why
    
    A durable realtime voice orchestrator can reconnect and resume through
    multiple fresh `Session` instances. Realtime handoffs were using the
    Session-local `auto-compact-N` counter as their turn identity, but that
    counter restarts at zero for every resumed Session. The durable thread
    could therefore accumulate duplicate turn IDs, violating the uniqueness
    assumptions made by app-server and web clients. In Codex Apps, a new
    delegated response stream could be attached to an older turn with the
    same ID, placing live output higher in history and putting turn-scoped
    actions at risk.
    
    Persisted rollout and reconstructed model-context order were already
    correct because raw response items remain append-only and chronological.
    This change restores unique identity for reconstructed and live turn
    surfaces.
    
    ## What changed
    
    - Generate a UUIDv7 specifically for each realtime-routed delegation.
    - Leave the existing `auto-compact-N` identity path unchanged for actual
    internal auto-compaction turns.
    - Extend the inbound realtime handoff integration test to require a UUID
    turn ID from `turn/started`.
    
    ## Verification
    
    - `just test -p codex-core inbound_handoff_request_starts_turn`
    - `just fix -p codex-core`
    - `just fmt`
  • fix(install): support older awk checksum parsing (#28784)
    ## Why
    
    The standalone installer validates package checksums with an awk
    interval expression. Older mawk releases do not support that expression,
    so they reject valid 64-character digests and report that the release
    manifest is missing an entry. This affects both x64 and ARM64 systems on
    common Debian-derived environments.
    
    Fixes #24219.
    
    ## What Changed
    
    Replace the awk interval expression with an explicit length check plus
    rejection of non-hexadecimal characters. This preserves the existing
    SHA-256 validation and lowercase normalization while working with older
    awk implementations.
    
    ## How to Test
    
    1. Build and run the checksum predicate with mawk 1.3.4 20121129.
    2. Confirm the old interval predicate rejects a valid 64-character
    digest.
    3. Confirm the updated predicate accepts that digest.
    4. Put the old mawk binary first on PATH as awk and run
    scripts/install/install.sh with an isolated HOME, CODEX_HOME, and
    CODEX_INSTALL_DIR.
    5. Confirm Codex installs successfully and the installed binary reports
    version 0.140.0.
    6. Verify the predicate rejects wrong-length digests, non-hexadecimal
    digests, and entries for another asset while accepting uppercase
    hexadecimal digests.
  • [codex] Add optional IDs to response items (#28812)
    ## Why
    
    `ResponseItem` variants do not have a consistent internal ID shape: some
    variants carry required IDs, some carry optional IDs, and some cannot
    represent an ID at all. The existing fields also use inconsistent serde,
    TypeScript, and JSON-schema annotations. A single enum-level access path
    is needed before history recording can assign and retain IDs.
    
    This PR establishes that internal model only. It intentionally does not
    generate or serialize IDs; allocation and wire persistence are isolated
    in the stacked follow-up.
    
    ## What changed
    
    - Give every concrete `ResponseItem` variant an `Option<String>` ID
    field.
    - Apply the same internal-only annotations to every ID field:
    `#[serde(default, skip_serializing)]`, `#[ts(skip)]`, and
    `#[schemars(skip)]`.
    - Add `ResponseItem::id()` and `ResponseItem::set_id()` as the shared
    accessors.
    - Preserve IDs when history items are rewritten for truncation.
    - Adapt consumers that previously assumed reasoning and image-generation
    IDs were required.
    - Regenerate app-server schemas so the hidden fields are represented
    consistently.
    
    The serde catch-all `ResponseItem::Other` remains ID-less because it
    must remain a unit variant.
    
    ## Test plan
    
    - `cargo check --tests -p codex-core -p codex-api -p codex-rollout-trace
    -p codex-image-generation-extension`
    - `just test -p codex-protocol`
    - `just test -p codex-app-server-protocol`
    - `just test -p codex-api -p codex-rollout-trace -p
    codex-image-generation-extension`
    - `just test -p codex-core event_mapping`
  • feat(exec-server): add Noise rendezvous environment (#28774)
    ## Why
    
    Codex can run a remote exec server through the Noise relay, but the
    normal
    environment-manager path could not establish an
    environment-registry-backed
    harness connection. Signed rendezvous URLs and harness authorizations
    are
    short-lived, so reconnects must fetch a fresh bundle instead of
    retaining
    stale connection credentials. A stalled registry request must also fail
    within
    the regular remote connection deadline, without exposing these
    credentials in
    debug logs.
    
    Issue: N/A (internal environment-service integration).
    
    ## What Changed
    
    - Add environment-manager configuration for a registry-backed Noise
    rendezvous
      environment.
    - Request a fresh bundle from
    `/cloud/environment/{environment_id}/connect` for every physical harness
      connection, using the existing 10-second remote connection timeout.
    - Share the Environment Registry register, connect, and validate wire
    payloads
      through `codex-exec-server` and `codex-core-api`.
    - Redact the signed rendezvous URL and harness authorization from the
    public
      connect response's `Debug` output.
    - Add focused coverage for registry bundle retrieval, stalled requests,
    and
      credential redaction.
  • path-uri: decouple native path parsing (#28778)
    ## Why
    
    `PathUri::join` should not depend on the app-server compatibility
    wrapper `LegacyAppPathString` to parse native paths. Native path parsing
    belongs to the URI abstraction that it constructs.
    
    ## What
    
    Move platform-independent native path parsing into the root `PathUri`
    module. `PathUri::join` and `LegacyAppPathString` now share the
    crate-private `PathUri::from_absolute_native_path` constructor.
  • [codex] trace tools build latency (#28782)
    Add more tracing spans around tool building.
  • bazel: refresh expired macOS SDK pin (#28791)
    ## Why
    
    macOS Bazel jobs fail before target analysis because the pinned Apple
    CDN object now returns HTTP 403.
    
    ## What
    
    Uprev the pin to Apple's currently live macOS 26.5 Command Line Tools
    package, including its checksum and SDK extraction path.
    
    ## Validation
    
    - Built `@macos_sdk//sysroot` from a fresh Bazel output root.
    - Regenerated and checked `MODULE.bazel.lock`; it remains unchanged.
  • fix(plugins): support root local marketplace plugins (#28771)
    ## Summary
    - allow local marketplace `source.path: "."` and `source.path: "./"` to
    resolve to the marketplace root
    - keep `""` invalid and preserve rejection of non-root paths without
    `./` plus non-normal/traversal paths
    - add focused regression coverage for repo-root plugin layouts and
    rejected local paths
    
    ## Tests
    - `RUSTUP_TOOLCHAIN=stable just fmt`
    - `RUSTUP_TOOLCHAIN=stable just test -p codex-core-plugins`
    - `RUSTUP_TOOLCHAIN=stable just fix -p codex-core-plugins`
    
    Note: plain pinned-toolchain `just fmt` was blocked locally by a rustup
    `clippy` component conflict, so validation used the working stable 1.95
    toolchain fallback.
  • exec-server: expose environment registry payloads (#28651)
    ## Why
    
    Services that proxy the exec-server environment registry endpoints need
    to deserialize and forward the same Noise registration and harness-key
    validation payloads. Those wire models currently live as private,
    serialize-only structs in `exec-server`, which forces consumers to
    duplicate the contract.
    
    ## What changed
    
    - Add owned serde models for registration and harness-key validation
    requests and responses.
    - Use those models in the existing exec-server registry client.
    - Re-export the models from `codex-exec-server` and `codex-core-api`.
    - Keep the harness authorization request free of a derived `Debug`
    implementation so it is not accidentally logged.
    
    ## Testing
    
    - Focused exec-server registration and harness-key validation tests: 2
    passed.
    - `cargo check -p codex-core-api`
    
    The full `codex-exec-server` suite compiled and ran 254 tests: 222
    passed, while 32 existing filesystem sandbox tests could not run under
    the nested macOS sandbox (`sandbox_apply: Operation not permitted`).
    
    Co-authored-by: Codex <noreply@openai.com>
  • [codex] Track plugin install and import telemetry failures (#28731)
    ## Summary
    - Track plugin install failures through the unified
    `codex_plugin_install_failed` event for local installs, remote install
    preflight failures, bundle failures, and remote catalog/backend
    failures.
    - Send classified `error_type` values in plugin install failure
    analytics instead of raw error strings.
    - Stop sending raw external-agent import errors in analytics while
    preserving raw failure details in app-facing import
    notifications/history.
    - Keep raw plugin/migration diagnostics in `tracing::warn!` logs.
    - Keep remote failure plugin names as the existing local placeholder
    (`unknown`) and remove the extra telemetry plugin-name override.
    - Change `ExternalAgentConfigImportParams.source` from a generated enum
    to `string | null`, with legacy `claudeCode` / `claudeCowork` inputs
    normalized to existing analytics values.
    
    ## Testing
  • unified-exec: preserve PathUri through exec-server (#28681)
    ## Why
    
    It should be possible for app-server to handle "foreign" OS paths in
    unified_exec working directories, allowing e.g. a Linux app-server to
    run processes on e.g. a Windows exec-server.
    
    ## What
    
    Convert the core unified_exec cwd values to use `PathUri`.
    
    Adds fallible path conversion in several places to try to minimize the
    scope of this change. The only time this change suppresses errors from
    converting `PathUri` to an `AbsolutePathBuf` is when the turn is
    configured with no sandboxing at all to allow us to make progress
    testing without sandboxing.
    
    Future changes to apply_patch and sandboxing will clean up these error
    paths.
    
    A tool's cwd is resolved from joining a model-provided workdir to the
    environment's cwd. When using `AbsolutePathBuf::join()`, an
    absolute-path workdir would overwrite the environment's cwd and we would
    resolve permissions/sandboxing against the model-provided path. This
    change extends `PathUri::join()` to also treat an absolute rhs as an
    override of the base/lhs.
    
    This also removes some coverage from the remove_env_windows tests until
    a follow-up converts foreign paths in command exec events correctly.
    
    ## Breaking Changes
    
    When using `AbsolutePathBuf::join()` for workdir resolution, we ended up
    resolving tilde-prefixed paths against the app-server's `$HOME`, e.g.
    `~/foo/bar` becomes `/home/anp/foo/bar`. It's difficult to do this with
    `PathUri` joining, so after offline discussion this PR no longer
    implements it.
    
    A quick check of some power users' rollouts suggests that models don't
    actually generate home-prefixed absolute working directories for their
    spawns, so this shouldn't have any real blast radius.
  • [codex] Use compact OpenAI docs search queries (#28389)
    ## Summary
    
    Updates the bundled OpenAI Docs skill to use compact, title-like search
    queries. This performs better in Codex.
    
    ## Validation
    
    - OpenAI Docs skill validation passed
    - `git diff --check`
  • Extract TUI plugin catalog rendering (#28768)
    This mechanically extracts the existing TUI plugin catalog and detail
    popup rendering from `chatwidget/plugins.rs` into a new
    `chatwidget/plugin_catalog.rs` module. `plugins.rs` now keeps the
    stateful plugin workflow and orchestration, while `plugin_catalog.rs`
    owns the presentation-heavy catalog/detail popup construction and its
    pure helpers. The goal is to keep `plugins.rs` focused before later
    plugin sharing work adds more catalog behavior.
    
    - Moves existing catalog/detail popup builders and related pure helpers
    into `plugin_catalog.rs`
    - Leaves plugin fetch/state/key handling in `plugins.rs`
    - Adds only minimal sibling-module visibility/import wiring
    - Intentionally makes no product behavior or UI changes beyond the code
    move
  • [codex] Restore thread recency with compatible migration history (#28671)
    ## Summary
    
    - Revert #28655, restoring the thread `recencyAt` behavior introduced by
    #27910.
    - Move `threads_recency_at` to migration 0039 so it no longer collides
    with `external_agent_config_imports` at version 0038.
    - Repair databases that already applied the recency migration as version
    38 by moving the matching migration-history row to version 39 before
    SQLx validation. The current version-38 migration can then apply
    normally.
    
    ## Validation
    
    - `just test -p codex-state
    migrations::tests::repairs_recency_migration_that_was_applied_as_version_38`
    - `just test -p codex-state -p codex-rollout -p codex-thread-store -p
    codex-app-server-protocol -p codex-tui`: 3,439 passed; six TUI tests
    could not open the machine's existing read-only incident database at
    `~/.codex/sqlite/state_5.sqlite`.
    - `just fix -p codex-state`
    - `just fmt`
    - Verified that state migration versions are unique.
  • feat: add run task identity primitives (#19047)
    ## Stack
    
    This is PR 1 of the simplified HAI single-run-task stack:
    
    - [#19047](https://github.com/openai/codex/pull/19047) Agent Identity
    assertion and task-registration primitives, including the shared
    run-task helper used by existing Agent Identity JWT auth.
    - [#19049](https://github.com/openai/codex/pull/19049)
    Disabled-by-default ChatGPT auth opt-in that provisions/reuses persisted
    Agent Identity runtime auth and its single run task.
    - [#19051](https://github.com/openai/codex/pull/19051) Run-scoped
    provider auth that uses one backend-owned task id for first-party
    inference and compaction requests.
    
    [#19054](https://github.com/openai/codex/pull/19054) collapsed out of
    the active stack because the simplified design no longer needs a
    separate background/control-plane task helper.
    
    ## Summary
    
    The simplified POC shape is one backend-owned task per Agent Identity
    run. This PR makes the first layer match that final shape directly
    instead of introducing task targets, caller-owned external task refs, or
    intermediate wrappers that later PRs would need to undo.
    
    What changed:
    
    - keeps the `AgentAssertion` wire payload as `agent_runtime_id`,
    `task_id`, `timestamp`, and `signature`
    - exposes `register_agent_task` as the single task-registration helper
    for both existing Agent Identity JWT auth and the ChatGPT-registration
    path added later in the stack
    - makes task registration send only the signed registration timestamp;
    the backend owns the returned opaque task id
    - removes the unused target/task-kind/external-task-ref surfaces from
    `codex-agent-identity`
    - keeps Agent Identity JWT JWKS lookup separate from agent/task
    registration URL derivation
    - updates Agent Identity JWT auth to register one run task during auth
    construction and share that task across cloned auth handles
    
    This PR intentionally does not enable ChatGPT-derived Agent Identity.
    That opt-in and config gate are added in the next PR.
    
    ## Testing
    
    - `just test -p codex-agent-identity`
  • Scope command approvals by execution environment (#28738)
    ## Why
    
    Command approval cache keys included the command and working directory,
    but not the execution environment. An approval for `/workspace` locally
    could therefore be reused for the same command and path on an executor.
    
    ## What changed
    
    - Include the selected environment ID in shell and unified-exec approval
    cache keys.
    - Carry that ID through the normal command approval request so clients
    can show which environment is being approved.
    - Expose the environment through app-server as a required nullable
    `environmentId` and show it in the inline TUI approval prompt.
    - Keep older recorded approval events compatible when the environment is
    absent.
    
    For example, `echo ok` in local `/workspace` and `echo ok` in executor
    `/workspace` now produce different approval keys and separate prompts.
    
    ## Scope
    
    This PR does not change network approvals, Guardian review actions, MCP
    elicitation, full-screen TUI rendering, or environment-ID validation.
    Remote `shell_command` execution itself remains in #28722; this PR only
    makes its approval key environment-aware.
  • Tell codex to avoid changing rollout format. (#28632)
    Just adds a requirement to the path-types skill to nudge Codex away from
    touching rollout types while migrating paths.
  • [codex] Repair invalid skill frontmatter scalars (#28628)
    ## Why
    
    The community marketplace audit found many skill frontmatter parse
    failures where values were intended as prose, but were not valid YAML.
    Common examples include unquoted scalar values with `: `, such as
    `description: Build for AWS: ECS` or `argument-hint: <duration: e.g.
    7d>`, and flow-looking values such as `tags: [next,@supabase/ssr]`.
    
    `serde_yaml` does not expose a permissive mode for this. The parser
    fails before unknown frontmatter fields can be ignored, so a
    compatibility repair has to happen before retrying YAML parsing.
    
    ## What changed
    
    Skill frontmatter loading still uses `serde_yaml` as the primary parser.
    If that parse fails, the loader performs a line-oriented repair of
    scalar frontmatter field values, then retries parsing.
    
    The fallback now:
    
    - applies to any frontmatter mapping field, not just `description` /
    `short-description`
    - quotes unquoted scalar values that contain a YAML colon separator such
    as `: `
    - quotes invalid flow-looking scalar values that start with `[`, `{`,
    `@`, or backtick
    - preserves already quoted values
    - skips `|` / `>` block scalar bodies so multiline descriptions are not
    rewritten
    - returns the original YAML error if the repaired frontmatter still
    cannot parse
    
    ## Examples
    
    This previously failed because the second `: ` was parsed as YAML
    structure:
    
    ```yaml
    description: AWS deployment patterns: ECS Fargate, Lambda, and S3
    ```
    
    The fallback now parses it as if it had been written explicitly as:
    
    ```yaml
    description: 'AWS deployment patterns: ECS Fargate, Lambda, and S3'
    ```
    
    The same repair now applies to ignored frontmatter fields that still
    need to be valid YAML for the parser to get through the document:
    
    ```yaml
    argument-hint: <duration: e.g. 7d, 2w>
    tags: [next,@supabase/ssr]
    ```
    
    Valid YAML multiline descriptions continue to work through normal
    parsing without repair:
    
    ```yaml
    description: |-
      Build for AWS: ECS
      and Lambda
    ```
    
    ## Validation
    
    - Added loader coverage for unquoted `description` values containing `:
    `.
    - Added loader coverage for unquoted `metadata.short-description` values
    containing `: ` and an apostrophe.
    - Added loader coverage for unrecognized frontmatter fields that need
    quoting, including `argument-hint` and `tags`.
    - Added block-scalar coverage to ensure multiline description bodies are
    preserved while other fields are repaired.
    - `just test -p codex-core-skills` (106 passed)
    - `just fix -p codex-core-skills`
  • Run fs helper through Windows sandbox wrapper (#28359)
    ## Why
    
    This is the final PR in the Windows fs-helper sandbox stack and contains
    the actual bug fix.
    
    The exec-server filesystem helper is a direct-spawn path: it asks
    `SandboxManager` for a `SandboxExecRequest`, then launches the returned
    argv itself. That works on macOS and Linux because the transformed argv
    is already a self-contained sandbox wrapper. On Windows, the transformed
    request carried `WindowsRestrictedToken` metadata, but the direct-spawn
    fs-helper runner still launched the helper argv directly.
    
    That means Windows filesystem built-ins backed by the fs-helper could
    run with the parent Codex process permissions instead of the configured
    Windows sandbox. This PR makes the direct-spawn transform produce a
    self-contained Windows wrapper argv before fs-helper launches it.
    
    ## What Changed
    
    - Added `SandboxManager::transform_for_direct_spawn()` for callers that
    launch the returned argv themselves.
    - Wrapped Windows restricted-token direct-spawn requests with `codex.exe
    --run-as-windows-sandbox` and then marked the outer request as
    unsandboxed, matching the macOS/Linux wrapper argv shape.
    - Updated `exec-server/src/fs_sandbox.rs` to use the direct-spawn
    transform for fs-helper launches.
    - Materialized the inner `codex.exe --codex-run-as-fs-helper` executable
    into `.sandbox-bin` so the sandboxed user can run it.
    - Carried runtime workspace roots through `FileSystemSandboxContext` as
    `PathUri` values so `:workspace_roots` policies resolve correctly
    without sending native client paths over exec-server JSON.
    - Preserved wrapper setup identity environment needed by Windows sandbox
    setup without changing the serialized inner helper environment.
    
    ## Verification
    
    - `just bazel-lock-update`
    - `just bazel-lock-check`
    - `just test -p codex-sandboxing transform_for_direct_spawn_windows`
    - `just test -p codex-exec-server fs_sandbox::tests`
    - `just fix -p codex-windows-sandbox -p codex-sandboxing -p
    codex-exec-server -p codex-core -p codex-file-system`
    
    Local note: `just fmt` completed Rust formatting, but this workstation
    still fails the non-Rust formatter phases because uv cannot open its
    cache and the local buildifier/dotslash path is missing.
  • [ez][codex-rs] Support apps._default.default_tools_approval_mode (#27965)
    [from codex]
    
    ## Summary
    
    - add `default_tools_approval_mode` to `[apps._default]` and expose it
    through app-server v2 `config/read`
    - apply it after managed, per-tool, and per-app approval settings,
    before the built-in `auto` fallback
    - document the precedence, regenerate config/app-server schemas, and add
    unit plus end-to-end approval coverage
    
    ## Configuration
    
    ```toml
    [apps._default]
    default_tools_approval_mode = "prompt"
    ```
    
    The effective precedence is managed requirements, tool-specific
    `approval_mode`, app-specific `default_tools_approval_mode`,
    `apps._default.default_tools_approval_mode`, then `auto`.
    
    ## Test plan
    
    - `just write-config-schema`
    - `just write-app-server-schema`
    - `just write-app-server-schema --experimental`
    - `just test -p codex-core app_tool_policy`
    - `just test -p codex-core mcp_turn_metadata`
    - `just test -p codex-config`
    - `just test -p codex-app-server-protocol`
    - `just test -p codex-app-server config_read_includes_apps`
    - `just fix -p codex-config -p codex-core -p codex-app-server-protocol
    -p codex-app-server`
    - `just fmt`
  • Replace SkillsManager with SkillsService (#28705)
    ## Why
    
    Host skill discovery was still exposed as a manager even though it is a
    process-owned service shared by sessions, the app-server catalog, and
    file-watcher invalidation. The skills extension also consumed an ad hoc
    loaded-skills wrapper instead of a named immutable snapshot.
    
    ## What changed
    
    - replace `SkillsManager` with concrete `SkillsService`
    - make the service cache and return immutable `HostSkillsSnapshot`
    values
    - migrate the skills extension host provider to the snapshot boundary
    - migrate app-server catalog, watcher, and invalidation paths to the
    service
    
    This keeps the service limited to host discovery, caching, roots, and
    invalidation. Catalog rendering and invocation remain extension
    responsibilities for the next stacked change.
  • app-server: keep the model cache warm (#28699)
    ## Why
    
    The app server is long-lived, but its shared model cache otherwise
    refreshes only when a caller needs it. Once the five-minute cache
    expires, starting a thread or calling `model/list` can wait for
    `/models` on the request path.
    
    Refresh the cache in the background before it expires so foreground
    callers normally use fresh local state.
    
    ## What changed
    
    - Start an app-server worker that refreshes models immediately and then
    every three minutes using the existing models-manager API.
    - Hold only a weak reference to the models manager between refreshes, so
    the worker does not extend its lifetime.
    - Stop scheduling refreshes when the app-server lifecycle handle is shut
    down or dropped. A refresh already in progress is allowed to finish.
    - Adjust affected app-server test fixtures to distinguish the background
    `/models` probe from the connection they are testing.
    
    The existing models-manager cache, refresh strategies, auth handling,
    ETag behavior, and concurrency semantics are unchanged.
    
    ## Testing
    
    -
    `models_refresh_worker::tests::refreshes_immediately_periodically_and_stops_when_dropped`
    -
    `suite::v2::remote_control::listen_off_honors_persisted_remote_control_enable`
    -
    `suite::v2::attestation::attestation_generate_round_trip_adds_header_to_responses_websocket_handshake`
  • Add join key for MAv2 inter-agent messages (#28561)
    ## Summary
    This keeps inter-agent communication on the existing raw response item
    path and adds a join key for MAv2 tool calls.
    
    MAv2 `spawn_agent`, `send_message`, and `followup_task` now stamp the
    originating tool call id into `ResponseItemMetadata.source_call_id` on
    the raw `ResponseItem::AgentMessage`. App-server clients can join that
    raw item back to the existing tool/activity event by call id, while
    using the raw agent message's existing sender, receiver, and content
    fields.
    
    No new app-server `ThreadItem` or notification type is added.
    
    ## Tests
    - `just fmt`
    - `just write-app-server-schema`
    - `just test -p codex-protocol`
    - `just test -p codex-app-server-protocol`
    - `just test -p codex-core
    multi_agent_v2_spawn_returns_path_and_send_message_accepts_relative_path`
    - `just test -p codex-core
    multi_agent_v2_followup_task_completion_notifies_parent_on_every_turn`
    - `just fix -p codex-protocol`
    - `just fix -p codex-app-server-protocol`
    - `just fix -p codex-core`
  • Back off registry retries during exec recovery (#28546)
    ## Why
    
    PR #28512 retries a failed session recovery every 100 ms. Every Noise
    recovery attempt first asks the environment registry for a fresh
    connection bundle, even when the eventual failure comes from the
    WebSocket or initialize handshake. During an outage, that could make
    each disconnected client call the registry about 250 times during the
    25-second recovery window.
    
    ## What changes
    
    All retryable Noise recovery failures now use a separate backoff
    schedule:
    
    ```text
    base:    500 ms -> 1 s -> 2 s -> 4 s -> 5 s maximum
    actual:  500-750 ms, 1-1.5 s, 2-3 s, 4-6 s, 5-7.5 s
    ```
    
    The extra 0-50% is deterministic per-session jitter so disconnected
    clients do not retry together. Direct WebSocket recovery keeps the
    existing 100 ms retry because it does not re-enter the registry.
  • Resume exec-server sessions after disconnect (#28512)
    Supersedes #28288 (closed).
    
    ## Why
    
    A short WebSocket interruption currently ends every client-side process
    handle, even though exec-server keeps the server session and its
    processes alive for a short time.
    
    This is especially visible for executor-backed stdio MCP servers: a
    temporary connection loss becomes a permanent `Transport closed` error.
    The server already has the information needed to resume the session, but
    the client opens a fresh session instead of using it.
    
    This change reconnects below the process and MCP layers. Existing
    process handles stay valid, missed output is recovered, and the same
    server-side processes continue running.
    
    ## State machine
    
    One logical `ExecServerClient` stays alive while its underlying RPC
    connection changes generations.
    
    ```text
                             transport closes
           +------------------------------------------------+
           |                                                v
    +-------------+                                  +-------------+
    |  Connected  |                                  | Recovering  |
    +-------------+                                  +-------------+
           ^                                                |
           | session resumed, processes caught up           | retryable error
           +------------------------------------------------+ loops until deadline
                                                            |
                                                            | deadline or permanent error
                                                            v
                                                      +-------------+
                                                      |   Failed    |
                                                      +-------------+
    ```
    
    ### `Connected`
    
    - New RPC calls use the current connection.
    - Process notifications are published in sequence order.
    - A disconnect only starts recovery if it came from the current
    connection generation. Late events from older generations cannot replace
    the active connection.
    
    ### `Recovering`
    
    - New calls wait instead of choosing a half-connected RPC client.
    - Existing process handles, wake subscriptions, and event subscriptions
    stay open.
    - Streaming HTTP response bodies fail immediately because their byte
    streams cannot be resumed safely.
    - Recovery first waits for process starts that were already in flight. A
    start whose result became ambiguous is cleaned up after reconnection
    instead of being silently adopted.
    - The client reconnects with the learned `session_id`. The server may
    briefly report that the old connection is still attached, so that error
    is retried until the detach finishes.
    - The notification consumer starts before the resume handshake
    completes. This prevents a busy process from filling the notification
    queue and blocking the initialize response.
    - Before installing the new connection, the client catches up every
    recoverable process with `process/read`.
    
    ### `Failed`
    
    - Recovery stops after 25 seconds or after a permanent error.
    - Waiting calls are released with one stable disconnect error.
    - Existing process sessions receive a terminal failure instead of
    waiting forever.
    
    ## Recovering process events
    
    Output, exit, and close events share one sequence. During normal
    operation, the client buffers early events until every lower sequence
    has been published.
    
    After reconnection, the client reads each process starting after its
    last published sequence:
    
    1. Retained output chunks are inserted by sequence number.
    2. Exit and close state are reconstructed in their sequence positions.
    3. Events already received as live notifications are ignored as
    duplicates.
    4. Newly contiguous events are published in order.
    5. If the server no longer retains enough output to fill a sequence gap,
    only that process is terminated and failed. The recovered connection
    remains usable for other processes.
    
    The server reports its full next event sequence for unbounded reads,
    including exit and close events. Closed processes remain readable for
    the same 30-second window used to retain detached sessions.
    
    ## Other details
    
    - Detached server sessions are retained for 30 seconds, leaving margin
    around the client's 25-second recovery deadline.
    - Session attach and detach update the active notification sender under
    the same attachment lock, so an old connection cannot clear a newly
    attached sender.
    - A dedicated error code distinguishes the temporary "session is still
    attached" race from permanent initialization errors.
    - Process starts are identity-checked on both client and server. Cleanup
    from an older start cannot remove a newer process that reused the same
    ID.
    - Mutating requests that were already in flight when the transport
    closed are not replayed, because the client cannot know whether the
    server applied them. Requests started after recovery is known wait for
    the replacement connection.
    - We assume the server/client version stays in sync (on the before/after
    this PR)
    
    ## User impact
    
    Long-running commands and stdio MCP servers can survive a temporary
    exec-server WebSocket interruption without changing process IDs or
    losing output produced during the outage.
  • [codex] Persist built-in image results reported as generating (#28656)
    ## Why
    
    #27920 stopped persisting image-generation items unless their status was
    `completed`, preventing failed standalone extension items with empty
    results from being saved. Built-in image generation can instead emit a
    terminal `response.output_item.done` containing a complete base64 PNG
    while the item status remains `generating`. In that case, app-server
    emits no `savedPath`, so Codex Apps can render the inline image but
    cannot expose a file artifact.
    
    ## What changed
    
    - Persist image-generation items whenever `result` contains image data.
    Failed terminal items still have empty results and remain unpersisted.
    - Update the existing built-in image-generation integration test to
    cover a terminal `generating` item and verify both `saved_path` and the
    written PNG bytes.
    
    ## Validation
    
    - Confirmed with a raw built-in websocket trace: the image progressed
    through `in_progress`, `generating`, and `partial_image`, then emitted
    one `response.output_item.done` with `status: "generating"` and a
    complete PNG result.
    - `just test -p codex-core builtin_image_generation_call_persisted` is
    currently blocked before test execution by a pre-existing compile error
    in `thread-store/src/thread_metadata_sync.rs:171`.
  • core: remove redundant TurnContext and Prompt fields (#28638)
    ## Why
    
    `TurnContext` had accumulated dead fields and cached projections of
    values already owned by its per-turn `Config` or `ModelInfo`. Keeping
    both copies made ownership unclear and allowed artificial split-brain
    states, such as a compatibility hash differing from the model metadata
    it came from.
    
    `Prompt` similarly carried a write-only personality after personality
    selection had already been materialized into its base instructions.
    
    This makes the canonical owner explicit: configuration-backed values
    come from `config`, model-derived values come from `model_info`, and
    prompts contain only data consumed by request construction.
    
    ## What changed
    
    - Remove the unused `ghost_snapshot`, `codex_self_exe`, and
    `thread_source` fields.
    - Remove duplicate `comp_hash`, `truncation_policy`, `features`,
    `shell_environment_policy`, `codex_linux_sandbox_exe`, `compact_prompt`,
    and `tool_mode` fields.
    - Read those values directly from `TurnContext::config` or
    `TurnContext::model_info` at their consumers.
    - Remove the write-only `Prompt::personality` field and its constructor
    assignments.
    - Preserve review-turn inheritance of the parent turn's shell policy,
    Linux sandbox executable, and compact prompt through the review config.
    
    ## Testing
    
    - `cargo check -p codex-core --tests`
  • Revert thread recencyAt for sidebar ordering (#28655)
    ## Why
    
    Revert #27910 to remove the newly introduced thread `recencyAt`
    persistence and API behavior from `main`.
    
    ## What changed
    
    This reverts commit `fac3158c2a783095768076489815f361fa9b0db4`,
    including the state migration, thread-store propagation, app-server API
    surface, generated schemas, and related tests.
    
    ## Validation
    
    Not run before opening; relying on CI for the initial fast signal.
  • [codex] Test code-mode variable truncation (#28471)
    ## Summary
    
    Code mode has two separate truncation points: the nested tool result
    returned to JavaScript and the code-mode output later recorded for the
    model. These tests now verify those behaviors independently.
    
    - Report whether `result.output` was truncated before printing it.
    - Verify omitted or sufficiently large nested limits produce `Variable
    truncated: False`, while allowing the printed value to be truncated
    downstream.
    - Verify an explicit nested limit produces `Variable truncated: True`
    when the command output exceeds it.
    - Use a token-policy model fixture so downstream truncation is visible
    as `…N tokens truncated…`.
    - Align the explicit nested-truncation expectation with the warning
    header.
    
    This PR changes test coverage only; runtime truncation behavior is
    unchanged.
    
    ## Validation
    
    - `env -u CODEX_SANDBOX_NETWORK_DISABLED RUST_MIN_STACK=8388608 cargo
    test -p codex-core --test all code_mode_exec -- --nocapture` (8 passed)
  • code-mode: move cell state into library actor (#28599)
    A code-mode cell is a single JavaScript execution that can produce
    output, call tools, wait for asynchronous work, resume, or be
    terminated. This PR extracts the existing per-cell run loop into a
    dedicated actor that owns the cell’s lifecycle state. It is primarily an
    ownership change rather than a new lifecycle contract: existing behavior
    now has one clear implementation boundary.
    
    ### Architecture
    The session service remains responsible for session-wide concerns:
    allocating cell IDs, storing shared values, creating cells, and routing
    requests to them.
    
    Once a cell is created, its execution state belongs to its actor.
    Callers interact with the actor through a handle. The actor receives two
    kinds of input: runtime events and control requests.
    
    A single event loop serializes these inputs and applies the lifecycle
    rules. It tracks the current observer—the caller waiting for an
    update—along with accumulated output, outstanding callbacks, runtime
    state, yield deadlines, and termination progress. Observation,
    termination, completion, and cleanup therefore have one consistent
    owner.
    
    When the runtime has no immediately runnable work and is waiting only on
    timers or tool results, the actor can return accumulated output and
    information about outstanding tool calls while keeping the cell
    available to resume. On completion or termination, it performs the
    appropriate callback cleanup before publishing the final result and
    removing the cell from the session.
    
    A small host interface connects the actor to session-owned facilities
    such as tool dispatch, notifications, stored values, and final cell
    removal, keeping those responsibilities outside the actor itself.
    
    ### Why
    Previously, cell lifecycle state and coordination lived alongside
    session management. The actor boundary makes each cell a self-contained
    state machine with a single writer, while the service becomes a registry
    and adapter around it.
    
    This makes lifecycle behavior easier to reason about and test in
    isolation. It also establishes a clean boundary for later changing where
    cells run or how they communicate without recreating their lifecycle
    rules.
  • [codex] Support object-valued plugin MCP manifests (#28580)
    ## Summary
    This fixes plugin manifest parsing for MCP servers declared as an object
    directly in `plugin.json`.
    
    Before this change, Codex modeled `mcpServers` as only a string path,
    for example:
    
    ```json
    {
      "name": "counter-sample",
      "version": "1.1.1",
      "mcpServers": "./.mcp.json"
    }
    ```
    
    Some migrated plugins instead provide the server map directly in the
    manifest:
    
    ```json
    {
      "name": "counter-sample",
      "version": "1.1.1",
      "description": "Plugin that declares MCP servers in the manifest",
      "mcpServers": {
        "counter": {
          "type": "http",
          "url": "https://sample.example/counter/mcp"
        }
      }
    }
    ```
    
    That object form previously failed during install/load with an error
    like:
    
    ```text
    failed to parse plugin manifest: invalid type: map, expected a string
    ```
    
    ## What changed
    - Add a manifest representation for `mcpServers` as either
    `Path(Resource)` or `Object(map)`.
    - Parse `plugin.json` `mcpServers` as either a string path or an object.
    - Route object-valued MCP server maps through the existing plugin MCP
    config parser instead of adding a second parser.
    - Apply existing per-plugin MCP server policy to object-valued MCP
    servers the same way as file-backed MCP servers.
    - Include object-valued MCP server names in plugin telemetry/capability
    metadata.
    - Support object-valued MCP config for executor plugins without
    requiring a `.mcp.json` filesystem read.
    - Update the bundled plugin-creator validator and `plugin-json-spec.md`
    so generated-plugin validation accepts the same object-valued shape.
    
    ## Compatibility
    Existing plugin manifests that use `"mcpServers": "./.mcp.json"`
    continue to work. Plugins can now also use the object shape shown above.
    
    ## Tests
    Added coverage for the new manifest attribute shape at the install,
    normal load, telemetry, and executor-provider layers:
    
    - `install_accepts_manifest_mcp_server_objects`
    - `load_plugins_loads_manifest_mcp_server_objects`
    - `plugin_telemetry_metadata_uses_manifest_mcp_server_objects`
    - `reads_manifest_object_config_without_executor_file_system_access`
    
    Also smoke-tested the plugin-creator validator against both supported
    forms:
    
    - `mcpServers` as a direct object in `plugin.json`
    - `mcpServers` as `"./.mcp.json"` with a companion `.mcp.json`
    
    ## Validation
    - `just test -p codex-plugin`
    - `just test -p codex-core-plugins`
    - `just test -p codex-mcp-extension`
    - `just bazel-lock-update`
    - `just bazel-lock-check`
    - `just fmt`
    - `git diff --check`
    - Focused rename/object-form rerun: `just test -p codex-core-plugins
    manager::tests::load_plugins_loads_manifest_mcp_server_objects
    manager::tests::plugin_telemetry_metadata_uses_manifest_mcp_server_objects
    store::tests::install_accepts_manifest_mcp_server_objects`
    - Focused executor rerun: `just test -p codex-mcp-extension
    executor_plugin::provider::tests::reads_manifest_object_config_without_executor_file_system_access`
    - `python3
    codex-rs/skills/src/assets/samples/plugin-creator/scripts/validate_plugin.py
    /private/tmp/codex-validator-object`
    - `python3
    codex-rs/skills/src/assets/samples/plugin-creator/scripts/validate_plugin.py
    /private/tmp/codex-validator-path`
  • thread-store: fix response fixture compilation (#28642)
    ## Why
    
    A `codex-thread-store` test fixture still constructs
    `ResponseItem::FunctionCallOutput` without its required `metadata`
    field, preventing the crate's test targets from compiling on `main`.
    
    ## What changed
    
    - Set the fixture's response-item metadata to `None`.
    
    ## Testing
    
    - `cargo check -p codex-thread-store --tests`
  • [codex] core: restore absolute turn context cwd (#28629)
    ## Why
    
    #28152 jumped the gun on moving the rollout format to store URIs, and
    would likely break compat with some features that don't go through the
    same types as the core logic.
    
    ## What
    
    Make `TurnContextItem.cwd` an `AbsolutePathBuf` again, remove test added
    for `PathUri` serialization in rollouts. Also drops a bunch of error
    paths that are no longer needed.
  • [codex] Gate remote plugin catalog by auth (#28625)
    ## Summary
    
    - Treat the remote global plugin catalog as active only when
    `remote_plugin` is enabled and the current auth uses the Codex backend.
    - Skip the local OpenAI curated marketplace for remote-enabled ChatGPT
    users while preserving configured marketplaces.
    - Keep the local curated marketplace for API-key users, unauthenticated
    fallback, and ChatGPT users with `remote_plugin` disabled.
    - Apply the same effective-remote gate to the remote
    installed-marketplace cache.
    
    ## Root cause
    
    The tool-suggestion discovery path unconditionally included the local
    OpenAI curated marketplace. For remote-enabled ChatGPT users, that made
    remote discovery additive: Codex parsed every local curated
    `plugin.json` before also loading the remote catalog.
    
    ## Validation
    
    - `just fmt`
    - `cargo build -p codex-cli --bin codex`
    - Targeted auth/feature matrix tests pass, including API-key auth with
    `remote_plugin` enabled.
    - Manual CLI validation confirmed:
      - ChatGPT + remote off includes local curated.
      - ChatGPT + remote on excludes local curated.
      - API-key auth keeps local curated when remote is enabled.
    - `just test -p codex-core-plugins`: 235 passed; one unrelated existing
    marketplace test failed because it loaded the developer's home
    marketplace configuration.
  • Revert "Tell codex about PathUri serde compat. (#28595)" (#28627)
    This reverts commit bd2a786326, which
    didn't capture all the nuance we need for this migration.
  • Add thread recencyAt for sidebar ordering (#27910)
    ## Summary
    
    Add a server-owned `recencyAt` timestamp and `recency_at` thread-list
    sort key for product recency ordering while preserving the existing
    meaning of `updatedAt` as the latest persisted thread mutation.
    
    This is the server-side alternative to #27697. Rather than narrowing
    `updatedAt`, clients can sort the sidebar by `recency_at` and continue
    treating `updatedAt` as mutation time.
    
    Paired Codex Apps PR:
    [openai/openai#1024599](https://github.com/openai/openai/pull/1024599)
    
    ## Contract
    
    - `recencyAt` initializes when a thread is created.
    - A turn start advances `recencyAt` monotonically.
    - Commentary, agent output, tool results, token/accounting updates, turn
    completion, archive, unarchive, resume, and generic metadata writes do
    not advance it.
    - `updatedAt` retains its existing behavior and continues to advance for
    persisted thread mutations.
    - Current servers populate `recencyAt`; the response field is optional
    in generated TypeScript so clients connected to older servers can fall
    back to `updatedAt`.
    - Filesystem-only fallback uses existing updated/mtime ordering when
    SQLite is unavailable.
    
    ## Persistence and compatibility
    
    Migration 0038 adds second- and millisecond-precision recency columns,
    backfills them from the existing updated timestamp, creates list
    indexes, and includes an insert trigger so older binaries writing to a
    migrated database seed recency without causing later mutations to
    advance it.
    
    Generic metadata upserts preserve existing recency values. Turn-start
    updates use a dedicated monotonic touch, and process-local allocation
    keeps millisecond cursor values unique. State DB list, search, read,
    filtered-list repair, rollout fallback propagation, and app-server
    conversions all carry the new field.
    
    ## API
    
    `Thread` responses include:
    
    ```ts
    recencyAt?: number
    ```
    
    `thread/list` and `thread/search` accept:
    
    ```json
    { "sortKey": "recency_at" }
    ```
    
    Generated TypeScript and JSON schemas are included.
    
    ## Validation
    
    - `just test -p codex-state` — 146 passed
    - `just test -p codex-rollout` — 69 passed
    - `just test -p codex-thread-store` — 81 passed
    - `just test -p codex-app-server-protocol` — 231 passed
    - Focused app-server list ordering, response mapping, archive/unarchive,
    and resume lifecycle tests passed
    - Scoped `just fix` for state, rollout, thread-store,
    app-server-protocol, and app-server
    - `just fmt`
    - `git diff --check`
    - Independent correctness, simplicity, elegance, security, and
    test-quality reviews; actionable ordering, lifecycle, query-projection,
    and timestamp-uniqueness findings were addressed
  • PAC 1 - Add system proxy feature config surface (#26706)
    ## Summary
    
    Introduces the default-off `respect_system_proxy` feature flag used to
    gate first-class system PAC/proxy support for Codex-owned native
    clients.
    
    With the feature disabled or absent, behavior remains unchanged. This PR
    establishes the configuration and managed-requirement surface; proxy
    discovery and request routing are implemented by follow-up PRs.
    
    ## Configuration
    
    User configuration uses the standard boolean feature form:
    
    ```toml
    [features]
    respect_system_proxy = true
    ```
    
    Managed feature requirements use the corresponding boolean key. The
    effective runtime configuration is exposed as a boolean and defaults to
    `false`.
    
    ## Implementation
    
    - Registers `respect_system_proxy` as an under-development, default-off
    feature.
    - Resolves user configuration and managed feature requirements into
    `Config.respect_system_proxy`.
    - Provides bootstrap resolution for startup paths that must evaluate the
    feature before full configuration loading completes.
    - Uses the standard feature CLI and config-editing behavior.
    - Excludes `features.respect_system_proxy` from project-local
    configuration.
    - Updates the generated configuration schema.
    
    ## End-user behavior
    
    - No networking behavior changes when the feature is absent or disabled.
    - Enabling the feature makes the boolean available to the native
    proxy-routing implementation in follow-up PRs.
    - Repository-local configuration cannot enable the feature.
    
    ## Test coverage
    
    Covers scalar configuration and CLI override resolution, managed
    requirement constraints, bootstrap resolution, and project-local
    filtering.
  • [codex] [4/4] Simplify recommended plugin install schema (#28403)
    ## Summary
    - Simplify recommendation-context `request_plugin_install` arguments to
    `plugin_id` and `suggest_reason`.
    - Derive plugin type and install action from the matched candidate while
    preserving Codex-owned elicitation metadata.
    - Keep the legacy list-backed schema unchanged and accept resumed calls
    that still use `tool_id`.
    
    ## Stack
    - #28399
    - #28400
    - #27704
    - This PR
    
    ## Validation
    - `just test -p codex-tools -p codex-core request_plugin_install` (25
    passed)
    - `just fix -p codex-tools -p codex-core`
    - `just fmt`
    - `git diff --check`
  • core: render remote environment cwd natively (#28152)
    ## Why
    
    Model-visible `<environment_context>` should match the environment of
    the executor, not of the app server.
    
    Stacked on #28146.
    
    ## What
    
    - Keep selected environment cwd values as `PathUri` while building
    environment context.
    - Render cwd text using the path convention represented by the URI, with
    the canonical URI as a fallback.
    - Preserve compatibility with legacy `TurnContextItem.cwd` values when
    reconstructing and diffing context.
    - Extend the Wine-backed remote Windows test to assert that the model
    sees `powershell` and `C:\windows`.
  • [codex] [3/4] Activate endpoint plugin recommendations (#27704)
    Summary\n- Await endpoint recommendation selection while constructing
    each authenticated turn, removing the first-turn cache race.\n- Snapshot
    and filter endpoint candidates once per turn, then use that same set for
    the bounded contextual user fragment, tool exposure, and exact install
    validation.\n- Keep recommendation selection ephemeral: do not persist
    recommendation state in or gate resumed threads on prior context.\n-
    Hide the legacy list tool in endpoint mode and preserve legacy discovery
    unchanged when the endpoint is disabled or unavailable.\n- Keep remote
    plugin and connector app identities out of model-visible context and
    attach them only to Codex-owned elicitation metadata.\n\nStack\n- 3/4,
    based on #28400.\n- Endpoint client and cache: #28399.\n- Generalized
    suggestion presentation: #28400.\n- Install-schema follow-up:
    #28403.\n\nValidation\n- \n- \n- \n- \n- Full : 2,649 passed and 88
    environment-dependent tests failed because this sandbox cannot write ,
    nest Seatbelt, or locate auxiliary test binaries.
  • [codex] [2/4] Generalize plugin suggestion presentation (#28400)
    Summary
    - Add list-backed and developer-context presentations for plugin
    suggestion candidates.
    - Let tool planning, install validation, and request-tool copy follow
    the selected presentation.
    - Keep every production caller on the existing list-backed presentation,
    preserving the current list tool, request schema, connector behavior,
    and model-visible copy.
    - Leave developer-context presentation latent until the final PR in the
    stack.
    
    Stack
    - 2/3, based on #28399.
    - Follow-up: #27704 activates endpoint recommendations.
    
    Validation
    - `just test -p codex-core request_plugin_install`
    - `just test -p codex-core spec_plan`
    - `just fix -p codex-core`
    - `just fmt`
    - `git diff --check`
  • [codex] [1/4] Add recommended plugin endpoint cache (#28399)
    Summary
    - Add authenticated parsing for `/ps/plugins/suggested?scope=GLOBAL`,
    including remote plugin and connector app identities.
    - Validate, deduplicate, sort, and cap endpoint candidates before
    caching them by backend and account identity.
    - Deduplicate concurrent cache misses and warm recommendations from the
    existing remote-installed-plugin refresh path used at startup and after
    account changes.
    - Keep endpoint results model-invisible in this PR; failures and
    responses without `enabled: true` resolve to legacy mode.
    
    Stack
    - 1/3. Follow-up: #28400 generalizes plugin suggestion presentation
    without activating endpoint recommendations.
    - Final activation: #27704.
    
    Validation
    - `just test -p codex-core-plugins recommended_plugins`
    - `just fix -p codex-core-plugins`
    - `just fmt`
    - `git diff --check`
  • Tell codex about PathUri serde compat. (#28595)
    This addresses another wrinkle I keep having to re-prompt codex about
    when migrating to cross-OS paths.
  • app-server: preserve target-native environment cwd (#28146)
    ## Why
    
    app-server may run on a different OS from the selected exec-server
    environment. Parsing that environment’s cwd with the Codex host’s path
    rules prevents thread startup.
    
    ## What
    
    Carry environment cwd values as `LegacyAppPathString` at the app-server
    boundary and `PathUri` internally. Existing tool-call schemas and
    relative-path behavior stay host-native; remaining local-only consumers
    convert explicitly and leave follow-up TODOs.
    
    The Wine integration test verifies app-server can start a thread and
    complete an ordinary turn with a Windows environment cwd from Linux.
    
    ## Validation
    
    - `bazel test //codex-rs/core/tests/remote_env_windows:smoke-test
    --test_output=errors`
    - focused app-server environment-selection and protocol schema tests
    - scoped Clippy for `codex-core` and `codex-app-server-protocol`
  • Record invariants for path migration. (#28589)
    ## Why
    
    Help Codex understand how to execute the migration to support cross-OS
    paths.
    
    ## What
    
    Expand the path-types skill with our goals and constraints.
  • Clarify model-generated and legacy app path types (#28577)
    ## Why
    
    `ApiPathString` kind of implies that it can be used anywhere we pull a
    path out of JSON, but it's not really appropriate for tool arguments
    when the model might generate relative paths.
    
    Prefer `String` for model-generated paths and we can handle the
    conversion per feature for now and define a shared abstraction later if
    it makes sense.
    
    # What
    
    Rename `ApiPathString` to `AppLegacyPathString` to clarify its role.
    
    Expand the `path-types` skill to tell the model to leave tool args as
    bare strings.