1446 Commits

  • [codex] Use model metadata for skills usage instructions (#29740)
    ## Summary
    
    - add a false-by-default `include_skills_usage_instructions` model
    metadata field
    - enable the field for the bundled `gpt-5.5` model metadata
    - consume the metadata in both core and extension skill rendering
    - remove hardcoded legacy-model matching and its marker plumbing
  • [codex] Enable remote plugins by default (#30297)
    ## Summary
    
    - enable the remote plugin feature by default
    - promote the remote plugin feature from under development to stable
    - preserve the existing `features.remote_plugin` override for explicitly
    disabling it
    - keep legacy disabled-path coverage explicit in TUI and app-server
    tests
    
    ## Impact
    
    Remote plugin functionality is enabled by default for configurations
    that do not set the feature flag. The existing Codex backend
    authentication gate still applies.
    
    ## Validation
    
    - `just fmt`
    - `just test -p codex-features`
    - `just test -p codex-tui
    plugins_popup_remote_section_fallback_states_snapshot`
    - targeted `codex-app-server` plugin-list and skills-list tests
    - `git diff --check`
    
    The full TUI and app-server suites were also exercised locally. All
    remote-plugin-related coverage passed; unrelated local
    sandbox/test-binary failures remain outside this change.
  • [app-server] increase currentTime/read timeout (#30384)
    ## Summary
    
    Increase the external currentTime/read request timeout from 5 seconds to
    10 seconds.
    
    ## Validation
    
    - just fmt
    - Focused app-server test build was stopped to defer validation to CI.
  • [app-server] expose environment info RPC (#30291)
    ## Why
    
    App-server clients that configure named execution environments need to
    discover an environment's shell and working directory before selecting
    it for a thread or turn. Because the environment can run on a different
    operating system than app-server, its working directory is represented
    as a canonical `file:` URI rather than a host-local path string. The
    probe also needs a bounded response time: an exec-server that completes
    initialization but never answers `environment/info` must not hold the
    environment serialization queue indefinitely.
    
    ## What changed
    
    - Add an experimental `environment/info` app-server RPC for named
    environments.
    - Route the probe through the managed environment connection and return
    target-native shell metadata plus the default working directory as a
    `PathUri`.
    - Return connection and protocol failures as JSON-RPC errors.
    - Bound the exec-server probe response to 30 seconds and remove
    timed-out calls from the pending-request table so later environment
    mutations can proceed.
    - Cover successful responses, omitted working directories, unknown
    environments, connection failures, and pending-call cleanup.
    
    ## Protocol examples
    
    Request:
    
    ```json
    {
      "id": 42,
      "method": "environment/info",
      "params": {
        "environmentId": "remote-a"
      }
    }
    ```
    
    Successful response:
    
    ```json
    {
      "id": 42,
      "result": {
        "shell": {
          "name": "zsh",
          "path": "/bin/zsh"
        },
        "cwd": "file:///workspace"
      }
    }
    ```
    
    If the exec-server initializes but does not answer the probe within 30
    seconds:
    
    ```json
    {
      "id": 42,
      "error": {
        "code": -32603,
        "message": "failed to get info for environment `remote-a`: exec-server protocol error: timed out waiting for exec-server `environment/info` response after 30s"
      }
    }
    ```
    
    ## Testing
    
    - App-server integration coverage for successful info (including omitted
    `cwd`), unknown environments, and connection failures.
    - Exec-server RPC coverage verifying a timed-out call is removed from
    the pending-request table.
    
    ---------
    
    Co-authored-by: Michael Bolin <mbolin@openai.com>
  • app-server: structure and test JSON shutdown logs (#30314)
    ## Why
    
    `LOG_FORMAT=json` and `RUST_LOG` are supported by app-server, but the
    behavior was only covered indirectly. We should verify the actual JSONL
    written by both user-facing entry points: `codex app-server` and the
    standalone `codex-app-server` binary.
    
    The existing processor shutdown message also always said the channel
    closed, even though the processor can exit for several different
    reasons. Structured fields make that event more accurate and useful to
    log consumers.
    
    ## What changed
    
    - Record the processor `exit_reason`, remaining connection count, and
    forced-shutdown state as structured tracing fields.
    - Add a shared process-test helper that enables JSON logging, validates
    every stderr line as JSON, and verifies the top-level timestamp is RFC
    3339.
    - Cover both `codex app-server` and `codex-app-server`, asserting the
    stable `level`, `fields`, and `target` payload.
    
    ## Test plan
    
    - `just test -p codex-app-server
    standalone_app_server_emits_json_info_events`
    - `just test -p codex-cli app_server_emits_json_info_events`
  • [codex] Support npm marketplace plugin sources (#29375)
    ## Why
    
    Marketplace source deserialization treated `{"source":"npm", ...}` as
    unsupported. The loader logged and skipped the entry, so npm-backed
    plugins never appeared in `plugin list --available` and `plugin add`
    returned "plugin not found".
    
    Codex plugins are installed from a plugin root, not from an npm
    dependency tree. For npm-backed marketplace entries, Codex should fetch
    the published package contents without running package scripts or
    installing unrelated dependencies.
    
    ## What changed
    
    - Add `npm` marketplace plugin sources with `package`, optional semver
    `version` or version range, and optional HTTPS `registry`.
    - Reject unsafe npm source fields before materialization, including
    invalid package names, non-semver version selectors, plaintext or
    credential-bearing registry URLs, and registry query/fragment data.
    - Materialize npm plugins with `npm pack --ignore-scripts`, then unpack
    the resulting tarball through the existing hardened plugin bundle
    extractor.
    - Enforce npm archive and extracted-size limits, require the standard
    npm `package/` archive root, and verify the extracted `package.json`
    name matches the requested package before installing.
    - Keep plugin listings, install-source descriptions, CLI JSON/human
    output, app-server v2 `PluginSource`, TUI source summaries, regenerated
    schema fixtures, and app-server documentation in sync.
    
    ## Impact
    
    Marketplaces can distribute Codex plugins from public or configured
    private HTTPS npm registries using the same install flow as existing
    materialized plugin sources. `npm` must be available on `PATH` when an
    npm-backed plugin is installed.
    
    Fixes #27831
    
    ## Validation
    
    - `just write-app-server-schema`
    - `just test -p codex-core-plugins -p codex-app-server-protocol -p
    codex-app-server -p codex-cli`
      - npm/schema/core-plugin coverage passed in the run.
    - The full focused command finished with `1739 passed`, `11 failed`, and
    `6 timed out`; the failures were unrelated local app-server environment
    failures from `sandbox-exec: sandbox_apply: Operation not permitted`
    plus one missing `test_stdio_server` helper binary.
    - Installed an npm-published Codex plugin package through a throwaway
    local marketplace and throwaway `CODEX_HOME` to exercise the real npm
    materialization path end to end.
  • feat(app-server): add optional turn_id to thread/fork (#30277)
    ## Description
    
    This adds stable optional `turnId` support to `thread/fork`. When
    supplied, the fork copies persisted history through that terminal turn,
    inclusive, and drops later turns from the new thread.
    
    Omitting or passing `null` preserves the existing full-history fork
    behavior, including the interruption marker when the stored source
    history ends mid-turn.
    
    ## Why
    
    We're deprecating `thread/rollback` and this will help certain UX use
    cases work around it by using `thread/fork` + `turn_id` instead.
  • [codex] allow AGENTS.md and skills to authorize delegation (#30274)
    Prompt update of MAv2 to include agents.md and skills more explicitly
    
    should mimic: https://github.com/openai/codex/pull/27919
  • [codex] Add managed new-thread model settings (#29683)
    ## Why
    
    Admins need persistent defaults for the model, reasoning effort, and
    service tier shown when the Desktop App creates a new thread. These are
    initialization defaults rather than runtime constraints: the App should
    use them to initialize its draft while still allowing a user to make an
    explicit selection.
    
    The app-server therefore needs to expose the managed values before
    thread creation without changing `thread/start` behavior for other
    clients.
    
    ## What changed
    
    - Parse `model`, `model_reasoning_effort`, and `service_tier` from
    `[models.new_thread]` in `requirements.toml`.
    - Compose the `models` requirements through the existing
    requirements-layer precedence rules.
    - Expose the resolved values through `configRequirements/read` as
    `requirements.models.newThread`.
    - Add the corresponding app-server protocol types and regenerate the
    JSON and TypeScript schema fixtures.
    - Document the new `configRequirements/read` fields in the app-server
    README.
    
    ## Scope
    
    This PR is data plumbing only. It does not apply these values during
    `thread/start` and does not change thread creation for existing
    app-server clients, resumed or forked sessions, internal or subagent
    sessions, `codex exec`, or the TUI. A companion Desktop App change owns
    draft initialization, sends the effective settings for ordinary and
    prewarmed starts, and preserves explicit user changes.
    
    ## Validation
    
    - Requirements deserialization coverage for `[models.new_thread]`
    - Requirements-layer precedence coverage
    - App-server API mapping coverage
    - `configRequirements/read` integration coverage
    - Regenerated app-server JSON and TypeScript schema fixtures
  • feat(app-server): add history_mode to thread (#29927)
    ## Description
    
    This PR adds a new `historyMode = "legacy" | "paginated"` to `Thread`.
    This will be stored in `SessionMeta` in the JSONL rollout file and as a
    new column in the SQLite thread_metadata table, and exposed on
    `thread/start` and on the `Thread` object in app-server.
    
    ## What changed
    
    - Added canonical `ThreadHistoryMode` with `legacy` and `paginated`,
    defaulting old and new SessionMeta to `legacy`.
    - Carried `history_mode` through core session config, ThreadStore stored
    metadata, local/in-memory stores, rollout metadata extraction, and the
    existing SQLite `threads` table.
    - Added experimental `historyMode` to app-server v2 `Thread` and
    `thread/start`.
    - Made paginated stored threads metadata-discoverable but unsupported
    for legacy full-history reads, `load_history`, live resume, and create
    paths.
    - Regenerated app-server schema fixtures and added
    protocol/state/thread-store/app-server coverage for persistence and
    fail-closed behavior.
    
    ## Compatibility floor
    Because users may be running various versions of Codex binaries on the
    same machine (TUI, Codex App, etc.), we will need to establish a
    compatibility floor for upcoming paginated threads, which will change
    how thread storage reads and writes work.
    
    The overall plan here:
    ```
    Release N:
    - Add historyMode to SessionMeta / Thread / SQLite metadata.
    - Teach binaries to understand paginated threads.
    - If a binary sees `historyMode="paginated"` but does not support the paginated contract, it refuses to resume/mutate the thread.
    - Default remains `"legacy"`.
    
    Release N+1:
    - First-party clients start opting into paginated threads where appropriate.
    - Internal dogfood / staged rollout.
    - Measure old-client usage and paginated-thread unsupported errors.
    
    Release N+2:
    - Only after Release N+ is overwhelmingly deployed, make paginated the default.
    - Accept that a small tail of N-1-or-older binaries may not understand paginated threads.
    ```
    
    The important behavior change is fail-closed handling for a binary that
    encounters a persisted `paginated` thread before it knows how to fully
    support paginated history. In app-server, if a thread is `paginated`, we
    will:
    
    - allow metadata-only discovery paths like `thread/list` and
    `thread/read(includeTurns=false)`, so clients can still see the thread
    and inspect its `historyMode`
    - reject legacy full-history/live-thread paths like
    `thread/read(includeTurns=true)` and `thread/resume` with an unsupported
    JSON-RPC error
    - avoid silently treating an unknown or future `historyMode` as `legacy`
    
    Under the hood, the ThreadStore layer also rejects legacy operations
    that would need to load or replay the full thread history for a
    paginated thread. That gives us the behavior we want for Release N:
    future paginated threads are visible, but this binary fails closed
    instead of trying to operate on them as if they were legacy threads.
  • Test selected capabilities across unavailable resume (#30215)
    ## Why
    
    The selected-capability integration test already covers initial
    attachment and cold resume, but it resumes while the selected executor
    is still reachable.
    
    That leaves an important World State transition untested: a thread
    remembers its selected capability root, resumes while that environment
    is unavailable, and later sees the same stable environment return.
    
    ## What this tests
    
    This extends the existing end-to-end scenario:
    
    ```text
    selected executor available
            ↓
    app-server stops and the executor goes away
            ↓
    thread resumes with the executor unavailable
            ↓
    skills, selected MCP tools, and connector attribution are absent
            ↓
    the same environment ID is attached again
            ↓
    skills, MCP tools, and connector attribution return
    ```
    
    The test also checks that the unavailable snapshot explicitly tells the
    model that no selected-environment skills are currently available. After
    reattachment, it invokes the selected skill again and verifies that a
    new executor-owned MCP process starts.
    
    ## Scope
    
    This is test-only. It keeps the existing assumption that an environment
    ID refers to stable capability contents. It does not add package-file
    invalidation or live transport reconnect behavior.
  • Test selected capabilities across availability and resume (#30157)
    ## Why
    
    This stack crosses World State, executor skills, selected plugin
    metadata, MCP processes, connectors, dynamic environments, and resume.
    This PR adds two end-to-end scenarios that validate those pieces
    together.
    
    Both tests enable `deferred_executor`, so they exercise the real
    delayed-environment path.
    
    ## Scenario 1: availability across turns and resume
    
    ```text
    1. Start a thread with one selected plugin root bound to E1.
    2. E1 is unavailable.
       - executor skill is absent
       - selected MCP is absent
       - connector has no selected-plugin attribution
    3. Start E1 and register the same stable environment ID.
    4. Start a new turn.
       - the executor skill appears through World State
       - its body beats a colliding host skill
       - the selected MCP tool is advertised and executes inside E1
       - the connector is attributed to the selected plugin
    5. Start another turn without changing E1.
       - the MCP PID stays the same, proving runtime reuse
    6. Restart app-server and resume the thread.
       - durable selected-root intent is restored
       - skills, MCP, and connector attribution are restored
       - a new MCP PID proves ephemeral process state was rebuilt
    ```
    
    ## Scenario 2: availability changes inside one turn
    
    ```text
    1. Start a turn while E1 is unavailable.
    2. The first model sample sees no executor skill, MCP, or selected connector.
    3. The turn pauses on request_user_input.
    4. Start E1 and register it while that same turn is still active.
    5. Continue the turn.
    6. The very next model sample sees:
       - the executor skill catalog
       - the selected MCP tool
       - selected-plugin connector attribution
    7. The model calls the MCP, and its output proves execution happened inside E1.
    ```
    
    This second scenario specifically protects the aeon-style behavior:
    capability state is captured again for every sampling step, not only at
    the next user turn.
    
    ## Scope
    
    These are integration tests only. They do not add a combinatorial matrix
    for unsupported plugin-file mutation, environment generations, transport
    disconnects, or delayed `required = true` executor MCPs.
  • Expose MCP app identity in app context (#29934)
    ## Why
    
    MCP tool-call events need to expose trusted app identity and action
    metadata directly so v2 clients do not have to infer it from tool names
    or resource URIs.
    
    ## What changed
    
    - Add optional `appName`, `templateId`, and `actionName` fields to MCP
    tool-call `appContext`.
    - Populate `appName` and `templateId` from trusted Codex Apps metadata,
    and derive `actionName` from the trusted app resource metadata.
    - Preserve all three fields through core events, legacy protocol events,
    persisted thread history, resume redaction, and app-server v2 responses.
    - Document the public `appContext` fields in
    `codex-rs/app-server/README.md`.
    - Regenerate app-server JSON and TypeScript schemas and add coverage for
    serialization, persistence, redaction, and metadata propagation.
    
    ## Validation
    
    - `just test -p codex-app-server-protocol mcp_tool_call`
    - `just test -p codex-core
    mcp_tool_call_item_metadata_only_trusts_codex_apps_identity
    mcp_tool_call_item_includes_app_identity`
    - `just write-app-server-schema`
    
    ---------
    
    Co-authored-by: Martin Au-Yeung <280153141+martinauyeung-oai@users.noreply.github.com>
  • Keep MCP elicitation routable across runtime refreshes (#30127)
    ## Why
    
    An MCP tool call can still be waiting for an elicitation response when
    an environment update replaces the thread's MCP runtime.
    
    Before this change:
    
    ```text
    runtime A starts a tool call and asks the user
    environment becomes ready, so runtime B is published
    client answers the prompt through runtime B
    runtime B cannot find runtime A's pending responder
    ```
    
    The response is lost and the original tool call stays blocked.
    
    ## What changed
    
    All MCP runtimes for one thread now share a small elicitation router:
    
    ```text
    runtime A ---\
                   shared router: response token -> exact pending responder
    runtime B ---/
    ```
    
    When Codex surfaces an MCP elicitation, it assigns a unique opaque
    response token. The router records which pending request owns that
    token. A replacement runtime reuses the same router, so the latest
    runtime can deliver a response to a request started by the previous
    runtime.
    
    The Codex-owned token also prevents two runtime connections that reuse
    the same MCP server request ID from receiving each other's responses.
    
    This does not retain or search old MCP managers. Only the pending
    responder map is shared.
    
    ## Covered scenario
    
    The integration test exercises the complete failure mode:
    
    1. A thread starts while its selected environment is still unavailable.
    2. A configured MCP server starts a tool call and asks the client for
    input.
    3. The environment becomes ready, causing Codex to publish a replacement
    MCP runtime.
    4. The client answers the original prompt after the replacement.
    5. The original tool call receives that answer and completes.
    
    A focused routing test also creates two runtimes with the same server
    request ID and verifies that each response reaches the exact request
    that emitted its token.
    
    ## Scope
    
    This PR changes only elicitation response routing across MCP runtime
    replacement. It does not change when runtimes are rebuilt, which
    environments contribute MCP configuration, or how environment
    availability is detected.
  • [codex] Attribute app-server analytics by thread originator (#29935)
    ## Why
    
    Desktop Work threads and regular Codex threads can share the same
    app-server connection. App-server analytics currently copy
    `product_client_id` from connection metadata for every thread-scoped
    event, so Work thread activity is attributed to the Desktop connection
    instead of the thread's resolved originator. This prevents analytics
    from distinguishing the two products on a shared connection.
    
    ## What changed
    
    - Publish the resolved originator after a thread is materialized,
    covering new, resumed, forked, and subagent threads.
    - Store that originator in the analytics reducer's existing per-thread
    state.
    - Override only `app_server_client.product_client_id` for thread, turn,
    tool, review, goal, guardian, and compaction events while preserving the
    connection's client name, version, and transport metadata.
    - Fall back to the connection-wide product client ID when a thread has
    no originator override.
    - Preserve persisted originators in thread initialization analytics for
    resume and fork flows.
    
    ## Validation
    
    - `just test -p codex-analytics
    thread_originator_overrides_shared_connection_across_thread_events
    subagent_events_keep_thread_originator_with_explicit_turn_connection`
    - `just test -p codex-app-server
    turn_start_tracks_thread_originator_in_analytics
    thread_start_tracks_thread_initialized_analytics
    thread_fork_tracks_thread_initialized_analytics
    thread_resume_tracks_thread_initialized_analytics`
    - `just test -p codex-core thread_manager`
  • Project selected plugin runtime by environment availability (#30093)
    ## Why
    
    Selected plugin metadata is stable, but MCP processes are live runtime
    state. They need different lifetimes:
    
    - the MCP extension caches manifest, MCP, and connector declarations for
    each stable selected root;
    - each model step projects that cached metadata through the roots that
    resolved as ready for that exact step;
    - the MCP manager is rebuilt only when that availability projection
    changes.
    
    This matches executor skills: both features consume the same resolved
    step roots instead of inferring readiness from the turn's selected
    environments.
    
    ## Behavior
    
    ```text
    E1 not ready for this step
      -> no E1 MCP servers or connectors
      -> cached plugin metadata stays in ext/mcp
    
    E1 becomes ready
      -> reuse cached metadata
      -> publish one MCP runtime containing E1 capabilities
    
    same ready roots on the next step
      -> reuse the exact runtime; no rediscovery and no MCP restart
    
    resume
      -> create new extension thread state and a new MCP runtime
    ```
    
    All model-facing consumers use the same step snapshot:
    
    ```text
    resolved selected roots
            |
            v
    extension MCP/connector projection
            |
            v
    { MCP config, connector snapshot, MCP manager }
            |
            +-> advertise model tools
            +-> build app/connector tools
            +-> execute MCP calls
    ```
    
    ## Cache contract
    
    The existing MCP extension owns a cache keyed by the full
    `SelectedCapabilityRoot`:
    
    ```rust
    let state = thread_store.get_or_init(SelectedExecutorPluginMcpState::default);
    ```
    
    The cache lives with extension thread state. Environment availability
    filters projection but does not invalidate metadata. Resume creates new
    thread state. There is no file watcher or executor generation because
    contents behind a stable environment/root are assumed stable.
    
    ## What changes
    
    - Keeps executor plugin discovery and cached metadata in `ext/mcp`.
    - Caches MCP and connector declarations together per selected root.
    - Uses the step's already-resolved capability roots, including lazy
    environments that are not turn environments.
    - Reuses the current MCP runtime when the ready-root projection is
    unchanged.
    - Uses the same step MCP manager and connector snapshot for
    model-visible tools and execution.
    - Resolves direct thread-scoped MCP requests from the current
    selected-root projection.
    
    ## Deliberately out of scope
    
    - `app/list` remains based on the latest global host-plugin state; this
    PR does not make its response or notifications thread-specific.
    - `required = true` startup semantics do not apply to delayed executor
    MCP activation.
    - No filesystem/content invalidation.
    - No transport-disconnect watcher.
    - No executor generations or environment replacement semantics.
    - No client sharing across complete manager replacements.
    
    ## Stack
    
    1. Extension-owned World State sections.
    2. Project executor skills through World State.
    3. Pin one MCP runtime to each model step.
    4. **This PR:** project selected MCP and connector state from
    extension-owned metadata.
    5. Integration coverage for selected capability availability and resume.
    
    ## Verification
    
    -
    `selected_plugin_servers_use_managed_requirements_for_the_selected_root_id`
    - The stacked integration PR covers unavailable to ready activation,
    unchanged-runtime reuse, skills, MCP tools, connector attribution, and
    cold resume.
  • Pin MCP runtimes to model steps (#30101)
    ## Why
    
    An MCP refresh can replace the session's current manager while a model
    step is still running. The step must execute calls through the same
    manager whose tools it advertised.
    
    ## Boundary
    
    ```text
    current session MCP runtime
              |
              | capture once for this model step
              v
    StepContext.mcp
      - exact MCP config
      - exact connection manager
      - exact runtime environment context
    ```
    
    ```rust
    pub struct McpRuntimeSnapshot {
        config: Arc<McpConfig>,
        manager: Arc<McpConnectionManager>,
        runtime_context: McpRuntimeContext,
    }
    ```
    
    ## Example
    
    ```text
    step A captures runtime A and advertises A's tools
    refresh publishes runtime B
    step A tool call -> runtime A
    next step        -> runtime B
    ```
    
    Capturing the snapshot is only an `Arc` clone. It does not restart MCPs
    or make an RPC.
    
    ## What changes
    
    - Captures one MCP runtime in `StepContext`.
    - Uses it for tool planning, tool calls, resources, approvals, connector
    attribution, and elicitation.
    - Publishes replacement runtimes atomically.
    - Lets an old runtime live only while an in-flight step or request still
    holds its `Arc`.
    
    Most of this diff is mechanical routing from the session-global manager
    to `step_context.mcp`; it does not introduce selected-plugin discovery
    yet.
    
    ## What does not change
    
    - No plugin or extension migration.
    - No new MCP cache policy.
    - No environment file watching.
    - No client sharing between separate managers.
    
    ## Stack
    
    1. Extension-owned World State sections.
    2. Project executor skills through World State.
    3. **This PR:** pin one MCP runtime to each model step.
    4. Project selected MCP/app/connector metadata by environment
    availability.
    5. One end-to-end integration scenario.
  • [codex] Surface MCP reauthentication-required startup failures (#29877)
    ## Summary
    
    - distinguish expired, non-refreshable stored MCP OAuth credentials from
    first-time missing credentials
    - carry a typed `failureReason: "reauthenticationRequired"` on the
    existing `mcpServer/startupStatus/updated` notification only when user
    action is required
    - keep the public MCP auth-status API unchanged and regenerate the
    app-server protocol schemas and documentation
    
    ## Why
    
    An MCP server with an expired access token and no usable refresh token
    currently fails startup without giving clients a reliable, typed
    recovery signal.
    
    The existing startup-status notification is the natural place to carry
    this state. Its nullable `failureReason` keeps the recovery reason
    attached to the failed startup transition without adding a one-off
    notification. Internally, Codex distinguishes first-time login from
    reauthentication and emits the reason only when the startup error itself
    requires authentication.
    
    ## User impact
    
    App clients can prompt an existing user to reconnect an MCP server when
    automatic recovery is impossible by handling a failed
    `mcpServer/startupStatus/updated` notification whose `failureReason` is
    `reauthenticationRequired`. Starting, ready, cancelled, unrelated
    failures, and first-time setup carry no reauthentication reason.
    
    ## Companion app PR
    
    - openai/openai#1069582
    
    ## Validation
    
    - `just test -p codex-app-server-protocol` — 248 passed; schema fixture
    tests passed
    - `cargo check -p codex-app-server -p codex-tui`
    - `just test -p codex-rmcp-client -p codex-mcp` — 184 passed, 2 skipped
    - `just test -p codex-protocol -p codex-app-server-protocol -p
    codex-mcp` — 579 passed
    - `just write-app-server-schema`
    - `just fmt`
  • fix(app-server): suppress TUI rollback warning (#30124)
    ## Why
    
    The TUI uses `thread/rollback` internally for user-facing flows such as
    prompt cancellation/backtracking. After `thread/rollback` was marked
    deprecated, those internal calls started surfacing `deprecationNotice`
    messages in the TUI, even though the user did not explicitly call the
    deprecated app-server API.
    
    The endpoint should remain deprecated for external app-server clients,
    but the built-in `codex-tui` client should not show this
    implementation-detail warning during normal interaction.
    
    ## What changed
    
    - Pass the initialized app-server client name into the `thread/rollback`
    request processor.
    - Suppress the `thread/rollback` deprecation notice only for
    `codex-tui`.
    - Preserve the existing `deprecationNotice` behavior for non-TUI
    clients.
    - Add regression coverage for the `codex-tui` suppression path.
    
    ## How to Test
    
    1. Start Codex TUI from this branch.
    2. Type text into the composer and press `Esc` to cancel/backtrack.
    3. Confirm the TUI restores/cancels the prompt without showing
    `thread/rollback is deprecated and will be removed soon`.
    4. Also verify an external app-server client that calls
    `thread/rollback` still receives `deprecationNotice`.
    
    Targeted tests:
    
    - `just test -p codex-app-server thread_rollback`
    - `just argument-comment-lint`
  • feat(core, mcp): cache codex_apps tools in memory (#29003)
    ## Description
    
    This makes Codex Apps tool reads use a shared in-memory snapshot instead
    of rereading the disk cache every time `list_all_tools()` runs. Disk
    still seeds the cache on startup and gets updated after successful
    fetches, but it is no longer the live read path.
    
    The core change is that `McpManager` now owns a process-scoped
    `CodexAppsToolsCache`. Codex threads in the same app-server process now
    share this Codex Apps in-memory tools snapshot. The snapshot is keyed by
    the Codex home plus the Codex Apps identity: the active Codex auth
    user/workspace and the effective Codex Apps MCP source config.
    
    There's already code to hard-refresh the cache, so we respect it in this
    PR.
    
    ## Local benchmark
    
    I ran a local steady-state microbenchmark of the exact repeated Codex
    Apps cached-tools read this PR removes, using the same real local cache
    payload in both trees: `3,678,138` bytes and `381` tools. The cache file
    was already warm in the OS page cache, so this measures same-process
    reread/deserialization work rather than cold-disk latency or full turn
    latency. Each run is 25 iterations (mimicking a turn that makes 25
    inference calls).
    
    | Version | Run 1 | Run 2 | Avg |
    |---|---:|---:|---:|
    | `origin/main` disk read + JSON deserialize + `filter_tools` | `50.755
    ms` | `52.894 ms` | `51.825 ms` |
    | This branch in-memory `current_tools` + `filter_tools` | `0.740 ms` |
    `0.778 ms` | `0.759 ms` |
    
    That removes about `51 ms` from each repeated Codex Apps cached-tools
    read on this machine, roughly `68x` faster for that subpath. It is
    useful evidence for the hot path this PR changes, but not a claim that
    every production turn gets `51 ms` faster; end-to-end impact also
    depends on the rest of `list_all_tools()` and tool-payload construction.
    
    This is on my M2 Max macbook, so with a slower disk this would be much
    worse (and indeed we did see this really blew up turn runtime with a
    slow disk).
  • [codex] poll external clock during sleep (#30113)
    ## Summary
    
    - make the external app-server time provider establish sleep deadlines
    using `currentTime/read`
    - poll the external clock once per second and complete `clock.sleep`
    when the deadline is reached
    - keep the system-clock timer and existing steer/agent-message
    interruption behavior unchanged
    
    ## Why
    
    This lets training control `clock.sleep` through its existing external
    simulated clock without adding separate sleep/wake protocol methods.
    
    ## Testing
    
    - `just fmt`
    - `just test -p codex-app-server
    external_sleep_polls_current_time_and_emits_items`
  • feat: add provider-aware model fallback to thread start (#29942)
    ## Why
    
    Helper threads such as task title generation can request a model ID that
    is valid for the default OpenAI provider but unavailable from the active
    provider. With Amazon Bedrock, `gpt-5.4-mini` is rejected while the
    provider static catalog exposes Bedrock model IDs such as
    `openai.gpt-5.5` and `openai.gpt-5.4`. This causes repeated background
    404s and can surface a misleading turn error even when the main turn
    succeeds.
    
    Clients need an explicit way to ask app-server to resolve an unavailable
    helper model to the active provider default. That fallback must remain
    limited to providers with an authoritative static catalog so custom or
    dynamically discovered model IDs are not rewritten based on an
    incomplete catalog.
    
    Fixes #28741.
    
    ## What changed
    
    - Add the experimental `allowProviderModelFallback` option to
    `thread/start`, defaulting to `false` to preserve existing behavior.
    - Thread the option through thread creation and model selection.
    - When enabled for a static model manager, preserve requested models
    present in the catalog and replace unavailable models with the provider
    default.
    - Continue preserving explicit model IDs for dynamic model managers
    without fetching a catalog solely to validate them.
    - Document the new `thread/start` behavior in the app-server API
    overview.
    
    ## Test
    Temporary test-client harness:
    ```
    ThreadStartParams {
        model: Some("gpt-5.4-mini".to_string()),
        allow_provider_model_fallback: true,
        ..Default::default()
    }
    ```
    Command:
    ```
    CODEX_HOME=/tmp/codex-bedrock-thread-start-home \
    CODEX_E2E_BEDROCK_THREAD_START_ONLY=1 \
    ./target/debug/codex-app-server-test-client \
      --codex-bin ./target/debug/codex \
      -c 'model_provider="amazon-bedrock"' \
      send-message-v2 --experimental-api ignored
    ```
    Relevant output:
    ```
    > "method": "thread/start",
    > "params": {
    >   "model": "gpt-5.4-mini",
    >   "modelProvider": null,
    >   "allowProviderModelFallback": true,
    >   ...
    > }
    
    < "result": {
    <   "model": "openai.gpt-5.5",
    <   "modelProvider": "amazon-bedrock",
    <   ...
    < }
    ```
  • Persist selected capability roots and resolve availability per model step (#29856)
    ## Why
    
    `selectedCapabilityRoots` is durable thread intent: “use this capability
    root from environment `worker`.”
    
    The important product assumption is:
    
    > One environment ID always names the same logical executor and stable
    contents.
    
    `worker` does not silently change from executor A to an unrelated
    executor B. The process-local connection handle for `worker` can still
    be replaced while Codex is running, though, for example when
    `environment/add` registers a fresh handle for the same logical
    environment.
    
    The thread should persist only the stable selection. Each model step
    should pair that selection with the exact ready handle captured for that
    step.
    
    ## The boundary
    
    ```text
    persisted thread intent
      plugin@1 -> environment "worker"
                    |
                    | capture the current step
                    v
    model-step view
      unavailable, or
      plugin@1 + worker's exact captured ready handle
    ```
    
    The environment ID is the stable identity and cache key. The
    `Arc<Environment>` is only a process-local handle retained so consumers
    of one model step use the same captured environment. It is never
    persisted and it does not imply different environment contents.
    
    ## What changes
    
    ### Persist the stable selection
    
    Selected roots are written into `SessionMeta` and restored with the
    thread. Forked subagents inherit the same selections, including
    bounded-history forks.
    
    Only stable data is persisted: root ID, environment ID, and root path.
    
    ### Capture readiness together with the exact handle
    
    The environment snapshot records:
    
    ```rust
    environment_id -> Some(Arc<Environment>) // ready in this step
    environment_id -> None                   // still starting in this step
    ```
    
    This prevents readiness and execution from coming from different
    registry snapshots.
    
    For example:
    
    ```text
    step snapshot: worker -> handle A, ready
    environment/add: worker -> fresh handle B for the same logical environment
    current step: plugin@1 still uses captured handle A
    ```
    
    Without carrying handle A in the snapshot, the resolver could combine “A
    was ready” with handle B and treat B as ready before it had finished
    starting.
    
    This does not change cache invalidation. Stable capability metadata
    remains identified by environment ID and capability root. Replacing a
    process-local handle under the same stable environment ID does not
    invalidate or rediscover that metadata.
    
    ### Resolve availability per model step
    
    - A ready captured environment produces resolved roots using its
    captured handle.
    - A starting, missing, or failed environment is omitted from that step.
    - A selected lazy environment that is outside the turn's captured
    environment set is asked to start, and a later step can observe it as
    ready.
    - No capability files are scanned here.
    
    Transient transport disconnects remain the remote client's reconnect
    concern. This PR models initial attachment/readiness; it does not add
    live socket-connectivity state.
    
    ## Example
    
    ```text
    thread selection: plugin@1 -> environment "worker"
    
    step 1: worker is starting -> plugin@1 unavailable
    step 2: worker is ready    -> plugin@1 resolves through worker's captured handle
    step 3: fresh local handle -> current step remains pinned; a later step captures its own view
    ```
    
    Temporary unavailability does not discard the durable selection. Later
    PRs can retain stable metadata caches while projecting only currently
    available capabilities into model-visible World State.
    
    ## Compatibility
    
    The app-server request shape does not change. Older rollouts without
    `selected_capability_roots` deserialize to an empty list.
    
    ## Stack
    
    1. **This PR:** persist stable selected roots and resolve them through
    an exact model-step handle.
    2. #29960: cache stable skill metadata and project available skills into
    World State.
    3. #29946: cache stable plugin declarations and manage the separate live
    MCP runtime.
  • chore(app-server): mark thread/rollback as deprecated (#29928)
    We will drop support for this in the near future due to the complexity
    it introduces.
  • Test executor-routed MCP OAuth token exchange (#29656)
    ## Why
    
    #28529 proves OAuth discovery uses the selected executor, but its
    end-to-end test stops before the callback and token exchange.
    
    ## What changed
    
    - add an executor-only mock token endpoint
    - complete the OAuth callback using the authorization URL's `state` and
    `redirect_uri`
    - assert the PKCE token exchange reaches the executor-only endpoint
    - assert the completion notification reports the selected thread and
    succeeds
    
    Depends on #28529.
  • Support OAuth for HTTP MCP servers from selected executor plugins (#28529)
    ## Why
    
    #28522 routes selected-plugin HTTP MCP traffic through the owning
    executor, but OAuth bootstrap and refresh still used host-local clients.
    Executor-only servers therefore cannot complete discovery or login
    through the same network boundary as the MCP connection.
    
    ## What changed
    
    - adapt `codex_exec_server::HttpClient` to RMCP 1.8's `OAuthHttpClient`
    contract
    - let RMCP own discovery, dynamic registration, PKCE, token exchange,
    and refresh
    - route auth status, persisted-token startup, and app-server login
    through the server runtime while preserving the existing local discovery
    path
    - add optional `threadId` to `mcpServer/oauth/login` and echo it in the
    completion notification
    - implement RMCP's redirect policy and 1 MiB OAuth response limit over
    executor HTTP
    - cover selected-thread OAuth discovery and login through an
    executor-only route
    
    Depends on #28522.
  • Support HTTP MCP servers from selected executor plugins (#28522)
    ## Why
    
    Selected executor plugins can declare both stdio and Streamable HTTP MCP
    servers, but only stdio registrations were retained. That silently drops
    part of the plugin's tool surface and prevents HTTP traffic from using
    the owning executor's network.
    
    ## What changed
    
    - retain selected-plugin Streamable HTTP MCP declarations alongside
    stdio declarations
    - route their HTTP clients through the owning executor environment
    - preserve local auth-header environment references while rejecting them
    for executor-hosted declarations
    - cover thread isolation, refresh, and an executor-only HTTP route end
    to end
  • [codex] route sleep through time providers (#29973)
    ## Summary
    
    - add a cancellable sleep operation to `TimeProvider`
    - route `clock.sleep` through the configured provider
    - extend the supported sleep duration to 12 hours
    - complete the sleep turn item before propagating provider failures
    
    ## Why
    
    This isolates the core clock abstraction needed by external clock
    integrations. Existing system and app-server behavior remains wall-clock
    based in this PR; the stacked follow-up supplies app-server sleeps from
    an external clock.
  • [codex] Add Ultra reasoning effort (#29899)
    ## Why
    
    Ultra should be one user-facing reasoning selection for work that
    benefits from both maximum reasoning and proactive multi-agent
    delegation. Without it, clients must coordinate maximum reasoning with
    the experimental `multiAgentMode` setting, even though the inference
    backend still expects its existing `max` effort value.
    
    This change makes reasoning effort the source of truth: clients select
    `ultra`, core derives proactive multi-agent behavior when the turn is
    eligible for multi-agent V2, and inference requests continue to use the
    backend-compatible `max` value.
    
    ## What changed
    
    - Add `ultra` as a first-class reasoning effort and preserve
    model-catalog ordering when exposing it to clients.
    - Convert `ultra` to `max` at the inference request boundary, including
    Responses HTTP/WebSocket requests, startup prewarm, compaction, and
    memory summarization.
    - Derive effective multi-agent mode per turn from effective reasoning
    effort:
      - eligible multi-agent V2 + `ultra` → `proactive`
      - eligible multi-agent V2 + any other effort → `explicitRequestOnly`
    - V1 or otherwise ineligible sessions → no multi-agent mode instruction
    - Keep the derived effective mode in turn context history so successive
    turns can emit a developer-message update only when the effective mode
    changes.
    - Remove selected multi-agent mode from core session configuration, turn
    construction, thread settings, resume/fork restoration, and subagent
    spawn plumbing. Subagents inherit reasoning effort and derive their own
    effective mode.
    - Retain the experimental app-server `multiAgentMode` fields for wire
    compatibility while marking them deprecated. Request values are accepted
    but ignored; compatibility response fields report `explicitRequestOnly`.
    - Display Ultra in the TUI using the order supplied by `model/list`.
    
    ## Validation
    
    - `just test -p codex-core ultra_reasoning_uses_max_for_requests`
    - `just test -p codex-tui model_reasoning_selection_popup`
  • [codex] Populate remote plugin local versions (#29956)
    # What
    
    - Carry installed remote release versions through remote plugin
    summaries as `localVersion`.
    - Keep the app-server mapping a pure adapter by populating that value in
    the remote catalog layer.
    
    # Why
    
    Remote plugin summaries always returned `localVersion: null` even after
    their versioned bundles had been installed locally. Consumers such as
    scheduled-task template discovery use `localVersion` to resolve a
    plugin's materialized root, so templates from remote curated plugins
    were silently skipped.
  • [codex] nest sleep config under current time reminder (#29910)
    ## Summary
    
    - move sleep tool enablement from top-level `[features].sleep_tool` to
    `[features.current_time_reminder].sleep_tool`
    - remove the standalone `Feature::SleepTool` flag and gate `clock.sleep`
    from resolved current-time configuration
    - update config schema, config-lock materialization, and existing sleep
    coverage
    
    Stacked on #29907.
  • [codex] namespace sleep under clock (#29907)
    ## Summary
    
    - expose the interruptible sleep tool as `clock.sleep` instead of
    top-level `sleep`
    - keep `clock.curr_time` and `clock.sleep` in the same model-visible
    namespace when both features are enabled
    - update existing core and app-server integration coverage to issue
    namespaced sleep calls
    
    ## Why
    
    Sleep is a clock operation. Grouping it with `clock.curr_time` gives the
    model a more coherent tool surface without changing the sleep feature
    gate or runtime behavior.
    
    ## Validation
    
    - `just test -p codex-core sleep_tool_follows_feature_gate`
    - `just test -p codex-core any_new_input_interrupts_sleep`
    - `just test -p codex-app-server
    sleep_emits_started_and_completed_items`
  • Add a connector declaration snapshot (#29851)
    ## Why
    
    Connector declarations currently enter Codex through broad plugin
    capability summaries, then MCP setup, turn tooling, and `app/list` each
    reconstruct the same information. That makes executor-selected
    connectors difficult to add without coupling connector behavior to the
    host plugin loader.
    
    This PR introduces a small connector-owned value that later stack layers
    can populate before thread startup.
    
    ## What changed
    
    - Move the pure app-declaration parser into `codex-connectors`,
    preserving declaration order and category cleanup while leaving
    host-side validation and deduplication unchanged.
    - Add an immutable `ConnectorSnapshot` with ordered connector IDs and
    plugin display-name provenance.
    - Adapt the existing local-plugin capability summaries into that
    snapshot at current consumer boundaries.
    - Use the snapshot for MCP tool provenance, turn connector inventory,
    and `app/list`.
    - Keep the crate API narrow: no test-only snapshot accessors are
    exposed.
    
    The externally visible behavior is unchanged. Connector tools still come
    from the orchestrator-owned `/ps/mcp` server, and local plugin
    enablement remains owned by the existing plugin loader.
    
    ## Stack scope
    
    This is the foundation only. It does not read selected executor packages
    or change thread startup. #29852 adds the executor-backed declaration
    reader, and #29856 composes selected declarations into a thread
    snapshot.
  • [apps] Thread structured icon assets through app list (#29889)
    ## Summary
    
    - Add `iconAssets` and `iconDarkAssets` to the app-list protocol.
    - Preserve structured icons through directory merging and the connector,
    app-
      server, and TUI boundaries.
    - Keep legacy logo URLs unchanged as compatibility fallbacks.
    - Update generated protocol schemas and TypeScript types.
  • [codex] Inject agent graph store into ThreadManager (#29736)
    Pick up the AgentGraphStore migration.
    
    - Inject an explicit optional agent graph store into `ThreadManager` 
    - Move all calls to spawn, close, recursive resume, and
    subtree/archive/delete/feedback traversal through it
    - Keep using  `LocalAgentGraphStore` when SQLite is available
    
    This required some changes to the interface to deal with futures:
    
    - The interface now matches `ThreadStore`'s object-safe pattern by
    returning a boxed `AgentGraphStoreFuture` directly, allowing
    `ThreadManager` to hold `Arc<dyn AgentGraphStore>`
    
    *Slight behavior change!* Unfiltered subtree enumeration now performs a
    single all-status breadth-first traversal, so a closed grandchild
    beneath an open edge is included; the previous Open-then-Closed
    traversals could not cross mixed-status paths and silently omitted it.
  • feat(app-server): list descendant threads by ancestor (#29591)
    ## Why
    
    `thread/list` can filter direct children with `parentThreadId`, but
    clients cannot request an entire spawned subtree. Discovering every
    descendant requires repeated client-side requests and gives up the
    database's existing filtering and pagination path.
    
    ## What changed
    
    Experimental clients can use `ancestorThreadId` to return strict
    descendants at any depth while `parentThreadId` retains its direct-child
    meaning. The filters are mutually exclusive, the ancestor is excluded,
    and every result preserves its immediate `parentThreadId` so callers can
    reconstruct the tree.
    
    ## How it works
    
    - **Explicit relationship:** Internal list parameters distinguish direct
    children from transitive descendants without changing the meaning of
    `parentThreadId`.
    - **Existing graph:** Persisted parent-child spawn edges remain the
    source of truth, so descendant lookup needs no schema migration or
    ancestry cache.
    - **Indexed traversal:** A recursive SQLite query starts from the
    parent-edge index, walks each generation, and applies thread filters,
    sorting, and cursor pagination in the same database request.
    - **Reconstructable results:** The response stays flat and normally
    ordered while carrying each descendant's immediate parent.
    
    ## Verification
    
    Ran 550 tests across the protocol, state, rollout, and thread-store
    crates, then reran the four focused state, store, and app-server
    descendant-listing tests after the final diff reduction. Scoped Clippy
    and formatting checks passed. Stable and experimental schema generation
    was checked; the stable fixtures remain unchanged while the experimental
    schema includes the new field.
  • [codex] show external import result counts (#29567)
    ## What changed
    
    - Show per-type import counts in the `/import` review UI and started
    message.
    - Render completion results as a multi-line summary with total
    imported/failed counts and one row per import type.
    - Add snapshot coverage for the updated review and completion output.
    
    <img width="537" height="322" alt="Screenshot 2026-06-23 at 9 41 20 PM"
    src="https://github.com/user-attachments/assets/166542eb-2097-4b2b-8130-8f6fd8c680ce"
    />
    
    
    ## Why
    
    The TUI previously only reported that Claude Code import started or
    finished. Users could not see how many items of each type were selected
    or how many actually imported versus failed.
  • test: use automatic environments in app-server integration tests (#29789)
    ## Why
    
    Topology-neutral app-server integration tests should exercise automatic
    environment selection so the same setup covers local and remote
    executors.
    
    ## What
    
    Migrate eligible tests to `TestAppServer::new_with_auto_env()` and
    `send_thread_start_request_with_auto_env()`. Leave explicit-topology
    tests unchanged, and skip the request-permissions case on Windows with a
    TODO for cross-platform tool routing.
    
    ## Validation
    
    - `just test -p codex-app-server`
    - `bazel test //codex-rs/app-server:app-server-all-wine-exec-test
    --test_output=errors`
    
    Stacked on #29788.
  • test: run app-server integration tests under Wine (#29788)
    ## Why
    
    Made a mistake when carving #29746 out of my local changes and the test
    was missing from the build graph. Oops!
    
    ## What
    
    Enable the app-server Wine exec test target. Remove the `manual` tag
    from generated Wine-exec test variants so wildcard Bazel test
    invocations select them. Refactor the smoke test to ensure it passes
    with current Windows support.
  • connectors: own app metadata types (#29723)
    ## Why
    
    Connector metadata is consumed by connector discovery, ChatGPT
    integration, core, and TUI code. Treating app-server's wire DTO as the
    shared domain model reverses the intended dependency direction.
    
    ## What changed
    
    - Added connector-owned app branding, review, screenshot, metadata, and
    info types.
    - Added explicit conversions in app-server and TUI while preserving
    app-server's wire payloads.
    - Removed production app-server-protocol dependencies from connectors
    and ChatGPT connector code.
    
    ## Stack
    
    This is PR 4 of 6, stacked on [PR
    #29722](https://github.com/openai/codex/pull/29722). Review only the
    delta from `codex/split-config-layer-types`. Next: [PR
    #29724](https://github.com/openai/codex/pull/29724).
    
    ## Validation
    
    - Connector and tools coverage passed.
    - App-server app-list coverage passed: 13 tests.
  • config: own layer provenance types (#29722)
    ## Why
    
    Config layer provenance describes how effective configuration was
    assembled, so it belongs with the config loader rather than in
    app-server's serialized API types.
    
    ## What changed
    
    - Moved `ConfigLayerSource`, `ConfigLayerMetadata`, and `ConfigLayer`
    ownership into `codex-config`.
    - Kept app-server's wire payloads unchanged and added explicit
    conversions at the app boundary.
    - Removed lower-level app-server-protocol dependencies from config
    consumers.
    
    ## Stack
    
    This is PR 3 of 6, stacked on [PR
    #29721](https://github.com/openai/codex/pull/29721). Review only the
    delta from `codex/split-auth-domain-types`. Next: [PR
    #29723](https://github.com/openai/codex/pull/29723).
    
    ## Validation
    
    - `codex-config` coverage passed.
    - App-server config-manager and config RPC coverage passed.
  • [plugins] Enforce marketplace source admission requirements (#29753)
    ## Why
    
    Managed marketplace source requirements only become effective when every
    local marketplace mutation path applies the same admission decision.
    This change centralizes that decision so CLI, app-server, and
    external-agent migration flows cannot add, install from, or refresh a
    disallowed source.
    
    ## What changed
    
    - Match exact normalized Git repository URLs with an optional exact
    `ref`.
    - Match Git hosts with managed regular expressions.
    - Match local marketplaces by exact absolute path.
    - Preserve the expected path/name boundary for managed OpenAI
    marketplaces.
    - Enforce source admission during marketplace add, plugin install, and
    configured Git marketplace upgrade.
    - Continue upgrading independent marketplaces when one source is
    rejected and return a per-marketplace error.
    - Load the effective requirements stack at CLI, app-server, and
    external-agent migration entry points.
    
    This PR does not filter already configured marketplaces at runtime; that
    remains in draft follow-up #29691.
    
    ## Stack
    
    This is PR 2 of 3 and is based on #29690, which introduces the
    requirements data shape and merge behavior.
    
    ## Test plan
    
    - Source matcher coverage for Git URL/ref, host-pattern, local-path, and
    managed marketplace cases.
    - Marketplace add and plugin install coverage for allowed and rejected
    sources.
    - Marketplace upgrade coverage for rejection and per-marketplace
    continuation.
  • auth: move domain mode below app wire types (#29721)
    ## Why
    
    Authentication mode is a domain concept used by login, model selection,
    telemetry, and transports. Keeping the canonical type in app-server
    protocol forces those lower-level crates to depend on an unrelated wire
    API.
    
    ## What changed
    
    - Added canonical `codex_protocol::auth::AuthMode` domain values.
    - Kept the app-server wire DTO unchanged and added an explicit app-side
    conversion.
    - Removed production app-server-protocol dependencies from login,
    model-provider-info, models-manager, and otel call paths.
    
    ## Stack
    
    This is PR 2 of 6, stacked on [PR
    #29714](https://github.com/openai/codex/pull/29714). Review only the
    delta from `codex/split-json-rpc-protocols`. Next: [PR
    #29722](https://github.com/openai/codex/pull/29722).
    
    ## Validation
    
    - Auth and login coverage passed in the focused protocol/domain test
    run.
    - App-server account and auth conversion coverage passed.
  • [codex] Ignore local curated plugins when remote catalog is active (#29765)
    ## Summary
    
    - suppress configured `openai-curated` plugins when the remote plugin
    feature is enabled and auth uses the Codex backend
    - preserve `openai-api-curated` and non-Codex-backend behavior while
    including remote catalog activation in the plugin load cache key
    - add core plugin coverage and an app-server integration test for
    runtime feature enablement
    
    ## Why
    
    The Codex app enables remote plugins through process-local runtime
    feature enablement, which can happen after app-server startup tasks have
    already observed legacy local plugin state. The existing conflict logic
    only preferred a remote plugin when the same plugin was already
    installed remotely, so a configured legacy-only plugin could continue
    exposing skills and other capabilities from `openai-curated`.
    
    ## Impact
    
    When the remote catalog is active, legacy `openai-curated` plugins no
    longer contribute skills, MCP servers, apps, or hooks. Remote installed
    plugins continue to load normally, and `openai-api-curated` remains
    unaffected. This does not change remote fetch, bundle sync, or uninstall
    behavior.
    
    ## Validation
    
    - `just test -p codex-core-plugins
    remote_global_catalog_ignores_local_curated_plugins
    remote_plugin_feature_keeps_local_curated_without_codex_backend`
    - `just test -p codex-app-server
    runtime_remote_plugin_enablement_excludes_local_curated_plugin_skills`
    - `just fmt`
    - `git diff --check`
  • Let image generation extension hosts control output persistence (#29711)
    ## Why
    
    Some extension hosts need generated images returned without writing them
    to the local filesystem or giving the model a local path.
    
    ## What changed
    
    **tl;dr**: we now conduct all extension operations in the image gen
    extension
    
    - Let hosts provide an optional image save root when installing the
    extension.
    - Save images and return path hints only when a save root is configured.
    - Return image data without saving or adding a path hint when no save
    root is configured.
    - Preserve the extension-provided `saved_path` instead of persisting
    extension images again in core.
    - Leave built-in image generation unchanged.
    
    ## Validation
    
    - `just test -p codex-image-generation-extension`
    - `just test -p codex-app-server
    standalone_image_generation_returns_saved_path_hint_to_model`
    - `just test -p codex-core
    extension_tool_uses_granted_turn_permissions_without_local_persistence`
    - `just test -p codex-core tools::handlers::extension_tools::tests`
    - tested on CODEX CLI on both save_root: CODEX_HOME and None 
    - tested on CODEX APP on both as well
  • test: add app-server auto environment helper (#29746)
    ## Why
    
    Start moving towards app-server tests defaulting to running against
    remote & foreign OS executors. To do so we need a point of indirection
    similar to core integration tests' `build_with_auto_env`, but with the
    flexibility of letting tests control environment registration if they
    need to.
    
    ## What
    
    This adds:
    
    - `TestAppServer::new_with_auto_env()` for constructing an app server
    with a default environment defined by the test runner (e.g. bazel)
    - `TestAppServer::auto_env_params()` for tests to easily acquire turn
    env params tailored to the automatic environment
    - `TestAppServer::send_thread_start_request_with_auto_env()` to make it
    easy for tests to start a thread using the automatic environment
    
    The above methods all fail if the test calling them has set up an
    environment where the automatic environment configuration conflicts with
    test-created state.
    
    ## Validation
    
    Adds a couple of basic smoke tests to the app-server test suite.
    Follow-ups will migrate more tests to use it.
  • Support thread-level originator overrides (#29477)
    ## Why
    
    Work(TPP) threads can be launched from the Desktop app, but if they all
    keep the Desktop app's default originator then downstream attribution
    cannot distinguish local Work launches from cloud-backed Work launches.
    `thread/start.serviceName` already carries that launch signal, while
    `SessionMeta.originator` is the durable thread-level value that survives
    resume and fork.
    
    This change converts the Desktop Work service names into an effective
    originator at thread creation time, persists that originator with the
    thread, and keeps using it for later model requests and memory writes.
    
    ## What changed
    
    - Map `CODEX_WORK_LOCAL` and `CODEX_WORK_CLOUD` service names to
    per-thread originators, while preserving
    `CODEX_INTERNAL_ORIGINATOR_OVERRIDE` as the highest-precedence override.
    - Persist the effective originator in `SessionMeta.originator`, read it
    back on resume/fork, and inherit the parent originator for subagent
    spawns when there is no persisted session metadata.
    - Handle truncated `SpawnAgentForkMode::LastNTurns` forks by falling
    back to the live parent originator when the forked history no longer
    includes `SessionMeta`.
    - Thread the per-thread originator through Responses headers,
    websocket/compaction request paths, thread-store creation, rollout
    metadata, and memory stage-one telemetry.
    
    ## Verification
    
    - `just test -p codex-core
    agent::control::tests::spawn_thread_subagent_inherits_parent_originator_without_fork
    agent::control::tests::spawn_thread_subagent_fork_last_n_turns_inherits_parent_originator_without_session_meta
    thread_manager::tests::originator_override_precedes_service_name_remapping`
    - `just test -p codex-core
    agent::control::tests::resume_thread_subagent_restores_stored_metadata_and_effective_multi_agent_mode`
    - `just test -p codex-memories-write`
    - `just fix -p codex-core -p codex-memories-write`
    - `git diff --check`
  • [codex] rename rollout budget error to session budget error (#29744)
    ## Summary
    
    - rename the rollout-budget exhaustion error from
    `RolloutBudgetExceeded` to `SessionBudgetExceeded`
    - expose the matching app-server v2 wire value as
    `sessionBudgetExceeded`
    - regenerate JSON/TypeScript schema fixtures and update the app-server
    docs and focused tests
    
    This is a naming-only follow-up to #29715 based on [Pavel's review
    suggestion](https://github.com/openai/codex/pull/29715#discussion_r3463183480).
    Runtime behavior is unchanged.
    
    ## Tests
    
    - `just test -p codex-core rollout_budget`
    - `just test -p codex-app-server-protocol`
    - `just fmt`
    - `just write-app-server-schema`
  • [codex] surface rollout budget exhaustion (#29715)
    ## Summary
    - surface shared rollout-budget exhaustion as
    `CodexErr::RolloutBudgetExceeded` instead of a generic interrupted turn
    - map it through the existing `CodexErrorInfo` and app-server v2
    `codexErrorInfo` path
    - keep local compaction from retrying after the shared rollout budget is
    exhausted
    
    This gives app-server clients a stable `rolloutBudgetExceeded` error
    they can classify without guessing from `status="interrupted"`.
    
    ## Tests
    - `just test -p codex-core rollout_budget`
  • Make selected plugin roots URI-native (#28918)
    ## Why
    
    Selected capability roots belong to the executor filesystem, not the
    app-server host. Converting their path strings into the host's native
    `Path` breaks whenever the two machines use different path conventions,
    such as a Windows executor behind a Unix app-server.
    
    This PR establishes `PathUri` as the selected-plugin boundary so the
    executor remains authoritative for its paths.
    
    ## What changed
    
    - Require `selectedCapabilityRoots[].location.path` to be a canonical
    `file:` URI and deserialize it directly as `PathUri`; native path
    strings are rejected.
    - Update the app-server schema, generated TypeScript, examples, and
    request coverage for the URI contract.
    - Keep selected roots, resolved plugin locations, manifest paths, and
    manifest resources as `PathUri`.
    - Inspect and read plugin roots and manifests only through the selected
    environment's `ExecutorFileSystem`.
    - Parse executor manifests with the shared URI-native parser from #29620
    instead of projecting them onto the host filesystem.
    - Enforce resource containment lexically and preserve the root URI's
    POSIX or Windows path convention.
    - Cover foreign Windows plugin roots and URI-native manifest resources.
    
    ```text
    thread/start
      selectedCapabilityRoots[].location.path = "file:///C:/plugins/demo"
                                  | PathUri
                                  v
                        ExecutorFileSystem
                                  |
                                  +--> plugin.json
                                  +--> manifest resources
    ```
    
    This PR stops at the shared selected-plugin representation. The next two
    PRs remove the remaining host-path projections in the skill and MCP
    consumers.
    
    ## Stack
    
    1. #29614 — add lexical `PathUri` containment.
    2. #29620 — share URI-native manifest path resolution.
    3. **This PR** — keep selected plugin roots and resources URI-native.
    4. #29626 — load executor skills without host path conversion.
    5. #29628 — resolve executor MCP working directories without host path
    conversion.