3916 Commits

  • [codex] Treat max as a first-class reasoning effort (#30467)
    ## Why
    
    The Bedrock GPT-5.6 catalog advertises `max`, but Codex treated it as an
    opaque custom effort. That made the reasoning picker render it as
    lowercase `max` while known efforts use productized labels.
    
    Making `max` a known effort aligns catalog data, parsing, and UI
    presentation without changing the `max` wire value or persisted
    representation.
    
    ## What changed
    
    - Add first-class `ReasoningEffort::Max` parsing and serialization.
    - Use the typed effort in the Bedrock catalog and render it as `Max` in
    the TUI.
    - Preserve forward-compatible custom-effort coverage with a genuinely
    unknown `future` value.
    
    ### Before
    <img width="559" height="124" alt="Screenshot 2026-06-28 at 12 08 47 PM"
    src="https://github.com/user-attachments/assets/7c43cf4f-020b-4605-9239-0a9c97eb7364"
    />
    
    ### After
    <img width="558" height="107" alt="Screenshot 2026-06-28 at 12 09 10 PM"
    src="https://github.com/user-attachments/assets/b9cc5ded-c940-43b4-b024-bba25abe0a17"
    />
  • [codex] Restore v1 delegation guidance (#30511)
    ## Summary
    
    - restore the v1 clarification that requests for depth, research, or
    investigation do not authorize subagent spawning
    - restore guidance for keeping critical-path, urgent, tightly coupled,
    or difficult work local
    - update the focused v1 tool-search and spawn-description coverage
    
    ## Why
    
    PR #27919 simplified the v1 `spawn_agent` prompt by removing its
    delegation decision guidance. That left the authorization rule intact,
    but removed the instructions that constrained what should be delegated
    after spawning was authorized.
    
    Restore those guardrails while preserving later support for explicit
    delegation authorization from applicable AGENTS.md and skill
    instructions. Multi-agent v2 prompts are unchanged.
    
    ## User impact
    
    Models using the v1 multi-agent tool surface receive clearer guidance to
    delegate independent side work while keeping blocking work on the main
    rollout.
    
    ## Validation
    
    - `just fmt`
    - `git diff --check`
    - tests not run locally per repository guidance; CI will validate the
    focused coverage
  • [codex] Use model metadata for skills usage instructions (#29740)
    ## Summary
    
    - add a false-by-default `include_skills_usage_instructions` model
    metadata field
    - enable the field for the bundled `gpt-5.5` model metadata
    - consume the metadata in both core and extension skill rendering
    - remove hardcoded legacy-model matching and its marker plumbing
  • [codex] Enable remote plugins by default (#30297)
    ## Summary
    
    - enable the remote plugin feature by default
    - promote the remote plugin feature from under development to stable
    - preserve the existing `features.remote_plugin` override for explicitly
    disabling it
    - keep legacy disabled-path coverage explicit in TUI and app-server
    tests
    
    ## Impact
    
    Remote plugin functionality is enabled by default for configurations
    that do not set the feature flag. The existing Codex backend
    authentication gate still applies.
    
    ## Validation
    
    - `just fmt`
    - `just test -p codex-features`
    - `just test -p codex-tui
    plugins_popup_remote_section_fallback_states_snapshot`
    - targeted `codex-app-server` plugin-list and skills-list tests
    - `git diff --check`
    
    The full TUI and app-server suites were also exercised locally. All
    remote-plugin-related coverage passed; unrelated local
    sandbox/test-binary failures remain outside this change.
  • core: stabilize synthesized call output IDs (#30327)
    ## Why
    
    Response item IDs represent stable conversation identity.
    `ContextManager::for_prompt` repairs an unmatched call by synthesizing
    an `"aborted"` output in the disposable prompt projection, but that
    output previously had no ID. Assigning a fresh ID on every prompt build
    would make retries and resumes change otherwise identical model context
    and reduce prompt-cache reuse.
    
    The concrete bug is that these normalization-created outputs bypass the
    regular item-ID allocation path. Even with item IDs enabled, a prompt
    could therefore contain an identified call paired with a synthetic
    output whose `id` was missing. This change closes that gap by deriving
    the output ID from the source call's item ID. For legacy calls that have
    no item ID, the output remains ID-less because there is no stable source
    identity to derive from.
    
    The originating call already has a stable item ID under the item-ID
    model introduced in #28814. A prompt-only output can therefore derive
    stable identity from that call without mutating canonical history or
    persisted rollouts. This addresses the failure exposed by #30311 while
    keeping normalization read-only outside its detached prompt snapshot.
    
    UUIDv5 is intentional here because it is the standard namespaced,
    deterministic UUID construction. Using the output kind and source call
    ID as the name produces the same UUID on every projection while keeping
    output kinds in separate name domains. UUIDv7 would introduce randomness
    and time, so keeping it stable would require persisting the synthetic
    repair. UUIDv5 uses SHA-1 internally, but this is only an identity
    mapping—not an authenticity or security boundary.
    
    ## What changed
    
    - Derive a deterministic UUIDv5 ID for each synthesized call output from
    the source call item ID.
    - Use the Responses API prefix appropriate for function, custom-tool,
    tool-search, and local-shell outputs.
    - Preserve the existing insertion position immediately after the
    unmatched call.
    - Keep synthesized outputs prompt-only; no rollout, task-lifecycle,
    compaction, or raw-response behavior changes.
    
    ## Testing
    
    - `just test -p codex-core
    for_prompt_assigns_stable_id_to_synthetic_output_without_reordering_history`
    - `just test -p codex-core
    synthetic_call_output_id_is_stable_across_resumes`
    - `just test -p codex-core normalize_adds_missing_output`
    - `just test -p codex-core response_item_ids`
  • Preserve namespaces on custom tool calls (#30302)
    ## Summary
    
    - Preserve the optional namespace on custom tool calls during response
    deserialization and app-server replay.
    - Use the namespaced tool identifier for streaming argument handling and
    tool dispatch.
    - Regenerate app-server protocol schemas.
    - Add regression tests covering namespace serialization and routing.
    
    ## Testing
    
    - Ran affected protocol and app-server test suites.
    - Ran the full core test suite; two load-sensitive timing tests passed
    when rerun individually.
    - Ran Clippy and formatting checks.
    - Verified with a local end-to-end app-server replay that the namespace
    is preserved through the complete request/response flow.
  • core: overlap diff root discovery with world state (#30286)
    ## Why
    
    Remote diff-root discovery is independent of world-state construction,
    but it ran afterward and added filesystem metadata latency before the
    first model request. Overlap the independent work so thread-cold turns
    do not pay those waits serially.
    
    ## What
    
    - Run `record_context_updates_and_set_reference_context_item` and
    `turn_diff_display_roots` with `tokio::join!`.
    - Reuse the same resolved display roots when constructing
    `TurnDiffTracker`; no cache or behavior lifecycle changes are
    introduced.
    
    ## Validation
    
    A synthetic executor-skill benchmark with artificial network delay:
    thread-cold model-request p50 improved from about 1.79 s to 1.58 s.
  • [codex] consume pushed exec-server process events (#30273)
    ## Summary
    
    - complete unified-exec processes from the ordered event stream instead
    of issuing a final zero-wait `process/read`
    - add optional executor sandbox-denial state to `process/exited`
    - retain `process/read` as a retained-output and compatibility fallback
    for receiver lag, sequence gaps, and legacy servers
    - recover sandbox-denial state across transport reconnection
    - cover the real `TestCodex` remote-exec path without adding a public
    test-only event constructor
    
    ## Why
    
    A successful one-shot tool call currently receives its output and
    terminal notifications, then pays another wide-area `process/read` round
    trip before returning. Staging traces showed that remote response wait
    accounted for more than 99.8% of RPC time; local serialization,
    queueing, and deserialization were below 0.6 ms.
    
    ## Measured impact
    
    A direct staging A/B used the same build and route and changed only
    completion mode. Each arm ran three times with 30 one-shot
    `/usr/bin/true` calls per run. The table reports the median of the three
    per-run percentiles.
    
    | Metric | Final `process/read` | Pushed events | Change |
    | --- | ---: | ---: | ---: |
    | End-to-end completion p50 | 159.5 ms | 118.7 ms | -40.8 ms (-25.6%) |
    | End-to-end completion p95 | 182.4 ms | 131.7 ms | -50.6 ms (-27.8%) |
    | Completion-wait p50 | 80.1 ms | 41.5 ms | -38.5 ms (-48.1%) |
    | Final `process/read` RPC p50 | 79.9 ms | eliminated | -79.9 ms |
    
    TCP_NODELAY was enabled in both A/B arms, so its effect cancels out. The
    successful, complete, in-order event path issued zero final
    `process/read` calls.
    
    ## Compatibility and recovery
    
    - new servers send `sandboxDenied` on `process/exited`
    - legacy servers omit it, which triggers one compatibility
    `process/read`
    - broadcast lag or a sequence gap triggers a retained-output read
    - recovery remains bounded by the server's existing 1 MiB
    retained-output window
    - complete, in-order event streams issue no completion read
    - sandbox denial is attached to the exit event before consumers can
    observe process completion
    - server-first and client-first rollouts remain wire-compatible;
    server-first realizes the latency win immediately
    
    ## Integration coverage
    
    The `TestCodex` suite exercises four distinct remote-exec contracts:
    
    - complete pushed output/exit/close with zero reads
    - direct pushed sandbox denial with zero reads
    - legacy missing denial metadata with exactly one compatibility read
    - count-bounded replay eviction recovered from retained output without
    duplication
    
    ## Validation
    
    - `just test -p codex-core
    exec_command_consumes_pushed_remote_process_events`: 4 passed
    - `just test -p codex-core unified_exec::process_tests::`: 4 passed
    - `just test -p codex-exec-server`: 294 passed, 2 skipped
    - `just test -p codex-exec-server-protocol`: 5 passed
    - `just test -p codex-rmcp-client`: 89 passed, 2 skipped
    - focused Bazel `//codex-rs/core:core-all-test`: passed across 16 shards
    - scoped `just fix` passed for core and exec-server
    - `just fmt` passed
    
    The complete workspace suite was not rerun; focused Cargo and Bazel
    coverage passed for the changed behavior.
  • feat(protocol): define missing rollout turn items (#30282)
    ## Description
    
    This PR adds canonical core `TurnItem` shapes for command execution,
    dynamic tool calls, collab agent tool calls, and sub-agent activity, to
    be stored in the rollout file soon.
    
    It also teaches app-server protocol / `ThreadHistoryBuilder` how to
    render those items, and adds the small legacy fanout helpers needed for
    existing event-based consumers. No core producer or rollout persistence
    behavior changes here, that will be done in a followup.
    
    ## Making ThreadHistoryBuilder stateless
    
    This is the first PR in a stack to make `ThreadHistoryBuilder` stateless
    enough that we can materialize app-server `ThreadItem`s from only a
    given slice of `RolloutItem` history, without ever needing to replay the
    whole thread from the beginning.
    
    The persisted legacy `RolloutItem::EventMsg` records are mostly shaped
    like live UI events, not like materialized `ThreadItem`s. They work if
    we replay the full rollout in order, but they often do not contain
    enough stable identity or complete item state to project an arbitrary
    suffix on its own.
    
    A few examples:
    
    - `UserMessageEvent` and `AgentMessageEvent` have content, but
    historically do not carry the persisted app-server item ID that should
    become the SQLite primary key.
    - `AgentReasoningEvent` and `AgentReasoningRawContentEvent` are
    fragments. `ThreadHistoryBuilder` currently merges them into the last
    reasoning item, which means a slice starting in the middle of reasoning
    cannot know whether to append to an earlier item or create a new one.
    - `WebSearchEndEvent`, `McpToolCallEndEvent`, collab end events, and
    similar legacy events can often render a final-looking item, but they
    usually rely on prior replay state to know which turn owns the item.
    - Begin/end legacy events are partial views of one logical item. The
    builder correlates them by `call_id` and mutates prior state to
    synthesize the final `ThreadItem`.
    
    That is the problem this direction fixes. A persisted canonical
    lifecycle record looks much closer to the read model we actually want
    later:
    
    ```rust
    ItemCompletedEvent {
        turn_id,
        item: TurnItem { id, ...full snapshot... },
        completed_at_ms,
    }
    ```
    
    Once rollout has explicit `turn_id`, stable `item.id`, and a canonical
    completed item snapshot, the future SQLite projector can reduce only the
    new rollout suffix and upsert the affected `thread_items` rows. It no
    longer needs to synthesize `item-N`, infer item ownership from the
    active turn, or replay earlier events just to reconstruct the current
    item snapshot.
    
    ## What changed
    
    - Added core `TurnItem` variants and item structs for command execution,
    dynamic tool calls, collab agent tool calls, and sub-agent activity.
    - Added conversions from those canonical items back into the legacy
    event shapes where current consumers still need them.
    - Added app-server v2 `ThreadItem` conversion for the new core item
    variants.
    - Taught `ThreadHistoryBuilder` and rollout persistence metrics to
    recognize the new item variants.
    
    ## Follow-up
    
    The next PR https://github.com/openai/codex/pull/30283 switches the live
    core producers for these item families onto canonical `ItemStarted` /
    `ItemCompleted` events.
  • Close thread persistence when submission channel closes (#30173)
    ### Summary
    
    Release live thread persistence when a session ends because its
    submission channel closes. This prevents a later same-process resume
    from failing with `thread ... already has a live local writer`.
    
    ### Details
    
    The issue is in the `codex-core` session teardown path used by Codex
    hosts, rather than in Managed Agents API or exec-server itself.
    
    Explicit shutdown already closes the `LiveThread`, which releases the
    process-scoped writer held by `LocalThreadStore`. The
    submission-channel-close fallback ran runtime and extension teardown but
    skipped that persistence shutdown, leaving the thread ID registered as
    having a live writer.
    
    This change:
    
    - closes the `LiveThread` on the channel-close fallback path;
    - preserves the existing teardown order used by explicit shutdowns;
    - extends the lifecycle regression test to assert that the thread store
    receives `shutdown_thread`.
    
    Context: [original
    report](https://openai.slack.com/archives/C0B4NBHQGTV/p1782136364948039),
    [recent occurrence
    1](https://openai.slack.com/archives/C0B4NBHQGTV/p1782434817895839?thread_ts=1782136364.948039&cid=C0B4NBHQGTV),
    [recent occurrence
    2](https://openai.slack.com/archives/C0B4NBHQGTV/p1782335107474429?thread_ts=1782136364.948039&cid=C0B4NBHQGTV)
    
    ### Testing
    
    - `just test -p codex-core
    submission_loop_channel_close_runs_full_thread_teardown`
    - `just test -p codex-core --lib` (1,989 passed; 3 skipped)
    - `just fix -p codex-core`
    - `just fmt`
    - Native code review: no findings
    
    I also attempted `just test -p codex-core`. The new regression passed;
    79 unrelated integration tests failed in the local harness, primarily
    because helper binaries such as `test_stdio_server` were unavailable,
    plus local proxy/shell timing failures.
  • feat(app-server): add optional turn_id to thread/fork (#30277)
    ## Description
    
    This adds stable optional `turnId` support to `thread/fork`. When
    supplied, the fork copies persisted history through that terminal turn,
    inclusive, and drops later turns from the new thread.
    
    Omitting or passing `null` preserves the existing full-history fork
    behavior, including the interruption marker when the stored source
    history ends mid-turn.
    
    ## Why
    
    We're deprecating `thread/rollback` and this will help certain UX use
    cases work around it by using `thread/fork` + `turn_id` instead.
  • [codex] allow AGENTS.md and skills to authorize delegation (#30274)
    Prompt update of MAv2 to include agents.md and skills more explicitly
    
    should mimic: https://github.com/openai/codex/pull/27919
  • [codex] Add managed new-thread model settings (#29683)
    ## Why
    
    Admins need persistent defaults for the model, reasoning effort, and
    service tier shown when the Desktop App creates a new thread. These are
    initialization defaults rather than runtime constraints: the App should
    use them to initialize its draft while still allowing a user to make an
    explicit selection.
    
    The app-server therefore needs to expose the managed values before
    thread creation without changing `thread/start` behavior for other
    clients.
    
    ## What changed
    
    - Parse `model`, `model_reasoning_effort`, and `service_tier` from
    `[models.new_thread]` in `requirements.toml`.
    - Compose the `models` requirements through the existing
    requirements-layer precedence rules.
    - Expose the resolved values through `configRequirements/read` as
    `requirements.models.newThread`.
    - Add the corresponding app-server protocol types and regenerate the
    JSON and TypeScript schema fixtures.
    - Document the new `configRequirements/read` fields in the app-server
    README.
    
    ## Scope
    
    This PR is data plumbing only. It does not apply these values during
    `thread/start` and does not change thread creation for existing
    app-server clients, resumed or forked sessions, internal or subagent
    sessions, `codex exec`, or the TUI. A companion Desktop App change owns
    draft initialization, sends the effective settings for ordinary and
    prewarmed starts, and preserves explicit user changes.
    
    ## Validation
    
    - Requirements deserialization coverage for `[models.new_thread]`
    - Requirements-layer precedence coverage
    - App-server API mapping coverage
    - `configRequirements/read` integration coverage
    - Regenerated app-server JSON and TypeScript schema fixtures
  • fix main (#30276)
    Introduced by a merge race around thread.history_mode.
  • feat(app-server): add history_mode to thread (#29927)
    ## Description
    
    This PR adds a new `historyMode = "legacy" | "paginated"` to `Thread`.
    This will be stored in `SessionMeta` in the JSONL rollout file and as a
    new column in the SQLite thread_metadata table, and exposed on
    `thread/start` and on the `Thread` object in app-server.
    
    ## What changed
    
    - Added canonical `ThreadHistoryMode` with `legacy` and `paginated`,
    defaulting old and new SessionMeta to `legacy`.
    - Carried `history_mode` through core session config, ThreadStore stored
    metadata, local/in-memory stores, rollout metadata extraction, and the
    existing SQLite `threads` table.
    - Added experimental `historyMode` to app-server v2 `Thread` and
    `thread/start`.
    - Made paginated stored threads metadata-discoverable but unsupported
    for legacy full-history reads, `load_history`, live resume, and create
    paths.
    - Regenerated app-server schema fixtures and added
    protocol/state/thread-store/app-server coverage for persistence and
    fail-closed behavior.
    
    ## Compatibility floor
    Because users may be running various versions of Codex binaries on the
    same machine (TUI, Codex App, etc.), we will need to establish a
    compatibility floor for upcoming paginated threads, which will change
    how thread storage reads and writes work.
    
    The overall plan here:
    ```
    Release N:
    - Add historyMode to SessionMeta / Thread / SQLite metadata.
    - Teach binaries to understand paginated threads.
    - If a binary sees `historyMode="paginated"` but does not support the paginated contract, it refuses to resume/mutate the thread.
    - Default remains `"legacy"`.
    
    Release N+1:
    - First-party clients start opting into paginated threads where appropriate.
    - Internal dogfood / staged rollout.
    - Measure old-client usage and paginated-thread unsupported errors.
    
    Release N+2:
    - Only after Release N+ is overwhelmingly deployed, make paginated the default.
    - Accept that a small tail of N-1-or-older binaries may not understand paginated threads.
    ```
    
    The important behavior change is fail-closed handling for a binary that
    encounters a persisted `paginated` thread before it knows how to fully
    support paginated history. In app-server, if a thread is `paginated`, we
    will:
    
    - allow metadata-only discovery paths like `thread/list` and
    `thread/read(includeTurns=false)`, so clients can still see the thread
    and inspect its `historyMode`
    - reject legacy full-history/live-thread paths like
    `thread/read(includeTurns=true)` and `thread/resume` with an unsupported
    JSON-RPC error
    - avoid silently treating an unknown or future `historyMode` as `legacy`
    
    Under the hood, the ThreadStore layer also rejects legacy operations
    that would need to load or replay the full thread history for a
    paginated thread. That gives us the behavior we want for Release N:
    future paginated threads are visible, but this binary fails closed
    instead of trying to operate on them as if they were legacy threads.
  • Reuse MCP runtimes when selected availability changes nothing (#30148)
    ## Why
    
    MCP runtime reuse was keyed by every ready selected-capability
    environment, even when an environment contributed no MCP servers or
    connectors.
    
    For example:
    
    1. a global stdio MCP is running;
    2. a selected remote environment contains only a skill;
    3. that environment becomes ready;
    4. the MCP and connector projection stays exactly the same;
    5. Codex nevertheless rebuilds the MCP manager and restarts the global
    stdio process.
    
    That restart can interrupt active calls and discard process-local state
    even though nothing about MCP changed.
    
    ## What changes
    
    When selected-environment availability changes, Codex now resolves the
    candidate MCP and connector projection before deciding whether to
    replace the runtime:
    
    - if the winning MCP servers or their ownership change, rebuild as
    before;
    - if the selected connector snapshot changes, rebuild as before;
    - if an enabled MCP is explicitly bound to an environment whose
    availability changed, rebuild as before;
    - otherwise, keep the exact live manager and processes, and update only
    the availability input remembered by the snapshot.
    
    ```text
    ready selected environments:  [] -> [skills-env]
    resolved MCP servers:          {global_probe} -> {global_probe}
    resolved connectors:           {} -> {}
    result:                         reuse manager; keep the same process
    ```
    
    The comparison uses the resolved winning servers and their sources, so
    plugin/config ownership remains part of the runtime identity.
    
    ## Existing stack coverage
    
    The integration PR directly below this one already covers both rebuild
    boundaries: a selected MCP becomes callable and a selected connector
    tool becomes model-visible when their environment becomes available. It
    also verifies that an unchanged selected MCP runtime keeps its process.
    
    This PR does not add another remote-attachment integration scenario for
    the no-change optimization. `environment/add` returns before readiness,
    and app-server does not currently expose a deterministic readiness
    signal for an environment that contributes only skills. Keeping a
    fixed-delay test would add flake risk; adding a new readiness API would
    be outside this fix.
    
    ## Scope and assumptions
    
    - This does not change skill discovery, World State rendering, or plugin
    metadata caching.
    - This does not add file watching or hot reload behavior.
    - This does not change disconnect/reconnect handling.
    - Selected environment IDs and their capability contents retain the
    stack's existing stability assumption.
    - Delayed `required = true` executor MCP behavior remains out of scope.
  • [codex] wire process-owned code mode host into core (#30142)
    ## Summary
    
    - add the `code_mode_host` feature flag and select
    `ProcessOwnedCodeModeSessionProvider` in `CodeModeService` when enabled
    - initialize code-mode sessions lazily so a missing host reports a tool
    error without failing thread startup
    - resolve `codex-code-mode-host` beside the running Codex binary by
    default while preserving `CODEX_CODE_MODE_HOST_PATH` as an override
    - add unit and end-to-end coverage for host resolution and graceful
    missing-host behavior
    
    ## Why
    
    This wires the process-owned session client from #30112 into the core
    service behind an opt-in rollout gate. Packaged Codex installations can
    place the helper in the same `bin` directory as the main executable
    without relying on `PATH`, while development and custom installations
    can continue to override the helper path.
    
    ## Stack
    
    - Depends on #30112
    - Base branch: `cconger/process-owned-session-runtime-4-client`
    
    ## Validation
    
    Build `codex` and `codex-code-mode-host`
    `CODEX_CODE_MODE_HOST_PATH="$PWD/target/debug/codex-code-mode-host"
    ./target/debug/codex --enable code_mode_host`
  • Retry failed Codex Apps MCP startup (#29920)
    ## Problem
    
    The built-in Codex Apps MCP client shares a future for the full startup
    operation: connect, complete `initialize`, fetch the initial tools, and
    return a usable client. Sharing deduplicates startup work, but it also
    memoizes terminal errors.
    
    After a transient connection, handshake, or initial `tools/list`
    failure, later tool builds observe the same failed future. The thread
    cannot reconnect after the backend recovers and continues serving its
    startup-time cached tool snapshot, which may be empty or stale.
    
    ## Fix
    
    When Apps MCP startup ends in an error, Codex starts bounded recovery
    without putting startup latency on tool-router construction:
    
    1. The current tool build immediately continues with the cached startup
    snapshot.
    2. After the initial failure is reported, Codex starts one fresh full
    startup attempt in the background.
    3. Concurrent tool builds share that in-flight attempt and also continue
    with cached tools.
    4. On success, the recovered client becomes active, refreshes the Apps
    tools cache, emits a `Ready` startup status, and is reused by later
    operations.
    5. On failure, the cache remains unchanged and later tool builds may
    start another background attempt after exponential cooldown: 1s, 2s, 4s,
    8s, 16s, then 30s maximum.
    
    Each recreated startup performs a fresh MCP `initialize` and uncached
    `tools/list`. The MCP client retains its existing bounded retries for
    retryable `initialize` and `tools/list` failures.
    
    This avoids adding the Apps startup timeout to every request during a
    sustained outage.
    
    ## Scope
    
    This is limited to the built-in Codex Apps MCP client:
    
    - no reconnects for user-configured MCP servers;
    - no cache deletion; and
    - no proactive refresh for a healthy client with stale tools.
    
    ## Tests
    
    Coverage verifies:
    
    - tool builds return cached tools without waiting for a blocked
    reconnect;
    - concurrent tool builds start only one background reconnect;
    - failed reconnects preserve cached tools and respect exponential
    cooldown;
    - a recovered client is retained and reused; and
    - a long-lived thread exposes recovered app tools on a later follow-up.
    
    Validation:
    
    - `just test -p codex-mcp` — 95 passed
    - `just test -p codex-core
    later_follow_up_uses_background_recovered_apps_after_mid_thread_startup_failures
    --no-capture` — passed
    - `just fix -p codex-mcp`
    - `just fmt`
  • [codex] fix terminal rollout event durability (#30144)
    Currently session code does not flush the thread store after appending
    the `TurnComplete` / `TurnAborted` events.
    
    This isn't a problem in practice for local storage because append_items
    itself effectively blocks, but any thread stores that buffer in
    append_items and only commit on flush effectively never get these events
    persisted.
    
    The fix adds explicit rollout flushes at the terminal emitters after
    normal completion and interruption.
    
    Added test cases that assert the number of flushes when completing or
    aborting turns. These are admittedly a little brittle and I'm open to
    better ideas on how to add automated testing.
  • [codex] allow CCA image generation and web search extensions (#29909)
    ## Summary
    
    - allow the standalone image-generation and web-search extensions for
    the actor-authorized provider shape used by CCA
    - preserve builtin `image_generation` and `web_search` for older models
    and existing flows
    - keep ordinary non-OpenAI providers excluded from both extensions
    - remove only the image extension local managed-AuthManager requirement
    that CCA cannot satisfy
    - share actor-authorization detection through `ModelProviderInfo`
    - keep Core tests focused on routing behavior and cover header-shape
    edge cases in `model-provider-info`
    - add a Responses Lite regression that verifies both
    `image_gen.imagegen` and `web.run`
    
    ## Why
    
    CCA uses a provider named `local` with `requires_openai_auth: false` and
    a non-empty `x-openai-actor-authorization` header. Core accepts that
    provider shape, but both extension provider-name gates rejected it;
    image generation additionally required a Codex-managed login.
    
    The standalone paths must coexist with existing builtin tools. New
    Responses Lite models can receive `image_gen.imagegen` and `web.run`,
    while older models continue using builtin tools.
    
    ## Impact
    
    This enables both standalone extensions for CCA once installed
    downstream, without removing or changing builtin-tool compatibility for
    older models.
    
    ## Validation
    
    - `just test -p codex-core
    responses_lite_exposes_standalone_tools_for_actor_authorized_provider`
    - `just test -p codex-core
    responses_lite_uses_standalone_web_search_and_image_generation`
    - `just test -p codex-core
    hosted_tools_follow_provider_auth_model_and_config_gates`
    - `just test -p codex-image-generation-extension`
    - `just test -p codex-web-search-extension`
    - `just test -p codex-model-provider-info`
    - `just fmt`
    - `git diff --check`
  • Expose MCP app identity in app context (#29934)
    ## Why
    
    MCP tool-call events need to expose trusted app identity and action
    metadata directly so v2 clients do not have to infer it from tool names
    or resource URIs.
    
    ## What changed
    
    - Add optional `appName`, `templateId`, and `actionName` fields to MCP
    tool-call `appContext`.
    - Populate `appName` and `templateId` from trusted Codex Apps metadata,
    and derive `actionName` from the trusted app resource metadata.
    - Preserve all three fields through core events, legacy protocol events,
    persisted thread history, resume redaction, and app-server v2 responses.
    - Document the public `appContext` fields in
    `codex-rs/app-server/README.md`.
    - Regenerate app-server JSON and TypeScript schemas and add coverage for
    serialization, persistence, redaction, and metadata propagation.
    
    ## Validation
    
    - `just test -p codex-app-server-protocol mcp_tool_call`
    - `just test -p codex-core
    mcp_tool_call_item_metadata_only_trusts_codex_apps_identity
    mcp_tool_call_item_includes_app_identity`
    - `just write-app-server-schema`
    
    ---------
    
    Co-authored-by: Martin Au-Yeung <280153141+martinauyeung-oai@users.noreply.github.com>
  • Keep MCP elicitation routable across runtime refreshes (#30127)
    ## Why
    
    An MCP tool call can still be waiting for an elicitation response when
    an environment update replaces the thread's MCP runtime.
    
    Before this change:
    
    ```text
    runtime A starts a tool call and asks the user
    environment becomes ready, so runtime B is published
    client answers the prompt through runtime B
    runtime B cannot find runtime A's pending responder
    ```
    
    The response is lost and the original tool call stays blocked.
    
    ## What changed
    
    All MCP runtimes for one thread now share a small elicitation router:
    
    ```text
    runtime A ---\
                   shared router: response token -> exact pending responder
    runtime B ---/
    ```
    
    When Codex surfaces an MCP elicitation, it assigns a unique opaque
    response token. The router records which pending request owns that
    token. A replacement runtime reuses the same router, so the latest
    runtime can deliver a response to a request started by the previous
    runtime.
    
    The Codex-owned token also prevents two runtime connections that reuse
    the same MCP server request ID from receiving each other's responses.
    
    This does not retain or search old MCP managers. Only the pending
    responder map is shared.
    
    ## Covered scenario
    
    The integration test exercises the complete failure mode:
    
    1. A thread starts while its selected environment is still unavailable.
    2. A configured MCP server starts a tool call and asks the client for
    input.
    3. The environment becomes ready, causing Codex to publish a replacement
    MCP runtime.
    4. The client answers the original prompt after the replacement.
    5. The original tool call receives that answer and completes.
    
    A focused routing test also creates two runtimes with the same server
    request ID and verifies that each response reaches the exact request
    that emitted its token.
    
    ## Scope
    
    This PR changes only elicitation response routing across MCP runtime
    replacement. It does not change when runtimes are rebuilt, which
    environments contribute MCP configuration, or how environment
    availability is detected.
  • Reinject missing World State fragments on resume (#30152)
    ## Why
    
    World State restores its structured snapshot on resume so unchanged
    sections do not have to be rendered again. That is safe only when the
    model-visible fragment represented by the snapshot is still present in
    retained history.
    
    For selected executor skills, the failing selected-capability scenario
    exposed this state:
    
    ```text
    persisted World State: selected skill catalog is known
    retained model history: selected skill catalog message is missing
    next diff: unchanged, so emit nothing
    ```
    
    The model resumes without being told about the selected skill catalog.
    
    ## What changed
    
    World State contributions may now optionally describe the concrete
    model-visible fragment that must remain in retained history.
    
    When a persisted snapshot is present:
    
    ```text
    matching retained fragment exists -> trust snapshot, emit nothing
    matching retained fragment missing -> treat section as absent, render current state once
    ```
    
    The skills extension uses this for non-empty selected-environment
    catalogs by matching its exact rendered catalog body. Empty or hidden
    catalogs do not require a fragment.
    
    ## Scope
    
    This does not clear or rebuild the whole World State baseline. It does
    not change skill discovery, cache invalidation, environment
    availability, or MCP runtime behavior. It only keeps a persisted section
    snapshot and its retained model context consistent across resume/history
    reconstruction.
    
    ## Coverage
    
    A focused World State regression test verifies both sides:
    
    - a missing retained fragment is rendered again
    - a matching retained fragment avoids duplicate injection
  • Project selected plugin runtime by environment availability (#30093)
    ## Why
    
    Selected plugin metadata is stable, but MCP processes are live runtime
    state. They need different lifetimes:
    
    - the MCP extension caches manifest, MCP, and connector declarations for
    each stable selected root;
    - each model step projects that cached metadata through the roots that
    resolved as ready for that exact step;
    - the MCP manager is rebuilt only when that availability projection
    changes.
    
    This matches executor skills: both features consume the same resolved
    step roots instead of inferring readiness from the turn's selected
    environments.
    
    ## Behavior
    
    ```text
    E1 not ready for this step
      -> no E1 MCP servers or connectors
      -> cached plugin metadata stays in ext/mcp
    
    E1 becomes ready
      -> reuse cached metadata
      -> publish one MCP runtime containing E1 capabilities
    
    same ready roots on the next step
      -> reuse the exact runtime; no rediscovery and no MCP restart
    
    resume
      -> create new extension thread state and a new MCP runtime
    ```
    
    All model-facing consumers use the same step snapshot:
    
    ```text
    resolved selected roots
            |
            v
    extension MCP/connector projection
            |
            v
    { MCP config, connector snapshot, MCP manager }
            |
            +-> advertise model tools
            +-> build app/connector tools
            +-> execute MCP calls
    ```
    
    ## Cache contract
    
    The existing MCP extension owns a cache keyed by the full
    `SelectedCapabilityRoot`:
    
    ```rust
    let state = thread_store.get_or_init(SelectedExecutorPluginMcpState::default);
    ```
    
    The cache lives with extension thread state. Environment availability
    filters projection but does not invalidate metadata. Resume creates new
    thread state. There is no file watcher or executor generation because
    contents behind a stable environment/root are assumed stable.
    
    ## What changes
    
    - Keeps executor plugin discovery and cached metadata in `ext/mcp`.
    - Caches MCP and connector declarations together per selected root.
    - Uses the step's already-resolved capability roots, including lazy
    environments that are not turn environments.
    - Reuses the current MCP runtime when the ready-root projection is
    unchanged.
    - Uses the same step MCP manager and connector snapshot for
    model-visible tools and execution.
    - Resolves direct thread-scoped MCP requests from the current
    selected-root projection.
    
    ## Deliberately out of scope
    
    - `app/list` remains based on the latest global host-plugin state; this
    PR does not make its response or notifications thread-specific.
    - `required = true` startup semantics do not apply to delayed executor
    MCP activation.
    - No filesystem/content invalidation.
    - No transport-disconnect watcher.
    - No executor generations or environment replacement semantics.
    - No client sharing across complete manager replacements.
    
    ## Stack
    
    1. Extension-owned World State sections.
    2. Project executor skills through World State.
    3. Pin one MCP runtime to each model step.
    4. **This PR:** project selected MCP and connector state from
    extension-owned metadata.
    5. Integration coverage for selected capability availability and resume.
    
    ## Verification
    
    -
    `selected_plugin_servers_use_managed_requirements_for_the_selected_root_id`
    - The stacked integration PR covers unavailable to ready activation,
    unchanged-runtime reuse, skills, MCP tools, connector attribution, and
    cold resume.
  • Pin MCP runtimes to model steps (#30101)
    ## Why
    
    An MCP refresh can replace the session's current manager while a model
    step is still running. The step must execute calls through the same
    manager whose tools it advertised.
    
    ## Boundary
    
    ```text
    current session MCP runtime
              |
              | capture once for this model step
              v
    StepContext.mcp
      - exact MCP config
      - exact connection manager
      - exact runtime environment context
    ```
    
    ```rust
    pub struct McpRuntimeSnapshot {
        config: Arc<McpConfig>,
        manager: Arc<McpConnectionManager>,
        runtime_context: McpRuntimeContext,
    }
    ```
    
    ## Example
    
    ```text
    step A captures runtime A and advertises A's tools
    refresh publishes runtime B
    step A tool call -> runtime A
    next step        -> runtime B
    ```
    
    Capturing the snapshot is only an `Arc` clone. It does not restart MCPs
    or make an RPC.
    
    ## What changes
    
    - Captures one MCP runtime in `StepContext`.
    - Uses it for tool planning, tool calls, resources, approvals, connector
    attribution, and elicitation.
    - Publishes replacement runtimes atomically.
    - Lets an old runtime live only while an in-flight step or request still
    holds its `Arc`.
    
    Most of this diff is mechanical routing from the session-global manager
    to `step_context.mcp`; it does not introduce selected-plugin discovery
    yet.
    
    ## What does not change
    
    - No plugin or extension migration.
    - No new MCP cache policy.
    - No environment file watching.
    - No client sharing between separate managers.
    
    ## Stack
    
    1. Extension-owned World State sections.
    2. Project executor skills through World State.
    3. **This PR:** pin one MCP runtime to each model step.
    4. Project selected MCP/app/connector metadata by environment
    availability.
    5. One end-to-end integration scenario.
  • Project executor skills through World State (#30088)
    ## Why
    
    A selected executor environment can be unavailable in one model step and
    ready in the next. The model should see its skills only while that
    environment is ready, without rescanning stable files on every sample.
    
    The product assumption is simple:
    
    - an environment ID names one stable logical environment;
    - the selected root contents do not change during the thread.
    
    ## Behavior
    
    ```text
    E1 unavailable -> do not show E1 skills
    E1 ready       -> discover once, cache, show through World State
    E1 unavailable -> hide skills, keep cache
    E1 ready again -> reuse cache, show skills again
    resume         -> create a new thread cache and discover again
    ```
    
    The cache key is the full `SelectedCapabilityRoot`. Availability does
    not invalidate it; dropping the extension's thread state does.
    
    The step supplies the ready selected roots directly. They do not have to
    be turn environments:
    
    ```text
    turn environment: laptop
    selected root:    worker:/plugins/lint-fix
    
    worker ready -> lint-fix skills are visible
    ```
    
    ## What changes
    
    - Keeps executor skill catalogs in the existing skills extension.
    - Passes the roots resolved as ready for the step into World State
    contributors.
    - Loads each ready selected root at most once per thread.
    - Contributes the executor catalog as the `skills` World State section.
    - Uses the exact step catalog for explicit skill selection and body
    reads.
    - Leaves host and orchestrator skill behavior where it already lives.
    
    Taking a step snapshot itself does not add an RPC. Executor filesystem
    calls happen only on the first discovery of a stable root for that
    thread.
    
    ## What does not change
    
    - No filesystem watcher or content-based invalidation.
    - No retry/generation framework.
    - No skill runtime migration into core.
    - No general rewrite of the skills extension.
    
    ## Stack
    
    1. Extension-owned World State sections.
    2. **This PR:** project cached executor skills through World State.
    3. Pin one MCP runtime to each model step.
    4. Project selected MCP/app/connector metadata by environment
    availability.
    5. One end-to-end integration scenario.
  • Recognize Work web and mobile thread originators (#29988)
    ## Summary
    
    - recognize `codex_work_web` and `codex_work_mobile` as supported
    `thread/start.serviceName` values
    - use the recognized value as the thread-scoped originator, with the
    same persistence and request propagation added for `codex_work_desktop`
    - cover precedence over persisted and inherited originators
    
    This is the Codex consumer for the service names introduced by
    [openai/openai#1073178](https://github.com/openai/openai/pull/1073178).
    
    ## Rollout / Compatibility
    
    The producer is ChatGPT's app-server integration in
    openai/openai#1073178. This PR is the Codex app-server consumer that
    converts those service names into the outgoing per-thread `originator`.
    
    Until this change is deployed, the new service names are ignored and
    Codex continues using its fallback originator. Deploy this mapper and
    the matching codex-backend compatibility change in
    [openai/openai#1073594](https://github.com/openai/openai/pull/1073594)
    while the existing Flora egress overwrite remains in place. Remove that
    overwrite in
    [openai/openai#1073197](https://github.com/openai/openai/pull/1073197)
    only after both consumers are deployed.
    
    ## Validation
    
    - `just test -p codex-core
    effective_originator_prefers_thread_scoped_sources_before_env_originator`
    - `just fix -p codex-core`
    - `just fmt`
  • Let extensions contribute World State sections (#30100)
    ## Why
    
    #29856 already owns the durable thread intent and exact environment
    binding. This PR adds only the small missing extension boundary: an
    extension can contribute one named World State section, while core still
    owns persistence, diffing, and model-visible fragment types.
    
    This lets skills stay in the skills extension instead of moving their
    runtime into core.
    
    ## Shape
    
    ```text
    extension-owned state
            |
            | contribute section id + JSON snapshot + renderer
            v
    core World State
            |
            | compare with the previous snapshot
            v
    no message, or one incremental model-visible update
    ```
    
    The extension API is deliberately small:
    
    ```rust
    fn contribute_world_state(...) -> Vec<WorldStateSectionContribution>
    ```
    
    Core adapts the rendered result to `ContextualUserFragment`, records the
    snapshot, and keeps the existing compaction/resume behavior.
    
    ## What changes
    
    - Adds extension-owned World State section contributions.
    - Calls those contributors from the existing per-step World State
    builder.
    - Restores durable selected capability roots into extension thread state
    on resume.
    - Keeps the actual model-context fragment and rollout machinery in core.
    
    ## What does not change
    
    - No skill or MCP implementation moves out of its extension.
    - No new file watcher, generation, or RPC.
    - No generic migration of existing World State sections.
    - No change to the stable environment-ID assumption from #29856.
    
    ## Example
    
    ```text
    step 1 snapshot: skills = []
    step 2 snapshot: skills = [executor-demo:deploy]
    
    core asks the skills extension to render only that change.
    ```
    
    ## Stack
    
    1. **This PR:** let extensions contribute World State sections.
    2. Project executor skills through the skills extension.
    3. Pin one MCP runtime to each model step.
    4. Project selected MCP/app/connector metadata by environment
    availability.
    5. One end-to-end integration scenario.
  • [codex] Add managed MCP server matchers (#29648)
    ## Summary
    
    This PR extends the existing managed `mcp_servers` identity requirement
    so that one name-qualified rule can use either:
    
    - the released exact command or URL identity;
    - an exact stdio executable with an exact-length, ordered argument
    matcher list; or
    - a direct MCP URL matcher.
    
    Matcher-based rules stay under the released `identity` key and use the
    same `McpServerRequirement` abstraction and `mcp_servers.<server_name>`
    namespace.
    
    ## Behavior
    
    Policy activation and name qualification are unchanged:
    
    - If `mcp_servers` is absent, ordinary configured MCP servers remain
    unrestricted.
    - If `mcp_servers` is present, a server needs a matching same-name
    requirement.
    - `mcp_servers = {}` continues to deny every configured MCP server.
    - Existing exact identity requirements keep their released semantics.
    
    Plugin-bundled MCP servers use the same requirement shapes under
    `plugins.<plugin_name>.mcp_servers.<server_name>`. Top-level non-empty
    rules continue to govern only ordinary configured servers; plugin rules
    remain explicitly plugin-scoped. The existing globally empty
    `mcp_servers = {}` plugin kill switch is preserved.
    
    Requirements layers continue to use the existing regular TOML merge
    behavior. Atomic replacement of named MCP requirements is intentionally
    out of scope here and is tracked independently in #30118.
    
    ## Requirement contract
    
    The released exact identity contract remains valid:
    
    ```toml
    [mcp_servers.docs.identity]
    command = "codex-mcp"
    
    [mcp_servers.remote.identity]
    url = "https://example.com/mcp"
    ```
    
    Command identities continue to check only `command`; they do not inspect
    arguments, `cwd`, `env`, or `env_vars`.
    
    A command matcher uses an exact executable plus an exact-length, ordered
    argument list. Each argument position supports `exact`, `prefix`, or
    full-value `regex` matching:
    
    ```toml
    [mcp_servers.internal_mcp_proxy.identity]
    command = { executable = "company-cli", args = [
      { match = "exact", value = "mcp" },
      { match = "exact", value = "proxy" },
      { match = "exact", value = "--server" },
      { match = "regex", expression = '^https://[A-Za-z0-9-]+\.mcp\.internal\.example\.com(?::443)?(?:/.*)?$' },
    ] }
    ```
    
    Direct streamable HTTP MCP definitions can use the same value matcher
    types through `identity.url`:
    
    ```toml
    [mcp_servers.internal_http.identity]
    url = {
      match = "regex",
      expression = '^https://[A-Za-z0-9-]+\.mcp\.internal\.example\.com(?:/.*)?$',
    }
    ```
    
    Plugin-bundled MCP matchers use the same contract inside the
    plugin-qualified allowlist:
    
    ```toml
    [plugins."sample@test".mcp_servers.internal_mcp_proxy.identity]
    command = { executable = "company-cli", args = [
      { match = "exact", value = "mcp" },
      { match = "exact", value = "proxy" },
    ] }
    ```
    
    Regexes are validated while managed requirements are loaded, and regex
    matching must cover the complete value. Command matchers constrain only
    the executable and arguments.
    
    ## Why
    
    Enterprise administrators need to allow MCP servers by executable and
    positional-argument shape, including fixed arguments plus constrained
    values such as internal MCP URLs passed to a proxy.
    
    ## Validation
    
    - `just fmt`
    - `git diff --check`
    - `just test -p codex-config` (198 passed)
    - `just test -p codex-core mcp_servers_by_matchers --lib` (2 passed)
  • feat(core, mcp): cache codex_apps tools in memory (#29003)
    ## Description
    
    This makes Codex Apps tool reads use a shared in-memory snapshot instead
    of rereading the disk cache every time `list_all_tools()` runs. Disk
    still seeds the cache on startup and gets updated after successful
    fetches, but it is no longer the live read path.
    
    The core change is that `McpManager` now owns a process-scoped
    `CodexAppsToolsCache`. Codex threads in the same app-server process now
    share this Codex Apps in-memory tools snapshot. The snapshot is keyed by
    the Codex home plus the Codex Apps identity: the active Codex auth
    user/workspace and the effective Codex Apps MCP source config.
    
    There's already code to hard-refresh the cache, so we respect it in this
    PR.
    
    ## Local benchmark
    
    I ran a local steady-state microbenchmark of the exact repeated Codex
    Apps cached-tools read this PR removes, using the same real local cache
    payload in both trees: `3,678,138` bytes and `381` tools. The cache file
    was already warm in the OS page cache, so this measures same-process
    reread/deserialization work rather than cold-disk latency or full turn
    latency. Each run is 25 iterations (mimicking a turn that makes 25
    inference calls).
    
    | Version | Run 1 | Run 2 | Avg |
    |---|---:|---:|---:|
    | `origin/main` disk read + JSON deserialize + `filter_tools` | `50.755
    ms` | `52.894 ms` | `51.825 ms` |
    | This branch in-memory `current_tools` + `filter_tools` | `0.740 ms` |
    `0.778 ms` | `0.759 ms` |
    
    That removes about `51 ms` from each repeated Codex Apps cached-tools
    read on this machine, roughly `68x` faster for that subpath. It is
    useful evidence for the hot path this PR changes, but not a claim that
    every production turn gets `51 ms` faster; end-to-end impact also
    depends on the rest of `list_all_tools()` and tool-payload construction.
    
    This is on my M2 Max macbook, so with a slower disk this would be much
    worse (and indeed we did see this really blew up turn runtime with a
    slow disk).
  • [codex] impl delivery_mode: current time reminders on response boundaries (#30033)
    ## Summary
    - track user-like input and tool-output boundaries in current-time
    reminder state
    - gate reminder injection when delivery_mode is
    after_user_or_tool_output
    - preserve interval debounce and forced reminders after context-window
    changes
    
    ## Why
    Training can request reminders only after user or tool-output items
    while keeping the existing canonical pre-inference history-injection
    path.
    
    ## Validation
    - just test -p codex-core
    current_time_reminders_can_follow_only_user_or_tool_outputs
    - just test -p codex-core
    current_time_reminders_follow_time_interval_and_persist_in_history
    - just test -p codex-core
    current_time_reminder_is_refreshed_after_compaction
    - just fix -p codex-core
  • [codex] add current time reminder delivery mode config (#30031)
    ```python
    delivery_mode = "any_inference" # default
    delivery_mode = "after_user_or_tool_output" # new mode
    ``` 
    
    ## Validation
    - just test -p codex-core load_config_resolves_current_time_reminder
    - just test -p codex-core
    lock_contains_prompts_and_materializes_features
  • core: expose permission profile to shell tools (#29941)
    ## tl;dr
    
    Inject a `CODEX_PERMISSION_PROFILE` environment variable with the name
    of the current permission profile when invoking a shell tool.
    
    ## Why
    
    Shell tool owners may need to launch nested commands under the same
    named permission profile, including through `codex sandbox -P PROFILE
    --include-managed-config`. Until now, child processes could observe
    sandbox and network metadata but could not identify the active named
    permission profile.
    
    The `--include-managed-config` flag is essential when a helper
    reconstructs the sandbox from a profile name: it ensures the nested
    sandbox also loads managed enterprise requirements. Without it, using
    the inherited profile could unintentionally create a sandbox that does
    not enforce the organization's managed restrictions.
    
    The new environment value is intentionally informational and **must not
    be treated as trusted input**. Any process in the ancestry can overwrite
    an environment variable, so a consumer that passes this value to `codex
    sandbox -P` must first validate it against the profiles that helper is
    authorized to use.
    
    ## Example Use Case
    
    Suppose an organization provides a trusted `remote-bash` wrapper that
    lets Codex run a command on an approved build host. The local shell
    command uses the named `:workspace` permission profile:
    
    ```toml
    default_permissions = ":workspace"
    ```
    
    The command exposed to the model is a small zsh wrapper. It deliberately
    delegates with `exec`, preserving the original arguments and process
    environment:
    
    ```zsh
    #!/usr/bin/env zsh
    exec /opt/codex-tools/remote_bash.py "$@"
    ```
    
    The model invokes the public wrapper, not its Python implementation:
    
    ```sh
    /opt/codex-tools/remote-bash \
      --host builder.example.com \
      -- printf '%s' 'hello world'
    ```
    
    Only the inner implementation is authorized to escape the local sandbox:
    
    ```starlark
    prefix_rule(
        pattern=["/opt/codex-tools/remote_bash.py"],
        decision="allow",
    )
    ```
    
    With zsh-fork, execution begins with `remote-bash` inside the
    `:workspace` sandbox. When the wrapper calls `exec`, the exact prefix
    rule matches `remote_bash.py`, so that inner script is restarted
    unsandboxed. The escalated process inherits:
    
    ```text
    CODEX_PERMISSION_PROFILE=:workspace
    ```
    
    Inheritance does not make the value trustworthy. `remote_bash.py`
    independently allowlists both the remote host and the permission profile
    before using either value. In particular, a forged value such as
    `:danger-full-access` is rejected before it can reach `codex sandbox
    -P`:
    
    ```python
    import argparse
    import os
    import shlex
    import sys
    
    ALLOWED_HOSTS = {"builder.example.com"}
    ALLOWED_PROFILES = {":workspace"}
    
    parser = argparse.ArgumentParser()
    parser.add_argument("--host", required=True)
    separator = sys.argv.index("--")
    args = parser.parse_args(sys.argv[1:separator])
    command = sys.argv[separator + 1:]
    
    if args.host not in ALLOWED_HOSTS:
        parser.error("host is not allowlisted")
    if not command:
        parser.error("the remote command must not be empty")
    
    profile = os.environ.get("CODEX_PERMISSION_PROFILE")
    if not profile:
        raise SystemExit("CODEX_PERMISSION_PROFILE must not be empty")
    if profile not in ALLOWED_PROFILES:
        raise SystemExit("CODEX_PERMISSION_PROFILE is not allowlisted")
    
    remote_command = shlex.join(command)
    sandbox_command = shlex.join([
        "codex", "sandbox", "-P", profile,
        "--include-managed-config", "--",
        "bash", "-lc", remote_command,
    ])
    print(shlex.join(["ssh", args.host, sandbox_command]))
    ```
    
    This builds each command layer as an argument vector and uses
    `shlex.join()` at the boundary, rather than interpolating untrusted
    shell text. After validation and parsing, the nested command has this
    structure:
    
    ```text
    ssh argv:
      ["ssh", "builder.example.com", SANDBOX_COMMAND]
    
    SANDBOX_COMMAND argv:
      ["codex", "sandbox", "-P", ":workspace",
       "--include-managed-config", "--",
       "bash", "-lc", "printf %s 'hello world'"]
    
    bash -lc payload argv:
      ["printf", "%s", "hello world"]
    ```
    
    A production implementation could execute that SSH command. The
    integration fixture prints it and parses the result back into arguments,
    verifying the complete flow:
    
    ```text
    model invokes outer wrapper
      -> zsh-fork starts wrapper under :workspace
      -> wrapper execs allowlisted Python script
      -> prefix rule restarts Python script unsandboxed
      -> Python script inherits CODEX_PERMISSION_PROFILE=:workspace
      -> Python script verifies :workspace is allowlisted
      -> remote command runs codex sandbox -P :workspace
         with --include-managed-config
      -> nested sandbox honors managed enterprise requirements
    ```
    
    This gives the trusted helper access to resources outside the local
    sandbox—such as SSH credentials—while ensuring that it can select only
    an explicitly authorized profile and that work on the remote host
    remains subject to the organization's managed requirements.
    
    ## What changed
    
    - Inject `CODEX_PERMISSION_PROFILE` after shell environment policy
    evaluation so the active profile wins over inherited or configured stale
    values.
    - Apply the variable to both `shell_command` and unified `exec_command`,
    including local, zsh-fork, and remote exec-server paths.
    - Remove stale values when the session has no active named profile.
    - Preserve the current profile value when loading a shell snapshot so a
    parent snapshot cannot restore an older profile.
    
    ## Testing
    
    - Added classic-shell integration coverage proving an exact prefix rule
    can run a `require_escalated` script outside the `:workspace` sandbox
    while preserving `CODEX_PERMISSION_PROFILE=:workspace`.
    - Added zsh-fork integration coverage in which the model invokes an
    outer zsh wrapper, an inner allowlisted `remote_bash.py` runs
    unsandboxed, and its printed SSH command reconstructs the inherited
    `:workspace` sandbox with `--include-managed-config` while preserving
    every argument after `--`.
    - The example helper treats `CODEX_PERMISSION_PROFILE` as untrusted and
    validates it against `ALLOWED_PROFILES` before constructing the nested
    command.
    - Assert that the reconstructed sandbox command includes
    `--include-managed-config` so nested use of the inherited profile cannot
    bypass managed enterprise requirements.
    - Added coverage for overriding and removing stale profile values.
    - Verified `shell_command` receives the selected active profile.
    - Added shell snapshot coverage using `printenv
    CODEX_PERMISSION_PROFILE`.
  • [codex] current time reminder interval to be set to 0 (#30029)
    A zero interval lets callers request a reminder at every
    otherwise-eligible inference boundary.
    
    ## Validation
    - just test -p codex-core load_config_resolves_current_time_reminder
  • feat: add provider-aware model fallback to thread start (#29942)
    ## Why
    
    Helper threads such as task title generation can request a model ID that
    is valid for the default OpenAI provider but unavailable from the active
    provider. With Amazon Bedrock, `gpt-5.4-mini` is rejected while the
    provider static catalog exposes Bedrock model IDs such as
    `openai.gpt-5.5` and `openai.gpt-5.4`. This causes repeated background
    404s and can surface a misleading turn error even when the main turn
    succeeds.
    
    Clients need an explicit way to ask app-server to resolve an unavailable
    helper model to the active provider default. That fallback must remain
    limited to providers with an authoritative static catalog so custom or
    dynamically discovered model IDs are not rewritten based on an
    incomplete catalog.
    
    Fixes #28741.
    
    ## What changed
    
    - Add the experimental `allowProviderModelFallback` option to
    `thread/start`, defaulting to `false` to preserve existing behavior.
    - Thread the option through thread creation and model selection.
    - When enabled for a static model manager, preserve requested models
    present in the catalog and replace unavailable models with the provider
    default.
    - Continue preserving explicit model IDs for dynamic model managers
    without fetching a catalog solely to validate them.
    - Document the new `thread/start` behavior in the app-server API
    overview.
    
    ## Test
    Temporary test-client harness:
    ```
    ThreadStartParams {
        model: Some("gpt-5.4-mini".to_string()),
        allow_provider_model_fallback: true,
        ..Default::default()
    }
    ```
    Command:
    ```
    CODEX_HOME=/tmp/codex-bedrock-thread-start-home \
    CODEX_E2E_BEDROCK_THREAD_START_ONLY=1 \
    ./target/debug/codex-app-server-test-client \
      --codex-bin ./target/debug/codex \
      -c 'model_provider="amazon-bedrock"' \
      send-message-v2 --experimental-api ignored
    ```
    Relevant output:
    ```
    > "method": "thread/start",
    > "params": {
    >   "model": "gpt-5.4-mini",
    >   "modelProvider": null,
    >   "allowProviderModelFallback": true,
    >   ...
    > }
    
    < "result": {
    <   "model": "openai.gpt-5.5",
    <   "modelProvider": "amazon-bedrock",
    <   ...
    < }
    ```
  • Persist selected capability roots and resolve availability per model step (#29856)
    ## Why
    
    `selectedCapabilityRoots` is durable thread intent: “use this capability
    root from environment `worker`.”
    
    The important product assumption is:
    
    > One environment ID always names the same logical executor and stable
    contents.
    
    `worker` does not silently change from executor A to an unrelated
    executor B. The process-local connection handle for `worker` can still
    be replaced while Codex is running, though, for example when
    `environment/add` registers a fresh handle for the same logical
    environment.
    
    The thread should persist only the stable selection. Each model step
    should pair that selection with the exact ready handle captured for that
    step.
    
    ## The boundary
    
    ```text
    persisted thread intent
      plugin@1 -> environment "worker"
                    |
                    | capture the current step
                    v
    model-step view
      unavailable, or
      plugin@1 + worker's exact captured ready handle
    ```
    
    The environment ID is the stable identity and cache key. The
    `Arc<Environment>` is only a process-local handle retained so consumers
    of one model step use the same captured environment. It is never
    persisted and it does not imply different environment contents.
    
    ## What changes
    
    ### Persist the stable selection
    
    Selected roots are written into `SessionMeta` and restored with the
    thread. Forked subagents inherit the same selections, including
    bounded-history forks.
    
    Only stable data is persisted: root ID, environment ID, and root path.
    
    ### Capture readiness together with the exact handle
    
    The environment snapshot records:
    
    ```rust
    environment_id -> Some(Arc<Environment>) // ready in this step
    environment_id -> None                   // still starting in this step
    ```
    
    This prevents readiness and execution from coming from different
    registry snapshots.
    
    For example:
    
    ```text
    step snapshot: worker -> handle A, ready
    environment/add: worker -> fresh handle B for the same logical environment
    current step: plugin@1 still uses captured handle A
    ```
    
    Without carrying handle A in the snapshot, the resolver could combine “A
    was ready” with handle B and treat B as ready before it had finished
    starting.
    
    This does not change cache invalidation. Stable capability metadata
    remains identified by environment ID and capability root. Replacing a
    process-local handle under the same stable environment ID does not
    invalidate or rediscover that metadata.
    
    ### Resolve availability per model step
    
    - A ready captured environment produces resolved roots using its
    captured handle.
    - A starting, missing, or failed environment is omitted from that step.
    - A selected lazy environment that is outside the turn's captured
    environment set is asked to start, and a later step can observe it as
    ready.
    - No capability files are scanned here.
    
    Transient transport disconnects remain the remote client's reconnect
    concern. This PR models initial attachment/readiness; it does not add
    live socket-connectivity state.
    
    ## Example
    
    ```text
    thread selection: plugin@1 -> environment "worker"
    
    step 1: worker is starting -> plugin@1 unavailable
    step 2: worker is ready    -> plugin@1 resolves through worker's captured handle
    step 3: fresh local handle -> current step remains pinned; a later step captures its own view
    ```
    
    Temporary unavailability does not discard the durable selection. Later
    PRs can retain stable metadata caches while projecting only currently
    available capabilities into model-visible World State.
    
    ## Compatibility
    
    The app-server request shape does not change. Older rollouts without
    `selected_capability_roots` deserialize to an empty list.
    
    ## Stack
    
    1. **This PR:** persist stable selected roots and resolve them through
    an exact model-step handle.
    2. #29960: cache stable skill metadata and project available skills into
    World State.
    3. #29946: cache stable plugin declarations and manage the separate live
    MCP runtime.
  • Support OAuth for HTTP MCP servers from selected executor plugins (#28529)
    ## Why
    
    #28522 routes selected-plugin HTTP MCP traffic through the owning
    executor, but OAuth bootstrap and refresh still used host-local clients.
    Executor-only servers therefore cannot complete discovery or login
    through the same network boundary as the MCP connection.
    
    ## What changed
    
    - adapt `codex_exec_server::HttpClient` to RMCP 1.8's `OAuthHttpClient`
    contract
    - let RMCP own discovery, dynamic registration, PKCE, token exchange,
    and refresh
    - route auth status, persisted-token startup, and app-server login
    through the server runtime while preserving the existing local discovery
    path
    - add optional `threadId` to `mcpServer/oauth/login` and echo it in the
    completion notification
    - implement RMCP's redirect policy and 1 MiB OAuth response limit over
    executor HTTP
    - cover selected-thread OAuth discovery and login through an
    executor-only route
    
    Depends on #28522.
  • core: reconcile legacy WorldState sections (#29997)
    ## Why
    
    Older rollouts can retain model-visible context for a WorldState section
    without having a persisted snapshot for that section. Treating the
    missing snapshot as definitely absent can duplicate old context or fail
    to tell the model that it was replaced or removed.
    
    This provides a generic migration path for sections moving into
    WorldState, beginning with AGENTS.md.
    
    Builds on #29810.
    
    ## What changed
    
    - distinguish section state that is absent, known from a persisted
    snapshot, or unknown because matching legacy context remains in history
    - let WorldState sections identify their own legacy fragments while
    `ContextManager` owns history reconciliation and baseline persistence
    - make AGENTS.md emit one conservative replacement or removal update for
    legacy history, then deduplicate from the newly persisted baseline
    - preserve existing environment rendering when persisted section data is
    missing or malformed
    
    ## Testing
    
    - `just test -p codex-core world_state`
    - `just test -p codex-core
    cold_resume_invalidates_deleted_legacy_agents_md_once -- --exact`
  • core: make AGENTS.md react to environment changes (#29810)
    ## Why
    
    With deferred executors, a turn can begin before a remote environment
    attaches. AGENTS.md discovery previously ran only during session setup,
    so instructions from a later environment never reached the model or the
    session instruction sources.
    
    WorldState persistence has now landed, so this uses the durable
    model-visible baseline directly instead of carrying a temporary
    resume/fork compatibility path.
    
    ## What
    
    - Add an `AgentsMdManager` in `SessionServices` to own host
    instructions, loaded state, and refresh caching.
    - When `DeferredExecutor` is enabled, refresh AGENTS.md when attached
    environment selections change and freeze the result in the corresponding
    `StepContext`.
    - Represent AGENTS.md as a persisted WorldState section for every
    session, with bounded initial, replacement, and removal updates.
    - Remove duplicate AGENTS.md state and rendering from
    `SessionConfiguration` and `TurnContext`.
    - Build initial context, per-request updates, and compaction context
    from the same step-scoped value.
    - On resume and fork, compare current instructions with the restored
    WorldState baseline and inject a replacement exactly once when they
    differ.
    
    Builds on #29833, #29835, and #29837.
    
    ## Tests
    
    - Covers a remote environment becoming ready mid-turn, with AGENTS.md
    appearing on the next request exactly once and updating canonical
    instruction sources.
    - Covers full, unchanged, replaced, and removed AGENTS.md WorldState
    rendering.
    - Covers changed instructions across cold resume and fork without
    duplicate reinjection.
    - Covers remote-v2 compaction retaining creation-time instructions in
    the live session and cold resume appending one replacement when the
    source changed.
    - Ran focused `codex-core` AGENTS.md, WorldState, and context-update
    test suites.
  • feat: use run agent task auth for inference (#19051)
    ## Stack
    
    This is PR 3 of the simplified HAI single-run-task stack:
    
    - [#19047](https://github.com/openai/codex/pull/19047) Agent Identity
    assertion and task-registration primitives, including the shared
    run-task helper used by existing Agent Identity JWT auth.
    - [#19049](https://github.com/openai/codex/pull/19049)
    Disabled-by-default ChatGPT auth opt-in that provisions/reuses persisted
    Agent Identity runtime auth and its single run task.
    - [#19051](https://github.com/openai/codex/pull/19051) Run-scoped
    provider auth that uses one backend-owned task id for first-party
    inference and compaction requests.
    
    [#19054](https://github.com/openai/codex/pull/19054) collapsed out of
    the active stack because the simplified design no longer needs a
    separate background/control-plane task helper.
    
    ## Summary
    
    This PR moves Agent Identity usage into provider auth resolution. That
    keeps `AgentAssertion` auth tied to first-party OpenAI provider requests
    instead of applying a late session-wide override that could affect
    local, custom, Bedrock, API-key, or external-bearer providers.
    
    What changed:
    
    - adds a small `ProviderAuthScope` struct carrying the run auth policy
    and session source needed by provider-scoped auth resolution
    - lets `Session` opt the existing `ModelClient` into `ChatGptAuth`
    policy when `use_agent_identity` is enabled, without adding a second
    model-client constructor
    - resolves Agent Identity only for first-party OpenAI provider auth
    paths
    - uses the persisted run task id from the `AgentIdentityAuth` record to
    build `AgentAssertion` auth for Responses requests
    - routes shared request setup through scoped provider auth so unary
    compact requests use the same run-task assertion path as inference turns
    - keeps local/custom/Bedrock/env-key/external-bearer provider auth
    unchanged
    - lets missing run-task state surface through the existing model-request
    error path instead of silently falling back to bearer auth
    
    This PR intentionally does not create thread-scoped, target-scoped, or
    background-scoped task identities. The run task is the only task Codex
    registers in this POC shape.
    
    ## Testing
    
    - `just test -p codex-model-provider`
    - `just test -p codex-core client::tests::provider_auth_scope_uses`
    - `just test -p codex-core remote_compact_uses_agent_identity_assertion`
  • [codex] route sleep through time providers (#29973)
    ## Summary
    
    - add a cancellable sleep operation to `TimeProvider`
    - route `clock.sleep` through the configured provider
    - extend the supported sleep duration to 12 hours
    - complete the sleep turn item before propagating provider failures
    
    ## Why
    
    This isolates the core clock abstraction needed by external clock
    integrations. Existing system and app-server behavior remains wall-clock
    based in this PR; the stacked follow-up supplies app-server sleeps from
    an external clock.
  • core: raise token budget message limits (#29970)
    ## Why
    
    Token-budget reminder and guidance messages can require more than 1,000
    bytes to provide useful model-facing instructions. At the same time,
    these strings are injected into model-visible context, so their size
    must remain tightly bounded in response to the P0 context-growth
    concern. A 2,000-byte runtime cap provides additional room without
    allowing the substantially larger context growth of a 4 KiB limit.
    
    ## What changed
    
    - raises the runtime byte limits for token-budget reminder templates and
    guidance messages from 1,000 to 2,000
    - raises the corresponding JSON Schema `maxLength` values to 2,000
    - regenerates `codex-rs/core/config.schema.json`
    
    ## Testing
    
    - `just test -p codex-features`
    - `just test -p codex-core load_config_resolves_token_budget_config
    load_config_rejects_invalid_token_budget_reminder_template`
    
    The full `codex-core` test run completed 2,858 tests successfully and
    encountered seven unrelated environment-sensitive failures involving
    Seatbelt/network environment assertions, MCP capability setup, and abort
    timing.
  • Report MCP error codes with server attribution (#29969)
    ## Why
    
    MCP error-code telemetry special-cased Codex Apps: its reported error
    codes were retained, while codes from every other MCP server were
    replaced with `unknown`. Error reporting should behave consistently for
    every MCP server. The server name already identifies where an error came
    from, so telemetry does not need a separate Codex Apps classification.
    
    This follows up on [#28976](https://github.com/openai/codex/pull/28976),
    which introduced MCP error-code telemetry.
    
    ## What changed
    
    - Add the MCP server name to call, duration, and error metrics.
    - Retain bounded, sanitized tool error codes from every MCP server.
    - Remove `McpErrorCodeSource` and the Codex Apps ownership lookup from
    telemetry collection.
    - Use the same metric-tagging path for blocked, rejected, and executed
    MCP calls.
    
    ## Test plan
    
    - Verify the complete metric tag set includes the sanitized MCP server
    name.
    - Verify error codes from ordinary MCP servers are retained, bounded,
    and sanitized.
    - Preserve coverage for request failures, tool-result failures, nested
    auth failures, and span attributes.
  • [3/3] core: replay persisted world state (#29837)
    ## Why
    
    Persisting `WorldState` snapshots and patches is only useful if resume
    and fork restore that exact comparison baseline. Rebuilding it from
    `TurnContextItem` loses section state and can either repeat or suppress
    model-visible updates.
    
    This is the third PR in the WorldState persistence stack, built on
    #29835.
    
    ## What
    
    - Replay full WorldState snapshots and RFC 7386 patches through the
    existing rollout reconstruction segments.
    - Discard state from rolled-back turns and treat compaction as a
    baseline reset.
    - Hydrate `ContextManager` from the reconstructed snapshot on resume and
    fork.
    - Remove the synthetic `TurnContextItem` to WorldState conversion path.
    - Leave legacy or malformed rollouts without a baseline so the next
    update safely emits a full snapshot.
    
    ## Testing
    
    - `just test -p codex-core world_state`
    - `just test -p codex-core rollout_reconstruction_tests`
    - `just fix -p codex-core`
    - `just test -p codex-core` *(the changed tests passed; the full run
    also hit unrelated existing/test-environment failures, primarily a
    missing `test_stdio_server` binary)*
  • [codex] Add Ultra reasoning effort (#29899)
    ## Why
    
    Ultra should be one user-facing reasoning selection for work that
    benefits from both maximum reasoning and proactive multi-agent
    delegation. Without it, clients must coordinate maximum reasoning with
    the experimental `multiAgentMode` setting, even though the inference
    backend still expects its existing `max` effort value.
    
    This change makes reasoning effort the source of truth: clients select
    `ultra`, core derives proactive multi-agent behavior when the turn is
    eligible for multi-agent V2, and inference requests continue to use the
    backend-compatible `max` value.
    
    ## What changed
    
    - Add `ultra` as a first-class reasoning effort and preserve
    model-catalog ordering when exposing it to clients.
    - Convert `ultra` to `max` at the inference request boundary, including
    Responses HTTP/WebSocket requests, startup prewarm, compaction, and
    memory summarization.
    - Derive effective multi-agent mode per turn from effective reasoning
    effort:
      - eligible multi-agent V2 + `ultra` → `proactive`
      - eligible multi-agent V2 + any other effort → `explicitRequestOnly`
    - V1 or otherwise ineligible sessions → no multi-agent mode instruction
    - Keep the derived effective mode in turn context history so successive
    turns can emit a developer-message update only when the effective mode
    changes.
    - Remove selected multi-agent mode from core session configuration, turn
    construction, thread settings, resume/fork restoration, and subagent
    spawn plumbing. Subagents inherit reasoning effort and derive their own
    effective mode.
    - Retain the experimental app-server `multiAgentMode` fields for wire
    compatibility while marking them deprecated. Request values are accepted
    but ignored; compatibility response fields report `explicitRequestOnly`.
    - Display Ultra in the TUI using the order supplied by `model/list`.
    
    ## Validation
    
    - `just test -p codex-core ultra_reasoning_uses_max_for_requests`
    - `just test -p codex-tui model_reasoning_selection_popup`
  • [2/3] core: persist world state in rollouts (#29835)
    ## Why
    
    `WorldState` currently remembers its model-visible diff baseline only in
    memory. That leaves no durable source for restoring the exact baseline
    after resume, fork, rollback, or compaction.
    
    This is the second PR in the WorldState persistence stack, built on
    #29833 and following #29249. It records durable state transitions; the
    next PR will replay them during rollout reconstruction.
    
    ## What
    
    - Add a `world_state` rollout item containing either a full snapshot or
    an RFC 7386 JSON Merge Patch.
    - Persist a full snapshot after initial context and after compaction
    establishes a new context window.
    - Persist non-empty patches when later sampling steps or turns advance
    the WorldState baseline.
    - Write model-visible history before its matching WorldState record, so
    an interrupted write can only cause a safe repeated update on replay.
    - Preserve WorldState records for full-history forks while excluding
    them from thread previews, metadata, and app-server history
    materialization.
    
    Older binaries read rollout lines independently, so they skip the
    unknown `world_state` records while retaining the rest of the thread.
    
    ## Testing
    
    - `just test -p codex-core
    snapshot_merge_patch_changes_and_removes_nested_values`
    - `just test -p codex-core
    world_state_baseline_deduplicates_until_history_is_replaced`
    - `just test -p codex-core
    deferred_executor_compaction_preserves_then_updates_environment_once`
    - `just test -p codex-protocol`
    - `just test -p codex-rollout`
    - `just test -p codex-state`
    - `just test -p codex-thread-store`
    - `just test -p codex-app-server-protocol`
  • Represent MCP authentication with an enum (#29924)
    ## Why
    
    MCP authentication has distinct OAuth and ChatGPT-session flows.
    Representing that choice as `use_chatgpt_auth` makes one flow implicit
    and allows the configuration model to express the distinction only
    through a boolean.
    
    ChatGPT credential forwarding also needs a first-party trust boundary. A
    configurable `chatgpt_base_url` controls routing, but must not grant an
    MCP server permission to receive session credentials.
    
    This change builds on #29733, where the boolean was introduced.
    
    ## What changed
    
    - Replace `use_chatgpt_auth` with an `auth` field backed by the
    exhaustive `McpServerAuth` enum.
    - Support `auth = "oauth"` and `auth = "chatgpt"`, with OAuth remaining
    the default.
    - Trust only the origin derived from the existing hardcoded
    `CHATGPT_CODEX_BASE_URL` when granting ChatGPT auth to an MCP server.
    - Keep configured bearer tokens and authorization headers ahead of the
    selected authentication flow.
    - Update config writers, schema output, fixtures, and integration-test
    setup to use the enum.
    
    ## Verification
    
    Integration coverage exercises the complete streamable HTTP startup path
    in two independent configurations:
    
    - A directly constructed MCP configuration verifies that matching an
    overridden `chatgpt_base_url` does not grant ChatGPT auth.
    - A persisted `config.toml` containing an attacker-controlled
    `chatgpt_base_url` and `auth = "chatgpt"` verifies the same boundary
    through normal config parsing.
    
    Both tests complete MCP initialization and tool listing and assert that
    the full captured request sequence contains no authorization headers.
    Separate integration coverage verifies that configured authorization
    takes precedence over ChatGPT auth.
  • [1/3] core: make world state snapshots serializable (#29833)
    ## Why
    
    `WorldState` currently keeps its diff baseline as live Rust objects
    keyed by process-local `TypeId`. That baseline cannot be written to a
    rollout or restored after resume, so Codex reconstructs an approximation
    from `TurnContextItem`.
    
    This is the first change in the WorldState persistence stack. It gives
    every section a stable persisted identity and a compact serializable
    comparison snapshot without changing rollout behavior yet.
    
    ## What changed
    
    - Require each `WorldStateSection` to define a stable ID and
    serializable snapshot type.
    - Reject duplicate section IDs when constructing `WorldState`.
    - Persist a dedicated environment comparison snapshot using
    model-visible strings instead of runtime path types.
    - Store only `WorldStateSnapshot` in `ContextManager`, removing the
    parallel live-object baseline.
    - Render diffs by restoring each section's typed snapshot; invalid
    snapshots fall back to a full section render.
    - Omit null object fields for future RFC 7386 patches while preserving
    null values inside arrays.
    
    Follow-up PRs will record full snapshots and merge patches, then restore
    the baseline during resume, fork, and rollback.
    
    ## Test plan
    
    - WorldState snapshot tests cover stable IDs, duplicate rejection, null
    omission, and array preservation.
    - Environment tests cover persistence-safe snapshot values and existing
    diff rendering.
    - ContextManager baseline deduplication and session context-update
    persistence tests.
    
    Related: #29249
  • Allow ChatGPT-hosted MCP servers to use session auth (#29733)
    ## Why
    
    ChatGPT session authentication was inferred from the reserved Codex Apps
    server name. That couples credential routing to Codex Apps-specific
    behavior and prevents other MCP endpoints hosted by ChatGPT from
    explicitly using the current session.
    
    The opt-in also needs a clear security boundary: an arbitrary MCP
    configuration must not be able to redirect ChatGPT credentials to
    another origin.
    
    ## What changed
    
    - Add `use_chatgpt_auth` to HTTP MCP server configuration, defaulting to
    `false`.
    - Honor the setting only when the parsed server URL has the same HTTP(S)
    origin as the configured `chatgpt_base_url`; otherwise remove the
    capability before startup.
    - Resolve bearer tokens and static or environment-backed authorization
    headers before selecting authentication, with configured authorization
    taking precedence over ChatGPT session auth.
    - Enable the setting for the built-in Codex Apps and hosted plugin
    runtime endpoints while keeping Codex Apps caching and tool
    normalization scoped to the reserved server.
    - Persist the setting through MCP config rewrite paths and expose it in
    the generated config schema.
    - Load the current login state for `codex mcp list` so reported auth
    status matches runtime behavior.
    
    ## Verification
    
    Core integration coverage exercises the complete streamable HTTP MCP
    startup path and verifies that:
    
    - a same-origin opted-in server receives the current ChatGPT access
    token;
    - an explicitly configured authorization header takes precedence;
    - a different-origin server completes MCP initialization and tool
    listing without receiving any ChatGPT authorization header.
  • core: add configurable <context_window_guidance> message (#29936)
    ## Why
    
    This PR adds a configurable `<context_window_guidance>` developer
    section immediately after `<context_window>`. Harness integrations need
    this section to give the model deployment-specific instructions for
    preparing for context-window transitions.
    
    ## What changed
    
    - Add an optional `features.token_budget.guidance_message` config with a
    1,000-byte runtime cap and generated schema support.
    - Render configured guidance as a developer `ContextualUserFragment`
    wrapped in `<context_window_guidance>` immediately after
    `<context_window>`.
    - Omit the section when guidance is unset, empty, or whitespace-only.
    - Preserve the resolved value in config locks and classify persisted
    guidance as contextual developer content.
    - Add integration coverage for rendered content and ordering.