1521 Commits

  • [codex] Treat max as a first-class reasoning effort (#30467)
    ## Why
    
    The Bedrock GPT-5.6 catalog advertises `max`, but Codex treated it as an
    opaque custom effort. That made the reasoning picker render it as
    lowercase `max` while known efforts use productized labels.
    
    Making `max` a known effort aligns catalog data, parsing, and UI
    presentation without changing the `max` wire value or persisted
    representation.
    
    ## What changed
    
    - Add first-class `ReasoningEffort::Max` parsing and serialization.
    - Use the typed effort in the Bedrock catalog and render it as `Max` in
    the TUI.
    - Preserve forward-compatible custom-effort coverage with a genuinely
    unknown `future` value.
    
    ### Before
    <img width="559" height="124" alt="Screenshot 2026-06-28 at 12 08 47 PM"
    src="https://github.com/user-attachments/assets/7c43cf4f-020b-4605-9239-0a9c97eb7364"
    />
    
    ### After
    <img width="558" height="107" alt="Screenshot 2026-06-28 at 12 09 10 PM"
    src="https://github.com/user-attachments/assets/b9cc5ded-c940-43b4-b024-bba25abe0a17"
    />
  • [codex] Restore v1 delegation guidance (#30511)
    ## Summary
    
    - restore the v1 clarification that requests for depth, research, or
    investigation do not authorize subagent spawning
    - restore guidance for keeping critical-path, urgent, tightly coupled,
    or difficult work local
    - update the focused v1 tool-search and spawn-description coverage
    
    ## Why
    
    PR #27919 simplified the v1 `spawn_agent` prompt by removing its
    delegation decision guidance. That left the authorization rule intact,
    but removed the instructions that constrained what should be delegated
    after spawning was authorized.
    
    Restore those guardrails while preserving later support for explicit
    delegation authorization from applicable AGENTS.md and skill
    instructions. Multi-agent v2 prompts are unchanged.
    
    ## User impact
    
    Models using the v1 multi-agent tool surface receive clearer guidance to
    delegate independent side work while keeping blocking work on the main
    rollout.
    
    ## Validation
    
    - `just fmt`
    - `git diff --check`
    - tests not run locally per repository guidance; CI will validate the
    focused coverage
  • [codex] Use model metadata for skills usage instructions (#29740)
    ## Summary
    
    - add a false-by-default `include_skills_usage_instructions` model
    metadata field
    - enable the field for the bundled `gpt-5.5` model metadata
    - consume the metadata in both core and extension skill rendering
    - remove hardcoded legacy-model matching and its marker plumbing
  • core: stabilize synthesized call output IDs (#30327)
    ## Why
    
    Response item IDs represent stable conversation identity.
    `ContextManager::for_prompt` repairs an unmatched call by synthesizing
    an `"aborted"` output in the disposable prompt projection, but that
    output previously had no ID. Assigning a fresh ID on every prompt build
    would make retries and resumes change otherwise identical model context
    and reduce prompt-cache reuse.
    
    The concrete bug is that these normalization-created outputs bypass the
    regular item-ID allocation path. Even with item IDs enabled, a prompt
    could therefore contain an identified call paired with a synthetic
    output whose `id` was missing. This change closes that gap by deriving
    the output ID from the source call's item ID. For legacy calls that have
    no item ID, the output remains ID-less because there is no stable source
    identity to derive from.
    
    The originating call already has a stable item ID under the item-ID
    model introduced in #28814. A prompt-only output can therefore derive
    stable identity from that call without mutating canonical history or
    persisted rollouts. This addresses the failure exposed by #30311 while
    keeping normalization read-only outside its detached prompt snapshot.
    
    UUIDv5 is intentional here because it is the standard namespaced,
    deterministic UUID construction. Using the output kind and source call
    ID as the name produces the same UUID on every projection while keeping
    output kinds in separate name domains. UUIDv7 would introduce randomness
    and time, so keeping it stable would require persisting the synthetic
    repair. UUIDv5 uses SHA-1 internally, but this is only an identity
    mapping—not an authenticity or security boundary.
    
    ## What changed
    
    - Derive a deterministic UUIDv5 ID for each synthesized call output from
    the source call item ID.
    - Use the Responses API prefix appropriate for function, custom-tool,
    tool-search, and local-shell outputs.
    - Preserve the existing insertion position immediately after the
    unmatched call.
    - Keep synthesized outputs prompt-only; no rollout, task-lifecycle,
    compaction, or raw-response behavior changes.
    
    ## Testing
    
    - `just test -p codex-core
    for_prompt_assigns_stable_id_to_synthetic_output_without_reordering_history`
    - `just test -p codex-core
    synthetic_call_output_id_is_stable_across_resumes`
    - `just test -p codex-core normalize_adds_missing_output`
    - `just test -p codex-core response_item_ids`
  • Preserve namespaces on custom tool calls (#30302)
    ## Summary
    
    - Preserve the optional namespace on custom tool calls during response
    deserialization and app-server replay.
    - Use the namespaced tool identifier for streaming argument handling and
    tool dispatch.
    - Regenerate app-server protocol schemas.
    - Add regression tests covering namespace serialization and routing.
    
    ## Testing
    
    - Ran affected protocol and app-server test suites.
    - Ran the full core test suite; two load-sensitive timing tests passed
    when rerun individually.
    - Ran Clippy and formatting checks.
    - Verified with a local end-to-end app-server replay that the namespace
    is preserved through the complete request/response flow.
  • [codex] consume pushed exec-server process events (#30273)
    ## Summary
    
    - complete unified-exec processes from the ordered event stream instead
    of issuing a final zero-wait `process/read`
    - add optional executor sandbox-denial state to `process/exited`
    - retain `process/read` as a retained-output and compatibility fallback
    for receiver lag, sequence gaps, and legacy servers
    - recover sandbox-denial state across transport reconnection
    - cover the real `TestCodex` remote-exec path without adding a public
    test-only event constructor
    
    ## Why
    
    A successful one-shot tool call currently receives its output and
    terminal notifications, then pays another wide-area `process/read` round
    trip before returning. Staging traces showed that remote response wait
    accounted for more than 99.8% of RPC time; local serialization,
    queueing, and deserialization were below 0.6 ms.
    
    ## Measured impact
    
    A direct staging A/B used the same build and route and changed only
    completion mode. Each arm ran three times with 30 one-shot
    `/usr/bin/true` calls per run. The table reports the median of the three
    per-run percentiles.
    
    | Metric | Final `process/read` | Pushed events | Change |
    | --- | ---: | ---: | ---: |
    | End-to-end completion p50 | 159.5 ms | 118.7 ms | -40.8 ms (-25.6%) |
    | End-to-end completion p95 | 182.4 ms | 131.7 ms | -50.6 ms (-27.8%) |
    | Completion-wait p50 | 80.1 ms | 41.5 ms | -38.5 ms (-48.1%) |
    | Final `process/read` RPC p50 | 79.9 ms | eliminated | -79.9 ms |
    
    TCP_NODELAY was enabled in both A/B arms, so its effect cancels out. The
    successful, complete, in-order event path issued zero final
    `process/read` calls.
    
    ## Compatibility and recovery
    
    - new servers send `sandboxDenied` on `process/exited`
    - legacy servers omit it, which triggers one compatibility
    `process/read`
    - broadcast lag or a sequence gap triggers a retained-output read
    - recovery remains bounded by the server's existing 1 MiB
    retained-output window
    - complete, in-order event streams issue no completion read
    - sandbox denial is attached to the exit event before consumers can
    observe process completion
    - server-first and client-first rollouts remain wire-compatible;
    server-first realizes the latency win immediately
    
    ## Integration coverage
    
    The `TestCodex` suite exercises four distinct remote-exec contracts:
    
    - complete pushed output/exit/close with zero reads
    - direct pushed sandbox denial with zero reads
    - legacy missing denial metadata with exactly one compatibility read
    - count-bounded replay eviction recovered from retained output without
    duplication
    
    ## Validation
    
    - `just test -p codex-core
    exec_command_consumes_pushed_remote_process_events`: 4 passed
    - `just test -p codex-core unified_exec::process_tests::`: 4 passed
    - `just test -p codex-exec-server`: 294 passed, 2 skipped
    - `just test -p codex-exec-server-protocol`: 5 passed
    - `just test -p codex-rmcp-client`: 89 passed, 2 skipped
    - focused Bazel `//codex-rs/core:core-all-test`: passed across 16 shards
    - scoped `just fix` passed for core and exec-server
    - `just fmt` passed
    
    The complete workspace suite was not rerun; focused Cargo and Bazel
    coverage passed for the changed behavior.
  • [codex] allow AGENTS.md and skills to authorize delegation (#30274)
    Prompt update of MAv2 to include agents.md and skills more explicitly
    
    should mimic: https://github.com/openai/codex/pull/27919
  • feat(app-server): add history_mode to thread (#29927)
    ## Description
    
    This PR adds a new `historyMode = "legacy" | "paginated"` to `Thread`.
    This will be stored in `SessionMeta` in the JSONL rollout file and as a
    new column in the SQLite thread_metadata table, and exposed on
    `thread/start` and on the `Thread` object in app-server.
    
    ## What changed
    
    - Added canonical `ThreadHistoryMode` with `legacy` and `paginated`,
    defaulting old and new SessionMeta to `legacy`.
    - Carried `history_mode` through core session config, ThreadStore stored
    metadata, local/in-memory stores, rollout metadata extraction, and the
    existing SQLite `threads` table.
    - Added experimental `historyMode` to app-server v2 `Thread` and
    `thread/start`.
    - Made paginated stored threads metadata-discoverable but unsupported
    for legacy full-history reads, `load_history`, live resume, and create
    paths.
    - Regenerated app-server schema fixtures and added
    protocol/state/thread-store/app-server coverage for persistence and
    fail-closed behavior.
    
    ## Compatibility floor
    Because users may be running various versions of Codex binaries on the
    same machine (TUI, Codex App, etc.), we will need to establish a
    compatibility floor for upcoming paginated threads, which will change
    how thread storage reads and writes work.
    
    The overall plan here:
    ```
    Release N:
    - Add historyMode to SessionMeta / Thread / SQLite metadata.
    - Teach binaries to understand paginated threads.
    - If a binary sees `historyMode="paginated"` but does not support the paginated contract, it refuses to resume/mutate the thread.
    - Default remains `"legacy"`.
    
    Release N+1:
    - First-party clients start opting into paginated threads where appropriate.
    - Internal dogfood / staged rollout.
    - Measure old-client usage and paginated-thread unsupported errors.
    
    Release N+2:
    - Only after Release N+ is overwhelmingly deployed, make paginated the default.
    - Accept that a small tail of N-1-or-older binaries may not understand paginated threads.
    ```
    
    The important behavior change is fail-closed handling for a binary that
    encounters a persisted `paginated` thread before it knows how to fully
    support paginated history. In app-server, if a thread is `paginated`, we
    will:
    
    - allow metadata-only discovery paths like `thread/list` and
    `thread/read(includeTurns=false)`, so clients can still see the thread
    and inspect its `historyMode`
    - reject legacy full-history/live-thread paths like
    `thread/read(includeTurns=true)` and `thread/resume` with an unsupported
    JSON-RPC error
    - avoid silently treating an unknown or future `historyMode` as `legacy`
    
    Under the hood, the ThreadStore layer also rejects legacy operations
    that would need to load or replay the full thread history for a
    paginated thread. That gives us the behavior we want for Release N:
    future paginated threads are visible, but this binary fails closed
    instead of trying to operate on them as if they were legacy threads.
  • [codex] wire process-owned code mode host into core (#30142)
    ## Summary
    
    - add the `code_mode_host` feature flag and select
    `ProcessOwnedCodeModeSessionProvider` in `CodeModeService` when enabled
    - initialize code-mode sessions lazily so a missing host reports a tool
    error without failing thread startup
    - resolve `codex-code-mode-host` beside the running Codex binary by
    default while preserving `CODEX_CODE_MODE_HOST_PATH` as an override
    - add unit and end-to-end coverage for host resolution and graceful
    missing-host behavior
    
    ## Why
    
    This wires the process-owned session client from #30112 into the core
    service behind an opt-in rollout gate. Packaged Codex installations can
    place the helper in the same `bin` directory as the main executable
    without relying on `PATH`, while development and custom installations
    can continue to override the helper path.
    
    ## Stack
    
    - Depends on #30112
    - Base branch: `cconger/process-owned-session-runtime-4-client`
    
    ## Validation
    
    Build `codex` and `codex-code-mode-host`
    `CODEX_CODE_MODE_HOST_PATH="$PWD/target/debug/codex-code-mode-host"
    ./target/debug/codex --enable code_mode_host`
  • Retry failed Codex Apps MCP startup (#29920)
    ## Problem
    
    The built-in Codex Apps MCP client shares a future for the full startup
    operation: connect, complete `initialize`, fetch the initial tools, and
    return a usable client. Sharing deduplicates startup work, but it also
    memoizes terminal errors.
    
    After a transient connection, handshake, or initial `tools/list`
    failure, later tool builds observe the same failed future. The thread
    cannot reconnect after the backend recovers and continues serving its
    startup-time cached tool snapshot, which may be empty or stale.
    
    ## Fix
    
    When Apps MCP startup ends in an error, Codex starts bounded recovery
    without putting startup latency on tool-router construction:
    
    1. The current tool build immediately continues with the cached startup
    snapshot.
    2. After the initial failure is reported, Codex starts one fresh full
    startup attempt in the background.
    3. Concurrent tool builds share that in-flight attempt and also continue
    with cached tools.
    4. On success, the recovered client becomes active, refreshes the Apps
    tools cache, emits a `Ready` startup status, and is reused by later
    operations.
    5. On failure, the cache remains unchanged and later tool builds may
    start another background attempt after exponential cooldown: 1s, 2s, 4s,
    8s, 16s, then 30s maximum.
    
    Each recreated startup performs a fresh MCP `initialize` and uncached
    `tools/list`. The MCP client retains its existing bounded retries for
    retryable `initialize` and `tools/list` failures.
    
    This avoids adding the Apps startup timeout to every request during a
    sustained outage.
    
    ## Scope
    
    This is limited to the built-in Codex Apps MCP client:
    
    - no reconnects for user-configured MCP servers;
    - no cache deletion; and
    - no proactive refresh for a healthy client with stale tools.
    
    ## Tests
    
    Coverage verifies:
    
    - tool builds return cached tools without waiting for a blocked
    reconnect;
    - concurrent tool builds start only one background reconnect;
    - failed reconnects preserve cached tools and respect exponential
    cooldown;
    - a recovered client is retained and reused; and
    - a long-lived thread exposes recovered app tools on a later follow-up.
    
    Validation:
    
    - `just test -p codex-mcp` — 95 passed
    - `just test -p codex-core
    later_follow_up_uses_background_recovered_apps_after_mid_thread_startup_failures
    --no-capture` — passed
    - `just fix -p codex-mcp`
    - `just fmt`
  • [codex] allow CCA image generation and web search extensions (#29909)
    ## Summary
    
    - allow the standalone image-generation and web-search extensions for
    the actor-authorized provider shape used by CCA
    - preserve builtin `image_generation` and `web_search` for older models
    and existing flows
    - keep ordinary non-OpenAI providers excluded from both extensions
    - remove only the image extension local managed-AuthManager requirement
    that CCA cannot satisfy
    - share actor-authorization detection through `ModelProviderInfo`
    - keep Core tests focused on routing behavior and cover header-shape
    edge cases in `model-provider-info`
    - add a Responses Lite regression that verifies both
    `image_gen.imagegen` and `web.run`
    
    ## Why
    
    CCA uses a provider named `local` with `requires_openai_auth: false` and
    a non-empty `x-openai-actor-authorization` header. Core accepts that
    provider shape, but both extension provider-name gates rejected it;
    image generation additionally required a Codex-managed login.
    
    The standalone paths must coexist with existing builtin tools. New
    Responses Lite models can receive `image_gen.imagegen` and `web.run`,
    while older models continue using builtin tools.
    
    ## Impact
    
    This enables both standalone extensions for CCA once installed
    downstream, without removing or changing builtin-tool compatibility for
    older models.
    
    ## Validation
    
    - `just test -p codex-core
    responses_lite_exposes_standalone_tools_for_actor_authorized_provider`
    - `just test -p codex-core
    responses_lite_uses_standalone_web_search_and_image_generation`
    - `just test -p codex-core
    hosted_tools_follow_provider_auth_model_and_config_gates`
    - `just test -p codex-image-generation-extension`
    - `just test -p codex-web-search-extension`
    - `just test -p codex-model-provider-info`
    - `just fmt`
    - `git diff --check`
  • Expose MCP app identity in app context (#29934)
    ## Why
    
    MCP tool-call events need to expose trusted app identity and action
    metadata directly so v2 clients do not have to infer it from tool names
    or resource URIs.
    
    ## What changed
    
    - Add optional `appName`, `templateId`, and `actionName` fields to MCP
    tool-call `appContext`.
    - Populate `appName` and `templateId` from trusted Codex Apps metadata,
    and derive `actionName` from the trusted app resource metadata.
    - Preserve all three fields through core events, legacy protocol events,
    persisted thread history, resume redaction, and app-server v2 responses.
    - Document the public `appContext` fields in
    `codex-rs/app-server/README.md`.
    - Regenerate app-server JSON and TypeScript schemas and add coverage for
    serialization, persistence, redaction, and metadata propagation.
    
    ## Validation
    
    - `just test -p codex-app-server-protocol mcp_tool_call`
    - `just test -p codex-core
    mcp_tool_call_item_metadata_only_trusts_codex_apps_identity
    mcp_tool_call_item_includes_app_identity`
    - `just write-app-server-schema`
    
    ---------
    
    Co-authored-by: Martin Au-Yeung <280153141+martinauyeung-oai@users.noreply.github.com>
  • Pin MCP runtimes to model steps (#30101)
    ## Why
    
    An MCP refresh can replace the session's current manager while a model
    step is still running. The step must execute calls through the same
    manager whose tools it advertised.
    
    ## Boundary
    
    ```text
    current session MCP runtime
              |
              | capture once for this model step
              v
    StepContext.mcp
      - exact MCP config
      - exact connection manager
      - exact runtime environment context
    ```
    
    ```rust
    pub struct McpRuntimeSnapshot {
        config: Arc<McpConfig>,
        manager: Arc<McpConnectionManager>,
        runtime_context: McpRuntimeContext,
    }
    ```
    
    ## Example
    
    ```text
    step A captures runtime A and advertises A's tools
    refresh publishes runtime B
    step A tool call -> runtime A
    next step        -> runtime B
    ```
    
    Capturing the snapshot is only an `Arc` clone. It does not restart MCPs
    or make an RPC.
    
    ## What changes
    
    - Captures one MCP runtime in `StepContext`.
    - Uses it for tool planning, tool calls, resources, approvals, connector
    attribution, and elicitation.
    - Publishes replacement runtimes atomically.
    - Lets an old runtime live only while an in-flight step or request still
    holds its `Arc`.
    
    Most of this diff is mechanical routing from the session-global manager
    to `step_context.mcp`; it does not introduce selected-plugin discovery
    yet.
    
    ## What does not change
    
    - No plugin or extension migration.
    - No new MCP cache policy.
    - No environment file watching.
    - No client sharing between separate managers.
    
    ## Stack
    
    1. Extension-owned World State sections.
    2. Project executor skills through World State.
    3. **This PR:** pin one MCP runtime to each model step.
    4. Project selected MCP/app/connector metadata by environment
    availability.
    5. One end-to-end integration scenario.
  • [codex] impl delivery_mode: current time reminders on response boundaries (#30033)
    ## Summary
    - track user-like input and tool-output boundaries in current-time
    reminder state
    - gate reminder injection when delivery_mode is
    after_user_or_tool_output
    - preserve interval debounce and forced reminders after context-window
    changes
    
    ## Why
    Training can request reminders only after user or tool-output items
    while keeping the existing canonical pre-inference history-injection
    path.
    
    ## Validation
    - just test -p codex-core
    current_time_reminders_can_follow_only_user_or_tool_outputs
    - just test -p codex-core
    current_time_reminders_follow_time_interval_and_persist_in_history
    - just test -p codex-core
    current_time_reminder_is_refreshed_after_compaction
    - just fix -p codex-core
  • core: expose permission profile to shell tools (#29941)
    ## tl;dr
    
    Inject a `CODEX_PERMISSION_PROFILE` environment variable with the name
    of the current permission profile when invoking a shell tool.
    
    ## Why
    
    Shell tool owners may need to launch nested commands under the same
    named permission profile, including through `codex sandbox -P PROFILE
    --include-managed-config`. Until now, child processes could observe
    sandbox and network metadata but could not identify the active named
    permission profile.
    
    The `--include-managed-config` flag is essential when a helper
    reconstructs the sandbox from a profile name: it ensures the nested
    sandbox also loads managed enterprise requirements. Without it, using
    the inherited profile could unintentionally create a sandbox that does
    not enforce the organization's managed restrictions.
    
    The new environment value is intentionally informational and **must not
    be treated as trusted input**. Any process in the ancestry can overwrite
    an environment variable, so a consumer that passes this value to `codex
    sandbox -P` must first validate it against the profiles that helper is
    authorized to use.
    
    ## Example Use Case
    
    Suppose an organization provides a trusted `remote-bash` wrapper that
    lets Codex run a command on an approved build host. The local shell
    command uses the named `:workspace` permission profile:
    
    ```toml
    default_permissions = ":workspace"
    ```
    
    The command exposed to the model is a small zsh wrapper. It deliberately
    delegates with `exec`, preserving the original arguments and process
    environment:
    
    ```zsh
    #!/usr/bin/env zsh
    exec /opt/codex-tools/remote_bash.py "$@"
    ```
    
    The model invokes the public wrapper, not its Python implementation:
    
    ```sh
    /opt/codex-tools/remote-bash \
      --host builder.example.com \
      -- printf '%s' 'hello world'
    ```
    
    Only the inner implementation is authorized to escape the local sandbox:
    
    ```starlark
    prefix_rule(
        pattern=["/opt/codex-tools/remote_bash.py"],
        decision="allow",
    )
    ```
    
    With zsh-fork, execution begins with `remote-bash` inside the
    `:workspace` sandbox. When the wrapper calls `exec`, the exact prefix
    rule matches `remote_bash.py`, so that inner script is restarted
    unsandboxed. The escalated process inherits:
    
    ```text
    CODEX_PERMISSION_PROFILE=:workspace
    ```
    
    Inheritance does not make the value trustworthy. `remote_bash.py`
    independently allowlists both the remote host and the permission profile
    before using either value. In particular, a forged value such as
    `:danger-full-access` is rejected before it can reach `codex sandbox
    -P`:
    
    ```python
    import argparse
    import os
    import shlex
    import sys
    
    ALLOWED_HOSTS = {"builder.example.com"}
    ALLOWED_PROFILES = {":workspace"}
    
    parser = argparse.ArgumentParser()
    parser.add_argument("--host", required=True)
    separator = sys.argv.index("--")
    args = parser.parse_args(sys.argv[1:separator])
    command = sys.argv[separator + 1:]
    
    if args.host not in ALLOWED_HOSTS:
        parser.error("host is not allowlisted")
    if not command:
        parser.error("the remote command must not be empty")
    
    profile = os.environ.get("CODEX_PERMISSION_PROFILE")
    if not profile:
        raise SystemExit("CODEX_PERMISSION_PROFILE must not be empty")
    if profile not in ALLOWED_PROFILES:
        raise SystemExit("CODEX_PERMISSION_PROFILE is not allowlisted")
    
    remote_command = shlex.join(command)
    sandbox_command = shlex.join([
        "codex", "sandbox", "-P", profile,
        "--include-managed-config", "--",
        "bash", "-lc", remote_command,
    ])
    print(shlex.join(["ssh", args.host, sandbox_command]))
    ```
    
    This builds each command layer as an argument vector and uses
    `shlex.join()` at the boundary, rather than interpolating untrusted
    shell text. After validation and parsing, the nested command has this
    structure:
    
    ```text
    ssh argv:
      ["ssh", "builder.example.com", SANDBOX_COMMAND]
    
    SANDBOX_COMMAND argv:
      ["codex", "sandbox", "-P", ":workspace",
       "--include-managed-config", "--",
       "bash", "-lc", "printf %s 'hello world'"]
    
    bash -lc payload argv:
      ["printf", "%s", "hello world"]
    ```
    
    A production implementation could execute that SSH command. The
    integration fixture prints it and parses the result back into arguments,
    verifying the complete flow:
    
    ```text
    model invokes outer wrapper
      -> zsh-fork starts wrapper under :workspace
      -> wrapper execs allowlisted Python script
      -> prefix rule restarts Python script unsandboxed
      -> Python script inherits CODEX_PERMISSION_PROFILE=:workspace
      -> Python script verifies :workspace is allowlisted
      -> remote command runs codex sandbox -P :workspace
         with --include-managed-config
      -> nested sandbox honors managed enterprise requirements
    ```
    
    This gives the trusted helper access to resources outside the local
    sandbox—such as SSH credentials—while ensuring that it can select only
    an explicitly authorized profile and that work on the remote host
    remains subject to the organization's managed requirements.
    
    ## What changed
    
    - Inject `CODEX_PERMISSION_PROFILE` after shell environment policy
    evaluation so the active profile wins over inherited or configured stale
    values.
    - Apply the variable to both `shell_command` and unified `exec_command`,
    including local, zsh-fork, and remote exec-server paths.
    - Remove stale values when the session has no active named profile.
    - Preserve the current profile value when loading a shell snapshot so a
    parent snapshot cannot restore an older profile.
    
    ## Testing
    
    - Added classic-shell integration coverage proving an exact prefix rule
    can run a `require_escalated` script outside the `:workspace` sandbox
    while preserving `CODEX_PERMISSION_PROFILE=:workspace`.
    - Added zsh-fork integration coverage in which the model invokes an
    outer zsh wrapper, an inner allowlisted `remote_bash.py` runs
    unsandboxed, and its printed SSH command reconstructs the inherited
    `:workspace` sandbox with `--include-managed-config` while preserving
    every argument after `--`.
    - The example helper treats `CODEX_PERMISSION_PROFILE` as untrusted and
    validates it against `ALLOWED_PROFILES` before constructing the nested
    command.
    - Assert that the reconstructed sandbox command includes
    `--include-managed-config` so nested use of the inherited profile cannot
    bypass managed enterprise requirements.
    - Added coverage for overriding and removing stale profile values.
    - Verified `shell_command` receives the selected active profile.
    - Added shell snapshot coverage using `printenv
    CODEX_PERMISSION_PROFILE`.
  • [codex] current time reminder interval to be set to 0 (#30029)
    A zero interval lets callers request a reminder at every
    otherwise-eligible inference boundary.
    
    ## Validation
    - just test -p codex-core load_config_resolves_current_time_reminder
  • feat: add provider-aware model fallback to thread start (#29942)
    ## Why
    
    Helper threads such as task title generation can request a model ID that
    is valid for the default OpenAI provider but unavailable from the active
    provider. With Amazon Bedrock, `gpt-5.4-mini` is rejected while the
    provider static catalog exposes Bedrock model IDs such as
    `openai.gpt-5.5` and `openai.gpt-5.4`. This causes repeated background
    404s and can surface a misleading turn error even when the main turn
    succeeds.
    
    Clients need an explicit way to ask app-server to resolve an unavailable
    helper model to the active provider default. That fallback must remain
    limited to providers with an authoritative static catalog so custom or
    dynamically discovered model IDs are not rewritten based on an
    incomplete catalog.
    
    Fixes #28741.
    
    ## What changed
    
    - Add the experimental `allowProviderModelFallback` option to
    `thread/start`, defaulting to `false` to preserve existing behavior.
    - Thread the option through thread creation and model selection.
    - When enabled for a static model manager, preserve requested models
    present in the catalog and replace unavailable models with the provider
    default.
    - Continue preserving explicit model IDs for dynamic model managers
    without fetching a catalog solely to validate them.
    - Document the new `thread/start` behavior in the app-server API
    overview.
    
    ## Test
    Temporary test-client harness:
    ```
    ThreadStartParams {
        model: Some("gpt-5.4-mini".to_string()),
        allow_provider_model_fallback: true,
        ..Default::default()
    }
    ```
    Command:
    ```
    CODEX_HOME=/tmp/codex-bedrock-thread-start-home \
    CODEX_E2E_BEDROCK_THREAD_START_ONLY=1 \
    ./target/debug/codex-app-server-test-client \
      --codex-bin ./target/debug/codex \
      -c 'model_provider="amazon-bedrock"' \
      send-message-v2 --experimental-api ignored
    ```
    Relevant output:
    ```
    > "method": "thread/start",
    > "params": {
    >   "model": "gpt-5.4-mini",
    >   "modelProvider": null,
    >   "allowProviderModelFallback": true,
    >   ...
    > }
    
    < "result": {
    <   "model": "openai.gpt-5.5",
    <   "modelProvider": "amazon-bedrock",
    <   ...
    < }
    ```
  • Persist selected capability roots and resolve availability per model step (#29856)
    ## Why
    
    `selectedCapabilityRoots` is durable thread intent: “use this capability
    root from environment `worker`.”
    
    The important product assumption is:
    
    > One environment ID always names the same logical executor and stable
    contents.
    
    `worker` does not silently change from executor A to an unrelated
    executor B. The process-local connection handle for `worker` can still
    be replaced while Codex is running, though, for example when
    `environment/add` registers a fresh handle for the same logical
    environment.
    
    The thread should persist only the stable selection. Each model step
    should pair that selection with the exact ready handle captured for that
    step.
    
    ## The boundary
    
    ```text
    persisted thread intent
      plugin@1 -> environment "worker"
                    |
                    | capture the current step
                    v
    model-step view
      unavailable, or
      plugin@1 + worker's exact captured ready handle
    ```
    
    The environment ID is the stable identity and cache key. The
    `Arc<Environment>` is only a process-local handle retained so consumers
    of one model step use the same captured environment. It is never
    persisted and it does not imply different environment contents.
    
    ## What changes
    
    ### Persist the stable selection
    
    Selected roots are written into `SessionMeta` and restored with the
    thread. Forked subagents inherit the same selections, including
    bounded-history forks.
    
    Only stable data is persisted: root ID, environment ID, and root path.
    
    ### Capture readiness together with the exact handle
    
    The environment snapshot records:
    
    ```rust
    environment_id -> Some(Arc<Environment>) // ready in this step
    environment_id -> None                   // still starting in this step
    ```
    
    This prevents readiness and execution from coming from different
    registry snapshots.
    
    For example:
    
    ```text
    step snapshot: worker -> handle A, ready
    environment/add: worker -> fresh handle B for the same logical environment
    current step: plugin@1 still uses captured handle A
    ```
    
    Without carrying handle A in the snapshot, the resolver could combine “A
    was ready” with handle B and treat B as ready before it had finished
    starting.
    
    This does not change cache invalidation. Stable capability metadata
    remains identified by environment ID and capability root. Replacing a
    process-local handle under the same stable environment ID does not
    invalidate or rediscover that metadata.
    
    ### Resolve availability per model step
    
    - A ready captured environment produces resolved roots using its
    captured handle.
    - A starting, missing, or failed environment is omitted from that step.
    - A selected lazy environment that is outside the turn's captured
    environment set is asked to start, and a later step can observe it as
    ready.
    - No capability files are scanned here.
    
    Transient transport disconnects remain the remote client's reconnect
    concern. This PR models initial attachment/readiness; it does not add
    live socket-connectivity state.
    
    ## Example
    
    ```text
    thread selection: plugin@1 -> environment "worker"
    
    step 1: worker is starting -> plugin@1 unavailable
    step 2: worker is ready    -> plugin@1 resolves through worker's captured handle
    step 3: fresh local handle -> current step remains pinned; a later step captures its own view
    ```
    
    Temporary unavailability does not discard the durable selection. Later
    PRs can retain stable metadata caches while projecting only currently
    available capabilities into model-visible World State.
    
    ## Compatibility
    
    The app-server request shape does not change. Older rollouts without
    `selected_capability_roots` deserialize to an empty list.
    
    ## Stack
    
    1. **This PR:** persist stable selected roots and resolve them through
    an exact model-step handle.
    2. #29960: cache stable skill metadata and project available skills into
    World State.
    3. #29946: cache stable plugin declarations and manage the separate live
    MCP runtime.
  • Support OAuth for HTTP MCP servers from selected executor plugins (#28529)
    ## Why
    
    #28522 routes selected-plugin HTTP MCP traffic through the owning
    executor, but OAuth bootstrap and refresh still used host-local clients.
    Executor-only servers therefore cannot complete discovery or login
    through the same network boundary as the MCP connection.
    
    ## What changed
    
    - adapt `codex_exec_server::HttpClient` to RMCP 1.8's `OAuthHttpClient`
    contract
    - let RMCP own discovery, dynamic registration, PKCE, token exchange,
    and refresh
    - route auth status, persisted-token startup, and app-server login
    through the server runtime while preserving the existing local discovery
    path
    - add optional `threadId` to `mcpServer/oauth/login` and echo it in the
    completion notification
    - implement RMCP's redirect policy and 1 MiB OAuth response limit over
    executor HTTP
    - cover selected-thread OAuth discovery and login through an
    executor-only route
    
    Depends on #28522.
  • core: reconcile legacy WorldState sections (#29997)
    ## Why
    
    Older rollouts can retain model-visible context for a WorldState section
    without having a persisted snapshot for that section. Treating the
    missing snapshot as definitely absent can duplicate old context or fail
    to tell the model that it was replaced or removed.
    
    This provides a generic migration path for sections moving into
    WorldState, beginning with AGENTS.md.
    
    Builds on #29810.
    
    ## What changed
    
    - distinguish section state that is absent, known from a persisted
    snapshot, or unknown because matching legacy context remains in history
    - let WorldState sections identify their own legacy fragments while
    `ContextManager` owns history reconciliation and baseline persistence
    - make AGENTS.md emit one conservative replacement or removal update for
    legacy history, then deduplicate from the newly persisted baseline
    - preserve existing environment rendering when persisted section data is
    missing or malformed
    
    ## Testing
    
    - `just test -p codex-core world_state`
    - `just test -p codex-core
    cold_resume_invalidates_deleted_legacy_agents_md_once -- --exact`
  • core: make AGENTS.md react to environment changes (#29810)
    ## Why
    
    With deferred executors, a turn can begin before a remote environment
    attaches. AGENTS.md discovery previously ran only during session setup,
    so instructions from a later environment never reached the model or the
    session instruction sources.
    
    WorldState persistence has now landed, so this uses the durable
    model-visible baseline directly instead of carrying a temporary
    resume/fork compatibility path.
    
    ## What
    
    - Add an `AgentsMdManager` in `SessionServices` to own host
    instructions, loaded state, and refresh caching.
    - When `DeferredExecutor` is enabled, refresh AGENTS.md when attached
    environment selections change and freeze the result in the corresponding
    `StepContext`.
    - Represent AGENTS.md as a persisted WorldState section for every
    session, with bounded initial, replacement, and removal updates.
    - Remove duplicate AGENTS.md state and rendering from
    `SessionConfiguration` and `TurnContext`.
    - Build initial context, per-request updates, and compaction context
    from the same step-scoped value.
    - On resume and fork, compare current instructions with the restored
    WorldState baseline and inject a replacement exactly once when they
    differ.
    
    Builds on #29833, #29835, and #29837.
    
    ## Tests
    
    - Covers a remote environment becoming ready mid-turn, with AGENTS.md
    appearing on the next request exactly once and updating canonical
    instruction sources.
    - Covers full, unchanged, replaced, and removed AGENTS.md WorldState
    rendering.
    - Covers changed instructions across cold resume and fork without
    duplicate reinjection.
    - Covers remote-v2 compaction retaining creation-time instructions in
    the live session and cold resume appending one replacement when the
    source changed.
    - Ran focused `codex-core` AGENTS.md, WorldState, and context-update
    test suites.
  • feat: use run agent task auth for inference (#19051)
    ## Stack
    
    This is PR 3 of the simplified HAI single-run-task stack:
    
    - [#19047](https://github.com/openai/codex/pull/19047) Agent Identity
    assertion and task-registration primitives, including the shared
    run-task helper used by existing Agent Identity JWT auth.
    - [#19049](https://github.com/openai/codex/pull/19049)
    Disabled-by-default ChatGPT auth opt-in that provisions/reuses persisted
    Agent Identity runtime auth and its single run task.
    - [#19051](https://github.com/openai/codex/pull/19051) Run-scoped
    provider auth that uses one backend-owned task id for first-party
    inference and compaction requests.
    
    [#19054](https://github.com/openai/codex/pull/19054) collapsed out of
    the active stack because the simplified design no longer needs a
    separate background/control-plane task helper.
    
    ## Summary
    
    This PR moves Agent Identity usage into provider auth resolution. That
    keeps `AgentAssertion` auth tied to first-party OpenAI provider requests
    instead of applying a late session-wide override that could affect
    local, custom, Bedrock, API-key, or external-bearer providers.
    
    What changed:
    
    - adds a small `ProviderAuthScope` struct carrying the run auth policy
    and session source needed by provider-scoped auth resolution
    - lets `Session` opt the existing `ModelClient` into `ChatGptAuth`
    policy when `use_agent_identity` is enabled, without adding a second
    model-client constructor
    - resolves Agent Identity only for first-party OpenAI provider auth
    paths
    - uses the persisted run task id from the `AgentIdentityAuth` record to
    build `AgentAssertion` auth for Responses requests
    - routes shared request setup through scoped provider auth so unary
    compact requests use the same run-task assertion path as inference turns
    - keeps local/custom/Bedrock/env-key/external-bearer provider auth
    unchanged
    - lets missing run-task state surface through the existing model-request
    error path instead of silently falling back to bearer auth
    
    This PR intentionally does not create thread-scoped, target-scoped, or
    background-scoped task identities. The run task is the only task Codex
    registers in this POC shape.
    
    ## Testing
    
    - `just test -p codex-model-provider`
    - `just test -p codex-core client::tests::provider_auth_scope_uses`
    - `just test -p codex-core remote_compact_uses_agent_identity_assertion`
  • [codex] route sleep through time providers (#29973)
    ## Summary
    
    - add a cancellable sleep operation to `TimeProvider`
    - route `clock.sleep` through the configured provider
    - extend the supported sleep duration to 12 hours
    - complete the sleep turn item before propagating provider failures
    
    ## Why
    
    This isolates the core clock abstraction needed by external clock
    integrations. Existing system and app-server behavior remains wall-clock
    based in this PR; the stacked follow-up supplies app-server sleeps from
    an external clock.
  • [codex] Add Ultra reasoning effort (#29899)
    ## Why
    
    Ultra should be one user-facing reasoning selection for work that
    benefits from both maximum reasoning and proactive multi-agent
    delegation. Without it, clients must coordinate maximum reasoning with
    the experimental `multiAgentMode` setting, even though the inference
    backend still expects its existing `max` effort value.
    
    This change makes reasoning effort the source of truth: clients select
    `ultra`, core derives proactive multi-agent behavior when the turn is
    eligible for multi-agent V2, and inference requests continue to use the
    backend-compatible `max` value.
    
    ## What changed
    
    - Add `ultra` as a first-class reasoning effort and preserve
    model-catalog ordering when exposing it to clients.
    - Convert `ultra` to `max` at the inference request boundary, including
    Responses HTTP/WebSocket requests, startup prewarm, compaction, and
    memory summarization.
    - Derive effective multi-agent mode per turn from effective reasoning
    effort:
      - eligible multi-agent V2 + `ultra` → `proactive`
      - eligible multi-agent V2 + any other effort → `explicitRequestOnly`
    - V1 or otherwise ineligible sessions → no multi-agent mode instruction
    - Keep the derived effective mode in turn context history so successive
    turns can emit a developer-message update only when the effective mode
    changes.
    - Remove selected multi-agent mode from core session configuration, turn
    construction, thread settings, resume/fork restoration, and subagent
    spawn plumbing. Subagents inherit reasoning effort and derive their own
    effective mode.
    - Retain the experimental app-server `multiAgentMode` fields for wire
    compatibility while marking them deprecated. Request values are accepted
    but ignored; compatibility response fields report `explicitRequestOnly`.
    - Display Ultra in the TUI using the order supplied by `model/list`.
    
    ## Validation
    
    - `just test -p codex-core ultra_reasoning_uses_max_for_requests`
    - `just test -p codex-tui model_reasoning_selection_popup`
  • [2/3] core: persist world state in rollouts (#29835)
    ## Why
    
    `WorldState` currently remembers its model-visible diff baseline only in
    memory. That leaves no durable source for restoring the exact baseline
    after resume, fork, rollback, or compaction.
    
    This is the second PR in the WorldState persistence stack, built on
    #29833 and following #29249. It records durable state transitions; the
    next PR will replay them during rollout reconstruction.
    
    ## What
    
    - Add a `world_state` rollout item containing either a full snapshot or
    an RFC 7386 JSON Merge Patch.
    - Persist a full snapshot after initial context and after compaction
    establishes a new context window.
    - Persist non-empty patches when later sampling steps or turns advance
    the WorldState baseline.
    - Write model-visible history before its matching WorldState record, so
    an interrupted write can only cause a safe repeated update on replay.
    - Preserve WorldState records for full-history forks while excluding
    them from thread previews, metadata, and app-server history
    materialization.
    
    Older binaries read rollout lines independently, so they skip the
    unknown `world_state` records while retaining the rest of the thread.
    
    ## Testing
    
    - `just test -p codex-core
    snapshot_merge_patch_changes_and_removes_nested_values`
    - `just test -p codex-core
    world_state_baseline_deduplicates_until_history_is_replaced`
    - `just test -p codex-core
    deferred_executor_compaction_preserves_then_updates_environment_once`
    - `just test -p codex-protocol`
    - `just test -p codex-rollout`
    - `just test -p codex-state`
    - `just test -p codex-thread-store`
    - `just test -p codex-app-server-protocol`
  • Represent MCP authentication with an enum (#29924)
    ## Why
    
    MCP authentication has distinct OAuth and ChatGPT-session flows.
    Representing that choice as `use_chatgpt_auth` makes one flow implicit
    and allows the configuration model to express the distinction only
    through a boolean.
    
    ChatGPT credential forwarding also needs a first-party trust boundary. A
    configurable `chatgpt_base_url` controls routing, but must not grant an
    MCP server permission to receive session credentials.
    
    This change builds on #29733, where the boolean was introduced.
    
    ## What changed
    
    - Replace `use_chatgpt_auth` with an `auth` field backed by the
    exhaustive `McpServerAuth` enum.
    - Support `auth = "oauth"` and `auth = "chatgpt"`, with OAuth remaining
    the default.
    - Trust only the origin derived from the existing hardcoded
    `CHATGPT_CODEX_BASE_URL` when granting ChatGPT auth to an MCP server.
    - Keep configured bearer tokens and authorization headers ahead of the
    selected authentication flow.
    - Update config writers, schema output, fixtures, and integration-test
    setup to use the enum.
    
    ## Verification
    
    Integration coverage exercises the complete streamable HTTP startup path
    in two independent configurations:
    
    - A directly constructed MCP configuration verifies that matching an
    overridden `chatgpt_base_url` does not grant ChatGPT auth.
    - A persisted `config.toml` containing an attacker-controlled
    `chatgpt_base_url` and `auth = "chatgpt"` verifies the same boundary
    through normal config parsing.
    
    Both tests complete MCP initialization and tool listing and assert that
    the full captured request sequence contains no authorization headers.
    Separate integration coverage verifies that configured authorization
    takes precedence over ChatGPT auth.
  • Allow ChatGPT-hosted MCP servers to use session auth (#29733)
    ## Why
    
    ChatGPT session authentication was inferred from the reserved Codex Apps
    server name. That couples credential routing to Codex Apps-specific
    behavior and prevents other MCP endpoints hosted by ChatGPT from
    explicitly using the current session.
    
    The opt-in also needs a clear security boundary: an arbitrary MCP
    configuration must not be able to redirect ChatGPT credentials to
    another origin.
    
    ## What changed
    
    - Add `use_chatgpt_auth` to HTTP MCP server configuration, defaulting to
    `false`.
    - Honor the setting only when the parsed server URL has the same HTTP(S)
    origin as the configured `chatgpt_base_url`; otherwise remove the
    capability before startup.
    - Resolve bearer tokens and static or environment-backed authorization
    headers before selecting authentication, with configured authorization
    taking precedence over ChatGPT session auth.
    - Enable the setting for the built-in Codex Apps and hosted plugin
    runtime endpoints while keeping Codex Apps caching and tool
    normalization scoped to the reserved server.
    - Persist the setting through MCP config rewrite paths and expose it in
    the generated config schema.
    - Load the current login state for `codex mcp list` so reported auth
    status matches runtime behavior.
    
    ## Verification
    
    Core integration coverage exercises the complete streamable HTTP MCP
    startup path and verifies that:
    
    - a same-origin opted-in server receives the current ChatGPT access
    token;
    - an explicitly configured authorization header takes precedence;
    - a different-origin server completes MCP initialization and tool
    listing without receiving any ChatGPT authorization header.
  • core: add configurable <context_window_guidance> message (#29936)
    ## Why
    
    This PR adds a configurable `<context_window_guidance>` developer
    section immediately after `<context_window>`. Harness integrations need
    this section to give the model deployment-specific instructions for
    preparing for context-window transitions.
    
    ## What changed
    
    - Add an optional `features.token_budget.guidance_message` config with a
    1,000-byte runtime cap and generated schema support.
    - Render configured guidance as a developer `ContextualUserFragment`
    wrapped in `<context_window_guidance>` immediately after
    `<context_window>`.
    - Omit the section when guidance is unset, empty, or whitespace-only.
    - Preserve the resolved value in config locks and classify persisted
    guidance as contextual developer content.
    - Add integration coverage for rendered content and ordering.
  • [codex] nest sleep config under current time reminder (#29910)
    ## Summary
    
    - move sleep tool enablement from top-level `[features].sleep_tool` to
    `[features.current_time_reminder].sleep_tool`
    - remove the standalone `Feature::SleepTool` flag and gate `clock.sleep`
    from resolved current-time configuration
    - update config schema, config-lock materialization, and existing sleep
    coverage
    
    Stacked on #29907.
  • [codex] namespace sleep under clock (#29907)
    ## Summary
    
    - expose the interruptible sleep tool as `clock.sleep` instead of
    top-level `sleep`
    - keep `clock.curr_time` and `clock.sleep` in the same model-visible
    namespace when both features are enabled
    - update existing core and app-server integration coverage to issue
    namespaced sleep calls
    
    ## Why
    
    Sleep is a clock operation. Grouping it with `clock.curr_time` gives the
    model a more coherent tool surface without changing the sleep feature
    gate or runtime behavior.
    
    ## Validation
    
    - `just test -p codex-core sleep_tool_follows_feature_gate`
    - `just test -p codex-core any_new_input_interrupts_sleep`
    - `just test -p codex-app-server
    sleep_emits_started_and_completed_items`
  • [plugins] Track plugin install requests by ID (#29684)
    Summary
    - Emit `codex_plugin_install_requested` when a validated plugin install
    request is made, before the user accepts or declines the elicitation.
    - Record the exact model-visible plugin ID, remote plugin ID, required
    connector IDs, stable suggestion ID, and `endpoint_recommendation` vs
    `legacy_discovery` source.
    - Keep `suggest_reason` out of telemetry and leave connector-only
    install requests unchanged.
    
    Rollout
    - Backend/schema dependency:
    https://github.com/openai/openai/pull/1065270
    - Land the backend PR before this producer starts sending the event.
    
    Validation
    - `just test -p codex-analytics` (83 passed)
    - `just test -p codex-core request_plugin_install` (17 passed)
    - `just fix -p codex-analytics`
    - `just fix -p codex-core`
    - `just fmt`
    - `git diff --check`
  • [codex] Inject agent graph store into ThreadManager (#29736)
    Pick up the AgentGraphStore migration.
    
    - Inject an explicit optional agent graph store into `ThreadManager` 
    - Move all calls to spawn, close, recursive resume, and
    subtree/archive/delete/feedback traversal through it
    - Keep using  `LocalAgentGraphStore` when SQLite is available
    
    This required some changes to the interface to deal with futures:
    
    - The interface now matches `ThreadStore`'s object-safe pattern by
    returning a boxed `AgentGraphStoreFuture` directly, allowing
    `ThreadManager` to hold `Arc<dyn AgentGraphStore>`
    
    *Slight behavior change!* Unfiltered subtree enumeration now performs a
    single all-status breadth-first traversal, so a closed grandchild
    beneath an open edge is included; the previous Open-then-Closed
    traversals could not cross mixed-status paths and silently omitted it.
  • [codex] Remove auto-compaction opt-out (#29815)
    ## Summary
    
    - remove the default-on `auto_compaction` feature flag and generated
    config schema entries
    - restore unconditional pre-turn, model-switch/hash, and mid-turn
    automatic compaction
    - expose `new_context` whenever token-budget tooling is enabled
    - remove the disabled-auto-compaction integration coverage introduced by
    #28260
    
    ## Motivation
    
    Roll back the internal auto-compaction escape hatch added in #28260.
    Automatic compaction should no longer be suppressible with `--disable
    auto_compaction`; existing manual `/compact` behavior remains unchanged.
    
    ## Testing
    
    - `just write-config-schema`
    - `just test -p codex-features` — 53 passed
    - `just test -p codex-core 'suite::compact::'` — 36 passed
    - `just test -p codex-core
    suite::token_budget::new_context_tool_starts_new_window_before_follow_up`
    — 1 passed
    - `just fix -p codex-core -p codex-features`
    - `just fmt`
    - `just test -p codex-core` — 2,778 passed, 59 failed, 16 skipped;
    failures were outside the changed compaction paths and were dominated by
    missing first-party test binaries and shell-snapshot timeouts
  • test: use automatic environments in app-server integration tests (#29789)
    ## Why
    
    Topology-neutral app-server integration tests should exercise automatic
    environment selection so the same setup covers local and remote
    executors.
    
    ## What
    
    Migrate eligible tests to `TestAppServer::new_with_auto_env()` and
    `send_thread_start_request_with_auto_env()`. Leave explicit-topology
    tests unchanged, and skip the request-permissions case on Windows with a
    TODO for cross-platform tool routing.
    
    ## Validation
    
    - `just test -p codex-app-server`
    - `bazel test //codex-rs/app-server:app-server-all-wine-exec-test
    --test_output=errors`
    
    Stacked on #29788.
  • test: run app-server integration tests under Wine (#29788)
    ## Why
    
    Made a mistake when carving #29746 out of my local changes and the test
    was missing from the build graph. Oops!
    
    ## What
    
    Enable the app-server Wine exec test target. Remove the `manual` tag
    from generated Wine-exec test variants so wildcard Bazel test
    invocations select them. Refactor the smoke test to ensure it passes
    with current Windows support.
  • Let image generation extension hosts control output persistence (#29711)
    ## Why
    
    Some extension hosts need generated images returned without writing them
    to the local filesystem or giving the model a local path.
    
    ## What changed
    
    **tl;dr**: we now conduct all extension operations in the image gen
    extension
    
    - Let hosts provide an optional image save root when installing the
    extension.
    - Save images and return path hints only when a save root is configured.
    - Return image data without saving or adding a path hint when no save
    root is configured.
    - Preserve the extension-provided `saved_path` instead of persisting
    extension images again in core.
    - Leave built-in image generation unchanged.
    
    ## Validation
    
    - `just test -p codex-image-generation-extension`
    - `just test -p codex-app-server
    standalone_image_generation_returns_saved_path_hint_to_model`
    - `just test -p codex-core
    extension_tool_uses_granted_turn_permissions_without_local_persistence`
    - `just test -p codex-core tools::handlers::extension_tools::tests`
    - tested on CODEX CLI on both save_root: CODEX_HOME and None 
    - tested on CODEX APP on both as well
  • test: add app-server auto environment helper (#29746)
    ## Why
    
    Start moving towards app-server tests defaulting to running against
    remote & foreign OS executors. To do so we need a point of indirection
    similar to core integration tests' `build_with_auto_env`, but with the
    flexibility of letting tests control environment registration if they
    need to.
    
    ## What
    
    This adds:
    
    - `TestAppServer::new_with_auto_env()` for constructing an app server
    with a default environment defined by the test runner (e.g. bazel)
    - `TestAppServer::auto_env_params()` for tests to easily acquire turn
    env params tailored to the automatic environment
    - `TestAppServer::send_thread_start_request_with_auto_env()` to make it
    easy for tests to start a thread using the automatic environment
    
    The above methods all fail if the test calling them has set up an
    environment where the automatic environment configuration conflicts with
    test-created state.
    
    ## Validation
    
    Adds a couple of basic smoke tests to the app-server test suite.
    Follow-ups will migrate more tests to use it.
  • core: add wait_for_environment for starting environments (#29745)
    ## Why
    
    With `DeferredExecutor`, a sampling request can begin while an
    environment is still starting. The model can see that pending state, but
    needs a way to wait for the environment within the same turn before
    continuing.
    
    Environment startup is owned by Core, so the wait tool should use the
    same request-frozen `StepContext` that advertised the starting
    environment. This keeps tool registration and execution tied to the
    exact startup operation the model saw, even if live thread state later
    changes.
    
    Supersedes #29735.
    
    ## What
    
    - register `wait_for_environment` when the current `StepContext`
    contains starting environments
    - wait on the selected `StartingTurnEnvironment` shared resolution and
    return a bounded ready or failed result
    - rebuild the next request normally, removing the wait tool and exposing
    ready environment tools, or reporting the environment as unavailable
    after failure
    
    ## Testing
    
    - `just test -p codex-core deferred_executor_`
    - verifies the wait tool is replaced by environment-backed tools after
    startup
    - verifies startup failure removes both the wait tool and unavailable
    environment tools while notifying the model
  • Support thread-level originator overrides (#29477)
    ## Why
    
    Work(TPP) threads can be launched from the Desktop app, but if they all
    keep the Desktop app's default originator then downstream attribution
    cannot distinguish local Work launches from cloud-backed Work launches.
    `thread/start.serviceName` already carries that launch signal, while
    `SessionMeta.originator` is the durable thread-level value that survives
    resume and fork.
    
    This change converts the Desktop Work service names into an effective
    originator at thread creation time, persists that originator with the
    thread, and keeps using it for later model requests and memory writes.
    
    ## What changed
    
    - Map `CODEX_WORK_LOCAL` and `CODEX_WORK_CLOUD` service names to
    per-thread originators, while preserving
    `CODEX_INTERNAL_ORIGINATOR_OVERRIDE` as the highest-precedence override.
    - Persist the effective originator in `SessionMeta.originator`, read it
    back on resume/fork, and inherit the parent originator for subagent
    spawns when there is no persisted session metadata.
    - Handle truncated `SpawnAgentForkMode::LastNTurns` forks by falling
    back to the live parent originator when the forked history no longer
    includes `SessionMeta`.
    - Thread the per-thread originator through Responses headers,
    websocket/compaction request paths, thread-store creation, rollout
    metadata, and memory stage-one telemetry.
    
    ## Verification
    
    - `just test -p codex-core
    agent::control::tests::spawn_thread_subagent_inherits_parent_originator_without_fork
    agent::control::tests::spawn_thread_subagent_fork_last_n_turns_inherits_parent_originator_without_session_meta
    thread_manager::tests::originator_override_precedes_service_name_remapping`
    - `just test -p codex-core
    agent::control::tests::resume_thread_subagent_restores_stored_metadata_and_effective_multi_agent_mode`
    - `just test -p codex-memories-write`
    - `just fix -p codex-core -p codex-memories-write`
    - `git diff --check`
  • core: reset context for token budget compaction (#29743)
    ## Why
    
    When `Feature::TokenBudget` is enabled, compaction should behave like
    `new_context`: start a fresh context window with the standard injected
    context, without asking the server to summarize old history and without
    carrying prior user or assistant messages into the next model request.
    
    This is still a compaction operation from the client lifecycle
    perspective. Manual `/compact` and auto-compaction should keep the same
    observable side effects that clients and hooks expect, including compact
    hooks and `TurnItem::ContextCompaction`.
    
    ## What changed
    
    - Added `compact_token_budget` to run token-budget manual and inline
    auto-compaction through a shared compaction lifecycle.
    - Split pending `new_context` requests from forced context-window
    startup: `take_new_context_window_request()` consumes pending requests,
    and `start_new_context_window()` installs a fresh context window.
    - Routed token-budget manual `/compact` and inline auto-compaction to
    install a fresh context window locally instead of calling server/local
    summarization.
    - Preserved compact lifecycle side effects for token-budget compaction
    by running pre/post compact hooks and emitting `ContextCompaction` item
    start/completion events.
    - Updated token-budget tests to assert fresh window IDs, absence of
    server-side compaction calls, dropped prior transcript messages/tool
    output after reset, and compact hook/item lifecycle behavior.
    
    ## Testing
    
    - `just test -p codex-core
    token_budget_context_uses_new_window_after_compaction`
    - `just test -p codex-core token_budget_compaction_runs_compact_hooks`
    - `just test -p codex-core
    token_budget_mid_turn_auto_compaction_resets_before_active_follow_up`
    
    ---------
    
    Co-authored-by: pakrym-oai <pakrym@openai.com>
  • [codex] rename rollout budget error to session budget error (#29744)
    ## Summary
    
    - rename the rollout-budget exhaustion error from
    `RolloutBudgetExceeded` to `SessionBudgetExceeded`
    - expose the matching app-server v2 wire value as
    `sessionBudgetExceeded`
    - regenerate JSON/TypeScript schema fixtures and update the app-server
    docs and focused tests
    
    This is a naming-only follow-up to #29715 based on [Pavel's review
    suggestion](https://github.com/openai/codex/pull/29715#discussion_r3463183480).
    Runtime behavior is unchanged.
    
    ## Tests
    
    - `just test -p codex-core rollout_budget`
    - `just test -p codex-app-server-protocol`
    - `just fmt`
    - `just write-app-server-schema`
  • fix: scope context remaining to body window (#29665)
    ## Why
    
    With `model_auto_compact_token_limit_scope = "body_after_prefix"`, the
    persistent prefix should not count against the active body window.
    `get_context_remaining` and the token-budget reminder should report the
    same usable body-after-prefix window that auto-compaction uses, rather
    than the total token count since the session began.
    
    This is stacked on #29664 so the mechanical move from `turn.rs` is
    isolated from the behavior fix.
    
    ## What
    
    - Extends `ContextWindowTokenStatus` with `context_remaining_tokens`.
    - Updates `get_context_remaining` to use the shared context-window
    accounting.
    - Adds integration coverage for body-after-prefix reminder timing and
    `get_context_remaining` output.
    
    ## Testing
    
    - `just test -p codex-core body_after_prefix_window`
    - `just test -p codex-core auto_compact_body_after_prefix`
    - `just fix -p codex-core`
  • [codex] surface rollout budget exhaustion (#29715)
    ## Summary
    - surface shared rollout-budget exhaustion as
    `CodexErr::RolloutBudgetExceeded` instead of a generic interrupted turn
    - map it through the existing `CodexErrorInfo` and app-server v2
    `codexErrorInfo` path
    - keep local compaction from retrying after the shared rollout budget is
    exhausted
    
    This gives app-server clients a stable `rolloutBudgetExceeded` error
    they can classify without guessing from `status="interrupted"`.
    
    ## Tests
    - `just test -p codex-core rollout_budget`
  • core: persist initial context window metadata (#29519)
    ## Why
    
    PR #29494 made context-window IDs visible to the model by wrapping the
    token-budget window payload in `<context_window>`, but rollout JSONL
    consumers still could not see the initial window identity by tailing the
    session file. Compacted rollout items carry window IDs only after
    compaction has happened, so a session with no compaction had no durable
    JSONL record for window 0.
    
    This change gives tailing consumers a stable initial-window record at
    session creation time.
    
    ## What Changed
    
    - Added `session_meta.context_window.window_id` for the initial
    context-window identity.
    - `CreateThreadParams` now requires `initial_window_id: String`, so
    thread-store callers cannot accidentally create new threads without
    window-0 metadata.
    - Live thread creation derives the persisted initial window ID from the
    same `AutoCompactWindowIds` used to initialize `SessionState`, keeping
    runtime state and JSONL metadata aligned.
    - Rollout reconstruction uses `session_meta.context_window.window_id` as
    the initial-window fallback and derives `window_number = 0`,
    `first_window_id = window_id`, and `previous_window_id = None`
    internally.
    - Fork reconstruction intentionally uses the same rollout reconstruction
    path; consumers that need to distinguish copied initial-window metadata
    can use the rollout `thread_id`.
    - Legacy compactions without `window_number` still use compaction-count
    fallback accounting instead of being reset to window 0 by the
    initial-window fallback.
    - Compacted rollout metadata still takes precedence once compaction
    records exist, preserving the richer chain fields there.
    
    ## JSONL Shape
    
    Real rollout JSONL is one object per line. This example is expanded for
    readability, but shows the new initial `session_meta.context_window`
    record followed by the existing compacted rollout item shape that also
    carries window IDs:
    
    ```jsonl
    {
      "timestamp": "2026-06-22T12:00:00.000Z",
      "type": "session_meta",
      "payload": {
        "session_id": "<THREAD_ID>",
        "id": "<THREAD_ID>",
        "timestamp": "2026-06-22T12:00:00.000Z",
        "cwd": "/repo",
        "originator": "codex",
        "cli_version": "0.0.0",
        "source": "cli",
        "model_provider": "<MODEL_PROVIDER>",
        "context_window": {
          "window_id": "<INITIAL_WINDOW_ID>"
        }
      }
    }
    ...
    {
      "timestamp": "2026-06-22T12:34:56.000Z",
      "type": "compacted",
      "payload": {
        "message": "<COMPACTION_SUMMARY>",
        "replacement_history": [
          "..."
        ],
        "window_number": 1,
        "first_window_id": "<INITIAL_WINDOW_ID>",
        "previous_window_id": "<INITIAL_WINDOW_ID>",
        "window_id": "<NEXT_WINDOW_ID>"
      }
    }
    ```
    
    The nested `context_window` object is intentional: it gives rollout
    consumers a stable namespace for context-window metadata while only
    writing the non-derivable initial `window_id`. For the initial window,
    `window_number`, `first_window_id`, and `previous_window_id` are derived
    internally instead of being written to the rollout.
    
    ## Verification
    
    - `just test -p codex-protocol`
    - `just test -p codex-rollout
    recorder_materializes_on_flush_with_pending_items`
    - `just test -p codex-core reconstruct_history`
    - `just test -p codex-core
    record_initial_history_reconstructs_forked_transcript`
    - `just test -p codex-thread-store`
    - `just test -p codex-state`
    - `just test -p codex-app-server
    thread_read_returns_summary_without_turns`
    - `just test -p codex-rollout persistence_metrics`
  • core tests: rename automatic environment builder (#29728)
    ## Why
    
    Use a clearer name for what happens when this helper sets up a test
    environment.
    
    ## What
    
    - Rename the builder and its harness wrapper to use `auto_env` instead
    of `remote_env` because the helper will set up a local environment if
    configured by the build system.
  • test: branch on target OS instead of runner flavor (#29712)
    ## Why
    
    Core tests should branch on the executor's operating system, not on
    runner details such as Docker or Wine. This keeps platform behavior
    stable as new test backends are added and reserves Wine-specific skips
    for actual runner debt.
    
    ## What
    
    - Add `TestTargetOs` and target/host-aware skip helpers while keeping
    `TestEnvironment` internal.
    - Replace topology enum access with remote predicates and a narrow
    Docker accessor.
    - Migrate OS-semantic Wine skips, preserve runner-specific gaps, and
    document the skip taxonomy.
    
    ## Validation
    
    - `just test -p core_test_support`
    - `just test -p codex-core
    remote_test_env_can_connect_and_use_filesystem`
    - `bazel test //codex-rs/core:core-all-wine-exec-test
    --test_output=errors` reached test execution; unrelated existing
    view-image, path, and timing failures remain.
    - `just test -p codex-core` and `just test` reached broad test
    execution; this checkout has unrelated helper, sandbox, and timing
    failures.
  • core: use current step environments for tools (#29547)
    ## Why
    
    With deferred executors, an environment can become ready between two
    sampling requests in the same turn. The model-visible environment
    update, advertised tools, and eventual tool execution must all describe
    the same request-time view.
    
    Otherwise, a request built while only environment B is ready can
    advertise a tool without an `environment_id`; if higher-priority
    environment A becomes ready before execution, that call could silently
    run in A instead.
    
    This PR is stacked on #29527.
    
    ## Design
    
    `run_turn` captures one `Arc<StepContext>` at each sampling-request
    boundary. That step owns the request's `TurnContext` and environment
    snapshot.
    
    - World-state environment updates and tool planning borrow that same
    step.
    - `ToolCallRuntime` retains the `Arc` while asynchronous tool calls
    execute.
    - `ToolInvocation` carries the step to handlers; its temporary `turn`
    compatibility field is derived from the same object.
    - `ToolRouter` does not retain `StepContext`; it only uses it while
    constructing the request's tool set.
    - With `DeferredExecutor` disabled, step capture keeps using the
    environments frozen at turn start.
    
    Simply: every sampling request gets one consistent picture of its
    environments, from what the model sees through where its tool calls run.
    
    ## What changed
    
    - Build environment-dependent tool specs from the current request's
    `StepContext`.
    - Use that same step for unified exec, legacy shell, `apply_patch`,
    `view_image`, and `request_permissions` execution.
    - Hide environment-backed tools, including `request_permissions`, while
    no environment is attached.
    - Resolve legacy shell paths and metadata from the selected step
    environment instead of the stale turn-start environment.
    - Capture explicit steps at non-turn-loop boundaries such as compaction,
    prompt debug, and startup prewarm.
    - Reconcile prompt-debug history from the same step used to build its
    tools.
    
    ## Follow-up
    
    - Bind yielded code-mode cells to the tool runtime that created them, so
    nested calls made after yielding continue to use the originating
    request's `StepContext`.
    
    ## Test plan
    
    - `just test -p codex-core
    deferred_executor_updates_context_and_tools_after_startup`
    - `just test -p codex-core
    environment_count_controls_environment_backed_tools`
    - `just test -p codex-core
    build_prompt_input_includes_context_and_user_message`
  • core: resolve view_image paths in selected environment (#29526)
    ## Why
    
    view_image needs to support foreign OS remote executors.
    
    ## What
    
    - resolve image paths against the selected environment as `PathUri` and
    read them through that environment's filesystem
    - keep app-server's public path field wire-compatible as
    `LegacyAppPathString`, with purpose-specific UI rendering
    - cover relative and absolute target-native paths in the core
    integration test and run the full `view_image` suite under wine-exec
    without skips
  • chore(core) rm AskForApproval::OnFailure (#28418)
    ## Summary
    Deletes the OnFailure variant of the `AskForApproval` enum. This option
    has been deprecated since #11631.
    
    ## Testing
    - [x] Tests pass
  • core: use turn-owned world state for inline compaction (#29527)
    ## Why
    
    Follow-up to #29249 and its [compaction review
    thread](https://github.com/openai/codex/pull/29249#discussion_r3455055101).
    
    During a turn, environment readiness can change between sampling
    requests. Inline compaction must render the same model-visible
    `WorldState` used by the request it follows. Rebuilding that state
    during compaction can observe a newer environment, make replacement
    history disagree with what the model saw, and suppress the next
    environment update.
    
    ## What changed
    
    - Make `run_turn` own the current `Arc<WorldState>` and replace it only
    between sampling requests.
    - Build each state from an explicitly chosen environment snapshot, diff
    deferred-executor steps against the turn-owned state, and retain the
    latest state in `ContextManager` only for cross-turn and resume
    tracking.
    - Pass the exact turn-owned state into inline compaction and explicit
    new-context-window replacement.
    - Carry that state with
    `InitialContextInjection::BeforeLastUserMessage`, so replacement context
    and its stored baseline cannot come from different snapshots.
    - Remove obsolete state-recapture helpers and ambiguous TurnContext-only
    WorldState builders.
    - Add an integration test that moves an environment from starting to
    ready during a paused turn, triggers compaction, and verifies the next
    request receives the readiness update exactly once.
    
    ## Test plan
    
    - `just test -p codex-core
    deferred_executor_compaction_preserves_then_updates_environment_once`
    - `just test -p codex-core process_compacted_history`
    - `just test -p codex-core mid_turn_continuation_compaction`
    - `just test -p codex-core build_initial_context`
    - `just test -p codex-core
    ignores_session_prefix_messages_when_truncating`