Commit Graph

630 Commits

  • [codex] add roles to realtime append text (#27936)
    ## Summary
    
    Add an explicit `user` or `developer` role to
    `thread/realtime/appendText` and propagate it through the realtime input
    queue into `conversation.item.create`. Older JSON clients that omit the
    field continue to default to `user`.
    
    This lets app-provided context such as memory retain developer authority
    without bypassing app-server through a renderer-owned data channel. The
    app-server schemas, API documentation, and focused protocol and
    websocket coverage are updated with the new contract.
    
    The Codex Apps consumer is tracked in
    [openai/openai#1025261](https://github.com/openai/openai/pull/1025261).
  • feat: use encrypted local secrets for MCP OAuth (#27541)
    ## Summary
    
    - store MCP OAuth credentials in the configured auth credential backend
    - support encrypted-local OAuth storage, including legacy keyring
    migration
    - propagate the credential backend through MCP refresh, session, CLI,
    and app-server paths
    
    ## Stack
    
    1. #27504 — config and feature flag
    2. #27535 — auth-specific secret namespaces
    3. #27539 — encrypted CLI auth storage
    4. this PR — encrypted MCP OAuth storage
    
    This is a parallel review stack; the original #17931 remains unchanged.
    
    ## Tests
    
    - `just test -p codex-rmcp-client` (the transport round-trip test passed
    after building the required `codex` binary and retrying)
    - `just test -p codex-mcp`
    - `just test -p codex-app-server
    refresh_config_uses_latest_auth_keyring_backend`
    - `just test -p codex-core
    refresh_mcp_servers_is_deferred_until_next_turn`
    - `just test -p codex-cli mcp`
    - `just fix -p codex-rmcp-client -p codex-mcp -p codex-core -p codex-cli
    -p codex-app-server -p codex-protocol`
    - `just bazel-lock-check`
  • Support plaintext agent messages (#27830)
    ## Why
    
    Multi-agent v2 `send_message` deliveries already reach the receiving
    model as typed `agent_message` items with encrypted content.
    Child-completion notifications are generated by Codex itself, so their
    content is plaintext and previously fell back to a serialized JSON
    envelope inside an assistant message.
    
    With plaintext `input_text` supported for `agent_message`, both delivery
    paths can use the same model-visible type while preserving explicit
    author and recipient metadata.
    
    ## What changed
    
    - add plaintext `input_text` support to `AgentMessageInputContent` and
    regenerate the affected app-server schemas
    - preserve `InterAgentCommunication` as structured mailbox input instead
    of converting it to assistant text
    - record delivered communications as typed `agent_message` history items
    - persist a dedicated rollout item so local delivery metadata such as
    `trigger_turn` remains available without leaking into the Responses
    request
    - reconstruct typed agent messages on resume and preserve fork-turn
    truncation behavior
    - remove request-time assistant-content parsing
    - preserve plaintext and encrypted inter-agent deliveries in stage-one
    memory inputs
    - normalize and link plaintext and encrypted agent messages in rollout
    traces without treating inbound messages as child results
    - cover the real MultiAgent V2 child-completion path end to end with
    deterministic mailbox synchronization
    
    ## Verification
    
    - `just test -p codex-core
    plaintext_multi_agent_v2_completion_sends_agent_message`
    - `just test -p codex-core input_queue_drains_mailbox_in_delivery_order
    record_initial_history_reconstructs_typed_inter_agent_message
    fork_turn_positions_use_inter_agent_delivery_metadata`
    - `just test -p codex-memories-write
    serializes_inter_agent_communications_for_memory`
    - `just test -p codex-rollout-trace
    agent_messages_preserve_routing_and_content
    sub_agent_started_activity_creates_spawn_edge`
    - `just test -p codex-rollout-trace
    agent_result_edge_falls_back_to_child_thread_without_result_message`
    - `just test -p codex-protocol -p codex-rollout -p
    codex-app-server-protocol`
  • realtime: add AVAS architecture override (#27720)
    ## Summary
    
    Adds a `RealtimeConversationArchitecture` option for realtime
    conversation startup, with `realtimeapi` as the default and `avas` as an
    opt-in architecture.
    
    The AVAS path is limited to realtime v1 conversational WebRTC starts,
    and WebRTC call creation appends `intent=quicksilver&architecture=avas`
    to `/v1/realtime/calls`. The existing sideband websocket still joins by
    `call_id`.
    
    This also exposes the per-session architecture override through
    app-server v2 `thread/realtime/start` params and updates the config
    schema for `[realtime].architecture`.
    
    ## Validation
    
    - `just fmt`
    - `just write-config-schema`
    - `just test -p codex-api sends_avas_session_call_query_params`
    - `just test -p codex-core -E
    'test(~conversation_webrtc_start_uses_avas_architecture_query)'`
    - `just test -p codex-core -E 'test(realtime_loads_from_config_toml)'`
    - `just test -p codex-app-server-protocol -E
    'test(~serialize_thread_realtime_start) |
    test(generated_ts_optional_nullable_fields_only_in_params)'`
    - `just test -p codex-app-server -E
    'test(realtime_webrtc_start_emits_sdp_notification)'`
  • Make MCP server contributions thread-scoped (#27670)
    ## Why
    
    `selectedCapabilityRoots` belongs to one thread, but MCP contributors
    previously received only the global Codex config. That left no clean way
    for a selected executor capability to contribute MCP servers to its own
    thread.
    
    ## What this PR does
    
    - Gives MCP contributors a small context containing the config and, for
    a running thread, its frozen host-seeded inputs.
    - Uses the same thread inputs during startup, status queries, refreshes,
    and skill dependency checks.
    - Keeps threadless MCP operations and the existing hosted Apps behavior
    unchanged.
    - Adds coverage showing that two threads resolve independent
    registrations and that later lifecycle mutations do not change the
    frozen MCP inputs.
    
    This PR does not discover plugin manifests, add MCP servers, or launch
    anything new. It only establishes the thread-scoped registration
    boundary.
    
    ## Follow-ups
    
    - Resolve selected executor plugin roots through their owning
    environment filesystem.
    - Convert their stdio MCP declarations into environment-bound
    registrations and add an executor MCP end-to-end test.
    
    ## Verification
    
    - `just fmt`
    - `cargo check --tests -p codex-protocol -p codex-extension-api -p
    codex-mcp-extension -p codex-core -p codex-app-server`
    
    Tests and Clippy were not run.
  • Add request_user_input auto-resolution window contract (#27256)
    ## Why
    
    `request_user_input` is moving beyond its original plan-mode-only
    workflow, and future default/goal-mode usage needs a way for the model
    to ask helpful but non-blocking questions without forcing the turn to
    wait forever. This PR adds an explicit `autoResolutionMs` contract so a
    later client/runtime change can auto-resolve unanswered prompts after a
    bounded window while leaving truly blocking questions unchanged.
    
    This is contract plumbing only; it does not implement the client-side
    timer or auto-selection behavior, and the model-facing description
    treats the field as reserved unless the current runtime explicitly
    supports auto-resolution.
    
    ## What Changed
    
    - Added optional `autoResolutionMs` to the model-facing
    `request_user_input` args and core `RequestUserInputEvent`.
    - Added model-facing schema text for `autoResolutionMs` while marking it
    reserved for runtimes that explicitly support auto-resolution.
    - Bounds `autoResolutionMs` to `60_000..=240_000` ms during argument
    normalization by clamping out-of-range model-provided values.
    - Propagated the field through app-server v2
    `ToolRequestUserInputParams`, app-server request forwarding, generated
    TypeScript, and JSON schema fixtures.
    - Updated app-server, core, protocol, and TUI call sites/tests so
    omitted values preserve existing `None`/`null` behavior and coverage
    verifies a `Some(60_000)` round trip.
    
    ## Verification
    
    - `just test -p codex-app-server-protocol`
    - `just test -p codex-core request_user_input`
    - `just test -p codex-app-server request_user_input_round_trip`
    - `just test -p codex-tui request_user_input`
    - `just test -p codex-protocol`
  • [codex] Compact when comp_hash changes (#27520)
    ## Summary
    - snapshot `comp_hash` into `TurnContext` when the turn is created and
    use that snapshot as the downstream source of truth
    - persist the turn hash in rollout context and recover it into
    previous-turn settings during resume and fork replay
    - compact existing history with the previous model only when both
    adjacent turns provide hashes and the values differ
    - record `comp_hash_changed` as the compaction reason
    - cover ordinary transitions, resume, and missing-hash compatibility
    with end-to-end tests
    
    ## Why
    History produced under one compaction-compatible model configuration may
    not be safe to carry directly into another. Compacting at the turn
    boundary converts that history before context updates and the new user
    message are added. Persisting the turn snapshot in `TurnContextItem`
    makes the same protection work after resuming a rollout.
    
    A missing hash is not treated as evidence of incompatibility. `None →
    Some`, `Some → None`, and `None → None` do not trigger compaction; only
    `Some(previous) → Some(current)` with unequal values does.
    
    ## Stack
    - depends on #27532
    - #27532 is based directly on `main`
    
    ## Testing
    - `just test -p codex-core pre_sampling_compact_` — 6 passed
    - `just test -p codex-core
    turn_context_item_uses_turn_context_comp_hash_snapshot` — passed
    - `just fix -p codex-core -p codex-protocol -p codex-analytics -p
    codex-models-manager`
  • [codex] Add comp_hash to model metadata (#27532)
    ## Summary
    - add optional `comp_hash` metadata to `ModelInfo`
    - update `ModelInfo` fixtures for the shared schema change
    - keep older model responses compatible by defaulting the field to
    `None`
    
    ## Why
    The models endpoint needs an opaque identifier for compaction-compatible
    model configurations. This PR only exposes that value in model metadata;
    it does not add it to turn context or change runtime behavior.
    
    Follow-up #27520 carries the value through turn context and rollouts,
    then uses it to trigger compaction.
    
    ## Stack
    - based directly on `main`
    - replaces #27519, which was accidentally merged into the wrong base
    branch
    - functionality follow-up: #27520
    
    ## Testing
    - `just test -p codex-protocol
    model_info_defaults_availability_nux_to_none_when_omitted`
    - `just fix -p codex-core -p codex-protocol -p codex-analytics -p
    codex-models-manager`
  • core: resize all history images behind a feature flag (#27247)
    ## Summary
    
    Adds complete client-side image preparation behind the default-off
    `resize_all_images` feature flag.
    
    When enabled, local image producers defer decoding and resizing. Images
    are prepared centrally before insertion into conversation history,
    covering user input, `view_image`, and structured tool-output images.
    
    ## Behavior
    
    - Processes base64 `data:` images in messages and function/custom tool
    outputs.
    - Leaves non-data URLs, including HTTP(S) URLs, unchanged.
    - Applies image-detail budgets:
      - `high` and omitted: 2048px maximum dimension and 2.5K 32px patches.
      - `original`: 6000px maximum dimension and 10K 32px patches.
      - `auto`: uses the same 2048px / 2.5K-patch budget as high.
      - `low`: unsupported and replaced with an actionable placeholder.
    - Preserves original image bytes when no resize or format conversion is
    needed.
    - Enforces the shared 1 GiB encoded and decoded data-URL sanity limits.
    - Replaces only an image that fails preparation, preserving sibling
    content and tool-output metadata.
    - Uses bounded placeholders distinguishing generic processing failures,
    oversized images, and unsupported `low` detail.
    - Prepares resumed and forked history before installing it as live
    history without modifying persisted rollouts.
    
    ## Flag-Off Behavior
    
    When `resize_all_images` is disabled:
    
    - Existing local user-input and `view_image` processing remains
    unchanged.
    - Existing decoding and error behavior remains unchanged.
    - Arbitrary tool-output images are not processed.
    - HTTP(S) image URLs continue to be forwarded unchanged.
    
    
    #### [git stack](https://github.com/magus/git-stack-cli)
    -  `1` https://github.com/openai/codex/pull/27245
    - 👉 `2` https://github.com/openai/codex/pull/27247
    -  `3` https://github.com/openai/codex/pull/27246
    -  `4` https://github.com/openai/codex/pull/27266
  • image: add shared data URL preparation utilities (#27245)
    ## Summary
    
    Add shared image-processing primitives needed for centralized image
    preparation in a follow-up PR.
    
    - Add `load_data_url_for_prompt` for decoding and preparing base64 image
    data URLs.
    - Add configurable maximum-dimension and 32px patch-budget resizing.
    - Enforce a 1 GiB sanity limit on both encoded and decoded data-URL
    representations.
    - Preserve original PNG, JPEG, and WebP bytes when resizing is
    unnecessary.
    - Preserve the existing GIF-to-PNG behavior.
    - Move image utility tests into the existing sidecar test module.
    
    ## Behavior
    
    This PR is intended to be runtime behavior-preserving.
    
    Existing production callers continue using
    `PromptImageMode::ResizeToFit` and `PromptImageMode::Original` with
    their existing semantics. The new data-URL entrypoint and configurable
    resize mode have no production callers in this PR; they are used by the
    next PR in the stack.
    
    This PR does not change user-input handling, `view_image`, history
    insertion, request construction, HTTP image URL forwarding, or
    app-server behavior.
    
    
    #### [git stack](https://github.com/magus/git-stack-cli)
    - 👉 `1` https://github.com/openai/codex/pull/27245
    -  `2` https://github.com/openai/codex/pull/27247
    -  `3` https://github.com/openai/codex/pull/27246
    -  `4` https://github.com/openai/codex/pull/27266
  • [codex] Store compact window id in rollout (#27264)
    ## Why
    
    Compaction window identity is part of session history, not model-client
    transport state. Persisting it with the compacted rollout item lets
    resumed threads continue from the reconstructed window without keeping
    mutable window state on `ModelClient`.
    
    ## What changed
    
    - Added `window_id` to `CompactedItem` and stamp it when
    `replace_compacted_history` installs compacted history.
    - Moved auto-compact window id ownership into `AutoCompactWindow` /
    `SessionState`; `ModelClient` now receives the request window id from
    callers instead of storing it.
    - Returned `window_id` from rollout reconstruction for resume.
    Reconstruction uses the newest surviving compacted item's stored
    `window_id` when present, and falls back to the legacy compacted-item
    count when it is absent.
    - Kept fork startup at the fresh default window id and updated direct
    model-client tests to pass explicit test window ids.
    
    ## Validation
    
    - `cargo check -p codex-core --tests`
  • Add per-session realtime model and version overrides (#24999)
    ## Why
    
    Clients need to select a realtime session configuration for an
    individual start without rewriting persisted configuration or restarting
    the app-server process.
    
    ## What Changed
    
    - Add optional `model` and `version` fields to `thread/realtime/start`
    - Forward those optional values through the realtime start operation and
    apply them only for that session
    - Preserve existing configured/default behavior when the new fields are
    omitted
    - Update generated protocol schema and app-server documentation
    
    ## Validation
    
    - Added/updated protocol serialization coverage for the new optional
    request fields
    - Added focused core coverage for a session override taking precedence
    over configured realtime selection
    - Added focused app-server coverage that a request override reaches the
    realtime WebSocket handshake
  • [codex-analytics] add extensible feature thread sources (#27063)
    ## Why
    - `ThreadSource` currently defines a closed set of core-owned values
    - Product features also create threads for background or scheduled work
    - Adding every product-specific value to the core enum would require
    repeated `codex-rs` protocol changes
    - Feature-backed values let product callers provide precise attribution
    while preserving the existing core classifications
    
    ## What Changed
    - Adds `ThreadSource::Feature(String)` for app-owned thread source
    values
    - Represents all app-server v2 thread sources as scalar strings, so a
    feature source is supplied as `"automation"`
    - Persists and emits the feature's plain string label, so `"automation"`
    produces `thread_source="automation"` in analytics
    - Keeps `user`, `subagent`, and `memory_consolidation` as explicit
    core-owned values and regenerates the app-server schemas and TypeScript
    bindings
    
    ## Verification
    - `just write-app-server-schema`
    - `cargo check --workspace`
    - `just test -p codex-protocol
    feature_thread_source_serializes_as_its_app_owned_label`
    - `just test -p codex-app-server-protocol
    thread_sources_round_trip_as_scalar_labels`
    - `cargo test -p codex-analytics
    thread_initialized_event_serializes_expected_shape`
    - `just fmt`
  • Load selected executor skills through extensions (#27184)
    ## Why
    
    CCA is moving toward a split runtime where the orchestrator may not have
    a filesystem, while executors can expose preinstalled plugins and
    skills. A thread therefore needs to select capabilities without asking
    app-server or core to interpret executor-owned paths through the
    orchestrator's filesystem.
    
    The longer-term model is broader than executor skills:
    
    - A plugin is a bundle of skills, MCP servers, connectors/apps, and
    hooks.
    - A plugin root can be local, executor-owned, or hosted by a backend.
    - Components inside one plugin can use different access and execution
    mechanisms. A skill may be read from a filesystem or through backend
    tools; an HTTP MCP server can run without an executor; a stdio MCP
    server or hook needs an execution environment.
    - Core should carry generic extension initialization data. The extension
    that owns a component should discover it, expose it to the model, and
    invoke it through the appropriate runtime.
    
    This PR establishes that architecture through one complete vertical:
    selecting a root on an executor, discovering the skills beneath it,
    exposing those skills to the model, and reading an explicitly invoked
    `SKILL.md` through the same executor.
    
    ## Contract
    
    `thread/start` gains an experimental `selectedCapabilityRoots` field:
    
    ```json
    {
      "selectedCapabilityRoots": [
        {
          "id": "deploy-plugin@1",
          "location": {
            "type": "environment",
            "environmentId": "workspace",
            "path": "/opt/codex/plugins/deploy"
          }
        }
      ]
    }
    ```
    
    The root is intentionally not classified as a "plugin" or "skill" in the
    API. It can point at a standalone skill, a directory containing several
    skills, or a plugin containing skills and other components. This PR only
    teaches the skills extension how to consume it; later extensions can
    resolve MCP, connector, and hook components from the same selection.
    
    The platform-supplied `id` is stable selection identity. The location
    says which runtime owns the root and gives that runtime an opaque path.
    App-server does not inspect or canonicalize the path.
    
    ## What changed
    
    ### Generic thread extension initialization
    
    App-server converts selected roots into `ExtensionDataInit`. Core
    carries that generic initialization value until the final thread ID is
    known, then creates thread-scoped `ExtensionData` before lifecycle
    contributors run.
    
    This keeps `Session` and core independent of the capability-selection
    contract. The initialization value is consumed during construction; it
    is not retained as another long-lived `Session` field.
    
    ### Executor-backed skills
    
    The skills extension now owns an `ExecutorSkillProvider` that:
    
    - resolves the selected environment through `EnvironmentManager`
    - discovers, canonicalizes, and reads skills through that environment's
    `ExecutorFileSystem`
    - contributes the bounded selected-skill catalog as stable developer
    context
    - reads an explicitly invoked skill body through the authority that
    listed it
    - warns when an environment or root is unavailable
    - never falls back to the orchestrator filesystem for an executor-owned
    root
    
    Skill catalog and instruction fragments have hard byte bounds, which
    also bound them below the 10K-token per-item context limit. If a
    selected executor skill has the same name as a legacy local skill, the
    executor selection owns that invocation and the local body is not
    injected a second time.
    
    Existing local and bundled skill loading remains in place. Omitting
    `selectedCapabilityRoots` therefore preserves current local-only
    behavior.
    
    ## Current semantics
    
    - Only environment-owned locations are represented in this first
    contract.
    - Roots are resolved by the destination extension, not by app-server or
    core.
    - An unavailable executor or invalid root produces a warning and no
    capabilities from that root; it does not trigger a local-filesystem
    fallback.
    - Selection applies to a newly started active thread.
    - MCP servers, connectors, and hooks beneath a selected plugin root are
    not activated yet.
    - Selection is not yet persisted or inherited across resume, fork, or
    subagent creation. Existing local capabilities continue to behave as
    they do today in those flows.
    
    ## Planned vertical follow-ups
    
    1. **Hosted HTTP MCP:** add an extension-backed HTTP MCP source that
    works without an executor, then replace the special-purpose MCP plugins
    loader with that implementation.
    2. **Executor MCP:** register and execute stdio MCP servers through the
    environment that owns the selected plugin root.
    3. **Backend skills:** add a hosted skill source whose catalog and
    bodies are accessed through extension tools rather than a filesystem.
    4. **Connectors and hooks:** activate those components through their
    owning extensions, using the same selected-root boundary and
    component-specific runtime.
    5. **Durable selection:** define the desired-selection lifecycle,
    persist it, and make resume, fork, and subagent inheritance explicit
    rather than accidental.
    6. **Local convergence:** incrementally route existing local plugin,
    skill, and MCP loading through the same extension model while preserving
    current local behavior.
    
    Each follow-up remains reviewable as an end-to-end capability. The
    platform selects roots, generic thread extension data carries the
    selection, and the owning extension resolves and operates its component.
    
    ## Verification
    
    Coverage added for:
    
    - app-server end-to-end discovery and explicit invocation of a skill
    inside an executor-selected plugin root
    - exclusive invocation when a selected executor skill collides with a
    local skill name
    - executor filesystem authority for discovery, canonicalization, and
    reads
    - thread extension initialization before lifecycle contributors run
    - stable executor catalog context, explicit invocation, context
    rebuilding, hidden skills, and preserved host/remote catalog behavior
    
    Targeted protocol, core-skills, skills-extension, core lifecycle, and
    app-server executor-skill tests were run during development.
  • multi-agent: add path-based v2 activity tracking (#27007)
    ## Why
    
    Multi-agent v2 identifies agents by canonical paths, but its tool
    handlers still emitted the larger legacy collaboration begin/end events
    built around nickname and role metadata. App-server, rollout-trace,
    analytics, and TUI consumers therefore lacked one compact path-based
    completion signal that behaved consistently across live events and
    replay.
    
    The TUI also needs a bounded `/agent` status surface for v2 agents. It
    should use recent local activity for previews, refresh liveness without
    loading full histories, and keep the legacy picker available when no
    path-backed v2 agent is known.
    
    ## What changed
    
    - Replace the v2 `spawn_agent`, `send_message`, `followup_task`, and
    `interrupt_agent` legacy lifecycle emissions with a success-only
    `SubAgentActivity` event. The event records the tool call ID, occurrence
    time, affected thread, canonical agent path, and `started`,
    `interacted`, or `interrupted` kind.
    - Expose the activity as a completion-only app-server v2
    `subAgentActivity` thread item in live notifications and reconstructed
    history, regenerate the protocol schemas, and count it in sub-agent tool
    analytics.
    - Track canonical paths from live activity and loaded-thread metadata in
    the TUI, and render the activity in live and replayed transcripts.
    - Make `/agent` list running path-backed agents with summaries from
    bounded local event buffers. Each summary is capped at 240 graphemes,
    the scan is capped at six recent items, only the last three wrapped
    lines are shown, and command output is omitted. Liveness falls back to
    metadata-only `thread/read` when local turn state is unavailable.
    - Persist the activity as a terminal rollout-trace runtime payload and
    reduce it to the corresponding spawn, send, follow-up, or close
    interaction edge. `interrupt_agent` is classified as a close-edge
    operation.
    - Preserve the legacy picker when no path-backed v2 agent is known.
    
    ## Compatibility
    
    App-server v2 clients that consumed `collabAgentToolCall` begin/end
    pairs for these tools must handle the new completion-only
    `subAgentActivity` item. Legacy v1 collaboration behavior is unchanged.
    
    ## Screenshot
    
    <img width="684" height="288" alt="Screenshot 2026-06-08 at 15 40 47"
    src="https://github.com/user-attachments/assets/194b3cd0-619d-45fb-b587-cf3e2b1b8a1d"
    />
    
    ## Testing
    
    - `just test -p codex-app-server-protocol`
    - `just test -p codex-rollout-trace`
    - Added focused coverage for activity analytics, terminal trace
    serialization, spawn-edge reduction, `interrupt_agent` classification,
    TUI status rendering without aggregated command output, and clearing
    stale running state after a completed turn.
  • Pair thread environment settings (#26687)
    ## Why
    
    Thread cwd and environment selections are a single logical setting in
    core: updating one without the other can silently desynchronize the
    next-turn execution context. This change makes that relationship
    explicit in the internal thread settings flow while preserving the
    existing app-server public API shape.
    
    ## What changed
    
    - Moved the cwd/environment pair through internal
    `ThreadSettingsOverrides.environment_settings` instead of a top-level
    internal `cwd` field.
    - Kept `thread/settings/update` public params unchanged, with app-server
    translating top-level `cwd` into the paired internal settings shape.
    - Moved `Op::UserInput` environment overrides into thread settings so
    user turns and settings updates use the same core path.
    - Updated core, app-server, MCP, memories, sample, and test callsites to
    construct the paired settings shape.
    
    ## Verification
    
    - `git diff --check`
    - Local test run starting after PR creation.
  • fix: preserve auto review across config and delegation (#26230)
    ## Why
    
    Auto Review should remain the effective approval reviewer when settings
    cross runtime boundaries. A config or app-server round trip must not
    change the reviewer identity, and delegated work must not silently fall
    back to user review.
    
    This requires both a stable canonical serialized value and propagation
    of the effective setting. `auto_review` is the canonical value across
    protocol and app-server output, while `guardian_subagent` remains
    accepted as backward-compatible input.
    
    ## What changed
    
    - serialize `ApprovalsReviewer::AutoReview` consistently as
    `auto_review` across core protocol and app-server v2
    - continue accepting `guardian_subagent` when reading existing config or
    client requests
    - carry the active turn's approval reviewer into spawned agents
    - update config/debug expectations and add delegated-task regression
    coverage
    
    ## Scope
    
    This does not change Guardian policy or remove compatibility with
    existing `guardian_subagent` inputs. It preserves the selected reviewer
    across serialization, config reloads, app-server settings, and delegated
    task setup.
    
    Related Guardian changes are split independently:
    
    - #26231 adds denials and soft denials
    - #26334 retries transient reviewer failures
    - #26333 reuses narrowly scoped low-risk approvals
    - #26232 adds TUI denial recovery
    
    ## Validation
    
    - `just test -p codex-app-server-protocol` (224 passed)
    - regression coverage for delegated task reviewer propagation
    - serialization coverage for canonical `auto_review` output and legacy
    `guardian_subagent` input
    
    ---------
    
    Co-authored-by: saud-oai <saud@openai.com>
  • protocol: remove submission-side serde from Op (#26674)
    ## Why
    
    Submission-side `Op` payloads are now an internal handoff inside the
    Rust codebase, so keeping a stable serde contract there adds complexity
    without a real wire consumer.
    
    ## What changed
    
    - remove serde/schema annotations from `Submission`, `Op`, and
    submission-only payload types like thread settings overrides, additional
    context, realtime conversation params, `TurnEnvironmentSelection`, and
    `RequestUserInputResponse`
    - delete the `Op` serialization tests and the now-unused double-option
    prompt serde helper
    - keep event/API-facing serialization where it is still required, and
    serialize the `request_user_input` tool output from its wire payload
    instead of the core response struct
    - update `protocol_v1.md` to call out that events remain the serialized
    transport surface while submission payloads are implementation details
    
    ## Testing
    
    - `just test -p codex-protocol`
    - `cargo check -p codex-core -p codex-app-server -p codex-thread-store`
    - `just test -p codex-core request_user_input`
  • Require absolute cwd in thread settings (#26532)
    ## Why
    
    Thread settings cwd overrides are expected to be resolved before they
    enter core. Keeping this boundary as a plain `PathBuf` made it easy for
    core/session code to keep fallback normalization and relative-path
    resolution logic in places that should only receive an already-resolved
    cwd.
    
    This is intentionally the absolute-cwd-only slice: it does not change
    environment selection stickiness or cwd-to-default-environment fallback
    behavior.
    
    ## What changed
    
    - Changes `ThreadSettingsOverrides.cwd`,
    `CodexThreadSettingsOverrides.cwd`, and `SessionSettingsUpdate.cwd` to
    use `AbsolutePathBuf`.
    - Removes core-side cwd normalization/resolution from session settings
    updates.
    - Updates affected core/app-server test helpers and callsites to pass
    existing absolute cwd values or use `abs()` helpers.
    
    ## Validation
    
    Opening as draft so CI can start while local validation continues.
  • [codex] Forward turn moderation metadata through app-server (#25710)
    ## Why
    First-party backends can supply turn-scoped moderation metadata that
    app-server clients need for client-side presentation. Exposing this as
    an experimental typed notification lets opted-in clients consume it
    without interpreting raw Responses API events.
    
    ## What changed
    - forward `response.metadata.openai_chatgpt_moderation_metadata` from
    Responses API SSE and WebSocket streams as turn-scoped moderation
    metadata
    - emit the experimental app-server v2 `turn/moderationMetadata`
    notification with `{ threadId, turnId, metadata }`
    - add app-server integration coverage for the typed moderation metadata
    notification
    
    ## Testing
    - `just test -p codex-core
    build_ws_client_metadata_includes_window_lineage_and_turn_metadata`
    - `just test -p codex-core` (fails locally: 46 failures and 1 timeout,
    primarily missing `test_stdio_server` and shell snapshot timeouts)
    - `just test -p codex-app-server-protocol`
    - `just test -p codex-app-server
    turn_moderation_metadata_emits_typed_notification_v2`
    - `just test -p codex-app-server` (fails locally: 792 passed, 10 failed,
    and 5 timed out; failures are in existing environment-sensitive tests,
    primarily because nested macOS `sandbox-exec` is not permitted)
    - `just write-app-server-schema --experimental --schema-root
    /tmp/codex-app-server-schema-experimental`
  • Encrypt multi-agent v2 message payloads (#26210)
    ## Why
    
    Multi-agent v2 currently routes agent instructions through normal tool
    arguments and inter-agent context. That means the parent model can emit
    plaintext task text, Codex can persist it in history/rollouts, and the
    recipient can receive it as ordinary assistant-message JSON.
    
    This changes the v2 path so agent instructions stay encrypted between
    model calls: Responses encrypts the `message` argument returned by the
    model, Codex forwards only that ciphertext, and Responses decrypts it
    internally for the recipient model.
    
    ## What changed
    
    - Mark the v2 `message` parameter as encrypted for `spawn_agent`,
    `send_message`, and `followup_task`.
    - Treat multi-agent v2 tool `message` values as ciphertext
    unconditionally.
    - Store v2 inter-agent task text in
    `InterAgentCommunication.encrypted_content` with empty plaintext
    `content`.
    - Convert encrypted inter-agent communications into the Responses
    `agent_message` input item before sending the child request.
    - Preserve `agent_message` items across history, rollout, compaction,
    telemetry, and app-server schema paths.
    - Leave multi-agent v1 unchanged.
    
    ## Message shape
    
    The model still calls the v2 tools with a `message` argument, but that
    value is now ciphertext:
    
    ```json
    {
      "name": "spawn_agent",
      "arguments": {
        "task_name": "worker",
        "message": "<ciphertext>"
      }
    }
    ```
    
    Codex stores the task as encrypted inter-agent communication:
    
    ```json
    {
      "author": "/root",
      "recipient": "/root/worker",
      "content": "",
      "encrypted_content": "<ciphertext>",
      "trigger_turn": true
    }
    ```
    
    When Codex builds the recipient request, it forwards the ciphertext
    using the new Responses input item:
    
    ```json
    {
      "type": "agent_message",
      "author": "/root",
      "recipient": "/root/worker",
      "content": [
        {
          "type": "encrypted_content",
          "encrypted_content": "<ciphertext>"
        }
      ]
    }
    ```
    
    Responses decrypts that item internally for the recipient model.
    
    ## Context impact
    
    - Parent context no longer carries plaintext v2 agent task instructions
    from these tool arguments.
    - Codex rollout/history stores ciphertext for v2 agent instructions.
    - Recipient requests receive an `agent_message` item instead of
    assistant commentary JSON for encrypted task delivery.
    - Plaintext completion/status notifications are still plaintext because
    they are Codex-generated status messages, not encrypted model tool
    arguments.
    
    ## Validation
    
    - `just test -p codex-tools`
    - `just test -p codex-protocol`
    - `just test -p codex-rollout`
    - `just test -p codex-rollout-trace`
    - `just test -p codex-otel`
    - `just write-app-server-schema`
  • [codex] Add use_responses_lite 'override' logic (#26487)
    ## Summary
    
    - add a defaulted `ModelInfo.use_responses_lite` catalog field
    - support serializing `reasoning.context` while preserving the existing
    effort and summary path
    - has not been turned on for any models yet
    
    I've added an override to parallel tools if responses_lite is on. I've
    also forced persistent reasoning when using responses_lite. It would be
    ideal if we could centralize all the responses_lite plumbing, but I
    think this is best for now to keep the plumbing & diffs small.
    
    ## Testing
    
    - `cargo test -p codex-protocol
    model_info_defaults_availability_nux_to_none_when_omitted`
    - `RUST_MIN_STACK=8388608 cargo test -p codex-core
    responses_lite_sets_all_turns_context_and_disables_parallel_tool_calls`
    - `RUST_MIN_STACK=8388608 cargo test -p codex-core
    configured_reasoning_summary_is_sent`
    - `cargo check -p codex-core --tests`
    - `RUST_MIN_STACK=8388608 cargo clippy -p codex-core --tests` (passes
    with pre-existing warnings in `codex-code-mode` and
    `codex-core-plugins`)
  • [codex] Use model-advertised reasoning effort order (#26446)
    ## Summary
    - preserve the model catalog order for app-server
    `supportedReasoningEfforts` and document that client contract
    - render TUI reasoning choices in the advertised order
    - step reasoning shortcuts by adjacent list position instead of deriving
    order from known effort names
    - anchor unsupported configured values to the advertised default, or the
    first option when needed
    - remove canonical effort ordering helpers and the unused upgrade effort
    mapping
    
    ## Validation
    - `just fmt`
    - Local tests and compilation were not run per request; relying on CI.
    
    Stacked on #26444.
  • [codex] Support model-defined reasoning efforts (#26444)
    ## Summary
    - accept non-empty model-defined reasoning effort values while
    preserving built-in effort behavior
    - propagate the non-Copy effort type through core, app-server, TUI,
    telemetry, and persistence call sites
    - preserve string wire encoding and expose an open-string schema for
    clients
    - update model selection and shortcut behavior for model-advertised
    effort values
    
    ## Root cause
    `ReasoningEffort` gained a string-backed custom variant, so it could no
    longer implement `Copy` or rely on derived closed-enum serialization.
    Existing consumers still moved effort values from shared references and
    assumed a fixed built-in value set.
    
    ## Validation
    - `just fmt`
    - Local tests and compilation were not run per request; relying on CI.
  • Expose local image paths to models (#25944)
    ## Why
    
    Local image attachments include image bytes, but the adjacent
    model-visible label omits the source path. Exposing the path lets
    model-selected workflows refer back to the intended local image
    explicitly.
    
    ## What changed
    
    - Include an escaped `path` attribute in model-visible local image
    opening tags.
    - Reuse the path-aware marker generator in rollout coverage.
    - Update protocol, replay, and rollout coverage for the new request
    shape.
    
    ## Validation
    
    - `just fmt`
    - `just test -p codex-protocol`
    - `just test -p codex-core skips_local_image_label_text`
    - `just test -p codex-core
    copy_paste_local_image_persists_rollout_request_shape`
    - `git diff --check`
  • Propagate permission approval environment id (#25862)
    ## Stack
    
    1. #25850 - Key request-permission grants by environment: stores and
    applies sticky permission grants per environment id.
    2. #25858 - Add `environmentId` to `request_permissions`: lets the model
    target a selected environment and resolves relative permission paths
    against it.
    3. This PR (#25862) - Propagate permission approval environment id:
    carries the selected environment id through approval events, app-server
    requests, TUI prompts, and delegate forwarding.
    4. #25867 - Add remote request permissions integration coverage:
    verifies the selected remote environment across request, approval, grant
    reuse, and exec.
    
    This PR is stacked on #25858, and #25867 is stacked on this PR.
    
    ## Why
    
    PR2 lets the model bind a `request_permissions` call to a selected
    environment, but the approval event and client-facing request still
    needed to carry that binding. For CCA, the user-facing prompt and
    delegated approval path should know which environment the grant applies
    to instead of relying on cwd alone.
    
    ## What Changed
    
    - Added optional `environmentId` to `RequestPermissionsEvent`.
    - Emit the selected environment id from core permission approval events.
    - Preserve the environment id through delegate forwarding, including
    cwd-based delegated requests.
    - Added `environmentId` to app-server permission approval params,
    generated schema/TypeScript artifacts, and README examples.
    - Preserve and display the environment id in TUI permission approval
    prompts.
    - Updated focused core, app-server protocol, and TUI conversion
    coverage.
    
    ## Testing
    
    Not run locally per instruction. Performed read-only `git diff --check`.
  • Add environmentId to request_permissions (#25858)
    ## Stack
    
    1. #25850 - Key request-permission grants by environment: stores and
    applies sticky permission grants per environment id.
    2. This PR (#25858) - Add `environmentId` to `request_permissions`: lets
    the model target a selected environment and resolves relative permission
    paths against it.
    3. #25862 - Propagate permission approval environment id: carries the
    selected environment id through approval events, app-server requests,
    TUI prompts, and delegate forwarding.
    4. #25867 - Add remote request permissions integration coverage:
    verifies the selected remote environment across request, approval, grant
    reuse, and exec.
    
    This PR is stacked on #25850; #25862 and #25867 are stacked on this PR.
    
    ## Why
    
    PR1 made request-permission grants internally environment-keyed, but the
    model-facing `request_permissions` tool could still only target the
    primary environment. For CCA and multi-environment turns, the tool needs
    an explicit way to bind a permission request to a selected attached
    environment before resolving relative paths.
    
    ## What Changed
    
    - Added optional `environmentId` to `RequestPermissionsArgs`, with
    `environment_id` accepted as an alias.
    - Exposed `environmentId` in the `request_permissions` tool schema and
    description.
    - Resolve the selected environment before parsing filesystem permission
    paths, so relative paths bind to the selected environment cwd.
    - Route validated tool calls through
    `request_permissions_for_environment` directly instead of duplicating
    environment lookup in `Session::request_permissions`.
    - Reject unknown environment ids with a model-facing error.
    - Updated focused request-permissions and Guardian call sites for the
    new optional field.
    
    ## Testing
    
    Not run locally per instruction.
  • core: derive built-in permission profiles from raw policies (#25739)
    ## Why
    
    Permission profiles that extend a built-in profile should behave like
    other TOML inheritance: parent entries provide defaults, and child keys
    override matching fields before the profile is compiled.
    
    That was not true for `:workspace`. Previously, a profile with `extends
    = ":workspace"` seeded the compiled runtime
    `PermissionProfile::workspace_write()` policy and then appended child
    filesystem entries. A child override such as `":tmpdir" = "read"`
    therefore left the inherited `":tmpdir" = "write"` entry in the final
    policy. Since same-target `write` wins over `read` during runtime
    resolution, the child override was ineffective.
    
    This also needs a clear source of truth for the built-in profiles. The
    protocol-level sandbox policy constructors now define the raw built-in
    filesystem entries, and both `PermissionProfile` presets and
    config-profile inheritance derive from those same values.
    
    ## What Changed
    
    - Add a canonical `FileSystemSandboxPolicy::read_only()` constructor
    while keeping the read-only and workspace-write raw filesystem entries
    explicit and independent.
    - Derive `PermissionProfile::read_only()` from
    `FileSystemSandboxPolicy::read_only()`;
    `PermissionProfile::workspace_write()` continues to derive from
    `FileSystemSandboxPolicy::workspace_write()`.
    - Build extensible `:read-only` and `:workspace` parent profiles by
    projecting those canonical sandbox policies into
    `PermissionProfileToml`, then merge user overrides at the TOML layer
    before compilation.
    - Add config parsing support for `:slash_tmp` so the built-in
    `:workspace` parent can be expressed in the same TOML-shaped filesystem
    table as user profiles.
    - Document that `PermissionsToml::resolve_profile()` returns an
    already-merged `PermissionProfileToml`, and return that profile directly
    after removing the resolved-profile wrapper.
    - Extend the config test for `extends = ":workspace"` to assert that
    inherited `":slash_tmp" = "write"` is preserved and that a child
    `":tmpdir" = "read"` entry replaces the inherited `write` entry.
    
    ## Verification
    
    - `just test -p codex-config`
    - `just test -p codex-protocol`
    - `just test -p codex-core
    permissions_profiles_resolve_extends_parent_first_with_child_overrides`
    - `just test -p codex-core
    default_permissions_profile_can_extend_builtin_workspace`
    - `just test -p codex-core`
      - Result: 2596 passed, 4 failed, 1 timed out.
    - The failures were existing sandbox/environment-sensitive tests
    unrelated to this permissions change:
    
    `suite::user_shell_cmd::user_shell_command_does_not_set_network_sandbox_env_var`,
    
    `suite::user_shell_cmd::user_shell_command_history_is_persisted_and_shared_with_model`,
    
    `suite::abort_tasks::interrupt_persists_turn_aborted_marker_in_next_request`,
        `suite::abort_tasks::interrupt_tool_records_history_entries`, and
    
    `thread_manager::tests::start_thread_uses_all_default_environments_from_codex_home`.
  • Add multi-agent runtime metadata types (#25720)
    Stack split from #25708. Original PR intentionally left open. This first
    PR adds the multi-agent runtime metadata types and catalog plumbing used
    by the rest of the stack.
  • feat: show enterprise monthly credit limits in status (#24812)
    ## Summary
    
    Enterprise users can have an effective monthly credit limit, but Codex
    `/status` currently drops that metadata from the account-usage response.
    
    This change adds the optional `spend_control.individual_limit`
    projection to the existing rate-limit snapshot flow. The backend client
    reads the monthly limit, app-server exposes it as `individualLimit`, and
    the TUI renders a `Monthly credit limit` row through the existing
    progress-bar renderer.
    
    When the backend does not return an effective monthly limit, existing
    rate-limit behavior is unchanged.
    
    ## Existing backend state
    
    The account-usage backend already returns the effective monthly limit
    and current usage together:
    
    ```json
    {
      "spend_control": {
        "reached": false,
        "individual_limit": {
          "limit": "25000",
          "used": "8000",
          "remaining": "17000",
          "used_percent": 32,
          "remaining_percent": 68,
          "reset_after_seconds": 86400,
          "reset_at": 1778137680
        }
      }
    }
    ```
    
    Before this change, Codex projected rolling `primary` and `secondary`
    windows plus `credits`. It ignored `spend_control.individual_limit`, so
    app-server clients and `/status` could not render the monthly cap.
    
    The updated flow is:
    
    ```text
    account usage backend
      -> backend-client reads spend_control.individual_limit
      -> existing rate-limit snapshot carries optional individual_limit
      -> app-server exposes optional individualLimit
      -> TUI renders Monthly credit limit
    ```
    
    ## App-server contract
    
    `account/rateLimits/read` and sparse `account/rateLimits/updated`
    notifications now include an additive nullable
    `rateLimits.individualLimit` field:
    
    ```json
    {
      "individualLimit": {
        "limit": "25000",
        "used": "8000",
        "remainingPercent": 68,
        "resetsAt": 1778137680
      }
    }
    ```
    
    In an `account/rateLimits/read` response, `null` means no monthly limit
    is available. `account/rateLimits/updated` remains a sparse rolling
    notification: clients merge available values into their most recent
    `account/rateLimits/read` snapshot or refetch. Nullable account metadata
    in a rolling notification does not clear a previously observed value.
    
    ## Design decisions
    
    - Extend the existing rate-limit snapshot instead of introducing a
    separate request or wire-level update protocol.
    - Keep the Codex projection narrow: `/status` needs the effective limit,
    current usage, remaining percentage, and reset timestamp.
    - Render the monthly row through the existing progress-bar renderer,
    with one optional detail line for `8,000 of 25,000 credits used`.
    - Keep the backend response optional so existing accounts and older
    usage states preserve their current behavior.
    - Preserve cached monthly metadata when sparse rolling notifications
    omit it. Live account-usage reads remain authoritative and can clear a
    removed limit.
    
    ## Visual evidence
    
    ```text
     Monthly credit limit:   [██████████████░░░░░░] 68% left (resets 07:08 on 7 May)
                             8,000 of 25,000 credits used
    ```
    
    Snapshot:
    `codex-rs/tui/src/status/snapshots/codex_tui__status__tests__status_snapshot_includes_enterprise_monthly_credit_limit.snap`
    
    ## Testing
    
    Tests: generated app-server schema verification, protocol tests,
    backend-client tests, app-server integration coverage, TUI snapshot
    coverage, formatting, and workspace lint cleanup.
  • [codex-rs] auto-review model override (#23767)
    ## Why
    
    Guardian auto-review normally uses the provider-preferred review model
    when one is available. Some parent models need model-catalog metadata to
    select a different review model while keeping older `/models` payloads
    compatible when that metadata is absent.
    
    ## What changed
    
    - Added optional `ModelInfo::auto_review_model_override` metadata to the
    public model payload as a review-model slug.
    - Updated Guardian review model selection to prefer the catalog override
    when present, while preserving the existing provider preferred-model
    path and parent-model fallback when it is omitted.
    - Added focused Guardian coverage for override and no-override model
    selection.
    - Added an `auto_review` core integration suite test that loads override
    metadata from a remote model catalog path and asserts the strict
    auto-review `/responses` request uses the catalog-selected review model.
    - Updated existing `ModelInfo` fixtures and local catalog constructors
    for the new optional field.
    
    ## Validation
    
    - `cargo test -p codex-protocol
    model_info_defaults_availability_nux_to_none_when_omitted`
    - `cargo test -p codex-core guardian_review_uses_`
    - `cargo test -p codex-core
    remote_model_override_uses_catalog_model_for_strict_auto_review --test
    all`
    - `just fix -p codex-protocol`
    - `just fix -p codex-core`
    - `just fmt`
    - `git diff --check`
  • store and expose parent_thread_id on Threads (#25113)
    ## Why
    
    This PR
    https://github.com/openai/codex/pull/24161#discussion_r3325692763
    revealed a subagent data modeling issue, where we overloaded
    `forked_from_id` to also mean `parent_thread_id`. That's incorrect since
    guardian and review subagents can be a subagent and NOT fork the main
    thread's history.
    
    The solution here is to explicitly store a new `parent_thread_id` on
    `SessionMeta`, alongside `forked_from_id` which already exists. While
    we're at it, also expose it in the app-server protocol on the `Thread`
    object.
    
    A thread->subagent relationship and a fork of thread history are
    orthogonal concepts.
    
    ## What Changed
    
    - Added top-level `parent_thread_id` persistence on `SessionMeta` and
    runtime/session plumbing through `SessionConfiguredEvent`,
    `CodexSpawnArgs`, `SessionConfiguration`, `ThreadConfigSnapshot`,
    `TurnContext`, and `ModelClient`.
    - Made turn metadata, request headers, analytics, and subagent-start
    events read the separate runtime/top-level parent field instead of
    deriving general parent lineage from `SessionSource` or
    `forked_from_thread_id`.
    - Passed parent lineage separately at delegated subagent, review,
    guardian, agent-job, and multi-agent spawn construction sites;
    copied-history fork lineage remains derived only from `InitialHistory`.
    - Persisted and exposed parent lineage through rollout/thread-store
    projections and app-server v2 `Thread.parentThreadId`.
    - Updated app-server README text and regenerated app-server schema
    fixtures for the additive `parentThreadId` response field.
  • Add cloud-managed config layer support (#24620)
    ## Summary
    
    PR 3 of 5 in the cloud-managed config client stack.
    
    Adds enterprise-managed cloud config as a first-class config layer
    source. The layer metadata is preserved through config loading,
    diagnostics, debug output, hook attribution, and app-server protocol
    surfaces.
    
    ## Details
    
    - Enterprise-managed config becomes a normal config layer source with
    backend-supplied `id` and display `name` attached for provenance.
    - These layers are designed to behave like non-file managed config: they
    can surface syntax/type diagnostics by layer name even though there is
    no physical config file.
    - Relative path settings are resolved from a stored config base so
    cloud-delivered config remains consistent with existing MDM-delivered
    config semantics.
    - Hook attribution distinguishes config-delivered hooks from
    requirements-delivered hooks via `HookSource::CloudManagedConfig`.
    - This remains pull-based and snapshot-oriented; the PR adds layer
    identity/diagnostics, not dynamic reload behavior.
    
    ## Validation
    
    Validated through the targeted stack checks after rebasing onto current
    `main`:
    
    - Rust crate tests for
    config/hooks/cloud-config/backend-client/app-server-protocol
    - Filtered `codex-core` and `codex-app-server` `cloud_config_bundle`
    tests
    - Python generated-file contract test
    - `cargo shear --deny-warnings`
    - Targeted `argument-comment-lint` for config/hooks
  • Add subagent lineage metadata for responsesapi (#24161)
    ## Why
    
    We recently added `forked_from_thread_id` which lets us trace where a
    thread's _context_ comes from, but we also want to understand subagent
    lineage (e.g. which parent thread spawned this subagent? what kind of
    subagent is it?) which is orthogonal.
    
    This PR adds `parent_thread_id` and `subagent_kind` to the
    `x-codex-turn-metadata` header sent to ResponsesAPI.
    
    ## What changed
    
    - Adds `parent_thread_id` and `subagent_kind` to core-owned
    `x-codex-turn-metadata`.
    - Restores persisted `SessionSource` and `ThreadSource` from resumed
    session metadata so cold-resumed subagent threads keep their lineage on
    later Responses API requests.
    - Centralizes parent-thread extraction on `SessionSource` /
    `SubAgentSource` and reuses it in the Responses client, analytics, agent
    control, and state parsing paths.
    - Extends reserved-key, git-enrichment, thread-spawn, and app-server v2
    metadata coverage for the new lineage fields.
    
    ## Verification
    
    - Not run locally per request.
    - Added focused coverage in `core/src/turn_metadata_tests.rs` and
    `app-server/tests/suite/v2/client_metadata.rs`.
  • [codex] Add model tool mode selector (#25031)
    ## Why
    Some models need to select their code-execution behavior through model
    catalog metadata. Models without that metadata must continue to follow
    the existing `CodeMode` and `CodeModeOnly` feature flags, including when
    a newer server sends an enum value this client does not recognize.
    
    ## What changed
    - add optional `ModelInfo.tool_mode` metadata with `direct`,
    `code_mode`, and `code_mode_only`
    - treat omitted and unknown wire values as `None`
    - resolve `None` from the existing feature flags
    - carry the resolved `ToolMode` directly on `TurnContext`, outside
    `Config`
    - use the resolved value for turn creation, model switches, review
    turns, tool planning, and code execution
    
    ## Coverage
    - add protocol coverage for omitted, known, and unknown enum values
    - add focused coverage for flag fallback and explicit metadata
    overriding feature flags
    - add core integration coverage that fetches remote model metadata
    through `/v1/models` and verifies the outbound `/responses` tools for
    explicit `direct` and `code_mode_only` selectors
    
    ## Stack
    - followed by #25032
  • Surface filesystem permission profiles in prompt context (#23924)
    ## Summary
    Some permission profiles can encode filesystem reads that should remain
    unavailable to the agent. Before this change, the model-visible context
    and automatic approval review prompt summarized the effective
    permissions as a legacy sandbox mode, which can omit permission-profile
    filesystem entries from escalation decisions.
    
    For example, a profile can grant workspace access while denying a
    private subtree across every workspace root:
    
    ```toml
    default_permissions = "restricted-workspace"
    
    [permissions.restricted-workspace.workspace_roots]
    "/Users/alice/project" = true
    "/Users/alice/other-project" = true
    
    [permissions.restricted-workspace.filesystem]
    ":minimal" = "read"
    
    [permissions.restricted-workspace.filesystem.":workspace_roots"]
    "." = "write"
    "private" = "deny"
    "private/**" = "deny"
    ```
    
    The context window now describes the workspace roots and effective
    filesystem side of the `PermissionProfile` directly, with deny entries
    marked as non-escalatable:
    
    ```xml
    <environment_context>
      <cwd>/Users/alice/project</cwd>
      <shell>zsh</shell>
      <filesystem><workspace_roots><root>/Users/alice/project</root><root>/Users/alice/other-project</root></workspace_roots><permission_profile type="managed"><file_system type="restricted"><entry access="read"><special>:minimal</special></entry><entry access="write"><path>/Users/alice/project</path></entry><entry access="write"><path>/Users/alice/other-project</path></entry><entry access="deny" escalatable="false"><path>/Users/alice/project/private</path></entry><entry access="deny" escalatable="false"><path>/Users/alice/other-project/private</path></entry><entry access="deny" escalatable="false"><glob>/Users/alice/project/private/**</glob></entry><entry access="deny" escalatable="false"><glob>/Users/alice/other-project/private/**</glob></entry></file_system></permission_profile></filesystem>
    </environment_context>
    ```
    
    Managed requirements can impose the same kind of deny-read restriction:
    
    ```toml
    [permissions.filesystem]
    deny_read = [
      "/Users/alice/project/private",
      "/Users/alice/project/private/**",
    ]
    ```
    
    The automatic approval review prompt also receives the parent turn's
    denied-read context, so review decisions can account for the active
    permission profile.
    
    ## What Changed
    - Render the effective filesystem profile in `<environment_context>`,
    including profile type, filesystem entries, workspace roots, and
    non-escalatable deny entries.
    - Persist effective `workspace_roots` in `TurnContextItem` so
    resumed/replayed context does not have to bind `:workspace_roots`
    through legacy `cwd` fallback.
    - Add explicit permission instructions that denied reads are policy
    restrictions, not escalation targets.
    - Pass the parent turn's denied-read context into automatic approval
    reviews.
    - Add targeted coverage for prompt rendering, workspace-root
    materialization, replay context, and review prompt context.
    - Keep the prompt-context test expectations platform-aware so the same
    filesystem rendering assertions pass on Unix and Windows paths.
    
    ## Testing
    - `just test -p codex-core
    context::environment_context::tests::serialize_environment_context_with_full_filesystem_profile`
    - `just test -p codex-core
    context::environment_context::tests::turn_context_item_filesystem_uses_workspace_roots_instead_of_cwd`
    - `just test -p codex-core
    context::permissions_instructions::permissions_instructions_tests::builds_permissions_from_profile_with_denied_reads`
    - `just fix -p codex-core`
    
    I also attempted `just test -p codex-core`; the changed prompt-context
    tests passed, but the full local run did not complete cleanly in this
    sandboxed macOS environment due unrelated user-shell `CODEX_SANDBOX*`
    expectations and integration-test timeouts.
  • [codex] Add user input client ids (#24653)
    ## Summary
    
    Adds an optional `clientId` field to app-server v2 `UserInput` and
    carries it through the core `UserInput` model so clients can correlate
    echoed user input items without relying on payload equality.
    
    ## Details
    
    - Adds `client_id: Option<String>` to core `UserInput` variants.
    - Exposes the v2 app-server field as `clientId` on the wire and in
    generated TypeScript.
    - Preserves the id when converting between app-server v2 and core
    protocol types.
    - Regenerates app-server schema fixtures.
    
    ## Validation
    
    - `just fmt`
    - `just write-app-server-schema`
    - `cargo test -p codex-app-server-protocol`
    - `cargo test -p codex-protocol`
    - `just fix -p codex-app-server-protocol`
    - `just fix -p codex-protocol`
    - `git diff --check`
  • Expose MCP server info as part of server status (#24698)
    # Summary
    
    Expose MCP server info via App Server (when available) so apps can
    render a richer MCP experience
  • Restore legacy image detail values (#24644)
    ## Why
    
    Older persisted rollouts can contain `input_image.detail` values of
    `auto` or `low` from before `ImageDetail` was narrowed to
    `high`/`original`. Current deserialization rejects those values, which
    can make resume skip later compacted checkpoints and reconstruct an
    oversized raw suffix before the next compaction attempt.
    
    Confirmed Sentry reports fixed by this compatibility path:
    
    - [CODEX-1H3F](https://openai.sentry.io/issues/7500642496/)
    - [CODEX-1H6N](https://openai.sentry.io/issues/7501025347/)
    - [CODEX-1JDP](https://openai.sentry.io/issues/7504549065/)
    - [CODEX-1HW6](https://openai.sentry.io/issues/7503407986/)
    
    ## Background
    
    [openai/codex#20693](https://github.com/openai/codex/pull/20693) added
    image-detail plumbing for app-server `UserInput` so input images could
    explicitly request `detail: original`. The Slack discussion behind that
    PR was about ScreenSpot / bridge evals where user input images were
    resized, while tool output images already had MCP/code-mode ways to
    request image detail.
    
    In review, the intended new API surface was narrowed to `high` and
    `original`: default to `high`, allow `original` when callers need
    unchanged image handling, and avoid encouraging new `auto` or `low`
    usage. That policy still makes sense for newly emitted values.
    
    The missing compatibility piece is persisted history. Older rollouts can
    already contain `auto` and `low`, and resume reconstructs typed history
    by deserializing those rollout records. Rejecting old values at that
    boundary causes valid compacted checkpoints to be skipped. This PR
    restores `auto` and `low` as real variants so old records deserialize
    and round-trip without being rewritten as `high`, while product paths
    can continue to default to `high` and avoid emitting `auto` for new
    behavior.
    
    ## What changed
    
    - Restored `ImageDetail::Auto` and `ImageDetail::Low` as first-class
    protocol values.
    - Preserved `auto`/`low` through rollout deserialization, MCP image
    metadata, code-mode image output, and schema/type generation.
    - Kept local image byte handling conservative: only `original` switches
    to original-resolution loading; `auto`/`low`/`high` continue through the
    resize-to-fit path while retaining their detail value.
    - Added regression coverage for enum round-tripping and code-mode `low`
    detail handling.
    
    ## Testing
    
    - `just write-app-server-schema`
    - `just test -p codex-protocol`
    - `just test -p codex-tools`
    - `just test -p codex-code-mode`
    - `just test -p codex-app-server-protocol`
    - `just test -p codex-core
    suite::rmcp_client::stdio_image_responses_preserve_original_detail_metadata`
    - `just test -p codex-core
    suite::code_mode::code_mode_can_use_mcp_image_result_with_image_helper`
    - Loaded broken rollouts on local fixed builds, and started/completed
    new turns.
    
    I also attempted `just test -p codex-core`; the local broad run did not
    finish green: 2559 tests run, 2467 passed, 55 flaky, 91 failed, 1 timed
    out. The failures were broad timeout/deadline failures across unrelated
    areas; targeted changed-path core tests above passed.
  • [codex] remove plain image wrapper spans (#24652)
    ## Why
    
    Remote image submissions currently wrap native `input_image` spans in
    literal `<image>` and `</image>` text spans. Those extra prompt tokens
    add structure without providing label or routing information.
    
    ## What Changed
    
    - Serialize `UserInput::Image` directly as an `input_image` content
    span.
    - Preserve named local-image framing and legacy wrapper parsing for
    labeled attachments and existing histories.
    - Update existing request-shape expectations for drag-and-drop images,
    model switching, and compaction.
    
    ## Validation
    
    - `just test -p codex-protocol`
    - Focused `codex-core` run covering
    `drag_drop_image_persists_rollout_request_shape`,
    `model_change_from_image_to_text_strips_prior_image_content`, and
    `snapshot_request_shape_pre_turn_compaction_including_incoming_user_message`
    
    ## Notes
    
    - A broader `just test -p codex-core` run was attempted; the affected
    tests passed, while the overall run failed in unrelated CLI, MCP, and
    tooling tests plus a `thread_manager` timeout.
  • Add experimental turn additional context (#24154)
    ## Summary
    
    Adds experimental `additionalContext` support to `turn/start` and
    `turn/steer` so clients can provide ephemeral external context, such as
    browser or automation state, without turning that plumbing into a
    visible user prompt or triggering user-prompt lifecycle behavior.
    
    ## API Shape
    
    The parameter shape is:
    
    ```ts
    additionalContext?: Record<string, {
      value: string
      kind: "untrusted" | "application"
    }> | null
    ```
    
    Example:
    
    ```json
    {
      "additionalContext": {
        "browser_info": {
          "value": "Active tab is CI failures.",
          "kind": "untrusted"
        },
        "automation_info": {
          "value": "CI rerun is in progress.",
          "kind": "application"
        }
      }
    }
    ```
    
    The keys are opaque and caller-defined.
    
    ## Context Injection
    
    When provided, accepted entries are inserted into model context as
    hidden contextual message items, not as visible thread user-message
    items.
    
    `kind: "untrusted"` entries are inserted with role `user`:
    
    ```text
    <external_${key}>${value}</external_${key}>
    ```
    
    `kind: "application"` entries are inserted with role `developer`:
    
    ```text
    <${key}>${value}</${key}>
    ```
    
    Values are not escaped. Each value is truncated to 1k approximate tokens
    before wrapping.
    
    For `turn/start`, accepted additional context is inserted before normal
    user input. For `turn/steer`, additional context is merged only when the
    steer includes non-empty user input; context-only steers still reject as
    empty input.
    
    ## Dedupe Strategy
    
    `AdditionalContextStore` lives on session state and stores the latest
    complete additional-context map.
    
    Each `turn/start` or non-empty `turn/steer` treats its
    `additionalContext` as the current complete set of values. Entries are
    injected only when the key is new or the exact entry for that key
    changed, including `value` or `kind`. After merging, the store is
    replaced with the provided map, so omitted keys are removed from the
    retained set and can be injected again later if reintroduced.
    
    Omitting `additionalContext`, passing `null`, or passing an empty object
    resets the store to empty and injects nothing.
    
    ## What Changed
    
    - Threads experimental v2 `additionalContext` through app-server into
    core turn start and steer handling.
    - Adds separate contextual fragment types for untrusted user-role
    context and application developer-role context.
    - Uses pending response input items so additional context can be
    combined with normal user input without treating it as prompt text.
    - Adds integration coverage for start/steer flow, role routing,
    dedupe/reset behavior, deletion/re-add behavior, hook-blocked input
    behavior, empty context-only steer rejection, external-fragment marker
    matching, and truncation.
  • standalone websearch extension (#23823)
    ## Summary
    
    Add the extension-backed standalone `web.run` tool so Codex can call the
    standalone search endpoint through the `codex-api` search client and
    return its encrypted output to Responses.
    
    - gate the new tool behind `standalone_web_search`
    - install the extension in the app-server thread registry and hide
    hosted `web_search` when standalone search is enabled for OpenAI
    providers so the two paths stay mutually exclusive
    - build search context from persisted history using a small tail
    heuristic: previous user message, assistant text between the last two
    user turns capped at about 1k tokens, and current user message
    
    ## Test Plan
    
    - `cargo test -p codex-web-search-extension`
    - `cargo test -p codex-api`
    - `cargo test -p codex-core
    hosted_tools_follow_provider_auth_model_and_config_gates`
  • Display workspace usage limit error copy from response header (#24114)
    ## Why
    
    `openai/openai#947613` adds `X-Codex-Rate-Limit-Reached-Type` for Codex
    workspace credit-depletion and spend-cap responses. The CLI currently
    reads the adjacent promo header but otherwise renders generic
    usage-limit copy, so those responses do not explain the
    workspace-specific action the user needs to take.
    
    Backend dependency: https://github.com/openai/openai/pull/947613
    
    ## What Changed
    
    - Parse `X-Codex-Rate-Limit-Reached-Type` in the usage-limit error
    handling path alongside `x-codex-promo-message`.
    - Keep the header value parsing with the shared `RateLimitReachedType`
    enum.
    - Carry the parsed type on `UsageLimitReachedError` and render
    client-owned copy for the four workspace owner/member credit and
    spend-cap values.
    - Preserve existing promo and plan-based text for absent, generic, or
    unknown header values.
    - Keep the existing TUI workspace-owner nudge state path unchanged; the
    response header only selects the displayed error string.
    - Add focused display coverage for all specific type values and the
    generic fallback case.
    
    ## Test Plan
    
    - Added `usage_limit_reached_error_formats_rate_limit_reached_types`
    coverage.
    - Not run manually, per request; CI runs validation on the pushed
    commit.
  • Add trace_id to TurnStartedEvent (#23980)
    ## Why
    [Recent PR](https://github.com/openai/codex/pull/22709) removed
    `trace_id` from `TurnContextItem`.
    
    ## What changed
    - Add to `TurnStartedEvent` so rollout consumers can correlate turns
    with telemetry traces.
    - Note that the branch name is out of date because I originally re-added
    to `TurnContextItem`, but we decided to move it to `TurnStartedEvent`.
    
    ## Verification
    - `cargo test -p codex-protocol`
    - `cargo test -p codex-core --lib
    regular_turn_emits_turn_started_without_waiting_for_startup_prewarm`
    - `cargo test -p codex-core --test all
    emits_warning_when_resumed_model_differs`
    - `cargo test -p codex-rollout`
    - `cargo test -p codex-state`
  • cli: rename profile v2 flag to --profile (#23883)
    ## Why
    
    Profile v2 is taking over the user-facing profile selection path, so the
    CLI no longer needs to expose the transitional `--profile-v2` spelling.
    This switches the public args surface to `--profile` before the
    remaining legacy profile plumbing is removed separately.
    
    ## What
    
    - Rebind `--profile` and `-p` to the v2 profile name argument that
    selects `$CODEX_HOME/<name>.config.toml`.
    - Stop parsing the legacy shared CLI profile argument while keeping its
    implementation path in place for follow-up cleanup.
    - Update CLI validation, profile-name parse errors, and the
    legacy-profile collision message/tests to refer to `--profile`.
    
    ## Testing
    
    - `cargo test -p codex-cli -p codex-config -p codex-protocol -p
    codex-utils-cli`
  • [codex] Add plugin id to MCP tool call items (#23737)
    Add owning plugin id to MCP tool call items so we can better filter them
    at plugin level.
    
    ## Summary
    - add optional `plugin_id` to MCP tool-call items and legacy begin/end
    events
    - propagate plugin metadata into emitted core items and app-server v2
    `ThreadItem::McpToolCall`
    - preserve plugin ids through app-server replay/redaction paths and
    regenerate v2 schema fixtures
    
    ## Testing
    - `just write-app-server-schema`
    - `just fmt`
    - `just fix -p codex-core`
    - `cargo test -p codex-protocol -p codex-app-server-protocol`
    - `cargo test -p codex-app-server-protocol`
    - `cargo test -p codex-core mcp_tool_call_item_includes_plugin_id --lib`
    - `cargo check -p codex-tui --tests`
    - `cargo check -p codex-app-server --tests`
    - `git diff --check`
    
    ## Notes
    - `just fix -p codex-core` completed with two non-fatal
    `too_many_arguments` warnings on the touched MCP notification helpers.
    - A broader `cargo test -p codex-core` run passed core unit tests, then
    hit shell/sandbox/snapshot failures in the integration target.
    - A broader app-server downstream run hit the existing
    `in_process::tests::in_process_start_clamps_zero_channel_capacity` stack
    overflow; `cargo test -p codex-exec` also hit the existing sandbox
    expectation mismatch in
    `thread_lifecycle_params_include_legacy_sandbox_when_no_active_profile`.
  • Honor client-resolved service tier defaults (#23537)
    ## Why
    
    Model catalog responses can now advertise a nullable
    `default_service_tier` for each model. Codex needs to preserve three
    distinct states all the way from config/app-server inputs to inference:
    
    - no explicit service tier, so the client may apply the current model
    catalog default when FastMode is enabled
    - explicit `default`, meaning the user intentionally wants standard
    routing
    - explicit catalog tier ids such as `priority`, `flex`, or future tiers
    
    Keeping those states distinct prevents the UI from showing one tier
    while core sends another, especially after model switches or app-server
    `thread/start` / `turn/start` updates.
    
    ## What Changed
    
    - Plumbed `default_service_tier` through model catalog protocol types,
    app-server model responses, generated schemas, model cache fixtures, and
    provider/model-manager conversions.
    - Added the request-only `default` service tier sentinel and normalized
    legacy config spelling so `fast` in `config.toml` still materializes as
    the runtime/request id `priority`.
    - Moved catalog default resolution to the TUI/client side, including
    recomputing the effective service tier when model/FastMode-dependent
    surfaces change.
    - Updated app-server thread lifecycle config construction so
    `serviceTier: null` preserves explicit standard-routing intent by
    mapping to `default` instead of internal `None`.
    - Kept core responsible for validating explicit tiers against the
    current model and stripping `default` before `/v1/responses`, without
    applying catalog defaults itself.
    
    ## Validation
    
    - `CARGO_INCREMENTAL=0 cargo build -p codex-cli`
    - `CARGO_INCREMENTAL=0 cargo test -p codex-app-server model_list`
    - `cargo test -p codex-tui service_tier`
    - `cargo test -p codex-protocol service_tier_for_request`
    - `cargo test -p codex-core get_service_tier`
    - `RUST_MIN_STACK=8388608 CARGO_INCREMENTAL=0 cargo test -p codex-core
    service_tier`
  • Add SubagentStop hook (#22873)
    # What
    
    <img width="1792" height="1024" alt="image"
    src="https://github.com/user-attachments/assets/8f81d232-5813-4994-a61d-e42a05a93a3e"
    />
    
    `SubagentStop` runs when a thread-spawned subagent turn is about to
    finish. Thread-spawned subagents use `SubagentStop` instead of the
    normal root-agent `Stop` hook.
    
    Configured handlers match on `agent_type`. Hook input includes the
    normal stop fields plus:
    
    - `agent_id`: the child thread id.
    - `agent_type`: the resolved subagent type.
    - `agent_transcript_path`: the child subagent transcript path.
    - `transcript_path`: the parent thread transcript path.
    - `last_assistant_message`: the final assistant message from the child
    turn, when available.
    - `stop_hook_active`: `true` when the child is already continuing
    because an earlier stop-like hook blocked completion.
    
    `SubagentStop` shares the same completion-control semantics as `Stop`,
    scoped to the child turn:
    
    - No decision allows the child turn to finish.
    - `decision: "block"` with a non-empty `reason` records that reason as
    hook feedback and continues the child with that prompt.
    - `continue: false` stops the child turn. If `stopReason` is present,
    Codex surfaces it as the stop reason.
    
    # Lifecycle Scope
    
    Only thread-spawned subagents run `SubagentStop`.
    
    Internal/system subagents such as Review, Compact, MemoryConsolidation,
    and Other do not run normal `Stop` hooks and do not run `SubagentStop`.
    This avoids exposing synthetic matcher labels for internal
    implementation paths.
    
    # Stack
    
    1. #22782: add `SubagentStart`.
    2. This PR: add `SubagentStop`.
    3. #22882: add subagent identity to normal hook inputs.
  • feat(permissions): resolve permission profile inheritance (#22270)
    ## Stack
    
    This is the foundation PR for the permission-profile inheritance stack.
    
    - This PR adds config-level `extends` resolution and merge semantics.
    - Follow-up: #23705 applies resolved profiles at runtime and updates the
    active-profile protocol surfaces.
    
    ## Why
    
    Permission profiles are starting to carry enough policy that
    copy-pasting near-identical definitions becomes hard to review and easy
    to drift. Before the runtime can consume inherited profiles, the config
    layer needs one explicit resolver that can merge parent chains and
    reject unsafe or invalid inheritance shapes.
    
    ## What changed
    
    - Add `extends` to permission-profile TOML and resolve parent chains in
    inheritance order.
    - Merge inherited profile TOML with the existing config merge behavior
    while preserving the permission-specific normalization needed for
    network domain keys.
    - Keep parent descriptions out of resolved child profiles and record
    inherited profile names separately for downstream consumers.
    - Reject undefined parents, unsupported built-in parents, and
    inheritance cycles with targeted errors.
    - Cover resolver behavior with TOML fixture tests and refresh the
    generated config schema.
    
    ## Validation
    
    - `cargo test -p codex-config`
    - `cargo test -p codex-core permissions_profiles_`
  • Add timeout for remote compaction requests (#23451)
    ## Why
    
    Remote compaction currently sends a unary `POST /responses/compact` and
    waits for the full response before replacing history or emitting the
    completed `ContextCompaction` item. Unlike normal `/responses` streaming
    requests, this unary compact request had no timeout boundary. If the
    backend accepts the request and then stalls before returning a body, the
    existing request retry policy never sees a transport error, so the
    compact turn can remain stuck after the started item with no completion
    or actionable error.
    
    That matches the reported hang shape in issues such as #18363, where
    logs show `responses/compact` was posted but no corresponding compact
    completion followed. A bounded request timeout gives the existing retry
    policy a concrete timeout error to retry instead of letting the user sit
    indefinitely on automatic context compaction.
    
    ## What
    
    - Add a request timeout to legacy `/responses/compact` calls.
    - Size that timeout from the provider stream idle timeout with a
    conservative multiplier, so the default compact attempt gets 20 minutes
    rather than the 5 minute stream idle window.
    - Map API transport timeouts to a request timeout error instead of the
    child-process timeout message.
    
    ## Testing
    
    - Not run (per request; CI will cover).