78 Commits

  • feat(app-server): add history_mode to thread (#29927)
    ## Description
    
    This PR adds a new `historyMode = "legacy" | "paginated"` to `Thread`.
    This will be stored in `SessionMeta` in the JSONL rollout file and as a
    new column in the SQLite thread_metadata table, and exposed on
    `thread/start` and on the `Thread` object in app-server.
    
    ## What changed
    
    - Added canonical `ThreadHistoryMode` with `legacy` and `paginated`,
    defaulting old and new SessionMeta to `legacy`.
    - Carried `history_mode` through core session config, ThreadStore stored
    metadata, local/in-memory stores, rollout metadata extraction, and the
    existing SQLite `threads` table.
    - Added experimental `historyMode` to app-server v2 `Thread` and
    `thread/start`.
    - Made paginated stored threads metadata-discoverable but unsupported
    for legacy full-history reads, `load_history`, live resume, and create
    paths.
    - Regenerated app-server schema fixtures and added
    protocol/state/thread-store/app-server coverage for persistence and
    fail-closed behavior.
    
    ## Compatibility floor
    Because users may be running various versions of Codex binaries on the
    same machine (TUI, Codex App, etc.), we will need to establish a
    compatibility floor for upcoming paginated threads, which will change
    how thread storage reads and writes work.
    
    The overall plan here:
    ```
    Release N:
    - Add historyMode to SessionMeta / Thread / SQLite metadata.
    - Teach binaries to understand paginated threads.
    - If a binary sees `historyMode="paginated"` but does not support the paginated contract, it refuses to resume/mutate the thread.
    - Default remains `"legacy"`.
    
    Release N+1:
    - First-party clients start opting into paginated threads where appropriate.
    - Internal dogfood / staged rollout.
    - Measure old-client usage and paginated-thread unsupported errors.
    
    Release N+2:
    - Only after Release N+ is overwhelmingly deployed, make paginated the default.
    - Accept that a small tail of N-1-or-older binaries may not understand paginated threads.
    ```
    
    The important behavior change is fail-closed handling for a binary that
    encounters a persisted `paginated` thread before it knows how to fully
    support paginated history. In app-server, if a thread is `paginated`, we
    will:
    
    - allow metadata-only discovery paths like `thread/list` and
    `thread/read(includeTurns=false)`, so clients can still see the thread
    and inspect its `historyMode`
    - reject legacy full-history/live-thread paths like
    `thread/read(includeTurns=true)` and `thread/resume` with an unsupported
    JSON-RPC error
    - avoid silently treating an unknown or future `historyMode` as `legacy`
    
    Under the hood, the ThreadStore layer also rejects legacy operations
    that would need to load or replay the full thread history for a
    paginated thread. That gives us the behavior we want for Release N:
    future paginated threads are visible, but this binary fails closed
    instead of trying to operate on them as if they were legacy threads.
  • feat: add provider-aware model fallback to thread start (#29942)
    ## Why
    
    Helper threads such as task title generation can request a model ID that
    is valid for the default OpenAI provider but unavailable from the active
    provider. With Amazon Bedrock, `gpt-5.4-mini` is rejected while the
    provider static catalog exposes Bedrock model IDs such as
    `openai.gpt-5.5` and `openai.gpt-5.4`. This causes repeated background
    404s and can surface a misleading turn error even when the main turn
    succeeds.
    
    Clients need an explicit way to ask app-server to resolve an unavailable
    helper model to the active provider default. That fallback must remain
    limited to providers with an authoritative static catalog so custom or
    dynamically discovered model IDs are not rewritten based on an
    incomplete catalog.
    
    Fixes #28741.
    
    ## What changed
    
    - Add the experimental `allowProviderModelFallback` option to
    `thread/start`, defaulting to `false` to preserve existing behavior.
    - Thread the option through thread creation and model selection.
    - When enabled for a static model manager, preserve requested models
    present in the catalog and replace unavailable models with the provider
    default.
    - Continue preserving explicit model IDs for dynamic model managers
    without fetching a catalog solely to validate them.
    - Document the new `thread/start` behavior in the app-server API
    overview.
    
    ## Test
    Temporary test-client harness:
    ```
    ThreadStartParams {
        model: Some("gpt-5.4-mini".to_string()),
        allow_provider_model_fallback: true,
        ..Default::default()
    }
    ```
    Command:
    ```
    CODEX_HOME=/tmp/codex-bedrock-thread-start-home \
    CODEX_E2E_BEDROCK_THREAD_START_ONLY=1 \
    ./target/debug/codex-app-server-test-client \
      --codex-bin ./target/debug/codex \
      -c 'model_provider="amazon-bedrock"' \
      send-message-v2 --experimental-api ignored
    ```
    Relevant output:
    ```
    > "method": "thread/start",
    > "params": {
    >   "model": "gpt-5.4-mini",
    >   "modelProvider": null,
    >   "allowProviderModelFallback": true,
    >   ...
    > }
    
    < "result": {
    <   "model": "openai.gpt-5.5",
    <   "modelProvider": "amazon-bedrock",
    <   ...
    < }
    ```
  • feat: use run agent task auth for inference (#19051)
    ## Stack
    
    This is PR 3 of the simplified HAI single-run-task stack:
    
    - [#19047](https://github.com/openai/codex/pull/19047) Agent Identity
    assertion and task-registration primitives, including the shared
    run-task helper used by existing Agent Identity JWT auth.
    - [#19049](https://github.com/openai/codex/pull/19049)
    Disabled-by-default ChatGPT auth opt-in that provisions/reuses persisted
    Agent Identity runtime auth and its single run task.
    - [#19051](https://github.com/openai/codex/pull/19051) Run-scoped
    provider auth that uses one backend-owned task id for first-party
    inference and compaction requests.
    
    [#19054](https://github.com/openai/codex/pull/19054) collapsed out of
    the active stack because the simplified design no longer needs a
    separate background/control-plane task helper.
    
    ## Summary
    
    This PR moves Agent Identity usage into provider auth resolution. That
    keeps `AgentAssertion` auth tied to first-party OpenAI provider requests
    instead of applying a late session-wide override that could affect
    local, custom, Bedrock, API-key, or external-bearer providers.
    
    What changed:
    
    - adds a small `ProviderAuthScope` struct carrying the run auth policy
    and session source needed by provider-scoped auth resolution
    - lets `Session` opt the existing `ModelClient` into `ChatGptAuth`
    policy when `use_agent_identity` is enabled, without adding a second
    model-client constructor
    - resolves Agent Identity only for first-party OpenAI provider auth
    paths
    - uses the persisted run task id from the `AgentIdentityAuth` record to
    build `AgentAssertion` auth for Responses requests
    - routes shared request setup through scoped provider auth so unary
    compact requests use the same run-task assertion path as inference turns
    - keeps local/custom/Bedrock/env-key/external-bearer provider auth
    unchanged
    - lets missing run-task state surface through the existing model-request
    error path instead of silently falling back to bearer auth
    
    This PR intentionally does not create thread-scoped, target-scoped, or
    background-scoped task identities. The run task is the only task Codex
    registers in this POC shape.
    
    ## Testing
    
    - `just test -p codex-model-provider`
    - `just test -p codex-core client::tests::provider_auth_scope_uses`
    - `just test -p codex-core remote_compact_uses_agent_identity_assertion`
  • [codex] Add Ultra reasoning effort (#29899)
    ## Why
    
    Ultra should be one user-facing reasoning selection for work that
    benefits from both maximum reasoning and proactive multi-agent
    delegation. Without it, clients must coordinate maximum reasoning with
    the experimental `multiAgentMode` setting, even though the inference
    backend still expects its existing `max` effort value.
    
    This change makes reasoning effort the source of truth: clients select
    `ultra`, core derives proactive multi-agent behavior when the turn is
    eligible for multi-agent V2, and inference requests continue to use the
    backend-compatible `max` value.
    
    ## What changed
    
    - Add `ultra` as a first-class reasoning effort and preserve
    model-catalog ordering when exposing it to clients.
    - Convert `ultra` to `max` at the inference request boundary, including
    Responses HTTP/WebSocket requests, startup prewarm, compaction, and
    memory summarization.
    - Derive effective multi-agent mode per turn from effective reasoning
    effort:
      - eligible multi-agent V2 + `ultra` → `proactive`
      - eligible multi-agent V2 + any other effort → `explicitRequestOnly`
    - V1 or otherwise ineligible sessions → no multi-agent mode instruction
    - Keep the derived effective mode in turn context history so successive
    turns can emit a developer-message update only when the effective mode
    changes.
    - Remove selected multi-agent mode from core session configuration, turn
    construction, thread settings, resume/fork restoration, and subagent
    spawn plumbing. Subagents inherit reasoning effort and derive their own
    effective mode.
    - Retain the experimental app-server `multiAgentMode` fields for wire
    compatibility while marking them deprecated. Request values are accepted
    but ignored; compatibility response fields report `explicitRequestOnly`.
    - Display Ultra in the TUI using the order supplied by `model/list`.
    
    ## Validation
    
    - `just test -p codex-core ultra_reasoning_uses_max_for_requests`
    - `just test -p codex-tui model_reasoning_selection_popup`
  • [2/3] core: persist world state in rollouts (#29835)
    ## Why
    
    `WorldState` currently remembers its model-visible diff baseline only in
    memory. That leaves no durable source for restoring the exact baseline
    after resume, fork, rollback, or compaction.
    
    This is the second PR in the WorldState persistence stack, built on
    #29833 and following #29249. It records durable state transitions; the
    next PR will replay them during rollout reconstruction.
    
    ## What
    
    - Add a `world_state` rollout item containing either a full snapshot or
    an RFC 7386 JSON Merge Patch.
    - Persist a full snapshot after initial context and after compaction
    establishes a new context window.
    - Persist non-empty patches when later sampling steps or turns advance
    the WorldState baseline.
    - Write model-visible history before its matching WorldState record, so
    an interrupted write can only cause a safe repeated update on replay.
    - Preserve WorldState records for full-history forks while excluding
    them from thread previews, metadata, and app-server history
    materialization.
    
    Older binaries read rollout lines independently, so they skip the
    unknown `world_state` records while retaining the rest of the thread.
    
    ## Testing
    
    - `just test -p codex-core
    snapshot_merge_patch_changes_and_removes_nested_values`
    - `just test -p codex-core
    world_state_baseline_deduplicates_until_history_is_replaced`
    - `just test -p codex-core
    deferred_executor_compaction_preserves_then_updates_environment_once`
    - `just test -p codex-protocol`
    - `just test -p codex-rollout`
    - `just test -p codex-state`
    - `just test -p codex-thread-store`
    - `just test -p codex-app-server-protocol`
  • Persist agent messages as response items (#29829)
    ## Why
    
    Inter-agent messages are recorded in live history as
    `ResponseItem::AgentMessage`, but rollouts stored
    `InterAgentCommunication` and rebuilt the response item during resume.
    This made the rollout differ from the actual Responses history.
    
    ## What changed
    
    - store the prepared `agent_message` response item directly
    - keep `trigger_turn` in a small local metadata record for fork
    truncation
    - keep reading older `inter_agent_communication` rollout items
  • Support thread-level originator overrides (#29477)
    ## Why
    
    Work(TPP) threads can be launched from the Desktop app, but if they all
    keep the Desktop app's default originator then downstream attribution
    cannot distinguish local Work launches from cloud-backed Work launches.
    `thread/start.serviceName` already carries that launch signal, while
    `SessionMeta.originator` is the durable thread-level value that survives
    resume and fork.
    
    This change converts the Desktop Work service names into an effective
    originator at thread creation time, persists that originator with the
    thread, and keeps using it for later model requests and memory writes.
    
    ## What changed
    
    - Map `CODEX_WORK_LOCAL` and `CODEX_WORK_CLOUD` service names to
    per-thread originators, while preserving
    `CODEX_INTERNAL_ORIGINATOR_OVERRIDE` as the highest-precedence override.
    - Persist the effective originator in `SessionMeta.originator`, read it
    back on resume/fork, and inherit the parent originator for subagent
    spawns when there is no persisted session metadata.
    - Handle truncated `SpawnAgentForkMode::LastNTurns` forks by falling
    back to the live parent originator when the forked history no longer
    includes `SessionMeta`.
    - Thread the per-thread originator through Responses headers,
    websocket/compaction request paths, thread-store creation, rollout
    metadata, and memory stage-one telemetry.
    
    ## Verification
    
    - `just test -p codex-core
    agent::control::tests::spawn_thread_subagent_inherits_parent_originator_without_fork
    agent::control::tests::spawn_thread_subagent_fork_last_n_turns_inherits_parent_originator_without_session_meta
    thread_manager::tests::originator_override_precedes_service_name_remapping`
    - `just test -p codex-core
    agent::control::tests::resume_thread_subagent_restores_stored_metadata_and_effective_multi_agent_mode`
    - `just test -p codex-memories-write`
    - `just fix -p codex-core -p codex-memories-write`
    - `git diff --check`
  • core: rename metadata -> internal_chat_message_metadata_passthrough (#28968)
    ## Description
    This PR cuts Codex over from generic `ResponseItem.metadata` (introduced
    here: https://github.com/openai/codex/pull/28355) to
    `ResponseItem.internal_chat_message_metadata_passthrough`, which is the
    blessed path and has strongly-typed keys.
    
    For now we have to drop this MAv2 usage of `metadata`:
    https://github.com/openai/codex/pull/28561 until we figure out where
    that should live.
  • Expose thread-level multi-agent mode (#28792)
    ## Why
    
    Once multi-agent mode can be selected per turn, clients also need to
    choose the initial selection when creating a thread and observe that
    selection through lifecycle and settings APIs.
    
    The selected value is intentionally distinct from the effective
    model-visible value: no client selection is represented as `null`, even
    though an eligible multi-agent v2 turn derives `explicitRequestOnly` as
    its effective default.
    
    ## What changed
    
    - Add the optional experimental `thread/start.multiAgentMode` parameter
    and pass it through thread creation.
    - Preserve an omitted initial value as an unset selection rather than
    eagerly storing `explicitRequestOnly`.
    - Apply an explicit `thread/start` selection to the first turn through
    the session configuration established at thread creation.
    - Restore the latest persisted effective mode as the selected baseline
    on cold resume when rollout history contains one.
    - Inherit the optional selected mode from a loaded parent when creating
    related runtime threads.
    - Return the current selected `multiAgentMode` from `thread/start`,
    `thread/resume`, `thread/fork`, and thread settings, using `null` when
    no mode is selected.
    - Keep lifecycle reporting independent from model capability and feature
    eligibility; core turn construction remains responsible for calculating
    and persisting the effective mode.
    
    ## Not covered
    
    - Clearing an existing loaded-session selection back to unset through
    `turn/start`; omitted or `null` currently retains the session's
    selection.
    - A TUI control, slash command, or `config.toml` preference.
    
    ## Verification
    
    - `CARGO_INCREMENTAL=0 just test -p codex-app-server-protocol`
    - `CARGO_INCREMENTAL=0 just test -p codex-app-server multi_agent_mode`
    
    The focused app-server coverage verifies explicit `thread/start`
    initialization, first-turn prompting, nullable reporting for an omitted
    selection, and retention of selections that are not currently
    runtime-eligible.
    
    ## Stack
    
    Stacked on #28685. This PR contains only the thread initialization and
    lifecycle/settings API layer.
  • [codex] Assign response item IDs when recording history (#28814)
    ## Why
    
    Client-created response items enter history without IDs, so their
    identity is lost across rollout persistence and resume. IDs should be
    assigned once at the history-recording boundary, while IDs returned by
    the server must remain unchanged.
    
    The Responses API validates item IDs using type-specific prefixes.
    Locally generated IDs therefore use the matching prefix plus a
    hyphenated UUIDv7, keeping them valid while distinguishable from
    server-generated IDs. Because this changes persisted history and
    provider request shapes, the behavior is opt-in behind the
    under-development `item_ids` feature. Compaction triggers remain request
    controls whose API shape does not accept an ID.
    
    ## What changed
    
    - Register the disabled-by-default `item_ids` feature and expose it in
    `config.schema.json`.
    - Make supported optional `ResponseItem` IDs serializable and expose
    them in the generated app-server schemas.
    - When `item_ids` is enabled, assign an ID during conversation-history
    preparation if an item has no ID.
    - Generate type-prefixed, hyphenated UUIDv7 IDs using the Responses API
    item conventions.
    - Preserve existing server IDs without rewriting them.
    - Persist assigned IDs in rollouts and include them in subsequent
    Responses requests.
    - Remove the unsupported ID field from `CompactionTrigger` and document
    why it has no ID.
    - Add integration coverage for enabled ID persistence, preservation of
    server IDs, and omission of generated IDs while the feature is disabled.
    
    `prepare_conversation_items_for_history` is the single response-item ID
    allocation boundary.
    
    ## Test plan
    
    - `just test -p codex-features`
    - `just test -p codex-core
    response_item_ids_persist_across_resume_and_preserve_server_ids`
    - `just test -p codex-core
    non_openai_responses_requests_omit_item_turn_metadata`
    - `just test -p codex-core
    resize_all_images_prepares_failures_before_history_insertion`
    - `just test -p codex-protocol`
    - `just test -p codex-app-server-protocol`
    - `just test -p codex-api azure_default_store_attaches_ids_and_headers`
  • Support openai/form extended form elicitations (#27500)
    # Summary
    Allow App Server clients to opt into `openai/form` MCP elicitations.
  • [codex] Add optional IDs to response items (#28812)
    ## Why
    
    `ResponseItem` variants do not have a consistent internal ID shape: some
    variants carry required IDs, some carry optional IDs, and some cannot
    represent an ID at all. The existing fields also use inconsistent serde,
    TypeScript, and JSON-schema annotations. A single enum-level access path
    is needed before history recording can assign and retain IDs.
    
    This PR establishes that internal model only. It intentionally does not
    generate or serialize IDs; allocation and wire persistence are isolated
    in the stacked follow-up.
    
    ## What changed
    
    - Give every concrete `ResponseItem` variant an `Option<String>` ID
    field.
    - Apply the same internal-only annotations to every ID field:
    `#[serde(default, skip_serializing)]`, `#[ts(skip)]`, and
    `#[schemars(skip)]`.
    - Add `ResponseItem::id()` and `ResponseItem::set_id()` as the shared
    accessors.
    - Preserve IDs when history items are rewritten for truncation.
    - Adapt consumers that previously assumed reasoning and image-generation
    IDs were required.
    - Regenerate app-server schemas so the hidden fields are represented
    consistently.
    
    The serde catch-all `ResponseItem::Other` remains ID-less because it
    must remain a unit variant.
    
    ## Test plan
    
    - `cargo check --tests -p codex-core -p codex-api -p codex-rollout-trace
    -p codex-image-generation-extension`
    - `just test -p codex-protocol`
    - `just test -p codex-app-server-protocol`
    - `just test -p codex-api -p codex-rollout-trace -p
    codex-image-generation-extension`
    - `just test -p codex-core event_mapping`
  • feat(core): add metadata field to ResponseItem (#28355)
    ## Description
    
    This PR adds an optional `metadata` field to `ResponseItem` for
    Responses API calls. Only mechanical plumbing, no actual values
    populated and sent yet. Turns out just adding a new field to
    `ResponseItem` has quite a large blast radius already.
    
    This change is backwards compatible because `metadata` is optional and
    omitted when absent, so existing response items and rollout history
    without it still deserialize and requests that do not set it keep the
    same wire shape. For provider compatibility, we strip out `metadata`
    before non-OpenAI Responses requests so Azure and AWS Bedrock never see
    this field.
    
    My followup PR here will actually make use of it to start storing and
    passing along `turn_id`: https://github.com/openai/codex/pull/28360
    
    ## What changed
    
    - Added `ResponseItemMetadata` with optional `turn_id`, plus optional
    `metadata` on Responses API item variants and inter-agent communication.
    - Preserved item metadata through response-item rewrites such as
    truncation, missing tool-output synthesis, compaction history
    rebuilding, visible-history conversion, rollout/resume, and generated
    app-server schemas/types.
    - Strip item metadata from non-OpenAI Responses requests while
    preserving it for OpenAI-shaped requests.
    - Updated the mechanical fixture/test construction churn required by the
    new optional field.
  • [codex] Add external agent import result accounting (#28008)
    ## Why
    
    External-agent imports can complete synchronously or continue in the
    background for plugins/sessions. Clients need a stable import id to
    correlate the immediate response with the eventual completion
    notification, and the completion payload needs enough accounting to show
    which artifact types succeeded or failed without hiding partial
    failures.
    
    ## What Changed
    
    - `externalAgentConfig/import` now returns an `importId`;
    `externalAgentConfig/import/completed` includes the same `importId` plus
    type-level `itemResults`.
    - Completed `itemResults` report `successCount`, `errorCount`,
    `successes`, and `rawErrors` for each migrated item type.
    - Added protocol/schema/TypeScript types for import successes, raw
    errors, and type-level results. No progress notification is included in
    the final PR.
    - `ExternalAgentConfigService::import` now returns an outcome object
    with synchronous item results and pending plugin imports.
    - Plugin import outcomes track succeeded/failed marketplaces, plugin
    ids, and raw errors. Plugin failures can be reported in completed
    accounting while later migration items continue.
    - Non-plugin synchronous import failures still fail the request, so
    invalid config/skills-style failures are not reported as a successful
    import response.
    - Session imports now return item results. Successful imports include
    the source session path and imported thread id; prepare, persist,
    ledger, and source-validation failures become raw errors in completion
    accounting where the import can continue.
    - The request processor generates the `importId`, aggregates synchronous
    results with background plugin/session results, and sends a single
    completed notification when all selected work is done.
    - App-server docs and generated schema fixtures were updated for the new
    response/completed payload shapes.
    
    ## Validation
    
    - `just test -p codex-app-server-protocol`
    - `just test -p codex-app-server-client event_requires_delivery`
    - `CODEX_SQLITE_HOME=/private/tmp/codex-app-server-review-sync-error
    just test -p codex-app-server
    external_agent_config_import_returns_error_for_failed_sync_import`
    - `CODEX_SQLITE_HOME=/private/tmp/codex-app-server-review-external-agent
    just test -p codex-app-server external_agent_config`
    
    Note: local sandbox validation used `CODEX_SQLITE_HOME` because the
    default sqlite state path is read-only in this environment.
  • [codex] simplify memory read metrics (#28164)
    ## Why
    
    Memory read telemetry currently reconstructs the executable shell
    command after a tool call finishes. That duplicates shell, login-policy,
    and cwd resolution owned by the tool handlers, and can diverge from the
    environment-specific command that unified exec actually ran.
    
    ## What changed
    
    - Expose the existing restricted shell-script parser directly for raw
    script text.
    - Parse `shell_command` and `exec_command` input into plain command argv
    before classifying memory reads.
    - Preserve all-or-nothing safe-command validation for multi-command
    scripts.
    - Remove cwd resolution, shell selection, and the unnecessary async
    boundary from memory read metric emission.
    
    ## Testing
    
    - `just test -p codex-shell-command`
    - `cargo check -p codex-core`
  • build: run buildifier from just fmt (#28125)
    ## Intent
    
    Keep Bazel and Starlark files consistently formatted without requiring
    contributors to install or version buildifier themselves.
    
    ## Implementation
    
    - Add a SHA-256-pinned, cross-platform DotSlash manifest for buildifier
    v8.5.1.
    - Run buildifier from the shared `just fmt` and `just fmt-check` driver,
    with Windows-safe explicit DotSlash invocation.
    - Provision DotSlash in formatting CI and contributor devcontainers, and
    document the source-build prerequisite.
    - Apply the initial mechanical buildifier formatting baseline.
  • Support plaintext agent messages (#27830)
    ## Why
    
    Multi-agent v2 `send_message` deliveries already reach the receiving
    model as typed `agent_message` items with encrypted content.
    Child-completion notifications are generated by Codex itself, so their
    content is plaintext and previously fell back to a serialized JSON
    envelope inside an assistant message.
    
    With plaintext `input_text` supported for `agent_message`, both delivery
    paths can use the same model-visible type while preserving explicit
    author and recipient metadata.
    
    ## What changed
    
    - add plaintext `input_text` support to `AgentMessageInputContent` and
    regenerate the affected app-server schemas
    - preserve `InterAgentCommunication` as structured mailbox input instead
    of converting it to assistant text
    - record delivered communications as typed `agent_message` history items
    - persist a dedicated rollout item so local delivery metadata such as
    `trigger_turn` remains available without leaking into the Responses
    request
    - reconstruct typed agent messages on resume and preserve fork-turn
    truncation behavior
    - remove request-time assistant-content parsing
    - preserve plaintext and encrypted inter-agent deliveries in stage-one
    memory inputs
    - normalize and link plaintext and encrypted agent messages in rollout
    traces without treating inbound messages as child results
    - cover the real MultiAgent V2 child-completion path end to end with
    deterministic mailbox synchronization
    
    ## Verification
    
    - `just test -p codex-core
    plaintext_multi_agent_v2_completion_sends_agent_message`
    - `just test -p codex-core input_queue_drains_mailbox_in_delivery_order
    record_initial_history_reconstructs_typed_inter_agent_message
    fork_turn_positions_use_inter_agent_delivery_metadata`
    - `just test -p codex-memories-write
    serializes_inter_agent_communications_for_memory`
    - `just test -p codex-rollout-trace
    agent_messages_preserve_routing_and_content
    sub_agent_started_activity_creates_spawn_edge`
    - `just test -p codex-rollout-trace
    agent_result_edge_falls_back_to_child_thread_without_result_message`
    - `just test -p codex-protocol -p codex-rollout -p
    codex-app-server-protocol`
  • [codex] Load AGENTS.md from all bound environments (#27696)
    ## Why
    
    We already have the machinery to support multiple environments on a
    single thread, but we only show the model the contents of `AGENTS.md`
    files in the primary environment.
    
    We should show the model all of the relevant project instructions when
    we know there's more than one environment.
    
    ## Known Gaps
    
    As discussed in the RFC, this implementation:
    
    1. doesn't handle environments being added/removed to/from the thread
    after its creation
    2. it doesn't enforce an aggregate context budget across environments,
    and instead applies the configured project maximum independently to each
    environment
    
    ## Implementation
    
    - Discover project instructions in environment order with an independent
    byte budget per environment and preserve source provenance/order.
    - Keep the legacy fragment byte-for-byte when exactly one environment
    contributes project instructions; use environment-labeled sections when
    two or more environments contribute.
    - Freeze the complete rendered fragment in `LoadedAgentsMd`, insert it
    directly into requests, and recognize both layouts in contextual and
    memory filtering.
    - Add exact rendering, independent-budget, source-order,
    creation-snapshot, and consumer coverage without changing app-server
    schemas.
  • [codex] Remove async_trait from first-party code (#27475)
    ## Why
    
    First-party async traits should expose their `Send` contracts explicitly
    without requiring `async_trait`. This completes the migration pattern
    established in #27303 and #27304.
    
    ## What changed
    
    - Replaced the remaining first-party `async_trait` traits with native
    return-position `impl Future + Send` where statically dispatched and
    explicit boxed `Send` futures where object safety is required.
    - Kept implementations behavior-preserving, outlining existing async
    bodies into inherent methods where that keeps the diff reviewable.
    - Removed all direct first-party `async-trait` dependencies and the
    workspace dependency declaration.
    - Added a cargo-deny policy that permits `async-trait` only through the
    remaining transitive wrapper crates.
    - Updated `rand` from 0.8.5 to 0.8.6 to resolve RUSTSEC-2026-0097 and
    keep the full cargo-deny check passing.
    
    ## Validation
    
    - `just test -p codex-exec-server`: 216 passed, 2 skipped.
    - `just test -p codex-model-provider`: 39 passed.
    - `just test -p codex-core` and `just test`: changed tests passed;
    remaining failures are environment-sensitive suites unrelated to this
    migration.
    - `cargo deny check`
    - `just fix`
    - `just fmt`
    - `cargo shear`
    - `just bazel-lock-check`
  • core: Consolidate Responses API Codex metadata (#27122)
    ## What
    Introduce a `CodexResponsesMetadata` struct that defines all the core
    metadata we send to Responses API. Example fields are `thread_id`,
    `turn_id`, `window_id`, etc.
    
    Going forward, `client_metadata["x-codex-turn-metadata"]` will be the
    canonical way Codex sends metadata to Responses API across both HTTP and
    websocket transports.
    
    For now, we continue to emit the existing top-level HTTP headers and
    top-level `client_metadata` fields from the same
    `CodexResponsesMetadata` struct for compatibility reasons.
    
    Also, app-server clients who specify additional
    `responsesapi_client_metadata` via `turn/start` and `turn/steer` will
    have those fields merged into
    `client_metadata["x-codex-turn-metadata"]`, but cannot override the
    reserved fields that core uses (i.e. the fields in
    `CodexResponsesMetadata`).
    
    ## Why
    
    Responses API request instrumentation is the source of truth for
    downstream Codex analytics that join requests by Codex IDs such as
    session, thread, turn, and context window. Before this change, those
    values were assembled through several request-specific paths: HTTP
    request bodies, websocket handshake headers, websocket `response.create`
    payloads, compaction requests, and the rich `x-codex-turn-metadata`
    envelope all had their own wiring.
    
    That made metadata propagation easy to drift across API-key/direct
    Responses API requests, ChatGPT-auth/proxied requests, websocket
    requests, and compaction requests. It also made additions like
    `window_id` error-prone because a field could be added to one transport
    projection but missed in another.
    
    ## What changed
    
    - Added `CodexResponsesMetadata` as the core-owned snapshot for Codex
    metadata sent to ResponsesAPI.
    - Render `client_metadata["x-codex-turn-metadata"]`, flat
    `client_metadata` projections, and direct compatibility headers from
    that same snapshot.
    - Include the known Codex-owned fields in the turn metadata blob,
    including installation/session/thread/turn/window IDs, request kind,
    lineage, sandbox/workspace metadata, timing, and compaction details.
    - Treat app-server `responsesapi_client_metadata` as enrichment for the
    Codex turn metadata blob while preventing those extras from overriding
    Codex-owned fields.
    - Use the same metadata path for normal turns, websocket prewarm, local
    compaction, remote v1 compaction, and remote v2 compaction.
    - Keep websocket connection-only preconnect metadata separate so
    handshakes carry compatibility identity headers without inventing a fake
    turn metadata blob.
    
    ## Verification
    
    - `cargo check -p codex-core`
    - `just fix -p codex-core`
  • [codex] Store compact window id in rollout (#27264)
    ## Why
    
    Compaction window identity is part of session history, not model-client
    transport state. Persisting it with the compacted rollout item lets
    resumed threads continue from the reconstructed window without keeping
    mutable window state on `ModelClient`.
    
    ## What changed
    
    - Added `window_id` to `CompactedItem` and stamp it when
    `replace_compacted_history` installs compacted history.
    - Moved auto-compact window id ownership into `AutoCompactWindow` /
    `SessionState`; `ModelClient` now receives the request window id from
    callers instead of storing it.
    - Returned `window_id` from rollout reconstruction for resume.
    Reconstruction uses the newest surviving compacted item's stored
    `window_id` when present, and falls back to the legacy compacted-item
    count when it is absent.
    - Kept fork startup at the fresh default window id and updated direct
    model-client tests to pass explicit test window ids.
    
    ## Validation
    
    - `cargo check -p codex-core --tests`
  • feat: use provider defaults for memory models (#27129)
    ## Why
    
    Memory startup used hardcoded OpenAI model slugs for extraction and
    consolidation. That works for the default OpenAI-compatible path, but
    provider-specific backends can require different model identifiers. In
    particular, Amazon Bedrock should use its Bedrock model ID for these
    background memory requests instead of the OpenAI `gpt-5.4-mini` /
    `gpt-5.4` slugs.
    
    ## What Changed
    
    - Added provider-owned preferred memory model methods alongside
    `approval_review_preferred_model`.
    - Updated memory extraction and consolidation to resolve their default
    model through the active `ModelProvider`.
    - Added Amazon Bedrock overrides so both memory stages use
    `openai.gpt-5.4` through Bedrock’s provider-specific model ID.
    - Kept explicit `memories.extract_model` and
    `memories.consolidation_model` config overrides taking precedence.
    - Added startup coverage for default OpenAI and Bedrock memory model
    selection.
    
    #closes #26288
  • Load selected executor skills through extensions (#27184)
    ## Why
    
    CCA is moving toward a split runtime where the orchestrator may not have
    a filesystem, while executors can expose preinstalled plugins and
    skills. A thread therefore needs to select capabilities without asking
    app-server or core to interpret executor-owned paths through the
    orchestrator's filesystem.
    
    The longer-term model is broader than executor skills:
    
    - A plugin is a bundle of skills, MCP servers, connectors/apps, and
    hooks.
    - A plugin root can be local, executor-owned, or hosted by a backend.
    - Components inside one plugin can use different access and execution
    mechanisms. A skill may be read from a filesystem or through backend
    tools; an HTTP MCP server can run without an executor; a stdio MCP
    server or hook needs an execution environment.
    - Core should carry generic extension initialization data. The extension
    that owns a component should discover it, expose it to the model, and
    invoke it through the appropriate runtime.
    
    This PR establishes that architecture through one complete vertical:
    selecting a root on an executor, discovering the skills beneath it,
    exposing those skills to the model, and reading an explicitly invoked
    `SKILL.md` through the same executor.
    
    ## Contract
    
    `thread/start` gains an experimental `selectedCapabilityRoots` field:
    
    ```json
    {
      "selectedCapabilityRoots": [
        {
          "id": "deploy-plugin@1",
          "location": {
            "type": "environment",
            "environmentId": "workspace",
            "path": "/opt/codex/plugins/deploy"
          }
        }
      ]
    }
    ```
    
    The root is intentionally not classified as a "plugin" or "skill" in the
    API. It can point at a standalone skill, a directory containing several
    skills, or a plugin containing skills and other components. This PR only
    teaches the skills extension how to consume it; later extensions can
    resolve MCP, connector, and hook components from the same selection.
    
    The platform-supplied `id` is stable selection identity. The location
    says which runtime owns the root and gives that runtime an opaque path.
    App-server does not inspect or canonicalize the path.
    
    ## What changed
    
    ### Generic thread extension initialization
    
    App-server converts selected roots into `ExtensionDataInit`. Core
    carries that generic initialization value until the final thread ID is
    known, then creates thread-scoped `ExtensionData` before lifecycle
    contributors run.
    
    This keeps `Session` and core independent of the capability-selection
    contract. The initialization value is consumed during construction; it
    is not retained as another long-lived `Session` field.
    
    ### Executor-backed skills
    
    The skills extension now owns an `ExecutorSkillProvider` that:
    
    - resolves the selected environment through `EnvironmentManager`
    - discovers, canonicalizes, and reads skills through that environment's
    `ExecutorFileSystem`
    - contributes the bounded selected-skill catalog as stable developer
    context
    - reads an explicitly invoked skill body through the authority that
    listed it
    - warns when an environment or root is unavailable
    - never falls back to the orchestrator filesystem for an executor-owned
    root
    
    Skill catalog and instruction fragments have hard byte bounds, which
    also bound them below the 10K-token per-item context limit. If a
    selected executor skill has the same name as a legacy local skill, the
    executor selection owns that invocation and the local body is not
    injected a second time.
    
    Existing local and bundled skill loading remains in place. Omitting
    `selectedCapabilityRoots` therefore preserves current local-only
    behavior.
    
    ## Current semantics
    
    - Only environment-owned locations are represented in this first
    contract.
    - Roots are resolved by the destination extension, not by app-server or
    core.
    - An unavailable executor or invalid root produces a warning and no
    capabilities from that root; it does not trigger a local-filesystem
    fallback.
    - Selection applies to a newly started active thread.
    - MCP servers, connectors, and hooks beneath a selected plugin root are
    not activated yet.
    - Selection is not yet persisted or inherited across resume, fork, or
    subagent creation. Existing local capabilities continue to behave as
    they do today in those flows.
    
    ## Planned vertical follow-ups
    
    1. **Hosted HTTP MCP:** add an extension-backed HTTP MCP source that
    works without an executor, then replace the special-purpose MCP plugins
    loader with that implementation.
    2. **Executor MCP:** register and execute stdio MCP servers through the
    environment that owns the selected plugin root.
    3. **Backend skills:** add a hosted skill source whose catalog and
    bodies are accessed through extension tools rather than a filesystem.
    4. **Connectors and hooks:** activate those components through their
    owning extensions, using the same selected-root boundary and
    component-specific runtime.
    5. **Durable selection:** define the desired-selection lifecycle,
    persist it, and make resume, fork, and subagent inheritance explicit
    rather than accidental.
    6. **Local convergence:** incrementally route existing local plugin,
    skill, and MCP loading through the same extension model while preserving
    current local behavior.
    
    Each follow-up remains reviewable as an end-to-end capability. The
    platform selects roots, generic thread extension data carries the
    selection, and the owning extension resolves and operates its component.
    
    ## Verification
    
    Coverage added for:
    
    - app-server end-to-end discovery and explicit invocation of a skill
    inside an executor-selected plugin root
    - exclusive invocation when a selected executor skill collides with a
    local skill name
    - executor filesystem authority for discovery, canonicalization, and
    reads
    - thread extension initialization before lifecycle contributors run
    - stable executor catalog context, explicit invocation, context
    rebuilding, hidden skills, and preserved host/remote catalog behavior
    
    Targeted protocol, core-skills, skills-extension, core lifecycle, and
    app-server executor-skill tests were run during development.
  • Pair thread environment settings (#26687)
    ## Why
    
    Thread cwd and environment selections are a single logical setting in
    core: updating one without the other can silently desynchronize the
    next-turn execution context. This change makes that relationship
    explicit in the internal thread settings flow while preserving the
    existing app-server public API shape.
    
    ## What changed
    
    - Moved the cwd/environment pair through internal
    `ThreadSettingsOverrides.environment_settings` instead of a top-level
    internal `cwd` field.
    - Kept `thread/settings/update` public params unchanged, with app-server
    translating top-level `cwd` into the paired internal settings shape.
    - Moved `Op::UserInput` environment overrides into thread settings so
    user turns and settings updates use the same core path.
    - Updated core, app-server, MCP, memories, sample, and test callsites to
    construct the paired settings shape.
    
    ## Verification
    
    - `git diff --check`
    - Local test run starting after PR creation.
  • [codex] Support model-defined reasoning efforts (#26444)
    ## Summary
    - accept non-empty model-defined reasoning effort values while
    preserving built-in effort behavior
    - propagate the non-Copy effort type through core, app-server, TUI,
    telemetry, and persistence call sites
    - preserve string wire encoding and expose an open-string schema for
    clients
    - update model selection and shortcut behavior for model-advertised
    effort values
    
    ## Root cause
    `ReasoningEffort` gained a string-backed custom variant, so it could no
    longer implement `Copy` or rely on derived closed-enum serialization.
    Existing consumers still moved effort values from shared references and
    assumed a fixed built-in value set.
    
    ## Validation
    - `just fmt`
    - Local tests and compilation were not run per request; relying on CI.
  • feat: show enterprise monthly credit limits in status (#24812)
    ## Summary
    
    Enterprise users can have an effective monthly credit limit, but Codex
    `/status` currently drops that metadata from the account-usage response.
    
    This change adds the optional `spend_control.individual_limit`
    projection to the existing rate-limit snapshot flow. The backend client
    reads the monthly limit, app-server exposes it as `individualLimit`, and
    the TUI renders a `Monthly credit limit` row through the existing
    progress-bar renderer.
    
    When the backend does not return an effective monthly limit, existing
    rate-limit behavior is unchanged.
    
    ## Existing backend state
    
    The account-usage backend already returns the effective monthly limit
    and current usage together:
    
    ```json
    {
      "spend_control": {
        "reached": false,
        "individual_limit": {
          "limit": "25000",
          "used": "8000",
          "remaining": "17000",
          "used_percent": 32,
          "remaining_percent": 68,
          "reset_after_seconds": 86400,
          "reset_at": 1778137680
        }
      }
    }
    ```
    
    Before this change, Codex projected rolling `primary` and `secondary`
    windows plus `credits`. It ignored `spend_control.individual_limit`, so
    app-server clients and `/status` could not render the monthly cap.
    
    The updated flow is:
    
    ```text
    account usage backend
      -> backend-client reads spend_control.individual_limit
      -> existing rate-limit snapshot carries optional individual_limit
      -> app-server exposes optional individualLimit
      -> TUI renders Monthly credit limit
    ```
    
    ## App-server contract
    
    `account/rateLimits/read` and sparse `account/rateLimits/updated`
    notifications now include an additive nullable
    `rateLimits.individualLimit` field:
    
    ```json
    {
      "individualLimit": {
        "limit": "25000",
        "used": "8000",
        "remainingPercent": 68,
        "resetsAt": 1778137680
      }
    }
    ```
    
    In an `account/rateLimits/read` response, `null` means no monthly limit
    is available. `account/rateLimits/updated` remains a sparse rolling
    notification: clients merge available values into their most recent
    `account/rateLimits/read` snapshot or refetch. Nullable account metadata
    in a rolling notification does not clear a previously observed value.
    
    ## Design decisions
    
    - Extend the existing rate-limit snapshot instead of introducing a
    separate request or wire-level update protocol.
    - Keep the Codex projection narrow: `/status` needs the effective limit,
    current usage, remaining percentage, and reset timestamp.
    - Render the monthly row through the existing progress-bar renderer,
    with one optional detail line for `8,000 of 25,000 credits used`.
    - Keep the backend response optional so existing accounts and older
    usage states preserve their current behavior.
    - Preserve cached monthly metadata when sparse rolling notifications
    omit it. Live account-usage reads remain authoritative and can clear a
    removed limit.
    
    ## Visual evidence
    
    ```text
     Monthly credit limit:   [██████████████░░░░░░] 68% left (resets 07:08 on 7 May)
                             8,000 of 25,000 credits used
    ```
    
    Snapshot:
    `codex-rs/tui/src/status/snapshots/codex_tui__status__tests__status_snapshot_includes_enterprise_monthly_credit_limit.snap`
    
    ## Testing
    
    Tests: generated app-server schema verification, protocol tests,
    backend-client tests, app-server integration coverage, TUI snapshot
    coverage, formatting, and workspace lint cleanup.
  • app-server: remove experimental persist_extended_history bool flag (#25712)
    ## Summary
    
    Remove the dead experimental `persistExtendedHistory` app-server flag
    and collapse rollout persistence to the single policy app-server already
    used.
    
    ## What Changed
    
    - Removed `persistExtendedHistory` from v2 thread start/resume/fork
    params and deleted its deprecation notice path.
    - Removed the persistence-mode enums and plumbing through core, rollout,
    and thread-store.
    - Made rollout filtering mode-free, keeping the existing limited
    persisted-history behavior.
    
    ## Test Plan
    
    - `just write-app-server-schema`
    - `cargo nextest run --no-fail-fast -p codex-app-server-protocol
    schema_fixtures`
    - `cargo nextest run --no-fail-fast -p codex-app-server
    thread_shell_command_history_responses_exclude_persisted_command_executions`
    - `cargo nextest run --no-fail-fast -p codex-rollout -p
    codex-thread-store`
    - final `rg` for removed flag/type names
  • store and expose parent_thread_id on Threads (#25113)
    ## Why
    
    This PR
    https://github.com/openai/codex/pull/24161#discussion_r3325692763
    revealed a subagent data modeling issue, where we overloaded
    `forked_from_id` to also mean `parent_thread_id`. That's incorrect since
    guardian and review subagents can be a subagent and NOT fork the main
    thread's history.
    
    The solution here is to explicitly store a new `parent_thread_id` on
    `SessionMeta`, alongside `forked_from_id` which already exists. While
    we're at it, also expose it in the app-server protocol on the `Thread`
    object.
    
    A thread->subagent relationship and a fork of thread history are
    orthogonal concepts.
    
    ## What Changed
    
    - Added top-level `parent_thread_id` persistence on `SessionMeta` and
    runtime/session plumbing through `SessionConfiguredEvent`,
    `CodexSpawnArgs`, `SessionConfiguration`, `ThreadConfigSnapshot`,
    `TurnContext`, and `ModelClient`.
    - Made turn metadata, request headers, analytics, and subagent-start
    events read the separate runtime/top-level parent field instead of
    deriving general parent lineage from `SessionSource` or
    `forked_from_thread_id`.
    - Passed parent lineage separately at delegated subagent, review,
    guardian, agent-job, and multi-agent spawn construction sites;
    copied-history fork lineage remains derived only from `InitialHistory`.
    - Persisted and exposed parent lineage through rollout/thread-store
    projections and app-server v2 `Thread.parentThreadId`.
    - Updated app-server README text and regenerated app-server schema
    fixtures for the additive `parentThreadId` response field.
  • Move memories root setup out of core config (#24758)
    ## Why
    
    Config loading should not create or write-authorize the memories root
    just because memory support exists. Memory startup is the code path that
    actually materializes that tree.
    
    ## What
    
    - Stop creating the memories root during Config load and remove it from
    legacy workspace-write projections.
    - Grant the memories root read access only when the memories feature and
    use_memories are enabled.
    - Create the memories root inside memories startup before seeding
    extension instructions.
    - Update config and startup tests around the ownership boundary.
    
    ## Tests
    
    - just fmt
    - just fix -p codex-core
    - just fix -p codex-memories-write
    - just test -p codex-core
    memory_tool_makes_memories_root_readable_without_creating_or_widening_writes
    workspace_write_includes_configured_writable_root_once_without_memories_root
    permission_profile_override_keeps_memories_root_out_of_legacy_projection
    permissions_profiles_allow_direct_write_roots_outside_workspace_root
    default_permissions_profile_populates_runtime_sandbox_policy
    - just test -p codex-memories-write memories_startup_creates_memory_root
    
    Note: a broader just test -p codex-core run is not clean in this
    sandbox; it hit missing test_stdio_server plus seatbelt, realtime, and
    environment-sensitive failures. The changed config tests above pass.
  • [codex] add compaction metadata to turn headers (#24368)
    ## Summary
    - Add `request_kind` values for foreground turn, startup prewarm,
    compaction, and detached memory model requests.
    - Attach compaction dispatch metadata to local Responses, legacy
    `/v1/responses/compact`, and remote v2 compact requests.
    - Add the existing logical context-window identifier as `window_id` on
    turn-owned model request metadata.
    - Keep identity fields optional for detached memory requests, while
    still emitting `request_kind="memory"` in non-git/no-sandbox workspaces.
    
    ## Root Cause
    `x-codex-turn-metadata` has more than one producer. Foreground turns and
    compaction requests own a real turn and should carry that turn identity.
    Detached memory stage-one requests do not own a foreground turn, so
    absent identity fields are valid rather than missing data. Startup
    websocket prewarm is also a model request, but it has `generate=false`
    and must not be counted as a foreground turn.
    
    `thread_source` or session source identifies where a thread came from
    (for example review, guardian, or another subagent). `request_kind`
    identifies what the current outbound model request is doing (`turn`,
    `prewarm`, `compaction`, or `memory`). A review or guardian thread can
    issue either a normal turn request or a compaction request, so source
    cannot replace request kind.
    
    ## Behavior / Impact
    - Ordinary foreground requests send `request_kind="turn"`, their real
    identity fields, and `window_id="<thread_id>:<window_generation>"`.
    - Startup websocket warmup requests send `request_kind="prewarm"` so
    they are not counted as foreground turns.
    - Compaction requests send `request_kind="compaction"`, their real
    owning turn identity, the existing `window_id`, and
    `compaction.{trigger,reason,implementation,phase,strategy}`.
    - Detached memory stage-one requests send `request_kind="memory"`
    without `session_id`, `thread_id`, `turn_id`, or `window_id`; when no
    workspace metadata exists, the kind-only header is still emitted.
    - `session_id`, `thread_id`, `turn_id`, and `window_id` remain optional
    in the header schema because detached memory requests do not own a
    foreground turn or context window.
    - `window_id` is not a new ID system: it is copied from the already-sent
    `x-codex-window-id` / WS client metadata value at model-request dispatch
    time.
    - Existing `x-codex-window-id` HTTP/WS emission, value format,
    generation advancement, resume behavior, and fork reset behavior are
    unchanged.
    - `request_kind`, `window_id`, and upstream turn-owned identity fields
    remain schema-owned; input `responsesapi_client_metadata` cannot replace
    their canonical values.
    - No table, DAG, export, app-server API, or MCP `_meta` schema changes
    are included.
    
    A compaction attempt stopped by a pre-compact hook issues no model
    request and therefore has no request header; its outcome remains in
    analytics events. Status, error, duration, and token deltas also remain
    analytics fields rather than request-header fields.
    
    Future detached-memory attribution using a real initiating turn ID as
    `trigger_turn_id` is intentionally not part of this PR.
    
    ## Sync With Main
    - Final pushed head `716342e79` is rebased onto `origin/main@0d37db4b2`.
    - The metadata conflict came from upstream `#24160`, which added
    `forked_from_thread_id` on the same `turn_metadata` surface. Resolution
    preserves that field and its protection from client metadata override
    alongside this PR's request-kind, compaction, and window-id fields.
    - While resolving the overlapping commits, I removed an accidental
    recursive model-request overlay and a duplicate detached-memory header
    builder before completing the rebase.
    
    ## Latency / User Experience Boundary
    - Foreground turns perform no new filesystem, git, or network work. New
    fields are inserted into metadata already serialized for outgoing
    requests.
    - Compaction issues the same model/HTTP requests with the same prompt,
    model, service tier, and sampling settings; only metadata bytes change.
    - Startup prewarm already sent metadata; it is now correctly classified
    as `prewarm`.
    - Non-git detached memory now sends a small kind-only metadata header
    rather than no header.
    - This client diff adds no user-visible latency mechanism beyond
    negligible serialization and header bytes on already-existing requests.
    
    ## Validation
    On conflict-resolved head `1d35c2cfb` based on `origin/main@487521733`:
    - `just fmt` (passed)
    - `just fix -p codex-core` (passed)
    - `git diff --check origin/main...HEAD` (passed)
    - `just test -p codex-core -E 'test(turn_metadata) |
    test(websocket_first_turn_uses_startup_prewarm_and_create) |
    test(responses_stream_includes_turn_metadata_header_for_git_workspace_e2e)
    |
    test(responses_websocket_forwards_turn_metadata_on_initial_and_incremental_create)
    | test(remote_compact_v2_retries_failures_with_stream_retry_budget) |
    test(window_id_advances_after_compact_persists_on_resume_and_resets_on_fork)'`
    (`23 passed`; `bench-smoke` passed)
    - `just test -p codex-app-server -E
    'test(turn_start_forwards_client_metadata_to_responses_request_v2) |
    test(turn_start_forwards_client_metadata_to_responses_websocket_request_body_v2)
    | test(auto_compaction_remote_emits_started_and_completed_items)'` (`3
    passed`; `bench-smoke` passed)
    - `just test -p codex-memories-write` (`29 passed`; `bench-smoke`
    passed)
  • Uprev Rust toolchain pins to 1.95.0 (#24684)
    ## Summary
    - Bump the workspace Rust toolchain from `1.93.0` to `1.95.0` across
    Cargo, Bazel, CI, release workflows, devcontainers, and the Codex
    environment config.
    - Refresh `MODULE.bazel.lock` so the Bazel Rust toolchain artifacts
    match the new version.
    - Leave purpose-specific toolchains unchanged, including the
    `argument-comment-lint` nightly and the upstream `rusty_v8` `1.91.0`
    build pin.
    - Includes fixes for new lints from `just fix` and a few codex-authored
    fixes for lints without a suggestion.
  • Add experimental turn additional context (#24154)
    ## Summary
    
    Adds experimental `additionalContext` support to `turn/start` and
    `turn/steer` so clients can provide ephemeral external context, such as
    browser or automation state, without turning that plumbing into a
    visible user prompt or triggering user-prompt lifecycle behavior.
    
    ## API Shape
    
    The parameter shape is:
    
    ```ts
    additionalContext?: Record<string, {
      value: string
      kind: "untrusted" | "application"
    }> | null
    ```
    
    Example:
    
    ```json
    {
      "additionalContext": {
        "browser_info": {
          "value": "Active tab is CI failures.",
          "kind": "untrusted"
        },
        "automation_info": {
          "value": "CI rerun is in progress.",
          "kind": "application"
        }
      }
    }
    ```
    
    The keys are opaque and caller-defined.
    
    ## Context Injection
    
    When provided, accepted entries are inserted into model context as
    hidden contextual message items, not as visible thread user-message
    items.
    
    `kind: "untrusted"` entries are inserted with role `user`:
    
    ```text
    <external_${key}>${value}</external_${key}>
    ```
    
    `kind: "application"` entries are inserted with role `developer`:
    
    ```text
    <${key}>${value}</${key}>
    ```
    
    Values are not escaped. Each value is truncated to 1k approximate tokens
    before wrapping.
    
    For `turn/start`, accepted additional context is inserted before normal
    user input. For `turn/steer`, additional context is merged only when the
    steer includes non-empty user input; context-only steers still reject as
    empty input.
    
    ## Dedupe Strategy
    
    `AdditionalContextStore` lives on session state and stores the latest
    complete additional-context map.
    
    Each `turn/start` or non-empty `turn/steer` treats its
    `additionalContext` as the current complete set of values. Entries are
    injected only when the key is new or the exact entry for that key
    changed, including `value` or `kind`. After merging, the store is
    replaced with the provided map, so omitted keys are removed from the
    retained set and can be injected again later if reintroduced.
    
    Omitting `additionalContext`, passing `null`, or passing an empty object
    resets the store to empty and injects nothing.
    
    ## What Changed
    
    - Threads experimental v2 `additionalContext` through app-server into
    core turn start and steer handling.
    - Adds separate contextual fragment types for untrusted user-role
    context and application developer-role context.
    - Uses pending response input items so additional context can be
    combined with normal user input without treating it as prompt text.
    - Adds integration coverage for start/steer flow, role routing,
    dedupe/reset behavior, deletion/re-add behavior, hook-blocked input
    behavior, empty context-only steer rejection, external-fragment marker
    matching, and truncation.
  • Move memory state to a dedicated SQLite DB (#24591)
    ## Summary
    
    Generated memory rows and their stage-one/stage-two job state currently
    live in `state_5.sqlite` alongside thread metadata. That makes memory
    cleanup and regeneration share the main state schema even though those
    rows are memory-pipeline data and can be rebuilt independently from the
    durable thread records.
    
    This PR moves the memory-owned tables into a dedicated
    `memories_1.sqlite` runtime database while keeping thread metadata in
    `state_5.sqlite`.
    
    ## Changes
    
    - Adds a separate memories DB runtime, migrator, path helpers, telemetry
    kind, and Bazel compile data for `state/memory_migrations`.
    - Introduces `MemoryStore` behind `StateRuntime::memories()` and moves
    memory table/job operations onto that store.
    - Drops the old memory tables from the state DB and recreates their
    schema in `state/memory_migrations/0001_memories.sql`.
    - Updates memory startup, citation usage tracking, rollout pollution
    handling, `debug clear-memories`, and app-server `memory/reset` to
    operate through the memories DB.
    - Preserves cross-DB behavior by hydrating thread metadata from the
    state DB when selecting visible memory outputs and checking stage-one
    staleness.
    
    ## Verification
    
    - Added/updated `codex-state` tests for deleted-thread memory visibility
    and already-polluted phase-two enqueue behavior.
    - Updated `debug clear-memories`, app-server `memory/reset`, and
    memories startup tests to seed and assert memory rows through
    `memories_1.sqlite`.
  • chore: move memory prompt builder into extension (#24558)
    ## Why
    
    The memories extension now owns the read-path developer instructions it
    injects at thread start. Keeping that prompt builder and template in
    `codex-memories-read` left the extension depending on a helper crate for
    extension-specific prompt assembly, and kept async template/truncation
    dependencies in the read crate after the remaining read surface no
    longer needed them.
    
    ## What changed
    
    - Moved `prompts.rs`, its tests, and `templates/memories/read_path.md`
    from `memories/read` into `ext/memories`.
    - Wired `MemoryExtension` to call the local prompt builder and added the
    moved templates to `ext/memories/BUILD.bazel` compile data.
    - Removed the now-unused prompt export and prompt-related dependencies
    from `codex-memories-read`.
    
    ## Testing
    
    - Not run locally.
  • chore: drop orphaned codex memories MCP crate (#24555)
    ## Why
    
    The memory read-tool surface had two implementations: the app-server
    extension path under `ext/memories`, and an unused `codex-memories-mcp`
    workspace crate under `memories/mcp`. The MCP crate no longer has
    reverse dependents, so keeping it around preserves duplicate backend,
    schema, and tool code that is not part of the live app-server memory
    path.
    
    Dropping the orphaned crate makes the remaining memory crate split
    clearer: `memories/read` owns read-path prompt/citation helpers,
    `memories/write` owns the write pipeline, and `ext/memories` owns the
    app-server extension integration.
    
    ## What changed
    
    - Removed the `memories/mcp` crate and its Bazel/Cargo metadata.
    - Removed `memories/mcp` from the Rust workspace and lockfile.
    - Updated `memories/README.md` so it only lists the remaining reusable
    memory crates.
    
    ## Verification
    
    - `cargo metadata --format-version 1 --no-deps` succeeds.
  • [5 of 7] Replace OverrideTurnContext with ThreadSettings (#22508)
    **Stack position:** [5 of 7]
    
    ## Summary
    
    This PR adds `Op::ThreadSettings`, a queued settings-only update
    mechanism for changing stored thread settings without starting a new
    turn. It also removes the legacy `Op::OverrideTurnContext` in the same
    layer, so reviewers can see the replacement and deletion together.
    
    ## Changes
    
    - Add `Op::ThreadSettings` for settings-only queued updates.
    - Emit `ThreadSettingsApplied` with the effective thread settings
    snapshot after core applies an update.
    - Route settings-only updates through the same submission queue as user
    input.
    - Migrate remaining `OverrideTurnContext` tests and callers to the
    queued `Op::ThreadSettings` path.
    - Delete `Op::OverrideTurnContext` from the core protocol and submission
    loop.
    
    This stack addresses #20656 and #22090.
    
    ## Stack
    
    1. [1 of 7] [Add thread settings to
    UserInput](https://github.com/openai/codex/pull/23080)
    2. [2 of 7] [Remove
    UserInputWithTurnContext](https://github.com/openai/codex/pull/23081)
    3. [3 of 7] [Remove
    UserTurn](https://github.com/openai/codex/pull/23075)
    4. [4 of 7] [Placeholder for OverrideTurnContext
    cleanup](https://github.com/openai/codex/pull/23087)
    5. [5 of 7] [Replace OverrideTurnContext with
    ThreadSettings](https://github.com/openai/codex/pull/22508) (this PR)
    6. [6 of 7] [Add app-server thread settings
    API](https://github.com/openai/codex/pull/22509)
    7. [7 of 7] [Sync TUI thread
    settings](https://github.com/openai/codex/pull/22510)
  • [1 of 7] Add thread settings to UserInput (#23080)
    **Stack position:** [1 of 7]
    
    ## Summary
    
    The first three PRs in this stack are a cleanup pass before the actual
    thread settings API work.
    
    Today, core has several overlapping "user input" ops: `UserInput`,
    `UserInputWithTurnContext`, and `UserTurn`. They differ mostly in how
    much next-turn state they carry, which makes the later queued thread
    settings update harder to reason about and review.
    
    This PR starts that cleanup by adding the shared
    `ThreadSettingsOverrides` payload and allowing `Op::UserInput` to carry
    it. Existing variants remain in place here, so this layer is mostly a
    behavior-preserving API shape change plus mechanical constructor
    updates.
    
    ## End State After PR3
    
    By the end of PR3, `Op::UserInput` is the only "user input" core op. It
    can carry optional thread settings overrides for callers that need to
    update stored defaults with a turn, while callers without updates use
    empty settings. `Op::UserInputWithTurnContext` and `Op::UserTurn` are
    deleted.
    
    ## End State After PR5
    
    By the end of PR5, core will have only two ops for this area:
    
    - `Op::UserInput` for user-input-bearing submissions.
    - `Op::ThreadSettings` for settings-only updates.
    
    ## Stack
    
    1. [1 of 7] [Add thread settings to
    UserInput](https://github.com/openai/codex/pull/23080) (this PR)
    2. [2 of 7] [Remove
    UserInputWithTurnContext](https://github.com/openai/codex/pull/23081)
    3. [3 of 7] [Remove
    UserTurn](https://github.com/openai/codex/pull/23075)
    4. [4 of 7] [Placeholder for OverrideTurnContext
    cleanup](https://github.com/openai/codex/pull/23087)
    5. [5 of 7] [Replace OverrideTurnContext with
    ThreadSettings](https://github.com/openai/codex/pull/22508)
    6. [6 of 7] [Add app-server thread settings
    API](https://github.com/openai/codex/pull/22509)
    7. [7 of 7] [Sync TUI thread
    settings](https://github.com/openai/codex/pull/22510)
  • Densify and version memory summaries (#23148)
    ## Why
    
    `memory_summary.md` is injected into every session, so its value depends
    on staying compact, navigational, and easy to regenerate when the
    expected shape changes. The previous consolidation prompt encouraged a
    broad actionable inventory and allowed older summary structures to be
    patched in place, which makes it easier for stale or overly verbose
    summaries to keep accumulating.
    
    This change makes the summary format explicitly versioned and biases
    Phase 2 memory consolidation toward denser prompt-loaded context.
    
    ## What changed
    
    - Require `memory_summary.md` to begin with an exact `v1` header.
    - Teach consolidation to regenerate `memory_summary.md` from scratch
    when the header is missing or incompatible, while still allowing
    incremental updates to `MEMORY.md`.
    - Tighten the `memory_summary.md` instructions so it acts as a compact
    routing/index layer instead of a second handbook.
    - Lower `MEMORY_TOOL_DEVELOPER_INSTRUCTIONS_SUMMARY_TOKEN_LIMIT` from
    `5_000` to `2_500` so the runtime prompt budget matches the denser
    summary target.
    
    ## Verification
    
    Not run; this is a prompt/template update plus a prompt budget constant
    change.
  • chore: drop built-in MCPs (#22173)
    Drop something that was never used
  • [codex] request desktop attestation from app (#20619)
    ## Summary
    
    TL;DR: teaches `codex-rs` / app-server to request a desktop-provided
    attestation token and attach it as `x-oai-attestation` on the scoped
    ChatGPT Codex request paths.
    
    ![DeviceCheck attestation
    interface](https://raw.githubusercontent.com/openai/codex/dev/jm/devicecheck-diagram-assets/pr-assets/devicecheck-attestation-interface.png)
    
    ## Details
    
    This PR teaches the Codex app-server runtime how to request and attach
    an attestation token. It does not generate DeviceCheck tokens directly;
    instead, it relies on the connected desktop app to advertise that it can
    generate attestation and then asks that app for a fresh header value
    when needed.
    
    The flow is:
    
    1. The Codex desktop app connects to app-server.
    2. During `initialize`, the app can advertise that it supports
    `requestAttestation`.
    3. Before app-server calls selected ChatGPT Codex endpoints, it sends
    the internal server request `attestation/generate` to the app.
    4. app-server receives a pre-encoded header value back.
    5. app-server forwards that value as `x-oai-attestation` on the scoped
    outbound requests.
    
    The code in this repo is mostly protocol and runtime plumbing: it adds
    the app-server request/response shape, introduces an attestation
    provider in core, wires that provider into Responses / compaction /
    realtime setup paths, and covers the intended scoping with tests. The
    signed macOS DeviceCheck generation remains owned by the desktop app PR.
    
    ## Related PR
    
    - Codex desktop app implementation:
    https://github.com/openai/openai/pull/878649
    
    ## Validation
    
    <details>
    <summary>Tests run</summary>
    
    ```sh
    cargo test -p codex-app-server-protocol
    cargo test -p codex-core attestation --lib
    cargo test -p codex-app-server --lib attestation
    ```
    
    Also ran:
    
    ```sh
    just fix -p codex-core
    just fix -p codex-app-server
    just fix -p codex-app-server-protocol
    just fmt
    just write-app-server-schema
    ```
    
    </details>
    
    <details>
    <summary>E2E DeviceCheck validation</summary>
    
    First validated the signed desktop app boundary directly: launched a
    packaged signed `Codex.app`, sent `attestation/generate`, decoded the
    returned `v1.` attestation header, and validated the extracted
    DeviceCheck token with `personal/jm/verify_devicecheck_token.py` using
    bundle ID `com.openai.codex`. Apple returned `status_code: 200` and
    `is_ok: true`.
    
    Then ran the fuller app + app-server flow. The packaged `Codex.app`
    launched a current-branch app-server via `CODEX_CLI_PATH`, and a local
    MITM proxy intercepted outbound `chatgpt.com` traffic. The app-server
    requested `attestation/generate` from the real Electron app process, and
    the intercepted `/backend-api/codex/responses` traffic included
    `x-oai-attestation` on both routes:
    
    ```text
    GET  /backend-api/codex/responses  Upgrade: websocket  x-oai-attestation: present
    POST /backend-api/codex/responses  Upgrade: none       x-oai-attestation: present
    ```
    
    The captured header decoded to a DeviceCheck token that also validated
    with Apple for `com.openai.codex` (`status_code: 200`, `is_ok: true`,
    team `2DC432GLL2`).
    
    </details>
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • nit: comment (#21763)
    Because of an async discussion
  • Disable empty Cargo test targets (#21584)
    ## Summary
    
    `cargo test` has entails both running standard Rust tests and doctests.
    It turns out that the doctest discovery is fairly slow, and it's a cost
    you pay even for crates that don't include any doctests.
    
    This PR disables doctests with `doctest = false` for crates that lack
    any doctests.
    
    For the collection of crates below, this speeds up test execution by
    >4x.
    
    E.g., before this PR:
    
    ```
    Benchmark 1: cargo test     -p codex-utils-absolute-path     -p codex-utils-cache     -p codex-utils-cli     -p codex-utils-home-dir     -p codex-utils-output-truncation     -p codex-utils-path     -p codex-utils-string     -p codex-utils-template     -p codex-utils-elapsed     -p codex-utils-json-to-toml
      Time (mean ± σ):      1.849 s ±  4.455 s    [User: 0.752 s, System: 1.367 s]
      Range (min … max):    0.418 s … 14.529 s    10 runs
    ```
    
    And after:
    
    ```
    Benchmark 1: cargo test     -p codex-utils-absolute-path     -p codex-utils-cache     -p codex-utils-cli     -p codex-utils-home-dir     -p codex-utils-output-truncation     -p codex-utils-path     -p codex-utils-string     -p codex-utils-template     -p codex-utils-elapsed     -p codex-utils-json-to-toml
      Time (mean ± σ):     428.6 ms ±   6.9 ms    [User: 187.7 ms, System: 219.7 ms]
      Range (min … max):   418.0 ms … 436.8 ms    10 runs
    ```
    
    For a single crate, with >2x speedup, before:
    
    ```
    Benchmark 1: cargo test -p codex-utils-string
      Time (mean ± σ):     491.1 ms ±   9.0 ms    [User: 229.8 ms, System: 234.9 ms]
      Range (min … max):   480.9 ms … 512.0 ms    10 runs
    ```
    
    And after:
    
    ```
    Benchmark 1: cargo test -p codex-utils-string
      Time (mean ± σ):     213.9 ms ±   4.3 ms    [User: 112.8 ms, System: 84.0 ms]
      Range (min … max):   206.8 ms … 221.0 ms    13 runs
    ```
    
    Co-authored-by: Codex <noreply@openai.com>
  • feat: make built-in MCPs first-class runtime servers (#21356)
    ## DISCLAIMER
    This is experimental and no production service must rely on this
    
    ## Why
    
    Built-in MCPs are product-owned runtime capabilities, but they were
    previously flattened into the same config-backed stdio path as
    user-configured servers. That made them depend on a hidden `codex
    builtin-mcp` re-exec path, exposed them through config-oriented CLI
    flows, and erased distinctions the runtime needs to preserve—most
    notably whether an MCP call should count as external context for
    memory-mode pollution.
    
    ## What changed
    
    - Model product-owned built-ins separately from config-backed MCP
    servers via `BuiltinMcpServer` and `EffectiveMcpServer`.
    - Launch built-ins in process through a reusable async transport instead
    of the hidden `builtin-mcp` stdio subcommand.
    - Keep config-oriented CLI operations such as `codex mcp
    list/get/login/logout` scoped to configured servers, while merging
    built-ins only into the effective runtime server set.
    - Retain server metadata after launch so parallel-tool support and
    context classification come from the live server set; built-in
    `memories` is now classified as local Codex state rather than external
    context.
    
    ## Test plan
    
    - `cargo test -p codex-mcp`
    - `cargo test -p codex-core --test suite
    builtin_memories_mcp_call_does_not_mark_thread_memory_mode_polluted_when_configured`
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • 2- Use string service tiers in session protocol (#20971)
    ## Summary
    - break service tier session/op/app-server protocol fields from the
    closed enum to string tier ids
    - send the service tier string directly through model requests, prewarm,
    compaction, memories, and TUI/app-server turn starts
    - regenerate app-server protocol JSON/TypeScript schemas, removing the
    standalone ServiceTier TS enum
    
    ## Verification
    - just fmt
    - cargo check -p codex-core -p codex-app-server -p codex-tui
    - just write-app-server-schema
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • feat: add session_id (#20437)
    ## Summary
    
    Related to
    https://openai.slack.com/archives/C095U48JNL9/p1777537279707449
    TLDR:
    We update the meaning of session ids and thread ids:
    * thread_id stays as now
    * session_id become a shared id between every thread under a /root
    thread (i.e. every sub-agent share the same session id)
    
    This PR introduces an explicit `SessionId` and threads it through the
    protocol/client boundary so `session_id` and `thread_id` can diverge
    when they need to, while preserving compatibility for older serialized
    `session_configured` events.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • [codex-analytics] rework thread_source for thread analytics (#20949)
    ## Summary
    - make `thread_source` an explicit optional thread-level field on
    `thread/start`, `thread/fork`, and returned thread payloads
    - persist `thread_source` in rollout/session metadata so resumed live
    threads retain the original value
    - replace the old best-effort `session_source` -> `thread_source`
    mapping with an explicit caller-supplied analytics classification
    
    ## Why
    Before this change, analytics `thread_source` was populated by a
    best-effort mapping from `session_source`. `session_source` describes
    the runtime/client surface, not the actual thread-level origin, so that
    projection was not accurate enough to distinguish cases such as `user`,
    `subagent`, `memory_consolidation`, and future thread origins reliably.
    
    Making `thread_source` explicit keeps one thread-level analytics field
    while letting callers provide the real classification directly instead
    of recovering it indirectly from `session_source`.
    
    ## Impact
    For new analytics events, `thread_source` now reflects the explicit
    thread-level classification supplied by the caller rather than an
    inferred value derived from `session_source`. Existing protocol fields
    remain optional; callers that omit `threadSource` now produce `null`
    instead of a best-effort inferred value.
    
    ## Validation
    - `just write-app-server-schema`
    - `cargo test -p codex-analytics -p codex-core -p
    codex-app-server-protocol --no-run`
    - `cargo test -p codex-app-server-protocol
    generated_ts_optional_nullable_fields_only_in_params`
    - `cargo test -p codex-analytics
    thread_initialized_event_serializes_expected_shape`
    - `cargo test -p codex-core
    resume_stopped_thread_from_rollout_preserves_thread_source`
  • feat: add normalized matching to memory search (#21205)
    ## Why
    
    Memory search currently treats separators literally, so callers need to
    know whether a stored term uses spaces, hyphens, or no separators at
    all. That makes recall brittle for terms such as `MultiAgentV2` vs.
    `multi agent v2` and `cold-resume` vs. `cold resume`.
    
    ## What changed
    
    - Add an opt-in `normalized` mode to memory search that removes
    non-alphanumeric separators after any requested case folding.
    - Thread the new flag through the MCP `search` tool into the local
    backend while keeping existing literal matching as the default.
    - Reject queries that normalize to an empty string, and add regression
    coverage for both normalized matching and that validation path.
    
    ## Testing
    
    - `cargo test -p codex-memories-mcp`
  • feat: support windowed multi-query memory search (#21204)
    ## Why
    
    Memory search currently supports either independent substring matches or
    requiring every query to appear on the same line. That is too
    restrictive for memory files where related terms often land on nearby
    lines in the same note or bullet block.
    
    ## What changed
    
    - Replace the old `all` match mode with explicit tagged modes:
    `all_on_same_line` and `all_within_lines { line_count }`.
    - Add windowed matching in `codex-rs/memories/mcp/src/local.rs` so
    callers can require every query to appear within a bounded line range
    while returning only the minimal qualifying windows.
    - Reject invalid zero-width windows and update the MCP tool description
    plus argument parsing to expose the new mode.
    - Add coverage for same-line matching, windowed matching, and invalid
    `line_count` input.
    
    ## Verification
    
    - Added targeted coverage in `codex-rs/memories/mcp/src/local_tests.rs`
    for `search_supports_all_within_lines_match_mode` and
    `search_rejects_zero_line_window`.
    - Added server-side parsing coverage in
    `codex-rs/memories/mcp/src/server.rs` for
    `search_args_accept_windowed_all_match_mode`.