342 Commits

  • [codex] Inject agent graph store into ThreadManager (#29736)
    Pick up the AgentGraphStore migration.
    
    - Inject an explicit optional agent graph store into `ThreadManager` 
    - Move all calls to spawn, close, recursive resume, and
    subtree/archive/delete/feedback traversal through it
    - Keep using  `LocalAgentGraphStore` when SQLite is available
    
    This required some changes to the interface to deal with futures:
    
    - The interface now matches `ThreadStore`'s object-safe pattern by
    returning a boxed `AgentGraphStoreFuture` directly, allowing
    `ThreadManager` to hold `Arc<dyn AgentGraphStore>`
    
    *Slight behavior change!* Unfiltered subtree enumeration now performs a
    single all-status breadth-first traversal, so a closed grandchild
    beneath an open edge is included; the previous Open-then-Closed
    traversals could not cross mixed-status paths and silently omitted it.
  • chore(core) rm AskForApproval::OnFailure (#28418)
    ## Summary
    Deletes the OnFailure variant of the `AskForApproval` enum. This option
    has been deprecated since #11631.
    
    ## Testing
    - [x] Tests pass
  • Propagate safety buffering events to app-server clients (#29371)
    Responses API safety buffering metadata currently stops at the transport
    boundary, so app-server clients cannot render the in-progress safety
    review state.
    
    This change:
    - decodes and deduplicates `safety_buffering` metadata from Responses
    API SSE and WebSocket events without suppressing the original response
    event
    - emits a typed core event containing the requested model plus backend
    use cases and reasons
    - forwards that event as `turn/safetyBuffering/updated` through
    app-server v2 and updates generated protocol schemas
    - keeps the side-channel event out of persisted rollouts and turn timing
    
    This supports the Codex Apps buffering UX and depends on the Responses
    API backend work in https://github.com/openai/openai/pull/1044569 and
    https://github.com/openai/openai/pull/1044571.
    
    Validation:
    - focused `codex-core` safety-buffering integration test passes
    - `cargo check -p codex-core -p codex-app-server -p
    codex-app-server-protocol`
    - `just fix -p codex-api -p codex-protocol -p codex-core -p
    codex-app-server-protocol -p codex-app-server -p codex-rollout -p
    codex-rollout-trace -p codex-otel`
    - `just fmt`
    - broad package test run: 4,430/4,492 passed; 62 unrelated
    local-environment/concurrency failures involved unavailable test
    binaries, MCP subprocess setup, and app-server timeouts
  • current time reminders impl for system clock (varlatency 2/n) (#28824)
    Stacked on #28822.
    
    ## Summary
    
    - add a host-injectable current-time provider with a built-in system
    implementation
    - record UTC developer reminders in history immediately before due model
    requests
    - keep cadence state per session and force a refresh after compaction
    
    This does NOT include the app server client <-> server clock logic. This
    PR is only for the reminder message & system clock that will be used in
    prod.
    
    ## Testing
    
    - `just test -p codex-core varlatency_`
    - `just clippy -p codex-core -p codex-app-server -p codex-mcp-server -p
    codex-thread-manager-sample`
    - `just fmt`
  • Scope command approvals by execution environment (#28738)
    ## Why
    
    Command approval cache keys included the command and working directory,
    but not the execution environment. An approval for `/workspace` locally
    could therefore be reused for the same command and path on an executor.
    
    ## What changed
    
    - Include the selected environment ID in shell and unified-exec approval
    cache keys.
    - Carry that ID through the normal command approval request so clients
    can show which environment is being approved.
    - Expose the environment through app-server as a required nullable
    `environmentId` and show it in the inline TUI approval prompt.
    - Keep older recorded approval events compatible when the environment is
    absent.
    
    For example, `echo ok` in local `/workspace` and `echo ok` in executor
    `/workspace` now produce different approval keys and separate prompts.
    
    ## Scope
    
    This PR does not change network approvals, Guardian review actions, MCP
    elicitation, full-screen TUI rendering, or environment-ID validation.
    Remote `shell_command` execution itself remains in #28722; this PR only
    makes its approval key environment-aware.
  • [codex] Use expect in integration tests (#28441)
    The workspace denies `clippy::expect_used` in production. Although
    `clippy.toml` allows `expect` in tests, Bazel Clippy compiles
    integration-test helper code in a way that does not receive that
    exemption, which encouraged verbose `unwrap_or_else(... panic!(...))`
    and equivalent `match`/`let else` forms.
    
    This allows `clippy::expect_used` once at each integration-test crate
    root (including aggregated suites and test-support libraries), then
    replaces manual panic-based Result and Option unwraps with
    `expect`/`expect_err`. Standalone `tests/*.rs` files remain their own
    crate roots. Intentional assertion and unexpected-variant panics remain
    unchanged, and the production `expect_used = "deny"` lint remains in
    place.
    
    The cleanup is mechanical and net-negative in line count.
  • [codex] Load user instructions through an injected provider (#27101)
    ## Why
    
    We want to remove implicit use of `$CODEX_HOME` from `codex-core` and
    make embedders responsible for supplying user-level instructions. This
    also ensures user instructions load when no primary environment is
    selected.
    
    ## What changed
    
    Stacked on #27415, which makes `codex exec` surface thread-scoped
    runtime warnings.
    
    - Added `UserInstructionsProvider` to `codex-extension-api`, with
    absolute source attribution and recoverable loading warnings.
    - Added `codex-home` with the filesystem-backed provider for
    `AGENTS.override.md` and `AGENTS.md`, preserving precedence, fallback,
    trimming, lossy UTF-8 handling, and the existing uncapped global
    instruction size.
    - Removed global instruction loading from `Config` and require
    `ThreadManager` callers to inject a provider.
    - Load provider instructions once for each fresh root runtime, including
    runtimes without a primary environment. Running sessions retain their
    snapshot, while child agents inherit the parent snapshot without
    invoking the provider.
    - Keep provider instructions separate while loading project `AGENTS.md`,
    then assemble the model-visible instructions with the existing ordering,
    source attribution, warning, and turn-context behavior.
    - Wired the Codex home provider through the CLI, app server, MCP server,
    core facade, and thread-manager sample.
    
    ## Validation
    
    - `just test -p codex-home -p codex-extension-api`
    - `just test -p codex-core agents_md`
    - `just test -p codex-core guardian`
    - `just test -p codex-app-server
    thread_start_without_selected_environment_includes_only_global_instruction_source`
    - `just test -p codex-exec warning`
    - `just bazel-lock-check`
  • multi-agent: add path-based v2 activity tracking (#27007)
    ## Why
    
    Multi-agent v2 identifies agents by canonical paths, but its tool
    handlers still emitted the larger legacy collaboration begin/end events
    built around nickname and role metadata. App-server, rollout-trace,
    analytics, and TUI consumers therefore lacked one compact path-based
    completion signal that behaved consistently across live events and
    replay.
    
    The TUI also needs a bounded `/agent` status surface for v2 agents. It
    should use recent local activity for previews, refresh liveness without
    loading full histories, and keep the legacy picker available when no
    path-backed v2 agent is known.
    
    ## What changed
    
    - Replace the v2 `spawn_agent`, `send_message`, `followup_task`, and
    `interrupt_agent` legacy lifecycle emissions with a success-only
    `SubAgentActivity` event. The event records the tool call ID, occurrence
    time, affected thread, canonical agent path, and `started`,
    `interacted`, or `interrupted` kind.
    - Expose the activity as a completion-only app-server v2
    `subAgentActivity` thread item in live notifications and reconstructed
    history, regenerate the protocol schemas, and count it in sub-agent tool
    analytics.
    - Track canonical paths from live activity and loaded-thread metadata in
    the TUI, and render the activity in live and replayed transcripts.
    - Make `/agent` list running path-backed agents with summaries from
    bounded local event buffers. Each summary is capped at 240 graphemes,
    the scan is capped at six recent items, only the last three wrapped
    lines are shown, and command output is omitted. Liveness falls back to
    metadata-only `thread/read` when local turn state is unavailable.
    - Persist the activity as a terminal rollout-trace runtime payload and
    reduce it to the corresponding spawn, send, follow-up, or close
    interaction edge. `interrupt_agent` is classified as a close-edge
    operation.
    - Preserve the legacy picker when no path-backed v2 agent is known.
    
    ## Compatibility
    
    App-server v2 clients that consumed `collabAgentToolCall` begin/end
    pairs for these tools must handle the new completion-only
    `subAgentActivity` item. Legacy v1 collaboration behavior is unchanged.
    
    ## Screenshot
    
    <img width="684" height="288" alt="Screenshot 2026-06-08 at 15 40 47"
    src="https://github.com/user-attachments/assets/194b3cd0-619d-45fb-b587-cf3e2b1b8a1d"
    />
    
    ## Testing
    
    - `just test -p codex-app-server-protocol`
    - `just test -p codex-rollout-trace`
    - Added focused coverage for activity analytics, terminal trace
    serialization, spawn-edge reduction, `interrupt_agent` classification,
    TUI status rendering without aggregated command output, and clearing
    stale running state after a completed turn.
  • Pair thread environment settings (#26687)
    ## Why
    
    Thread cwd and environment selections are a single logical setting in
    core: updating one without the other can silently desynchronize the
    next-turn execution context. This change makes that relationship
    explicit in the internal thread settings flow while preserving the
    existing app-server public API shape.
    
    ## What changed
    
    - Moved the cwd/environment pair through internal
    `ThreadSettingsOverrides.environment_settings` instead of a top-level
    internal `cwd` field.
    - Kept `thread/settings/update` public params unchanged, with app-server
    translating top-level `cwd` into the paired internal settings shape.
    - Moved `Op::UserInput` environment overrides into thread settings so
    user turns and settings updates use the same core path.
    - Updated core, app-server, MCP, memories, sample, and test callsites to
    construct the paired settings shape.
    
    ## Verification
    
    - `git diff --check`
    - Local test run starting after PR creation.
  • [codex] Forward turn moderation metadata through app-server (#25710)
    ## Why
    First-party backends can supply turn-scoped moderation metadata that
    app-server clients need for client-side presentation. Exposing this as
    an experimental typed notification lets opted-in clients consume it
    without interpreting raw Responses API events.
    
    ## What changed
    - forward `response.metadata.openai_chatgpt_moderation_metadata` from
    Responses API SSE and WebSocket streams as turn-scoped moderation
    metadata
    - emit the experimental app-server v2 `turn/moderationMetadata`
    notification with `{ threadId, turnId, metadata }`
    - add app-server integration coverage for the typed moderation metadata
    notification
    
    ## Testing
    - `just test -p codex-core
    build_ws_client_metadata_includes_window_lineage_and_turn_metadata`
    - `just test -p codex-core` (fails locally: 46 failures and 1 timeout,
    primarily missing `test_stdio_server` and shell snapshot timeouts)
    - `just test -p codex-app-server-protocol`
    - `just test -p codex-app-server
    turn_moderation_metadata_emits_typed_notification_v2`
    - `just test -p codex-app-server` (fails locally: 792 passed, 10 failed,
    and 5 timed out; failures are in existing environment-sensitive tests,
    primarily because nested macOS `sandbox-exec` is not permitted)
    - `just write-app-server-schema --experimental --schema-root
    /tmp/codex-app-server-schema-experimental`
  • store and expose parent_thread_id on Threads (#25113)
    ## Why
    
    This PR
    https://github.com/openai/codex/pull/24161#discussion_r3325692763
    revealed a subagent data modeling issue, where we overloaded
    `forked_from_id` to also mean `parent_thread_id`. That's incorrect since
    guardian and review subagents can be a subagent and NOT fork the main
    thread's history.
    
    The solution here is to explicitly store a new `parent_thread_id` on
    `SessionMeta`, alongside `forked_from_id` which already exists. While
    we're at it, also expose it in the app-server protocol on the `Thread`
    object.
    
    A thread->subagent relationship and a fork of thread history are
    orthogonal concepts.
    
    ## What Changed
    
    - Added top-level `parent_thread_id` persistence on `SessionMeta` and
    runtime/session plumbing through `SessionConfiguredEvent`,
    `CodexSpawnArgs`, `SessionConfiguration`, `ThreadConfigSnapshot`,
    `TurnContext`, and `ModelClient`.
    - Made turn metadata, request headers, analytics, and subagent-start
    events read the separate runtime/top-level parent field instead of
    deriving general parent lineage from `SessionSource` or
    `forked_from_thread_id`.
    - Passed parent lineage separately at delegated subagent, review,
    guardian, agent-job, and multi-agent spawn construction sites;
    copied-history fork lineage remains derived only from `InitialHistory`.
    - Persisted and exposed parent lineage through rollout/thread-store
    projections and app-server v2 `Thread.parentThreadId`.
    - Updated app-server README text and regenerated app-server schema
    fixtures for the additive `parentThreadId` response field.
  • [codex] Add user input client ids (#24653)
    ## Summary
    
    Adds an optional `clientId` field to app-server v2 `UserInput` and
    carries it through the core `UserInput` model so clients can correlate
    echoed user input items without relying on payload equality.
    
    ## Details
    
    - Adds `client_id: Option<String>` to core `UserInput` variants.
    - Exposes the v2 app-server field as `clientId` on the wire and in
    generated TypeScript.
    - Preserves the id when converting between app-server v2 and core
    protocol types.
    - Regenerates app-server schema fixtures.
    
    ## Validation
    
    - `just fmt`
    - `just write-app-server-schema`
    - `cargo test -p codex-app-server-protocol`
    - `cargo test -p codex-protocol`
    - `just fix -p codex-app-server-protocol`
    - `just fix -p codex-protocol`
    - `git diff --check`
  • Update rmcp to 1.7.0 (#24763)
    WIll make it easier to uprev when the new draft spec is supported.
    
    Also updates reqwest where needed for compatibility but doesn't update
    it everywhere since this is already a large diff.
    
    The new version of rmcp handles certain kinds of authentication failures
    differently, this patch includes support for identifying the failing scope
    in a WWW-Authenticate header.
  • Add experimental turn additional context (#24154)
    ## Summary
    
    Adds experimental `additionalContext` support to `turn/start` and
    `turn/steer` so clients can provide ephemeral external context, such as
    browser or automation state, without turning that plumbing into a
    visible user prompt or triggering user-prompt lifecycle behavior.
    
    ## API Shape
    
    The parameter shape is:
    
    ```ts
    additionalContext?: Record<string, {
      value: string
      kind: "untrusted" | "application"
    }> | null
    ```
    
    Example:
    
    ```json
    {
      "additionalContext": {
        "browser_info": {
          "value": "Active tab is CI failures.",
          "kind": "untrusted"
        },
        "automation_info": {
          "value": "CI rerun is in progress.",
          "kind": "application"
        }
      }
    }
    ```
    
    The keys are opaque and caller-defined.
    
    ## Context Injection
    
    When provided, accepted entries are inserted into model context as
    hidden contextual message items, not as visible thread user-message
    items.
    
    `kind: "untrusted"` entries are inserted with role `user`:
    
    ```text
    <external_${key}>${value}</external_${key}>
    ```
    
    `kind: "application"` entries are inserted with role `developer`:
    
    ```text
    <${key}>${value}</${key}>
    ```
    
    Values are not escaped. Each value is truncated to 1k approximate tokens
    before wrapping.
    
    For `turn/start`, accepted additional context is inserted before normal
    user input. For `turn/steer`, additional context is merged only when the
    steer includes non-empty user input; context-only steers still reject as
    empty input.
    
    ## Dedupe Strategy
    
    `AdditionalContextStore` lives on session state and stores the latest
    complete additional-context map.
    
    Each `turn/start` or non-empty `turn/steer` treats its
    `additionalContext` as the current complete set of values. Entries are
    injected only when the key is new or the exact entry for that key
    changed, including `value` or `kind`. After merging, the store is
    replaced with the provided map, so omitted keys are removed from the
    retained set and can be injected again later if reintroduced.
    
    Omitting `additionalContext`, passing `null`, or passing an empty object
    resets the store to empty and injects nothing.
    
    ## What Changed
    
    - Threads experimental v2 `additionalContext` through app-server into
    core turn start and steer handling.
    - Adds separate contextual fragment types for untrusted user-role
    context and application developer-role context.
    - Uses pending response input items so additional context can be
    combined with normal user input without treating it as prompt text.
    - Adds integration coverage for start/steer flow, role routing,
    dedupe/reset behavior, deletion/re-add behavior, hook-blocked input
    behavior, empty context-only steer rejection, external-fragment marker
    matching, and truncation.
  • fix: reject legacy profile selectors (#24059)
    ## Why
    
    `--profile` now selects `<name>.config.toml`, so the legacy `profile`
    selector should not be reintroduced through config write or MCP tool
    paths. A matching legacy selector in base user config also needs the
    same migration guard as a matching legacy `[profiles.<name>]` table so
    profile loading fails with one clear migration error instead of mixing
    the old and new profile models.
    
    ## What
    
    - reject non-null app-server config writes to the top-level legacy
    `profile` selector
    - make `--profile <name>` reject base user config that still selects the
    same legacy `profile = "<name>"` value, alongside the existing matching
    legacy profile-table guard
    - reject removed MCP `codex` tool fields such as `profile` by denying
    unknown tool-call parameters and exposing that restriction in the
    generated schema
    - add regression coverage for the app-server write paths, config loader
    guard, and MCP tool input/schema behavior
    
    ## Verification
    
    - targeted regression tests cover the new app-server, config loader, and
    MCP rejection paths
  • config: remove legacy profile v1 resolution (#24051)
    ## Why
    
    [#23883](https://github.com/openai/codex/pull/23883) moved user-facing
    `--profile` selection onto profile v2, and
    [#23886](https://github.com/openai/codex/pull/23886) removed the old CLI
    `config_profile` override path. Core still had a second legacy path:
    `profile = "..."` could select `[profiles.*]` values while runtime
    config was built. Keeping that resolver alive preserves the old
    precedence model and profile-carrying surfaces even though profile
    selection now points at `$CODEX_HOME/<name>.config.toml`.
    
    ## What
    
    - Reject legacy top-level `profile = "..."` config while loading runtime
    config, with an error that points callers at `--profile <name>` and
    `<name>.config.toml` in the [core load
    path](https://github.com/openai/codex/blob/3d923366eca10a29143623124c6c6e538f058269/codex-rs/core/src/config/mod.rs#L2524-L2531).
    - Remove the remaining profile-v1 merge points from runtime config
    resolution, including features, permissions, model/provider selection,
    web search, Windows sandbox settings, TUI settings, role reloads, and
    OSS provider lookup.
    - Drop the leftover profile override surface from
    [`ConfigOverrides`](https://github.com/openai/codex/blob/3d923366eca10a29143623124c6c6e538f058269/codex-rs/core/src/config/mod.rs#L2118-L2148)
    and from the MCP server `codex` tool schema.
    - Prune profile-precedence tests that only exercised the removed
    resolver and replace them with rejection coverage for the legacy
    selector.
    
    ## Testing
    
    - Not run in this metadata pass.
    - Added
    [`legacy_profile_selection_is_rejected`](https://github.com/openai/codex/blob/3d923366eca10a29143623124c6c6e538f058269/codex-rs/core/src/config/config_tests.rs#L7942-L7965)
    coverage for the new runtime guard.
  • Make local environment optional in EnvironmentManager (#23369)
    ## Summary
    - make `EnvironmentManager` local environment/runtime paths optional
    - simplify constructor surface around snapshot materialization
    - rename local env accessors to `require_local_environment` /
    `try_local_environment`
    
    ## Validation
    - devbox Bazel build for touched crate surfaces
    - `//codex-rs/exec-server:exec-server-unit-tests`
    - `//codex-rs/app-server-client:app-server-client-unit-tests`
    - filtered touched `//codex-rs/core:core-unit-tests` cases
  • [5 of 7] Replace OverrideTurnContext with ThreadSettings (#22508)
    **Stack position:** [5 of 7]
    
    ## Summary
    
    This PR adds `Op::ThreadSettings`, a queued settings-only update
    mechanism for changing stored thread settings without starting a new
    turn. It also removes the legacy `Op::OverrideTurnContext` in the same
    layer, so reviewers can see the replacement and deletion together.
    
    ## Changes
    
    - Add `Op::ThreadSettings` for settings-only queued updates.
    - Emit `ThreadSettingsApplied` with the effective thread settings
    snapshot after core applies an update.
    - Route settings-only updates through the same submission queue as user
    input.
    - Migrate remaining `OverrideTurnContext` tests and callers to the
    queued `Op::ThreadSettings` path.
    - Delete `Op::OverrideTurnContext` from the core protocol and submission
    loop.
    
    This stack addresses #20656 and #22090.
    
    ## Stack
    
    1. [1 of 7] [Add thread settings to
    UserInput](https://github.com/openai/codex/pull/23080)
    2. [2 of 7] [Remove
    UserInputWithTurnContext](https://github.com/openai/codex/pull/23081)
    3. [3 of 7] [Remove
    UserTurn](https://github.com/openai/codex/pull/23075)
    4. [4 of 7] [Placeholder for OverrideTurnContext
    cleanup](https://github.com/openai/codex/pull/23087)
    5. [5 of 7] [Replace OverrideTurnContext with
    ThreadSettings](https://github.com/openai/codex/pull/22508) (this PR)
    6. [6 of 7] [Add app-server thread settings
    API](https://github.com/openai/codex/pull/22509)
    7. [7 of 7] [Sync TUI thread
    settings](https://github.com/openai/codex/pull/22510)
  • [1 of 7] Add thread settings to UserInput (#23080)
    **Stack position:** [1 of 7]
    
    ## Summary
    
    The first three PRs in this stack are a cleanup pass before the actual
    thread settings API work.
    
    Today, core has several overlapping "user input" ops: `UserInput`,
    `UserInputWithTurnContext`, and `UserTurn`. They differ mostly in how
    much next-turn state they carry, which makes the later queued thread
    settings update harder to reason about and review.
    
    This PR starts that cleanup by adding the shared
    `ThreadSettingsOverrides` payload and allowing `Op::UserInput` to carry
    it. Existing variants remain in place here, so this layer is mostly a
    behavior-preserving API shape change plus mechanical constructor
    updates.
    
    ## End State After PR3
    
    By the end of PR3, `Op::UserInput` is the only "user input" core op. It
    can carry optional thread settings overrides for callers that need to
    update stored defaults with a turn, while callers without updates use
    empty settings. `Op::UserInputWithTurnContext` and `Op::UserTurn` are
    deleted.
    
    ## End State After PR5
    
    By the end of PR5, core will have only two ops for this area:
    
    - `Op::UserInput` for user-input-bearing submissions.
    - `Op::ThreadSettings` for settings-only updates.
    
    ## Stack
    
    1. [1 of 7] [Add thread settings to
    UserInput](https://github.com/openai/codex/pull/23080) (this PR)
    2. [2 of 7] [Remove
    UserInputWithTurnContext](https://github.com/openai/codex/pull/23081)
    3. [3 of 7] [Remove
    UserTurn](https://github.com/openai/codex/pull/23075)
    4. [4 of 7] [Placeholder for OverrideTurnContext
    cleanup](https://github.com/openai/codex/pull/23087)
    5. [5 of 7] [Replace OverrideTurnContext with
    ThreadSettings](https://github.com/openai/codex/pull/22508)
    6. [6 of 7] [Add app-server thread settings
    API](https://github.com/openai/codex/pull/22509)
    7. [7 of 7] [Sync TUI thread
    settings](https://github.com/openai/codex/pull/22510)
  • config: add strict config parsing (#20559)
    ## Why
    
    Codex intentionally ignores unknown `config.toml` fields by default so
    older and newer config files keep working across versions. That leniency
    also makes typo detection hard because misspelled or misplaced keys
    disappear silently.
    
    This change adds an opt-in strict config mode so users and tooling can
    fail fast on unrecognized config fields without changing the default
    permissive behavior.
    
    This feature is possible because `serde_ignored` exposes the exact
    signal Codex needs: it lets Codex run ordinary Serde deserialization
    while recording fields Serde would otherwise ignore. That avoids
    requiring `#[serde(deny_unknown_fields)]` across every config type and
    keeps strict validation opt-in around the existing config model.
    
    ## What Changed
    
    ### Added strict config validation
    
    - Added `serde_ignored`-based validation for `ConfigToml` in
    `codex-rs/config/src/strict_config.rs`.
    - Combined `serde_ignored` with `serde_path_to_error` so strict mode
    preserves typed config error paths while also collecting fields Serde
    would otherwise ignore.
    - Added strict-mode validation for unknown `[features]` keys, including
    keys that would otherwise be accepted by `FeaturesToml`'s flattened
    boolean map.
    - Kept typed config errors ahead of ignored-field reporting, so
    malformed known fields are reported before unknown-field diagnostics.
    - Added source-range diagnostics for top-level and nested unknown config
    fields, including non-file managed preference source names.
    
    ### Kept parsing single-pass per source
    
    - Reworked file and managed-config loading so strict validation reuses
    the already parsed `TomlValue` for that source.
    - For actual config files and managed config strings, the loader now
    reads once, parses once, and validates that same parsed value instead of
    deserializing multiple times.
    - Validated `-c` / `--config` override layers with the same
    base-directory context used for normal relative-path resolution, so
    unknown override keys are still reported when another override contains
    a relative path.
    
    ### Scoped `--strict-config` to config-heavy entry points
    
    - Added support for `--strict-config` on the main config-loading entry
    points where it is most useful:
      - `codex`
      - `codex resume`
      - `codex fork`
      - `codex exec`
      - `codex review`
      - `codex mcp-server`
      - `codex app-server` when running the server itself
      - the standalone `codex-app-server` binary
      - the standalone `codex-exec` binary
    - Commands outside that set now reject `--strict-config` early with
    targeted errors instead of accepting it everywhere through shared CLI
    plumbing.
    - `codex app-server` subcommands such as `proxy`, `daemon`, and
    `generate-*` are intentionally excluded from the first rollout.
    - When app-server strict mode sees invalid config, app-server exits with
    the config error instead of logging a warning and continuing with
    defaults.
    - Introduced a dedicated `ReviewCommand` wrapper in `codex-rs/cli`
    instead of extending shared `ReviewArgs`, so `--strict-config` stays on
    the outer config-loading command surface and does not become part of the
    reusable review payload used by `codex exec review`.
    
    ### Coverage
    
    - Added tests for top-level and nested unknown config fields, unknown
    `[features]` keys, typed-error precedence, source-location reporting,
    and non-file managed preference source names.
    - Added CLI coverage showing invalid `--enable`, invalid `--disable`,
    and unknown `-c` overrides still error when `--strict-config` is
    present, including compound-looking feature names such as
    `multi_agent_v2.subagent_usage_hint_text`.
    - Added integration coverage showing both `codex app-server
    --strict-config` and standalone `codex-app-server --strict-config` exit
    with an error for unknown config fields instead of starting with
    fallback defaults.
    - Added coverage showing unsupported command surfaces reject
    `--strict-config` with explicit errors.
    
    ## Example Usage
    
    Run Codex with strict config validation enabled:
    
    ```shell
    codex --strict-config
    ```
    
    Strict config mode is also available on the supported config-heavy
    subcommands:
    
    ```shell
    codex --strict-config exec "explain this repository"
    codex review --strict-config --uncommitted
    codex mcp-server --strict-config
    codex app-server --strict-config --listen off
    codex-app-server --strict-config --listen off
    ```
    
    For example, if `~/.codex/config.toml` contains a typo in a key name:
    
    ```toml
    model = "gpt-5"
    approval_polic = "on-request"
    ```
    
    then `codex --strict-config` reports the misspelled key instead of
    silently ignoring it. The path is shortened to `~` here for readability:
    
    ```text
    $ codex --strict-config
    Error loading config.toml:
    ~/.codex/config.toml:2:1: unknown configuration field `approval_polic`
      |
    2 | approval_polic = "on-request"
      | ^^^^^^^^^^^^^^
    ```
    
    Without `--strict-config`, Codex keeps the existing permissive behavior
    and ignores the unknown key.
    
    Strict config mode also validates ad-hoc `-c` / `--config` overrides:
    
    ```text
    $ codex --strict-config -c foo=bar
    Error: unknown configuration field `foo` in -c/--config override
    
    $ codex --strict-config -c features.foo=true
    Error: unknown configuration field `features.foo` in -c/--config override
    ```
    
    Invalid feature toggles are rejected too, including values that look
    like nested config paths:
    
    ```text
    $ codex --strict-config --enable does_not_exist
    Error: Unknown feature flag: does_not_exist
    
    $ codex --strict-config --disable does_not_exist
    Error: Unknown feature flag: does_not_exist
    
    $ codex --strict-config --enable multi_agent_v2.subagent_usage_hint_text
    Error: Unknown feature flag: multi_agent_v2.subagent_usage_hint_text
    ```
    
    Unsupported commands reject the flag explicitly:
    
    ```text
    $ codex --strict-config cloud list
    Error: `--strict-config` is not supported for `codex cloud`
    ```
    
    ## Verification
    
    The `codex-cli` `strict_config` tests cover invalid `--enable`, invalid
    `--disable`, the compound `multi_agent_v2.subagent_usage_hint_text`
    case, unknown `-c` overrides, app-server strict startup failure through
    `codex app-server`, and rejection for unsupported commands such as
    `codex cloud`, `codex mcp`, `codex remote-control`, and `codex
    app-server proxy`.
    
    The config and config-loader tests cover unknown top-level fields,
    unknown nested fields, unknown `[features]` keys, source-location
    reporting, non-file managed config sources, and `-c` validation for keys
    such as `features.foo`.
    
    The app-server test suite covers standalone `codex-app-server
    --strict-config` startup failure for an unknown config field.
    
    ## Documentation
    
    The Codex CLI docs on developers.openai.com/codex should mention
    `--strict-config` as an opt-in validation mode for supported
    config-heavy entry points once this ships.
  • Support multi-environment apply_patch selection (#21617)
    ## Summary
    - add multi-environment apply_patch routing for both freeform and
    function-call tool flows
    - parse and reconcile the optional environment selector in the main
    apply_patch parser, then verify against the selected environment in the
    handler
    - carry environment_id through runtime and approval surfaces so
    remote-targeted patches stay explicit end to end
    
    ## Testing
    - just fmt
    - remote exec-server e2e: `cargo test -p codex-core --test all
    apply_patch_multi_environment_uses_remote_executor -- --nocapture` on
    dev via `scripts/test-remote-env.sh`
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Add process-scoped SQLite telemetry (#22154)
    ## Summary
    - add SQLite init, backfill-gate, and fallback telemetry without
    introducing a cross-cutting state-db access wrapper
    - install one process-scoped telemetry sink after OTEL startup and let
    low-level state/rollout paths emit through it directly
    - add process-start metrics for the process owners that initialize
    SQLite
    
    ---------
    
    Co-authored-by: Owen Lin <owen@openai.com>
  • extension: wire extension registries into sessions (#21737)
    ## Why
    
    [#21736](https://github.com/openai/codex/pull/21736) introduces the
    typed extension API, but the runtime does not yet carry a registry
    through thread/session startup or give contributors host-owned stores to
    read from. This PR wires that host-side path so later feature migrations
    can move product-specific behavior behind typed contributions without
    adding another bespoke seam directly to `codex-core`.
    
    ## What changed
    
    - Thread `ExtensionRegistry<Config>` through `ThreadManager`,
    `CodexSpawnArgs`, `Session`, and sub-agent spawn paths.
    - Wire `ThreadStartContributor` and `ContextContributor`
    - Expose the small supporting surface needed by non-core callers that
    construct threads directly, including `empty_extension_registry()`
    through `codex-core-api`.
    
    This PR lands the host plumbing only: the app-server registry is still
    empty, and concrete feature migrations are intended to follow
    separately.
  • Reapply "Move skills watcher to app-server" (#21652)
    ## Why
    
    PR #21460 reverted the earlier move of skills change watching from
    `codex-core` into app-server. This reapplies that boundary change so
    app-server owns client-facing `skills/changed` notifications and core no
    longer carries the watcher.
    
    ## What
    
    - Restore the app-server `SkillsWatcher` and register it from thread
    listener setup.
    - Remove the core-owned skills watcher and its core live-reload
    integration surface.
    - Restore app-server coverage for `skills/changed` notifications after a
    watched skill file changes.
    
    ## Validation
    
    - `cargo test -p codex-app-server --test all
    suite::v2::skills_list::skills_changed_notification_is_emitted_after_skill_change
    -- --exact --nocapture`
    - `cargo test -p codex-core --lib --no-run`
  • Enable --deny-warnings for cargo shear (#21616)
    ## Summary
    
    In https://github.com/openai/codex/pull/21584, we disabled doctests for
    crates that lack any doctests. We can enforce that property via `cargo
    shear --deny-warnings`: crates that lack doctests will be flagged if
    doctests are enabled, and crates with doctests will be flagged if
    doctests are disabled.
    
    A few additional notes:
    
    - By adding `--deny-warnings`, `cargo shear` also flagged a number of
    modules that were not reachable at all. Some of those have been removed.
    - This PR removes a usage of `windows_modules!` (since `cargo shear` and
    `rustfmt` couldn't see through it) in favor of simple `#[cfg(target_os =
    "windows")]` macros. As a consequence, many of these files exhibit churn
    in this PR, since they weren't being formatted by `rustfmt` at all on
    main.
    - Again, to make the code more analyzable, this PR also removes some
    usages of `#[path = "cwd_junction.rs"]` in favor of a more standard
    module structure. The bin sidecar structure is still retained, but,
    e.g., `windows-sandbox-rs/src/bin/command_runner.rs‎` was moved to
    `windows-sandbox-rs/src/bin/command_runner/main.rs`, and so on.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • [codex] request desktop attestation from app (#20619)
    ## Summary
    
    TL;DR: teaches `codex-rs` / app-server to request a desktop-provided
    attestation token and attach it as `x-oai-attestation` on the scoped
    ChatGPT Codex request paths.
    
    ![DeviceCheck attestation
    interface](https://raw.githubusercontent.com/openai/codex/dev/jm/devicecheck-diagram-assets/pr-assets/devicecheck-attestation-interface.png)
    
    ## Details
    
    This PR teaches the Codex app-server runtime how to request and attach
    an attestation token. It does not generate DeviceCheck tokens directly;
    instead, it relies on the connected desktop app to advertise that it can
    generate attestation and then asks that app for a fresh header value
    when needed.
    
    The flow is:
    
    1. The Codex desktop app connects to app-server.
    2. During `initialize`, the app can advertise that it supports
    `requestAttestation`.
    3. Before app-server calls selected ChatGPT Codex endpoints, it sends
    the internal server request `attestation/generate` to the app.
    4. app-server receives a pre-encoded header value back.
    5. app-server forwards that value as `x-oai-attestation` on the scoped
    outbound requests.
    
    The code in this repo is mostly protocol and runtime plumbing: it adds
    the app-server request/response shape, introduces an attestation
    provider in core, wires that provider into Responses / compaction /
    realtime setup paths, and covers the intended scoping with tests. The
    signed macOS DeviceCheck generation remains owned by the desktop app PR.
    
    ## Related PR
    
    - Codex desktop app implementation:
    https://github.com/openai/openai/pull/878649
    
    ## Validation
    
    <details>
    <summary>Tests run</summary>
    
    ```sh
    cargo test -p codex-app-server-protocol
    cargo test -p codex-core attestation --lib
    cargo test -p codex-app-server --lib attestation
    ```
    
    Also ran:
    
    ```sh
    just fix -p codex-core
    just fix -p codex-app-server
    just fix -p codex-app-server-protocol
    just fmt
    just write-app-server-schema
    ```
    
    </details>
    
    <details>
    <summary>E2E DeviceCheck validation</summary>
    
    First validated the signed desktop app boundary directly: launched a
    packaged signed `Codex.app`, sent `attestation/generate`, decoded the
    returned `v1.` attestation header, and validated the extracted
    DeviceCheck token with `personal/jm/verify_devicecheck_token.py` using
    bundle ID `com.openai.codex`. Apple returned `status_code: 200` and
    `is_ok: true`.
    
    Then ran the fuller app + app-server flow. The packaged `Codex.app`
    launched a current-branch app-server via `CODEX_CLI_PATH`, and a local
    MITM proxy intercepted outbound `chatgpt.com` traffic. The app-server
    requested `attestation/generate` from the real Electron app process, and
    the intercepted `/backend-api/codex/responses` traffic included
    `x-oai-attestation` on both routes:
    
    ```text
    GET  /backend-api/codex/responses  Upgrade: websocket  x-oai-attestation: present
    POST /backend-api/codex/responses  Upgrade: none       x-oai-attestation: present
    ```
    
    The captured header decoded to a DeviceCheck token that also validated
    with Apple for `com.openai.codex` (`status_code: 200`, `is_ok: true`,
    team `2DC432GLL2`).
    
    </details>
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Load configured environments from CODEX_HOME (#20667)
    ## Why
    
    The earlier PRs add stdio transport support and the config-backed
    environment provider, but the feature remains inert until normal Codex
    entrypoints construct `EnvironmentManager` with enough context to
    discover `CODEX_HOME/environments.toml`. This final stack PR activates
    the provider while preserving the legacy `CODEX_EXEC_SERVER_URL`
    fallback when no environments file exists.
    
    **Stack position:** this is PR 5 of 5. It is the product wiring PR that
    activates the configured environment provider added in PR 4.
    
    ## What Changed
    
    - Thread `codex_home` into `EnvironmentManagerArgs`.
    - Change `EnvironmentManager::new(...)` to load the provider from
    `CODEX_HOME`.
    - Preserve legacy behavior by falling back to
    `DefaultEnvironmentProvider::from_env()` when `environments.toml` is
    absent.
    - Make `environments.toml`-backed managers start new threads with all
    configured environments, default first, while keeping the legacy env-var
    path single-default.
    - Update the app-server, TUI, exec, MCP server, connector, prompt-debug,
    and thread-manager-sample callsites to pass `codex_home` and handle
    provider-loading errors.
    
    ## Self-Review Notes
    
    - The multi-environment startup path is intentionally tied to the
    `environments.toml` provider. Using `>1` configured environment as the
    only signal would also expand the legacy `CODEX_EXEC_SERVER_URL`
    provider because it keeps `local` addressable alongside `remote`.
    - The startup environment list is still derived inside
    `EnvironmentManager`; the provider only says whether its snapshot should
    start new threads with all configured environments.
    - The thread-manager sample was updated to pass the current
    `ThreadManager::new(...)` installation id argument so the stack compiles
    under Bazel.
    
    ## Stack
    
    - 1. https://github.com/openai/codex/pull/20663 - Add stdio exec-server
    listener
    - 2. https://github.com/openai/codex/pull/20664 - Add stdio exec-server
    client transport
    - 3. https://github.com/openai/codex/pull/20665 - Make environment
    providers own default selection
    - 4. https://github.com/openai/codex/pull/20666 - Add CODEX_HOME
    environments TOML provider
    - **5. This PR:** https://github.com/openai/codex/pull/20667 - Load
    configured environments from CODEX_HOME
    
    Split from original draft: https://github.com/openai/codex/pull/20508
    
    ## Validation
    
    - `just fmt`
    - `git diff --check`
    - `bazel build --config=remote --strategy=remote
    --remote_download_toplevel
    //codex-rs/thread-manager-sample:codex-thread-manager-sample`
    - `bazel test --config=remote --strategy=remote
    --remote_download_toplevel
    //codex-rs/exec-server:exec-server-unit-tests`
    - `bazel test --config=remote --strategy=remote
    --remote_download_toplevel --test_sharding_strategy=disabled
    --test_arg=default_thread_environment_selections_use_manager_default_id
    //codex-rs/core:core-unit-tests`
    - `bazel test --config=remote --strategy=remote
    --remote_download_toplevel --test_sharding_strategy=disabled
    --test_arg=start_thread_uses_all_default_environments_from_codex_home
    //codex-rs/core:core-unit-tests`
    
    ## Documentation
    
    This activates `CODEX_HOME/environments.toml`; user-facing documentation
    should be added before this stack is treated as a documented public
    workflow.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • [codex-analytics] plumb protocol-native review timing (#21434)
    ## Why
    
    We want terminal tool review analytics, but the reducer should not stamp
    review timing from its own wall clock.
    
    This PR plumbs review timing through the real protocol and app-server
    seams so downstream analytics can consume the emitter's timestamps
    directly. Guardian reviews keep their enriched `started_at` /
    `completed_at` analytics fields by deriving those legacy second-based
    values from the same protocol-native millisecond lifecycle timestamps,
    rather than sampling a separate analytics clock.
    
    ## What changed
    
    - add `started_at_ms` to user approval request payloads
    - add `started_at_ms` / `completed_at_ms` to guardian review
    notifications
    - preserve Guardian review `started_at` / `completed_at` enrichment from
    the protocol-native timing source
    - stamp typed `ServerResponse` analytics facts with app-server-observed
    `completed_at_ms`
    - thread the new timing fields through core, protocol, app-server, TUI,
    and analytics fixtures
    
    ## Verification
    
    - `cargo test -p codex-app-server outgoing_message --manifest-path
    codex-rs/Cargo.toml`
    - `cargo test -p codex-app-server-protocol guardian --manifest-path
    codex-rs/Cargo.toml`
    - `cargo test -p codex-tui guardian --manifest-path codex-rs/Cargo.toml`
    - `cargo test -p codex-analytics analytics_client_tests --manifest-path
    codex-rs/Cargo.toml`
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/21434).
    * #18748
    * __->__ #21434
    * #18747
    * #17090
    * #17089
    * #20514
  • Disable empty Cargo test targets (#21584)
    ## Summary
    
    `cargo test` has entails both running standard Rust tests and doctests.
    It turns out that the doctest discovery is fairly slow, and it's a cost
    you pay even for crates that don't include any doctests.
    
    This PR disables doctests with `doctest = false` for crates that lack
    any doctests.
    
    For the collection of crates below, this speeds up test execution by
    >4x.
    
    E.g., before this PR:
    
    ```
    Benchmark 1: cargo test     -p codex-utils-absolute-path     -p codex-utils-cache     -p codex-utils-cli     -p codex-utils-home-dir     -p codex-utils-output-truncation     -p codex-utils-path     -p codex-utils-string     -p codex-utils-template     -p codex-utils-elapsed     -p codex-utils-json-to-toml
      Time (mean ± σ):      1.849 s ±  4.455 s    [User: 0.752 s, System: 1.367 s]
      Range (min … max):    0.418 s … 14.529 s    10 runs
    ```
    
    And after:
    
    ```
    Benchmark 1: cargo test     -p codex-utils-absolute-path     -p codex-utils-cache     -p codex-utils-cli     -p codex-utils-home-dir     -p codex-utils-output-truncation     -p codex-utils-path     -p codex-utils-string     -p codex-utils-template     -p codex-utils-elapsed     -p codex-utils-json-to-toml
      Time (mean ± σ):     428.6 ms ±   6.9 ms    [User: 187.7 ms, System: 219.7 ms]
      Range (min … max):   418.0 ms … 436.8 ms    10 runs
    ```
    
    For a single crate, with >2x speedup, before:
    
    ```
    Benchmark 1: cargo test -p codex-utils-string
      Time (mean ± σ):     491.1 ms ±   9.0 ms    [User: 229.8 ms, System: 234.9 ms]
      Range (min … max):   480.9 ms … 512.0 ms    10 runs
    ```
    
    And after:
    
    ```
    Benchmark 1: cargo test -p codex-utils-string
      Time (mean ± σ):     213.9 ms ±   4.3 ms    [User: 112.8 ms, System: 84.0 ms]
      Range (min … max):   206.8 ms … 221.0 ms    13 runs
    ```
    
    Co-authored-by: Codex <noreply@openai.com>
  • Revert state DB injection and agent graph store (#21481)
    ## Why
    
    Reverts #20689 to restore the previous optional state DB plumbing. The
    conflict resolution keeps the newer installation ID and session/thread
    identity changes that landed after #20689, while removing the mandatory
    state DB and agent graph store dependency from ThreadManager
    construction.
    
    ## What changed
    
    - Restored `Option<StateDbHandle>` through app-server, MCP server,
    prompt debug, and test entry points.
    - Removed the `codex-core` dependency on `codex-agent-graph-store` and
    reverted descendant lookup back to the existing state DB path when
    available.
    - Kept newer `installation_id` forwarding by passing it beside the
    optional DB handle.
    - Kept local thread-name updates working when the optional state DB
    handle is absent.
    
    ## Validation
    
    - `git diff --check`
    - `cargo test -p codex-thread-store`
    - `cargo test -p codex-state -p codex-rollout -p
    codex-app-server-protocol`
    - Attempted `env CARGO_INCREMENTAL=0 cargo test -p codex-core -p
    codex-app-server -p codex-app-server-client -p codex-mcp-server -p
    codex-thread-manager-sample -p codex-tui`; blocked locally by a rustc
    ICE while compiling `v8 v146.4.0` with `rustc 1.93.0 (254b59607
    2026-01-19)` on `aarch64-apple-darwin`.
  • Move skills watcher to app-server (#21287)
    ## Why
    
    Skills update notifications are app-server API behavior, but the watcher
    lived in `codex-core` and surfaced through
    `EventMsg::SkillsUpdateAvailable`. Moving the watcher out keeps core
    focused on thread execution and lets app-server own both cache
    invalidation and the `skills/changed` notification.
    
    ## What changed
    
    - Added an app-server-owned skills watcher that watches local skill
    roots, clears the shared skills cache, and emits `skills/changed`
    directly.
    - Registers skill watches from the common app-server thread listener
    attach path, including direct starts, resumes, and app-server-observed
    child or forked threads.
    - Stores the `WatchRegistration` on `ThreadState`, so listener
    replacement, thread teardown, idle unload, and app-server shutdown
    deregister by dropping the RAII guard.
    - Removed `EventMsg::SkillsUpdateAvailable`, the core watcher, and the
    old core live-reload test.
    - Extended the app-server skills change test to verify a cached skills
    list is refreshed after a filesystem change without forcing reload.
    
    ## Validation
    
    - `cargo check -p codex-core -p codex-app-server -p codex-mcp-server -p
    codex-rollout -p codex-rollout-trace`
    - `cargo test -p codex-app-server
    skills_changed_notification_is_emitted_after_skill_change`
  • Remove core MCP list tools op (#21281)
    ## Why
    
    The core `Op::ListMcpTools` request path is no longer needed. Keeping it
    around left a dead request/response surface alongside the app-server MCP
    inventory APIs that own current server status listing.
    
    ## What Changed
    
    - Removed `Op::ListMcpTools`, `EventMsg::McpListToolsResponse`, and the
    core handler that built the MCP snapshot response.
    - Removed the now-unused `codex-mcp` snapshot wrapper/export and passive
    event handling arms in rollout and MCP-server consumers.
    - Updated tests that used the old op as a synchronization hook to wait
    on existing startup/skills events, and deleted the plugin test that only
    exercised the removed listing op.
    
    ## Validation
    
    - `cargo test -p codex-protocol`
    - `cargo test -p codex-mcp`
    - `cargo test -p codex-rollout -p codex-rollout-trace -p
    codex-mcp-server`
    - `cargo test -p codex-core --test all
    pending_input::queued_inter_agent_mail`
    - `cargo test -p codex-core --test all
    rmcp_client::stdio_mcp_tool_call_includes_sandbox_state_meta`
    - `cargo test -p codex-core --test all
    rmcp_client::stdio_image_responses`
    - `just fix -p codex-core -p codex-protocol -p codex-mcp -p
    codex-rollout -p codex-rollout-trace -p codex-mcp-server`
  • Move message history out of core (#21278)
    ## Why
    
    Message history was implemented inside `codex-core` and surfaced through
    core protocol ops and `SessionConfiguredEvent` fields even though the
    current consumer is TUI-local prompt recall. That made core own UI
    history persistence and exposed `history_log_id` / `history_entry_count`
    through surfaces that app-server and other clients do not need.
    
    This change moves message history persistence out of core and keeps the
    recall plumbing local to the TUI.
    
    ## What changed
    
    - Added a new `codex-message-history` crate for appending, looking up,
    trimming, and reading metadata from `history.jsonl`.
    - Removed core protocol history ops/events: `AddToHistory`,
    `GetHistoryEntryRequest`, and `GetHistoryEntryResponse`.
    - Removed `history_log_id` and `history_entry_count` from
    `SessionConfiguredEvent` and updated exec/MCP/test fixtures accordingly.
    - Updated the TUI to dispatch local app events for message-history
    append/lookup and keep its persistent-history metadata in TUI session
    state.
    
    ## Validation
    
    - `cargo test -p codex-message-history -p codex-protocol`
    - `cargo test -p codex-exec event_processor_with_json_output`
    - `cargo test -p codex-mcp-server outgoing_message`
    - `cargo test -p codex-tui`
    - `just fix -p codex-message-history -p codex-protocol -p codex-core -p
    codex-tui -p codex-exec -p codex-mcp-server`
  • Move installation ID resolution out of core startup (#21182)
    ## Summary
    
    - resolve or inject the installation ID before core startup and pass it
    through `ThreadManager`, `CodexSpawnArgs`, and `Session` as a plain
    `String`
    - keep child sessions on the parent installation ID instead of
    rediscovering it inside core
    - propagate installation ID startup failures in `mcp-server` instead of
    panicking
    
    ## Why
    
    Core was still touching the filesystem on the session startup path to
    discover `installation_id`. This moves that work to the outer host
    boundary so core no longer depends on `codex_home` reads during session
    construction.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • feat: add session_id (#20437)
    ## Summary
    
    Related to
    https://openai.slack.com/archives/C095U48JNL9/p1777537279707449
    TLDR:
    We update the meaning of session ids and thread ids:
    * thread_id stays as now
    * session_id become a shared id between every thread under a /root
    thread (i.e. every sub-agent share the same session id)
    
    This PR introduces an explicit `SessionId` and threads it through the
    protocol/client boundary so `session_id` and `thread_id` can diverge
    when they need to, while preserving compatibility for older serialized
    `session_configured` events.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • [codex-analytics] rework thread_source for thread analytics (#20949)
    ## Summary
    - make `thread_source` an explicit optional thread-level field on
    `thread/start`, `thread/fork`, and returned thread payloads
    - persist `thread_source` in rollout/session metadata so resumed live
    threads retain the original value
    - replace the old best-effort `session_source` -> `thread_source`
    mapping with an explicit caller-supplied analytics classification
    
    ## Why
    Before this change, analytics `thread_source` was populated by a
    best-effort mapping from `session_source`. `session_source` describes
    the runtime/client surface, not the actual thread-level origin, so that
    projection was not accurate enough to distinguish cases such as `user`,
    `subagent`, `memory_consolidation`, and future thread origins reliably.
    
    Making `thread_source` explicit keeps one thread-level analytics field
    while letting callers provide the real classification directly instead
    of recovering it indirectly from `session_source`.
    
    ## Impact
    For new analytics events, `thread_source` now reflects the explicit
    thread-level classification supplied by the caller rather than an
    inferred value derived from `session_source`. Existing protocol fields
    remain optional; callers that omit `threadSource` now produce `null`
    instead of a best-effort inferred value.
    
    ## Validation
    - `just write-app-server-schema`
    - `cargo test -p codex-analytics -p codex-core -p
    codex-app-server-protocol --no-run`
    - `cargo test -p codex-app-server-protocol
    generated_ts_optional_nullable_fields_only_in_params`
    - `cargo test -p codex-analytics
    thread_initialized_event_serializes_expected_shape`
    - `cargo test -p codex-core
    resume_stopped_thread_from_rollout_preserves_thread_source`
  • [codex] Remove legacy ListSkills op (#21282)
    ## Why
    
    `skills/list` is already exposed through app-server v2 and covered by
    the app-server test suite. Keeping the separate core `Op::ListSkills`
    path leaves a duplicate legacy protocol surface that no longer needs to
    be maintained.
    
    ## What Changed
    
    - Removed `Op::ListSkills` and `EventMsg::ListSkillsResponse` from the
    core protocol.
    - Deleted the corresponding core session handler and stale core
    integration tests.
    - Removed rollout/MCP ignore branches and protocol v1 docs references
    for the deleted event/op.
    - Left app-server `skills/list` and its existing coverage intact.
    
    ## Validation
    
    - `cargo test -p codex-protocol`
    - `cargo test -p codex-core --test all suite::skills`
    - `cargo check -p codex-mcp-server -p codex-rollout -p
    codex-rollout-trace`
    - `just fix -p codex-core`
  • [codex] Move thread naming to app server (#21260)
    ## Why
    
    Thread names are app-server metadata now, backed by the thread store and
    sqlite state database. Keeping a core `SetThreadName` op plus a rollout
    `thread_name_updated` event made rename persistence live in the wrong
    layer and required historical replay support for an event that new
    app-server flows should not write.
    
    ## What changed
    
    - Removed `Op::SetThreadName` and `EventMsg::ThreadNameUpdated` from the
    core protocol and deleted the core handler path that appended rename
    events to rollouts.
    - Updated app-server `thread/name/set` so both loaded and unloaded
    threads write through thread-store metadata and app-server emits
    `thread/name/updated` notifications.
    - Updated local thread-store name metadata updates to write sqlite title
    metadata and the legacy thread-name index without appending rollout
    events.
    - Removed state extraction and rollout handling for the deleted
    thread-name event.
    
    ## Validation
    
    - `cargo test -p codex-app-server thread_name_updated_broadcasts`
    - `cargo test -p codex-app-server
    thread_name_set_is_reflected_in_read_list_and_resume`
    - `cargo test -p codex-thread-store
    update_thread_metadata_sets_name_on_active_rollout_and_indexes_name`
    - `cargo test -p codex-state`
    - `cargo check -p codex-mcp-server -p codex-rollout-trace`
    - `just fix -p codex-app-server -p codex-thread-store -p codex-state -p
    codex-mcp-server -p codex-rollout-trace`
    
    ## Docs
    
    No external documentation update is expected for this internal ownership
    change.
  • Inject state DB, agent graph store (#20689)
    ## Why
    
    We want the agent graph store to be passed down the stack as a real
    dependency, the same way we already treat the thread store.
    
    This will let us inject the agent graph store as a real dependency and
    support implementations other than the local SQLite-backed one. Right
    now most code instantiates a state DB and an agent graph store
    just-in-time. Ideally, we would not depend on the state DB directly but
    only read through the higher-level interfaces.
    
    This change makes the dependency boundaries explicit and moves state DB
    initialization to process bootstrap instead of hiding it inside local
    store implementations.
    
    ## What changed
    
    - `ThreadManager` now requires a `StateDbHandle` and an
    `AgentGraphStore` at construction time instead of treating them as
    optional internals.
    - The local store constructors no longer lazily initialize SQLite.
    Callers now initialize the state DB once per process and use that shared
    handle to build:
      - `LocalThreadStore`
      - `LocalAgentGraphStore`
    - App bootstraps (`app-server`, `mcp-server`, `prompt_debug`, and the
    thread-manager sample) now initialize the state DB up front and inject
    the resulting handle down the stack.
    - `app-server` now consistently uses its process-scoped state DB handle
    instead of reopening SQLite or trying to recover it from loaded threads.
    - Device-key storage now reuses the shared state DB handle instead of
    maintaining its own lazy opener.
    - The thread archive / descendant traversal paths now use the injected
    `AgentGraphStore` instead of reaching through local
    thread-store-specific state.
    
    ## Verification
    
    - `cargo check -p codex-core -p codex-thread-store -p codex-app-server
    -p codex-mcp-server -p codex-thread-manager-sample --tests`
    - `cargo test -p codex-thread-store`
    - `cargo test -p codex-core
    thread_manager_accepts_separate_agent_graph_store_and_thread_store --
    --nocapture`
    - `cargo test -p codex-app-server
    thread_archive_archives_spawned_descendants -- --nocapture`
  • state: pass state db handles through consumers (#20561)
    ## Why
    
    SQLite state was still being opened from consumer paths, including lazy
    `OnceCell`-backed thread-store call sites. That let one process
    construct multiple state DB connections for the same Codex home, which
    makes SQLite lock contention and `database is locked` failures much
    easier to hit.
    
    State DB lifetime should be chosen by main-like entrypoints and tests,
    then passed through explicitly. Consumers should use the supplied
    `Option<StateDbHandle>` or `StateDbHandle` and keep their existing
    filesystem fallback or error behavior when no handle is available.
    
    The startup path also needs to keep the rollout crate in charge of
    SQLite state initialization. Opening `codex_state::StateRuntime`
    directly bypasses rollout metadata backfill, so entrypoints should
    initialize through `codex_rollout::state_db` and receive a handle only
    after required rollout backfills have completed.
    
    ## What Changed
    
    - Initialize the state DB in main-like entrypoints for CLI, TUI,
    app-server, exec, MCP server, and the thread-manager sample.
    - Pass `Option<StateDbHandle>` through `ThreadManager`,
    `LocalThreadStore`, app-server processors, TUI app wiring, rollout
    listing/recording, personality migration, shell snapshot cleanup,
    session-name lookup, and memory/device-key consumers.
    - Remove the lazy local state DB wrapper from the thread store so
    non-test consumers use only the supplied handle or their existing
    fallback path.
    - Make `codex_rollout::state_db::init` the local state startup path: it
    opens/migrates SQLite, runs rollout metadata backfill when needed, waits
    for concurrent backfill workers up to a bounded timeout, verifies
    completion, and then returns the initialized handle.
    - Keep optional/non-owning SQLite helpers, such as remote TUI local
    reads, as open-only paths that do not run startup backfill.
    - Switch app-server startup from direct
    `codex_state::StateRuntime::init` to the rollout state initializer so
    app-server cannot skip rollout backfill.
    - Collapse split rollout lookup/list APIs so callers use the normal
    methods with an optional state handle instead of `_with_state_db`
    variants.
    - Restore `getConversationSummary(ThreadId)` to delegate through
    `ThreadStore::read_thread` instead of a LocalThreadStore-specific
    rollout path special case.
    - Keep DB-backed rollout path lookup keyed on the DB row and file
    existence, without imposing the filesystem filename convention on
    existing DB rows.
    - Verify readable DB-backed rollout paths against `session_meta.id`
    before returning them, so a stale SQLite row that points at another
    thread's JSONL falls back to filesystem search and read-repairs the DB
    row.
    - Keep `debug prompt-input` filesystem-only so a one-off debug command
    does not initialize or backfill SQLite state just to print prompt input.
    - Keep goal-session test Codex homes alive only in the goal-specific
    helper, rather than leaking tempdirs from the shared session test
    helper.
    - Update tests and call sites to pass explicit state handles where DB
    behavior is expected and explicit `None` where filesystem-only behavior
    is intended.
    
    ## Validation
    
    - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo check -p
    codex-rollout -p codex-thread-store -p codex-app-server -p codex-core -p
    codex-tui -p codex-exec -p codex-cli --tests`
    - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p
    codex-rollout state_db_`
    - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p
    codex-rollout find_thread_path`
    - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p
    codex-rollout find_thread_path -- --nocapture`
    - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p
    codex-rollout try_init_ -- --nocapture`
    - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p
    codex-rollout`
    - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo clippy -p
    codex-rollout --lib -- -D warnings`
    - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p
    codex-thread-store
    read_thread_falls_back_when_sqlite_path_points_to_another_thread --
    --nocapture`
    - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p
    codex-thread-store`
    - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core
    shell_snapshot`
    - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core
    --test all personality_migration`
    - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core
    --test all rollout_list_find`
    - `RUST_MIN_STACK=8388608 CODEX_SKIP_VENDORED_BWRAP=1
    CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core
    --test all rollout_list_find::find_prefers_sqlite_path_by_id --
    --nocapture`
    - `RUST_MIN_STACK=8388608 CODEX_SKIP_VENDORED_BWRAP=1
    CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core
    --test all rollout_list_find -- --nocapture`
    - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core
    interrupt_accounts_active_goal_before_pausing`
    - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p
    codex-app-server get_auth_status -- --test-threads=1`
    - `CODEX_SKIP_VENDORED_BWRAP=1
    CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p
    codex-app-server --lib`
    - `CODEX_SKIP_VENDORED_BWRAP=1
    CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo check -p codex-rollout
    -p codex-app-server --tests`
    - `CARGO_TARGET_DIR=/tmp/codex-target-state-db just fix -p codex-rollout
    -p codex-thread-store -p codex-core -p codex-app-server -p codex-tui -p
    codex-exec -p codex-cli`
    - `CODEX_SKIP_VENDORED_BWRAP=1
    CARGO_TARGET_DIR=/tmp/codex-target-state-db just fix -p codex-rollout -p
    codex-app-server`
    - `CARGO_TARGET_DIR=/tmp/codex-target-state-db just fix -p
    codex-rollout`
    - `CODEX_SKIP_VENDORED_BWRAP=1
    CARGO_TARGET_DIR=/tmp/codex-target-state-db just fix -p codex-core`
    - `just argument-comment-lint -p codex-core`
    - `just argument-comment-lint -p codex-rollout`
    
    Focused coverage added in `codex-rollout`:
    
    - `recorder::tests::state_db_init_backfills_before_returning` verifies
    the rollout metadata row exists before startup init returns.
    - `state_db::tests::try_init_waits_for_concurrent_startup_backfill`
    verifies startup waits for another worker to finish backfill instead of
    disabling the handle for the process.
    -
    `state_db::tests::try_init_times_out_waiting_for_stuck_startup_backfill`
    verifies startup does not hang indefinitely on a stuck backfill lease.
    -
    `tests::find_thread_path_accepts_existing_state_db_path_without_canonical_filename`
    verifies DB-backed lookup accepts valid existing rollout paths even when
    the filename does not include the thread UUID.
    -
    `tests::find_thread_path_falls_back_when_db_path_points_to_another_thread`
    verifies DB-backed lookup ignores a stale row whose existing path
    belongs to another thread and read-repairs the row after filesystem
    fallback.
    
    Focused coverage updated in `codex-core`:
    
    - `rollout_list_find::find_prefers_sqlite_path_by_id` now uses a
    DB-preferred rollout file with matching `session_meta.id`, so it still
    verifies that valid SQLite paths win without depending on stale/empty
    rollout contents.
    
    `cargo test -p codex-app-server thread_list_respects_search_term_filter
    -- --test-threads=1 --nocapture` was attempted locally but timed out
    waiting for the app-server test harness `initialize` response before
    reaching the changed thread-list code path.
    
    `bazel test //codex-rs/thread-store:thread-store-unit-tests
    --test_output=errors` was attempted locally after the thread-store fix,
    but this container failed before target analysis while fetching `v8+`
    through BuildBuddy/direct GitHub. The equivalent local crate coverage,
    including `cargo test -p codex-thread-store`, passes.
    
    A plain local `cargo check -p codex-rollout -p codex-app-server --tests`
    also requires system `libcap.pc` for `codex-linux-sandbox`; the
    follow-up app-server check above used `CODEX_SKIP_VENDORED_BWRAP=1` in
    this container.
  • [codex] Emit image view as core item (#20512)
    ## Why
    
    Image-view results should be represented as a core-produced turn item
    instead of being reconstructed by app-server. At the same time, existing
    rollout/history paths still understand the legacy `ViewImageToolCall`
    event, so this keeps that event as compatibility output generated from
    the new item lifecycle.
    
    ## What changed
    
    - Added `TurnItem::ImageView` to `codex-protocol`.
    - Emitted image-view item start/completion directly from the core
    `view_image` handler.
    - Kept `ViewImageToolCall` as a legacy event and generate it from
    completed `TurnItem::ImageView` items.
    - Kept `thread_history.rs` on the legacy `ViewImageToolCall` replay
    path, with `ImageView` item lifecycle events ignored there.
    - Updated app-server protocol conversion, rollout persistence, and
    affected exhaustive event matches for the new item plus legacy fan-out
    shape.
    
    ## Verification
    
    - `cargo test -p codex-protocol -p codex-app-server-protocol -p
    codex-rollout -p codex-rollout-trace -p codex-mcp-server -p
    codex-app-server --lib`
    - `cargo test -p codex-core --test all
    view_image_tool_attaches_local_image`
    - `just fix -p codex-protocol -p codex-core -p codex-app-server-protocol
    -p codex-app-server -p codex-rollout -p codex-rollout-trace -p
    codex-mcp-server`
    - `git diff --check`
  • Make thread store process-scoped (#19474)
    - Build one app-server process ThreadStore from startup config and share
    it with ThreadManager and CodexMessageProcessor.
    - Remove per-thread/fork store reconstruction so effective thread config
    cannot switch the persistence backend.
    - Add params to ThreadStore create/resume for specifying thread
    metadata, since otherwise the metadata from store creation would be used
    (incorrectly).
  • [codex] Remove unused event messages (#20511)
    ## Why
    
    Several legacy `EventMsg` variants were still emitted or mapped even
    though clients either ignored them or had moved to item/lifecycle
    events. `Op::Undo` had also degraded to an unavailable shim, so this
    removes that dead task path instead of preserving a command that cannot
    do useful work.
    
    `McpStartupComplete`, `WebSearchBegin`, and `ImageGenerationBegin` are
    intentionally kept because useful consumers still depend on them: MCP
    startup completion drives readiness behavior, and the begin events let
    app-server/core consumers surface in-progress web-search and
    image-generation items before the final payload arrives.
    
    ## What Changed
    
    - Removed weak legacy event variants and payloads from `codex-protocol`,
    including legacy agent deltas, background events, and undo lifecycle
    events.
    - Kept/restored `EventMsg::McpStartupComplete`,
    `EventMsg::WebSearchBegin`, and `EventMsg::ImageGenerationBegin` with
    serializer and emission coverage.
    - Updated core, rollout, MCP server, app-server thread history,
    review/delegate filtering, and tests to rely on the useful replacement
    events that remain.
    - Removed `Op::Undo`, `UndoTask`, the undo test module, and stale TUI
    slash-command comments.
    - Stopped agent job/background progress and compaction retry notices
    from emitting `BackgroundEvent` payloads.
    
    ## Verification
    
    - `cargo check -p codex-protocol -p codex-app-server-protocol -p
    codex-core -p codex-rollout -p codex-rollout-trace -p codex-mcp-server`
    - `cargo test -p codex-protocol -p codex-app-server-protocol -p
    codex-rollout -p codex-rollout-trace -p codex-mcp-server`
    - `cargo test -p codex-core --test all suite::items`
    - `just fix -p codex-protocol -p codex-app-server-protocol -p codex-core
    -p codex-rollout -p codex-rollout-trace -p codex-mcp-server`
    - Earlier coverage on this PR also included `codex-mcp`, `codex-tui`,
    core library tests, MCP/plugin/delegate/review/agent job tests, and MCP
    startup TUI tests.
  • Reduce the surface of collaboration modes (#20149)
    Collaboration modes were slightly invasive both into ThreadManager
    construction and ModelProvider
  • Add ThreadManager sample crate (#20141)
    Summary:
    - Add codex-thread-manager-sample, a one-shot binary that starts a
    ThreadManager thread, submits a prompt, and prints the final assistant
    output.
    - Pass ThreadStore into ThreadManager::new and expose
    thread_store_from_config for existing callsites.
    - Build the sample Config directly with only --model and prompt inputs.
    
    Verification:
    - just fmt
    - cargo check -p codex-thread-manager-sample -p codex-app-server -p
    codex-mcp-server
    - git diff --check
    
    Tests: Not run per request.
  • Add environment provider snapshot (#20058)
    ## Summary
    - Change `EnvironmentProvider` to return concrete `Environment`
    instances instead of `EnvironmentConfigurations`.
    - Make `DefaultEnvironmentProvider` provide the provider-visible `local`
    environment plus optional `remote` environment from
    `CODEX_EXEC_SERVER_URL`.
    - Keep `EnvironmentManager` as the concrete cache while exposing its own
    explicit local environment for `local_environment()` fallback paths.
    
    ## Validation
    - `just fmt`
    - `git diff --check`
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • permissions: make SessionConfigured profile-only (#19774)
    ## Why
    
    `SessionConfiguredEvent` is the internal event that tells clients what
    permissions are active for a session. Emitting both `sandbox_policy` and
    `permission_profile` leaves two possible authorities and forces every
    consumer to decide which one to honor. At this point in the migration,
    the profile is expressive enough to represent managed, disabled, and
    external sandbox enforcement, so the internal event can be profile-only.
    
    The wire compatibility concern is older serialized events or rollout
    data that only contain `sandbox_policy`; those still need to
    deserialize.
    
    ## What Changed
    
    - Removes `sandbox_policy` from `SessionConfiguredEvent` and makes
    `permission_profile` required.
    - Adds custom deserialization so old payloads with only `sandbox_policy`
    are upgraded to a cwd-anchored `PermissionProfile`.
    - Updates core event emission and TUI session handling to sync
    permissions from the profile directly.
    - Updates app-server response construction to derive the legacy
    `sandbox` response field from the active thread snapshot instead of from
    `SessionConfiguredEvent`.
    - Updates yolo-mode display logic to treat both
    `PermissionProfile::Disabled` and managed unrestricted filesystem plus
    enabled network as full-access, while still preserving the distinction
    between no sandbox and external sandboxing.
    
    ## Verification
    
    - `cargo test -p codex-protocol session_configured_event --lib`
    - `cargo test -p codex-protocol serialize_event --lib`
    - `cargo test -p codex-exec session_configured --lib`
    - `cargo test -p codex-app-server
    thread_response_permission_profile_preserves_enforcement --lib`
    - `cargo test -p codex-core
    session_configured_reports_permission_profile_for_external_sandbox
    --lib`
    - `cargo test -p codex-tui session_configured --lib`
    - `cargo test -p codex-tui
    yolo_mode_includes_managed_full_access_profiles --lib`
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19774).
    * #19900
    * #19899
    * #19776
    * #19775
    * __->__ #19774
  • refactor: make auth loading async (#19762)
    ## Summary
    
    Auth loading used to expose synchronous construction helpers in several
    places even though some auth sources now need async work. This PR makes
    the auth-loading surface async and updates the callers to await it.
    
    This is intentionally only plumbing. It does not change how
    AgentIdentity tokens are decoded, how task runtime ids are allocated, or
    how JWT signatures are verified.
    
    ## Stack
    
    1. **This PR:** [refactor: make auth loading
    async](https://github.com/openai/codex/pull/19762)
    2. [refactor: load AgentIdentity runtime
    eagerly](https://github.com/openai/codex/pull/19763)
    3. [feat: verify AgentIdentity JWTs with
    JWKS](https://github.com/openai/codex/pull/19764)
    
    ## Important call sites
    
    | Area | Change |
    | --- | --- |
    | `codex-login` auth loading | `CodexAuth` and `AuthManager`
    construction paths now await auth loading. |
    | app-server startup | Auth manager construction is awaited during
    initialization. |
    | CLI/TUI/exec/MCP/chatgpt callers | Existing auth-loading calls now
    await the same behavior. |
    | cloud requirements storage loader | The loader becomes async so it can
    share the same auth construction path. |
    | auth tests | Tests that load auth now run in async contexts. |
    
    ## Testing
    
    Tests: targeted Rust auth test compilation, formatter, scoped Clippy
    fix, and Bazel lock check.