109 Commits

  • feat(app-server): add history_mode to thread (#29927)
    ## Description
    
    This PR adds a new `historyMode = "legacy" | "paginated"` to `Thread`.
    This will be stored in `SessionMeta` in the JSONL rollout file and as a
    new column in the SQLite thread_metadata table, and exposed on
    `thread/start` and on the `Thread` object in app-server.
    
    ## What changed
    
    - Added canonical `ThreadHistoryMode` with `legacy` and `paginated`,
    defaulting old and new SessionMeta to `legacy`.
    - Carried `history_mode` through core session config, ThreadStore stored
    metadata, local/in-memory stores, rollout metadata extraction, and the
    existing SQLite `threads` table.
    - Added experimental `historyMode` to app-server v2 `Thread` and
    `thread/start`.
    - Made paginated stored threads metadata-discoverable but unsupported
    for legacy full-history reads, `load_history`, live resume, and create
    paths.
    - Regenerated app-server schema fixtures and added
    protocol/state/thread-store/app-server coverage for persistence and
    fail-closed behavior.
    
    ## Compatibility floor
    Because users may be running various versions of Codex binaries on the
    same machine (TUI, Codex App, etc.), we will need to establish a
    compatibility floor for upcoming paginated threads, which will change
    how thread storage reads and writes work.
    
    The overall plan here:
    ```
    Release N:
    - Add historyMode to SessionMeta / Thread / SQLite metadata.
    - Teach binaries to understand paginated threads.
    - If a binary sees `historyMode="paginated"` but does not support the paginated contract, it refuses to resume/mutate the thread.
    - Default remains `"legacy"`.
    
    Release N+1:
    - First-party clients start opting into paginated threads where appropriate.
    - Internal dogfood / staged rollout.
    - Measure old-client usage and paginated-thread unsupported errors.
    
    Release N+2:
    - Only after Release N+ is overwhelmingly deployed, make paginated the default.
    - Accept that a small tail of N-1-or-older binaries may not understand paginated threads.
    ```
    
    The important behavior change is fail-closed handling for a binary that
    encounters a persisted `paginated` thread before it knows how to fully
    support paginated history. In app-server, if a thread is `paginated`, we
    will:
    
    - allow metadata-only discovery paths like `thread/list` and
    `thread/read(includeTurns=false)`, so clients can still see the thread
    and inspect its `historyMode`
    - reject legacy full-history/live-thread paths like
    `thread/read(includeTurns=true)` and `thread/resume` with an unsupported
    JSON-RPC error
    - avoid silently treating an unknown or future `historyMode` as `legacy`
    
    Under the hood, the ThreadStore layer also rejects legacy operations
    that would need to load or replay the full thread history for a
    paginated thread. That gives us the behavior we want for Release N:
    future paginated threads are visible, but this binary fails closed
    instead of trying to operate on them as if they were legacy threads.
  • [codex] Attribute app-server analytics by thread originator (#29935)
    ## Why
    
    Desktop Work threads and regular Codex threads can share the same
    app-server connection. App-server analytics currently copy
    `product_client_id` from connection metadata for every thread-scoped
    event, so Work thread activity is attributed to the Desktop connection
    instead of the thread's resolved originator. This prevents analytics
    from distinguishing the two products on a shared connection.
    
    ## What changed
    
    - Publish the resolved originator after a thread is materialized,
    covering new, resumed, forked, and subagent threads.
    - Store that originator in the analytics reducer's existing per-thread
    state.
    - Override only `app_server_client.product_client_id` for thread, turn,
    tool, review, goal, guardian, and compaction events while preserving the
    connection's client name, version, and transport metadata.
    - Fall back to the connection-wide product client ID when a thread has
    no originator override.
    - Preserve persisted originators in thread initialization analytics for
    resume and fork flows.
    
    ## Validation
    
    - `just test -p codex-analytics
    thread_originator_overrides_shared_connection_across_thread_events
    subagent_events_keep_thread_originator_with_explicit_turn_connection`
    - `just test -p codex-app-server
    turn_start_tracks_thread_originator_in_analytics
    thread_start_tracks_thread_initialized_analytics
    thread_fork_tracks_thread_initialized_analytics
    thread_resume_tracks_thread_initialized_analytics`
    - `just test -p codex-core thread_manager`
  • [plugins] Track plugin install requests by ID (#29684)
    Summary
    - Emit `codex_plugin_install_requested` when a validated plugin install
    request is made, before the user accepts or declines the elicitation.
    - Record the exact model-visible plugin ID, remote plugin ID, required
    connector IDs, stable suggestion ID, and `endpoint_recommendation` vs
    `legacy_discovery` source.
    - Keep `suggest_reason` out of telemetry and leave connector-only
    install requests unchanged.
    
    Rollout
    - Backend/schema dependency:
    https://github.com/openai/openai/pull/1065270
    - Land the backend PR before this producer starts sending the event.
    
    Validation
    - `just test -p codex-analytics` (83 passed)
    - `just test -p codex-core request_plugin_install` (17 passed)
    - `just fix -p codex-analytics`
    - `just fix -p codex-core`
    - `just fmt`
    - `git diff --check`
  • Support thread-level originator overrides (#29477)
    ## Why
    
    Work(TPP) threads can be launched from the Desktop app, but if they all
    keep the Desktop app's default originator then downstream attribution
    cannot distinguish local Work launches from cloud-backed Work launches.
    `thread/start.serviceName` already carries that launch signal, while
    `SessionMeta.originator` is the durable thread-level value that survives
    resume and fork.
    
    This change converts the Desktop Work service names into an effective
    originator at thread creation time, persists that originator with the
    thread, and keeps using it for later model requests and memory writes.
    
    ## What changed
    
    - Map `CODEX_WORK_LOCAL` and `CODEX_WORK_CLOUD` service names to
    per-thread originators, while preserving
    `CODEX_INTERNAL_ORIGINATOR_OVERRIDE` as the highest-precedence override.
    - Persist the effective originator in `SessionMeta.originator`, read it
    back on resume/fork, and inherit the parent originator for subagent
    spawns when there is no persisted session metadata.
    - Handle truncated `SpawnAgentForkMode::LastNTurns` forks by falling
    back to the live parent originator when the forked history no longer
    includes `SessionMeta`.
    - Thread the per-thread originator through Responses headers,
    websocket/compaction request paths, thread-store creation, rollout
    metadata, and memory stage-one telemetry.
    
    ## Verification
    
    - `just test -p codex-core
    agent::control::tests::spawn_thread_subagent_inherits_parent_originator_without_fork
    agent::control::tests::spawn_thread_subagent_fork_last_n_turns_inherits_parent_originator_without_session_meta
    thread_manager::tests::originator_override_precedes_service_name_remapping`
    - `just test -p codex-core
    agent::control::tests::resume_thread_subagent_restores_stored_metadata_and_effective_multi_agent_mode`
    - `just test -p codex-memories-write`
    - `just fix -p codex-core -p codex-memories-write`
    - `git diff --check`
  • [codex] rename rollout budget error to session budget error (#29744)
    ## Summary
    
    - rename the rollout-budget exhaustion error from
    `RolloutBudgetExceeded` to `SessionBudgetExceeded`
    - expose the matching app-server v2 wire value as
    `sessionBudgetExceeded`
    - regenerate JSON/TypeScript schema fixtures and update the app-server
    docs and focused tests
    
    This is a naming-only follow-up to #29715 based on [Pavel's review
    suggestion](https://github.com/openai/codex/pull/29715#discussion_r3463183480).
    Runtime behavior is unchanged.
    
    ## Tests
    
    - `just test -p codex-core rollout_budget`
    - `just test -p codex-app-server-protocol`
    - `just fmt`
    - `just write-app-server-schema`
  • [codex] surface rollout budget exhaustion (#29715)
    ## Summary
    - surface shared rollout-budget exhaustion as
    `CodexErr::RolloutBudgetExceeded` instead of a generic interrupted turn
    - map it through the existing `CodexErrorInfo` and app-server v2
    `codexErrorInfo` path
    - keep local compaction from retrying after the shared rollout budget is
    exhausted
    
    This gives app-server clients a stable `rolloutBudgetExceeded` error
    they can classify without guessing from `status="interrupted"`.
    
    ## Tests
    - `just test -p codex-core rollout_budget`
  • Separate local and remote plugin analytics IDs (#29495)
    ## Why
    
    Plugin analytics overloaded `plugin_id`: most events used the Codex
    `<plugin>@<marketplace>` identity, while remote install events used the
    backend plugin ID. That makes the same field change meaning across event
    types and complicates downstream identity resolution.
    
    This change makes the contract unambiguous:
    
    - `plugin_id`: the local Codex `<plugin>@<marketplace>` identity, when
    resolved
    - `remote_plugin_id`: the backend plugin identity, when available
    
    For a remote install failure that happens before plugin details resolve,
    `plugin_id` is `null` and `remote_plugin_id` remains populated.
    
    ## What changed
    
    All six plugin analytics events use the same identity contract:
    
    - `codex_plugin_installed`
    - `codex_plugin_install_failed`
    - `codex_plugin_uninstalled`
    - `codex_plugin_enabled`
    - `codex_plugin_disabled`
    - `codex_plugin_used`
    
    Remote identity is resolved from the current installed-plugin snapshot
    first, with persisted install metadata as fallback. The telemetry
    metadata type keeps local identity optional for failures that occur
    before remote details are available.
    
    The app-server test client's manual analytics smokes now find remote
    mutation events through `remote_plugin_id` and validate that `plugin_id`
    remains local.
    
    ## Remote uninstall
    
    Resolve and capture telemetry metadata before removing the local plugin
    cache, then emit `codex_plugin_uninstalled` after the backend confirms
    success. The event is also emitted when backend uninstall succeeds but
    local cache cleanup reports `CacheRemove`.
    
    If a concurrent remote-cache refresh removes the local bundle before
    telemetry capture, the already-fetched remote plugin detail supplies
    fallback capability metadata.
    
    ## Validation
    
    - `just test -p codex-analytics` — 82 passed
    - `just test -p codex-core-plugins` — 271 passed
    - `just test -p codex-app-server-test-client` — 5 passed
    - `just test -p codex-plugin` — 3 passed
    - `just test -p codex-app-server plugin_install` — 37 passed
    - `just test -p codex-app-server plugin_uninstall` — 10 passed
    
    The production app-server install/uninstall flow was also exercised
    against `plugins~Plugin_f1b845ac33888191ac156169c58733c2`
    (`build-ios-apps@openai-curated-remote`), and the plugin's original
    uninstalled state was restored.
  • core: add extra metadata field to Thread struct (#29675)
    # Summary
    
    Adds a field Thread.extras that can be used to hold arbitrary metadata
    specific to a given thread.
  • chore(core) rm AskForApproval::OnFailure (#28418)
    ## Summary
    Deletes the OnFailure variant of the `AskForApproval` enum. This option
    has been deprecated since #11631.
    
    ## Testing
    - [x] Tests pass
  • [codex] Centralize Plugin Analytics Metadata (#27102)
    This PR moves construction of `PluginTelemetryMetadata` from loader and
    model helpers into `PluginsManager`, which already owns installed plugin
    state and will eventually perform remote identity enrichment. The
    metadata type remains in `codex-plugin`, and serialized analytics events
    remain unchanged.
    
    ## Before
    
    ```mermaid
    flowchart LR
        subgraph Events["Analytics event paths"]
            direction TB
            Lifecycle["Local install / uninstall"]
            Config["Enable / disable"]
            Remote["Remote install"]
            Used["Plugin used"]
        end
    
        subgraph Construction["Metadata construction"]
            direction TB
            Loader["Loader telemetry helpers"]
            Summary["PluginCapabilitySummary::telemetry_metadata"]
            Override["Caller adds remote_plugin_id"]
        end
    
        Metadata["PluginTelemetryMetadata"]
    
        Lifecycle --> Loader
        Config --> Loader
        Remote --> Loader
        Loader -->|"local events"| Metadata
        Loader -->|"remote install"| Override
        Override --> Metadata
        Used --> Summary
        Summary --> Metadata
    ```
    
    Telemetry metadata was constructed through loader helpers, a
    capability-summary method, and a remote-install call-site override.
    
    ## After
    
    ```mermaid
    flowchart LR
        subgraph Events["Analytics event paths"]
            direction TB
            Lifecycle["Local install / uninstall"]
            Config["Enable / disable"]
            Remote["Remote install"]
            Used["Plugin used"]
        end
    
        Manager["PluginsManager — single construction owner"]
        Metadata["PluginTelemetryMetadata"]
    
        Lifecycle --> Manager
        Config --> Manager
        Remote -->|"authoritative remote ID"| Manager
        Used -->|"capability summary"| Manager
        Manager --> Metadata
    ```
    
    Every analytics path delegates metadata construction to
    `PluginsManager`. Remote install still supplies its authoritative
    backend ID explicitly.
    
    ## What Changes
    
    - Make loader code return a focused plugin capability summary instead of
    constructing analytics metadata.
    - Centralize immutable plugin telemetry metadata construction in
    `PluginsManager`.
    - Route local install/uninstall, remote install, enable/disable, and
    plugin-used emitters through the manager.
    - Preserve the current serialized analytics contract exactly.
    
    Normal metadata still has no remote override. Remote install continues
    to provide its authoritative backend ID explicitly, so the existing
    serializer continues reporting that ID through `plugin_id`.
    Snapshot-based enrichment is intentionally deferred to the final PR.
    
    ## Testing
    
    - `just test -p codex-core-plugins` (238 tests passed)
    - `just test -p codex-plugin` (3 tests passed)
    - Scoped Clippy/compile checks passed for `codex-plugin`,
    `codex-core-plugins`, `codex-app-server`, and `codex-core`.
    
    ## Split Overview
    
    ```text
    main
    ├── #27093  Debug analytics capture                 (merged)
    ├── #27099  Non-mutating plugin smoke               (merged)
    ├── #27100  Remote install/uninstall smoke          (merged)
    └── #27102  Plugin telemetry metadata refactor      ← you are here
        └── #27669  Persist remote plugin identity
    
    After #27102 and #27669 merge:
    └── Final PR: add explicit local and remote IDs to plugin analytics
    ```
    
    Review order and dependencies:
    
    1. [#27093 Add debug-only analytics event
    capture](https://github.com/openai/codex/pull/27093) (merged)
    2. [#27099 Add a plugin analytics smoke
    workflow](https://github.com/openai/codex/pull/27099) (merged)
    3. [#27100 Add a remote plugin analytics mutation smoke
    workflow](https://github.com/openai/codex/pull/27100) (merged)
    4. This metadata refactor, independent and based on `main`
    5. [#27669 Persist remote plugin
    identity](https://github.com/openai/codex/pull/27669), stacked on this
    PR
    6. Final remote-ID behavior PR, created after the prerequisites merge
    
    The original [#26281](https://github.com/openai/codex/pull/26281)
    remains open as the aggregate reference until the final replacement PR
    is published.
  • Expose thread-level multi-agent mode (#28792)
    ## Why
    
    Once multi-agent mode can be selected per turn, clients also need to
    choose the initial selection when creating a thread and observe that
    selection through lifecycle and settings APIs.
    
    The selected value is intentionally distinct from the effective
    model-visible value: no client selection is represented as `null`, even
    though an eligible multi-agent v2 turn derives `explicitRequestOnly` as
    its effective default.
    
    ## What changed
    
    - Add the optional experimental `thread/start.multiAgentMode` parameter
    and pass it through thread creation.
    - Preserve an omitted initial value as an unset selection rather than
    eagerly storing `explicitRequestOnly`.
    - Apply an explicit `thread/start` selection to the first turn through
    the session configuration established at thread creation.
    - Restore the latest persisted effective mode as the selected baseline
    on cold resume when rollout history contains one.
    - Inherit the optional selected mode from a loaded parent when creating
    related runtime threads.
    - Return the current selected `multiAgentMode` from `thread/start`,
    `thread/resume`, `thread/fork`, and thread settings, using `null` when
    no mode is selected.
    - Keep lifecycle reporting independent from model capability and feature
    eligibility; core turn construction remains responsible for calculating
    and persisting the effective mode.
    
    ## Not covered
    
    - Clearing an existing loaded-session selection back to unset through
    `turn/start`; omitted or `null` currently retains the session's
    selection.
    - A TUI control, slash command, or `config.toml` preference.
    
    ## Verification
    
    - `CARGO_INCREMENTAL=0 just test -p codex-app-server-protocol`
    - `CARGO_INCREMENTAL=0 just test -p codex-app-server multi_agent_mode`
    
    The focused app-server coverage verifies explicit `thread/start`
    initialization, first-turn prompting, nullable reporting for an omitted
    selection, and retention of selections that are not currently
    runtime-eligible.
    
    ## Stack
    
    Stacked on #28685. This PR contains only the thread initialization and
    lifecycle/settings API layer.
  • Emit Trusted MCP App Identity on Tool-Call Items (#27132)
    ## Summary
    
    - Add optional `appContext` to app-server MCP tool-call items with
    trusted `connectorId`, `linkId`, and `mcpAppResourceUri` metadata.
    - Preserve that context across tool-call events, persisted history,
    reconnects, and thread resume.
    - Keep the deprecated top-level `mcpAppResourceUri` temporarily for
    client migration.
    
    The consumer contract is `{ appContext: { connectorId, linkId,
    mcpAppResourceUri }, tool }`.
    
    ## Validation
    
    - Full GitHub Actions suite passes, including CLA, Bazel tests, clippy,
    release builds, and argument-comment lint.
    
    ---------
    
    Co-authored-by: martinauyeung-oai <280153141+martinauyeung-oai@users.noreply.github.com>
  • Support openai/form extended form elicitations (#27500)
    # Summary
    Allow App Server clients to opt into `openai/form` MCP elicitations.
  • unified-exec: retain PathUri in command events (#28780)
    ## Why
    
    App-server must report command events containing foreign-platform paths
    without changing existing client or rollout path-string formats.
    
    ## What changed
    
    - retain `PathUri` through exec command begin/end events
    - convert cwd values to `LegacyAppPathString` at the app-server
    compatibility boundary
    - drop command actions with foreign paths and log them
    - serialize rollout-trace cwd values using their inferred native path
    representation
    - restore Wine coverage for retained Windows cwd values and successful
    completion
  • [codex] Track plugin install and import telemetry failures (#28731)
    ## Summary
    - Track plugin install failures through the unified
    `codex_plugin_install_failed` event for local installs, remote install
    preflight failures, bundle failures, and remote catalog/backend
    failures.
    - Send classified `error_type` values in plugin install failure
    analytics instead of raw error strings.
    - Stop sending raw external-agent import errors in analytics while
    preserving raw failure details in app-facing import
    notifications/history.
    - Keep raw plugin/migration diagnostics in `tracing::warn!` logs.
    - Keep remote failure plugin names as the existing local placeholder
    (`unknown`) and remove the extra telemetry plugin-name override.
    - Change `ExternalAgentConfigImportParams.source` from a generated enum
    to `string | null`, with legacy `claudeCode` / `claudeCowork` inputs
    normalized to existing analytics values.
    
    ## Testing
  • [codex] Restore thread recency with compatible migration history (#28671)
    ## Summary
    
    - Revert #28655, restoring the thread `recencyAt` behavior introduced by
    #27910.
    - Move `threads_recency_at` to migration 0039 so it no longer collides
    with `external_agent_config_imports` at version 0038.
    - Repair databases that already applied the recency migration as version
    38 by moving the matching migration-history row to version 39 before
    SQLx validation. The current version-38 migration can then apply
    normally.
    
    ## Validation
    
    - `just test -p codex-state
    migrations::tests::repairs_recency_migration_that_was_applied_as_version_38`
    - `just test -p codex-state -p codex-rollout -p codex-thread-store -p
    codex-app-server-protocol -p codex-tui`: 3,439 passed; six TUI tests
    could not open the machine's existing read-only incident database at
    `~/.codex/sqlite/state_5.sqlite`.
    - `just fix -p codex-state`
    - `just fmt`
    - Verified that state migration versions are unique.
  • Scope command approvals by execution environment (#28738)
    ## Why
    
    Command approval cache keys included the command and working directory,
    but not the execution environment. An approval for `/workspace` locally
    could therefore be reused for the same command and path on an executor.
    
    ## What changed
    
    - Include the selected environment ID in shell and unified-exec approval
    cache keys.
    - Carry that ID through the normal command approval request so clients
    can show which environment is being approved.
    - Expose the environment through app-server as a required nullable
    `environmentId` and show it in the inline TUI approval prompt.
    - Keep older recorded approval events compatible when the environment is
    absent.
    
    For example, `echo ok` in local `/workspace` and `echo ok` in executor
    `/workspace` now produce different approval keys and separate prompts.
    
    ## Scope
    
    This PR does not change network approvals, Guardian review actions, MCP
    elicitation, full-screen TUI rendering, or environment-ID validation.
    Remote `shell_command` execution itself remains in #28722; this PR only
    makes its approval key environment-aware.
  • Revert thread recencyAt for sidebar ordering (#28655)
    ## Why
    
    Revert #27910 to remove the newly introduced thread `recencyAt`
    persistence and API behavior from `main`.
    
    ## What changed
    
    This reverts commit `fac3158c2a783095768076489815f361fa9b0db4`,
    including the state migration, thread-store propagation, app-server API
    surface, generated schemas, and related tests.
    
    ## Validation
    
    Not run before opening; relying on CI for the initial fast signal.
  • Add thread recencyAt for sidebar ordering (#27910)
    ## Summary
    
    Add a server-owned `recencyAt` timestamp and `recency_at` thread-list
    sort key for product recency ordering while preserving the existing
    meaning of `updatedAt` as the latest persisted thread mutation.
    
    This is the server-side alternative to #27697. Rather than narrowing
    `updatedAt`, clients can sort the sidebar by `recency_at` and continue
    treating `updatedAt` as mutation time.
    
    Paired Codex Apps PR:
    [openai/openai#1024599](https://github.com/openai/openai/pull/1024599)
    
    ## Contract
    
    - `recencyAt` initializes when a thread is created.
    - A turn start advances `recencyAt` monotonically.
    - Commentary, agent output, tool results, token/accounting updates, turn
    completion, archive, unarchive, resume, and generic metadata writes do
    not advance it.
    - `updatedAt` retains its existing behavior and continues to advance for
    persisted thread mutations.
    - Current servers populate `recencyAt`; the response field is optional
    in generated TypeScript so clients connected to older servers can fall
    back to `updatedAt`.
    - Filesystem-only fallback uses existing updated/mtime ordering when
    SQLite is unavailable.
    
    ## Persistence and compatibility
    
    Migration 0038 adds second- and millisecond-precision recency columns,
    backfills them from the existing updated timestamp, creates list
    indexes, and includes an insert trigger so older binaries writing to a
    migrated database seed recency without causing later mutations to
    advance it.
    
    Generic metadata upserts preserve existing recency values. Turn-start
    updates use a dedicated monotonic touch, and process-local allocation
    keeps millisecond cursor values unique. State DB list, search, read,
    filtered-list repair, rollout fallback propagation, and app-server
    conversions all carry the new field.
    
    ## API
    
    `Thread` responses include:
    
    ```ts
    recencyAt?: number
    ```
    
    `thread/list` and `thread/search` accept:
    
    ```json
    { "sortKey": "recency_at" }
    ```
    
    Generated TypeScript and JSON schemas are included.
    
    ## Validation
    
    - `just test -p codex-state` — 146 passed
    - `just test -p codex-rollout` — 69 passed
    - `just test -p codex-thread-store` — 81 passed
    - `just test -p codex-app-server-protocol` — 231 passed
    - Focused app-server list ordering, response mapping, archive/unarchive,
    and resume lifecycle tests passed
    - Scoped `just fix` for state, rollout, thread-store,
    app-server-protocol, and app-server
    - `just fmt`
    - `git diff --check`
    - Independent correctness, simplicity, elegance, security, and
    test-quality reviews; actionable ordering, lifecycle, query-projection,
    and timestamp-uniqueness findings were addressed
  • [codex] Add interruptible sleep tool (#28429)
    ## Why
    
    Models sometimes need to pause briefly while waiting for external work,
    but using a shell command for that delay ties the wait to a process and
    does not naturally resume when new turn input arrives.
    
    ## What changed
    
    - add a built-in `sleep` tool behind the under-development `sleep_tool`
    feature
    - accept a bounded `duration_ms` argument, matching the millisecond
    convention used by unified exec
    - end the sleep early when either steered user input or mailbox input
    arrives
    - include elapsed wall-clock time in completed and interrupted outputs
    - emit a dedicated core `SleepItem` through `item/started` and
    `item/completed`
    - expose the sleep item as app-server v2 `ThreadItem::Sleep` and retain
    it in reconstructed thread history
    - regenerate the configuration schema for the new feature flag
    - regenerate app-server JSON and TypeScript schema fixtures
    
    ## Test plan
    
    - `just test -p codex-core sleep_tool_follows_feature_gate`
    - `just test -p codex-core any_new_input_interrupts_sleep`
    - `just test -p codex-app-server-protocol`
    - `just test -p codex-app-server
    sleep_emits_started_and_completed_items`
  • [codex-analytics] Analytics Capture to File in Debug Builds (#27093)
    ## This PR
    
    The original [combined remote plugin analytics PR
    #26281](https://github.com/openai/codex/pull/26281) mixed reusable
    analytics test infrastructure, two manual smoke workflows, a metadata
    refactor, and the final identity behavior. This PR isolates the generic
    capture mechanism so it can be reviewed and landed before any
    plugin-specific behavior.
    
    - Add a debug-only analytics destination that writes final request
    payloads as JSONL.
    - Suppress HTTP delivery whenever capture mode is selected, including
    after capture write failures.
    - Keep release behavior unchanged even when the capture environment
    variable is present.
    - Keep the mechanism generic; this PR contains no plugin-specific
    behavior.
    
    Set `CODEX_ANALYTICS_EVENTS_CAPTURE_FILE=/path/events.jsonl` when
    running a debug Codex binary to inspect the exact batched payload that
    would otherwise be sent to the analytics endpoint.
    
    ## Testing
    
    - `just test -p codex-analytics` (76 passed)
    - `just test --release -p codex-analytics` (73 passed)
    - CI is green across the required platform matrix.
    
    ## Split Overview
    
    ```text
    main
    ├── #27093  Debug analytics capture                 ← you are here
    │   └── #27099  Non-mutating plugin smoke
    │       └── #27100  Remote install/uninstall smoke
    └── #27102  Plugin telemetry metadata refactor
    
    After #27093, #27099, #27100, and #27102 merge:
    └── Final PR: add remote_plugin_id to plugin analytics
    ```
    
    Review order and dependencies:
    
    1. [#27093 Add debug-only analytics event
    capture](https://github.com/openai/codex/pull/27093) **(this PR, based
    on `main`)**
    2. [#27099 Add a plugin analytics smoke
    workflow](https://github.com/openai/codex/pull/27099) (stacked on
    #27093)
    3. [#27100 Add a remote plugin analytics mutation smoke
    workflow](https://github.com/openai/codex/pull/27100) (stacked on
    #27099)
    4. [#27102 Centralize plugin telemetry metadata
    construction](https://github.com/openai/codex/pull/27102) (independent,
    based on `main`)
    5. Final remote-ID behavior PR (created after PRs 1-4 merge)
    
    The original [#26281](https://github.com/openai/codex/pull/26281)
    remains open as the green aggregate reference until the final PR is
    published.
  • Add Guardian catalog diagnostics metadata (#27109)
    ## Why
    
    We need request-level evidence for Guardian cases where
    `codex-auto-review` is missing from the client-side model catalog and
    the review falls back to the parent model.
    
    ## What changed
    
    - Add `guardian_catalog_contains_auto_review` to Guardian Responses API
    client metadata.
    - Add `guardian_model_provider_id` to Guardian Responses API client
    metadata.
    - Keep review-session metadata optional so callers without metadata
    preserve the existing `None` path.
    - Add tests for override, normal preferred-model, and
    missing-auto-review-catalog behavior.
    
    ## Validation
    
    - `just test -p codex-core
    guardian_review_records_missing_auto_review_model_in_request_metadata`
    - `just test -p codex-core
    guardian_review_uses_model_catalog_override_when_preferred_review_model_exists`
    - `just test -p codex-core
    guardian_review_uses_preferred_review_model_without_model_catalog_override`
    - `git diff --check origin/main`
  • Emit plugin ID on MCP tool call analytics events (#27483)
    MCP tool-call items already carry the runtime-resolved plugin owner, but
    the analytics reducer dropped that field. Forwarding the existing value
    provides direct attribution without downstream server-name inference.
    
    ## Summary
    
    - emit `plugin_id` on `codex_mcp_tool_call_event` payloads
    - preserve `null` for MCP calls without a plugin owner
    - verify the serialized field through the MCP item lifecycle test
    
    ## Test
    
    - `cd codex-rs && just test -p codex-analytics`
    - `cd codex-rs && just fix -p codex-analytics`
    - `cd codex-rs && just fmt`
  • [codex-analytics] Emit structured compaction codex errors (#27082)
    ## Summary
    - replace raw compaction `error` analytics with `codex_error_kind` and
    `codex_error_http_status_code`
    - derive compaction error telemetry from `CodexErr` using the same
    `CodexErrKind` mapping and HTTP status helper used by turn events
    - remove the pre-compact hook stop reason from the internal compaction
    outcome now that it is no longer emitted as raw analytics text
    
    ## Why
    Compaction `error` was a raw `CodexErr::to_string()` value, which can
    carry free-form provider or user-derived text. Structured Codex error
    fields preserve useful low-cardinality telemetry without sending the raw
    string.
    
    ## Validation
    - `just fmt`
    - `just test -p codex-analytics`
    - `just test -p codex-core
    compact::tests::build_token_limited_compacted_history_appends_summary_message`
    
    Attempted `just test -p codex-core`; the changed crate compiled, but the
    full target failed in unrelated environment-dependent tests such as
    missing helper binaries and shell snapshot timeouts.
  • [codex-analytics] report cached input tokens for v2 compaction (#27103)
    ## Summary
    
    - add nullable `cached_input_tokens` to the compaction analytics event
    - populate it from response usage for compaction v2
    - leave it `null` for other compaction implementations
    
    This adds visibility into prompt-cache usage for v2 compaction without
    changing compaction behavior.
    
    ## Testing
    
    - `just test -p codex-analytics`
    - `just test -p codex-core
    collect_compaction_output_accepts_additional_output_items`
  • [codex] Compact when comp_hash changes (#27520)
    ## Summary
    - snapshot `comp_hash` into `TurnContext` when the turn is created and
    use that snapshot as the downstream source of truth
    - persist the turn hash in rollout context and recover it into
    previous-turn settings during resume and fork replay
    - compact existing history with the previous model only when both
    adjacent turns provide hashes and the values differ
    - record `comp_hash_changed` as the compaction reason
    - cover ordinary transitions, resume, and missing-hash compatibility
    with end-to-end tests
    
    ## Why
    History produced under one compaction-compatible model configuration may
    not be safe to carry directly into another. Compacting at the turn
    boundary converts that history before context updates and the new user
    message are added. Persisting the turn snapshot in `TurnContextItem`
    makes the same protection work after resuming a rollout.
    
    A missing hash is not treated as evidence of incompatibility. `None →
    Some`, `Some → None`, and `None → None` do not trigger compaction; only
    `Some(previous) → Some(current)` with unequal values does.
    
    ## Stack
    - depends on #27532
    - #27532 is based directly on `main`
    
    ## Testing
    - `just test -p codex-core pre_sampling_compact_` — 6 passed
    - `just test -p codex-core
    turn_context_item_uses_turn_context_comp_hash_snapshot` — passed
    - `just fix -p codex-core -p codex-protocol -p codex-analytics -p
    codex-models-manager`
  • [codex-analytics] emit internally started turn events (#27392)
    ## Why
    Currently, the analytics reducer omits `codex_turn_event` for internally
    started subagent turns
    - It uses `TurnState.connection_id` to select app-server client and
    runtime metadata
    - `turn/start` sets this field for client-started turns, while internal
    subagent turns bypass that path
    - Spawned child threads inherit the correct connection, but turn
    emission does not use thread state
    
    ## What Changed
    - Keeps explicit `TurnState.connection_id` authoritative for
    client-started turns
    - Falls back to the matching thread’s inherited connection when the turn
    connection is absent
    - Preserves completeness gates, event schema, and post-emission state
    removal
    - Extends subagent lifecycle test coverage
    
    ## Verification
    - `just test -p codex-analytics` (71 tests passed)
    - `just fix -p codex-analytics`
    - `just fmt`
  • [codex] Retry transient Guardian review failures (#27062)
    ## Background
    
    Codex can use **Auto Review** for permission requests. Instead of asking
    the user immediately, Codex starts a separate locked-down reviewer
    session called **Guardian**, which returns a structured `allow` or
    `deny` assessment.
    
    The Guardian reviewer is itself a Codex session, so its model request
    can fail for transient infrastructure reasons such as model overload,
    HTTP connection failure, or response-stream disconnect. Today, any such
    failure immediately ends the Auto Review attempt and blocks the action.
    
    This PR adds bounded retries for failures that the existing protocol
    explicitly identifies as transient.
    
    Linear context:
    [CA-539](https://linear.app/openai/issue/CA-539/retry-auto-review-infrastructure-failures-and-fall-back-to-manual)
    
    ## What changes
    
    A Guardian review can now make at most **three total attempts**:
    
    1. Run the review normally.
    2. Retry after a jittered delay of roughly 180–220 ms if the first
    attempt fails with an eligible error.
    3. Retry after a jittered delay of roughly 360–440 ms if the second
    attempt also fails with an eligible error.
    
    All attempts share the original review deadline. Jitter spreads retries
    from concurrent clients to reduce synchronized load during broader
    outages. The retries do not reset the user's maximum wait time, and the
    backoff waits terminate early if the review is cancelled or the deadline
    expires.
    
    Before retrying, the existing Guardian session lifecycle decides whether
    the session remains usable. Healthy trunks are reused, broken trunks are
    removed by the existing cleanup path, and ephemeral sessions continue to
    clean themselves up.
    
    The review still emits one logical lifecycle to clients. Recoverable
    intermediate failures do not produce warnings or terminal events.
    
    ## Retry policy
    
    ### Retried up to twice
    
    - model/server overload
    - HTTP connection failure
    - response-stream connection failure
    - response-stream disconnect
    - internal server error
    - a final reviewer message that cannot be parsed as the required
    Guardian assessment
    
    ### Not retried
    
    - bad or invalid requests
    - authentication failures
    - usage limits
    - cyber-policy failures
    - errors without a structured category
    - a request that already exhausted the lower-level Responses retry
    budget
    - a completed Guardian turn with no assessment payload
    - prompt-construction failures
    - Guardian review timeout
    - cancellation or abort
    - a valid `deny` assessment
    
    The session-error classification uses `ErrorEvent.codex_error_info`; it
    does not inspect error-message strings.
    
    ## Implementation notes
    
    - `wait_for_guardian_review` preserves the complete `ErrorEvent`,
    including structured `codex_error_info`.
    - Guardian session failures preserve the original message and optional
    structured `CodexErrorInfo`.
    - The retry policy classifies the explicitly transient `CodexErrorInfo`
    variants; unknown, absent, and deterministic categories are not retried.
    - The Guardian session manager receives the caller's deadline rather
    than creating a new timeout per attempt.
    - Analytics record the final `attempt_count`.
    - Retry orchestration does not add a separate session-cleanup protocol;
    it relies on the existing trunk and ephemeral lifecycle decisions.
    
    ## Automated testing
    
    Focused Guardian coverage verifies:
    
    - every supported transient `CodexErrorInfo` is classified as retryable,
    while absent and non-transient categories are not;
    - structured transient session failure -> retry -> approval with the
    healthy trunk reused;
    - two invalid Guardian responses -> third attempt -> approval, with
    exactly three requests;
    - three invalid responses -> existing fail-closed result, with exactly
    three requests and one terminal lifecycle;
    - valid denial, missing payload, invalid request, timeout, cancellation,
    and prompt/session construction failures are not retried;
    - retry eligibility ends after the third attempt;
    - retry delays use the shared exponential backoff helper and remain
    within the expected jitter bounds;
    - cancellation and deadline expiry interrupt the backoff wait;
    - healthy trunks are reused across retryable failures;
    - broken event streams remove the trunk through the existing lifecycle
    cleanup;
    - an ephemeral retry does not disturb a concurrent trunk review.
    
    Validation performed:
    
    - `just test -p codex-core guardian_review_
    guardian_ephemeral_retry_preserves_parallel_trunk_and_fork_history
    run_review_removes_trunk_when_event_stream_is_broken` — **42 passed**;
    - `just test -p codex-analytics` — **71 passed**;
    - scoped Clippy fixes for `codex-core` and `codex-analytics` passed.
    
    A prior full `codex-core` run had unrelated environment-sensitive
    failures outside Guardian coverage.
    
    ## Manual QA
    
    The focused integration tests use the local mock Responses server to
    inspect exact request counts and emitted lifecycle events. They confirm
    that retries are internal, a successful later attempt supplies the final
    decision, non-retryable failures issue only one request, and exhausted
    retries emit only one terminal result.
  • [codex] Fix post-merge analytics integration failures (#27285)
    ## Why
    
    Recent merges left `main` with analytics integration build failures.
    Local Cargo runs also made the trimmed-skills test depend on
    developer-installed skills, while Bazel used an isolated home.
    
    ## What changed
    
    - Clone `thread_metadata.thread_source` when constructing goal analytics
    event parameters.
    - Group app-server thread extension inputs into
    `ThreadExtensionDependencies`.
    - Isolate the trimmed-skills test home so its exact fixture count is
    stable across Cargo and Bazel.
    
    ## Validation
    
    - `cargo check -p codex-analytics`
    - `just test -p codex-analytics` (71 tests)
    - `just test -p codex-app-server` (837 tests; one unrelated zsh-fork
    timeout passed on retry)
  • [codex-analytics] emit goal lifecycle analytics (#27078)
    ## Why
    - Currently, there is no analytics event for `/goal` behavior
    - Existing events cannot identify goal execution or its resulting
    outcome
    - The original update in
    [#26182](https://github.com/openai/codex/pull/26182) was implemented
    before `/goal` moved into `codex-goal-extension`.
    
    ## What Changed
    - Adds `codex_goal_event` serialization and enrichment to
    `codex-analytics`
    - Emits goal events from the canonical `codex-goal-extension` mutation
    and accounting paths:
      - `created` when a new logical goal is persisted
      - `usage_accounted` when cumulative goal usage is persisted
      - `status_changed` when the stored goal status changes
      - `cleared` when the goal is deleted
    - Preserves causal `turn_id` for turn driven events and uses null
    attribution for external or idle lifecycle events
    - Changes goal deletion to return the deleted row so `cleared` retains
    the stable goal ID
    
    ## Event Details
    
    Includes standard analytics metadata along with goal specific fields:
    - `goal_id`: Stable ID stored in the local SQLite goal row and shared
    across the goal's events
    - `event_kind`: Observed operation (see the 4 lifecycle events cited in
    the above bullet)
    - `goal_status`: Resulting or last stored status: `active`, `paused`,
    `blocked`, `usage_limited`, etc.
      - `has_token_budget`: Indicates whether a token budget is configured
      - `turn_id`: Causal turn ID, or null when no causal turn exists
    - `cumulative_tokens_accounted`: Cumulative tokens on `usage_accounted`
    events; null otherwise
    - `cumulative_time_accounted_seconds`: Cumulative active time on
    `usage_accounted` events; null otherwise
    
    ## Validation
    - `just test -p codex-analytics -p codex-state -p codex-goal-extension`
    - `just test -p codex-core -E 'test(/goal/)'`
    - `just test -p codex-app-server`
    - `cargo build -p codex-analytics -p codex-core -p codex-state -p
    codex-app-server`
  • [codex-analytics] add extensible feature thread sources (#27063)
    ## Why
    - `ThreadSource` currently defines a closed set of core-owned values
    - Product features also create threads for background or scheduled work
    - Adding every product-specific value to the core enum would require
    repeated `codex-rs` protocol changes
    - Feature-backed values let product callers provide precise attribution
    while preserving the existing core classifications
    
    ## What Changed
    - Adds `ThreadSource::Feature(String)` for app-owned thread source
    values
    - Represents all app-server v2 thread sources as scalar strings, so a
    feature source is supplied as `"automation"`
    - Persists and emits the feature's plain string label, so `"automation"`
    produces `thread_source="automation"` in analytics
    - Keeps `user`, `subagent`, and `memory_consolidation` as explicit
    core-owned values and regenerates the app-server schemas and TypeScript
    bindings
    
    ## Verification
    - `just write-app-server-schema`
    - `cargo check --workspace`
    - `just test -p codex-protocol
    feature_thread_source_serializes_as_its_app_owned_label`
    - `just test -p codex-app-server-protocol
    thread_sources_round_trip_as_scalar_labels`
    - `cargo test -p codex-analytics
    thread_initialized_event_serializes_expected_shape`
    - `just fmt`
  • multi-agent: add path-based v2 activity tracking (#27007)
    ## Why
    
    Multi-agent v2 identifies agents by canonical paths, but its tool
    handlers still emitted the larger legacy collaboration begin/end events
    built around nickname and role metadata. App-server, rollout-trace,
    analytics, and TUI consumers therefore lacked one compact path-based
    completion signal that behaved consistently across live events and
    replay.
    
    The TUI also needs a bounded `/agent` status surface for v2 agents. It
    should use recent local activity for previews, refresh liveness without
    loading full histories, and keep the legacy picker available when no
    path-backed v2 agent is known.
    
    ## What changed
    
    - Replace the v2 `spawn_agent`, `send_message`, `followup_task`, and
    `interrupt_agent` legacy lifecycle emissions with a success-only
    `SubAgentActivity` event. The event records the tool call ID, occurrence
    time, affected thread, canonical agent path, and `started`,
    `interacted`, or `interrupted` kind.
    - Expose the activity as a completion-only app-server v2
    `subAgentActivity` thread item in live notifications and reconstructed
    history, regenerate the protocol schemas, and count it in sub-agent tool
    analytics.
    - Track canonical paths from live activity and loaded-thread metadata in
    the TUI, and render the activity in live and replayed transcripts.
    - Make `/agent` list running path-backed agents with summaries from
    bounded local event buffers. Each summary is capped at 240 graphemes,
    the scan is capped at six recent items, only the last three wrapped
    lines are shown, and command output is omitted. Liveness falls back to
    metadata-only `thread/read` when local turn state is unavailable.
    - Persist the activity as a terminal rollout-trace runtime payload and
    reduce it to the corresponding spawn, send, follow-up, or close
    interaction edge. `interrupt_agent` is classified as a close-edge
    operation.
    - Preserve the legacy picker when no path-backed v2 agent is known.
    
    ## Compatibility
    
    App-server v2 clients that consumed `collabAgentToolCall` begin/end
    pairs for these tools must handle the new completion-only
    `subAgentActivity` item. Legacy v1 collaboration behavior is unchanged.
    
    ## Screenshot
    
    <img width="684" height="288" alt="Screenshot 2026-06-08 at 15 40 47"
    src="https://github.com/user-attachments/assets/194b3cd0-619d-45fb-b587-cf3e2b1b8a1d"
    />
    
    ## Testing
    
    - `just test -p codex-app-server-protocol`
    - `just test -p codex-rollout-trace`
    - Added focused coverage for activity analytics, terminal trace
    serialization, spawn-edge reduction, `interrupt_agent` classification,
    TUI status rendering without aggregated command output, and clearing
    stale running state after a completed turn.
  • [codex-analytics] stop sending codex error subreason (#27060)
    ## Summary
    - stop emitting `codex_error_subreason` on `codex_turn_event`
    - remove the transient analytics fact plumbing that copied
    `CodexErr::InvalidRequest(String)` into the event
    - update analytics serialization coverage accordingly
    
    ## Why
    `codex_error_subreason` is a free-form copy of `InvalidRequest(String)`,
    including raw provider 400 bodies in some paths. That makes it unsafe as
    an analytics field because it can carry user-derived or sensitive text.
    
    ## Validation
    - `just fmt`
    - `just test -p codex-analytics`
  • [codex-analytics] report compaction analytics details (#26680)
    ## Why
    
    Compaction analytics adds retained image count and compaction summary
    output tokens for v1.5 specifically.
    
    ## What changed
    
    - Add nullable `retained_image_count` and `compaction_summary_tokens`
    fields to `codex_compaction_event`.
    - Populate them only for `responses_compaction_v2`: retained images come
    from the retained v2 compacted history, and summary tokens come from
    `response.completed.token_usage.output_tokens`.
    - Leave local and legacy remote compaction events as `null` for these
    detail fields.
    
    ## Verification
    
    - `just fmt`
    - `just fix -p codex-core`
    - `just test -p codex-core
    build_v2_compacted_history_counts_retained_input_images`
    - `git diff --check`
  • [codex] Add turn profiling analytics (#26484)
    ## Summary
    
    Add flat profiling fields to `codex_turn_event` so analytics can explain
    where turn wall-clock time is spent without changing tool execution
    behavior.
    
    The profile reports:
    - time before the first sampling request
    - sampling time across all attempts and follow-ups
    - overhead between sampling requests
    - time blocked in the post-sampling tool drain
    - time after the final sampling request
    - sampling request and retry counts
    
    ## Implementation
    
    - Extend the existing turn timing state with constant-memory phase
    accounting and one RAII phase guard.
    - Observe sampling and the existing post-sampling drain only at turn
    orchestration boundaries.
    - Keep tool runtime, tool futures, response item handling, and turn
    lifecycle values unchanged.
    - Add the profiling fields directly to the existing analytics turn event
    without changing app-server protocol or rollout persistence.
    - Use the existing turn `status` to distinguish completed, failed, and
    interrupted profiles.
    
    Exact sampling/tool overlap is intentionally omitted because measuring
    tool completion accurately would require hooks in the tool execution
    path.
    
    ## Validation
    
    - Add app-server end-to-end coverage for a single-sampling turn with no
    blocking tool work.
    - Add app-server end-to-end coverage for `request_user_input` blocking
    followed by a second sampling request.
    - CI is running on the PR; tests were not executed locally per
    repository guidance.
  • [codex-analytics] emit forked thread id on initialization (#26248)
    ## Why
    - Thread initialization analytics do not identify the source thread for
    forked threads.
    - The session viewer needs this lineage to construct thread trees.
    - Depends on openai/openai#987854. Do not release this change before
    that backend schema change is deployed.
    
    ## What Changed
    - Adds optional `forked_from_thread_id` to `codex_thread_initialized`.
    - Populates it from the existing thread fork lineage for app-server and
    in-process subagent initialization paths.
    - Keeps it null for non-forked threads.
    
    ## Verification
    - `just fmt`
    - `just test -p codex-analytics`
    - `just test -p codex-app-server
    thread_fork_tracks_thread_initialized_analytics`
  • log plugin MCP server names (#26002)
    ## Summary
    - emit the plugin capability summary's exact MCP server names in
    `codex_plugin_used`
    
    ## Test
    - `just test -p codex-analytics`
    - `just test -p codex-core
    explicit_plugin_mentions_track_plugin_used_analytics`
    - `just fix -p codex-analytics`
  • Populate workspace kind on Codex turn events (#25135)
    ## Summary
    - carry `workspace_kind` from Responses API client metadata into the
    turn resolved analytics fact
    - serialize the optional value on `codex_turn_event`
    - cover both the turn metadata source and turn event serialization
    
    The `workspace_kind` tells us whether a thread had a project attached vs
    projectless. this is an indicator for who is adopting Codex for
    knowledge work outside of coding
    
    ## Testing
    - `env UV_CACHE_DIR=/private/tmp/uv-cache
    /private/tmp/cargo-tools/bin/just fmt`
    - `env PATH=/private/tmp/cargo-tools/bin:$PATH
    CARGO_HOME=/private/tmp/cargo-home UV_CACHE_DIR=/private/tmp/uv-cache
    /private/tmp/cargo-tools/bin/just test -p codex-analytics`
    - `env PATH=/private/tmp/cargo-tools/bin:$PATH
    CARGO_HOME=/private/tmp/cargo-home UV_CACHE_DIR=/private/tmp/uv-cache
    /private/tmp/cargo-tools/bin/just test -p codex-core turn_metadata`
    
    Paired with openai/openai#970661, which keeps forwarding the same
    metadata key through Responses API headers.
  • Propagate permission approval environment id (#25862)
    ## Stack
    
    1. #25850 - Key request-permission grants by environment: stores and
    applies sticky permission grants per environment id.
    2. #25858 - Add `environmentId` to `request_permissions`: lets the model
    target a selected environment and resolves relative permission paths
    against it.
    3. This PR (#25862) - Propagate permission approval environment id:
    carries the selected environment id through approval events, app-server
    requests, TUI prompts, and delegate forwarding.
    4. #25867 - Add remote request permissions integration coverage:
    verifies the selected remote environment across request, approval, grant
    reuse, and exec.
    
    This PR is stacked on #25858, and #25867 is stacked on this PR.
    
    ## Why
    
    PR2 lets the model bind a `request_permissions` call to a selected
    environment, but the approval event and client-facing request still
    needed to carry that binding. For CCA, the user-facing prompt and
    delegated approval path should know which environment the grant applies
    to instead of relying on cwd alone.
    
    ## What Changed
    
    - Added optional `environmentId` to `RequestPermissionsEvent`.
    - Emit the selected environment id from core permission approval events.
    - Preserve the environment id through delegate forwarding, including
    cwd-based delegated requests.
    - Added `environmentId` to app-server permission approval params,
    generated schema/TypeScript artifacts, and README examples.
    - Preserve and display the environment id in TUI permission approval
    prompts.
    - Updated focused core, app-server protocol, and TUI conversion
    coverage.
    
    ## Testing
    
    Not run locally per instruction. Performed read-only `git diff --check`.
  • [codex-analytics] Track CodexErr details in turn analytics (#25707)
    ## Summary
    - add analytics-only `CodexErr` telemetry to `codex_turn_event` while
    leaving existing `turn_error` unchanged
    - record terminal `CodexErr` facts from core immediately before the
    existing turn error event is sent
    - emit source-truth `codex_error_*` fields for downstream analytics,
    including the raw `CodexErr::InvalidRequest(String)` message as
    `codex_error_subreason`
    
    ## Validation
    - `just test -p codex-analytics`
    - attempted `just test -p codex-core`, but the local run timed out
    across unrelated integration suites in this environment and is not being
    used as validation
  • store and expose parent_thread_id on Threads (#25113)
    ## Why
    
    This PR
    https://github.com/openai/codex/pull/24161#discussion_r3325692763
    revealed a subagent data modeling issue, where we overloaded
    `forked_from_id` to also mean `parent_thread_id`. That's incorrect since
    guardian and review subagents can be a subagent and NOT fork the main
    thread's history.
    
    The solution here is to explicitly store a new `parent_thread_id` on
    `SessionMeta`, alongside `forked_from_id` which already exists. While
    we're at it, also expose it in the app-server protocol on the `Thread`
    object.
    
    A thread->subagent relationship and a fork of thread history are
    orthogonal concepts.
    
    ## What Changed
    
    - Added top-level `parent_thread_id` persistence on `SessionMeta` and
    runtime/session plumbing through `SessionConfiguredEvent`,
    `CodexSpawnArgs`, `SessionConfiguration`, `ThreadConfigSnapshot`,
    `TurnContext`, and `ModelClient`.
    - Made turn metadata, request headers, analytics, and subagent-start
    events read the separate runtime/top-level parent field instead of
    deriving general parent lineage from `SessionSource` or
    `forked_from_thread_id`.
    - Passed parent lineage separately at delegated subagent, review,
    guardian, agent-job, and multi-agent spawn construction sites;
    copied-history fork lineage remains derived only from `InitialHistory`.
    - Persisted and exposed parent lineage through rollout/thread-store
    projections and app-server v2 `Thread.parentThreadId`.
    - Updated app-server README text and regenerated app-server schema
    fixtures for the additive `parentThreadId` response field.
  • Add cloud-managed config layer support (#24620)
    ## Summary
    
    PR 3 of 5 in the cloud-managed config client stack.
    
    Adds enterprise-managed cloud config as a first-class config layer
    source. The layer metadata is preserved through config loading,
    diagnostics, debug output, hook attribution, and app-server protocol
    surfaces.
    
    ## Details
    
    - Enterprise-managed config becomes a normal config layer source with
    backend-supplied `id` and display `name` attached for provenance.
    - These layers are designed to behave like non-file managed config: they
    can surface syntax/type diagnostics by layer name even though there is
    no physical config file.
    - Relative path settings are resolved from a stored config base so
    cloud-delivered config remains consistent with existing MDM-delivered
    config semantics.
    - Hook attribution distinguishes config-delivered hooks from
    requirements-delivered hooks via `HookSource::CloudManagedConfig`.
    - This remains pull-based and snapshot-oriented; the PR adds layer
    identity/diagnostics, not dynamic reload behavior.
    
    ## Validation
    
    Validated through the targeted stack checks after rebasing onto current
    `main`:
    
    - Rust crate tests for
    config/hooks/cloud-config/backend-client/app-server-protocol
    - Filtered `codex-core` and `codex-app-server` `cloud_config_bundle`
    tests
    - Python generated-file contract test
    - `cargo shear --deny-warnings`
    - Targeted `argument-comment-lint` for config/hooks
  • Add subagent lineage metadata for responsesapi (#24161)
    ## Why
    
    We recently added `forked_from_thread_id` which lets us trace where a
    thread's _context_ comes from, but we also want to understand subagent
    lineage (e.g. which parent thread spawned this subagent? what kind of
    subagent is it?) which is orthogonal.
    
    This PR adds `parent_thread_id` and `subagent_kind` to the
    `x-codex-turn-metadata` header sent to ResponsesAPI.
    
    ## What changed
    
    - Adds `parent_thread_id` and `subagent_kind` to core-owned
    `x-codex-turn-metadata`.
    - Restores persisted `SessionSource` and `ThreadSource` from resumed
    session metadata so cold-resumed subagent threads keep their lineage on
    later Responses API requests.
    - Centralizes parent-thread extraction on `SessionSource` /
    `SubAgentSource` and reuses it in the Responses client, analytics, agent
    control, and state parsing paths.
    - Extends reserved-key, git-enrichment, thread-spawn, and app-server v2
    metadata coverage for the new lineage fields.
    
    ## Verification
    
    - Not run locally per request.
    - Added focused coverage in `core/src/turn_metadata_tests.rs` and
    `app-server/tests/suite/v2/client_metadata.rs`.
  • [codex] Add user input client ids (#24653)
    ## Summary
    
    Adds an optional `clientId` field to app-server v2 `UserInput` and
    carries it through the core `UserInput` model so clients can correlate
    echoed user input items without relying on payload equality.
    
    ## Details
    
    - Adds `client_id: Option<String>` to core `UserInput` variants.
    - Exposes the v2 app-server field as `clientId` on the wire and in
    generated TypeScript.
    - Preserves the id when converting between app-server v2 and core
    protocol types.
    - Regenerates app-server schema fixtures.
    
    ## Validation
    
    - `just fmt`
    - `just write-app-server-schema`
    - `cargo test -p codex-app-server-protocol`
    - `cargo test -p codex-protocol`
    - `just fix -p codex-app-server-protocol`
    - `just fix -p codex-protocol`
    - `git diff --check`
  • feat(app-server): include turns page on thread resume (#23534)
    ## Summary
    
    The client currently calls `thread/resume` to establish live updates and
    immediately follows it with `thread/turns/list` to hydrate recent turns.
    This lets `thread/resume` return that page directly, eliminating a round
    trip and the ordering/deduplication gap between the two calls.
    
    Experimental clients opt in with `initialTurnsPage: { limit,
    sortDirection, itemsView }`. The response returns `initialTurnsPage` as
    a `TurnsPage`, including cursors for paging further back in history.
    Keeping the controls in a nested opt-in object provides the useful
    `thread/turns/list` knobs without spreading page-specific parameters
    across `thread/resume`.
    
    ## Verification
    
    - `just fmt`
    - `just write-app-server-schema --experimental`
    - `just write-app-server-schema`
    - `cargo test -p codex-app-server-protocol`
    - `cargo test -p codex-app-server
    thread_resume_initial_turns_page_matches_requested_turns_list_page
    --tests`
    - `cargo test -p codex-app-server
    thread_resume_rejoins_running_thread_even_with_override_mismatch
    --tests`
    - `just fix -p codex-app-server-protocol -p codex-app-server`
  • [codex-analytics] add grouped session id to runtime events (#24655)
    ## Why
    - Runtime analytics events report `thread_id`, which identifies the
    individual thread emitting an event
    - They don't report `session_id`, which identifies the shared session
    for a root thread and its subagent threads
    - Emitting both identifiers allows analytics to group related activity
    
    ## What Changed
    - Adds `session_id` to relevant analytics events (thread_initalized,
    turn, turn_steer, compaction, guardian_review)
    - Tracks each thread's session ID in the analytics reducer so subsequent
    thread scoped events emit the same value
    - Carries the shared session ID through subagent initialization
    
    ## Verification
    - `just test -p codex-analytics` validates event payloads and subagent
    session grouping.
    - Focused `codex-app-server` tests validate session IDs for thread,
    turn, and steer events.
    - Focused `codex-core` tests validate root and subagent session ID
    propagation.
  • Add experimental turn additional context (#24154)
    ## Summary
    
    Adds experimental `additionalContext` support to `turn/start` and
    `turn/steer` so clients can provide ephemeral external context, such as
    browser or automation state, without turning that plumbing into a
    visible user prompt or triggering user-prompt lifecycle behavior.
    
    ## API Shape
    
    The parameter shape is:
    
    ```ts
    additionalContext?: Record<string, {
      value: string
      kind: "untrusted" | "application"
    }> | null
    ```
    
    Example:
    
    ```json
    {
      "additionalContext": {
        "browser_info": {
          "value": "Active tab is CI failures.",
          "kind": "untrusted"
        },
        "automation_info": {
          "value": "CI rerun is in progress.",
          "kind": "application"
        }
      }
    }
    ```
    
    The keys are opaque and caller-defined.
    
    ## Context Injection
    
    When provided, accepted entries are inserted into model context as
    hidden contextual message items, not as visible thread user-message
    items.
    
    `kind: "untrusted"` entries are inserted with role `user`:
    
    ```text
    <external_${key}>${value}</external_${key}>
    ```
    
    `kind: "application"` entries are inserted with role `developer`:
    
    ```text
    <${key}>${value}</${key}>
    ```
    
    Values are not escaped. Each value is truncated to 1k approximate tokens
    before wrapping.
    
    For `turn/start`, accepted additional context is inserted before normal
    user input. For `turn/steer`, additional context is merged only when the
    steer includes non-empty user input; context-only steers still reject as
    empty input.
    
    ## Dedupe Strategy
    
    `AdditionalContextStore` lives on session state and stores the latest
    complete additional-context map.
    
    Each `turn/start` or non-empty `turn/steer` treats its
    `additionalContext` as the current complete set of values. Entries are
    injected only when the key is new or the exact entry for that key
    changed, including `value` or `kind`. After merging, the store is
    replaced with the provided map, so omitted keys are removed from the
    retained set and can be injected again later if reintroduced.
    
    Omitting `additionalContext`, passing `null`, or passing an empty object
    resets the store to empty and injects nothing.
    
    ## What Changed
    
    - Threads experimental v2 `additionalContext` through app-server into
    core turn start and steer handling.
    - Adds separate contextual fragment types for untrusted user-role
    context and application developer-role context.
    - Uses pending response input items so additional context can be
    combined with normal user input without treating it as prompt text.
    - Adds integration coverage for start/steer flow, role routing,
    dedupe/reset behavior, deletion/re-add behavior, hook-blocked input
    behavior, empty context-only steer rejection, external-fragment marker
    matching, and truncation.
  • [codex-analytics] split compaction v2 analytics implementation (#24146)
    ## What changed
    
    - Add a distinct `responses_compaction_v2` value for
    `CodexCompactionEvent.implementation`.
    - Emit that value from the remote compaction v2 path.
    - Keep local compaction as `responses` and legacy `/responses/compact`
    as `responses_compact`.
    
    ## Why
    
    Remote compaction v2 and local prompt-based compaction were both
    reported as `responses`, which made the analytics table collapse two
    different compaction mechanisms into one implementation bucket.
    
    ## Validation
    
    - `just fmt`
    - `just test -p codex-analytics`
    
    `just test -p codex-core` was started locally, but this PR is
    intentionally being pushed for CI to finish the remaining validation.
  • [codex] Add plugin id to MCP tool call items (#23737)
    Add owning plugin id to MCP tool call items so we can better filter them
    at plugin level.
    
    ## Summary
    - add optional `plugin_id` to MCP tool-call items and legacy begin/end
    events
    - propagate plugin metadata into emitted core items and app-server v2
    `ThreadItem::McpToolCall`
    - preserve plugin ids through app-server replay/redaction paths and
    regenerate v2 schema fixtures
    
    ## Testing
    - `just write-app-server-schema`
    - `just fmt`
    - `just fix -p codex-core`
    - `cargo test -p codex-protocol -p codex-app-server-protocol`
    - `cargo test -p codex-app-server-protocol`
    - `cargo test -p codex-core mcp_tool_call_item_includes_plugin_id --lib`
    - `cargo check -p codex-tui --tests`
    - `cargo check -p codex-app-server --tests`
    - `git diff --check`
    
    ## Notes
    - `just fix -p codex-core` completed with two non-fatal
    `too_many_arguments` warnings on the touched MCP notification helpers.
    - A broader `cargo test -p codex-core` run passed core unit tests, then
    hit shell/sandbox/snapshot failures in the integration target.
    - A broader app-server downstream run hit the existing
    `in_process::tests::in_process_start_clamps_zero_channel_capacity` stack
    overflow; `cargo test -p codex-exec` also hit the existing sandbox
    expectation mismatch in
    `thread_lifecycle_params_include_legacy_sandbox_when_no_active_profile`.
  • Add SubagentStop hook (#22873)
    # What
    
    <img width="1792" height="1024" alt="image"
    src="https://github.com/user-attachments/assets/8f81d232-5813-4994-a61d-e42a05a93a3e"
    />
    
    `SubagentStop` runs when a thread-spawned subagent turn is about to
    finish. Thread-spawned subagents use `SubagentStop` instead of the
    normal root-agent `Stop` hook.
    
    Configured handlers match on `agent_type`. Hook input includes the
    normal stop fields plus:
    
    - `agent_id`: the child thread id.
    - `agent_type`: the resolved subagent type.
    - `agent_transcript_path`: the child subagent transcript path.
    - `transcript_path`: the parent thread transcript path.
    - `last_assistant_message`: the final assistant message from the child
    turn, when available.
    - `stop_hook_active`: `true` when the child is already continuing
    because an earlier stop-like hook blocked completion.
    
    `SubagentStop` shares the same completion-control semantics as `Stop`,
    scoped to the child turn:
    
    - No decision allows the child turn to finish.
    - `decision: "block"` with a non-empty `reason` records that reason as
    hook feedback and continues the child with that prompt.
    - `continue: false` stops the child turn. If `stopReason` is present,
    Codex surfaces it as the stop reason.
    
    # Lifecycle Scope
    
    Only thread-spawned subagents run `SubagentStop`.
    
    Internal/system subagents such as Review, Compact, MemoryConsolidation,
    and Other do not run normal `Stop` hooks and do not run `SubagentStop`.
    This avoids exposing synthetic matcher labels for internal
    implementation paths.
    
    # Stack
    
    1. #22782: add `SubagentStart`.
    2. This PR: add `SubagentStop`.
    3. #22882: add subagent identity to normal hook inputs.