codex

feat(app-server): add history_mode to thread (#29927 )

## Description

This PR adds a new `historyMode = "legacy" | "paginated"` to `Thread`.
This will be stored in `SessionMeta` in the JSONL rollout file and as a
new column in the SQLite thread_metadata table, and exposed on
`thread/start` and on the `Thread` object in app-server.

## What changed

- Added canonical `ThreadHistoryMode` with `legacy` and `paginated`,
defaulting old and new SessionMeta to `legacy`.
- Carried `history_mode` through core session config, ThreadStore stored
metadata, local/in-memory stores, rollout metadata extraction, and the
existing SQLite `threads` table.
- Added experimental `historyMode` to app-server v2 `Thread` and
`thread/start`.
- Made paginated stored threads metadata-discoverable but unsupported
for legacy full-history reads, `load_history`, live resume, and create
paths.
- Regenerated app-server schema fixtures and added
protocol/state/thread-store/app-server coverage for persistence and
fail-closed behavior.

## Compatibility floor
Because users may be running various versions of Codex binaries on the
same machine (TUI, Codex App, etc.), we will need to establish a
compatibility floor for upcoming paginated threads, which will change
how thread storage reads and writes work.

The overall plan here:
```
Release N:
- Add historyMode to SessionMeta / Thread / SQLite metadata.
- Teach binaries to understand paginated threads.
- If a binary sees `historyMode="paginated"` but does not support the paginated contract, it refuses to resume/mutate the thread.
- Default remains `"legacy"`.

Release N+1:
- First-party clients start opting into paginated threads where appropriate.
- Internal dogfood / staged rollout.
- Measure old-client usage and paginated-thread unsupported errors.

Release N+2:
- Only after Release N+ is overwhelmingly deployed, make paginated the default.
- Accept that a small tail of N-1-or-older binaries may not understand paginated threads.
```

The important behavior change is fail-closed handling for a binary that
encounters a persisted `paginated` thread before it knows how to fully
support paginated history. In app-server, if a thread is `paginated`, we
will:

- allow metadata-only discovery paths like `thread/list` and
`thread/read(includeTurns=false)`, so clients can still see the thread
and inspect its `historyMode`
- reject legacy full-history/live-thread paths like
`thread/read(includeTurns=true)` and `thread/resume` with an unsupported
JSON-RPC error
- avoid silently treating an unknown or future `historyMode` as `legacy`

Under the hood, the ThreadStore layer also rejects legacy operations
that would need to load or replay the full thread history for a
paginated thread. That gives us the behavior we want for Release N:
future paginated threads are visible, but this binary fails closed
instead of trying to operate on them as if they were legacy threads.

Owen Lin · 2026-06-26 09:12:42 -07:00

5267e805fb

[codex] Attribute app-server analytics by thread originator (#29935 )

## Why

Desktop Work threads and regular Codex threads can share the same
app-server connection. App-server analytics currently copy
`product_client_id` from connection metadata for every thread-scoped
event, so Work thread activity is attributed to the Desktop connection
instead of the thread's resolved originator. This prevents analytics
from distinguishing the two products on a shared connection.

## What changed

- Publish the resolved originator after a thread is materialized,
covering new, resumed, forked, and subagent threads.
- Store that originator in the analytics reducer's existing per-thread
state.
- Override only `app_server_client.product_client_id` for thread, turn,
tool, review, goal, guardian, and compaction events while preserving the
connection's client name, version, and transport metadata.
- Fall back to the connection-wide product client ID when a thread has
no originator override.
- Preserve persisted originators in thread initialization analytics for
resume and fork flows.

## Validation

- `just test -p codex-analytics
thread_originator_overrides_shared_connection_across_thread_events
subagent_events_keep_thread_originator_with_explicit_turn_connection`
- `just test -p codex-app-server
turn_start_tracks_thread_originator_in_analytics
thread_start_tracks_thread_initialized_analytics
thread_fork_tracks_thread_initialized_analytics
thread_resume_tracks_thread_initialized_analytics`
- `just test -p codex-core thread_manager`

alexsong-oai · 2026-06-25 18:15:48 -07:00

841f30598c

[plugins] Track plugin install requests by ID (#29684 )

Summary
- Emit `codex_plugin_install_requested` when a validated plugin install
request is made, before the user accepts or declines the elicitation.
- Record the exact model-visible plugin ID, remote plugin ID, required
connector IDs, stable suggestion ID, and `endpoint_recommendation` vs
`legacy_discovery` source.
- Keep `suggest_reason` out of telemetry and leave connector-only
install requests unchanged.

Rollout
- Backend/schema dependency:
https://github.com/openai/openai/pull/1065270
- Land the backend PR before this producer starts sending the event.

Validation
- `just test -p codex-analytics` (83 passed)
- `just test -p codex-core request_plugin_install` (17 passed)
- `just fix -p codex-analytics`
- `just fix -p codex-core`
- `just fmt`
- `git diff --check`

Alex Daley · 2026-06-24 21:29:11 +00:00

24423f5712

Support thread-level originator overrides (#29477 )

## Why

Work(TPP) threads can be launched from the Desktop app, but if they all
keep the Desktop app's default originator then downstream attribution
cannot distinguish local Work launches from cloud-backed Work launches.
`thread/start.serviceName` already carries that launch signal, while
`SessionMeta.originator` is the durable thread-level value that survives
resume and fork.

This change converts the Desktop Work service names into an effective
originator at thread creation time, persists that originator with the
thread, and keeps using it for later model requests and memory writes.

## What changed

- Map `CODEX_WORK_LOCAL` and `CODEX_WORK_CLOUD` service names to
per-thread originators, while preserving
`CODEX_INTERNAL_ORIGINATOR_OVERRIDE` as the highest-precedence override.
- Persist the effective originator in `SessionMeta.originator`, read it
back on resume/fork, and inherit the parent originator for subagent
spawns when there is no persisted session metadata.
- Handle truncated `SpawnAgentForkMode::LastNTurns` forks by falling
back to the live parent originator when the forked history no longer
includes `SessionMeta`.
- Thread the per-thread originator through Responses headers,
websocket/compaction request paths, thread-store creation, rollout
metadata, and memory stage-one telemetry.

## Verification

- `just test -p codex-core
agent::control::tests::spawn_thread_subagent_inherits_parent_originator_without_fork
agent::control::tests::spawn_thread_subagent_fork_last_n_turns_inherits_parent_originator_without_session_meta
thread_manager::tests::originator_override_precedes_service_name_remapping`
- `just test -p codex-core
agent::control::tests::resume_thread_subagent_restores_stored_metadata_and_effective_multi_agent_mode`
- `just test -p codex-memories-write`
- `just fix -p codex-core -p codex-memories-write`
- `git diff --check`

alexsong-oai · 2026-06-23 17:23:38 -07:00

1acb722e8a

[codex] rename rollout budget error to session budget error (#29744 )

## Summary

- rename the rollout-budget exhaustion error from
`RolloutBudgetExceeded` to `SessionBudgetExceeded`
- expose the matching app-server v2 wire value as
`sessionBudgetExceeded`
- regenerate JSON/TypeScript schema fixtures and update the app-server
docs and focused tests

This is a naming-only follow-up to #29715 based on [Pavel's review
suggestion](https://github.com/openai/codex/pull/29715#discussion_r3463183480).
Runtime behavior is unchanged.

## Tests

- `just test -p codex-core rollout_budget`
- `just test -p codex-app-server-protocol`
- `just fmt`
- `just write-app-server-schema`

rka-oai · 2026-06-23 16:49:13 -07:00

1ec3def0b5

[codex] surface rollout budget exhaustion (#29715 )

## Summary
- surface shared rollout-budget exhaustion as
`CodexErr::RolloutBudgetExceeded` instead of a generic interrupted turn
- map it through the existing `CodexErrorInfo` and app-server v2
`codexErrorInfo` path
- keep local compaction from retrying after the shared rollout budget is
exhausted

This gives app-server clients a stable `rolloutBudgetExceeded` error
they can classify without guessing from `status="interrupted"`.

## Tests
- `just test -p codex-core rollout_budget`

rka-oai · 2026-06-23 15:01:28 -07:00

bbbea91960

Separate local and remote plugin analytics IDs (#29495 )

## Why

Plugin analytics overloaded `plugin_id`: most events used the Codex
`<plugin>@<marketplace>` identity, while remote install events used the
backend plugin ID. That makes the same field change meaning across event
types and complicates downstream identity resolution.

This change makes the contract unambiguous:

- `plugin_id`: the local Codex `<plugin>@<marketplace>` identity, when
resolved
- `remote_plugin_id`: the backend plugin identity, when available

For a remote install failure that happens before plugin details resolve,
`plugin_id` is `null` and `remote_plugin_id` remains populated.

## What changed

All six plugin analytics events use the same identity contract:

- `codex_plugin_installed`
- `codex_plugin_install_failed`
- `codex_plugin_uninstalled`
- `codex_plugin_enabled`
- `codex_plugin_disabled`
- `codex_plugin_used`

Remote identity is resolved from the current installed-plugin snapshot
first, with persisted install metadata as fallback. The telemetry
metadata type keeps local identity optional for failures that occur
before remote details are available.

The app-server test client's manual analytics smokes now find remote
mutation events through `remote_plugin_id` and validate that `plugin_id`
remains local.

## Remote uninstall

Resolve and capture telemetry metadata before removing the local plugin
cache, then emit `codex_plugin_uninstalled` after the backend confirms
success. The event is also emitted when backend uninstall succeeds but
local cache cleanup reports `CacheRemove`.

If a concurrent remote-cache refresh removes the local bundle before
telemetry capture, the already-fetched remote plugin detail supplies
fallback capability metadata.

## Validation

- `just test -p codex-analytics` — 82 passed
- `just test -p codex-core-plugins` — 271 passed
- `just test -p codex-app-server-test-client` — 5 passed
- `just test -p codex-plugin` — 3 passed
- `just test -p codex-app-server plugin_install` — 37 passed
- `just test -p codex-app-server plugin_uninstall` — 10 passed

The production app-server install/uninstall flow was also exercised
against `plugins~Plugin_f1b845ac33888191ac156169c58733c2`
(`build-ios-apps@openai-curated-remote`), and the plugin's original
uninstalled state was restored.

jameswt-oai · 2026-06-23 12:27:14 -07:00

ff50b47dce

core: add extra metadata field to Thread struct (#29675 )

# Summary

Adds a field Thread.extras that can be used to hold arbitrary metadata
specific to a given thread.

Boyang Niu · 2026-06-23 19:15:59 +00:00

354807920e

chore(core) rm AskForApproval::OnFailure (#28418 )

## Summary
Deletes the OnFailure variant of the `AskForApproval` enum. This option
has been deprecated since #11631.

## Testing
- [x] Tests pass

Dylan Hurd · 2026-06-23 12:13:54 -07:00

2cf2a6a844

[codex] Centralize Plugin Analytics Metadata (#27102 )

This PR moves construction of `PluginTelemetryMetadata` from loader and
model helpers into `PluginsManager`, which already owns installed plugin
state and will eventually perform remote identity enrichment. The
metadata type remains in `codex-plugin`, and serialized analytics events
remain unchanged.

## Before

```mermaid
flowchart LR
    subgraph Events["Analytics event paths"]
        direction TB
        Lifecycle["Local install / uninstall"]
        Config["Enable / disable"]
        Remote["Remote install"]
        Used["Plugin used"]
    end

    subgraph Construction["Metadata construction"]
        direction TB
        Loader["Loader telemetry helpers"]
        Summary["PluginCapabilitySummary::telemetry_metadata"]
        Override["Caller adds remote_plugin_id"]
    end

    Metadata["PluginTelemetryMetadata"]

    Lifecycle --> Loader
    Config --> Loader
    Remote --> Loader
    Loader -->|"local events"| Metadata
    Loader -->|"remote install"| Override
    Override --> Metadata
    Used --> Summary
    Summary --> Metadata
```

Telemetry metadata was constructed through loader helpers, a
capability-summary method, and a remote-install call-site override.

## After

```mermaid
flowchart LR
    subgraph Events["Analytics event paths"]
        direction TB
        Lifecycle["Local install / uninstall"]
        Config["Enable / disable"]
        Remote["Remote install"]
        Used["Plugin used"]
    end

    Manager["PluginsManager — single construction owner"]
    Metadata["PluginTelemetryMetadata"]

    Lifecycle --> Manager
    Config --> Manager
    Remote -->|"authoritative remote ID"| Manager
    Used -->|"capability summary"| Manager
    Manager --> Metadata
```

Every analytics path delegates metadata construction to
`PluginsManager`. Remote install still supplies its authoritative
backend ID explicitly.

## What Changes

- Make loader code return a focused plugin capability summary instead of
constructing analytics metadata.
- Centralize immutable plugin telemetry metadata construction in
`PluginsManager`.
- Route local install/uninstall, remote install, enable/disable, and
plugin-used emitters through the manager.
- Preserve the current serialized analytics contract exactly.

Normal metadata still has no remote override. Remote install continues
to provide its authoritative backend ID explicitly, so the existing
serializer continues reporting that ID through `plugin_id`.
Snapshot-based enrichment is intentionally deferred to the final PR.

## Testing

- `just test -p codex-core-plugins` (238 tests passed)
- `just test -p codex-plugin` (3 tests passed)
- Scoped Clippy/compile checks passed for `codex-plugin`,
`codex-core-plugins`, `codex-app-server`, and `codex-core`.

## Split Overview

```text
main
├── #27093  Debug analytics capture                 (merged)
├── #27099  Non-mutating plugin smoke               (merged)
├── #27100  Remote install/uninstall smoke          (merged)
└── #27102  Plugin telemetry metadata refactor      ← you are here
    └── #27669  Persist remote plugin identity

After #27102 and #27669 merge:
└── Final PR: add explicit local and remote IDs to plugin analytics
```

Review order and dependencies:

1. [#27093 Add debug-only analytics event
capture](https://github.com/openai/codex/pull/27093) (merged)
2. [#27099 Add a plugin analytics smoke
workflow](https://github.com/openai/codex/pull/27099) (merged)
3. [#27100 Add a remote plugin analytics mutation smoke
workflow](https://github.com/openai/codex/pull/27100) (merged)
4. This metadata refactor, independent and based on `main`
5. [#27669 Persist remote plugin
identity](https://github.com/openai/codex/pull/27669), stacked on this
PR
6. Final remote-ID behavior PR, created after the prerequisites merge

The original [#26281](https://github.com/openai/codex/pull/26281)
remains open as the aggregate reference until the final replacement PR
is published.

jameswt-oai · 2026-06-22 10:27:23 -07:00

44dbae90eb

Expose thread-level multi-agent mode (#28792 )

## Why

Once multi-agent mode can be selected per turn, clients also need to
choose the initial selection when creating a thread and observe that
selection through lifecycle and settings APIs.

The selected value is intentionally distinct from the effective
model-visible value: no client selection is represented as `null`, even
though an eligible multi-agent v2 turn derives `explicitRequestOnly` as
its effective default.

## What changed

- Add the optional experimental `thread/start.multiAgentMode` parameter
and pass it through thread creation.
- Preserve an omitted initial value as an unset selection rather than
eagerly storing `explicitRequestOnly`.
- Apply an explicit `thread/start` selection to the first turn through
the session configuration established at thread creation.
- Restore the latest persisted effective mode as the selected baseline
on cold resume when rollout history contains one.
- Inherit the optional selected mode from a loaded parent when creating
related runtime threads.
- Return the current selected `multiAgentMode` from `thread/start`,
`thread/resume`, `thread/fork`, and thread settings, using `null` when
no mode is selected.
- Keep lifecycle reporting independent from model capability and feature
eligibility; core turn construction remains responsible for calculating
and persisting the effective mode.

## Not covered

- Clearing an existing loaded-session selection back to unset through
`turn/start`; omitted or `null` currently retains the session's
selection.
- A TUI control, slash command, or `config.toml` preference.

## Verification

- `CARGO_INCREMENTAL=0 just test -p codex-app-server-protocol`
- `CARGO_INCREMENTAL=0 just test -p codex-app-server multi_agent_mode`

The focused app-server coverage verifies explicit `thread/start`
initialization, first-turn prompting, nullable reporting for an omitted
selection, and retention of selections that are not currently
runtime-eligible.

## Stack

Stacked on #28685. This PR contains only the thread initialization and
lifecycle/settings API layer.

Shijie Rao · 2026-06-19 10:50:44 +02:00

7abfcf220b

Emit Trusted MCP App Identity on Tool-Call Items (#27132 )

## Summary

- Add optional `appContext` to app-server MCP tool-call items with
trusted `connectorId`, `linkId`, and `mcpAppResourceUri` metadata.
- Preserve that context across tool-call events, persisted history,
reconnects, and thread resume.
- Keep the deprecated top-level `mcpAppResourceUri` temporarily for
client migration.

The consumer contract is `{ appContext: { connectorId, linkId,
mcpAppResourceUri }, tool }`.

## Validation

- Full GitHub Actions suite passes, including CLA, Bazel tests, clippy,
release builds, and argument-comment lint.

---------

Co-authored-by: martinauyeung-oai <280153141+martinauyeung-oai@users.noreply.github.com>

martinauyeung-oai · 2026-06-18 14:02:54 -07:00

765309d5a6

Support openai/form extended form elicitations (#27500 )

# Summary
Allow App Server clients to opt into `openai/form` MCP elicitations.

Gabriel Peal · 2026-06-18 11:54:49 -07:00

21a599fa56

unified-exec: retain PathUri in command events (#28780 )

## Why

App-server must report command events containing foreign-platform paths
without changing existing client or rollout path-string formats.

## What changed

- retain `PathUri` through exec command begin/end events
- convert cwd values to `LegacyAppPathString` at the app-server
compatibility boundary
- drop command actions with foreign paths and log them
- serialize rollout-trace cwd values using their inferred native path
representation
- restore Wine coverage for retained Windows cwd values and successful
completion

Adam Perry @ OpenAI · 2026-06-18 05:00:04 +00:00

3931bc2bde

[codex] Track plugin install and import telemetry failures (#28731 )

## Summary
- Track plugin install failures through the unified
`codex_plugin_install_failed` event for local installs, remote install
preflight failures, bundle failures, and remote catalog/backend
failures.
- Send classified `error_type` values in plugin install failure
analytics instead of raw error strings.
- Stop sending raw external-agent import errors in analytics while
preserving raw failure details in app-facing import
notifications/history.
- Keep raw plugin/migration diagnostics in `tracing::warn!` logs.
- Keep remote failure plugin names as the existing local placeholder
(`unknown`) and remove the extra telemetry plugin-name override.
- Change `ExternalAgentConfigImportParams.source` from a generated enum
to `string | null`, with legacy `claudeCode` / `claudeCowork` inputs
normalized to existing analytics values.

## Testing

charlesgong-openai · 2026-06-17 13:16:34 -07:00

3959ab0ffc

[codex] Restore thread recency with compatible migration history (#28671 )

## Summary

- Revert #28655, restoring the thread `recencyAt` behavior introduced by
#27910.
- Move `threads_recency_at` to migration 0039 so it no longer collides
with `external_agent_config_imports` at version 0038.
- Repair databases that already applied the recency migration as version
38 by moving the matching migration-history row to version 39 before
SQLx validation. The current version-38 migration can then apply
normally.

## Validation

- `just test -p codex-state
migrations::tests::repairs_recency_migration_that_was_applied_as_version_38`
- `just test -p codex-state -p codex-rollout -p codex-thread-store -p
codex-app-server-protocol -p codex-tui`: 3,439 passed; six TUI tests
could not open the machine's existing read-only incident database at
`~/.codex/sqlite/state_5.sqlite`.
- `just fix -p codex-state`
- `just fmt`
- Verified that state migration versions are unique.

Jeremy Rose · 2026-06-17 18:52:18 +00:00

7dc7096ae1

Scope command approvals by execution environment (#28738 )

## Why

Command approval cache keys included the command and working directory,
but not the execution environment. An approval for `/workspace` locally
could therefore be reused for the same command and path on an executor.

## What changed

- Include the selected environment ID in shell and unified-exec approval
cache keys.
- Carry that ID through the normal command approval request so clients
can show which environment is being approved.
- Expose the environment through app-server as a required nullable
`environmentId` and show it in the inline TUI approval prompt.
- Keep older recorded approval events compatible when the environment is
absent.

For example, `echo ok` in local `/workspace` and `echo ok` in executor
`/workspace` now produce different approval keys and separate prompts.

## Scope

This PR does not change network approvals, Guardian review actions, MCP
elicitation, full-screen TUI rendering, or environment-ID validation.
Remote `shell_command` execution itself remains in #28722; this PR only
makes its approval key environment-aware.

jif · 2026-06-17 19:52:43 +02:00

1391d786bc

Revert thread recencyAt for sidebar ordering (#28655 )

## Why

Revert #27910 to remove the newly introduced thread `recencyAt`
persistence and API behavior from `main`.

## What changed

This reverts commit `fac3158c2a783095768076489815f361fa9b0db4`,
including the state migration, thread-store propagation, app-server API
surface, generated schemas, and related tests.

## Validation

Not run before opening; relying on CI for the initial fast signal.

pakrym-oai · 2026-06-16 21:39:30 -07:00

cb15c64760

Add thread recencyAt for sidebar ordering (#27910 )

## Summary

Add a server-owned `recencyAt` timestamp and `recency_at` thread-list
sort key for product recency ordering while preserving the existing
meaning of `updatedAt` as the latest persisted thread mutation.

This is the server-side alternative to #27697. Rather than narrowing
`updatedAt`, clients can sort the sidebar by `recency_at` and continue
treating `updatedAt` as mutation time.

Paired Codex Apps PR:
[openai/openai#1024599](https://github.com/openai/openai/pull/1024599)

## Contract

- `recencyAt` initializes when a thread is created.
- A turn start advances `recencyAt` monotonically.
- Commentary, agent output, tool results, token/accounting updates, turn
completion, archive, unarchive, resume, and generic metadata writes do
not advance it.
- `updatedAt` retains its existing behavior and continues to advance for
persisted thread mutations.
- Current servers populate `recencyAt`; the response field is optional
in generated TypeScript so clients connected to older servers can fall
back to `updatedAt`.
- Filesystem-only fallback uses existing updated/mtime ordering when
SQLite is unavailable.

## Persistence and compatibility

Migration 0038 adds second- and millisecond-precision recency columns,
backfills them from the existing updated timestamp, creates list
indexes, and includes an insert trigger so older binaries writing to a
migrated database seed recency without causing later mutations to
advance it.

Generic metadata upserts preserve existing recency values. Turn-start
updates use a dedicated monotonic touch, and process-local allocation
keeps millisecond cursor values unique. State DB list, search, read,
filtered-list repair, rollout fallback propagation, and app-server
conversions all carry the new field.

## API

`Thread` responses include:

```ts
recencyAt?: number
```

`thread/list` and `thread/search` accept:

```json
{ "sortKey": "recency_at" }
```

Generated TypeScript and JSON schemas are included.

## Validation

- `just test -p codex-state` — 146 passed
- `just test -p codex-rollout` — 69 passed
- `just test -p codex-thread-store` — 81 passed
- `just test -p codex-app-server-protocol` — 231 passed
- Focused app-server list ordering, response mapping, archive/unarchive,
and resume lifecycle tests passed
- Scoped `just fix` for state, rollout, thread-store,
app-server-protocol, and app-server
- `just fmt`
- `git diff --check`
- Independent correctness, simplicity, elegance, security, and
test-quality reviews; actionable ordering, lifecycle, query-projection,
and timestamp-uniqueness findings were addressed

Jeremy Rose · 2026-06-16 17:06:22 -07:00

fac3158c2a

[codex] Add interruptible sleep tool (#28429 )

## Why

Models sometimes need to pause briefly while waiting for external work,
but using a shell command for that delay ties the wait to a process and
does not naturally resume when new turn input arrives.

## What changed

- add a built-in `sleep` tool behind the under-development `sleep_tool`
feature
- accept a bounded `duration_ms` argument, matching the millisecond
convention used by unified exec
- end the sleep early when either steered user input or mailbox input
arrives
- include elapsed wall-clock time in completed and interrupted outputs
- emit a dedicated core `SleepItem` through `item/started` and
`item/completed`
- expose the sleep item as app-server v2 `ThreadItem::Sleep` and retain
it in reconstructed thread history
- regenerate the configuration schema for the new feature flag
- regenerate app-server JSON and TypeScript schema fixtures

## Test plan

- `just test -p codex-core sleep_tool_follows_feature_gate`
- `just test -p codex-core any_new_input_interrupts_sleep`
- `just test -p codex-app-server-protocol`
- `just test -p codex-app-server
sleep_emits_started_and_completed_items`

pakrym-oai · 2026-06-15 21:39:21 -07:00

08901fc8e1

[codex-analytics] Analytics Capture to File in Debug Builds (#27093 )

## This PR

The original [combined remote plugin analytics PR
#26281](https://github.com/openai/codex/pull/26281) mixed reusable
analytics test infrastructure, two manual smoke workflows, a metadata
refactor, and the final identity behavior. This PR isolates the generic
capture mechanism so it can be reviewed and landed before any
plugin-specific behavior.

- Add a debug-only analytics destination that writes final request
payloads as JSONL.
- Suppress HTTP delivery whenever capture mode is selected, including
after capture write failures.
- Keep release behavior unchanged even when the capture environment
variable is present.
- Keep the mechanism generic; this PR contains no plugin-specific
behavior.

Set `CODEX_ANALYTICS_EVENTS_CAPTURE_FILE=/path/events.jsonl` when
running a debug Codex binary to inspect the exact batched payload that
would otherwise be sent to the analytics endpoint.

## Testing

- `just test -p codex-analytics` (76 passed)
- `just test --release -p codex-analytics` (73 passed)
- CI is green across the required platform matrix.

## Split Overview

```text
main
├── #27093  Debug analytics capture                 ← you are here
│   └── #27099  Non-mutating plugin smoke
│       └── #27100  Remote install/uninstall smoke
└── #27102  Plugin telemetry metadata refactor

After #27093, #27099, #27100, and #27102 merge:
└── Final PR: add remote_plugin_id to plugin analytics
```

Review order and dependencies:

1. [#27093 Add debug-only analytics event
capture](https://github.com/openai/codex/pull/27093) **(this PR, based
on `main`)**
2. [#27099 Add a plugin analytics smoke
workflow](https://github.com/openai/codex/pull/27099) (stacked on
#27093)
3. [#27100 Add a remote plugin analytics mutation smoke
workflow](https://github.com/openai/codex/pull/27100) (stacked on
#27099)
4. [#27102 Centralize plugin telemetry metadata
construction](https://github.com/openai/codex/pull/27102) (independent,
based on `main`)
5. Final remote-ID behavior PR (created after PRs 1-4 merge)

The original [#26281](https://github.com/openai/codex/pull/26281)
remains open as the green aggregate reference until the final PR is
published.

jameswt-oai · 2026-06-15 16:32:38 -07:00

e512e884ed

Add Guardian catalog diagnostics metadata (#27109 )

## Why

We need request-level evidence for Guardian cases where
`codex-auto-review` is missing from the client-side model catalog and
the review falls back to the parent model.

## What changed

- Add `guardian_catalog_contains_auto_review` to Guardian Responses API
client metadata.
- Add `guardian_model_provider_id` to Guardian Responses API client
metadata.
- Keep review-session metadata optional so callers without metadata
preserve the existing `None` path.
- Add tests for override, normal preferred-model, and
missing-auto-review-catalog behavior.

## Validation

- `just test -p codex-core
guardian_review_records_missing_auto_review_model_in_request_metadata`
- `just test -p codex-core
guardian_review_uses_model_catalog_override_when_preferred_review_model_exists`
- `just test -p codex-core
guardian_review_uses_preferred_review_model_without_model_catalog_override`
- `git diff --check origin/main`

Won Park · 2026-06-12 15:50:30 -07:00

0605f9c14f

Emit plugin ID on MCP tool call analytics events (#27483 )

MCP tool-call items already carry the runtime-resolved plugin owner, but
the analytics reducer dropped that field. Forwarding the existing value
provides direct attribution without downstream server-name inference.

## Summary

- emit `plugin_id` on `codex_mcp_tool_call_event` payloads
- preserve `null` for MCP calls without a plugin owner
- verify the serialized field through the MCP item lifecycle test

## Test

- `cd codex-rs && just test -p codex-analytics`
- `cd codex-rs && just fix -p codex-analytics`
- `cd codex-rs && just fmt`

Chris Dong · 2026-06-11 09:55:53 -07:00

df9dd22248

[codex-analytics] Emit structured compaction codex errors (#27082 )

## Summary
- replace raw compaction `error` analytics with `codex_error_kind` and
`codex_error_http_status_code`
- derive compaction error telemetry from `CodexErr` using the same
`CodexErrKind` mapping and HTTP status helper used by turn events
- remove the pre-compact hook stop reason from the internal compaction
outcome now that it is no longer emitted as raw analytics text

## Why
Compaction `error` was a raw `CodexErr::to_string()` value, which can
carry free-form provider or user-derived text. Structured Codex error
fields preserve useful low-cardinality telemetry without sending the raw
string.

## Validation
- `just fmt`
- `just test -p codex-analytics`
- `just test -p codex-core
compact::tests::build_token_limited_compacted_history_appends_summary_message`

Attempted `just test -p codex-core`; the changed crate compiled, but the
full target failed in unrelated environment-dependent tests such as
missing helper binaries and shell snapshot timeouts.

rhan-oai · 2026-06-11 06:07:06 +00:00

8d9f33c87c

[codex-analytics] report cached input tokens for v2 compaction (#27103 )

## Summary

- add nullable `cached_input_tokens` to the compaction analytics event
- populate it from response usage for compaction v2
- leave it `null` for other compaction implementations

This adds visibility into prompt-cache usage for v2 compaction without
changing compaction behavior.

## Testing

- `just test -p codex-analytics`
- `just test -p codex-core
collect_compaction_output_accepts_additional_output_items`

rhan-oai · 2026-06-10 22:47:22 -07:00

383708e74e

[codex] Compact when comp_hash changes (#27520 )

## Summary
- snapshot `comp_hash` into `TurnContext` when the turn is created and
use that snapshot as the downstream source of truth
- persist the turn hash in rollout context and recover it into
previous-turn settings during resume and fork replay
- compact existing history with the previous model only when both
adjacent turns provide hashes and the values differ
- record `comp_hash_changed` as the compaction reason
- cover ordinary transitions, resume, and missing-hash compatibility
with end-to-end tests

## Why
History produced under one compaction-compatible model configuration may
not be safe to carry directly into another. Compacting at the turn
boundary converts that history before context updates and the new user
message are added. Persisting the turn snapshot in `TurnContextItem`
makes the same protection work after resuming a rollout.

A missing hash is not treated as evidence of incompatibility. `None →
Some`, `Some → None`, and `None → None` do not trigger compaction; only
`Some(previous) → Some(current)` with unequal values does.

## Stack
- depends on #27532
- #27532 is based directly on `main`

## Testing
- `just test -p codex-core pre_sampling_compact_` — 6 passed
- `just test -p codex-core
turn_context_item_uses_turn_context_comp_hash_snapshot` — passed
- `just fix -p codex-core -p codex-protocol -p codex-analytics -p
codex-models-manager`

Ahmed Ibrahim · 2026-06-11 04:11:26 +00:00

ba4925b3c2

[codex-analytics] emit internally started turn events (#27392 )

## Why
Currently, the analytics reducer omits `codex_turn_event` for internally
started subagent turns
- It uses `TurnState.connection_id` to select app-server client and
runtime metadata
- `turn/start` sets this field for client-started turns, while internal
subagent turns bypass that path
- Spawned child threads inherit the correct connection, but turn
emission does not use thread state

## What Changed
- Keeps explicit `TurnState.connection_id` authoritative for
client-started turns
- Falls back to the matching thread’s inherited connection when the turn
connection is absent
- Preserves completeness gates, event schema, and post-emission state
removal
- Extends subagent lifecycle test coverage

## Verification
- `just test -p codex-analytics` (71 tests passed)
- `just fix -p codex-analytics`
- `just fmt`

marksteinbrick-oai · 2026-06-10 15:35:41 -07:00

b39f943a63

[codex] Retry transient Guardian review failures (#27062 )

## Background

Codex can use **Auto Review** for permission requests. Instead of asking
the user immediately, Codex starts a separate locked-down reviewer
session called **Guardian**, which returns a structured `allow` or
`deny` assessment.

The Guardian reviewer is itself a Codex session, so its model request
can fail for transient infrastructure reasons such as model overload,
HTTP connection failure, or response-stream disconnect. Today, any such
failure immediately ends the Auto Review attempt and blocks the action.

This PR adds bounded retries for failures that the existing protocol
explicitly identifies as transient.

Linear context:
[CA-539](https://linear.app/openai/issue/CA-539/retry-auto-review-infrastructure-failures-and-fall-back-to-manual)

## What changes

A Guardian review can now make at most **three total attempts**:

1. Run the review normally.
2. Retry after a jittered delay of roughly 180–220 ms if the first
attempt fails with an eligible error.
3. Retry after a jittered delay of roughly 360–440 ms if the second
attempt also fails with an eligible error.

All attempts share the original review deadline. Jitter spreads retries
from concurrent clients to reduce synchronized load during broader
outages. The retries do not reset the user's maximum wait time, and the
backoff waits terminate early if the review is cancelled or the deadline
expires.

Before retrying, the existing Guardian session lifecycle decides whether
the session remains usable. Healthy trunks are reused, broken trunks are
removed by the existing cleanup path, and ephemeral sessions continue to
clean themselves up.

The review still emits one logical lifecycle to clients. Recoverable
intermediate failures do not produce warnings or terminal events.

## Retry policy

### Retried up to twice

- model/server overload
- HTTP connection failure
- response-stream connection failure
- response-stream disconnect
- internal server error
- a final reviewer message that cannot be parsed as the required
Guardian assessment

### Not retried

- bad or invalid requests
- authentication failures
- usage limits
- cyber-policy failures
- errors without a structured category
- a request that already exhausted the lower-level Responses retry
budget
- a completed Guardian turn with no assessment payload
- prompt-construction failures
- Guardian review timeout
- cancellation or abort
- a valid `deny` assessment

The session-error classification uses `ErrorEvent.codex_error_info`; it
does not inspect error-message strings.

## Implementation notes

- `wait_for_guardian_review` preserves the complete `ErrorEvent`,
including structured `codex_error_info`.
- Guardian session failures preserve the original message and optional
structured `CodexErrorInfo`.
- The retry policy classifies the explicitly transient `CodexErrorInfo`
variants; unknown, absent, and deterministic categories are not retried.
- The Guardian session manager receives the caller's deadline rather
than creating a new timeout per attempt.
- Analytics record the final `attempt_count`.
- Retry orchestration does not add a separate session-cleanup protocol;
it relies on the existing trunk and ephemeral lifecycle decisions.

## Automated testing

Focused Guardian coverage verifies:

- every supported transient `CodexErrorInfo` is classified as retryable,
while absent and non-transient categories are not;
- structured transient session failure -> retry -> approval with the
healthy trunk reused;
- two invalid Guardian responses -> third attempt -> approval, with
exactly three requests;
- three invalid responses -> existing fail-closed result, with exactly
three requests and one terminal lifecycle;
- valid denial, missing payload, invalid request, timeout, cancellation,
and prompt/session construction failures are not retried;
- retry eligibility ends after the third attempt;
- retry delays use the shared exponential backoff helper and remain
within the expected jitter bounds;
- cancellation and deadline expiry interrupt the backoff wait;
- healthy trunks are reused across retryable failures;
- broken event streams remove the trunk through the existing lifecycle
cleanup;
- an ephemeral retry does not disturb a concurrent trunk review.

Validation performed:

- `just test -p codex-core guardian_review_
guardian_ephemeral_retry_preserves_parallel_trunk_and_fork_history
run_review_removes_trunk_when_event_stream_is_broken` — **42 passed**;
- `just test -p codex-analytics` — **71 passed**;
- scoped Clippy fixes for `codex-core` and `codex-analytics` passed.

A prior full `codex-core` run had unrelated environment-sensitive
failures outside Guardian coverage.

## Manual QA

The focused integration tests use the local mock Responses server to
inspect exact request counts and emitted lifecycle events. They confirm
that retries are internal, a successful later attempt supplies the final
decision, non-retryable failures issue only one request, and exhausted
retries emit only one terminal result.

kbazzi · 2026-06-10 11:46:57 -07:00

ccf1a18518

[codex] Fix post-merge analytics integration failures (#27285 )

## Why

Recent merges left `main` with analytics integration build failures.
Local Cargo runs also made the trimmed-skills test depend on
developer-installed skills, while Bazel used an isolated home.

## What changed

- Clone `thread_metadata.thread_source` when constructing goal analytics
event parameters.
- Group app-server thread extension inputs into
`ThreadExtensionDependencies`.
- Isolate the trimmed-skills test home so its exact fixture count is
stable across Cargo and Bazel.

## Validation

- `cargo check -p codex-analytics`
- `just test -p codex-analytics` (71 tests)
- `just test -p codex-app-server` (837 tests; one unrelated zsh-fork
timeout passed on retry)

Adam Perry @ OpenAI · 2026-06-09 20:52:09 -07:00

e0cb4ede4e

[codex-analytics] emit goal lifecycle analytics (#27078 )

## Why
- Currently, there is no analytics event for `/goal` behavior
- Existing events cannot identify goal execution or its resulting
outcome
- The original update in
[#26182](https://github.com/openai/codex/pull/26182) was implemented
before `/goal` moved into `codex-goal-extension`.

## What Changed
- Adds `codex_goal_event` serialization and enrichment to
`codex-analytics`
- Emits goal events from the canonical `codex-goal-extension` mutation
and accounting paths:
  - `created` when a new logical goal is persisted
  - `usage_accounted` when cumulative goal usage is persisted
  - `status_changed` when the stored goal status changes
  - `cleared` when the goal is deleted
- Preserves causal `turn_id` for turn driven events and uses null
attribution for external or idle lifecycle events
- Changes goal deletion to return the deleted row so `cleared` retains
the stable goal ID

## Event Details

Includes standard analytics metadata along with goal specific fields:
- `goal_id`: Stable ID stored in the local SQLite goal row and shared
across the goal's events
- `event_kind`: Observed operation (see the 4 lifecycle events cited in
the above bullet)
- `goal_status`: Resulting or last stored status: `active`, `paused`,
`blocked`, `usage_limited`, etc.
  - `has_token_budget`: Indicates whether a token budget is configured
  - `turn_id`: Causal turn ID, or null when no causal turn exists
- `cumulative_tokens_accounted`: Cumulative tokens on `usage_accounted`
events; null otherwise
- `cumulative_time_accounted_seconds`: Cumulative active time on
`usage_accounted` events; null otherwise

## Validation
- `just test -p codex-analytics -p codex-state -p codex-goal-extension`
- `just test -p codex-core -E 'test(/goal/)'`
- `just test -p codex-app-server`
- `cargo build -p codex-analytics -p codex-core -p codex-state -p
codex-app-server`

marksteinbrick-oai · 2026-06-09 18:45:54 -07:00

608b8b1cc6

[codex-analytics] add extensible feature thread sources (#27063 )

## Why
- `ThreadSource` currently defines a closed set of core-owned values
- Product features also create threads for background or scheduled work
- Adding every product-specific value to the core enum would require
repeated `codex-rs` protocol changes
- Feature-backed values let product callers provide precise attribution
while preserving the existing core classifications

## What Changed
- Adds `ThreadSource::Feature(String)` for app-owned thread source
values
- Represents all app-server v2 thread sources as scalar strings, so a
feature source is supplied as `"automation"`
- Persists and emits the feature's plain string label, so `"automation"`
produces `thread_source="automation"` in analytics
- Keeps `user`, `subagent`, and `memory_consolidation` as explicit
core-owned values and regenerates the app-server schemas and TypeScript
bindings

## Verification
- `just write-app-server-schema`
- `cargo check --workspace`
- `just test -p codex-protocol
feature_thread_source_serializes_as_its_app_owned_label`
- `just test -p codex-app-server-protocol
thread_sources_round_trip_as_scalar_labels`
- `cargo test -p codex-analytics
thread_initialized_event_serializes_expected_shape`
- `just fmt`

marksteinbrick-oai · 2026-06-09 12:27:10 -07:00

a71e040df5

multi-agent: add path-based v2 activity tracking (#27007 )

## Why

Multi-agent v2 identifies agents by canonical paths, but its tool
handlers still emitted the larger legacy collaboration begin/end events
built around nickname and role metadata. App-server, rollout-trace,
analytics, and TUI consumers therefore lacked one compact path-based
completion signal that behaved consistently across live events and
replay.

The TUI also needs a bounded `/agent` status surface for v2 agents. It
should use recent local activity for previews, refresh liveness without
loading full histories, and keep the legacy picker available when no
path-backed v2 agent is known.

## What changed

- Replace the v2 `spawn_agent`, `send_message`, `followup_task`, and
`interrupt_agent` legacy lifecycle emissions with a success-only
`SubAgentActivity` event. The event records the tool call ID, occurrence
time, affected thread, canonical agent path, and `started`,
`interacted`, or `interrupted` kind.
- Expose the activity as a completion-only app-server v2
`subAgentActivity` thread item in live notifications and reconstructed
history, regenerate the protocol schemas, and count it in sub-agent tool
analytics.
- Track canonical paths from live activity and loaded-thread metadata in
the TUI, and render the activity in live and replayed transcripts.
- Make `/agent` list running path-backed agents with summaries from
bounded local event buffers. Each summary is capped at 240 graphemes,
the scan is capped at six recent items, only the last three wrapped
lines are shown, and command output is omitted. Liveness falls back to
metadata-only `thread/read` when local turn state is unavailable.
- Persist the activity as a terminal rollout-trace runtime payload and
reduce it to the corresponding spawn, send, follow-up, or close
interaction edge. `interrupt_agent` is classified as a close-edge
operation.
- Preserve the legacy picker when no path-backed v2 agent is known.

## Compatibility

App-server v2 clients that consumed `collabAgentToolCall` begin/end
pairs for these tools must handle the new completion-only
`subAgentActivity` item. Legacy v1 collaboration behavior is unchanged.

## Screenshot

<img width="684" height="288" alt="Screenshot 2026-06-08 at 15 40 47"
src="https://github.com/user-attachments/assets/194b3cd0-619d-45fb-b587-cf3e2b1b8a1d"
/>

## Testing

- `just test -p codex-app-server-protocol`
- `just test -p codex-rollout-trace`
- Added focused coverage for activity analytics, terminal trace
serialization, spawn-edge reduction, `interrupt_agent` classification,
TUI status rendering without aggregated command output, and clearing
stale running state after a completed turn.

jif · 2026-06-09 12:14:48 +02:00

fae2709320

[codex-analytics] stop sending codex error subreason (#27060 )

## Summary
- stop emitting `codex_error_subreason` on `codex_turn_event`
- remove the transient analytics fact plumbing that copied
`CodexErr::InvalidRequest(String)` into the event
- update analytics serialization coverage accordingly

## Why
`codex_error_subreason` is a free-form copy of `InvalidRequest(String)`,
including raw provider 400 bodies in some paths. That makes it unsafe as
an analytics field because it can carry user-derived or sensitive text.

## Validation
- `just fmt`
- `just test -p codex-analytics`

rhan-oai · 2026-06-08 21:29:06 +00:00

ee6c91d5cf

[codex-analytics] report compaction analytics details (#26680 )

## Why

Compaction analytics adds retained image count and compaction summary
output tokens for v1.5 specifically.

## What changed

- Add nullable `retained_image_count` and `compaction_summary_tokens`
fields to `codex_compaction_event`.
- Populate them only for `responses_compaction_v2`: retained images come
from the retained v2 compacted history, and summary tokens come from
`response.completed.token_usage.output_tokens`.
- Leave local and legacy remote compaction events as `null` for these
detail fields.

## Verification

- `just fmt`
- `just fix -p codex-core`
- `just test -p codex-core
build_v2_compacted_history_counts_retained_input_images`
- `git diff --check`

rhan-oai · 2026-06-08 10:52:31 -07:00

f1c18df9ae

[codex] Add turn profiling analytics (#26484 )

## Summary

Add flat profiling fields to `codex_turn_event` so analytics can explain
where turn wall-clock time is spent without changing tool execution
behavior.

The profile reports:
- time before the first sampling request
- sampling time across all attempts and follow-ups
- overhead between sampling requests
- time blocked in the post-sampling tool drain
- time after the final sampling request
- sampling request and retry counts

## Implementation

- Extend the existing turn timing state with constant-memory phase
accounting and one RAII phase guard.
- Observe sampling and the existing post-sampling drain only at turn
orchestration boundaries.
- Keep tool runtime, tool futures, response item handling, and turn
lifecycle values unchanged.
- Add the profiling fields directly to the existing analytics turn event
without changing app-server protocol or rollout persistence.
- Use the existing turn `status` to distinguish completed, failed, and
interrupted profiles.

Exact sampling/tool overlap is intentionally omitted because measuring
tool completion accurately would require hooks in the tool execution
path.

## Validation

- Add app-server end-to-end coverage for a single-sampling turn with no
blocking tool work.
- Add app-server end-to-end coverage for `request_user_input` blocking
followed by a second sampling request.
- CI is running on the PR; tests were not executed locally per
repository guidance.

Ahmed Ibrahim · 2026-06-05 11:27:10 -07:00

8d72fb6de9

[codex-analytics] emit forked thread id on initialization (#26248 )

## Why
- Thread initialization analytics do not identify the source thread for
forked threads.
- The session viewer needs this lineage to construct thread trees.
- Depends on openai/openai#987854. Do not release this change before
that backend schema change is deployed.

## What Changed
- Adds optional `forked_from_thread_id` to `codex_thread_initialized`.
- Populates it from the existing thread fork lineage for app-server and
in-process subagent initialization paths.
- Keeps it null for non-forked threads.

## Verification
- `just fmt`
- `just test -p codex-analytics`
- `just test -p codex-app-server
thread_fork_tracks_thread_initialized_analytics`

kbazzi · 2026-06-04 11:24:12 -07:00

9e41f8ddbe

log plugin MCP server names (#26002 )

## Summary
- emit the plugin capability summary's exact MCP server names in
`codex_plugin_used`

## Test
- `just test -p codex-analytics`
- `just test -p codex-core
explicit_plugin_mentions_track_plugin_used_analytics`
- `just fix -p codex-analytics`

Chris Dong · 2026-06-03 16:06:52 -07:00

4d4837c495

Populate workspace kind on Codex turn events (#25135 )

## Summary
- carry `workspace_kind` from Responses API client metadata into the
turn resolved analytics fact
- serialize the optional value on `codex_turn_event`
- cover both the turn metadata source and turn event serialization

The `workspace_kind` tells us whether a thread had a project attached vs
projectless. this is an indicator for who is adopting Codex for
knowledge work outside of coding

## Testing
- `env UV_CACHE_DIR=/private/tmp/uv-cache
/private/tmp/cargo-tools/bin/just fmt`
- `env PATH=/private/tmp/cargo-tools/bin:$PATH
CARGO_HOME=/private/tmp/cargo-home UV_CACHE_DIR=/private/tmp/uv-cache
/private/tmp/cargo-tools/bin/just test -p codex-analytics`
- `env PATH=/private/tmp/cargo-tools/bin:$PATH
CARGO_HOME=/private/tmp/cargo-home UV_CACHE_DIR=/private/tmp/uv-cache
/private/tmp/cargo-tools/bin/just test -p codex-core turn_metadata`

Paired with openai/openai#970661, which keeps forwarding the same
metadata key through Responses API headers.

knittel-openai · 2026-06-02 12:46:14 -07:00

b794182ea7

Propagate permission approval environment id (#25862 )

## Stack

1. #25850 - Key request-permission grants by environment: stores and
applies sticky permission grants per environment id.
2. #25858 - Add `environmentId` to `request_permissions`: lets the model
target a selected environment and resolves relative permission paths
against it.
3. This PR (#25862) - Propagate permission approval environment id:
carries the selected environment id through approval events, app-server
requests, TUI prompts, and delegate forwarding.
4. #25867 - Add remote request permissions integration coverage:
verifies the selected remote environment across request, approval, grant
reuse, and exec.

This PR is stacked on #25858, and #25867 is stacked on this PR.

## Why

PR2 lets the model bind a `request_permissions` call to a selected
environment, but the approval event and client-facing request still
needed to carry that binding. For CCA, the user-facing prompt and
delegated approval path should know which environment the grant applies
to instead of relying on cwd alone.

## What Changed

- Added optional `environmentId` to `RequestPermissionsEvent`.
- Emit the selected environment id from core permission approval events.
- Preserve the environment id through delegate forwarding, including
cwd-based delegated requests.
- Added `environmentId` to app-server permission approval params,
generated schema/TypeScript artifacts, and README examples.
- Preserve and display the environment id in TUI permission approval
prompts.
- Updated focused core, app-server protocol, and TUI conversion
coverage.

## Testing

Not run locally per instruction. Performed read-only `git diff --check`.

jif · 2026-06-02 21:09:34 +02:00

9de568372d

[codex-analytics] Track CodexErr details in turn analytics (#25707 )

## Summary
- add analytics-only `CodexErr` telemetry to `codex_turn_event` while
leaving existing `turn_error` unchanged
- record terminal `CodexErr` facts from core immediately before the
existing turn error event is sent
- emit source-truth `codex_error_*` fields for downstream analytics,
including the raw `CodexErr::InvalidRequest(String)` message as
`codex_error_subreason`

## Validation
- `just test -p codex-analytics`
- attempted `just test -p codex-core`, but the local run timed out
across unrelated integration suites in this environment and is not being
used as validation

rhan-oai · 2026-06-02 11:40:35 -07:00

8e4b92d294

store and expose parent_thread_id on Threads (#25113 )

## Why

This PR
https://github.com/openai/codex/pull/24161#discussion_r3325692763
revealed a subagent data modeling issue, where we overloaded
`forked_from_id` to also mean `parent_thread_id`. That's incorrect since
guardian and review subagents can be a subagent and NOT fork the main
thread's history.

The solution here is to explicitly store a new `parent_thread_id` on
`SessionMeta`, alongside `forked_from_id` which already exists. While
we're at it, also expose it in the app-server protocol on the `Thread`
object.

A thread->subagent relationship and a fork of thread history are
orthogonal concepts.

## What Changed

- Added top-level `parent_thread_id` persistence on `SessionMeta` and
runtime/session plumbing through `SessionConfiguredEvent`,
`CodexSpawnArgs`, `SessionConfiguration`, `ThreadConfigSnapshot`,
`TurnContext`, and `ModelClient`.
- Made turn metadata, request headers, analytics, and subagent-start
events read the separate runtime/top-level parent field instead of
deriving general parent lineage from `SessionSource` or
`forked_from_thread_id`.
- Passed parent lineage separately at delegated subagent, review,
guardian, agent-job, and multi-agent spawn construction sites;
copied-history fork lineage remains derived only from `InitialHistory`.
- Persisted and exposed parent lineage through rollout/thread-store
projections and app-server v2 `Thread.parentThreadId`.
- Updated app-server README text and regenerated app-server schema
fixtures for the additive `parentThreadId` response field.

Owen Lin · 2026-06-01 04:33:20 +00:00

cf0911076f

Add cloud-managed config layer support (#24620 )

## Summary

PR 3 of 5 in the cloud-managed config client stack.

Adds enterprise-managed cloud config as a first-class config layer
source. The layer metadata is preserved through config loading,
diagnostics, debug output, hook attribution, and app-server protocol
surfaces.

## Details

- Enterprise-managed config becomes a normal config layer source with
backend-supplied `id` and display `name` attached for provenance.
- These layers are designed to behave like non-file managed config: they
can surface syntax/type diagnostics by layer name even though there is
no physical config file.
- Relative path settings are resolved from a stored config base so
cloud-delivered config remains consistent with existing MDM-delivered
config semantics.
- Hook attribution distinguishes config-delivered hooks from
requirements-delivered hooks via `HookSource::CloudManagedConfig`.
- This remains pull-based and snapshot-oriented; the PR adds layer
identity/diagnostics, not dynamic reload behavior.

## Validation

Validated through the targeted stack checks after rebasing onto current
`main`:

- Rust crate tests for
config/hooks/cloud-config/backend-client/app-server-protocol
- Filtered `codex-core` and `codex-app-server` `cloud_config_bundle`
tests
- Python generated-file contract test
- `cargo shear --deny-warnings`
- Targeted `argument-comment-lint` for config/hooks

joeflorencio-openai · 2026-05-31 15:54:31 -07:00

8a556296f0

Add subagent lineage metadata for responsesapi (#24161 )

## Why

We recently added `forked_from_thread_id` which lets us trace where a
thread's _context_ comes from, but we also want to understand subagent
lineage (e.g. which parent thread spawned this subagent? what kind of
subagent is it?) which is orthogonal.

This PR adds `parent_thread_id` and `subagent_kind` to the
`x-codex-turn-metadata` header sent to ResponsesAPI.

## What changed

- Adds `parent_thread_id` and `subagent_kind` to core-owned
`x-codex-turn-metadata`.
- Restores persisted `SessionSource` and `ThreadSource` from resumed
session metadata so cold-resumed subagent threads keep their lineage on
later Responses API requests.
- Centralizes parent-thread extraction on `SessionSource` /
`SubAgentSource` and reuses it in the Responses client, analytics, agent
control, and state parsing paths.
- Extends reserved-key, git-enrichment, thread-spawn, and app-server v2
metadata coverage for the new lineage fields.

## Verification

- Not run locally per request.
- Added focused coverage in `core/src/turn_metadata_tests.rs` and
`app-server/tests/suite/v2/client_metadata.rs`.

Owen Lin · 2026-05-29 11:28:12 -07:00

fc9cf62efb

[codex] Add user input client ids (#24653 )

## Summary

Adds an optional `clientId` field to app-server v2 `UserInput` and
carries it through the core `UserInput` model so clients can correlate
echoed user input items without relying on payload equality.

## Details

- Adds `client_id: Option<String>` to core `UserInput` variants.
- Exposes the v2 app-server field as `clientId` on the wire and in
generated TypeScript.
- Preserves the id when converting between app-server v2 and core
protocol types.
- Regenerates app-server schema fixtures.

## Validation

- `just fmt`
- `just write-app-server-schema`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-protocol`
- `just fix -p codex-app-server-protocol`
- `just fix -p codex-protocol`
- `git diff --check`

Alexi Christakis · 2026-05-28 14:54:39 -07:00

e92c952b2e

feat(app-server): include turns page on thread resume (#23534 )

## Summary

The client currently calls `thread/resume` to establish live updates and
immediately follows it with `thread/turns/list` to hydrate recent turns.
This lets `thread/resume` return that page directly, eliminating a round
trip and the ordering/deduplication gap between the two calls.

Experimental clients opt in with `initialTurnsPage: { limit,
sortDirection, itemsView }`. The response returns `initialTurnsPage` as
a `TurnsPage`, including cursors for paging further back in history.
Keeping the controls in a nested opt-in object provides the useful
`thread/turns/list` knobs without spreading page-specific parameters
across `thread/resume`.

## Verification

- `just fmt`
- `just write-app-server-schema --experimental`
- `just write-app-server-schema`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-app-server
thread_resume_initial_turns_page_matches_requested_turns_list_page
--tests`
- `cargo test -p codex-app-server
thread_resume_rejoins_running_thread_even_with_override_mismatch
--tests`
- `just fix -p codex-app-server-protocol -p codex-app-server`

Brent Traut · 2026-05-28 09:18:13 -07:00

2a1158b8e2

[codex-analytics] add grouped session id to runtime events (#24655 )

## Why
- Runtime analytics events report `thread_id`, which identifies the
individual thread emitting an event
- They don't report `session_id`, which identifies the shared session
for a root thread and its subagent threads
- Emitting both identifiers allows analytics to group related activity

## What Changed
- Adds `session_id` to relevant analytics events (thread_initalized,
turn, turn_steer, compaction, guardian_review)
- Tracks each thread's session ID in the analytics reducer so subsequent
thread scoped events emit the same value
- Carries the shared session ID through subagent initialization

## Verification
- `just test -p codex-analytics` validates event payloads and subagent
session grouping.
- Focused `codex-app-server` tests validate session IDs for thread,
turn, and steer events.
- Focused `codex-core` tests validate root and subagent session ID
propagation.

marksteinbrick-oai · 2026-05-26 16:38:46 -07:00

487521733b

Add experimental turn additional context (#24154 )

## Summary

Adds experimental `additionalContext` support to `turn/start` and
`turn/steer` so clients can provide ephemeral external context, such as
browser or automation state, without turning that plumbing into a
visible user prompt or triggering user-prompt lifecycle behavior.

## API Shape

The parameter shape is:

```ts
additionalContext?: Record<string, {
  value: string
  kind: "untrusted" | "application"
}> | null
```

Example:

```json
{
  "additionalContext": {
    "browser_info": {
      "value": "Active tab is CI failures.",
      "kind": "untrusted"
    },
    "automation_info": {
      "value": "CI rerun is in progress.",
      "kind": "application"
    }
  }
}
```

The keys are opaque and caller-defined.

## Context Injection

When provided, accepted entries are inserted into model context as
hidden contextual message items, not as visible thread user-message
items.

`kind: "untrusted"` entries are inserted with role `user`:

```text
<external_${key}>${value}</external_${key}>
```

`kind: "application"` entries are inserted with role `developer`:

```text
<${key}>${value}</${key}>
```

Values are not escaped. Each value is truncated to 1k approximate tokens
before wrapping.

For `turn/start`, accepted additional context is inserted before normal
user input. For `turn/steer`, additional context is merged only when the
steer includes non-empty user input; context-only steers still reject as
empty input.

## Dedupe Strategy

`AdditionalContextStore` lives on session state and stores the latest
complete additional-context map.

Each `turn/start` or non-empty `turn/steer` treats its
`additionalContext` as the current complete set of values. Entries are
injected only when the key is new or the exact entry for that key
changed, including `value` or `kind`. After merging, the store is
replaced with the provided map, so omitted keys are removed from the
retained set and can be injected again later if reintroduced.

Omitting `additionalContext`, passing `null`, or passing an empty object
resets the store to empty and injects nothing.

## What Changed

- Threads experimental v2 `additionalContext` through app-server into
core turn start and steer handling.
- Adds separate contextual fragment types for untrusted user-role
context and application developer-role context.
- Uses pending response input items so additional context can be
combined with normal user input without treating it as prompt text.
- Adds integration coverage for start/steer flow, role routing,
dedupe/reset behavior, deletion/re-add behavior, hook-blocked input
behavior, empty context-only steer rejection, external-fragment marker
matching, and truncation.

pakrym-oai · 2026-05-26 13:02:34 -07:00

768848ab6f

[codex-analytics] split compaction v2 analytics implementation (#24146 )

## What changed

- Add a distinct `responses_compaction_v2` value for
`CodexCompactionEvent.implementation`.
- Emit that value from the remote compaction v2 path.
- Keep local compaction as `responses` and legacy `/responses/compact`
as `responses_compact`.

## Why

Remote compaction v2 and local prompt-based compaction were both
reported as `responses`, which made the analytics table collapse two
different compaction mechanisms into one implementation bucket.

## Validation

- `just fmt`
- `just test -p codex-analytics`

`just test -p codex-core` was started locally, but this PR is
intentionally being pushed for CI to finish the remaining validation.

rhan-oai · 2026-05-22 21:34:22 +00:00

6419402a7c

[codex] Add plugin id to MCP tool call items (#23737 )

Add owning plugin id to MCP tool call items so we can better filter them
at plugin level.

## Summary
- add optional `plugin_id` to MCP tool-call items and legacy begin/end
events
- propagate plugin metadata into emitted core items and app-server v2
`ThreadItem::McpToolCall`
- preserve plugin ids through app-server replay/redaction paths and
regenerate v2 schema fixtures

## Testing
- `just write-app-server-schema`
- `just fmt`
- `just fix -p codex-core`
- `cargo test -p codex-protocol -p codex-app-server-protocol`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-core mcp_tool_call_item_includes_plugin_id --lib`
- `cargo check -p codex-tui --tests`
- `cargo check -p codex-app-server --tests`
- `git diff --check`

## Notes
- `just fix -p codex-core` completed with two non-fatal
`too_many_arguments` warnings on the touched MCP notification helpers.
- A broader `cargo test -p codex-core` run passed core unit tests, then
hit shell/sandbox/snapshot failures in the integration target.
- A broader app-server downstream run hit the existing
`in_process::tests::in_process_start_clamps_zero_channel_capacity` stack
overflow; `cargo test -p codex-exec` also hit the existing sandbox
expectation mismatch in
`thread_lifecycle_params_include_legacy_sandbox_when_no_active_profile`.

Matthew Zeng · 2026-05-20 17:02:10 -07:00

0a4179bb19

Add SubagentStop hook (#22873 )

# What

<img width="1792" height="1024" alt="image"
src="https://github.com/user-attachments/assets/8f81d232-5813-4994-a61d-e42a05a93a3e"
/>

`SubagentStop` runs when a thread-spawned subagent turn is about to
finish. Thread-spawned subagents use `SubagentStop` instead of the
normal root-agent `Stop` hook.

Configured handlers match on `agent_type`. Hook input includes the
normal stop fields plus:

- `agent_id`: the child thread id.
- `agent_type`: the resolved subagent type.
- `agent_transcript_path`: the child subagent transcript path.
- `transcript_path`: the parent thread transcript path.
- `last_assistant_message`: the final assistant message from the child
turn, when available.
- `stop_hook_active`: `true` when the child is already continuing
because an earlier stop-like hook blocked completion.

`SubagentStop` shares the same completion-control semantics as `Stop`,
scoped to the child turn:

- No decision allows the child turn to finish.
- `decision: "block"` with a non-empty `reason` records that reason as
hook feedback and continues the child with that prompt.
- `continue: false` stops the child turn. If `stopReason` is present,
Codex surfaces it as the stop reason.

# Lifecycle Scope

Only thread-spawned subagents run `SubagentStop`.

Internal/system subagents such as Review, Compact, MemoryConsolidation,
and Other do not run normal `Stop` hooks and do not run `SubagentStop`.
This avoids exposing synthetic matcher labels for internal
implementation paths.

# Stack

1. #22782: add `SubagentStart`.
2. This PR: add `SubagentStop`.
3. #22882: add subagent identity to normal hook inputs.

Abhinav · 2026-05-20 14:59:41 -07:00

eee3e60db3

109 Commits