mirror of
https://github.com/pchuan98/codex.git
synced 2026-07-01 00:31:56 +08:00
dev
109 Commits
-
feat(app-server): add history_mode to thread (#29927)
## Description This PR adds a new `historyMode = "legacy" | "paginated"` to `Thread`. This will be stored in `SessionMeta` in the JSONL rollout file and as a new column in the SQLite thread_metadata table, and exposed on `thread/start` and on the `Thread` object in app-server. ## What changed - Added canonical `ThreadHistoryMode` with `legacy` and `paginated`, defaulting old and new SessionMeta to `legacy`. - Carried `history_mode` through core session config, ThreadStore stored metadata, local/in-memory stores, rollout metadata extraction, and the existing SQLite `threads` table. - Added experimental `historyMode` to app-server v2 `Thread` and `thread/start`. - Made paginated stored threads metadata-discoverable but unsupported for legacy full-history reads, `load_history`, live resume, and create paths. - Regenerated app-server schema fixtures and added protocol/state/thread-store/app-server coverage for persistence and fail-closed behavior. ## Compatibility floor Because users may be running various versions of Codex binaries on the same machine (TUI, Codex App, etc.), we will need to establish a compatibility floor for upcoming paginated threads, which will change how thread storage reads and writes work. The overall plan here: ``` Release N: - Add historyMode to SessionMeta / Thread / SQLite metadata. - Teach binaries to understand paginated threads. - If a binary sees `historyMode="paginated"` but does not support the paginated contract, it refuses to resume/mutate the thread. - Default remains `"legacy"`. Release N+1: - First-party clients start opting into paginated threads where appropriate. - Internal dogfood / staged rollout. - Measure old-client usage and paginated-thread unsupported errors. Release N+2: - Only after Release N+ is overwhelmingly deployed, make paginated the default. - Accept that a small tail of N-1-or-older binaries may not understand paginated threads. ``` The important behavior change is fail-closed handling for a binary that encounters a persisted `paginated` thread before it knows how to fully support paginated history. In app-server, if a thread is `paginated`, we will: - allow metadata-only discovery paths like `thread/list` and `thread/read(includeTurns=false)`, so clients can still see the thread and inspect its `historyMode` - reject legacy full-history/live-thread paths like `thread/read(includeTurns=true)` and `thread/resume` with an unsupported JSON-RPC error - avoid silently treating an unknown or future `historyMode` as `legacy` Under the hood, the ThreadStore layer also rejects legacy operations that would need to load or replay the full thread history for a paginated thread. That gives us the behavior we want for Release N: future paginated threads are visible, but this binary fails closed instead of trying to operate on them as if they were legacy threads.
Owen Lin ·
2026-06-26 09:12:42 -07:00 -
[codex] Attribute app-server analytics by thread originator (#29935)
## Why Desktop Work threads and regular Codex threads can share the same app-server connection. App-server analytics currently copy `product_client_id` from connection metadata for every thread-scoped event, so Work thread activity is attributed to the Desktop connection instead of the thread's resolved originator. This prevents analytics from distinguishing the two products on a shared connection. ## What changed - Publish the resolved originator after a thread is materialized, covering new, resumed, forked, and subagent threads. - Store that originator in the analytics reducer's existing per-thread state. - Override only `app_server_client.product_client_id` for thread, turn, tool, review, goal, guardian, and compaction events while preserving the connection's client name, version, and transport metadata. - Fall back to the connection-wide product client ID when a thread has no originator override. - Preserve persisted originators in thread initialization analytics for resume and fork flows. ## Validation - `just test -p codex-analytics thread_originator_overrides_shared_connection_across_thread_events subagent_events_keep_thread_originator_with_explicit_turn_connection` - `just test -p codex-app-server turn_start_tracks_thread_originator_in_analytics thread_start_tracks_thread_initialized_analytics thread_fork_tracks_thread_initialized_analytics thread_resume_tracks_thread_initialized_analytics` - `just test -p codex-core thread_manager`
alexsong-oai ·
2026-06-25 18:15:48 -07:00 -
[plugins] Track plugin install requests by ID (#29684)
Summary - Emit `codex_plugin_install_requested` when a validated plugin install request is made, before the user accepts or declines the elicitation. - Record the exact model-visible plugin ID, remote plugin ID, required connector IDs, stable suggestion ID, and `endpoint_recommendation` vs `legacy_discovery` source. - Keep `suggest_reason` out of telemetry and leave connector-only install requests unchanged. Rollout - Backend/schema dependency: https://github.com/openai/openai/pull/1065270 - Land the backend PR before this producer starts sending the event. Validation - `just test -p codex-analytics` (83 passed) - `just test -p codex-core request_plugin_install` (17 passed) - `just fix -p codex-analytics` - `just fix -p codex-core` - `just fmt` - `git diff --check`
Alex Daley ·
2026-06-24 21:29:11 +00:00 -
Support thread-level originator overrides (#29477)
## Why Work(TPP) threads can be launched from the Desktop app, but if they all keep the Desktop app's default originator then downstream attribution cannot distinguish local Work launches from cloud-backed Work launches. `thread/start.serviceName` already carries that launch signal, while `SessionMeta.originator` is the durable thread-level value that survives resume and fork. This change converts the Desktop Work service names into an effective originator at thread creation time, persists that originator with the thread, and keeps using it for later model requests and memory writes. ## What changed - Map `CODEX_WORK_LOCAL` and `CODEX_WORK_CLOUD` service names to per-thread originators, while preserving `CODEX_INTERNAL_ORIGINATOR_OVERRIDE` as the highest-precedence override. - Persist the effective originator in `SessionMeta.originator`, read it back on resume/fork, and inherit the parent originator for subagent spawns when there is no persisted session metadata. - Handle truncated `SpawnAgentForkMode::LastNTurns` forks by falling back to the live parent originator when the forked history no longer includes `SessionMeta`. - Thread the per-thread originator through Responses headers, websocket/compaction request paths, thread-store creation, rollout metadata, and memory stage-one telemetry. ## Verification - `just test -p codex-core agent::control::tests::spawn_thread_subagent_inherits_parent_originator_without_fork agent::control::tests::spawn_thread_subagent_fork_last_n_turns_inherits_parent_originator_without_session_meta thread_manager::tests::originator_override_precedes_service_name_remapping` - `just test -p codex-core agent::control::tests::resume_thread_subagent_restores_stored_metadata_and_effective_multi_agent_mode` - `just test -p codex-memories-write` - `just fix -p codex-core -p codex-memories-write` - `git diff --check`
alexsong-oai ·
2026-06-23 17:23:38 -07:00 -
[codex] rename rollout budget error to session budget error (#29744)
## Summary - rename the rollout-budget exhaustion error from `RolloutBudgetExceeded` to `SessionBudgetExceeded` - expose the matching app-server v2 wire value as `sessionBudgetExceeded` - regenerate JSON/TypeScript schema fixtures and update the app-server docs and focused tests This is a naming-only follow-up to #29715 based on [Pavel's review suggestion](https://github.com/openai/codex/pull/29715#discussion_r3463183480). Runtime behavior is unchanged. ## Tests - `just test -p codex-core rollout_budget` - `just test -p codex-app-server-protocol` - `just fmt` - `just write-app-server-schema`
rka-oai ·
2026-06-23 16:49:13 -07:00 -
[codex] surface rollout budget exhaustion (#29715)
## Summary - surface shared rollout-budget exhaustion as `CodexErr::RolloutBudgetExceeded` instead of a generic interrupted turn - map it through the existing `CodexErrorInfo` and app-server v2 `codexErrorInfo` path - keep local compaction from retrying after the shared rollout budget is exhausted This gives app-server clients a stable `rolloutBudgetExceeded` error they can classify without guessing from `status="interrupted"`. ## Tests - `just test -p codex-core rollout_budget`
rka-oai ·
2026-06-23 15:01:28 -07:00 -
Separate local and remote plugin analytics IDs (#29495)
## Why Plugin analytics overloaded `plugin_id`: most events used the Codex `<plugin>@<marketplace>` identity, while remote install events used the backend plugin ID. That makes the same field change meaning across event types and complicates downstream identity resolution. This change makes the contract unambiguous: - `plugin_id`: the local Codex `<plugin>@<marketplace>` identity, when resolved - `remote_plugin_id`: the backend plugin identity, when available For a remote install failure that happens before plugin details resolve, `plugin_id` is `null` and `remote_plugin_id` remains populated. ## What changed All six plugin analytics events use the same identity contract: - `codex_plugin_installed` - `codex_plugin_install_failed` - `codex_plugin_uninstalled` - `codex_plugin_enabled` - `codex_plugin_disabled` - `codex_plugin_used` Remote identity is resolved from the current installed-plugin snapshot first, with persisted install metadata as fallback. The telemetry metadata type keeps local identity optional for failures that occur before remote details are available. The app-server test client's manual analytics smokes now find remote mutation events through `remote_plugin_id` and validate that `plugin_id` remains local. ## Remote uninstall Resolve and capture telemetry metadata before removing the local plugin cache, then emit `codex_plugin_uninstalled` after the backend confirms success. The event is also emitted when backend uninstall succeeds but local cache cleanup reports `CacheRemove`. If a concurrent remote-cache refresh removes the local bundle before telemetry capture, the already-fetched remote plugin detail supplies fallback capability metadata. ## Validation - `just test -p codex-analytics` — 82 passed - `just test -p codex-core-plugins` — 271 passed - `just test -p codex-app-server-test-client` — 5 passed - `just test -p codex-plugin` — 3 passed - `just test -p codex-app-server plugin_install` — 37 passed - `just test -p codex-app-server plugin_uninstall` — 10 passed The production app-server install/uninstall flow was also exercised against `plugins~Plugin_f1b845ac33888191ac156169c58733c2` (`build-ios-apps@openai-curated-remote`), and the plugin's original uninstalled state was restored.
jameswt-oai ·
2026-06-23 12:27:14 -07:00 -
core: add extra metadata field to Thread struct (#29675)
# Summary Adds a field Thread.extras that can be used to hold arbitrary metadata specific to a given thread.
Boyang Niu ·
2026-06-23 19:15:59 +00:00 -
chore(core) rm AskForApproval::OnFailure (#28418)
## Summary Deletes the OnFailure variant of the `AskForApproval` enum. This option has been deprecated since #11631. ## Testing - [x] Tests pass
Dylan Hurd ·
2026-06-23 12:13:54 -07:00 -
[codex] Centralize Plugin Analytics Metadata (#27102)
This PR moves construction of `PluginTelemetryMetadata` from loader and model helpers into `PluginsManager`, which already owns installed plugin state and will eventually perform remote identity enrichment. The metadata type remains in `codex-plugin`, and serialized analytics events remain unchanged. ## Before ```mermaid flowchart LR subgraph Events["Analytics event paths"] direction TB Lifecycle["Local install / uninstall"] Config["Enable / disable"] Remote["Remote install"] Used["Plugin used"] end subgraph Construction["Metadata construction"] direction TB Loader["Loader telemetry helpers"] Summary["PluginCapabilitySummary::telemetry_metadata"] Override["Caller adds remote_plugin_id"] end Metadata["PluginTelemetryMetadata"] Lifecycle --> Loader Config --> Loader Remote --> Loader Loader -->|"local events"| Metadata Loader -->|"remote install"| Override Override --> Metadata Used --> Summary Summary --> Metadata ``` Telemetry metadata was constructed through loader helpers, a capability-summary method, and a remote-install call-site override. ## After ```mermaid flowchart LR subgraph Events["Analytics event paths"] direction TB Lifecycle["Local install / uninstall"] Config["Enable / disable"] Remote["Remote install"] Used["Plugin used"] end Manager["PluginsManager — single construction owner"] Metadata["PluginTelemetryMetadata"] Lifecycle --> Manager Config --> Manager Remote -->|"authoritative remote ID"| Manager Used -->|"capability summary"| Manager Manager --> Metadata ``` Every analytics path delegates metadata construction to `PluginsManager`. Remote install still supplies its authoritative backend ID explicitly. ## What Changes - Make loader code return a focused plugin capability summary instead of constructing analytics metadata. - Centralize immutable plugin telemetry metadata construction in `PluginsManager`. - Route local install/uninstall, remote install, enable/disable, and plugin-used emitters through the manager. - Preserve the current serialized analytics contract exactly. Normal metadata still has no remote override. Remote install continues to provide its authoritative backend ID explicitly, so the existing serializer continues reporting that ID through `plugin_id`. Snapshot-based enrichment is intentionally deferred to the final PR. ## Testing - `just test -p codex-core-plugins` (238 tests passed) - `just test -p codex-plugin` (3 tests passed) - Scoped Clippy/compile checks passed for `codex-plugin`, `codex-core-plugins`, `codex-app-server`, and `codex-core`. ## Split Overview ```text main ├── #27093 Debug analytics capture (merged) ├── #27099 Non-mutating plugin smoke (merged) ├── #27100 Remote install/uninstall smoke (merged) └── #27102 Plugin telemetry metadata refactor ← you are here └── #27669 Persist remote plugin identity After #27102 and #27669 merge: └── Final PR: add explicit local and remote IDs to plugin analytics ``` Review order and dependencies: 1. [#27093 Add debug-only analytics event capture](https://github.com/openai/codex/pull/27093) (merged) 2. [#27099 Add a plugin analytics smoke workflow](https://github.com/openai/codex/pull/27099) (merged) 3. [#27100 Add a remote plugin analytics mutation smoke workflow](https://github.com/openai/codex/pull/27100) (merged) 4. This metadata refactor, independent and based on `main` 5. [#27669 Persist remote plugin identity](https://github.com/openai/codex/pull/27669), stacked on this PR 6. Final remote-ID behavior PR, created after the prerequisites merge The original [#26281](https://github.com/openai/codex/pull/26281) remains open as the aggregate reference until the final replacement PR is published.jameswt-oai ·
2026-06-22 10:27:23 -07:00 -
Expose thread-level multi-agent mode (#28792)
## Why Once multi-agent mode can be selected per turn, clients also need to choose the initial selection when creating a thread and observe that selection through lifecycle and settings APIs. The selected value is intentionally distinct from the effective model-visible value: no client selection is represented as `null`, even though an eligible multi-agent v2 turn derives `explicitRequestOnly` as its effective default. ## What changed - Add the optional experimental `thread/start.multiAgentMode` parameter and pass it through thread creation. - Preserve an omitted initial value as an unset selection rather than eagerly storing `explicitRequestOnly`. - Apply an explicit `thread/start` selection to the first turn through the session configuration established at thread creation. - Restore the latest persisted effective mode as the selected baseline on cold resume when rollout history contains one. - Inherit the optional selected mode from a loaded parent when creating related runtime threads. - Return the current selected `multiAgentMode` from `thread/start`, `thread/resume`, `thread/fork`, and thread settings, using `null` when no mode is selected. - Keep lifecycle reporting independent from model capability and feature eligibility; core turn construction remains responsible for calculating and persisting the effective mode. ## Not covered - Clearing an existing loaded-session selection back to unset through `turn/start`; omitted or `null` currently retains the session's selection. - A TUI control, slash command, or `config.toml` preference. ## Verification - `CARGO_INCREMENTAL=0 just test -p codex-app-server-protocol` - `CARGO_INCREMENTAL=0 just test -p codex-app-server multi_agent_mode` The focused app-server coverage verifies explicit `thread/start` initialization, first-turn prompting, nullable reporting for an omitted selection, and retention of selections that are not currently runtime-eligible. ## Stack Stacked on #28685. This PR contains only the thread initialization and lifecycle/settings API layer.
Shijie Rao ·
2026-06-19 10:50:44 +02:00 -
Emit Trusted MCP App Identity on Tool-Call Items (#27132)
## Summary - Add optional `appContext` to app-server MCP tool-call items with trusted `connectorId`, `linkId`, and `mcpAppResourceUri` metadata. - Preserve that context across tool-call events, persisted history, reconnects, and thread resume. - Keep the deprecated top-level `mcpAppResourceUri` temporarily for client migration. The consumer contract is `{ appContext: { connectorId, linkId, mcpAppResourceUri }, tool }`. ## Validation - Full GitHub Actions suite passes, including CLA, Bazel tests, clippy, release builds, and argument-comment lint. --------- Co-authored-by: martinauyeung-oai <280153141+martinauyeung-oai@users.noreply.github.com>martinauyeung-oai ·
2026-06-18 14:02:54 -07:00 -
Support
openai/formextended form elicitations (#27500)# Summary Allow App Server clients to opt into `openai/form` MCP elicitations.
Gabriel Peal ·
2026-06-18 11:54:49 -07:00 -
unified-exec: retain PathUri in command events (#28780)
## Why App-server must report command events containing foreign-platform paths without changing existing client or rollout path-string formats. ## What changed - retain `PathUri` through exec command begin/end events - convert cwd values to `LegacyAppPathString` at the app-server compatibility boundary - drop command actions with foreign paths and log them - serialize rollout-trace cwd values using their inferred native path representation - restore Wine coverage for retained Windows cwd values and successful completion
Adam Perry @ OpenAI ·
2026-06-18 05:00:04 +00:00 -
[codex] Track plugin install and import telemetry failures (#28731)
## Summary - Track plugin install failures through the unified `codex_plugin_install_failed` event for local installs, remote install preflight failures, bundle failures, and remote catalog/backend failures. - Send classified `error_type` values in plugin install failure analytics instead of raw error strings. - Stop sending raw external-agent import errors in analytics while preserving raw failure details in app-facing import notifications/history. - Keep raw plugin/migration diagnostics in `tracing::warn!` logs. - Keep remote failure plugin names as the existing local placeholder (`unknown`) and remove the extra telemetry plugin-name override. - Change `ExternalAgentConfigImportParams.source` from a generated enum to `string | null`, with legacy `claudeCode` / `claudeCowork` inputs normalized to existing analytics values. ## Testing
charlesgong-openai ·
2026-06-17 13:16:34 -07:00 -
[codex] Restore thread recency with compatible migration history (#28671)
## Summary - Revert #28655, restoring the thread `recencyAt` behavior introduced by #27910. - Move `threads_recency_at` to migration 0039 so it no longer collides with `external_agent_config_imports` at version 0038. - Repair databases that already applied the recency migration as version 38 by moving the matching migration-history row to version 39 before SQLx validation. The current version-38 migration can then apply normally. ## Validation - `just test -p codex-state migrations::tests::repairs_recency_migration_that_was_applied_as_version_38` - `just test -p codex-state -p codex-rollout -p codex-thread-store -p codex-app-server-protocol -p codex-tui`: 3,439 passed; six TUI tests could not open the machine's existing read-only incident database at `~/.codex/sqlite/state_5.sqlite`. - `just fix -p codex-state` - `just fmt` - Verified that state migration versions are unique.
Jeremy Rose ·
2026-06-17 18:52:18 +00:00 -
Scope command approvals by execution environment (#28738)
## Why Command approval cache keys included the command and working directory, but not the execution environment. An approval for `/workspace` locally could therefore be reused for the same command and path on an executor. ## What changed - Include the selected environment ID in shell and unified-exec approval cache keys. - Carry that ID through the normal command approval request so clients can show which environment is being approved. - Expose the environment through app-server as a required nullable `environmentId` and show it in the inline TUI approval prompt. - Keep older recorded approval events compatible when the environment is absent. For example, `echo ok` in local `/workspace` and `echo ok` in executor `/workspace` now produce different approval keys and separate prompts. ## Scope This PR does not change network approvals, Guardian review actions, MCP elicitation, full-screen TUI rendering, or environment-ID validation. Remote `shell_command` execution itself remains in #28722; this PR only makes its approval key environment-aware.
jif ·
2026-06-17 19:52:43 +02:00 -
Revert thread recencyAt for sidebar ordering (#28655)
## Why Revert #27910 to remove the newly introduced thread `recencyAt` persistence and API behavior from `main`. ## What changed This reverts commit `fac3158c2a783095768076489815f361fa9b0db4`, including the state migration, thread-store propagation, app-server API surface, generated schemas, and related tests. ## Validation Not run before opening; relying on CI for the initial fast signal.
pakrym-oai ·
2026-06-16 21:39:30 -07:00 -
Add thread recencyAt for sidebar ordering (#27910)
## Summary Add a server-owned `recencyAt` timestamp and `recency_at` thread-list sort key for product recency ordering while preserving the existing meaning of `updatedAt` as the latest persisted thread mutation. This is the server-side alternative to #27697. Rather than narrowing `updatedAt`, clients can sort the sidebar by `recency_at` and continue treating `updatedAt` as mutation time. Paired Codex Apps PR: [openai/openai#1024599](https://github.com/openai/openai/pull/1024599) ## Contract - `recencyAt` initializes when a thread is created. - A turn start advances `recencyAt` monotonically. - Commentary, agent output, tool results, token/accounting updates, turn completion, archive, unarchive, resume, and generic metadata writes do not advance it. - `updatedAt` retains its existing behavior and continues to advance for persisted thread mutations. - Current servers populate `recencyAt`; the response field is optional in generated TypeScript so clients connected to older servers can fall back to `updatedAt`. - Filesystem-only fallback uses existing updated/mtime ordering when SQLite is unavailable. ## Persistence and compatibility Migration 0038 adds second- and millisecond-precision recency columns, backfills them from the existing updated timestamp, creates list indexes, and includes an insert trigger so older binaries writing to a migrated database seed recency without causing later mutations to advance it. Generic metadata upserts preserve existing recency values. Turn-start updates use a dedicated monotonic touch, and process-local allocation keeps millisecond cursor values unique. State DB list, search, read, filtered-list repair, rollout fallback propagation, and app-server conversions all carry the new field. ## API `Thread` responses include: ```ts recencyAt?: number ``` `thread/list` and `thread/search` accept: ```json { "sortKey": "recency_at" } ``` Generated TypeScript and JSON schemas are included. ## Validation - `just test -p codex-state` — 146 passed - `just test -p codex-rollout` — 69 passed - `just test -p codex-thread-store` — 81 passed - `just test -p codex-app-server-protocol` — 231 passed - Focused app-server list ordering, response mapping, archive/unarchive, and resume lifecycle tests passed - Scoped `just fix` for state, rollout, thread-store, app-server-protocol, and app-server - `just fmt` - `git diff --check` - Independent correctness, simplicity, elegance, security, and test-quality reviews; actionable ordering, lifecycle, query-projection, and timestamp-uniqueness findings were addressed
Jeremy Rose ·
2026-06-16 17:06:22 -07:00 -
[codex] Add interruptible sleep tool (#28429)
## Why Models sometimes need to pause briefly while waiting for external work, but using a shell command for that delay ties the wait to a process and does not naturally resume when new turn input arrives. ## What changed - add a built-in `sleep` tool behind the under-development `sleep_tool` feature - accept a bounded `duration_ms` argument, matching the millisecond convention used by unified exec - end the sleep early when either steered user input or mailbox input arrives - include elapsed wall-clock time in completed and interrupted outputs - emit a dedicated core `SleepItem` through `item/started` and `item/completed` - expose the sleep item as app-server v2 `ThreadItem::Sleep` and retain it in reconstructed thread history - regenerate the configuration schema for the new feature flag - regenerate app-server JSON and TypeScript schema fixtures ## Test plan - `just test -p codex-core sleep_tool_follows_feature_gate` - `just test -p codex-core any_new_input_interrupts_sleep` - `just test -p codex-app-server-protocol` - `just test -p codex-app-server sleep_emits_started_and_completed_items`
pakrym-oai ·
2026-06-15 21:39:21 -07:00 -
[codex-analytics] Analytics Capture to File in Debug Builds (#27093)
## This PR The original [combined remote plugin analytics PR #26281](https://github.com/openai/codex/pull/26281) mixed reusable analytics test infrastructure, two manual smoke workflows, a metadata refactor, and the final identity behavior. This PR isolates the generic capture mechanism so it can be reviewed and landed before any plugin-specific behavior. - Add a debug-only analytics destination that writes final request payloads as JSONL. - Suppress HTTP delivery whenever capture mode is selected, including after capture write failures. - Keep release behavior unchanged even when the capture environment variable is present. - Keep the mechanism generic; this PR contains no plugin-specific behavior. Set `CODEX_ANALYTICS_EVENTS_CAPTURE_FILE=/path/events.jsonl` when running a debug Codex binary to inspect the exact batched payload that would otherwise be sent to the analytics endpoint. ## Testing - `just test -p codex-analytics` (76 passed) - `just test --release -p codex-analytics` (73 passed) - CI is green across the required platform matrix. ## Split Overview ```text main ├── #27093 Debug analytics capture ← you are here │ └── #27099 Non-mutating plugin smoke │ └── #27100 Remote install/uninstall smoke └── #27102 Plugin telemetry metadata refactor After #27093, #27099, #27100, and #27102 merge: └── Final PR: add remote_plugin_id to plugin analytics ``` Review order and dependencies: 1. [#27093 Add debug-only analytics event capture](https://github.com/openai/codex/pull/27093) **(this PR, based on `main`)** 2. [#27099 Add a plugin analytics smoke workflow](https://github.com/openai/codex/pull/27099) (stacked on #27093) 3. [#27100 Add a remote plugin analytics mutation smoke workflow](https://github.com/openai/codex/pull/27100) (stacked on #27099) 4. [#27102 Centralize plugin telemetry metadata construction](https://github.com/openai/codex/pull/27102) (independent, based on `main`) 5. Final remote-ID behavior PR (created after PRs 1-4 merge) The original [#26281](https://github.com/openai/codex/pull/26281) remains open as the green aggregate reference until the final PR is published.
jameswt-oai ·
2026-06-15 16:32:38 -07:00 -
Add Guardian catalog diagnostics metadata (#27109)
## Why We need request-level evidence for Guardian cases where `codex-auto-review` is missing from the client-side model catalog and the review falls back to the parent model. ## What changed - Add `guardian_catalog_contains_auto_review` to Guardian Responses API client metadata. - Add `guardian_model_provider_id` to Guardian Responses API client metadata. - Keep review-session metadata optional so callers without metadata preserve the existing `None` path. - Add tests for override, normal preferred-model, and missing-auto-review-catalog behavior. ## Validation - `just test -p codex-core guardian_review_records_missing_auto_review_model_in_request_metadata` - `just test -p codex-core guardian_review_uses_model_catalog_override_when_preferred_review_model_exists` - `just test -p codex-core guardian_review_uses_preferred_review_model_without_model_catalog_override` - `git diff --check origin/main`
Won Park ·
2026-06-12 15:50:30 -07:00 -
Emit plugin ID on MCP tool call analytics events (#27483)
MCP tool-call items already carry the runtime-resolved plugin owner, but the analytics reducer dropped that field. Forwarding the existing value provides direct attribution without downstream server-name inference. ## Summary - emit `plugin_id` on `codex_mcp_tool_call_event` payloads - preserve `null` for MCP calls without a plugin owner - verify the serialized field through the MCP item lifecycle test ## Test - `cd codex-rs && just test -p codex-analytics` - `cd codex-rs && just fix -p codex-analytics` - `cd codex-rs && just fmt`
Chris Dong ·
2026-06-11 09:55:53 -07:00 -
[codex-analytics] Emit structured compaction codex errors (#27082)
## Summary - replace raw compaction `error` analytics with `codex_error_kind` and `codex_error_http_status_code` - derive compaction error telemetry from `CodexErr` using the same `CodexErrKind` mapping and HTTP status helper used by turn events - remove the pre-compact hook stop reason from the internal compaction outcome now that it is no longer emitted as raw analytics text ## Why Compaction `error` was a raw `CodexErr::to_string()` value, which can carry free-form provider or user-derived text. Structured Codex error fields preserve useful low-cardinality telemetry without sending the raw string. ## Validation - `just fmt` - `just test -p codex-analytics` - `just test -p codex-core compact::tests::build_token_limited_compacted_history_appends_summary_message` Attempted `just test -p codex-core`; the changed crate compiled, but the full target failed in unrelated environment-dependent tests such as missing helper binaries and shell snapshot timeouts.
rhan-oai ·
2026-06-11 06:07:06 +00:00 -
[codex-analytics] report cached input tokens for v2 compaction (#27103)
## Summary - add nullable `cached_input_tokens` to the compaction analytics event - populate it from response usage for compaction v2 - leave it `null` for other compaction implementations This adds visibility into prompt-cache usage for v2 compaction without changing compaction behavior. ## Testing - `just test -p codex-analytics` - `just test -p codex-core collect_compaction_output_accepts_additional_output_items`
rhan-oai ·
2026-06-10 22:47:22 -07:00 -
[codex] Compact when comp_hash changes (#27520)
## Summary - snapshot `comp_hash` into `TurnContext` when the turn is created and use that snapshot as the downstream source of truth - persist the turn hash in rollout context and recover it into previous-turn settings during resume and fork replay - compact existing history with the previous model only when both adjacent turns provide hashes and the values differ - record `comp_hash_changed` as the compaction reason - cover ordinary transitions, resume, and missing-hash compatibility with end-to-end tests ## Why History produced under one compaction-compatible model configuration may not be safe to carry directly into another. Compacting at the turn boundary converts that history before context updates and the new user message are added. Persisting the turn snapshot in `TurnContextItem` makes the same protection work after resuming a rollout. A missing hash is not treated as evidence of incompatibility. `None → Some`, `Some → None`, and `None → None` do not trigger compaction; only `Some(previous) → Some(current)` with unequal values does. ## Stack - depends on #27532 - #27532 is based directly on `main` ## Testing - `just test -p codex-core pre_sampling_compact_` — 6 passed - `just test -p codex-core turn_context_item_uses_turn_context_comp_hash_snapshot` — passed - `just fix -p codex-core -p codex-protocol -p codex-analytics -p codex-models-manager`
Ahmed Ibrahim ·
2026-06-11 04:11:26 +00:00 -
[codex-analytics] emit internally started turn events (#27392)
## Why Currently, the analytics reducer omits `codex_turn_event` for internally started subagent turns - It uses `TurnState.connection_id` to select app-server client and runtime metadata - `turn/start` sets this field for client-started turns, while internal subagent turns bypass that path - Spawned child threads inherit the correct connection, but turn emission does not use thread state ## What Changed - Keeps explicit `TurnState.connection_id` authoritative for client-started turns - Falls back to the matching thread’s inherited connection when the turn connection is absent - Preserves completeness gates, event schema, and post-emission state removal - Extends subagent lifecycle test coverage ## Verification - `just test -p codex-analytics` (71 tests passed) - `just fix -p codex-analytics` - `just fmt`
marksteinbrick-oai ·
2026-06-10 15:35:41 -07:00 -
[codex] Retry transient Guardian review failures (#27062)
## Background Codex can use **Auto Review** for permission requests. Instead of asking the user immediately, Codex starts a separate locked-down reviewer session called **Guardian**, which returns a structured `allow` or `deny` assessment. The Guardian reviewer is itself a Codex session, so its model request can fail for transient infrastructure reasons such as model overload, HTTP connection failure, or response-stream disconnect. Today, any such failure immediately ends the Auto Review attempt and blocks the action. This PR adds bounded retries for failures that the existing protocol explicitly identifies as transient. Linear context: [CA-539](https://linear.app/openai/issue/CA-539/retry-auto-review-infrastructure-failures-and-fall-back-to-manual) ## What changes A Guardian review can now make at most **three total attempts**: 1. Run the review normally. 2. Retry after a jittered delay of roughly 180–220 ms if the first attempt fails with an eligible error. 3. Retry after a jittered delay of roughly 360–440 ms if the second attempt also fails with an eligible error. All attempts share the original review deadline. Jitter spreads retries from concurrent clients to reduce synchronized load during broader outages. The retries do not reset the user's maximum wait time, and the backoff waits terminate early if the review is cancelled or the deadline expires. Before retrying, the existing Guardian session lifecycle decides whether the session remains usable. Healthy trunks are reused, broken trunks are removed by the existing cleanup path, and ephemeral sessions continue to clean themselves up. The review still emits one logical lifecycle to clients. Recoverable intermediate failures do not produce warnings or terminal events. ## Retry policy ### Retried up to twice - model/server overload - HTTP connection failure - response-stream connection failure - response-stream disconnect - internal server error - a final reviewer message that cannot be parsed as the required Guardian assessment ### Not retried - bad or invalid requests - authentication failures - usage limits - cyber-policy failures - errors without a structured category - a request that already exhausted the lower-level Responses retry budget - a completed Guardian turn with no assessment payload - prompt-construction failures - Guardian review timeout - cancellation or abort - a valid `deny` assessment The session-error classification uses `ErrorEvent.codex_error_info`; it does not inspect error-message strings. ## Implementation notes - `wait_for_guardian_review` preserves the complete `ErrorEvent`, including structured `codex_error_info`. - Guardian session failures preserve the original message and optional structured `CodexErrorInfo`. - The retry policy classifies the explicitly transient `CodexErrorInfo` variants; unknown, absent, and deterministic categories are not retried. - The Guardian session manager receives the caller's deadline rather than creating a new timeout per attempt. - Analytics record the final `attempt_count`. - Retry orchestration does not add a separate session-cleanup protocol; it relies on the existing trunk and ephemeral lifecycle decisions. ## Automated testing Focused Guardian coverage verifies: - every supported transient `CodexErrorInfo` is classified as retryable, while absent and non-transient categories are not; - structured transient session failure -> retry -> approval with the healthy trunk reused; - two invalid Guardian responses -> third attempt -> approval, with exactly three requests; - three invalid responses -> existing fail-closed result, with exactly three requests and one terminal lifecycle; - valid denial, missing payload, invalid request, timeout, cancellation, and prompt/session construction failures are not retried; - retry eligibility ends after the third attempt; - retry delays use the shared exponential backoff helper and remain within the expected jitter bounds; - cancellation and deadline expiry interrupt the backoff wait; - healthy trunks are reused across retryable failures; - broken event streams remove the trunk through the existing lifecycle cleanup; - an ephemeral retry does not disturb a concurrent trunk review. Validation performed: - `just test -p codex-core guardian_review_ guardian_ephemeral_retry_preserves_parallel_trunk_and_fork_history run_review_removes_trunk_when_event_stream_is_broken` — **42 passed**; - `just test -p codex-analytics` — **71 passed**; - scoped Clippy fixes for `codex-core` and `codex-analytics` passed. A prior full `codex-core` run had unrelated environment-sensitive failures outside Guardian coverage. ## Manual QA The focused integration tests use the local mock Responses server to inspect exact request counts and emitted lifecycle events. They confirm that retries are internal, a successful later attempt supplies the final decision, non-retryable failures issue only one request, and exhausted retries emit only one terminal result.
kbazzi ·
2026-06-10 11:46:57 -07:00 -
[codex] Fix post-merge analytics integration failures (#27285)
## Why Recent merges left `main` with analytics integration build failures. Local Cargo runs also made the trimmed-skills test depend on developer-installed skills, while Bazel used an isolated home. ## What changed - Clone `thread_metadata.thread_source` when constructing goal analytics event parameters. - Group app-server thread extension inputs into `ThreadExtensionDependencies`. - Isolate the trimmed-skills test home so its exact fixture count is stable across Cargo and Bazel. ## Validation - `cargo check -p codex-analytics` - `just test -p codex-analytics` (71 tests) - `just test -p codex-app-server` (837 tests; one unrelated zsh-fork timeout passed on retry)
Adam Perry @ OpenAI ·
2026-06-09 20:52:09 -07:00 -
[codex-analytics] emit goal lifecycle analytics (#27078)
## Why - Currently, there is no analytics event for `/goal` behavior - Existing events cannot identify goal execution or its resulting outcome - The original update in [#26182](https://github.com/openai/codex/pull/26182) was implemented before `/goal` moved into `codex-goal-extension`. ## What Changed - Adds `codex_goal_event` serialization and enrichment to `codex-analytics` - Emits goal events from the canonical `codex-goal-extension` mutation and accounting paths: - `created` when a new logical goal is persisted - `usage_accounted` when cumulative goal usage is persisted - `status_changed` when the stored goal status changes - `cleared` when the goal is deleted - Preserves causal `turn_id` for turn driven events and uses null attribution for external or idle lifecycle events - Changes goal deletion to return the deleted row so `cleared` retains the stable goal ID ## Event Details Includes standard analytics metadata along with goal specific fields: - `goal_id`: Stable ID stored in the local SQLite goal row and shared across the goal's events - `event_kind`: Observed operation (see the 4 lifecycle events cited in the above bullet) - `goal_status`: Resulting or last stored status: `active`, `paused`, `blocked`, `usage_limited`, etc. - `has_token_budget`: Indicates whether a token budget is configured - `turn_id`: Causal turn ID, or null when no causal turn exists - `cumulative_tokens_accounted`: Cumulative tokens on `usage_accounted` events; null otherwise - `cumulative_time_accounted_seconds`: Cumulative active time on `usage_accounted` events; null otherwise ## Validation - `just test -p codex-analytics -p codex-state -p codex-goal-extension` - `just test -p codex-core -E 'test(/goal/)'` - `just test -p codex-app-server` - `cargo build -p codex-analytics -p codex-core -p codex-state -p codex-app-server`
marksteinbrick-oai ·
2026-06-09 18:45:54 -07:00 -
[codex-analytics] add extensible feature thread sources (#27063)
## Why - `ThreadSource` currently defines a closed set of core-owned values - Product features also create threads for background or scheduled work - Adding every product-specific value to the core enum would require repeated `codex-rs` protocol changes - Feature-backed values let product callers provide precise attribution while preserving the existing core classifications ## What Changed - Adds `ThreadSource::Feature(String)` for app-owned thread source values - Represents all app-server v2 thread sources as scalar strings, so a feature source is supplied as `"automation"` - Persists and emits the feature's plain string label, so `"automation"` produces `thread_source="automation"` in analytics - Keeps `user`, `subagent`, and `memory_consolidation` as explicit core-owned values and regenerates the app-server schemas and TypeScript bindings ## Verification - `just write-app-server-schema` - `cargo check --workspace` - `just test -p codex-protocol feature_thread_source_serializes_as_its_app_owned_label` - `just test -p codex-app-server-protocol thread_sources_round_trip_as_scalar_labels` - `cargo test -p codex-analytics thread_initialized_event_serializes_expected_shape` - `just fmt`
marksteinbrick-oai ·
2026-06-09 12:27:10 -07:00 -
multi-agent: add path-based v2 activity tracking (#27007)
## Why Multi-agent v2 identifies agents by canonical paths, but its tool handlers still emitted the larger legacy collaboration begin/end events built around nickname and role metadata. App-server, rollout-trace, analytics, and TUI consumers therefore lacked one compact path-based completion signal that behaved consistently across live events and replay. The TUI also needs a bounded `/agent` status surface for v2 agents. It should use recent local activity for previews, refresh liveness without loading full histories, and keep the legacy picker available when no path-backed v2 agent is known. ## What changed - Replace the v2 `spawn_agent`, `send_message`, `followup_task`, and `interrupt_agent` legacy lifecycle emissions with a success-only `SubAgentActivity` event. The event records the tool call ID, occurrence time, affected thread, canonical agent path, and `started`, `interacted`, or `interrupted` kind. - Expose the activity as a completion-only app-server v2 `subAgentActivity` thread item in live notifications and reconstructed history, regenerate the protocol schemas, and count it in sub-agent tool analytics. - Track canonical paths from live activity and loaded-thread metadata in the TUI, and render the activity in live and replayed transcripts. - Make `/agent` list running path-backed agents with summaries from bounded local event buffers. Each summary is capped at 240 graphemes, the scan is capped at six recent items, only the last three wrapped lines are shown, and command output is omitted. Liveness falls back to metadata-only `thread/read` when local turn state is unavailable. - Persist the activity as a terminal rollout-trace runtime payload and reduce it to the corresponding spawn, send, follow-up, or close interaction edge. `interrupt_agent` is classified as a close-edge operation. - Preserve the legacy picker when no path-backed v2 agent is known. ## Compatibility App-server v2 clients that consumed `collabAgentToolCall` begin/end pairs for these tools must handle the new completion-only `subAgentActivity` item. Legacy v1 collaboration behavior is unchanged. ## Screenshot <img width="684" height="288" alt="Screenshot 2026-06-08 at 15 40 47" src="https://github.com/user-attachments/assets/194b3cd0-619d-45fb-b587-cf3e2b1b8a1d" /> ## Testing - `just test -p codex-app-server-protocol` - `just test -p codex-rollout-trace` - Added focused coverage for activity analytics, terminal trace serialization, spawn-edge reduction, `interrupt_agent` classification, TUI status rendering without aggregated command output, and clearing stale running state after a completed turn.
jif ·
2026-06-09 12:14:48 +02:00 -
[codex-analytics] stop sending codex error subreason (#27060)
## Summary - stop emitting `codex_error_subreason` on `codex_turn_event` - remove the transient analytics fact plumbing that copied `CodexErr::InvalidRequest(String)` into the event - update analytics serialization coverage accordingly ## Why `codex_error_subreason` is a free-form copy of `InvalidRequest(String)`, including raw provider 400 bodies in some paths. That makes it unsafe as an analytics field because it can carry user-derived or sensitive text. ## Validation - `just fmt` - `just test -p codex-analytics`
rhan-oai ·
2026-06-08 21:29:06 +00:00 -
[codex-analytics] report compaction analytics details (#26680)
## Why Compaction analytics adds retained image count and compaction summary output tokens for v1.5 specifically. ## What changed - Add nullable `retained_image_count` and `compaction_summary_tokens` fields to `codex_compaction_event`. - Populate them only for `responses_compaction_v2`: retained images come from the retained v2 compacted history, and summary tokens come from `response.completed.token_usage.output_tokens`. - Leave local and legacy remote compaction events as `null` for these detail fields. ## Verification - `just fmt` - `just fix -p codex-core` - `just test -p codex-core build_v2_compacted_history_counts_retained_input_images` - `git diff --check`
rhan-oai ·
2026-06-08 10:52:31 -07:00 -
[codex] Add turn profiling analytics (#26484)
## Summary Add flat profiling fields to `codex_turn_event` so analytics can explain where turn wall-clock time is spent without changing tool execution behavior. The profile reports: - time before the first sampling request - sampling time across all attempts and follow-ups - overhead between sampling requests - time blocked in the post-sampling tool drain - time after the final sampling request - sampling request and retry counts ## Implementation - Extend the existing turn timing state with constant-memory phase accounting and one RAII phase guard. - Observe sampling and the existing post-sampling drain only at turn orchestration boundaries. - Keep tool runtime, tool futures, response item handling, and turn lifecycle values unchanged. - Add the profiling fields directly to the existing analytics turn event without changing app-server protocol or rollout persistence. - Use the existing turn `status` to distinguish completed, failed, and interrupted profiles. Exact sampling/tool overlap is intentionally omitted because measuring tool completion accurately would require hooks in the tool execution path. ## Validation - Add app-server end-to-end coverage for a single-sampling turn with no blocking tool work. - Add app-server end-to-end coverage for `request_user_input` blocking followed by a second sampling request. - CI is running on the PR; tests were not executed locally per repository guidance.
Ahmed Ibrahim ·
2026-06-05 11:27:10 -07:00 -
[codex-analytics] emit forked thread id on initialization (#26248)
## Why - Thread initialization analytics do not identify the source thread for forked threads. - The session viewer needs this lineage to construct thread trees. - Depends on openai/openai#987854. Do not release this change before that backend schema change is deployed. ## What Changed - Adds optional `forked_from_thread_id` to `codex_thread_initialized`. - Populates it from the existing thread fork lineage for app-server and in-process subagent initialization paths. - Keeps it null for non-forked threads. ## Verification - `just fmt` - `just test -p codex-analytics` - `just test -p codex-app-server thread_fork_tracks_thread_initialized_analytics`
kbazzi ·
2026-06-04 11:24:12 -07:00 -
log plugin MCP server names (#26002)
## Summary - emit the plugin capability summary's exact MCP server names in `codex_plugin_used` ## Test - `just test -p codex-analytics` - `just test -p codex-core explicit_plugin_mentions_track_plugin_used_analytics` - `just fix -p codex-analytics`
Chris Dong ·
2026-06-03 16:06:52 -07:00 -
Populate workspace kind on Codex turn events (#25135)
## Summary - carry `workspace_kind` from Responses API client metadata into the turn resolved analytics fact - serialize the optional value on `codex_turn_event` - cover both the turn metadata source and turn event serialization The `workspace_kind` tells us whether a thread had a project attached vs projectless. this is an indicator for who is adopting Codex for knowledge work outside of coding ## Testing - `env UV_CACHE_DIR=/private/tmp/uv-cache /private/tmp/cargo-tools/bin/just fmt` - `env PATH=/private/tmp/cargo-tools/bin:$PATH CARGO_HOME=/private/tmp/cargo-home UV_CACHE_DIR=/private/tmp/uv-cache /private/tmp/cargo-tools/bin/just test -p codex-analytics` - `env PATH=/private/tmp/cargo-tools/bin:$PATH CARGO_HOME=/private/tmp/cargo-home UV_CACHE_DIR=/private/tmp/uv-cache /private/tmp/cargo-tools/bin/just test -p codex-core turn_metadata` Paired with openai/openai#970661, which keeps forwarding the same metadata key through Responses API headers.
knittel-openai ·
2026-06-02 12:46:14 -07:00 -
Propagate permission approval environment id (#25862)
## Stack 1. #25850 - Key request-permission grants by environment: stores and applies sticky permission grants per environment id. 2. #25858 - Add `environmentId` to `request_permissions`: lets the model target a selected environment and resolves relative permission paths against it. 3. This PR (#25862) - Propagate permission approval environment id: carries the selected environment id through approval events, app-server requests, TUI prompts, and delegate forwarding. 4. #25867 - Add remote request permissions integration coverage: verifies the selected remote environment across request, approval, grant reuse, and exec. This PR is stacked on #25858, and #25867 is stacked on this PR. ## Why PR2 lets the model bind a `request_permissions` call to a selected environment, but the approval event and client-facing request still needed to carry that binding. For CCA, the user-facing prompt and delegated approval path should know which environment the grant applies to instead of relying on cwd alone. ## What Changed - Added optional `environmentId` to `RequestPermissionsEvent`. - Emit the selected environment id from core permission approval events. - Preserve the environment id through delegate forwarding, including cwd-based delegated requests. - Added `environmentId` to app-server permission approval params, generated schema/TypeScript artifacts, and README examples. - Preserve and display the environment id in TUI permission approval prompts. - Updated focused core, app-server protocol, and TUI conversion coverage. ## Testing Not run locally per instruction. Performed read-only `git diff --check`.
jif ·
2026-06-02 21:09:34 +02:00 -
[codex-analytics] Track CodexErr details in turn analytics (#25707)
## Summary - add analytics-only `CodexErr` telemetry to `codex_turn_event` while leaving existing `turn_error` unchanged - record terminal `CodexErr` facts from core immediately before the existing turn error event is sent - emit source-truth `codex_error_*` fields for downstream analytics, including the raw `CodexErr::InvalidRequest(String)` message as `codex_error_subreason` ## Validation - `just test -p codex-analytics` - attempted `just test -p codex-core`, but the local run timed out across unrelated integration suites in this environment and is not being used as validation
rhan-oai ·
2026-06-02 11:40:35 -07:00 -
store and expose parent_thread_id on Threads (#25113)
## Why This PR https://github.com/openai/codex/pull/24161#discussion_r3325692763 revealed a subagent data modeling issue, where we overloaded `forked_from_id` to also mean `parent_thread_id`. That's incorrect since guardian and review subagents can be a subagent and NOT fork the main thread's history. The solution here is to explicitly store a new `parent_thread_id` on `SessionMeta`, alongside `forked_from_id` which already exists. While we're at it, also expose it in the app-server protocol on the `Thread` object. A thread->subagent relationship and a fork of thread history are orthogonal concepts. ## What Changed - Added top-level `parent_thread_id` persistence on `SessionMeta` and runtime/session plumbing through `SessionConfiguredEvent`, `CodexSpawnArgs`, `SessionConfiguration`, `ThreadConfigSnapshot`, `TurnContext`, and `ModelClient`. - Made turn metadata, request headers, analytics, and subagent-start events read the separate runtime/top-level parent field instead of deriving general parent lineage from `SessionSource` or `forked_from_thread_id`. - Passed parent lineage separately at delegated subagent, review, guardian, agent-job, and multi-agent spawn construction sites; copied-history fork lineage remains derived only from `InitialHistory`. - Persisted and exposed parent lineage through rollout/thread-store projections and app-server v2 `Thread.parentThreadId`. - Updated app-server README text and regenerated app-server schema fixtures for the additive `parentThreadId` response field.
Owen Lin ·
2026-06-01 04:33:20 +00:00 -
Add cloud-managed config layer support (#24620)
## Summary PR 3 of 5 in the cloud-managed config client stack. Adds enterprise-managed cloud config as a first-class config layer source. The layer metadata is preserved through config loading, diagnostics, debug output, hook attribution, and app-server protocol surfaces. ## Details - Enterprise-managed config becomes a normal config layer source with backend-supplied `id` and display `name` attached for provenance. - These layers are designed to behave like non-file managed config: they can surface syntax/type diagnostics by layer name even though there is no physical config file. - Relative path settings are resolved from a stored config base so cloud-delivered config remains consistent with existing MDM-delivered config semantics. - Hook attribution distinguishes config-delivered hooks from requirements-delivered hooks via `HookSource::CloudManagedConfig`. - This remains pull-based and snapshot-oriented; the PR adds layer identity/diagnostics, not dynamic reload behavior. ## Validation Validated through the targeted stack checks after rebasing onto current `main`: - Rust crate tests for config/hooks/cloud-config/backend-client/app-server-protocol - Filtered `codex-core` and `codex-app-server` `cloud_config_bundle` tests - Python generated-file contract test - `cargo shear --deny-warnings` - Targeted `argument-comment-lint` for config/hooks
joeflorencio-openai ·
2026-05-31 15:54:31 -07:00 -
Add subagent lineage metadata for responsesapi (#24161)
## Why We recently added `forked_from_thread_id` which lets us trace where a thread's _context_ comes from, but we also want to understand subagent lineage (e.g. which parent thread spawned this subagent? what kind of subagent is it?) which is orthogonal. This PR adds `parent_thread_id` and `subagent_kind` to the `x-codex-turn-metadata` header sent to ResponsesAPI. ## What changed - Adds `parent_thread_id` and `subagent_kind` to core-owned `x-codex-turn-metadata`. - Restores persisted `SessionSource` and `ThreadSource` from resumed session metadata so cold-resumed subagent threads keep their lineage on later Responses API requests. - Centralizes parent-thread extraction on `SessionSource` / `SubAgentSource` and reuses it in the Responses client, analytics, agent control, and state parsing paths. - Extends reserved-key, git-enrichment, thread-spawn, and app-server v2 metadata coverage for the new lineage fields. ## Verification - Not run locally per request. - Added focused coverage in `core/src/turn_metadata_tests.rs` and `app-server/tests/suite/v2/client_metadata.rs`.
Owen Lin ·
2026-05-29 11:28:12 -07:00 -
[codex] Add user input client ids (#24653)
## Summary Adds an optional `clientId` field to app-server v2 `UserInput` and carries it through the core `UserInput` model so clients can correlate echoed user input items without relying on payload equality. ## Details - Adds `client_id: Option<String>` to core `UserInput` variants. - Exposes the v2 app-server field as `clientId` on the wire and in generated TypeScript. - Preserves the id when converting between app-server v2 and core protocol types. - Regenerates app-server schema fixtures. ## Validation - `just fmt` - `just write-app-server-schema` - `cargo test -p codex-app-server-protocol` - `cargo test -p codex-protocol` - `just fix -p codex-app-server-protocol` - `just fix -p codex-protocol` - `git diff --check`
Alexi Christakis ·
2026-05-28 14:54:39 -07:00 -
feat(app-server): include turns page on thread resume (#23534)
## Summary The client currently calls `thread/resume` to establish live updates and immediately follows it with `thread/turns/list` to hydrate recent turns. This lets `thread/resume` return that page directly, eliminating a round trip and the ordering/deduplication gap between the two calls. Experimental clients opt in with `initialTurnsPage: { limit, sortDirection, itemsView }`. The response returns `initialTurnsPage` as a `TurnsPage`, including cursors for paging further back in history. Keeping the controls in a nested opt-in object provides the useful `thread/turns/list` knobs without spreading page-specific parameters across `thread/resume`. ## Verification - `just fmt` - `just write-app-server-schema --experimental` - `just write-app-server-schema` - `cargo test -p codex-app-server-protocol` - `cargo test -p codex-app-server thread_resume_initial_turns_page_matches_requested_turns_list_page --tests` - `cargo test -p codex-app-server thread_resume_rejoins_running_thread_even_with_override_mismatch --tests` - `just fix -p codex-app-server-protocol -p codex-app-server`Brent Traut ·
2026-05-28 09:18:13 -07:00 -
[codex-analytics] add grouped session id to runtime events (#24655)
## Why - Runtime analytics events report `thread_id`, which identifies the individual thread emitting an event - They don't report `session_id`, which identifies the shared session for a root thread and its subagent threads - Emitting both identifiers allows analytics to group related activity ## What Changed - Adds `session_id` to relevant analytics events (thread_initalized, turn, turn_steer, compaction, guardian_review) - Tracks each thread's session ID in the analytics reducer so subsequent thread scoped events emit the same value - Carries the shared session ID through subagent initialization ## Verification - `just test -p codex-analytics` validates event payloads and subagent session grouping. - Focused `codex-app-server` tests validate session IDs for thread, turn, and steer events. - Focused `codex-core` tests validate root and subagent session ID propagation.
marksteinbrick-oai ·
2026-05-26 16:38:46 -07:00 -
Add experimental turn additional context (#24154)
## Summary Adds experimental `additionalContext` support to `turn/start` and `turn/steer` so clients can provide ephemeral external context, such as browser or automation state, without turning that plumbing into a visible user prompt or triggering user-prompt lifecycle behavior. ## API Shape The parameter shape is: ```ts additionalContext?: Record<string, { value: string kind: "untrusted" | "application" }> | null ``` Example: ```json { "additionalContext": { "browser_info": { "value": "Active tab is CI failures.", "kind": "untrusted" }, "automation_info": { "value": "CI rerun is in progress.", "kind": "application" } } } ``` The keys are opaque and caller-defined. ## Context Injection When provided, accepted entries are inserted into model context as hidden contextual message items, not as visible thread user-message items. `kind: "untrusted"` entries are inserted with role `user`: ```text <external_${key}>${value}</external_${key}> ``` `kind: "application"` entries are inserted with role `developer`: ```text <${key}>${value}</${key}> ``` Values are not escaped. Each value is truncated to 1k approximate tokens before wrapping. For `turn/start`, accepted additional context is inserted before normal user input. For `turn/steer`, additional context is merged only when the steer includes non-empty user input; context-only steers still reject as empty input. ## Dedupe Strategy `AdditionalContextStore` lives on session state and stores the latest complete additional-context map. Each `turn/start` or non-empty `turn/steer` treats its `additionalContext` as the current complete set of values. Entries are injected only when the key is new or the exact entry for that key changed, including `value` or `kind`. After merging, the store is replaced with the provided map, so omitted keys are removed from the retained set and can be injected again later if reintroduced. Omitting `additionalContext`, passing `null`, or passing an empty object resets the store to empty and injects nothing. ## What Changed - Threads experimental v2 `additionalContext` through app-server into core turn start and steer handling. - Adds separate contextual fragment types for untrusted user-role context and application developer-role context. - Uses pending response input items so additional context can be combined with normal user input without treating it as prompt text. - Adds integration coverage for start/steer flow, role routing, dedupe/reset behavior, deletion/re-add behavior, hook-blocked input behavior, empty context-only steer rejection, external-fragment marker matching, and truncation.pakrym-oai ·
2026-05-26 13:02:34 -07:00 -
[codex-analytics] split compaction v2 analytics implementation (#24146)
## What changed - Add a distinct `responses_compaction_v2` value for `CodexCompactionEvent.implementation`. - Emit that value from the remote compaction v2 path. - Keep local compaction as `responses` and legacy `/responses/compact` as `responses_compact`. ## Why Remote compaction v2 and local prompt-based compaction were both reported as `responses`, which made the analytics table collapse two different compaction mechanisms into one implementation bucket. ## Validation - `just fmt` - `just test -p codex-analytics` `just test -p codex-core` was started locally, but this PR is intentionally being pushed for CI to finish the remaining validation.
rhan-oai ·
2026-05-22 21:34:22 +00:00 -
[codex] Add plugin id to MCP tool call items (#23737)
Add owning plugin id to MCP tool call items so we can better filter them at plugin level. ## Summary - add optional `plugin_id` to MCP tool-call items and legacy begin/end events - propagate plugin metadata into emitted core items and app-server v2 `ThreadItem::McpToolCall` - preserve plugin ids through app-server replay/redaction paths and regenerate v2 schema fixtures ## Testing - `just write-app-server-schema` - `just fmt` - `just fix -p codex-core` - `cargo test -p codex-protocol -p codex-app-server-protocol` - `cargo test -p codex-app-server-protocol` - `cargo test -p codex-core mcp_tool_call_item_includes_plugin_id --lib` - `cargo check -p codex-tui --tests` - `cargo check -p codex-app-server --tests` - `git diff --check` ## Notes - `just fix -p codex-core` completed with two non-fatal `too_many_arguments` warnings on the touched MCP notification helpers. - A broader `cargo test -p codex-core` run passed core unit tests, then hit shell/sandbox/snapshot failures in the integration target. - A broader app-server downstream run hit the existing `in_process::tests::in_process_start_clamps_zero_channel_capacity` stack overflow; `cargo test -p codex-exec` also hit the existing sandbox expectation mismatch in `thread_lifecycle_params_include_legacy_sandbox_when_no_active_profile`.
Matthew Zeng ·
2026-05-20 17:02:10 -07:00 -
Add SubagentStop hook (#22873)
# What <img width="1792" height="1024" alt="image" src="https://github.com/user-attachments/assets/8f81d232-5813-4994-a61d-e42a05a93a3e" /> `SubagentStop` runs when a thread-spawned subagent turn is about to finish. Thread-spawned subagents use `SubagentStop` instead of the normal root-agent `Stop` hook. Configured handlers match on `agent_type`. Hook input includes the normal stop fields plus: - `agent_id`: the child thread id. - `agent_type`: the resolved subagent type. - `agent_transcript_path`: the child subagent transcript path. - `transcript_path`: the parent thread transcript path. - `last_assistant_message`: the final assistant message from the child turn, when available. - `stop_hook_active`: `true` when the child is already continuing because an earlier stop-like hook blocked completion. `SubagentStop` shares the same completion-control semantics as `Stop`, scoped to the child turn: - No decision allows the child turn to finish. - `decision: "block"` with a non-empty `reason` records that reason as hook feedback and continues the child with that prompt. - `continue: false` stops the child turn. If `stopReason` is present, Codex surfaces it as the stop reason. # Lifecycle Scope Only thread-spawned subagents run `SubagentStop`. Internal/system subagents such as Review, Compact, MemoryConsolidation, and Other do not run normal `Stop` hooks and do not run `SubagentStop`. This avoids exposing synthetic matcher labels for internal implementation paths. # Stack 1. #22782: add `SubagentStart`. 2. This PR: add `SubagentStop`. 3. #22882: add subagent identity to normal hook inputs.
Abhinav ·
2026-05-20 14:59:41 -07:00