mirror of
https://github.com/pchuan98/codex.git
synced 2026-07-01 00:31:56 +08:00
fdd72e9cd9a952e14bc123d2c8cd13d950c1928a
88 Commits
-
Add Guardian catalog diagnostics metadata (#27109)
## Why We need request-level evidence for Guardian cases where `codex-auto-review` is missing from the client-side model catalog and the review falls back to the parent model. ## What changed - Add `guardian_catalog_contains_auto_review` to Guardian Responses API client metadata. - Add `guardian_model_provider_id` to Guardian Responses API client metadata. - Keep review-session metadata optional so callers without metadata preserve the existing `None` path. - Add tests for override, normal preferred-model, and missing-auto-review-catalog behavior. ## Validation - `just test -p codex-core guardian_review_records_missing_auto_review_model_in_request_metadata` - `just test -p codex-core guardian_review_uses_model_catalog_override_when_preferred_review_model_exists` - `just test -p codex-core guardian_review_uses_preferred_review_model_without_model_catalog_override` - `git diff --check origin/main`
Won Park ·
2026-06-12 15:50:30 -07:00 -
Emit plugin ID on MCP tool call analytics events (#27483)
MCP tool-call items already carry the runtime-resolved plugin owner, but the analytics reducer dropped that field. Forwarding the existing value provides direct attribution without downstream server-name inference. ## Summary - emit `plugin_id` on `codex_mcp_tool_call_event` payloads - preserve `null` for MCP calls without a plugin owner - verify the serialized field through the MCP item lifecycle test ## Test - `cd codex-rs && just test -p codex-analytics` - `cd codex-rs && just fix -p codex-analytics` - `cd codex-rs && just fmt`
Chris Dong ·
2026-06-11 09:55:53 -07:00 -
[codex-analytics] Emit structured compaction codex errors (#27082)
## Summary - replace raw compaction `error` analytics with `codex_error_kind` and `codex_error_http_status_code` - derive compaction error telemetry from `CodexErr` using the same `CodexErrKind` mapping and HTTP status helper used by turn events - remove the pre-compact hook stop reason from the internal compaction outcome now that it is no longer emitted as raw analytics text ## Why Compaction `error` was a raw `CodexErr::to_string()` value, which can carry free-form provider or user-derived text. Structured Codex error fields preserve useful low-cardinality telemetry without sending the raw string. ## Validation - `just fmt` - `just test -p codex-analytics` - `just test -p codex-core compact::tests::build_token_limited_compacted_history_appends_summary_message` Attempted `just test -p codex-core`; the changed crate compiled, but the full target failed in unrelated environment-dependent tests such as missing helper binaries and shell snapshot timeouts.
rhan-oai ·
2026-06-11 06:07:06 +00:00 -
[codex-analytics] report cached input tokens for v2 compaction (#27103)
## Summary - add nullable `cached_input_tokens` to the compaction analytics event - populate it from response usage for compaction v2 - leave it `null` for other compaction implementations This adds visibility into prompt-cache usage for v2 compaction without changing compaction behavior. ## Testing - `just test -p codex-analytics` - `just test -p codex-core collect_compaction_output_accepts_additional_output_items`
rhan-oai ·
2026-06-10 22:47:22 -07:00 -
[codex] Compact when comp_hash changes (#27520)
## Summary - snapshot `comp_hash` into `TurnContext` when the turn is created and use that snapshot as the downstream source of truth - persist the turn hash in rollout context and recover it into previous-turn settings during resume and fork replay - compact existing history with the previous model only when both adjacent turns provide hashes and the values differ - record `comp_hash_changed` as the compaction reason - cover ordinary transitions, resume, and missing-hash compatibility with end-to-end tests ## Why History produced under one compaction-compatible model configuration may not be safe to carry directly into another. Compacting at the turn boundary converts that history before context updates and the new user message are added. Persisting the turn snapshot in `TurnContextItem` makes the same protection work after resuming a rollout. A missing hash is not treated as evidence of incompatibility. `None → Some`, `Some → None`, and `None → None` do not trigger compaction; only `Some(previous) → Some(current)` with unequal values does. ## Stack - depends on #27532 - #27532 is based directly on `main` ## Testing - `just test -p codex-core pre_sampling_compact_` — 6 passed - `just test -p codex-core turn_context_item_uses_turn_context_comp_hash_snapshot` — passed - `just fix -p codex-core -p codex-protocol -p codex-analytics -p codex-models-manager`
Ahmed Ibrahim ·
2026-06-11 04:11:26 +00:00 -
[codex-analytics] emit internally started turn events (#27392)
## Why Currently, the analytics reducer omits `codex_turn_event` for internally started subagent turns - It uses `TurnState.connection_id` to select app-server client and runtime metadata - `turn/start` sets this field for client-started turns, while internal subagent turns bypass that path - Spawned child threads inherit the correct connection, but turn emission does not use thread state ## What Changed - Keeps explicit `TurnState.connection_id` authoritative for client-started turns - Falls back to the matching thread’s inherited connection when the turn connection is absent - Preserves completeness gates, event schema, and post-emission state removal - Extends subagent lifecycle test coverage ## Verification - `just test -p codex-analytics` (71 tests passed) - `just fix -p codex-analytics` - `just fmt`
marksteinbrick-oai ·
2026-06-10 15:35:41 -07:00 -
[codex] Retry transient Guardian review failures (#27062)
## Background Codex can use **Auto Review** for permission requests. Instead of asking the user immediately, Codex starts a separate locked-down reviewer session called **Guardian**, which returns a structured `allow` or `deny` assessment. The Guardian reviewer is itself a Codex session, so its model request can fail for transient infrastructure reasons such as model overload, HTTP connection failure, or response-stream disconnect. Today, any such failure immediately ends the Auto Review attempt and blocks the action. This PR adds bounded retries for failures that the existing protocol explicitly identifies as transient. Linear context: [CA-539](https://linear.app/openai/issue/CA-539/retry-auto-review-infrastructure-failures-and-fall-back-to-manual) ## What changes A Guardian review can now make at most **three total attempts**: 1. Run the review normally. 2. Retry after a jittered delay of roughly 180–220 ms if the first attempt fails with an eligible error. 3. Retry after a jittered delay of roughly 360–440 ms if the second attempt also fails with an eligible error. All attempts share the original review deadline. Jitter spreads retries from concurrent clients to reduce synchronized load during broader outages. The retries do not reset the user's maximum wait time, and the backoff waits terminate early if the review is cancelled or the deadline expires. Before retrying, the existing Guardian session lifecycle decides whether the session remains usable. Healthy trunks are reused, broken trunks are removed by the existing cleanup path, and ephemeral sessions continue to clean themselves up. The review still emits one logical lifecycle to clients. Recoverable intermediate failures do not produce warnings or terminal events. ## Retry policy ### Retried up to twice - model/server overload - HTTP connection failure - response-stream connection failure - response-stream disconnect - internal server error - a final reviewer message that cannot be parsed as the required Guardian assessment ### Not retried - bad or invalid requests - authentication failures - usage limits - cyber-policy failures - errors without a structured category - a request that already exhausted the lower-level Responses retry budget - a completed Guardian turn with no assessment payload - prompt-construction failures - Guardian review timeout - cancellation or abort - a valid `deny` assessment The session-error classification uses `ErrorEvent.codex_error_info`; it does not inspect error-message strings. ## Implementation notes - `wait_for_guardian_review` preserves the complete `ErrorEvent`, including structured `codex_error_info`. - Guardian session failures preserve the original message and optional structured `CodexErrorInfo`. - The retry policy classifies the explicitly transient `CodexErrorInfo` variants; unknown, absent, and deterministic categories are not retried. - The Guardian session manager receives the caller's deadline rather than creating a new timeout per attempt. - Analytics record the final `attempt_count`. - Retry orchestration does not add a separate session-cleanup protocol; it relies on the existing trunk and ephemeral lifecycle decisions. ## Automated testing Focused Guardian coverage verifies: - every supported transient `CodexErrorInfo` is classified as retryable, while absent and non-transient categories are not; - structured transient session failure -> retry -> approval with the healthy trunk reused; - two invalid Guardian responses -> third attempt -> approval, with exactly three requests; - three invalid responses -> existing fail-closed result, with exactly three requests and one terminal lifecycle; - valid denial, missing payload, invalid request, timeout, cancellation, and prompt/session construction failures are not retried; - retry eligibility ends after the third attempt; - retry delays use the shared exponential backoff helper and remain within the expected jitter bounds; - cancellation and deadline expiry interrupt the backoff wait; - healthy trunks are reused across retryable failures; - broken event streams remove the trunk through the existing lifecycle cleanup; - an ephemeral retry does not disturb a concurrent trunk review. Validation performed: - `just test -p codex-core guardian_review_ guardian_ephemeral_retry_preserves_parallel_trunk_and_fork_history run_review_removes_trunk_when_event_stream_is_broken` — **42 passed**; - `just test -p codex-analytics` — **71 passed**; - scoped Clippy fixes for `codex-core` and `codex-analytics` passed. A prior full `codex-core` run had unrelated environment-sensitive failures outside Guardian coverage. ## Manual QA The focused integration tests use the local mock Responses server to inspect exact request counts and emitted lifecycle events. They confirm that retries are internal, a successful later attempt supplies the final decision, non-retryable failures issue only one request, and exhausted retries emit only one terminal result.
kbazzi ·
2026-06-10 11:46:57 -07:00 -
[codex] Fix post-merge analytics integration failures (#27285)
## Why Recent merges left `main` with analytics integration build failures. Local Cargo runs also made the trimmed-skills test depend on developer-installed skills, while Bazel used an isolated home. ## What changed - Clone `thread_metadata.thread_source` when constructing goal analytics event parameters. - Group app-server thread extension inputs into `ThreadExtensionDependencies`. - Isolate the trimmed-skills test home so its exact fixture count is stable across Cargo and Bazel. ## Validation - `cargo check -p codex-analytics` - `just test -p codex-analytics` (71 tests) - `just test -p codex-app-server` (837 tests; one unrelated zsh-fork timeout passed on retry)
Adam Perry @ OpenAI ·
2026-06-09 20:52:09 -07:00 -
[codex-analytics] emit goal lifecycle analytics (#27078)
## Why - Currently, there is no analytics event for `/goal` behavior - Existing events cannot identify goal execution or its resulting outcome - The original update in [#26182](https://github.com/openai/codex/pull/26182) was implemented before `/goal` moved into `codex-goal-extension`. ## What Changed - Adds `codex_goal_event` serialization and enrichment to `codex-analytics` - Emits goal events from the canonical `codex-goal-extension` mutation and accounting paths: - `created` when a new logical goal is persisted - `usage_accounted` when cumulative goal usage is persisted - `status_changed` when the stored goal status changes - `cleared` when the goal is deleted - Preserves causal `turn_id` for turn driven events and uses null attribution for external or idle lifecycle events - Changes goal deletion to return the deleted row so `cleared` retains the stable goal ID ## Event Details Includes standard analytics metadata along with goal specific fields: - `goal_id`: Stable ID stored in the local SQLite goal row and shared across the goal's events - `event_kind`: Observed operation (see the 4 lifecycle events cited in the above bullet) - `goal_status`: Resulting or last stored status: `active`, `paused`, `blocked`, `usage_limited`, etc. - `has_token_budget`: Indicates whether a token budget is configured - `turn_id`: Causal turn ID, or null when no causal turn exists - `cumulative_tokens_accounted`: Cumulative tokens on `usage_accounted` events; null otherwise - `cumulative_time_accounted_seconds`: Cumulative active time on `usage_accounted` events; null otherwise ## Validation - `just test -p codex-analytics -p codex-state -p codex-goal-extension` - `just test -p codex-core -E 'test(/goal/)'` - `just test -p codex-app-server` - `cargo build -p codex-analytics -p codex-core -p codex-state -p codex-app-server`
marksteinbrick-oai ·
2026-06-09 18:45:54 -07:00 -
[codex-analytics] add extensible feature thread sources (#27063)
## Why - `ThreadSource` currently defines a closed set of core-owned values - Product features also create threads for background or scheduled work - Adding every product-specific value to the core enum would require repeated `codex-rs` protocol changes - Feature-backed values let product callers provide precise attribution while preserving the existing core classifications ## What Changed - Adds `ThreadSource::Feature(String)` for app-owned thread source values - Represents all app-server v2 thread sources as scalar strings, so a feature source is supplied as `"automation"` - Persists and emits the feature's plain string label, so `"automation"` produces `thread_source="automation"` in analytics - Keeps `user`, `subagent`, and `memory_consolidation` as explicit core-owned values and regenerates the app-server schemas and TypeScript bindings ## Verification - `just write-app-server-schema` - `cargo check --workspace` - `just test -p codex-protocol feature_thread_source_serializes_as_its_app_owned_label` - `just test -p codex-app-server-protocol thread_sources_round_trip_as_scalar_labels` - `cargo test -p codex-analytics thread_initialized_event_serializes_expected_shape` - `just fmt`
marksteinbrick-oai ·
2026-06-09 12:27:10 -07:00 -
multi-agent: add path-based v2 activity tracking (#27007)
## Why Multi-agent v2 identifies agents by canonical paths, but its tool handlers still emitted the larger legacy collaboration begin/end events built around nickname and role metadata. App-server, rollout-trace, analytics, and TUI consumers therefore lacked one compact path-based completion signal that behaved consistently across live events and replay. The TUI also needs a bounded `/agent` status surface for v2 agents. It should use recent local activity for previews, refresh liveness without loading full histories, and keep the legacy picker available when no path-backed v2 agent is known. ## What changed - Replace the v2 `spawn_agent`, `send_message`, `followup_task`, and `interrupt_agent` legacy lifecycle emissions with a success-only `SubAgentActivity` event. The event records the tool call ID, occurrence time, affected thread, canonical agent path, and `started`, `interacted`, or `interrupted` kind. - Expose the activity as a completion-only app-server v2 `subAgentActivity` thread item in live notifications and reconstructed history, regenerate the protocol schemas, and count it in sub-agent tool analytics. - Track canonical paths from live activity and loaded-thread metadata in the TUI, and render the activity in live and replayed transcripts. - Make `/agent` list running path-backed agents with summaries from bounded local event buffers. Each summary is capped at 240 graphemes, the scan is capped at six recent items, only the last three wrapped lines are shown, and command output is omitted. Liveness falls back to metadata-only `thread/read` when local turn state is unavailable. - Persist the activity as a terminal rollout-trace runtime payload and reduce it to the corresponding spawn, send, follow-up, or close interaction edge. `interrupt_agent` is classified as a close-edge operation. - Preserve the legacy picker when no path-backed v2 agent is known. ## Compatibility App-server v2 clients that consumed `collabAgentToolCall` begin/end pairs for these tools must handle the new completion-only `subAgentActivity` item. Legacy v1 collaboration behavior is unchanged. ## Screenshot <img width="684" height="288" alt="Screenshot 2026-06-08 at 15 40 47" src="https://github.com/user-attachments/assets/194b3cd0-619d-45fb-b587-cf3e2b1b8a1d" /> ## Testing - `just test -p codex-app-server-protocol` - `just test -p codex-rollout-trace` - Added focused coverage for activity analytics, terminal trace serialization, spawn-edge reduction, `interrupt_agent` classification, TUI status rendering without aggregated command output, and clearing stale running state after a completed turn.
jif ·
2026-06-09 12:14:48 +02:00 -
[codex-analytics] stop sending codex error subreason (#27060)
## Summary - stop emitting `codex_error_subreason` on `codex_turn_event` - remove the transient analytics fact plumbing that copied `CodexErr::InvalidRequest(String)` into the event - update analytics serialization coverage accordingly ## Why `codex_error_subreason` is a free-form copy of `InvalidRequest(String)`, including raw provider 400 bodies in some paths. That makes it unsafe as an analytics field because it can carry user-derived or sensitive text. ## Validation - `just fmt` - `just test -p codex-analytics`
rhan-oai ·
2026-06-08 21:29:06 +00:00 -
[codex-analytics] report compaction analytics details (#26680)
## Why Compaction analytics adds retained image count and compaction summary output tokens for v1.5 specifically. ## What changed - Add nullable `retained_image_count` and `compaction_summary_tokens` fields to `codex_compaction_event`. - Populate them only for `responses_compaction_v2`: retained images come from the retained v2 compacted history, and summary tokens come from `response.completed.token_usage.output_tokens`. - Leave local and legacy remote compaction events as `null` for these detail fields. ## Verification - `just fmt` - `just fix -p codex-core` - `just test -p codex-core build_v2_compacted_history_counts_retained_input_images` - `git diff --check`
rhan-oai ·
2026-06-08 10:52:31 -07:00 -
[codex] Add turn profiling analytics (#26484)
## Summary Add flat profiling fields to `codex_turn_event` so analytics can explain where turn wall-clock time is spent without changing tool execution behavior. The profile reports: - time before the first sampling request - sampling time across all attempts and follow-ups - overhead between sampling requests - time blocked in the post-sampling tool drain - time after the final sampling request - sampling request and retry counts ## Implementation - Extend the existing turn timing state with constant-memory phase accounting and one RAII phase guard. - Observe sampling and the existing post-sampling drain only at turn orchestration boundaries. - Keep tool runtime, tool futures, response item handling, and turn lifecycle values unchanged. - Add the profiling fields directly to the existing analytics turn event without changing app-server protocol or rollout persistence. - Use the existing turn `status` to distinguish completed, failed, and interrupted profiles. Exact sampling/tool overlap is intentionally omitted because measuring tool completion accurately would require hooks in the tool execution path. ## Validation - Add app-server end-to-end coverage for a single-sampling turn with no blocking tool work. - Add app-server end-to-end coverage for `request_user_input` blocking followed by a second sampling request. - CI is running on the PR; tests were not executed locally per repository guidance.
Ahmed Ibrahim ·
2026-06-05 11:27:10 -07:00 -
[codex-analytics] emit forked thread id on initialization (#26248)
## Why - Thread initialization analytics do not identify the source thread for forked threads. - The session viewer needs this lineage to construct thread trees. - Depends on openai/openai#987854. Do not release this change before that backend schema change is deployed. ## What Changed - Adds optional `forked_from_thread_id` to `codex_thread_initialized`. - Populates it from the existing thread fork lineage for app-server and in-process subagent initialization paths. - Keeps it null for non-forked threads. ## Verification - `just fmt` - `just test -p codex-analytics` - `just test -p codex-app-server thread_fork_tracks_thread_initialized_analytics`
kbazzi ·
2026-06-04 11:24:12 -07:00 -
log plugin MCP server names (#26002)
## Summary - emit the plugin capability summary's exact MCP server names in `codex_plugin_used` ## Test - `just test -p codex-analytics` - `just test -p codex-core explicit_plugin_mentions_track_plugin_used_analytics` - `just fix -p codex-analytics`
Chris Dong ·
2026-06-03 16:06:52 -07:00 -
Populate workspace kind on Codex turn events (#25135)
## Summary - carry `workspace_kind` from Responses API client metadata into the turn resolved analytics fact - serialize the optional value on `codex_turn_event` - cover both the turn metadata source and turn event serialization The `workspace_kind` tells us whether a thread had a project attached vs projectless. this is an indicator for who is adopting Codex for knowledge work outside of coding ## Testing - `env UV_CACHE_DIR=/private/tmp/uv-cache /private/tmp/cargo-tools/bin/just fmt` - `env PATH=/private/tmp/cargo-tools/bin:$PATH CARGO_HOME=/private/tmp/cargo-home UV_CACHE_DIR=/private/tmp/uv-cache /private/tmp/cargo-tools/bin/just test -p codex-analytics` - `env PATH=/private/tmp/cargo-tools/bin:$PATH CARGO_HOME=/private/tmp/cargo-home UV_CACHE_DIR=/private/tmp/uv-cache /private/tmp/cargo-tools/bin/just test -p codex-core turn_metadata` Paired with openai/openai#970661, which keeps forwarding the same metadata key through Responses API headers.
knittel-openai ·
2026-06-02 12:46:14 -07:00 -
Propagate permission approval environment id (#25862)
## Stack 1. #25850 - Key request-permission grants by environment: stores and applies sticky permission grants per environment id. 2. #25858 - Add `environmentId` to `request_permissions`: lets the model target a selected environment and resolves relative permission paths against it. 3. This PR (#25862) - Propagate permission approval environment id: carries the selected environment id through approval events, app-server requests, TUI prompts, and delegate forwarding. 4. #25867 - Add remote request permissions integration coverage: verifies the selected remote environment across request, approval, grant reuse, and exec. This PR is stacked on #25858, and #25867 is stacked on this PR. ## Why PR2 lets the model bind a `request_permissions` call to a selected environment, but the approval event and client-facing request still needed to carry that binding. For CCA, the user-facing prompt and delegated approval path should know which environment the grant applies to instead of relying on cwd alone. ## What Changed - Added optional `environmentId` to `RequestPermissionsEvent`. - Emit the selected environment id from core permission approval events. - Preserve the environment id through delegate forwarding, including cwd-based delegated requests. - Added `environmentId` to app-server permission approval params, generated schema/TypeScript artifacts, and README examples. - Preserve and display the environment id in TUI permission approval prompts. - Updated focused core, app-server protocol, and TUI conversion coverage. ## Testing Not run locally per instruction. Performed read-only `git diff --check`.
jif ·
2026-06-02 21:09:34 +02:00 -
[codex-analytics] Track CodexErr details in turn analytics (#25707)
## Summary - add analytics-only `CodexErr` telemetry to `codex_turn_event` while leaving existing `turn_error` unchanged - record terminal `CodexErr` facts from core immediately before the existing turn error event is sent - emit source-truth `codex_error_*` fields for downstream analytics, including the raw `CodexErr::InvalidRequest(String)` message as `codex_error_subreason` ## Validation - `just test -p codex-analytics` - attempted `just test -p codex-core`, but the local run timed out across unrelated integration suites in this environment and is not being used as validation
rhan-oai ·
2026-06-02 11:40:35 -07:00 -
store and expose parent_thread_id on Threads (#25113)
## Why This PR https://github.com/openai/codex/pull/24161#discussion_r3325692763 revealed a subagent data modeling issue, where we overloaded `forked_from_id` to also mean `parent_thread_id`. That's incorrect since guardian and review subagents can be a subagent and NOT fork the main thread's history. The solution here is to explicitly store a new `parent_thread_id` on `SessionMeta`, alongside `forked_from_id` which already exists. While we're at it, also expose it in the app-server protocol on the `Thread` object. A thread->subagent relationship and a fork of thread history are orthogonal concepts. ## What Changed - Added top-level `parent_thread_id` persistence on `SessionMeta` and runtime/session plumbing through `SessionConfiguredEvent`, `CodexSpawnArgs`, `SessionConfiguration`, `ThreadConfigSnapshot`, `TurnContext`, and `ModelClient`. - Made turn metadata, request headers, analytics, and subagent-start events read the separate runtime/top-level parent field instead of deriving general parent lineage from `SessionSource` or `forked_from_thread_id`. - Passed parent lineage separately at delegated subagent, review, guardian, agent-job, and multi-agent spawn construction sites; copied-history fork lineage remains derived only from `InitialHistory`. - Persisted and exposed parent lineage through rollout/thread-store projections and app-server v2 `Thread.parentThreadId`. - Updated app-server README text and regenerated app-server schema fixtures for the additive `parentThreadId` response field.
Owen Lin ·
2026-06-01 04:33:20 +00:00 -
Add cloud-managed config layer support (#24620)
## Summary PR 3 of 5 in the cloud-managed config client stack. Adds enterprise-managed cloud config as a first-class config layer source. The layer metadata is preserved through config loading, diagnostics, debug output, hook attribution, and app-server protocol surfaces. ## Details - Enterprise-managed config becomes a normal config layer source with backend-supplied `id` and display `name` attached for provenance. - These layers are designed to behave like non-file managed config: they can surface syntax/type diagnostics by layer name even though there is no physical config file. - Relative path settings are resolved from a stored config base so cloud-delivered config remains consistent with existing MDM-delivered config semantics. - Hook attribution distinguishes config-delivered hooks from requirements-delivered hooks via `HookSource::CloudManagedConfig`. - This remains pull-based and snapshot-oriented; the PR adds layer identity/diagnostics, not dynamic reload behavior. ## Validation Validated through the targeted stack checks after rebasing onto current `main`: - Rust crate tests for config/hooks/cloud-config/backend-client/app-server-protocol - Filtered `codex-core` and `codex-app-server` `cloud_config_bundle` tests - Python generated-file contract test - `cargo shear --deny-warnings` - Targeted `argument-comment-lint` for config/hooks
joeflorencio-openai ·
2026-05-31 15:54:31 -07:00 -
Add subagent lineage metadata for responsesapi (#24161)
## Why We recently added `forked_from_thread_id` which lets us trace where a thread's _context_ comes from, but we also want to understand subagent lineage (e.g. which parent thread spawned this subagent? what kind of subagent is it?) which is orthogonal. This PR adds `parent_thread_id` and `subagent_kind` to the `x-codex-turn-metadata` header sent to ResponsesAPI. ## What changed - Adds `parent_thread_id` and `subagent_kind` to core-owned `x-codex-turn-metadata`. - Restores persisted `SessionSource` and `ThreadSource` from resumed session metadata so cold-resumed subagent threads keep their lineage on later Responses API requests. - Centralizes parent-thread extraction on `SessionSource` / `SubAgentSource` and reuses it in the Responses client, analytics, agent control, and state parsing paths. - Extends reserved-key, git-enrichment, thread-spawn, and app-server v2 metadata coverage for the new lineage fields. ## Verification - Not run locally per request. - Added focused coverage in `core/src/turn_metadata_tests.rs` and `app-server/tests/suite/v2/client_metadata.rs`.
Owen Lin ·
2026-05-29 11:28:12 -07:00 -
[codex] Add user input client ids (#24653)
## Summary Adds an optional `clientId` field to app-server v2 `UserInput` and carries it through the core `UserInput` model so clients can correlate echoed user input items without relying on payload equality. ## Details - Adds `client_id: Option<String>` to core `UserInput` variants. - Exposes the v2 app-server field as `clientId` on the wire and in generated TypeScript. - Preserves the id when converting between app-server v2 and core protocol types. - Regenerates app-server schema fixtures. ## Validation - `just fmt` - `just write-app-server-schema` - `cargo test -p codex-app-server-protocol` - `cargo test -p codex-protocol` - `just fix -p codex-app-server-protocol` - `just fix -p codex-protocol` - `git diff --check`
Alexi Christakis ·
2026-05-28 14:54:39 -07:00 -
feat(app-server): include turns page on thread resume (#23534)
## Summary The client currently calls `thread/resume` to establish live updates and immediately follows it with `thread/turns/list` to hydrate recent turns. This lets `thread/resume` return that page directly, eliminating a round trip and the ordering/deduplication gap between the two calls. Experimental clients opt in with `initialTurnsPage: { limit, sortDirection, itemsView }`. The response returns `initialTurnsPage` as a `TurnsPage`, including cursors for paging further back in history. Keeping the controls in a nested opt-in object provides the useful `thread/turns/list` knobs without spreading page-specific parameters across `thread/resume`. ## Verification - `just fmt` - `just write-app-server-schema --experimental` - `just write-app-server-schema` - `cargo test -p codex-app-server-protocol` - `cargo test -p codex-app-server thread_resume_initial_turns_page_matches_requested_turns_list_page --tests` - `cargo test -p codex-app-server thread_resume_rejoins_running_thread_even_with_override_mismatch --tests` - `just fix -p codex-app-server-protocol -p codex-app-server`Brent Traut ·
2026-05-28 09:18:13 -07:00 -
[codex-analytics] add grouped session id to runtime events (#24655)
## Why - Runtime analytics events report `thread_id`, which identifies the individual thread emitting an event - They don't report `session_id`, which identifies the shared session for a root thread and its subagent threads - Emitting both identifiers allows analytics to group related activity ## What Changed - Adds `session_id` to relevant analytics events (thread_initalized, turn, turn_steer, compaction, guardian_review) - Tracks each thread's session ID in the analytics reducer so subsequent thread scoped events emit the same value - Carries the shared session ID through subagent initialization ## Verification - `just test -p codex-analytics` validates event payloads and subagent session grouping. - Focused `codex-app-server` tests validate session IDs for thread, turn, and steer events. - Focused `codex-core` tests validate root and subagent session ID propagation.
marksteinbrick-oai ·
2026-05-26 16:38:46 -07:00 -
Add experimental turn additional context (#24154)
## Summary Adds experimental `additionalContext` support to `turn/start` and `turn/steer` so clients can provide ephemeral external context, such as browser or automation state, without turning that plumbing into a visible user prompt or triggering user-prompt lifecycle behavior. ## API Shape The parameter shape is: ```ts additionalContext?: Record<string, { value: string kind: "untrusted" | "application" }> | null ``` Example: ```json { "additionalContext": { "browser_info": { "value": "Active tab is CI failures.", "kind": "untrusted" }, "automation_info": { "value": "CI rerun is in progress.", "kind": "application" } } } ``` The keys are opaque and caller-defined. ## Context Injection When provided, accepted entries are inserted into model context as hidden contextual message items, not as visible thread user-message items. `kind: "untrusted"` entries are inserted with role `user`: ```text <external_${key}>${value}</external_${key}> ``` `kind: "application"` entries are inserted with role `developer`: ```text <${key}>${value}</${key}> ``` Values are not escaped. Each value is truncated to 1k approximate tokens before wrapping. For `turn/start`, accepted additional context is inserted before normal user input. For `turn/steer`, additional context is merged only when the steer includes non-empty user input; context-only steers still reject as empty input. ## Dedupe Strategy `AdditionalContextStore` lives on session state and stores the latest complete additional-context map. Each `turn/start` or non-empty `turn/steer` treats its `additionalContext` as the current complete set of values. Entries are injected only when the key is new or the exact entry for that key changed, including `value` or `kind`. After merging, the store is replaced with the provided map, so omitted keys are removed from the retained set and can be injected again later if reintroduced. Omitting `additionalContext`, passing `null`, or passing an empty object resets the store to empty and injects nothing. ## What Changed - Threads experimental v2 `additionalContext` through app-server into core turn start and steer handling. - Adds separate contextual fragment types for untrusted user-role context and application developer-role context. - Uses pending response input items so additional context can be combined with normal user input without treating it as prompt text. - Adds integration coverage for start/steer flow, role routing, dedupe/reset behavior, deletion/re-add behavior, hook-blocked input behavior, empty context-only steer rejection, external-fragment marker matching, and truncation.pakrym-oai ·
2026-05-26 13:02:34 -07:00 -
[codex-analytics] split compaction v2 analytics implementation (#24146)
## What changed - Add a distinct `responses_compaction_v2` value for `CodexCompactionEvent.implementation`. - Emit that value from the remote compaction v2 path. - Keep local compaction as `responses` and legacy `/responses/compact` as `responses_compact`. ## Why Remote compaction v2 and local prompt-based compaction were both reported as `responses`, which made the analytics table collapse two different compaction mechanisms into one implementation bucket. ## Validation - `just fmt` - `just test -p codex-analytics` `just test -p codex-core` was started locally, but this PR is intentionally being pushed for CI to finish the remaining validation.
rhan-oai ·
2026-05-22 21:34:22 +00:00 -
[codex] Add plugin id to MCP tool call items (#23737)
Add owning plugin id to MCP tool call items so we can better filter them at plugin level. ## Summary - add optional `plugin_id` to MCP tool-call items and legacy begin/end events - propagate plugin metadata into emitted core items and app-server v2 `ThreadItem::McpToolCall` - preserve plugin ids through app-server replay/redaction paths and regenerate v2 schema fixtures ## Testing - `just write-app-server-schema` - `just fmt` - `just fix -p codex-core` - `cargo test -p codex-protocol -p codex-app-server-protocol` - `cargo test -p codex-app-server-protocol` - `cargo test -p codex-core mcp_tool_call_item_includes_plugin_id --lib` - `cargo check -p codex-tui --tests` - `cargo check -p codex-app-server --tests` - `git diff --check` ## Notes - `just fix -p codex-core` completed with two non-fatal `too_many_arguments` warnings on the touched MCP notification helpers. - A broader `cargo test -p codex-core` run passed core unit tests, then hit shell/sandbox/snapshot failures in the integration target. - A broader app-server downstream run hit the existing `in_process::tests::in_process_start_clamps_zero_channel_capacity` stack overflow; `cargo test -p codex-exec` also hit the existing sandbox expectation mismatch in `thread_lifecycle_params_include_legacy_sandbox_when_no_active_profile`.
Matthew Zeng ·
2026-05-20 17:02:10 -07:00 -
Add SubagentStop hook (#22873)
# What <img width="1792" height="1024" alt="image" src="https://github.com/user-attachments/assets/8f81d232-5813-4994-a61d-e42a05a93a3e" /> `SubagentStop` runs when a thread-spawned subagent turn is about to finish. Thread-spawned subagents use `SubagentStop` instead of the normal root-agent `Stop` hook. Configured handlers match on `agent_type`. Hook input includes the normal stop fields plus: - `agent_id`: the child thread id. - `agent_type`: the resolved subagent type. - `agent_transcript_path`: the child subagent transcript path. - `transcript_path`: the parent thread transcript path. - `last_assistant_message`: the final assistant message from the child turn, when available. - `stop_hook_active`: `true` when the child is already continuing because an earlier stop-like hook blocked completion. `SubagentStop` shares the same completion-control semantics as `Stop`, scoped to the child turn: - No decision allows the child turn to finish. - `decision: "block"` with a non-empty `reason` records that reason as hook feedback and continues the child with that prompt. - `continue: false` stops the child turn. If `stopReason` is present, Codex surfaces it as the stop reason. # Lifecycle Scope Only thread-spawned subagents run `SubagentStop`. Internal/system subagents such as Review, Compact, MemoryConsolidation, and Other do not run normal `Stop` hooks and do not run `SubagentStop`. This avoids exposing synthetic matcher labels for internal implementation paths. # Stack 1. #22782: add `SubagentStart`. 2. This PR: add `SubagentStop`. 3. #22882: add subagent identity to normal hook inputs.
Abhinav ·
2026-05-20 14:59:41 -07:00 -
Add SubagentStart hook (#22782)
# What `SubagentStart` runs once when Codex creates a thread-spawned subagent, before that child sends its first model request. Thread-spawned subagents use `SubagentStart` instead of the normal root-agent `SessionStart` hook. Configured handlers match on the subagent `agent_type`, using the same value passed to `spawn_agent`. When no agent type is specified, Codex uses the default agent type. Hook input includes the normal session-start fields plus: - `agent_id`: the child thread id. - `agent_type`: the resolved subagent type. `SubagentStart` may return `hookSpecificOutput.additionalContext`. That context is added to the child conversation before the first model request. # Lifecycle Scope Only thread-spawned subagents run `SubagentStart`. Internal/system subagents such as Review, Compact, MemoryConsolidation, and Other do not run normal `SessionStart` hooks and do not run `SubagentStart`. This avoids exposing synthetic matcher labels for internal implementation paths. Also the `SessionStart` hook no longer fires for subagents, this matches behavior with other coding agents' implementation # Stack 1. This PR: add `SubagentStart`. 2. #22873: add `SubagentStop`. 3. #22882: add subagent identity to normal hook inputs.
Abhinav ·
2026-05-19 12:45:08 -07:00 -
test: construct permission profiles directly (#23030)
## Why `SandboxPolicy` is now a legacy compatibility shape, but several tests still built a `SandboxPolicy` only to immediately convert it into `PermissionProfile` for APIs that already accept canonical runtime permissions. Those detours make it harder to audit where legacy sandbox policy is still required, because boundary-only usages are mixed together with ordinary test setup. ## What Changed - Updated tests in `codex-core`, `codex-exec`, `codex-analytics`, and `codex-config` to construct `PermissionProfile` values directly when the code under test takes a permission profile. - Changed exec-policy, request-permissions, session, and sandbox test helpers to pass `PermissionProfile` through instead of converting from `SandboxPolicy` internally. - Left `SandboxPolicy` in place where tests are explicitly exercising legacy compatibility or request/response boundaries. ## Test Plan - `cargo test -p codex-analytics -p codex-config` - `cargo test -p codex-core --lib safety::tests` - `cargo test -p codex-core --lib exec_policy::tests::` - `cargo test -p codex-core --lib exec::tests` - `cargo test -p codex-core --lib guardian_review_session_config` - `cargo test -p codex-core --lib tools::network_approval::tests` - `cargo test -p codex-core --lib tools::runtimes::shell::unix_escalation::tests` - `cargo test -p codex-core --lib managed_network` - `cargo test -p codex-core --test all request_permissions::` - `cargo test -p codex-exec sandbox` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/23030). * #23036 * __->__ #23030
Michael Bolin ·
2026-05-16 12:12:37 -07:00 -
Preserve image detail in app-server inputs (#20693)
## Summary - Add optional image detail to user image inputs across core, app-server v2, thread history/event mapping, and the generated app-server schemas/types. - Preserve requested detail when serializing Responses image inputs: omitted detail stays on the existing `high` default, while explicit `original` keeps local images on the original-resolution path. - Support `high`/`original` consistently for tool image outputs, including MCP `codex/imageDetail`, code-mode image helpers, and `view_image`.
Curtis 'Fjord' Hawthorne ·
2026-05-15 15:04:04 -07:00 -
app-server: stop returning thread permission profiles (#22792)
## Why The app-server thread lifecycle API should no longer expose the full `PermissionProfile` value. After the permissions-profile migration, clients should round-trip only the active profile identity through `activePermissionProfile` and `permissions` when that identity is known. The full profile is server-side config. Treating a response-derived legacy sandbox projection as a new local profile can lose named-profile restrictions and accidentally widen permissions on the next turn. The legacy `sandbox` response field remains only as the compatibility/display fallback. ## What Changed - Removed `permissionProfile` from `ThreadStartResponse`, `ThreadResumeResponse`, and `ThreadForkResponse`. - Stopped populating that field in app-server thread start/resume/fork responses. - Updated embedded exec/TUI response mapping to derive display permission state from local config or the legacy sandbox fallback instead of a response profile value. - Added a TUI turn override shape that distinguishes preserving server permissions, selecting an active profile id, and sending a legacy sandbox for an explicit local override. - Preserved remote app-server permissions across turns by sending `permissions` only when an `activePermissionProfile` id is known, and otherwise sending no sandbox override unless the user selected a local override. - Kept embedded `thread/resume` hydration server-authored when `activePermissionProfile` is absent, which matches the live-thread attach path where the server ignores requested overrides. - Updated the app-server README to remove the obsolete lifecycle response `permissionProfile` reference. The remaining `permissionProfile` README references are request-side permission overrides. - Regenerated app-server JSON schema and TypeScript fixtures. - Kept the generated typed response enum exempt from `large_enum_variant`, matching the existing payload enum exemption after the lifecycle response variants shrank. ## How To Review Start with `codex-rs/app-server-protocol/src/protocol/v2/thread.rs` to confirm the response shape, then check the response construction in `codex-rs/app-server/src/request_processors`. The generated schema and TypeScript fixture changes are mechanical follow-through from the protocol removal. The TUI behavior is the delicate part: review `codex-rs/tui/src/app_server_session.rs` for response hydration and turn-start override projection, then `codex-rs/tui/src/app/thread_routing.rs` for the decision about whether the next turn should preserve the server snapshot, send an active profile id, or send a legacy sandbox for an explicit local override. ## Verification - `just write-app-server-schema` - `cargo test -p codex-app-server-protocol thread_lifecycle_responses_default_missing_optional_fields` - `cargo test -p codex-exec session_configured_from_thread_response_uses_permission_profile_from_config` - `cargo test -p codex-tui --lib thread_response` - `cargo test -p codex-tui turn_permissions_` - `cargo test -p codex-tui resume_response_restores_turns_from_thread_items` - `cargo test -p codex-analytics track_response_only_enqueues_analytics_relevant_responses` - `just fix -p codex-analytics` - `just fix -p codex-app-server-protocol` - `just fix -p codex-tui` - `just argument-comment-lint` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/22792). * #22795 * __->__ #22792
Michael Bolin ·
2026-05-15 12:45:48 -07:00 -
app-server: use permission ids and runtime workspace roots (#22611)
## Why This PR builds on [#22610](https://github.com/openai/codex/pull/22610) and is the app-server side of the migration from mutable per-turn `SandboxPolicy` replacement toward selecting immutable permission profiles by id plus mutable runtime workspace roots. Once permission profiles can carry their own immutable `workspace_roots`, app-server no longer needs to mutate the selected `PermissionProfile` just to represent thread-specific filesystem context. The mutable part now lives on the thread as explicit `runtimeWorkspaceRoots`, while `:workspace_roots` remains symbolic until the sandbox is realized for a turn. ## What Changed - Replaced the v2 permission-selection wrapper surface with plain profile ids for `thread/start`, `thread/resume`, `thread/fork`, and `turn/start`. - Removed the API surface for profile modifications (`PermissionProfileSelectionParams`, `PermissionProfileModificationParams`, `ActivePermissionProfileModification`). - Added experimental `runtimeWorkspaceRoots` fields to the thread lifecycle and turn-start APIs. - Threaded runtime workspace roots through core session/thread snapshots, turn overrides, app-server request handling, and command execution permission resolution. - Kept session permission state symbolic so later runtime root updates and cwd-only implicit-root retargeting rebind `:workspace_roots` correctly. - Updated the embedded clients just enough to send and restore the new thread state. - Refreshed the generated schema/TypeScript artifacts and the app-server README to match the new contract. ## Verification Targeted coverage for this layer lives in: - `codex-rs/app-server-protocol/src/protocol/v2/tests.rs` - `codex-rs/app-server/tests/suite/v2/thread_start.rs` - `codex-rs/app-server/tests/suite/v2/thread_resume.rs` - `codex-rs/app-server/tests/suite/v2/turn_start.rs` - `codex-rs/core/src/session/tests.rs` The key regression checks exercise that: - `runtimeWorkspaceRoots` resolve against the effective cwd on thread start. - Profile-declared workspace roots are excluded from the runtime workspace roots returned by app-server. - A turn-level runtime workspace-root update persists onto the thread and is returned by `thread/resume`. - A named permission profile selected on one turn remains symbolic so a later runtime-root-only turn update changes the actual sandbox writes. - A cwd-only turn update retargets the implicit runtime cwd root while preserving additional runtime roots. - The protocol fixtures and generated client artifacts stay in sync with the string-based permission selection contract. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/22611). * #22612 * __->__ #22611
Michael Bolin ·
2026-05-14 23:00:05 -07:00 -
Stop uploading accepted line fingerprints (#22180)
## Summary - keep accepted-line diff parsing and fingerprint hashing logic locally - stop uploading path/line hash fingerprints in the accepted-line analytics event payload - keep aggregate accepted added/deleted line counts in the event ## Testing - just fmt - cargo test -p codex-analytics - just fix -p codex-analytics
alexsong-oai ·
2026-05-11 15:41:38 -07:00 -
[codex-analytics] emit terminal review events (#18748)
## Why Review telemetry should describe reviews as first-class events, not only as counters denormalized onto terminal tool-item events. That lets us analyze guardian and user reviews consistently across command execution, file changes, permissions, and network access, while still preserving the terminal item summaries that existing tool analytics need. To make those review events accurate, analytics also needs the observed completion time for each review and enough command metadata to distinguish `shell` from `unified_exec` reviews. ## What changed - emit generic `codex_review_event` rows for completed user and guardian reviews, with review subjects, reviewer, trigger, terminal status, resolution, and observed duration - reduce approval request / response / abort facts into review events for command execution, file change, and permissions flows - keep denormalized review counts, final approval outcome, and permission-request flags on terminal tool-item events for item-associated reviews - plumb review completion timing so user-review responses and aborts use app-server-observed completion times, while guardian analytics reuse the same terminal timestamps emitted on guardian assessment events - carry command approval `source` through the protocol and app-server layers so review analytics can distinguish `shell` from `unified_exec` - add analytics coverage for user-review emission, guardian-review emission, permission reviews that should not denormalize onto tool items, item-summary isolation across threads, and the serialized review-event shape ## Verification - `cargo test -p codex-analytics` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18748). * __->__ #18748 * #21434 * #18747 * #17090 * #17089 * #20514
rhan-oai ·
2026-05-11 22:13:32 +00:00 -
[codex-analytics] add turn tool counts to turn events (#21431)
## Summary - accumulate completed tool-item counts per turn from the item lifecycle - populate the reserved count fields on `codex_turn_event` - add reducer coverage for zero-count turns and mixed completed tool items ## Why PR #17090 moved tool-item analytics onto the item lifecycle, so the turn reducer can now derive the per-turn tool counts from the same completed items instead of leaving the reserved fields null. ## Validation - `just fmt` - `cargo test -p codex-analytics`
rhan-oai ·
2026-05-11 18:18:02 +00:00 -
[codex] request desktop attestation from app (#20619)
## Summary TL;DR: teaches `codex-rs` / app-server to request a desktop-provided attestation token and attach it as `x-oai-attestation` on the scoped ChatGPT Codex request paths.  ## Details This PR teaches the Codex app-server runtime how to request and attach an attestation token. It does not generate DeviceCheck tokens directly; instead, it relies on the connected desktop app to advertise that it can generate attestation and then asks that app for a fresh header value when needed. The flow is: 1. The Codex desktop app connects to app-server. 2. During `initialize`, the app can advertise that it supports `requestAttestation`. 3. Before app-server calls selected ChatGPT Codex endpoints, it sends the internal server request `attestation/generate` to the app. 4. app-server receives a pre-encoded header value back. 5. app-server forwards that value as `x-oai-attestation` on the scoped outbound requests. The code in this repo is mostly protocol and runtime plumbing: it adds the app-server request/response shape, introduces an attestation provider in core, wires that provider into Responses / compaction / realtime setup paths, and covers the intended scoping with tests. The signed macOS DeviceCheck generation remains owned by the desktop app PR. ## Related PR - Codex desktop app implementation: https://github.com/openai/openai/pull/878649 ## Validation <details> <summary>Tests run</summary> ```sh cargo test -p codex-app-server-protocol cargo test -p codex-core attestation --lib cargo test -p codex-app-server --lib attestation ``` Also ran: ```sh just fix -p codex-core just fix -p codex-app-server just fix -p codex-app-server-protocol just fmt just write-app-server-schema ``` </details> <details> <summary>E2E DeviceCheck validation</summary> First validated the signed desktop app boundary directly: launched a packaged signed `Codex.app`, sent `attestation/generate`, decoded the returned `v1.` attestation header, and validated the extracted DeviceCheck token with `personal/jm/verify_devicecheck_token.py` using bundle ID `com.openai.codex`. Apple returned `status_code: 200` and `is_ok: true`. Then ran the fuller app + app-server flow. The packaged `Codex.app` launched a current-branch app-server via `CODEX_CLI_PATH`, and a local MITM proxy intercepted outbound `chatgpt.com` traffic. The app-server requested `attestation/generate` from the real Electron app process, and the intercepted `/backend-api/codex/responses` traffic included `x-oai-attestation` on both routes: ```text GET /backend-api/codex/responses Upgrade: websocket x-oai-attestation: present POST /backend-api/codex/responses Upgrade: none x-oai-attestation: present ``` The captured header decoded to a DeviceCheck token that also validated with Apple for `com.openai.codex` (`status_code: 200`, `is_ok: true`, team `2DC432GLL2`). </details> --------- Co-authored-by: Codex <noreply@openai.com>
Jiaming Zhang ·
2026-05-08 12:36:02 -07:00 -
Emit accepted line fingerprint analytics (#21601)
## Why Codex assisted-code attribution needs a client-side accepted-code source that does not upload raw code. This adds a hash-only analytics event derived from the turn diff so downstream attribution can compare accepted Codex lines against commit or PR diffs. ## What Changed - Parse accepted/effective added lines from the final turn diff and emit `codex_accepted_line_fingerprints` analytics. - Hash repo, path, and normalized line content before upload; raw code and raw diffs are not included in the event. - Chunk large fingerprint payloads and send accepted-line fingerprint events in isolated requests while preserving normal batching for other analytics events. - Canonicalize Git remote URLs before repo hashing so SSH/HTTPS GitHub remotes join to the same repo hash. - Add parser coverage for unified diff hunk lines that look like `+++` or `---` file headers. ## Verification - `cargo test -p codex-analytics` - `cargo test -p codex-git-utils canonicalize_git_remote_url` - `just fix -p codex-analytics` - `just bazel-lock-check` - `git diff --check`
alexsong-oai ·
2026-05-08 12:16:24 -07:00 -
[codex-analytics] plumb protocol-native review timing (#21434)
## Why We want terminal tool review analytics, but the reducer should not stamp review timing from its own wall clock. This PR plumbs review timing through the real protocol and app-server seams so downstream analytics can consume the emitter's timestamps directly. Guardian reviews keep their enriched `started_at` / `completed_at` analytics fields by deriving those legacy second-based values from the same protocol-native millisecond lifecycle timestamps, rather than sampling a separate analytics clock. ## What changed - add `started_at_ms` to user approval request payloads - add `started_at_ms` / `completed_at_ms` to guardian review notifications - preserve Guardian review `started_at` / `completed_at` enrichment from the protocol-native timing source - stamp typed `ServerResponse` analytics facts with app-server-observed `completed_at_ms` - thread the new timing fields through core, protocol, app-server, TUI, and analytics fixtures ## Verification - `cargo test -p codex-app-server outgoing_message --manifest-path codex-rs/Cargo.toml` - `cargo test -p codex-app-server-protocol guardian --manifest-path codex-rs/Cargo.toml` - `cargo test -p codex-tui guardian --manifest-path codex-rs/Cargo.toml` - `cargo test -p codex-analytics analytics_client_tests --manifest-path codex-rs/Cargo.toml` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/21434). * #18748 * __->__ #21434 * #18747 * #17090 * #17089 * #20514
rhan-oai ·
2026-05-07 20:31:41 -07:00 -
[codex-analytics] add tool review event schema (#18747)
## Why We want to emit terminal review analytics for tool-related approval flows, but the event contract needs to exist before the reducer can publish anything. This PR is the schema-only slice for the Codex review event family. ## What changed - add the `ReviewEvent` analytics envelope in `codex-rs/analytics/src/events.rs` - define the review subject kind, reviewer, trigger, terminal status, and post-review resolution enums - define the review event payload with thread, turn, item, lineage, tool, and timing fields that the emitter stack will populate ## Verification - stacked verification in dependent PRs: `cargo test -p codex-analytics analytics_client_tests --manifest-path codex-rs/Cargo.toml` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18747). * #18748 * #21434 * __->__ #18747 * #17090 * #17089 * #20514
rhan-oai ·
2026-05-07 09:46:46 -07:00 -
Add compact lifecycle hooks (started by vincentkoc - external contrib) (#19905)
Based on work from Vincent K - https://github.com/openai/codex/pull/19060 <img width="1836" height="642" alt="CleanShot 2026-04-29 at 20 47 40@2x" src="https://github.com/user-attachments/assets/b647bb89-65fe-40c8-80b0-7a6b7c984634" /> ## Why Compaction rewrites the conversation context that future model turns receive, but hooks currently have no deterministic lifecycle point around that rewrite. This adds compact lifecycle hooks so users can audit manual and automatic compaction, surface hook messages in the UI, and run post-compaction follow-up without overloading tool or prompt hooks. ## What Changed - Added `PreCompact` and `PostCompact` hook events across hook config, discovery, dispatch, generated schemas, app-server notifications, analytics, and TUI hook rendering. - Added trigger matching for compact hooks with the documented `manual` and `auto` matcher values. - Wired `PreCompact` before both local and remote compaction, and `PostCompact` after successful local or remote compaction. - Kept compact hook command input to lifecycle metadata: session id, Codex turn id, transcript path, cwd, hook event name, model, and trigger. - Made compact stdout handling consistent with other hooks: plain stdout is ignored as debug output, while malformed JSON-looking stdout is reported as failed hook output. - Added integration coverage for compact hook dispatch, trigger matching, post-compact execution, and the audited behavior that `decision:"block"` does not block compaction. ## Out of Scope - Hook-specific compaction blocking is not implemented; `decision:"block"` and exit-code-2 blocking semantics are intentionally unsupported for `PreCompact`. - Custom compaction instructions are not exposed to compact hooks in this PR. - Compact summaries, summary character counts, and summary previews are not exposed to compact hooks in this PR. ## Verification - `cargo test -p codex-hooks` - `cargo test -p codex-core manual_pre_compact_block_decision_does_not_block_compaction` - `cargo test -p codex-app-server hooks_list` - `cargo test -p codex-core config_schema_matches_fixture` - `cargo test -p codex-tui hooks_browser` ## Docs The developer documentation for Codex hooks should be updated alongside this feature to document `PreCompact` and `PostCompact`, the `manual`/`auto` matcher values, and the compact hook payload fields. --------- Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
Andrei Eternal ·
2026-05-06 18:08:31 -07:00 -
[codex-analytics] emit tool item events from item lifecycle (#17090)
## Why After the tool-item schemas are in place, analytics needs to emit them from the app-server item lifecycle rather than requiring bespoke tracking at each callsite. The reducer should also reuse the shared thread analytics context introduced below it in the stack so later event families do not repeat the same reducer joins or missing-state ladder. ## What changed - Tracks tool-item completion notifications and emits the matching tool analytics event when a terminal item arrives. - Derives event-specific payload details for command execution, file changes, MCP calls, dynamic tools, collaboration tools, web search, and image generation. - Denormalizes thread, app-server client, runtime, and subagent provenance metadata through the shared thread analytics context. - Adds reducer coverage for item lifecycle emission and subagent metadata inheritance. ## Duration semantics `duration_ms` is computed from the app-server item lifecycle timestamps: `completed_at_ms - started_at_ms`. That makes it the duration of the lifecycle Codex observed locally, not necessarily the upstream provider's full execution time. - Web search usually has a meaningful observed lifecycle because Responses can send `response.output_item.added` before `response.output_item.done`; in that case `started_at_ms` comes from the added event and `completed_at_ms` comes from the done event. - Image generation can be much less precise. In the current observed stream, image generation often arrives only as a completed `response.output_item.done`; when there is no earlier added event, Codex synthesizes the started item immediately before completion, so `duration_ms` can be `0` even though upstream image generation took longer. - Standalone web search and standalone image generation work is expected to land after this stack. Those paths may introduce more direct lifecycle events or timing points, so the current web-search/image-generation duration semantics should be treated as the best available item-lifecycle approximation, not the final latency contract for those tool families. - `execution_duration_ms` is populated only where the completed item already carries a native execution duration; otherwise it remains `null` while `duration_ms` still reflects the local lifecycle interval. ## Currently placeholder / partial fields Some fields are included in the schema for the intended steady-state contract, but this PR does not yet populate them from real approval/review state: - `review_count`, `guardian_review_count`, and `user_review_count` currently default to `0`. - `final_approval_outcome` currently defaults to `unknown`. - `requested_additional_permissions` and `requested_network_access` currently default to `false`. ## Verification - `cargo test -p codex-analytics` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/17090). * #18748 * #18747 * __->__ #17090 * #17089 * #20514
rhan-oai ·
2026-05-06 20:27:41 +00:00 -
feat(app-server): move v2
sessionIdontoThread(#21336)## Why `session_id` and `thread_id` are separate identities after #20437, but app-server only surfaced `sessionId` on the `thread/start`, `thread/resume`, and `thread/fork` response envelopes. Other thread-bearing surfaces such as `thread/list`, `thread/read`, `thread/started`, `thread/rollback`, `thread/metadata/update`, and `thread/unarchive` either lacked the grouping key or forced clients to special-case those three responses. Making `sessionId` part of the reusable `Thread` payload gives every v2 API surface one place to expose session-tree identity. ## Mental model 1. thread.sessionId lives on `Thread` 2. It is a view/runtime identity for the current live session tree, not durable stored lineage metadata 3. When app-server has a live loaded thread, it copies the real value from core’s session_configured.session_id 4. When it only has stored/unloaded data, it falls back to thread.sessionId = thread.id ## What changed - Added `sessionId` to the v2 [`Thread`](https://github.com/openai/codex/blob/8fc9e9b4cf81b6f61d432e71f1eb266f6f104b63/codex-rs/app-server-protocol/src/protocol/v2/thread_data.rs#L105-L109). - Removed the duplicate top-level `sessionId` fields from `thread/start`, `thread/resume`, and `thread/fork`; clients should now read `response.thread.sessionId`. - Populated `thread.sessionId` when building live thread responses, replaying loaded threads, and returning stored-thread summaries so the field is present across start, resume, fork, list, read, rollback, metadata-update, unarchive, and `thread/started` paths. See [`load_thread_from_resume_source_or_send_internal`](https://github.com/openai/codex/blob/8fc9e9b4cf81b6f61d432e71f1eb266f6f104b63/codex-rs/app-server/src/request_processors/thread_processor.rs#L2824-L2918) and [`thread_from_stored_thread`](https://github.com/openai/codex/blob/8fc9e9b4cf81b6f61d432e71f1eb266f6f104b63/codex-rs/app-server/src/request_processors/thread_processor.rs#L3671-L3719). - Preserved the stored-thread fallback: if a thread has not been loaded into a live session tree yet, `thread.sessionId` falls back to `thread.id`; once the thread is live again, the field reports the active session tree root. - Regenerated the JSON/TypeScript schemas and updated the app-server README examples to show [`thread.sessionId`](https://github.com/openai/codex/blob/8fc9e9b4cf81b6f61d432e71f1eb266f6f104b63/codex-rs/app-server/README.md#L306-L310) on the thread object.
jif-oai ·
2026-05-06 15:23:25 +02:00 -
feat: return session ID from thread/fork (#21332)
## Why `thread/start` and `thread/resume` already return `sessionId`, but `thread/fork` only returned the new thread. That left clients to infer the forked thread's session identity from `thread.id`, which kept the new `session_id` / `thread_id` split implicit at one lifecycle boundary. Follow-up to #20437. ## What changed - Add `sessionId` to `ThreadForkResponse`. - Populate it from the forked session configuration. - Regenerate the v2 JSON/TypeScript schema fixtures and update the app-server docs/example. - Extend the fork integration test to assert the returned `sessionId`. ## Verification - Added coverage in `thread_fork_creates_new_thread_and_emits_started` for the new response field.
jif-oai ·
2026-05-06 12:04:27 +02:00 -
feat: add
session_id(#20437)## Summary Related to https://openai.slack.com/archives/C095U48JNL9/p1777537279707449 TLDR: We update the meaning of session ids and thread ids: * thread_id stays as now * session_id become a shared id between every thread under a /root thread (i.e. every sub-agent share the same session id) This PR introduces an explicit `SessionId` and threads it through the protocol/client boundary so `session_id` and `thread_id` can diverge when they need to, while preserving compatibility for older serialized `session_configured` events. --------- Co-authored-by: Codex <noreply@openai.com>
jif-oai ·
2026-05-06 10:48:37 +02:00 -
[codex-analytics] rework thread_source for thread analytics (#20949)
## Summary - make `thread_source` an explicit optional thread-level field on `thread/start`, `thread/fork`, and returned thread payloads - persist `thread_source` in rollout/session metadata so resumed live threads retain the original value - replace the old best-effort `session_source` -> `thread_source` mapping with an explicit caller-supplied analytics classification ## Why Before this change, analytics `thread_source` was populated by a best-effort mapping from `session_source`. `session_source` describes the runtime/client surface, not the actual thread-level origin, so that projection was not accurate enough to distinguish cases such as `user`, `subagent`, `memory_consolidation`, and future thread origins reliably. Making `thread_source` explicit keeps one thread-level analytics field while letting callers provide the real classification directly instead of recovering it indirectly from `session_source`. ## Impact For new analytics events, `thread_source` now reflects the explicit thread-level classification supplied by the caller rather than an inferred value derived from `session_source`. Existing protocol fields remain optional; callers that omit `threadSource` now produce `null` instead of a best-effort inferred value. ## Validation - `just write-app-server-schema` - `cargo test -p codex-analytics -p codex-core -p codex-app-server-protocol --no-run` - `cargo test -p codex-app-server-protocol generated_ts_optional_nullable_fields_only_in_params` - `cargo test -p codex-analytics thread_initialized_event_serializes_expected_shape` - `cargo test -p codex-core resume_stopped_thread_from_rollout_preserves_thread_source`
rhan-oai ·
2026-05-06 02:12:31 +00:00 -
add turn items view to app-server turns (#21063)
## Why `Turn.items` currently overloads an empty array to mean either that no items exist or that the server intentionally did not load them for this response. That ambiguity blocks future lazy-loading work where clients need to distinguish unloaded, summary, and fully hydrated turn payloads. ## What changed - add a new `TurnItemsView` enum with `notLoaded`, `summary`, and `full` variants - add required `itemsView` metadata to app-server `Turn` payloads - mark reconstructed persisted history as `full` and live shell-style turn payloads as `notLoaded` - keep current `thread/turns/list` behavior unchanged and document that it still returns `full` turns today - regenerate the JSON and TypeScript protocol fixtures ## Verification - `just write-app-server-schema` - `cargo test -p codex-app-server-protocol` - `cargo test -p codex-app-server thread_read_can_include_turns` - `cargo test -p codex-app-server thread_turns_list_can_page_backward_and_forward` - `cargo test -p codex-app-server thread_resume_rejects_history_when_thread_is_running` - `just fix -p codex-app-server-protocol` - `just fix -p codex-app-server` - `just fmt`
rhan-oai ·
2026-05-05 19:17:16 +00:00 -
[codex-analytics] add tool item event schemas (#17089)
## Why Tool analytics need stable, typed payloads before the later lifecycle reducer starts emitting them. Keeping the event schema definitions isolated in their own PR makes the emitted surface reviewable separately from the reducer logic that produces those events. ## What changed - Adds the common tool-item analytics event base plus event payload types for command execution, file changes, MCP calls, dynamic tools, collaboration tools, web search, and image generation. - Extends `TrackEventRequest` with the corresponding tool-item variants. - Adds serialization coverage for the command-execution event shape. ## Verification - `cargo test -p codex-analytics` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/17089). * #18748 * #18747 * #17090 * __->__ #17089 * #20514
rhan-oai ·
2026-05-05 11:49:30 -07:00 -
edwardysun3 ·
2026-05-05 00:11:06 -04:00