codex

Add Guardian catalog diagnostics metadata (#27109 )

## Why

We need request-level evidence for Guardian cases where
`codex-auto-review` is missing from the client-side model catalog and
the review falls back to the parent model.

## What changed

- Add `guardian_catalog_contains_auto_review` to Guardian Responses API
client metadata.
- Add `guardian_model_provider_id` to Guardian Responses API client
metadata.
- Keep review-session metadata optional so callers without metadata
preserve the existing `None` path.
- Add tests for override, normal preferred-model, and
missing-auto-review-catalog behavior.

## Validation

- `just test -p codex-core
guardian_review_records_missing_auto_review_model_in_request_metadata`
- `just test -p codex-core
guardian_review_uses_model_catalog_override_when_preferred_review_model_exists`
- `just test -p codex-core
guardian_review_uses_preferred_review_model_without_model_catalog_override`
- `git diff --check origin/main`

Won Park · 2026-06-12 15:50:30 -07:00

0605f9c14f

Emit plugin ID on MCP tool call analytics events (#27483 )

MCP tool-call items already carry the runtime-resolved plugin owner, but
the analytics reducer dropped that field. Forwarding the existing value
provides direct attribution without downstream server-name inference.

## Summary

- emit `plugin_id` on `codex_mcp_tool_call_event` payloads
- preserve `null` for MCP calls without a plugin owner
- verify the serialized field through the MCP item lifecycle test

## Test

- `cd codex-rs && just test -p codex-analytics`
- `cd codex-rs && just fix -p codex-analytics`
- `cd codex-rs && just fmt`

Chris Dong · 2026-06-11 09:55:53 -07:00

df9dd22248

[codex-analytics] Emit structured compaction codex errors (#27082 )

## Summary
- replace raw compaction `error` analytics with `codex_error_kind` and
`codex_error_http_status_code`
- derive compaction error telemetry from `CodexErr` using the same
`CodexErrKind` mapping and HTTP status helper used by turn events
- remove the pre-compact hook stop reason from the internal compaction
outcome now that it is no longer emitted as raw analytics text

## Why
Compaction `error` was a raw `CodexErr::to_string()` value, which can
carry free-form provider or user-derived text. Structured Codex error
fields preserve useful low-cardinality telemetry without sending the raw
string.

## Validation
- `just fmt`
- `just test -p codex-analytics`
- `just test -p codex-core
compact::tests::build_token_limited_compacted_history_appends_summary_message`

Attempted `just test -p codex-core`; the changed crate compiled, but the
full target failed in unrelated environment-dependent tests such as
missing helper binaries and shell snapshot timeouts.

rhan-oai · 2026-06-11 06:07:06 +00:00

8d9f33c87c

[codex-analytics] report cached input tokens for v2 compaction (#27103 )

## Summary

- add nullable `cached_input_tokens` to the compaction analytics event
- populate it from response usage for compaction v2
- leave it `null` for other compaction implementations

This adds visibility into prompt-cache usage for v2 compaction without
changing compaction behavior.

## Testing

- `just test -p codex-analytics`
- `just test -p codex-core
collect_compaction_output_accepts_additional_output_items`

rhan-oai · 2026-06-10 22:47:22 -07:00

383708e74e

[codex] Compact when comp_hash changes (#27520 )

## Summary
- snapshot `comp_hash` into `TurnContext` when the turn is created and
use that snapshot as the downstream source of truth
- persist the turn hash in rollout context and recover it into
previous-turn settings during resume and fork replay
- compact existing history with the previous model only when both
adjacent turns provide hashes and the values differ
- record `comp_hash_changed` as the compaction reason
- cover ordinary transitions, resume, and missing-hash compatibility
with end-to-end tests

## Why
History produced under one compaction-compatible model configuration may
not be safe to carry directly into another. Compacting at the turn
boundary converts that history before context updates and the new user
message are added. Persisting the turn snapshot in `TurnContextItem`
makes the same protection work after resuming a rollout.

A missing hash is not treated as evidence of incompatibility. `None →
Some`, `Some → None`, and `None → None` do not trigger compaction; only
`Some(previous) → Some(current)` with unequal values does.

## Stack
- depends on #27532
- #27532 is based directly on `main`

## Testing
- `just test -p codex-core pre_sampling_compact_` — 6 passed
- `just test -p codex-core
turn_context_item_uses_turn_context_comp_hash_snapshot` — passed
- `just fix -p codex-core -p codex-protocol -p codex-analytics -p
codex-models-manager`

Ahmed Ibrahim · 2026-06-11 04:11:26 +00:00

ba4925b3c2

[codex-analytics] emit internally started turn events (#27392 )

## Why
Currently, the analytics reducer omits `codex_turn_event` for internally
started subagent turns
- It uses `TurnState.connection_id` to select app-server client and
runtime metadata
- `turn/start` sets this field for client-started turns, while internal
subagent turns bypass that path
- Spawned child threads inherit the correct connection, but turn
emission does not use thread state

## What Changed
- Keeps explicit `TurnState.connection_id` authoritative for
client-started turns
- Falls back to the matching thread’s inherited connection when the turn
connection is absent
- Preserves completeness gates, event schema, and post-emission state
removal
- Extends subagent lifecycle test coverage

## Verification
- `just test -p codex-analytics` (71 tests passed)
- `just fix -p codex-analytics`
- `just fmt`

marksteinbrick-oai · 2026-06-10 15:35:41 -07:00

b39f943a63

[codex] Retry transient Guardian review failures (#27062 )

## Background

Codex can use **Auto Review** for permission requests. Instead of asking
the user immediately, Codex starts a separate locked-down reviewer
session called **Guardian**, which returns a structured `allow` or
`deny` assessment.

The Guardian reviewer is itself a Codex session, so its model request
can fail for transient infrastructure reasons such as model overload,
HTTP connection failure, or response-stream disconnect. Today, any such
failure immediately ends the Auto Review attempt and blocks the action.

This PR adds bounded retries for failures that the existing protocol
explicitly identifies as transient.

Linear context:
[CA-539](https://linear.app/openai/issue/CA-539/retry-auto-review-infrastructure-failures-and-fall-back-to-manual)

## What changes

A Guardian review can now make at most **three total attempts**:

1. Run the review normally.
2. Retry after a jittered delay of roughly 180–220 ms if the first
attempt fails with an eligible error.
3. Retry after a jittered delay of roughly 360–440 ms if the second
attempt also fails with an eligible error.

All attempts share the original review deadline. Jitter spreads retries
from concurrent clients to reduce synchronized load during broader
outages. The retries do not reset the user's maximum wait time, and the
backoff waits terminate early if the review is cancelled or the deadline
expires.

Before retrying, the existing Guardian session lifecycle decides whether
the session remains usable. Healthy trunks are reused, broken trunks are
removed by the existing cleanup path, and ephemeral sessions continue to
clean themselves up.

The review still emits one logical lifecycle to clients. Recoverable
intermediate failures do not produce warnings or terminal events.

## Retry policy

### Retried up to twice

- model/server overload
- HTTP connection failure
- response-stream connection failure
- response-stream disconnect
- internal server error
- a final reviewer message that cannot be parsed as the required
Guardian assessment

### Not retried

- bad or invalid requests
- authentication failures
- usage limits
- cyber-policy failures
- errors without a structured category
- a request that already exhausted the lower-level Responses retry
budget
- a completed Guardian turn with no assessment payload
- prompt-construction failures
- Guardian review timeout
- cancellation or abort
- a valid `deny` assessment

The session-error classification uses `ErrorEvent.codex_error_info`; it
does not inspect error-message strings.

## Implementation notes

- `wait_for_guardian_review` preserves the complete `ErrorEvent`,
including structured `codex_error_info`.
- Guardian session failures preserve the original message and optional
structured `CodexErrorInfo`.
- The retry policy classifies the explicitly transient `CodexErrorInfo`
variants; unknown, absent, and deterministic categories are not retried.
- The Guardian session manager receives the caller's deadline rather
than creating a new timeout per attempt.
- Analytics record the final `attempt_count`.
- Retry orchestration does not add a separate session-cleanup protocol;
it relies on the existing trunk and ephemeral lifecycle decisions.

## Automated testing

Focused Guardian coverage verifies:

- every supported transient `CodexErrorInfo` is classified as retryable,
while absent and non-transient categories are not;
- structured transient session failure -> retry -> approval with the
healthy trunk reused;
- two invalid Guardian responses -> third attempt -> approval, with
exactly three requests;
- three invalid responses -> existing fail-closed result, with exactly
three requests and one terminal lifecycle;
- valid denial, missing payload, invalid request, timeout, cancellation,
and prompt/session construction failures are not retried;
- retry eligibility ends after the third attempt;
- retry delays use the shared exponential backoff helper and remain
within the expected jitter bounds;
- cancellation and deadline expiry interrupt the backoff wait;
- healthy trunks are reused across retryable failures;
- broken event streams remove the trunk through the existing lifecycle
cleanup;
- an ephemeral retry does not disturb a concurrent trunk review.

Validation performed:

- `just test -p codex-core guardian_review_
guardian_ephemeral_retry_preserves_parallel_trunk_and_fork_history
run_review_removes_trunk_when_event_stream_is_broken` — **42 passed**;
- `just test -p codex-analytics` — **71 passed**;
- scoped Clippy fixes for `codex-core` and `codex-analytics` passed.

A prior full `codex-core` run had unrelated environment-sensitive
failures outside Guardian coverage.

## Manual QA

The focused integration tests use the local mock Responses server to
inspect exact request counts and emitted lifecycle events. They confirm
that retries are internal, a successful later attempt supplies the final
decision, non-retryable failures issue only one request, and exhausted
retries emit only one terminal result.

kbazzi · 2026-06-10 11:46:57 -07:00

ccf1a18518

[codex] Fix post-merge analytics integration failures (#27285 )

## Why

Recent merges left `main` with analytics integration build failures.
Local Cargo runs also made the trimmed-skills test depend on
developer-installed skills, while Bazel used an isolated home.

## What changed

- Clone `thread_metadata.thread_source` when constructing goal analytics
event parameters.
- Group app-server thread extension inputs into
`ThreadExtensionDependencies`.
- Isolate the trimmed-skills test home so its exact fixture count is
stable across Cargo and Bazel.

## Validation

- `cargo check -p codex-analytics`
- `just test -p codex-analytics` (71 tests)
- `just test -p codex-app-server` (837 tests; one unrelated zsh-fork
timeout passed on retry)

Adam Perry @ OpenAI · 2026-06-09 20:52:09 -07:00

e0cb4ede4e

[codex-analytics] emit goal lifecycle analytics (#27078 )

## Why
- Currently, there is no analytics event for `/goal` behavior
- Existing events cannot identify goal execution or its resulting
outcome
- The original update in
[#26182](https://github.com/openai/codex/pull/26182) was implemented
before `/goal` moved into `codex-goal-extension`.

## What Changed
- Adds `codex_goal_event` serialization and enrichment to
`codex-analytics`
- Emits goal events from the canonical `codex-goal-extension` mutation
and accounting paths:
  - `created` when a new logical goal is persisted
  - `usage_accounted` when cumulative goal usage is persisted
  - `status_changed` when the stored goal status changes
  - `cleared` when the goal is deleted
- Preserves causal `turn_id` for turn driven events and uses null
attribution for external or idle lifecycle events
- Changes goal deletion to return the deleted row so `cleared` retains
the stable goal ID

## Event Details

Includes standard analytics metadata along with goal specific fields:
- `goal_id`: Stable ID stored in the local SQLite goal row and shared
across the goal's events
- `event_kind`: Observed operation (see the 4 lifecycle events cited in
the above bullet)
- `goal_status`: Resulting or last stored status: `active`, `paused`,
`blocked`, `usage_limited`, etc.
  - `has_token_budget`: Indicates whether a token budget is configured
  - `turn_id`: Causal turn ID, or null when no causal turn exists
- `cumulative_tokens_accounted`: Cumulative tokens on `usage_accounted`
events; null otherwise
- `cumulative_time_accounted_seconds`: Cumulative active time on
`usage_accounted` events; null otherwise

## Validation
- `just test -p codex-analytics -p codex-state -p codex-goal-extension`
- `just test -p codex-core -E 'test(/goal/)'`
- `just test -p codex-app-server`
- `cargo build -p codex-analytics -p codex-core -p codex-state -p
codex-app-server`

marksteinbrick-oai · 2026-06-09 18:45:54 -07:00

608b8b1cc6

[codex-analytics] add extensible feature thread sources (#27063 )

## Why
- `ThreadSource` currently defines a closed set of core-owned values
- Product features also create threads for background or scheduled work
- Adding every product-specific value to the core enum would require
repeated `codex-rs` protocol changes
- Feature-backed values let product callers provide precise attribution
while preserving the existing core classifications

## What Changed
- Adds `ThreadSource::Feature(String)` for app-owned thread source
values
- Represents all app-server v2 thread sources as scalar strings, so a
feature source is supplied as `"automation"`
- Persists and emits the feature's plain string label, so `"automation"`
produces `thread_source="automation"` in analytics
- Keeps `user`, `subagent`, and `memory_consolidation` as explicit
core-owned values and regenerates the app-server schemas and TypeScript
bindings

## Verification
- `just write-app-server-schema`
- `cargo check --workspace`
- `just test -p codex-protocol
feature_thread_source_serializes_as_its_app_owned_label`
- `just test -p codex-app-server-protocol
thread_sources_round_trip_as_scalar_labels`
- `cargo test -p codex-analytics
thread_initialized_event_serializes_expected_shape`
- `just fmt`

marksteinbrick-oai · 2026-06-09 12:27:10 -07:00

a71e040df5

multi-agent: add path-based v2 activity tracking (#27007 )

## Why

Multi-agent v2 identifies agents by canonical paths, but its tool
handlers still emitted the larger legacy collaboration begin/end events
built around nickname and role metadata. App-server, rollout-trace,
analytics, and TUI consumers therefore lacked one compact path-based
completion signal that behaved consistently across live events and
replay.

The TUI also needs a bounded `/agent` status surface for v2 agents. It
should use recent local activity for previews, refresh liveness without
loading full histories, and keep the legacy picker available when no
path-backed v2 agent is known.

## What changed

- Replace the v2 `spawn_agent`, `send_message`, `followup_task`, and
`interrupt_agent` legacy lifecycle emissions with a success-only
`SubAgentActivity` event. The event records the tool call ID, occurrence
time, affected thread, canonical agent path, and `started`,
`interacted`, or `interrupted` kind.
- Expose the activity as a completion-only app-server v2
`subAgentActivity` thread item in live notifications and reconstructed
history, regenerate the protocol schemas, and count it in sub-agent tool
analytics.
- Track canonical paths from live activity and loaded-thread metadata in
the TUI, and render the activity in live and replayed transcripts.
- Make `/agent` list running path-backed agents with summaries from
bounded local event buffers. Each summary is capped at 240 graphemes,
the scan is capped at six recent items, only the last three wrapped
lines are shown, and command output is omitted. Liveness falls back to
metadata-only `thread/read` when local turn state is unavailable.
- Persist the activity as a terminal rollout-trace runtime payload and
reduce it to the corresponding spawn, send, follow-up, or close
interaction edge. `interrupt_agent` is classified as a close-edge
operation.
- Preserve the legacy picker when no path-backed v2 agent is known.

## Compatibility

App-server v2 clients that consumed `collabAgentToolCall` begin/end
pairs for these tools must handle the new completion-only
`subAgentActivity` item. Legacy v1 collaboration behavior is unchanged.

## Screenshot

<img width="684" height="288" alt="Screenshot 2026-06-08 at 15 40 47"
src="https://github.com/user-attachments/assets/194b3cd0-619d-45fb-b587-cf3e2b1b8a1d"
/>

## Testing

- `just test -p codex-app-server-protocol`
- `just test -p codex-rollout-trace`
- Added focused coverage for activity analytics, terminal trace
serialization, spawn-edge reduction, `interrupt_agent` classification,
TUI status rendering without aggregated command output, and clearing
stale running state after a completed turn.

jif · 2026-06-09 12:14:48 +02:00

fae2709320

[codex-analytics] stop sending codex error subreason (#27060 )

## Summary
- stop emitting `codex_error_subreason` on `codex_turn_event`
- remove the transient analytics fact plumbing that copied
`CodexErr::InvalidRequest(String)` into the event
- update analytics serialization coverage accordingly

## Why
`codex_error_subreason` is a free-form copy of `InvalidRequest(String)`,
including raw provider 400 bodies in some paths. That makes it unsafe as
an analytics field because it can carry user-derived or sensitive text.

## Validation
- `just fmt`
- `just test -p codex-analytics`

rhan-oai · 2026-06-08 21:29:06 +00:00

ee6c91d5cf

[codex-analytics] report compaction analytics details (#26680 )

## Why

Compaction analytics adds retained image count and compaction summary
output tokens for v1.5 specifically.

## What changed

- Add nullable `retained_image_count` and `compaction_summary_tokens`
fields to `codex_compaction_event`.
- Populate them only for `responses_compaction_v2`: retained images come
from the retained v2 compacted history, and summary tokens come from
`response.completed.token_usage.output_tokens`.
- Leave local and legacy remote compaction events as `null` for these
detail fields.

## Verification

- `just fmt`
- `just fix -p codex-core`
- `just test -p codex-core
build_v2_compacted_history_counts_retained_input_images`
- `git diff --check`

rhan-oai · 2026-06-08 10:52:31 -07:00

f1c18df9ae

[codex] Add turn profiling analytics (#26484 )

## Summary

Add flat profiling fields to `codex_turn_event` so analytics can explain
where turn wall-clock time is spent without changing tool execution
behavior.

The profile reports:
- time before the first sampling request
- sampling time across all attempts and follow-ups
- overhead between sampling requests
- time blocked in the post-sampling tool drain
- time after the final sampling request
- sampling request and retry counts

## Implementation

- Extend the existing turn timing state with constant-memory phase
accounting and one RAII phase guard.
- Observe sampling and the existing post-sampling drain only at turn
orchestration boundaries.
- Keep tool runtime, tool futures, response item handling, and turn
lifecycle values unchanged.
- Add the profiling fields directly to the existing analytics turn event
without changing app-server protocol or rollout persistence.
- Use the existing turn `status` to distinguish completed, failed, and
interrupted profiles.

Exact sampling/tool overlap is intentionally omitted because measuring
tool completion accurately would require hooks in the tool execution
path.

## Validation

- Add app-server end-to-end coverage for a single-sampling turn with no
blocking tool work.
- Add app-server end-to-end coverage for `request_user_input` blocking
followed by a second sampling request.
- CI is running on the PR; tests were not executed locally per
repository guidance.

Ahmed Ibrahim · 2026-06-05 11:27:10 -07:00

8d72fb6de9

[codex-analytics] emit forked thread id on initialization (#26248 )

## Why
- Thread initialization analytics do not identify the source thread for
forked threads.
- The session viewer needs this lineage to construct thread trees.
- Depends on openai/openai#987854. Do not release this change before
that backend schema change is deployed.

## What Changed
- Adds optional `forked_from_thread_id` to `codex_thread_initialized`.
- Populates it from the existing thread fork lineage for app-server and
in-process subagent initialization paths.
- Keeps it null for non-forked threads.

## Verification
- `just fmt`
- `just test -p codex-analytics`
- `just test -p codex-app-server
thread_fork_tracks_thread_initialized_analytics`

kbazzi · 2026-06-04 11:24:12 -07:00

9e41f8ddbe

log plugin MCP server names (#26002 )

## Summary
- emit the plugin capability summary's exact MCP server names in
`codex_plugin_used`

## Test
- `just test -p codex-analytics`
- `just test -p codex-core
explicit_plugin_mentions_track_plugin_used_analytics`
- `just fix -p codex-analytics`

Chris Dong · 2026-06-03 16:06:52 -07:00

4d4837c495

Populate workspace kind on Codex turn events (#25135 )

## Summary
- carry `workspace_kind` from Responses API client metadata into the
turn resolved analytics fact
- serialize the optional value on `codex_turn_event`
- cover both the turn metadata source and turn event serialization

The `workspace_kind` tells us whether a thread had a project attached vs
projectless. this is an indicator for who is adopting Codex for
knowledge work outside of coding

## Testing
- `env UV_CACHE_DIR=/private/tmp/uv-cache
/private/tmp/cargo-tools/bin/just fmt`
- `env PATH=/private/tmp/cargo-tools/bin:$PATH
CARGO_HOME=/private/tmp/cargo-home UV_CACHE_DIR=/private/tmp/uv-cache
/private/tmp/cargo-tools/bin/just test -p codex-analytics`
- `env PATH=/private/tmp/cargo-tools/bin:$PATH
CARGO_HOME=/private/tmp/cargo-home UV_CACHE_DIR=/private/tmp/uv-cache
/private/tmp/cargo-tools/bin/just test -p codex-core turn_metadata`

Paired with openai/openai#970661, which keeps forwarding the same
metadata key through Responses API headers.

knittel-openai · 2026-06-02 12:46:14 -07:00

b794182ea7

Propagate permission approval environment id (#25862 )

## Stack

1. #25850 - Key request-permission grants by environment: stores and
applies sticky permission grants per environment id.
2. #25858 - Add `environmentId` to `request_permissions`: lets the model
target a selected environment and resolves relative permission paths
against it.
3. This PR (#25862) - Propagate permission approval environment id:
carries the selected environment id through approval events, app-server
requests, TUI prompts, and delegate forwarding.
4. #25867 - Add remote request permissions integration coverage:
verifies the selected remote environment across request, approval, grant
reuse, and exec.

This PR is stacked on #25858, and #25867 is stacked on this PR.

## Why

PR2 lets the model bind a `request_permissions` call to a selected
environment, but the approval event and client-facing request still
needed to carry that binding. For CCA, the user-facing prompt and
delegated approval path should know which environment the grant applies
to instead of relying on cwd alone.

## What Changed

- Added optional `environmentId` to `RequestPermissionsEvent`.
- Emit the selected environment id from core permission approval events.
- Preserve the environment id through delegate forwarding, including
cwd-based delegated requests.
- Added `environmentId` to app-server permission approval params,
generated schema/TypeScript artifacts, and README examples.
- Preserve and display the environment id in TUI permission approval
prompts.
- Updated focused core, app-server protocol, and TUI conversion
coverage.

## Testing

Not run locally per instruction. Performed read-only `git diff --check`.

jif · 2026-06-02 21:09:34 +02:00

9de568372d

[codex-analytics] Track CodexErr details in turn analytics (#25707 )

## Summary
- add analytics-only `CodexErr` telemetry to `codex_turn_event` while
leaving existing `turn_error` unchanged
- record terminal `CodexErr` facts from core immediately before the
existing turn error event is sent
- emit source-truth `codex_error_*` fields for downstream analytics,
including the raw `CodexErr::InvalidRequest(String)` message as
`codex_error_subreason`

## Validation
- `just test -p codex-analytics`
- attempted `just test -p codex-core`, but the local run timed out
across unrelated integration suites in this environment and is not being
used as validation

rhan-oai · 2026-06-02 11:40:35 -07:00

8e4b92d294

store and expose parent_thread_id on Threads (#25113 )

## Why

This PR
https://github.com/openai/codex/pull/24161#discussion_r3325692763
revealed a subagent data modeling issue, where we overloaded
`forked_from_id` to also mean `parent_thread_id`. That's incorrect since
guardian and review subagents can be a subagent and NOT fork the main
thread's history.

The solution here is to explicitly store a new `parent_thread_id` on
`SessionMeta`, alongside `forked_from_id` which already exists. While
we're at it, also expose it in the app-server protocol on the `Thread`
object.

A thread->subagent relationship and a fork of thread history are
orthogonal concepts.

## What Changed

- Added top-level `parent_thread_id` persistence on `SessionMeta` and
runtime/session plumbing through `SessionConfiguredEvent`,
`CodexSpawnArgs`, `SessionConfiguration`, `ThreadConfigSnapshot`,
`TurnContext`, and `ModelClient`.
- Made turn metadata, request headers, analytics, and subagent-start
events read the separate runtime/top-level parent field instead of
deriving general parent lineage from `SessionSource` or
`forked_from_thread_id`.
- Passed parent lineage separately at delegated subagent, review,
guardian, agent-job, and multi-agent spawn construction sites;
copied-history fork lineage remains derived only from `InitialHistory`.
- Persisted and exposed parent lineage through rollout/thread-store
projections and app-server v2 `Thread.parentThreadId`.
- Updated app-server README text and regenerated app-server schema
fixtures for the additive `parentThreadId` response field.

Owen Lin · 2026-06-01 04:33:20 +00:00

cf0911076f

Add cloud-managed config layer support (#24620 )

## Summary

PR 3 of 5 in the cloud-managed config client stack.

Adds enterprise-managed cloud config as a first-class config layer
source. The layer metadata is preserved through config loading,
diagnostics, debug output, hook attribution, and app-server protocol
surfaces.

## Details

- Enterprise-managed config becomes a normal config layer source with
backend-supplied `id` and display `name` attached for provenance.
- These layers are designed to behave like non-file managed config: they
can surface syntax/type diagnostics by layer name even though there is
no physical config file.
- Relative path settings are resolved from a stored config base so
cloud-delivered config remains consistent with existing MDM-delivered
config semantics.
- Hook attribution distinguishes config-delivered hooks from
requirements-delivered hooks via `HookSource::CloudManagedConfig`.
- This remains pull-based and snapshot-oriented; the PR adds layer
identity/diagnostics, not dynamic reload behavior.

## Validation

Validated through the targeted stack checks after rebasing onto current
`main`:

- Rust crate tests for
config/hooks/cloud-config/backend-client/app-server-protocol
- Filtered `codex-core` and `codex-app-server` `cloud_config_bundle`
tests
- Python generated-file contract test
- `cargo shear --deny-warnings`
- Targeted `argument-comment-lint` for config/hooks

joeflorencio-openai · 2026-05-31 15:54:31 -07:00

8a556296f0

Add subagent lineage metadata for responsesapi (#24161 )

## Why

We recently added `forked_from_thread_id` which lets us trace where a
thread's _context_ comes from, but we also want to understand subagent
lineage (e.g. which parent thread spawned this subagent? what kind of
subagent is it?) which is orthogonal.

This PR adds `parent_thread_id` and `subagent_kind` to the
`x-codex-turn-metadata` header sent to ResponsesAPI.

## What changed

- Adds `parent_thread_id` and `subagent_kind` to core-owned
`x-codex-turn-metadata`.
- Restores persisted `SessionSource` and `ThreadSource` from resumed
session metadata so cold-resumed subagent threads keep their lineage on
later Responses API requests.
- Centralizes parent-thread extraction on `SessionSource` /
`SubAgentSource` and reuses it in the Responses client, analytics, agent
control, and state parsing paths.
- Extends reserved-key, git-enrichment, thread-spawn, and app-server v2
metadata coverage for the new lineage fields.

## Verification

- Not run locally per request.
- Added focused coverage in `core/src/turn_metadata_tests.rs` and
`app-server/tests/suite/v2/client_metadata.rs`.

Owen Lin · 2026-05-29 11:28:12 -07:00

fc9cf62efb

[codex] Add user input client ids (#24653 )

## Summary

Adds an optional `clientId` field to app-server v2 `UserInput` and
carries it through the core `UserInput` model so clients can correlate
echoed user input items without relying on payload equality.

## Details

- Adds `client_id: Option<String>` to core `UserInput` variants.
- Exposes the v2 app-server field as `clientId` on the wire and in
generated TypeScript.
- Preserves the id when converting between app-server v2 and core
protocol types.
- Regenerates app-server schema fixtures.

## Validation

- `just fmt`
- `just write-app-server-schema`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-protocol`
- `just fix -p codex-app-server-protocol`
- `just fix -p codex-protocol`
- `git diff --check`

Alexi Christakis · 2026-05-28 14:54:39 -07:00

e92c952b2e

feat(app-server): include turns page on thread resume (#23534 )

## Summary

The client currently calls `thread/resume` to establish live updates and
immediately follows it with `thread/turns/list` to hydrate recent turns.
This lets `thread/resume` return that page directly, eliminating a round
trip and the ordering/deduplication gap between the two calls.

Experimental clients opt in with `initialTurnsPage: { limit,
sortDirection, itemsView }`. The response returns `initialTurnsPage` as
a `TurnsPage`, including cursors for paging further back in history.
Keeping the controls in a nested opt-in object provides the useful
`thread/turns/list` knobs without spreading page-specific parameters
across `thread/resume`.

## Verification

- `just fmt`
- `just write-app-server-schema --experimental`
- `just write-app-server-schema`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-app-server
thread_resume_initial_turns_page_matches_requested_turns_list_page
--tests`
- `cargo test -p codex-app-server
thread_resume_rejoins_running_thread_even_with_override_mismatch
--tests`
- `just fix -p codex-app-server-protocol -p codex-app-server`

Brent Traut · 2026-05-28 09:18:13 -07:00

2a1158b8e2

[codex-analytics] add grouped session id to runtime events (#24655 )

## Why
- Runtime analytics events report `thread_id`, which identifies the
individual thread emitting an event
- They don't report `session_id`, which identifies the shared session
for a root thread and its subagent threads
- Emitting both identifiers allows analytics to group related activity

## What Changed
- Adds `session_id` to relevant analytics events (thread_initalized,
turn, turn_steer, compaction, guardian_review)
- Tracks each thread's session ID in the analytics reducer so subsequent
thread scoped events emit the same value
- Carries the shared session ID through subagent initialization

## Verification
- `just test -p codex-analytics` validates event payloads and subagent
session grouping.
- Focused `codex-app-server` tests validate session IDs for thread,
turn, and steer events.
- Focused `codex-core` tests validate root and subagent session ID
propagation.

marksteinbrick-oai · 2026-05-26 16:38:46 -07:00

487521733b

Add experimental turn additional context (#24154 )

## Summary

Adds experimental `additionalContext` support to `turn/start` and
`turn/steer` so clients can provide ephemeral external context, such as
browser or automation state, without turning that plumbing into a
visible user prompt or triggering user-prompt lifecycle behavior.

## API Shape

The parameter shape is:

```ts
additionalContext?: Record<string, {
  value: string
  kind: "untrusted" | "application"
}> | null
```

Example:

```json
{
  "additionalContext": {
    "browser_info": {
      "value": "Active tab is CI failures.",
      "kind": "untrusted"
    },
    "automation_info": {
      "value": "CI rerun is in progress.",
      "kind": "application"
    }
  }
}
```

The keys are opaque and caller-defined.

## Context Injection

When provided, accepted entries are inserted into model context as
hidden contextual message items, not as visible thread user-message
items.

`kind: "untrusted"` entries are inserted with role `user`:

```text
<external_${key}>${value}</external_${key}>
```

`kind: "application"` entries are inserted with role `developer`:

```text
<${key}>${value}</${key}>
```

Values are not escaped. Each value is truncated to 1k approximate tokens
before wrapping.

For `turn/start`, accepted additional context is inserted before normal
user input. For `turn/steer`, additional context is merged only when the
steer includes non-empty user input; context-only steers still reject as
empty input.

## Dedupe Strategy

`AdditionalContextStore` lives on session state and stores the latest
complete additional-context map.

Each `turn/start` or non-empty `turn/steer` treats its
`additionalContext` as the current complete set of values. Entries are
injected only when the key is new or the exact entry for that key
changed, including `value` or `kind`. After merging, the store is
replaced with the provided map, so omitted keys are removed from the
retained set and can be injected again later if reintroduced.

Omitting `additionalContext`, passing `null`, or passing an empty object
resets the store to empty and injects nothing.

## What Changed

- Threads experimental v2 `additionalContext` through app-server into
core turn start and steer handling.
- Adds separate contextual fragment types for untrusted user-role
context and application developer-role context.
- Uses pending response input items so additional context can be
combined with normal user input without treating it as prompt text.
- Adds integration coverage for start/steer flow, role routing,
dedupe/reset behavior, deletion/re-add behavior, hook-blocked input
behavior, empty context-only steer rejection, external-fragment marker
matching, and truncation.

pakrym-oai · 2026-05-26 13:02:34 -07:00

768848ab6f

[codex-analytics] split compaction v2 analytics implementation (#24146 )

## What changed

- Add a distinct `responses_compaction_v2` value for
`CodexCompactionEvent.implementation`.
- Emit that value from the remote compaction v2 path.
- Keep local compaction as `responses` and legacy `/responses/compact`
as `responses_compact`.

## Why

Remote compaction v2 and local prompt-based compaction were both
reported as `responses`, which made the analytics table collapse two
different compaction mechanisms into one implementation bucket.

## Validation

- `just fmt`
- `just test -p codex-analytics`

`just test -p codex-core` was started locally, but this PR is
intentionally being pushed for CI to finish the remaining validation.

rhan-oai · 2026-05-22 21:34:22 +00:00

6419402a7c

[codex] Add plugin id to MCP tool call items (#23737 )

Add owning plugin id to MCP tool call items so we can better filter them
at plugin level.

## Summary
- add optional `plugin_id` to MCP tool-call items and legacy begin/end
events
- propagate plugin metadata into emitted core items and app-server v2
`ThreadItem::McpToolCall`
- preserve plugin ids through app-server replay/redaction paths and
regenerate v2 schema fixtures

## Testing
- `just write-app-server-schema`
- `just fmt`
- `just fix -p codex-core`
- `cargo test -p codex-protocol -p codex-app-server-protocol`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-core mcp_tool_call_item_includes_plugin_id --lib`
- `cargo check -p codex-tui --tests`
- `cargo check -p codex-app-server --tests`
- `git diff --check`

## Notes
- `just fix -p codex-core` completed with two non-fatal
`too_many_arguments` warnings on the touched MCP notification helpers.
- A broader `cargo test -p codex-core` run passed core unit tests, then
hit shell/sandbox/snapshot failures in the integration target.
- A broader app-server downstream run hit the existing
`in_process::tests::in_process_start_clamps_zero_channel_capacity` stack
overflow; `cargo test -p codex-exec` also hit the existing sandbox
expectation mismatch in
`thread_lifecycle_params_include_legacy_sandbox_when_no_active_profile`.

Matthew Zeng · 2026-05-20 17:02:10 -07:00

0a4179bb19

Add SubagentStop hook (#22873 )

# What

<img width="1792" height="1024" alt="image"
src="https://github.com/user-attachments/assets/8f81d232-5813-4994-a61d-e42a05a93a3e"
/>

`SubagentStop` runs when a thread-spawned subagent turn is about to
finish. Thread-spawned subagents use `SubagentStop` instead of the
normal root-agent `Stop` hook.

Configured handlers match on `agent_type`. Hook input includes the
normal stop fields plus:

- `agent_id`: the child thread id.
- `agent_type`: the resolved subagent type.
- `agent_transcript_path`: the child subagent transcript path.
- `transcript_path`: the parent thread transcript path.
- `last_assistant_message`: the final assistant message from the child
turn, when available.
- `stop_hook_active`: `true` when the child is already continuing
because an earlier stop-like hook blocked completion.

`SubagentStop` shares the same completion-control semantics as `Stop`,
scoped to the child turn:

- No decision allows the child turn to finish.
- `decision: "block"` with a non-empty `reason` records that reason as
hook feedback and continues the child with that prompt.
- `continue: false` stops the child turn. If `stopReason` is present,
Codex surfaces it as the stop reason.

# Lifecycle Scope

Only thread-spawned subagents run `SubagentStop`.

Internal/system subagents such as Review, Compact, MemoryConsolidation,
and Other do not run normal `Stop` hooks and do not run `SubagentStop`.
This avoids exposing synthetic matcher labels for internal
implementation paths.

# Stack

1. #22782: add `SubagentStart`.
2. This PR: add `SubagentStop`.
3. #22882: add subagent identity to normal hook inputs.

Abhinav · 2026-05-20 14:59:41 -07:00

eee3e60db3

Add SubagentStart hook (#22782 )

# What

`SubagentStart` runs once when Codex creates a thread-spawned subagent,
before that child sends its first model request. Thread-spawned
subagents use `SubagentStart` instead of the normal root-agent
`SessionStart` hook.

Configured handlers match on the subagent `agent_type`, using the same
value passed to `spawn_agent`. When no agent type is specified, Codex
uses the default agent type.

Hook input includes the normal session-start fields plus:

- `agent_id`: the child thread id.
- `agent_type`: the resolved subagent type.

`SubagentStart` may return `hookSpecificOutput.additionalContext`. That
context is added to the child conversation before the first model
request.

# Lifecycle Scope

Only thread-spawned subagents run `SubagentStart`.

Internal/system subagents such as Review, Compact, MemoryConsolidation,
and Other do not run normal `SessionStart` hooks and do not run
`SubagentStart`. This avoids exposing synthetic matcher labels for
internal implementation paths.

Also the `SessionStart` hook no longer fires for subagents, this matches
behavior with other coding agents' implementation

# Stack

1. This PR: add `SubagentStart`.
2. #22873: add `SubagentStop`.
3. #22882: add subagent identity to normal hook inputs.

Abhinav · 2026-05-19 12:45:08 -07:00

d661ab70ed

test: construct permission profiles directly (#23030 )

## Why

`SandboxPolicy` is now a legacy compatibility shape, but several tests
still built a `SandboxPolicy` only to immediately convert it into
`PermissionProfile` for APIs that already accept canonical runtime
permissions. Those detours make it harder to audit where legacy sandbox
policy is still required, because boundary-only usages are mixed
together with ordinary test setup.

## What Changed

- Updated tests in `codex-core`, `codex-exec`, `codex-analytics`, and
`codex-config` to construct `PermissionProfile` values directly when the
code under test takes a permission profile.
- Changed exec-policy, request-permissions, session, and sandbox test
helpers to pass `PermissionProfile` through instead of converting from
`SandboxPolicy` internally.
- Left `SandboxPolicy` in place where tests are explicitly exercising
legacy compatibility or request/response boundaries.

## Test Plan

- `cargo test -p codex-analytics -p codex-config`
- `cargo test -p codex-core --lib safety::tests`
- `cargo test -p codex-core --lib exec_policy::tests::`
- `cargo test -p codex-core --lib exec::tests`
- `cargo test -p codex-core --lib guardian_review_session_config`
- `cargo test -p codex-core --lib tools::network_approval::tests`
- `cargo test -p codex-core --lib
tools::runtimes::shell::unix_escalation::tests`
- `cargo test -p codex-core --lib managed_network`
- `cargo test -p codex-core --test all request_permissions::`
- `cargo test -p codex-exec sandbox`


---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/23030).
* #23036
* __->__ #23030

Michael Bolin · 2026-05-16 12:12:37 -07:00

d91bc15618

Preserve image detail in app-server inputs (#20693 )

## Summary

- Add optional image detail to user image inputs across core, app-server
v2, thread history/event mapping, and the generated app-server
schemas/types.
- Preserve requested detail when serializing Responses image inputs:
omitted detail stays on the existing `high` default, while explicit
`original` keeps local images on the original-resolution path.
- Support `high`/`original` consistently for tool image outputs,
including MCP `codex/imageDetail`, code-mode image helpers, and
`view_image`.

Curtis 'Fjord' Hawthorne · 2026-05-15 15:04:04 -07:00

8543e39885

app-server: stop returning thread permission profiles (#22792 )

## Why

The app-server thread lifecycle API should no longer expose the full
`PermissionProfile` value. After the permissions-profile migration,
clients should round-trip only the active profile identity through
`activePermissionProfile` and `permissions` when that identity is known.

The full profile is server-side config. Treating a response-derived
legacy sandbox projection as a new local profile can lose named-profile
restrictions and accidentally widen permissions on the next turn. The
legacy `sandbox` response field remains only as the
compatibility/display fallback.

## What Changed

- Removed `permissionProfile` from `ThreadStartResponse`,
`ThreadResumeResponse`, and `ThreadForkResponse`.
- Stopped populating that field in app-server thread start/resume/fork
responses.
- Updated embedded exec/TUI response mapping to derive display
permission state from local config or the legacy sandbox fallback
instead of a response profile value.
- Added a TUI turn override shape that distinguishes preserving server
permissions, selecting an active profile id, and sending a legacy
sandbox for an explicit local override.
- Preserved remote app-server permissions across turns by sending
`permissions` only when an `activePermissionProfile` id is known, and
otherwise sending no sandbox override unless the user selected a local
override.
- Kept embedded `thread/resume` hydration server-authored when
`activePermissionProfile` is absent, which matches the live-thread
attach path where the server ignores requested overrides.
- Updated the app-server README to remove the obsolete lifecycle
response `permissionProfile` reference. The remaining
`permissionProfile` README references are request-side permission
overrides.
- Regenerated app-server JSON schema and TypeScript fixtures.
- Kept the generated typed response enum exempt from
`large_enum_variant`, matching the existing payload enum exemption after
the lifecycle response variants shrank.

## How To Review

Start with `codex-rs/app-server-protocol/src/protocol/v2/thread.rs` to
confirm the response shape, then check the response construction in
`codex-rs/app-server/src/request_processors`. The generated schema and
TypeScript fixture changes are mechanical follow-through from the
protocol removal.

The TUI behavior is the delicate part: review
`codex-rs/tui/src/app_server_session.rs` for response hydration and
turn-start override projection, then
`codex-rs/tui/src/app/thread_routing.rs` for the decision about whether
the next turn should preserve the server snapshot, send an active
profile id, or send a legacy sandbox for an explicit local override.

## Verification

- `just write-app-server-schema`
- `cargo test -p codex-app-server-protocol
thread_lifecycle_responses_default_missing_optional_fields`
- `cargo test -p codex-exec
session_configured_from_thread_response_uses_permission_profile_from_config`
- `cargo test -p codex-tui --lib thread_response`
- `cargo test -p codex-tui turn_permissions_`
- `cargo test -p codex-tui
resume_response_restores_turns_from_thread_items`
- `cargo test -p codex-analytics
track_response_only_enqueues_analytics_relevant_responses`
- `just fix -p codex-analytics`
- `just fix -p codex-app-server-protocol`
- `just fix -p codex-tui`
- `just argument-comment-lint`

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/22792).
* #22795
* __->__ #22792

Michael Bolin · 2026-05-15 12:45:48 -07:00

83bbb4f326

app-server: use permission ids and runtime workspace roots (#22611 )

## Why

This PR builds on [#22610](https://github.com/openai/codex/pull/22610)
and is the app-server side of the migration from mutable per-turn
`SandboxPolicy` replacement toward selecting immutable permission
profiles by id plus mutable runtime workspace roots.

Once permission profiles can carry their own immutable
`workspace_roots`, app-server no longer needs to mutate the selected
`PermissionProfile` just to represent thread-specific filesystem
context. The mutable part now lives on the thread as explicit
`runtimeWorkspaceRoots`, while `:workspace_roots` remains symbolic until
the sandbox is realized for a turn.

## What Changed

- Replaced the v2 permission-selection wrapper surface with plain
profile ids for `thread/start`, `thread/resume`, `thread/fork`, and
`turn/start`.
- Removed the API surface for profile modifications
(`PermissionProfileSelectionParams`,
`PermissionProfileModificationParams`,
`ActivePermissionProfileModification`).
- Added experimental `runtimeWorkspaceRoots` fields to the thread
lifecycle and turn-start APIs.
- Threaded runtime workspace roots through core session/thread
snapshots, turn overrides, app-server request handling, and command
execution permission resolution.
- Kept session permission state symbolic so later runtime root updates
and cwd-only implicit-root retargeting rebind `:workspace_roots`
correctly.
- Updated the embedded clients just enough to send and restore the new
thread state.
- Refreshed the generated schema/TypeScript artifacts and the app-server
README to match the new contract.

## Verification

Targeted coverage for this layer lives in:

- `codex-rs/app-server-protocol/src/protocol/v2/tests.rs`
- `codex-rs/app-server/tests/suite/v2/thread_start.rs`
- `codex-rs/app-server/tests/suite/v2/thread_resume.rs`
- `codex-rs/app-server/tests/suite/v2/turn_start.rs`
- `codex-rs/core/src/session/tests.rs`

The key regression checks exercise that:

- `runtimeWorkspaceRoots` resolve against the effective cwd on thread
start.
- Profile-declared workspace roots are excluded from the runtime
workspace roots returned by app-server.
- A turn-level runtime workspace-root update persists onto the thread
and is returned by `thread/resume`.
- A named permission profile selected on one turn remains symbolic so a
later runtime-root-only turn update changes the actual sandbox writes.
- A cwd-only turn update retargets the implicit runtime cwd root while
preserving additional runtime roots.
- The protocol fixtures and generated client artifacts stay in sync with
the string-based permission selection contract.











---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/22611).
* #22612
* __->__ #22611

Michael Bolin · 2026-05-14 23:00:05 -07:00

8a5306ff88

Stop uploading accepted line fingerprints (#22180 )

## Summary
- keep accepted-line diff parsing and fingerprint hashing logic locally
- stop uploading path/line hash fingerprints in the accepted-line
analytics event payload
- keep aggregate accepted added/deleted line counts in the event

## Testing
- just fmt
- cargo test -p codex-analytics
- just fix -p codex-analytics

alexsong-oai · 2026-05-11 15:41:38 -07:00

bb6134c028

[codex-analytics] emit terminal review events (#18748 )

## Why

Review telemetry should describe reviews as first-class events, not only
as counters denormalized onto terminal tool-item events. That lets us
analyze guardian and user reviews consistently across command execution,
file changes, permissions, and network access, while still preserving
the terminal item summaries that existing tool analytics need.

To make those review events accurate, analytics also needs the observed
completion time for each review and enough command metadata to
distinguish `shell` from `unified_exec` reviews.

## What changed

- emit generic `codex_review_event` rows for completed user and guardian
reviews, with review subjects, reviewer, trigger, terminal status,
resolution, and observed duration
- reduce approval request / response / abort facts into review events
for command execution, file change, and permissions flows
- keep denormalized review counts, final approval outcome, and
permission-request flags on terminal tool-item events for
item-associated reviews
- plumb review completion timing so user-review responses and aborts use
app-server-observed completion times, while guardian analytics reuse the
same terminal timestamps emitted on guardian assessment events
- carry command approval `source` through the protocol and app-server
layers so review analytics can distinguish `shell` from `unified_exec`
- add analytics coverage for user-review emission, guardian-review
emission, permission reviews that should not denormalize onto tool
items, item-summary isolation across threads, and the serialized
review-event shape

## Verification

- `cargo test -p codex-analytics`

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18748).
* __->__ #18748
* #21434
* #18747
* #17090
* #17089
* #20514

rhan-oai · 2026-05-11 22:13:32 +00:00

a175ddacc0

[codex-analytics] add turn tool counts to turn events (#21431 )

## Summary
- accumulate completed tool-item counts per turn from the item lifecycle
- populate the reserved count fields on `codex_turn_event`
- add reducer coverage for zero-count turns and mixed completed tool
items

## Why
PR #17090 moved tool-item analytics onto the item lifecycle, so the turn
reducer can now derive the per-turn tool counts from the same completed
items instead of leaving the reserved fields null.

## Validation
- `just fmt`
- `cargo test -p codex-analytics`

rhan-oai · 2026-05-11 18:18:02 +00:00

cf6342b75b

[codex] request desktop attestation from app (#20619 )

## Summary

TL;DR: teaches `codex-rs` / app-server to request a desktop-provided
attestation token and attach it as `x-oai-attestation` on the scoped
ChatGPT Codex request paths.

![DeviceCheck attestation
interface](https://raw.githubusercontent.com/openai/codex/dev/jm/devicecheck-diagram-assets/pr-assets/devicecheck-attestation-interface.png)

## Details

This PR teaches the Codex app-server runtime how to request and attach
an attestation token. It does not generate DeviceCheck tokens directly;
instead, it relies on the connected desktop app to advertise that it can
generate attestation and then asks that app for a fresh header value
when needed.

The flow is:

1. The Codex desktop app connects to app-server.
2. During `initialize`, the app can advertise that it supports
`requestAttestation`.
3. Before app-server calls selected ChatGPT Codex endpoints, it sends
the internal server request `attestation/generate` to the app.
4. app-server receives a pre-encoded header value back.
5. app-server forwards that value as `x-oai-attestation` on the scoped
outbound requests.

The code in this repo is mostly protocol and runtime plumbing: it adds
the app-server request/response shape, introduces an attestation
provider in core, wires that provider into Responses / compaction /
realtime setup paths, and covers the intended scoping with tests. The
signed macOS DeviceCheck generation remains owned by the desktop app PR.

## Related PR

- Codex desktop app implementation:
https://github.com/openai/openai/pull/878649

## Validation

<details>
<summary>Tests run</summary>

```sh
cargo test -p codex-app-server-protocol
cargo test -p codex-core attestation --lib
cargo test -p codex-app-server --lib attestation
```

Also ran:

```sh
just fix -p codex-core
just fix -p codex-app-server
just fix -p codex-app-server-protocol
just fmt
just write-app-server-schema
```

</details>

<details>
<summary>E2E DeviceCheck validation</summary>

First validated the signed desktop app boundary directly: launched a
packaged signed `Codex.app`, sent `attestation/generate`, decoded the
returned `v1.` attestation header, and validated the extracted
DeviceCheck token with `personal/jm/verify_devicecheck_token.py` using
bundle ID `com.openai.codex`. Apple returned `status_code: 200` and
`is_ok: true`.

Then ran the fuller app + app-server flow. The packaged `Codex.app`
launched a current-branch app-server via `CODEX_CLI_PATH`, and a local
MITM proxy intercepted outbound `chatgpt.com` traffic. The app-server
requested `attestation/generate` from the real Electron app process, and
the intercepted `/backend-api/codex/responses` traffic included
`x-oai-attestation` on both routes:

```text
GET /backend-api/codex/responses Upgrade: websocket x-oai-attestation: present
POST /backend-api/codex/responses Upgrade: none x-oai-attestation: present
```

The captured header decoded to a DeviceCheck token that also validated
with Apple for `com.openai.codex` (`status_code: 200`, `is_ok: true`,
team `2DC432GLL2`).

</details>

---------

Co-authored-by: Codex <noreply@openai.com>

Jiaming Zhang · 2026-05-08 12:36:02 -07:00

5f4d0ec343

Emit accepted line fingerprint analytics (#21601 )

## Why

Codex assisted-code attribution needs a client-side accepted-code source
that does not upload raw code. This adds a hash-only analytics event
derived from the turn diff so downstream attribution can compare
accepted Codex lines against commit or PR diffs.

## What Changed

- Parse accepted/effective added lines from the final turn diff and emit
`codex_accepted_line_fingerprints` analytics.
- Hash repo, path, and normalized line content before upload; raw code
and raw diffs are not included in the event.
- Chunk large fingerprint payloads and send accepted-line fingerprint
events in isolated requests while preserving normal batching for other
analytics events.
- Canonicalize Git remote URLs before repo hashing so SSH/HTTPS GitHub
remotes join to the same repo hash.
- Add parser coverage for unified diff hunk lines that look like `+++`
or `---` file headers.

## Verification

- `cargo test -p codex-analytics`
- `cargo test -p codex-git-utils canonicalize_git_remote_url`
- `just fix -p codex-analytics`
- `just bazel-lock-check`
- `git diff --check`

alexsong-oai · 2026-05-08 12:16:24 -07:00

bbb6bf0a37

[codex-analytics] plumb protocol-native review timing (#21434 )

## Why

We want terminal tool review analytics, but the reducer should not stamp
review timing from its own wall clock.

This PR plumbs review timing through the real protocol and app-server
seams so downstream analytics can consume the emitter's timestamps
directly. Guardian reviews keep their enriched `started_at` /
`completed_at` analytics fields by deriving those legacy second-based
values from the same protocol-native millisecond lifecycle timestamps,
rather than sampling a separate analytics clock.

## What changed

- add `started_at_ms` to user approval request payloads
- add `started_at_ms` / `completed_at_ms` to guardian review
notifications
- preserve Guardian review `started_at` / `completed_at` enrichment from
the protocol-native timing source
- stamp typed `ServerResponse` analytics facts with app-server-observed
`completed_at_ms`
- thread the new timing fields through core, protocol, app-server, TUI,
and analytics fixtures

## Verification

- `cargo test -p codex-app-server outgoing_message --manifest-path
codex-rs/Cargo.toml`
- `cargo test -p codex-app-server-protocol guardian --manifest-path
codex-rs/Cargo.toml`
- `cargo test -p codex-tui guardian --manifest-path codex-rs/Cargo.toml`
- `cargo test -p codex-analytics analytics_client_tests --manifest-path
codex-rs/Cargo.toml`

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/21434).
* #18748
* __->__ #21434
* #18747
* #17090
* #17089
* #20514

rhan-oai · 2026-05-07 20:31:41 -07:00

99016ec732

[codex-analytics] add tool review event schema (#18747 )

## Why

We want to emit terminal review analytics for tool-related approval
flows, but the event contract needs to exist before the reducer can
publish anything.

This PR is the schema-only slice for the Codex review event family.

## What changed

- add the `ReviewEvent` analytics envelope in
`codex-rs/analytics/src/events.rs`
- define the review subject kind, reviewer, trigger, terminal status,
and post-review resolution enums
- define the review event payload with thread, turn, item, lineage,
tool, and timing fields that the emitter stack will populate

## Verification

- stacked verification in dependent PRs: `cargo test -p codex-analytics
analytics_client_tests --manifest-path codex-rs/Cargo.toml`

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18747).
* #18748
* #21434
* __->__ #18747
* #17090
* #17089
* #20514

rhan-oai · 2026-05-07 09:46:46 -07:00

3444b0d60a

Add compact lifecycle hooks (started by vincentkoc - external contrib) (#19905 )

Based on work from Vincent K -
https://github.com/openai/codex/pull/19060

<img width="1836" height="642" alt="CleanShot 2026-04-29 at 20 47 40@2x"
src="https://github.com/user-attachments/assets/b647bb89-65fe-40c8-80b0-7a6b7c984634"
/>

## Why

Compaction rewrites the conversation context that future model turns
receive, but hooks currently have no deterministic lifecycle point
around that rewrite. This adds compact lifecycle hooks so users can
audit manual and automatic compaction, surface hook messages in the UI,
and run post-compaction follow-up without overloading tool or prompt
hooks.

## What Changed

- Added `PreCompact` and `PostCompact` hook events across hook config,
discovery, dispatch, generated schemas, app-server notifications,
analytics, and TUI hook rendering.
- Added trigger matching for compact hooks with the documented `manual`
and `auto` matcher values.
- Wired `PreCompact` before both local and remote compaction, and
`PostCompact` after successful local or remote compaction.
- Kept compact hook command input to lifecycle metadata: session id,
Codex turn id, transcript path, cwd, hook event name, model, and
trigger.
- Made compact stdout handling consistent with other hooks: plain stdout
is ignored as debug output, while malformed JSON-looking stdout is
reported as failed hook output.
- Added integration coverage for compact hook dispatch, trigger
matching, post-compact execution, and the audited behavior that
`decision:"block"` does not block compaction.

## Out of Scope

- Hook-specific compaction blocking is not implemented;
`decision:"block"` and exit-code-2 blocking semantics are intentionally
unsupported for `PreCompact`.
- Custom compaction instructions are not exposed to compact hooks in
this PR.
- Compact summaries, summary character counts, and summary previews are
not exposed to compact hooks in this PR.

## Verification

- `cargo test -p codex-hooks`
- `cargo test -p codex-core
manual_pre_compact_block_decision_does_not_block_compaction`
- `cargo test -p codex-app-server hooks_list`
- `cargo test -p codex-core config_schema_matches_fixture`
- `cargo test -p codex-tui hooks_browser`

## Docs

The developer documentation for Codex hooks should be updated alongside
this feature to document `PreCompact` and `PostCompact`, the
`manual`/`auto` matcher values, and the compact hook payload fields.

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>

Andrei Eternal · 2026-05-06 18:08:31 -07:00

527d52df03

[codex-analytics] emit tool item events from item lifecycle (#17090 )

## Why

After the tool-item schemas are in place, analytics needs to emit them
from the app-server item lifecycle rather than requiring bespoke
tracking at each callsite. The reducer should also reuse the shared
thread analytics context introduced below it in the stack so later event
families do not repeat the same reducer joins or missing-state ladder.

## What changed

- Tracks tool-item completion notifications and emits the matching tool
analytics event when a terminal item arrives.
- Derives event-specific payload details for command execution, file
changes, MCP calls, dynamic tools, collaboration tools, web search, and
image generation.
- Denormalizes thread, app-server client, runtime, and subagent
provenance metadata through the shared thread analytics context.
- Adds reducer coverage for item lifecycle emission and subagent
metadata inheritance.

## Duration semantics

`duration_ms` is computed from the app-server item lifecycle timestamps:
`completed_at_ms - started_at_ms`. That makes it the duration of the
lifecycle Codex observed locally, not necessarily the upstream
provider's full execution time.

- Web search usually has a meaningful observed lifecycle because
Responses can send `response.output_item.added` before
`response.output_item.done`; in that case `started_at_ms` comes from the
added event and `completed_at_ms` comes from the done event.
- Image generation can be much less precise. In the current observed
stream, image generation often arrives only as a completed
`response.output_item.done`; when there is no earlier added event, Codex
synthesizes the started item immediately before completion, so
`duration_ms` can be `0` even though upstream image generation took
longer.
- Standalone web search and standalone image generation work is expected
to land after this stack. Those paths may introduce more direct
lifecycle events or timing points, so the current
web-search/image-generation duration semantics should be treated as the
best available item-lifecycle approximation, not the final latency
contract for those tool families.
- `execution_duration_ms` is populated only where the completed item
already carries a native execution duration; otherwise it remains `null`
while `duration_ms` still reflects the local lifecycle interval.

## Currently placeholder / partial fields

Some fields are included in the schema for the intended steady-state
contract, but this PR does not yet populate them from real
approval/review state:

- `review_count`, `guardian_review_count`, and `user_review_count`
currently default to `0`.
- `final_approval_outcome` currently defaults to `unknown`.
- `requested_additional_permissions` and `requested_network_access`
currently default to `false`.

## Verification

- `cargo test -p codex-analytics`

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/17090).
* #18748
* #18747
* __->__ #17090
* #17089
* #20514

rhan-oai · 2026-05-06 20:27:41 +00:00

fbdbc6b2fe

feat(app-server): move v2 sessionId onto Thread (#21336 )

## Why

`session_id` and `thread_id` are separate identities after #20437, but
app-server only surfaced `sessionId` on the `thread/start`,
`thread/resume`, and `thread/fork` response envelopes. Other
thread-bearing surfaces such as `thread/list`, `thread/read`,
`thread/started`, `thread/rollback`, `thread/metadata/update`, and
`thread/unarchive` either lacked the grouping key or forced clients to
special-case those three responses.

Making `sessionId` part of the reusable `Thread` payload gives every v2
API surface one place to expose session-tree identity.

## Mental model
  1. thread.sessionId lives on `Thread`
2. It is a view/runtime identity for the current live session tree, not
durable stored lineage metadata
3. When app-server has a live loaded thread, it copies the real value
from core’s session_configured.session_id
4. When it only has stored/unloaded data, it falls back to
thread.sessionId = thread.id

## What changed

- Added `sessionId` to the v2
[`Thread`](https://github.com/openai/codex/blob/8fc9e9b4cf81b6f61d432e71f1eb266f6f104b63/codex-rs/app-server-protocol/src/protocol/v2/thread_data.rs#L105-L109).
- Removed the duplicate top-level `sessionId` fields from
`thread/start`, `thread/resume`, and `thread/fork`; clients should now
read `response.thread.sessionId`.
- Populated `thread.sessionId` when building live thread responses,
replaying loaded threads, and returning stored-thread summaries so the
field is present across start, resume, fork, list, read, rollback,
metadata-update, unarchive, and `thread/started` paths. See
[`load_thread_from_resume_source_or_send_internal`](https://github.com/openai/codex/blob/8fc9e9b4cf81b6f61d432e71f1eb266f6f104b63/codex-rs/app-server/src/request_processors/thread_processor.rs#L2824-L2918)
and
[`thread_from_stored_thread`](https://github.com/openai/codex/blob/8fc9e9b4cf81b6f61d432e71f1eb266f6f104b63/codex-rs/app-server/src/request_processors/thread_processor.rs#L3671-L3719).
- Preserved the stored-thread fallback: if a thread has not been loaded
into a live session tree yet, `thread.sessionId` falls back to
`thread.id`; once the thread is live again, the field reports the active
session tree root.
- Regenerated the JSON/TypeScript schemas and updated the app-server
README examples to show
[`thread.sessionId`](https://github.com/openai/codex/blob/8fc9e9b4cf81b6f61d432e71f1eb266f6f104b63/codex-rs/app-server/README.md#L306-L310)
on the thread object.

jif-oai · 2026-05-06 15:23:25 +02:00

5ecff05196

feat: return session ID from thread/fork (#21332 )

## Why

`thread/start` and `thread/resume` already return `sessionId`, but
`thread/fork` only returned the new thread. That left clients to infer
the forked thread's session identity from `thread.id`, which kept the
new `session_id` / `thread_id` split implicit at one lifecycle boundary.
Follow-up to #20437.

## What changed

- Add `sessionId` to `ThreadForkResponse`.
- Populate it from the forked session configuration.
- Regenerate the v2 JSON/TypeScript schema fixtures and update the
app-server docs/example.
- Extend the fork integration test to assert the returned `sessionId`.

## Verification

- Added coverage in `thread_fork_creates_new_thread_and_emits_started`
for the new response field.

jif-oai · 2026-05-06 12:04:27 +02:00

06e5dfa4dd

feat: add session_id (#20437 )

## Summary

Related to
https://openai.slack.com/archives/C095U48JNL9/p1777537279707449
TLDR:
We update the meaning of session ids and thread ids:
* thread_id stays as now
* session_id become a shared id between every thread under a /root
thread (i.e. every sub-agent share the same session id)

This PR introduces an explicit `SessionId` and threads it through the
protocol/client boundary so `session_id` and `thread_id` can diverge
when they need to, while preserving compatibility for older serialized
`session_configured` events.

---------

Co-authored-by: Codex <noreply@openai.com>

jif-oai · 2026-05-06 10:48:37 +02:00

a98623511b

[codex-analytics] rework thread_source for thread analytics (#20949 )

## Summary
- make `thread_source` an explicit optional thread-level field on
`thread/start`, `thread/fork`, and returned thread payloads
- persist `thread_source` in rollout/session metadata so resumed live
threads retain the original value
- replace the old best-effort `session_source` -> `thread_source`
mapping with an explicit caller-supplied analytics classification

## Why
Before this change, analytics `thread_source` was populated by a
best-effort mapping from `session_source`. `session_source` describes
the runtime/client surface, not the actual thread-level origin, so that
projection was not accurate enough to distinguish cases such as `user`,
`subagent`, `memory_consolidation`, and future thread origins reliably.

Making `thread_source` explicit keeps one thread-level analytics field
while letting callers provide the real classification directly instead
of recovering it indirectly from `session_source`.

## Impact
For new analytics events, `thread_source` now reflects the explicit
thread-level classification supplied by the caller rather than an
inferred value derived from `session_source`. Existing protocol fields
remain optional; callers that omit `threadSource` now produce `null`
instead of a best-effort inferred value.

## Validation
- `just write-app-server-schema`
- `cargo test -p codex-analytics -p codex-core -p
codex-app-server-protocol --no-run`
- `cargo test -p codex-app-server-protocol
generated_ts_optional_nullable_fields_only_in_params`
- `cargo test -p codex-analytics
thread_initialized_event_serializes_expected_shape`
- `cargo test -p codex-core
resume_stopped_thread_from_rollout_preserves_thread_source`

rhan-oai · 2026-05-06 02:12:31 +00:00

b3d4f1a9f0

add turn items view to app-server turns (#21063 )

## Why

`Turn.items` currently overloads an empty array to mean either that no
items exist or that the server intentionally did not load them for this
response. That ambiguity blocks future lazy-loading work where clients
need to distinguish unloaded, summary, and fully hydrated turn payloads.

## What changed

- add a new `TurnItemsView` enum with `notLoaded`, `summary`, and `full`
variants
- add required `itemsView` metadata to app-server `Turn` payloads
- mark reconstructed persisted history as `full` and live shell-style
turn payloads as `notLoaded`
- keep current `thread/turns/list` behavior unchanged and document that
it still returns `full` turns today
- regenerate the JSON and TypeScript protocol fixtures

## Verification

- `just write-app-server-schema`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-app-server thread_read_can_include_turns`
- `cargo test -p codex-app-server
thread_turns_list_can_page_backward_and_forward`
- `cargo test -p codex-app-server
thread_resume_rejects_history_when_thread_is_running`
- `just fix -p codex-app-server-protocol`
- `just fix -p codex-app-server`
- `just fmt`

rhan-oai · 2026-05-05 19:17:16 +00:00

9e0c191c13

[codex-analytics] add tool item event schemas (#17089 )

## Why

Tool analytics need stable, typed payloads before the later lifecycle
reducer starts emitting them. Keeping the event schema definitions
isolated in their own PR makes the emitted surface reviewable separately
from the reducer logic that produces those events.

## What changed

- Adds the common tool-item analytics event base plus event payload
types for command execution, file changes, MCP calls, dynamic tools,
collaboration tools, web search, and image generation.
- Extends `TrackEventRequest` with the corresponding tool-item variants.
- Adds serialization coverage for the command-execution event shape.

## Verification

- `cargo test -p codex-analytics`

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/17089).
* #18748
* #18747
* #17090
* __->__ #17089
* #20514

rhan-oai · 2026-05-05 11:49:30 -07:00

fb7e1eb6fc

Add turn_id to Codex skill invocation analytics (#21122 )

edwardysun3 · 2026-05-05 00:11:06 -04:00

7e71d02610

88 Commits