mirror of
https://github.com/pchuan98/codex.git
synced 2026-07-01 00:31:56 +08:00
dev
701 Commits
-
[codex] Treat max as a first-class reasoning effort (#30467)
## Why The Bedrock GPT-5.6 catalog advertises `max`, but Codex treated it as an opaque custom effort. That made the reasoning picker render it as lowercase `max` while known efforts use productized labels. Making `max` a known effort aligns catalog data, parsing, and UI presentation without changing the `max` wire value or persisted representation. ## What changed - Add first-class `ReasoningEffort::Max` parsing and serialization. - Use the typed effort in the Bedrock catalog and render it as `Max` in the TUI. - Preserve forward-compatible custom-effort coverage with a genuinely unknown `future` value. ### Before <img width="559" height="124" alt="Screenshot 2026-06-28 at 12 08 47 PM" src="https://github.com/user-attachments/assets/7c43cf4f-020b-4605-9239-0a9c97eb7364" /> ### After <img width="558" height="107" alt="Screenshot 2026-06-28 at 12 09 10 PM" src="https://github.com/user-attachments/assets/b9cc5ded-c940-43b4-b024-bba25abe0a17" />
Shijie Rao ·
2026-06-29 09:38:49 -07:00 -
[codex] Use model metadata for skills usage instructions (#29740)
## Summary - add a false-by-default `include_skills_usage_instructions` model metadata field - enable the field for the bundled `gpt-5.5` model metadata - consume the metadata in both core and extension skill rendering - remove hardcoded legacy-model matching and its marker plumbing
ani-oai ·
2026-06-29 09:44:36 +09:00 -
Preserve namespaces on custom tool calls (#30302)
## Summary - Preserve the optional namespace on custom tool calls during response deserialization and app-server replay. - Use the namespaced tool identifier for streaming argument handling and tool dispatch. - Regenerate app-server protocol schemas. - Add regression tests covering namespace serialization and routing. ## Testing - Ran affected protocol and app-server test suites. - Ran the full core test suite; two load-sensitive timing tests passed when rerun individually. - Ran Clippy and formatting checks. - Verified with a local end-to-end app-server replay that the namespace is preserved through the complete request/response flow.
nhamidi-oai ·
2026-06-27 09:54:56 -07:00 -
feat(protocol): define missing rollout turn items (#30282)
## Description This PR adds canonical core `TurnItem` shapes for command execution, dynamic tool calls, collab agent tool calls, and sub-agent activity, to be stored in the rollout file soon. It also teaches app-server protocol / `ThreadHistoryBuilder` how to render those items, and adds the small legacy fanout helpers needed for existing event-based consumers. No core producer or rollout persistence behavior changes here, that will be done in a followup. ## Making ThreadHistoryBuilder stateless This is the first PR in a stack to make `ThreadHistoryBuilder` stateless enough that we can materialize app-server `ThreadItem`s from only a given slice of `RolloutItem` history, without ever needing to replay the whole thread from the beginning. The persisted legacy `RolloutItem::EventMsg` records are mostly shaped like live UI events, not like materialized `ThreadItem`s. They work if we replay the full rollout in order, but they often do not contain enough stable identity or complete item state to project an arbitrary suffix on its own. A few examples: - `UserMessageEvent` and `AgentMessageEvent` have content, but historically do not carry the persisted app-server item ID that should become the SQLite primary key. - `AgentReasoningEvent` and `AgentReasoningRawContentEvent` are fragments. `ThreadHistoryBuilder` currently merges them into the last reasoning item, which means a slice starting in the middle of reasoning cannot know whether to append to an earlier item or create a new one. - `WebSearchEndEvent`, `McpToolCallEndEvent`, collab end events, and similar legacy events can often render a final-looking item, but they usually rely on prior replay state to know which turn owns the item. - Begin/end legacy events are partial views of one logical item. The builder correlates them by `call_id` and mutates prior state to synthesize the final `ThreadItem`. That is the problem this direction fixes. A persisted canonical lifecycle record looks much closer to the read model we actually want later: ```rust ItemCompletedEvent { turn_id, item: TurnItem { id, ...full snapshot... }, completed_at_ms, } ``` Once rollout has explicit `turn_id`, stable `item.id`, and a canonical completed item snapshot, the future SQLite projector can reduce only the new rollout suffix and upsert the affected `thread_items` rows. It no longer needs to synthesize `item-N`, infer item ownership from the active turn, or replay earlier events just to reconstruct the current item snapshot. ## What changed - Added core `TurnItem` variants and item structs for command execution, dynamic tool calls, collab agent tool calls, and sub-agent activity. - Added conversions from those canonical items back into the legacy event shapes where current consumers still need them. - Added app-server v2 `ThreadItem` conversion for the new core item variants. - Taught `ThreadHistoryBuilder` and rollout persistence metrics to recognize the new item variants. ## Follow-up The next PR https://github.com/openai/codex/pull/30283 switches the live core producers for these item families onto canonical `ItemStarted` / `ItemCompleted` events.Owen Lin ·
2026-06-26 16:44:34 -07:00 -
feat(app-server): add history_mode to thread (#29927)
## Description This PR adds a new `historyMode = "legacy" | "paginated"` to `Thread`. This will be stored in `SessionMeta` in the JSONL rollout file and as a new column in the SQLite thread_metadata table, and exposed on `thread/start` and on the `Thread` object in app-server. ## What changed - Added canonical `ThreadHistoryMode` with `legacy` and `paginated`, defaulting old and new SessionMeta to `legacy`. - Carried `history_mode` through core session config, ThreadStore stored metadata, local/in-memory stores, rollout metadata extraction, and the existing SQLite `threads` table. - Added experimental `historyMode` to app-server v2 `Thread` and `thread/start`. - Made paginated stored threads metadata-discoverable but unsupported for legacy full-history reads, `load_history`, live resume, and create paths. - Regenerated app-server schema fixtures and added protocol/state/thread-store/app-server coverage for persistence and fail-closed behavior. ## Compatibility floor Because users may be running various versions of Codex binaries on the same machine (TUI, Codex App, etc.), we will need to establish a compatibility floor for upcoming paginated threads, which will change how thread storage reads and writes work. The overall plan here: ``` Release N: - Add historyMode to SessionMeta / Thread / SQLite metadata. - Teach binaries to understand paginated threads. - If a binary sees `historyMode="paginated"` but does not support the paginated contract, it refuses to resume/mutate the thread. - Default remains `"legacy"`. Release N+1: - First-party clients start opting into paginated threads where appropriate. - Internal dogfood / staged rollout. - Measure old-client usage and paginated-thread unsupported errors. Release N+2: - Only after Release N+ is overwhelmingly deployed, make paginated the default. - Accept that a small tail of N-1-or-older binaries may not understand paginated threads. ``` The important behavior change is fail-closed handling for a binary that encounters a persisted `paginated` thread before it knows how to fully support paginated history. In app-server, if a thread is `paginated`, we will: - allow metadata-only discovery paths like `thread/list` and `thread/read(includeTurns=false)`, so clients can still see the thread and inspect its `historyMode` - reject legacy full-history/live-thread paths like `thread/read(includeTurns=true)` and `thread/resume` with an unsupported JSON-RPC error - avoid silently treating an unknown or future `historyMode` as `legacy` Under the hood, the ThreadStore layer also rejects legacy operations that would need to load or replay the full thread history for a paginated thread. That gives us the behavior we want for Release N: future paginated threads are visible, but this binary fails closed instead of trying to operate on them as if they were legacy threads.
Owen Lin ·
2026-06-26 09:12:42 -07:00 -
Expose MCP app identity in app context (#29934)
## Why MCP tool-call events need to expose trusted app identity and action metadata directly so v2 clients do not have to infer it from tool names or resource URIs. ## What changed - Add optional `appName`, `templateId`, and `actionName` fields to MCP tool-call `appContext`. - Populate `appName` and `templateId` from trusted Codex Apps metadata, and derive `actionName` from the trusted app resource metadata. - Preserve all three fields through core events, legacy protocol events, persisted thread history, resume redaction, and app-server v2 responses. - Document the public `appContext` fields in `codex-rs/app-server/README.md`. - Regenerate app-server JSON and TypeScript schemas and add coverage for serialization, persistence, redaction, and metadata propagation. ## Validation - `just test -p codex-app-server-protocol mcp_tool_call` - `just test -p codex-core mcp_tool_call_item_metadata_only_trusts_codex_apps_identity mcp_tool_call_item_includes_app_identity` - `just write-app-server-schema` --------- Co-authored-by: Martin Au-Yeung <280153141+martinauyeung-oai@users.noreply.github.com>
Martin Au-Yeung ·
2026-06-25 18:31:10 -07:00 -
[codex] Surface MCP reauthentication-required startup failures (#29877)
## Summary - distinguish expired, non-refreshable stored MCP OAuth credentials from first-time missing credentials - carry a typed `failureReason: "reauthenticationRequired"` on the existing `mcpServer/startupStatus/updated` notification only when user action is required - keep the public MCP auth-status API unchanged and regenerate the app-server protocol schemas and documentation ## Why An MCP server with an expired access token and no usable refresh token currently fails startup without giving clients a reliable, typed recovery signal. The existing startup-status notification is the natural place to carry this state. Its nullable `failureReason` keeps the recovery reason attached to the failed startup transition without adding a one-off notification. Internally, Codex distinguishes first-time login from reauthentication and emits the reason only when the startup error itself requires authentication. ## User impact App clients can prompt an existing user to reconnect an MCP server when automatic recovery is impossible by handling a failed `mcpServer/startupStatus/updated` notification whose `failureReason` is `reauthenticationRequired`. Starting, ready, cancelled, unrelated failures, and first-time setup carry no reauthentication reason. ## Companion app PR - openai/openai#1069582 ## Validation - `just test -p codex-app-server-protocol` — 248 passed; schema fixture tests passed - `cargo check -p codex-app-server -p codex-tui` - `just test -p codex-rmcp-client -p codex-mcp` — 184 passed, 2 skipped - `just test -p codex-protocol -p codex-app-server-protocol -p codex-mcp` — 579 passed - `just write-app-server-schema` - `just fmt`
felixxia-oai ·
2026-06-25 21:50:36 +00:00 -
Persist selected capability roots and resolve availability per model step (#29856)
## Why `selectedCapabilityRoots` is durable thread intent: “use this capability root from environment `worker`.” The important product assumption is: > One environment ID always names the same logical executor and stable contents. `worker` does not silently change from executor A to an unrelated executor B. The process-local connection handle for `worker` can still be replaced while Codex is running, though, for example when `environment/add` registers a fresh handle for the same logical environment. The thread should persist only the stable selection. Each model step should pair that selection with the exact ready handle captured for that step. ## The boundary ```text persisted thread intent plugin@1 -> environment "worker" | | capture the current step v model-step view unavailable, or plugin@1 + worker's exact captured ready handle ``` The environment ID is the stable identity and cache key. The `Arc<Environment>` is only a process-local handle retained so consumers of one model step use the same captured environment. It is never persisted and it does not imply different environment contents. ## What changes ### Persist the stable selection Selected roots are written into `SessionMeta` and restored with the thread. Forked subagents inherit the same selections, including bounded-history forks. Only stable data is persisted: root ID, environment ID, and root path. ### Capture readiness together with the exact handle The environment snapshot records: ```rust environment_id -> Some(Arc<Environment>) // ready in this step environment_id -> None // still starting in this step ``` This prevents readiness and execution from coming from different registry snapshots. For example: ```text step snapshot: worker -> handle A, ready environment/add: worker -> fresh handle B for the same logical environment current step: plugin@1 still uses captured handle A ``` Without carrying handle A in the snapshot, the resolver could combine “A was ready” with handle B and treat B as ready before it had finished starting. This does not change cache invalidation. Stable capability metadata remains identified by environment ID and capability root. Replacing a process-local handle under the same stable environment ID does not invalidate or rediscover that metadata. ### Resolve availability per model step - A ready captured environment produces resolved roots using its captured handle. - A starting, missing, or failed environment is omitted from that step. - A selected lazy environment that is outside the turn's captured environment set is asked to start, and a later step can observe it as ready. - No capability files are scanned here. Transient transport disconnects remain the remote client's reconnect concern. This PR models initial attachment/readiness; it does not add live socket-connectivity state. ## Example ```text thread selection: plugin@1 -> environment "worker" step 1: worker is starting -> plugin@1 unavailable step 2: worker is ready -> plugin@1 resolves through worker's captured handle step 3: fresh local handle -> current step remains pinned; a later step captures its own view ``` Temporary unavailability does not discard the durable selection. Later PRs can retain stable metadata caches while projecting only currently available capabilities into model-visible World State. ## Compatibility The app-server request shape does not change. Older rollouts without `selected_capability_roots` deserialize to an empty list. ## Stack 1. **This PR:** persist stable selected roots and resolve them through an exact model-step handle. 2. #29960: cache stable skill metadata and project available skills into World State. 3. #29946: cache stable plugin declarations and manage the separate live MCP runtime.jif ·
2026-06-25 17:49:43 +00:00 -
[codex] Add Ultra reasoning effort (#29899)
## Why Ultra should be one user-facing reasoning selection for work that benefits from both maximum reasoning and proactive multi-agent delegation. Without it, clients must coordinate maximum reasoning with the experimental `multiAgentMode` setting, even though the inference backend still expects its existing `max` effort value. This change makes reasoning effort the source of truth: clients select `ultra`, core derives proactive multi-agent behavior when the turn is eligible for multi-agent V2, and inference requests continue to use the backend-compatible `max` value. ## What changed - Add `ultra` as a first-class reasoning effort and preserve model-catalog ordering when exposing it to clients. - Convert `ultra` to `max` at the inference request boundary, including Responses HTTP/WebSocket requests, startup prewarm, compaction, and memory summarization. - Derive effective multi-agent mode per turn from effective reasoning effort: - eligible multi-agent V2 + `ultra` → `proactive` - eligible multi-agent V2 + any other effort → `explicitRequestOnly` - V1 or otherwise ineligible sessions → no multi-agent mode instruction - Keep the derived effective mode in turn context history so successive turns can emit a developer-message update only when the effective mode changes. - Remove selected multi-agent mode from core session configuration, turn construction, thread settings, resume/fork restoration, and subagent spawn plumbing. Subagents inherit reasoning effort and derive their own effective mode. - Retain the experimental app-server `multiAgentMode` fields for wire compatibility while marking them deprecated. Request values are accepted but ignored; compatibility response fields report `explicitRequestOnly`. - Display Ultra in the TUI using the order supplied by `model/list`. ## Validation - `just test -p codex-core ultra_reasoning_uses_max_for_requests` - `just test -p codex-tui model_reasoning_selection_popup`
Shijie Rao ·
2026-06-24 20:13:52 -07:00 -
[2/3] core: persist world state in rollouts (#29835)
## Why `WorldState` currently remembers its model-visible diff baseline only in memory. That leaves no durable source for restoring the exact baseline after resume, fork, rollback, or compaction. This is the second PR in the WorldState persistence stack, built on #29833 and following #29249. It records durable state transitions; the next PR will replay them during rollout reconstruction. ## What - Add a `world_state` rollout item containing either a full snapshot or an RFC 7386 JSON Merge Patch. - Persist a full snapshot after initial context and after compaction establishes a new context window. - Persist non-empty patches when later sampling steps or turns advance the WorldState baseline. - Write model-visible history before its matching WorldState record, so an interrupted write can only cause a safe repeated update on replay. - Preserve WorldState records for full-history forks while excluding them from thread previews, metadata, and app-server history materialization. Older binaries read rollout lines independently, so they skip the unknown `world_state` records while retaining the rest of the thread. ## Testing - `just test -p codex-core snapshot_merge_patch_changes_and_removes_nested_values` - `just test -p codex-core world_state_baseline_deduplicates_until_history_is_replaced` - `just test -p codex-core deferred_executor_compaction_preserves_then_updates_environment_once` - `just test -p codex-protocol` - `just test -p codex-rollout` - `just test -p codex-state` - `just test -p codex-thread-store` - `just test -p codex-app-server-protocol`
sayan-oai ·
2026-06-24 20:13:49 -07:00 -
core: add configurable <context_window_guidance> message (#29936)
## Why This PR adds a configurable `<context_window_guidance>` developer section immediately after `<context_window>`. Harness integrations need this section to give the model deployment-specific instructions for preparing for context-window transitions. ## What changed - Add an optional `features.token_budget.guidance_message` config with a 1,000-byte runtime cap and generated schema support. - Render configured guidance as a developer `ContextualUserFragment` wrapped in `<context_window_guidance>` immediately after `<context_window>`. - Omit the section when guidance is unset, empty, or whitespace-only. - Preserve the resolved value in config locks and classify persisted guidance as contextual developer content. - Add integration coverage for rendered content and ordering.
Michael Bolin ·
2026-06-24 18:03:44 -07:00 -
Persist agent messages as response items (#29829)
## Why Inter-agent messages are recorded in live history as `ResponseItem::AgentMessage`, but rollouts stored `InterAgentCommunication` and rebuilt the response item during resume. This made the rollout differ from the actual Responses history. ## What changed - store the prepared `agent_message` response item directly - keep `trigger_turn` in a small local metadata record for fork truncation - keep reading older `inter_agent_communication` rollout items
jif ·
2026-06-24 15:43:10 +01:00 -
auth: move domain mode below app wire types (#29721)
## Why Authentication mode is a domain concept used by login, model selection, telemetry, and transports. Keeping the canonical type in app-server protocol forces those lower-level crates to depend on an unrelated wire API. ## What changed - Added canonical `codex_protocol::auth::AuthMode` domain values. - Kept the app-server wire DTO unchanged and added an explicit app-side conversion. - Removed production app-server-protocol dependencies from login, model-provider-info, models-manager, and otel call paths. ## Stack This is PR 2 of 6, stacked on [PR #29714](https://github.com/openai/codex/pull/29714). Review only the delta from `codex/split-json-rpc-protocols`. Next: [PR #29722](https://github.com/openai/codex/pull/29722). ## Validation - Auth and login coverage passed in the focused protocol/domain test run. - App-server account and auth conversion coverage passed.
Adam Perry @ OpenAI ·
2026-06-24 03:10:20 +00:00 -
chore: assign
amsg_IDs to agent messages (#29750)## Why The `ItemIds` path fills in missing IDs before response items are persisted and emitted as raw item events. `ResponseItem::AgentMessage` is part of that same response-item stream, but it was skipped by the missing-ID repair path, leaving agent messages without stable item IDs while messages and tool items received generated IDs. Agent messages recorded through `InterAgentCommunication` also need the generated ID to survive rollout persistence and resume. Otherwise clients can observe an `amsg_` ID for the live raw response item, then see that same persisted agent message lose its item ID after restart. ## What changed - Assign missing `ResponseItem::AgentMessage` IDs with the `amsg_` prefix. - Persist the generated item ID on `InterAgentCommunication` and replay it back into the reconstructed `ResponseItem::AgentMessage` on resume. - Keep the persisted ID out of the model-visible inter-agent message envelope. - Keep `CompactionTrigger` and `Other` skipped because they do not get generated item IDs. - Update session/protocol tests for agent-message ID assignment and resume preservation. ## Manual Testing Run the local dev build using `just c --enable item_ids` to ensure this code is exercised: https://github.com/openai/codex/blob/322e33512b2d38d38d705e2ef692a8aca50decac/codex-rs/core/src/session/mod.rs#L2713-L2715 In the `.jsonl` file, I saw entries like: ```json { "timestamp": "2026-06-24T00:44:03.098Z", "type": "inter_agent_communication", "payload": { "id": "amsg_019ef715-849a-7a50-becc-ce63c6a9c994", ``` ## Test plan - `just test -p codex-core record_inter_agent_communication_preserves_item_id_in_rollout_and_resume` - `just test -p codex-core record_inter_agent_communication_sets_turn_id_in_rollout_and_resume` - `just test -p codex-protocol inter_agent_communication_response_input_item_preserves_commentary_phase`
Michael Bolin ·
2026-06-23 17:57:03 -07:00 -
Support thread-level originator overrides (#29477)
## Why Work(TPP) threads can be launched from the Desktop app, but if they all keep the Desktop app's default originator then downstream attribution cannot distinguish local Work launches from cloud-backed Work launches. `thread/start.serviceName` already carries that launch signal, while `SessionMeta.originator` is the durable thread-level value that survives resume and fork. This change converts the Desktop Work service names into an effective originator at thread creation time, persists that originator with the thread, and keeps using it for later model requests and memory writes. ## What changed - Map `CODEX_WORK_LOCAL` and `CODEX_WORK_CLOUD` service names to per-thread originators, while preserving `CODEX_INTERNAL_ORIGINATOR_OVERRIDE` as the highest-precedence override. - Persist the effective originator in `SessionMeta.originator`, read it back on resume/fork, and inherit the parent originator for subagent spawns when there is no persisted session metadata. - Handle truncated `SpawnAgentForkMode::LastNTurns` forks by falling back to the live parent originator when the forked history no longer includes `SessionMeta`. - Thread the per-thread originator through Responses headers, websocket/compaction request paths, thread-store creation, rollout metadata, and memory stage-one telemetry. ## Verification - `just test -p codex-core agent::control::tests::spawn_thread_subagent_inherits_parent_originator_without_fork agent::control::tests::spawn_thread_subagent_fork_last_n_turns_inherits_parent_originator_without_session_meta thread_manager::tests::originator_override_precedes_service_name_remapping` - `just test -p codex-core agent::control::tests::resume_thread_subagent_restores_stored_metadata_and_effective_multi_agent_mode` - `just test -p codex-memories-write` - `just fix -p codex-core -p codex-memories-write` - `git diff --check`
alexsong-oai ·
2026-06-23 17:23:38 -07:00 -
[codex] rename rollout budget error to session budget error (#29744)
## Summary - rename the rollout-budget exhaustion error from `RolloutBudgetExceeded` to `SessionBudgetExceeded` - expose the matching app-server v2 wire value as `sessionBudgetExceeded` - regenerate JSON/TypeScript schema fixtures and update the app-server docs and focused tests This is a naming-only follow-up to #29715 based on [Pavel's review suggestion](https://github.com/openai/codex/pull/29715#discussion_r3463183480). Runtime behavior is unchanged. ## Tests - `just test -p codex-core rollout_budget` - `just test -p codex-app-server-protocol` - `just fmt` - `just write-app-server-schema`
rka-oai ·
2026-06-23 16:49:13 -07:00 -
[codex] surface rollout budget exhaustion (#29715)
## Summary - surface shared rollout-budget exhaustion as `CodexErr::RolloutBudgetExceeded` instead of a generic interrupted turn - map it through the existing `CodexErrorInfo` and app-server v2 `codexErrorInfo` path - keep local compaction from retrying after the shared rollout budget is exhausted This gives app-server clients a stable `rolloutBudgetExceeded` error they can classify without guessing from `status="interrupted"`. ## Tests - `just test -p codex-core rollout_budget`
rka-oai ·
2026-06-23 15:01:28 -07:00 -
Make selected plugin roots URI-native (#28918)
## Why Selected capability roots belong to the executor filesystem, not the app-server host. Converting their path strings into the host's native `Path` breaks whenever the two machines use different path conventions, such as a Windows executor behind a Unix app-server. This PR establishes `PathUri` as the selected-plugin boundary so the executor remains authoritative for its paths. ## What changed - Require `selectedCapabilityRoots[].location.path` to be a canonical `file:` URI and deserialize it directly as `PathUri`; native path strings are rejected. - Update the app-server schema, generated TypeScript, examples, and request coverage for the URI contract. - Keep selected roots, resolved plugin locations, manifest paths, and manifest resources as `PathUri`. - Inspect and read plugin roots and manifests only through the selected environment's `ExecutorFileSystem`. - Parse executor manifests with the shared URI-native parser from #29620 instead of projecting them onto the host filesystem. - Enforce resource containment lexically and preserve the root URI's POSIX or Windows path convention. - Cover foreign Windows plugin roots and URI-native manifest resources. ```text thread/start selectedCapabilityRoots[].location.path = "file:///C:/plugins/demo" | PathUri v ExecutorFileSystem | +--> plugin.json +--> manifest resources ``` This PR stops at the shared selected-plugin representation. The next two PRs remove the remaining host-path projections in the skill and MCP consumers. ## Stack 1. #29614 — add lexical `PathUri` containment. 2. #29620 — share URI-native manifest path resolution. 3. **This PR** — keep selected plugin roots and resources URI-native. 4. #29626 — load executor skills without host path conversion. 5. #29628 — resolve executor MCP working directories without host path conversion.
jif ·
2026-06-23 22:51:19 +01:00 -
core: persist initial context window metadata (#29519)
## Why PR #29494 made context-window IDs visible to the model by wrapping the token-budget window payload in `<context_window>`, but rollout JSONL consumers still could not see the initial window identity by tailing the session file. Compacted rollout items carry window IDs only after compaction has happened, so a session with no compaction had no durable JSONL record for window 0. This change gives tailing consumers a stable initial-window record at session creation time. ## What Changed - Added `session_meta.context_window.window_id` for the initial context-window identity. - `CreateThreadParams` now requires `initial_window_id: String`, so thread-store callers cannot accidentally create new threads without window-0 metadata. - Live thread creation derives the persisted initial window ID from the same `AutoCompactWindowIds` used to initialize `SessionState`, keeping runtime state and JSONL metadata aligned. - Rollout reconstruction uses `session_meta.context_window.window_id` as the initial-window fallback and derives `window_number = 0`, `first_window_id = window_id`, and `previous_window_id = None` internally. - Fork reconstruction intentionally uses the same rollout reconstruction path; consumers that need to distinguish copied initial-window metadata can use the rollout `thread_id`. - Legacy compactions without `window_number` still use compaction-count fallback accounting instead of being reset to window 0 by the initial-window fallback. - Compacted rollout metadata still takes precedence once compaction records exist, preserving the richer chain fields there. ## JSONL Shape Real rollout JSONL is one object per line. This example is expanded for readability, but shows the new initial `session_meta.context_window` record followed by the existing compacted rollout item shape that also carries window IDs: ```jsonl { "timestamp": "2026-06-22T12:00:00.000Z", "type": "session_meta", "payload": { "session_id": "<THREAD_ID>", "id": "<THREAD_ID>", "timestamp": "2026-06-22T12:00:00.000Z", "cwd": "/repo", "originator": "codex", "cli_version": "0.0.0", "source": "cli", "model_provider": "<MODEL_PROVIDER>", "context_window": { "window_id": "<INITIAL_WINDOW_ID>" } } } ... { "timestamp": "2026-06-22T12:34:56.000Z", "type": "compacted", "payload": { "message": "<COMPACTION_SUMMARY>", "replacement_history": [ "..." ], "window_number": 1, "first_window_id": "<INITIAL_WINDOW_ID>", "previous_window_id": "<INITIAL_WINDOW_ID>", "window_id": "<NEXT_WINDOW_ID>" } } ``` The nested `context_window` object is intentional: it gives rollout consumers a stable namespace for context-window metadata while only writing the non-derivable initial `window_id`. For the initial window, `window_number`, `first_window_id`, and `previous_window_id` are derived internally instead of being written to the rollout. ## Verification - `just test -p codex-protocol` - `just test -p codex-rollout recorder_materializes_on_flush_with_pending_items` - `just test -p codex-core reconstruct_history` - `just test -p codex-core record_initial_history_reconstructs_forked_transcript` - `just test -p codex-thread-store` - `just test -p codex-state` - `just test -p codex-app-server thread_read_returns_summary_without_turns` - `just test -p codex-rollout persistence_metrics`
Michael Bolin ·
2026-06-23 21:50:50 +00:00 -
core: resolve view_image paths in selected environment (#29526)
## Why view_image needs to support foreign OS remote executors. ## What - resolve image paths against the selected environment as `PathUri` and read them through that environment's filesystem - keep app-server's public path field wire-compatible as `LegacyAppPathString`, with purpose-specific UI rendering - cover relative and absolute target-native paths in the core integration test and run the full `view_image` suite under wine-exec without skips
Adam Perry @ OpenAI ·
2026-06-23 19:52:37 +00:00 -
chore(core) rm AskForApproval::OnFailure (#28418)
## Summary Deletes the OnFailure variant of the `AskForApproval` enum. This option has been deprecated since #11631. ## Testing - [x] Tests pass
Dylan Hurd ·
2026-06-23 12:13:54 -07:00 -
app-server: document thread and turn IDs are UUID7 (#27714)
It's actually a very nice property that these are UUID7s, so documenting them so we think twice before changing it away from UUID7s in the future.
Owen Lin ·
2026-06-23 11:46:36 -07:00 -
Share resumed rollout history (#28426)
## Summary Resuming a persisted thread currently deep-clones its complete rollout history several times. `InitialHistory` is retained for the app-server response, copied into thread persistence, and copied again by read-only accessors. These copies scale with the complete rollout rather than the bounded model context and add measurable latency for large sessions. This change stores resumed rollout history in `Arc<Vec<RolloutItem>>`. Rollout loading wraps the parsed vector once, while app-server response construction, session initialization, and thread persistence share it through inexpensive `Arc` clones. Read-only history access now returns a borrowed slice, and fork paths use `Arc::unwrap_or_clone` where they genuinely need mutable ownership. Rollout reconstruction also consumes its temporary context instead of cloning the reconstructed model history. The serialized representation remains unchanged. In an artificial 123 MB rollout benchmark, sharing resumed history reduced cold resume latency by roughly 9–10%. The affected crates compile with their test targets, all 80 thread-store tests pass, and the Bazel dependency lock remains valid.
Charlie Marsh ·
2026-06-23 10:23:25 -04:00 -
[codex] Use input items for Responses Lite tools (#27946)
When using Responses Lite, we should all use `additional_tools` and a developer item instead of the top level tools array & instructions field. This keeps things 1-to-1. Forced namespacing for _all_ tools will land in a following PR after some coordination & fixes in Responses API (around collisions & return items). The goal is to eventually expand the scope of this to _all_ requests from codex, but that will require larger coordination across providers & slower rollout.
rka-oai ·
2026-06-22 23:56:16 -07:00 -
Propagate safety buffering treatment metadata (#29473)
## Summary - read the request-scoped safety-buffering treatment from HTTP response headers and per-turn WebSocket metadata through one shared header parser - combine that treatment with Responses API safety-buffering signals - propagate `showBufferingUi` and nullable `fasterModel` through the existing `model/safetyBuffering/updated` app-server notification - update the app-server documentation and generated JSON and TypeScript schemas The public implementation contains no model mapping or real model identifier. Tests and protocol examples use generic `current-model` and `faster-model` placeholders only. ## Dependencies - server-side treatment evaluation: https://github.com/openai/openai/pull/1060247 - initial Responses API safety-buffering propagation: https://github.com/openai/codex/pull/29371 - Codex App UI: https://github.com/openai/openai/pull/1057789 ## Validation - Codex API tests: 129 passed - focused Codex core safety-buffering integration test passed - app-server protocol tests passed after regenerating schema fixtures - Clippy fix and repository formatting completed successfully The broader app-server run compiled all changed crates and completed with 1,269 passing tests. Its remaining failures were unrelated environment limitations: macOS sandbox application was denied, one expected test binary was unavailable, and several existing subprocess tests timed out as a result.
Francis Chalissery ·
2026-06-22 19:51:03 -07:00 -
chore: improve expired Bedrock credential errors (#28992)
## Why Amazon Bedrock returns a `401 Unauthorized` response containing `Signature expired:` when an AWS credential, including a short-lived `AWS_BEARER_TOKEN_BEDROCK`, has expired. Codex currently surfaces that response as a generic `unexpected status` error, which does not explain how to recover. Environment-provided bearer tokens cannot be refreshed automatically, so the error should direct users to refresh their AWS credentials or replace or remove the environment token and restart Codex. This classification belongs to the Amazon Bedrock provider so similar responses from other providers retain their existing behavior. ## What changed - Add a synchronous `ModelProvider::map_api_error` hook that defaults to the existing provider-neutral API error mapping, and route model request, stream, WebSocket, and terminal unauthorized errors through the active provider. - Override the hook for Amazon Bedrock. After preserving the structured status, body, URL, and request metadata, recognize `401` responses containing `Signature expired:` and attach actionable credential guidance. - Keep `codex-protocol` provider-neutral by representing the guidance as an optional `user_message`. Error rendering prefers this message while continuing to append the URL, request ID, Cloudflare ray, and authorization diagnostics. - Add model-provider coverage for expired signatures and negative cases, core coverage for provider dispatch after unauthorized recovery, and a TUI snapshot for the rendered error. ## Testing Tested with a real request with expired bedrock key: <img width="962" height="126" alt="Screenshot 2026-06-22 at 3 56 51 PM" src="https://github.com/user-attachments/assets/7e21cc7c-798e-4662-8467-7f304a2f2b59" />
Celia Chen ·
2026-06-23 00:53:09 +00:00 -
feat(core): store turn_id on ResponseItem metadata (#28360)
## Description This PR is a followup to https://github.com/openai/codex/pull/28355 and starts assigning `internal_chat_message_metadata_passthrough.turn_id` to durable Responses API items created during a turn. The goal is that those items keep the `turn_id` that introduced them when Codex resends stateless HTTP context, reconstructs history for resume/fork paths, or reuses websocket response state. ## What changed - Set `internal_chat_message_metadata_passthrough.turn_id` when missing as response items enter durable history, initial/replacement history, inter-agent communication history, and local compaction summaries. - Preserve existing item turn IDs instead of overwriting them during persistence, resume reconstruction, compaction, forked history, and websocket incremental reuse. - Keep `compaction_trigger` fieldless because it is a request control, not a durable response item. - Update focused history/request assertions and fixtures for stateless requests, websocket incrementals, compaction, thread injection, prompt debug, and related CI coverage.
Owen Lin ·
2026-06-22 16:45:14 -07:00 -
core: wrap token budget window context (#29494)
Token-budget initial context carries thread and context-window lineage that the model should treat as one structured context-window block. Wrapping it in `<context_window>` makes that boundary explicit while preserving the existing window id content. Before this change, the window identifiers were injected as an untagged developer text fragment: ```text Thread id <THREAD_ID>. First context window id: <FIRST_WINDOW_ID> Current context window id: <WINDOW_ID> Previous context window id: <PREVIOUS_WINDOW_ID> ``` After this change, the same payload is wrapped as a context-window block: ```text <context_window> Thread id: <THREAD_ID> First context window id: <FIRST_WINDOW_ID> Current context window id: <WINDOW_ID> Previous context window id: <PREVIOUS_WINDOW_ID> </context_window> ``` This adds shared `CONTEXT_WINDOW_*_TAG` protocol constants, updates `TokenBudgetContext` to render with those markers, treats the new wrapper as contextual developer content when mapping history, and refreshes the token-budget request-shape assertions and snapshot. Verification: - `just test -p codex-core token_budget` - `just test -p codex-core recognizes_context_window_as_contextual_developer_content`
Michael Bolin ·
2026-06-22 23:37:49 +00:00 -
Allow ChatGPT accounts without email (#28991)
# Summary Codex required every ChatGPT account to have an email address. A service-account personal access token can return valid account metadata without one, so PAT login failed while decoding the metadata response. This change makes email optional in the account metadata type that owns it and preserves that absence through authentication, provider account state, the app-server API, generated clients, and TUI bootstrap. Existing accounts with email addresses keep the same behavior. ## Behavior-changing call sites | Call site | Behavior after this change | | --- | --- | | `login/src/auth/personal_access_token.rs` | PAT metadata accepts a missing or null email and retains `None`. | | `agent-identity/src/lib.rs` | Agent Identity JWT claims accept an omitted email. | | `login/src/auth/storage.rs` and `login/src/auth/agent_identity.rs` | Stored and managed Agent Identity records carry `Option<String>`. Deserialization maps the legacy empty-string sentinel to `None`. | | `login/src/auth/manager.rs` | `get_account_email` returns the stored option, and managed identity bootstrap no longer converts `None` to an empty string. | | `model-provider/src/provider.rs` and `protocol/src/account.rs` | A ChatGPT provider account requires a plan type but may carry no email. | | `app-server-protocol/src/protocol/v2/account.rs` | `account/read` keeps the `email` field on the wire and returns `null` when the account has no email. Generated TypeScript and JSON schemas describe a required, nullable field. | | `sdk/python/src/openai_codex/generated/v2_all.py` | The generated Python `ChatgptAccount` model accepts `None` for email. | | `tui/src/app_server_session.rs` | Email-less ChatGPT accounts bootstrap normally, keep external feedback routing, omit account-email telemetry, and display the plan in account status. | ## Design decisions - Missing email remains `None` at every layer. The code never uses an empty string as a substitute. - The app-server response includes `"email": null` instead of omitting the field. Clients retain a stable response shape. - Plan type remains required for provider account state. This change relaxes only the email assumption. ## Testing Tests: affected test targets compile, scoped Clippy and formatting pass, a focused TUI snapshot covers plan-only account status, real before/after PAT login smoke covers metadata without email, app-server smoke covers `account/read` with `email: null`, and a regression smoke covers an existing email-bearing PAT. Unit tests run in CI. ## Evidence Visual smoke evidence will be attached here.
efrazer-oai ·
2026-06-22 13:19:40 -07:00 -
core: rename metadata -> internal_chat_message_metadata_passthrough (#28968)
## Description This PR cuts Codex over from generic `ResponseItem.metadata` (introduced here: https://github.com/openai/codex/pull/28355) to `ResponseItem.internal_chat_message_metadata_passthrough`, which is the blessed path and has strongly-typed keys. For now we have to drop this MAv2 usage of `metadata`: https://github.com/openai/codex/pull/28561 until we figure out where that should live.
Owen Lin ·
2026-06-22 11:11:25 -07:00 -
Simplify multi-agent mode controls (#29324)
## Why Multi-agent delegation policy was split across `multiAgentMode`, `features.multi_agent_mode`, and `usage_hint_enabled`. These controls could disagree: a requested mode could be downgraded by the feature flag, and disabling usage hints also disabled mode instructions. Some clients also need multi-agent tools without adding delegation-policy text to model context. The previous two-mode API could not express that directly. ## What changed `multiAgentMode` is now the only live delegation-policy control: | Mode | Behavior | | --- | --- | | `none` | Keep multi-agent tools available without adding mode instructions. | | `explicitRequestOnly` | Only delegate after an explicit user request. | | `proactive` | Delegate when parallel work materially improves speed or quality. | - new threads default to `explicitRequestOnly`; omitting the mode on later turns keeps the current value - thread start, resume, fork, and settings responses always report the concrete current mode instead of `null` - mode selection remains sticky across turns and resume - usage-hint text no longer controls whether mode instructions apply - `features.multi_agent_mode` and `usage_hint_enabled` remain accepted as ignored compatibility settings so existing configs continue to load - app-server documentation and generated schemas describe the three-mode API ## Tests - `just test -p codex-core multi_agent_mode` - `just test -p codex-core multi_agent_v2_config_from_feature_table` - `just test -p codex-core spawn_agent_description` - `just test -p codex-features` - `just test -p codex-app-server-protocol` - `just test -p codex-app-server multi_agent_mode`
jif ·
2026-06-22 10:05:36 +02:00 -
Persist session IDs across thread resume (#29327)
## Summary A cold-resumed subagent kept its durable thread ID but could receive a new session ID, splitting one agent tree across multiple sessions after a restart. Persist the root session ID in every rollout `SessionMeta`, carry it through thread creation, and restore it before initializing the resumed `Session` and `AgentControl`. ## Behavior For a nested agent tree: ```text root session R parent thread P child thread C ``` The child rollout stores: ```text session_id: R parent_thread_id: P id: C ``` After a cold resume, the child still belongs to root session `R` while its immediate parent remains `P`. The integration coverage uses distinct values for all three IDs so it catches restoring the session from `parent_thread_id`. ## Legacy rollouts Previous rollouts have `id` but no `session_id`. `SessionMetaLine` deserialization treats a missing `session_id` as `id`, keeping those files readable, listable, and resumable. When a legacy subagent is resumed through its root, that synthesized child ID no longer overrides the inherited root-scoped `AgentControl`. New rollouts always persist the explicit root session ID.jif ·
2026-06-22 09:36:08 +02:00 -
Propagate safety buffering events to app-server clients (#29371)
Responses API safety buffering metadata currently stops at the transport boundary, so app-server clients cannot render the in-progress safety review state. This change: - decodes and deduplicates `safety_buffering` metadata from Responses API SSE and WebSocket events without suppressing the original response event - emits a typed core event containing the requested model plus backend use cases and reasons - forwards that event as `turn/safetyBuffering/updated` through app-server v2 and updates generated protocol schemas - keeps the side-channel event out of persisted rollouts and turn timing This supports the Codex Apps buffering UX and depends on the Responses API backend work in https://github.com/openai/openai/pull/1044569 and https://github.com/openai/openai/pull/1044571. Validation: - focused `codex-core` safety-buffering integration test passes - `cargo check -p codex-core -p codex-app-server -p codex-app-server-protocol` - `just fix -p codex-api -p codex-protocol -p codex-core -p codex-app-server-protocol -p codex-app-server -p codex-rollout -p codex-rollout-trace -p codex-otel` - `just fmt` - broad package test run: 4,430/4,492 passed; 62 unrelated local-environment/concurrency failures involved unavailable test binaries, MCP subprocess setup, and app-server timeouts
Francis Chalissery ·
2026-06-22 03:39:14 +00:00 -
core: add context window lineage IDs (#29256)
## Why The rendered `<token_budget>` fragment identifies the thread and current context window, but it does not expose enough lineage to identify the first window in the thread or the immediately preceding window. Those IDs also need to remain stable across compaction, resume, and rollback. ## What changed - Track first, previous, and current UUIDv7 context-window IDs in auto-compaction state. - Render `thread_id`, `first_window_id`, `previous_window_id`, and the current window ID in the full `<token_budget>` fragment. - Persist the first and previous window IDs in compacted rollout checkpoints and restore them during rollout reconstruction. - Preserve compatibility with older compacted records that do not contain the new optional fields. - Update focused state, rendering, reconstruction, rollback, and serialization coverage. ## Validation - `just test -p codex-core token_budget` - `just test -p codex-protocol compacted_item::tests` - `just test -p codex-core tracks_prefill_and_window_boundaries` - `just test -p codex-core reconstruct_history_uses_replacement_history_verbatim` - `just test -p codex-core thread_rollback_restores_cleared_reference_context_item_after_compaction`
pakrym-oai ·
2026-06-20 13:15:49 -07:00 -
Add indexed web search mode (#28489)
## Summary - Add `web_search = "indexed"` alongside `disabled`, `cached`, and `live`. - Use that same resolved mode for both hosted and standalone web search. - For hosted search, send `index_gated_web_access: true` with external web access enabled only when `indexed` is selected. - For standalone search, preserve the existing boolean wire values for existing modes (`cached` maps to `false` and `live` to `true`) and send `"indexed"` only for `indexed`; `disabled` keeps the tool unavailable. - Carry the mode through managed configuration requirements and generated schemas. ## Why Indexed search provides a middle ground between cached-only search and unrestricted live page fetching. Search queries can remain live while direct page fetches are limited to URLs admitted by the server. The existing `web_search` setting remains the single source of truth, so hosted and standalone executors cannot drift into different access modes. Without an explicit `indexed` selection, the existing model-visible tool and request shapes are unchanged. ```toml web_search = "indexed" [features] standalone_web_search = true ``` ## Validation - `just fmt` - `just test -p codex-api` (`126 passed`) - `just test -p codex-web-search-extension` (`7 passed`) - `just test -p codex-core code_mode_can_call_indexed_standalone_web_search` (`1 passed`) - Focused configuration, hosted request, standalone request, and managed-requirement coverage is included in the PR; remaining suites run in CI. The full workspace test suite was not run locally.
Winston Howes ·
2026-06-19 05:35:57 -07:00 -
Expose thread-level multi-agent mode (#28792)
## Why Once multi-agent mode can be selected per turn, clients also need to choose the initial selection when creating a thread and observe that selection through lifecycle and settings APIs. The selected value is intentionally distinct from the effective model-visible value: no client selection is represented as `null`, even though an eligible multi-agent v2 turn derives `explicitRequestOnly` as its effective default. ## What changed - Add the optional experimental `thread/start.multiAgentMode` parameter and pass it through thread creation. - Preserve an omitted initial value as an unset selection rather than eagerly storing `explicitRequestOnly`. - Apply an explicit `thread/start` selection to the first turn through the session configuration established at thread creation. - Restore the latest persisted effective mode as the selected baseline on cold resume when rollout history contains one. - Inherit the optional selected mode from a loaded parent when creating related runtime threads. - Return the current selected `multiAgentMode` from `thread/start`, `thread/resume`, `thread/fork`, and thread settings, using `null` when no mode is selected. - Keep lifecycle reporting independent from model capability and feature eligibility; core turn construction remains responsible for calculating and persisting the effective mode. ## Not covered - Clearing an existing loaded-session selection back to unset through `turn/start`; omitted or `null` currently retains the session's selection. - A TUI control, slash command, or `config.toml` preference. ## Verification - `CARGO_INCREMENTAL=0 just test -p codex-app-server-protocol` - `CARGO_INCREMENTAL=0 just test -p codex-app-server multi_agent_mode` The focused app-server coverage verifies explicit `thread/start` initialization, first-turn prompting, nullable reporting for an omitted selection, and retention of selections that are not currently runtime-eligible. ## Stack Stacked on #28685. This PR contains only the thread initialization and lifecycle/settings API layer.
Shijie Rao ·
2026-06-19 10:50:44 +02:00 -
Add per-turn multi-agent mode (#28685)
## Why Multi-agent v2 currently carries an explicit-request-only delegation rule in its static usage hint. That provides a safe default, but it prevents clients from selecting proactive delegation per turn without changing static guidance or rewriting prior model context. This change makes delegation mode a session selection that can be updated through `turn/start`, while deriving the effective model-visible mode separately for each turn. Eligible multi-agent v2 turns remain explicit-request-only unless proactive mode is both selected and enabled. ## What changed - Add the experimental `turn/start.multiAgentMode` parameter with `explicitRequestOnly` and `proactive` values. Omission retains the loaded session's current optional selection. - Add the default-off `features.multi_agent_mode` feature gate. Eligible multi-agent v2 turns use the selected mode when enabled; an unset selection or disabled gate resolves to `explicitRequestOnly`. - Treat mode prompting as inapplicable for multi-agent v1 and other unsupported session configurations, producing no multi-agent mode developer message rather than rejecting the turn. - Move the explicit-request-only rule out of the static v2 usage hint and into a bounded, tagged developer context fragment. - Emit the effective mode in initial context and only when that effective mode changes on later turns. - Persist the effective mode in `TurnContextItem` as the durable baseline for resume and context-update comparisons. Historical rollout items are not rewritten. Later mode developer messages establish the current rule incrementally. ## Not covered - Initial selection through `thread/start` and selected-mode reporting from thread lifecycle/settings APIs; those are isolated in the stacked #28792. - A TUI control or slash command for selecting the mode. - Persisting a preferred mode to `config.toml`; selection remains session/turn scoped. - Changes to multi-agent concurrency limits, tool availability, or model catalog capability declarations. - Rewriting historical rollout prompt items. Cold resume restores the latest persisted effective mode when available while leaving historical developer messages intact. ## Verification - `CARGO_INCREMENTAL=0 just test -p codex-core multi_agent_mode` - Focused app-server coverage verifies that `turn/start.multiAgentMode` produces proactive developer instructions for an eligible v2 turn. ## Stack Followed by #28792, which adds `thread/start` initialization and lifecycle/settings observability.
Shijie Rao ·
2026-06-18 22:47:51 -07:00 -
[codex] Assign response item IDs when recording history (#28814)
## Why Client-created response items enter history without IDs, so their identity is lost across rollout persistence and resume. IDs should be assigned once at the history-recording boundary, while IDs returned by the server must remain unchanged. The Responses API validates item IDs using type-specific prefixes. Locally generated IDs therefore use the matching prefix plus a hyphenated UUIDv7, keeping them valid while distinguishable from server-generated IDs. Because this changes persisted history and provider request shapes, the behavior is opt-in behind the under-development `item_ids` feature. Compaction triggers remain request controls whose API shape does not accept an ID. ## What changed - Register the disabled-by-default `item_ids` feature and expose it in `config.schema.json`. - Make supported optional `ResponseItem` IDs serializable and expose them in the generated app-server schemas. - When `item_ids` is enabled, assign an ID during conversation-history preparation if an item has no ID. - Generate type-prefixed, hyphenated UUIDv7 IDs using the Responses API item conventions. - Preserve existing server IDs without rewriting them. - Persist assigned IDs in rollouts and include them in subsequent Responses requests. - Remove the unsupported ID field from `CompactionTrigger` and document why it has no ID. - Add integration coverage for enabled ID persistence, preservation of server IDs, and omission of generated IDs while the feature is disabled. `prepare_conversation_items_for_history` is the single response-item ID allocation boundary. ## Test plan - `just test -p codex-features` - `just test -p codex-core response_item_ids_persist_across_resume_and_preserve_server_ids` - `just test -p codex-core non_openai_responses_requests_omit_item_turn_metadata` - `just test -p codex-core resize_all_images_prepares_failures_before_history_insertion` - `just test -p codex-protocol` - `just test -p codex-app-server-protocol` - `just test -p codex-api azure_default_store_attaches_ids_and_headers`
pakrym-oai ·
2026-06-18 17:30:55 -07:00 -
Always use AVAS for realtime WebRTC calls (#28856)
## Summary - Remove the realtime `architecture` selector from core protocol, app-server protocol, config parsing, generated schemas, and callers. - Always create WebRTC realtime calls with the AVAS query params: `intent=quicksilver&architecture=avas`. - Keep direct websocket realtime behavior on the existing config/default path, while WebRTC starts without an explicit version now default to realtime v1 because AVAS requires v1. ## Notes - WebRTC realtime now means AVAS. If a caller explicitly asks to start WebRTC with realtime v2, Codex rejects that request because the AVAS WebRTC path only supports realtime v1. Websocket realtime is separate and can still use realtime v2. - The old `[realtime] architecture = "realtimeapi" | "avas"` config knob is removed. Local configs that still set it will need to delete that line. - Some app-server tests that were only trying to exercise realtime v2 protocol behavior now use websocket transport, because WebRTC is intentionally locked to AVAS/v1. Separate WebRTC tests cover the AVAS query params, v1 startup, SDP flow, and sideband join. ## Validation - Merged fresh `origin/main` at `83e6a786a2`. - `just fmt` - `just write-config-schema` - `just write-app-server-schema` - `git diff --check` - `just test -p codex-api -p codex-core -p codex-app-server-protocol -p codex-app-server realtime` (176 passed) - `just test -p codex-protocol -p codex-config` (413 passed)
Peter Bakkum ·
2026-06-18 19:11:21 -05:00 -
core: add UUIDv7 context window IDs (#28953)
## Why The token-budget context currently identifies a context window by its thread-local sequence number. A UUIDv7 gives the model a stable opaque identity that remains fixed for a window and rotates when compaction or `new_context` starts the next one. ## What changed - Preserve the existing monotonic value as `window_number` and add a UUIDv7 `window_id` to `CompactedItem`. - Generate and rotate the UUID with auto-compaction window state, persist it alongside the number, and reconstruct it on resume and rollback. - Accept legacy compacted rollout records where the numeric `window_id` represented the window number. - Use the UUID only in token-budget context; existing request headers and metadata continue using `thread_id:window_number`. ## Testing - `just test -p codex-protocol compacted_item::tests` - `just test -p codex-core token_budget`
pakrym-oai ·
2026-06-18 17:00:49 -07:00 -
Emit Trusted MCP App Identity on Tool-Call Items (#27132)
## Summary - Add optional `appContext` to app-server MCP tool-call items with trusted `connectorId`, `linkId`, and `mcpAppResourceUri` metadata. - Preserve that context across tool-call events, persisted history, reconnects, and thread resume. - Keep the deprecated top-level `mcpAppResourceUri` temporarily for client migration. The consumer contract is `{ appContext: { connectorId, linkId, mcpAppResourceUri }, tool }`. ## Validation - Full GitHub Actions suite passes, including CLA, Bazel tests, clippy, release builds, and argument-comment lint. --------- Co-authored-by: martinauyeung-oai <280153141+martinauyeung-oai@users.noreply.github.com>martinauyeung-oai ·
2026-06-18 14:02:54 -07:00 -
Support
openai/formextended form elicitations (#27500)# Summary Allow App Server clients to opt into `openai/form` MCP elicitations.
Gabriel Peal ·
2026-06-18 11:54:49 -07:00 -
Add turn-scoped context contributions (#28911)
## Summary - keep context injection on a single ContextContributor trait - split context injection into thread-scoped and turn-scoped contribution methods - wire turn-scoped fragments into initial context assembly so extensions can contribute context from turn-local state
jif ·
2026-06-18 19:40:28 +02:00 -
unified-exec: retain PathUri in command events (#28780)
## Why App-server must report command events containing foreign-platform paths without changing existing client or rollout path-string formats. ## What changed - retain `PathUri` through exec command begin/end events - convert cwd values to `LegacyAppPathString` at the app-server compatibility boundary - drop command actions with foreign paths and log them - serialize rollout-trace cwd values using their inferred native path representation - restore Wine coverage for retained Windows cwd values and successful completion
Adam Perry @ OpenAI ·
2026-06-18 05:00:04 +00:00 -
[codex] Support assistant realtime append text (#28836)
## Why Frontend realtime voice continuity needs to replay a tiny previous-session overlap as actual conversation items, including assistant text. The app-server `thread/realtime/appendText` API already carries a role through to the Rust realtime websocket layer, but the shared role enum only accepted `user` and `developer`. ## What Changed - Added `assistant` to `ConversationTextRole` and regenerated the app-server schema/type fixtures. - Added `output_text` as a realtime conversation content type. - Updated realtime websocket item creation so assistant appendText emits `content: [{ type: "output_text", text }]`, while user and developer continue to emit `input_text`. - Updated app-server docs and tests to cover assistant appendText alongside the existing developer role behavior. ## Validation - `just write-app-server-schema` - `just fmt` (first sandboxed attempt failed because `uv` could not access `~/.cache/uv`; reran with filesystem access and passed) - `just test -p codex-api` passed: 126/126 - `just test -p codex-app-server-protocol` passed: 239/239, including generated JSON/TypeScript fixture checks - `just test -p codex-app-server` was started locally but stopped per request after unrelated local sandbox/Seatbelt failures (`sandbox-exec: sandbox_apply: Operation not permitted`) and one missing local `codex` binary failure; CI should be faster and more authoritative for the full suite.guinness-oai ·
2026-06-17 20:57:13 -07:00 -
[codex] control automatic realtime handoff delivery (#27986)
## What Built on the realtime speech-control plumbing merged in #27917. - Add optional `codexResponseHandoffPrefix` to `thread/realtime/start`. - Apply that prefix only to automatic V1 commentary sent through `conversation.handoff.append`; final answers remain unprefixed. - Add opt-in `clientManagedHandoffs`. When true, core suppresses automatic response handoffs and completion output so delivery is controlled by explicit client append APIs. - Preserve existing automatic behavior by default. `codexResponsesAsItems: true` continues to select item routing when client-managed mode is disabled. ## Why Voice clients need two delivery policies: automatic background context with silent commentary instructions and fully client-owned handoffs. Phase-aware prefixing keeps routine commentary silent without suppressing the final answer, while client-managed mode lets an app decide exactly which updates to append. ## Validation - `just fmt` - `cargo test -p codex-app-server-protocol serialize_thread_realtime_start` - `RUST_MIN_STACK=16777216 cargo test -p codex-core --test all conversation_handoff_persists_across_item_done_until_turn_complete` - `RUST_MIN_STACK=16777216 cargo test -p codex-app-server --test all webrtc_v1_client_managed_handoffs_disable_automatic_output` - `RUST_MIN_STACK=16777216 cargo test -p codex-app-server --test all webrtc_v1_final_automatic_handoff_omits_silent_prefix` - `cargo build -p codex-cli --bin codex` - Local Codex Apps compatibility check: 43 focused webview tests passed, and a live voice session routed through the source-built app-server. The explicit `RUST_MIN_STACK` avoids a macOS Tokio test-worker stack overflow seen with the default test environment.
jiayuhuang-openai ·
2026-06-18 02:22:29 +00:00 -
[codex] Add optional IDs to response items (#28812)
## Why `ResponseItem` variants do not have a consistent internal ID shape: some variants carry required IDs, some carry optional IDs, and some cannot represent an ID at all. The existing fields also use inconsistent serde, TypeScript, and JSON-schema annotations. A single enum-level access path is needed before history recording can assign and retain IDs. This PR establishes that internal model only. It intentionally does not generate or serialize IDs; allocation and wire persistence are isolated in the stacked follow-up. ## What changed - Give every concrete `ResponseItem` variant an `Option<String>` ID field. - Apply the same internal-only annotations to every ID field: `#[serde(default, skip_serializing)]`, `#[ts(skip)]`, and `#[schemars(skip)]`. - Add `ResponseItem::id()` and `ResponseItem::set_id()` as the shared accessors. - Preserve IDs when history items are rewritten for truncation. - Adapt consumers that previously assumed reasoning and image-generation IDs were required. - Regenerate app-server schemas so the hidden fields are represented consistently. The serde catch-all `ResponseItem::Other` remains ID-less because it must remain a unit variant. ## Test plan - `cargo check --tests -p codex-core -p codex-api -p codex-rollout-trace -p codex-image-generation-extension` - `just test -p codex-protocol` - `just test -p codex-app-server-protocol` - `just test -p codex-api -p codex-rollout-trace -p codex-image-generation-extension` - `just test -p codex-core event_mapping`
pakrym-oai ·
2026-06-17 18:27:43 -07:00 -
Scope command approvals by execution environment (#28738)
## Why Command approval cache keys included the command and working directory, but not the execution environment. An approval for `/workspace` locally could therefore be reused for the same command and path on an executor. ## What changed - Include the selected environment ID in shell and unified-exec approval cache keys. - Carry that ID through the normal command approval request so clients can show which environment is being approved. - Expose the environment through app-server as a required nullable `environmentId` and show it in the inline TUI approval prompt. - Keep older recorded approval events compatible when the environment is absent. For example, `echo ok` in local `/workspace` and `echo ok` in executor `/workspace` now produce different approval keys and separate prompts. ## Scope This PR does not change network approvals, Guardian review actions, MCP elicitation, full-screen TUI rendering, or environment-ID validation. Remote `shell_command` execution itself remains in #28722; this PR only makes its approval key environment-aware.
jif ·
2026-06-17 19:52:43 +02:00 -
Add join key for MAv2 inter-agent messages (#28561)
## Summary This keeps inter-agent communication on the existing raw response item path and adds a join key for MAv2 tool calls. MAv2 `spawn_agent`, `send_message`, and `followup_task` now stamp the originating tool call id into `ResponseItemMetadata.source_call_id` on the raw `ResponseItem::AgentMessage`. App-server clients can join that raw item back to the existing tool/activity event by call id, while using the raw agent message's existing sender, receiver, and content fields. No new app-server `ThreadItem` or notification type is added. ## Tests - `just fmt` - `just write-app-server-schema` - `just test -p codex-protocol` - `just test -p codex-app-server-protocol` - `just test -p codex-core multi_agent_v2_spawn_returns_path_and_send_message_accepts_relative_path` - `just test -p codex-core multi_agent_v2_followup_task_completion_notifies_parent_on_every_turn` - `just fix -p codex-protocol` - `just fix -p codex-app-server-protocol` - `just fix -p codex-core`
jif ·
2026-06-17 14:48:56 +02:00 -
[codex] core: restore absolute turn context cwd (#28629)
## Why #28152 jumped the gun on moving the rollout format to store URIs, and would likely break compat with some features that don't go through the same types as the core logic. ## What Make `TurnContextItem.cwd` an `AbsolutePathBuf` again, remove test added for `PathUri` serialization in rollouts. Also drops a bunch of error paths that are no longer needed.
Adam Perry @ OpenAI ·
2026-06-16 19:05:26 -07:00