mirror of
https://github.com/pchuan98/codex.git
synced 2026-07-01 00:31:56 +08:00
dev
78 Commits
-
feat(app-server): add history_mode to thread (#29927)
## Description This PR adds a new `historyMode = "legacy" | "paginated"` to `Thread`. This will be stored in `SessionMeta` in the JSONL rollout file and as a new column in the SQLite thread_metadata table, and exposed on `thread/start` and on the `Thread` object in app-server. ## What changed - Added canonical `ThreadHistoryMode` with `legacy` and `paginated`, defaulting old and new SessionMeta to `legacy`. - Carried `history_mode` through core session config, ThreadStore stored metadata, local/in-memory stores, rollout metadata extraction, and the existing SQLite `threads` table. - Added experimental `historyMode` to app-server v2 `Thread` and `thread/start`. - Made paginated stored threads metadata-discoverable but unsupported for legacy full-history reads, `load_history`, live resume, and create paths. - Regenerated app-server schema fixtures and added protocol/state/thread-store/app-server coverage for persistence and fail-closed behavior. ## Compatibility floor Because users may be running various versions of Codex binaries on the same machine (TUI, Codex App, etc.), we will need to establish a compatibility floor for upcoming paginated threads, which will change how thread storage reads and writes work. The overall plan here: ``` Release N: - Add historyMode to SessionMeta / Thread / SQLite metadata. - Teach binaries to understand paginated threads. - If a binary sees `historyMode="paginated"` but does not support the paginated contract, it refuses to resume/mutate the thread. - Default remains `"legacy"`. Release N+1: - First-party clients start opting into paginated threads where appropriate. - Internal dogfood / staged rollout. - Measure old-client usage and paginated-thread unsupported errors. Release N+2: - Only after Release N+ is overwhelmingly deployed, make paginated the default. - Accept that a small tail of N-1-or-older binaries may not understand paginated threads. ``` The important behavior change is fail-closed handling for a binary that encounters a persisted `paginated` thread before it knows how to fully support paginated history. In app-server, if a thread is `paginated`, we will: - allow metadata-only discovery paths like `thread/list` and `thread/read(includeTurns=false)`, so clients can still see the thread and inspect its `historyMode` - reject legacy full-history/live-thread paths like `thread/read(includeTurns=true)` and `thread/resume` with an unsupported JSON-RPC error - avoid silently treating an unknown or future `historyMode` as `legacy` Under the hood, the ThreadStore layer also rejects legacy operations that would need to load or replay the full thread history for a paginated thread. That gives us the behavior we want for Release N: future paginated threads are visible, but this binary fails closed instead of trying to operate on them as if they were legacy threads.
Owen Lin ·
2026-06-26 09:12:42 -07:00 -
feat: add provider-aware model fallback to thread start (#29942)
## Why Helper threads such as task title generation can request a model ID that is valid for the default OpenAI provider but unavailable from the active provider. With Amazon Bedrock, `gpt-5.4-mini` is rejected while the provider static catalog exposes Bedrock model IDs such as `openai.gpt-5.5` and `openai.gpt-5.4`. This causes repeated background 404s and can surface a misleading turn error even when the main turn succeeds. Clients need an explicit way to ask app-server to resolve an unavailable helper model to the active provider default. That fallback must remain limited to providers with an authoritative static catalog so custom or dynamically discovered model IDs are not rewritten based on an incomplete catalog. Fixes #28741. ## What changed - Add the experimental `allowProviderModelFallback` option to `thread/start`, defaulting to `false` to preserve existing behavior. - Thread the option through thread creation and model selection. - When enabled for a static model manager, preserve requested models present in the catalog and replace unavailable models with the provider default. - Continue preserving explicit model IDs for dynamic model managers without fetching a catalog solely to validate them. - Document the new `thread/start` behavior in the app-server API overview. ## Test Temporary test-client harness: ``` ThreadStartParams { model: Some("gpt-5.4-mini".to_string()), allow_provider_model_fallback: true, ..Default::default() } ``` Command: ``` CODEX_HOME=/tmp/codex-bedrock-thread-start-home \ CODEX_E2E_BEDROCK_THREAD_START_ONLY=1 \ ./target/debug/codex-app-server-test-client \ --codex-bin ./target/debug/codex \ -c 'model_provider="amazon-bedrock"' \ send-message-v2 --experimental-api ignored ``` Relevant output: ``` > "method": "thread/start", > "params": { > "model": "gpt-5.4-mini", > "modelProvider": null, > "allowProviderModelFallback": true, > ... > } < "result": { < "model": "openai.gpt-5.5", < "modelProvider": "amazon-bedrock", < ... < } ```
Celia Chen ·
2026-06-25 18:24:34 +00:00 -
feat: use run agent task auth for inference (#19051)
## Stack This is PR 3 of the simplified HAI single-run-task stack: - [#19047](https://github.com/openai/codex/pull/19047) Agent Identity assertion and task-registration primitives, including the shared run-task helper used by existing Agent Identity JWT auth. - [#19049](https://github.com/openai/codex/pull/19049) Disabled-by-default ChatGPT auth opt-in that provisions/reuses persisted Agent Identity runtime auth and its single run task. - [#19051](https://github.com/openai/codex/pull/19051) Run-scoped provider auth that uses one backend-owned task id for first-party inference and compaction requests. [#19054](https://github.com/openai/codex/pull/19054) collapsed out of the active stack because the simplified design no longer needs a separate background/control-plane task helper. ## Summary This PR moves Agent Identity usage into provider auth resolution. That keeps `AgentAssertion` auth tied to first-party OpenAI provider requests instead of applying a late session-wide override that could affect local, custom, Bedrock, API-key, or external-bearer providers. What changed: - adds a small `ProviderAuthScope` struct carrying the run auth policy and session source needed by provider-scoped auth resolution - lets `Session` opt the existing `ModelClient` into `ChatGptAuth` policy when `use_agent_identity` is enabled, without adding a second model-client constructor - resolves Agent Identity only for first-party OpenAI provider auth paths - uses the persisted run task id from the `AgentIdentityAuth` record to build `AgentAssertion` auth for Responses requests - routes shared request setup through scoped provider auth so unary compact requests use the same run-task assertion path as inference turns - keeps local/custom/Bedrock/env-key/external-bearer provider auth unchanged - lets missing run-task state surface through the existing model-request error path instead of silently falling back to bearer auth This PR intentionally does not create thread-scoped, target-scoped, or background-scoped task identities. The run task is the only task Codex registers in this POC shape. ## Testing - `just test -p codex-model-provider` - `just test -p codex-core client::tests::provider_auth_scope_uses` - `just test -p codex-core remote_compact_uses_agent_identity_assertion`
Adrian ·
2026-06-24 22:31:41 -07:00 -
[codex] Add Ultra reasoning effort (#29899)
## Why Ultra should be one user-facing reasoning selection for work that benefits from both maximum reasoning and proactive multi-agent delegation. Without it, clients must coordinate maximum reasoning with the experimental `multiAgentMode` setting, even though the inference backend still expects its existing `max` effort value. This change makes reasoning effort the source of truth: clients select `ultra`, core derives proactive multi-agent behavior when the turn is eligible for multi-agent V2, and inference requests continue to use the backend-compatible `max` value. ## What changed - Add `ultra` as a first-class reasoning effort and preserve model-catalog ordering when exposing it to clients. - Convert `ultra` to `max` at the inference request boundary, including Responses HTTP/WebSocket requests, startup prewarm, compaction, and memory summarization. - Derive effective multi-agent mode per turn from effective reasoning effort: - eligible multi-agent V2 + `ultra` → `proactive` - eligible multi-agent V2 + any other effort → `explicitRequestOnly` - V1 or otherwise ineligible sessions → no multi-agent mode instruction - Keep the derived effective mode in turn context history so successive turns can emit a developer-message update only when the effective mode changes. - Remove selected multi-agent mode from core session configuration, turn construction, thread settings, resume/fork restoration, and subagent spawn plumbing. Subagents inherit reasoning effort and derive their own effective mode. - Retain the experimental app-server `multiAgentMode` fields for wire compatibility while marking them deprecated. Request values are accepted but ignored; compatibility response fields report `explicitRequestOnly`. - Display Ultra in the TUI using the order supplied by `model/list`. ## Validation - `just test -p codex-core ultra_reasoning_uses_max_for_requests` - `just test -p codex-tui model_reasoning_selection_popup`
Shijie Rao ·
2026-06-24 20:13:52 -07:00 -
[2/3] core: persist world state in rollouts (#29835)
## Why `WorldState` currently remembers its model-visible diff baseline only in memory. That leaves no durable source for restoring the exact baseline after resume, fork, rollback, or compaction. This is the second PR in the WorldState persistence stack, built on #29833 and following #29249. It records durable state transitions; the next PR will replay them during rollout reconstruction. ## What - Add a `world_state` rollout item containing either a full snapshot or an RFC 7386 JSON Merge Patch. - Persist a full snapshot after initial context and after compaction establishes a new context window. - Persist non-empty patches when later sampling steps or turns advance the WorldState baseline. - Write model-visible history before its matching WorldState record, so an interrupted write can only cause a safe repeated update on replay. - Preserve WorldState records for full-history forks while excluding them from thread previews, metadata, and app-server history materialization. Older binaries read rollout lines independently, so they skip the unknown `world_state` records while retaining the rest of the thread. ## Testing - `just test -p codex-core snapshot_merge_patch_changes_and_removes_nested_values` - `just test -p codex-core world_state_baseline_deduplicates_until_history_is_replaced` - `just test -p codex-core deferred_executor_compaction_preserves_then_updates_environment_once` - `just test -p codex-protocol` - `just test -p codex-rollout` - `just test -p codex-state` - `just test -p codex-thread-store` - `just test -p codex-app-server-protocol`
sayan-oai ·
2026-06-24 20:13:49 -07:00 -
Persist agent messages as response items (#29829)
## Why Inter-agent messages are recorded in live history as `ResponseItem::AgentMessage`, but rollouts stored `InterAgentCommunication` and rebuilt the response item during resume. This made the rollout differ from the actual Responses history. ## What changed - store the prepared `agent_message` response item directly - keep `trigger_turn` in a small local metadata record for fork truncation - keep reading older `inter_agent_communication` rollout items
jif ·
2026-06-24 15:43:10 +01:00 -
Support thread-level originator overrides (#29477)
## Why Work(TPP) threads can be launched from the Desktop app, but if they all keep the Desktop app's default originator then downstream attribution cannot distinguish local Work launches from cloud-backed Work launches. `thread/start.serviceName` already carries that launch signal, while `SessionMeta.originator` is the durable thread-level value that survives resume and fork. This change converts the Desktop Work service names into an effective originator at thread creation time, persists that originator with the thread, and keeps using it for later model requests and memory writes. ## What changed - Map `CODEX_WORK_LOCAL` and `CODEX_WORK_CLOUD` service names to per-thread originators, while preserving `CODEX_INTERNAL_ORIGINATOR_OVERRIDE` as the highest-precedence override. - Persist the effective originator in `SessionMeta.originator`, read it back on resume/fork, and inherit the parent originator for subagent spawns when there is no persisted session metadata. - Handle truncated `SpawnAgentForkMode::LastNTurns` forks by falling back to the live parent originator when the forked history no longer includes `SessionMeta`. - Thread the per-thread originator through Responses headers, websocket/compaction request paths, thread-store creation, rollout metadata, and memory stage-one telemetry. ## Verification - `just test -p codex-core agent::control::tests::spawn_thread_subagent_inherits_parent_originator_without_fork agent::control::tests::spawn_thread_subagent_fork_last_n_turns_inherits_parent_originator_without_session_meta thread_manager::tests::originator_override_precedes_service_name_remapping` - `just test -p codex-core agent::control::tests::resume_thread_subagent_restores_stored_metadata_and_effective_multi_agent_mode` - `just test -p codex-memories-write` - `just fix -p codex-core -p codex-memories-write` - `git diff --check`
alexsong-oai ·
2026-06-23 17:23:38 -07:00 -
core: rename metadata -> internal_chat_message_metadata_passthrough (#28968)
## Description This PR cuts Codex over from generic `ResponseItem.metadata` (introduced here: https://github.com/openai/codex/pull/28355) to `ResponseItem.internal_chat_message_metadata_passthrough`, which is the blessed path and has strongly-typed keys. For now we have to drop this MAv2 usage of `metadata`: https://github.com/openai/codex/pull/28561 until we figure out where that should live.
Owen Lin ·
2026-06-22 11:11:25 -07:00 -
Expose thread-level multi-agent mode (#28792)
## Why Once multi-agent mode can be selected per turn, clients also need to choose the initial selection when creating a thread and observe that selection through lifecycle and settings APIs. The selected value is intentionally distinct from the effective model-visible value: no client selection is represented as `null`, even though an eligible multi-agent v2 turn derives `explicitRequestOnly` as its effective default. ## What changed - Add the optional experimental `thread/start.multiAgentMode` parameter and pass it through thread creation. - Preserve an omitted initial value as an unset selection rather than eagerly storing `explicitRequestOnly`. - Apply an explicit `thread/start` selection to the first turn through the session configuration established at thread creation. - Restore the latest persisted effective mode as the selected baseline on cold resume when rollout history contains one. - Inherit the optional selected mode from a loaded parent when creating related runtime threads. - Return the current selected `multiAgentMode` from `thread/start`, `thread/resume`, `thread/fork`, and thread settings, using `null` when no mode is selected. - Keep lifecycle reporting independent from model capability and feature eligibility; core turn construction remains responsible for calculating and persisting the effective mode. ## Not covered - Clearing an existing loaded-session selection back to unset through `turn/start`; omitted or `null` currently retains the session's selection. - A TUI control, slash command, or `config.toml` preference. ## Verification - `CARGO_INCREMENTAL=0 just test -p codex-app-server-protocol` - `CARGO_INCREMENTAL=0 just test -p codex-app-server multi_agent_mode` The focused app-server coverage verifies explicit `thread/start` initialization, first-turn prompting, nullable reporting for an omitted selection, and retention of selections that are not currently runtime-eligible. ## Stack Stacked on #28685. This PR contains only the thread initialization and lifecycle/settings API layer.
Shijie Rao ·
2026-06-19 10:50:44 +02:00 -
[codex] Assign response item IDs when recording history (#28814)
## Why Client-created response items enter history without IDs, so their identity is lost across rollout persistence and resume. IDs should be assigned once at the history-recording boundary, while IDs returned by the server must remain unchanged. The Responses API validates item IDs using type-specific prefixes. Locally generated IDs therefore use the matching prefix plus a hyphenated UUIDv7, keeping them valid while distinguishable from server-generated IDs. Because this changes persisted history and provider request shapes, the behavior is opt-in behind the under-development `item_ids` feature. Compaction triggers remain request controls whose API shape does not accept an ID. ## What changed - Register the disabled-by-default `item_ids` feature and expose it in `config.schema.json`. - Make supported optional `ResponseItem` IDs serializable and expose them in the generated app-server schemas. - When `item_ids` is enabled, assign an ID during conversation-history preparation if an item has no ID. - Generate type-prefixed, hyphenated UUIDv7 IDs using the Responses API item conventions. - Preserve existing server IDs without rewriting them. - Persist assigned IDs in rollouts and include them in subsequent Responses requests. - Remove the unsupported ID field from `CompactionTrigger` and document why it has no ID. - Add integration coverage for enabled ID persistence, preservation of server IDs, and omission of generated IDs while the feature is disabled. `prepare_conversation_items_for_history` is the single response-item ID allocation boundary. ## Test plan - `just test -p codex-features` - `just test -p codex-core response_item_ids_persist_across_resume_and_preserve_server_ids` - `just test -p codex-core non_openai_responses_requests_omit_item_turn_metadata` - `just test -p codex-core resize_all_images_prepares_failures_before_history_insertion` - `just test -p codex-protocol` - `just test -p codex-app-server-protocol` - `just test -p codex-api azure_default_store_attaches_ids_and_headers`
pakrym-oai ·
2026-06-18 17:30:55 -07:00 -
Support
openai/formextended form elicitations (#27500)# Summary Allow App Server clients to opt into `openai/form` MCP elicitations.
Gabriel Peal ·
2026-06-18 11:54:49 -07:00 -
[codex] Add optional IDs to response items (#28812)
## Why `ResponseItem` variants do not have a consistent internal ID shape: some variants carry required IDs, some carry optional IDs, and some cannot represent an ID at all. The existing fields also use inconsistent serde, TypeScript, and JSON-schema annotations. A single enum-level access path is needed before history recording can assign and retain IDs. This PR establishes that internal model only. It intentionally does not generate or serialize IDs; allocation and wire persistence are isolated in the stacked follow-up. ## What changed - Give every concrete `ResponseItem` variant an `Option<String>` ID field. - Apply the same internal-only annotations to every ID field: `#[serde(default, skip_serializing)]`, `#[ts(skip)]`, and `#[schemars(skip)]`. - Add `ResponseItem::id()` and `ResponseItem::set_id()` as the shared accessors. - Preserve IDs when history items are rewritten for truncation. - Adapt consumers that previously assumed reasoning and image-generation IDs were required. - Regenerate app-server schemas so the hidden fields are represented consistently. The serde catch-all `ResponseItem::Other` remains ID-less because it must remain a unit variant. ## Test plan - `cargo check --tests -p codex-core -p codex-api -p codex-rollout-trace -p codex-image-generation-extension` - `just test -p codex-protocol` - `just test -p codex-app-server-protocol` - `just test -p codex-api -p codex-rollout-trace -p codex-image-generation-extension` - `just test -p codex-core event_mapping`
pakrym-oai ·
2026-06-17 18:27:43 -07:00 -
feat(core): add metadata field to ResponseItem (#28355)
## Description This PR adds an optional `metadata` field to `ResponseItem` for Responses API calls. Only mechanical plumbing, no actual values populated and sent yet. Turns out just adding a new field to `ResponseItem` has quite a large blast radius already. This change is backwards compatible because `metadata` is optional and omitted when absent, so existing response items and rollout history without it still deserialize and requests that do not set it keep the same wire shape. For provider compatibility, we strip out `metadata` before non-OpenAI Responses requests so Azure and AWS Bedrock never see this field. My followup PR here will actually make use of it to start storing and passing along `turn_id`: https://github.com/openai/codex/pull/28360 ## What changed - Added `ResponseItemMetadata` with optional `turn_id`, plus optional `metadata` on Responses API item variants and inter-agent communication. - Preserved item metadata through response-item rewrites such as truncation, missing tool-output synthesis, compaction history rebuilding, visible-history conversion, rollout/resume, and generated app-server schemas/types. - Strip item metadata from non-OpenAI Responses requests while preserving it for OpenAI-shaped requests. - Updated the mechanical fixture/test construction churn required by the new optional field.
Owen Lin ·
2026-06-15 15:05:28 -07:00 -
[codex] Add external agent import result accounting (#28008)
## Why External-agent imports can complete synchronously or continue in the background for plugins/sessions. Clients need a stable import id to correlate the immediate response with the eventual completion notification, and the completion payload needs enough accounting to show which artifact types succeeded or failed without hiding partial failures. ## What Changed - `externalAgentConfig/import` now returns an `importId`; `externalAgentConfig/import/completed` includes the same `importId` plus type-level `itemResults`. - Completed `itemResults` report `successCount`, `errorCount`, `successes`, and `rawErrors` for each migrated item type. - Added protocol/schema/TypeScript types for import successes, raw errors, and type-level results. No progress notification is included in the final PR. - `ExternalAgentConfigService::import` now returns an outcome object with synchronous item results and pending plugin imports. - Plugin import outcomes track succeeded/failed marketplaces, plugin ids, and raw errors. Plugin failures can be reported in completed accounting while later migration items continue. - Non-plugin synchronous import failures still fail the request, so invalid config/skills-style failures are not reported as a successful import response. - Session imports now return item results. Successful imports include the source session path and imported thread id; prepare, persist, ledger, and source-validation failures become raw errors in completion accounting where the import can continue. - The request processor generates the `importId`, aggregates synchronous results with background plugin/session results, and sends a single completed notification when all selected work is done. - App-server docs and generated schema fixtures were updated for the new response/completed payload shapes. ## Validation - `just test -p codex-app-server-protocol` - `just test -p codex-app-server-client event_requires_delivery` - `CODEX_SQLITE_HOME=/private/tmp/codex-app-server-review-sync-error just test -p codex-app-server external_agent_config_import_returns_error_for_failed_sync_import` - `CODEX_SQLITE_HOME=/private/tmp/codex-app-server-review-external-agent just test -p codex-app-server external_agent_config` Note: local sandbox validation used `CODEX_SQLITE_HOME` because the default sqlite state path is read-only in this environment.
charlesgong-openai ·
2026-06-15 13:25:42 -07:00 -
[codex] simplify memory read metrics (#28164)
## Why Memory read telemetry currently reconstructs the executable shell command after a tool call finishes. That duplicates shell, login-policy, and cwd resolution owned by the tool handlers, and can diverge from the environment-specific command that unified exec actually ran. ## What changed - Expose the existing restricted shell-script parser directly for raw script text. - Parse `shell_command` and `exec_command` input into plain command argv before classifying memory reads. - Preserve all-or-nothing safe-command validation for multi-command scripts. - Remove cwd resolution, shell selection, and the unnecessary async boundary from memory read metric emission. ## Testing - `just test -p codex-shell-command` - `cargo check -p codex-core`
pakrym-oai ·
2026-06-15 08:28:02 -07:00 -
build: run buildifier from just fmt (#28125)
## Intent Keep Bazel and Starlark files consistently formatted without requiring contributors to install or version buildifier themselves. ## Implementation - Add a SHA-256-pinned, cross-platform DotSlash manifest for buildifier v8.5.1. - Run buildifier from the shared `just fmt` and `just fmt-check` driver, with Windows-safe explicit DotSlash invocation. - Provision DotSlash in formatting CI and contributor devcontainers, and document the source-build prerequisite. - Apply the initial mechanical buildifier formatting baseline.
Adam Perry @ OpenAI ·
2026-06-13 21:43:39 -07:00 -
Support plaintext agent messages (#27830)
## Why Multi-agent v2 `send_message` deliveries already reach the receiving model as typed `agent_message` items with encrypted content. Child-completion notifications are generated by Codex itself, so their content is plaintext and previously fell back to a serialized JSON envelope inside an assistant message. With plaintext `input_text` supported for `agent_message`, both delivery paths can use the same model-visible type while preserving explicit author and recipient metadata. ## What changed - add plaintext `input_text` support to `AgentMessageInputContent` and regenerate the affected app-server schemas - preserve `InterAgentCommunication` as structured mailbox input instead of converting it to assistant text - record delivered communications as typed `agent_message` history items - persist a dedicated rollout item so local delivery metadata such as `trigger_turn` remains available without leaking into the Responses request - reconstruct typed agent messages on resume and preserve fork-turn truncation behavior - remove request-time assistant-content parsing - preserve plaintext and encrypted inter-agent deliveries in stage-one memory inputs - normalize and link plaintext and encrypted agent messages in rollout traces without treating inbound messages as child results - cover the real MultiAgent V2 child-completion path end to end with deterministic mailbox synchronization ## Verification - `just test -p codex-core plaintext_multi_agent_v2_completion_sends_agent_message` - `just test -p codex-core input_queue_drains_mailbox_in_delivery_order record_initial_history_reconstructs_typed_inter_agent_message fork_turn_positions_use_inter_agent_delivery_metadata` - `just test -p codex-memories-write serializes_inter_agent_communications_for_memory` - `just test -p codex-rollout-trace agent_messages_preserve_routing_and_content sub_agent_started_activity_creates_spawn_edge` - `just test -p codex-rollout-trace agent_result_edge_falls_back_to_child_thread_without_result_message` - `just test -p codex-protocol -p codex-rollout -p codex-app-server-protocol`
jif ·
2026-06-12 13:50:04 -07:00 -
[codex] Load AGENTS.md from all bound environments (#27696)
## Why We already have the machinery to support multiple environments on a single thread, but we only show the model the contents of `AGENTS.md` files in the primary environment. We should show the model all of the relevant project instructions when we know there's more than one environment. ## Known Gaps As discussed in the RFC, this implementation: 1. doesn't handle environments being added/removed to/from the thread after its creation 2. it doesn't enforce an aggregate context budget across environments, and instead applies the configured project maximum independently to each environment ## Implementation - Discover project instructions in environment order with an independent byte budget per environment and preserve source provenance/order. - Keep the legacy fragment byte-for-byte when exactly one environment contributes project instructions; use environment-labeled sections when two or more environments contribute. - Freeze the complete rendered fragment in `LoadedAgentsMd`, insert it directly into requests, and recognize both layouts in contextual and memory filtering. - Add exact rendering, independent-budget, source-order, creation-snapshot, and consumer coverage without changing app-server schemas.
Adam Perry @ OpenAI ·
2026-06-12 00:10:06 -07:00 -
[codex] Remove async_trait from first-party code (#27475)
## Why First-party async traits should expose their `Send` contracts explicitly without requiring `async_trait`. This completes the migration pattern established in #27303 and #27304. ## What changed - Replaced the remaining first-party `async_trait` traits with native return-position `impl Future + Send` where statically dispatched and explicit boxed `Send` futures where object safety is required. - Kept implementations behavior-preserving, outlining existing async bodies into inherent methods where that keeps the diff reviewable. - Removed all direct first-party `async-trait` dependencies and the workspace dependency declaration. - Added a cargo-deny policy that permits `async-trait` only through the remaining transitive wrapper crates. - Updated `rand` from 0.8.5 to 0.8.6 to resolve RUSTSEC-2026-0097 and keep the full cargo-deny check passing. ## Validation - `just test -p codex-exec-server`: 216 passed, 2 skipped. - `just test -p codex-model-provider`: 39 passed. - `just test -p codex-core` and `just test`: changed tests passed; remaining failures are environment-sensitive suites unrelated to this migration. - `cargo deny check` - `just fix` - `just fmt` - `cargo shear` - `just bazel-lock-check`
Adam Perry @ OpenAI ·
2026-06-11 18:16:39 -07:00 -
core: Consolidate Responses API Codex metadata (#27122)
## What Introduce a `CodexResponsesMetadata` struct that defines all the core metadata we send to Responses API. Example fields are `thread_id`, `turn_id`, `window_id`, etc. Going forward, `client_metadata["x-codex-turn-metadata"]` will be the canonical way Codex sends metadata to Responses API across both HTTP and websocket transports. For now, we continue to emit the existing top-level HTTP headers and top-level `client_metadata` fields from the same `CodexResponsesMetadata` struct for compatibility reasons. Also, app-server clients who specify additional `responsesapi_client_metadata` via `turn/start` and `turn/steer` will have those fields merged into `client_metadata["x-codex-turn-metadata"]`, but cannot override the reserved fields that core uses (i.e. the fields in `CodexResponsesMetadata`). ## Why Responses API request instrumentation is the source of truth for downstream Codex analytics that join requests by Codex IDs such as session, thread, turn, and context window. Before this change, those values were assembled through several request-specific paths: HTTP request bodies, websocket handshake headers, websocket `response.create` payloads, compaction requests, and the rich `x-codex-turn-metadata` envelope all had their own wiring. That made metadata propagation easy to drift across API-key/direct Responses API requests, ChatGPT-auth/proxied requests, websocket requests, and compaction requests. It also made additions like `window_id` error-prone because a field could be added to one transport projection but missed in another. ## What changed - Added `CodexResponsesMetadata` as the core-owned snapshot for Codex metadata sent to ResponsesAPI. - Render `client_metadata["x-codex-turn-metadata"]`, flat `client_metadata` projections, and direct compatibility headers from that same snapshot. - Include the known Codex-owned fields in the turn metadata blob, including installation/session/thread/turn/window IDs, request kind, lineage, sandbox/workspace metadata, timing, and compaction details. - Treat app-server `responsesapi_client_metadata` as enrichment for the Codex turn metadata blob while preventing those extras from overriding Codex-owned fields. - Use the same metadata path for normal turns, websocket prewarm, local compaction, remote v1 compaction, and remote v2 compaction. - Keep websocket connection-only preconnect metadata separate so handshakes carry compatibility identity headers without inventing a fake turn metadata blob. ## Verification - `cargo check -p codex-core` - `just fix -p codex-core`
Owen Lin ·
2026-06-11 13:42:09 -07:00 -
[codex] Store compact window id in rollout (#27264)
## Why Compaction window identity is part of session history, not model-client transport state. Persisting it with the compacted rollout item lets resumed threads continue from the reconstructed window without keeping mutable window state on `ModelClient`. ## What changed - Added `window_id` to `CompactedItem` and stamp it when `replace_compacted_history` installs compacted history. - Moved auto-compact window id ownership into `AutoCompactWindow` / `SessionState`; `ModelClient` now receives the request window id from callers instead of storing it. - Returned `window_id` from rollout reconstruction for resume. Reconstruction uses the newest surviving compacted item's stored `window_id` when present, and falls back to the legacy compacted-item count when it is absent. - Kept fork startup at the fresh default window id and updated direct model-client tests to pass explicit test window ids. ## Validation - `cargo check -p codex-core --tests`
pakrym-oai ·
2026-06-10 08:47:16 -07:00 -
feat: use provider defaults for memory models (#27129)
## Why Memory startup used hardcoded OpenAI model slugs for extraction and consolidation. That works for the default OpenAI-compatible path, but provider-specific backends can require different model identifiers. In particular, Amazon Bedrock should use its Bedrock model ID for these background memory requests instead of the OpenAI `gpt-5.4-mini` / `gpt-5.4` slugs. ## What Changed - Added provider-owned preferred memory model methods alongside `approval_review_preferred_model`. - Updated memory extraction and consolidation to resolve their default model through the active `ModelProvider`. - Added Amazon Bedrock overrides so both memory stages use `openai.gpt-5.4` through Bedrock’s provider-specific model ID. - Kept explicit `memories.extract_model` and `memories.consolidation_model` config overrides taking precedence. - Added startup coverage for default OpenAI and Bedrock memory model selection. #closes #26288
Celia Chen ·
2026-06-09 23:49:09 +00:00 -
Load selected executor skills through extensions (#27184)
## Why CCA is moving toward a split runtime where the orchestrator may not have a filesystem, while executors can expose preinstalled plugins and skills. A thread therefore needs to select capabilities without asking app-server or core to interpret executor-owned paths through the orchestrator's filesystem. The longer-term model is broader than executor skills: - A plugin is a bundle of skills, MCP servers, connectors/apps, and hooks. - A plugin root can be local, executor-owned, or hosted by a backend. - Components inside one plugin can use different access and execution mechanisms. A skill may be read from a filesystem or through backend tools; an HTTP MCP server can run without an executor; a stdio MCP server or hook needs an execution environment. - Core should carry generic extension initialization data. The extension that owns a component should discover it, expose it to the model, and invoke it through the appropriate runtime. This PR establishes that architecture through one complete vertical: selecting a root on an executor, discovering the skills beneath it, exposing those skills to the model, and reading an explicitly invoked `SKILL.md` through the same executor. ## Contract `thread/start` gains an experimental `selectedCapabilityRoots` field: ```json { "selectedCapabilityRoots": [ { "id": "deploy-plugin@1", "location": { "type": "environment", "environmentId": "workspace", "path": "/opt/codex/plugins/deploy" } } ] } ``` The root is intentionally not classified as a "plugin" or "skill" in the API. It can point at a standalone skill, a directory containing several skills, or a plugin containing skills and other components. This PR only teaches the skills extension how to consume it; later extensions can resolve MCP, connector, and hook components from the same selection. The platform-supplied `id` is stable selection identity. The location says which runtime owns the root and gives that runtime an opaque path. App-server does not inspect or canonicalize the path. ## What changed ### Generic thread extension initialization App-server converts selected roots into `ExtensionDataInit`. Core carries that generic initialization value until the final thread ID is known, then creates thread-scoped `ExtensionData` before lifecycle contributors run. This keeps `Session` and core independent of the capability-selection contract. The initialization value is consumed during construction; it is not retained as another long-lived `Session` field. ### Executor-backed skills The skills extension now owns an `ExecutorSkillProvider` that: - resolves the selected environment through `EnvironmentManager` - discovers, canonicalizes, and reads skills through that environment's `ExecutorFileSystem` - contributes the bounded selected-skill catalog as stable developer context - reads an explicitly invoked skill body through the authority that listed it - warns when an environment or root is unavailable - never falls back to the orchestrator filesystem for an executor-owned root Skill catalog and instruction fragments have hard byte bounds, which also bound them below the 10K-token per-item context limit. If a selected executor skill has the same name as a legacy local skill, the executor selection owns that invocation and the local body is not injected a second time. Existing local and bundled skill loading remains in place. Omitting `selectedCapabilityRoots` therefore preserves current local-only behavior. ## Current semantics - Only environment-owned locations are represented in this first contract. - Roots are resolved by the destination extension, not by app-server or core. - An unavailable executor or invalid root produces a warning and no capabilities from that root; it does not trigger a local-filesystem fallback. - Selection applies to a newly started active thread. - MCP servers, connectors, and hooks beneath a selected plugin root are not activated yet. - Selection is not yet persisted or inherited across resume, fork, or subagent creation. Existing local capabilities continue to behave as they do today in those flows. ## Planned vertical follow-ups 1. **Hosted HTTP MCP:** add an extension-backed HTTP MCP source that works without an executor, then replace the special-purpose MCP plugins loader with that implementation. 2. **Executor MCP:** register and execute stdio MCP servers through the environment that owns the selected plugin root. 3. **Backend skills:** add a hosted skill source whose catalog and bodies are accessed through extension tools rather than a filesystem. 4. **Connectors and hooks:** activate those components through their owning extensions, using the same selected-root boundary and component-specific runtime. 5. **Durable selection:** define the desired-selection lifecycle, persist it, and make resume, fork, and subagent inheritance explicit rather than accidental. 6. **Local convergence:** incrementally route existing local plugin, skill, and MCP loading through the same extension model while preserving current local behavior. Each follow-up remains reviewable as an end-to-end capability. The platform selects roots, generic thread extension data carries the selection, and the owning extension resolves and operates its component. ## Verification Coverage added for: - app-server end-to-end discovery and explicit invocation of a skill inside an executor-selected plugin root - exclusive invocation when a selected executor skill collides with a local skill name - executor filesystem authority for discovery, canonicalization, and reads - thread extension initialization before lifecycle contributors run - stable executor catalog context, explicit invocation, context rebuilding, hidden skills, and preserved host/remote catalog behavior Targeted protocol, core-skills, skills-extension, core lifecycle, and app-server executor-skill tests were run during development.jif ·
2026-06-09 19:51:54 +02:00 -
Pair thread environment settings (#26687)
## Why Thread cwd and environment selections are a single logical setting in core: updating one without the other can silently desynchronize the next-turn execution context. This change makes that relationship explicit in the internal thread settings flow while preserving the existing app-server public API shape. ## What changed - Moved the cwd/environment pair through internal `ThreadSettingsOverrides.environment_settings` instead of a top-level internal `cwd` field. - Kept `thread/settings/update` public params unchanged, with app-server translating top-level `cwd` into the paired internal settings shape. - Moved `Op::UserInput` environment overrides into thread settings so user turns and settings updates use the same core path. - Updated core, app-server, MCP, memories, sample, and test callsites to construct the paired settings shape. ## Verification - `git diff --check` - Local test run starting after PR creation.
pakrym-oai ·
2026-06-08 13:55:15 -07:00 -
[codex] Support model-defined reasoning efforts (#26444)
## Summary - accept non-empty model-defined reasoning effort values while preserving built-in effort behavior - propagate the non-Copy effort type through core, app-server, TUI, telemetry, and persistence call sites - preserve string wire encoding and expose an open-string schema for clients - update model selection and shortcut behavior for model-advertised effort values ## Root cause `ReasoningEffort` gained a string-backed custom variant, so it could no longer implement `Copy` or rely on derived closed-enum serialization. Existing consumers still moved effort values from shared references and assumed a fixed built-in value set. ## Validation - `just fmt` - Local tests and compilation were not run per request; relying on CI.
Ahmed Ibrahim ·
2026-06-04 13:36:24 -07:00 -
feat: show enterprise monthly credit limits in status (#24812)
## Summary Enterprise users can have an effective monthly credit limit, but Codex `/status` currently drops that metadata from the account-usage response. This change adds the optional `spend_control.individual_limit` projection to the existing rate-limit snapshot flow. The backend client reads the monthly limit, app-server exposes it as `individualLimit`, and the TUI renders a `Monthly credit limit` row through the existing progress-bar renderer. When the backend does not return an effective monthly limit, existing rate-limit behavior is unchanged. ## Existing backend state The account-usage backend already returns the effective monthly limit and current usage together: ```json { "spend_control": { "reached": false, "individual_limit": { "limit": "25000", "used": "8000", "remaining": "17000", "used_percent": 32, "remaining_percent": 68, "reset_after_seconds": 86400, "reset_at": 1778137680 } } } ``` Before this change, Codex projected rolling `primary` and `secondary` windows plus `credits`. It ignored `spend_control.individual_limit`, so app-server clients and `/status` could not render the monthly cap. The updated flow is: ```text account usage backend -> backend-client reads spend_control.individual_limit -> existing rate-limit snapshot carries optional individual_limit -> app-server exposes optional individualLimit -> TUI renders Monthly credit limit ``` ## App-server contract `account/rateLimits/read` and sparse `account/rateLimits/updated` notifications now include an additive nullable `rateLimits.individualLimit` field: ```json { "individualLimit": { "limit": "25000", "used": "8000", "remainingPercent": 68, "resetsAt": 1778137680 } } ``` In an `account/rateLimits/read` response, `null` means no monthly limit is available. `account/rateLimits/updated` remains a sparse rolling notification: clients merge available values into their most recent `account/rateLimits/read` snapshot or refetch. Nullable account metadata in a rolling notification does not clear a previously observed value. ## Design decisions - Extend the existing rate-limit snapshot instead of introducing a separate request or wire-level update protocol. - Keep the Codex projection narrow: `/status` needs the effective limit, current usage, remaining percentage, and reset timestamp. - Render the monthly row through the existing progress-bar renderer, with one optional detail line for `8,000 of 25,000 credits used`. - Keep the backend response optional so existing accounts and older usage states preserve their current behavior. - Preserve cached monthly metadata when sparse rolling notifications omit it. Live account-usage reads remain authoritative and can clear a removed limit. ## Visual evidence ```text Monthly credit limit: [██████████████░░░░░░] 68% left (resets 07:08 on 7 May) 8,000 of 25,000 credits used ``` Snapshot: `codex-rs/tui/src/status/snapshots/codex_tui__status__tests__status_snapshot_includes_enterprise_monthly_credit_limit.snap` ## Testing Tests: generated app-server schema verification, protocol tests, backend-client tests, app-server integration coverage, TUI snapshot coverage, formatting, and workspace lint cleanup.efrazer-oai ·
2026-06-01 21:25:42 -07:00 -
app-server: remove experimental persist_extended_history bool flag (#25712)
## Summary Remove the dead experimental `persistExtendedHistory` app-server flag and collapse rollout persistence to the single policy app-server already used. ## What Changed - Removed `persistExtendedHistory` from v2 thread start/resume/fork params and deleted its deprecation notice path. - Removed the persistence-mode enums and plumbing through core, rollout, and thread-store. - Made rollout filtering mode-free, keeping the existing limited persisted-history behavior. ## Test Plan - `just write-app-server-schema` - `cargo nextest run --no-fail-fast -p codex-app-server-protocol schema_fixtures` - `cargo nextest run --no-fail-fast -p codex-app-server thread_shell_command_history_responses_exclude_persisted_command_executions` - `cargo nextest run --no-fail-fast -p codex-rollout -p codex-thread-store` - final `rg` for removed flag/type names
Owen Lin ·
2026-06-01 23:33:42 +00:00 -
store and expose parent_thread_id on Threads (#25113)
## Why This PR https://github.com/openai/codex/pull/24161#discussion_r3325692763 revealed a subagent data modeling issue, where we overloaded `forked_from_id` to also mean `parent_thread_id`. That's incorrect since guardian and review subagents can be a subagent and NOT fork the main thread's history. The solution here is to explicitly store a new `parent_thread_id` on `SessionMeta`, alongside `forked_from_id` which already exists. While we're at it, also expose it in the app-server protocol on the `Thread` object. A thread->subagent relationship and a fork of thread history are orthogonal concepts. ## What Changed - Added top-level `parent_thread_id` persistence on `SessionMeta` and runtime/session plumbing through `SessionConfiguredEvent`, `CodexSpawnArgs`, `SessionConfiguration`, `ThreadConfigSnapshot`, `TurnContext`, and `ModelClient`. - Made turn metadata, request headers, analytics, and subagent-start events read the separate runtime/top-level parent field instead of deriving general parent lineage from `SessionSource` or `forked_from_thread_id`. - Passed parent lineage separately at delegated subagent, review, guardian, agent-job, and multi-agent spawn construction sites; copied-history fork lineage remains derived only from `InitialHistory`. - Persisted and exposed parent lineage through rollout/thread-store projections and app-server v2 `Thread.parentThreadId`. - Updated app-server README text and regenerated app-server schema fixtures for the additive `parentThreadId` response field.
Owen Lin ·
2026-06-01 04:33:20 +00:00 -
Move memories root setup out of core config (#24758)
## Why Config loading should not create or write-authorize the memories root just because memory support exists. Memory startup is the code path that actually materializes that tree. ## What - Stop creating the memories root during Config load and remove it from legacy workspace-write projections. - Grant the memories root read access only when the memories feature and use_memories are enabled. - Create the memories root inside memories startup before seeding extension instructions. - Update config and startup tests around the ownership boundary. ## Tests - just fmt - just fix -p codex-core - just fix -p codex-memories-write - just test -p codex-core memory_tool_makes_memories_root_readable_without_creating_or_widening_writes workspace_write_includes_configured_writable_root_once_without_memories_root permission_profile_override_keeps_memories_root_out_of_legacy_projection permissions_profiles_allow_direct_write_roots_outside_workspace_root default_permissions_profile_populates_runtime_sandbox_policy - just test -p codex-memories-write memories_startup_creates_memory_root Note: a broader just test -p codex-core run is not clean in this sandbox; it hit missing test_stdio_server plus seatbelt, realtime, and environment-sensitive failures. The changed config tests above pass.
jif-oai ·
2026-05-28 11:51:24 +02:00 -
[codex] add compaction metadata to turn headers (#24368)
## Summary - Add `request_kind` values for foreground turn, startup prewarm, compaction, and detached memory model requests. - Attach compaction dispatch metadata to local Responses, legacy `/v1/responses/compact`, and remote v2 compact requests. - Add the existing logical context-window identifier as `window_id` on turn-owned model request metadata. - Keep identity fields optional for detached memory requests, while still emitting `request_kind="memory"` in non-git/no-sandbox workspaces. ## Root Cause `x-codex-turn-metadata` has more than one producer. Foreground turns and compaction requests own a real turn and should carry that turn identity. Detached memory stage-one requests do not own a foreground turn, so absent identity fields are valid rather than missing data. Startup websocket prewarm is also a model request, but it has `generate=false` and must not be counted as a foreground turn. `thread_source` or session source identifies where a thread came from (for example review, guardian, or another subagent). `request_kind` identifies what the current outbound model request is doing (`turn`, `prewarm`, `compaction`, or `memory`). A review or guardian thread can issue either a normal turn request or a compaction request, so source cannot replace request kind. ## Behavior / Impact - Ordinary foreground requests send `request_kind="turn"`, their real identity fields, and `window_id="<thread_id>:<window_generation>"`. - Startup websocket warmup requests send `request_kind="prewarm"` so they are not counted as foreground turns. - Compaction requests send `request_kind="compaction"`, their real owning turn identity, the existing `window_id`, and `compaction.{trigger,reason,implementation,phase,strategy}`. - Detached memory stage-one requests send `request_kind="memory"` without `session_id`, `thread_id`, `turn_id`, or `window_id`; when no workspace metadata exists, the kind-only header is still emitted. - `session_id`, `thread_id`, `turn_id`, and `window_id` remain optional in the header schema because detached memory requests do not own a foreground turn or context window. - `window_id` is not a new ID system: it is copied from the already-sent `x-codex-window-id` / WS client metadata value at model-request dispatch time. - Existing `x-codex-window-id` HTTP/WS emission, value format, generation advancement, resume behavior, and fork reset behavior are unchanged. - `request_kind`, `window_id`, and upstream turn-owned identity fields remain schema-owned; input `responsesapi_client_metadata` cannot replace their canonical values. - No table, DAG, export, app-server API, or MCP `_meta` schema changes are included. A compaction attempt stopped by a pre-compact hook issues no model request and therefore has no request header; its outcome remains in analytics events. Status, error, duration, and token deltas also remain analytics fields rather than request-header fields. Future detached-memory attribution using a real initiating turn ID as `trigger_turn_id` is intentionally not part of this PR. ## Sync With Main - Final pushed head `716342e79` is rebased onto `origin/main@0d37db4b2`. - The metadata conflict came from upstream `#24160`, which added `forked_from_thread_id` on the same `turn_metadata` surface. Resolution preserves that field and its protection from client metadata override alongside this PR's request-kind, compaction, and window-id fields. - While resolving the overlapping commits, I removed an accidental recursive model-request overlay and a duplicate detached-memory header builder before completing the rebase. ## Latency / User Experience Boundary - Foreground turns perform no new filesystem, git, or network work. New fields are inserted into metadata already serialized for outgoing requests. - Compaction issues the same model/HTTP requests with the same prompt, model, service tier, and sampling settings; only metadata bytes change. - Startup prewarm already sent metadata; it is now correctly classified as `prewarm`. - Non-git detached memory now sends a small kind-only metadata header rather than no header. - This client diff adds no user-visible latency mechanism beyond negligible serialization and header bytes on already-existing requests. ## Validation On conflict-resolved head `1d35c2cfb` based on `origin/main@487521733`: - `just fmt` (passed) - `just fix -p codex-core` (passed) - `git diff --check origin/main...HEAD` (passed) - `just test -p codex-core -E 'test(turn_metadata) | test(websocket_first_turn_uses_startup_prewarm_and_create) | test(responses_stream_includes_turn_metadata_header_for_git_workspace_e2e) | test(responses_websocket_forwards_turn_metadata_on_initial_and_incremental_create) | test(remote_compact_v2_retries_failures_with_stream_retry_budget) | test(window_id_advances_after_compact_persists_on_resume_and_resets_on_fork)'` (`23 passed`; `bench-smoke` passed) - `just test -p codex-app-server -E 'test(turn_start_forwards_client_metadata_to_responses_request_v2) | test(turn_start_forwards_client_metadata_to_responses_websocket_request_body_v2) | test(auto_compaction_remote_emits_started_and_completed_items)'` (`3 passed`; `bench-smoke` passed) - `just test -p codex-memories-write` (`29 passed`; `bench-smoke` passed)ningyi-oai ·
2026-05-27 11:09:33 -07:00 -
Uprev Rust toolchain pins to 1.95.0 (#24684)
## Summary - Bump the workspace Rust toolchain from `1.93.0` to `1.95.0` across Cargo, Bazel, CI, release workflows, devcontainers, and the Codex environment config. - Refresh `MODULE.bazel.lock` so the Bazel Rust toolchain artifacts match the new version. - Leave purpose-specific toolchains unchanged, including the `argument-comment-lint` nightly and the upstream `rusty_v8` `1.91.0` build pin. - Includes fixes for new lints from `just fix` and a few codex-authored fixes for lints without a suggestion.
Adam Perry @ OpenAI ·
2026-05-26 20:59:47 -07:00 -
Add experimental turn additional context (#24154)
## Summary Adds experimental `additionalContext` support to `turn/start` and `turn/steer` so clients can provide ephemeral external context, such as browser or automation state, without turning that plumbing into a visible user prompt or triggering user-prompt lifecycle behavior. ## API Shape The parameter shape is: ```ts additionalContext?: Record<string, { value: string kind: "untrusted" | "application" }> | null ``` Example: ```json { "additionalContext": { "browser_info": { "value": "Active tab is CI failures.", "kind": "untrusted" }, "automation_info": { "value": "CI rerun is in progress.", "kind": "application" } } } ``` The keys are opaque and caller-defined. ## Context Injection When provided, accepted entries are inserted into model context as hidden contextual message items, not as visible thread user-message items. `kind: "untrusted"` entries are inserted with role `user`: ```text <external_${key}>${value}</external_${key}> ``` `kind: "application"` entries are inserted with role `developer`: ```text <${key}>${value}</${key}> ``` Values are not escaped. Each value is truncated to 1k approximate tokens before wrapping. For `turn/start`, accepted additional context is inserted before normal user input. For `turn/steer`, additional context is merged only when the steer includes non-empty user input; context-only steers still reject as empty input. ## Dedupe Strategy `AdditionalContextStore` lives on session state and stores the latest complete additional-context map. Each `turn/start` or non-empty `turn/steer` treats its `additionalContext` as the current complete set of values. Entries are injected only when the key is new or the exact entry for that key changed, including `value` or `kind`. After merging, the store is replaced with the provided map, so omitted keys are removed from the retained set and can be injected again later if reintroduced. Omitting `additionalContext`, passing `null`, or passing an empty object resets the store to empty and injects nothing. ## What Changed - Threads experimental v2 `additionalContext` through app-server into core turn start and steer handling. - Adds separate contextual fragment types for untrusted user-role context and application developer-role context. - Uses pending response input items so additional context can be combined with normal user input without treating it as prompt text. - Adds integration coverage for start/steer flow, role routing, dedupe/reset behavior, deletion/re-add behavior, hook-blocked input behavior, empty context-only steer rejection, external-fragment marker matching, and truncation.pakrym-oai ·
2026-05-26 13:02:34 -07:00 -
Move memory state to a dedicated SQLite DB (#24591)
## Summary Generated memory rows and their stage-one/stage-two job state currently live in `state_5.sqlite` alongside thread metadata. That makes memory cleanup and regeneration share the main state schema even though those rows are memory-pipeline data and can be rebuilt independently from the durable thread records. This PR moves the memory-owned tables into a dedicated `memories_1.sqlite` runtime database while keeping thread metadata in `state_5.sqlite`. ## Changes - Adds a separate memories DB runtime, migrator, path helpers, telemetry kind, and Bazel compile data for `state/memory_migrations`. - Introduces `MemoryStore` behind `StateRuntime::memories()` and moves memory table/job operations onto that store. - Drops the old memory tables from the state DB and recreates their schema in `state/memory_migrations/0001_memories.sql`. - Updates memory startup, citation usage tracking, rollout pollution handling, `debug clear-memories`, and app-server `memory/reset` to operate through the memories DB. - Preserves cross-DB behavior by hydrating thread metadata from the state DB when selecting visible memory outputs and checking stage-one staleness. ## Verification - Added/updated `codex-state` tests for deleted-thread memory visibility and already-polluted phase-two enqueue behavior. - Updated `debug clear-memories`, app-server `memory/reset`, and memories startup tests to seed and assert memory rows through `memories_1.sqlite`.
jif-oai ·
2026-05-26 20:07:25 +02:00 -
chore: move memory prompt builder into extension (#24558)
## Why The memories extension now owns the read-path developer instructions it injects at thread start. Keeping that prompt builder and template in `codex-memories-read` left the extension depending on a helper crate for extension-specific prompt assembly, and kept async template/truncation dependencies in the read crate after the remaining read surface no longer needed them. ## What changed - Moved `prompts.rs`, its tests, and `templates/memories/read_path.md` from `memories/read` into `ext/memories`. - Wired `MemoryExtension` to call the local prompt builder and added the moved templates to `ext/memories/BUILD.bazel` compile data. - Removed the now-unused prompt export and prompt-related dependencies from `codex-memories-read`. ## Testing - Not run locally.
jif-oai ·
2026-05-26 11:53:47 +02:00 -
chore: drop orphaned codex memories MCP crate (#24555)
## Why The memory read-tool surface had two implementations: the app-server extension path under `ext/memories`, and an unused `codex-memories-mcp` workspace crate under `memories/mcp`. The MCP crate no longer has reverse dependents, so keeping it around preserves duplicate backend, schema, and tool code that is not part of the live app-server memory path. Dropping the orphaned crate makes the remaining memory crate split clearer: `memories/read` owns read-path prompt/citation helpers, `memories/write` owns the write pipeline, and `ext/memories` owns the app-server extension integration. ## What changed - Removed the `memories/mcp` crate and its Bazel/Cargo metadata. - Removed `memories/mcp` from the Rust workspace and lockfile. - Updated `memories/README.md` so it only lists the remaining reusable memory crates. ## Verification - `cargo metadata --format-version 1 --no-deps` succeeds.
jif-oai ·
2026-05-26 11:29:37 +02:00 -
[5 of 7] Replace OverrideTurnContext with ThreadSettings (#22508)
**Stack position:** [5 of 7] ## Summary This PR adds `Op::ThreadSettings`, a queued settings-only update mechanism for changing stored thread settings without starting a new turn. It also removes the legacy `Op::OverrideTurnContext` in the same layer, so reviewers can see the replacement and deletion together. ## Changes - Add `Op::ThreadSettings` for settings-only queued updates. - Emit `ThreadSettingsApplied` with the effective thread settings snapshot after core applies an update. - Route settings-only updates through the same submission queue as user input. - Migrate remaining `OverrideTurnContext` tests and callers to the queued `Op::ThreadSettings` path. - Delete `Op::OverrideTurnContext` from the core protocol and submission loop. This stack addresses #20656 and #22090. ## Stack 1. [1 of 7] [Add thread settings to UserInput](https://github.com/openai/codex/pull/23080) 2. [2 of 7] [Remove UserInputWithTurnContext](https://github.com/openai/codex/pull/23081) 3. [3 of 7] [Remove UserTurn](https://github.com/openai/codex/pull/23075) 4. [4 of 7] [Placeholder for OverrideTurnContext cleanup](https://github.com/openai/codex/pull/23087) 5. [5 of 7] [Replace OverrideTurnContext with ThreadSettings](https://github.com/openai/codex/pull/22508) (this PR) 6. [6 of 7] [Add app-server thread settings API](https://github.com/openai/codex/pull/22509) 7. [7 of 7] [Sync TUI thread settings](https://github.com/openai/codex/pull/22510)
Eric Traut ·
2026-05-18 21:03:51 -07:00 -
[1 of 7] Add thread settings to UserInput (#23080)
**Stack position:** [1 of 7] ## Summary The first three PRs in this stack are a cleanup pass before the actual thread settings API work. Today, core has several overlapping "user input" ops: `UserInput`, `UserInputWithTurnContext`, and `UserTurn`. They differ mostly in how much next-turn state they carry, which makes the later queued thread settings update harder to reason about and review. This PR starts that cleanup by adding the shared `ThreadSettingsOverrides` payload and allowing `Op::UserInput` to carry it. Existing variants remain in place here, so this layer is mostly a behavior-preserving API shape change plus mechanical constructor updates. ## End State After PR3 By the end of PR3, `Op::UserInput` is the only "user input" core op. It can carry optional thread settings overrides for callers that need to update stored defaults with a turn, while callers without updates use empty settings. `Op::UserInputWithTurnContext` and `Op::UserTurn` are deleted. ## End State After PR5 By the end of PR5, core will have only two ops for this area: - `Op::UserInput` for user-input-bearing submissions. - `Op::ThreadSettings` for settings-only updates. ## Stack 1. [1 of 7] [Add thread settings to UserInput](https://github.com/openai/codex/pull/23080) (this PR) 2. [2 of 7] [Remove UserInputWithTurnContext](https://github.com/openai/codex/pull/23081) 3. [3 of 7] [Remove UserTurn](https://github.com/openai/codex/pull/23075) 4. [4 of 7] [Placeholder for OverrideTurnContext cleanup](https://github.com/openai/codex/pull/23087) 5. [5 of 7] [Replace OverrideTurnContext with ThreadSettings](https://github.com/openai/codex/pull/22508) 6. [6 of 7] [Add app-server thread settings API](https://github.com/openai/codex/pull/22509) 7. [7 of 7] [Sync TUI thread settings](https://github.com/openai/codex/pull/22510)
Eric Traut ·
2026-05-18 18:48:35 -07:00 -
jif-oai ·
2026-05-18 19:25:27 +02:00 -
Densify and version memory summaries (#23148)
## Why `memory_summary.md` is injected into every session, so its value depends on staying compact, navigational, and easy to regenerate when the expected shape changes. The previous consolidation prompt encouraged a broad actionable inventory and allowed older summary structures to be patched in place, which makes it easier for stale or overly verbose summaries to keep accumulating. This change makes the summary format explicitly versioned and biases Phase 2 memory consolidation toward denser prompt-loaded context. ## What changed - Require `memory_summary.md` to begin with an exact `v1` header. - Teach consolidation to regenerate `memory_summary.md` from scratch when the header is missing or incompatible, while still allowing incremental updates to `MEMORY.md`. - Tighten the `memory_summary.md` instructions so it acts as a compact routing/index layer instead of a second handbook. - Lower `MEMORY_TOOL_DEVELOPER_INSTRUCTIONS_SUMMARY_TOKEN_LIMIT` from `5_000` to `2_500` so the runtime prompt budget matches the denser summary target. ## Verification Not run; this is a prompt/template update plus a prompt budget constant change.
jif-oai ·
2026-05-18 09:59:34 +02:00 -
chore: drop built-in MCPs (#22173)
Drop something that was never used
jif-oai ·
2026-05-11 19:45:08 +02:00 -
[codex] request desktop attestation from app (#20619)
## Summary TL;DR: teaches `codex-rs` / app-server to request a desktop-provided attestation token and attach it as `x-oai-attestation` on the scoped ChatGPT Codex request paths.  ## Details This PR teaches the Codex app-server runtime how to request and attach an attestation token. It does not generate DeviceCheck tokens directly; instead, it relies on the connected desktop app to advertise that it can generate attestation and then asks that app for a fresh header value when needed. The flow is: 1. The Codex desktop app connects to app-server. 2. During `initialize`, the app can advertise that it supports `requestAttestation`. 3. Before app-server calls selected ChatGPT Codex endpoints, it sends the internal server request `attestation/generate` to the app. 4. app-server receives a pre-encoded header value back. 5. app-server forwards that value as `x-oai-attestation` on the scoped outbound requests. The code in this repo is mostly protocol and runtime plumbing: it adds the app-server request/response shape, introduces an attestation provider in core, wires that provider into Responses / compaction / realtime setup paths, and covers the intended scoping with tests. The signed macOS DeviceCheck generation remains owned by the desktop app PR. ## Related PR - Codex desktop app implementation: https://github.com/openai/openai/pull/878649 ## Validation <details> <summary>Tests run</summary> ```sh cargo test -p codex-app-server-protocol cargo test -p codex-core attestation --lib cargo test -p codex-app-server --lib attestation ``` Also ran: ```sh just fix -p codex-core just fix -p codex-app-server just fix -p codex-app-server-protocol just fmt just write-app-server-schema ``` </details> <details> <summary>E2E DeviceCheck validation</summary> First validated the signed desktop app boundary directly: launched a packaged signed `Codex.app`, sent `attestation/generate`, decoded the returned `v1.` attestation header, and validated the extracted DeviceCheck token with `personal/jm/verify_devicecheck_token.py` using bundle ID `com.openai.codex`. Apple returned `status_code: 200` and `is_ok: true`. Then ran the fuller app + app-server flow. The packaged `Codex.app` launched a current-branch app-server via `CODEX_CLI_PATH`, and a local MITM proxy intercepted outbound `chatgpt.com` traffic. The app-server requested `attestation/generate` from the real Electron app process, and the intercepted `/backend-api/codex/responses` traffic included `x-oai-attestation` on both routes: ```text GET /backend-api/codex/responses Upgrade: websocket x-oai-attestation: present POST /backend-api/codex/responses Upgrade: none x-oai-attestation: present ``` The captured header decoded to a DeviceCheck token that also validated with Apple for `com.openai.codex` (`status_code: 200`, `is_ok: true`, team `2DC432GLL2`). </details> --------- Co-authored-by: Codex <noreply@openai.com>
Jiaming Zhang ·
2026-05-08 12:36:02 -07:00 -
nit: comment (#21763)
Because of an async discussion
jif-oai ·
2026-05-08 17:15:46 +02:00 -
Disable empty Cargo test targets (#21584)
## Summary `cargo test` has entails both running standard Rust tests and doctests. It turns out that the doctest discovery is fairly slow, and it's a cost you pay even for crates that don't include any doctests. This PR disables doctests with `doctest = false` for crates that lack any doctests. For the collection of crates below, this speeds up test execution by >4x. E.g., before this PR: ``` Benchmark 1: cargo test -p codex-utils-absolute-path -p codex-utils-cache -p codex-utils-cli -p codex-utils-home-dir -p codex-utils-output-truncation -p codex-utils-path -p codex-utils-string -p codex-utils-template -p codex-utils-elapsed -p codex-utils-json-to-toml Time (mean ± σ): 1.849 s ± 4.455 s [User: 0.752 s, System: 1.367 s] Range (min … max): 0.418 s … 14.529 s 10 runs ``` And after: ``` Benchmark 1: cargo test -p codex-utils-absolute-path -p codex-utils-cache -p codex-utils-cli -p codex-utils-home-dir -p codex-utils-output-truncation -p codex-utils-path -p codex-utils-string -p codex-utils-template -p codex-utils-elapsed -p codex-utils-json-to-toml Time (mean ± σ): 428.6 ms ± 6.9 ms [User: 187.7 ms, System: 219.7 ms] Range (min … max): 418.0 ms … 436.8 ms 10 runs ``` For a single crate, with >2x speedup, before: ``` Benchmark 1: cargo test -p codex-utils-string Time (mean ± σ): 491.1 ms ± 9.0 ms [User: 229.8 ms, System: 234.9 ms] Range (min … max): 480.9 ms … 512.0 ms 10 runs ``` And after: ``` Benchmark 1: cargo test -p codex-utils-string Time (mean ± σ): 213.9 ms ± 4.3 ms [User: 112.8 ms, System: 84.0 ms] Range (min … max): 206.8 ms … 221.0 ms 13 runs ``` Co-authored-by: Codex <noreply@openai.com>
Charlie Marsh ·
2026-05-07 15:44:17 -07:00 -
feat: make built-in MCPs first-class runtime servers (#21356)
## DISCLAIMER This is experimental and no production service must rely on this ## Why Built-in MCPs are product-owned runtime capabilities, but they were previously flattened into the same config-backed stdio path as user-configured servers. That made them depend on a hidden `codex builtin-mcp` re-exec path, exposed them through config-oriented CLI flows, and erased distinctions the runtime needs to preserve—most notably whether an MCP call should count as external context for memory-mode pollution. ## What changed - Model product-owned built-ins separately from config-backed MCP servers via `BuiltinMcpServer` and `EffectiveMcpServer`. - Launch built-ins in process through a reusable async transport instead of the hidden `builtin-mcp` stdio subcommand. - Keep config-oriented CLI operations such as `codex mcp list/get/login/logout` scoped to configured servers, while merging built-ins only into the effective runtime server set. - Retain server metadata after launch so parallel-tool support and context classification come from the live server set; built-in `memories` is now classified as local Codex state rather than external context. ## Test plan - `cargo test -p codex-mcp` - `cargo test -p codex-core --test suite builtin_memories_mcp_call_does_not_mark_thread_memory_mode_polluted_when_configured` --------- Co-authored-by: Codex <noreply@openai.com>
jif-oai ·
2026-05-07 10:36:32 +02:00 -
2- Use string service tiers in session protocol (#20971)
## Summary - break service tier session/op/app-server protocol fields from the closed enum to string tier ids - send the service tier string directly through model requests, prewarm, compaction, memories, and TUI/app-server turn starts - regenerate app-server protocol JSON/TypeScript schemas, removing the standalone ServiceTier TS enum ## Verification - just fmt - cargo check -p codex-core -p codex-app-server -p codex-tui - just write-app-server-schema --------- Co-authored-by: Codex <noreply@openai.com>
Ahmed Ibrahim ·
2026-05-06 18:00:21 +03:00 -
chore: spawn MCP for memories (#21214)
Co-authored-by: Codex <noreply@openai.com>
jif-oai ·
2026-05-06 15:05:54 +02:00 -
feat: add
session_id(#20437)## Summary Related to https://openai.slack.com/archives/C095U48JNL9/p1777537279707449 TLDR: We update the meaning of session ids and thread ids: * thread_id stays as now * session_id become a shared id between every thread under a /root thread (i.e. every sub-agent share the same session id) This PR introduces an explicit `SessionId` and threads it through the protocol/client boundary so `session_id` and `thread_id` can diverge when they need to, while preserving compatibility for older serialized `session_configured` events. --------- Co-authored-by: Codex <noreply@openai.com>
jif-oai ·
2026-05-06 10:48:37 +02:00 -
[codex-analytics] rework thread_source for thread analytics (#20949)
## Summary - make `thread_source` an explicit optional thread-level field on `thread/start`, `thread/fork`, and returned thread payloads - persist `thread_source` in rollout/session metadata so resumed live threads retain the original value - replace the old best-effort `session_source` -> `thread_source` mapping with an explicit caller-supplied analytics classification ## Why Before this change, analytics `thread_source` was populated by a best-effort mapping from `session_source`. `session_source` describes the runtime/client surface, not the actual thread-level origin, so that projection was not accurate enough to distinguish cases such as `user`, `subagent`, `memory_consolidation`, and future thread origins reliably. Making `thread_source` explicit keeps one thread-level analytics field while letting callers provide the real classification directly instead of recovering it indirectly from `session_source`. ## Impact For new analytics events, `thread_source` now reflects the explicit thread-level classification supplied by the caller rather than an inferred value derived from `session_source`. Existing protocol fields remain optional; callers that omit `threadSource` now produce `null` instead of a best-effort inferred value. ## Validation - `just write-app-server-schema` - `cargo test -p codex-analytics -p codex-core -p codex-app-server-protocol --no-run` - `cargo test -p codex-app-server-protocol generated_ts_optional_nullable_fields_only_in_params` - `cargo test -p codex-analytics thread_initialized_event_serializes_expected_shape` - `cargo test -p codex-core resume_stopped_thread_from_rollout_preserves_thread_source`
rhan-oai ·
2026-05-06 02:12:31 +00:00 -
feat: add normalized matching to memory search (#21205)
## Why Memory search currently treats separators literally, so callers need to know whether a stored term uses spaces, hyphens, or no separators at all. That makes recall brittle for terms such as `MultiAgentV2` vs. `multi agent v2` and `cold-resume` vs. `cold resume`. ## What changed - Add an opt-in `normalized` mode to memory search that removes non-alphanumeric separators after any requested case folding. - Thread the new flag through the MCP `search` tool into the local backend while keeping existing literal matching as the default. - Reject queries that normalize to an empty string, and add regression coverage for both normalized matching and that validation path. ## Testing - `cargo test -p codex-memories-mcp`
jif-oai ·
2026-05-05 17:33:07 +02:00 -
feat: support windowed multi-query memory search (#21204)
## Why Memory search currently supports either independent substring matches or requiring every query to appear on the same line. That is too restrictive for memory files where related terms often land on nearby lines in the same note or bullet block. ## What changed - Replace the old `all` match mode with explicit tagged modes: `all_on_same_line` and `all_within_lines { line_count }`. - Add windowed matching in `codex-rs/memories/mcp/src/local.rs` so callers can require every query to appear within a bounded line range while returning only the minimal qualifying windows. - Reject invalid zero-width windows and update the MCP tool description plus argument parsing to expose the new mode. - Add coverage for same-line matching, windowed matching, and invalid `line_count` input. ## Verification - Added targeted coverage in `codex-rs/memories/mcp/src/local_tests.rs` for `search_supports_all_within_lines_match_mode` and `search_rejects_zero_line_window`. - Added server-side parsing coverage in `codex-rs/memories/mcp/src/server.rs` for `search_args_accept_windowed_all_match_mode`.jif-oai ·
2026-05-05 17:15:21 +02:00