Commit Graph

231 Commits

  • feat: make sandbox read access configurable with ReadOnlyAccess (#11387)
    `SandboxPolicy::ReadOnly` previously implied broad read access and could
    not express a narrower read surface.
    This change introduces an explicit read-access model so we can support
    user-configurable read restrictions in follow-up work, while preserving
    current behavior today.
    
    It also ensures unsupported backends fail closed for restricted-read
    policies instead of silently granting broader access than intended.
    
    ## What
    
    - Added `ReadOnlyAccess` in protocol with:
      - `Restricted { include_platform_defaults, readable_roots }`
      - `FullAccess`
    - Updated `SandboxPolicy` to carry read-access configuration:
      - `ReadOnly { access: ReadOnlyAccess }`
      - `WorkspaceWrite { ..., read_only_access: ReadOnlyAccess }`
    - Preserved existing behavior by defaulting current construction paths
    to `ReadOnlyAccess::FullAccess`.
    - Threaded the new fields through sandbox policy consumers and call
    sites across `core`, `tui`, `linux-sandbox`, `windows-sandbox`, and
    related tests.
    - Updated Seatbelt policy generation to honor restricted read roots by
    emitting scoped read rules when full read access is not granted.
    - Added fail-closed behavior on Linux and Windows backends when
    restricted read access is requested but not yet implemented there
    (`UnsupportedOperation`).
    - Regenerated app-server protocol schema and TypeScript artifacts,
    including `ReadOnlyAccess`.
    
    ## Compatibility / rollout
    
    - Runtime behavior remains unchanged by default (`FullAccess`).
    - API/schema changes are in place so future config wiring can enable
    restricted read access without another policy-shape migration.
  • feat: split codex-common into smaller utils crates (#11422)
    We are removing feature-gated shared crates from the `codex-rs`
    workspace. `codex-common` grouped several unrelated utilities behind
    `[features]`, which made dependency boundaries harder to reason about
    and worked against the ongoing effort to eliminate feature flags from
    workspace crates.
    
    Splitting these utilities into dedicated crates under `utils/` aligns
    this area with existing workspace structure and keeps each dependency
    explicit at the crate boundary.
    
    ## What changed
    
    - Removed `codex-rs/common` (`codex-common`) from workspace members and
    workspace dependencies.
    - Added six new utility crates under `codex-rs/utils/`:
      - `codex-utils-cli`
      - `codex-utils-elapsed`
      - `codex-utils-sandbox-summary`
      - `codex-utils-approval-presets`
      - `codex-utils-oss`
      - `codex-utils-fuzzy-match`
    - Migrated the corresponding modules out of `codex-common` into these
    crates (with tests), and added matching `BUILD.bazel` targets.
    - Updated direct consumers to use the new crates instead of
    `codex-common`:
      - `codex-rs/cli`
      - `codex-rs/tui`
      - `codex-rs/exec`
      - `codex-rs/app-server`
      - `codex-rs/mcp-server`
      - `codex-rs/chatgpt`
      - `codex-rs/cloud-tasks`
    - Updated workspace lockfile entries to reflect the new dependency graph
    and removal of `codex-common`.
  • chore: persist turn_id in rollout session and make turn_id uuid based (#11246)
    Problem:
    1. turn id is constructed in-memory;
    2. on resuming threads, turn_id might not be unique;
    3. client cannot no the boundary of a turn from rollout files easily.
    
    This PR does three things:
    1. persist `task_started` and `task_complete` events;
    1. persist `turn_id` in rollout turn events;
    5. generate turn_id as unique uuids instead of incrementing it in
    memory.
    
    This helps us resolve the issue of clients wanting to have unique turn
    ids for resuming a thread, and knowing the boundry of each turn in
    rollout files.
    
    example debug logs
    ```
    2026-02-11T00:32:10.746876Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=8 turn=Turn { id: "019c4a07-d809-74c3-bc4b-fd9618487b4b", items: [UserMessage { id: "item-24", content: [Text { text: "hi", text_elements: [] }] }, AgentMessage { id: "item-25", text: "Hi. I’m in the workspace with your current changes loaded and ready. Send the next task and I’ll execute it end-to-end." }], status: Completed, error: None }
    2026-02-11T00:32:10.746888Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=9 turn=Turn { id: "019c4a18-1004-76c0-a0fb-a77610f6a9b8", items: [UserMessage { id: "item-26", content: [Text { text: "hello", text_elements: [] }] }, AgentMessage { id: "item-27", text: "Hello. Ready for the next change in `codex-rs`; I can continue from the current in-progress diff or start a new task." }], status: Completed, error: None }
    2026-02-11T00:32:10.746899Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=10 turn=Turn { id: "019c4a19-41f0-7db0-ad78-74f1503baeb8", items: [UserMessage { id: "item-28", content: [Text { text: "hello", text_elements: [] }] }, AgentMessage { id: "item-29", text: "Hello. Send the specific change you want in `codex-rs`, and I’ll implement it and run the required checks." }], status: Completed, error: None }
    ```
    
    backward compatibility:
    if you try to resume an old session without task_started and
    task_complete event populated, the following happens:
    - If you resume and do nothing: those reconstructed historical IDs can
    differ next time you resume.
    - If you resume and send a new turn: the new turn gets a fresh UUID from
    live submission flow and is persisted, so that new turn’s ID is stable
    on later resumes.
    I think this behavior is fine, because we only care about deterministic
    turn id once a turn is triggered.
  • Fix: update parallel tool call exec approval to approve on request id (#11162)
    ### Summary
    
    In parallel tool call, exec command approvals were not approved at
    request level but at a turn level. i.e. when a single request is
    approved, the system currently treats all requests in turn as approved.
    
    ### Before
    
    https://github.com/user-attachments/assets/d50ed129-b3d2-4b2f-97fa-8601eb11f6a8
    
    ### After
    
    https://github.com/user-attachments/assets/36528a43-a4aa-4775-9e12-f13287ef19fc
  • feat: retain NetworkProxy, when appropriate (#11207)
    As of this PR, `SessionServices` retains a
    `Option<StartedNetworkProxy>`, if appropriate.
    
    Now the `network` field on `Config` is `Option<NetworkProxySpec>`
    instead of `Option<NetworkProxy>`.
    
    Over in `Session::new()`, we invoke `NetworkProxySpec::start_proxy()` to
    create the `StartedNetworkProxy`, which is a new struct that retains the
    `NetworkProxy` as well as the `NetworkProxyHandle`. (Note that `Drop` is
    implemented for `NetworkProxyHandle` to ensure the proxies are shutdown
    when it is dropped.)
    
    The `NetworkProxy` from the `StartedNetworkProxy` is threaded through to
    the appropriate places.
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/11207).
    * #11285
    * __->__ #11207
  • test: deflake nextest child-process leak in MCP harnesses (#11263)
    ## Summary
    - add deterministic child-process cleanup to both test `McpProcess`
    helpers
    - keep Tokio `kill_on_drop(true)` but also reap via bounded `try_wait()`
    polling in `Drop`
    - document the failure mode and why this avoids nondeterministic `LEAK`
    flakes
    
    ## Why
    `cargo nextest` leak detection can intermittently report `LEAK` when a
    spawned server outlives test teardown, making CI flaky.
    
    ## Testing
    - `just fmt`
    - `cargo test -p codex-app-server`
    - `cargo test -p codex-mcp-server`
    
    
    ## Failing CI Reference
    - Original failing job:
    https://github.com/openai/codex/actions/runs/21845226299/job/63039443593?pr=11245
  • Upgrade rmcp to 0.14 (#10718)
    - [x] Upgrade rmcp to 0.14
  • Fix flaky windows CI test (#10993)
    Hardens PTY Python REPL test and make MCP test startup deterministic
    
    **Summary**
    - `utils/pty/src/tests.rs`
    - Added a REPL readiness handshake (`wait_for_python_repl_ready`) that
    repeatedly sends a marker and waits for it in PTY output before sending
    test commands.
      - Updated `pty_python_repl_emits_output_and_exits` to:
        - wait for readiness first,
        - preserve startup output,
        - append output collected through process exit.
    - Reduces Windows/ConPTY flakiness from early stdin writes racing REPL
    startup.
    
    - `mcp-server/tests/suite/codex_tool.rs`
    - Avoid remote model refresh during MCP test startup, reducing
    timeout-prone nondeterminism.
  • Add resume_agent collab tool (#10903)
    Summary
    - add the new resume_agent collab tool path through core, protocol, and
    the app server API, including the resume events
    - update the schema/TypeScript definitions plus docs so resume_agent
    appears in generated artifacts and README
    - note that resumed agents rehydrate rollout history without overwriting
    their base instructions
    
    Testing
    - Not run (not requested)
  • feat: add APIs to list and download public remote skills (#10448)
    Add API to list / download from remote public skills
  • feat: replace custom mcp-types crate with equivalents from rmcp (#10349)
    We started working with MCP in Codex before
    https://crates.io/crates/rmcp was mature, so we had our own crate for
    MCP types that was generated from the MCP schema:
    
    
    https://github.com/openai/codex/blob/8b95d3e082376f4cb23e92641705a22afb28a9da/codex-rs/mcp-types/README.md
    
    Now that `rmcp` is more mature, it makes more sense to use their MCP
    types in Rust, as they handle details (like the `_meta` field) that our
    custom version ignored. Though one advantage that our custom types had
    is that our generated types implemented `JsonSchema` and `ts_rs::TS`,
    whereas the types in `rmcp` do not. As such, part of the work of this PR
    is leveraging the adapters between `rmcp` types and the serializable
    types that are API for us (app server and MCP) introduced in #10356.
    
    Note this PR results in a number of changes to
    `codex-rs/app-server-protocol/schema`, which merit special attention
    during review. We must ensure that these changes are still
    backwards-compatible, which is possible because we have:
    
    ```diff
    - export type CallToolResult = { content: Array<ContentBlock>, isError?: boolean, structuredContent?: JsonValue, };
    + export type CallToolResult = { content: Array<JsonValue>, structuredContent?: JsonValue, isError?: boolean, _meta?: JsonValue, };
    ```
    
    so `ContentBlock` has been replaced with the more general `JsonValue`.
    Note that `ContentBlock` was defined as:
    
    ```typescript
    export type ContentBlock = TextContent | ImageContent | AudioContent | ResourceLink | EmbeddedResource;
    ```
    
    so the deletion of those individual variants should not be a cause of
    great concern.
    
    Similarly, we have the following change in
    `codex-rs/app-server-protocol/schema/typescript/Tool.ts`:
    
    ```
    - export type Tool = { annotations?: ToolAnnotations, description?: string, inputSchema: ToolInputSchema, name: string, outputSchema?: ToolOutputSchema, title?: string, };
    + export type Tool = { name: string, title?: string, description?: string, inputSchema: JsonValue, outputSchema?: JsonValue, annotations?: JsonValue, icons?: Array<JsonValue>, _meta?: JsonValue, };
    ```
    
    so:
    
    - `annotations?: ToolAnnotations` ➡️ `JsonValue`
    - `inputSchema: ToolInputSchema` ➡️ `JsonValue`
    - `outputSchema?: ToolOutputSchema` ➡️ `JsonValue`
    
    and two new fields: `icons?: Array<JsonValue>, _meta?: JsonValue`
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/10349).
    * #10357
    * __->__ #10349
    * #10356
  • Plan mode: stream proposed plans, emit plan items, and render in TUI (#9786)
    ## Summary
    - Stream proposed plans in Plan Mode using `<proposed_plan>` tags parsed
    in core, emitting plan deltas plus a plan `ThreadItem`, while stripping
    tags from normal assistant output.
    - Persist plan items and rebuild them on resume so proposed plans show
    in thread history.
    - Wire plan items/deltas through app-server protocol v2 and render a
    dedicated proposed-plan view in the TUI, including the “Implement this
    plan?” prompt only when a plan item is present.
    
    ## Changes
    
    ### Core (`codex-rs/core`)
    - Added a generic, line-based tag parser that buffers each line until it
    can disprove a tag prefix; implements auto-close on `finish()` for
    unterminated tags. `codex-rs/core/src/tagged_block_parser.rs`
    - Refactored proposed plan parsing to wrap the generic parser.
    `codex-rs/core/src/proposed_plan_parser.rs`
    - In plan mode, stream assistant deltas as:
      - **Normal text** → `AgentMessageContentDelta`
      - **Plan text** → `PlanDelta` + `TurnItem::Plan` start/completion  
      (`codex-rs/core/src/codex.rs`)
    - Final plan item content is derived from the completed assistant
    message (authoritative), not necessarily the concatenated deltas.
    - Strips `<proposed_plan>` blocks from assistant text in plan mode so
    tags don’t appear in normal messages.
    (`codex-rs/core/src/stream_events_utils.rs`)
    - Persist `ItemCompleted` events only for plan items for rollout replay.
    (`codex-rs/core/src/rollout/policy.rs`)
    - Guard `update_plan` tool in Plan Mode with a clear error message.
    (`codex-rs/core/src/tools/handlers/plan.rs`)
    - Updated Plan Mode prompt to:  
      - keep `<proposed_plan>` out of non-final reasoning/preambles  
      - require exact tag formatting  
      - allow only one `<proposed_plan>` block per turn  
      (`codex-rs/core/templates/collaboration_mode/plan.md`)
    
    ### Protocol / App-server protocol
    - Added `TurnItem::Plan` and `PlanDeltaEvent` to core protocol items.
    (`codex-rs/protocol/src/items.rs`, `codex-rs/protocol/src/protocol.rs`)
    - Added v2 `ThreadItem::Plan` and `PlanDeltaNotification` with
    EXPERIMENTAL markers and note that deltas may not match the final plan
    item. (`codex-rs/app-server-protocol/src/protocol/v2.rs`)
    - Added plan delta route in app-server protocol common mapping.
    (`codex-rs/app-server-protocol/src/protocol/common.rs`)
    - Rebuild plan items from persisted `ItemCompleted` events on resume.
    (`codex-rs/app-server-protocol/src/protocol/thread_history.rs`)
    
    ### App-server
    - Forward plan deltas to v2 clients and map core plan items to v2 plan
    items. (`codex-rs/app-server/src/bespoke_event_handling.rs`,
    `codex-rs/app-server/src/codex_message_processor.rs`)
    - Added v2 plan item tests.
    (`codex-rs/app-server/tests/suite/v2/plan_item.rs`)
    
    ### TUI
    - Added a dedicated proposed plan history cell with special background
    and padding, and moved “• Proposed Plan” outside the highlighted block.
    (`codex-rs/tui/src/history_cell.rs`, `codex-rs/tui/src/style.rs`)
    - Only show “Implement this plan?” when a plan item exists.
    (`codex-rs/tui/src/chatwidget.rs`,
    `codex-rs/tui/src/chatwidget/tests.rs`)
    
    <img width="831" height="847" alt="Screenshot 2026-01-29 at 7 06 24 PM"
    src="https://github.com/user-attachments/assets/69794c8c-f96b-4d36-92ef-c1f5c3a8f286"
    />
    
    ### Docs / Misc
    - Updated protocol docs to mention plan deltas.
    (`codex-rs/docs/protocol_v1.md`)
    - Minor plumbing updates in exec/debug clients to tolerate plan deltas.
    (`codex-rs/debug-client/src/reader.rs`, `codex-rs/exec/...`)
    
    ## Tests
    - Added core integration tests:
      - Plan mode strips plan from agent messages.
      - Missing `</proposed_plan>` closes at end-of-message.  
      (`codex-rs/core/tests/suite/items.rs`)
    - Added unit tests for generic tag parser (prefix buffering, non-tag
    lines, auto-close). (`codex-rs/core/src/tagged_block_parser.rs`)
    - Existing app-server plan item tests in v2.
    (`codex-rs/app-server/tests/suite/v2/plan_item.rs`)
    
    ## Notes / Behavior
    - Plan output no longer appears in standard assistant text in Plan Mode;
    it streams via `PlanDelta` and completes as a `TurnItem::Plan`.
    - The final plan item content is authoritative and may diverge from
    streamed deltas (documented as experimental).
    - Reasoning summaries are not filtered; prompt instructs the model not
    to include `<proposed_plan>` outside the final plan message.
    
    ## Codex Author
    `codex fork 019bec2d-b09d-7450-b292-d7bcdddcdbfb`
  • Conversation naming (#8991)
    Session renaming:
    - `/rename my_session`
    - `/rename` without arg and passing an argument in `customViewPrompt`
    - AppExitInfo shows resume hint using the session name if set instead of
    uuid, defaults to uuid if not set
    - Names are stored in `CODEX_HOME/sessions.jsonl`
    
    Session resuming:
    - codex resume <name> lookup for `CODEX_HOME/sessions.jsonl` first entry
    matching the name and resumes the session
    
    ---------
    
    Co-authored-by: jif-oai <jif@openai.com>
  • feat: dynamic tools injection (#9539)
    ## Summary
    Add dynamic tool injection to thread startup in API v2, wire dynamic
    tool calls through the app server to clients, and plumb responses back
    into the model tool pipeline.
    
    ### Flow (high level)
    - Thread start injects `dynamic_tools` into the model tool list for that
    thread (validation is done here).
    - When the model emits a tool call for one of those names, core raises a
    `DynamicToolCallRequest` event.
    - The app server forwards it to the client as `item/tool/call`, waits
    for the client’s response, then submits a `DynamicToolResponse` back to
    core.
    - Core turns that into a `function_call_output` in the next model
    request so the model can continue.
    
    ### What changed
    - Added dynamic tool specs to v2 thread start params and protocol types;
    introduced `item/tool/call` (request/response) for dynamic tool
    execution.
    - Core now registers dynamic tool specs at request time and routes those
    calls via a new dynamic tool handler.
    - App server validates tool names/schemas, forwards dynamic tool call
    requests to clients, and publishes tool outputs back into the session.
    - Integration tests
  • feat: ephemeral threads (#9765)
    Add ephemeral threads capabilities. Only exposed through the
    `app-server` v2
    
    The idea is to disable the rollout recorder for those threads.
  • Persist text elements through TUI input and history (#9393)
    Continuation of breaking up this PR
    https://github.com/openai/codex/pull/9116
    
    ## Summary
    - Thread user text element ranges through TUI/TUI2 input, submission,
    queueing, and history so placeholders survive resume/edit flows.
    - Preserve local image attachments alongside text elements and rehydrate
    placeholders when restoring drafts.
    - Keep model-facing content shapes clean by attaching UI metadata only
    to user input/events (no API content changes).
    
    ## Key Changes
    - TUI/TUI2 composer now captures text element ranges, trims them with
    text edits, and restores them when submission is suppressed.
    - User history cells render styled spans for text elements and keep
    local image paths for future rehydration.
    - Initial chat widget bootstraps accept empty `initial_text_elements` to
    keep initialization uniform.
    - Protocol/core helpers updated to tolerate the new InputText field
    shape without changing payloads sent to the API.
  • Feat: request user input tool (#9472)
    ### Summary
    * Add `requestUserInput` tool that the model can use for gather
    feedback/asking question mid turn.
    
    
    ### Tool input schema
    ```
    {
      "$schema": "http://json-schema.org/draft-07/schema#",
      "title": "requestUserInput input",
      "type": "object",
      "additionalProperties": false,
      "required": ["questions"],
      "properties": {
        "questions": {
          "type": "array",
          "description": "Questions to show the user (1-3). Prefer 1 unless multiple independent decisions block progress.",
          "minItems": 1,
          "maxItems": 3,
          "items": {
            "type": "object",
            "additionalProperties": false,
            "required": ["id", "header", "question"],
            "properties": {
              "id": {
                "type": "string",
                "description": "Stable identifier for mapping answers (snake_case)."
              },
              "header": {
                "type": "string",
                "description": "Short header label shown in the UI (12 or fewer chars)."
              },
              "question": {
                "type": "string",
                "description": "Single-sentence prompt shown to the user."
              },
              "options": {
                "type": "array",
                "description": "Optional 2-3 mutually exclusive choices. Put the recommended option first and suffix its label with \"(Recommended)\". Only include \"Other\" option if we want to include a free form option. If the question is free form in nature, do not include any option.",
                "minItems": 2,
                "maxItems": 3,
                "items": {
                  "type": "object",
                  "additionalProperties": false,
                  "required": ["value", "label", "description"],
                  "properties": {
                    "value": {
                      "type": "string",
                      "description": "Machine-readable value (snake_case)."
                    },
                    "label": {
                      "type": "string",
                      "description": "User-facing label (1-5 words)."
                    },
                    "description": {
                      "type": "string",
                      "description": "One short sentence explaining impact/tradeoff if selected."
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
    ```
    
    ### Tool output schema
    ```
    {
      "$schema": "http://json-schema.org/draft-07/schema#",
      "title": "requestUserInput output",
      "type": "object",
      "additionalProperties": false,
      "required": ["answers"],
      "properties": {
        "answers": {
          "type": "object",
          "description": "Map of question id to user answer.",
          "additionalProperties": {
            "type": "object",
            "additionalProperties": false,
            "required": ["selected"],
            "properties": {
              "selected": {
                "type": "array",
                "items": { "type": "string" }
              },
              "other": {
                "type": ["string", "null"]
              }
            }
          }
        }
      }
    }
    ```
  • feat: show forked from session id in /status (#9330)
    Summary:
    - Add forked_from to SessionMeta/SessionConfiguredEvent and persist it
    for forked sessions.
    - Surface forked_from in /status for tui + tui2 and add snapshots.
  • Add text element metadata to types (#9235)
    Initial type tweaking PR to make the diff of
    https://github.com/openai/codex/pull/9116 smaller
    
    This should not change any behavior, just adds some fields to types
  • feat: emit events around collab tools (#9095)
    Emit the following events around the collab tools. On the `app-server`
    this will be under `item/started` and `item/completed`
    ```
    #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
    pub struct CollabAgentSpawnBeginEvent {
        /// Identifier for the collab tool call.
        pub call_id: String,
        /// Thread ID of the sender.
        pub sender_thread_id: ThreadId,
        /// Initial prompt sent to the agent. Can be empty to prevent CoT leaking at the
        /// beginning.
        pub prompt: String,
    }
    
    #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
    pub struct CollabAgentSpawnEndEvent {
        /// Identifier for the collab tool call.
        pub call_id: String,
        /// Thread ID of the sender.
        pub sender_thread_id: ThreadId,
        /// Thread ID of the newly spawned agent, if it was created.
        pub new_thread_id: Option<ThreadId>,
        /// Initial prompt sent to the agent. Can be empty to prevent CoT leaking at the
        /// beginning.
        pub prompt: String,
        /// Last known status of the new agent reported to the sender agent.
        pub status: AgentStatus,
    }
    
    #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
    pub struct CollabAgentInteractionBeginEvent {
        /// Identifier for the collab tool call.
        pub call_id: String,
        /// Thread ID of the sender.
        pub sender_thread_id: ThreadId,
        /// Thread ID of the receiver.
        pub receiver_thread_id: ThreadId,
        /// Prompt sent from the sender to the receiver. Can be empty to prevent CoT
        /// leaking at the beginning.
        pub prompt: String,
    }
    
    #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
    pub struct CollabAgentInteractionEndEvent {
        /// Identifier for the collab tool call.
        pub call_id: String,
        /// Thread ID of the sender.
        pub sender_thread_id: ThreadId,
        /// Thread ID of the receiver.
        pub receiver_thread_id: ThreadId,
        /// Prompt sent from the sender to the receiver. Can be empty to prevent CoT
        /// leaking at the beginning.
        pub prompt: String,
        /// Last known status of the receiver agent reported to the sender agent.
        pub status: AgentStatus,
    }
    
    #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
    pub struct CollabWaitingBeginEvent {
        /// Thread ID of the sender.
        pub sender_thread_id: ThreadId,
        /// Thread ID of the receiver.
        pub receiver_thread_id: ThreadId,
        /// ID of the waiting call.
        pub call_id: String,
    }
    
    #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
    pub struct CollabWaitingEndEvent {
        /// Thread ID of the sender.
        pub sender_thread_id: ThreadId,
        /// Thread ID of the receiver.
        pub receiver_thread_id: ThreadId,
        /// ID of the waiting call.
        pub call_id: String,
        /// Last known status of the receiver agent reported to the sender agent.
        pub status: AgentStatus,
    }
    
    #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
    pub struct CollabCloseBeginEvent {
        /// Identifier for the collab tool call.
        pub call_id: String,
        /// Thread ID of the sender.
        pub sender_thread_id: ThreadId,
        /// Thread ID of the receiver.
        pub receiver_thread_id: ThreadId,
    }
    
    #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
    pub struct CollabCloseEndEvent {
        /// Identifier for the collab tool call.
        pub call_id: String,
        /// Thread ID of the sender.
        pub sender_thread_id: ThreadId,
        /// Thread ID of the receiver.
        pub receiver_thread_id: ThreadId,
        /// Last known status of the receiver agent reported to the sender agent before
        /// the close.
        pub status: AgentStatus,
    }
    ```
  • feat: add threadId to MCP server messages (#9192)
    This favors `threadId` instead of `conversationId` so we use the same
    terms as https://developers.openai.com/codex/sdk/.
    
    To test the local build:
    
    ```
    cd codex-rs
    cargo build --bin codex
    npx -y @modelcontextprotocol/inspector ./target/debug/codex mcp-server
    ```
    
    I sent:
    
    ```json
    {
      "method": "tools/call",
      "params": {
        "name": "codex",
        "arguments": {
          "prompt": "favorite ls option?"
        },
        "_meta": {
          "progressToken": 0
        }
      }
    }
    ```
    
    and got:
    
    ```json
    {
      "content": [
        {
          "type": "text",
          "text": "`ls -lah` (or `ls -alh`) — long listing, includes dotfiles, human-readable sizes."
        }
      ],
      "structuredContent": {
        "threadId": "019bbb20-bff6-7130-83aa-bf45ab33250e"
      }
    }
    ```
    
    and successfully used the `threadId` in the follow-up with the
    `codex-reply` tool call:
    
    ```json
    {
      "method": "tools/call",
      "params": {
        "name": "codex-reply",
        "arguments": {
          "prompt": "what is the long versoin",
          "threadId": "019bbb20-bff6-7130-83aa-bf45ab33250e"
        },
        "_meta": {
          "progressToken": 1
        }
      }
    }
    ```
    
    whose response also has the `threadId`:
    
    ```json
    {
      "content": [
        {
          "type": "text",
          "text": "Long listing is `ls -l` (adds permissions, owner/group, size, timestamp)."
        }
      ],
      "structuredContent": {
        "threadId": "019bbb20-bff6-7130-83aa-bf45ab33250e"
      }
    }
    ```
    
    Fixes https://github.com/openai/codex/issues/3712.
  • Assemble sandbox/approval/network prompts dynamically (#8961)
    - Add a single builder for developer permissions messaging that accepts
    SandboxPolicy and approval policy. This builder now drives the developer
    “permissions” message that’s injected at session start and any time
    sandbox/approval settings change.
    - Trim EnvironmentContext to only include cwd, writable roots, and
    shell; removed sandbox/approval/network duplication and adjusted XML
    serialization and tests accordingly.
    
    Follow-up: adding a config value to replace the developer permissions
    message for custom sandboxes.
  • feat: add support for building with Bazel (#8875)
    This PR configures Codex CLI so it can be built with
    [Bazel](https://bazel.build) in addition to Cargo. The `.bazelrc`
    includes configuration so that remote builds can be done using
    [BuildBuddy](https://www.buildbuddy.io).
    
    If you are familiar with Bazel, things should work as you expect, e.g.,
    run `bazel test //... --keep-going` to run all the tests in the repo,
    but we have also added some new aliases in the `justfile` for
    convenience:
    
    - `just bazel-test` to run tests locally
    - `just bazel-remote-test` to run tests remotely (currently, the remote
    build is for x86_64 Linux regardless of your host platform). Note we are
    currently seeing the following test failures in the remote build, so we
    still need to figure out what is happening here:
    
    ```
    failures:
        suite::compact::manual_compact_twice_preserves_latest_user_messages
        suite::compact_resume_fork::compact_resume_after_second_compaction_preserves_history
        suite::compact_resume_fork::compact_resume_and_fork_preserve_model_history_view
    ```
    
    - `just build-for-release` to build release binaries for all
    platforms/architectures remotely
    
    To setup remote execution:
    - [Create a buildbuddy account](https://app.buildbuddy.io/) (OpenAI
    employees should also request org access at
    https://openai.buildbuddy.io/join/ with their `@openai.com` email
    address.)
    - [Copy your API key](https://app.buildbuddy.io/docs/setup/) to
    `~/.bazelrc` (add the line `build
    --remote_header=x-buildbuddy-api-key=YOUR_KEY`)
    - Use `--config=remote` in your `bazel` invocations (or add `common
    --config=remote` to your `~/.bazelrc`, or use the `just` commands)
    
    ## CI
    
    In terms of CI, this PR introduces `.github/workflows/bazel.yml`, which
    uses Bazel to run the tests _locally_ on Mac and Linux GitHub runners
    (we are working on supporting Windows, but that is not ready yet). Note
    that the failures we are seeing in `just bazel-remote-test` do not occur
    on these GitHub CI jobs, so everything in `.github/workflows/bazel.yml`
    is green right now.
    
    The `bazel.yml` uses extra config in `.github/workflows/ci.bazelrc` so
    that macOS CI jobs build _remotely_ on Linux hosts (using the
    `docker://docker.io/mbolin491/codex-bazel` Docker image declared in the
    root `BUILD.bazel`) using cross-compilation to build the macOS
    artifacts. Then these artifacts are downloaded locally to GitHub's macOS
    runner so the tests can be executed natively. This is the relevant
    config that enables this:
    
    ```
    common:macos --config=remote
    common:macos --strategy=remote
    common:macos --strategy=TestRunner=darwin-sandbox,local
    ```
    
    Because of the remote caching benefits we get from BuildBuddy, these new
    CI jobs can be extremely fast! For example, consider these two jobs that
    ran all the tests on Linux x86_64:
    
    - Bazel 1m37s
    https://github.com/openai/codex/actions/runs/20861063212/job/59940545209?pr=8875
    - Cargo 9m20s
    https://github.com/openai/codex/actions/runs/20861063192/job/59940559592?pr=8875
    
    For now, we will continue to run both the Bazel and Cargo jobs for PRs,
    but once we add support for Windows and running Clippy, we should be
    able to cutover to using Bazel exclusively for PRs, which should still
    speed things up considerably. We will probably continue to run the Cargo
    jobs post-merge for commits that land on `main` as a sanity check.
    
    Release builds will also continue to be done by Cargo for now.
    
    Earlier attempt at this PR: https://github.com/openai/codex/pull/8832
    Earlier attempt to add support for Buck2, now abandoned:
    https://github.com/openai/codex/pull/8504
    
    ---------
    
    Co-authored-by: David Zbarsky <dzbarsky@gmail.com>
    Co-authored-by: Michael Bolin <mbolin@openai.com>
  • chore: unify conversation with thread name (#8830)
    Done and verified by Codex + refactor feature of RustRover
  • feat(app-server): thread/rollback API (#8454)
    Add `thread/rollback` to app-server to support IDEs undo-ing the last N
    turns of a thread.
    
    For context, an IDE partner will be supporting an "undo" capability
    where the IDE (the app-server client) will be responsible for reverting
    the local changes made during the last turn. To support this well, we
    also need a way to drop the last turn (or more generally, the last N
    turns) from the agent's context. This is what `thread/rollback` does.
    
    **Core idea**: A Thread rollback is represented as a persisted event
    message (EventMsg::ThreadRollback) in the rollout JSONL file, not by
    rewriting history. On resume, both the model's context (core replay) and
    the UI turn list (app-server v2's thread history builder) apply these
    markers so the pruned history is consistent across live conversations
    and `thread/resume`.
    
    Implementation notes:
    - Rollback only affects agent context and appends to the rollout file;
    clients are responsible for reverting files on disk.
    - If a thread rollback is currently in progress, subsequent
    `thread/rollback` calls are rejected.
    - Because we use `CodexConversation::submit` and codex core tracks
    active turns, returning an error on concurrent rollbacks is communicated
    via an `EventMsg::Error` with a new variant
    `CodexErrorInfo::ThreadRollbackFailed`. app-server watches for that and
    sends the BAD_REQUEST RPC response.
    
    Tests cover thread rollbacks in both core and app-server, including when
    `num_turns` > existing turns (which clears all turns).
    
    **Note**: this explicitly does **not** behave like `/undo` which we just
    removed from the CLI, which does the opposite of what `thread/rollback`
    does. `/undo` reverts local changes via ghost commits/snapshots and does
    not modify the agent's context / conversation history.
  • fix: update model examples to gpt-5.2 (#8566)
    The models are outdated and sometime get used by GPT when it to try
    delegate.
    
    I have read the CLA Document and I hereby sign the CLA
  • feat: expose outputSchema to user_turn/turn_start app_server API (#8377)
    What changed
    - Added `outputSchema` support to the app-server APIs, mirroring `codex
    exec --output-schema` behavior.
    - V1 `sendUserTurn` now accepts `outputSchema` and constrains the final
    assistant message for that turn.
    - V2 `turn/start` now accepts `outputSchema` and constrains the final
    assistant message for that turn (explicitly per-turn only).
    
    Core behavior
    - `Op::UserTurn` already supported `final_output_json_schema`; now V1
    `sendUserTurn` forwards `outputSchema` into that field.
    - `Op::UserInput` now carries `final_output_json_schema` for per-turn
    settings updates; core maps it into
    `SessionSettingsUpdate.final_output_json_schema` so it applies to the
    created turn context.
    - V2 `turn/start` does NOT persist the schema via `OverrideTurnContext`
    (it’s applied only for the current turn). Other overrides
    (cwd/model/etc) keep their existing persistent behavior.
    
    API / docs
    - `codex-rs/app-server-protocol/src/protocol/v1.rs`: add `output_schema:
    Option<serde_json::Value>` to `SendUserTurnParams` (serialized as
    `outputSchema`).
    - `codex-rs/app-server-protocol/src/protocol/v2.rs`: add `output_schema:
    Option<JsonValue>` to `TurnStartParams` (serialized as `outputSchema`).
    - `codex-rs/app-server/README.md`: document `outputSchema` for
    `turn/start` and clarify it applies only to the current turn.
    - `codex-rs/docs/codex_mcp_interface.md`: document `outputSchema` for v1
    `sendUserTurn` and v2 `turn/start`.
    
    Tests added/updated
    - New app-server integration tests asserting `outputSchema` is forwarded
    into outbound `/responses` requests as `text.format`:
      - `codex-rs/app-server/tests/suite/output_schema.rs`
      - `codex-rs/app-server/tests/suite/v2/output_schema.rs`
    - Added per-turn semantics tests (schema does not leak to the next
    turn):
      - `send_user_turn_output_schema_is_per_turn_v1`
      - `turn_start_output_schema_is_per_turn_v2`
    - Added protocol wire-compat tests for the merged op:
      - serialize omits `final_output_json_schema` when `None`
      - deserialize works when field is missing
      - serialize includes `final_output_json_schema` when `Some(schema)`
    
    Call site updates (high level)
    - Updated all `Op::UserInput { .. }` constructions to include
    `final_output_json_schema`:
      - `codex-rs/app-server/src/codex_message_processor.rs`
      - `codex-rs/core/src/codex_delegate.rs`
      - `codex-rs/mcp-server/src/codex_tool_runner.rs`
      - `codex-rs/tui/src/chatwidget.rs`
      - `codex-rs/tui2/src/chatwidget.rs`
      - plus impacted core tests.
    
    Validation
    - `just fmt`
    - `cargo test -p codex-core`
    - `cargo test -p codex-app-server`
    - `cargo test -p codex-mcp-server`
    - `cargo test -p codex-tui`
    - `cargo test -p codex-tui2`
    - `cargo test -p codex-protocol`
    - `cargo clippy --all-features --tests --profile dev --fix -- -D
    warnings`
  • feat: introduce codex-utils-cargo-bin as an alternative to assert_cmd::Command (#8496)
    This PR introduces a `codex-utils-cargo-bin` utility crate that
    wraps/replaces our use of `assert_cmd::Command` and
    `escargot::CargoBuild`.
    
    As you can infer from the introduction of `buck_project_root()` in this
    PR, I am attempting to make it possible to build Codex under
    [Buck2](https://buck2.build) as well as `cargo`. With Buck2, I hope to
    achieve faster incremental local builds (largely due to Buck2's
    [dice](https://buck2.build/docs/insights_and_knowledge/modern_dice/)
    build strategy, as well as benefits from its local build daemon) as well
    as faster CI builds if we invest in remote execution and caching.
    
    See
    https://buck2.build/docs/getting_started/what_is_buck2/#why-use-buck2-key-advantages
    for more details about the performance advantages of Buck2.
    
    Buck2 enforces stronger requirements in terms of build and test
    isolation. It discourages assumptions about absolute paths (which is key
    to enabling remote execution). Because the `CARGO_BIN_EXE_*` environment
    variables that Cargo provides are absolute paths (which
    `assert_cmd::Command` reads), this is a problem for Buck2, which is why
    we need this `codex-utils-cargo-bin` utility.
    
    My WIP-Buck2 setup sets the `CARGO_BIN_EXE_*` environment variables
    passed to a `rust_test()` build rule as relative paths.
    `codex-utils-cargo-bin` will resolve these values to absolute paths,
    when necessary.
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/8496).
    * #8498
    * __->__ #8496
  • chore: cleanup Config instantiation codepaths (#8226)
    This PR does various types of cleanup before I can proceed with more
    ambitious changes to config loading.
    
    First, I noticed duplicated code across these two methods:
    
    
    https://github.com/openai/codex/blob/774bd9e432fa2e0f4e059e97648cf92216912e19/codex-rs/core/src/config/mod.rs#L314-L324
    
    
    https://github.com/openai/codex/blob/774bd9e432fa2e0f4e059e97648cf92216912e19/codex-rs/core/src/config/mod.rs#L334-L344
    
    This has now been consolidated in
    `load_config_as_toml_with_cli_overrides()`.
    
    Further, I noticed that `Config::load_with_cli_overrides()` took two
    similar arguments:
    
    
    https://github.com/openai/codex/blob/774bd9e432fa2e0f4e059e97648cf92216912e19/codex-rs/core/src/config/mod.rs#L308-L311
    
    The difference between `cli_overrides` and `overrides` was not
    immediately obvious to me. At first glance, it appears that one should
    be able to be expressed in terms of the other, but it turns out that
    some fields of `ConfigOverrides` (such as `cwd` and
    `codex_linux_sandbox_exe`) are, by design, not configurable via a
    `.toml` file or a command-line `--config` flag.
    
    That said, I discovered that many callers of
    `Config::load_with_cli_overrides()` were passing
    `ConfigOverrides::default()` for `overrides`, so I created two separate
    methods:
    
    - `Config::load_with_cli_overrides(cli_overrides: Vec<(String,
    TomlValue)>)`
    - `Config::load_with_cli_overrides_and_harness_overrides(cli_overrides:
    Vec<(String, TomlValue)>, harness_overrides: ConfigOverrides)`
    
    The latter has a long name, as it is _not_ what should be used in the
    common case, so the extra typing is designed to draw attention to this
    fact. I tried to update the existing callsites to use the shorter name,
    where possible.
    
    Further, in the cases where `ConfigOverrides` is used, usually only a
    limited subset of fields are actually set, so I updated the declarations
    to leverage `..Default::default()` where possible.
  • Add public skills + improve repo skill discovery and error UX (#8098)
    1. Adds SkillScope::Public end-to-end (core + protocol) and loads skills
    from the public cache directory
    2. Improves repo skill discovery by searching upward for the nearest
    .codex/skills within a git repo
    3. Deduplicates skills by name with deterministic ordering to avoid
    duplicates across sources
    4. Fixes garbled “Skill errors” overlay rendering by preventing pending
    history lines from being injected during the modal
    5. Updates the project docs “Skills” intro wording to avoid hardcoded
    paths
  • Reimplement skills loading using SkillsManager + skills/list op. (#7914)
    refactor the way we load and manage skills:
    1. Move skill discovery/caching into SkillsManager and reuse it across
    sessions.
    2. Add the skills/list API (Op::ListSkills/SkillsListResponse) to fetch
    skills for one or more cwds. Also update app-server for VSCE/App;
    3. Trigger skills/list during session startup so UIs preload skills and
    handle errors immediately.
  • Inject SKILL.md when it's explicitly mentioned. (#7763)
    1. Skills load once in core at session start; the cached outcome is
    reused across core and surfaced to TUI via SessionConfigured.
    2. TUI detects explicit skill selections, and core injects the matching
    SKILL.md content into the turn when a selected skill is present.
  • Removed experimental "command risk assessment" feature (#7799)
    This experimental feature received lukewarm reception during internal
    testing. Removing from the code base.
  • Refactor execpolicy fallback evaluation (#7544)
    ## Refactor of the `execpolicy` crate
    
    To illustrate why we need this refactor, consider an agent attempting to
    run `apple | rm -rf ./`. Suppose `apple` is allowed by `execpolicy`.
    Before this PR, `execpolicy` would consider `apple` and `pear` and only
    render one rule match: `Allow`. We would skip any heuristics checks on
    `rm -rf ./` and immediately approve `apple | rm -rf ./` to run.
    
    To fix this, we now thread a `fallback` evaluation function into
    `execpolicy` that runs when no `execpolicy` rules match a given command.
    In our example, we would run `fallback` on `rm -rf ./` and prevent
    `apple | rm -rf ./` from being run without approval.
  • whitelist command prefix integration in core and tui (#7033)
    this PR enables TUI to approve commands and add their prefixes to an
    allowlist:
    <img width="708" height="605" alt="Screenshot 2025-11-21 at 4 18 07 PM"
    src="https://github.com/user-attachments/assets/56a19893-4553-4770-a881-becf79eeda32"
    />
    
    note: we only show the option to whitelist the command when 
    1) command is not multi-part (e.g `git add -A && git commit -m 'hello
    world'`)
    2) command is not already matched by an existing rule
  • Migrate model preset (#7542)
    - Introduce `openai_models` in `/core`
    - Move `PRESETS` under it
    - Move `ModelPreset`, `ModelUpgrade`, `ReasoningEffortPreset`,
    `ReasoningEffortPreset`, and `ReasoningEffortPreset` to `protocol`
    - Introduce `Op::ListModels` and `EventMsg::AvailableModels`
    
    Next steps:
    - migrate `app-server` and `tui` to use the introduced Operation
  • chore: add cargo-deny configuration (#7119)
    - add GitHub workflow running cargo-deny on push/PR
    - document cargo-deny allowlist with workspace-dep notes and advisory
    ignores
    - align workspace crates to inherit version/edition/license for
    consistent checks
  • support MCP elicitations (#6947)
    No support for request schema yet, but we'll at least show the message
    and allow accept/decline.
    
    <img width="823" height="551" alt="Screenshot 2025-11-21 at 2 44 05 PM"
    src="https://github.com/user-attachments/assets/6fbb892d-ca12-4765-921e-9ac4b217534d"
    />
  • [app-server] feat: v2 apply_patch approval flow (#6760)
    This PR adds the API V2 version of the apply_patch approval flow, which
    centers around `ThreadItem::FileChange`.
    
    This PR wires the new RPC (`item/fileChange/requestApproval`, V2 only)
    and related events (`item/started`, `item/completed` for
    `ThreadItem::FileChange`, which are emitted in both V1 and V2) through
    the app-server
    protocol. The new approval RPC is only sent when the user initiates a
    turn with the new `turn/start` API so we don't break backwards
    compatibility with VSCE.
    
    Similar to https://github.com/openai/codex/pull/6758, the approach I
    took was to make as few changes to the Codex core as possible,
    leveraging existing `EventMsg` core events, and translating those in
    app-server. I did have to add a few additional fields to
    `EventMsg::PatchApplyBegin` and `EventMsg::PatchApplyEnd`, but those
    were fairly lightweight.
    
    However, the `EventMsg`s emitted by core are the following:
    ```
    1) Auto-approved (no request for approval)

    - EventMsg::PatchApplyBegin
    - EventMsg::PatchApplyEnd
    
    2) Approved by user
    - EventMsg::ApplyPatchApprovalRequest
    - EventMsg::PatchApplyBegin
    - EventMsg::PatchApplyEnd
    
    3) Declined by user
    - EventMsg::ApplyPatchApprovalRequest
    - EventMsg::PatchApplyBegin
    - EventMsg::PatchApplyEnd
    ```
    
    For a request triggering an approval, this would result in:
    ```
    item/fileChange/requestApproval
    item/started
    item/completed
    ```
    
    which is different from the `ThreadItem::CommandExecution` flow
    introduced in https://github.com/openai/codex/pull/6758, which does the
    below and is preferable:
    ```
    item/started
    item/commandExecution/requestApproval
    item/completed
    ```
    
    To fix this, we leverage `TurnSummaryStore` on codex_message_processor
    to store a little bit of state, allowing us to fire `item/started` and
    `item/fileChange/requestApproval` whenever we receive the underlying
    `EventMsg::ApplyPatchApprovalRequest`, and no-oping when we receive the
    `EventMsg::PatchApplyBegin` later.
    
    This is much less invasive than modifying the order of EventMsg within
    core (I tried).
    
    The resulting payloads:
    ```
    {
      "method": "item/started",
      "params": {
        "item": {
          "changes": [
            {
              "diff": "Hello from Codex!\n",
              "kind": "add",
              "path": "/Users/owen/repos/codex/codex-rs/APPROVAL_DEMO.txt"
            }
          ],
          "id": "call_Nxnwj7B3YXigfV6Mwh03d686",
          "status": "inProgress",
          "type": "fileChange"
        }
      }
    }
    ```
    
    ```
    {
      "id": 0,
      "method": "item/fileChange/requestApproval",
      "params": {
        "grantRoot": null,
        "itemId": "call_Nxnwj7B3YXigfV6Mwh03d686",
        "reason": null,
        "threadId": "019a9e11-8295-7883-a283-779e06502c6f",
        "turnId": "1"
      }
    }
    ```
    
    ```
    {
      "id": 0,
      "result": {
        "decision": "accept"
      }
    }
    ```
    
    ```
    {
      "method": "item/completed",
      "params": {
        "item": {
          "changes": [
            {
              "diff": "Hello from Codex!\n",
              "kind": "add",
              "path": "/Users/owen/repos/codex/codex-rs/APPROVAL_DEMO.txt"
            }
          ],
          "id": "call_Nxnwj7B3YXigfV6Mwh03d686",
          "status": "completed",
          "type": "fileChange"
        }
      }
    }
    ```
  • fix: add more fields to ThreadStartResponse and ThreadResumeResponse (#6847)
    This adds the following fields to `ThreadStartResponse` and
    `ThreadResumeResponse`:
    
    ```rust
        pub model: String,
        pub model_provider: String,
        pub cwd: PathBuf,
        pub approval_policy: AskForApproval,
        pub sandbox: SandboxPolicy,
        pub reasoning_effort: Option<ReasoningEffort>,
    ```
    
    This is important because these fields are optional in
    `ThreadStartParams` and `ThreadResumeParams`, so the caller needs to be
    able to determine what values were ultimately used to start/resume the
    conversation. (Though note that any of these could be changed later
    between turns in the conversation.)
    
    Though to get this information reliably, it must be read from the
    internal `SessionConfiguredEvent` that is created in response to the
    start of a conversation. Because `SessionConfiguredEvent` (as defined in
    `codex-rs/protocol/src/protocol.rs`) did not have all of these fields, a
    number of them had to be added as part of this PR.
    
    Because `SessionConfiguredEvent` is referenced in many tests, test
    instances of `SessionConfiguredEvent` had to be updated, as well, which
    is why this PR touches so many files.