34 Commits

  • Support openai/form extended form elicitations (#27500)
    # Summary
    Allow App Server clients to opt into `openai/form` MCP elicitations.
  • [codex] request desktop attestation from app (#20619)
    ## Summary
    
    TL;DR: teaches `codex-rs` / app-server to request a desktop-provided
    attestation token and attach it as `x-oai-attestation` on the scoped
    ChatGPT Codex request paths.
    
    ![DeviceCheck attestation
    interface](https://raw.githubusercontent.com/openai/codex/dev/jm/devicecheck-diagram-assets/pr-assets/devicecheck-attestation-interface.png)
    
    ## Details
    
    This PR teaches the Codex app-server runtime how to request and attach
    an attestation token. It does not generate DeviceCheck tokens directly;
    instead, it relies on the connected desktop app to advertise that it can
    generate attestation and then asks that app for a fresh header value
    when needed.
    
    The flow is:
    
    1. The Codex desktop app connects to app-server.
    2. During `initialize`, the app can advertise that it supports
    `requestAttestation`.
    3. Before app-server calls selected ChatGPT Codex endpoints, it sends
    the internal server request `attestation/generate` to the app.
    4. app-server receives a pre-encoded header value back.
    5. app-server forwards that value as `x-oai-attestation` on the scoped
    outbound requests.
    
    The code in this repo is mostly protocol and runtime plumbing: it adds
    the app-server request/response shape, introduces an attestation
    provider in core, wires that provider into Responses / compaction /
    realtime setup paths, and covers the intended scoping with tests. The
    signed macOS DeviceCheck generation remains owned by the desktop app PR.
    
    ## Related PR
    
    - Codex desktop app implementation:
    https://github.com/openai/openai/pull/878649
    
    ## Validation
    
    <details>
    <summary>Tests run</summary>
    
    ```sh
    cargo test -p codex-app-server-protocol
    cargo test -p codex-core attestation --lib
    cargo test -p codex-app-server --lib attestation
    ```
    
    Also ran:
    
    ```sh
    just fix -p codex-core
    just fix -p codex-app-server
    just fix -p codex-app-server-protocol
    just fmt
    just write-app-server-schema
    ```
    
    </details>
    
    <details>
    <summary>E2E DeviceCheck validation</summary>
    
    First validated the signed desktop app boundary directly: launched a
    packaged signed `Codex.app`, sent `attestation/generate`, decoded the
    returned `v1.` attestation header, and validated the extracted
    DeviceCheck token with `personal/jm/verify_devicecheck_token.py` using
    bundle ID `com.openai.codex`. Apple returned `status_code: 200` and
    `is_ok: true`.
    
    Then ran the fuller app + app-server flow. The packaged `Codex.app`
    launched a current-branch app-server via `CODEX_CLI_PATH`, and a local
    MITM proxy intercepted outbound `chatgpt.com` traffic. The app-server
    requested `attestation/generate` from the real Electron app process, and
    the intercepted `/backend-api/codex/responses` traffic included
    `x-oai-attestation` on both routes:
    
    ```text
    GET  /backend-api/codex/responses  Upgrade: websocket  x-oai-attestation: present
    POST /backend-api/codex/responses  Upgrade: none       x-oai-attestation: present
    ```
    
    The captured header decoded to a DeviceCheck token that also validated
    with Apple for `com.openai.codex` (`status_code: 200`, `is_ok: true`,
    team `2DC432GLL2`).
    
    </details>
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • app-server: Return codex home in initialize response (#15689)
    This allows clients to get enough information to interact with the codex
    skills/configuration/etc.
  • app-server: Add platform os and family to init response (#14527)
    This allows the client to pick os-specific behavior while interacting
    with the app server, e.g. to use proper path separators.
  • chore(app-server): stop emitting codex/event/ notifications (#14392)
    ## Description
    
    This PR stops emitting legacy `codex/event/*` notifications from the
    public app-server transports.
    
    It's been a long time coming! app-server was still producing a raw
    notification stream from core, alongside the typed app-server
    notifications and server requests, for compatibility reasons. Now,
    external clients should no longer be depending on those legacy
    notifications, so this change removes them from the stdio and websocket
    contract and updates the surrounding docs, examples, and tests to match.
    
    ### Caveat
    I left the "in-process" version of app-server alone for now, since
    `codex exec` was recently based on top of app-server via this in-process
    form here: https://github.com/openai/codex/pull/14005
    
    Seems like `codex exec` still consumes some legacy notifications
    internally, so this branch only removes `codex/event/*` from app-server
    over stdio and websockets.
    
    ## Follow-up
    
    Once `codex exec` is fully migrated off `codex/event/*` notifications,
    we'll be able to stop emitting them entirely entirely instead of just
    filtering it at the external transport boundary.
  • ignore v1 in JSON schema codegen (#12408)
    ## Why
    
    The generated unnamespaced JSON envelope schemas (`ClientRequest` and
    `ServerNotification`) still contained both v1 and v2 variants, which
    pulled legacy v1/core types and v2 types into the same `definitions`
    graph. That caused `schemars` to produce numeric suffix names (for
    example `AskForApproval2`, `ByteRange2`, `MessagePhase2`).
    
    This PR moves JSON codegen toward v2-only output while preserving the
    unnamespaced envelope artifacts, and avoids reintroducing numeric-suffix
    tolerance by removing the v1/internal-only variants that caused the
    collisions in those envelope schemas.
    
    ## What Changed
    
    - In `codex-rs/app-server-protocol/src/export.rs`, JSON generation now
    excludes v1 schema artifacts (`v1/*`) while continuing to emit
    unnamespaced/root JSON schemas and the JSON bundle.
    - Added a narrow JSON v1 allowlist (`JSON_V1_ALLOWLIST`) so
    `InitializeParams` and `InitializeResponse` are still emitted.
    - Added JSON-only post-processing for the mixed envelope schemas before
    collision checks run:
    - `ClientRequest`: strips v1 request variants from the generated `oneOf`
    using the temporary `V1_CLIENT_REQUEST_METHODS` list
    - `ServerNotification`: strips v1 notifications plus the internal-only
    `rawResponseItem/completed` notification using the temporary
    `EXCLUDED_SERVER_NOTIFICATION_METHODS_FOR_JSON` list
    - Added a temporary local-definition pruning pass for those envelope
    schemas so now-unreferenced v1/core definitions are removed from
    `definitions` after method filtering.
    - Updated the variant-title naming heuristic for single-property literal
    object variants to use the literal value (when available), avoiding
    collisions like multiple `state`-only variants all deriving the same
    title.
    - Collision handling remains fail-fast (no numeric suffix fallback map
    in this PR path).
    
    ## Verification
    
    - `just write-app-server-schema`
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/12408).
    * __->__ #12408
    * #12406
  • Wire realtime api to core (#12268)
    - Introduce `RealtimeConversationManager` for realtime API management 
    - Add `op::conversation` to start conversation, insert audio, insert
    text, and close conversation.
    - emit conversation lifecycle and realtime events.
    - Move shared realtime payload types into codex-protocol and add core
    e2e websocket tests for start/replace/transport-close paths.
    
    Things to consider:
    - Should we use the same `op::` and `Events` channel to carry audio? I
    think we should try this simple approach and later we can create
    separate one if the channels got congested.
    - Sending text updates to the client: we can start simple and later
    restrict that.
    - Provider auth isn't wired for now intentionally
  • feat: cleaner TUI for sub-agents (#12327)
    <img width="760" height="496" alt="Screenshot 2026-02-20 at 14 31 25"
    src="https://github.com/user-attachments/assets/1983b825-bb47-417e-9925-6f727af56765"
    />
  • feat: add nick name to sub-agents (#12320)
    Adding random nick name to sub-agents. Used for UX
    
    At the same time, also storing and wiring the role of the sub-agent
  • feat: add Reject approval policy with granular prompt rejection controls (#12087)
    ## Why
    
    We need a way to auto-reject specific approval prompt categories without
    switching all approvals off.
    
    The goal is to let users independently control:
    - sandbox escalation approvals,
    - execpolicy `prompt` rule approvals,
    - MCP elicitation prompts.
    
    ## What changed
    
    - Added a new primary approval mode in `protocol/src/protocol.rs`:
    
    ```rust
    pub enum AskForApproval {
        // ...
        Reject(RejectConfig),
        // ...
    }
    
    pub struct RejectConfig {
        pub sandbox_approval: bool,
        pub rules: bool,
        pub mcp_elicitations: bool,
    }
    ```
    
    - Wired `RejectConfig` semantics through approval paths in `core`:
      - `core/src/exec_policy.rs`
        - rejects rule-driven prompts when `rules = true`
        - rejects sandbox/escalation prompts when `sandbox_approval = true`
    - preserves rule priority when both rule and sandbox prompt conditions
    are present
      - `core/src/tools/sandboxing.rs`
    - applies `sandbox_approval` to default exec approval decisions and
    sandbox-failure retry gating
      - `core/src/safety.rs`
    - keeps `Reject { all false }` behavior aligned with `OnRequest` for
    patch safety
        - rejects out-of-root patch approvals when `sandbox_approval = true`
      - `core/src/mcp_connection_manager.rs`
        - auto-declines MCP elicitations when `mcp_elicitations = true`
    
    - Ensured approval policy used by MCP elicitation flow stays in sync
    with constrained session policy updates.
    
    - Updated app-server v2 conversions and generated schema/TypeScript
    artifacts for the new `Reject` shape.
    
    ## Verification
    
    Added focused unit coverage for the new behavior in:
    - `core/src/exec_policy.rs`
    - `core/src/tools/sandboxing.rs`
    - `core/src/mcp_connection_manager.rs`
    - `core/src/safety.rs`
    - `core/src/tools/runtimes/apply_patch.rs`
    
    Key cases covered include rule-vs-sandbox prompt precedence, MCP
    auto-decline behavior, and patch/sandbox retry behavior under
    `RejectConfig`.
  • feat(core): plumb distinct approval ids for command approvals (#12051)
    zsh fork PR stack:
    - https://github.com/openai/codex/pull/12051 👈 
    - https://github.com/openai/codex/pull/12052
    
    With upcoming support for a fork of zsh that allows us to intercept
    `execve` and run execpolicy checks for each subcommand as part of a
    `CommandExecution`, it will be possible for there to be multiple
    approval requests for a shell command like `/path/to/zsh -lc 'git status
    && rg \"TODO\" src && make test'`.
    
    To support that, this PR introduces a new `approval_id` field across
    core, protocol, and app-server so that we can associate approvals
    properly for subcommands.
  • Feat: add model reroute notification (#12001)
    ### Summary
    Builiding off
    https://github.com/openai/codex/pull/11964/files/5c75aa7b89a70bc2cc410a6fd238749306ec4c5e#diff-058ae8f109a8b84b4b79bbfa45f522c2233b9d9e139696044ae374d50b6196e0,
    we have created a `model/rerouted` notification that captures the event
    so that consumers can render as expected. Keep the `EventMsg::Warning`
    path in core so that this does not affect TUI rendering.
    
    `model/rerouted` is meant to be generic to account for future usage
    including capacity planning etc.
  • feat(core): add structured network approval plumbing and policy decision model (#11672)
    ### Description
    #### Summary
    Introduces the core plumbing required for structured network approvals
    
    #### What changed
    - Added structured network policy decision modeling in core.
    - Added approval payload/context types needed for network approval
    semantics.
    - Wired shell/unified-exec runtime plumbing to consume structured
    decisions.
    - Updated related core error/event surfaces for structured handling.
    - Updated protocol plumbing used by core approval flow.
    - Included small CLI debug sandbox compatibility updates needed by this
    layer.
    
    #### Why
    establishes the minimal backend foundation for network approvals without
    yet changing high-level orchestration or TUI behavior.
    
    #### Notes
    - Behavior remains constrained by existing requirements/config gating.
    - Follow-up PRs in the stack handle orchestration, UX, and app-server
    integration.
    
    ---------
    
    Co-authored-by: Codex <199175422+chatgpt-codex-connector[bot]@users.noreply.github.com>
  • chore(core) Deprecate approval_policy: on-failure (#11631)
    ## Summary
    In an effort to start simplifying our sandbox setup, we're announcing
    this approval_policy as deprecated. In general, it performs worse than
    `on-request`, and we're focusing on making fewer sandbox configurations
    perform much better.
    
    ## Testing
    - [x] Tested locally
    - [x] Existing tests pass
  • feat(app-server): experimental flag to persist extended history (#11227)
    This PR adds an experimental `persist_extended_history` bool flag to
    app-server thread APIs so rollout logs can retain a richer set of
    EventMsgs for non-lossy Thread > Turn > ThreadItems reconstruction (i.e.
    on `thread/resume`).
    
    ### Motivation
    Today, our rollout recorder only persists a small subset (e.g. user
    message, reasoning, assistant message) of `EventMsg` types, dropping a
    good number (like command exec, file change, etc.) that are important
    for reconstructing full item history for `thread/resume`, `thread/read`,
    and `thread/fork`.
    
    Some clients want to be able to resume a thread without lossiness. This
    lossiness is primarily a UI thing, since what the model sees are
    `ResponseItem` and not `EventMsg`.
    
    ### Approach
    This change introduces an opt-in `persist_full_history` flag to preserve
    those events when you start/resume/fork a thread (defaults to `false`).
    
    This is done by adding an `EventPersistenceMode` to the rollout
    recorder:
    - `Limited` (existing behavior, default)
    - `Extended` (new opt-in behavior)
    
    In `Extended` mode, persist additional `EventMsg` variants needed for
    non-lossy app-server `ThreadItem` reconstruction. We now store the
    following ThreadItems that we didn't before:
    - web search
    - command execution
    - patch/file changes
    - MCP tool calls
    - image view calls
    - collab tool outcomes
    - context compaction
    - review mode enter/exit
    
    For **command executions** in particular, we truncate the output using
    the existing `truncate_text` from core to store an upper bound of 10,000
    bytes, which is also the default value for truncating tool outputs shown
    to the model. This keeps the size of the rollout file and command
    execution items returned over the wire reasonable.
    
    And we also persist `EventMsg::Error` which we can now map back to the
    Turn's status and populates the Turn's error metadata.
    
    #### Updates to EventMsgs
    To truly make `thread/resume` non-lossy, we also needed to persist the
    `status` on `EventMsg::CommandExecutionEndEvent` and
    `EventMsg::PatchApplyEndEvent`. Previously it was not obvious whether a
    command failed or was declined (similar for apply_patch). These
    EventMsgs were never persisted before so I made it a required field.
  • feat: make sandbox read access configurable with ReadOnlyAccess (#11387)
    `SandboxPolicy::ReadOnly` previously implied broad read access and could
    not express a narrower read surface.
    This change introduces an explicit read-access model so we can support
    user-configurable read restrictions in follow-up work, while preserving
    current behavior today.
    
    It also ensures unsupported backends fail closed for restricted-read
    policies instead of silently granting broader access than intended.
    
    ## What
    
    - Added `ReadOnlyAccess` in protocol with:
      - `Restricted { include_platform_defaults, readable_roots }`
      - `FullAccess`
    - Updated `SandboxPolicy` to carry read-access configuration:
      - `ReadOnly { access: ReadOnlyAccess }`
      - `WorkspaceWrite { ..., read_only_access: ReadOnlyAccess }`
    - Preserved existing behavior by defaulting current construction paths
    to `ReadOnlyAccess::FullAccess`.
    - Threaded the new fields through sandbox policy consumers and call
    sites across `core`, `tui`, `linux-sandbox`, `windows-sandbox`, and
    related tests.
    - Updated Seatbelt policy generation to honor restricted read roots by
    emitting scoped read rules when full read access is not granted.
    - Added fail-closed behavior on Linux and Windows backends when
    restricted read access is requested but not yet implemented there
    (`UnsupportedOperation`).
    - Regenerated app-server protocol schema and TypeScript artifacts,
    including `ReadOnlyAccess`.
    
    ## Compatibility / rollout
    
    - Runtime behavior remains unchanged by default (`FullAccess`).
    - API/schema changes are in place so future config wiring can enable
    restricted read access without another policy-shape migration.
  • change model cap to server overload (#11388)
    # External (non-OpenAI) Pull Request Requirements
    
    Before opening this Pull Request, please read the dedicated
    "Contributing" markdown file or your PR may be closed:
    https://github.com/openai/codex/blob/main/docs/contributing.md
    
    If your PR conforms to our contribution guidelines, replace this text
    with a detailed and high quality description of your changes.
    
    Include a link to a bug report or enhancement request.
  • feat: support multiple rate limits (#11260)
    Added multi-limit support end-to-end by carrying limit_name in
    rate-limit snapshots and handling multiple buckets instead of only
    codex.
    Extended /usage client parsing to consume additional_rate_limits
    Updated TUI /status and in-memory state to store/render per-limit
    snapshots
    Extended app-server rate-limit read response: kept rate_limits and added
    rate_limits_by_name.
    Adjusted usage-limit error messaging for non-default codex limit buckets
  • chore: persist turn_id in rollout session and make turn_id uuid based (#11246)
    Problem:
    1. turn id is constructed in-memory;
    2. on resuming threads, turn_id might not be unique;
    3. client cannot no the boundary of a turn from rollout files easily.
    
    This PR does three things:
    1. persist `task_started` and `task_complete` events;
    1. persist `turn_id` in rollout turn events;
    5. generate turn_id as unique uuids instead of incrementing it in
    memory.
    
    This helps us resolve the issue of clients wanting to have unique turn
    ids for resuming a thread, and knowing the boundry of each turn in
    rollout files.
    
    example debug logs
    ```
    2026-02-11T00:32:10.746876Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=8 turn=Turn { id: "019c4a07-d809-74c3-bc4b-fd9618487b4b", items: [UserMessage { id: "item-24", content: [Text { text: "hi", text_elements: [] }] }, AgentMessage { id: "item-25", text: "Hi. I’m in the workspace with your current changes loaded and ready. Send the next task and I’ll execute it end-to-end." }], status: Completed, error: None }
    2026-02-11T00:32:10.746888Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=9 turn=Turn { id: "019c4a18-1004-76c0-a0fb-a77610f6a9b8", items: [UserMessage { id: "item-26", content: [Text { text: "hello", text_elements: [] }] }, AgentMessage { id: "item-27", text: "Hello. Ready for the next change in `codex-rs`; I can continue from the current in-progress diff or start a new task." }], status: Completed, error: None }
    2026-02-11T00:32:10.746899Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=10 turn=Turn { id: "019c4a19-41f0-7db0-ad78-74f1503baeb8", items: [UserMessage { id: "item-28", content: [Text { text: "hello", text_elements: [] }] }, AgentMessage { id: "item-29", text: "Hello. Send the specific change you want in `codex-rs`, and I’ll implement it and run the required checks." }], status: Completed, error: None }
    ```
    
    backward compatibility:
    if you try to resume an old session without task_started and
    task_complete event populated, the following happens:
    - If you resume and do nothing: those reconstructed historical IDs can
    differ next time you resume.
    - If you resume and send a new turn: the new turn gets a fresh UUID from
    live submission flow and is persisted, so that new turn’s ID is stable
    on later resumes.
    I think this behavior is fine, because we only care about deterministic
    turn id once a turn is triggered.
  • feat: opt-out of events in the app-server (#11319)
    Add `optOutNotificationMethods` in the app-server to opt-out events
    based on exact method matching
  • feat: retain NetworkProxy, when appropriate (#11207)
    As of this PR, `SessionServices` retains a
    `Option<StartedNetworkProxy>`, if appropriate.
    
    Now the `network` field on `Config` is `Option<NetworkProxySpec>`
    instead of `Option<NetworkProxy>`.
    
    Over in `Session::new()`, we invoke `NetworkProxySpec::start_proxy()` to
    create the `StartedNetworkProxy`, which is a new struct that retains the
    `NetworkProxy` as well as the `NetworkProxyHandle`. (Note that `Drop` is
    implemented for `NetworkProxyHandle` to ensure the proxies are shutdown
    when it is dropped.)
    
    The `NetworkProxy` from the `StartedNetworkProxy` is threaded through to
    the appropriate places.
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/11207).
    * #11285
    * __->__ #11207
  • fix(tui): keep unified exec summary on working line (#10962)
    ## Problem
    When unified-exec background sessions appear while the status indicator
    is visible, the bottom pane can grow by one row to show a dedicated
    footer line. That row insertion/removal makes the composer jump
    vertically and produces visible jitter/flicker during streaming turns.
    
    ## Mental model
    The bottom pane should expose one canonical background-exec summary
    string, but it should surface that string in only one place at a time:
    - if the status indicator row is visible, show the summary inline on
    that row;
    - if the status indicator row is hidden, show the summary as the
    standalone unified-exec footer row.
    
    This keeps status information visible while preserving a stable pane
    height.
    
    ## Non-goals
    This change does not alter unified-exec lifecycle, process tracking, or
    `/ps` behavior. It does not redesign status text copy, spinner timing,
    or interrupt handling semantics.
    
    ## Tradeoffs
    Inlining the summary preserves layout stability and keeps interrupt
    affordances in a fixed location, but it reduces horizontal space for
    long status/detail text in narrow terminals. We accept that truncation
    risk in exchange for removing vertical jitter and keeping the composer
    anchored.
    
    ## Architecture
    `UnifiedExecFooter` remains the source of truth for background-process
    summary copy via `summary_text()`. `BottomPane` mirrors that text into
    `StatusIndicatorWidget::update_inline_message()` whenever process state
    changes or a status widget is created. Rendering enforces single-surface
    output: the standalone footer row is skipped while status is present,
    and the status row appends the summary after the elapsed/interrupt
    segment.
    
    ## Documentation pass
    Added non-functional docs/comments that make the new invariant explicit:
    - status row owns inline summary when present;
    - unified-exec footer row renders only when status row is absent;
    - summary ordering keeps elapsed/interrupt affordance in a stable
    position.
    
    ## Observability
    No new telemetry or logs are introduced. The behavior is traceable
    through:
    - `BottomPane::set_unified_exec_processes()` for state updates,
    - `BottomPane::sync_status_inline_message()` for status-row
    synchronization,
    - `StatusIndicatorWidget::render()` for final inline ordering.
    
    ## Tests
    - Added
    `bottom_pane::tests::unified_exec_summary_does_not_increase_height_when_status_visible`
    to lock the no-height-growth invariant.
    - Updated the unified-exec status restoration snapshot to match inline
    rendering order.
    - Validated with:
      - `just fmt`
      - `cargo test -p codex-tui --lib`
    
    ---------
    
    Co-authored-by: Sayan Sisodiya <sayan@openai.com>
  • Add resume_agent collab tool (#10903)
    Summary
    - add the new resume_agent collab tool path through core, protocol, and
    the app server API, including the resume events
    - update the schema/TypeScript definitions plus docs so resume_agent
    appears in generated artifacts and README
    - note that resumed agents rehydrate rollout history without overwriting
    their base instructions
    
    Testing
    - Not run (not requested)
  • fix(tui): conditionally restore status indicator using message phase (#10947)
    TLDR: use new message phase field emitted by preamble-supported models
    to determine whether an AgentMessage is mid-turn commentary. if so,
    restore the status indicator afterwards to indicate the turn has not
    completed.
    
    ### Problem
    `commit_tick` hides the status indicator while streaming assistant text.
    For preamble-capable models, that text can be commentary mid-turn, so
    hiding was correct during streaming but restore timing mattered:
    - restoring too aggressively caused jitter/flashing
    - not restoring caused indicator to stay hidden before subsequent work
    (tool calls, web search, etc.)
    
    ### Fix
    - Add optional `phase` to `AgentMessageItem` and propagate it from
    `ResponseItem::Message`
    - Keep indicator hidden during streamed commit ticks, restore only when:
      - assistant item completes as `phase=commentary`, and
      - stream queues are idle + task is still running.
    - Treat `phase=None` as final-answer behavior (no restore) to keep
    existing behavior for non-preamble models
    
    ### Tests
    Add/update tests for:
    - no idle-tick restore without commentary completion
    - commentary completion restoring status before tool begin
    - snapshot coverage for preamble/status behavior
    
    ---------
    
    Co-authored-by: Josh McKinney <joshka@openai.com>
  • feat(app-server, core): allow text + image content items for dynamic tool outputs (#10567)
    Took over the work that @aaronl-openai started here:
    https://github.com/openai/codex/pull/10397
    
    Now that app-server clients are able to set up custom tools (called
    `dynamic_tools` in app-server), we should expose a way for clients to
    pass in not just text, but also image outputs. This is something the
    Responses API already supports for function call outputs, where you can
    pass in either a string or an array of content outputs (text, image,
    file):
    https://platform.openai.com/docs/api-reference/responses/create#responses_create-input-input_item_list-item-function_tool_call_output-output-array-input_image
    
    So let's just plumb it through in Codex (with the caveat that we only
    support text and image for now). This is implemented end-to-end across
    app-server v2 protocol types and core tool handling.
    
    ## Breaking API change
    NOTE: This introduces a breaking change with dynamic tools, but I think
    it's ok since this concept was only recently introduced
    (https://github.com/openai/codex/pull/9539) and it's better to get the
    API contract correct. I don't think there are any real consumers of this
    yet (not even the Codex App).
    
    Old shape:
    `{ "output": "dynamic-ok", "success": true }`
    
    New shape:
    ```
    {
        "contentItems": [
          { "type": "inputText", "text": "dynamic-ok" },
          { "type": "inputImage", "imageUrl": "data:image/png;base64,AAA" }
        ]
      "success": true
    }
    ```
  • feat: add APIs to list and download public remote skills (#10448)
    Add API to list / download from remote public skills
  • Cleanup collaboration mode variants (#10404)
    ## Summary
    
    This PR simplifies collaboration modes to the visible set `default |
    plan`, while preserving backward compatibility for older partners that
    may still send legacy mode
    names.
    
    Specifically:
    - Renames the old Code behavior to **Default**.
    - Keeps **Plan** as-is.
    - Removes **Custom** mode behavior (fallbacks now resolve to Default).
    - Keeps `PairProgramming` and `Execute` internally for compatibility
    plumbing, while removing them from schema/API and UI visibility.
    - Adds legacy input aliasing so older clients can still send old mode
    names.
    
    ## What Changed
    
    1. Mode enum and compatibility
    - `ModeKind` now uses `Plan` + `Default` as active/public modes.
    - `ModeKind::Default` deserialization accepts legacy values:
      - `code`
      - `pair_programming`
      - `execute`
      - `custom`
    - `PairProgramming` and `Execute` variants remain in code but are hidden
    from protocol/schema generation.
    - `Custom` variant is removed; previous custom fallbacks now map to
    `Default`.
    
    2. Collaboration presets and templates
    - Built-in presets now return only:
      - `Plan`
      - `Default`
    - Template rename:
      - `core/templates/collaboration_mode/code.md` -> `default.md`
    - `execute.md` and `pair_programming.md` remain on disk but are not
    surfaced in visible preset lists.
    
    3. TUI updates
    - Updated user-facing naming and prompts from “Code” to “Default”.
    - Updated mode-cycle and indicator behavior to reflect only visible
    `Plan` and `Default`.
    - Updated corresponding tests and snapshots.
    
    4. request_user_input behavior
    - `request_user_input` remains allowed only in `Plan` mode.
    - Rejection messaging now consistently treats non-plan modes as
    `Default`.
    
    5. Schemas
    - Regenerated config and app-server schemas.
    - Public schema types now advertise mode values as:
      - `plan`
      - `default`
    
    ## Backward Compatibility Notes
    
    - Incoming legacy mode names (`code`, `pair_programming`, `execute`,
    `custom`) are accepted and coerced to `default`.
    - Outgoing/public schema surfaces intentionally expose only `plan |
    default`.
    - This allows tolerant ingestion of older partner payloads while
    standardizing new integrations on the reduced mode set.
    
    ## Codex author
    `codex fork 019c1fae-693b-7840-b16e-9ad38ea0bd00`
  • chore: add phase to message responseitem (#10455)
    ### What
    
    add wiring for `phase` field on `ResponseItem::Message` to lay
    groundwork for differentiating model preambles and final messages.
    currently optional.
    
    follows pattern in #9698.
    
    updated schemas with `just write-app-server-schema` so we can see type
    changes.
    
    ### Tests
    Updated existing tests for SSE parsing and hydrating from history
  • feat: replace custom mcp-types crate with equivalents from rmcp (#10349)
    We started working with MCP in Codex before
    https://crates.io/crates/rmcp was mature, so we had our own crate for
    MCP types that was generated from the MCP schema:
    
    
    https://github.com/openai/codex/blob/8b95d3e082376f4cb23e92641705a22afb28a9da/codex-rs/mcp-types/README.md
    
    Now that `rmcp` is more mature, it makes more sense to use their MCP
    types in Rust, as they handle details (like the `_meta` field) that our
    custom version ignored. Though one advantage that our custom types had
    is that our generated types implemented `JsonSchema` and `ts_rs::TS`,
    whereas the types in `rmcp` do not. As such, part of the work of this PR
    is leveraging the adapters between `rmcp` types and the serializable
    types that are API for us (app server and MCP) introduced in #10356.
    
    Note this PR results in a number of changes to
    `codex-rs/app-server-protocol/schema`, which merit special attention
    during review. We must ensure that these changes are still
    backwards-compatible, which is possible because we have:
    
    ```diff
    - export type CallToolResult = { content: Array<ContentBlock>, isError?: boolean, structuredContent?: JsonValue, };
    + export type CallToolResult = { content: Array<JsonValue>, structuredContent?: JsonValue, isError?: boolean, _meta?: JsonValue, };
    ```
    
    so `ContentBlock` has been replaced with the more general `JsonValue`.
    Note that `ContentBlock` was defined as:
    
    ```typescript
    export type ContentBlock = TextContent | ImageContent | AudioContent | ResourceLink | EmbeddedResource;
    ```
    
    so the deletion of those individual variants should not be a cause of
    great concern.
    
    Similarly, we have the following change in
    `codex-rs/app-server-protocol/schema/typescript/Tool.ts`:
    
    ```
    - export type Tool = { annotations?: ToolAnnotations, description?: string, inputSchema: ToolInputSchema, name: string, outputSchema?: ToolOutputSchema, title?: string, };
    + export type Tool = { name: string, title?: string, description?: string, inputSchema: JsonValue, outputSchema?: JsonValue, annotations?: JsonValue, icons?: Array<JsonValue>, _meta?: JsonValue, };
    ```
    
    so:
    
    - `annotations?: ToolAnnotations` ➡️ `JsonValue`
    - `inputSchema: ToolInputSchema` ➡️ `JsonValue`
    - `outputSchema?: ToolOutputSchema` ➡️ `JsonValue`
    
    and two new fields: `icons?: Array<JsonValue>, _meta?: JsonValue`
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/10349).
    * #10357
    * __->__ #10349
    * #10356
  • feat: experimental flags (#10231)
    ## Problem being solved
    - We need a single, reliable way to mark app-server API surface as
    experimental so that:
      1. the runtime can reject experimental usage unless the client opts in
    2. generated TS/JSON schemas can exclude experimental methods/fields for
    stable clients.
    
    Right now that’s easy to drift or miss when done ad-hoc.
    
    ## How to declare experimental methods and fields
    - **Experimental method**: add `#[experimental("method/name")]` to the
    `ClientRequest` variant in `client_request_definitions!`.
    - **Experimental field**: on the params struct, derive `ExperimentalApi`
    and annotate the field with `#[experimental("method/name.field")]` + set
    `inspect_params: true` for the method variant so
    `ClientRequest::experimental_reason()` inspects params for experimental
    fields.
    
    ## How the macro solves it
    - The new derive macro lives in
    `codex-rs/codex-experimental-api-macros/src/lib.rs` and is used via
    `#[derive(ExperimentalApi)]` plus `#[experimental("reason")]`
    attributes.
    - **Structs**:
    - Generates `ExperimentalApi::experimental_reason(&self)` that checks
    only annotated fields.
      - The “presence” check is type-aware:
        - `Option<T>`: `is_some_and(...)` recursively checks inner.
        - `Vec`/`HashMap`/`BTreeMap`: must be non-empty.
        - `bool`: must be `true`.
        - Other types: considered present (returns `true`).
    - Registers each experimental field in an `inventory` with `(type_name,
    serialized field name, reason)` and exposes `EXPERIMENTAL_FIELDS` for
    that type. Field names are converted from `snake_case` to `camelCase`
    for schema/TS filtering.
    - **Enums**:
    - Generates an exhaustive `match` returning `Some(reason)` for annotated
    variants and `None` otherwise (no wildcard arm).
    - **Wiring**:
    - Runtime gating uses `ExperimentalApi::experimental_reason()` in
    `codex-rs/app-server/src/message_processor.rs` to reject requests unless
    `InitializeParams.capabilities.experimental_api == true`.
    - Schema/TS export filters use the inventory list and
    `EXPERIMENTAL_CLIENT_METHODS` from `client_request_definitions!` to
    strip experimental methods/fields when `experimental_api` is false.
  • feat: vendor app-server protocol schema fixtures (#10371)
    Similar to what @sayan-oai did in openai/codex#8956 for
    `config.schema.json`, this PR updates the repo so that it includes the
    output of `codex app-server generate-json-schema` and `codex app-server
    generate-ts` and adds a test to verify it is in sync with the current
    code.
    
    Motivation:
    - This makes any schema changes introduced by a PR transparent during
    code review.
    - In particular, this should help us catch PRs that would introduce a
    non-backwards-compatible change to the app schema (eventually, this
    should also be enforced by tooling).
    - Once https://github.com/openai/codex/pull/10231 is in to formalize the
    notion of "experimental" fields, we can work on ensuring the
    non-experimental bits are backwards-compatible.
    
    `codex-rs/app-server-protocol/tests/schema_fixtures.rs` was added as the
    test and `just write-app-server-schema` can be use to generate the
    vendored schema files.
    
    Incidentally, when I run:
    
    ```
    rg _ codex-rs/app-server-protocol/schema/typescript/v2
    ```
    
    I see a number of `snake_case` names that should be `camelCase`.