19 Commits

  • Use ApiPathString in app-server filesystem permission paths (#28367)
    ## Why
    
    Clients running an app-server on one OS and an exec-server on another OS
    need to be able to pass sandbox config to app-server that refers to
    resources on the executor's foreign OS.
    
    ## What
    
    `AbsolutePathBuf` can't represent these paths and we don't want users to
    be exposed to `PathUri` yet, so this moves the public app-server API to
    be expressed in terms of `ApiPathString`.
    
    Stacked on #28165.
    
    - change app-server v2 filesystem permission paths, including legacy
    read/write roots, to `ApiPathString`
    - localize API paths through `PathUri` when converting into the current
    native core permission types
    - make path-bearing permission conversions fallible and surface
    localization failures instead of silently treating malformed grants as
    ordinary denials
    - propagate conversion failures through app-server and TUI approval
    handling
    - regenerate the app-server JSON and TypeScript schemas
    - leave migration TODOs on native-path conversions so they can be
    removed once core permission paths use `PathUri`
  • Disable empty Cargo test targets (#21584)
    ## Summary
    
    `cargo test` has entails both running standard Rust tests and doctests.
    It turns out that the doctest discovery is fairly slow, and it's a cost
    you pay even for crates that don't include any doctests.
    
    This PR disables doctests with `doctest = false` for crates that lack
    any doctests.
    
    For the collection of crates below, this speeds up test execution by
    >4x.
    
    E.g., before this PR:
    
    ```
    Benchmark 1: cargo test     -p codex-utils-absolute-path     -p codex-utils-cache     -p codex-utils-cli     -p codex-utils-home-dir     -p codex-utils-output-truncation     -p codex-utils-path     -p codex-utils-string     -p codex-utils-template     -p codex-utils-elapsed     -p codex-utils-json-to-toml
      Time (mean ± σ):      1.849 s ±  4.455 s    [User: 0.752 s, System: 1.367 s]
      Range (min … max):    0.418 s … 14.529 s    10 runs
    ```
    
    And after:
    
    ```
    Benchmark 1: cargo test     -p codex-utils-absolute-path     -p codex-utils-cache     -p codex-utils-cli     -p codex-utils-home-dir     -p codex-utils-output-truncation     -p codex-utils-path     -p codex-utils-string     -p codex-utils-template     -p codex-utils-elapsed     -p codex-utils-json-to-toml
      Time (mean ± σ):     428.6 ms ±   6.9 ms    [User: 187.7 ms, System: 219.7 ms]
      Range (min … max):   418.0 ms … 436.8 ms    10 runs
    ```
    
    For a single crate, with >2x speedup, before:
    
    ```
    Benchmark 1: cargo test -p codex-utils-string
      Time (mean ± σ):     491.1 ms ±   9.0 ms    [User: 229.8 ms, System: 234.9 ms]
      Range (min … max):   480.9 ms … 512.0 ms    10 runs
    ```
    
    And after:
    
    ```
    Benchmark 1: cargo test -p codex-utils-string
      Time (mean ± σ):     213.9 ms ±   4.3 ms    [User: 112.8 ms, System: 84.0 ms]
      Range (min … max):   206.8 ms … 221.0 ms    13 runs
    ```
    
    Co-authored-by: Codex <noreply@openai.com>
  • Refactor config loading to use filesystem abstraction (#18209)
    Initial pass propagating FileSystem through config loading.
  • fix(guardian): fix ordering of guardian events (#16462)
    Guardian events were emitted a bit out of order for CommandExecution
    items. This would make it hard for the frontend to render a guardian
    auto-review, which has this payload:
    ```
    pub struct ItemGuardianApprovalReviewStartedNotification {
        pub thread_id: String,
        pub turn_id: String,
        pub target_item_id: String,
        pub review: GuardianApprovalReview,
        // FYI this is no longer a json blob
        pub action: Option<JsonValue>,
    }
    ```
    
    There is a `target_item_id` the auto-approval review is referring to,
    but the actual item had not been emitted yet.
    
    Before this PR:
    - `item/autoApprovalReview/started`
    - `item/autoApprovalReview/completed`, and if approved...
    - `item/started`
    - `item/completed`
    
    After this PR:
    - `item/started`
    - `item/autoApprovalReview/started`
    - `item/autoApprovalReview/completed`
    - `item/completed`
    
    This lines up much better with existing patterns (i.e. human review in
    `Default mode`, where app-server would send a server request to prompt
    for user approval after `item/started`), and makes it easier for clients
    to render what guardian is actually reviewing.
    
    We do this following a similar pattern as `FileChange` (aka apply patch)
    items, where we create a FileChange item and emit `item/started` if we
    see the apply patch approval request, before the actual apply patch call
    runs.
  • Move git utilities into a dedicated crate (#15564)
    - create `codex-git-utils` and move the shared git helpers into it with
    file moves preserved for diff readability
    - move the `GitInfo` helpers out of `core` so stacked rollout work can
    depend on the shared crate without carrying its own git info module
    
    ---------
    
    Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
    Co-authored-by: Codex <noreply@openai.com>
  • feat(app-server): support mcp elicitations in v2 api (#13425)
    This adds a first-class server request for MCP server elicitations:
    `mcpServer/elicitation/request`.
    
    Until now, MCP elicitation requests only showed up as a raw
    `codex/event/elicitation_request` event from core. That made it hard for
    v2 clients to handle elicitations using the same request/response flow
    as other server-driven interactions (like shell and `apply_patch`
    tools).
    
    This also updates the underlying MCP elicitation request handling in
    core to pass through the full MCP request (including URL and form data)
    so we can expose it properly in app-server.
    
    ### Why not `item/mcpToolCall/elicitationRequest`?
    This is because MCP elicitations are related to MCP servers first, and
    only optionally to a specific MCP tool call.
    
    In the MCP protocol, elicitation is a server-to-client capability: the
    server sends `elicitation/create`, and the client replies with an
    elicitation result. RMCP models it that way as well.
    
    In practice an elicitation is often triggered by an MCP tool call, but
    not always.
    
    ### What changed
    - add `mcpServer/elicitation/request` to the v2 app-server API
    - translate core `codex/event/elicitation_request` events into the new
    v2 server request
    - map client responses back into `Op::ResolveElicitation` so the MCP
    server can continue
    - update app-server docs and generated protocol schema
    - add an end-to-end app-server test that covers the full round trip
    through a real RMCP elicitation flow
    - The new test exercises a realistic case where an MCP tool call
    triggers an elicitation, the app-server emits
    mcpServer/elicitation/request, the client accepts it, and the tool call
    resumes and completes successfully.
    
    ### app-server API flow
    - Client starts a thread with `thread/start`.
    - Client starts a turn with `turn/start`.
    - App-server sends `item/started` for the `mcpToolCall`.
    - While that tool call is in progress, app-server sends
    `mcpServer/elicitation/request`.
    - Client responds to that request with `{ action: "accept" | "decline" |
    "cancel" }`.
    - App-server sends `serverRequest/resolved`.
    - App-server sends `item/completed` for the mcpToolCall.
    - App-server sends `turn/completed`.
    - If the turn is interrupted while the elicitation is pending,
    app-server still sends `serverRequest/resolved` before the turn
    finishes.
  • app-server service tier plumbing (plus some cleanup) (#13334)
    followup to https://github.com/openai/codex/pull/13212 to expose fast
    tier controls to app server
    (majority of this PR is generated schema jsons - actual code is +69 /
    -35 and +24 tests )
    
    - add service tier fields to the app-server protocol surfaces used by
    thread lifecycle, turn start, config, and session configured events
    - thread service tier through the app-server message processor and core
    thread config snapshots
    - allow runtime config overrides to carry service tier for app-server
    callers
    
    cleanup:
    - Removing useless "legacy" code supporting "standard" - we moved to
    None | "fast", so "standard" is not needed.
  • app-server: improve thread resume rejoin flow (#11776)
    thread/resume response includes latest turn with all items, in band so
    no events are stale or lost
    
    Testing
    - e2e tested using app-server-test-client using flow described in
    "Testing Thread Rejoin Behavior" in
    codex-rs/app-server-test-client/README.md
    - e2e tested in codex desktop by reconnecting to a running turn
  • feat(app-server): experimental flag to persist extended history (#11227)
    This PR adds an experimental `persist_extended_history` bool flag to
    app-server thread APIs so rollout logs can retain a richer set of
    EventMsgs for non-lossy Thread > Turn > ThreadItems reconstruction (i.e.
    on `thread/resume`).
    
    ### Motivation
    Today, our rollout recorder only persists a small subset (e.g. user
    message, reasoning, assistant message) of `EventMsg` types, dropping a
    good number (like command exec, file change, etc.) that are important
    for reconstructing full item history for `thread/resume`, `thread/read`,
    and `thread/fork`.
    
    Some clients want to be able to resume a thread without lossiness. This
    lossiness is primarily a UI thing, since what the model sees are
    `ResponseItem` and not `EventMsg`.
    
    ### Approach
    This change introduces an opt-in `persist_full_history` flag to preserve
    those events when you start/resume/fork a thread (defaults to `false`).
    
    This is done by adding an `EventPersistenceMode` to the rollout
    recorder:
    - `Limited` (existing behavior, default)
    - `Extended` (new opt-in behavior)
    
    In `Extended` mode, persist additional `EventMsg` variants needed for
    non-lossy app-server `ThreadItem` reconstruction. We now store the
    following ThreadItems that we didn't before:
    - web search
    - command execution
    - patch/file changes
    - MCP tool calls
    - image view calls
    - collab tool outcomes
    - context compaction
    - review mode enter/exit
    
    For **command executions** in particular, we truncate the output using
    the existing `truncate_text` from core to store an upper bound of 10,000
    bytes, which is also the default value for truncating tool outputs shown
    to the model. This keeps the size of the rollout file and command
    execution items returned over the wire reasonable.
    
    And we also persist `EventMsg::Error` which we can now map back to the
    Turn's status and populates the Turn's error metadata.
    
    #### Updates to EventMsgs
    To truly make `thread/resume` non-lossy, we also needed to persist the
    `status` on `EventMsg::CommandExecutionEndEvent` and
    `EventMsg::PatchApplyEndEvent`. Previously it was not obvious whether a
    command failed or was declined (similar for apply_patch). These
    EventMsgs were never persisted before so I made it a required field.
  • feat: replace custom mcp-types crate with equivalents from rmcp (#10349)
    We started working with MCP in Codex before
    https://crates.io/crates/rmcp was mature, so we had our own crate for
    MCP types that was generated from the MCP schema:
    
    
    https://github.com/openai/codex/blob/8b95d3e082376f4cb23e92641705a22afb28a9da/codex-rs/mcp-types/README.md
    
    Now that `rmcp` is more mature, it makes more sense to use their MCP
    types in Rust, as they handle details (like the `_meta` field) that our
    custom version ignored. Though one advantage that our custom types had
    is that our generated types implemented `JsonSchema` and `ts_rs::TS`,
    whereas the types in `rmcp` do not. As such, part of the work of this PR
    is leveraging the adapters between `rmcp` types and the serializable
    types that are API for us (app server and MCP) introduced in #10356.
    
    Note this PR results in a number of changes to
    `codex-rs/app-server-protocol/schema`, which merit special attention
    during review. We must ensure that these changes are still
    backwards-compatible, which is possible because we have:
    
    ```diff
    - export type CallToolResult = { content: Array<ContentBlock>, isError?: boolean, structuredContent?: JsonValue, };
    + export type CallToolResult = { content: Array<JsonValue>, structuredContent?: JsonValue, isError?: boolean, _meta?: JsonValue, };
    ```
    
    so `ContentBlock` has been replaced with the more general `JsonValue`.
    Note that `ContentBlock` was defined as:
    
    ```typescript
    export type ContentBlock = TextContent | ImageContent | AudioContent | ResourceLink | EmbeddedResource;
    ```
    
    so the deletion of those individual variants should not be a cause of
    great concern.
    
    Similarly, we have the following change in
    `codex-rs/app-server-protocol/schema/typescript/Tool.ts`:
    
    ```
    - export type Tool = { annotations?: ToolAnnotations, description?: string, inputSchema: ToolInputSchema, name: string, outputSchema?: ToolOutputSchema, title?: string, };
    + export type Tool = { name: string, title?: string, description?: string, inputSchema: JsonValue, outputSchema?: JsonValue, annotations?: JsonValue, icons?: Array<JsonValue>, _meta?: JsonValue, };
    ```
    
    so:
    
    - `annotations?: ToolAnnotations` ➡️ `JsonValue`
    - `inputSchema: ToolInputSchema` ➡️ `JsonValue`
    - `outputSchema?: ToolOutputSchema` ➡️ `JsonValue`
    
    and two new fields: `icons?: Array<JsonValue>, _meta?: JsonValue`
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/10349).
    * #10357
    * __->__ #10349
    * #10356
  • feat: experimental flags (#10231)
    ## Problem being solved
    - We need a single, reliable way to mark app-server API surface as
    experimental so that:
      1. the runtime can reject experimental usage unless the client opts in
    2. generated TS/JSON schemas can exclude experimental methods/fields for
    stable clients.
    
    Right now that’s easy to drift or miss when done ad-hoc.
    
    ## How to declare experimental methods and fields
    - **Experimental method**: add `#[experimental("method/name")]` to the
    `ClientRequest` variant in `client_request_definitions!`.
    - **Experimental field**: on the params struct, derive `ExperimentalApi`
    and annotate the field with `#[experimental("method/name.field")]` + set
    `inspect_params: true` for the method variant so
    `ClientRequest::experimental_reason()` inspects params for experimental
    fields.
    
    ## How the macro solves it
    - The new derive macro lives in
    `codex-rs/codex-experimental-api-macros/src/lib.rs` and is used via
    `#[derive(ExperimentalApi)]` plus `#[experimental("reason")]`
    attributes.
    - **Structs**:
    - Generates `ExperimentalApi::experimental_reason(&self)` that checks
    only annotated fields.
      - The “presence” check is type-aware:
        - `Option<T>`: `is_some_and(...)` recursively checks inner.
        - `Vec`/`HashMap`/`BTreeMap`: must be non-empty.
        - `bool`: must be `true`.
        - Other types: considered present (returns `true`).
    - Registers each experimental field in an `inventory` with `(type_name,
    serialized field name, reason)` and exposes `EXPERIMENTAL_FIELDS` for
    that type. Field names are converted from `snake_case` to `camelCase`
    for schema/TS filtering.
    - **Enums**:
    - Generates an exhaustive `match` returning `Some(reason)` for annotated
    variants and `None` otherwise (no wildcard arm).
    - **Wiring**:
    - Runtime gating uses `ExperimentalApi::experimental_reason()` in
    `codex-rs/app-server/src/message_processor.rs` to reject requests unless
    `InitializeParams.capabilities.experimental_api == true`.
    - Schema/TS export filters use the inventory list and
    `EXPERIMENTAL_CLIENT_METHODS` from `client_request_definitions!` to
    strip experimental methods/fields when `experimental_api` is false.
  • feat: vendor app-server protocol schema fixtures (#10371)
    Similar to what @sayan-oai did in openai/codex#8956 for
    `config.schema.json`, this PR updates the repo so that it includes the
    output of `codex app-server generate-json-schema` and `codex app-server
    generate-ts` and adds a test to verify it is in sync with the current
    code.
    
    Motivation:
    - This makes any schema changes introduced by a PR transparent during
    code review.
    - In particular, this should help us catch PRs that would introduce a
    non-backwards-compatible change to the app schema (eventually, this
    should also be enforced by tooling).
    - Once https://github.com/openai/codex/pull/10231 is in to formalize the
    notion of "experimental" fields, we can work on ensuring the
    non-experimental bits are backwards-compatible.
    
    `codex-rs/app-server-protocol/tests/schema_fixtures.rs` was added as the
    test and `just write-app-server-schema` can be use to generate the
    vendored schema files.
    
    Incidentally, when I run:
    
    ```
    rg _ codex-rs/app-server-protocol/schema/typescript/v2
    ```
    
    I see a number of `snake_case` names that should be `camelCase`.
  • fix: introduce AbsolutePathBuf as part of sandbox config (#7856)
    Changes the `writable_roots` field of the `WorkspaceWrite` variant of
    the `SandboxPolicy` enum from `Vec<PathBuf>` to `Vec<AbsolutePathBuf>`.
    This is helpful because now callers can be sure the value is an absolute
    path rather than a relative one. (Though when using an absolute path in
    a Seatbelt config policy, we still have to _canonicalize_ it first.)
    
    Because `writable_roots` can be read from a config file, it is important
    that we are able to resolve relative paths properly using the parent
    folder of the config file as the base path.
  • chore: add cargo-deny configuration (#7119)
    - add GitHub workflow running cargo-deny on push/PR
    - document cargo-deny allowlist with workspace-dep notes and advisory
    ignores
    - align workspace crates to inherit version/edition/license for
    consistent checks
  • [app-server & core] introduce new codex error code and v2 app-server error events (#6938)
    This PR does two things:
    1. populate a new `codex_error_code` protocol in error events sent from
    core to client;
    2. old v1 core events `codex/event/stream_error` and `codex/event/error`
    will now both become `error`. We also show codex error code for
    turncompleted -> error status.
    
    new events in app server test:
    ```
    < {
    <   "method": "codex/event/stream_error",
    <   "params": {
    <     "conversationId": "019aa34c-0c14-70e0-9706-98520a760d67",
    <     "id": "0",
    <     "msg": {
    <       "codex_error_code": {
    <         "response_stream_disconnected": {
    <           "http_status_code": 401
    <         }
    <       },
    <       "message": "Reconnecting... 2/5",
    <       "type": "stream_error"
    <     }
    <   }
    < }
    
     {
    <   "method": "error",
    <   "params": {
    <     "error": {
    <       "codexErrorCode": {
    <         "responseStreamDisconnected": {
    <           "httpStatusCode": 401
    <         }
    <       },
    <       "message": "Reconnecting... 2/5"
    <     }
    <   }
    < }
    
    < {
    <   "method": "turn/completed",
    <   "params": {
    <     "turn": {
    <       "error": {
    <         "codexErrorCode": {
    <           "responseTooManyFailedAttempts": {
    <             "httpStatusCode": 401
    <           }
    <         },
    <         "message": "exceeded retry limit, last status: 401 Unauthorized, request id: 9a1b495a1a97ed3e-SJC"
    <       },
    <       "id": "0",
    <       "items": [],
    <       "status": "failed"
    <     }
    <   }
    < }
    ```
  • [app-server] feat: add v2 command execution approval flow (#6758)
    This PR adds the API V2 version of the command‑execution approval flow
    for the shell tool.
    
    This PR wires the new RPC (`item/commandExecution/requestApproval`, V2
    only) and related events (`item/started`, `item/completed`, and
    `item/commandExecution/delta`, which are emitted in both V1 and V2)
    through the app-server
    protocol. The new approval RPC is only sent when the user initiates a
    turn with the new `turn/start` API so we don't break backwards
    compatibility with VSCE.
    
    The approach I took was to make as few changes to the Codex core as
    possible, leveraging existing `EventMsg` core events, and translating
    those in app-server. I did have to add additional fields to
    `EventMsg::ExecCommandEndEvent` to capture the command's input so that
    app-server can statelessly transform these events to a
    `ThreadItem::CommandExecution` item for the `item/completed` event.
    
    Once we stabilize the API and it's complete enough for our partners, we
    can work on migrating the core to be aware of command execution items as
    a first-class concept.
    
    **Note**: We'll need followup work to make sure these APIs work for the
    unified exec tool, but will wait til that's stable and landed before
    doing a pass on app-server.
    
    Example payloads below:
    ```
    {
      "method": "item/started",
      "params": {
        "item": {
          "aggregatedOutput": null,
          "command": "/bin/zsh -lc 'touch /tmp/should-trigger-approval'",
          "cwd": "/Users/owen/repos/codex/codex-rs",
          "durationMs": null,
          "exitCode": null,
          "id": "call_lNWWsbXl1e47qNaYjFRs0dyU",
          "parsedCmd": [
            {
              "cmd": "touch /tmp/should-trigger-approval",
              "type": "unknown"
            }
          ],
          "status": "inProgress",
          "type": "commandExecution"
        }
      }
    }
    ```
    
    ```
    {
      "id": 0,
      "method": "item/commandExecution/requestApproval",
      "params": {
        "itemId": "call_lNWWsbXl1e47qNaYjFRs0dyU",
        "parsedCmd": [
          {
            "cmd": "touch /tmp/should-trigger-approval",
            "type": "unknown"
          }
        ],
        "reason": "Need to create file in /tmp which is outside workspace sandbox",
        "risk": null,
        "threadId": "019a93e8-0a52-7fe3-9808-b6bc40c0989a",
        "turnId": "1"
      }
    }
    ```
    
    ```
    {
      "id": 0,
      "result": {
        "acceptSettings": {
          "forSession": false
        },
        "decision": "accept"
      }
    }
    ```
    
    ```
    {
      "params": {
        "item": {
          "aggregatedOutput": null,
          "command": "/bin/zsh -lc 'touch /tmp/should-trigger-approval'",
          "cwd": "/Users/owen/repos/codex/codex-rs",
          "durationMs": 224,
          "exitCode": 0,
          "id": "call_lNWWsbXl1e47qNaYjFRs0dyU",
          "parsedCmd": [
            {
              "cmd": "touch /tmp/should-trigger-approval",
              "type": "unknown"
            }
          ],
          "status": "completed",
          "type": "commandExecution"
        }
      }
    }
    ```
  • [app-server] feat: export.rs supports a v2 namespace, initial v2 notifications (#6212)
    **Typescript and JSON schema exports**
    While working on Thread/Turn/Items type definitions, I realize we will
    run into name conflicts between v1 and v2 APIs (e.g. `RateLimitWindow`
    which won't be reusable since v1 uses `RateLimitWindow` from `protocol/`
    which uses snake_case, but we want to expose camelCase everywhere, so
    we'll define a V2 version of that struct that serializes as camelCase).
    
    To set us up for a clean and isolated v2 API, generate types into a
    `v2/` namespace for both typescript and JSON schema.
    - TypeScript: v2 types emit under `out_dir/v2/*.ts`, and root index.ts
    now re-exports them via `export * as v2 from "./v2"`;.
    - JSON Schemas: v2 definitions bundle under `#/definitions/v2/*` rather
    than the root.
    
    The location for the original types (v1 and types pulled from
    `protocol/` and other core crates) haven't changed and are still at the
    root. This is for backwards compatibility: no breaking changes to
    existing usages of v1 APIs and types.
    
    **Notifications**
    While working on export.rs, I:
    - refactored server/client notifications with macros (like we already do
    for methods) so they also get exported (I noticed they weren't being
    exported at all).
    - removed the hardcoded list of types to export as JSON schema by
    leveraging the existing macros instead
    - and took a stab at API V2 notifications. These aren't wired up yet,
    and I expect to iterate on these this week.
  • Generate JSON schema for app-server protocol (#5063)
    Add annotations and an export script that let us generate app-server
    protocol types as typescript and JSONSchema.
    
    The script itself is a bit hacky because we need to manually label some
    of the types. Unfortunately it seems that enum variants don't get good
    names by default and end up with something like `EventMsg1`,
    `EventMsg2`, etc. I'm not an expert in this by any means, but since this
    is only run manually and we already need to enumerate the types required
    to describe the protocol, it didn't seem that much worse. An ideal
    solution here would be to have some kind of root that we could generate
    schemas for in one go, but I'm not sure if that's compatible with how we
    generate the protocol today.
  • fix: remove mcp-types from app server protocol (#4537)
    We continue the separation between `codex app-server` and `codex
    mcp-server`.
    
    In particular, we introduce a new crate, `codex-app-server-protocol`,
    and migrate `codex-rs/protocol/src/mcp_protocol.rs` into it, renaming it
    `codex-rs/app-server-protocol/src/protocol.rs`.
    
    Because `ConversationId` was defined in `mcp_protocol.rs`, we move it
    into its own file, `codex-rs/protocol/src/conversation_id.rs`, and
    because it is referenced in a ton of places, we have to touch a lot of
    files as part of this PR.
    
    We also decide to get away from proper JSON-RPC 2.0 semantics, so we
    also introduce `codex-rs/app-server-protocol/src/jsonrpc_lite.rs`, which
    is basically the same `JSONRPCMessage` type defined in `mcp-types`
    except with all of the `"jsonrpc": "2.0"` removed.
    
    Getting rid of `"jsonrpc": "2.0"` makes our serialization logic
    considerably simpler, as we can lean heavier on serde to serialize
    directly into the wire format that we use now.