Commit Graph

81 Commits

  • Separate interactive and non-interactive sessions (#4612)
    Do not show exec session in VSCode/TUI selector.
  • fix: remove mcp-types from app server protocol (#4537)
    We continue the separation between `codex app-server` and `codex
    mcp-server`.
    
    In particular, we introduce a new crate, `codex-app-server-protocol`,
    and migrate `codex-rs/protocol/src/mcp_protocol.rs` into it, renaming it
    `codex-rs/app-server-protocol/src/protocol.rs`.
    
    Because `ConversationId` was defined in `mcp_protocol.rs`, we move it
    into its own file, `codex-rs/protocol/src/conversation_id.rs`, and
    because it is referenced in a ton of places, we have to touch a lot of
    files as part of this PR.
    
    We also decide to get away from proper JSON-RPC 2.0 semantics, so we
    also introduce `codex-rs/app-server-protocol/src/jsonrpc_lite.rs`, which
    is basically the same `JSONRPCMessage` type defined in `mcp-types`
    except with all of the `"jsonrpc": "2.0"` removed.
    
    Getting rid of `"jsonrpc": "2.0"` makes our serialization logic
    considerably simpler, as we can lean heavier on serde to serialize
    directly into the wire format that we use now.
  • fix: use macros to ensure request/response symmetry (#4529)
    Manually curating `protocol-ts/src/lib.rs` was error-prone, as expected.
    I finally asked Codex to write some Rust macros so we can ensure that:
    
    - For every variant of `ClientRequest` and `ServerRequest`, there is an
    associated `params` and `response` type.
    - All response types are included automatically in the output of `codex
    generate-ts`.
  • fix: ensure every variant of ClientRequest has a params field (#4512)
    This ensures changes the generated TypeScript type for `ClientRequest`
    so that instead of this:
    
    ```typescript
    /**
     * Request from the client to the server.
     */
    export type ClientRequest =
      | { method: "initialize"; id: RequestId; params: InitializeParams }
      | { method: "newConversation"; id: RequestId; params: NewConversationParams }
      // ...
      | { method: "getUserAgent"; id: RequestId }
      | { method: "userInfo"; id: RequestId }
      // ...
    ```
    
    we have this:
    
    ```typescript
    /**
     * Request from the client to the server.
     */
    export type ClientRequest =
      | { method: "initialize"; id: RequestId; params: InitializeParams }
      | { method: "newConversation"; id: RequestId; params: NewConversationParams }
      // ...
      | { method: "getUserAgent"; id: RequestId; params: undefined }
      | { method: "userInfo"; id: RequestId; params: undefined }
      // ...
    ```
    
    which makes TypeScript happier when it comes to destructuring instances
    of `ClientRequest` because it does not complain about `params` not being
    guaranteed to exist anymore.
  • fix: separate codex mcp into codex mcp-server and codex app-server (#4471)
    This is a very large PR with some non-backwards-compatible changes.
    
    Historically, `codex mcp` (or `codex mcp serve`) started a JSON-RPC-ish
    server that had two overlapping responsibilities:
    
    - Running an MCP server, providing some basic tool calls.
    - Running the app server used to power experiences such as the VS Code
    extension.
    
    This PR aims to separate these into distinct concepts:
    
    - `codex mcp-server` for the MCP server
    - `codex app-server` for the "application server"
    
    Note `codex mcp` still exists because it already has its own subcommands
    for MCP management (`list`, `add`, etc.)
    
    The MCP logic continues to live in `codex-rs/mcp-server` whereas the
    refactored app server logic is in the new `codex-rs/app-server` folder.
    Note that most of the existing integration tests in
    `codex-rs/mcp-server/tests/suite` were actually for the app server, so
    all the tests have been moved with the exception of
    `codex-rs/mcp-server/tests/suite/mod.rs`.
    
    Because this is already a large diff, I tried not to change more than I
    had to, so `codex-rs/app-server/tests/common/mcp_process.rs` still uses
    the name `McpProcess` for now, but I will do some mechanical renamings
    to things like `AppServer` in subsequent PRs.
    
    While `mcp-server` and `app-server` share some overlapping functionality
    (like reading streams of JSONL and dispatching based on message types)
    and some differences (completely different message types), I ended up
    doing a bit of copypasta between the two crates, as both have somewhat
    similar `message_processor.rs` and `outgoing_message.rs` files for now,
    though I expect them to diverge more in the near future.
    
    One material change is that of the initialize handshake for `codex
    app-server`, as we no longer use the MCP types for that handshake.
    Instead, we update `codex-rs/protocol/src/mcp_protocol.rs` to add an
    `Initialize` variant to `ClientRequest`, which takes the `ClientInfo`
    object we need to update the `USER_AGENT_SUFFIX` in
    `codex-rs/app-server/src/message_processor.rs`.
    
    One other material change is in
    `codex-rs/app-server/src/codex_message_processor.rs` where I eliminated
    a use of the `send_event_as_notification()` method I am generally trying
    to deprecate (because it blindly maps an `EventMsg` into a
    `JSONNotification`) in favor of `send_server_notification()`, which
    takes a `ServerNotification`, as that is intended to be a custom enum of
    all notification types supported by the app server. So to make this
    update, I had to introduce a new variant of `ServerNotification`,
    `SessionConfigured`, which is a non-backwards compatible change with the
    old `codex mcp`, and clients will have to be updated after the next
    release that contains this PR. Note that
    `codex-rs/app-server/tests/suite/list_resume.rs` also had to be update
    to reflect this change.
    
    I introduced `codex-rs/utils/json-to-toml/src/lib.rs` as a small utility
    crate to avoid some of the copying between `mcp-server` and
    `app-server`.
  • Custom prompts begin with /prompts: (#4476)
    <img width="608" height="354" alt="Screenshot 2025-09-29 at 4 41 08 PM"
    src="https://github.com/user-attachments/assets/162508eb-c1ac-4bc0-95f2-5e23cb4ae428"
    />
  • Parse out frontmatter for custom prompts (#4456)
    [Cherry picked from https://github.com/openai/codex/pull/3565]
    
    Removes the frontmatter description/args from custom prompt files and
    only includes body.
  • [mcp-server] Expose fuzzy file search in MCP (#2677)
    ## Summary
    Expose a simple fuzzy file search implementation for mcp clients to work
    with
    
    ## Testing
    - [x] Tested locally
  • OpenTelemetry events (#2103)
    ### Title
    
    ## otel
    
    Codex can emit [OpenTelemetry](https://opentelemetry.io/) **log events**
    that
    describe each run: outbound API requests, streamed responses, user
    input,
    tool-approval decisions, and the result of every tool invocation. Export
    is
    **disabled by default** so local runs remain self-contained. Opt in by
    adding an
    `[otel]` table and choosing an exporter.
    
    ```toml
    [otel]
    environment = "staging"   # defaults to "dev"
    exporter = "none"          # defaults to "none"; set to otlp-http or otlp-grpc to send events
    log_user_prompt = false    # defaults to false; redact prompt text unless explicitly enabled
    ```
    
    Codex tags every exported event with `service.name = "codex-cli"`, the
    CLI
    version, and an `env` attribute so downstream collectors can distinguish
    dev/staging/prod traffic. Only telemetry produced inside the
    `codex_otel`
    crate—the events listed below—is forwarded to the exporter.
    
    ### Event catalog
    
    Every event shares a common set of metadata fields: `event.timestamp`,
    `conversation.id`, `app.version`, `auth_mode` (when available),
    `user.account_id` (when available), `terminal.type`, `model`, and
    `slug`.
    
    With OTEL enabled Codex emits the following event types (in addition to
    the
    metadata above):
    
    - `codex.api_request`
      - `cf_ray` (optional)
      - `attempt`
      - `duration_ms`
      - `http.response.status_code` (optional)
      - `error.message` (failures)
    - `codex.sse_event`
      - `event.kind`
      - `duration_ms`
      - `error.message` (failures)
      - `input_token_count` (completion only)
      - `output_token_count` (completion only)
      - `cached_token_count` (completion only, optional)
      - `reasoning_token_count` (completion only, optional)
      - `tool_token_count` (completion only)
    - `codex.user_prompt`
      - `prompt_length`
      - `prompt` (redacted unless `log_user_prompt = true`)
    - `codex.tool_decision`
      - `tool_name`
      - `call_id`
    - `decision` (`approved`, `approved_for_session`, `denied`, or `abort`)
      - `source` (`config` or `user`)
    - `codex.tool_result`
      - `tool_name`
      - `call_id`
      - `arguments`
      - `duration_ms` (execution time for the tool)
      - `success` (`"true"` or `"false"`)
      - `output`
    
    ### Choosing an exporter
    
    Set `otel.exporter` to control where events go:
    
    - `none` – leaves instrumentation active but skips exporting. This is
    the
      default.
    - `otlp-http` – posts OTLP log records to an OTLP/HTTP collector.
    Specify the
      endpoint, protocol, and headers your collector expects:
    
      ```toml
      [otel]
      exporter = { otlp-http = {
        endpoint = "https://otel.example.com/v1/logs",
        protocol = "binary",
        headers = { "x-otlp-api-key" = "${OTLP_TOKEN}" }
      }}
      ```
    
    - `otlp-grpc` – streams OTLP log records over gRPC. Provide the endpoint
    and any
      metadata headers:
    
      ```toml
      [otel]
      exporter = { otlp-grpc = {
        endpoint = "https://otel.example.com:4317",
        headers = { "x-otlp-meta" = "abc123" }
      }}
      ```
    
    If the exporter is `none` nothing is written anywhere; otherwise you
    must run or point to your
    own collector. All exporters run on a background batch worker that is
    flushed on
    shutdown.
    
    If you build Codex from source the OTEL crate is still behind an `otel`
    feature
    flag; the official prebuilt binaries ship with the feature enabled. When
    the
    feature is disabled the telemetry hooks become no-ops so the CLI
    continues to
    function without the extra dependencies.
    
    ---------
    
    Co-authored-by: Anton Panasenko <apanasenko@openai.com>
  • Add Reset in for rate limits (#4111)
    - Parse the headers
    - Reorganize the struct because it's getting too long
    - show the resets at in the tui
    
    <img width="324" height="79" alt="image"
    src="https://github.com/user-attachments/assets/ca15cd48-f112-4556-91ab-1e3a9bc4683d"
    />
  • Send limits when getting rate limited (#4102)
    Users need visibility on rate limits when they are rate limited.
  • Add exec output-schema parameter (#4079)
    Adds structured output to `exec` via the `--structured-output`
    parameter.
  • feat: update default (#4076)
    Changes:
    - Default model and docs now use gpt-5-codex. 
    - Disables the GPT-5 Codex NUX by default.
    - Keeps presets available for API key users.
  • chore: unify cargo versions (#4044)
    Unify cargo versions at root
  • Forward Rate limits to the UI (#3965)
    We currently get information about rate limits in the response headers.
    We want to forward them to the clients to have better transparency.
    UI/UX plans have been discussed and this information is needed.
  • feat: /review (#3774)
    Adds `/review` action in TUI
    
    <img width="637" height="370" alt="Screenshot 2025-09-17 at 12 41 19 AM"
    src="https://github.com/user-attachments/assets/b1979a6e-844a-4b97-ab20-107c185aec1d"
    />
  • Switch to uuid_v7 and tighten ConversationId usage (#3819)
    Make sure conversations have a timestamp.
  • Handle resuming/forking after compact (#3533)
    We need to construct the history different when compact happens. For
    this, we need to just consider the history after compact and convert
    compact to a response item.
    
    This needs to change and use `build_compact_history` when this #3446 is
    merged.
  • Review Mode (Core) (#3401)
    ## 📝 Review Mode -- Core
    
    This PR introduces the Core implementation for Review mode:
    
    - New op `Op::Review { prompt: String }:` spawns a child review task
    with isolated context, a review‑specific system prompt, and a
    `Config.review_model`.
    - `EnteredReviewMode`: emitted when the child review session starts.
    Every event from this point onwards reflects the review session.
    - `ExitedReviewMode(Option<ReviewOutputEvent>)`: emitted when the review
    finishes or is interrupted, with optional structured findings:
    
    ```json
    {
      "findings": [
        {
          "title": "<≤ 80 chars, imperative>",
          "body": "<valid Markdown explaining *why* this is a problem; cite files/lines/functions>",
          "confidence_score": <float 0.0-1.0>,
          "priority": <int 0-3>,
          "code_location": {
            "absolute_file_path": "<file path>",
            "line_range": {"start": <int>, "end": <int>}
          }
        }
      ],
      "overall_correctness": "patch is correct" | "patch is incorrect",
      "overall_explanation": "<1-3 sentence explanation justifying the overall_correctness verdict>",
      "overall_confidence_score": <float 0.0-1.0>
    }
    ```
    
    ## Questions
    
    ### Why separate out its own message history?
    
    We want the review thread to match the training of our review models as
    much as possible -- that means using a custom prompt, removing user
    instructions, and starting a clean chat history.
    
    We also want to make sure the review thread doesn't leak into the parent
    thread.
    
    ### Why do this as a mode, vs. sub-agents?
    
    1. We want review to be a synchronous task, so it's fine for now to do a
    bespoke implementation.
    2. We're still unclear about the final structure for sub-agents. We'd
    prefer to land this quickly and then refactor into sub-agents without
    rushing that implementation.
  • feat: context compaction (#3446)
    ## Compact feature:
    1. Stops the model when the context window become too large
    2. Add a user turn, asking for the model to summarize
    3. Build a bridge that contains all the previous user message + the
    summary. Rendered from a template
    4. Start sampling again from a clean conversation with only that bridge
  • feat: reasoning effort as optional (#3527)
    Allow the reasoning effort to be optional
  • feat: change the behavior of SetDefaultModel RPC so None clears the value. (#3529)
    It turns out that we want slightly different behavior for the
    `SetDefaultModel` RPC because some models do not work with reasoning
    (like GPT-4.1), so we should be able to explicitly clear this value.
    
    Verified in `codex-rs/mcp-server/tests/suite/set_default_model.rs`.
  • feat: added SetDefaultModel to JSON-RPC server (#3512)
    This adds `SetDefaultModel`, which takes `model` and `reasoning_effort`
    as optional fields. If set, the field will overwrite what is in the
    user's `config.toml`.
    
    This reuses logic that was added to support the `/model` command in the
    TUI: https://github.com/openai/codex/pull/2799.
  • feat: include reasoning_effort in NewConversationResponse (#3506)
    `ClientRequest::NewConversation` picks up the reasoning level from the user's defaults in `config.toml`, so it should be reported in `NewConversationResponse`.
  • bug: default to image (#3501)
    Default the MIME type to image
  • Add Compact and Turn Context to the rollout items (#3444)
    Adding compact and turn context to the rollout items
    
    based on #3440
  • Simplify auth flow and reconcile differences between ChatGPT and API Key auth (#3189)
    This PR does the following:
    * Adds the ability to paste or type an API key.
    * Removes the `preferred_auth_method` config option. The last login
    method is always persisted in auth.json, so this isn't needed.
    * If OPENAI_API_KEY env variable is defined, the value is used to
    prepopulate the new UI. The env variable is otherwise ignored by the
    CLI.
    * Adds a new MCP server entry point "login_api_key" so we can implement
    this same API key behavior for the VS Code extension.
    <img width="473" height="140" alt="Screenshot 2025-09-04 at 3 51 04 PM"
    src="https://github.com/user-attachments/assets/c11bbd5b-8a4d-4d71-90fd-34130460f9d9"
    />
    <img width="726" height="254" alt="Screenshot 2025-09-04 at 3 51 32 PM"
    src="https://github.com/user-attachments/assets/6cc76b34-309a-4387-acbc-15ee5c756db9"
    />
  • Change forking to read the rollout from file (#3440)
    This PR changes get history op to get path. Then, forking will use a
    path. This will help us have one unified codepath for resuming/forking
    conversations. Will also help in having rollout history in order. It
    also fixes a bug where you won't see the UI when resuming after forking.
  • Unified execution (#3288)
    ## Unified PTY-Based Exec Tool
    
    Note: this requires to have this flag in the config:
    `use_experimental_unified_exec_tool=true`
    
    - Adds a PTY-backed interactive exec feature (“unified_exec”) with
    session reuse via
      session_id, bounded output (128 KiB), and timeout clamping (≤ 60 s).
    - Protocol: introduces ResponseItem::UnifiedExec { session_id,
    arguments, timeout_ms }.
    - Tools: exposes unified_exec as a function tool (Responses API);
    excluded from Chat
      Completions payload while still supported in tool lists.
    - Path handling: resolves commands via PATH (or explicit paths), with
    UTF‑8/newline‑aware
      truncation (truncate_middle).
    - Tests: cover command parsing, path resolution, session
    persistence/cleanup, multi‑session
      isolation, timeouts, and truncation behavior.
  • feat: add UserInfo request to JSON-RPC server (#3428)
    This adds a simple endpoint that provides the email address encoded in
    `$CODEX_HOME/auth.json`.
    
    As noted, for now, we do not hit the server to verify this is the user's
    true email address.
  • Added images to UserMessageEvent (#3400)
    This PR adds an `images` field to the existing `UserMessageEvent` so we
    can encode zero or more images associated with a user message. This
    allows images to be restored when conversations are restored.
  • Move initial history to protocol (#3422)
    To fix an edge case of forking then resuming
    
    #3419
  • Do not send reasoning item IDs (#3390)
    Response API doesn't require IDs on reasoning items anymore. 
    
    Fixes: https://github.com/openai/codex/issues/3292
  • feat: add ArchiveConversation to ClientRequest (#3353)
    Adds support for `ArchiveConversation` in the JSON-RPC server that takes
    a `(ConversationId, PathBuf)` pair and:
    
    - verifies the `ConversationId` corresponds to the rollout id at the
    `PathBuf`
    - if so, invokes
    `ConversationManager.remove_conversation(ConversationId)`
    - if the `CodexConversation` was in memory, send `Shutdown` and wait for
    `ShutdownComplete` with a timeout
    - moves the `.jsonl` file to `$CODEX_HOME/archived_sessions`
    
    ---------
    
    Co-authored-by: Gabriel Peal <gabriel@openai.com>
  • fix: include rollout_path in NewConversationResponse (#3352)
    Adding the `rollout_path` to the `NewConversationResponse` makes it so a
    client can perform subsequent operations on a `(ConversationId,
    PathBuf)` pair. #3353 will introduce support for `ArchiveConversation`.
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/3352).
    * #3353
    * __->__ #3352
  • feat: Run cargo shear during CI (#3338)
    Run cargo shear as part of the CI to ensure no unused dependencies
  • Generate more typescript types and return conversation id with ConversationSummary (#3219)
    This PR does multiple things that are necessary for conversation resume
    to work from the extension. I wanted to make sure everything worked so
    these changes wound up in one PR:
    1. Generate more ts types
    2. Resume rollout history files rather than create a new one every time
    it is resumed so you don't see a duplicate conversation in history for
    every resume. Chatted with @aibrahim-oai to verify this
    3. Return conversation_id in conversation summaries
    4. [Cleanup] Use serde and strong types for a lot of the rollout file
    parsing
  • Format large numbers in a more readable way. (#2046)
    - In the bottom line of the TUI, print the number of tokens to 3 sigfigs
      with an SI suffix, e.g. "1.23K".
    - Elsewhere where we print a number, I figure it's worthwhile to print
      the exact number, because e.g. it's a summary of your session. Here we print
      the numbers comma-separated.
  • Add a getUserAgent MCP method (#3320)
    This will allow the extension to pass this user agent + a suffix for its
    requests
  • Use ConversationId instead of raw Uuids (#3282)
    We're trying to migrate from `session_id: Uuid` to `conversation_id:
    ConversationId`. Not only does this give us more type safety but it
    unifies our terminology across Codex and with the implementation of
    session resuming, a conversation (which can span multiple sessions) is
    more appropriate.
    
    I started this impl on https://github.com/openai/codex/pull/3219 as part
    of getting resume working in the extension but it's big enough that it
    should be broken out.
  • Move token usage/context information to session level (#3221)
    Move context information into the main loop so it can be used to
    interrupt the loop or start auto-compaction.
  • Never store requests (#3212)
    When item ids are sent to Responses API it will load them from the
    database ignoring the provided values. This adds extra latency.
    
    Not having the mode to store requests also allows us to simplify the
    code.
    
    ## Breaking change
    
    The `disable_response_storage` configuration option is removed.
  • chore: improve serialization of ServerNotification (#3193)
    This PR introduces introduces a new
    `OutgoingMessage::AppServerNotification` variant that is designed to
    wrap a `ServerNotification`, which makes the serialization more
    straightforward compared to
    `OutgoingMessage::Notification(OutgoingNotification)`. We still use the
    latter for serializing an `Event` as a `JSONRPCMessage::Notification`,
    but I will try to get away from that in the near future.
    
    With this change, now the generated TypeScript type for
    `ServerNotification` is:
    
    ```typescript
    export type ServerNotification =
      | { "method": "authStatusChange", "params": AuthStatusChangeNotification }
      | { "method": "loginChatGptComplete", "params": LoginChatGptCompleteNotification };
    ```
    
    whereas before it was:
    
    ```typescript
    export type ServerNotification =
      | { type: "auth_status_change"; data: AuthStatusChangeNotification }
      | { type: "login_chat_gpt_complete"; data: LoginChatGptCompleteNotification };
    ```
    
    Once the `Event`s are migrated to the `ServerNotification` enum in Rust,
    it should be considerably easier to work with notifications on the
    TypeScript side, as it will be possible to `switch (message.method)` and
    check for exhaustiveness.
    
    Though we will probably need to introduce:
    
    ```typescript
    export type ServerMessage = ServerRequest | ServerNotification;
    ```
    
    and then we still need to group all of the `ServerResponse` types
    together, as well.
  • MCP: add session resume + history listing; (#3185)
    # External (non-OpenAI) Pull Request Requirements
    
    Before opening this Pull Request, please read the dedicated
    "Contributing" markdown file or your PR may be closed:
    https://github.com/openai/codex/blob/main/docs/contributing.md
    
    If your PR conforms to our contribution guidelines, replace this text
    with a detailed and high quality description of your changes.
  • Correctly calculate remaining context size (#3190)
    We had multiple issues with context size calculation:
    1. `initial_prompt_tokens` calculation based on cache size is not
    reliable, cache misses might set it to much higher value. For now
    hardcoded to a safer constant.
    2. Input context size for GPT-5 is 272k (that's where 33% came from).
    
    Fixes.
  • [mcp-server] Update read config interface (#3093)
    ## Summary
    Follow-up to #3056
    
    This PR updates the mcp-server interface for reading the config settings
    saved by the user. At risk of introducing _another_ Config struct, I
    think it makes sense to avoid tying our protocol to ConfigToml, as its
    become a bit unwieldy. GetConfigTomlResponse was a de-facto struct for
    this already - better to make it explicit, in my opinion.
    
    This is technically a breaking change of the mcp-server protocol, but
    given the previous interface was introduced so recently in #2725, and we
    have not yet even started to call it, I propose proceeding with the
    breaking change - but am open to preserving the old endpoint.
    
    ## Testing
    - [x] Added additional integration test coverage