Commit Graph

62 Commits

  • fix: separate codex mcp into codex mcp-server and codex app-server (#4471)
    This is a very large PR with some non-backwards-compatible changes.
    
    Historically, `codex mcp` (or `codex mcp serve`) started a JSON-RPC-ish
    server that had two overlapping responsibilities:
    
    - Running an MCP server, providing some basic tool calls.
    - Running the app server used to power experiences such as the VS Code
    extension.
    
    This PR aims to separate these into distinct concepts:
    
    - `codex mcp-server` for the MCP server
    - `codex app-server` for the "application server"
    
    Note `codex mcp` still exists because it already has its own subcommands
    for MCP management (`list`, `add`, etc.)
    
    The MCP logic continues to live in `codex-rs/mcp-server` whereas the
    refactored app server logic is in the new `codex-rs/app-server` folder.
    Note that most of the existing integration tests in
    `codex-rs/mcp-server/tests/suite` were actually for the app server, so
    all the tests have been moved with the exception of
    `codex-rs/mcp-server/tests/suite/mod.rs`.
    
    Because this is already a large diff, I tried not to change more than I
    had to, so `codex-rs/app-server/tests/common/mcp_process.rs` still uses
    the name `McpProcess` for now, but I will do some mechanical renamings
    to things like `AppServer` in subsequent PRs.
    
    While `mcp-server` and `app-server` share some overlapping functionality
    (like reading streams of JSONL and dispatching based on message types)
    and some differences (completely different message types), I ended up
    doing a bit of copypasta between the two crates, as both have somewhat
    similar `message_processor.rs` and `outgoing_message.rs` files for now,
    though I expect them to diverge more in the near future.
    
    One material change is that of the initialize handshake for `codex
    app-server`, as we no longer use the MCP types for that handshake.
    Instead, we update `codex-rs/protocol/src/mcp_protocol.rs` to add an
    `Initialize` variant to `ClientRequest`, which takes the `ClientInfo`
    object we need to update the `USER_AGENT_SUFFIX` in
    `codex-rs/app-server/src/message_processor.rs`.
    
    One other material change is in
    `codex-rs/app-server/src/codex_message_processor.rs` where I eliminated
    a use of the `send_event_as_notification()` method I am generally trying
    to deprecate (because it blindly maps an `EventMsg` into a
    `JSONNotification`) in favor of `send_server_notification()`, which
    takes a `ServerNotification`, as that is intended to be a custom enum of
    all notification types supported by the app server. So to make this
    update, I had to introduce a new variant of `ServerNotification`,
    `SessionConfigured`, which is a non-backwards compatible change with the
    old `codex mcp`, and clients will have to be updated after the next
    release that contains this PR. Note that
    `codex-rs/app-server/tests/suite/list_resume.rs` also had to be update
    to reflect this change.
    
    I introduced `codex-rs/utils/json-to-toml/src/lib.rs` as a small utility
    crate to avoid some of the copying between `mcp-server` and
    `app-server`.
  • [mcp-server] Expose fuzzy file search in MCP (#2677)
    ## Summary
    Expose a simple fuzzy file search implementation for mcp clients to work
    with
    
    ## Testing
    - [x] Tested locally
  • make tests pass cleanly in sandbox (#4067)
    This changes the reqwest client used in tests to be sandbox-friendly,
    and skips a bunch of other tests that don't work inside the
    sandbox/without network.
  • feat: update default (#4076)
    Changes:
    - Default model and docs now use gpt-5-codex. 
    - Disables the GPT-5 Codex NUX by default.
    - Keeps presets available for API key users.
  • chore: clippy on redundant closure (#4058)
    Add redundant closure clippy rules and let Codex fix it by minimising
    FQP
  • chore: unify cargo versions (#4044)
    Unify cargo versions at root
  • Switch to uuid_v7 and tighten ConversationId usage (#3819)
    Make sure conversations have a timestamp.
  • Fix get_auth_status response when using custom provider (#3581)
    This PR addresses an edge-case bug that appears in the VS Code extension
    in the following situation:
    1. Log in using ChatGPT (using either the CLI or extension). This will
    create an `auth.json` file.
    2. Manually modify `config.toml` to specify a custom provider.
    3. Start a fresh copy of the VS Code extension.
    
    The profile menu in the VS Code extension will indicate that you are
    logged in using ChatGPT even though you're not.
    
    This is caused by the `get_auth_status` method returning an
    `auth_method: 'chatgpt'` when a custom provider is configured and it
    doesn't use OpenAI auth (i.e. `requires_openai_auth` is false). The
    method should always return `auth_method: None` if
    `requires_openai_auth` is false.
    
    The same bug also causes the NUX (new user experience) screen to be
    displayed in the VSCE in this situation.
  • feat: reasoning effort as optional (#3527)
    Allow the reasoning effort to be optional
  • feat: change the behavior of SetDefaultModel RPC so None clears the value. (#3529)
    It turns out that we want slightly different behavior for the
    `SetDefaultModel` RPC because some models do not work with reasoning
    (like GPT-4.1), so we should be able to explicitly clear this value.
    
    Verified in `codex-rs/mcp-server/tests/suite/set_default_model.rs`.
  • feat: added SetDefaultModel to JSON-RPC server (#3512)
    This adds `SetDefaultModel`, which takes `model` and `reasoning_effort`
    as optional fields. If set, the field will overwrite what is in the
    user's `config.toml`.
    
    This reuses logic that was added to support the `/model` command in the
    TUI: https://github.com/openai/codex/pull/2799.
  • feat: include reasoning_effort in NewConversationResponse (#3506)
    `ClientRequest::NewConversation` picks up the reasoning level from the user's defaults in `config.toml`, so it should be reported in `NewConversationResponse`.
  • Simplify auth flow and reconcile differences between ChatGPT and API Key auth (#3189)
    This PR does the following:
    * Adds the ability to paste or type an API key.
    * Removes the `preferred_auth_method` config option. The last login
    method is always persisted in auth.json, so this isn't needed.
    * If OPENAI_API_KEY env variable is defined, the value is used to
    prepopulate the new UI. The env variable is otherwise ignored by the
    CLI.
    * Adds a new MCP server entry point "login_api_key" so we can implement
    this same API key behavior for the VS Code extension.
    <img width="473" height="140" alt="Screenshot 2025-09-04 at 3 51 04 PM"
    src="https://github.com/user-attachments/assets/c11bbd5b-8a4d-4d71-90fd-34130460f9d9"
    />
    <img width="726" height="254" alt="Screenshot 2025-09-04 at 3 51 32 PM"
    src="https://github.com/user-attachments/assets/6cc76b34-309a-4387-acbc-15ee5c756db9"
    />
  • feat: add UserInfo request to JSON-RPC server (#3428)
    This adds a simple endpoint that provides the email address encoded in
    `$CODEX_HOME/auth.json`.
    
    As noted, for now, we do not hit the server to verify this is the user's
    true email address.
  • fix: ensure output of codex-rs/mcp-types/generate_mcp_types.py matches codex-rs/mcp-types/src/lib.rs (#3439)
    https://github.com/openai/codex/pull/3395 updated `mcp-types/src/lib.rs`
    by hand, but that file is generated code that is produced by
    `mcp-types/generate_mcp_types.py`. Unfortunately, we do not have
    anything in CI to verify this right now, but I will address that in a
    subsequent PR.
    
    #3395 ended up introducing a change that added a required field when
    deserializing `InitializeResult`, breaking Codex when used as an MCP
    client, so the quick fix in #3436 was to make the new field `Optional`
    with `skip_serializing_if = "Option::is_none"`, but that did not address
    the problem that `mcp-types/generate_mcp_types.py` and
    `mcp-types/src/lib.rs` are out of sync.
    
    This PR gets things back to where they are in sync. It removes the
    custom `mcp_types::McpClientInfo` type that was added to
    `mcp-types/src/lib.rs` and forces us to use the generated
    `mcp_types::Implementation` type. Though this PR also updates
    `generate_mcp_types.py` to generate the additional `user_agent:
    Optional<String>` field on `Implementation` so that we can continue to
    specify it when Codex operates as an MCP server.
    
    However, this also requires us to specify `user_agent: None` when Codex
    operates as an MCP client.
    
    We may want to introduce our own `InitializeResult` type that is
    specific to when we run as a server to avoid this in the future, but my
    immediate goal is just to get things back in sync.
  • Improved resiliency of two auth-related tests (#3427)
    This PR improves two existing auth-related tests. They were failing when
    run in an environment where an `OPENAI_API_KEY` env variable was
    defined. The change makes them more resilient.
  • Set a user agent suffix when used as a mcp server (#3395)
    This automatically adds a user agent suffix whenever the CLI is used as
    a MCP server
  • Introduce rollout items (#3380)
    This PR introduces Rollout items. This enable us to rollout eventmsgs
    and session meta.
    
    This is mostly #3214 with rebase on main
  • Replace config.responses_originator_header_internal_override with CODEX_INTERNAL_ORIGINATOR_OVERRIDE_ENV_VAR (#3388)
    The previous config approach had a few issues:
    1. It is part of the config but not designed to be used externally
    2. It had to be wired through many places (look at the +/- on this PR
    3. It wasn't guaranteed to be set consistently everywhere because we
    don't have a super well defined way that configs stack. For example, the
    extension would configure during newConversation but anything that
    happened outside of that (like login) wouldn't get it.
    
    This env var approach is cleaner and also creates one less thing we have
    to deal with when coming up with a better holistic story around configs.
    
    One downside is that I removed the unit test testing for the override
    because I don't want to deal with setting the global env or spawning
    child processes and figuring out how to introspect their originator
    header. The new code is sufficiently simple and I tested it e2e that I
    feel as if this is still worth it.
  • feat: add ArchiveConversation to ClientRequest (#3353)
    Adds support for `ArchiveConversation` in the JSON-RPC server that takes
    a `(ConversationId, PathBuf)` pair and:
    
    - verifies the `ConversationId` corresponds to the rollout id at the
    `PathBuf`
    - if so, invokes
    `ConversationManager.remove_conversation(ConversationId)`
    - if the `CodexConversation` was in memory, send `Shutdown` and wait for
    `ShutdownComplete` with a timeout
    - moves the `.jsonl` file to `$CODEX_HOME/archived_sessions`
    
    ---------
    
    Co-authored-by: Gabriel Peal <gabriel@openai.com>
  • fix: include rollout_path in NewConversationResponse (#3352)
    Adding the `rollout_path` to the `NewConversationResponse` makes it so a
    client can perform subsequent operations on a `(ConversationId,
    PathBuf)` pair. #3353 will introduce support for `ArchiveConversation`.
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/3352).
    * #3353
    * __->__ #3352
  • feat: Run cargo shear during CI (#3338)
    Run cargo shear as part of the CI to ensure no unused dependencies
  • Generate more typescript types and return conversation id with ConversationSummary (#3219)
    This PR does multiple things that are necessary for conversation resume
    to work from the extension. I wanted to make sure everything worked so
    these changes wound up in one PR:
    1. Generate more ts types
    2. Resume rollout history files rather than create a new one every time
    it is resumed so you don't see a duplicate conversation in history for
    every resume. Chatted with @aibrahim-oai to verify this
    3. Return conversation_id in conversation summaries
    4. [Cleanup] Use serde and strong types for a lot of the rollout file
    parsing
  • Add a getUserAgent MCP method (#3320)
    This will allow the extension to pass this user agent + a suffix for its
    requests
  • Use ConversationId instead of raw Uuids (#3282)
    We're trying to migrate from `session_id: Uuid` to `conversation_id:
    ConversationId`. Not only does this give us more type safety but it
    unifies our terminology across Codex and with the implementation of
    session resuming, a conversation (which can span multiple sessions) is
    more appropriate.
    
    I started this impl on https://github.com/openai/codex/pull/3219 as part
    of getting resume working in the extension but it's big enough that it
    should be broken out.
  • Never store requests (#3212)
    When item ids are sent to Responses API it will load them from the
    database ignoring the provided values. This adds extra latency.
    
    Not having the mode to store requests also allows us to simplify the
    code.
    
    ## Breaking change
    
    The `disable_response_storage` configuration option is removed.
  • MCP: add session resume + history listing; (#3185)
    # External (non-OpenAI) Pull Request Requirements
    
    Before opening this Pull Request, please read the dedicated
    "Contributing" markdown file or your PR may be closed:
    https://github.com/openai/codex/blob/main/docs/contributing.md
    
    If your PR conforms to our contribution guidelines, replace this text
    with a detailed and high quality description of your changes.
  • [mcp-server] Update read config interface (#3093)
    ## Summary
    Follow-up to #3056
    
    This PR updates the mcp-server interface for reading the config settings
    saved by the user. At risk of introducing _another_ Config struct, I
    think it makes sense to avoid tying our protocol to ConfigToml, as its
    become a bit unwieldy. GetConfigTomlResponse was a de-facto struct for
    this already - better to make it explicit, in my opinion.
    
    This is technically a breaking change of the mcp-server protocol, but
    given the previous interface was introduced so recently in #2725, and we
    have not yet even started to call it, I propose proceeding with the
    breaking change - but am open to preserving the old endpoint.
    
    ## Testing
    - [x] Added additional integration test coverage
  • Move CodexAuth and AuthManager to the core crate (#3074)
    Fix a long standing layering issue.
  • chore: print stderr from MCP server to test output using eprintln! (#2849)
    Related to https://github.com/openai/codex/pull/2848, I don't see the
    stderr from `codex mcp` colocated with the other stderr from
    `test_shell_command_approval_triggers_elicitation()` when it fails even
    though we have `RUST_LOG=debug` set when we spawn `codex mcp`:
    
    
    https://github.com/openai/codex/blob/1e9e703b969d3f0965b31d1cc3d70fed3ebdd6f6/codex-rs/mcp-server/tests/common/mcp_process.rs#L65
    
    Let's try this new logic which should be more explicit.
  • chore: try to make it easier to debug the flakiness of test_shell_command_approval_triggers_elicitation (#2848)
    `test_shell_command_approval_triggers_elicitation()` is one of a number
    of integration tests that we have observed to be flaky on GitHub CI, so
    this PR tries to reduce the flakiness _and_ to provide us with more
    information when it flakes. Specifically:
    
    - Changed the command that we use to trigger the elicitation from `git
    init` to `python3 -c 'import pathlib; pathlib.Path(r"{}").touch()'`
    because running `git` seems more likely to invite variance.
    - Increased the timeout to wait for the task response from 10s to 20s.
    - Added more logging.
  • [mcp-server] Add GetConfig endpoint (#2725)
    ## Summary
    Adds a GetConfig request to the MCP Protocol, so MCP clients can
    evaluate the resolved config.toml settings which the harness is using.
    
    ## Testing
    - [x] Added an end to end test of the endpoint
  • test: faster test execution in codex-core (#2633)
    this dramatically improves time to run `cargo test -p codex-core` (~25x
    speedup).
    
    before:
    ```
    cargo test -p codex-core  35.96s user 68.63s system 19% cpu 8:49.80 total
    ```
    
    after:
    ```
    cargo test -p codex-core  5.51s user 8.16s system 63% cpu 21.407 total
    ```
    
    both tests measured "hot", i.e. on a 2nd run with no filesystem changes,
    to exclude compile times.
    
    approach inspired by [Delete Cargo Integration
    Tests](https://matklad.github.io/2021/02/27/delete-cargo-integration-tests.html),
    we move all test cases in tests/ into a single suite in order to have a
    single binary, as there is significant overhead for each test binary
    executed, and because test execution is only parallelized with a single
    binary.
  • Fix flakiness in shell command approval test (#2547)
    ## Summary
    - read the shell exec approval request's actual id instead of assuming
    it is always 0
    - use that id when validating and responding in the test
    
    ## Testing
    - `cargo test -p codex-mcp-server
    test_shell_command_approval_triggers_elicitation`
    
    ------
    https://chatgpt.com/codex/tasks/task_i_68a6ab9c732c832c81522cbf11812be0
  • Add AuthManager and enhance GetAuthStatus command (#2577)
    This PR adds a central `AuthManager` struct that manages the auth
    information used across conversations and the MCP server. Prior to this,
    each conversation and the MCP server got their own private snapshots of
    the auth information, and changes to one (such as a logout or token
    refresh) were not seen by others.
    
    This is especially problematic when multiple instances of the CLI are
    run. For example, consider the case where you start CLI 1 and log in to
    ChatGPT account X and then start CLI 2 and log out and then log in to
    ChatGPT account Y. The conversation in CLI 1 is still using account X,
    but if you create a new conversation, it will suddenly (and
    unexpectedly) switch to account Y.
    
    With the `AuthManager`, auth information is read from disk at the time
    the `ConversationManager` is constructed, and it is cached in memory.
    All new conversations use this same auth information, as do any token
    refreshes.
    
    The `AuthManager` is also used by the MCP server's GetAuthStatus
    command, which now returns the auth method currently used by the MCP
    server.
    
    This PR also includes an enhancement to the GetAuthStatus command. It
    now accepts two new (optional) input parameters: `include_token` and
    `refresh_token`. Callers can use this to request the in-use auth token
    and can optionally request to refresh the token.
    
    The PR also adds tests for the login and auth APIs that I recently added
    to the MCP server.
  • chore: move mcp-server/src/wire_format.rs to protocol/src/mcp_protocol.rs (#2423)
    The existing `wire_format.rs` should share more types with the
    `codex-protocol` crate (like `AskForApproval` instead of maintaining a
    parallel `CodexToolCallApprovalPolicy` enum), so this PR moves
    `wire_format.rs` into `codex-protocol`, renaming it as
    `mcp-protocol.rs`. We also de-dupe types, where appropriate.
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2423).
    * #2424
    * __->__ #2423
  • fix: introduce EventMsg::TurnAborted (#2365)
    Introduces `EventMsg::TurnAborted` that should be sent in response to
    `Op::Interrupt`.
    
    In the MCP server, updates the handling of a
    `ClientRequest::InterruptConversation` request such that it sends the
    `Op::Interrupt` but does not respond to the request until it sees an
    `EventMsg::TurnAborted`.
  • feat: introduce ClientRequest::SendUserTurn (#2345)
    This adds a new request type, `SendUserTurn`, that makes it possible to
    submit a `Op::UserTurn` operation (introduced in #2329) to a
    conversation. This PR also adds a new integration test that verifies
    that changing from `AskForApproval::UnlessTrusted` to
    `AskForApproval::Never` mid-conversation ensures that an elicitation is
    no longer sent for running `python3 -c print(42)`.
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2345).
    * __->__ #2345
    * #2329
    * #2343
    * #2340
    * #2338
  • fix: try to fix flakiness in test_shell_command_approval_triggers_elicitation (#2344)
    I still see flakiness in
    `test_shell_command_approval_triggers_elicitation()` on occasion where
    `MockServer` claims it has not received all of its expected requests.
    
    I recently introduced a similar type of test in #2264,
    `test_codex_jsonrpc_conversation_flow()`, which I have not seen flake
    (yet!), so this PR pulls over two things I did in that test:
    
    - increased `worker_threads` from `2` to `4`
    - added an assertion to make sure the `task_complete` notification is
    received
    
    Honestly, I'm still not sure why `MockServer` claims it sometimes does
    not receive all its expected requests given that we assert that the
    final `JSONRPCResponse` is read on the stream, but let's give this a
    shot.
    
    Assuming this fixes things, my hypothesis is that the increase in
    `worker_threads` helps because perhaps there are async tasks in
    `MockServer` that do not reliably complete fully when there are not
    enough threads available? If that is correct, it seems like the test
    would still be flaky, though perhaps with lower frequency?
  • Added allow-expect-in-tests / allow-unwrap-in-tests (#2328)
    This PR:
    * Added the clippy.toml to configure allowable expect / unwrap usage in
    tests
    * Removed as many expect/allow lines as possible from tests
    * moved a bunch of allows to expects where possible
    
    Note: in integration tests, non `#[test]` helper functions are not
    covered by this so we had to leave a few lingering `expect(expect_used`
    checks around
  • fix: verify notifications are sent with the conversationId set (#2278)
    This updates `CodexMessageProcessor` so that each notification it sends
    for a `EventMsg` from a `CodexConversation` such that:
    
    - The `params` always has an appropriate `conversationId` field.
    - The `method` is now includes the name of the `EventMsg` type rather
    than using `codex/event` as the `method` type for all notifications. (We
    currently prefix the method name with `codex/event/`, but I think that
    should go away once we formalize the notification schema in
    `wire_format.rs`.)
    
    As part of this, we update `test_codex_jsonrpc_conversation_flow()` to
    verify that the `task_finished` notification has made it through the
    system instead of sleeping for 5s and "hoping" the server finished
    processing the task. Note we have seen some flakiness in some of our
    other, similar integration tests, and I expect adding a similar check
    would help in those cases, as well.
  • feat: support traditional JSON-RPC request/response in MCP server (#2264)
    This introduces a new set of request types that our `codex mcp`
    supports. Note that these do not conform to MCP tool calls so that
    instead of having to send something like this:
    
    ```json
    {
      "jsonrpc": "2.0",
      "method": "tools/call",
      "id": 42,
      "params": {
        "name": "newConversation",
        "arguments": {
          "model": "gpt-5",
          "approvalPolicy": "on-request"
        }
      }
    }
    ```
    
    we can send something like this:
    
    
    ```json
    {
      "jsonrpc": "2.0",
      "method": "newConversation",
      "id": 42,
      "params": {
        "model": "gpt-5",
        "approvalPolicy": "on-request"
      }
    }
    ```
    
    Admittedly, this new format is not a valid MCP tool call, but we are OK
    with that right now. (That is, not everything we might want to request
    of `codex mcp` is something that is appropriate for an autonomous agent
    to do.)
    
    To start, this introduces four request types:
    
    - `newConversation`
    - `sendUserMessage`
    - `addConversationListener`
    - `removeConversationListener`
    
    The new `mcp-server/tests/codex_message_processor_flow.rs` shows how
    these can be used.
    
    The types are defined on the `CodexRequest` enum, so we introduce a new
    `CodexMessageProcessor` that is responsible for dealing with requests
    from this enum. The top-level `MessageProcessor` has been updated so
    that when `process_request()` is called, it first checks whether the
    request conforms to `CodexRequest` and dispatches it to
    `CodexMessageProcessor` if so.
    
    Note that I also decided to use `camelCase` for the on-the-wire format,
    as that seems to be the convention for MCP.
    
    For the moment, the new protocol is defined in `wire_format.rs` within
    the `mcp-server` crate, but in a subsequent PR, I will probably move it
    to its own crate to ensure the protocol has minimal dependencies and
    that we can codegen a schema from it.
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2264).
    * #2278
    * __->__ #2264
  • [env] Remove git config for now (#1884)
    ## Summary
    Forgot to remove this in #1869 last night! Too much of a performance hit
    on the main thread. We can bring it back via an async thread on startup.
  • [prompts] Add <environment_context> (#1869)
    ## Summary
    Includes a new user message in the api payload which provides useful
    environment context for the model, so it knows about things like the
    current working directory and the sandbox.
    
    ## Testing
    Updated unit tests
  • [tests] Investigate flakey mcp-server test (#1877)
    ## Summary
    Have seen these tests flaking over the course of today on different
    boxes. `wiremock` seems to be generally written with tokio/threads in
    mind but based on the weird panics from the tests, let's see if this
    helps.
  • Fix flaky test_shell_command_approval_triggers_elicitation test (#1802)
    This doesn't flake very often but this should fix it.
  • MCP: add conversation.create tool [Stack 2/2] (#1783)
    Introduce conversation.create handler (handle_create_conversation) and
    wire it in MessageProcessor.
    
    Stack:
    Top: #1783 
    Bottom: #1784
    
    ---------
    
    Co-authored-by: Gabriel Peal <gpeal@users.noreply.github.com>