Commit Graph

61 Commits

  • Handle resuming/forking after compact (#3533)
    We need to construct the history different when compact happens. For
    this, we need to just consider the history after compact and convert
    compact to a response item.
    
    This needs to change and use `build_compact_history` when this #3446 is
    merged.
  • Review Mode (Core) (#3401)
    ## 📝 Review Mode -- Core
    
    This PR introduces the Core implementation for Review mode:
    
    - New op `Op::Review { prompt: String }:` spawns a child review task
    with isolated context, a review‑specific system prompt, and a
    `Config.review_model`.
    - `EnteredReviewMode`: emitted when the child review session starts.
    Every event from this point onwards reflects the review session.
    - `ExitedReviewMode(Option<ReviewOutputEvent>)`: emitted when the review
    finishes or is interrupted, with optional structured findings:
    
    ```json
    {
      "findings": [
        {
          "title": "<≤ 80 chars, imperative>",
          "body": "<valid Markdown explaining *why* this is a problem; cite files/lines/functions>",
          "confidence_score": <float 0.0-1.0>,
          "priority": <int 0-3>,
          "code_location": {
            "absolute_file_path": "<file path>",
            "line_range": {"start": <int>, "end": <int>}
          }
        }
      ],
      "overall_correctness": "patch is correct" | "patch is incorrect",
      "overall_explanation": "<1-3 sentence explanation justifying the overall_correctness verdict>",
      "overall_confidence_score": <float 0.0-1.0>
    }
    ```
    
    ## Questions
    
    ### Why separate out its own message history?
    
    We want the review thread to match the training of our review models as
    much as possible -- that means using a custom prompt, removing user
    instructions, and starting a clean chat history.
    
    We also want to make sure the review thread doesn't leak into the parent
    thread.
    
    ### Why do this as a mode, vs. sub-agents?
    
    1. We want review to be a synchronous task, so it's fine for now to do a
    bespoke implementation.
    2. We're still unclear about the final structure for sub-agents. We'd
    prefer to land this quickly and then refactor into sub-agents without
    rushing that implementation.
  • feat: context compaction (#3446)
    ## Compact feature:
    1. Stops the model when the context window become too large
    2. Add a user turn, asking for the model to summarize
    3. Build a bridge that contains all the previous user message + the
    summary. Rendered from a template
    4. Start sampling again from a clean conversation with only that bridge
  • feat: reasoning effort as optional (#3527)
    Allow the reasoning effort to be optional
  • feat: change the behavior of SetDefaultModel RPC so None clears the value. (#3529)
    It turns out that we want slightly different behavior for the
    `SetDefaultModel` RPC because some models do not work with reasoning
    (like GPT-4.1), so we should be able to explicitly clear this value.
    
    Verified in `codex-rs/mcp-server/tests/suite/set_default_model.rs`.
  • feat: added SetDefaultModel to JSON-RPC server (#3512)
    This adds `SetDefaultModel`, which takes `model` and `reasoning_effort`
    as optional fields. If set, the field will overwrite what is in the
    user's `config.toml`.
    
    This reuses logic that was added to support the `/model` command in the
    TUI: https://github.com/openai/codex/pull/2799.
  • feat: include reasoning_effort in NewConversationResponse (#3506)
    `ClientRequest::NewConversation` picks up the reasoning level from the user's defaults in `config.toml`, so it should be reported in `NewConversationResponse`.
  • bug: default to image (#3501)
    Default the MIME type to image
  • Add Compact and Turn Context to the rollout items (#3444)
    Adding compact and turn context to the rollout items
    
    based on #3440
  • Simplify auth flow and reconcile differences between ChatGPT and API Key auth (#3189)
    This PR does the following:
    * Adds the ability to paste or type an API key.
    * Removes the `preferred_auth_method` config option. The last login
    method is always persisted in auth.json, so this isn't needed.
    * If OPENAI_API_KEY env variable is defined, the value is used to
    prepopulate the new UI. The env variable is otherwise ignored by the
    CLI.
    * Adds a new MCP server entry point "login_api_key" so we can implement
    this same API key behavior for the VS Code extension.
    <img width="473" height="140" alt="Screenshot 2025-09-04 at 3 51 04 PM"
    src="https://github.com/user-attachments/assets/c11bbd5b-8a4d-4d71-90fd-34130460f9d9"
    />
    <img width="726" height="254" alt="Screenshot 2025-09-04 at 3 51 32 PM"
    src="https://github.com/user-attachments/assets/6cc76b34-309a-4387-acbc-15ee5c756db9"
    />
  • Change forking to read the rollout from file (#3440)
    This PR changes get history op to get path. Then, forking will use a
    path. This will help us have one unified codepath for resuming/forking
    conversations. Will also help in having rollout history in order. It
    also fixes a bug where you won't see the UI when resuming after forking.
  • Unified execution (#3288)
    ## Unified PTY-Based Exec Tool
    
    Note: this requires to have this flag in the config:
    `use_experimental_unified_exec_tool=true`
    
    - Adds a PTY-backed interactive exec feature (“unified_exec”) with
    session reuse via
      session_id, bounded output (128 KiB), and timeout clamping (≤ 60 s).
    - Protocol: introduces ResponseItem::UnifiedExec { session_id,
    arguments, timeout_ms }.
    - Tools: exposes unified_exec as a function tool (Responses API);
    excluded from Chat
      Completions payload while still supported in tool lists.
    - Path handling: resolves commands via PATH (or explicit paths), with
    UTF‑8/newline‑aware
      truncation (truncate_middle).
    - Tests: cover command parsing, path resolution, session
    persistence/cleanup, multi‑session
      isolation, timeouts, and truncation behavior.
  • feat: add UserInfo request to JSON-RPC server (#3428)
    This adds a simple endpoint that provides the email address encoded in
    `$CODEX_HOME/auth.json`.
    
    As noted, for now, we do not hit the server to verify this is the user's
    true email address.
  • Added images to UserMessageEvent (#3400)
    This PR adds an `images` field to the existing `UserMessageEvent` so we
    can encode zero or more images associated with a user message. This
    allows images to be restored when conversations are restored.
  • Move initial history to protocol (#3422)
    To fix an edge case of forking then resuming
    
    #3419
  • Do not send reasoning item IDs (#3390)
    Response API doesn't require IDs on reasoning items anymore. 
    
    Fixes: https://github.com/openai/codex/issues/3292
  • feat: add ArchiveConversation to ClientRequest (#3353)
    Adds support for `ArchiveConversation` in the JSON-RPC server that takes
    a `(ConversationId, PathBuf)` pair and:
    
    - verifies the `ConversationId` corresponds to the rollout id at the
    `PathBuf`
    - if so, invokes
    `ConversationManager.remove_conversation(ConversationId)`
    - if the `CodexConversation` was in memory, send `Shutdown` and wait for
    `ShutdownComplete` with a timeout
    - moves the `.jsonl` file to `$CODEX_HOME/archived_sessions`
    
    ---------
    
    Co-authored-by: Gabriel Peal <gabriel@openai.com>
  • fix: include rollout_path in NewConversationResponse (#3352)
    Adding the `rollout_path` to the `NewConversationResponse` makes it so a
    client can perform subsequent operations on a `(ConversationId,
    PathBuf)` pair. #3353 will introduce support for `ArchiveConversation`.
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/3352).
    * #3353
    * __->__ #3352
  • feat: Run cargo shear during CI (#3338)
    Run cargo shear as part of the CI to ensure no unused dependencies
  • Generate more typescript types and return conversation id with ConversationSummary (#3219)
    This PR does multiple things that are necessary for conversation resume
    to work from the extension. I wanted to make sure everything worked so
    these changes wound up in one PR:
    1. Generate more ts types
    2. Resume rollout history files rather than create a new one every time
    it is resumed so you don't see a duplicate conversation in history for
    every resume. Chatted with @aibrahim-oai to verify this
    3. Return conversation_id in conversation summaries
    4. [Cleanup] Use serde and strong types for a lot of the rollout file
    parsing
  • Format large numbers in a more readable way. (#2046)
    - In the bottom line of the TUI, print the number of tokens to 3 sigfigs
      with an SI suffix, e.g. "1.23K".
    - Elsewhere where we print a number, I figure it's worthwhile to print
      the exact number, because e.g. it's a summary of your session. Here we print
      the numbers comma-separated.
  • Add a getUserAgent MCP method (#3320)
    This will allow the extension to pass this user agent + a suffix for its
    requests
  • Use ConversationId instead of raw Uuids (#3282)
    We're trying to migrate from `session_id: Uuid` to `conversation_id:
    ConversationId`. Not only does this give us more type safety but it
    unifies our terminology across Codex and with the implementation of
    session resuming, a conversation (which can span multiple sessions) is
    more appropriate.
    
    I started this impl on https://github.com/openai/codex/pull/3219 as part
    of getting resume working in the extension but it's big enough that it
    should be broken out.
  • Move token usage/context information to session level (#3221)
    Move context information into the main loop so it can be used to
    interrupt the loop or start auto-compaction.
  • Never store requests (#3212)
    When item ids are sent to Responses API it will load them from the
    database ignoring the provided values. This adds extra latency.
    
    Not having the mode to store requests also allows us to simplify the
    code.
    
    ## Breaking change
    
    The `disable_response_storage` configuration option is removed.
  • chore: improve serialization of ServerNotification (#3193)
    This PR introduces introduces a new
    `OutgoingMessage::AppServerNotification` variant that is designed to
    wrap a `ServerNotification`, which makes the serialization more
    straightforward compared to
    `OutgoingMessage::Notification(OutgoingNotification)`. We still use the
    latter for serializing an `Event` as a `JSONRPCMessage::Notification`,
    but I will try to get away from that in the near future.
    
    With this change, now the generated TypeScript type for
    `ServerNotification` is:
    
    ```typescript
    export type ServerNotification =
      | { "method": "authStatusChange", "params": AuthStatusChangeNotification }
      | { "method": "loginChatGptComplete", "params": LoginChatGptCompleteNotification };
    ```
    
    whereas before it was:
    
    ```typescript
    export type ServerNotification =
      | { type: "auth_status_change"; data: AuthStatusChangeNotification }
      | { type: "login_chat_gpt_complete"; data: LoginChatGptCompleteNotification };
    ```
    
    Once the `Event`s are migrated to the `ServerNotification` enum in Rust,
    it should be considerably easier to work with notifications on the
    TypeScript side, as it will be possible to `switch (message.method)` and
    check for exhaustiveness.
    
    Though we will probably need to introduce:
    
    ```typescript
    export type ServerMessage = ServerRequest | ServerNotification;
    ```
    
    and then we still need to group all of the `ServerResponse` types
    together, as well.
  • MCP: add session resume + history listing; (#3185)
    # External (non-OpenAI) Pull Request Requirements
    
    Before opening this Pull Request, please read the dedicated
    "Contributing" markdown file or your PR may be closed:
    https://github.com/openai/codex/blob/main/docs/contributing.md
    
    If your PR conforms to our contribution guidelines, replace this text
    with a detailed and high quality description of your changes.
  • Correctly calculate remaining context size (#3190)
    We had multiple issues with context size calculation:
    1. `initial_prompt_tokens` calculation based on cache size is not
    reliable, cache misses might set it to much higher value. For now
    hardcoded to a safer constant.
    2. Input context size for GPT-5 is 272k (that's where 33% came from).
    
    Fixes.
  • [mcp-server] Update read config interface (#3093)
    ## Summary
    Follow-up to #3056
    
    This PR updates the mcp-server interface for reading the config settings
    saved by the user. At risk of introducing _another_ Config struct, I
    think it makes sense to avoid tying our protocol to ConfigToml, as its
    become a bit unwieldy. GetConfigTomlResponse was a de-facto struct for
    this already - better to make it explicit, in my opinion.
    
    This is technically a breaking change of the mcp-server protocol, but
    given the previous interface was introduced so recently in #2725, and we
    have not yet even started to call it, I propose proceeding with the
    breaking change - but am open to preserving the old endpoint.
    
    ## Testing
    - [x] Added additional integration test coverage
  • fix: fix serde_as annotation and verify with test (#3170)
    I didn't do https://github.com/openai/codex/pull/3163 correctly the
    first time: now verified with a test.
  • fix: use a more efficient wire format for ExecCommandOutputDeltaEvent.chunk (#3163)
    When serializing to JSON, the existing solution created an enormous
    array of ints, which is far more bytes on the wire than a base64-encoded
    string would be.
  • Dividing UserMsgs into categories to send it back to the tui (#3127)
    This PR does the following:
    
    - divides user msgs into 3 categories: plain, user instructions, and
    environment context
    - Centralizes adding user instructions and environment context to a
    degree
    - Improve the integration testing
    
    Building on top of #3123
    
    Specifically this
    [comment](https://github.com/openai/codex/pull/3123#discussion_r2319885089).
    We need to send the user message while ignoring the User Instructions
    and Environment Context we attach.
  • Replay EventMsgs from Response Items when resuming a session with history. (#3123)
    ### Overview
    
    This PR introduces the following changes:
    	1.	Adds a unified mechanism to convert ResponseItem into EventMsg.
    2. Ensures that when a session is initialized with initial history, a
    vector of EventMsg is sent along with the session configuration. This
    allows clients to re-render the UI accordingly.
    	3. 	Added integration testing
    
    ### Caveats
    
    This implementation does not send every EventMsg that was previously
    dispatched to clients. The excluded events fall into two categories:
    	•	“Arguably” rolled-out events
    Examples include tool calls and apply-patch calls. While these events
    are conceptually rolled out, we currently only roll out ResponseItems.
    These events are already being handled elsewhere and transformed into
    EventMsg before being sent.
    	•	Non-rolled-out events
    Certain events such as TurnDiff, Error, and TokenCount are not rolled
    out at all.
    
    ### Future Directions
    
    At present, resuming a session involves maintaining two states:
    	•	UI State
    Clients can replay most of the important UI from the provided EventMsg
    history.
    	•	Model State
    The model receives the complete session history to reconstruct its
    internal state.
    
    This design provides a solid foundation. If, in the future, more precise
    UI reconstruction is needed, we have two potential paths:
    1. Introduce a third data structure that allows us to derive both
    ResponseItems and EventMsgs.
    2. Clearly divide responsibilities: the core system ensures the
    integrity of the model state, while clients are responsible for
    reconstructing the UI.
  • MCP sandbox call (#3128)
    I have read the CLA Document and I hereby sign the CLA
  • chore: Clean up verbosity config (#3056)
    ## Summary
    It appears that #2108 hit a merge conflict with #2355 - I failed to
    notice the path difference when re-reviewing the former. This PR
    rectifies that, and consolidates it into the protocol package, in line
    with our philosophy of specifying types in one place.
    
    ## Testing
    - [x] Adds config test for model_verbosity
  • Following up on #2371 post commit feedback (#2852)
    - Introduce websearch end to complement the begin 
    - Moves the logic of adding the sebsearch tool to
    create_tools_json_for_responses_api
    - Making it the client responsibility to toggle the tool on or off 
    - Other misc in #2371 post commit feedback
    - Show the query:
    
    <img width="1392" height="151" alt="image"
    src="https://github.com/user-attachments/assets/8457f1a6-f851-44cf-bcca-0d4fe460ce89"
    />
  • Custom /prompts (#2696)
    Adds custom `/prompts` to `~/.codex/prompts/<command>.md`.
    
    <img width="239" height="107" alt="Screenshot 2025-08-25 at 6 22 42 PM"
    src="https://github.com/user-attachments/assets/fe6ebbaa-1bf6-49d3-95f9-fdc53b752679"
    />
    
    ---
    
    Details:
    
    1. Adds `Op::ListCustomPrompts` to core.
    2. Returns `ListCustomPromptsResponse` with list of `CustomPrompt`
    (name, content).
    3. TUI calls the operation on load, and populates the custom prompts
    (excluding prompts that collide with builtins).
    4. Selecting the custom prompt automatically sends the prompt to the
    agent.
  • [mcp-server] Add GetConfig endpoint (#2725)
    ## Summary
    Adds a GetConfig request to the MCP Protocol, so MCP clients can
    evaluate the resolved config.toml settings which the harness is using.
    
    ## Testing
    - [x] Added an end to end test of the endpoint
  • send context window with task started (#2752)
    - Send context window with task started
    - Accounting for changing the model per turn
  • Add web search tool (#2371)
    Adds web_search tool, enabling the model to use Responses API web_search
    tool.
    - Disabled by default, enabled by --search flag
    - When --search is passed, exposes web_search_request function tool to
    the model, which triggers user approval. When approved, the model can
    use the web_search tool for the remainder of the turn
    <img width="1033" height="294" alt="image"
    src="https://github.com/user-attachments/assets/62ac6563-b946-465c-ba5d-9325af28b28f"
    />
    
    ---------
    
    Co-authored-by: easong-openai <easong@openai.com>
  • send-aggregated output (#2364)
    We want to send an aggregated output of stderr and stdout so we don't
    have to aggregate it stderr+stdout as we lose order sometimes.
    
    ---------
    
    Co-authored-by: Gabriel Peal <gpeal@users.noreply.github.com>
  • fork conversation from a previous message (#2575)
    This can be the underlying logic in order to start a conversation from a
    previous message. will need some love in the UI.
    
    Base for building this: #2588
  • Move models.rs to protocol (#2595)
    Moving models.rs to protocol so we can use them in `Codex` operations
  • Add AuthManager and enhance GetAuthStatus command (#2577)
    This PR adds a central `AuthManager` struct that manages the auth
    information used across conversations and the MCP server. Prior to this,
    each conversation and the MCP server got their own private snapshots of
    the auth information, and changes to one (such as a logout or token
    refresh) were not seen by others.
    
    This is especially problematic when multiple instances of the CLI are
    run. For example, consider the case where you start CLI 1 and log in to
    ChatGPT account X and then start CLI 2 and log out and then log in to
    ChatGPT account Y. The conversation in CLI 1 is still using account X,
    but if you create a new conversation, it will suddenly (and
    unexpectedly) switch to account Y.
    
    With the `AuthManager`, auth information is read from disk at the time
    the `ConversationManager` is constructed, and it is cached in memory.
    All new conversations use this same auth information, as do any token
    refreshes.
    
    The `AuthManager` is also used by the MCP server's GetAuthStatus
    command, which now returns the auth method currently used by the MCP
    server.
    
    This PR also includes an enhancement to the GetAuthStatus command. It
    now accepts two new (optional) input parameters: `include_token` and
    `refresh_token`. Callers can use this to request the in-use auth token
    and can optionally request to refresh the token.
    
    The PR also adds tests for the login and auth APIs that I recently added
    to the MCP server.
  • Added new auth-related methods and events to mcp server (#2496)
    This PR adds the following:
    * A getAuthStatus method on the mcp server. This returns the auth method
    currently in use (chatgpt or apikey) or none if the user is not
    authenticated. It also returns the "preferred auth method" which
    reflects the `preferred_auth_method` value in the config.
    * A logout method on the mcp server. If called, it logs out the user and
    deletes the `auth.json` file — the same behavior in the cli's `/logout`
    command.
    * An `authStatusChange` event notification that is sent when the auth
    status changes due to successful login or logout operations.
    * Logic to pass command-line config overrides to the mcp server at
    startup time. This allows use cases like `codex mcp -c
    preferred_auth_method=apikey`.