Commit Graph

600 Commits

  • feat: stream exec stdout events (#1786)
    ## Summary
    - stream command stdout as `ExecCommandStdout` events
    - forward streamed stdout to clients and ignore in human output
    processor
    - adjust call sites for new streaming API
  • fix insert_history modifier handling (#1774)
    This fixes a bug in insert_history_lines where writing
    `Line::From(vec!["A".bold(), "B".into()])` would write "B" as bold,
    because "B" didn't explicitly subtract bold.
  • Introduce a new function to just send user message [Stack 3/3] (#1686)
    - MCP server: add send-user-message tool to send user input to a running
    Codex session
    - Added an integration tests for the happy and sad paths
    
    Changes:
    •	Add tool definition and schema.
    •	Expose tool in capabilities.
    •	Route and handle tool requests with validation.
    •	Tests for success, bad UUID, and missing session.
    
    
    follow‑ups
    • Listen path not implemented yet; the tool is present but marked “don’t
    use yet” in code comments.
    • Session run flag reset: clear running_session_id_set appropriately
    after turn completion/errors.
    
    This is the third PR in a stack.
    Stack:
    Final: #1686
    Intermediate: #1751
    First: #1750
  • Add /compact (#1527)
    - Add operation to summarize the context so far.
    - The operation runs a compact task that summarizes the context.
    - The operation clear the previous context to free the context window
    - The operation didn't use `run_task` to avoid corrupting the session
    - Add /compact in the tui
    
    
    
    https://github.com/user-attachments/assets/e06c24e5-dcfb-4806-934a-564d425a919c
  • MCP server: route structured tool-call requests and expose mcp_protocol [Stack 2/3] (#1751)
    - Expose mcp_protocol from mcp-server for reuse in tests and callers.
    - In MessageProcessor, detect structured ToolCallRequestParams in
    tools/call and forward to a new handler.
    - Add handle_new_tool_calls scaffold (returns error for now).
    - Test helper: add send_send_user_message_tool_call to McpProcess to
    send ConversationSendMessage requests;
    
    This is the second PR in a stack.
    Stack:
    Final: #1686
    Intermediate: #1751
    First: #1750
  • MCP Protocol: Align tool-call response with CallToolResult [Stack 1/3] (#1750)
    # Summary
    - Align MCP server responses with mcp_types by emitting [CallToolResult,
    RequestId] instead of an object.
    Update send-message result to a tagged enum: Ok or Error { message }.
    
    # Why
    Protocol compliance with current MCP schema.
    
    # Tests
    - Updated assertions in mcp_protocol.rs for create/stream/send/list and
    error cases.
    
    This is the first PR in a stack.
    Stack:
    Final: #1686
    Intermediate: #1751
    First: #1750
  • Detect kitty terminals (#1748)
    We want to detect kitty terminals so we can preferentially upgrade their UX without degrading older terminals.
  • insert history lines with redraw (#1769)
    This delays the call to insert_history_lines until a redraw is
    happening. Crucially, the new lines are inserted _after the viewport is
    resized_. This results in fewer stray blank lines below the viewport
    when modals (e.g. user approval) are closed.
  • lighter approval modal (#1768)
    The yellow hazard stripes were too scary :)
    
    This also has the added benefit of not rendering anything at the full
    width of the terminal, so resizing is a little easier to handle.
    
    <img width="860" height="390" alt="Screenshot 2025-07-31 at 4 03 29 PM"
    src="https://github.com/user-attachments/assets/18476e1a-065d-4da9-92fe-e94978ab0fce"
    />
    
    <img width="860" height="390" alt="Screenshot 2025-07-31 at 4 05 03 PM"
    src="https://github.com/user-attachments/assets/337db0da-de40-48c6-ae71-0e40f24b87e7"
    />
  • do not dispatch key releases (#1771)
    when we enabled KKP in https://github.com/openai/codex/pull/1743, we
    started receiving keyup events, but didn't expect them anywhere in our
    code. for now, just don't dispatch them at all.
  • Send account id when available (#1767)
    For users with multiple accounts we need to specify the account to use.
  • Initial planning tool (#1753)
    We need to optimize the prompt, but this causes the model to use the new
    planning_tool.
    
    <img width="765" height="110" alt="image"
    src="https://github.com/user-attachments/assets/45633f7f-3c85-4e60-8b80-902f1b3b508d"
    />
  • chore: refactor exec.rs: create separate seatbelt.rs and spawn.rs files (#1762)
    At 550 lines, `exec.rs` was a bit large. In particular, I found it hard
    to locate the Seatbelt-related code quickly without a file with
    `seatbelt` in the name, so this refactors things so:
    
    - `spawn_command_under_seatbelt()` and dependent code moves to a new
    `seatbelt.rs` file
    - `spawn_child_async()` and dependent code moves to a new `spawn.rs`
    file
  • Fix double-scrolling in approval model (#1754)
    Previously, pressing up or down arrow in the new approval modal would be
    the equivalent of two up or down presses.
  • fix: ensure PatchApplyBeginEvent and PatchApplyEndEvent are dispatched reliably (#1760)
    This is a follow-up to https://github.com/openai/codex/pull/1705, as
    that PR inadvertently lost the logic where `PatchApplyBeginEvent` and
    `PatchApplyEndEvent` events were sent when patches were auto-approved.
    
    Though as part of this fix, I believe this also makes an important
    safety fix to `assess_patch_safety()`, as there was a case that returned
    `SandboxType::None`, which arguably is the thing we were trying to avoid
    in #1705.
    
    On a high level, we want there to be only one codepath where
    `apply_patch` happens, which should be unified with the patch to run
    `exec`, in general, so that sandboxing is applied consistently for both
    cases.
    
    Prior to this change, `apply_patch()` in `core` would either:
    
    * exit early, delegating to `exec()` to shell out to `apply_patch` using
    the appropriate sandbox
    * proceed to run the logic for `apply_patch` in memory
    
    
    https://github.com/openai/codex/blob/549846b29ad52f6cb4f8560365a731966054a9b3/codex-rs/core/src/apply_patch.rs#L61-L63
    
    In this implementation, only the latter would dispatch
    `PatchApplyBeginEvent` and `PatchApplyEndEvent`, though the former would
    dispatch `ExecCommandBeginEvent` and `ExecCommandEndEvent` for the
    `apply_patch` call (or, more specifically, the `codex
    --codex-run-as-apply-patch PATCH` call).
    
    To unify things in this PR, we:
    
    * Eliminate the back half of the `apply_patch()` function, and instead
    have it also return with `DelegateToExec`, though we add an extra field
    to the return value, `user_explicitly_approved_this_action`.
    * In `codex.rs` where we process `DelegateToExec`, we use
    `SandboxType::None` when `user_explicitly_approved_this_action` is
    `true`. This means **we no longer run the apply_patch logic in memory**,
    as we always `exec()`. (Note this is what allowed us to delete so much
    code in `apply_patch.rs`.)
    * In `codex.rs`, we further update `notify_exec_command_begin()` and
    `notify_exec_command_end()` to take additional fields to determine what
    type of notification to send: `ExecCommand` or `PatchApply`.
    
    Admittedly, this PR also drops some of the functionality about giving
    the user the opportunity to expand the set of writable roots as part of
    approving the `apply_patch` command. I'm not sure how much that was
    used, and we should probably rethink how that works as we are currently
    tidying up the protocol to the TUI, in general.
  • Add codex login --api-key (#1759)
    Allow setting the API key via `codex login --api-key`
  • clamp render area to terminal size (#1758)
    this fixes a couple of panics that would happen when trying to render
    something larger than the terminal, or insert history lines when the top
    of the viewport is at y=0.
  • Show error message after panic (#1752)
    Previously we were swallowing errors and silently exiting, which isn't
    great for helping users help us.
  • fix git tests (#1747)
    the git tests were failing on my local machine due to gpg signing config
    in my ~/.gitconfig. tests should not be affected by ~/.gitconfig, so
    configure them to ignore it.
  • streamline ui (#1733)
    Simplify and improve many UI elements.
    * Remove all-around borders in most places. These interact badly with
    terminal resizing and look heavy. Prefer left-side-only borders.
    * Make the viewport adjust to the size of its contents.
    * <kbd>/</kbd> and <kbd>@</kbd> autocomplete boxes appear below the
    prompt, instead of above it.
    * Restyle the keyboard shortcut hints & move them to the left.
    * Restyle the approval dialog.
    * Use synchronized rendering to avoid flashing during rerenders.
    
    
    https://github.com/user-attachments/assets/96f044af-283b-411c-b7fc-5e6b8a433c20
    
    <img width="1117" height="858" alt="Screenshot 2025-07-30 at 5 29 20 PM"
    src="https://github.com/user-attachments/assets/0cc0af77-8396-429b-b6ee-9feaaccdbee7"
    />
  • add keyboard enhancements to support shift_return (#1743)
    For terminal that supports [keyboard
    enhancements](https://docs.rs/libcrossterm/latest/crossterm/enum.KeyboardEnhancementFlags.html),
    adds the enhancements (enabling [kitty keyboard
    protocol](https://sw.kovidgoyal.net/kitty/keyboard-protocol/)) to
    support shift+enter listener.
    
    Those users (users with terminals listed on
    [KPP](https://sw.kovidgoyal.net/kitty/keyboard-protocol/)) should be
    able to press shift+return for new line
    
    ---------
    
    Co-authored-by: easong-openai <easong@openai.com>
  • Auto format toml (#1745)
    Add recommended extension and configure it to auto format prompt.
  • chore: add support for a new label, codex-rust-review (#1744)
    The goal of this change is to try an experiment where we try to get AI
    to take on more of the code review load. The idea is that once you
    believe your PR is ready for review, please add the `codex-rust-review`
    label (as opposed to the `codex-review` label).
    
    Admittedly the corresponding prompt currently represents my personal
    biases in terms of code review, but we should massage it over time to
    represent the team's preferences.
  • resizable viewport (#1732)
    Proof of concept for a resizable viewport.
    
    The general approach here is to duplicate the `Terminal` struct from
    ratatui, but with our own logic. This is a "light fork" in that we are
    still using all the base ratatui functions (`Buffer`, `Widget` and so
    on), but we're doing our own bookkeeping at the top level to determine
    where to draw everything.
    
    This approach could use improvement—e.g, when the window is resized to a
    smaller size, if the UI wraps, we don't correctly clear out the
    artifacts from wrapping. This is possible with a little work (i.e.
    tracking what parts of our UI would have been wrapped), but this
    behavior is at least at par with the existing behavior.
    
    
    https://github.com/user-attachments/assets/4eb17689-09fd-4daa-8315-c7ebc654986d
    
    
    cc @joshka who might have Thoughts™
  • fix: run apply_patch calls through the sandbox (#1705)
    Building on the work of https://github.com/openai/codex/pull/1702, this
    changes how a shell call to `apply_patch` is handled.
    
    Previously, a shell call to `apply_patch` was always handled in-process,
    never leveraging a sandbox. To determine whether the `apply_patch`
    operation could be auto-approved, the
    `is_write_patch_constrained_to_writable_paths()` function would check if
    all the paths listed in the paths were writable. If so, the agent would
    apply the changes listed in the patch.
    
    Unfortunately, this approach afforded a loophole: symlinks!
    
    * For a soft link, we could fix this issue by tracing the link and
    checking whether the target is in the set of writable paths, however...
    * ...For a hard link, things are not as simple. We can run `stat FILE`
    to see if the number of links is greater than 1, but then we would have
    to do something potentially expensive like `find . -inum <inode_number>`
    to find the other paths for `FILE`. Further, even if this worked, this
    approach runs the risk of a
    [TOCTOU](https://en.wikipedia.org/wiki/Time-of-check_to_time-of-use)
    race condition, so it is not robust.
    
    The solution, implemented in this PR, is to take the virtual execution
    of the `apply_patch` CLI into an _actual_ execution using `codex
    --codex-run-as-apply-patch PATCH`, which we can run under the sandbox
    the user specified, just like any other `shell` call.
    
    This, of course, assumes that the sandbox prevents writing through
    symlinks as a mechanism to write to folders that are not in the writable
    set configured by the sandbox. I verified this by testing the following
    on both Mac and Linux:
    
    ```shell
    #!/usr/bin/env bash
    set -euo pipefail
    
    # Can running a command in SANDBOX_DIR write a file in EXPLOIT_DIR?
    
    # Codex is run in SANDBOX_DIR, so writes should be constrianed to this directory.
    SANDBOX_DIR=$(mktemp -d -p "$HOME" sandboxtesttemp.XXXXXX)
    # EXPLOIT_DIR is outside of SANDBOX_DIR, so let's see if we can write to it.
    EXPLOIT_DIR=$(mktemp -d -p "$HOME" sandboxtesttemp.XXXXXX)
    
    echo "SANDBOX_DIR: $SANDBOX_DIR"
    echo "EXPLOIT_DIR: $EXPLOIT_DIR"
    
    cleanup() {
      # Only remove if it looks sane and still exists
      [[ -n "${SANDBOX_DIR:-}" && -d "$SANDBOX_DIR" ]] && rm -rf -- "$SANDBOX_DIR"
      [[ -n "${EXPLOIT_DIR:-}" && -d "$EXPLOIT_DIR" ]] && rm -rf -- "$EXPLOIT_DIR"
    }
    
    trap cleanup EXIT
    
    echo "I am the original content" > "${EXPLOIT_DIR}/original.txt"
    
    # Drop the -s to test hard links.
    ln -s "${EXPLOIT_DIR}/original.txt" "${SANDBOX_DIR}/link-to-original.txt"
    
    cat "${SANDBOX_DIR}/link-to-original.txt"
    
    if [[ "$(uname)" == "Linux" ]]; then
        SANDBOX_SUBCOMMAND=landlock
    else
        SANDBOX_SUBCOMMAND=seatbelt
    fi
    
    # Attempt the exploit
    cd "${SANDBOX_DIR}"
    
    codex debug "${SANDBOX_SUBCOMMAND}" bash -lc "echo pwned > ./link-to-original.txt" || true
    
    cat "${EXPLOIT_DIR}/original.txt"
    ```
    
    Admittedly, this change merits a proper integration test, but I think I
    will have to do that in a follow-up PR.
  • Add login status command (#1716)
    Print the current login mode, sanitized key and return an appropriate
    status.
  • moving input item from MCP Protocol back to core Protocol (#1740)
    - Currently we have duplicate input item. Let's have one source of truth
    in the core.
    - Used Requestid type
  • Add support for a separate chatgpt auth endpoint (#1712)
    Adds a `CodexAuth` type that encapsulates information about available
    auth modes and logic for refreshing the token.
    Changes `Responses` API to send requests to different endpoints based on
    the auth type.
    Updates login_with_chatgpt to support API-less mode and skip the key
    exchange.
  • fix ci (#1739)
    I think this commit broke the CI because it changed the
    `McpToolCallBeginEvent` type:
    https://github.com/openai/codex/commit/347c81ad0049103c84e0aa2c0d7e2988db18218a
  • remove conversation history widget (#1727)
    this widget is no longer used.
  • Mcp protocol (#1715)
    - Add typed MCP protocol surface in
    `codex-rs/mcp-server/src/mcp_protocol.rs` for `requests`, `responses`,
    and `notifications`
    - Requests: `NewConversation`, `Connect`, `SendUserMessage`,
    `GetConversations`
    - Message content parts: `Text`, `Image` (`ImageUrl`/`FileId`, optional
    `ImageDetail`), File (`Url`/`Id`/`inline Data`)
    - Responses: `ToolCallResponseEnvelope` with optional `isError` and
    `structuredContent` variants (`NewConversation`, `Connect`,
    `SendUserMessageAccepted`, `GetConversations`)
    - Notifications: `InitialState`, `ConnectionRevoked`, `CodexEvent`,
    `Cancelled`
    - Uniform `_meta` on `notifications` via `NotificationMeta`
    (`conversationId`, `requestId`)
    - Unit tests validate JSON wire shapes for key
    `requests`/`responses`/`notifications`
  • Trim bash lc and run with login shell (#1725)
    include .zshenv, .zprofile by running with the `-l` flag and don't start
    a shell inside a shell when we see the typical `bash -lc` invocation.
  • Add an experimental plan tool (#1726)
    This adds a tool the model can call to update a plan. The tool doesn't
    actually _do_ anything but it gives clients a chance to read and render
    the structured plan. We will likely iterate on the prompt and tools
    exposed for planning over time.
  • Relative instruction file (#1722)
    Passing in an instruction file with a bad path led to silent failures,
    also instruction relative paths were handled in an unintuitive fashion.
  • feat: map ^U to kill-line-to-head (#1711)
    see
    [discussion](https://github.com/rhysd/tui-textarea/issues/51#issuecomment-3021191712),
    it's surprising that ^U behaves this way. IMO the undo/redo
    functionality in tui-textarea isn't good enough to be worth preserving,
    but if we do bring it back it should probably be on C-z / C-S-z / C-y.
  • replace login screen with a simple prompt (#1713)
    Perhaps there was an intention to make the login screen prettier, but it
    feels quite silly right now to just have a screen that says "press q",
    so replace it with something that lets the user directly login without
    having to quit the app.
    
    <img width="1283" height="635" alt="Screenshot 2025-07-28 at 2 54 05 PM"
    src="https://github.com/user-attachments/assets/f19e5595-6ef9-4a2d-b409-aa61b30d3628"
    />
  • [mcp-server] Populate notifications._meta with requestId (#1704)
    ## Summary
    Per the [latest MCP
    spec](https://modelcontextprotocol.io/specification/2025-06-18/basic#meta),
    the `_meta` field is reserved for metadata. In the [Typescript
    Schema](https://github.com/modelcontextprotocol/modelcontextprotocol/blob/0695a497eb50a804fc0e88c18a93a21a675d6b3e/schema/2025-06-18/schema.ts#L37-L40),
    `progressToken` is defined as a value to be attached to subsequent
    notifications for that request.
    
    The
    [CallToolRequestParams](https://github.com/modelcontextprotocol/modelcontextprotocol/blob/0695a497eb50a804fc0e88c18a93a21a675d6b3e/schema/2025-06-18/schema.ts#L806-L817)
    extends this definition but overwrites the params field. This ambiguity
    makes our generated type definitions tricky, so I'm going to skip
    `progressToken` field for now and just send back the `requestId`
    instead.
     
    In a future PR, we can clarify, update our `generate_mcp_types.py`
    script, and update our progressToken logic accordingly.
    
    ## Testing
    - [x] Added unit tests
    - [x] Manually tested with mcp client
  • Fix approval workflow (#1696)
    (Hopefully) temporary solution to the invisible approvals problem -
    prints commands to history when they need approval and then also prints
    the result of the approval. In the near future we should be able to do
    some fancy stuff with updating commands before writing them to permanent
    history.
    
    Also, ctr-c while in the approval modal now acts as esc (aborts command)
    and puts the TUI in the state where one additional ctr-c will exit.
  • Serializing the eventmsg type to snake_case (#1709)
    This was an abrupt change on our clients. We need to serialize as
    snake_case.
  • chore: split apply_patch logic out of codex.rs and into apply_patch.rs (#1703)
    This is a straight refactor, moving apply-patch-related code from
    `codex.rs` and into the new `apply_patch.rs` file. The only "logical"
    change is inlining `#[allow(clippy::unwrap_used)]` instead of declaring
    `#![allow(clippy::unwrap_used)]` at the top of the file (which is
    currently the case in `codex.rs`).
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/1703).
    * #1705
    * __->__ #1703
    * #1702
    * #1698
    * #1697
  • fix: support special --codex-run-as-apply-patch arg (#1702)
    This introduces some special behavior to the CLIs that are using the
    `codex-arg0` crate where if `arg1` is `--codex-run-as-apply-patch`, then
    it will run as if `apply_patch arg2` were invoked. This is important
    because it means we can do things like:
    
    ```
    SANDBOX_TYPE=landlock # or seatbelt for macOS
    codex debug "${SANDBOX_TYPE}" -- codex --codex-run-as-apply-patch PATCH
    ```
    
    which gives us a way to run `apply_patch` while ensuring it adheres to
    the sandbox the user specified.
    
    While it would be nice to use the `arg0` trick like we are currently
    doing for `codex-linux-sandbox`, there is no way to specify the `arg0`
    for the underlying command when running under `/usr/bin/sandbox-exec`,
    so it will not work for us in this case.
    
    Admittedly, we could have also supported this via a custom environment
    variable (e.g., `CODEX_ARG0`), but since environment variables are
    inherited by child processes, that seemed like a potentially leakier
    abstraction.
    
    This change, as well as our existing reliance on checking `arg0`, place
    additional requirements on those who include `codex-core`. Its
    `README.md` has been updated to reflect this.
    
    While we could have just added an `apply-patch` subcommand to the
    `codex` multitool CLI, that would not be sufficient for the standalone
    `codex-exec` CLI, which is something that we distribute as part of our
    GitHub releases for those who know they will not be using the TUI and
    therefore prefer to use a slightly smaller executable:
    
    https://github.com/openai/codex/releases/tag/rust-v0.10.0
    
    To that end, this PR adds an integration test to ensure that the
    `--codex-run-as-apply-patch` option works with the standalone
    `codex-exec` CLI.
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/1702).
    * #1705
    * #1703
    * __->__ #1702
    * #1698
    * #1697
  • fix: use std::env::args_os instead of std::env::args (#1698)
    Apparently `std::env::args()` will panic during iteration if any
    argument to the process is not valid Unicode:
    
    https://doc.rust-lang.org/std/env/fn.args.html
    
    Let's avoid the risk and just go with `std::env::args_os()`.
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/1698).
    * #1705
    * #1703
    * #1702
    * __->__ #1698
    * #1697
  • fix: correctly wrap history items (#1685)
    The overall idea here is: skip ratatui for writing into scrollback,
    because its primitives are wrong. We want to render full lines of text,
    that will be wrapped natively by the terminal, and which we never plan
    to update using ratatui (so the `Buffer` struct is overhead and in fact
    an inhibition).
    
    Instead, we use ANSI scrolling regions (link reference doc to come).
    Essentially, we:
    1. Define a scrolling region that extends from the top of the prompt
    area all the way to the top of scrollback
    2. Scroll that region up by N < (screen_height - viewport_height) lines,
    in this PR N=1
    3. Put our cursor at the top of the newly empty region
    4. Print out our new text like normal
    
    The terminal interactions here (write_spans and its dependencies) are
    mostly extracted from ratatui.