474 Commits

  • Use helpers instead of fixtures (#3888)
    Move to using test helper method everywhere.
  • fix: ensure cwd for conversation and sandbox are separate concerns (#3874)
    Previous to this PR, both of these functions take a single `cwd`:
    
    
    https://github.com/openai/codex/blob/71038381aa0f51aa62e1a2bcc7cbf26a05b141f3/codex-rs/core/src/seatbelt.rs#L19-L25
    
    
    https://github.com/openai/codex/blob/71038381aa0f51aa62e1a2bcc7cbf26a05b141f3/codex-rs/core/src/landlock.rs#L16-L23
    
    whereas `cwd` and `sandbox_cwd` should be set independently (fixed in
    this PR).
    
    Added `sandbox_distinguishes_command_and_policy_cwds()` to
    `codex-rs/exec/tests/suite/sandbox.rs` to verify this.
  • feat: /review (#3774)
    Adds `/review` action in TUI
    
    <img width="637" height="370" alt="Screenshot 2025-09-17 at 12 41 19 AM"
    src="https://github.com/user-attachments/assets/b1979a6e-844a-4b97-ab20-107c185aec1d"
    />
  • chore(deps): bump tracing-subscriber from 0.3.19 to 0.3.20 in /codex-rs (#3620)
    Bumps [tracing-subscriber](https://github.com/tokio-rs/tracing) from
    0.3.19 to 0.3.20.
    <details>
    <summary>Release notes</summary>
    <p><em>Sourced from <a
    href="https://github.com/tokio-rs/tracing/releases">tracing-subscriber's
    releases</a>.</em></p>
    <blockquote>
    <h2>tracing-subscriber 0.3.20</h2>
    <p><strong>Security Fix</strong>: ANSI Escape Sequence Injection
    (CVE-TBD)</p>
    <h2>Impact</h2>
    <p>Previous versions of tracing-subscriber were vulnerable to ANSI
    escape sequence injection attacks. Untrusted user input containing ANSI
    escape sequences could be injected into terminal output when logged,
    potentially allowing attackers to:</p>
    <ul>
    <li>Manipulate terminal title bars</li>
    <li>Clear screens or modify terminal display</li>
    <li>Potentially mislead users through terminal manipulation</li>
    </ul>
    <p>In isolation, impact is minimal, however security issues have been
    found in terminal emulators that enabled an attacker to use ANSI escape
    sequences via logs to exploit vulnerabilities in the terminal
    emulator.</p>
    <h2>Solution</h2>
    <p>Version 0.3.20 fixes this vulnerability by escaping ANSI control
    characters in when writing events to destinations that may be printed to
    the terminal.</p>
    <h2>Affected Versions</h2>
    <p>All versions of tracing-subscriber prior to 0.3.20 are affected by
    this vulnerability.</p>
    <h2>Recommendations</h2>
    <p>Immediate Action Required: We recommend upgrading to
    tracing-subscriber 0.3.20 immediately, especially if your
    application:</p>
    <ul>
    <li>Logs user-provided input (form data, HTTP headers, query parameters,
    etc.)</li>
    <li>Runs in environments where terminal output is displayed to
    users</li>
    </ul>
    <h2>Migration</h2>
    <p>This is a patch release with no breaking API changes. Simply update
    your Cargo.toml:</p>
    <pre lang="toml"><code>[dependencies]
    tracing-subscriber = &quot;0.3.20&quot;
    </code></pre>
    <h2>Acknowledgments</h2>
    <p>We would like to thank <a href="http://github.com/zefr0x">zefr0x</a>
    who responsibly reported the issue at
    <code>security@tokio.rs</code>.</p>
    <p>If you believe you have found a security vulnerability in any
    tokio-rs project, please email us at <code>security@tokio.rs</code>.</p>
    </blockquote>
    </details>
    <details>
    <summary>Commits</summary>
    <ul>
    <li><a
    href="https://github.com/tokio-rs/tracing/commit/4c52ca5266a3920fc5dfeebda2accf15ee7fb278"><code>4c52ca5</code></a>
    fmt: fix ANSI escape sequence injection vulnerability (<a
    href="https://redirect.github.com/tokio-rs/tracing/issues/3368">#3368</a>)</li>
    <li><a
    href="https://github.com/tokio-rs/tracing/commit/f71cebe41e4c12735b1d19ca804428d4ff7d905d"><code>f71cebe</code></a>
    subscriber: impl Clone for EnvFilter (<a
    href="https://redirect.github.com/tokio-rs/tracing/issues/3360">#3360</a>)</li>
    <li><a
    href="https://github.com/tokio-rs/tracing/commit/3a1f571102b38bcdca13d59f3c454989d179055d"><code>3a1f571</code></a>
    Fix CI (<a
    href="https://redirect.github.com/tokio-rs/tracing/issues/3361">#3361</a>)</li>
    <li><a
    href="https://github.com/tokio-rs/tracing/commit/e63ef57f3d686abe3727ddd586eb9af73d6715b7"><code>e63ef57</code></a>
    chore: prepare tracing-attributes 0.1.30 (<a
    href="https://redirect.github.com/tokio-rs/tracing/issues/3316">#3316</a>)</li>
    <li><a
    href="https://github.com/tokio-rs/tracing/commit/6e59a13b1a7bcdd78b8b5a7cbcf70a0b2cdd76f0"><code>6e59a13</code></a>
    attributes: fix tracing::instrument regression around shadowing (<a
    href="https://redirect.github.com/tokio-rs/tracing/issues/3311">#3311</a>)</li>
    <li><a
    href="https://github.com/tokio-rs/tracing/commit/e4df76127538aa8370d7dee32a6f84bbec6bbf10"><code>e4df761</code></a>
    tracing: update core to 0.1.34 and attributes to 0.1.29 (<a
    href="https://redirect.github.com/tokio-rs/tracing/issues/3305">#3305</a>)</li>
    <li><a
    href="https://github.com/tokio-rs/tracing/commit/643f392ebb73c4fb856f56a78c066c82582dd22c"><code>643f392</code></a>
    chore: prepare tracing-attributes 0.1.29 (<a
    href="https://redirect.github.com/tokio-rs/tracing/issues/3304">#3304</a>)</li>
    <li><a
    href="https://github.com/tokio-rs/tracing/commit/d08e7a6eea1833810ea527e18ea03b08cd402c9d"><code>d08e7a6</code></a>
    chore: prepare tracing-core 0.1.34 (<a
    href="https://redirect.github.com/tokio-rs/tracing/issues/3302">#3302</a>)</li>
    <li><a
    href="https://github.com/tokio-rs/tracing/commit/6e70c571d319a033d5f37c885ccf99aa675a9eac"><code>6e70c57</code></a>
    tracing-subscriber: count numbers of enters in <code>Timings</code> (<a
    href="https://redirect.github.com/tokio-rs/tracing/issues/2944">#2944</a>)</li>
    <li><a
    href="https://github.com/tokio-rs/tracing/commit/c01d4fd9def2fb061669a310598095c789ca0a32"><code>c01d4fd</code></a>
    fix docs and enable CI on <code>main</code> branch (<a
    href="https://redirect.github.com/tokio-rs/tracing/issues/3295">#3295</a>)</li>
    <li>Additional commits viewable in <a
    href="https://github.com/tokio-rs/tracing/compare/tracing-subscriber-0.3.19...tracing-subscriber-0.3.20">compare
    view</a></li>
    </ul>
    </details>
    <br />
    
    
    [![Dependabot compatibility
    score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=tracing-subscriber&package-manager=cargo&previous-version=0.3.19&new-version=0.3.20)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
    
    Dependabot will resolve any conflicts with this PR as long as you don't
    alter it yourself. You can also trigger a rebase manually by commenting
    `@dependabot rebase`.
    
    [//]: # (dependabot-automerge-start)
    [//]: # (dependabot-automerge-end)
    
    ---
    
    <details>
    <summary>Dependabot commands and options</summary>
    <br />
    
    You can trigger Dependabot actions by commenting on this PR:
    - `@dependabot rebase` will rebase this PR
    - `@dependabot recreate` will recreate this PR, overwriting any edits
    that have been made to it
    - `@dependabot merge` will merge this PR after your CI passes on it
    - `@dependabot squash and merge` will squash and merge this PR after
    your CI passes on it
    - `@dependabot cancel merge` will cancel a previously requested merge
    and block automerging
    - `@dependabot reopen` will reopen this PR if it is closed
    - `@dependabot close` will close this PR and stop Dependabot recreating
    it. You can achieve the same result by closing it manually
    - `@dependabot show <dependency name> ignore conditions` will show all
    of the ignore conditions of the specified dependency
    - `@dependabot ignore this major version` will close this PR and stop
    Dependabot creating any more for this major version (unless you reopen
    the PR or upgrade to it yourself)
    - `@dependabot ignore this minor version` will close this PR and stop
    Dependabot creating any more for this minor version (unless you reopen
    the PR or upgrade to it yourself)
    - `@dependabot ignore this dependency` will close this PR and stop
    Dependabot creating any more for this dependency (unless you reopen the
    PR or upgrade to it yourself)
    
    
    </details>
    
    Signed-off-by: dependabot[bot] <support@github.com>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
  • enable-resume (#3537)
    Adding the ability to resume conversations.
    we have one verb `resume`. 
    
    Behavior:
    
    `tui`:
    `codex resume`: opens session picker
    `codex resume --last`: continue last message
    `codex resume <session id>`: continue conversation with `session id`
    
    `exec`:
    `codex resume --last`: continue last conversation
    `codex resume <session id>`: continue conversation with `session id`
    
    Implementation:
    - I added a function to find the path in `~/.codex/sessions/` with a
    `UUID`. This is helpful in resuming with session id.
    - Added the above mentioned flags
    - Added lots of testing
  • Review Mode (Core) (#3401)
    ## 📝 Review Mode -- Core
    
    This PR introduces the Core implementation for Review mode:
    
    - New op `Op::Review { prompt: String }:` spawns a child review task
    with isolated context, a review‑specific system prompt, and a
    `Config.review_model`.
    - `EnteredReviewMode`: emitted when the child review session starts.
    Every event from this point onwards reflects the review session.
    - `ExitedReviewMode(Option<ReviewOutputEvent>)`: emitted when the review
    finishes or is interrupted, with optional structured findings:
    
    ```json
    {
      "findings": [
        {
          "title": "<≤ 80 chars, imperative>",
          "body": "<valid Markdown explaining *why* this is a problem; cite files/lines/functions>",
          "confidence_score": <float 0.0-1.0>,
          "priority": <int 0-3>,
          "code_location": {
            "absolute_file_path": "<file path>",
            "line_range": {"start": <int>, "end": <int>}
          }
        }
      ],
      "overall_correctness": "patch is correct" | "patch is incorrect",
      "overall_explanation": "<1-3 sentence explanation justifying the overall_correctness verdict>",
      "overall_confidence_score": <float 0.0-1.0>
    }
    ```
    
    ## Questions
    
    ### Why separate out its own message history?
    
    We want the review thread to match the training of our review models as
    much as possible -- that means using a custom prompt, removing user
    instructions, and starting a clean chat history.
    
    We also want to make sure the review thread doesn't leak into the parent
    thread.
    
    ### Why do this as a mode, vs. sub-agents?
    
    1. We want review to be a synchronous task, so it's fine for now to do a
    bespoke implementation.
    2. We're still unclear about the final structure for sub-agents. We'd
    prefer to land this quickly and then refactor into sub-agents without
    rushing that implementation.
  • feat: include reasoning_effort in NewConversationResponse (#3506)
    `ClientRequest::NewConversation` picks up the reasoning level from the user's defaults in `config.toml`, so it should be reported in `NewConversationResponse`.
  • chore: enable clippy::redundant_clone (#3489)
    Created this PR by:
    
    - adding `redundant_clone` to `[workspace.lints.clippy]` in
    `cargo-rs/Cargol.toml`
    - running `cargo clippy --tests --fix`
    - running `just fmt`
    
    Though I had to clean up one instance of the following that resulted:
    
    ```rust
    let codex = codex;
    ```
  • Simplify auth flow and reconcile differences between ChatGPT and API Key auth (#3189)
    This PR does the following:
    * Adds the ability to paste or type an API key.
    * Removes the `preferred_auth_method` config option. The last login
    method is always persisted in auth.json, so this isn't needed.
    * If OPENAI_API_KEY env variable is defined, the value is used to
    prepopulate the new UI. The env variable is otherwise ignored by the
    CLI.
    * Adds a new MCP server entry point "login_api_key" so we can implement
    this same API key behavior for the VS Code extension.
    <img width="473" height="140" alt="Screenshot 2025-09-04 at 3 51 04 PM"
    src="https://github.com/user-attachments/assets/c11bbd5b-8a4d-4d71-90fd-34130460f9d9"
    />
    <img width="726" height="254" alt="Screenshot 2025-09-04 at 3 51 32 PM"
    src="https://github.com/user-attachments/assets/6cc76b34-309a-4387-acbc-15ee5c756db9"
    />
  • Change forking to read the rollout from file (#3440)
    This PR changes get history op to get path. Then, forking will use a
    path. This will help us have one unified codepath for resuming/forking
    conversations. Will also help in having rollout history in order. It
    also fixes a bug where you won't see the UI when resuming after forking.
  • Replace config.responses_originator_header_internal_override with CODEX_INTERNAL_ORIGINATOR_OVERRIDE_ENV_VAR (#3388)
    The previous config approach had a few issues:
    1. It is part of the config but not designed to be used externally
    2. It had to be wired through many places (look at the +/- on this PR
    3. It wasn't guaranteed to be set consistently everywhere because we
    don't have a super well defined way that configs stack. For example, the
    extension would configure during newConversation but anything that
    happened outside of that (like login) wouldn't get it.
    
    This env var approach is cleaner and also creates one less thing we have
    to deal with when coming up with a better holistic story around configs.
    
    One downside is that I removed the unit test testing for the override
    because I don't want to deal with setting the global env or spawning
    child processes and figuring out how to introspect their originator
    header. The new code is sufficiently simple and I tested it e2e that I
    feel as if this is still worth it.
  • fix: include rollout_path in NewConversationResponse (#3352)
    Adding the `rollout_path` to the `NewConversationResponse` makes it so a
    client can perform subsequent operations on a `(ConversationId,
    PathBuf)` pair. #3353 will introduce support for `ArchiveConversation`.
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/3352).
    * #3353
    * __->__ #3352
  • feat: Run cargo shear during CI (#3338)
    Run cargo shear as part of the CI to ensure no unused dependencies
  • Format large numbers in a more readable way. (#2046)
    - In the bottom line of the TUI, print the number of tokens to 3 sigfigs
      with an SI suffix, e.g. "1.23K".
    - Elsewhere where we print a number, I figure it's worthwhile to print
      the exact number, because e.g. it's a summary of your session. Here we print
      the numbers comma-separated.
  • Use ConversationId instead of raw Uuids (#3282)
    We're trying to migrate from `session_id: Uuid` to `conversation_id:
    ConversationId`. Not only does this give us more type safety but it
    unifies our terminology across Codex and with the implementation of
    session resuming, a conversation (which can span multiple sessions) is
    more appropriate.
    
    I started this impl on https://github.com/openai/codex/pull/3219 as part
    of getting resume working in the extension but it's big enough that it
    should be broken out.
  • Move token usage/context information to session level (#3221)
    Move context information into the main loop so it can be used to
    interrupt the loop or start auto-compaction.
  • Never store requests (#3212)
    When item ids are sent to Responses API it will load them from the
    database ignoring the provided values. This adds extra latency.
    
    Not having the mode to store requests also allows us to simplify the
    code.
    
    ## Breaking change
    
    The `disable_response_storage` configuration option is removed.
  • Dividing UserMsgs into categories to send it back to the tui (#3127)
    This PR does the following:
    
    - divides user msgs into 3 categories: plain, user instructions, and
    environment context
    - Centralizes adding user instructions and environment context to a
    degree
    - Improve the integration testing
    
    Building on top of #3123
    
    Specifically this
    [comment](https://github.com/openai/codex/pull/3123#discussion_r2319885089).
    We need to send the user message while ignoring the User Instructions
    and Environment Context we attach.
  • Replay EventMsgs from Response Items when resuming a session with history. (#3123)
    ### Overview
    
    This PR introduces the following changes:
    	1.	Adds a unified mechanism to convert ResponseItem into EventMsg.
    2. Ensures that when a session is initialized with initial history, a
    vector of EventMsg is sent along with the session configuration. This
    allows clients to re-render the UI accordingly.
    	3. 	Added integration testing
    
    ### Caveats
    
    This implementation does not send every EventMsg that was previously
    dispatched to clients. The excluded events fall into two categories:
    	•	“Arguably” rolled-out events
    Examples include tool calls and apply-patch calls. While these events
    are conceptually rolled out, we currently only roll out ResponseItems.
    These events are already being handled elsewhere and transformed into
    EventMsg before being sent.
    	•	Non-rolled-out events
    Certain events such as TurnDiff, Error, and TokenCount are not rolled
    out at all.
    
    ### Future Directions
    
    At present, resuming a session involves maintaining two states:
    	•	UI State
    Clients can replay most of the important UI from the provided EventMsg
    history.
    	•	Model State
    The model receives the complete session history to reconstruct its
    internal state.
    
    This design provides a solid foundation. If, in the future, more precise
    UI reconstruction is needed, we have two potential paths:
    1. Introduce a third data structure that allows us to derive both
    ResponseItems and EventMsgs.
    2. Clearly divide responsibilities: the core system ensures the
    integrity of the model state, while clients are responsible for
    reconstructing the UI.
  • Add a common way to create HTTP client (#3110)
    Ensure User-Agent and originator are always sent.
  • Move CodexAuth and AuthManager to the core crate (#3074)
    Fix a long standing layering issue.
  • Following up on #2371 post commit feedback (#2852)
    - Introduce websearch end to complement the begin 
    - Moves the logic of adding the sebsearch tool to
    create_tools_json_for_responses_api
    - Making it the client responsibility to toggle the tool on or off 
    - Other misc in #2371 post commit feedback
    - Show the query:
    
    <img width="1392" height="151" alt="image"
    src="https://github.com/user-attachments/assets/8457f1a6-f851-44cf-bcca-0d4fe460ce89"
    />
  • Custom /prompts (#2696)
    Adds custom `/prompts` to `~/.codex/prompts/<command>.md`.
    
    <img width="239" height="107" alt="Screenshot 2025-08-25 at 6 22 42 PM"
    src="https://github.com/user-attachments/assets/fe6ebbaa-1bf6-49d3-95f9-fdc53b752679"
    />
    
    ---
    
    Details:
    
    1. Adds `Op::ListCustomPrompts` to core.
    2. Returns `ListCustomPromptsResponse` with list of `CustomPrompt`
    (name, content).
    3. TUI calls the operation on load, and populates the custom prompts
    (excluding prompts that collide with builtins).
    4. Selecting the custom prompt automatically sends the prompt to the
    agent.
  • chore: require uninlined_format_args from clippy (#2845)
    - added `uninlined_format_args` to `[workspace.lints.clippy]` in the
    `Cargo.toml` for the workspace
    - ran `cargo clippy --tests --fix`
    - ran `just fmt`
  • Add "View Image" tool (#2723)
    Adds a "View Image" tool so Codex can find and see images by itself:
    
    <img width="1772" height="420" alt="Screenshot 2025-08-26 at 10 40
    04 AM"
    src="https://github.com/user-attachments/assets/7a459c7b-0b86-4125-82d9-05fbb35ade03"
    />
  • send context window with task started (#2752)
    - Send context window with task started
    - Accounting for changing the model per turn
  • [exec] Clean up apply-patch tests (#2648)
    ## Summary
    These tests were getting a bit unwieldy, and they're starting to become
    load-bearing. Let's clean them up, and get them working solidly so we
    can easily expand this harness with new tests.
    
    ## Test Plan
    - [x] Tests continue to pass
  • test: faster test execution in codex-core (#2633)
    this dramatically improves time to run `cargo test -p codex-core` (~25x
    speedup).
    
    before:
    ```
    cargo test -p codex-core  35.96s user 68.63s system 19% cpu 8:49.80 total
    ```
    
    after:
    ```
    cargo test -p codex-core  5.51s user 8.16s system 63% cpu 21.407 total
    ```
    
    both tests measured "hot", i.e. on a 2nd run with no filesystem changes,
    to exclude compile times.
    
    approach inspired by [Delete Cargo Integration
    Tests](https://matklad.github.io/2021/02/27/delete-cargo-integration-tests.html),
    we move all test cases in tests/ into a single suite in order to have a
    single binary, as there is significant overhead for each test binary
    executed, and because test execution is only parallelized with a single
    binary.
  • Add web search tool (#2371)
    Adds web_search tool, enabling the model to use Responses API web_search
    tool.
    - Disabled by default, enabled by --search flag
    - When --search is passed, exposes web_search_request function tool to
    the model, which triggers user approval. When approved, the model can
    use the web_search tool for the remainder of the turn
    <img width="1033" height="294" alt="image"
    src="https://github.com/user-attachments/assets/62ac6563-b946-465c-ba5d-9325af28b28f"
    />
    
    ---------
    
    Co-authored-by: easong-openai <easong@openai.com>
  • send-aggregated output (#2364)
    We want to send an aggregated output of stderr and stdout so we don't
    have to aggregate it stderr+stdout as we lose order sometimes.
    
    ---------
    
    Co-authored-by: Gabriel Peal <gpeal@users.noreply.github.com>
  • fork conversation from a previous message (#2575)
    This can be the underlying logic in order to start a conversation from a
    previous message. will need some love in the UI.
    
    Base for building this: #2588
  • [apply_patch] freeform apply_patch tool (#2576)
    ## Summary
    GPT-5 introduced the concept of [custom
    tools](https://platform.openai.com/docs/guides/function-calling#custom-tools),
    which allow the model to send a raw string result back, simplifying
    json-escape issues. We are migrating gpt-5 to use this by default.
    
    However, gpt-oss models do not support custom tools, only normal
    functions. So we keep both tool definitions, and provide whichever one
    the model family supports.
    
    ## Testing
    - [x] Tested locally with various models
    - [x] Unit tests pass
  • Add AuthManager and enhance GetAuthStatus command (#2577)
    This PR adds a central `AuthManager` struct that manages the auth
    information used across conversations and the MCP server. Prior to this,
    each conversation and the MCP server got their own private snapshots of
    the auth information, and changes to one (such as a logout or token
    refresh) were not seen by others.
    
    This is especially problematic when multiple instances of the CLI are
    run. For example, consider the case where you start CLI 1 and log in to
    ChatGPT account X and then start CLI 2 and log out and then log in to
    ChatGPT account Y. The conversation in CLI 1 is still using account X,
    but if you create a new conversation, it will suddenly (and
    unexpectedly) switch to account Y.
    
    With the `AuthManager`, auth information is read from disk at the time
    the `ConversationManager` is constructed, and it is cached in memory.
    All new conversations use this same auth information, as do any token
    refreshes.
    
    The `AuthManager` is also used by the MCP server's GetAuthStatus
    command, which now returns the auth method currently used by the MCP
    server.
    
    This PR also includes an enhancement to the GetAuthStatus command. It
    now accepts two new (optional) input parameters: `include_token` and
    `refresh_token`. Callers can use this to request the in-use auth token
    and can optionally request to refresh the token.
    
    The PR also adds tests for the login and auth APIs that I recently added
    to the MCP server.
  • chore: upgrade to Rust 1.89 (#2465)
    Codex created this PR from the following prompt:
    
    > upgrade this entire repo to Rust 1.89. Note that this requires
    updating codex-rs/rust-toolchain.toml as well as the workflows in
    .github/. Make sure that things are "clippy clean" as this change will
    likely uncover new Clippy errors. `just fmt` and `cargo clippy --tests`
    are sufficient to check for correctness
    
    Note this modifies a lot of lines because it folds nested `if`
    statements using `&&`.
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2465).
    * #2467
    * __->__ #2465
  • [tui] Support /mcp command (#2430)
    ## Summary
    Adds a `/mcp` command to list active tools. We can extend this command
    to allow configuration of MCP tools, but for now a simple list command
    will help debug if your config.toml and your tools are working as
    expected.
  • chore: move mcp-server/src/wire_format.rs to protocol/src/mcp_protocol.rs (#2423)
    The existing `wire_format.rs` should share more types with the
    `codex-protocol` crate (like `AskForApproval` instead of maintaining a
    parallel `CodexToolCallApprovalPolicy` enum), so this PR moves
    `wire_format.rs` into `codex-protocol`, renaming it as
    `mcp-protocol.rs`. We also de-dupe types, where appropriate.
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2423).
    * #2424
    * __->__ #2423
  • fix: introduce EventMsg::TurnAborted (#2365)
    Introduces `EventMsg::TurnAborted` that should be sent in response to
    `Op::Interrupt`.
    
    In the MCP server, updates the handling of a
    `ClientRequest::InterruptConversation` request such that it sends the
    `Op::Interrupt` but does not respond to the request until it sees an
    `EventMsg::TurnAborted`.
  • [tools] Add apply_patch tool (#2303)
    ## Summary
    We've been seeing a number of issues and reports with our synthetic
    `apply_patch` tool, e.g. #802. Let's make this a real tool - in my
    anecdotal testing, it's critical for GPT-OSS models, but I'd like to
    make it the standard across GPT-5 and codex models as well.
    
    ## Testing
    - [x] Tested locally
    - [x] Integration test
  • Added allow-expect-in-tests / allow-unwrap-in-tests (#2328)
    This PR:
    * Added the clippy.toml to configure allowable expect / unwrap usage in
    tests
    * Removed as many expect/allow lines as possible from tests
    * moved a bunch of allows to expects where possible
    
    Note: in integration tests, non `#[test]` helper functions are not
    covered by this so we had to leave a few lingering `expect(expect_used`
    checks around
  • Fix AF_UNIX, sockpair, recvfrom in linux sandbox (#2309)
    When using codex-tui on a linux system I was unable to run `cargo
    clippy` inside of codex due to:
    ```
    [pid 3548377] socketpair(AF_UNIX, SOCK_SEQPACKET|SOCK_CLOEXEC, 0,  <unfinished ...>
    [pid 3548370] close(8 <unfinished ...>
    [pid 3548377] <... socketpair resumed>0x7ffb97f4ed60) = -1 EPERM (Operation not permitted)
    ```
    And
    ```
    3611300 <... recvfrom resumed>0x708b8b5cffe0, 8, 0, NULL, NULL) = -1 EPERM (Operation not permitted)
    ```
    
    This PR:
    * Fixes a bug that disallowed AF_UNIX to allow it on `socket()`
    * Adds recvfrom() to the syscall allow list, this should be fine since
    we disable opening new sockets. But we should validate there is not a
    open socket inheritance issue.
    * Allow socketpair to be called for AF_UNIX
    * Adds tests for AF_UNIX components
    * All of which allows running `cargo clippy` within the sandbox on
    linux, and possibly other tooling using a fork server model + AF_UNIX
    comms.
  • fix: run python_multiprocessing_lock_works integration test on Mac and Linux (#2318)
    The high-order bit on this PR is that it makes it so `sandbox.rs` tests
    both Mac and Linux, as we introduce a general
    `spawn_command_under_sandbox()` function with platform-specific
    implementations for testing.
    
    An important, and interesting, discovery in porting the test to Linux is
    that (for reasons cited in the code comments), `/dev/shm` has to be
    added to `writable_roots` on Linux in order for `multiprocessing.Lock`
    to work there. Granting write access to `/dev/shm` comes with some
    degree of risk, so we do not make this the default for Codex CLI.
    
    Piggybacking on top of #2317, this moves the
    `python_multiprocessing_lock_works` test yet again, moving
    `codex-rs/core/tests/sandbox.rs` to `codex-rs/exec/tests/sandbox.rs`
    because in `codex-rs/exec/tests` we can use `cargo_bin()` like so:
    
    ```
    let codex_linux_sandbox_exe = assert_cmd::cargo::cargo_bin("codex-exec");
    ```
    
    which is necessary so we can use `codex_linux_sandbox_exe` and therefore
    `spawn_command_under_linux_sandbox` in an integration test.
    
    This also moves `spawn_command_under_linux_sandbox()` out of `exec.rs`
    and into `landlock.rs`, which makes things more consistent with
    `seatbelt.rs` in `codex-core`.
    
    For reference, https://github.com/openai/codex/pull/1808 is the PR that
    made the change to Seatbelt to get this test to pass on Mac.
  • chore: introduce ConversationManager as a clearinghouse for all conversations (#2240)
    This PR does two things because after I got deep into the first one I
    started pulling on the thread to the second:
    
    - Makes `ConversationManager` the place where all in-memory
    conversations are created and stored. Previously, `MessageProcessor` in
    the `codex-mcp-server` crate was doing this via its `session_map`, but
    this is something that should be done in `codex-core`.
    - It unwinds the `ctrl_c: tokio::sync::Notify` that was threaded
    throughout our code. I think this made sense at one time, but now that
    we handle Ctrl-C within the TUI and have a proper `Op::Interrupt` event,
    I don't think this was quite right, so I removed it. For `codex exec`
    and `codex proto`, we now use `tokio::signal::ctrl_c()` directly, but we
    no longer make `Notify` a field of `Codex` or `CodexConversation`.
    
    Changes of note:
    
    - Adds the files `conversation_manager.rs` and `codex_conversation.rs`
    to `codex-core`.
    - `Codex` and `CodexSpawnOk` are no longer exported from `codex-core`:
    other crates must use `CodexConversation` instead (which is created via
    `ConversationManager`).
    - `core/src/codex_wrapper.rs` has been deleted in favor of
    `ConversationManager`.
    - `ConversationManager::new_conversation()` returns `NewConversation`,
    which is in line with the `new_conversation` tool we want to add to the
    MCP server. Note `NewConversation` includes `SessionConfiguredEvent`, so
    we eliminate checks in cases like `codex-rs/core/tests/client.rs` to
    verify `SessionConfiguredEvent` is the first event because that is now
    internal to `ConversationManager`.
    - Quite a bit of code was deleted from
    `codex-rs/mcp-server/src/message_processor.rs` since it no longer has to
    manage multiple conversations itself: it goes through
    `ConversationManager` instead.
    - `core/tests/live_agent.rs` has been deleted because I had to update a
    bunch of tests and all the tests in here were ignored, and I don't think
    anyone ever ran them, so this was just technical debt, at this point.
    - Removed `notify_on_sigint()` from `util.rs` (and in a follow-up, I
    hope to refactor the blandly-named `util.rs` into more descriptive
    files).
    - In general, I started replacing local variables named `codex` as
    `conversation`, where appropriate, though admittedly I didn't do it
    through all the integration tests because that would have added a lot of
    noise to this PR.
    
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2240).
    * #2264
    * #2263
    * __->__ #2240
  • Re-add markdown streaming (#2029)
    Wait for newlines, then render markdown on a line by line basis. Word wrap it for the current terminal size and then spit it out line by line into the UI. Also adds tests and fixes some UI regressions.
  • [1/3] Parse exec commands and format them more nicely in the UI (#2095)
    # Note for reviewers
    The bulk of this PR is in in the new file, `parse_command.rs`. This file
    is designed to be written TDD and implemented with Codex. Do not worry
    about reviewing the code, just review the unit tests (if you want). If
    any cases are missing, we'll add more tests and have Codex fix them.
    
    I think the best approach will be to land and iterate. I have some
    follow-ups I want to do after this lands. The next PR after this will
    let us merge (and dedupe) multiple sequential cells of the same such as
    multiple read commands. The deduping will also be important because the
    model often reads the same file multiple times in a row in chunks
    
    ===
    
    This PR formats common commands like reading, formatting, testing, etc
    more nicely:
    
    It tries to extract things like file names, tests and falls back to the
    cmd if it doesn't. It also only shows stdout/err if the command failed.
    
    <img width="770" height="238" alt="CleanShot 2025-08-09 at 16 05 15"
    src="https://github.com/user-attachments/assets/0ead179a-8910-486b-aa3d-7d26264d751e"
    />
    <img width="348" height="158" alt="CleanShot 2025-08-09 at 16 05 32"
    src="https://github.com/user-attachments/assets/4302681b-5e87-4ff3-85b4-0252c6c485a9"
    />
    <img width="834" height="324" alt="CleanShot 2025-08-09 at 16 05 56 2"
    src="https://github.com/user-attachments/assets/09fb3517-7bd6-40f6-a126-4172106b700f"
    />
    
    Part 2: https://github.com/openai/codex/pull/2097
    Part 3: https://github.com/openai/codex/pull/2110
  • [exec] Fix exec sandbox arg (#2034)
    ## Summary
    From codex-cli 😁 
    `-s/--sandbox` now correctly affects sandbox mode.
    
    What changed
    - In `codex-rs/exec/src/cli.rs`:
    - Added `value_enum` to the `--sandbox` flag so Clap parses enum values
    into `
    SandboxModeCliArg`.
    - This ensures values like `-s read-only`, `-s workspace-write`, and `-s
    dange
    r-full-access` are recognized and propagated.
    
    Why this fixes it
    - The enum already derives `ValueEnum`, but without `#[arg(value_enum)]`
    Clap ma
    y not map the string into the enum, leaving the option ineffective at
    runtime. W
    ith `value_enum`, `sandbox_mode` is parsed and then converted to
    `SandboxMode` i
    n `run_main`, which feeds into `ConfigOverrides` and ultimately into the
    effecti
    ve `sandbox_policy`.
  • [config] Onboarding flow with persistence (#1929)
    ## Summary
    In collaboration with @gpeal: upgrade the onboarding flow, and persist
    user settings.
    
    ---------
    
    Co-authored-by: Gabriel Peal <gabriel@openai.com>