Commit Graph

85 Commits

  • chore: move mcp-server/src/wire_format.rs to protocol/src/mcp_protocol.rs (#2423)
    The existing `wire_format.rs` should share more types with the
    `codex-protocol` crate (like `AskForApproval` instead of maintaining a
    parallel `CodexToolCallApprovalPolicy` enum), so this PR moves
    `wire_format.rs` into `codex-protocol`, renaming it as
    `mcp-protocol.rs`. We also de-dupe types, where appropriate.
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2423).
    * #2424
    * __->__ #2423
  • fix: introduce EventMsg::TurnAborted (#2365)
    Introduces `EventMsg::TurnAborted` that should be sent in response to
    `Op::Interrupt`.
    
    In the MCP server, updates the handling of a
    `ClientRequest::InterruptConversation` request such that it sends the
    `Op::Interrupt` but does not respond to the request until it sees an
    `EventMsg::TurnAborted`.
  • [tools] Add apply_patch tool (#2303)
    ## Summary
    We've been seeing a number of issues and reports with our synthetic
    `apply_patch` tool, e.g. #802. Let's make this a real tool - in my
    anecdotal testing, it's critical for GPT-OSS models, but I'd like to
    make it the standard across GPT-5 and codex models as well.
    
    ## Testing
    - [x] Tested locally
    - [x] Integration test
  • Added allow-expect-in-tests / allow-unwrap-in-tests (#2328)
    This PR:
    * Added the clippy.toml to configure allowable expect / unwrap usage in
    tests
    * Removed as many expect/allow lines as possible from tests
    * moved a bunch of allows to expects where possible
    
    Note: in integration tests, non `#[test]` helper functions are not
    covered by this so we had to leave a few lingering `expect(expect_used`
    checks around
  • Fix AF_UNIX, sockpair, recvfrom in linux sandbox (#2309)
    When using codex-tui on a linux system I was unable to run `cargo
    clippy` inside of codex due to:
    ```
    [pid 3548377] socketpair(AF_UNIX, SOCK_SEQPACKET|SOCK_CLOEXEC, 0,  <unfinished ...>
    [pid 3548370] close(8 <unfinished ...>
    [pid 3548377] <... socketpair resumed>0x7ffb97f4ed60) = -1 EPERM (Operation not permitted)
    ```
    And
    ```
    3611300 <... recvfrom resumed>0x708b8b5cffe0, 8, 0, NULL, NULL) = -1 EPERM (Operation not permitted)
    ```
    
    This PR:
    * Fixes a bug that disallowed AF_UNIX to allow it on `socket()`
    * Adds recvfrom() to the syscall allow list, this should be fine since
    we disable opening new sockets. But we should validate there is not a
    open socket inheritance issue.
    * Allow socketpair to be called for AF_UNIX
    * Adds tests for AF_UNIX components
    * All of which allows running `cargo clippy` within the sandbox on
    linux, and possibly other tooling using a fork server model + AF_UNIX
    comms.
  • fix: run python_multiprocessing_lock_works integration test on Mac and Linux (#2318)
    The high-order bit on this PR is that it makes it so `sandbox.rs` tests
    both Mac and Linux, as we introduce a general
    `spawn_command_under_sandbox()` function with platform-specific
    implementations for testing.
    
    An important, and interesting, discovery in porting the test to Linux is
    that (for reasons cited in the code comments), `/dev/shm` has to be
    added to `writable_roots` on Linux in order for `multiprocessing.Lock`
    to work there. Granting write access to `/dev/shm` comes with some
    degree of risk, so we do not make this the default for Codex CLI.
    
    Piggybacking on top of #2317, this moves the
    `python_multiprocessing_lock_works` test yet again, moving
    `codex-rs/core/tests/sandbox.rs` to `codex-rs/exec/tests/sandbox.rs`
    because in `codex-rs/exec/tests` we can use `cargo_bin()` like so:
    
    ```
    let codex_linux_sandbox_exe = assert_cmd::cargo::cargo_bin("codex-exec");
    ```
    
    which is necessary so we can use `codex_linux_sandbox_exe` and therefore
    `spawn_command_under_linux_sandbox` in an integration test.
    
    This also moves `spawn_command_under_linux_sandbox()` out of `exec.rs`
    and into `landlock.rs`, which makes things more consistent with
    `seatbelt.rs` in `codex-core`.
    
    For reference, https://github.com/openai/codex/pull/1808 is the PR that
    made the change to Seatbelt to get this test to pass on Mac.
  • chore: introduce ConversationManager as a clearinghouse for all conversations (#2240)
    This PR does two things because after I got deep into the first one I
    started pulling on the thread to the second:
    
    - Makes `ConversationManager` the place where all in-memory
    conversations are created and stored. Previously, `MessageProcessor` in
    the `codex-mcp-server` crate was doing this via its `session_map`, but
    this is something that should be done in `codex-core`.
    - It unwinds the `ctrl_c: tokio::sync::Notify` that was threaded
    throughout our code. I think this made sense at one time, but now that
    we handle Ctrl-C within the TUI and have a proper `Op::Interrupt` event,
    I don't think this was quite right, so I removed it. For `codex exec`
    and `codex proto`, we now use `tokio::signal::ctrl_c()` directly, but we
    no longer make `Notify` a field of `Codex` or `CodexConversation`.
    
    Changes of note:
    
    - Adds the files `conversation_manager.rs` and `codex_conversation.rs`
    to `codex-core`.
    - `Codex` and `CodexSpawnOk` are no longer exported from `codex-core`:
    other crates must use `CodexConversation` instead (which is created via
    `ConversationManager`).
    - `core/src/codex_wrapper.rs` has been deleted in favor of
    `ConversationManager`.
    - `ConversationManager::new_conversation()` returns `NewConversation`,
    which is in line with the `new_conversation` tool we want to add to the
    MCP server. Note `NewConversation` includes `SessionConfiguredEvent`, so
    we eliminate checks in cases like `codex-rs/core/tests/client.rs` to
    verify `SessionConfiguredEvent` is the first event because that is now
    internal to `ConversationManager`.
    - Quite a bit of code was deleted from
    `codex-rs/mcp-server/src/message_processor.rs` since it no longer has to
    manage multiple conversations itself: it goes through
    `ConversationManager` instead.
    - `core/tests/live_agent.rs` has been deleted because I had to update a
    bunch of tests and all the tests in here were ignored, and I don't think
    anyone ever ran them, so this was just technical debt, at this point.
    - Removed `notify_on_sigint()` from `util.rs` (and in a follow-up, I
    hope to refactor the blandly-named `util.rs` into more descriptive
    files).
    - In general, I started replacing local variables named `codex` as
    `conversation`, where appropriate, though admittedly I didn't do it
    through all the integration tests because that would have added a lot of
    noise to this PR.
    
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2240).
    * #2264
    * #2263
    * __->__ #2240
  • Re-add markdown streaming (#2029)
    Wait for newlines, then render markdown on a line by line basis. Word wrap it for the current terminal size and then spit it out line by line into the UI. Also adds tests and fixes some UI regressions.
  • [1/3] Parse exec commands and format them more nicely in the UI (#2095)
    # Note for reviewers
    The bulk of this PR is in in the new file, `parse_command.rs`. This file
    is designed to be written TDD and implemented with Codex. Do not worry
    about reviewing the code, just review the unit tests (if you want). If
    any cases are missing, we'll add more tests and have Codex fix them.
    
    I think the best approach will be to land and iterate. I have some
    follow-ups I want to do after this lands. The next PR after this will
    let us merge (and dedupe) multiple sequential cells of the same such as
    multiple read commands. The deduping will also be important because the
    model often reads the same file multiple times in a row in chunks
    
    ===
    
    This PR formats common commands like reading, formatting, testing, etc
    more nicely:
    
    It tries to extract things like file names, tests and falls back to the
    cmd if it doesn't. It also only shows stdout/err if the command failed.
    
    <img width="770" height="238" alt="CleanShot 2025-08-09 at 16 05 15"
    src="https://github.com/user-attachments/assets/0ead179a-8910-486b-aa3d-7d26264d751e"
    />
    <img width="348" height="158" alt="CleanShot 2025-08-09 at 16 05 32"
    src="https://github.com/user-attachments/assets/4302681b-5e87-4ff3-85b4-0252c6c485a9"
    />
    <img width="834" height="324" alt="CleanShot 2025-08-09 at 16 05 56 2"
    src="https://github.com/user-attachments/assets/09fb3517-7bd6-40f6-a126-4172106b700f"
    />
    
    Part 2: https://github.com/openai/codex/pull/2097
    Part 3: https://github.com/openai/codex/pull/2110
  • [exec] Fix exec sandbox arg (#2034)
    ## Summary
    From codex-cli 😁 
    `-s/--sandbox` now correctly affects sandbox mode.
    
    What changed
    - In `codex-rs/exec/src/cli.rs`:
    - Added `value_enum` to the `--sandbox` flag so Clap parses enum values
    into `
    SandboxModeCliArg`.
    - This ensures values like `-s read-only`, `-s workspace-write`, and `-s
    dange
    r-full-access` are recognized and propagated.
    
    Why this fixes it
    - The enum already derives `ValueEnum`, but without `#[arg(value_enum)]`
    Clap ma
    y not map the string into the enum, leaving the option ineffective at
    runtime. W
    ith `value_enum`, `sandbox_mode` is parsed and then converted to
    `SandboxMode` i
    n `run_main`, which feeds into `ConfigOverrides` and ultimately into the
    effecti
    ve `sandbox_policy`.
  • [config] Onboarding flow with persistence (#1929)
    ## Summary
    In collaboration with @gpeal: upgrade the onboarding flow, and persist
    user settings.
    
    ---------
    
    Co-authored-by: Gabriel Peal <gabriel@openai.com>
  • [fix] fix absolute and % token counts (#1931)
    - For absolute, use non-cached input + output.
    - For estimating what % of the model's context window is used, we need
    to account for reasoning output tokens from prior turns being dropped
    from the context window. We approximate this here by subtracting
    reasoning output tokens from the total. This will be off for the current
    turn and pending function calls. We can improve it later.
  • Migrate GitWarning to OnboardingScreen (#1915)
    This paves the way to do per-directory approval settings
    (https://github.com/openai/codex/pull/1912).
    
    This also lets us pass in a Config/ChatWidgetArgs into onboarding which
    can then mutate it and emit the ChatWidgetArgs it wants at the end which
    may be modified by the said approval settings.
    
    <img width="1180" height="428" alt="CleanShot 2025-08-06 at 19 30 55"
    src="https://github.com/user-attachments/assets/4dcfda42-0f5e-4b6d-a16d-2597109cc31c"
    />
  • [feat] add /status slash command (#1873)
    - Added a `/status` command, which will be useful when we update the
    home screen to print less status.
    - Moved `create_config_summary_entries` to common since it's used in a
    few places.
    - Noticed we inconsistently had periods in slash command descriptions
    and just removed them everywhere.
    - Noticed the diff description was overflowing so made it shorter.
  • fix: exit cleanly when ShutdownComplete is received (#1864)
    Previous to this PR, `ShutdownComplete` was not being handled correctly
    in `codex exec`, so it always ended up printing the following to stderr:
    
    ```
    ERROR codex_exec: Error receiving event: InternalAgentDied
    ```
    
    Because we were not breaking out of the loop for `ShutdownComplete`,
    inevitably `codex.next_event()` would get called again and
    `rx_event.recv()` would fail and the error would get mapped to
    `InternalAgentDied`:
    
    
    https://github.com/openai/codex/blob/ea7d3f27bdc1da61df979419515889f64f36c5ce/codex-rs/core/src/codex.rs#L190-L197
    
    For reference, https://github.com/openai/codex/pull/1647 introduced the
    `ShutdownComplete` variant.
  • chore: remove unnecessary default_ prefix (#1854)
    This prefix is not inline with the other fields on the `ConfigOverrides`
    struct.
  • fix: when using --oss, ensure correct configuration is threaded through correctly (#1859)
    This PR started as an investigation with the goal of eliminating the use
    of `unsafe { std::env::set_var() }` in `ollama/src/client.rs`, as
    setting environment variables in a multithreaded context is indeed
    unsafe and these tests were observed to be flaky, as a result.
    
    Though as I dug deeper into the issue, I discovered that the logic for
    instantiating `OllamaClient` under test scenarios was not quite right.
    In this PR, I aimed to:
    
    - share more code between the two creation codepaths,
    `try_from_oss_provider()` and `try_from_provider_with_base_url()`
    - use the values from `Config` when setting up Ollama, as we have
    various mechanisms for overriding config values, so we should be sure
    that we are always using the ultimate `Config` for things such as the
    `ModelProviderInfo` associated with the `oss` id
    
    Once this was in place,
    `OllamaClient::try_from_provider_with_base_url()` could be used in unit
    tests for `OllamaClient` so it was possible to create a properly
    configured client without having to set environment variables.
  • Introduce --oss flag to use gpt-oss models (#1848)
    This adds support for easily running Codex backed by a local Ollama
    instance running our new open source models. See
    https://github.com/openai/gpt-oss for details.
    
    If you pass in `--oss` you'll be prompted to install/launch ollama, and
    it will automatically download the 20b model and attempt to use it.
    
    We'll likely want to expand this with some options later to make the
    experience smoother for users who can't run the 20b or want to run the
    120b.
    
    Co-authored-by: Michael Bolin <mbolin@openai.com>
  • Rescue chat completion changes (#1846)
    https://github.com/openai/codex/pull/1835 has some messed up history.
    
    This adds support for streaming chat completions, which is useful for ollama. We should probably take a very skeptical eye to the code introduced in this PR.
    
    ---------
    
    Co-authored-by: Ahmed Ibrahim <aibrahim@openai.com>
  • chore: introduce ModelFamily abstraction (#1838)
    To date, we have a number of hardcoded OpenAI model slug checks spread
    throughout the codebase, which makes it hard to audit the various
    special cases for each model. To mitigate this issue, this PR introduces
    the idea of a `ModelFamily` that has fields to represent the existing
    special cases, such as `supports_reasoning_summaries` and
    `uses_local_shell_tool`.
    
    There is a `find_family_for_model()` function that maps the raw model
    slug to a `ModelFamily`. This function hardcodes all the knowledge about
    the special attributes for each model. This PR then replaces the
    hardcoded model name checks with checks against a `ModelFamily`.
    
    Note `ModelFamily` is now available as `Config::model_family`. We should
    ultimately remove `Config::model` in favor of
    `Config::model_family::slug`.
  • [codex] stop printing error message when --output-last-message is not specified (#1828)
    Previously, `codex exec` was printing `Warning: no file to write last
    message to` as a warning to stderr even though `--output-last-message`
    was not specified, which is wrong. This fixes the code and changes
    `handle_last_message()` so that it is only called when
    `last_message_path` is `Some`.
  • Add a TurnDiffTracker to create a unified diff for an entire turn (#1770)
    This lets us show an accumulating diff across all patches in a turn.
    Refer to the docs for TurnDiffTracker for implementation details.
    
    There are multiple ways this could have been done and this felt like the
    right tradeoff between reliability and completeness:
    *Pros*
    * It will pick up all changes to files that the model touched including
    if they prettier or another command that updates them.
    * It will not pick up changes made by the user or other agents to files
    it didn't modify.
    
    *Cons*
    * It will pick up changes that the user made to a file that the model
    also touched
    * It will not pick up changes to codegen or files that were not modified
    with apply_patch
  • fix command duration display (#1806)
    we were always displaying "0ms" before.
    
    <img width="731" height="101" alt="Screenshot 2025-08-02 at 10 51 22 PM"
    src="https://github.com/user-attachments/assets/f56814ed-b9a4-4164-9e78-181c60ce19b7"
    />
  • feat: stream exec stdout events (#1786)
    ## Summary
    - stream command stdout as `ExecCommandStdout` events
    - forward streamed stdout to clients and ignore in human output
    processor
    - adjust call sites for new streaming API
  • Auto format toml (#1745)
    Add recommended extension and configure it to auto format prompt.
  • fix: run apply_patch calls through the sandbox (#1705)
    Building on the work of https://github.com/openai/codex/pull/1702, this
    changes how a shell call to `apply_patch` is handled.
    
    Previously, a shell call to `apply_patch` was always handled in-process,
    never leveraging a sandbox. To determine whether the `apply_patch`
    operation could be auto-approved, the
    `is_write_patch_constrained_to_writable_paths()` function would check if
    all the paths listed in the paths were writable. If so, the agent would
    apply the changes listed in the patch.
    
    Unfortunately, this approach afforded a loophole: symlinks!
    
    * For a soft link, we could fix this issue by tracing the link and
    checking whether the target is in the set of writable paths, however...
    * ...For a hard link, things are not as simple. We can run `stat FILE`
    to see if the number of links is greater than 1, but then we would have
    to do something potentially expensive like `find . -inum <inode_number>`
    to find the other paths for `FILE`. Further, even if this worked, this
    approach runs the risk of a
    [TOCTOU](https://en.wikipedia.org/wiki/Time-of-check_to_time-of-use)
    race condition, so it is not robust.
    
    The solution, implemented in this PR, is to take the virtual execution
    of the `apply_patch` CLI into an _actual_ execution using `codex
    --codex-run-as-apply-patch PATCH`, which we can run under the sandbox
    the user specified, just like any other `shell` call.
    
    This, of course, assumes that the sandbox prevents writing through
    symlinks as a mechanism to write to folders that are not in the writable
    set configured by the sandbox. I verified this by testing the following
    on both Mac and Linux:
    
    ```shell
    #!/usr/bin/env bash
    set -euo pipefail
    
    # Can running a command in SANDBOX_DIR write a file in EXPLOIT_DIR?
    
    # Codex is run in SANDBOX_DIR, so writes should be constrianed to this directory.
    SANDBOX_DIR=$(mktemp -d -p "$HOME" sandboxtesttemp.XXXXXX)
    # EXPLOIT_DIR is outside of SANDBOX_DIR, so let's see if we can write to it.
    EXPLOIT_DIR=$(mktemp -d -p "$HOME" sandboxtesttemp.XXXXXX)
    
    echo "SANDBOX_DIR: $SANDBOX_DIR"
    echo "EXPLOIT_DIR: $EXPLOIT_DIR"
    
    cleanup() {
      # Only remove if it looks sane and still exists
      [[ -n "${SANDBOX_DIR:-}" && -d "$SANDBOX_DIR" ]] && rm -rf -- "$SANDBOX_DIR"
      [[ -n "${EXPLOIT_DIR:-}" && -d "$EXPLOIT_DIR" ]] && rm -rf -- "$EXPLOIT_DIR"
    }
    
    trap cleanup EXIT
    
    echo "I am the original content" > "${EXPLOIT_DIR}/original.txt"
    
    # Drop the -s to test hard links.
    ln -s "${EXPLOIT_DIR}/original.txt" "${SANDBOX_DIR}/link-to-original.txt"
    
    cat "${SANDBOX_DIR}/link-to-original.txt"
    
    if [[ "$(uname)" == "Linux" ]]; then
        SANDBOX_SUBCOMMAND=landlock
    else
        SANDBOX_SUBCOMMAND=seatbelt
    fi
    
    # Attempt the exploit
    cd "${SANDBOX_DIR}"
    
    codex debug "${SANDBOX_SUBCOMMAND}" bash -lc "echo pwned > ./link-to-original.txt" || true
    
    cat "${EXPLOIT_DIR}/original.txt"
    ```
    
    Admittedly, this change merits a proper integration test, but I think I
    will have to do that in a follow-up PR.
  • remove conversation history widget (#1727)
    this widget is no longer used.
  • Add an experimental plan tool (#1726)
    This adds a tool the model can call to update a plan. The tool doesn't
    actually _do_ anything but it gives clients a chance to read and render
    the structured plan. We will likely iterate on the prompt and tools
    exposed for planning over time.
  • Relative instruction file (#1722)
    Passing in an instruction file with a bad path led to silent failures,
    also instruction relative paths were handled in an unintuitive fashion.
  • fix: support special --codex-run-as-apply-patch arg (#1702)
    This introduces some special behavior to the CLIs that are using the
    `codex-arg0` crate where if `arg1` is `--codex-run-as-apply-patch`, then
    it will run as if `apply_patch arg2` were invoked. This is important
    because it means we can do things like:
    
    ```
    SANDBOX_TYPE=landlock # or seatbelt for macOS
    codex debug "${SANDBOX_TYPE}" -- codex --codex-run-as-apply-patch PATCH
    ```
    
    which gives us a way to run `apply_patch` while ensuring it adheres to
    the sandbox the user specified.
    
    While it would be nice to use the `arg0` trick like we are currently
    doing for `codex-linux-sandbox`, there is no way to specify the `arg0`
    for the underlying command when running under `/usr/bin/sandbox-exec`,
    so it will not work for us in this case.
    
    Admittedly, we could have also supported this via a custom environment
    variable (e.g., `CODEX_ARG0`), but since environment variables are
    inherited by child processes, that seemed like a potentially leakier
    abstraction.
    
    This change, as well as our existing reliance on checking `arg0`, place
    additional requirements on those who include `codex-core`. Its
    `README.md` has been updated to reflect this.
    
    While we could have just added an `apply-patch` subcommand to the
    `codex` multitool CLI, that would not be sufficient for the standalone
    `codex-exec` CLI, which is something that we distribute as part of our
    GitHub releases for those who know they will not be using the TUI and
    therefore prefer to use a slightly smaller executable:
    
    https://github.com/openai/codex/releases/tag/rust-v0.10.0
    
    To that end, this PR adds an integration test to ensure that the
    `--codex-run-as-apply-patch` option works with the standalone
    `codex-exec` CLI.
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/1702).
    * #1705
    * #1703
    * __->__ #1702
    * #1698
    * #1697
  • chore: update Codex::spawn() to return a struct instead of a tuple (#1677)
    Also update `init_codex()` to return a `struct` instead of a tuple, as well.
  • Update render name in tui for approval_policy to match with config values (#1675)
    Currently, codex on start shows the value for the approval policy as
    name of
    [AskForApproval](https://github.com/openai/codex/blob/2437a8d17a0cf972d1a6e7f303d469b6e2f57eae/codex-rs/core/src/protocol.rs#L128)
    enum, which differs from
    [approval_policy](https://github.com/openai/codex/blob/2437a8d17a0cf972d1a6e7f303d469b6e2f57eae/codex-rs/config.md#approval_policy)
    config values.
    E.g. "untrusted" becomes "UnlessTrusted", "on-failure" -> "OnFailure",
    "never" -> "Never".
    This PR changes render names of the approval policy to match with
    configuration values.
  • Flaky CI fix (#1647)
    Flushing before sending `TaskCompleteEvent` and ending the submission
    loop to avoid race conditions.
  • Add support for custom base instructions (#1645)
    Allows providing custom instructions file as a config parameter and
    custom instruction text via MCP tool call.
  • [mcp-server] Add reply tool call (#1643)
    ## Summary
    Adds a new mcp tool call, `codex-reply`, so we can continue existing
    sessions. This is a first draft and does not yet support sessions from
    previous processes.
    
    ## Testing
    - [x] tested with mcp client
  • feat: add --json flag to codex exec (#1603)
    This is designed to facilitate programmatic use of Codex in a more
    lightweight way than using `codex mcp`.
    
    Passing `--json` to `codex exec` will print each event as a line of JSON
    to stdout. Note that it does not print the individual tokens as they are
    streamed, only full messages, as this is aimed at programmatic use
    rather than to power UI.
    
    <img width="1348" height="1307" alt="image"
    src="https://github.com/user-attachments/assets/fc7908de-b78d-46e4-a6ff-c85de28415c7"
    />
    
    I changed the existing `EventProcessor` into a trait and moved the
    implementation to `EventProcessorWithHumanOutput`. Then I introduced an
    alternative implementation, `EventProcessorWithJsonOutput`. The `--json`
    flag determines which implementation to use.
  • Add streaming to exec and tui (#1594)
    Added support for streaming in `tui`
    Added support for streaming in `exec`
    
    
    https://github.com/user-attachments/assets/4215892e-d940-452c-a1d0-416ed0cf14eb
  • support deltas in core (#1587)
    - Added support for message and reasoning deltas
    - Skipped adding the support in the cli and tui for later
    - Commented a failing test (wrong merge) that needs fix in a separate
    PR.
    
    Side note: I think we need to disable merge when the CI don't pass.
  • feat: add new config option: model_supports_reasoning_summaries (#1524)
    As noted in the updated docs, this makes it so that you can set:
    
    ```toml
    model_supports_reasoning_summaries = true
    ```
    
    as a way of overriding the existing heuristic for when to set the
    `reasoning` field on a sampling request:
    
    
    https://github.com/openai/codex/blob/341c091c5b09dc706ab5c7d629516e6ef5aaf902/codex-rs/core/src/client_common.rs#L152-L166
  • chore(rs): update dependencies (#1494)
    ### Chores
    - Update cargo dependencies
    - Remove unused cargo dependencies
    - Fix clippy warnings
    - Update Dockerfile (package.json requires node 22)
    - Let Dependabot update bun, cargo, devcontainers, docker,
    github-actions, npm (nix still not supported)
    
    ### TODO
    - Upgrade dependencies with breaking changes
    
    ```shell
    $ cargo update --verbose
       Unchanged crossterm v0.28.1 (available: v0.29.0)
       Unchanged schemars v0.8.22 (available: v1.0.4)
    ```
  • feat: add support for --sandbox flag (#1476)
    On a high-level, we try to design `config.toml` so that you don't have
    to "comment out a lot of stuff" when testing different options.
    
    Previously, defining a sandbox policy was somewhat at odds with this
    principle because you would define the policy as attributes of
    `[sandbox]` like so:
    
    ```toml
    [sandbox]
    mode = "workspace-write"
    writable_roots = [ "/tmp" ]
    ```
    
    but if you wanted to temporarily change to a read-only sandbox, you
    might feel compelled to modify your file to be:
    
    ```toml
    [sandbox]
    mode = "read-only"
    # mode = "workspace-write"
    # writable_roots = [ "/tmp" ]
    ```
    
    Technically, commenting out `writable_roots` would not be strictly
    necessary, as `mode = "read-only"` would ignore `writable_roots`, but
    it's still a reasonable thing to do to keep things tidy.
    
    Currently, the various values for `mode` do not support that many
    attributes, so this is not that hard to maintain, but one could imagine
    this becoming more complex in the future.
    
    In this PR, we change Codex CLI so that it no longer recognizes
    `[sandbox]`. Instead, it introduces a top-level option, `sandbox_mode`,
    and `[sandbox_workspace_write]` is used to further configure the sandbox
    when when `sandbox_mode = "workspace-write"` is used:
    
    ```toml
    sandbox_mode = "workspace-write"
    
    [sandbox_workspace_write]
    writable_roots = [ "/tmp" ]
    ```
    
    This feels a bit more future-proof in that it is less tedious to
    configure different sandboxes:
    
    ```toml
    sandbox_mode = "workspace-write"
    
    [sandbox_read_only]
    # read-only options here...
    
    [sandbox_workspace_write]
    writable_roots = [ "/tmp" ]
    
    [sandbox_danger_full_access]
    # danger-full-access options here...
    ```
    
    In this scheme, you never need to comment out the configuration for an
    individual sandbox type: you only need to redefine `sandbox_mode`.
    
    Relatedly, previous to this change, a user had to do `-c
    sandbox.mode=read-only` to change the mode on the command line. With
    this change, things are arguably a bit cleaner because the equivalent
    option is `-c sandbox_mode=read-only` (and now `-c
    sandbox_workspace_write=...` can be set separately).
    
    Though more importantly, we introduce the `-s/--sandbox` option to the
    CLI, which maps directly to `sandbox_mode` in `config.toml`, making
    config override behavior easier to reason about. Moreover, as you can
    see in the updates to the various Markdown files, it is much easier to
    explain how to configure sandboxing when things like `--sandbox
    read-only` can be used as an example.
    
    Relatedly, this cleanup also made it straightforward to add support for
    a `sandbox` option for Codex when used as an MCP server (see the changes
    to `mcp-server/src/codex_tool_config.rs`).
    
    Fixes https://github.com/openai/codex/issues/1248.
  • feat: show number of tokens remaining in UI (#1388)
    When using the OpenAI Responses API, we now record the `usage` field for
    a `"response.completed"` event, which includes metrics about the number
    of tokens consumed. We also introduce `openai_model_info.rs`, which
    includes current data about the most common OpenAI models available via
    the API (specifically `context_window` and `max_output_tokens`). If
    Codex does not recognize the model, you can set `model_context_window`
    and `model_max_output_tokens` explicitly in `config.toml`.
    
    When then introduce a new event type to `protocol.rs`, `TokenCount`,
    which includes the `TokenUsage` for the most recent turn.
    
    Finally, we update the TUI to record the running sum of tokens used so
    the percentage of available context window remaining can be reported via
    the placeholder text for the composer:
    
    ![Screenshot 2025-06-25 at 11 20
    55 PM](https://github.com/user-attachments/assets/6fd6982f-7247-4f14-84b2-2e600cb1fd49)
    
    We could certainly get much fancier with this (such as reporting the
    estimated cost of the conversation), but for now, we are just trying to
    achieve feature parity with the TypeScript CLI.
    
    Though arguably this improves upon the TypeScript CLI, as the TypeScript
    CLI uses heuristics to estimate the number of tokens used rather than
    using the `usage` information directly:
    
    
    https://github.com/openai/codex/blob/296996d74e345b1b05d8c3451a06ace21c5ada96/codex-cli/src/utils/approximate-tokens-used.ts#L3-L16
    
    Fixes https://github.com/openai/codex/issues/1242
  • feat: add --dangerously-bypass-approvals-and-sandbox (#1384)
    This PR reworks `assess_command_safety()` so that the combination of
    `AskForApproval::Never` and `SandboxPolicy::DangerFullAccess` ensures
    that commands are run without _any_ sandbox and the user should never be
    prompted. In turn, it adds support for a new
    `--dangerously-bypass-approvals-and-sandbox` flag (that cannot be used
    with `--approval-policy` or `--full-auto`) that sets both of those
    options.
    
    Fixes https://github.com/openai/codex/issues/1254
  • chore: improve docstring for --full-auto (#1379)
    Reference `-c sandbox.mode=workspace-write` in the docstring and users
    can read the config docs for `sandbox` for more information.
  • fix: pretty-print the sandbox config in the TUI/exec modes (#1376)
    Now that https://github.com/openai/codex/pull/1373 simplified the
    sandbox config, we can print something much simpler in the TUI (and in
    `codex exec`) to summarize the sandbox config.
    
    Before:
    
    ![Screenshot 2025-06-24 at 5 45
    52 PM](https://github.com/user-attachments/assets/b7633efb-a619-43e1-9abe-7bb0be2d0ec0)
    
    With this change:
    
    ![Screenshot 2025-06-24 at 5 46
    44 PM](https://github.com/user-attachments/assets/8d099bdd-a429-4796-a08d-70931d984e4f)
    
    For reference, my `config.toml` contains:
    
    ```
    [sandbox]
    mode = "workspace-write"
    writable_roots = ["/tmp", "/Users/mbolin/.pyenv/shims"]
    ```
    
    Fixes https://github.com/openai/codex/issues/1248