Commit Graph

15 Commits

  • chore: clippy on redundant closure (#4058)
    Add redundant closure clippy rules and let Codex fix it by minimising
    FQP
  • fix: ensure cwd for conversation and sandbox are separate concerns (#3874)
    Previous to this PR, both of these functions take a single `cwd`:
    
    
    https://github.com/openai/codex/blob/71038381aa0f51aa62e1a2bcc7cbf26a05b141f3/codex-rs/core/src/seatbelt.rs#L19-L25
    
    
    https://github.com/openai/codex/blob/71038381aa0f51aa62e1a2bcc7cbf26a05b141f3/codex-rs/core/src/landlock.rs#L16-L23
    
    whereas `cwd` and `sandbox_cwd` should be set independently (fixed in
    this PR).
    
    Added `sandbox_distinguishes_command_and_policy_cwds()` to
    `codex-rs/exec/tests/suite/sandbox.rs` to verify this.
  • Include command output when sending timeout to model (#3576)
    Being able to see the output helps the model decide how to handle the
    timeout.
  • test: faster test execution in codex-core (#2633)
    this dramatically improves time to run `cargo test -p codex-core` (~25x
    speedup).
    
    before:
    ```
    cargo test -p codex-core  35.96s user 68.63s system 19% cpu 8:49.80 total
    ```
    
    after:
    ```
    cargo test -p codex-core  5.51s user 8.16s system 63% cpu 21.407 total
    ```
    
    both tests measured "hot", i.e. on a 2nd run with no filesystem changes,
    to exclude compile times.
    
    approach inspired by [Delete Cargo Integration
    Tests](https://matklad.github.io/2021/02/27/delete-cargo-integration-tests.html),
    we move all test cases in tests/ into a single suite in order to have a
    single binary, as there is significant overhead for each test binary
    executed, and because test execution is only parallelized with a single
    binary.
  • Added allow-expect-in-tests / allow-unwrap-in-tests (#2328)
    This PR:
    * Added the clippy.toml to configure allowable expect / unwrap usage in
    tests
    * Removed as many expect/allow lines as possible from tests
    * moved a bunch of allows to expects where possible
    
    Note: in integration tests, non `#[test]` helper functions are not
    covered by this so we had to leave a few lingering `expect(expect_used`
    checks around
  • chore: introduce ConversationManager as a clearinghouse for all conversations (#2240)
    This PR does two things because after I got deep into the first one I
    started pulling on the thread to the second:
    
    - Makes `ConversationManager` the place where all in-memory
    conversations are created and stored. Previously, `MessageProcessor` in
    the `codex-mcp-server` crate was doing this via its `session_map`, but
    this is something that should be done in `codex-core`.
    - It unwinds the `ctrl_c: tokio::sync::Notify` that was threaded
    throughout our code. I think this made sense at one time, but now that
    we handle Ctrl-C within the TUI and have a proper `Op::Interrupt` event,
    I don't think this was quite right, so I removed it. For `codex exec`
    and `codex proto`, we now use `tokio::signal::ctrl_c()` directly, but we
    no longer make `Notify` a field of `Codex` or `CodexConversation`.
    
    Changes of note:
    
    - Adds the files `conversation_manager.rs` and `codex_conversation.rs`
    to `codex-core`.
    - `Codex` and `CodexSpawnOk` are no longer exported from `codex-core`:
    other crates must use `CodexConversation` instead (which is created via
    `ConversationManager`).
    - `core/src/codex_wrapper.rs` has been deleted in favor of
    `ConversationManager`.
    - `ConversationManager::new_conversation()` returns `NewConversation`,
    which is in line with the `new_conversation` tool we want to add to the
    MCP server. Note `NewConversation` includes `SessionConfiguredEvent`, so
    we eliminate checks in cases like `codex-rs/core/tests/client.rs` to
    verify `SessionConfiguredEvent` is the first event because that is now
    internal to `ConversationManager`.
    - Quite a bit of code was deleted from
    `codex-rs/mcp-server/src/message_processor.rs` since it no longer has to
    manage multiple conversations itself: it goes through
    `ConversationManager` instead.
    - `core/tests/live_agent.rs` has been deleted because I had to update a
    bunch of tests and all the tests in here were ignored, and I don't think
    anyone ever ran them, so this was just technical debt, at this point.
    - Removed `notify_on_sigint()` from `util.rs` (and in a follow-up, I
    hope to refactor the blandly-named `util.rs` into more descriptive
    files).
    - In general, I started replacing local variables named `codex` as
    `conversation`, where appropriate, though admittedly I didn't do it
    through all the integration tests because that would have added a lot of
    noise to this PR.
    
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2240).
    * #2264
    * #2263
    * __->__ #2240
  • Include output truncation message in tool call results (#2183)
    To avoid model being confused about incomplete output.
  • feat: add /tmp by default (#1919)
    Replaces the `include_default_writable_roots` option on
    `sandbox_workspace_write` (that defaulted to `true`, which was slightly
    weird/annoying) with `exclude_tmpdir_env_var`, which defaults to
    `false`.
    
    Though perhaps more importantly `/tmp` is now enabled by default as part
    of `sandbox_mode = "workspace-write"`, though `exclude_slash_tmp =
    false` can be used to disable this.
  • [approval_policy] Add OnRequest approval_policy (#1865)
    ## Summary
    A split-up PR of #1763 , stacked on top of a tools refactor #1858 to
    make the change clearer. From the previous summary:
    
    > Let's try something new: tell the model about the sandbox, and let it
    decide when it will need to break the sandbox. Some local testing
    suggests that it works pretty well with zero iteration on the prompt!
    
    ## Testing
    - [x] Added unit tests
    - [x] Tested locally and it appears to work smoothly!
  • chore: introduce SandboxPolicy::WorkspaceWrite::include_default_writable_roots (#1785)
    Without this change, it is challenging to create integration tests to
    verify that the folders not included in `writable_roots` in
    `SandboxPolicy::WorkspaceWrite` are read-only because, by default,
    `get_writable_roots_with_cwd()` includes `TMPDIR`, which is where most
    integrationt
    tests do their work.
    
    This introduces a `use_exact_writable_roots` option to disable the
    default
    includes returned by `get_writable_roots_with_cwd()`.
    
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/1785).
    * #1765
    * __->__ #1785
  • feat: stream exec stdout events (#1786)
    ## Summary
    - stream command stdout as `ExecCommandStdout` events
    - forward streamed stdout to clients and ignore in human output
    processor
    - adjust call sites for new streaming API
  • chore(rs): update dependencies (#1494)
    ### Chores
    - Update cargo dependencies
    - Remove unused cargo dependencies
    - Fix clippy warnings
    - Update Dockerfile (package.json requires node 22)
    - Let Dependabot update bun, cargo, devcontainers, docker,
    github-actions, npm (nix still not supported)
    
    ### TODO
    - Upgrade dependencies with breaking changes
    
    ```shell
    $ cargo update --verbose
       Unchanged crossterm v0.28.1 (available: v0.29.0)
       Unchanged schemars v0.8.22 (available: v1.0.4)
    ```
  • feat: redesign sandbox config (#1373)
    This is a major redesign of how sandbox configuration works and aims to
    fix https://github.com/openai/codex/issues/1248. Specifically, it
    replaces `sandbox_permissions` in `config.toml` (and the
    `-s`/`--sandbox-permission` CLI flags) with a "table" with effectively
    three variants:
    
    ```toml
    # Safest option: full disk is read-only, but writes and network access are disallowed.
    [sandbox]
    mode = "read-only"
    
    # The cwd of the Codex task is writable, as well as $TMPDIR on macOS.
    # writable_roots can be used to specify additional writable folders.
    [sandbox]
    mode = "workspace-write"
    writable_roots = []  # Optional, defaults to the empty list.
    network_access = false  # Optional, defaults to false.
    
    # Disable sandboxing: use at your own risk!!!
    [sandbox]
    mode = "danger-full-access"
    ```
    
    This should make sandboxing easier to reason about. While we have
    dropped support for `-s`, the way it works now is:
    
    - no flags => `read-only`
    - `--full-auto` => `workspace-write`
    - currently, there is no way to specify `danger-full-access` via a CLI
    flag, but we will revisit that as part of
    https://github.com/openai/codex/issues/1254
    
    Outstanding issue:
    
    - As noted in the `TODO` on `SandboxPolicy::is_unrestricted()`, we are
    still conflating sandbox preferences with approval preferences in that
    case, which needs to be cleaned up.
  • fix: support arm64 build for Linux (#1225)
    Users were running into issues with glibc mismatches on arm64 linux. In
    the past, we did not provide a musl build for arm64 Linux because we had
    trouble getting the openssl dependency to build correctly. Though today
    I just tried the same trick in `Cargo.toml` that we were doing for
    `x86_64-unknown-linux-musl` (using `openssl-sys` with `features =
    ["vendored"]`), so I'm not sure what problem we had in the past the
    builds "just worked" today!
    
    Though one tweak that did have to be made is that the integration tests
    for Seccomp/Landlock empirically require longer timeouts on arm64 linux,
    or at least on the `ubuntu-24.04-arm` GitHub Runner. As such, we change
    the timeouts for arm64 in `codex-rs/linux-sandbox/tests/landlock.rs`.
    
    Though in solving this problem, I decided I needed a turnkey solution
    for testing the Linux build(s) from my Mac laptop, so this PR introduces
    `.devcontainer/Dockerfile` and `.devcontainer/devcontainer.json` to
    facilitate this. Detailed instructions are in `.devcontainer/README.md`.
    
    We will update `dotslash-config.json` and other release-related scripts
    in a follow-up PR.
  • fix: overhaul how we spawn commands under seccomp/landlock on Linux (#1086)
    Historically, we spawned the Seatbelt and Landlock sandboxes in
    substantially different ways:
    
    For **Seatbelt**, we would run `/usr/bin/sandbox-exec` with our policy
    specified as an arg followed by the original command:
    
    
    https://github.com/openai/codex/blob/d1de7bb383552e8fadd94be79d65d188e00fd562/codex-rs/core/src/exec.rs#L147-L219
    
    For **Landlock/Seccomp**, we would do
    `tokio::runtime::Builder::new_current_thread()`, _invoke
    Landlock/Seccomp APIs to modify the permissions of that new thread_, and
    then spawn the command:
    
    
    https://github.com/openai/codex/blob/d1de7bb383552e8fadd94be79d65d188e00fd562/codex-rs/core/src/exec_linux.rs#L28-L49
    
    While it is neat that Landlock/Seccomp supports applying a policy to
    only one thread without having to apply it to the entire process, it
    requires us to maintain two different codepaths and is a bit harder to
    reason about. The tipping point was
    https://github.com/openai/codex/pull/1061, in which we had to start
    building up the `env` in an unexpected way for the existing
    Landlock/Seccomp approach to continue to work.
    
    This PR overhauls things so that we do similar things for Mac and Linux.
    It turned out that we were already building our own "helper binary"
    comparable to Mac's `sandbox-exec` as part of the `cli` crate:
    
    
    https://github.com/openai/codex/blob/d1de7bb383552e8fadd94be79d65d188e00fd562/codex-rs/cli/Cargo.toml#L10-L12
    
    We originally created this to build a small binary to include with the
    Node.js version of the Codex CLI to provide support for Linux
    sandboxing.
    
    Though the sticky bit is that, at this point, we still want to deploy
    the Rust version of Codex as a single, standalone binary rather than a
    CLI and a supporting sandboxing binary. To satisfy this goal, we use
    "the arg0 trick," in which we:
    
    * use `std::env::current_exe()` to get the path to the CLI that is
    currently running
    * use the CLI as the `program` for the `Command`
    * set `"codex-linux-sandbox"` as arg0 for the `Command`
    
    A CLI that supports sandboxing should check arg0 at the start of the
    program. If it is `"codex-linux-sandbox"`, it must invoke
    `codex_linux_sandbox::run_main()`, which runs the CLI as if it were
    `codex-linux-sandbox`. When acting as `codex-linux-sandbox`, we make the
    appropriate Landlock/Seccomp API calls and then use `execvp(3)` to spawn
    the original command, so do _replace_ the process rather than spawn a
    subprocess. Incidentally, we do this before starting the Tokio runtime,
    so the process should only have one thread when `execvp(3)` is called.
    
    Because the `core` crate that needs to spawn the Linux sandboxing is not
    a CLI in its own right, this means that every CLI that includes `core`
    and relies on this behavior has to (1) implement it and (2) provide the
    path to the sandboxing executable. While the path is almost always
    `std::env::current_exe()`, we needed to make this configurable for
    integration tests, so `Config` now has a `codex_linux_sandbox_exe:
    Option<PathBuf>` property to facilitate threading this through,
    introduced in https://github.com/openai/codex/pull/1089.
    
    This common pattern is now captured in
    `codex_linux_sandbox::run_with_sandbox()` and all of the `main.rs`
    functions that should use it have been updated as part of this PR.
    
    The `codex-linux-sandbox` crate added to the Cargo workspace as part of
    this PR now has the bulk of the Landlock/Seccomp logic, which makes
    `core` a bit simpler. Indeed, `core/src/exec_linux.rs` and
    `core/src/landlock.rs` were removed/ported as part of this PR. I also
    moved the unit tests for this code into an integration test,
    `linux-sandbox/tests/landlock.rs`, in which I use
    `env!("CARGO_BIN_EXE_codex-linux-sandbox")` as the value for
    `codex_linux_sandbox_exe` since `std::env::current_exe()` is not
    appropriate in that case.