Commit Graph

12 Commits

  • enable-resume (#3537)
    Adding the ability to resume conversations.
    we have one verb `resume`. 
    
    Behavior:
    
    `tui`:
    `codex resume`: opens session picker
    `codex resume --last`: continue last message
    `codex resume <session id>`: continue conversation with `session id`
    
    `exec`:
    `codex resume --last`: continue last conversation
    `codex resume <session id>`: continue conversation with `session id`
    
    Implementation:
    - I added a function to find the path in `~/.codex/sessions/` with a
    `UUID`. This is helpful in resuming with session id.
    - Added the above mentioned flags
    - Added lots of testing
  • chore: enable clippy::redundant_clone (#3489)
    Created this PR by:
    
    - adding `redundant_clone` to `[workspace.lints.clippy]` in
    `cargo-rs/Cargol.toml`
    - running `cargo clippy --tests --fix`
    - running `just fmt`
    
    Though I had to clean up one instance of the following that resulted:
    
    ```rust
    let codex = codex;
    ```
  • chore: require uninlined_format_args from clippy (#2845)
    - added `uninlined_format_args` to `[workspace.lints.clippy]` in the
    `Cargo.toml` for the workspace
    - ran `cargo clippy --tests --fix`
    - ran `just fmt`
  • [exec] Clean up apply-patch tests (#2648)
    ## Summary
    These tests were getting a bit unwieldy, and they're starting to become
    load-bearing. Let's clean them up, and get them working solidly so we
    can easily expand this harness with new tests.
    
    ## Test Plan
    - [x] Tests continue to pass
  • test: faster test execution in codex-core (#2633)
    this dramatically improves time to run `cargo test -p codex-core` (~25x
    speedup).
    
    before:
    ```
    cargo test -p codex-core  35.96s user 68.63s system 19% cpu 8:49.80 total
    ```
    
    after:
    ```
    cargo test -p codex-core  5.51s user 8.16s system 63% cpu 21.407 total
    ```
    
    both tests measured "hot", i.e. on a 2nd run with no filesystem changes,
    to exclude compile times.
    
    approach inspired by [Delete Cargo Integration
    Tests](https://matklad.github.io/2021/02/27/delete-cargo-integration-tests.html),
    we move all test cases in tests/ into a single suite in order to have a
    single binary, as there is significant overhead for each test binary
    executed, and because test execution is only parallelized with a single
    binary.
  • [apply_patch] freeform apply_patch tool (#2576)
    ## Summary
    GPT-5 introduced the concept of [custom
    tools](https://platform.openai.com/docs/guides/function-calling#custom-tools),
    which allow the model to send a raw string result back, simplifying
    json-escape issues. We are migrating gpt-5 to use this by default.
    
    However, gpt-oss models do not support custom tools, only normal
    functions. So we keep both tool definitions, and provide whichever one
    the model family supports.
    
    ## Testing
    - [x] Tested locally with various models
    - [x] Unit tests pass
  • [tools] Add apply_patch tool (#2303)
    ## Summary
    We've been seeing a number of issues and reports with our synthetic
    `apply_patch` tool, e.g. #802. Let's make this a real tool - in my
    anecdotal testing, it's critical for GPT-OSS models, but I'd like to
    make it the standard across GPT-5 and codex models as well.
    
    ## Testing
    - [x] Tested locally
    - [x] Integration test
  • Added allow-expect-in-tests / allow-unwrap-in-tests (#2328)
    This PR:
    * Added the clippy.toml to configure allowable expect / unwrap usage in
    tests
    * Removed as many expect/allow lines as possible from tests
    * moved a bunch of allows to expects where possible
    
    Note: in integration tests, non `#[test]` helper functions are not
    covered by this so we had to leave a few lingering `expect(expect_used`
    checks around
  • Fix AF_UNIX, sockpair, recvfrom in linux sandbox (#2309)
    When using codex-tui on a linux system I was unable to run `cargo
    clippy` inside of codex due to:
    ```
    [pid 3548377] socketpair(AF_UNIX, SOCK_SEQPACKET|SOCK_CLOEXEC, 0,  <unfinished ...>
    [pid 3548370] close(8 <unfinished ...>
    [pid 3548377] <... socketpair resumed>0x7ffb97f4ed60) = -1 EPERM (Operation not permitted)
    ```
    And
    ```
    3611300 <... recvfrom resumed>0x708b8b5cffe0, 8, 0, NULL, NULL) = -1 EPERM (Operation not permitted)
    ```
    
    This PR:
    * Fixes a bug that disallowed AF_UNIX to allow it on `socket()`
    * Adds recvfrom() to the syscall allow list, this should be fine since
    we disable opening new sockets. But we should validate there is not a
    open socket inheritance issue.
    * Allow socketpair to be called for AF_UNIX
    * Adds tests for AF_UNIX components
    * All of which allows running `cargo clippy` within the sandbox on
    linux, and possibly other tooling using a fork server model + AF_UNIX
    comms.
  • fix: run python_multiprocessing_lock_works integration test on Mac and Linux (#2318)
    The high-order bit on this PR is that it makes it so `sandbox.rs` tests
    both Mac and Linux, as we introduce a general
    `spawn_command_under_sandbox()` function with platform-specific
    implementations for testing.
    
    An important, and interesting, discovery in porting the test to Linux is
    that (for reasons cited in the code comments), `/dev/shm` has to be
    added to `writable_roots` on Linux in order for `multiprocessing.Lock`
    to work there. Granting write access to `/dev/shm` comes with some
    degree of risk, so we do not make this the default for Codex CLI.
    
    Piggybacking on top of #2317, this moves the
    `python_multiprocessing_lock_works` test yet again, moving
    `codex-rs/core/tests/sandbox.rs` to `codex-rs/exec/tests/sandbox.rs`
    because in `codex-rs/exec/tests` we can use `cargo_bin()` like so:
    
    ```
    let codex_linux_sandbox_exe = assert_cmd::cargo::cargo_bin("codex-exec");
    ```
    
    which is necessary so we can use `codex_linux_sandbox_exe` and therefore
    `spawn_command_under_linux_sandbox` in an integration test.
    
    This also moves `spawn_command_under_linux_sandbox()` out of `exec.rs`
    and into `landlock.rs`, which makes things more consistent with
    `seatbelt.rs` in `codex-core`.
    
    For reference, https://github.com/openai/codex/pull/1808 is the PR that
    made the change to Seatbelt to get this test to pass on Mac.
  • fix: run apply_patch calls through the sandbox (#1705)
    Building on the work of https://github.com/openai/codex/pull/1702, this
    changes how a shell call to `apply_patch` is handled.
    
    Previously, a shell call to `apply_patch` was always handled in-process,
    never leveraging a sandbox. To determine whether the `apply_patch`
    operation could be auto-approved, the
    `is_write_patch_constrained_to_writable_paths()` function would check if
    all the paths listed in the paths were writable. If so, the agent would
    apply the changes listed in the patch.
    
    Unfortunately, this approach afforded a loophole: symlinks!
    
    * For a soft link, we could fix this issue by tracing the link and
    checking whether the target is in the set of writable paths, however...
    * ...For a hard link, things are not as simple. We can run `stat FILE`
    to see if the number of links is greater than 1, but then we would have
    to do something potentially expensive like `find . -inum <inode_number>`
    to find the other paths for `FILE`. Further, even if this worked, this
    approach runs the risk of a
    [TOCTOU](https://en.wikipedia.org/wiki/Time-of-check_to_time-of-use)
    race condition, so it is not robust.
    
    The solution, implemented in this PR, is to take the virtual execution
    of the `apply_patch` CLI into an _actual_ execution using `codex
    --codex-run-as-apply-patch PATCH`, which we can run under the sandbox
    the user specified, just like any other `shell` call.
    
    This, of course, assumes that the sandbox prevents writing through
    symlinks as a mechanism to write to folders that are not in the writable
    set configured by the sandbox. I verified this by testing the following
    on both Mac and Linux:
    
    ```shell
    #!/usr/bin/env bash
    set -euo pipefail
    
    # Can running a command in SANDBOX_DIR write a file in EXPLOIT_DIR?
    
    # Codex is run in SANDBOX_DIR, so writes should be constrianed to this directory.
    SANDBOX_DIR=$(mktemp -d -p "$HOME" sandboxtesttemp.XXXXXX)
    # EXPLOIT_DIR is outside of SANDBOX_DIR, so let's see if we can write to it.
    EXPLOIT_DIR=$(mktemp -d -p "$HOME" sandboxtesttemp.XXXXXX)
    
    echo "SANDBOX_DIR: $SANDBOX_DIR"
    echo "EXPLOIT_DIR: $EXPLOIT_DIR"
    
    cleanup() {
      # Only remove if it looks sane and still exists
      [[ -n "${SANDBOX_DIR:-}" && -d "$SANDBOX_DIR" ]] && rm -rf -- "$SANDBOX_DIR"
      [[ -n "${EXPLOIT_DIR:-}" && -d "$EXPLOIT_DIR" ]] && rm -rf -- "$EXPLOIT_DIR"
    }
    
    trap cleanup EXIT
    
    echo "I am the original content" > "${EXPLOIT_DIR}/original.txt"
    
    # Drop the -s to test hard links.
    ln -s "${EXPLOIT_DIR}/original.txt" "${SANDBOX_DIR}/link-to-original.txt"
    
    cat "${SANDBOX_DIR}/link-to-original.txt"
    
    if [[ "$(uname)" == "Linux" ]]; then
        SANDBOX_SUBCOMMAND=landlock
    else
        SANDBOX_SUBCOMMAND=seatbelt
    fi
    
    # Attempt the exploit
    cd "${SANDBOX_DIR}"
    
    codex debug "${SANDBOX_SUBCOMMAND}" bash -lc "echo pwned > ./link-to-original.txt" || true
    
    cat "${EXPLOIT_DIR}/original.txt"
    ```
    
    Admittedly, this change merits a proper integration test, but I think I
    will have to do that in a follow-up PR.
  • fix: support special --codex-run-as-apply-patch arg (#1702)
    This introduces some special behavior to the CLIs that are using the
    `codex-arg0` crate where if `arg1` is `--codex-run-as-apply-patch`, then
    it will run as if `apply_patch arg2` were invoked. This is important
    because it means we can do things like:
    
    ```
    SANDBOX_TYPE=landlock # or seatbelt for macOS
    codex debug "${SANDBOX_TYPE}" -- codex --codex-run-as-apply-patch PATCH
    ```
    
    which gives us a way to run `apply_patch` while ensuring it adheres to
    the sandbox the user specified.
    
    While it would be nice to use the `arg0` trick like we are currently
    doing for `codex-linux-sandbox`, there is no way to specify the `arg0`
    for the underlying command when running under `/usr/bin/sandbox-exec`,
    so it will not work for us in this case.
    
    Admittedly, we could have also supported this via a custom environment
    variable (e.g., `CODEX_ARG0`), but since environment variables are
    inherited by child processes, that seemed like a potentially leakier
    abstraction.
    
    This change, as well as our existing reliance on checking `arg0`, place
    additional requirements on those who include `codex-core`. Its
    `README.md` has been updated to reflect this.
    
    While we could have just added an `apply-patch` subcommand to the
    `codex` multitool CLI, that would not be sufficient for the standalone
    `codex-exec` CLI, which is something that we distribute as part of our
    GitHub releases for those who know they will not be using the TUI and
    therefore prefer to use a slightly smaller executable:
    
    https://github.com/openai/codex/releases/tag/rust-v0.10.0
    
    To that end, this PR adds an integration test to ensure that the
    `--codex-run-as-apply-patch` option works with the standalone
    `codex-exec` CLI.
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/1702).
    * #1705
    * #1703
    * __->__ #1702
    * #1698
    * #1697