Commit Graph

17 Commits

  • Refactor ExecServer filesystem split between local and remote (#15232)
    For each feature we have:
    1. Trait exposed on environment
    2. **Local Implementation** of the trait
    3. Remote implementation that uses the client to proxy via network
    4. Handler implementation that handles PRC requests and calls into
    **Local Implementation**
  • Add exec-server exec RPC implementation (#15090)
    Stacked PR 2/3, based on the stub PR.
    
    Adds the exec RPC implementation and process/event flow in exec-server
    only.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Move environment abstraction into exec server (#15125)
    The idea is that codex-exec exposes an Environment struct with services
    on it. Each of those is a trait.
    
    Depending on construction parameters passed to Environment they are
    either backed by local or remote server but core doesn't see these
    differences.
  • Add exec-server stub server and protocol docs (#15089)
    Stacked PR 1/3.
    
    This is the initialize-only exec-server stub slice: binary/client
    scaffolding and protocol docs, without exec/filesystem implementation.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • refactor: delete exec-server and move execve wrapper into shell-escalation (#12632)
    ## Why
    
    We already plan to remove the shell-tool MCP path, and doing that
    cleanup first makes the follow-on `shell-escalation` work much simpler.
    
    This change removes the last remaining reason to keep
    `codex-rs/exec-server` around by moving the `codex-execve-wrapper`
    binary and shared shell test fixtures to the crates/tests that now own
    that functionality.
    
    ## What Changed
    
    ### Delete `codex-rs/exec-server`
    
    - Remove the `exec-server` crate, including the MCP server binary,
    MCP-specific modules, and its test support/test suite
    - Remove `exec-server` from the `codex-rs` workspace and update
    `Cargo.lock`
    
    ### Move `codex-execve-wrapper` into `codex-rs/shell-escalation`
    
    - Move the wrapper implementation into `shell-escalation`
    (`src/unix/execve_wrapper.rs`)
    - Add the `codex-execve-wrapper` binary entrypoint under
    `shell-escalation/src/bin/`
    - Update `shell-escalation` exports/module layout so the wrapper
    entrypoint is hosted there
    - Move the wrapper README content from `exec-server` to
    `shell-escalation/README.md`
    
    ### Move shared shell test fixtures to `app-server`
    
    - Move the DotSlash `bash`/`zsh` test fixtures from
    `exec-server/tests/suite/` to `app-server/tests/suite/`
    - Update `app-server` zsh-fork tests to reference the new fixture paths
    
    ### Keep `shell-tool-mcp` as a shell-assets package
    
    - Update `.github/workflows/shell-tool-mcp.yml` packaging so the npm
    artifact contains only patched Bash/Zsh payloads (no Rust binaries)
    - Update `shell-tool-mcp/package.json`, `shell-tool-mcp/src/index.ts`,
    and docs to reflect the shell-assets-only package shape
    - `shell-tool-mcp-ci.yml` does not need changes because it is already
    JS-only
    
    ## Verification
    
    - `cargo shear`
    - `cargo clippy -p codex-shell-escalation --tests`
    - `just clippy`
  • refactor: normalize unix module layout for exec-server and shell-escalation (#12556)
    ## Why
    Shell execution refactoring in `exec-server` had become split between
    duplicated code paths, which blocked a clean introduction of the new
    reusable shell escalation flow. This commit creates a dedicated
    foundation crate so later shell tooling changes can share one
    implementation.
    
    ## What changed
    - Added the `codex-shell-escalation` crate and moved the core escalation
    pieces (`mcp` protocol/socket/session flow, policy glue) that were
    previously in `exec-server` into it.
    - Normalized `exec-server` Unix structure under a dedicated `unix`
    module layout and kept non-Unix builds narrow.
    - Wired crate/build metadata so `shell-escalation` is a first-class
    workspace dependency for follow-on integration work.
    
    ## Verification
    - Built and linted the stack at this commit point with `just clippy`.
    
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/12556).
    * #12584
    * #12583
    * __->__ #12556
  • test: vendor zsh fork via DotSlash and stabilize zsh-fork tests (#12518)
    ## Why
    
    The zsh integration tests were still brittle in two ways:
    
    - they relied on `CODEX_TEST_ZSH_PATH` / environment-specific setup, so
    they often did not exercise the patched zsh fork that `shell-tool-mcp`
    ships
    - once the tests consistently used the vendored zsh fork, they exposed
    real Linux-specific zsh-fork issues in CI
    
    In particular, the Linux failures were not just test noise:
    
    - the zsh-fork launch path was dropping `ExecRequest.arg0`, so Linux
    `codex-linux-sandbox` arg0 dispatch did not run and zsh wrapper-mode
    could receive malformed arguments
    - the
    `turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2`
    test uses the zsh exec bridge (which talks to the parent over a Unix
    socket), but Linux restricted sandbox seccomp denies `connect(2)`,
    causing timeouts on `ubuntu-24.04` x86/arm
    
    This PR makes the zsh tests consistently run against the intended
    vendored zsh fork and fixes/hardens the zsh-fork path so the Linux CI
    signal is meaningful.
    
    ## What Changed
    
    - Added a single shared test-only DotSlash file for the patched zsh fork
    at `codex-rs/exec-server/tests/suite/zsh` (analogous to the existing
    `bash` test resource).
    - Updated both app-server and exec-server zsh tests to use that shared
    DotSlash zsh (no duplicate zsh DotSlash file, no `CODEX_TEST_ZSH_PATH`
    dependency).
    - Updated the app-server zsh-fork test helper to resolve the shared
    DotSlash zsh and avoid silently falling back to host zsh.
    - Kept the app-server zsh-fork tests configured via `config.toml`, using
    a test wrapper path where needed to force `zsh -df` (and rewrite `-lc`
    to `-c`) for the subcommand-decline test.
    - Hardened the app-server subcommand-decline zsh-fork test for CI
    variability:
      - tolerate an extra `/responses` POST with a no-op mock response
    - tolerate non-target approval ordering while remaining strict on the
    two `/usr/bin/true` approvals and decline behavior
    - use `DangerFullAccess` on Linux for this one test because it validates
    zsh approval flow, not Linux sandbox socket restrictions
    - Fixed zsh-fork process launching on Linux by preserving `req.arg0` in
    `ZshExecBridge::execute_shell_request(...)` so `codex-linux-sandbox`
    arg0 dispatch continues to work.
    - Moved `maybe_run_zsh_exec_wrapper_mode()` under
    `arg0_dispatch_or_else(...)` in `app-server` and `cli` so wrapper-mode
    handling coexists correctly with arg0-dispatched helper modes.
    - Consolidated duplicated `dotslash -- fetch` resolution logic into
    shared test support (`core/tests/common/lib.rs`).
    - Updated `codex-rs/exec-server/tests/suite/accept_elicitation.rs` to
    use DotSlash zsh and hardened the zsh elicitation test for Bazel/zsh
    differences by:
      - resolving an absolute `git` path
      - running `git init --quiet .`
    - asserting success / `.git` creation instead of relying on banner text
    
    ## Verification
    
    - `cargo test -p codex-app-server turn_start_zsh_fork -- --nocapture`
    - `cargo test -p codex-exec-server accept_elicitation -- --nocapture`
    - `bazel test //codex-rs/exec-server:exec-server-all-test
    --test_output=streamed --test_arg=--nocapture
    --test_arg=accept_elicitation_for_prompt_rule_with_zsh`
    - CI (`rust-ci`) on the final cleaned commit: `Tests — ubuntu-24.04 -
    x86_64-unknown-linux-gnu` and `Tests — ubuntu-24.04-arm -
    aarch64-unknown-linux-gnu` passed in [run
    22291424358](https://github.com/openai/codex/actions/runs/22291424358)
  • chore: remove codex-core public protocol/shell re-exports (#12432)
    ## Why
    
    `codex-rs/core/src/lib.rs` re-exported a broad set of types and modules
    from `codex-protocol` and `codex-shell-command`. That made it easy for
    workspace crates to import those APIs through `codex-core`, which in
    turn hides dependency edges and makes it harder to reduce compile-time
    coupling over time.
    
    This change removes those public re-exports so call sites must import
    from the source crates directly. Even when a crate still depends on
    `codex-core` today, this makes dependency boundaries explicit and
    unblocks future work to drop `codex-core` dependencies where possible.
    
    ## What Changed
    
    - Removed public re-exports from `codex-rs/core/src/lib.rs` for:
    - `codex_protocol::protocol` and related protocol/model types (including
    `InitialHistory`)
      - `codex_protocol::config_types` (`protocol_config_types`)
    - `codex_shell_command::{bash, is_dangerous_command, is_safe_command,
    parse_command, powershell}`
    - Migrated workspace Rust call sites to import directly from:
      - `codex_protocol::protocol`
      - `codex_protocol::config_types`
      - `codex_protocol::models`
      - `codex_shell_command`
    - Added explicit `Cargo.toml` dependencies (`codex-protocol` /
    `codex-shell-command`) in crates that now import those crates directly.
    - Kept `codex-core` internal modules compiling by using `pub(crate)`
    aliases in `core/src/lib.rs` (internal-only, not part of the public
    API).
    - Updated the two utility crates that can already drop a `codex-core`
    dependency edge entirely:
      - `codex-utils-approval-presets`
      - `codex-utils-cli`
    
    ## Verification
    
    - `cargo test -p codex-utils-approval-presets`
    - `cargo test -p codex-utils-cli`
    - `cargo check --workspace --all-targets`
    - `just clippy`
  • feat: introduce codex-utils-cargo-bin as an alternative to assert_cmd::Command (#8496)
    This PR introduces a `codex-utils-cargo-bin` utility crate that
    wraps/replaces our use of `assert_cmd::Command` and
    `escargot::CargoBuild`.
    
    As you can infer from the introduction of `buck_project_root()` in this
    PR, I am attempting to make it possible to build Codex under
    [Buck2](https://buck2.build) as well as `cargo`. With Buck2, I hope to
    achieve faster incremental local builds (largely due to Buck2's
    [dice](https://buck2.build/docs/insights_and_knowledge/modern_dice/)
    build strategy, as well as benefits from its local build daemon) as well
    as faster CI builds if we invest in remote execution and caching.
    
    See
    https://buck2.build/docs/getting_started/what_is_buck2/#why-use-buck2-key-advantages
    for more details about the performance advantages of Buck2.
    
    Buck2 enforces stronger requirements in terms of build and test
    isolation. It discourages assumptions about absolute paths (which is key
    to enabling remote execution). Because the `CARGO_BIN_EXE_*` environment
    variables that Cargo provides are absolute paths (which
    `assert_cmd::Command` reads), this is a problem for Buck2, which is why
    we need this `codex-utils-cargo-bin` utility.
    
    My WIP-Buck2 setup sets the `CARGO_BIN_EXE_*` environment variables
    passed to a `rust_test()` build rule as relative paths.
    `codex-utils-cargo-bin` will resolve these values to absolute paths,
    when necessary.
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/8496).
    * #8498
    * __->__ #8496
  • fix: add integration tests for codex-exec-mcp-server with execpolicy (#7617)
    This PR introduces integration tests that run
    [codex-shell-tool-mcp](https://www.npmjs.com/package/@openai/codex-shell-tool-mcp)
    as a user would. Note that this requires running our fork of Bash, so we
    introduce a [DotSlash](https://dotslash-cli.com/) file for `bash` so
    that we can run the integration tests on multiple platforms without
    having to check the binaries into the repository. (As noted in the
    DotSlash file, it is slightly more heavyweight than necessary, which may
    be worth addressing as disk space in CI is limited:
    https://github.com/openai/codex/pull/7678.)
    
    To start, this PR adds two tests:
    
    - `list_tools()` makes the `list_tools` request to the MCP server and
    verifies we get the expected response
    - `accept_elicitation_for_prompt_rule()` defines a `prefix_rule()` with
    `decision="prompt"` and verifies the elicitation flow works as expected
    
    Though the `accept_elicitation_for_prompt_rule()` test **only works on
    Linux**, as this PR reveals that there are currently issues when running
    the Bash fork in a read-only sandbox on Linux. This will have to be
    fixed in a follow-up PR.
    
    Incidentally, getting this test run to correctly on macOS also requires
    a recent fix we made to `brew` that hasn't hit a mainline release yet,
    so getting CI green in this PR required
    https://github.com/openai/codex/pull/7680.
  • feat: exec policy integration in shell mcp (#7609)
    adding execpolicy support into the `posix` mcp
    
    Co-authored-by: Michael Bolin <mbolin@openai.com>
  • chore: add cargo-deny configuration (#7119)
    - add GitHub workflow running cargo-deny on push/PR
    - document cargo-deny allowlist with workspace-dep notes and advisory
    ignores
    - align workspace crates to inherit version/edition/license for
    consistent checks
  • feat: waiting for an elicitation should not count against a shell tool timeout (#6973)
    Previously, we were running into an issue where we would run the `shell`
    tool call with a timeout of 10s, but it fired an elicitation asking for
    user approval, the time the user took to respond to the elicitation was
    counted agains the 10s timeout, so the `shell` tool call would fail with
    a timeout error unless the user is very fast!
    
    This PR addresses this issue by introducing a "stopwatch" abstraction
    that is used to manage the timeout. The idea is:
    
    - `Stopwatch::new()` is called with the _real_ timeout of the `shell`
    tool call.
    - `process_exec_tool_call()` is called with the `Cancellation` variant
    of `ExecExpiration` because it should not manage its own timeout in this
    case
    - the `Stopwatch` expiration is wired up to the `cancel_rx` passed to
    `process_exec_tool_call()`
    - when an elicitation for the `shell` tool call is received, the
    `Stopwatch` pauses
    - because it is possible for multiple elicitations to arrive
    concurrently, it keeps track of the number of "active pauses" and does
    not resume until that counter goes down to zero
    
    I verified that I can test the MCP server using
    `@modelcontextprotocol/inspector` and specify `git status` as the
    `command` with a timeout of 500ms and that the elicitation pops up and I
    have all the time in the world to respond whereas previous to this PR,
    that would not have been possible.
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/6973).
    * #7005
    * __->__ #6973
    * #6972
  • chore: refactor exec-server to prepare it for standalone MCP use (#6944)
    This PR reorganizes things slightly so that:
    
    - Instead of a single multitool executable, `codex-exec-server`, we now
    have two executables:
      - `codex-exec-mcp-server` to launch the MCP server
    - `codex-execve-wrapper` is the `execve(2)` wrapper to use with the
    `BASH_EXEC_WRAPPER` environment variable
    - `BASH_EXEC_WRAPPER` must be a single executable: it cannot be a
    command string composed of an executable with args (i.e., it no longer
    adds the `escalate` subcommand, as before)
    - `codex-exec-mcp-server` takes `--bash` and `--execve` as options.
    Though if `--execve` is not specified, the MCP server will check the
    directory containing `std::env::current_exe()` and attempt to use the
    file named `codex-execve-wrapper` within it. In development, this works
    out since these executables are side-by-side in the `target/debug`
    folder.
    
    With respect to testing, this also fixes an important bug in
    `dummy_exec_policy()`, as I was using `ends_with()` as if it applied to
    a `String`, but in this case, it is used with a `&Path`, so the
    semantics are slightly different.
    
    Putting this all together, I was able to test this by running the
    following:
    
    ```
    ~/code/codex/codex-rs$ npx @modelcontextprotocol/inspector \
        ./target/debug/codex-exec-mcp-server --bash ~/code/bash/bash
    ```
    
    If I try to run `git status` in `/Users/mbolin/code/codex` via the
    `shell` tool from the MCP server:
    
    <img width="1589" height="1335" alt="image"
    src="https://github.com/user-attachments/assets/9db6aea8-7fbc-4675-8b1f-ec446685d6c4"
    />
    
    then I get prompted with the following elicitation, as expected:
    
    <img width="1589" height="1335" alt="image"
    src="https://github.com/user-attachments/assets/21b68fe0-494d-4562-9bad-0ddc55fc846d"
    />
    
    Though a current limitation is that the `shell` tool defaults to a
    timeout of 10s, which means I only have 10s to respond to the
    elicitation. Ideally, the time spent waiting for a response from a human
    should not count against the timeout for the command execution. I will
    address this in a subsequent PR.
    
    ---
    
    Note `~/code/bash/bash` was created by doing:
    
    ```
    cd ~/code
    git clone https://github.com/bminor/bash
    cd bash
    git checkout a8a1c2fac029404d3f42cd39f5a20f24b6e4fe4b
    <apply the patch below>
    ./configure
    make
    ```
    
    The patch:
    
    ```
    diff --git a/execute_cmd.c b/execute_cmd.c
    index 070f5119..d20ad2b9 100644
    --- a/execute_cmd.c
    +++ b/execute_cmd.c
    @@ -6129,6 +6129,19 @@ shell_execve (char *command, char **args, char **env)
       char sample[HASH_BANG_BUFSIZ];
       size_t larray;
    
    +  char* exec_wrapper = getenv("BASH_EXEC_WRAPPER");
    +  if (exec_wrapper && *exec_wrapper && !whitespace (*exec_wrapper))
    +    {
    +      char *orig_command = command;
    +
    +      larray = strvec_len (args);
    +
    +      memmove (args + 2, args, (++larray) * sizeof (char *));
    +      args[0] = exec_wrapper;
    +      args[1] = orig_command;
    +      command = exec_wrapper;
    +    }
    +
    ```
  • fix: prepare ExecPolicy in exec-server for execpolicy2 cutover (#6888)
    This PR introduces an extra layer of abstraction to prepare us for the
    migration to execpolicy2:
    
    - introduces a new trait, `EscalationPolicy`, whose `determine_action()`
    method is responsible for producing the `EscalateAction`
    - the existing `ExecPolicy` typedef is changed to return an intermediate
    `ExecPolicyOutcome` instead of `EscalateAction`
    - the default implementation of `EscalationPolicy`,
    `McpEscalationPolicy`, composes `ExecPolicy`
    - the `ExecPolicyOutcome` includes `codex_execpolicy2::Decision`, which
    has a `Prompt` variant
    - when `McpEscalationPolicy` gets `Decision::Prompt` back from
    `ExecPolicy`, it prompts the user via an MCP elicitation and maps the
    result into an `ElicitationAction`
    - now that the end user can reply to an elicitation with `Decline` or
    `Cancel`, we introduce a new variant, `EscalateAction::Deny`, which the
    client handles by returning exit code `1` without running anything
    
    Note the way the elicitation is created is still not quite right, but I
    will fix that once we have things running end-to-end for real in a
    follow-up PR.