Commit Graph

41 Commits

  • Move terminal module to terminal-detection crate (#15216)
    - Move core/src/terminal.rs and its tests into a standalone
    terminal-detection workspace crate.
    - Update direct consumers to depend on codex-terminal-detection and
    import terminal APIs directly.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Add exec-server exec RPC implementation (#15090)
    Stacked PR 2/3, based on the stub PR.
    
    Adds the exec RPC implementation and process/event flow in exec-server
    only.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Add experimental exec server URL handling (#15196)
    Add a config and attempt to start the server.
  • Move environment abstraction into exec server (#15125)
    The idea is that codex-exec exposes an Environment struct with services
    on it. Each of those is a trait.
    
    Depending on construction parameters passed to Environment they are
    either backed by local or remote server but core doesn't see these
    differences.
  • Remove stdio transport from exec server (#15119)
    Summary
    - delete the deprecated stdio transport plumbing from the exec server
    stack
    - add a basic `exec_server()` harness plus test utilities to start a
    server, send requests, and await events
    - refresh exec-server dependencies, configs, and documentation to
    reflect the new flow
    
    Testing
    - Not run (not requested)
    
    ---------
    
    Co-authored-by: starr-openai <starr@openai.com>
    Co-authored-by: Codex <noreply@openai.com>
  • Add exec-server stub server and protocol docs (#15089)
    Stacked PR 1/3.
    
    This is the initialize-only exec-server stub slice: binary/client
    scaffolding and protocol docs, without exec/filesystem implementation.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • refactor: delete exec-server and move execve wrapper into shell-escalation (#12632)
    ## Why
    
    We already plan to remove the shell-tool MCP path, and doing that
    cleanup first makes the follow-on `shell-escalation` work much simpler.
    
    This change removes the last remaining reason to keep
    `codex-rs/exec-server` around by moving the `codex-execve-wrapper`
    binary and shared shell test fixtures to the crates/tests that now own
    that functionality.
    
    ## What Changed
    
    ### Delete `codex-rs/exec-server`
    
    - Remove the `exec-server` crate, including the MCP server binary,
    MCP-specific modules, and its test support/test suite
    - Remove `exec-server` from the `codex-rs` workspace and update
    `Cargo.lock`
    
    ### Move `codex-execve-wrapper` into `codex-rs/shell-escalation`
    
    - Move the wrapper implementation into `shell-escalation`
    (`src/unix/execve_wrapper.rs`)
    - Add the `codex-execve-wrapper` binary entrypoint under
    `shell-escalation/src/bin/`
    - Update `shell-escalation` exports/module layout so the wrapper
    entrypoint is hosted there
    - Move the wrapper README content from `exec-server` to
    `shell-escalation/README.md`
    
    ### Move shared shell test fixtures to `app-server`
    
    - Move the DotSlash `bash`/`zsh` test fixtures from
    `exec-server/tests/suite/` to `app-server/tests/suite/`
    - Update `app-server` zsh-fork tests to reference the new fixture paths
    
    ### Keep `shell-tool-mcp` as a shell-assets package
    
    - Update `.github/workflows/shell-tool-mcp.yml` packaging so the npm
    artifact contains only patched Bash/Zsh payloads (no Rust binaries)
    - Update `shell-tool-mcp/package.json`, `shell-tool-mcp/src/index.ts`,
    and docs to reflect the shell-assets-only package shape
    - `shell-tool-mcp-ci.yml` does not need changes because it is already
    JS-only
    
    ## Verification
    
    - `cargo shear`
    - `cargo clippy -p codex-shell-escalation --tests`
    - `just clippy`
  • refactor: normalize unix module layout for exec-server and shell-escalation (#12556)
    ## Why
    Shell execution refactoring in `exec-server` had become split between
    duplicated code paths, which blocked a clean introduction of the new
    reusable shell escalation flow. This commit creates a dedicated
    foundation crate so later shell tooling changes can share one
    implementation.
    
    ## What changed
    - Added the `codex-shell-escalation` crate and moved the core escalation
    pieces (`mcp` protocol/socket/session flow, policy glue) that were
    previously in `exec-server` into it.
    - Normalized `exec-server` Unix structure under a dedicated `unix`
    module layout and kept non-Unix builds narrow.
    - Wired crate/build metadata so `shell-escalation` is a first-class
    workspace dependency for follow-on integration work.
    
    ## Verification
    - Built and linted the stack at this commit point with `just clippy`.
    
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/12556).
    * #12584
    * #12583
    * __->__ #12556
  • refactor: decouple MCP policy construction from escalate server (#12555)
    ## Why
    The current escalate path in `codex-rs/exec-server` still had policy
    creation coupled to MCP details, which makes it hard to reuse the shell
    execution flow outside the MCP server. This change is part of a broader
    goal to split MCP-specific behavior from shared escalation execution so
    other handlers (for example a future `ShellCommandHandler`) can reuse it
    without depending on MCP request context types.
    
    ## What changed
    - Added a new `EscalationPolicyFactory` abstraction in `mcp.rs`:
      - `crate`-relative path: `codex-rs/exec-server/src/posix/mcp.rs`
    -
    https://github.com/openai/codex/blob/main/codex-rs/exec-server/src/posix/mcp.rs#L87-L107
    - Made `run_escalate_server` in `mcp.rs` accept a policy factory instead
    of constructing `McpEscalationPolicy` directly.
    -
    https://github.com/openai/codex/blob/main/codex-rs/exec-server/src/posix/mcp.rs#L178-L201
    - Introduced `McpEscalationPolicyFactory` that stores MCP-only state
    (`RequestContext`, `preserve_program_paths`) and implements the new
    trait.
    -
    https://github.com/openai/codex/blob/main/codex-rs/exec-server/src/posix/mcp.rs#L100-L117
    - Updated `shell()` to pass a `McpEscalationPolicyFactory` instance into
    `run_escalate_server`, so the server remains the MCP-specific wiring
    layer.
    -
    https://github.com/openai/codex/blob/main/codex-rs/exec-server/src/posix/mcp.rs#L163-L170
    
    ## Verification
    - Build and test execution was not re-run in this pass; changes are
    limited to `mcp.rs` and preserve the existing escalation flow semantics
    by only extracting policy construction behind a factory.
    
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/12555).
    * #12556
    * __->__ #12555
  • chore: remove codex-core public protocol/shell re-exports (#12432)
    ## Why
    
    `codex-rs/core/src/lib.rs` re-exported a broad set of types and modules
    from `codex-protocol` and `codex-shell-command`. That made it easy for
    workspace crates to import those APIs through `codex-core`, which in
    turn hides dependency edges and makes it harder to reduce compile-time
    coupling over time.
    
    This change removes those public re-exports so call sites must import
    from the source crates directly. Even when a crate still depends on
    `codex-core` today, this makes dependency boundaries explicit and
    unblocks future work to drop `codex-core` dependencies where possible.
    
    ## What Changed
    
    - Removed public re-exports from `codex-rs/core/src/lib.rs` for:
    - `codex_protocol::protocol` and related protocol/model types (including
    `InitialHistory`)
      - `codex_protocol::config_types` (`protocol_config_types`)
    - `codex_shell_command::{bash, is_dangerous_command, is_safe_command,
    parse_command, powershell}`
    - Migrated workspace Rust call sites to import directly from:
      - `codex_protocol::protocol`
      - `codex_protocol::config_types`
      - `codex_protocol::models`
      - `codex_shell_command`
    - Added explicit `Cargo.toml` dependencies (`codex-protocol` /
    `codex-shell-command`) in crates that now import those crates directly.
    - Kept `codex-core` internal modules compiling by using `pub(crate)`
    aliases in `core/src/lib.rs` (internal-only, not part of the public
    API).
    - Updated the two utility crates that can already drop a `codex-core`
    dependency edge entirely:
      - `codex-utils-approval-presets`
      - `codex-utils-cli`
    
    ## Verification
    
    - `cargo test -p codex-utils-approval-presets`
    - `cargo test -p codex-utils-cli`
    - `cargo check --workspace --all-targets`
    - `just clippy`
  • Refactor network approvals to host/protocol/port scope (#12140)
    ## Summary
    Simplify network approvals by removing per-attempt proxy correlation and
    moving to session-level approval dedupe keyed by (host, protocol, port).
    Instead of encoding attempt IDs into proxy credentials/URLs, we now
    treat approvals as a destination policy decision.
    
    - Concurrent calls to the same destination share one approval prompt.
    - Different destinations (or same host on different ports) get separate
    prompts.
    - Allow once approves the current queued request group only.
    - Allow for session caches that (host, protocol, port) and auto-allows
    future matching requests.
    - Never policy continues to deny without prompting.
    
    Example:
    - 3 calls: 
      - a.com (line 443)
      - b.com (line 443)
      - a.com (line 443)
    => 2 prompts total (a, b), second a waits on the first decision.
    - a.com:80 is treated separately from a.com line 443
    
    ## Testing
    - `just fmt` (in `codex-rs`)
    - `cargo test -p codex-core tools::network_approval::tests`
    - `cargo test -p codex-core` (unit tests pass; existing
    integration-suite failures remain in this environment)
  • feat(core): add structured network approval plumbing and policy decision model (#11672)
    ### Description
    #### Summary
    Introduces the core plumbing required for structured network approvals
    
    #### What changed
    - Added structured network policy decision modeling in core.
    - Added approval payload/context types needed for network approval
    semantics.
    - Wired shell/unified-exec runtime plumbing to consume structured
    decisions.
    - Updated related core error/event surfaces for structured handling.
    - Updated protocol plumbing used by core approval flow.
    - Included small CLI debug sandbox compatibility updates needed by this
    layer.
    
    #### Why
    establishes the minimal backend foundation for network approvals without
    yet changing high-level orchestration or TUI behavior.
    
    #### Notes
    - Behavior remains constrained by existing requirements/config gating.
    - Follow-up PRs in the stack handle orchestration, UX, and app-server
    integration.
    
    ---------
    
    Co-authored-by: Codex <199175422+chatgpt-codex-connector[bot]@users.noreply.github.com>
  • feat(shell-tool-mcp): add patched zsh build pipeline (#11668)
    ## Summary
    - add `shell-tool-mcp/patches/zsh-exec-wrapper.patch` against upstream
    zsh `77045ef899e53b9598bebc5a41db93a548a40ca6`
    - add `zsh-linux` and `zsh-darwin` jobs to
    `.github/workflows/shell-tool-mcp.yml`
    - stage zsh binaries under `artifacts/vendor/<target>/zsh/<variant>/zsh`
    - include zsh artifact jobs in `package.needs`
    - mark staged zsh binaries executable during packaging
    
    ## Notes
    - zsh source is cloned from `https://git.code.sf.net/p/zsh/code`
    - workflow pins zsh commit `77045ef899e53b9598bebc5a41db93a548a40ca6`
    - zsh build runs `./Util/preconfig` before `./configure`
    
    ## Validation
    - parsed workflow YAML locally (`yaml-ok`)
    - validated zsh patch applies cleanly with `git apply --check` on a
    fresh zsh clone
  • feat: make sandbox read access configurable with ReadOnlyAccess (#11387)
    `SandboxPolicy::ReadOnly` previously implied broad read access and could
    not express a narrower read surface.
    This change introduces an explicit read-access model so we can support
    user-configurable read restrictions in follow-up work, while preserving
    current behavior today.
    
    It also ensures unsupported backends fail closed for restricted-read
    policies instead of silently granting broader access than intended.
    
    ## What
    
    - Added `ReadOnlyAccess` in protocol with:
      - `Restricted { include_platform_defaults, readable_roots }`
      - `FullAccess`
    - Updated `SandboxPolicy` to carry read-access configuration:
      - `ReadOnly { access: ReadOnlyAccess }`
      - `WorkspaceWrite { ..., read_only_access: ReadOnlyAccess }`
    - Preserved existing behavior by defaulting current construction paths
    to `ReadOnlyAccess::FullAccess`.
    - Threaded the new fields through sandbox policy consumers and call
    sites across `core`, `tui`, `linux-sandbox`, `windows-sandbox`, and
    related tests.
    - Updated Seatbelt policy generation to honor restricted read roots by
    emitting scoped read rules when full read access is not granted.
    - Added fail-closed behavior on Linux and Windows backends when
    restricted read access is requested but not yet implemented there
    (`UnsupportedOperation`).
    - Regenerated app-server protocol schema and TypeScript artifacts,
    including `ReadOnlyAccess`.
    
    ## Compatibility / rollout
    
    - Runtime behavior remains unchanged by default (`FullAccess`).
    - API/schema changes are in place so future config wiring can enable
    restricted read access without another policy-shape migration.
  • feat: include NetworkConfig through ExecParams (#11105)
    This PR adds the following field to `Config`:
    
    ```rust
    pub network: Option<NetworkProxy>,
    ```
    
    Though for the moment, it will always be initialized as `None` (this
    will be addressed in a subsequent PR).
    
    This PR does the work to thread `network` through to `execute_exec_env()`, `process_exec_tool_call()`, and `UnifiedExecRuntime.run()` to ensure it is available whenever we span a process.
  • Upgrade rmcp to 0.14 (#10718)
    - [x] Upgrade rmcp to 0.14
  • feat(linux-sandbox): add bwrap support (#9938)
    ## Summary
    This PR introduces a gated Bubblewrap (bwrap) Linux sandbox path. The
    curent Linux sandbox path relies on in-process restrictions (including
    Landlock). Bubblewrap gives us a more uniform filesystem isolation
    model, especially explicit writable roots with the option to make some
    directories read-only and granular network controls.
    
    This is behind a feature flag so we can validate behavior safely before
    making it the default.
    
    - Added temporary rollout flag:
      - `features.use_linux_sandbox_bwrap`
    - Preserved existing default path when the flag is off.
    - In Bubblewrap mode:
    - Added internal retry without /proc when /proc mount is not permitted
    by the host/container.
  • Wire up cloud reqs in exec, app-server (#10241)
    We're fetching cloud requirements in TUI in
    https://github.com/openai/codex/pull/10167.
    
    This adds the same fetching in exec and app-server binaries also.
  • Fetch Requirements from cloud (#10167)
    Load requirements from Codex Backend. It only does this for enterprise
    customers signed in with ChatGPT.
    
    Todo in follow-up PRs:
    * Add to app-server and exec too
    * Switch from fail-open to fail-closed on failure
  • remove sandbox globals. (#9797)
    Threads sandbox updates through OverrideTurnContext for active turn
    Passes computed sandbox type into safety/exec
  • feat: load ExecPolicyManager from ConfigLayerStack (#8453)
    https://github.com/openai/codex/pull/8354 added support for in-repo
    `.config/` files, so this PR updates the logic for loading `*.rules`
    files to load `*.rules` files from all relevant layers. The main change
    to the business logic is `load_exec_policy()` in
    `codex-rs/core/src/exec_policy.rs`.
    
    Note this adds a `config_folder()` method to `ConfigLayerSource` that
    returns `Option<AbsolutePathBuf>` so that it is straightforward to
    iterate over the sources and get the associated config folder, if any.
  • fix: change codex/sandbox-state/update from a notification to a request (#8142)
    Historically, `accept_elicitation_for_prompt_rule()` was flaky because
    we were using a notification to update the sandbox followed by a `shell`
    tool request that we expected to be subject to the new sandbox config,
    but because [rmcp](https://crates.io/crates/rmcp) MCP servers delegate
    each incoming message to a new Tokio task, messages are not guaranteed
    to be processed in order, so sometimes the `shell` tool call would run
    before the notification was processed.
    
    Prior to this PR, we relied on a generous `sleep()` between the
    notification and the request to reduce the change of the test flaking
    out.
    
    This PR implements a proper fix, which is to use a _request_ instead of
    a notification for the sandbox update so that we can wait for the
    response to the sandbox request before sending the request to the
    `shell` tool call. Previously, `rmcp` did not support custom requests,
    but I fixed that in
    https://github.com/modelcontextprotocol/rust-sdk/pull/590, which made it
    into the `0.12.0` release (see #8288).
    
    This PR updates `shell-tool-mcp` to expect
    `"codex/sandbox-state/update"` as a _request_ instead of a notification
    and sends the appropriate ack. Note this behavior is tied to our custom
    `codex/sandbox-state` capability, which Codex honors as an MCP client,
    which is why `core/src/mcp_connection_manager.rs` had to be updated as
    part of this PR, as well.
    
    This PR also updates the docs at `shell-tool-mcp/README.md`.
  • chore: upgrade rmcp crate from 0.10.0 to 0.12.0 (#8288)
    Version `0.12.0` includes
    https://github.com/modelcontextprotocol/rust-sdk/pull/590, which I will
    use in https://github.com/openai/codex/pull/8142.
    
    Changes:
    
    - `rmcp::model::CustomClientNotification` was renamed to
    `rmcp::model::CustomNotification`
    - a bunch of types have a `meta` field now, but it is `Option`, so I
    added `meta: None` to a bunch of things
  • exec-server: additional context for errors (#7935)
    Add a .context() on some exec-server errors for debugging CI flakes.
    
    Also, "login": false in the test to make the test not affected by user
    profile.
  • refactoring with_escalated_permissions to use SandboxPermissions instead (#7750)
    helpful in the future if we want more granularity for requesting
    escalated permissions:
    e.g when running in readonly sandbox, model can request to escalate to a
    sandbox that allows writes
  • fix: add integration tests for codex-exec-mcp-server with execpolicy (#7617)
    This PR introduces integration tests that run
    [codex-shell-tool-mcp](https://www.npmjs.com/package/@openai/codex-shell-tool-mcp)
    as a user would. Note that this requires running our fork of Bash, so we
    introduce a [DotSlash](https://dotslash-cli.com/) file for `bash` so
    that we can run the integration tests on multiple platforms without
    having to check the binaries into the repository. (As noted in the
    DotSlash file, it is slightly more heavyweight than necessary, which may
    be worth addressing as disk space in CI is limited:
    https://github.com/openai/codex/pull/7678.)
    
    To start, this PR adds two tests:
    
    - `list_tools()` makes the `list_tools` request to the MCP server and
    verifies we get the expected response
    - `accept_elicitation_for_prompt_rule()` defines a `prefix_rule()` with
    `decision="prompt"` and verifies the elicitation flow works as expected
    
    Though the `accept_elicitation_for_prompt_rule()` test **only works on
    Linux**, as this PR reveals that there are currently issues when running
    the Bash fork in a read-only sandbox on Linux. This will have to be
    fixed in a follow-up PR.
    
    Incidentally, getting this test run to correctly on macOS also requires
    a recent fix we made to `brew` that hasn't hit a mainline release yet,
    so getting CI green in this PR required
    https://github.com/openai/codex/pull/7680.
  • fix: exec-server stream was erroring for large requests (#7654)
    Previous to this change, large `EscalateRequest` payloads exceeded the
    kernel send buffer, causing our single `sendmsg(2)` call (with attached
    FDs) to be split and retried without proper control handling; this led
    to `EINVAL`/broken pipe in the
    `handle_escalate_session_respects_run_in_sandbox_decision()` test when
    using an `env` with large contents.
    
    **Before:** `AsyncSocket::send_with_fds()` called `send_json_message()`,
    which called `send_message_bytes()`, which made one `socket.sendmsg()`
    call followed by additional `socket.send()` calls, as necessary:
    
    
    https://github.com/openai/codex/blob/2e4a40252157751765dff176b35c692df8a9fb4e/codex-rs/exec-server/src/posix/socket.rs#L198-L209
    
    **After:** `AsyncSocket::send_with_fds()` now calls
    `send_stream_frame()`, which calls `send_stream_chunk()` one or more
    times. Each call to `send_stream_chunk()` calls `socket.sendmsg()`.
    
    In the previous implementation, the subsequent `socket.send()` writes
    had no control information associated with them, whereas in the new
    `send_stream_chunk()` implementation, a fresh `MsgHdr` (using
    `with_control()`, as appropriate) is created for `socket.sendmsg()` each
    time.
    
    Additionally, with this PR, stream sending attaches `SCM_RIGHTS` only on
    the first chunk, and omits control data when there are no FDs, allowing
    oversized payloads to deliver correctly while preserving FD limits and
    error checks.
  • feat: exec policy integration in shell mcp (#7609)
    adding execpolicy support into the `posix` mcp
    
    Co-authored-by: Michael Bolin <mbolin@openai.com>
  • feat: support --version flag for @openai/codex-shell-tool-mcp (#7504)
    I find it helpful to easily verify which version is running.
    
    Tested:
    
    ```shell
    ~/code/codex3/codex-rs/exec-server$ cargo run --bin codex-exec-mcp-server -- --help
        Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.19s
         Running `/Users/mbolin/code/codex3/codex-rs/target/debug/codex-exec-mcp-server --help`
    Usage: codex-exec-mcp-server [OPTIONS]
    
    Options:
          --execve <EXECVE_WRAPPER>  Executable to delegate execve(2) calls to in Bash
          --bash <BASH_PATH>         Path to Bash that has been patched to support execve() wrapping
      -h, --help                     Print help
      -V, --version                  Print version
    ~/code/codex3/codex-rs/exec-server$ cargo run --bin codex-exec-mcp-server -- --version
        Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.17s
         Running `/Users/mbolin/code/codex3/codex-rs/target/debug/codex-exec-mcp-server --version`
    codex-exec-server 0.0.0
    ```
  • feat: declare server capability in shell-tool-mcp (#7112)
    This introduces a new feature to Codex when it operates as an MCP
    _client_ where if an MCP _server_ replies that it has an entry named
    `"codex/sandbox-state"` in its _server capabilities_, then Codex will
    send it an MCP notification with the following structure:
    
    ```json
    {
      "method": "codex/sandbox-state/update",
      "params": {
        "sandboxPolicy": {
          "type": "workspace-write",
          "network-access": false,
          "exclude-tmpdir-env-var": false
          "exclude-slash-tmp": false
        },
        "codexLinuxSandboxExe": null,
        "sandboxCwd": "/Users/mbolin/code/codex2"
      }
    }
    ```
    
    or with whatever values are appropriate for the initial `sandboxPolicy`.
    
    **NOTE:** Codex _should_ continue to send the MCP server notifications
    of the same format if these things change over the lifetime of the
    thread, but that isn't wired up yet.
    
    The result is that `shell-tool-mcp` can consume these values so that
    when it calls `codex_core::exec::process_exec_tool_call()` in
    `codex-rs/exec-server/src/posix/escalate_server.rs`, it is now sure to
    call it with the correct values (whereas previously we relied on
    hardcoded values).
    
    While I would argue this is a supported use case within the MCP
    protocol, the `rmcp` crate that we are using today does not support
    custom notifications. As such, I had to patch it and I submitted it for
    review, so hopefully it will be accepted in some form:
    
    https://github.com/modelcontextprotocol/rust-sdk/pull/556
    
    To test out this change from end-to-end:
    
    - I ran `cargo build` in `~/code/codex2/codex-rs/exec-server`
    - I built the fork of Bash in `~/code/bash/bash`
    - I added the following to my `~/.codex/config.toml`:
    
    ```toml
    # Use with `codex --disable shell_tool`.
    [mcp_servers.execshell]
    args = ["--bash", "/Users/mbolin/code/bash/bash"]
    command = "/Users/mbolin/code/codex2/codex-rs/target/debug/codex-exec-mcp-server"
    ```
    
    - From `~/code/codex2/codex-rs`, I ran `just codex --disable shell_tool`
    - When the TUI started up, I verified that the sandbox mode is
    `workspace-write`
    - I ran `/mcp` to verify that the shell tool from the MCP is there:
    
    <img width="1387" height="1400" alt="image"
    src="https://github.com/user-attachments/assets/1a8addcc-5005-4e16-b59f-95cfd06fd4ab"
    />
    
    - Then I asked it:
    
    > what is the output of `gh issue list`
    
    because this should be auto-approved with our existing dummy policy:
    
    
    https://github.com/openai/codex/blob/af63e6eccc35783f1bf4dca3c61adb090efb6b8a/codex-rs/exec-server/src/posix.rs#L157-L164
    
    And it worked:
    
    <img width="1387" height="1400" alt="image"
    src="https://github.com/user-attachments/assets/7568d2f7-80da-4d68-86d0-c265a6f5e6c1"
    />
  • refactor: inline sandbox type lookup in process_exec_tool_call (#7122)
    `process_exec_tool_call()` was taking `SandboxType` as a param, but in
    practice, the only place it was constructed was in
    `codex_message_processor.rs` where it was derived from the other
    `sandbox_policy` param, so this PR inlines the logic that decides the
    `SandboxType` into `process_exec_tool_call()`.
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/7122).
    * #7112
    * __->__ #7122
  • feat: support login as an option on shell-tool-mcp (#7120)
    The unified exec tool has a `login` option that defaults to `true`:
    
    
    https://github.com/openai/codex/blob/3bdcbc72927acd3e0071da3df3247e1da7ec578c/codex-rs/core/src/tools/handlers/unified_exec.rs#L35-L36
    
    This updates the `ExecParams` for `shell-tool-mcp` to support the same
    parameter. Note it is declared as `Option<bool>` to ensure it is marked
    optional in the generated JSON schema.
  • feat: waiting for an elicitation should not count against a shell tool timeout (#6973)
    Previously, we were running into an issue where we would run the `shell`
    tool call with a timeout of 10s, but it fired an elicitation asking for
    user approval, the time the user took to respond to the elicitation was
    counted agains the 10s timeout, so the `shell` tool call would fail with
    a timeout error unless the user is very fast!
    
    This PR addresses this issue by introducing a "stopwatch" abstraction
    that is used to manage the timeout. The idea is:
    
    - `Stopwatch::new()` is called with the _real_ timeout of the `shell`
    tool call.
    - `process_exec_tool_call()` is called with the `Cancellation` variant
    of `ExecExpiration` because it should not manage its own timeout in this
    case
    - the `Stopwatch` expiration is wired up to the `cancel_rx` passed to
    `process_exec_tool_call()`
    - when an elicitation for the `shell` tool call is received, the
    `Stopwatch` pauses
    - because it is possible for multiple elicitations to arrive
    concurrently, it keeps track of the number of "active pauses" and does
    not resume until that counter goes down to zero
    
    I verified that I can test the MCP server using
    `@modelcontextprotocol/inspector` and specify `git status` as the
    `command` with a timeout of 500ms and that the elicitation pops up and I
    have all the time in the world to respond whereas previous to this PR,
    that would not have been possible.
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/6973).
    * #7005
    * __->__ #6973
    * #6972
  • feat: update process_exec_tool_call() to take a cancellation token (#6972)
    This updates `ExecParams` so that instead of taking `timeout_ms:
    Option<u64>`, it now takes a more general cancellation mechanism,
    `ExecExpiration`, which is an enum that includes a
    `Cancellation(tokio_util::sync::CancellationToken)` variant.
    
    If the cancellation token is fired, then `process_exec_tool_call()`
    returns in the same way as if a timeout was exceeded.
    
    This is necessary so that in #6973, we can manage the timeout logic
    external to the `process_exec_tool_call()` because we want to "suspend"
    the timeout when an elicitation from a human user is pending.
    
    
    
    
    
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/6972).
    * #7005
    * #6973
    * __->__ #6972
  • fix: when displaying execv, show file instead of arg0 (#6966)
    After merging https://github.com/openai/codex/pull/6958, I realized that
    the `command` I was displaying was not quite right. Since we know it, we
    should show the _exact_ program being executed (the first arg to
    `execve(3)`) rather than `arg0` to be more precise.
    
    Below is the same command I used to test
    https://github.com/openai/codex/pull/6958, but now you can see it shows
    `/Users/mbolin/.openai/bin/git` instead of just `git`.
    
    <img width="1526" height="1444" alt="image"
    src="https://github.com/user-attachments/assets/428128d1-c658-456e-a64e-fc6a0009cb34"
    />
  • fix: clean up elicitation used by exec-server (#6958)
    Using appropriate message/title fields, I think this looks better now:
    
    <img width="3370" height="3208" alt="image"
    src="https://github.com/user-attachments/assets/e9bbf906-4ba8-4563-affc-62cdc6c97342"
    />
    
    Though note that in the current version of the Inspector (`0.17.2`), you
    cannot hit **Submit** until you fill out the field. I believe this is a
    bug in the Inspector, as it does not properly handle the case when all
    fields are optional. I put up a fix:
    
    https://github.com/modelcontextprotocol/inspector/pull/926
  • chore: refactor exec-server to prepare it for standalone MCP use (#6944)
    This PR reorganizes things slightly so that:
    
    - Instead of a single multitool executable, `codex-exec-server`, we now
    have two executables:
      - `codex-exec-mcp-server` to launch the MCP server
    - `codex-execve-wrapper` is the `execve(2)` wrapper to use with the
    `BASH_EXEC_WRAPPER` environment variable
    - `BASH_EXEC_WRAPPER` must be a single executable: it cannot be a
    command string composed of an executable with args (i.e., it no longer
    adds the `escalate` subcommand, as before)
    - `codex-exec-mcp-server` takes `--bash` and `--execve` as options.
    Though if `--execve` is not specified, the MCP server will check the
    directory containing `std::env::current_exe()` and attempt to use the
    file named `codex-execve-wrapper` within it. In development, this works
    out since these executables are side-by-side in the `target/debug`
    folder.
    
    With respect to testing, this also fixes an important bug in
    `dummy_exec_policy()`, as I was using `ends_with()` as if it applied to
    a `String`, but in this case, it is used with a `&Path`, so the
    semantics are slightly different.
    
    Putting this all together, I was able to test this by running the
    following:
    
    ```
    ~/code/codex/codex-rs$ npx @modelcontextprotocol/inspector \
        ./target/debug/codex-exec-mcp-server --bash ~/code/bash/bash
    ```
    
    If I try to run `git status` in `/Users/mbolin/code/codex` via the
    `shell` tool from the MCP server:
    
    <img width="1589" height="1335" alt="image"
    src="https://github.com/user-attachments/assets/9db6aea8-7fbc-4675-8b1f-ec446685d6c4"
    />
    
    then I get prompted with the following elicitation, as expected:
    
    <img width="1589" height="1335" alt="image"
    src="https://github.com/user-attachments/assets/21b68fe0-494d-4562-9bad-0ddc55fc846d"
    />
    
    Though a current limitation is that the `shell` tool defaults to a
    timeout of 10s, which means I only have 10s to respond to the
    elicitation. Ideally, the time spent waiting for a response from a human
    should not count against the timeout for the command execution. I will
    address this in a subsequent PR.
    
    ---
    
    Note `~/code/bash/bash` was created by doing:
    
    ```
    cd ~/code
    git clone https://github.com/bminor/bash
    cd bash
    git checkout a8a1c2fac029404d3f42cd39f5a20f24b6e4fe4b
    <apply the patch below>
    ./configure
    make
    ```
    
    The patch:
    
    ```
    diff --git a/execute_cmd.c b/execute_cmd.c
    index 070f5119..d20ad2b9 100644
    --- a/execute_cmd.c
    +++ b/execute_cmd.c
    @@ -6129,6 +6129,19 @@ shell_execve (char *command, char **args, char **env)
       char sample[HASH_BANG_BUFSIZ];
       size_t larray;
    
    +  char* exec_wrapper = getenv("BASH_EXEC_WRAPPER");
    +  if (exec_wrapper && *exec_wrapper && !whitespace (*exec_wrapper))
    +    {
    +      char *orig_command = command;
    +
    +      larray = strvec_len (args);
    +
    +      memmove (args + 2, args, (++larray) * sizeof (char *));
    +      args[0] = exec_wrapper;
    +      args[1] = orig_command;
    +      command = exec_wrapper;
    +    }
    +
    ```
  • fix: prepare ExecPolicy in exec-server for execpolicy2 cutover (#6888)
    This PR introduces an extra layer of abstraction to prepare us for the
    migration to execpolicy2:
    
    - introduces a new trait, `EscalationPolicy`, whose `determine_action()`
    method is responsible for producing the `EscalateAction`
    - the existing `ExecPolicy` typedef is changed to return an intermediate
    `ExecPolicyOutcome` instead of `EscalateAction`
    - the default implementation of `EscalationPolicy`,
    `McpEscalationPolicy`, composes `ExecPolicy`
    - the `ExecPolicyOutcome` includes `codex_execpolicy2::Decision`, which
    has a `Prompt` variant
    - when `McpEscalationPolicy` gets `Decision::Prompt` back from
    `ExecPolicy`, it prompts the user via an MCP elicitation and maps the
    result into an `ElicitationAction`
    - now that the end user can reply to an elicitation with `Decline` or
    `Cancel`, we introduce a new variant, `EscalateAction::Deny`, which the
    client handles by returning exit code `1` without running anything
    
    Note the way the elicitation is created is still not quite right, but I
    will fix that once we have things running end-to-end for real in a
    follow-up PR.