Commit Graph

30 Commits

  • feat: make sandbox read access configurable with ReadOnlyAccess (#11387)
    `SandboxPolicy::ReadOnly` previously implied broad read access and could
    not express a narrower read surface.
    This change introduces an explicit read-access model so we can support
    user-configurable read restrictions in follow-up work, while preserving
    current behavior today.
    
    It also ensures unsupported backends fail closed for restricted-read
    policies instead of silently granting broader access than intended.
    
    ## What
    
    - Added `ReadOnlyAccess` in protocol with:
      - `Restricted { include_platform_defaults, readable_roots }`
      - `FullAccess`
    - Updated `SandboxPolicy` to carry read-access configuration:
      - `ReadOnly { access: ReadOnlyAccess }`
      - `WorkspaceWrite { ..., read_only_access: ReadOnlyAccess }`
    - Preserved existing behavior by defaulting current construction paths
    to `ReadOnlyAccess::FullAccess`.
    - Threaded the new fields through sandbox policy consumers and call
    sites across `core`, `tui`, `linux-sandbox`, `windows-sandbox`, and
    related tests.
    - Updated Seatbelt policy generation to honor restricted read roots by
    emitting scoped read rules when full read access is not granted.
    - Added fail-closed behavior on Linux and Windows backends when
    restricted read access is requested but not yet implemented there
    (`UnsupportedOperation`).
    - Regenerated app-server protocol schema and TypeScript artifacts,
    including `ReadOnlyAccess`.
    
    ## Compatibility / rollout
    
    - Runtime behavior remains unchanged by default (`FullAccess`).
    - API/schema changes are in place so future config wiring can enable
    restricted read access without another policy-shape migration.
  • deflake linux-sandbox NoNewPrivs timeout (#11245)
    Deflake `codex-linux-sandbox::all
    suite::landlock::test_no_new_privs_is_enabled`.
    
    CI has intermittently failed with `Sandbox(Timeout)` (exit 124) because
    the sandboxed
    `grep '^NoNewPrivs:' /proc/self/status` can run close to the short
    timeout budget.
    
    This updates only this test to use `LONG_TIMEOUT_MS`, which removes the
    near-threshold timeout
    behavior while keeping the rest of the suite unchanged.
    
    Refs (previous failures):
    - PR:
    https://github.com/openai/codex/actions/runs/21836764823/job/63009902779
    - PR:
    https://github.com/openai/codex/actions/runs/21837427251/job/63012470353
    - main:
    https://github.com/openai/codex/actions/runs/21830746538/job/62988079964
    
    Validation:
    - Local: `cd codex-rs && cargo test -p codex-linux-sandbox` (non-Linux
    runs 0 tests)
  • feat: include NetworkConfig through ExecParams (#11105)
    This PR adds the following field to `Config`:
    
    ```rust
    pub network: Option<NetworkProxy>,
    ```
    
    Though for the moment, it will always be initialized as `None` (this
    will be addressed in a subsequent PR).
    
    This PR does the work to thread `network` through to `execute_exec_env()`, `process_exec_tool_call()`, and `UnifiedExecRuntime.run()` to ensure it is available whenever we span a process.
  • feat(linux-sandbox): add bwrap support (#9938)
    ## Summary
    This PR introduces a gated Bubblewrap (bwrap) Linux sandbox path. The
    curent Linux sandbox path relies on in-process restrictions (including
    Landlock). Bubblewrap gives us a more uniform filesystem isolation
    model, especially explicit writable roots with the option to make some
    directories read-only and granular network controls.
    
    This is behind a feature flag so we can validate behavior safely before
    making it the default.
    
    - Added temporary rollout flag:
      - `features.use_linux_sandbox_bwrap`
    - Preserved existing default path when the flag is off.
    - In Bubblewrap mode:
    - Added internal retry without /proc when /proc mount is not permitted
    by the host/container.
  • Inject CODEX_THREAD_ID into the terminal environment (#10096)
    Inject CODEX_THREAD_ID (when applicable) into the terminal environment
    so that the agent (and skills) can refer to the current thread / session
    ID.
    
    Discussion:
    https://openai.slack.com/archives/C095U48JNL9/p1769542492067109
  • remove sandbox globals. (#9797)
    Threads sandbox updates through OverrideTurnContext for active turn
    Passes computed sandbox type into safety/exec
  • revert: remove pre-Landlock bind mounts apply (#9300)
    **Description**
    
    This removes the pre‑Landlock read‑only bind‑mount step from the Linux
    sandbox so filesystem restrictions rely solely on Landlock again.
    `mounts.rs` is kept in place but left unused. The linux‑sandbox README
    is updated to match the new behavior and manual test expectations.
  • fix: fallback to Landlock-only when user namespaces unavailable and set PR_SET_NO_NEW_PRIVS early (#9250)
    fixes https://github.com/openai/codex/issues/9236
    
    ### Motivation
    - Prevent sandbox setup from failing when unprivileged user namespaces
    are denied so Landlock-only protections can still be applied.
    - Ensure `PR_SET_NO_NEW_PRIVS` is set before installing seccomp and
    Landlock restrictions to avoid kernel `EPERM`/`LandlockRestrict`
    ordering issues.
    
    ### Description
    - Add `is_permission_denied` helper that detects `EPERM` /
    `PermissionDenied` from `CodexErr` to drive fallback logic.
    - In `apply_read_only_mounts` skip read-only bind-mount setup and return
    `Ok(())` when `unshare_user_and_mount_namespaces()` fails with
    permission-denied so Landlock rules can still be installed.
    - Add `set_no_new_privs()` and call it from
    `apply_sandbox_policy_to_current_thread` before installing seccomp
    filters and Landlock rules when disk or network access is restricted.
  • feat: add support for read-only bind mounts in the linux sandbox (#9112)
    ### Motivation
    
    - Landlock alone cannot prevent writes to sensitive in-repo files like
    `.git/` when the repo root is writable, so explicit mount restrictions
    are required for those paths.
    - The sandbox must set up any mounts before calling Landlock so Landlock
    can still be applied afterwards and the two mechanisms compose
    correctly.
    
    ### Description
    
    - Add a new `linux-sandbox` helper `apply_read_only_mounts` in
    `linux-sandbox/src/mounts.rs` that: unshares namespaces, maps uids/gids
    when required, makes mounts private, bind-mounts targets, and remounts
    them read-only.
    - Wire the mount step into the sandbox flow by calling
    `apply_read_only_mounts(...)` before network/seccomp and before applying
    Landlock rules in `linux-sandbox/src/landlock.rs`.
  • fix: introduce AbsolutePathBuf as part of sandbox config (#7856)
    Changes the `writable_roots` field of the `WorkspaceWrite` variant of
    the `SandboxPolicy` enum from `Vec<PathBuf>` to `Vec<AbsolutePathBuf>`.
    This is helpful because now callers can be sure the value is an absolute
    path rather than a relative one. (Though when using an absolute path in
    a Seatbelt config policy, we still have to _canonicalize_ it first.)
    
    Because `writable_roots` can be read from a config file, it is important
    that we are able to resolve relative paths properly using the parent
    folder of the config file as the base path.
  • refactoring with_escalated_permissions to use SandboxPermissions instead (#7750)
    helpful in the future if we want more granularity for requesting
    escalated permissions:
    e.g when running in readonly sandbox, model can request to escalate to a
    sandbox that allows writes
  • refactor: inline sandbox type lookup in process_exec_tool_call (#7122)
    `process_exec_tool_call()` was taking `SandboxType` as a param, but in
    practice, the only place it was constructed was in
    `codex_message_processor.rs` where it was derived from the other
    `sandbox_policy` param, so this PR inlines the logic that decides the
    `SandboxType` into `process_exec_tool_call()`.
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/7122).
    * #7112
    * __->__ #7122
  • feat: update process_exec_tool_call() to take a cancellation token (#6972)
    This updates `ExecParams` so that instead of taking `timeout_ms:
    Option<u64>`, it now takes a more general cancellation mechanism,
    `ExecExpiration`, which is an enum that includes a
    `Cancellation(tokio_util::sync::CancellationToken)` variant.
    
    If the cancellation token is fired, then `process_exec_tool_call()`
    returns in the same way as if a timeout was exceeded.
    
    This is necessary so that in #6973, we can manage the timeout logic
    external to the `process_exec_tool_call()` because we want to "suspend"
    the timeout when an elicitation from a human user is pending.
    
    
    
    
    
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/6972).
    * #7005
    * #6973
    * __->__ #6972
  • chore: rework tools execution workflow (#5278)
    Re-work the tool execution flow. Read `orchestrator.rs` to understand
    the structure
  • chore: clippy on redundant closure (#4058)
    Add redundant closure clippy rules and let Codex fix it by minimising
    FQP
  • fix: ensure cwd for conversation and sandbox are separate concerns (#3874)
    Previous to this PR, both of these functions take a single `cwd`:
    
    
    https://github.com/openai/codex/blob/71038381aa0f51aa62e1a2bcc7cbf26a05b141f3/codex-rs/core/src/seatbelt.rs#L19-L25
    
    
    https://github.com/openai/codex/blob/71038381aa0f51aa62e1a2bcc7cbf26a05b141f3/codex-rs/core/src/landlock.rs#L16-L23
    
    whereas `cwd` and `sandbox_cwd` should be set independently (fixed in
    this PR).
    
    Added `sandbox_distinguishes_command_and_policy_cwds()` to
    `codex-rs/exec/tests/suite/sandbox.rs` to verify this.
  • Include command output when sending timeout to model (#3576)
    Being able to see the output helps the model decide how to handle the
    timeout.
  • test: faster test execution in codex-core (#2633)
    this dramatically improves time to run `cargo test -p codex-core` (~25x
    speedup).
    
    before:
    ```
    cargo test -p codex-core  35.96s user 68.63s system 19% cpu 8:49.80 total
    ```
    
    after:
    ```
    cargo test -p codex-core  5.51s user 8.16s system 63% cpu 21.407 total
    ```
    
    both tests measured "hot", i.e. on a 2nd run with no filesystem changes,
    to exclude compile times.
    
    approach inspired by [Delete Cargo Integration
    Tests](https://matklad.github.io/2021/02/27/delete-cargo-integration-tests.html),
    we move all test cases in tests/ into a single suite in order to have a
    single binary, as there is significant overhead for each test binary
    executed, and because test execution is only parallelized with a single
    binary.
  • Added allow-expect-in-tests / allow-unwrap-in-tests (#2328)
    This PR:
    * Added the clippy.toml to configure allowable expect / unwrap usage in
    tests
    * Removed as many expect/allow lines as possible from tests
    * moved a bunch of allows to expects where possible
    
    Note: in integration tests, non `#[test]` helper functions are not
    covered by this so we had to leave a few lingering `expect(expect_used`
    checks around
  • chore: introduce ConversationManager as a clearinghouse for all conversations (#2240)
    This PR does two things because after I got deep into the first one I
    started pulling on the thread to the second:
    
    - Makes `ConversationManager` the place where all in-memory
    conversations are created and stored. Previously, `MessageProcessor` in
    the `codex-mcp-server` crate was doing this via its `session_map`, but
    this is something that should be done in `codex-core`.
    - It unwinds the `ctrl_c: tokio::sync::Notify` that was threaded
    throughout our code. I think this made sense at one time, but now that
    we handle Ctrl-C within the TUI and have a proper `Op::Interrupt` event,
    I don't think this was quite right, so I removed it. For `codex exec`
    and `codex proto`, we now use `tokio::signal::ctrl_c()` directly, but we
    no longer make `Notify` a field of `Codex` or `CodexConversation`.
    
    Changes of note:
    
    - Adds the files `conversation_manager.rs` and `codex_conversation.rs`
    to `codex-core`.
    - `Codex` and `CodexSpawnOk` are no longer exported from `codex-core`:
    other crates must use `CodexConversation` instead (which is created via
    `ConversationManager`).
    - `core/src/codex_wrapper.rs` has been deleted in favor of
    `ConversationManager`.
    - `ConversationManager::new_conversation()` returns `NewConversation`,
    which is in line with the `new_conversation` tool we want to add to the
    MCP server. Note `NewConversation` includes `SessionConfiguredEvent`, so
    we eliminate checks in cases like `codex-rs/core/tests/client.rs` to
    verify `SessionConfiguredEvent` is the first event because that is now
    internal to `ConversationManager`.
    - Quite a bit of code was deleted from
    `codex-rs/mcp-server/src/message_processor.rs` since it no longer has to
    manage multiple conversations itself: it goes through
    `ConversationManager` instead.
    - `core/tests/live_agent.rs` has been deleted because I had to update a
    bunch of tests and all the tests in here were ignored, and I don't think
    anyone ever ran them, so this was just technical debt, at this point.
    - Removed `notify_on_sigint()` from `util.rs` (and in a follow-up, I
    hope to refactor the blandly-named `util.rs` into more descriptive
    files).
    - In general, I started replacing local variables named `codex` as
    `conversation`, where appropriate, though admittedly I didn't do it
    through all the integration tests because that would have added a lot of
    noise to this PR.
    
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2240).
    * #2264
    * #2263
    * __->__ #2240
  • Include output truncation message in tool call results (#2183)
    To avoid model being confused about incomplete output.
  • feat: add /tmp by default (#1919)
    Replaces the `include_default_writable_roots` option on
    `sandbox_workspace_write` (that defaulted to `true`, which was slightly
    weird/annoying) with `exclude_tmpdir_env_var`, which defaults to
    `false`.
    
    Though perhaps more importantly `/tmp` is now enabled by default as part
    of `sandbox_mode = "workspace-write"`, though `exclude_slash_tmp =
    false` can be used to disable this.
  • [approval_policy] Add OnRequest approval_policy (#1865)
    ## Summary
    A split-up PR of #1763 , stacked on top of a tools refactor #1858 to
    make the change clearer. From the previous summary:
    
    > Let's try something new: tell the model about the sandbox, and let it
    decide when it will need to break the sandbox. Some local testing
    suggests that it works pretty well with zero iteration on the prompt!
    
    ## Testing
    - [x] Added unit tests
    - [x] Tested locally and it appears to work smoothly!
  • chore: introduce SandboxPolicy::WorkspaceWrite::include_default_writable_roots (#1785)
    Without this change, it is challenging to create integration tests to
    verify that the folders not included in `writable_roots` in
    `SandboxPolicy::WorkspaceWrite` are read-only because, by default,
    `get_writable_roots_with_cwd()` includes `TMPDIR`, which is where most
    integrationt
    tests do their work.
    
    This introduces a `use_exact_writable_roots` option to disable the
    default
    includes returned by `get_writable_roots_with_cwd()`.
    
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/1785).
    * #1765
    * __->__ #1785
  • feat: stream exec stdout events (#1786)
    ## Summary
    - stream command stdout as `ExecCommandStdout` events
    - forward streamed stdout to clients and ignore in human output
    processor
    - adjust call sites for new streaming API
  • chore(rs): update dependencies (#1494)
    ### Chores
    - Update cargo dependencies
    - Remove unused cargo dependencies
    - Fix clippy warnings
    - Update Dockerfile (package.json requires node 22)
    - Let Dependabot update bun, cargo, devcontainers, docker,
    github-actions, npm (nix still not supported)
    
    ### TODO
    - Upgrade dependencies with breaking changes
    
    ```shell
    $ cargo update --verbose
       Unchanged crossterm v0.28.1 (available: v0.29.0)
       Unchanged schemars v0.8.22 (available: v1.0.4)
    ```
  • feat: redesign sandbox config (#1373)
    This is a major redesign of how sandbox configuration works and aims to
    fix https://github.com/openai/codex/issues/1248. Specifically, it
    replaces `sandbox_permissions` in `config.toml` (and the
    `-s`/`--sandbox-permission` CLI flags) with a "table" with effectively
    three variants:
    
    ```toml
    # Safest option: full disk is read-only, but writes and network access are disallowed.
    [sandbox]
    mode = "read-only"
    
    # The cwd of the Codex task is writable, as well as $TMPDIR on macOS.
    # writable_roots can be used to specify additional writable folders.
    [sandbox]
    mode = "workspace-write"
    writable_roots = []  # Optional, defaults to the empty list.
    network_access = false  # Optional, defaults to false.
    
    # Disable sandboxing: use at your own risk!!!
    [sandbox]
    mode = "danger-full-access"
    ```
    
    This should make sandboxing easier to reason about. While we have
    dropped support for `-s`, the way it works now is:
    
    - no flags => `read-only`
    - `--full-auto` => `workspace-write`
    - currently, there is no way to specify `danger-full-access` via a CLI
    flag, but we will revisit that as part of
    https://github.com/openai/codex/issues/1254
    
    Outstanding issue:
    
    - As noted in the `TODO` on `SandboxPolicy::is_unrestricted()`, we are
    still conflating sandbox preferences with approval preferences in that
    case, which needs to be cleaned up.
  • fix: support arm64 build for Linux (#1225)
    Users were running into issues with glibc mismatches on arm64 linux. In
    the past, we did not provide a musl build for arm64 Linux because we had
    trouble getting the openssl dependency to build correctly. Though today
    I just tried the same trick in `Cargo.toml` that we were doing for
    `x86_64-unknown-linux-musl` (using `openssl-sys` with `features =
    ["vendored"]`), so I'm not sure what problem we had in the past the
    builds "just worked" today!
    
    Though one tweak that did have to be made is that the integration tests
    for Seccomp/Landlock empirically require longer timeouts on arm64 linux,
    or at least on the `ubuntu-24.04-arm` GitHub Runner. As such, we change
    the timeouts for arm64 in `codex-rs/linux-sandbox/tests/landlock.rs`.
    
    Though in solving this problem, I decided I needed a turnkey solution
    for testing the Linux build(s) from my Mac laptop, so this PR introduces
    `.devcontainer/Dockerfile` and `.devcontainer/devcontainer.json` to
    facilitate this. Detailed instructions are in `.devcontainer/README.md`.
    
    We will update `dotslash-config.json` and other release-related scripts
    in a follow-up PR.
  • fix: overhaul how we spawn commands under seccomp/landlock on Linux (#1086)
    Historically, we spawned the Seatbelt and Landlock sandboxes in
    substantially different ways:
    
    For **Seatbelt**, we would run `/usr/bin/sandbox-exec` with our policy
    specified as an arg followed by the original command:
    
    
    https://github.com/openai/codex/blob/d1de7bb383552e8fadd94be79d65d188e00fd562/codex-rs/core/src/exec.rs#L147-L219
    
    For **Landlock/Seccomp**, we would do
    `tokio::runtime::Builder::new_current_thread()`, _invoke
    Landlock/Seccomp APIs to modify the permissions of that new thread_, and
    then spawn the command:
    
    
    https://github.com/openai/codex/blob/d1de7bb383552e8fadd94be79d65d188e00fd562/codex-rs/core/src/exec_linux.rs#L28-L49
    
    While it is neat that Landlock/Seccomp supports applying a policy to
    only one thread without having to apply it to the entire process, it
    requires us to maintain two different codepaths and is a bit harder to
    reason about. The tipping point was
    https://github.com/openai/codex/pull/1061, in which we had to start
    building up the `env` in an unexpected way for the existing
    Landlock/Seccomp approach to continue to work.
    
    This PR overhauls things so that we do similar things for Mac and Linux.
    It turned out that we were already building our own "helper binary"
    comparable to Mac's `sandbox-exec` as part of the `cli` crate:
    
    
    https://github.com/openai/codex/blob/d1de7bb383552e8fadd94be79d65d188e00fd562/codex-rs/cli/Cargo.toml#L10-L12
    
    We originally created this to build a small binary to include with the
    Node.js version of the Codex CLI to provide support for Linux
    sandboxing.
    
    Though the sticky bit is that, at this point, we still want to deploy
    the Rust version of Codex as a single, standalone binary rather than a
    CLI and a supporting sandboxing binary. To satisfy this goal, we use
    "the arg0 trick," in which we:
    
    * use `std::env::current_exe()` to get the path to the CLI that is
    currently running
    * use the CLI as the `program` for the `Command`
    * set `"codex-linux-sandbox"` as arg0 for the `Command`
    
    A CLI that supports sandboxing should check arg0 at the start of the
    program. If it is `"codex-linux-sandbox"`, it must invoke
    `codex_linux_sandbox::run_main()`, which runs the CLI as if it were
    `codex-linux-sandbox`. When acting as `codex-linux-sandbox`, we make the
    appropriate Landlock/Seccomp API calls and then use `execvp(3)` to spawn
    the original command, so do _replace_ the process rather than spawn a
    subprocess. Incidentally, we do this before starting the Tokio runtime,
    so the process should only have one thread when `execvp(3)` is called.
    
    Because the `core` crate that needs to spawn the Linux sandboxing is not
    a CLI in its own right, this means that every CLI that includes `core`
    and relies on this behavior has to (1) implement it and (2) provide the
    path to the sandboxing executable. While the path is almost always
    `std::env::current_exe()`, we needed to make this configurable for
    integration tests, so `Config` now has a `codex_linux_sandbox_exe:
    Option<PathBuf>` property to facilitate threading this through,
    introduced in https://github.com/openai/codex/pull/1089.
    
    This common pattern is now captured in
    `codex_linux_sandbox::run_with_sandbox()` and all of the `main.rs`
    functions that should use it have been updated as part of this PR.
    
    The `codex-linux-sandbox` crate added to the Cargo workspace as part of
    this PR now has the bulk of the Landlock/Seccomp logic, which makes
    `core` a bit simpler. Indeed, `core/src/exec_linux.rs` and
    `core/src/landlock.rs` were removed/ported as part of this PR. I also
    moved the unit tests for this code into an integration test,
    `linux-sandbox/tests/landlock.rs`, in which I use
    `env!("CARGO_BIN_EXE_codex-linux-sandbox")` as the value for
    `codex_linux_sandbox_exe` since `std::env::current_exe()` is not
    appropriate in that case.