30 Commits

  • Add network environment ID plumbing (#28766)
    ## Why
    
    Prepare network approval scoping to distinguish execution environments
    without changing behavior yet.
    
    ## What changed
    
    - Add optional environment IDs to network policy requests.
    - Add optional network environment IDs to exec and sandbox request
    structs.
    - Thread default None values through existing construction points.
    - Fix stale constructor call sites that caused the CI compile failures.
    
    ## Not included
    
    - Per-environment proxy listeners.
    - Network approval cache or prompt behavior changes.
    - Ambiguous request attribution handling.
    
    Those behavior changes moved to stacked follow-up #28899.
    
    ## Validation
    
    - just fmt
    - CI will run tests and clippy
  • [codex] Use expect in integration tests (#28441)
    The workspace denies `clippy::expect_used` in production. Although
    `clippy.toml` allows `expect` in tests, Bazel Clippy compiles
    integration-test helper code in a way that does not receive that
    exemption, which encouraged verbose `unwrap_or_else(... panic!(...))`
    and equivalent `match`/`let else` forms.
    
    This allows `clippy::expect_used` once at each integration-test crate
    root (including aggregated suites and test-support libraries), then
    replaces manual panic-based Result and Option unwraps with
    `expect`/`expect_err`. Standalone `tests/*.rs` files remain their own
    crate roots. Intentional assertion and unexpected-variant panics remain
    unchanged, and the production `expect_used = "deny"` lint remains in
    place.
    
    The cleanup is mechanical and net-negative in line count.
  • windows-sandbox: pass workspace roots to runner (#24108)
    ## Why
    
    #23813 switches the Windows sandbox runner path to `PermissionProfile`,
    but it still left one runtime anchor for resolving symbolic
    `:workspace_roots` entries. That is not enough once a turn has multiple
    effective workspace roots: exact entries and deny globs under
    `:workspace_roots` need to be materialized for every runtime root before
    the command runner chooses token mode or builds ACL plans.
    
    ## What Changed
    
    - Replaces the Windows runner/setup `permission_profile_cwd` plumbing
    with `workspace_roots: Vec<AbsolutePathBuf>`.
    - Resolves Windows-local `PermissionProfile` data with
    `materialize_project_roots_with_workspace_roots(...)` instead of the
    single-cwd helper.
    - Threads `Config::effective_workspace_roots()` through core execution,
    unified exec, TUI setup/read-grant flows, app-server setup, app-server
    `command/exec`, and `debug sandbox` on Windows.
    - Preserves those workspace roots through the zsh-fork escalation
    executor instead of rebuilding them from `sandbox_policy_cwd`.
    - Makes `ExecRequest::new(...)` and the remaining
    `build_exec_request(...)` helper path take
    `windows_sandbox_workspace_roots` explicitly so new call sites cannot
    silently fall back to `vec![cwd]`.
    - Clarifies the `debug sandbox` non-Windows comment: remaining
    cwd-dependent resolution still uses `sandbox_policy_cwd`, while
    `:workspace_roots` entries are already materialized from config roots.
    - Updates elevated runner IPC `SpawnRequest` to send `workspace_roots`
    and bumps the framed IPC protocol version to `3` for the payload shape
    change.
    - Adds Windows-local resolver coverage for expanding exact and glob
    `:workspace_roots` entries across multiple roots, plus core helper
    coverage proving explicit roots are preserved.
    
    ## Verification
    
    - `cargo check -p codex-windows-sandbox -p codex-core -p codex-tui -p
    codex-cli -p codex-app-server`
    - `cargo test -p codex-windows-sandbox`
    - `cargo test -p codex-core windows_sandbox`
    - `cargo test -p codex-core unix_escalation`
    - `cargo test -p codex-app-server windows_sandbox`
    - `cargo test -p codex-tui windows_sandbox`
    - `cargo test -p codex-cli debug_sandbox`
    - `just test -p codex-core unified_exec`
    - `just test -p codex-core
    build_exec_request_preserves_windows_workspace_roots`
    - `env -u CODEX_NETWORK_PROXY_ACTIVE -u
    CODEX_NETWORK_ALLOW_LOCAL_BINDING just test -p codex-app-server --lib
    command_exec`
    - `just test -p codex-windows-sandbox`
    - `just test -p codex-exec sandbox`
    - `just fix -p codex-core -p codex-app-server -p codex-windows-sandbox`
    
    A local macOS cross-check with `cargo check --target
    x86_64-pc-windows-msvc ...` did not reach crate Rust code because native
    dependencies require Windows SDK headers (`windows.h` / `assert.h`) in
    this environment; Windows CI remains the real target validation.
    
    Two local targeted filters compile but do not run assertions on macOS:
    `env -u CODEX_NETWORK_PROXY_ACTIVE -u CODEX_NETWORK_ALLOW_LOCAL_BINDING
    just test -p codex-app-server --lib command_exec_processor` matched zero
    tests, and `just test -p codex-linux-sandbox landlock` matched zero
    tests because the landlock suite is Linux-only.
  • core tests: migrate more turns to permission profiles (#20013)
    ## Summary
    - Migrate another batch of direct `Op::UserTurn` test construction from
    legacy `SandboxPolicy` values to `PermissionProfile` inputs via
    `turn_permission_fields()`.
    - Replace a one-off read-only `SandboxPolicy` bridge in the macOS exec
    test with `PermissionProfile::read_only()`.
    - Reduce `SandboxPolicy` references in `codex-rs/core/tests` from 32
    files at the start of the cleanup stack to 27 files.
    
    ## Testing
    - `cargo check -p codex-core --tests`
    - `just fmt`
    - `just fix -p codex-core`
  • permissions: make runtime config profile-backed (#19606)
    ## Why
    
    This supersedes #19391. During stack repair, GitHub marked #19391 as
    merged into a temporary stack branch rather than into `main`, so the
    runtime-config change needed a fresh PR.
    
    `PermissionProfile` is now the canonical permissions shape after #19231
    because it can distinguish `Managed`, `Disabled`, and `External`
    enforcement while also carrying filesystem rules that legacy
    `SandboxPolicy` cannot represent cleanly. Core config and session state
    still needed to accept profile-backed permissions without forcing every
    profile through the strict legacy bridge, which rejected valid runtime
    profiles such as direct write roots.
    
    The unrelated CI/test hardening that previously rode along with this PR
    has been split into #19683 so this PR stays focused on the permissions
    model migration.
    
    ## What Changed
    
    - Adds `Permissions.permission_profile` and
    `SessionConfiguration.permission_profile` as constrained runtime state,
    while keeping `sandbox_policy` as a legacy compatibility projection.
    - Introduces profile setters that keep `PermissionProfile`, split
    filesystem/network policies, and legacy `SandboxPolicy` projections
    synchronized.
    - Uses a compatibility projection for requirement checks and legacy
    consumers instead of rejecting profiles that cannot round-trip through
    `SandboxPolicy` exactly.
    - Updates config loading, config overrides, session updates, turn
    context plumbing, prompt permission text, sandbox tags, and exec request
    construction to carry profile-backed runtime permissions.
    - Preserves configured deny-read entries and `glob_scan_max_depth` when
    command/session profiles are narrowed.
    - Adds `PermissionProfile::read_only()` and
    `PermissionProfile::workspace_write()` presets that match legacy
    defaults.
    
    ## Verification
    
    - `cargo test -p codex-core direct_write_roots`
    - `cargo test -p codex-core runtime_roots_to_legacy_projection`
    - `cargo test -p codex-app-server
    requested_permissions_trust_project_uses_permission_profile_intent`
    
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19606).
    * #19395
    * #19394
    * #19393
    * #19392
    * __->__ #19606
  • sandbox: remove dead seatbelt helper and update tests (#17859)
    ## Why
    
    `spawn_command_under_seatbelt()` in `codex-rs/core/src/seatbelt.rs` had
    fallen out of production use and was only referenced by test-only
    wrappers. That left us with sandbox tests that could stay green even if
    the actual seatbelt exec path regressed, because production shell
    execution now flows through `SandboxManager::transform()` and
    `ExecRequest::from_sandbox_exec_request()` instead of that helper.
    
    Removing the dead helper also exposed one downstream `codex-exec`
    integration test that still imported it, which broke `just clippy`.
    
    ## What Changed
    
    - Removed `codex-rs/core/src/seatbelt.rs` and stopped exporting
    `codex_core::seatbelt`.
    - Removed the redundant `codex-rs/core/tests/suite/seatbelt.rs` coverage
    that only exercised the dead helper.
    - Kept the `openpty` regression check, but moved it into
    `codex-rs/core/tests/suite/exec.rs` so it now runs through
    `process_exec_tool_call()`.
    - Fixed the seatbelt denial test in `codex-rs/core/tests/suite/exec.rs`
    to use `/usr/bin/touch`, so it actually exercises the sandbox instead of
    a nonexistent path.
    - Updated `codex-rs/exec/tests/suite/sandbox.rs` on macOS to build the
    sandboxed command through `build_exec_request()` and spawn the
    transformed command, instead of importing the removed helper.
    - Left the lower-level seatbelt policy coverage in
    `codex-rs/sandboxing/src/seatbelt_tests.rs`, where the policy generator
    is still covered directly.
    
    ## Verification
    
    - `cargo test -p codex-core suite::exec::`
    - `cargo test -p codex-exec`
    - `cargo clippy -p codex-exec --tests -- -D warnings`
  • Spread AbsolutePathBuf (#17792)
    Mechanical change to promote absolute paths through code.
  • Use AbsolutePathBuf for exec cwd plumbing (#17063)
    ## Summary
    - Carry `AbsolutePathBuf` through tool cwd parsing/resolution instead of
    resolving workdirs to raw `PathBuf`s.
    - Type exec/sandbox request cwd fields as `AbsolutePathBuf` through
    `ExecParams`, `ExecRequest`, `SandboxCommand`, and unified exec runtime
    requests.
    - Keep `PathBuf` conversions at external/event boundaries and update
    existing tests/fixtures for the typed cwd.
    
    ## Validation
    - `cargo check -p codex-core --tests`
    - `cargo check -p codex-sandboxing --tests`
    - `cargo test -p codex-sandboxing`
    - `cargo test -p codex-core --lib tools::handlers::`
    - `just fix -p codex-sandboxing`
    - `just fix -p codex-core`
    - `just fmt`
    
    Full `codex-core` test suite was not run locally; per repo guidance I
    kept local validation targeted.
  • remove temporary ownership re-exports (#16626)
    Stacked on #16508.
    
    This removes the temporary `codex-core` / `codex-login` re-export shims
    from the ownership split and rewrites callsites to import directly from
    `codex-model-provider-info`, `codex-models-manager`, `codex-api`,
    `codex-protocol`, `codex-feedback`, and `codex-response-debug-context`.
    
    No behavior change intended; this is the mechanical import cleanup layer
    split out from the ownership move.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • extract models manager and related ownership from core (#16508)
    ## Summary
    - split `models-manager` out of `core` and add `ModelsManagerConfig`
    plus `Config::to_models_manager_config()` so model metadata paths stop
    depending on `core::Config`
    - move login-owned/auth-owned code out of `core` into `codex-login`,
    move model provider config into `codex-model-provider-info`, move API
    bridge mapping into `codex-api`, move protocol-owned types/impls into
    `codex-protocol`, and move response debug helpers into a dedicated
    `response-debug-context` crate
    - move feedback tag emission into `codex-feedback`, relocate tests to
    the crates that now own the code, and keep broad temporary re-exports so
    this PR avoids a giant import-only rewrite
    
    ## Major moves and decisions
    - created `codex-models-manager` as the owner for model
    cache/catalog/config/model info logic, including the new
    `ModelsManagerConfig` struct
    - created `codex-model-provider-info` as the owner for provider config
    parsing/defaults and kept temporary `codex-login`/`codex-core`
    re-exports for old import paths
    - moved `api_bridge` error mapping + `CoreAuthProvider` into
    `codex-api`, while `codex-login::api_bridge` temporarily re-exports
    those symbols and keeps the `auth_provider_from_auth` wrapper
    - moved `auth_env_telemetry` and `provider_auth` ownership to
    `codex-login`
    - moved `CodexErr` ownership to `codex-protocol::error`, plus
    `StreamOutput`, `bytes_to_string_smart`, and network policy helpers to
    protocol-owned modules
    - created `codex-response-debug-context` for
    `extract_response_debug_context`, `telemetry_transport_error_message`,
    and related response-debug plumbing instead of leaving that behavior in
    `core`
    - moved `FeedbackRequestTags`, `emit_feedback_request_tags`, and
    `emit_feedback_request_tags_with_auth_env` to `codex-feedback`
    - deferred removal of temporary re-exports and the mechanical import
    rewrites to a stacked follow-up PR so this PR stays reviewable
    
    ## Test moves
    - moved auth refresh coverage from `core/tests/suite/auth_refresh.rs` to
    `login/tests/suite/auth_refresh.rs`
    - moved text encoding coverage from
    `core/tests/suite/text_encoding_fix.rs` to
    `protocol/src/exec_output_tests.rs`
    - moved model info override coverage from
    `core/tests/suite/model_info_overrides.rs` to
    `models-manager/src/model_info_overrides_tests.rs`
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • chore: clean up argument-comment lint and roll out all-target CI on macOS (#16054)
    ## Why
    
    `argument-comment-lint` was green in CI even though the repo still had
    many uncommented literal arguments. The main gap was target coverage:
    the repo wrapper did not force Cargo to inspect test-only call sites, so
    examples like the `latest_session_lookup_params(true, ...)` tests in
    `codex-rs/tui_app_server/src/lib.rs` never entered the blocking CI path.
    
    This change cleans up the existing backlog, makes the default repo lint
    path cover all Cargo targets, and starts rolling that stricter CI
    enforcement out on the platform where it is currently validated.
    
    ## What changed
    
    - mechanically fixed existing `argument-comment-lint` violations across
    the `codex-rs` workspace, including tests, examples, and benches
    - updated `tools/argument-comment-lint/run-prebuilt-linter.sh` and
    `tools/argument-comment-lint/run.sh` so non-`--fix` runs default to
    `--all-targets` unless the caller explicitly narrows the target set
    - fixed both wrappers so forwarded cargo arguments after `--` are
    preserved with a single separator
    - documented the new default behavior in
    `tools/argument-comment-lint/README.md`
    - updated `rust-ci` so the macOS lint lane keeps the plain wrapper
    invocation and therefore enforces `--all-targets`, while Linux and
    Windows temporarily pass `-- --lib --bins`
    
    That temporary CI split keeps the stricter all-targets check where it is
    already cleaned up, while leaving room to finish the remaining Linux-
    and Windows-specific target-gated cleanup before enabling
    `--all-targets` on those runners. The Linux and Windows failures on the
    intermediate revision were caused by the wrapper forwarding bug, not by
    additional lint findings in those lanes.
    
    ## Validation
    
    - `bash -n tools/argument-comment-lint/run.sh`
    - `bash -n tools/argument-comment-lint/run-prebuilt-linter.sh`
    - shell-level wrapper forwarding check for `-- --lib --bins`
    - shell-level wrapper forwarding check for `-- --tests`
    - `just argument-comment-lint`
    - `cargo test` in `tools/argument-comment-lint`
    - `cargo test -p codex-terminal-detection`
    
    ## Follow-up
    
    - Clean up remaining Linux-only target-gated callsites, then switch the
    Linux lint lane back to the plain wrapper invocation.
    - Clean up remaining Windows-only target-gated callsites, then switch
    the Windows lint lane back to the plain wrapper invocation.
  • Use a private desktop for Windows sandbox instead of Winsta0\Default (#14400)
    ## Summary
    - launch Windows sandboxed children on a private desktop instead of
    `Winsta0\Default`
    - make private desktop the default while keeping
    `windows.sandbox_private_desktop=false` as the escape hatch
    - centralize process launch through the shared
    `create_process_as_user(...)` path
    - scope the private desktop ACL to the launching logon SID
    
    ## Why
    Today sandboxed Windows commands run on the visible shared desktop. That
    leaves an avoidable same-desktop attack surface for window interaction,
    spoofing, and related UI/input issues. This change moves sandboxed
    commands onto a dedicated per-launch desktop by default so the sandbox
    no longer shares `Winsta0\Default` with the user session.
    
    The implementation stays conservative on security with no silent
    fallback back to `Winsta0\Default`
    
    If private-desktop setup fails on a machine, users can still opt out
    explicitly with `windows.sandbox_private_desktop=false`.
    
    ## Validation
    - `cargo build -p codex-cli`
    - elevated-path `codex exec` desktop-name probe returned
    `CodexSandboxDesktop-*`
    - elevated-path `codex exec` smoke sweep for shell commands, nested
    `pwsh`, jobs, and hidden `notepad` launch
    - unelevated-path full private-desktop compatibility sweep via `codex
    exec` with `-c windows.sandbox=unelevated`
  • sandboxing: plumb split sandbox policies through runtime (#13439)
    ## Why
    
    `#13434` introduces split `FileSystemSandboxPolicy` and
    `NetworkSandboxPolicy`, but the runtime still made most execution-time
    sandbox decisions from the legacy `SandboxPolicy` projection.
    
    That projection loses information about combinations like unrestricted
    filesystem access with restricted network access. In practice, that
    means the runtime can choose the wrong platform sandbox behavior or set
    the wrong network-restriction environment for a command even when config
    has already separated those concerns.
    
    This PR carries the split policies through the runtime so sandbox
    selection, process spawning, and exec handling can consult the policy
    that actually matters.
    
    ## What changed
    
    - threaded `FileSystemSandboxPolicy` and `NetworkSandboxPolicy` through
    `TurnContext`, `ExecRequest`, sandbox attempts, shell escalation state,
    unified exec, and app-server exec overrides
    - updated sandbox selection in `core/src/sandboxing/mod.rs` and
    `core/src/exec.rs` to key off `FileSystemSandboxPolicy.kind` plus
    `NetworkSandboxPolicy`, rather than inferring behavior only from the
    legacy `SandboxPolicy`
    - updated process spawning in `core/src/spawn.rs` and the platform
    wrappers to use `NetworkSandboxPolicy` when deciding whether to set
    `CODEX_SANDBOX_NETWORK_DISABLED`
    - kept additional-permissions handling and legacy `ExternalSandbox`
    compatibility projections aligned with the split policies, including
    explicit user-shell execution and Windows restricted-token routing
    - updated callers across `core`, `app-server`, and `linux-sandbox` to
    pass the split policies explicitly
    
    ## Verification
    
    - added regression coverage in `core/tests/suite/user_shell_cmd.rs` to
    verify `RunUserShellCommand` does not inherit
    `CODEX_SANDBOX_NETWORK_DISABLED` from the active turn
    - added coverage in `core/src/exec.rs` for Windows restricted-token
    sandbox selection when the legacy projection is `ExternalSandbox`
    - updated Linux sandbox coverage in
    `linux-sandbox/tests/suite/landlock.rs` to exercise the split-policy
    exec path
    - verified the current PR state with `just clippy`
    
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13439).
    * #13453
    * #13452
    * #13451
    * #13449
    * #13448
    * #13445
    * #13440
    * __->__ #13439
    
    ---------
    
    Co-authored-by: viyatb-oai <viyatb@openai.com>
  • Revert "Ensure shell command skills trigger approval (#12697)" (#12721)
    This reverts commit daf0f03ac8.
    
    # External (non-OpenAI) Pull Request Requirements
    
    Before opening this Pull Request, please read the dedicated
    "Contributing" markdown file or your PR may be closed:
    https://github.com/openai/codex/blob/main/docs/contributing.md
    
    If your PR conforms to our contribution guidelines, replace this text
    with a detailed and high quality description of your changes.
    
    Include a link to a bug report or enhancement request.
  • Ensure shell command skills trigger approval (#12697)
    Summary
    - detect skill-invoking shell commands based on the original command
    string, request approvals when needed, and cache positive decisions per
    session
    - keep implicit skill invocation emitted after approval and keep skill
    approval decline messaging centralized to the shell handler
    - expand and adjust skill approval tests to cover shell-based skill
    scripts while matching the new detection expectations
    
    Testing
    - Not run (not requested)
  • chore: remove codex-core public protocol/shell re-exports (#12432)
    ## Why
    
    `codex-rs/core/src/lib.rs` re-exported a broad set of types and modules
    from `codex-protocol` and `codex-shell-command`. That made it easy for
    workspace crates to import those APIs through `codex-core`, which in
    turn hides dependency edges and makes it harder to reduce compile-time
    coupling over time.
    
    This change removes those public re-exports so call sites must import
    from the source crates directly. Even when a crate still depends on
    `codex-core` today, this makes dependency boundaries explicit and
    unblocks future work to drop `codex-core` dependencies where possible.
    
    ## What Changed
    
    - Removed public re-exports from `codex-rs/core/src/lib.rs` for:
    - `codex_protocol::protocol` and related protocol/model types (including
    `InitialHistory`)
      - `codex_protocol::config_types` (`protocol_config_types`)
    - `codex_shell_command::{bash, is_dangerous_command, is_safe_command,
    parse_command, powershell}`
    - Migrated workspace Rust call sites to import directly from:
      - `codex_protocol::protocol`
      - `codex_protocol::config_types`
      - `codex_protocol::models`
      - `codex_shell_command`
    - Added explicit `Cargo.toml` dependencies (`codex-protocol` /
    `codex-shell-command`) in crates that now import those crates directly.
    - Kept `codex-core` internal modules compiling by using `pub(crate)`
    aliases in `core/src/lib.rs` (internal-only, not part of the public
    API).
    - Updated the two utility crates that can already drop a `codex-core`
    dependency edge entirely:
      - `codex-utils-approval-presets`
      - `codex-utils-cli`
    
    ## Verification
    
    - `cargo test -p codex-utils-approval-presets`
    - `cargo test -p codex-utils-cli`
    - `cargo check --workspace --all-targets`
    - `just clippy`
  • Refactor network approvals to host/protocol/port scope (#12140)
    ## Summary
    Simplify network approvals by removing per-attempt proxy correlation and
    moving to session-level approval dedupe keyed by (host, protocol, port).
    Instead of encoding attempt IDs into proxy credentials/URLs, we now
    treat approvals as a destination policy decision.
    
    - Concurrent calls to the same destination share one approval prompt.
    - Different destinations (or same host on different ports) get separate
    prompts.
    - Allow once approves the current queued request group only.
    - Allow for session caches that (host, protocol, port) and auto-allows
    future matching requests.
    - Never policy continues to deny without prompting.
    
    Example:
    - 3 calls: 
      - a.com (line 443)
      - b.com (line 443)
      - a.com (line 443)
    => 2 prompts total (a, b), second a waits on the first decision.
    - a.com:80 is treated separately from a.com line 443
    
    ## Testing
    - `just fmt` (in `codex-rs`)
    - `cargo test -p codex-core tools::network_approval::tests`
    - `cargo test -p codex-core` (unit tests pass; existing
    integration-suite failures remain in this environment)
  • feat(core): add structured network approval plumbing and policy decision model (#11672)
    ### Description
    #### Summary
    Introduces the core plumbing required for structured network approvals
    
    #### What changed
    - Added structured network policy decision modeling in core.
    - Added approval payload/context types needed for network approval
    semantics.
    - Wired shell/unified-exec runtime plumbing to consume structured
    decisions.
    - Updated related core error/event surfaces for structured handling.
    - Updated protocol plumbing used by core approval flow.
    - Included small CLI debug sandbox compatibility updates needed by this
    layer.
    
    #### Why
    establishes the minimal backend foundation for network approvals without
    yet changing high-level orchestration or TUI behavior.
    
    #### Notes
    - Behavior remains constrained by existing requirements/config gating.
    - Follow-up PRs in the stack handle orchestration, UX, and app-server
    integration.
    
    ---------
    
    Co-authored-by: Codex <199175422+chatgpt-codex-connector[bot]@users.noreply.github.com>
  • feat: include NetworkConfig through ExecParams (#11105)
    This PR adds the following field to `Config`:
    
    ```rust
    pub network: Option<NetworkProxy>,
    ```
    
    Though for the moment, it will always be initialized as `None` (this
    will be addressed in a subsequent PR).
    
    This PR does the work to thread `network` through to `execute_exec_env()`, `process_exec_tool_call()`, and `UnifiedExecRuntime.run()` to ensure it is available whenever we span a process.
  • feat(linux-sandbox): add bwrap support (#9938)
    ## Summary
    This PR introduces a gated Bubblewrap (bwrap) Linux sandbox path. The
    curent Linux sandbox path relies on in-process restrictions (including
    Landlock). Bubblewrap gives us a more uniform filesystem isolation
    model, especially explicit writable roots with the option to make some
    directories read-only and granular network controls.
    
    This is behind a feature flag so we can validate behavior safely before
    making it the default.
    
    - Added temporary rollout flag:
      - `features.use_linux_sandbox_bwrap`
    - Preserved existing default path when the flag is off.
    - In Bubblewrap mode:
    - Added internal retry without /proc when /proc mount is not permitted
    by the host/container.
  • remove sandbox globals. (#9797)
    Threads sandbox updates through OverrideTurnContext for active turn
    Passes computed sandbox type into safety/exec
  • refactoring with_escalated_permissions to use SandboxPermissions instead (#7750)
    helpful in the future if we want more granularity for requesting
    escalated permissions:
    e.g when running in readonly sandbox, model can request to escalate to a
    sandbox that allows writes
  • refactor: inline sandbox type lookup in process_exec_tool_call (#7122)
    `process_exec_tool_call()` was taking `SandboxType` as a param, but in
    practice, the only place it was constructed was in
    `codex_message_processor.rs` where it was derived from the other
    `sandbox_policy` param, so this PR inlines the logic that decides the
    `SandboxType` into `process_exec_tool_call()`.
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/7122).
    * #7112
    * __->__ #7122
  • feat: update process_exec_tool_call() to take a cancellation token (#6972)
    This updates `ExecParams` so that instead of taking `timeout_ms:
    Option<u64>`, it now takes a more general cancellation mechanism,
    `ExecExpiration`, which is an enum that includes a
    `Cancellation(tokio_util::sync::CancellationToken)` variant.
    
    If the cancellation token is fired, then `process_exec_tool_call()`
    returns in the same way as if a timeout was exceeded.
    
    This is necessary so that in #6973, we can manage the timeout logic
    external to the `process_exec_tool_call()` because we want to "suspend"
    the timeout when an elicitation from a human user is pending.
    
    
    
    
    
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/6972).
    * #7005
    * #6973
    * __->__ #6972
  • chore: rework tools execution workflow (#5278)
    Re-work the tool execution flow. Read `orchestrator.rs` to understand
    the structure
  • chore: clippy on redundant closure (#4058)
    Add redundant closure clippy rules and let Codex fix it by minimising
    FQP
  • fix: ensure cwd for conversation and sandbox are separate concerns (#3874)
    Previous to this PR, both of these functions take a single `cwd`:
    
    
    https://github.com/openai/codex/blob/71038381aa0f51aa62e1a2bcc7cbf26a05b141f3/codex-rs/core/src/seatbelt.rs#L19-L25
    
    
    https://github.com/openai/codex/blob/71038381aa0f51aa62e1a2bcc7cbf26a05b141f3/codex-rs/core/src/landlock.rs#L16-L23
    
    whereas `cwd` and `sandbox_cwd` should be set independently (fixed in
    this PR).
    
    Added `sandbox_distinguishes_command_and_policy_cwds()` to
    `codex-rs/exec/tests/suite/sandbox.rs` to verify this.
  • test: faster test execution in codex-core (#2633)
    this dramatically improves time to run `cargo test -p codex-core` (~25x
    speedup).
    
    before:
    ```
    cargo test -p codex-core  35.96s user 68.63s system 19% cpu 8:49.80 total
    ```
    
    after:
    ```
    cargo test -p codex-core  5.51s user 8.16s system 63% cpu 21.407 total
    ```
    
    both tests measured "hot", i.e. on a 2nd run with no filesystem changes,
    to exclude compile times.
    
    approach inspired by [Delete Cargo Integration
    Tests](https://matklad.github.io/2021/02/27/delete-cargo-integration-tests.html),
    we move all test cases in tests/ into a single suite in order to have a
    single binary, as there is significant overhead for each test binary
    executed, and because test execution is only parallelized with a single
    binary.