Commit Graph

18 Commits

  • Use a private desktop for Windows sandbox instead of Winsta0\Default (#14400)
    ## Summary
    - launch Windows sandboxed children on a private desktop instead of
    `Winsta0\Default`
    - make private desktop the default while keeping
    `windows.sandbox_private_desktop=false` as the escape hatch
    - centralize process launch through the shared
    `create_process_as_user(...)` path
    - scope the private desktop ACL to the launching logon SID
    
    ## Why
    Today sandboxed Windows commands run on the visible shared desktop. That
    leaves an avoidable same-desktop attack surface for window interaction,
    spoofing, and related UI/input issues. This change moves sandboxed
    commands onto a dedicated per-launch desktop by default so the sandbox
    no longer shares `Winsta0\Default` with the user session.
    
    The implementation stays conservative on security with no silent
    fallback back to `Winsta0\Default`
    
    If private-desktop setup fails on a machine, users can still opt out
    explicitly with `windows.sandbox_private_desktop=false`.
    
    ## Validation
    - `cargo build -p codex-cli`
    - elevated-path `codex exec` desktop-name probe returned
    `CodexSandboxDesktop-*`
    - elevated-path `codex exec` smoke sweep for shell commands, nested
    `pwsh`, jobs, and hidden `notepad` launch
    - unelevated-path full private-desktop compatibility sweep via `codex
    exec` with `-c windows.sandbox=unelevated`
  • sandboxing: plumb split sandbox policies through runtime (#13439)
    ## Why
    
    `#13434` introduces split `FileSystemSandboxPolicy` and
    `NetworkSandboxPolicy`, but the runtime still made most execution-time
    sandbox decisions from the legacy `SandboxPolicy` projection.
    
    That projection loses information about combinations like unrestricted
    filesystem access with restricted network access. In practice, that
    means the runtime can choose the wrong platform sandbox behavior or set
    the wrong network-restriction environment for a command even when config
    has already separated those concerns.
    
    This PR carries the split policies through the runtime so sandbox
    selection, process spawning, and exec handling can consult the policy
    that actually matters.
    
    ## What changed
    
    - threaded `FileSystemSandboxPolicy` and `NetworkSandboxPolicy` through
    `TurnContext`, `ExecRequest`, sandbox attempts, shell escalation state,
    unified exec, and app-server exec overrides
    - updated sandbox selection in `core/src/sandboxing/mod.rs` and
    `core/src/exec.rs` to key off `FileSystemSandboxPolicy.kind` plus
    `NetworkSandboxPolicy`, rather than inferring behavior only from the
    legacy `SandboxPolicy`
    - updated process spawning in `core/src/spawn.rs` and the platform
    wrappers to use `NetworkSandboxPolicy` when deciding whether to set
    `CODEX_SANDBOX_NETWORK_DISABLED`
    - kept additional-permissions handling and legacy `ExternalSandbox`
    compatibility projections aligned with the split policies, including
    explicit user-shell execution and Windows restricted-token routing
    - updated callers across `core`, `app-server`, and `linux-sandbox` to
    pass the split policies explicitly
    
    ## Verification
    
    - added regression coverage in `core/tests/suite/user_shell_cmd.rs` to
    verify `RunUserShellCommand` does not inherit
    `CODEX_SANDBOX_NETWORK_DISABLED` from the active turn
    - added coverage in `core/src/exec.rs` for Windows restricted-token
    sandbox selection when the legacy projection is `ExternalSandbox`
    - updated Linux sandbox coverage in
    `linux-sandbox/tests/suite/landlock.rs` to exercise the split-policy
    exec path
    - verified the current PR state with `just clippy`
    
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13439).
    * #13453
    * #13452
    * #13451
    * #13449
    * #13448
    * #13445
    * #13440
    * __->__ #13439
    
    ---------
    
    Co-authored-by: viyatb-oai <viyatb@openai.com>
  • Revert "Ensure shell command skills trigger approval (#12697)" (#12721)
    This reverts commit daf0f03ac8.
    
    # External (non-OpenAI) Pull Request Requirements
    
    Before opening this Pull Request, please read the dedicated
    "Contributing" markdown file or your PR may be closed:
    https://github.com/openai/codex/blob/main/docs/contributing.md
    
    If your PR conforms to our contribution guidelines, replace this text
    with a detailed and high quality description of your changes.
    
    Include a link to a bug report or enhancement request.
  • Ensure shell command skills trigger approval (#12697)
    Summary
    - detect skill-invoking shell commands based on the original command
    string, request approvals when needed, and cache positive decisions per
    session
    - keep implicit skill invocation emitted after approval and keep skill
    approval decline messaging centralized to the shell handler
    - expand and adjust skill approval tests to cover shell-based skill
    scripts while matching the new detection expectations
    
    Testing
    - Not run (not requested)
  • chore: remove codex-core public protocol/shell re-exports (#12432)
    ## Why
    
    `codex-rs/core/src/lib.rs` re-exported a broad set of types and modules
    from `codex-protocol` and `codex-shell-command`. That made it easy for
    workspace crates to import those APIs through `codex-core`, which in
    turn hides dependency edges and makes it harder to reduce compile-time
    coupling over time.
    
    This change removes those public re-exports so call sites must import
    from the source crates directly. Even when a crate still depends on
    `codex-core` today, this makes dependency boundaries explicit and
    unblocks future work to drop `codex-core` dependencies where possible.
    
    ## What Changed
    
    - Removed public re-exports from `codex-rs/core/src/lib.rs` for:
    - `codex_protocol::protocol` and related protocol/model types (including
    `InitialHistory`)
      - `codex_protocol::config_types` (`protocol_config_types`)
    - `codex_shell_command::{bash, is_dangerous_command, is_safe_command,
    parse_command, powershell}`
    - Migrated workspace Rust call sites to import directly from:
      - `codex_protocol::protocol`
      - `codex_protocol::config_types`
      - `codex_protocol::models`
      - `codex_shell_command`
    - Added explicit `Cargo.toml` dependencies (`codex-protocol` /
    `codex-shell-command`) in crates that now import those crates directly.
    - Kept `codex-core` internal modules compiling by using `pub(crate)`
    aliases in `core/src/lib.rs` (internal-only, not part of the public
    API).
    - Updated the two utility crates that can already drop a `codex-core`
    dependency edge entirely:
      - `codex-utils-approval-presets`
      - `codex-utils-cli`
    
    ## Verification
    
    - `cargo test -p codex-utils-approval-presets`
    - `cargo test -p codex-utils-cli`
    - `cargo check --workspace --all-targets`
    - `just clippy`
  • Refactor network approvals to host/protocol/port scope (#12140)
    ## Summary
    Simplify network approvals by removing per-attempt proxy correlation and
    moving to session-level approval dedupe keyed by (host, protocol, port).
    Instead of encoding attempt IDs into proxy credentials/URLs, we now
    treat approvals as a destination policy decision.
    
    - Concurrent calls to the same destination share one approval prompt.
    - Different destinations (or same host on different ports) get separate
    prompts.
    - Allow once approves the current queued request group only.
    - Allow for session caches that (host, protocol, port) and auto-allows
    future matching requests.
    - Never policy continues to deny without prompting.
    
    Example:
    - 3 calls: 
      - a.com (line 443)
      - b.com (line 443)
      - a.com (line 443)
    => 2 prompts total (a, b), second a waits on the first decision.
    - a.com:80 is treated separately from a.com line 443
    
    ## Testing
    - `just fmt` (in `codex-rs`)
    - `cargo test -p codex-core tools::network_approval::tests`
    - `cargo test -p codex-core` (unit tests pass; existing
    integration-suite failures remain in this environment)
  • feat(core): add structured network approval plumbing and policy decision model (#11672)
    ### Description
    #### Summary
    Introduces the core plumbing required for structured network approvals
    
    #### What changed
    - Added structured network policy decision modeling in core.
    - Added approval payload/context types needed for network approval
    semantics.
    - Wired shell/unified-exec runtime plumbing to consume structured
    decisions.
    - Updated related core error/event surfaces for structured handling.
    - Updated protocol plumbing used by core approval flow.
    - Included small CLI debug sandbox compatibility updates needed by this
    layer.
    
    #### Why
    establishes the minimal backend foundation for network approvals without
    yet changing high-level orchestration or TUI behavior.
    
    #### Notes
    - Behavior remains constrained by existing requirements/config gating.
    - Follow-up PRs in the stack handle orchestration, UX, and app-server
    integration.
    
    ---------
    
    Co-authored-by: Codex <199175422+chatgpt-codex-connector[bot]@users.noreply.github.com>
  • feat: include NetworkConfig through ExecParams (#11105)
    This PR adds the following field to `Config`:
    
    ```rust
    pub network: Option<NetworkProxy>,
    ```
    
    Though for the moment, it will always be initialized as `None` (this
    will be addressed in a subsequent PR).
    
    This PR does the work to thread `network` through to `execute_exec_env()`, `process_exec_tool_call()`, and `UnifiedExecRuntime.run()` to ensure it is available whenever we span a process.
  • feat(linux-sandbox): add bwrap support (#9938)
    ## Summary
    This PR introduces a gated Bubblewrap (bwrap) Linux sandbox path. The
    curent Linux sandbox path relies on in-process restrictions (including
    Landlock). Bubblewrap gives us a more uniform filesystem isolation
    model, especially explicit writable roots with the option to make some
    directories read-only and granular network controls.
    
    This is behind a feature flag so we can validate behavior safely before
    making it the default.
    
    - Added temporary rollout flag:
      - `features.use_linux_sandbox_bwrap`
    - Preserved existing default path when the flag is off.
    - In Bubblewrap mode:
    - Added internal retry without /proc when /proc mount is not permitted
    by the host/container.
  • remove sandbox globals. (#9797)
    Threads sandbox updates through OverrideTurnContext for active turn
    Passes computed sandbox type into safety/exec
  • refactoring with_escalated_permissions to use SandboxPermissions instead (#7750)
    helpful in the future if we want more granularity for requesting
    escalated permissions:
    e.g when running in readonly sandbox, model can request to escalate to a
    sandbox that allows writes
  • refactor: inline sandbox type lookup in process_exec_tool_call (#7122)
    `process_exec_tool_call()` was taking `SandboxType` as a param, but in
    practice, the only place it was constructed was in
    `codex_message_processor.rs` where it was derived from the other
    `sandbox_policy` param, so this PR inlines the logic that decides the
    `SandboxType` into `process_exec_tool_call()`.
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/7122).
    * #7112
    * __->__ #7122
  • feat: update process_exec_tool_call() to take a cancellation token (#6972)
    This updates `ExecParams` so that instead of taking `timeout_ms:
    Option<u64>`, it now takes a more general cancellation mechanism,
    `ExecExpiration`, which is an enum that includes a
    `Cancellation(tokio_util::sync::CancellationToken)` variant.
    
    If the cancellation token is fired, then `process_exec_tool_call()`
    returns in the same way as if a timeout was exceeded.
    
    This is necessary so that in #6973, we can manage the timeout logic
    external to the `process_exec_tool_call()` because we want to "suspend"
    the timeout when an elicitation from a human user is pending.
    
    
    
    
    
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/6972).
    * #7005
    * #6973
    * __->__ #6972
  • chore: rework tools execution workflow (#5278)
    Re-work the tool execution flow. Read `orchestrator.rs` to understand
    the structure
  • chore: clippy on redundant closure (#4058)
    Add redundant closure clippy rules and let Codex fix it by minimising
    FQP
  • fix: ensure cwd for conversation and sandbox are separate concerns (#3874)
    Previous to this PR, both of these functions take a single `cwd`:
    
    
    https://github.com/openai/codex/blob/71038381aa0f51aa62e1a2bcc7cbf26a05b141f3/codex-rs/core/src/seatbelt.rs#L19-L25
    
    
    https://github.com/openai/codex/blob/71038381aa0f51aa62e1a2bcc7cbf26a05b141f3/codex-rs/core/src/landlock.rs#L16-L23
    
    whereas `cwd` and `sandbox_cwd` should be set independently (fixed in
    this PR).
    
    Added `sandbox_distinguishes_command_and_policy_cwds()` to
    `codex-rs/exec/tests/suite/sandbox.rs` to verify this.
  • test: faster test execution in codex-core (#2633)
    this dramatically improves time to run `cargo test -p codex-core` (~25x
    speedup).
    
    before:
    ```
    cargo test -p codex-core  35.96s user 68.63s system 19% cpu 8:49.80 total
    ```
    
    after:
    ```
    cargo test -p codex-core  5.51s user 8.16s system 63% cpu 21.407 total
    ```
    
    both tests measured "hot", i.e. on a 2nd run with no filesystem changes,
    to exclude compile times.
    
    approach inspired by [Delete Cargo Integration
    Tests](https://matklad.github.io/2021/02/27/delete-cargo-integration-tests.html),
    we move all test cases in tests/ into a single suite in order to have a
    single binary, as there is significant overhead for each test binary
    executed, and because test execution is only parallelized with a single
    binary.