26 Commits

  • [codex] Reject unlowered PowerShell AST regions (#24092)
    ## Why
    
    On Windows, Codex uses a PowerShell safe-command classifier to decide
    whether a command is read-only enough to run without additional
    approval. The classifier lowers `EndBlock.Statements` into argv-like
    command words and checks those words against a safelist.
    
    PowerShell can execute code stored elsewhere in the AST. Parameter
    defaults, named blocks, `using` preambles, and top-level `trap` handlers
    are not represented in the lowered statement list. Ignoring those
    regions can make a side-effecting script look like a read-only command.
    
    ## What
    
    Fail closed whenever a PowerShell script contains executable AST content
    that the current lowering does not represent.
    
    ## How
    
    - Return `unsupported` for parameter, dynamic-parameter, begin, process,
    and clean blocks.
    - Return `unsupported` for `using module` and `using assembly`
    preambles.
    - Return `unsupported` for non-empty `EndBlock.Traps` collections.
    - Preserve compatibility with Windows PowerShell 5.1 by looking up
    `CleanBlock` dynamically.
    - Treat `unsupported` as a failure to prove that the command is safe,
    routing it through the normal approval path.
    - Add parser-level and end-to-end regressions for parameter blocks,
    named blocks, using statements, and trap handlers.
    
    This does not make these PowerShell forms invalid or prevent them from
    running. It prevents automatic safe-command approval when the classifier
    cannot account for all executable behavior.
    
    ## Testing
    
    - `just test -p codex-shell-command`
    - Windows CI exercises the parser and end-to-end safe-command
    regressions against a real PowerShell installation.
    
    ---------
    
    Co-authored-by: viyatb-oai <viyatb@openai.com>
  • [codex] simplify memory read metrics (#28164)
    ## Why
    
    Memory read telemetry currently reconstructs the executable shell
    command after a tool call finishes. That duplicates shell, login-policy,
    and cwd resolution owned by the tool handlers, and can diverge from the
    environment-specific command that unified exec actually ran.
    
    ## What changed
    
    - Expose the existing restricted shell-script parser directly for raw
    script text.
    - Parse `shell_command` and `exec_command` input into plain command argv
    before classifying memory reads.
    - Preserve all-or-nothing safe-command validation for multi-command
    scripts.
    - Remove cwd resolution, shell selection, and the unnecessary async
    boundary from memory read metric emission.
    
    ## Testing
    
    - `just test -p codex-shell-command`
    - `cargo check -p codex-core`
  • build: run buildifier from just fmt (#28125)
    ## Intent
    
    Keep Bazel and Starlark files consistently formatted without requiring
    contributors to install or version buildifier themselves.
    
    ## Implementation
    
    - Add a SHA-256-pinned, cross-platform DotSlash manifest for buildifier
    v8.5.1.
    - Run buildifier from the shared `just fmt` and `just fmt-check` driver,
    with Windows-safe explicit DotSlash invocation.
    - Provision DotSlash in formatting CI and contributor devcontainers, and
    document the source-build prerequisite.
    - Apply the initial mechanical buildifier formatting baseline.
  • [codex] Add /usr/bin/bash shell fallback (#26538)
    ## Why
    
    Some Linux environments expose `bash` at `/usr/bin/bash` instead of
    `/bin/bash`. The shell detection fallback list should cover both
    standard locations once PATH/user-shell probing fails.
    
    Stacked on #26480.
    
    ## What changed
    
    - Add `/usr/bin/bash` to the bash fallback path list in
    `codex-shell-command`.
    - Extend shell type detection coverage for `/usr/bin/bash`.
    - Add AGENTS.md testing guidance to avoid tests for statically defined
    values and negative tests for removed logic.
    
    ## Verification
    
    - `just test -p codex-shell-command`
  • [codex] Add environment shell info (#26480)
    ## Why
    
    Shell detection needs to be available through the `Environment`
    abstraction so callers can ask the selected local or remote environment
    for shell metadata without adding a separate HTTP endpoint or parallel
    info-source path. This keeps shell metadata shaped like the existing
    environment-owned filesystem capability and lets remote environments
    answer through exec-server JSON-RPC.
    
    ## What changed
    
    - Added `environment/info` to the exec-server protocol/client/server and
    exposed `Environment::info()`.
    - Added local and remote environment info providers on `Environment`,
    following the existing capability-provider pattern used for filesystem
    access.
    - Moved the shared shell detection logic into `codex-shell-command` and
    kept core shell APIs as wrappers around that implementation.
    - Returned shell metadata as `EnvironmentInfo { shell: ShellInfo }`
    using the existing shell detection path.
    - Added a remote environment test that calls `Environment::info()`
    through an exec-server-backed environment.
    
    ## Validation
    
    - `git diff --check`
    - `just test -p codex-shell-command`
    - `just test -p codex-core -E 'test(/shell::tests::/)'`\n- `just test -p
    codex-exec-server environment`
  • [codex] Avoid PowerShell safety parsing off Windows (#24946)
    ## Summary
    
    This fixes BUGB-17567 by preventing non-Windows command safety
    classification from invoking the Windows PowerShell safelist/parser
    path.
    
    Previously, `is_known_safe_command` called the Windows PowerShell
    classifier on every platform. That classifier recognizes
    `pwsh`/`powershell` by basename and delegates script parsing to the
    PowerShell AST parser. The parser starts the supplied executable, so on
    macOS/Linux a repository-controlled `pwsh` path could execute during
    safety parsing before the normal sandboxed command execution path.
    
    The change gates the Windows PowerShell classifier and module behind
    `#[cfg(windows)]`. On macOS/Linux, PowerShell-looking commands are no
    longer auto-approved by the Windows classifier and instead fall through
    to the normal non-Windows safe-command logic.
    
    ## Validation
    
    - `/private/tmp/codex-tools/bin/just fmt`
    - `PATH=/private/tmp/codex-tools/bin:$PATH
    /private/tmp/codex-tools/bin/just test -p codex-shell-command`
    
    The focused test run passed 135 tests with 0 skipped and completed the
    crate bench-smoke step.
    
    ## Notes
    
    This PR is scoped to the BUGB-17567 macOS/Linux path. Windows still uses
    the PowerShell classifier; a separate hardening follow-up should ensure
    Windows safety parsing only executes a trusted PowerShell parser binary
    and does not spawn the command's `argv[0]` when that path may be
    repository-controlled.
  • [codex] Handle PowerShell UTF-8 setup failures (#24949)
    Fixes #12496.
    
    ## Why
    
    Windows sandboxed PowerShell commands can run under
    `ConstrainedLanguage` on some machines, especially enterprise-managed
    Windows environments. In that mode, our PowerShell command prelude could
    fail before every command because it directly assigned
    `[Console]::OutputEncoding` to UTF-8. The actual user command still ran,
    but Codex surfaced noisy `Cannot set property. Property setting is
    supported only on core types in this language mode.` output for every
    shell call.
    
    ## What Changed
    
    - Makes the PowerShell UTF-8 output encoding prelude best-effort by
    wrapping the assignment in `try { ... } catch {}`.
    - Keeps the existing UTF-8 behavior when PowerShell allows the
    assignment.
    - Adds focused tests for adding the prelude and avoiding duplicate
    prelude insertion.
    
    ## Validation
    
    - `cargo fmt -p codex-shell-command`
    - `cargo check -p codex-shell-command`
    - `git diff --check`
    - Verified a local `ConstrainedLanguage` PowerShell probe prints only
    the command output with no property-setting error.
    - Verified `codex exec` from a temporary `chcp 437` context reports
    `utf-8` / `65001` and preserves non-ASCII output (`café`, `漢字`).
  • [codex] treat PowerShell stop-parsing forms as unsupported (#22643)
    ## Summary
    - Treat PowerShell stop-parsing token forms as unsupported in the
    AST-backed command flattener.
    - Add focused regressions at the parser layer and Windows command-safety
    layer.
    
    ## Why
    The command-safety parser lowers PowerShell AST elements into argv-like
    words. Stop-parsing syntax preserves a native-command argument shape
    that this lowering does not model, so these forms should stay on the
    conservative unsupported path.
    
    ## Validation
    - `cargo fmt --manifest-path codex-rs/Cargo.toml --all --check`
    - `cargo test --manifest-path codex-rs/Cargo.toml -p
    codex-shell-command`
  • Disable empty Cargo test targets (#21584)
    ## Summary
    
    `cargo test` has entails both running standard Rust tests and doctests.
    It turns out that the doctest discovery is fairly slow, and it's a cost
    you pay even for crates that don't include any doctests.
    
    This PR disables doctests with `doctest = false` for crates that lack
    any doctests.
    
    For the collection of crates below, this speeds up test execution by
    >4x.
    
    E.g., before this PR:
    
    ```
    Benchmark 1: cargo test     -p codex-utils-absolute-path     -p codex-utils-cache     -p codex-utils-cli     -p codex-utils-home-dir     -p codex-utils-output-truncation     -p codex-utils-path     -p codex-utils-string     -p codex-utils-template     -p codex-utils-elapsed     -p codex-utils-json-to-toml
      Time (mean ± σ):      1.849 s ±  4.455 s    [User: 0.752 s, System: 1.367 s]
      Range (min … max):    0.418 s … 14.529 s    10 runs
    ```
    
    And after:
    
    ```
    Benchmark 1: cargo test     -p codex-utils-absolute-path     -p codex-utils-cache     -p codex-utils-cli     -p codex-utils-home-dir     -p codex-utils-output-truncation     -p codex-utils-path     -p codex-utils-string     -p codex-utils-template     -p codex-utils-elapsed     -p codex-utils-json-to-toml
      Time (mean ± σ):     428.6 ms ±   6.9 ms    [User: 187.7 ms, System: 219.7 ms]
      Range (min … max):   418.0 ms … 436.8 ms    10 runs
    ```
    
    For a single crate, with >2x speedup, before:
    
    ```
    Benchmark 1: cargo test -p codex-utils-string
      Time (mean ± σ):     491.1 ms ±   9.0 ms    [User: 229.8 ms, System: 234.9 ms]
      Range (min … max):   480.9 ms … 512.0 ms    10 runs
    ```
    
    And after:
    
    ```
    Benchmark 1: cargo test -p codex-utils-string
      Time (mean ± σ):     213.9 ms ±   4.3 ms    [User: 112.8 ms, System: 84.0 ms]
      Range (min … max):   206.8 ms … 221.0 ms    13 runs
    ```
    
    Co-authored-by: Codex <noreply@openai.com>
  • [codex] Handle git pagination flags by position (#21381)
    ## Why
    
    This is a follow-up to the Windows Git safe-command bypass fix for
    BUGB-15601. Git's global `--paginate` / `-p` flags can route output
    through a configured pager, so they should not be auto-approved as safe
    before the subcommand. At the same time, `-p` after read-only
    subcommands like `log`, `diff`, and `show` is the common patch-output
    flag, so treating every `-p` as unsafe would make ordinary read-only
    inspection commands prompt unnecessarily.
    
    ## What Changed
    
    - Split Git option safety matching into explicit global-option and
    subcommand-option lists.
    - Treat global `git --paginate ...` and `git -p ...` as unsafe.
    - Keep post-subcommand patch usage such as `git log -p`, `git diff -p`,
    and `git show -p HEAD` safe.
    - Keep the pagination coverage with the shared Git safe-command
    implementation rather than the Windows wrapper tests.
    - Remove the stale `git_global_option_requires_prompt` helper now that
    safe-command Git option matching owns the prompt-required lists.
    
    ## Testing
    
    - `cargo test -p codex-shell-command`
  • Share Git safe-command logic on Windows (#21275)
    ## Why
    
    BUGB-15601 showed that the Windows safe-command path had drifted from
    the generic Git classifier. The Windows-specific Git parser could
    classify a PowerShell-wrapped `git` command as safe as soon as it found
    a safelisted subcommand, without applying the generic checks for unsafe
    subcommand options such as `--output`, `--ext-diff`, `--textconv`,
    `--paginate`, or `cat-file --filters`.
    
    The generic classifier already models the Git command boundary and the
    read-only argument checks more carefully, so Windows should reuse that
    logic instead of maintaining a smaller parallel parser.
    
    ## What Changed
    
    - Extracted the existing generic Git classification logic into
    `is_safe_git_command`.
    - Updated `windows_safe_commands.rs` to call that shared helper for
    parsed PowerShell `git` commands.
    - Removed the Windows-only Git subcommand safelist, including the
    `cat-file` allowance that was part of the reported bypass.
    - Added a Windows regression test that keeps PowerShell-wrapped Git
    commands with side-effecting options classified unsafe.
    - Made the full-path PowerShell test discover the installed PowerShell
    executable instead of depending on one hard-coded `pwsh.exe` path.
    
    ## Verification
    
    - `cargo test -p codex-shell-command
    rejects_git_subcommand_options_with_side_effects`
    - `cargo test -p codex-shell-command
    git_global_override_flags_are_not_safe`
    - `cargo test -p codex-shell-command
    windows_powershell_full_path_is_safe -- --nocapture`
    
    Co-authored-by: Codex <codex@openai.com>
  • fix(exec_policy) heredoc parsing file_redirect (#20113)
    ## Summary
    Fixes a regression introduced in #10941 so that heredocs do not permit
    file redirects to be approved by rules, and adds scenario tests to cover
    this behavior.
    
    
    Previously, heredoc command parsing would allow redirects and
    environment variables:
    ```bash
    # commands_for_exec_policy() would parse this via parse_shell_lc_single_command_prefix
    PATH=/tmp/bad:$PATH cat <<'EOF' > /tmp/bad/hello.txt
    hello
    EOF
    ```
    This conflicts with the Codex Rules documentation; heredoc parsing logic
    should abide by the same strictness of parsing.
    
    
    ## Tests
    - [x] Updated unit tests accordingly
    - [x] Added scenario tests for these cases
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • execpolicy: unwrap PowerShell -Command wrappers on Windows (#20336)
    ## Why
    On Windows, Codex runs shell commands through a top-level
    `powershell.exe -NoProfile -Command ...` wrapper. `execpolicy` was
    matching that wrapper instead of the inner command, so prefix rules like
    `["git", "push"]` did not fire for PowerShell-wrapped commands even
    though the same normalization already happens for `bash -lc` on Unix.
    
    This change makes the Windows shell wrapper transparent to rule matching
    while preserving the existing Windows unmatched-command safelist and
    dangerous-command heuristics.
    
    ## What changed
    - add `parse_powershell_command_plain_commands()` in
    `shell-command/src/powershell.rs` to unwrap the top-level PowerShell
    `-Command` body with `extract_powershell_command()` and parse it with
    the existing PowerShell AST parser
    - update `core/src/exec_policy.rs` so `commands_for_exec_policy()`
    treats top-level PowerShell wrappers like `bash -lc` and evaluates rules
    against the parsed inner commands
    - carry a small `ExecPolicyCommandOrigin` through unmatched-command
    evaluation and expose `is_safe_powershell_words()` /
    `is_dangerous_powershell_words()` so Windows safelist and
    dangerous-command checks still work after unwrap
    - add Windows-focused tests for wrapped PowerShell prompt/allow matches,
    wrapper parsing, and unmatched safe/dangerous inner commands, and
    re-enable the end-to-end `execpolicy_blocks_shell_invocation` test on
    Windows
    
    ## Testing
    - `cargo test -p codex-shell-command`
  • fix: don't auto approve git -C ... (#20085)
    It's safer to make sure these commands go through approval flows.
  • [codex] Remove unused Rust helpers (#17146)
    ## Summary
    
    Removes high-confidence unused Rust helper functions and exports across
    `codex-tui`, `codex-shell-command`, and utility crates.
    
    The cleanup includes dead TUI helper methods, unused
    path/string/elapsed/fuzzy-match utilities, an unused Windows PowerShell
    lookup helper, and the unused terminal palette version counter. This
    keeps the remaining public surface smaller without changing behavior.
    
    ## Validation
    
    - `just fmt`
    - `cargo test -p codex-tui -p codex-shell-command -p codex-utils-elapsed
    -p codex-utils-fuzzy-match -p codex-utils-string -p codex-utils-path`
    - `just fix -p codex-tui -p codex-shell-command -p codex-utils-elapsed
    -p codex-utils-fuzzy-match -p codex-utils-string -p codex-utils-path`
    - `git diff --check`
  • [codex] Make AbsolutePathBuf joins infallible (#16981)
    Having to check for errors every time join is called is painful and
    unnecessary.
  • [codex] reduce module visibility (#16978)
    ## Summary
    - reduce public module visibility across Rust crates, preferring private
    or crate-private modules with explicit crate-root public exports
    - update external call sites and tests to use the intended public crate
    APIs instead of reaching through module trees
    - add the module visibility guideline to AGENTS.md
    
    ## Validation
    - `cargo check --workspace --all-targets --message-format=short` passed
    before the final fix/format pass
    - `just fix` completed successfully
    - `just fmt` completed successfully
    - `git diff --check` passed
  • shell-command: reuse a PowerShell parser process on Windows (#16057)
    ## Why
    
    `//codex-rs/shell-command:shell-command-unit-tests` became a real
    bottleneck in the Windows Bazel lane because repeated calls to
    `is_safe_command_windows()` were starting a fresh PowerShell parser
    process for every `powershell.exe -Command ...` assertion.
    
    PR #16056 was motivated by that same bottleneck, but its test-only
    shortcut was the wrong layer to optimize because it weakened the
    end-to-end guarantee that our runtime path really asks PowerShell to
    parse the command the way we expect.
    
    This PR attacks the actual cost center instead: it keeps the real
    PowerShell parser in the loop, but turns that parser into a long-lived
    helper process so both tests and the runtime safe-command path can reuse
    it across many requests.
    
    ## What Changed
    
    - add `shell-command/src/command_safety/powershell_parser.rs`, which
    keeps one mutex-protected parser process per PowerShell executable path
    and speaks a simple JSON-over-stdio request/response protocol
    - turn `shell-command/src/command_safety/powershell_parser.ps1` into a
    long-running parser server with comments explaining the protocol, the
    AST-shape restrictions, and why unsupported constructs are rejected
    conservatively
    - keep request ids and a one-time respawn path so a dead or
    desynchronized cached child fails closed instead of silently returning
    mixed parser output
    - preserve separate parser processes for `powershell.exe` and
    `pwsh.exe`, since they do not accept the same language surface
    - avoid a direct `PipelineChainAst` type reference in the PowerShell
    script so the parser service still runs under Windows PowerShell 5.1 as
    well as newer `pwsh`
    - make `shell-command/src/command_safety/windows_safe_commands.rs`
    delegate to the new parser utility instead of spawning a fresh
    PowerShell process for every parse
    - add a Windows-only unit test that exercises multiple sequential
    requests against the same parser process
    
    ## Testing
    
    - adds a Windows-only parser-reuse unit test in `powershell_parser.rs`
    - the main end-to-end verification for this change is the Windows CI
    lane, because the new service depends on real `powershell.exe` /
    `pwsh.exe` behavior
  • [codex] Block unsafe git global options from safe allowlist (#15796)
    ## Summary
    - block git global options that can redirect config, repository, or
    helper lookup from being auto-approved as safe
    - share the unsafe global-option predicate across the Unix and Windows
    git safety checks
    - add regression coverage for inline and split forms, including `bash
    -lc` and PowerShell wrappers
    
    ## Root cause
    The Unix safe-command gate only rejected `-c` and `--config-env`, even
    though the shared git parser already knew how to skip additional
    pre-subcommand globals such as `--git-dir`, `--work-tree`,
    `--exec-path`, `--namespace`, and `--super-prefix`. That let those
    arguments slip through safe-command classification on otherwise
    read-only git invocations and bypass approval. The Windows-specific
    safe-command path had the same trust-boundary gap for git global
    options.
  • Collapse parsed command summaries when any stage is unknown (#13043)
    ## Summary
    - collapse parsed command output to a single `Unknown` whenever the
    normal parse includes any unknown entry
    - preserve the existing parsing flow and existing `cd` handling,
    including the current `cd && ...` collapse behavior
    - trim redundant tests and add focused coverage for collapse-on-unknown
    cases
    
    ## Testing
    - `cargo test -p codex-shell-command`
  • core: resolve host_executable() rules during preflight (#13065)
    ## Why
    
    [#12964](https://github.com/openai/codex/pull/12964) added
    `host_executable()` support to `codex-execpolicy`, and
    [#13046](https://github.com/openai/codex/pull/13046) adopted it in the
    zsh-fork interception path.
    
    The remaining gap was the preflight execpolicy check in
    `core/src/exec_policy.rs`. That path derives approval requirements
    before execution for `shell`, `shell_command`, and `unified_exec`, but
    it was still using the default exact-token matcher.
    
    As a result, a command that already included an absolute executable
    path, such as `/usr/bin/git status`, could still miss a basename rule
    like `prefix_rule(pattern = ["git"], ...)` during preflight even when
    the policy also defined a matching `host_executable(name = "git", ...)`
    entry.
    
    This PR brings the same opt-in `host_executable()` resolution to the
    preflight approval path when an absolute program path is already present
    in the parsed command.
    
    ## What Changed
    
    - updated
    `ExecPolicyManager::create_exec_approval_requirement_for_command()` in
    `core/src/exec_policy.rs` to use `check_multiple_with_options(...)` with
    `MatchOptions { resolve_host_executables: true }`
    - kept the existing shell parsing flow for approval derivation, but now
    allow basename rules to match absolute executable paths during preflight
    when `host_executable()` permits it
    - updated requested-prefix amendment evaluation to use the same
    host-executable-aware matching mode, so suggested `prefix_rule()`
    amendments are checked consistently for absolute-path commands
    - added preflight coverage for:
    - absolute-path commands that should match basename rules through
    `host_executable()`
    - absolute-path commands whose paths are not in the allowed
    `host_executable()` mapping
      - requested prefix-rule amendments for absolute-path commands
    
    ## Verification
    
    - `just fix -p codex-core`
    - `cargo test -p codex-core --lib exec_policy::tests::`
  • fix(core) exec_policy parsing fixes (#11951)
    ## Summary
    Fixes a few things in our exec_policy handling of prefix_rules:
    1. Correctly match redirects specifically for exec_policy parsing. i.e.
    if you have `prefix_rule(["echo"], decision="allow")` then `echo hello >
    output.txt` should match - this should fix #10321
    2. If there already exists any rule that would match our prefix rule
    (not just a prompt), then drop it, since it won't do anything.
    
    
    ## Testing
    - [x] Updated unit tests, added approvals ScenarioSpecs
  • Remove git commands from dangerous command checks (#11510)
    ### Motivation
    
    - Git subcommand matching was being classified as "dangerous" and caused
    benign developer workflows (for example `git push --force-with-lease`)
    to be blocked by the preflight policy.
    - The change aligns behavior with the intent to reserve the dangerous
    checklist for truly destructive shell ops (e.g. `rm -rf`) and avoid
    surprising developer-facing blocks.
    
    ### Description
    
    - Remove git-specific subcommand checks from
    `is_dangerous_to_call_with_exec` in
    `codex-rs/shell-command/src/command_safety/is_dangerous_command.rs`,
    leaving only explicit `rm` and `sudo` passthrough checks.
    - Deleted the git-specific helper logic that classified `reset`,
    `branch`-delete, `push` (force/delete/refspec) and `clean --force` as
    dangerous.
    - Updated unit tests in the same file to assert that various `git
    reset`/`git branch`/`git push`/`git clean` variants are no longer
    classified as dangerous.
    - Kept `find_git_subcommand` (used by safe-command classification)
    intact so safe/unsafe parsing elsewhere remains functional.
    
    ### Testing
    
    - Ran formatter with `just fmt` successfully.  
    - Ran unit tests with `cargo test -p codex-shell-command` and all tests
    passed (`144 passed; 0 failed`).
    
    ------
    [Codex
    Task](https://chatgpt.com/codex/tasks/task_i_698d19dedb4883299c3ceb5bbc6a0dcf)
  • fix(exec-policy) No empty command lists (#11397)
    ## Summary
    This should rarely, if ever, happen in practice. But regardless, we
    should never provide an empty list of `commands` to ExecPolicy. This PR
    is almost entirely adding test around these cases.
    
    ## Testing
    - [x] Adds a bunch of unit tests for this
  • chore: rename codex-command to codex-shell-command (#11378)
    This addresses some post-merge feedback on
    https://github.com/openai/codex/pull/11361:
    
    - crate rename
    - reuse `detect_shell_type()` utility