25 Commits

  • release: publish standalone zsh artifacts (#30114)
    ## Why
    
    The patched zsh artifacts rarely change, but
    `.github/workflows/rust-release-zsh.yml` currently runs as part of every
    Rust release. Rebuilding the same four binaries for each Codex version
    wastes release capacity and ties an independently versioned runtime
    dependency to the main release cadence.
    
    This establishes the producer side of a build-once flow. The existing
    Rust release workflow remains unchanged until the first standalone
    artifact release has been published and the checked-in DotSlash
    manifests can be updated with its URLs and checksums.
    
    ## What changed
    
    - Run the zsh release workflow for protected `codex-zsh-vX.Y.Z` tags
    instead of as a reusable workflow.
    - Validate the semantic release tag before starting the platform builds.
    - Publish the four zsh archives to a GitHub prerelease so the release
    never becomes the repository latest release.
    - Publish the generated `codex-zsh` DotSlash manifest alongside the
    archives.
    - Document how to publish the next artifact version after changing the
    pinned zsh commit or patch.
    
    ## Tag protection
    
    An active repository tag ruleset named `codex-zsh-v*.*.*` targets
    `refs/tags/codex-zsh-v*.*.*`. It restricts tag creation, updates,
    deletion, and non-fast-forward changes; requires linear history; and
    limits bypass to the configured repository role.
    
    This was verified with:
    
    ```shell
    gh api repos/openai/codex/rulesets/18140982
    ```
    
    The response reported `"enforcement":"active"`, the expected tag
    condition, and the `creation`, `update`, `deletion`, `non_fast_forward`,
    and `required_linear_history` rules.
    
    ## Rollout
    
    After this lands, publish the first `codex-zsh-vX.Y.Z` release. A
    follow-up can then update the checked-in DotSlash manifests and remove
    the zsh rebuild from `.github/workflows/rust-release.yml`.
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/30114).
    * #30116
    * __->__ #30114
  • [codex] Remove async_trait from first-party code (#27475)
    ## Why
    
    First-party async traits should expose their `Send` contracts explicitly
    without requiring `async_trait`. This completes the migration pattern
    established in #27303 and #27304.
    
    ## What changed
    
    - Replaced the remaining first-party `async_trait` traits with native
    return-position `impl Future + Send` where statically dispatched and
    explicit boxed `Send` futures where object safety is required.
    - Kept implementations behavior-preserving, outlining existing async
    bodies into inherent methods where that keeps the diff reviewable.
    - Removed all direct first-party `async-trait` dependencies and the
    workspace dependency declaration.
    - Added a cargo-deny policy that permits `async-trait` only through the
    remaining transitive wrapper crates.
    - Updated `rand` from 0.8.5 to 0.8.6 to resolve RUSTSEC-2026-0097 and
    keep the full cargo-deny check passing.
    
    ## Validation
    
    - `just test -p codex-exec-server`: 216 passed, 2 skipped.
    - `just test -p codex-model-provider`: 39 passed.
    - `just test -p codex-core` and `just test`: changed tests passed;
    remaining failures are environment-sensitive suites unrelated to this
    migration.
    - `cargo deny check`
    - `just fix`
    - `just fmt`
    - `cargo shear`
    - `just bazel-lock-check`
  • Disable empty Cargo test targets (#21584)
    ## Summary
    
    `cargo test` has entails both running standard Rust tests and doctests.
    It turns out that the doctest discovery is fairly slow, and it's a cost
    you pay even for crates that don't include any doctests.
    
    This PR disables doctests with `doctest = false` for crates that lack
    any doctests.
    
    For the collection of crates below, this speeds up test execution by
    >4x.
    
    E.g., before this PR:
    
    ```
    Benchmark 1: cargo test     -p codex-utils-absolute-path     -p codex-utils-cache     -p codex-utils-cli     -p codex-utils-home-dir     -p codex-utils-output-truncation     -p codex-utils-path     -p codex-utils-string     -p codex-utils-template     -p codex-utils-elapsed     -p codex-utils-json-to-toml
      Time (mean ± σ):      1.849 s ±  4.455 s    [User: 0.752 s, System: 1.367 s]
      Range (min … max):    0.418 s … 14.529 s    10 runs
    ```
    
    And after:
    
    ```
    Benchmark 1: cargo test     -p codex-utils-absolute-path     -p codex-utils-cache     -p codex-utils-cli     -p codex-utils-home-dir     -p codex-utils-output-truncation     -p codex-utils-path     -p codex-utils-string     -p codex-utils-template     -p codex-utils-elapsed     -p codex-utils-json-to-toml
      Time (mean ± σ):     428.6 ms ±   6.9 ms    [User: 187.7 ms, System: 219.7 ms]
      Range (min … max):   418.0 ms … 436.8 ms    10 runs
    ```
    
    For a single crate, with >2x speedup, before:
    
    ```
    Benchmark 1: cargo test -p codex-utils-string
      Time (mean ± σ):     491.1 ms ±   9.0 ms    [User: 229.8 ms, System: 234.9 ms]
      Range (min … max):   480.9 ms … 512.0 ms    10 runs
    ```
    
    And after:
    
    ```
    Benchmark 1: cargo test -p codex-utils-string
      Time (mean ± σ):     213.9 ms ±   4.3 ms    [User: 112.8 ms, System: 84.0 ms]
      Range (min … max):   206.8 ms … 221.0 ms    13 runs
    ```
    
    Co-authored-by: Codex <noreply@openai.com>
  • permissions: make profiles represent enforcement (#19231)
    ## Why
    
    `PermissionProfile` is becoming the canonical permissions abstraction,
    but the old shape only carried optional filesystem and network fields.
    It could describe allowed access, but not who is responsible for
    enforcing it. That made `DangerFullAccess` and `ExternalSandbox` lossy
    when profiles were exported, cached, or round-tripped through app-server
    APIs.
    
    The important model change is that active permissions are now a disjoint
    union over the enforcement mode. Conceptually:
    
    ```rust
    pub enum PermissionProfile {
        Managed {
            file_system: FileSystemSandboxPolicy,
            network: NetworkSandboxPolicy,
        },
        Disabled,
        External {
            network: NetworkSandboxPolicy,
        },
    }
    ```
    
    This distinction matters because `Disabled` means Codex should apply no
    outer sandbox at all, while `External` means filesystem isolation is
    owned by an outside caller. Those are not equivalent to a broad managed
    sandbox. For example, macOS cannot nest Seatbelt inside Seatbelt, so an
    inner sandbox may require the outer Codex layer to use no sandbox rather
    than a permissive one.
    
    ## How Existing Modeling Maps
    
    Legacy `SandboxPolicy` remains a boundary projection, but it now maps
    into the higher-fidelity profile model:
    
    - `ReadOnly` and `WorkspaceWrite` map to `PermissionProfile::Managed`
    with restricted filesystem entries plus the corresponding network
    policy.
    - `DangerFullAccess` maps to `PermissionProfile::Disabled`, preserving
    the “no outer sandbox” intent instead of treating it as a lax managed
    sandbox.
    - `ExternalSandbox { network_access }` maps to
    `PermissionProfile::External { network }`, preserving external
    filesystem enforcement while still carrying the active network policy.
    - Split runtime policies that legacy `SandboxPolicy` cannot faithfully
    express, such as managed unrestricted filesystem plus restricted
    network, stay `Managed` instead of being collapsed into
    `ExternalSandbox`.
    - Per-command/session/turn grants remain partial overlays via
    `AdditionalPermissionProfile`; full `PermissionProfile` is reserved for
    complete active runtime permissions.
    
    ## What Changed
    
    - Change active `PermissionProfile` into a tagged union: `managed`,
    `disabled`, and `external`.
    - Keep partial permission grants separate with
    `AdditionalPermissionProfile` for command/session/turn overlays.
    - Represent managed filesystem permissions as either `restricted`
    entries or `unrestricted`; `glob_scan_max_depth` is non-zero when
    present.
    - Preserve old rollout compatibility by accepting the pre-tagged `{
    network, file_system }` profile shape during deserialization.
    - Preserve fidelity for important edge cases: `DangerFullAccess`
    round-trips as `disabled`, `ExternalSandbox` round-trips as `external`,
    and managed unrestricted filesystem + restricted network stays managed
    instead of being mistaken for external enforcement.
    - Preserve configured deny-read entries and bounded glob scan depth when
    full profiles are projected back into runtime policies, including
    unrestricted replacements that now become `:root = write` plus deny
    entries.
    - Regenerate the experimental app-server v2 JSON/TypeScript schema and
    update the `command/exec` README example for the tagged
    `permissionProfile` shape.
    
    ## Compatibility
    
    Legacy `SandboxPolicy` remains available at config/API boundaries as the
    compatibility projection. Existing rollout lines with the old
    `PermissionProfile` shape continue to load. The app-server
    `permissionProfile` field is experimental, so its v2 wire shape is
    intentionally updated to match the higher-fidelity model.
    
    ## Verification
    
    - `just write-app-server-schema`
    - `cargo check --tests`
    - `cargo test -p codex-protocol permission_profile`
    - `cargo test -p codex-protocol
    preserving_deny_entries_keeps_unrestricted_policy_enforceable`
    - `cargo test -p codex-app-server-protocol
    permission_profile_file_system_permissions`
    - `cargo test -p codex-app-server-protocol serialize_client_response`
    - `cargo test -p codex-core
    session_configured_reports_permission_profile_for_external_sandbox`
    - `just fix`
    - `just fix -p codex-protocol`
    - `just fix -p codex-app-server-protocol`
    - `just fix -p codex-core`
    - `just fix -p codex-app-server`
  • shell-escalation: carry resolved permission profiles (#18287)
    ## Why
    
    Shell escalation still has adapter code that expects a legacy sandbox
    policy, but command approvals should carry the resolved
    `PermissionProfile` so callers can reason about the granted permissions
    canonically.
    
    ## What changed
    
    This introduces profile-shaped resolved escalation permissions while
    retaining the derived legacy sandbox policy for the Unix escalation
    adapter. It updates approval types, the escalation server protocol, and
    tests that inspect escalated command permissions.
    
    ## Verification
    
    - `cargo test -p codex-core --test all handle_container_exec_ --
    --nocapture`
    - `cargo test -p codex-core --test all handle_sandbox_ -- --nocapture`
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18287).
    * #18288
    * __->__ #18287
  • refactor: narrow async lock guard lifetimes (#18211)
    Follow-up to https://github.com/openai/codex/pull/18178, where we called
    out enabling the await-holding lint as a follow-up.
    
    The long-term goal is to enable Clippy coverage for async guards held
    across awaits. This PR is intentionally only the first, low-risk cleanup
    pass: it narrows obvious lock guard lifetimes and leaves
    `codex-rs/Cargo.toml` unchanged so the lint is not enabled until the
    remaining cases are fixed or explicitly justified. It intentionally
    leaves the active-turn/turn-state locking pattern alone because those
    checks and mutations need to stay atomic.
    
    ## Common fixes used here
    
    These are the main patterns reviewers should expect in this PR, and they
    are also the patterns to reach for when fixing future `await_holding_*`
    findings:
    
    - **Scope the guard to the synchronous work.** If the code only needs
    data from a locked value, move the lock into a small block, clone or
    compute the needed values, and do the later `.await` after the block.
    - **Use direct one-line mutations when there is no later await.** Cases
    like `map.lock().await.remove(&id)` are acceptable when the guard is
    only needed for that single mutation and the statement ends before any
    async work.
    - **Drain or clone work out of the lock before notifying or awaiting.**
    For example, the JS REPL drains pending exec senders into a local vector
    and the websocket writer clones buffered envelopes before it serializes
    or sends them.
    - **Use a `Semaphore` only when serialization is intentional across
    async work.** The test serialization guards intentionally span awaited
    setup or execution, so using a semaphore communicates "one at a time"
    without holding a mutex guard.
    - **Remove the mutex when there is only one owner.** The PTY stdin
    writer task owns `stdin` directly; the old `Arc<Mutex<_>>` did not
    protect shared access because nothing else had access to the writer.
    - **Do not split locks that protect an atomic invariant.** This PR
    deliberately leaves active-turn/turn-state paths alone because those
    checks and mutations need to stay atomic. Those cases should be fixed
    separately with a design change or documented with `#[expect]`.
    
    ## What changed
    
    - Narrow scoped async mutex guards in app-server, JS REPL, network
    approval, remote-control websocket, and the RMCP test server.
    - Replace test-only async mutex serialization guards with semaphores
    where the guard intentionally lives across async work.
    - Let the PTY pipe writer task own stdin directly instead of wrapping it
    in an async mutex.
    
    ## Verification
    
    - `just fix -p codex-core -p codex-app-server -p codex-rmcp-client -p
    codex-shell-escalation -p codex-utils-pty -p codex-utils-readiness`
    - `just clippy -p codex-core`
    - `cargo test -p codex-core -p codex-app-server -p codex-rmcp-client -p
    codex-shell-escalation -p codex-utils-pty -p codex-utils-readiness` was
    run; the app-server suite passed, and `codex-core` failed in the local
    sandbox on six otel approval tests plus
    `suite::user_shell_cmd::user_shell_command_does_not_set_network_sandbox_env_var`,
    which appear to depend on local command approval/default rules and
    `CODEX_SANDBOX_NETWORK_DISABLED=1` in this environment.
  • [codex] Make AbsolutePathBuf joins infallible (#16981)
    Having to check for errors every time join is called is painful and
    unnecessary.
  • [codex] reduce module visibility (#16978)
    ## Summary
    - reduce public module visibility across Rust crates, preferring private
    or crate-private modules with explicit crate-root public exports
    - update external call sites and tests to use the intended public crate
    APIs instead of reaching through module trees
    - add the module visibility guideline to AGENTS.md
    
    ## Validation
    - `cargo check --workspace --all-targets --message-format=short` passed
    before the final fix/format pass
    - `just fix` completed successfully
    - `just fmt` completed successfully
    - `git diff --check` passed
  • ci: verify codex-rs Cargo manifests inherit workspace settings (#16353)
    ## Why
    
    Bazel clippy now catches lints that `cargo clippy` can still miss when a
    crate under `codex-rs` forgets to opt into workspace lints. The concrete
    example here was `codex-rs/app-server/tests/common/Cargo.toml`: Bazel
    flagged a clippy violation in `models_cache.rs`, but Cargo did not
    because that crate inherited workspace package metadata without
    declaring `[lints] workspace = true`.
    
    We already mirror the workspace clippy deny list into Bazel after
    [#15955](https://github.com/openai/codex/pull/15955), so we also need a
    repo-side check that keeps every `codex-rs` manifest opted into the same
    workspace settings.
    
    ## What changed
    
    - add `.github/scripts/verify_cargo_workspace_manifests.py`, which
    parses every `codex-rs/**/Cargo.toml` with `tomllib` and verifies:
      - `version.workspace = true`
      - `edition.workspace = true`
      - `license.workspace = true`
      - `[lints] workspace = true`
    - top-level crate names follow the `codex-*` / `codex-utils-*`
    conventions, with explicit exceptions for `windows-sandbox-rs` and
    `utils/path-utils`
    - run that script in `.github/workflows/ci.yml`
    - update the current outlier manifests so the check is enforceable
    immediately
    - fix the newly exposed clippy violations in the affected crates
    (`app-server/tests/common`, `file-search`, `feedback`,
    `shell-escalation`, and `debug-client`)
    
    
    
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/16353).
    * #16351
    * __->__ #16353
  • chore: clean up argument-comment lint and roll out all-target CI on macOS (#16054)
    ## Why
    
    `argument-comment-lint` was green in CI even though the repo still had
    many uncommented literal arguments. The main gap was target coverage:
    the repo wrapper did not force Cargo to inspect test-only call sites, so
    examples like the `latest_session_lookup_params(true, ...)` tests in
    `codex-rs/tui_app_server/src/lib.rs` never entered the blocking CI path.
    
    This change cleans up the existing backlog, makes the default repo lint
    path cover all Cargo targets, and starts rolling that stricter CI
    enforcement out on the platform where it is currently validated.
    
    ## What changed
    
    - mechanically fixed existing `argument-comment-lint` violations across
    the `codex-rs` workspace, including tests, examples, and benches
    - updated `tools/argument-comment-lint/run-prebuilt-linter.sh` and
    `tools/argument-comment-lint/run.sh` so non-`--fix` runs default to
    `--all-targets` unless the caller explicitly narrows the target set
    - fixed both wrappers so forwarded cargo arguments after `--` are
    preserved with a single separator
    - documented the new default behavior in
    `tools/argument-comment-lint/README.md`
    - updated `rust-ci` so the macOS lint lane keeps the plain wrapper
    invocation and therefore enforces `--all-targets`, while Linux and
    Windows temporarily pass `-- --lib --bins`
    
    That temporary CI split keeps the stricter all-targets check where it is
    already cleaned up, while leaving room to finish the remaining Linux-
    and Windows-specific target-gated cleanup before enabling
    `--all-targets` on those runners. The Linux and Windows failures on the
    intermediate revision were caused by the wrapper forwarding bug, not by
    additional lint findings in those lanes.
    
    ## Validation
    
    - `bash -n tools/argument-comment-lint/run.sh`
    - `bash -n tools/argument-comment-lint/run-prebuilt-linter.sh`
    - shell-level wrapper forwarding check for `-- --lib --bins`
    - shell-level wrapper forwarding check for `-- --tests`
    - `just argument-comment-lint`
    - `cargo test` in `tools/argument-comment-lint`
    - `cargo test -p codex-terminal-detection`
    
    ## Follow-up
    
    - Clean up remaining Linux-only target-gated callsites, then switch the
    Linux lint lane back to the plain wrapper invocation.
    - Clean up remaining Windows-only target-gated callsites, then switch
    the Windows lint lane back to the plain wrapper invocation.
  • ci: add Bazel clippy workflow for codex-rs (#15955)
    ## Why
    `bazel.yml` already builds and tests the Bazel graph, but `rust-ci.yml`
    still runs `cargo clippy` separately. This PR starts the transition to a
    Bazel-backed lint lane for `codex-rs` so we can eventually replace the
    duplicate Rust build, test, and lint work with Bazel while explicitly
    keeping the V8 Bazel path out of scope for now.
    
    To make that lane practical, the workflow also needs to look like the
    Bazel job we already trust. That means sharing the common Bazel setup
    and invocation logic instead of hand-copying it, and covering the arm64
    macOS path in addition to Linux.
    
    Landing the workflow green also required fixing the first lint findings
    that Bazel surfaced and adding the matching local entrypoint.
    
    ## What changed
    - add a reusable `build:clippy` config to `.bazelrc` and export
    `codex-rs/clippy.toml` from `codex-rs/BUILD.bazel` so Bazel can run the
    repository's existing Clippy policy
    - add `just bazel-clippy` so the local developer entrypoint matches the
    new CI lane
    - extend `.github/workflows/bazel.yml` with a dedicated Bazel clippy job
    for `codex-rs`, scoped to `//codex-rs/... -//codex-rs/v8-poc:all`
    - run that clippy job on Linux x64 and arm64 macOS
    - factor the shared Bazel workflow setup into
    `.github/actions/setup-bazel-ci/action.yml` and the shared Bazel
    invocation logic into `.github/scripts/run-bazel-ci.sh` so the clippy
    and build/test jobs stay aligned
    - fix the first Bazel-clippy findings needed to keep the lane green,
    including the cross-target `cmsghdr::cmsg_len` normalization in
    `codex-rs/shell-escalation/src/unix/socket.rs` and the no-`voice-input`
    dead-code warnings in `codex-rs/tui` and `codex-rs/tui_app_server`
    
    ## Verification
    - `just bazel-clippy`
    - `RUNNER_OS=macOS ./.github/scripts/run-bazel-ci.sh -- build
    --config=clippy --build_metadata=COMMIT_SHA=local-check
    --build_metadata=TAG_job=clippy -- //codex-rs/...
    -//codex-rs/v8-poc:all`
    - `bazel build --config=clippy
    //codex-rs/shell-escalation:shell-escalation`
    - `CARGO_TARGET_DIR=/tmp/codex4-shell-escalation-test cargo test -p
    codex-shell-escalation`
    - `ruby -e 'require "yaml";
    YAML.load_file(".github/workflows/bazel.yml");
    YAML.load_file(".github/actions/setup-bazel-ci/action.yml")'`
    
    ## Notes
    - `CARGO_TARGET_DIR=/tmp/codex4-tui-app-server-test cargo test -p
    codex-tui-app-server` still hits existing guardian-approvals test and
    snapshot failures unrelated to this PR's Bazel-clippy changes.
    
    Related: #15954
  • fix: keep zsh-fork release assets after removing shell-tool-mcp (#15644)
    ## Why
    
    `shell-tool-mcp` and the Bash fork are no longer needed, but the patched
    zsh fork is still relevant for shell escalation and for the
    DotSlash-backed zsh-fork integration tests.
    
    Deleting the old `shell-tool-mcp` workflow also deleted the only
    pipeline that rebuilt those patched zsh binaries. This keeps the package
    removal, while preserving a small release path that can be reused
    whenever `codex-rs/shell-escalation/patches/zsh-exec-wrapper.patch`
    changes.
    
    ## What changed
    
    - removed the `shell-tool-mcp` workspace package, its npm
    packaging/release jobs, the Bash test fixture, and the remaining
    Bash-specific compatibility wiring
    - deleted the old `.github/workflows/shell-tool-mcp.yml` and
    `.github/workflows/shell-tool-mcp-ci.yml` workflows now that their
    responsibilities have been replaced or removed
    - kept the zsh patch under
    `codex-rs/shell-escalation/patches/zsh-exec-wrapper.patch` and updated
    the `codex-rs/shell-escalation` docs/code to describe the zsh-based flow
    directly
    - added `.github/workflows/rust-release-zsh.yml` to build only the three
    zsh binaries that `codex-rs/app-server/tests/suite/zsh` needs today:
      - `aarch64-apple-darwin` on `macos-15`
      - `x86_64-unknown-linux-musl` on `ubuntu-24.04`
      - `aarch64-unknown-linux-musl` on `ubuntu-24.04`
    - extracted the shared zsh build/smoke-test/stage logic into
    `.github/scripts/build-zsh-release-artifact.sh`, made that helper
    directly executable, and now invoke it directly from the workflow so the
    Linux and macOS jobs only keep the OS-specific setup in YAML
    - wired those standalone `codex-zsh-*.tar.gz` assets into
    `rust-release.yml` and added `.github/dotslash-zsh-config.json` so
    releases also publish a `codex-zsh` DotSlash file
    - updated the checked-in `codex-rs/app-server/tests/suite/zsh` fixture
    comments to explain that new releases come from the standalone zsh
    assets, while the checked-in fixture remains pinned to the latest
    historical release until a newer zsh artifact is published
    - tightened a couple of follow-on cleanups in
    `codex-rs/shell-escalation`: the `ExecParams::command` comment now
    describes the shell `-c`/`-lc` string more clearly, and the README now
    points at the same `git.code.sf.net` zsh source URL that the workflow
    uses
    
    ## Testing
    
    - `cargo test -p codex-shell-escalation`
    - `just argument-comment-lint`
    - `bash -n .github/scripts/build-zsh-release-artifact.sh`
    - attempted `cargo test -p codex-core`; unrelated existing failures
    remain, but the touched `tools::runtimes::shell::unix_escalation::*`
    coverage passed during that run
  • fix: preserve zsh-fork escalation fds across unified-exec spawn paths (#13644)
    ## Why
    
    `zsh-fork` sessions launched through unified-exec need the escalation
    socket to survive the wrapper -> server -> child handoff so later
    intercepted `exec()` calls can still reach the escalation server.
    
    The inherited-fd spawn path also needs to avoid closing Rust's internal
    exec-error pipe, and the shell-escalation handoff needs to tolerate the
    receive-side case where a transferred fd is installed into the same
    stdio slot it will be mapped onto.
    
    ## What Changed
    
    - Added `SpawnLifecycle::inherited_fds()` in
    `codex-rs/core/src/unified_exec/process.rs` and threaded inherited fds
    through `codex-rs/core/src/unified_exec/process_manager.rs` so
    unified-exec can preserve required descriptors across both PTY and
    no-stdin pipe spawn paths.
    - Updated `codex-rs/core/src/tools/runtimes/shell/zsh_fork_backend.rs`
    to expose the escalation socket fd through the spawn lifecycle.
    - Added inherited-fd-aware spawn helpers in
    `codex-rs/utils/pty/src/pty.rs` and `codex-rs/utils/pty/src/pipe.rs`,
    including Unix pre-exec fd pruning that preserves requested inherited
    fds while leaving `FD_CLOEXEC` descriptors alone. The pruning helper is
    now named `close_inherited_fds_except()` to better describe that
    behavior.
    - Updated `codex-rs/shell-escalation/src/unix/escalate_client.rs` to
    duplicate local stdio before transfer and send destination stdio numbers
    in `SuperExecMessage`, so the wrapper keeps using its own
    `stdin`/`stdout`/`stderr` until the escalated child takes over.
    - Updated `codex-rs/shell-escalation/src/unix/escalate_server.rs` so the
    server accepts the overlap case where a received fd reuses the same
    stdio descriptor number that the child setup will target with `dup2`.
    - Added comments around the PTY stdio wiring and the overlap regression
    helper to make the fd handoff and controlling-terminal setup easier to
    follow.
    
    ## Verification
    
    - `cargo test -p codex-utils-pty`
    - covers preserved-fd PTY spawn behavior, PTY resize, Python REPL
    continuity, exec-failure reporting, and the no-stdin pipe path
    - `cargo test -p codex-shell-escalation`
    - covers duplicated-fd transfer on the client side and verifies the
    overlap case by passing a pipe-backed stdin payload through the
    server-side `dup2` path
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13644).
    * #14624
    * __->__ #13644
  • start of hooks engine (#13276)
    (Experimental)
    
    This PR adds a first MVP for hooks, with SessionStart and Stop
    
    The core design is:
    
    - hooks live in a dedicated engine under codex-rs/hooks
    - each hook type has its own event-specific file
    - hook execution is synchronous and blocks normal turn progression while
    running
    - matching hooks run in parallel, then their results are aggregated into
    a normalized HookRunSummary
    
    On the AppServer side, hooks are exposed as operational metadata rather
    than transcript-native items:
    
    - new live notifications: hook/started, hook/completed
    - persisted/replayed hook results live on Turn.hookRuns
    - we intentionally did not add hook-specific ThreadItem variants
    
    Hooks messages are not persisted, they remain ephemeral. The context
    changes they add are (they get appended to the user's prompt)
  • fix(ci): restore guardian coverage and bazel unit tests (#13912)
    ## Summary
    - restore the guardian review request snapshot test and its tracked
    snapshot after it was dropped from `main`
    - make Bazel Rust unit-test wrappers resolve runfiles correctly on
    manifest-only platforms like macOS and point Insta at the real workspace
    root
    - harden the shell-escalation socket-closure assertion so the musl Bazel
    test no longer depends on fd reuse behavior
    
    ## Verification
    - cargo test -p codex-core
    guardian_review_request_layout_matches_model_visible_request_snapshot
    - cargo test -p codex-shell-escalation
    - bazel test //codex-rs/exec:exec-unit-tests
    //codex-rs/shell-escalation:shell-escalation-unit-tests
    
    Supersedes #13894.
    
    ---------
    
    Co-authored-by: Ahmed Ibrahim <aibrahim@openai.com>
    Co-authored-by: viyatb-oai <viyatb@openai.com>
    Co-authored-by: Codex <noreply@openai.com>
  • refactor: prepare unified exec for zsh-fork backend (#13392)
    ## Why
    
    `shell_zsh_fork` already provides stronger guarantees around which
    executables receive elevated permissions. To reuse that machinery from
    unified exec without pushing Unix-specific escalation details through
    generic runtime code, the escalation bootstrap and session lifetime
    handling need a cleaner boundary.
    
    That boundary also needs to be safe for long-lived sessions: when an
    intercepted shell session is closed or pruned, any in-flight approval
    workers and any already-approved escalated child they spawned must be
    torn down with the session, and the inherited escalation socket must not
    leak into unrelated subprocesses.
    
    ## What Changed
    
    - Extracted a reusable `EscalationSession` and
    `EscalateServer::start_session(...)` in `shell-escalation` so callers
    can get the wrapper/socket env overlay and keep the escalation server
    alive without immediately running a one-shot command.
    - Documented that `EscalationSession::env()` and
    `ShellCommandExecutor::run(...)` exchange only that env overlay, which
    callers must merge into their own base shell environment.
    - Clarified the prepared-exec helper boundary in `core` by naming the
    new helper APIs around `ExecRequest`, while keeping the legacy
    `execute_env(...)` entrypoints as thin compatibility wrappers for
    existing callers that still use the older naming.
    - Added a small post-spawn hook on the prepared execution path so the
    parent copy of the inheritable escalation socket is closed immediately
    after both the existing one-shot shell-command spawn and the
    unified-exec spawn.
    - Made session teardown explicit with session-scoped cancellation:
    dropping an `EscalationSession` or canceling its parent request now
    stops intercept workers, and the server-spawned escalated child uses
    `kill_on_drop(true)` so teardown cannot orphan an already-approved
    child.
    - Added `UnifiedExecBackendConfig` plumbing through `ToolsConfig`, a
    `shell::zsh_fork_backend` facade, and an opaque unified-exec
    spawn-lifecycle hook so unified exec can prepare a wrapped `zsh -c/-lc`
    request without storing `EscalationSession` directly in generic
    process/runtime code.
    - Kept the existing `shell_command` zsh-fork behavior intact on top of
    the new bootstrap path. Tool selection is unchanged in this PR: when
    `shell_zsh_fork` is enabled, `ShellCommand` still wins over
    `exec_command`.
    
    ## Verification
    
    - `cargo test -p codex-shell-escalation`
      - includes coverage for `start_session_exposes_wrapper_env_overlay`
      - includes coverage for `exec_closes_parent_socket_after_shell_spawn`
    - includes coverage for
    `dropping_session_aborts_intercept_workers_and_kills_spawned_child`
    - `cargo test -p codex-core
    shell_zsh_fork_prefers_shell_command_over_unified_exec`
    - `cargo test -p codex-core --test all
    shell_zsh_fork_prompts_for_skill_script_execution`
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13392).
    * #13432
    * __->__ #13392
  • chore: Nest skill and protocol network permissions under network.enabled (#13427)
    ## Summary
    
    Changes the permission profile shape from a bare network boolean to a
    nested object.
    
    Before:
    
    ```yaml
    permissions:
      network: true
    ```
    
    After:
    
    ```yaml
    permissions:
      network:
        enabled: true
    ```
    
    This also updates the shared Rust and app-server protocol types so
    `PermissionProfile.network` is no longer `Option<bool>`, but
    `Option<NetworkPermissions>` with `enabled: Option<bool>`.
    
    ## What Changed
    
    - Updated `PermissionProfile` in `codex-rs/protocol/src/models.rs`:
    - `pub network: Option<bool>` -> `pub network:
    Option<NetworkPermissions>`
    - Added `NetworkPermissions` with:
      - `pub enabled: Option<bool>`
    - Changed emptiness semantics so `network` is only considered empty when
    `enabled` is `None`
    - Updated skill metadata parsing to accept `permissions.network.enabled`
    - Updated core permission consumers to read
    `network.enabled.unwrap_or(false)` where a concrete boolean is needed
    - Updated app-server v2 protocol types and regenerated schema/TypeScript
    outputs
    - Updated docs to mention `additionalPermissions.network.enabled`
  • fix: use https://git.savannah.gnu.org/git/bash instead of https://github.com/bolinfest/bash (#13057)
    Historically, we cloned the Bash repo from
    https://github.com/bminor/bash, but for whatever reason, it was removed
    at some point.
    
    I had a local clone of it, so I pushed it to
    https://github.com/bolinfest/bash so that we could continue running our
    CI job. I did this in https://github.com/openai/codex/pull/9563, and as
    you can see, I did not tamper with the commit hash we used as the basis
    of this build.
    
    Using a personal fork is not great, so this PR changes the CI job to use
    what appears to be considered the source of truth for Bash, which is
    https://git.savannah.gnu.org/git/bash.git.
    
    Though in testing this out, it appears this Git server does not support
    the combination of `git clone --depth 1
    https://git.savannah.gnu.org/git/bash` and `git fetch --depth 1 origin
    a8a1c2fac029404d3f42cd39f5a20f24b6e4fe4b`, as it fails with the
    following error:
    
    ```
    error: Server does not allow request for unadvertised object a8a1c2fac029404d3f42cd39f5a20f24b6e4fe4b
    ```
    
    so unfortunately this means that we have to do a full clone instead of a
    shallow clone in our CI jobs, which will be a bit slower.
    
    Also updated `codex-rs/shell-escalation/README.md` to reflect this
    change.
  • feat: include sandbox config with escalation request (#12839)
    ## Why
    
    Before this change, an escalation approval could say that a command
    should be rerun, but it could not carry the sandbox configuration that
    should still apply when the escalated command is actually spawned.
    
    That left an unsafe gap in the `zsh-fork` skill path: skill scripts
    under `scripts/` that did not declare permissions could be escalated
    without a sandbox, and scripts that did declare permissions could lose
    their bounded sandbox on rerun or cached session approval.
    
    This PR extends the escalation protocol so approvals can optionally
    carry sandbox configuration all the way through execution. That lets the
    shell runtime preserve the intended sandbox instead of silently widening
    access.
    
    We likely want a single permissions type for this codepath eventually,
    probably centered on `Permissions`. For now, the protocol needs to
    represent both the existing `PermissionProfile` form and the fuller
    `Permissions` form, so this introduces a temporary disjoint union,
    `EscalationPermissions`, to carry either one.
    
    Further, this means that today, a skill either:
    
    - does not declare any permissions, in which case it is run using the
    default sandbox for the turn
    - specifies permissions, in which case the skill is run using that exact
    sandbox, which might be more restrictive than the default sandbox for
    the turn
    
    We will likely change the skill's permissions to be additive to the
    existing permissions for the turn.
    
    ## What Changed
    
    - Added `EscalationPermissions` to `codex-protocol` so escalation
    requests can carry either a `PermissionProfile` or a full `Permissions`
    payload.
    - Added an explicit `EscalationExecution` mode to the shell escalation
    protocol so reruns distinguish between `Unsandboxed`, `TurnDefault`, and
    `Permissions(...)` instead of overloading `None`.
    - Updated `zsh-fork` shell reruns to resolve `TurnDefault` at execution
    time, which keeps ordinary `UseDefault` commands on the turn sandbox and
    preserves turn-level macOS seatbelt profile extensions.
    - Updated the `zsh-fork` skill path so a skill with no declared
    permissions inherits the conversation's effective sandbox instead of
    escalating unsandboxed.
    - Updated the `zsh-fork` skill path so a skill with declared permissions
    reruns with exactly those permissions, including when a cached session
    approval is reused.
    
    ## Testing
    
    - Added unit coverage in
    `core/src/tools/runtimes/shell/unix_escalation.rs` for the explicit
    `UseDefault` / `RequireEscalated` / `WithAdditionalPermissions`
    execution mapping.
    - Added unit coverage in
    `core/src/tools/runtimes/shell/unix_escalation.rs` for macOS seatbelt
    extension preservation in both the `TurnDefault` and
    explicit-permissions rerun paths.
    - Added integration coverage in `core/tests/suite/skill_approval.rs` for
    permissionless skills inheriting the turn sandbox and explicit skill
    permissions remaining bounded across cached approval reuse.
  • fix: keep shell escalation exec paths absolute (#12750)
    ## Why
    
    In the `shell_zsh_fork` flow, `codex-shell-escalation` receives the
    executable path exactly as the shell passed it to `execve()`. That path
    is not guaranteed to be absolute.
    
    For commands such as `./scripts/hello-mbolin.sh`, if the shell was
    launched with a different `workdir`, resolving the intercepted `file`
    against the server process working directory makes policy checks and
    skill matching inspect the wrong executable. This change pushes that fix
    a step further by keeping the normalized path typed as `AbsolutePathBuf`
    throughout the rest of the escalation pipeline.
    
    That makes the absolute-path invariant explicit, so later code cannot
    accidentally treat the resolved executable path as an arbitrary
    `PathBuf`.
    
    ## What Changed
    
    - record the wrapper process working directory as an `AbsolutePathBuf`
    - update the escalation protocol so `workdir` is explicitly absolute
    while `file` remains the raw intercepted exec path
    - resolve a relative intercepted `file` against the request `workdir` as
    soon as the server receives the request
    - thread `AbsolutePathBuf` through `EscalationPolicy`,
    `CoreShellActionProvider`, and command normalization helpers so the
    resolved executable path stays type-checked as absolute
    - replace the `path-absolutize` dependency in `codex-shell-escalation`
    with `codex-utils-absolute-path`
    - add a regression test that covers a relative `file` with a distinct
    `workdir`
    
    ## Verification
    
    - `cargo test -p codex-shell-escalation`
  • fix: make EscalateServer public and remove shell escalation wrappers (#12724)
    ## Why
    
    `codex-shell-escalation` exposed a `codex-core`-specific adapter layer
    (`ShellActionProvider`, `ShellPolicyFactory`, and `run_escalate_server`)
    that existed only to bridge `codex-core` to `EscalateServer`. That
    indirection increased API surface and obscured crate ownership without
    adding behavior.
    
    This change moves orchestration into `codex-core` so boundaries are
    clearer: `codex-shell-escalation` provides reusable escalation
    primitives, and `codex-core` provides shell-tool policy decisions.
    
    Admittedly, @pakrym rightfully requested this sort of cleanup as part of
    https://github.com/openai/codex/pull/12649, though this avoids moving
    all of `codex-shell-escalation` into `codex-core`.
    
    ## What changed
    
    - Made `EscalateServer` public and exported it from `shell-escalation`.
    - Removed the adapter layer from `shell-escalation`:
      - deleted `shell-escalation/src/unix/core_shell_escalation.rs`
    - removed exports for `ShellActionProvider`, `ShellPolicyFactory`,
    `EscalationPolicyFactory`, and `run_escalate_server`
    - Updated `core/src/tools/runtimes/shell/unix_escalation.rs` to:
      - create `Stopwatch`/cancellation in `codex-core`
      - instantiate `EscalateServer` directly
      - implement `EscalationPolicy` directly on `CoreShellActionProvider`
    
    Net effect: same escalation flow with fewer wrappers and a smaller
    public API.
    
    ## Verification
    
    - Manually reviewed the old vs. new escalation call flow to confirm
    timeout/cancellation behavior and approval policy decisions are
    preserved while removing wrapper types.
  • feat: run zsh fork shell tool via shell-escalation (#12649)
    ## Why
    
    This PR switches the `shell_command` zsh-fork path over to
    `codex-shell-escalation` so the new shell tool can use the shared
    exec-wrapper/escalation protocol instead of the `zsh_exec_bridge`
    implementation that was introduced in
    https://github.com/openai/codex/pull/12052. `zsh_exec_bridge` relied on
    UNIX domain sockets, which is not as tamper-proof as the FD-based
    approach in `codex-shell-escalation`.
    
    ## What Changed
    
    - Added a Unix zsh-fork runtime adapter in `core`
    (`core/src/tools/runtimes/shell/unix_escalation.rs`) that:
    - runs zsh-fork commands through
    `codex_shell_escalation::run_escalate_server`
      - bridges exec-policy / approval decisions into `ShellActionProvider`
    - executes escalated commands via a `ShellCommandExecutor` that calls
    `process_exec_tool_call`
    - Updated `ShellRuntime` / `ShellCommandHandler` / tool spec wiring to
    select a `shell_command` backend (`classic` vs `zsh-fork`) while leaving
    the generic `shell` tool path unchanged.
    - Removed the `zsh_exec_bridge`-based session service and deleted
    `core/src/zsh_exec_bridge/mod.rs`.
    - Moved exec-wrapper entrypoint dispatch to `arg0` by handling the
    `codex-execve-wrapper` arg0 alias there, and removed the old
    `codex_core::maybe_run_zsh_exec_wrapper_mode()` hooks from `cli` and
    `app-server` mains.
    - Added the needed `codex-shell-escalation` dependencies for `core` and
    `arg0`.
    
    ## Tests
    
    - `cargo test -p codex-core
    shell_zsh_fork_prefers_shell_command_over_unified_exec`
    - `cargo test -p codex-app-server turn_start_shell_zsh_fork --
    --nocapture`
    - verifies zsh-fork command execution and approval flows through the new
    backend
    - includes subcommand approve/decline coverage using the shared zsh
    DotSlash fixture in `app-server/tests/suite/zsh`
    - To test manually, I added the following to `~/.codex/config.toml`:
    
    ```toml
    zsh_path = "/Users/mbolin/code/codex3/codex-rs/app-server/tests/suite/zsh"
    
    [features]
    shell_zsh_fork = true
    ```
    
    Then I ran `just c` to run the dev build of Codex with these changes and
    sent it the message:
    
    ```
    run `echo $0`
    ```
    
    And it replied with:
    
    ```
      echo $0 printed:
    
      /Users/mbolin/code/codex3/codex-rs/app-server/tests/suite/zsh
    
      In this tool context, $0 reflects the script path used to invoke the shell, not just zsh.
    ```
    
    so the tool appears to be wired up correctly.
    
    ## Notes
    
    - The zsh subcommand-decline integration test now uses `rm` under a
    `WorkspaceWrite` sandbox. The previous `/usr/bin/true` scenario is
    auto-allowed by the new `shell-escalation` policy path, which no longer
    produces subcommand approval prompts.
  • refactor: decouple shell-escalation from codex-core (#12638)
    ## Why
    
    After removing `exec-server`, the next step is to wire a new shell tool
    to `codex-rs/shell-escalation` directly.
    
    That is blocked while `codex-shell-escalation` depends on `codex-core`,
    because the new integration would require `codex-core` to depend on
    `codex-shell-escalation` and create a dependency cycle.
    
    This change ports the reusable pieces from the earlier prep work, but
    drops the old compatibility shim because `exec-server`/MCP support is
    already gone.
    
    ## What Changed
    
    ### Decouple `shell-escalation` from `codex-core`
    
    - Introduce a crate-local `SandboxState` in `shell-escalation`
    - Introduce a `ShellCommandExecutor` trait so callers provide process
    execution/sandbox integration
    - Update `EscalateServer::exec(...)` and `run_escalate_server(...)` to
    use the injected executor
    - Remove the direct `codex_core::exec::process_exec_tool_call(...)` call
    from `shell-escalation`
    - Remove the `codex-core` dependency from `codex-shell-escalation`
    
    ### Restore reusable policy adapter exports
    
    - Re-enable `unix::core_shell_escalation`
    - Export `ShellActionProvider` and `ShellPolicyFactory` from
    `shell-escalation`
    - Keep the crate root API simple (no `legacy_api` compatibility layer)
    
    ### Port socket fixes from the earlier prep commit
    
    - Use `socket2::Socket::pair_raw(...)` for AF_UNIX socketpairs and
    restore `CLOEXEC` explicitly on both endpoints
    - Keep `CLOEXEC` cleared only on the single datagram client FD that is
    intentionally passed across `exec`
    - Clean up `tokio::AsyncFd::try_io(...)` error handling in the socket
    helpers
    
    ## Verification
    
    - `cargo shear`
    - `cargo clippy -p codex-shell-escalation --tests`
    - `cargo test -p codex-shell-escalation`
  • refactor: delete exec-server and move execve wrapper into shell-escalation (#12632)
    ## Why
    
    We already plan to remove the shell-tool MCP path, and doing that
    cleanup first makes the follow-on `shell-escalation` work much simpler.
    
    This change removes the last remaining reason to keep
    `codex-rs/exec-server` around by moving the `codex-execve-wrapper`
    binary and shared shell test fixtures to the crates/tests that now own
    that functionality.
    
    ## What Changed
    
    ### Delete `codex-rs/exec-server`
    
    - Remove the `exec-server` crate, including the MCP server binary,
    MCP-specific modules, and its test support/test suite
    - Remove `exec-server` from the `codex-rs` workspace and update
    `Cargo.lock`
    
    ### Move `codex-execve-wrapper` into `codex-rs/shell-escalation`
    
    - Move the wrapper implementation into `shell-escalation`
    (`src/unix/execve_wrapper.rs`)
    - Add the `codex-execve-wrapper` binary entrypoint under
    `shell-escalation/src/bin/`
    - Update `shell-escalation` exports/module layout so the wrapper
    entrypoint is hosted there
    - Move the wrapper README content from `exec-server` to
    `shell-escalation/README.md`
    
    ### Move shared shell test fixtures to `app-server`
    
    - Move the DotSlash `bash`/`zsh` test fixtures from
    `exec-server/tests/suite/` to `app-server/tests/suite/`
    - Update `app-server` zsh-fork tests to reference the new fixture paths
    
    ### Keep `shell-tool-mcp` as a shell-assets package
    
    - Update `.github/workflows/shell-tool-mcp.yml` packaging so the npm
    artifact contains only patched Bash/Zsh payloads (no Rust binaries)
    - Update `shell-tool-mcp/package.json`, `shell-tool-mcp/src/index.ts`,
    and docs to reflect the shell-assets-only package shape
    - `shell-tool-mcp-ci.yml` does not need changes because it is already
    JS-only
    
    ## Verification
    
    - `cargo shear`
    - `cargo clippy -p codex-shell-escalation --tests`
    - `just clippy`
  • refactor: normalize unix module layout for exec-server and shell-escalation (#12556)
    ## Why
    Shell execution refactoring in `exec-server` had become split between
    duplicated code paths, which blocked a clean introduction of the new
    reusable shell escalation flow. This commit creates a dedicated
    foundation crate so later shell tooling changes can share one
    implementation.
    
    ## What changed
    - Added the `codex-shell-escalation` crate and moved the core escalation
    pieces (`mcp` protocol/socket/session flow, policy glue) that were
    previously in `exec-server` into it.
    - Normalized `exec-server` Unix structure under a dedicated `unix`
    module layout and kept non-Unix builds narrow.
    - Wired crate/build metadata so `shell-escalation` is a first-class
    workspace dependency for follow-on integration work.
    
    ## Verification
    - Built and linted the stack at this commit point with `just clippy`.
    
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/12556).
    * #12584
    * #12583
    * __->__ #12556