Commit Graph

118 Commits

  • [codex][mcp] Add resource uri meta to tool call item. (#17831)
    - [x] Add resource uri meta to tool call item so that the app-server
    client can start prefetching resources immediately without loading mcp
    server status.
  • sandbox: remove dead seatbelt helper and update tests (#17859)
    ## Why
    
    `spawn_command_under_seatbelt()` in `codex-rs/core/src/seatbelt.rs` had
    fallen out of production use and was only referenced by test-only
    wrappers. That left us with sandbox tests that could stay green even if
    the actual seatbelt exec path regressed, because production shell
    execution now flows through `SandboxManager::transform()` and
    `ExecRequest::from_sandbox_exec_request()` instead of that helper.
    
    Removing the dead helper also exposed one downstream `codex-exec`
    integration test that still imported it, which broke `just clippy`.
    
    ## What Changed
    
    - Removed `codex-rs/core/src/seatbelt.rs` and stopped exporting
    `codex_core::seatbelt`.
    - Removed the redundant `codex-rs/core/tests/suite/seatbelt.rs` coverage
    that only exercised the dead helper.
    - Kept the `openpty` regression check, but moved it into
    `codex-rs/core/tests/suite/exec.rs` so it now runs through
    `process_exec_tool_call()`.
    - Fixed the seatbelt denial test in `codex-rs/core/tests/suite/exec.rs`
    to use `/usr/bin/touch`, so it actually exercises the sandbox instead of
    a nonexistent path.
    - Updated `codex-rs/exec/tests/suite/sandbox.rs` on macOS to build the
    sandboxed command through `build_exec_request()` and spawn the
    transformed command, instead of importing the removed helper.
    - Left the lower-level seatbelt policy coverage in
    `codex-rs/sandboxing/src/seatbelt_tests.rs`, where the policy generator
    is still covered directly.
    
    ## Verification
    
    - `cargo test -p codex-core suite::exec::`
    - `cargo test -p codex-exec`
    - `cargo clippy -p codex-exec --tests -- -D warnings`
  • Spread AbsolutePathBuf (#17792)
    Mechanical change to promote absolute paths through code.
  • [codex] reduce module visibility (#16978)
    ## Summary
    - reduce public module visibility across Rust crates, preferring private
    or crate-private modules with explicit crate-root public exports
    - update external call sites and tests to use the intended public crate
    APIs instead of reaching through module trees
    - add the module visibility guideline to AGENTS.md
    
    ## Validation
    - `cargo check --workspace --all-targets --message-format=short` passed
    before the final fix/format pass
    - `just fix` completed successfully
    - `just fmt` completed successfully
    - `git diff --check` passed
  • [codex-analytics] add protocol-native turn timestamps (#16638)
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/16638).
    * #16870
    * #16706
    * #16659
    * #16641
    * #16640
    * __->__ #16638
  • Remove OPENAI_BASE_URL config fallback (#16720)
    The `OPENAI_BASE_URL` environment variable has been a significant
    support issue, so we decided to deprecate it in favor of an
    `openai_base_url` config key. We've had the deprecation warning in place
    for about a month, so users have had time to migrate to the new
    mechanism. This PR removes support for `OPENAI_BASE_URL` entirely.
  • core: remove cross-crate re-exports from lib.rs (#16512)
    ## Why
    
    `codex-core` was re-exporting APIs owned by sibling `codex-*` crates,
    which made downstream crates depend on `codex-core` as a proxy module
    instead of the actual owner crate.
    
    Removing those forwards makes crate boundaries explicit and lets leaf
    crates drop unnecessary `codex-core` dependencies. In this PR, this
    reduces the dependency on `codex-core` to `codex-login` in the following
    files:
    
    ```
    codex-rs/backend-client/Cargo.toml
    codex-rs/mcp-server/tests/common/Cargo.toml
    ```
    
    ## What
    
    - Remove `codex-rs/core/src/lib.rs` re-exports for symbols owned by
    `codex-login`, `codex-mcp`, `codex-rollout`, `codex-analytics`,
    `codex-protocol`, `codex-shell-command`, `codex-sandboxing`,
    `codex-tools`, and `codex-utils-path`.
    - Delete the `default_client` forwarding shim in `codex-rs/core`.
    - Update in-crate and downstream callsites to import directly from the
    owning `codex-*` crate.
    - Add direct Cargo dependencies where callsites now target the owner
    crate, and remove `codex-core` from `codex-rs/backend-client`.
  • fix: fix comment linter lint violations in Linux-only code (#16118)
    https://github.com/openai/codex/pull/16071 took care of this for
    Windows, so this takes care of things for Linux.
    
    We don't touch the CI jobs in this PR because
    https://github.com/openai/codex/pull/16106 is going to be the real fix
    there (including a major speedup!).
  • Support Codex CLI stdin piping for codex exec (#15917)
    # Summary
    
    Claude Code supports a useful prompt-plus-stdin workflow:
    
    ```bash
    echo "complex input..." | claude -p "summarize concisely"
    ```
    
    Codex previously did not support the equivalent `codex exec` form. While
    `codex exec` could read the prompt from stdin, it could not combine
    piped input with an explicit prompt argument.
    
    This change adds that missing workflow:
    
    ```bash
    echo "complex input..." | codex exec "summarize concisely"
    ```
    
    With this change, when `codex exec` receives both a positional prompt
    and piped stdin, the prompt remains the instruction and stdin is passed
    along as structured `<stdin>...</stdin>` context.
    
    Example:
    
    ```bash
    curl https://jsonplaceholder.typicode.com/comments \
      | ./target/debug/codex exec --skip-git-repo-check "format the top 20 items into a markdown table" \
      > table.md
    ```
    
    This PR also adds regression coverage for:
    - prompt argument + piped stdin
    - legacy stdin-as-prompt behavior
    - `codex exec -` forced-stdin behavior
    - empty-stdin error cases
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • chore: clean up argument-comment lint and roll out all-target CI on macOS (#16054)
    ## Why
    
    `argument-comment-lint` was green in CI even though the repo still had
    many uncommented literal arguments. The main gap was target coverage:
    the repo wrapper did not force Cargo to inspect test-only call sites, so
    examples like the `latest_session_lookup_params(true, ...)` tests in
    `codex-rs/tui_app_server/src/lib.rs` never entered the blocking CI path.
    
    This change cleans up the existing backlog, makes the default repo lint
    path cover all Cargo targets, and starts rolling that stricter CI
    enforcement out on the platform where it is currently validated.
    
    ## What changed
    
    - mechanically fixed existing `argument-comment-lint` violations across
    the `codex-rs` workspace, including tests, examples, and benches
    - updated `tools/argument-comment-lint/run-prebuilt-linter.sh` and
    `tools/argument-comment-lint/run.sh` so non-`--fix` runs default to
    `--all-targets` unless the caller explicitly narrows the target set
    - fixed both wrappers so forwarded cargo arguments after `--` are
    preserved with a single separator
    - documented the new default behavior in
    `tools/argument-comment-lint/README.md`
    - updated `rust-ci` so the macOS lint lane keeps the plain wrapper
    invocation and therefore enforces `--all-targets`, while Linux and
    Windows temporarily pass `-- --lib --bins`
    
    That temporary CI split keeps the stricter all-targets check where it is
    already cleaned up, while leaving room to finish the remaining Linux-
    and Windows-specific target-gated cleanup before enabling
    `--all-targets` on those runners. The Linux and Windows failures on the
    intermediate revision were caused by the wrapper forwarding bug, not by
    additional lint findings in those lanes.
    
    ## Validation
    
    - `bash -n tools/argument-comment-lint/run.sh`
    - `bash -n tools/argument-comment-lint/run-prebuilt-linter.sh`
    - shell-level wrapper forwarding check for `-- --lib --bins`
    - shell-level wrapper forwarding check for `-- --tests`
    - `just argument-comment-lint`
    - `cargo test` in `tools/argument-comment-lint`
    - `cargo test -p codex-terminal-detection`
    
    ## Follow-up
    
    - Clean up remaining Linux-only target-gated callsites, then switch the
    Linux lint lane back to the plain wrapper invocation.
    - Clean up remaining Windows-only target-gated callsites, then switch
    the Windows lint lane back to the plain wrapper invocation.
  • Protect first-time project .codex creation across Linux and macOS sandboxes (#15067)
    ## Problem
    
    Codex already treated an existing top-level project `./.codex` directory
    as protected, but there was a gap on first creation.
    
    If `./.codex` did not exist yet, a turn could create files under it,
    such as `./.codex/config.toml`, without going through the same approval
    path as later modifications. That meant the initial write could bypass
    the intended protection for project-local Codex state.
    
    ## What this changes
    
    This PR closes that first-creation gap in the Unix enforcement layers:
    
    - `codex-protocol`
    - treat the top-level project `./.codex` path as a protected carveout
    even when it does not exist yet
    - avoid injecting the default carveout when the user already has an
    explicit rule for that exact path
    - macOS Seatbelt
    - deny writes to both the exact protected path and anything beneath it,
    so creating `./.codex` itself is blocked in addition to writes inside it
    - Linux bubblewrap
    - preserve the same protected-path behavior for first-time creation
    under `./.codex`
    - tests
    - add protocol regressions for missing `./.codex` and explicit-rule
    collisions
    - add Unix sandbox coverage for blocking first-time `./.codex` creation
      - tighten Seatbelt policy assertions around excluded subpaths
    
    ## Scope
    
    This change is intentionally scoped to protecting the top-level project
    `.codex` subtree from agent writes.
    
    It does not make `.codex` unreadable, and it does not change the product
    behavior around loading project skills from `.codex` when project config
    is untrusted.
    
    ## Why this shape
    
    The fix is pointed rather than broad:
    - it preserves the current model of “project `.codex` is protected from
    writes”
    - it closes the security-relevant first-write hole
    - it avoids folding a larger permissions-model redesign into this PR
    
    ## Validation
    
    - `cargo test -p codex-protocol`
    - `cargo test -p codex-sandboxing seatbelt`
    - `cargo test -p codex-exec --test all
    sandbox_blocks_first_time_dot_codex_creation -- --nocapture`
    
    ---------
    
    Co-authored-by: Michael Bolin <mbolin@openai.com>
  • fix: fix old system bubblewrap compatibility without falling back to vendored bwrap (#15693)
    Fixes #15283.
    
    ## Summary
    Older system bubblewrap builds reject `--argv0`, which makes our Linux
    sandbox fail before the helper can re-exec. This PR keeps using system
    `/usr/bin/bwrap` whenever it exists and only falls back to vendored
    bwrap when the system binary is missing. That matters on stricter
    AppArmor hosts, where the distro bwrap package also provides the policy
    setup needed for user namespaces.
    
    For old system bwrap, we avoid `--argv0` instead of switching binaries:
    - pass the sandbox helper a full-path `argv0`,
    - keep the existing `current_exe() + --argv0` path when the selected
    launcher supports it,
    - otherwise omit `--argv0` and re-exec through the helper's own
    `argv[0]` path, whose basename still dispatches as
    `codex-linux-sandbox`.
    
    Also updates the launcher/warning tests and docs so they match the new
    behavior: present-but-old system bwrap uses the compatibility path, and
    only absent system bwrap falls back to vendored.
    
    ### Validation
    
    1. Install Ubuntu 20.04 in a VM
    2. Compile codex and run without bubblewrap installed - see a warning
    about falling back to the vendored bwrap
    3. Install bwrap and verify version is 0.4.0 without `argv0` support
    4. run codex and use apply_patch tool without errors
    
    <img width="802" height="631" alt="Screenshot 2026-03-25 at 11 48 36 PM"
    src="https://github.com/user-attachments/assets/77248a29-aa38-4d7c-9833-496ec6a458b8"
    />
    <img width="807" height="634" alt="Screenshot 2026-03-25 at 11 47 32 PM"
    src="https://github.com/user-attachments/assets/5af8b850-a466-489b-95a6-455b76b5050f"
    />
    <img width="812" height="635" alt="Screenshot 2026-03-25 at 11 45 45 PM"
    src="https://github.com/user-attachments/assets/438074f0-8435-4274-a667-332efdd5cb57"
    />
    <img width="801" height="623" alt="Screenshot 2026-03-25 at 11 43 56 PM"
    src="https://github.com/user-attachments/assets/0dc8d3f5-e8cf-4218-b4b4-a4f7d9bf02e3"
    />
    
    ---------
    
    Co-authored-by: Michael Bolin <mbolin@openai.com>
  • Finish moving codex exec to app-server (#15424)
    This PR completes the conversion of non-interactive `codex exec` to use
    app server rather than directly using core events and methods.
    
    ### Summary
    - move `codex-exec` off exec-owned `AuthManager` and `ThreadManager`
    state
    - route exec bootstrap, resume, and auth refresh through existing
    app-server paths
    - replace legacy `codex/event/*` decoding in exec with typed app-server
    notification handling
    - update human and JSONL exec output adapters to translate existing
    app-server notifications only
    - clean up "app server client" layer by eliminating support for legacy
    notifications; this is no longer needed
    - remove exposure of `authManager` and `threadManager` from "app server
    client" layer
    
    ### Testing
    - `exec` has pretty extensive unit and integration tests already, and
    these all pass
    - In addition, I asked Codex to put together a comprehensive manual set
    of tests to cover all of the `codex exec` functionality (including
    command-line options), and it successfully generated and ran these tests
  • Add Smart Approvals guardian review across core, app-server, and TUI (#13860)
    ## Summary
    - add `approvals_reviewer = "user" | "guardian_subagent"` as the runtime
    control for who reviews approval requests
    - route Smart Approvals guardian review through core for command
    execution, file changes, managed-network approvals, MCP approvals, and
    delegated/subagent approval flows
    - expose guardian review in app-server with temporary unstable
    `item/autoApprovalReview/{started,completed}` notifications carrying
    `targetItemId`, `review`, and `action`
    - update the TUI so Smart Approvals can be enabled from `/experimental`,
    aligned with the matching `/approvals` mode, and surfaced clearly while
    reviews are pending or resolved
    
    ## Runtime model
    This PR does not introduce a new `approval_policy`.
    
    Instead:
    - `approval_policy` still controls when approval is needed
    - `approvals_reviewer` controls who reviewable approval requests are
    routed to:
      - `user`
      - `guardian_subagent`
    
    `guardian_subagent` is a carefully prompted reviewer subagent that
    gathers relevant context and applies a risk-based decision framework
    before approving or denying the request.
    
    The `smart_approvals` feature flag is a rollout/UI gate. Core runtime
    behavior keys off `approvals_reviewer`.
    
    When Smart Approvals is enabled from the TUI, it also switches the
    current `/approvals` settings to the matching Smart Approvals mode so
    users immediately see guardian review in the active thread:
    - `approval_policy = on-request`
    - `approvals_reviewer = guardian_subagent`
    - `sandbox_mode = workspace-write`
    
    Users can still change `/approvals` afterward.
    
    Config-load behavior stays intentionally narrow:
    - plain `smart_approvals = true` in `config.toml` remains just the
    rollout/UI gate and does not auto-set `approvals_reviewer`
    - the deprecated `guardian_approval = true` alias migration does
    backfill `approvals_reviewer = "guardian_subagent"` in the same scope
    when that reviewer is not already configured there, so old configs
    preserve their original guardian-enabled behavior
    
    ARC remains a separate safety check. For MCP tool approvals, ARC
    escalations now flow into the configured reviewer instead of always
    bypassing guardian and forcing manual review.
    
    ## Config stability
    The runtime reviewer override is stable, but the config-backed
    app-server protocol shape is still settling.
    
    - `thread/start`, `thread/resume`, and `turn/start` keep stable
    `approvalsReviewer` overrides
    - the config-backed `approvals_reviewer` exposure returned via
    `config/read` (including profile-level config) is now marked
    `[UNSTABLE]` / experimental in the app-server protocol until we are more
    confident in that config surface
    
    ## App-server surface
    This PR intentionally keeps the guardian app-server shape narrow and
    temporary.
    
    It adds generic unstable lifecycle notifications:
    - `item/autoApprovalReview/started`
    - `item/autoApprovalReview/completed`
    
    with payloads of the form:
    - `{ threadId, turnId, targetItemId, review, action? }`
    
    `review` is currently:
    - `{ status, riskScore?, riskLevel?, rationale? }`
    - where `status` is one of `inProgress`, `approved`, `denied`, or
    `aborted`
    
    `action` carries the guardian action summary payload from core when
    available. This lets clients render temporary standalone pending-review
    UI, including parallel reviews, even when the underlying tool item has
    not been emitted yet.
    
    These notifications are explicitly documented as `[UNSTABLE]` and
    expected to change soon.
    
    This PR does **not** persist guardian review state onto `thread/read`
    tool items. The intended follow-up is to attach guardian review state to
    the reviewed tool item lifecycle instead, which would improve
    consistency with manual approvals and allow thread history / reconnect
    flows to replay guardian review state directly.
    
    ## TUI behavior
    - `/experimental` exposes the rollout gate as `Smart Approvals`
    - enabling it in the TUI enables the feature and switches the current
    session to the matching Smart Approvals `/approvals` mode
    - disabling it in the TUI clears the persisted `approvals_reviewer`
    override when appropriate and returns the session to default manual
    review when the effective reviewer changes
    - `/approvals` still exposes the reviewer choice directly
    - the TUI renders:
    - pending guardian review state in the live status footer, including
    parallel review aggregation
      - resolved approval/denial state in history
    
    ## Scope notes
    This PR includes the supporting core/runtime work needed to make Smart
    Approvals usable end-to-end:
    - shell / unified-exec / apply_patch / managed-network / MCP guardian
    review
    - delegated/subagent approval routing into guardian review
    - guardian review risk metadata and action summaries for app-server/TUI
    - config/profile/TUI handling for `smart_approvals`, `guardian_approval`
    alias migration, and `approvals_reviewer`
    - a small internal cleanup of delegated approval forwarding to dedupe
    fallback paths and simplify guardian-vs-parent approval waiting (no
    intended behavior change)
    
    Out of scope for this PR:
    - redesigning the existing manual approval protocol shapes
    - persisting guardian review state onto app-server `ThreadItem`s
    - delegated MCP elicitation auto-review (the current delegated MCP
    guardian shim only covers the legacy `RequestUserInput` path)
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Include spawn agent model metadata in app-server items (#14410)
    - add model and reasoning effort to app-server collab spawn items and
    notifications
    - regenerate app-server protocol schemas for the new fields
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Show spawned agent model and effort in TUI (#14273)
    - include the requested sub-agent model and reasoning effort in the
    spawn begin event\n- render that metadata next to the spawned agent name
    and role in the TUI transcript
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • app-server service tier plumbing (plus some cleanup) (#13334)
    followup to https://github.com/openai/codex/pull/13212 to expose fast
    tier controls to app server
    (majority of this PR is generated schema jsons - actual code is +69 /
    -35 and +24 tests )
    
    - add service tier fields to the app-server protocol surfaces used by
    thread lifecycle, turn start, config, and session configured events
    - thread service tier through the app-server message processor and core
    thread config snapshots
    - allow runtime config overrides to carry service tier for app-server
    callers
    
    cleanup:
    - Removing useless "legacy" code supporting "standard" - we moved to
    None | "fast", so "standard" is not needed.
  • core: bundle settings diff updates into one dev/user envelope (#12417)
    ## Summary
    - bundle contextual prompt injection into at most one developer message
    plus one contextual user message in both:
      - per-turn settings updates
      - initial context insertion
    - preserve `<model_switch>` across compaction by rebuilding it through
    canonical initial-context injection, instead of relying on
    strip/reattach hacks
    - centralize contextual user fragment detection in one shared definition
    table and reuse it for parsing/compaction logic
    - keep `AGENTS.md` in its natural serialized format:
      - `# AGENTS.md instructions for {dirname}`
      - `<INSTRUCTIONS>...</INSTRUCTIONS>`
    - simplify related tests/helpers and accept the expected snapshot/layout
    updates from bundled multi-part messages
    
    ## Why
    The goal is to converge toward a simpler, more intentional prompt shape
    where contextual updates are consistently represented as one developer
    envelope plus one contextual user envelope, while keeping parsing and
    compaction behavior aligned with that representation.
    
    ## Notable details
    - the temporary `SettingsUpdateEnvelope` wrapper was removed; these
    paths now return `Vec<ResponseItem>` directly
    - local/remote compaction no longer rely on model-switch strip/restore
    helpers
    - contextual user detection is now driven by shared fragment definitions
    instead of ad hoc matcher assembly
    - AGENTS/user instructions are still the same logical context; only the
    synthetic `<user_instructions>` wrapper was replaced by the natural
    AGENTS text format
    
    ## Testing
    - `just fmt`
    - `cargo test -p codex-app-server
    codex_message_processor::tests::extract_conversation_summary_prefers_plain_user_messages
    -- --exact`
    - `cargo test -p codex-core
    compact::tests::collect_user_messages_filters_session_prefix_entries
    --lib -- --exact`
    - `cargo test -p codex-core --test all
    'suite::compact::snapshot_request_shape_pre_turn_compaction_strips_incoming_model_switch'
    -- --exact`
    - `cargo test -p codex-core --test all
    'suite::compact_remote::snapshot_request_shape_remote_pre_turn_compaction_strips_incoming_model_switch'
    -- --exact`
    - `cargo test -p codex-core --test all
    'suite::client::includes_apps_guidance_as_developer_message_when_enabled'
    -- --exact`
    - `cargo test -p codex-core --test all
    'suite::client::includes_developer_instructions_message_in_request' --
    --exact`
    - `cargo test -p codex-core --test all
    'suite::client::includes_user_instructions_message_in_request' --
    --exact`
    - `cargo test -p codex-core --test all
    'suite::client::resume_includes_initial_messages_and_sends_prior_items'
    -- --exact`
    - `cargo test -p codex-core --test all
    'suite::review::review_input_isolated_from_parent_history' -- --exact`
    - `cargo test -p codex-exec --test all
    'suite::resume::exec_resume_last_respects_cwd_filter_and_all_flag' --
    --exact`
    - `cargo test -p core_test_support
    context_snapshot::tests::full_text_mode_preserves_unredacted_text --
    --exact`
    
    ## Notes
    - I also ran several targeted `compact`, `compact_remote`,
    `prompt_caching`, `model_visible_layout`, and `event_mapping` tests
    while iterating on prompt-shape changes.
    - I have not claimed a clean full-workspace `cargo test` from this
    environment because local sandbox/resource conditions have previously
    produced unrelated failures in large workspace runs.
  • fix(exec) Patch resume test race condition (#12648)
    ## Summary
    The test exec_resume_last_respects_cwd_filter_and_all_flag makes one
    session “newest” by resuming it, but rollout updated_at is stored/sorted
    at second precision. On fast CI (especially Windows), the touch could
    land in the same second as initial session creation, making ordering
    nondeterministic.
    
    This change adds a short sleep before the recency-touch step so the
    resumed session is guaranteed to have a later updated_at, preserving the
    intended assertion without changing product behavior.
  • fix: codex-arg0 no longer depends on codex-core (#12434)
    ## Why
    
    `codex-rs/arg0` only needed two things from `codex-core`:
    
    - the `find_codex_home()` wrapper
    - the special argv flag used for the internal `apply_patch`
    self-invocation path
    
    That made `codex-arg0` depend on `codex-core` for a very small surface
    area. This change removes that dependency edge and moves the shared
    `apply_patch` invocation flag to a more natural boundary
    (`codex-apply-patch`) while keeping the contract explicitly documented.
    
    ## What Changed
    
    - Moved the internal `apply_patch` argv[1] flag constant out of
    `codex-core` and into `codex-apply-patch`.
    - Renamed the constant to `CODEX_CORE_APPLY_PATCH_ARG1` and documented
    that it is part of the Codex core process-invocation contract (even
    though it now lives in `codex-apply-patch`).
    - Updated `arg0`, the core apply-patch runtime, and the `codex-exec`
    apply-patch test to import the constant from `codex-apply-patch`.
    - Updated `codex-rs/arg0` to call
    `codex_utils_home_dir::find_codex_home()` directly instead of
    `codex_core::config::find_codex_home()`.
    - Removed the `codex-core` dependency from `codex-rs/arg0` and added the
    needed direct dependency on `codex-utils-home-dir`.
    - Added `codex-apply-patch` as a dev-dependency for `codex-rs/exec`
    tests (the apply-patch test now imports the moved constant directly).
    
    ## Verification
    
    - `cargo test -p codex-apply-patch`
    - `cargo test -p codex-arg0`
    - `cargo test -p codex-core --lib apply_patch`
    - `cargo test -p codex-exec
    test_standalone_exec_cli_can_use_apply_patch`
    - `cargo shear`
  • chore: remove codex-core public protocol/shell re-exports (#12432)
    ## Why
    
    `codex-rs/core/src/lib.rs` re-exported a broad set of types and modules
    from `codex-protocol` and `codex-shell-command`. That made it easy for
    workspace crates to import those APIs through `codex-core`, which in
    turn hides dependency edges and makes it harder to reduce compile-time
    coupling over time.
    
    This change removes those public re-exports so call sites must import
    from the source crates directly. Even when a crate still depends on
    `codex-core` today, this makes dependency boundaries explicit and
    unblocks future work to drop `codex-core` dependencies where possible.
    
    ## What Changed
    
    - Removed public re-exports from `codex-rs/core/src/lib.rs` for:
    - `codex_protocol::protocol` and related protocol/model types (including
    `InitialHistory`)
      - `codex_protocol::config_types` (`protocol_config_types`)
    - `codex_shell_command::{bash, is_dangerous_command, is_safe_command,
    parse_command, powershell}`
    - Migrated workspace Rust call sites to import directly from:
      - `codex_protocol::protocol`
      - `codex_protocol::config_types`
      - `codex_protocol::models`
      - `codex_shell_command`
    - Added explicit `Cargo.toml` dependencies (`codex-protocol` /
    `codex-shell-command`) in crates that now import those crates directly.
    - Kept `codex-core` internal modules compiling by using `pub(crate)`
    aliases in `core/src/lib.rs` (internal-only, not part of the public
    API).
    - Updated the two utility crates that can already drop a `codex-core`
    dependency edge entirely:
      - `codex-utils-approval-presets`
      - `codex-utils-cli`
    
    ## Verification
    
    - `cargo test -p codex-utils-approval-presets`
    - `cargo test -p codex-utils-cli`
    - `cargo check --workspace --all-targets`
    - `just clippy`
  • feat: cleaner TUI for sub-agents (#12327)
    <img width="760" height="496" alt="Screenshot 2026-02-20 at 14 31 25"
    src="https://github.com/user-attachments/assets/1983b825-bb47-417e-9925-6f727af56765"
    />
  • feat(app-server): experimental flag to persist extended history (#11227)
    This PR adds an experimental `persist_extended_history` bool flag to
    app-server thread APIs so rollout logs can retain a richer set of
    EventMsgs for non-lossy Thread > Turn > ThreadItems reconstruction (i.e.
    on `thread/resume`).
    
    ### Motivation
    Today, our rollout recorder only persists a small subset (e.g. user
    message, reasoning, assistant message) of `EventMsg` types, dropping a
    good number (like command exec, file change, etc.) that are important
    for reconstructing full item history for `thread/resume`, `thread/read`,
    and `thread/fork`.
    
    Some clients want to be able to resume a thread without lossiness. This
    lossiness is primarily a UI thing, since what the model sees are
    `ResponseItem` and not `EventMsg`.
    
    ### Approach
    This change introduces an opt-in `persist_full_history` flag to preserve
    those events when you start/resume/fork a thread (defaults to `false`).
    
    This is done by adding an `EventPersistenceMode` to the rollout
    recorder:
    - `Limited` (existing behavior, default)
    - `Extended` (new opt-in behavior)
    
    In `Extended` mode, persist additional `EventMsg` variants needed for
    non-lossy app-server `ThreadItem` reconstruction. We now store the
    following ThreadItems that we didn't before:
    - web search
    - command execution
    - patch/file changes
    - MCP tool calls
    - image view calls
    - collab tool outcomes
    - context compaction
    - review mode enter/exit
    
    For **command executions** in particular, we truncate the output using
    the existing `truncate_text` from core to store an upper bound of 10,000
    bytes, which is also the default value for truncating tool outputs shown
    to the model. This keeps the size of the rollout file and command
    execution items returned over the wire reasonable.
    
    And we also persist `EventMsg::Error` which we can now map back to the
    Turn's status and populates the Turn's error metadata.
    
    #### Updates to EventMsgs
    To truly make `thread/resume` non-lossy, we also needed to persist the
    `status` on `EventMsg::CommandExecutionEndEvent` and
    `EventMsg::PatchApplyEndEvent`. Previously it was not obvious whether a
    command failed or was declined (similar for apply_patch). These
    EventMsgs were never persisted before so I made it a required field.
  • feat: make sandbox read access configurable with ReadOnlyAccess (#11387)
    `SandboxPolicy::ReadOnly` previously implied broad read access and could
    not express a narrower read surface.
    This change introduces an explicit read-access model so we can support
    user-configurable read restrictions in follow-up work, while preserving
    current behavior today.
    
    It also ensures unsupported backends fail closed for restricted-read
    policies instead of silently granting broader access than intended.
    
    ## What
    
    - Added `ReadOnlyAccess` in protocol with:
      - `Restricted { include_platform_defaults, readable_roots }`
      - `FullAccess`
    - Updated `SandboxPolicy` to carry read-access configuration:
      - `ReadOnly { access: ReadOnlyAccess }`
      - `WorkspaceWrite { ..., read_only_access: ReadOnlyAccess }`
    - Preserved existing behavior by defaulting current construction paths
    to `ReadOnlyAccess::FullAccess`.
    - Threaded the new fields through sandbox policy consumers and call
    sites across `core`, `tui`, `linux-sandbox`, `windows-sandbox`, and
    related tests.
    - Updated Seatbelt policy generation to honor restricted read roots by
    emitting scoped read rules when full read access is not granted.
    - Added fail-closed behavior on Linux and Windows backends when
    restricted read access is requested but not yet implemented there
    (`UnsupportedOperation`).
    - Regenerated app-server protocol schema and TypeScript artifacts,
    including `ReadOnlyAccess`.
    
    ## Compatibility / rollout
    
    - Runtime behavior remains unchanged by default (`FullAccess`).
    - API/schema changes are in place so future config wiring can enable
    restricted read access without another policy-shape migration.
  • chore: persist turn_id in rollout session and make turn_id uuid based (#11246)
    Problem:
    1. turn id is constructed in-memory;
    2. on resuming threads, turn_id might not be unique;
    3. client cannot no the boundary of a turn from rollout files easily.
    
    This PR does three things:
    1. persist `task_started` and `task_complete` events;
    1. persist `turn_id` in rollout turn events;
    5. generate turn_id as unique uuids instead of incrementing it in
    memory.
    
    This helps us resolve the issue of clients wanting to have unique turn
    ids for resuming a thread, and knowing the boundry of each turn in
    rollout files.
    
    example debug logs
    ```
    2026-02-11T00:32:10.746876Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=8 turn=Turn { id: "019c4a07-d809-74c3-bc4b-fd9618487b4b", items: [UserMessage { id: "item-24", content: [Text { text: "hi", text_elements: [] }] }, AgentMessage { id: "item-25", text: "Hi. I’m in the workspace with your current changes loaded and ready. Send the next task and I’ll execute it end-to-end." }], status: Completed, error: None }
    2026-02-11T00:32:10.746888Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=9 turn=Turn { id: "019c4a18-1004-76c0-a0fb-a77610f6a9b8", items: [UserMessage { id: "item-26", content: [Text { text: "hello", text_elements: [] }] }, AgentMessage { id: "item-27", text: "Hello. Ready for the next change in `codex-rs`; I can continue from the current in-progress diff or start a new task." }], status: Completed, error: None }
    2026-02-11T00:32:10.746899Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=10 turn=Turn { id: "019c4a19-41f0-7db0-ad78-74f1503baeb8", items: [UserMessage { id: "item-28", content: [Text { text: "hello", text_elements: [] }] }, AgentMessage { id: "item-29", text: "Hello. Send the specific change you want in `codex-rs`, and I’ll implement it and run the required checks." }], status: Completed, error: None }
    ```
    
    backward compatibility:
    if you try to resume an old session without task_started and
    task_complete event populated, the following happens:
    - If you resume and do nothing: those reconstructed historical IDs can
    differ next time you resume.
    - If you resume and send a new turn: the new turn gets a fresh UUID from
    live submission flow and is persisted, so that new turn’s ID is stable
    on later resumes.
    I think this behavior is fine, because we only care about deterministic
    turn id once a turn is triggered.
  • feat: retain NetworkProxy, when appropriate (#11207)
    As of this PR, `SessionServices` retains a
    `Option<StartedNetworkProxy>`, if appropriate.
    
    Now the `network` field on `Config` is `Option<NetworkProxySpec>`
    instead of `Option<NetworkProxy>`.
    
    Over in `Session::new()`, we invoke `NetworkProxySpec::start_proxy()` to
    create the `StartedNetworkProxy`, which is a new struct that retains the
    `NetworkProxy` as well as the `NetworkProxyHandle`. (Note that `Drop` is
    implemented for `NetworkProxyHandle` to ensure the proxies are shutdown
    when it is dropped.)
    
    The `NetworkProxy` from the `StartedNetworkProxy` is threaded through to
    the appropriate places.
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/11207).
    * #11285
    * __->__ #11207
  • feat(sandbox): enforce proxy-aware network routing in sandbox (#11113)
    ## Summary
    - expand proxy env injection to cover common tool env vars
    (`HTTP_PROXY`/`HTTPS_PROXY`/`ALL_PROXY`/`NO_PROXY` families +
    tool-specific variants)
    - harden macOS Seatbelt network policy generation to route through
    inferred loopback proxy endpoints and fail closed when proxy env is
    malformed
    - thread proxy-aware Linux sandbox flags and add minimal bwrap netns
    isolation hook for restricted non-proxy runs
    - add/refresh tests for proxy env wiring, Seatbelt policy generation,
    and Linux sandbox argument wiring
  • Handle required MCP startup failures across components (#10902)
    Summary
    - add a `required` flag for MCP servers everywhere config/CLI data is
    touched so mandatory helpers can be round-tripped
    - have `codex exec` and `codex app-server` thread start/resume fail fast
    when required MCPs fail to initialize
  • feat(linux-sandbox): add bwrap support (#9938)
    ## Summary
    This PR introduces a gated Bubblewrap (bwrap) Linux sandbox path. The
    curent Linux sandbox path relies on in-process restrictions (including
    Landlock). Bubblewrap gives us a more uniform filesystem isolation
    model, especially explicit writable roots with the option to make some
    directories read-only and granular network controls.
    
    This is behind a feature flag so we can validate behavior safely before
    making it the default.
    
    - Added temporary rollout flag:
      - `features.use_linux_sandbox_bwrap`
    - Preserved existing default path when the flag is off.
    - In Bubblewrap mode:
    - Added internal retry without /proc when /proc mount is not permitted
    by the host/container.
  • Cleanup collaboration mode variants (#10404)
    ## Summary
    
    This PR simplifies collaboration modes to the visible set `default |
    plan`, while preserving backward compatibility for older partners that
    may still send legacy mode
    names.
    
    Specifically:
    - Renames the old Code behavior to **Default**.
    - Keeps **Plan** as-is.
    - Removes **Custom** mode behavior (fallbacks now resolve to Default).
    - Keeps `PairProgramming` and `Execute` internally for compatibility
    plumbing, while removing them from schema/API and UI visibility.
    - Adds legacy input aliasing so older clients can still send old mode
    names.
    
    ## What Changed
    
    1. Mode enum and compatibility
    - `ModeKind` now uses `Plan` + `Default` as active/public modes.
    - `ModeKind::Default` deserialization accepts legacy values:
      - `code`
      - `pair_programming`
      - `execute`
      - `custom`
    - `PairProgramming` and `Execute` variants remain in code but are hidden
    from protocol/schema generation.
    - `Custom` variant is removed; previous custom fallbacks now map to
    `Default`.
    
    2. Collaboration presets and templates
    - Built-in presets now return only:
      - `Plan`
      - `Default`
    - Template rename:
      - `core/templates/collaboration_mode/code.md` -> `default.md`
    - `execute.md` and `pair_programming.md` remain on disk but are not
    surfaced in visible preset lists.
    
    3. TUI updates
    - Updated user-facing naming and prompts from “Code” to “Default”.
    - Updated mode-cycle and indicator behavior to reflect only visible
    `Plan` and `Default`.
    - Updated corresponding tests and snapshots.
    
    4. request_user_input behavior
    - `request_user_input` remains allowed only in `Plan` mode.
    - Rejection messaging now consistently treats non-plan modes as
    `Default`.
    
    5. Schemas
    - Regenerated config and app-server schemas.
    - Public schema types now advertise mode values as:
      - `plan`
      - `default`
    
    ## Backward Compatibility Notes
    
    - Incoming legacy mode names (`code`, `pair_programming`, `execute`,
    `custom`) are accepted and coerced to `default`.
    - Outgoing/public schema surfaces intentionally expose only `plan |
    default`.
    - This allows tolerant ingestion of older partner payloads while
    standardizing new integrations on the reduced mode set.
    
    ## Codex author
    `codex fork 019c1fae-693b-7840-b16e-9ad38ea0bd00`
  • feat: replace custom mcp-types crate with equivalents from rmcp (#10349)
    We started working with MCP in Codex before
    https://crates.io/crates/rmcp was mature, so we had our own crate for
    MCP types that was generated from the MCP schema:
    
    
    https://github.com/openai/codex/blob/8b95d3e082376f4cb23e92641705a22afb28a9da/codex-rs/mcp-types/README.md
    
    Now that `rmcp` is more mature, it makes more sense to use their MCP
    types in Rust, as they handle details (like the `_meta` field) that our
    custom version ignored. Though one advantage that our custom types had
    is that our generated types implemented `JsonSchema` and `ts_rs::TS`,
    whereas the types in `rmcp` do not. As such, part of the work of this PR
    is leveraging the adapters between `rmcp` types and the serializable
    types that are API for us (app server and MCP) introduced in #10356.
    
    Note this PR results in a number of changes to
    `codex-rs/app-server-protocol/schema`, which merit special attention
    during review. We must ensure that these changes are still
    backwards-compatible, which is possible because we have:
    
    ```diff
    - export type CallToolResult = { content: Array<ContentBlock>, isError?: boolean, structuredContent?: JsonValue, };
    + export type CallToolResult = { content: Array<JsonValue>, structuredContent?: JsonValue, isError?: boolean, _meta?: JsonValue, };
    ```
    
    so `ContentBlock` has been replaced with the more general `JsonValue`.
    Note that `ContentBlock` was defined as:
    
    ```typescript
    export type ContentBlock = TextContent | ImageContent | AudioContent | ResourceLink | EmbeddedResource;
    ```
    
    so the deletion of those individual variants should not be a cause of
    great concern.
    
    Similarly, we have the following change in
    `codex-rs/app-server-protocol/schema/typescript/Tool.ts`:
    
    ```
    - export type Tool = { annotations?: ToolAnnotations, description?: string, inputSchema: ToolInputSchema, name: string, outputSchema?: ToolOutputSchema, title?: string, };
    + export type Tool = { name: string, title?: string, description?: string, inputSchema: JsonValue, outputSchema?: JsonValue, annotations?: JsonValue, icons?: Array<JsonValue>, _meta?: JsonValue, };
    ```
    
    so:
    
    - `annotations?: ToolAnnotations` ➡️ `JsonValue`
    - `inputSchema: ToolInputSchema` ➡️ `JsonValue`
    - `outputSchema?: ToolOutputSchema` ➡️ `JsonValue`
    
    and two new fields: `icons?: Array<JsonValue>, _meta?: JsonValue`
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/10349).
    * #10357
    * __->__ #10349
    * #10356
  • add missing fields to WebSearchAction and update app-server types (#10276)
    - add `WebSearchAction` to app-server v2 types
    - add `queries` to `WebSearchAction::Search` type
    
    Updated tests.
  • Plan mode: stream proposed plans, emit plan items, and render in TUI (#9786)
    ## Summary
    - Stream proposed plans in Plan Mode using `<proposed_plan>` tags parsed
    in core, emitting plan deltas plus a plan `ThreadItem`, while stripping
    tags from normal assistant output.
    - Persist plan items and rebuild them on resume so proposed plans show
    in thread history.
    - Wire plan items/deltas through app-server protocol v2 and render a
    dedicated proposed-plan view in the TUI, including the “Implement this
    plan?” prompt only when a plan item is present.
    
    ## Changes
    
    ### Core (`codex-rs/core`)
    - Added a generic, line-based tag parser that buffers each line until it
    can disprove a tag prefix; implements auto-close on `finish()` for
    unterminated tags. `codex-rs/core/src/tagged_block_parser.rs`
    - Refactored proposed plan parsing to wrap the generic parser.
    `codex-rs/core/src/proposed_plan_parser.rs`
    - In plan mode, stream assistant deltas as:
      - **Normal text** → `AgentMessageContentDelta`
      - **Plan text** → `PlanDelta` + `TurnItem::Plan` start/completion  
      (`codex-rs/core/src/codex.rs`)
    - Final plan item content is derived from the completed assistant
    message (authoritative), not necessarily the concatenated deltas.
    - Strips `<proposed_plan>` blocks from assistant text in plan mode so
    tags don’t appear in normal messages.
    (`codex-rs/core/src/stream_events_utils.rs`)
    - Persist `ItemCompleted` events only for plan items for rollout replay.
    (`codex-rs/core/src/rollout/policy.rs`)
    - Guard `update_plan` tool in Plan Mode with a clear error message.
    (`codex-rs/core/src/tools/handlers/plan.rs`)
    - Updated Plan Mode prompt to:  
      - keep `<proposed_plan>` out of non-final reasoning/preambles  
      - require exact tag formatting  
      - allow only one `<proposed_plan>` block per turn  
      (`codex-rs/core/templates/collaboration_mode/plan.md`)
    
    ### Protocol / App-server protocol
    - Added `TurnItem::Plan` and `PlanDeltaEvent` to core protocol items.
    (`codex-rs/protocol/src/items.rs`, `codex-rs/protocol/src/protocol.rs`)
    - Added v2 `ThreadItem::Plan` and `PlanDeltaNotification` with
    EXPERIMENTAL markers and note that deltas may not match the final plan
    item. (`codex-rs/app-server-protocol/src/protocol/v2.rs`)
    - Added plan delta route in app-server protocol common mapping.
    (`codex-rs/app-server-protocol/src/protocol/common.rs`)
    - Rebuild plan items from persisted `ItemCompleted` events on resume.
    (`codex-rs/app-server-protocol/src/protocol/thread_history.rs`)
    
    ### App-server
    - Forward plan deltas to v2 clients and map core plan items to v2 plan
    items. (`codex-rs/app-server/src/bespoke_event_handling.rs`,
    `codex-rs/app-server/src/codex_message_processor.rs`)
    - Added v2 plan item tests.
    (`codex-rs/app-server/tests/suite/v2/plan_item.rs`)
    
    ### TUI
    - Added a dedicated proposed plan history cell with special background
    and padding, and moved “• Proposed Plan” outside the highlighted block.
    (`codex-rs/tui/src/history_cell.rs`, `codex-rs/tui/src/style.rs`)
    - Only show “Implement this plan?” when a plan item exists.
    (`codex-rs/tui/src/chatwidget.rs`,
    `codex-rs/tui/src/chatwidget/tests.rs`)
    
    <img width="831" height="847" alt="Screenshot 2026-01-29 at 7 06 24 PM"
    src="https://github.com/user-attachments/assets/69794c8c-f96b-4d36-92ef-c1f5c3a8f286"
    />
    
    ### Docs / Misc
    - Updated protocol docs to mention plan deltas.
    (`codex-rs/docs/protocol_v1.md`)
    - Minor plumbing updates in exec/debug clients to tolerate plan deltas.
    (`codex-rs/debug-client/src/reader.rs`, `codex-rs/exec/...`)
    
    ## Tests
    - Added core integration tests:
      - Plan mode strips plan from agent messages.
      - Missing `</proposed_plan>` closes at end-of-message.  
      (`codex-rs/core/tests/suite/items.rs`)
    - Added unit tests for generic tag parser (prefix buffering, non-tag
    lines, auto-close). (`codex-rs/core/src/tagged_block_parser.rs`)
    - Existing app-server plan item tests in v2.
    (`codex-rs/app-server/tests/suite/v2/plan_item.rs`)
    
    ## Notes / Behavior
    - Plan output no longer appears in standard assistant text in Plan Mode;
    it streams via `PlanDelta` and completes as a `TurnItem::Plan`.
    - The final plan item content is authoritative and may diverge from
    streamed deltas (documented as experimental).
    - Reasoning summaries are not filtered; prompt instructs the model not
    to include `<proposed_plan>` outside the final plan message.
    
    ## Codex Author
    `codex fork 019bec2d-b09d-7450-b292-d7bcdddcdbfb`
  • Conversation naming (#8991)
    Session renaming:
    - `/rename my_session`
    - `/rename` without arg and passing an argument in `customViewPrompt`
    - AppExitInfo shows resume hint using the session name if set instead of
    uuid, defaults to uuid if not set
    - Names are stored in `CODEX_HOME/sessions.jsonl`
    
    Session resuming:
    - codex resume <name> lookup for `CODEX_HOME/sessions.jsonl` first entry
    matching the name and resumes the session
    
    ---------
    
    Co-authored-by: jif-oai <jif@openai.com>
  • [bazel] Improve runfiles handling (#10098)
    we can't use runfiles directory on Windows due to path lengths, so swap
    to manifest strategy. Parsing the manifest is a bit complex and the
    format is changing in Bazel upstream, so pull in the official Rust
    library (via a small hack to make it importable...) and cleanup all the
    associated logic to work cleanly in both bazel and cargo without extra
    confusion
  • fix: handle all web_search actions and in progress invocations (#9960)
    ### Summary
    - Parse all `web_search` tool actions (`search`, `find_in_page`,
    `open_page`).
    - Previously we only parsed + displayed `search`, which made the TUI
    appear to pause when the other actions were being used.
    - Show in progress `web_search` calls as `Searching the web`
      - Previously we only showed completed tool calls
    
    <img width="308" height="149" alt="image"
    src="https://github.com/user-attachments/assets/90a4e8ff-b06a-48ff-a282-b57b31121845"
    />
    
    ### Tests
    Added + updated tests, tested locally
    
    ### Follow ups
    Update VSCode extension to display these as well
  • Fix flakey resume test (#9789)
    Sessions' `updated_at` times are truncated to seconds, with the UUID
    session ID used to break ties. If the two test sessions are created in
    the same second, AND the session B UUID < session A UUID, the test
    fails.
    
    Fix this by mutating the session mtimes, from which we derive the
    updated_at time, to ensure session B is updated_at later than session A.
  • feat: ephemeral threads (#9765)
    Add ephemeral threads capabilities. Only exposed through the
    `app-server` v2
    
    The idea is to disable the rollout recorder for those threads.
  • feat: show forked from session id in /status (#9330)
    Summary:
    - Add forked_from to SessionMeta/SessionConfiguredEvent and persist it
    for forked sessions.
    - Surface forked_from in /status for tui + tui2 and add snapshots.
  • Made codex exec resume --last consistent with codex resume --last (#9352)
    PR #9245 made `codex resume --last` honor cwd, but I forgot to make the
    same change for `codex exec resume --last`. This PR fixes the
    inconsistency.
    
    This addresses #8700
  • feat: introduce find_resource! macro that works with Cargo or Bazel (#8879)
    To support Bazelification in https://github.com/openai/codex/pull/8875,
    this PR introduces a new `find_resource!` macro that we use in place of
    our existing logic in tests that looks for resources relative to the
    compile-time `CARGO_MANIFEST_DIR` env var.
    
    To make this work, we plan to add the following to all `rust_library()`
    and `rust_test()` Bazel rules in the project:
    
    ```
    rustc_env = {
        "BAZEL_PACKAGE": native.package_name(),
    },
    ```
    
    Our new `find_resource!` macro reads this value via
    `option_env!("BAZEL_PACKAGE")` so that the Bazel package _of the code
    using `find_resource!`_ is injected into the code expanded from the
    macro. (If `find_resource()` were a function, then
    `option_env!("BAZEL_PACKAGE")` would always be
    `codex-rs/utils/cargo-bin`, which is not what we want.)
    
    Note we only consider the `BAZEL_PACKAGE` value when the `RUNFILES_DIR`
    environment variable is set at runtime, indicating that the test is
    being run by Bazel. In this case, we have to concatenate the runtime
    `RUNFILES_DIR` with the compile-time `BAZEL_PACKAGE` value to build the
    path to the resource.
    
    In testing this change, I discovered one funky edge case in
    `codex-rs/exec-server/tests/common/lib.rs` where we have to _normalize_
    (but not canonicalize!) the result from `find_resource!` because the
    path contains a `common/..` component that does not exist on disk when
    the test is run under Bazel, so it must be semantically normalized using
    the [`path-absolutize`](https://crates.io/crates/path-absolutize) crate
    before it is passed to `dotslash fetch`.
    
    Because this new behavior may be non-obvious, this PR also updates
    `AGENTS.md` to make humans/Codex aware that this API is preferred.