Commit Graph

46 Commits

  • Use released DotSlash package for argument-comment lint (#15199)
    ## Why
    The argument-comment lint now has a packaged DotSlash artifact from
    [#15198](https://github.com/openai/codex/pull/15198), so the normal repo
    lint path should use that released payload instead of rebuilding the
    lint from source every time.
    
    That keeps `just clippy` and CI aligned with the shipped artifact while
    preserving a separate source-build path for people actively hacking on
    the lint crate.
    
    The current alpha package also exposed two integration wrinkles that the
    repo-side prebuilt wrapper needs to smooth over:
    - the bundled Dylint library filename includes the host triple, for
    example `@nightly-2025-09-18-aarch64-apple-darwin`, and Dylint derives
    `RUSTUP_TOOLCHAIN` from that filename
    - on Windows, Dylint's driver path also expects `RUSTUP_HOME` to be
    present in the environment
    
    Without those adjustments, the prebuilt CI jobs fail during `cargo
    metadata` or driver setup. This change makes the checked-in prebuilt
    wrapper normalize the packaged library name to the plain
    `nightly-2025-09-18` channel before invoking `cargo-dylint`, and it
    teaches both the wrapper and the packaged runner source to infer
    `RUSTUP_HOME` from `rustup show home` when the environment does not
    already provide it.
    
    After the prebuilt Windows lint job started running successfully, it
    also surfaced a handful of existing anonymous literal callsites in
    `windows-sandbox-rs`. This PR now annotates those callsites so the new
    cross-platform lint job is green on the current tree.
    
    ## What Changed
    - checked in the current
    `tools/argument-comment-lint/argument-comment-lint` DotSlash manifest
    - kept `tools/argument-comment-lint/run.sh` as the source-build wrapper
    for lint development
    - added `tools/argument-comment-lint/run-prebuilt-linter.sh` as the
    normal enforcement path, using the checked-in DotSlash package and
    bundled `cargo-dylint`
    - updated `just clippy` and `just argument-comment-lint` to use the
    prebuilt wrapper
    - split `.github/workflows/rust-ci.yml` so source-package checks live in
    a dedicated `argument_comment_lint_package` job, while the released lint
    runs in an `argument_comment_lint_prebuilt` matrix on Linux, macOS, and
    Windows
    - kept the pinned `nightly-2025-09-18` toolchain install in the prebuilt
    CI matrix, since the prebuilt package still relies on rustup-provided
    toolchain components
    - updated `tools/argument-comment-lint/run-prebuilt-linter.sh` to
    normalize host-qualified nightly library filenames, keep the `rustup`
    shim directory ahead of direct toolchain `cargo` binaries, and export
    `RUSTUP_HOME` when needed for Windows Dylint driver setup
    - updated `tools/argument-comment-lint/src/bin/argument-comment-lint.rs`
    so future published DotSlash artifacts apply the same nightly-filename
    normalization and `RUSTUP_HOME` inference internally
    - fixed the remaining Windows lint violations in
    `codex-rs/windows-sandbox-rs` by adding the required `/*param*/`
    comments at the reported callsites
    - documented the checked-in DotSlash file, wrapper split, archive
    layout, nightly prerequisite, and Windows `RUSTUP_HOME` requirement in
    `tools/argument-comment-lint/README.md`
  • Move TUI on top of app server (parallel code) (#14717)
    This PR replicates the `tui` code directory and creates a temporary
    parallel `tui_app_server` directory. It also implements a new feature
    flag `tui_app_server` to select between the two tui implementations.
    
    Once the new app-server-based TUI is stabilized, we'll delete the old
    `tui` directory and feature flag.
  • client: extend custom CA handling across HTTPS and websocket clients (#14239)
    ## Stacked PRs
    
    This work is now effectively split across two steps:
    
    - #14178: add custom CA support for browser and device-code login flows,
    docs, and hermetic subprocess tests
    - #14239: extend that shared custom CA handling across Codex HTTPS
    clients and secure websocket TLS
    
    Note: #14240 was merged into this branch while it was stacked on top of
    this PR. This PR now subsumes that websocket follow-up and should be
    treated as the combined change.
    
    Builds on top of #14178.
    
    ## Problem
    
    Custom CA support landed first in the login path, but the real
    requirement is broader. Codex constructs outbound TLS clients in
    multiple places, and both HTTPS and secure websocket paths can fail
    behind enterprise TLS interception if they do not honor
    `CODEX_CA_CERTIFICATE` or `SSL_CERT_FILE` consistently.
    
    This PR broadens the shared custom-CA logic beyond login and applies the
    same policy to websocket TLS, so the enterprise-proxy story is no longer
    split between “HTTPS works” and “websockets still fail”.
    
    ## What This Delivers
    
    Custom CA support is no longer limited to login. Codex outbound HTTPS
    clients and secure websocket connections can now honor the same
    `CODEX_CA_CERTIFICATE` / `SSL_CERT_FILE` configuration, so enterprise
    proxy/intercept setups work more consistently end-to-end.
    
    For users and operators, nothing new needs to be configured beyond the
    same CA env vars introduced in #14178. The change is that more of Codex
    now respects them, including websocket-backed flows that were previously
    still using default trust roots.
    
    I also manually validated the proxy path locally with mitmproxy using:
    `CODEX_CA_CERTIFICATE=~/.mitmproxy/mitmproxy-ca-cert.pem
    HTTPS_PROXY=http://127.0.0.1:8080 just codex`
    with mitmproxy installed via `brew install mitmproxy` and configured as
    the macOS system proxy.
    
    ## Mental model
    
    `codex-client` is now the owner of shared custom-CA policy for outbound
    TLS client construction. Reqwest callers start from the builder
    configuration they already need, then pass that builder through
    `build_reqwest_client_with_custom_ca(...)`. Websocket callers ask the
    same module for a rustls client config when a custom CA bundle is
    configured.
    
    The env precedence is the same everywhere:
    - `CODEX_CA_CERTIFICATE` wins
    - otherwise fall back to `SSL_CERT_FILE`
    - otherwise use system roots
    
    The helper is intentionally narrow. It loads every usable certificate
    from the configured PEM bundle into the appropriate root store and
    returns either a configured transport or a typed error that explains
    what went wrong.
    
    ## Non-goals
    
    This does not add handshake-level integration tests against a live TLS
    endpoint. It does not validate that the configured bundle forms a
    meaningful certificate chain. It also does not try to force every
    transport in the repo through one abstraction; it extends the shared CA
    policy across the reqwest and websocket paths that actually needed it.
    
    ## Tradeoffs
    
    The main tradeoff is centralizing CA behavior in `codex-client` while
    still leaving adoption up to call sites. That keeps the implementation
    additive and reviewable, but it means the rule "outbound Codex TLS that
    should honor enterprise roots must use the shared helper" is still
    partly enforced socially rather than by types.
    
    For websockets, the shared helper only builds an explicit rustls config
    when a custom CA bundle is configured. When no override env var is set,
    websocket callers still use their ordinary default connector path.
    
    ## Architecture
    
    `codex-client::custom_ca` now owns CA bundle selection, PEM
    normalization, mixed-section parsing, certificate extraction, typed
    CA-loading errors, and optional rustls client-config construction for
    websocket TLS.
    
    The affected consumers now call into that shared helper directly rather
    than carrying login-local CA behavior:
    - backend-client
    - cloud-tasks
    - RMCP client paths that use `reqwest`
    - TUI voice HTTP paths
    - `codex-core` default reqwest client construction
    - `codex-api` websocket clients for both responses and realtime
    websocket connections
    
    The subprocess CA probe, env-sensitive integration tests, and shared PEM
    fixtures also live in `codex-client`, which is now the actual owner of
    the behavior they exercise.
    
    ## Observability
    
    The shared CA path logs:
    - which environment variable selected the bundle
    - which path was loaded
    - how many certificates were accepted
    - when `TRUSTED CERTIFICATE` labels were normalized
    - when CRLs were ignored
    - where client construction failed
    
    Returned errors remain user-facing and include the relevant env var,
    path, and remediation hint. That same error model now applies whether
    the failure surfaced while building a reqwest client or websocket TLS
    configuration.
    
    ## Tests
    
    Pure unit tests in `codex-client` cover env precedence and PEM
    normalization behavior. Real client construction remains in subprocess
    tests so the suite can control process env and avoid the macOS seatbelt
    panic path that motivated the hermetic test split.
    
    The subprocess coverage verifies:
    - `CODEX_CA_CERTIFICATE` precedence over `SSL_CERT_FILE`
    - fallback to `SSL_CERT_FILE`
    - single-cert and multi-cert bundles
    - malformed and empty-file errors
    - OpenSSL `TRUSTED CERTIFICATE` handling
    - CRL tolerance for well-formed CRL sections
    
    The websocket side is covered by the existing `codex-api` / `codex-core`
    websocket test suites plus the manual mitmproxy validation above.
    
    ---------
    
    Co-authored-by: Ivan Zakharchanka <3axap4eHko@gmail.com>
    Co-authored-by: Codex <noreply@openai.com>
  • feat: discourage the use of the --all-features flag (#12429)
    ## Why
    
    Developers are frequently running low on disk space, and routine use of
    `--all-features` contributes to larger Cargo build caches in `target/`
    by compiling additional feature combinations.
    
    This change updates local workflow guidance to avoid `--all-features` by
    default and reserve it for cases where full feature coverage is
    specifically needed.
    
    ## What Changed
    
    - Updated `AGENTS.md` guidance for `codex-rs` to recommend `cargo test`
    / `just test` for full-suite local runs, and to call out the disk-usage
    cost of routine `--all-features` usage.
    - Updated the root `justfile` so `just fix` and `just clippy` no longer
    pass `--all-features` by default.
    - Updated `docs/install.md` to explicitly describe `cargo test
    --all-features` as an optional heavier-weight run (more build time and
    `target/` disk usage).
    
    ## Verification
    
    - Confirmed the `justfile` parses and the recipes list successfully with
    `just --list`.
  • bazel: enforce MODULE.bazel.lock sync with Cargo.lock (#11790)
    ## Why this change
    
    When Cargo dependencies change, it is easy to end up with an unexpected
    local diff in
    `MODULE.bazel.lock` after running Bazel. That creates noisy working
    copies and pushes lockfile fixes
    later in the cycle. This change addresses that pain point directly.
    
    ## What this change enforces
    
    The expected invariant is: after dependency updates, `MODULE.bazel.lock`
    is already in sync with
    Cargo resolution. In practice, running `bazel mod deps` should not
    mutate the lockfile in a clean
    state. If it does, the dependency update is incomplete.
    
    ## How this is enforced
    
    This change adds a single lockfile check script that snapshots
    `MODULE.bazel.lock`, runs
    `bazel mod deps`, and fails if the file changes. The same check is wired
    into local workflow
    commands (`just bazel-lock-update` and `just bazel-lock-check`) and into
    Bazel CI (Linux x86_64 job)
    so drift is caught early and consistently. The developer documentation
    is updated in
    `codex-rs/docs/bazel.md` and `AGENTS.md` to make the expected flow
    explicit.
    
    `MODULE.bazel.lock` is also refreshed in this PR to match the current
    Cargo dependency resolution.
    
    ## Expected developer workflow
    
    After changing `Cargo.toml` or `Cargo.lock`, run `just
    bazel-lock-update`, then run
    `just bazel-lock-check`, and include any resulting `MODULE.bazel.lock`
    update in the same change.
    
    ## Testing
    
    Ran `just bazel-lock-check` locally.
  • docs: require insta snapshot coverage for UI changes (#10669)
    Adds an explicit requirement in AGENTS.md that any user-visible UI
    change includes corresponding insta snapshot coverage and that snapshots
    are reviewed/accepted in the PR.
    
    Tests: N/A (docs only)
  • feat(app-server): experimental flag to persist extended history (#11227)
    This PR adds an experimental `persist_extended_history` bool flag to
    app-server thread APIs so rollout logs can retain a richer set of
    EventMsgs for non-lossy Thread > Turn > ThreadItems reconstruction (i.e.
    on `thread/resume`).
    
    ### Motivation
    Today, our rollout recorder only persists a small subset (e.g. user
    message, reasoning, assistant message) of `EventMsg` types, dropping a
    good number (like command exec, file change, etc.) that are important
    for reconstructing full item history for `thread/resume`, `thread/read`,
    and `thread/fork`.
    
    Some clients want to be able to resume a thread without lossiness. This
    lossiness is primarily a UI thing, since what the model sees are
    `ResponseItem` and not `EventMsg`.
    
    ### Approach
    This change introduces an opt-in `persist_full_history` flag to preserve
    those events when you start/resume/fork a thread (defaults to `false`).
    
    This is done by adding an `EventPersistenceMode` to the rollout
    recorder:
    - `Limited` (existing behavior, default)
    - `Extended` (new opt-in behavior)
    
    In `Extended` mode, persist additional `EventMsg` variants needed for
    non-lossy app-server `ThreadItem` reconstruction. We now store the
    following ThreadItems that we didn't before:
    - web search
    - command execution
    - patch/file changes
    - MCP tool calls
    - image view calls
    - collab tool outcomes
    - context compaction
    - review mode enter/exit
    
    For **command executions** in particular, we truncate the output using
    the existing `truncate_text` from core to store an upper bound of 10,000
    bytes, which is also the default value for truncating tool outputs shown
    to the model. This keeps the size of the rollout file and command
    execution items returned over the wire reasonable.
    
    And we also persist `EventMsg::Error` which we can now map back to the
    Turn's status and populates the Turn's error metadata.
    
    #### Updates to EventMsgs
    To truly make `thread/resume` non-lossy, we also needed to persist the
    `status` on `EventMsg::CommandExecutionEndEvent` and
    `EventMsg::PatchApplyEndEvent`. Previously it was not obvious whether a
    command failed or was declined (similar for apply_patch). These
    EventMsgs were never persisted before so I made it a required field.
  • fix(app-server): fix TS annotations for optional fields on requests (#10412)
    This updates our generated TypeScript types to be more correct with how
    the server actually behaves, **specifically for JSON-RPC requests**.
    
    Before this PR, we'd generate `field: T | null`. After this PR, we will
    have `field?: T | null`. The latter matches how the server actually
    works, in that if an optional field is omitted, the server will treat it
    as null. This also makes it less annoying in theory for clients to
    upgrade to newer versions of Codex, since adding a new optional field to
    a JSON-RPC request should not require a client change.
    
    NOTE: This only applies to JSON-RPC requests. All other payloads (i.e.
    responses, notifications) will return `field: T | null` as usual.
  • Reject request_user_input outside Plan/Pair (#9955)
    ## Context
    
    Previous work in https://github.com/openai/codex/pull/9560 only rejected
    `request_user_input` in Execute and Custom modes. Since then, additional
    modes
    (e.g., Code) were added, so the guard should be mode-agnostic.
    
    ## What changed
    
    - Switch the handler to an allowlist: only Plan and PairProgramming are
    allowed
    - Return the same error for any other mode (including Code)
    - Add a Code-mode rejection test alongside the existing Execute/Custom
    tests
    
    ## Why
    
    This prevents `request_user_input` from being used in modes where it is
    not
    intended, even as new modes are introduced.
  • chore: tweak AGENTS.md (#9650)
    ## Summary
    Update AGENTS.md to improve testing flow
    
    ## Testing
    - [x] Tested locally, much faster
  • don't ask for approval for just fix (#9586)
    It blocks all my skills from executing because it asks to run just fmt.
    It's quick command that doesn't need approval.
    
    
    <img width="967" height="120" alt="image"
    src="https://github.com/user-attachments/assets/f8e6ca76-a650-49e9-beb2-ce98ba48d310"
    />
  • add generated jsonschema for config.toml (#8956)
    ### What
    Add JSON Schema generation for `config.toml`, with checked‑in
    `docs/config.schema.json`. We can move the schema elsewhere if preferred
    (and host it if there's demand).
    
    Add fixture test to prevent drift and `just write-config-schema` to
    regenerate on schema changes.
    
    Generate MCP config schema from `RawMcpServerConfig` instead of
    `McpServerConfig` because that is the runtime type used for
    deserialization.
    
    Populate feature flag values into generated schema so they can be
    autocompleted.
    
    ### Tests
    Added tests + regenerate script to prevent drift. Tested autocompletions
    using generated jsonschema locally with Even Better TOML.
    
    
    
    https://github.com/user-attachments/assets/5aa7cd39-520c-4a63-96fb-63798183d0bc
  • feat: introduce find_resource! macro that works with Cargo or Bazel (#8879)
    To support Bazelification in https://github.com/openai/codex/pull/8875,
    this PR introduces a new `find_resource!` macro that we use in place of
    our existing logic in tests that looks for resources relative to the
    compile-time `CARGO_MANIFEST_DIR` env var.
    
    To make this work, we plan to add the following to all `rust_library()`
    and `rust_test()` Bazel rules in the project:
    
    ```
    rustc_env = {
        "BAZEL_PACKAGE": native.package_name(),
    },
    ```
    
    Our new `find_resource!` macro reads this value via
    `option_env!("BAZEL_PACKAGE")` so that the Bazel package _of the code
    using `find_resource!`_ is injected into the code expanded from the
    macro. (If `find_resource()` were a function, then
    `option_env!("BAZEL_PACKAGE")` would always be
    `codex-rs/utils/cargo-bin`, which is not what we want.)
    
    Note we only consider the `BAZEL_PACKAGE` value when the `RUNFILES_DIR`
    environment variable is set at runtime, indicating that the test is
    being run by Bazel. In this case, we have to concatenate the runtime
    `RUNFILES_DIR` with the compile-time `BAZEL_PACKAGE` value to build the
    path to the resource.
    
    In testing this change, I discovered one funky edge case in
    `codex-rs/exec-server/tests/common/lib.rs` where we have to _normalize_
    (but not canonicalize!) the result from `find_resource!` because the
    path contains a `common/..` component that does not exist on disk when
    the test is run under Bazel, so it must be semantically normalized using
    the [`path-absolutize`](https://crates.io/crates/path-absolutize) crate
    before it is passed to `dotslash fetch`.
    
    Because this new behavior may be non-obvious, this PR also updates
    `AGENTS.md` to make humans/Codex aware that this API is preferred.
  • feat: introduce codex-utils-cargo-bin as an alternative to assert_cmd::Command (#8496)
    This PR introduces a `codex-utils-cargo-bin` utility crate that
    wraps/replaces our use of `assert_cmd::Command` and
    `escargot::CargoBuild`.
    
    As you can infer from the introduction of `buck_project_root()` in this
    PR, I am attempting to make it possible to build Codex under
    [Buck2](https://buck2.build) as well as `cargo`. With Buck2, I hope to
    achieve faster incremental local builds (largely due to Buck2's
    [dice](https://buck2.build/docs/insights_and_knowledge/modern_dice/)
    build strategy, as well as benefits from its local build daemon) as well
    as faster CI builds if we invest in remote execution and caching.
    
    See
    https://buck2.build/docs/getting_started/what_is_buck2/#why-use-buck2-key-advantages
    for more details about the performance advantages of Buck2.
    
    Buck2 enforces stronger requirements in terms of build and test
    isolation. It discourages assumptions about absolute paths (which is key
    to enabling remote execution). Because the `CARGO_BIN_EXE_*` environment
    variables that Cargo provides are absolute paths (which
    `assert_cmd::Command` reads), this is a problem for Buck2, which is why
    we need this `codex-utils-cargo-bin` utility.
    
    My WIP-Buck2 setup sets the `CARGO_BIN_EXE_*` environment variables
    passed to a `rust_test()` build rule as relative paths.
    `codex-utils-cargo-bin` will resolve these values to absolute paths,
    when necessary.
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/8496).
    * #8498
    * __->__ #8496
  • docs: remove blanket ban on unsigned integers (#7957)
    Drop the AGENTS.md rule that forbids unsigned ints. The blanket guidance
    causes unnecessary complexity in cases where values are naturally
    unsigned, leading to extra clamping/conversion code instead of using
    checked or saturating arithmetic where needed.
  • feat(core) Add login to shell_command tool (#6846)
    ## Summary
    Adds the `login` parameter to the `shell_command` tool - optional,
    defaults to true.
    
    ## Testing
    - [x] Tested locally
  • Prefer wait_for_event over wait_for_event_with_timeout. (#6346)
    No need to specify the timeout in most cases.
  • Auto compact at ~90% (#5292)
    Users now hit a window exceeded limit and they usually don't know what
    to do. This starts auto compact at ~90% of the window.
  • [MCP] Add support for resources (#5239)
    This PR adds support for [MCP
    resources](https://modelcontextprotocol.io/specification/2025-06-18/server/resources)
    by adding three new tools for the model:
    1. `list_resources`
    2. `list_resource_templates`
    3. `read_resource`
    
    These 3 tools correspond to the [three primary MCP resource protocol
    messages](https://modelcontextprotocol.io/specification/2025-06-18/server/resources#protocol-messages).
    
    Example of listing and reading a GitHub resource tempalte
    <img width="2984" height="804" alt="CleanShot 2025-10-15 at 17 31 10"
    src="https://github.com/user-attachments/assets/89b7f215-2e2a-41c5-90dd-b932ac84a585"
    />
    
    `/mcp` with Figma configured
    <img width="2984" height="442" alt="CleanShot 2025-10-15 at 18 29 35"
    src="https://github.com/user-attachments/assets/a7578080-2ed2-4c59-b9b4-d8461f90d8ee"
    />
    
    Fixes #4956
  • Simplify request body assertions (#4845)
    We'll have a lot more test like these
  • chore: subject docs/*.md to Prettier checks (#4645)
    Apparently we were not running our `pnpm run prettier` check in CI, so
    many files that were covered by the existing Prettier check were not
    well-formatted.
    
    This updates CI and formats the files.
  • [MCP] Add support for MCP Oauth credentials (#4517)
    This PR adds oauth login support to streamable http servers when
    `experimental_use_rmcp_client` is enabled.
    
    This PR is large but represents the minimal amount of work required for
    this to work. To keep this PR smaller, login can only be done with
    `codex mcp login` and `codex mcp logout` but it doesn't appear in `/mcp`
    or `codex mcp list` yet. Fingers crossed that this is the last large MCP
    PR and that subsequent PRs can be smaller.
    
    Under the hood, credentials are stored using platform credential
    managers using the [keyring crate](https://crates.io/crates/keyring).
    When the keyring isn't available, it falls back to storing credentials
    in `CODEX_HOME/.credentials.json` which is consistent with how other
    coding agents handle authentication.
    
    I tested this on macOS, Windows, WSL (ubuntu), and Linux. I wasn't able
    to test the dbus store on linux but did verify that the fallback works.
    
    One quirk is that if you have credentials, during development, every
    build will have its own ad-hoc binary so the keyring won't recognize the
    reader as being the same as the write so it may ask for the user's
    password. I may add an override to disable this or allow
    users/enterprises to opt-out of the keyring storage if it causes issues.
    
    <img width="5064" height="686" alt="CleanShot 2025-09-30 at 19 31 40"
    src="https://github.com/user-attachments/assets/9573f9b4-07f1-4160-83b8-2920db287e2d"
    />
    <img width="745" height="486" alt="image"
    src="https://github.com/user-attachments/assets/9562649b-ea5f-4f22-ace2-d0cb438b143e"
    />
  • AGENTS.md: Add instruction to install missing commands (#3807)
    This change instructs the model to install any missing command. Else
    tokens are wasted when it tries to run
    commands that aren't available multiple times before installing them.
  • syntax-highlight bash lines (#3142)
    i'm not yet convinced i have the best heuristics for what to highlight,
    but this feels like a useful step towards something a bit easier to
    read, esp. when the model is producing large commands.
    
    <img width="669" height="589" alt="Screenshot 2025-09-03 at 8 21 56 PM"
    src="https://github.com/user-attachments/assets/b9cbcc43-80e8-4d41-93c8-daa74b84b331"
    />
    
    also a fairly significant refactor of our line wrapping logic.
  • AGENTS.md: clarify test approvals for codex-rs (#3132)
    Clarifies codex-rs testing approvals in AGENTS.md:
    
    - Allow running project-specific or individual tests without asking.
    - Require asking before running the complete test suite.
    - Keep `just fmt` always allowed without approval.
  • core: correct sandboxed shell tool description (reads allowed anywhere) (#3069)
    Correct the `shell` tool description for sandboxed runs and add targeted
    tests.
    
    - Fix the WorkspaceWrite description to clearly state that writes
    outside the writable roots require escalated permissions; reads are not
    restricted. The previous wording/formatting could be read as restricting
    reads outside the workspace.
    - Render the writable roots list on its own lines under a newline after
    "writable roots:" for clarity.
    - Show the "Commands that require network access" note only in
    WorkspaceWrite when network is disabled.
    - Add focused tests that call `create_shell_tool_for_sandbox` directly
    and assert the exact description text for WorkspaceWrite, ReadOnly, and
    DangerFullAccess.
    - Update AGENTS.md to note that `just fmt` can be run automatically
    without asking.
  • prefer ratatui Stylized for constructing lines/spans (#3068)
    no functional change, just simplifying ratatui styling and adding
    guidance in AGENTS.md for future.
  • Fix typo in AGENTS.md (#2518)
    - Change `examole` to `example`
  • tui: tab-completing a command moves the cursor to the end (#2362)
    also tweak agents.md for faster `just fix`
  • fix: clean up styles & colors and define in styles.md (#2401)
    New style guide:
    
      # Headers, primary, and secondary text
      
    - **Headers:** Use `bold`. For markdown with various header levels,
    leave in the `#` signs.
      - **Primary text:** Default.
      - **Secondary text:** Use `dim`.
      
      # Foreground colors
      
    - **Default:** Most of the time, just use the default foreground color.
    `reset` can help get it back.
    - **Selection:** Use ANSI `blue`. (Ed & AE want to make this cyan too,
    but we'll do that in a followup since it's riskier in different themes.)
      - **User input tips and status indicators:** Use ANSI `cyan`.
      - **Success and additions:** Use ANSI `green`.
      - **Errors, failures and deletions:** Use ANSI `red`.
      - **Codex:** Use ANSI `magenta`.
      
      # Avoid
      
    - Avoid custom colors because there's no guarantee that they'll contrast
    well or look good on various terminal color themes.
    - Avoid ANSI `black`, `white`, `yellow` as foreground colors because the
    terminal theme will do a better job. (Use `reset` if you need to in
    order to get those.) The exception is if you need contrast rendering
    over a manually colored background.
      
      (There are some rules to try to catch this in `clippy.toml`.)
    
    # Testing
    
    Tested in a variety of light and dark color themes in Terminal, iTerm2, and Ghostty.
  • [1/3] Parse exec commands and format them more nicely in the UI (#2095)
    # Note for reviewers
    The bulk of this PR is in in the new file, `parse_command.rs`. This file
    is designed to be written TDD and implemented with Codex. Do not worry
    about reviewing the code, just review the unit tests (if you want). If
    any cases are missing, we'll add more tests and have Codex fix them.
    
    I think the best approach will be to land and iterate. I have some
    follow-ups I want to do after this lands. The next PR after this will
    let us merge (and dedupe) multiple sequential cells of the same such as
    multiple read commands. The deduping will also be important because the
    model often reads the same file multiple times in a row in chunks
    
    ===
    
    This PR formats common commands like reading, formatting, testing, etc
    more nicely:
    
    It tries to extract things like file names, tests and falls back to the
    cmd if it doesn't. It also only shows stdout/err if the command failed.
    
    <img width="770" height="238" alt="CleanShot 2025-08-09 at 16 05 15"
    src="https://github.com/user-attachments/assets/0ead179a-8910-486b-aa3d-7d26264d751e"
    />
    <img width="348" height="158" alt="CleanShot 2025-08-09 at 16 05 32"
    src="https://github.com/user-attachments/assets/4302681b-5e87-4ff3-85b4-0252c6c485a9"
    />
    <img width="834" height="324" alt="CleanShot 2025-08-09 at 16 05 56 2"
    src="https://github.com/user-attachments/assets/09fb3517-7bd6-40f6-a126-4172106b700f"
    />
    
    Part 2: https://github.com/openai/codex/pull/2097
    Part 3: https://github.com/openai/codex/pull/2110
  • feat: make .git read-only within a writable root when using Seatbelt (#1765)
    To make `--full-auto` safer, this PR updates the Seatbelt policy so that
    a `SandboxPolicy` with a `writable_root` that contains a `.git/`
    _directory_ will make `.git/` _read-only_ (though as a follow-up, we
    should also consider the case where `.git` is a _file_ with a `gitdir:
    /path/to/actual/repo/.git` entry that should also be protected).
    
    The two major changes in this PR:
    
    - Updating `SandboxPolicy::get_writable_roots_with_cwd()` to return a
    `Vec<WritableRoot>` instead of a `Vec<PathBuf>` where a `WritableRoot`
    can specify a list of read-only subpaths.
    - Updating `create_seatbelt_command_args()` to honor the read-only
    subpaths in `WritableRoot`.
    
    The logic to update the policy is a fairly straightforward update to
    `create_seatbelt_command_args()`, but perhaps the more interesting part
    of this PR is the introduction of an integration test in
    `tests/sandbox.rs`. Leveraging the new API in #1785, we test
    `SandboxPolicy` under various conditions, including ones where `$TMPDIR`
    is not readable, which is critical for verifying the new behavior.
    
    To ensure that Codex can run its own tests, e.g.:
    
    ```
    just codex debug seatbelt --full-auto -- cargo test if_git_repo_is_writable_root_then_dot_git_folder_is_read_only
    ```
    
    I had to introduce the use of `CODEX_SANDBOX=sandbox`, which is
    comparable to how `CODEX_SANDBOX_NETWORK_DISABLED=1` was already being
    used.
    
    Adding a comparable change for Landlock will be done in a subsequent PR.
  • chore: rename toolchain file (#1604)
    Rename toolchain file so older versions of cargo can pick it up.
  • chore: auto format code on save and add more details to AGENTS.md (#1582)
    Adds a default vscode config with generally applicable settings.
    Adds more entrypoints to justfile both  for environment setup and to help
    agents better verify changes.
  • feat: add support for AGENTS.md in Rust CLI (#885)
    The TypeScript CLI already has support for including the contents of
    `AGENTS.md` in the instructions sent with the first turn of a
    conversation. This PR brings this functionality to the Rust CLI.
    
    To be considered, `AGENTS.md` must be in the `cwd` of the session, or in
    one of the parent folders up to a Git/filesystem root (whichever is
    encountered first).
    
    By default, a maximum of 32 KiB of `AGENTS.md` will be included, though
    this is configurable using the new-in-this-PR `project_doc_max_bytes`
    option in `config.toml`.