Commit Graph

69 Commits

  • Specify platform support in AGENTS.md (#27966)
    Codex seems to do interesting things with `cfg`'s sometimes and it seems
    it would be good to give it guidance about how broadly our Rust needs to
    work.
    
    This adds a very brief section to AGENTS.md explaining that we target
    the major desktop OSes and that we want the vast majority of our logic
    to be portable across them.
  • [codex] Add crate API surface review rule (#27939)
    ## Why
    
    Review guidance should explicitly discourage widening crate APIs for
    testing convenience. Keeping those boundaries narrow reduces accidental
    coupling and prevents one-off test utilities from becoming durable
    public surface area.
    
    ## What
    
    - Add a crate API surface rule to `AGENTS.md`.
    - Ask reviewers to keep crate APIs small and avoid proliferating
    test-only helpers.
    
    ## Test plan
    
    - Not run (documentation-only change).
  • lint: allow self-documenting builder arguments (#27507)
    Builder-style setters often repeat the setting name in both the method
    and its sole argument. Calls such as `.enabled(false)` are already
    self-documenting, so requiring `/*enabled*/` adds noise without
    clarifying the call.
    
    ## What changed
    
    - Exempt a method's sole non-self argument when its resolved parameter
    name matches the method name.
    - Continue validating any explicit argument comment against the resolved
    parameter name.
    - Continue requiring comments when method and parameter names differ or
    when a method has multiple non-self arguments.
    - Document the exception in `AGENTS.md` and the lint's own behavior
    documentation.
    
    ## Examples
    
    Before this change we'd need redundant comments like this:
    
    ```rust
    builder.enabled(/*false*/ false);
    builder.retry_count(/*retry_count*/ 3);
    builder.base_url(/*base_url*/ None);
    ```
    
    Now can be written like this:
    
    ```rust
    builder.enabled(false);
    builder.retry_count(3);
    builder.base_url(None);
    ```
    
    Still disallowed:
    
    ```rust
    client.set_flag(true); // Method name does not match parameter `enabled`.
    options.enabled(false, /*retry_count*/ 3); // More than one non-self argument.
    options.enabled(/*value*/ false); // Explicit comment does not match `enabled`.
    ```
    
    ## Validation
    
    Added UI coverage for boolean, numeric, and `None` builder arguments,
    multi-argument methods, and explicit comment mismatches. Ran `rustup run
    nightly-2025-09-18 cargo test` in `tools/argument-comment-lint`.
  • Remove just bench-smoke from just test. (#26716)
    ## Why
    
    `just test` should run the test suite without also compiling and
    executing benchmark smoke tests. Keeping benchmark validation explicit
    avoids adding unrelated work to every project-specific test invocation.
    
    ## What changed
    
    - Remove the `just bench-smoke` step from the Unix and Windows `test`
    recipes.
    - Document `just bench` and `just bench-smoke` as the explicit benchmark
    commands in `AGENTS.md`.
    
    ## Validation
    
    - `just test -p codex-arg0`
    - `just --dry-run test`
    - `just --dry-run bench-smoke`
  • [codex] Add /usr/bin/bash shell fallback (#26538)
    ## Why
    
    Some Linux environments expose `bash` at `/usr/bin/bash` instead of
    `/bin/bash`. The shell detection fallback list should cover both
    standard locations once PATH/user-shell probing fails.
    
    Stacked on #26480.
    
    ## What changed
    
    - Add `/usr/bin/bash` to the bash fallback path list in
    `codex-shell-command`.
    - Extend shell type detection coverage for `/usr/bin/bash`.
    - Add AGENTS.md testing guidance to avoid tests for statically defined
    values and negative tests for removed logic.
    
    ## Verification
    
    - `just test -p codex-shell-command`
  • Move code review rules into AGENTS (#25738)
    ## Why
    Codex Review now supports repository-specific review rules in AGENTS.md.
    Adding the review prompts there makes the guidance available as
    repository review rules next to the code it governs while keeping the
    existing local review skills intact.
    
    ## What changed
    - Added a `## Code Review Rules` section to `AGENTS.md` with the
    existing review prompts for model context, breaking changes, test
    authoring, and change size.
    - Preserved the existing `.codex/skills/code-review*` skill files.
    
    ## Verification
    - `git diff --check origin/main...HEAD`
  • Add Python version compatibility guidance (#25690)
    ## Why
    
    Python contributions in this repository should target the declared
    Python 3 runtime instead of carrying Python 2 compatibility patterns
    forward. When compatibility across Python 3 point releases matters,
    contributors need a consistent source of truth for the minimum supported
    version.
    
    ## What changed
    
    - Added Python development guidance to `AGENTS.md` stating that the
    repository uses Python 3+ and should not use the `__future__` module.
    - Documented that contributors should check the nearest `pyproject.toml`
    `requires-python` field when evaluating Python 3 point-release
    compatibility.
    
    ## Testing
    
    Not run (guidance-only change).
  • [codex] document out-of-line test module convention (#25682)
    ## Why
    
    New unit test modules should follow one consistent layout so
    implementation files stay focused and test suites remain easy to locate,
    without creating cleanup churn in existing inline test modules.
    
    ## What changed
    
    - Added `AGENTS.md` guidance requiring new test modules to use separate
    sibling `*_tests.rs` files with an explicit `#[path = "..._tests.rs"]`
    attribute.
    - Clarified that existing inline `#[cfg(test)] mod tests { ... }`
    modules should not be moved solely to follow the new convention.
    
    ## Validation
    
    - Ran `git diff --check`.
  • Check root Python script formatting in CI (#25165)
    ## Why
    
    Python files under `scripts/` were not covered by the repository
    formatting recipe or the CI formatting job, so formatting drift could
    merge unnoticed.
    
    ## What
    
    - Add a dedicated `scripts/pyproject.toml` and `scripts/uv.lock` so
    root-script formatting uses a locked Ruff version.
    - Extend `just fmt` to format root Python scripts and add
    `fmt-scripts-check` for CI.
    - Run `just fmt-scripts-check` from `.github/workflows/ci.yml`,
    installing `uv` through SHA-pinned `astral-sh/setup-uv` while retaining
    the `uv` `0.11.3` pin.
    - Apply Ruff formatting to the root Python scripts, including
    `scripts/just-shell.py`, and extend
    `sdk/python/tests/test_artifact_workflow_and_binaries.py` to cover the
    root formatting recipe.
    - Update `AGENTS.md` so agents run `just fmt` after code changes
    anywhere in the repository.
    
    ## Validation
    
    - Extended the existing Python SDK workflow test to assert that `just
    fmt` includes root Python scripts.
  • [codex] Remove external client session reset plumbing (#24157)
    ## Why
    
    The turn loop no longer needs to decide when a `ModelClientSession`
    should reset its websocket state after compaction. That reset behavior
    belongs inside the model client, where the websocket cache and retry
    state are owned. The repo guidance now calls this out explicitly so
    future changes let the incremental request logic decide whether the
    previous request can be reused.
    
    ## What Changed
    
    - Removed the `reset_client_session` return value from pre-sampling and
    auto-compact helpers in `core/src/session/turn.rs`.
    - Changed compaction helpers to return `CodexResult<()>` so callers only
    handle success or failure.
    - Made `ModelClientSession::reset_websocket_session` private to
    `core/src/client.rs`, leaving it callable only from model-client
    internals.
    - Added `AGENTS.md` guidance not to call `reset_client_session`
    unnecessarily.
    
    ## Validation
    
    - `just test -p codex-core session::turn`
  • Prefer just test over cargo test in docs (#23910)
    `cargo test` for the core and other crates fails on a fresh macOS
    checkout without the right stack size variable. This change encourages
    using the just test command that sets the environment up correctly.
    
    As a bonus, this should encourage agents to get more benefit out of
    nextest's parallel execution.
  • Clarify docs folder guidance in AGENTS.md (#21772)
    ## Summary
    
    Codex keeps trying to add documentation to the `docs/` directory. With
    the exception of app server API documentation, the docs for Codex should
    not live in this repo. We don't want the local `docs/` folder to become
    a stale shadow of the official docs.
    
    This PR updates `AGENTS.md` to make that boundary explicit and scopes
    the existing API documentation guidance to app-server docs/examples. It
    also removes the extra `docs/config.md` sections that were recently
    added.
  • Ensure all mentions of cargo-install are --locked (#21592)
    There's already a preference for this in the codebase, but a few of them
    have drifted away. Generally `--locked` is preferred to reduce exposure
    to supply-chain attacks (and just generally improve reproducibility).
    
    In an ideal world these dependencies would maybe even be pinned to
    versions but Cargo is kinda bad at that for devtools. Still better to
    use --locked than not.
  • docs: discourage #[async_trait] and #[allow(async_fn_in_trait)] (#20242)
    ## Why
    
    We have run into two avoidable problems when introducing async trait
    APIs in Rust:
    
    - `#[async_trait]` has caused materially worse build times in this
    repository.
    - `#[allow(async_fn_in_trait)]` makes it too easy to ship a public trait
    without spelling out whether the returned future is `Send`, which hides
    an important part of the trait contract.
    
    We already have a good example of the preferred alternative in
    [#16630](https://github.com/openai/codex/pull/16630) /
    [`3c7f013f9735`](https://github.com/openai/codex/commit/3c7f013f9735),
    but that guidance currently lives only as prior art in the codebase.
    This PR documents the rule in `AGENTS.md` so contributors are more
    likely to follow the native RPITIT pattern before these two shortcuts
    spread further.
    
    ## What Changed
    
    - added Rust guidance in `AGENTS.md` discouraging both `#[async_trait]`
    and `#[allow(async_fn_in_trait)]`
    - pointed contributors to the native RPITIT pattern with explicit `Send`
    bounds on the returned future
    - clarified that implementations may still use `async fn` when they
    satisfy that trait contract
    
    ## Verification
    
    - docs-only change; no tests run
  • feat(tui): add clear-context plan implementation (#17499)
    ## TL;DR
    
    - Adds a second Plan Mode handoff: implement the approved plan after
    clearing context.
    - Keeps the existing same-thread `Yes, implement this plan` action
    unchanged.
    - Reuses the `/clear` thread-start path and submits the approved plan as
    the fresh thread's first prompt.
    - Covers the new popup option, event plumbing, initial-message behavior,
    and disabled states in TUI tests.
    
    ## Problem
    
    Plan Mode already asks whether to implement an approved plan, but the
    only affirmative path continues in the same thread. That is useful when
    the planning conversation itself is still valuable, but it does not
    support the workflow where exploratory planning context is discarded and
    implementation starts from the final approved plan as the only
    model-visible handoff.
    
    <img width="1253" height="869" alt="image"
    src="https://github.com/user-attachments/assets/90023d75-c330-4919-bed8-518671c3474b"
    />
    
    ## Mental model
    
    There are now two implementation choices after a proposed plan. The
    existing choice, `Yes, implement this plan`, is unchanged: it switches
    to Default mode and submits `Implement the plan.` in the current thread.
    The new choice, `Yes, clear context and implement`, treats the proposed
    plan as a handoff artifact. It clears the UI/session context through the
    same thread-start source used by `/clear`, then submits an initial
    prompt containing the approved plan after the fresh thread is
    configured.
    
    The important distinction is that the new path is not compaction. The
    model receives a deliberate implementation prompt built from the
    approved plan markdown, not a summary of the previous planning
    transcript. Both implementation choices require the Default
    collaboration preset to be available, so the popup does not offer a
    coding handoff when the fresh thread would fall back to another mode.
    
    ## Non-goals
    
    This change does not alter `/clear`, `/compact`, or the existing
    same-context Plan Mode implementation option. It does not add protocol
    surface area or app-server schema changes. It also does not carry the
    previous transcript path or a generated planning summary into the new
    model context.
    
    ## Tradeoffs
    
    The fresh-context option relies on the approved plan being sufficiently
    complete. That matches the Plan Mode contract, but it means vague plans
    will produce weaker implementation starts than a compacted transcript
    would. The upside is that rejected ideas, exploratory dead ends, and
    planning corrections do not leak into the implementation turn.
    
    The current implementation stores the latest proposed plan in
    `ChatWidget` rather than deriving it from history cells at selection
    time. This keeps the popup action simple and deterministic, but it makes
    the cache lifecycle important: it must be reset when a new task starts
    so an old plan cannot be submitted later.
    
    ## Architecture
    
    The TUI stores the most recent completed proposed-plan markdown when a
    plan item completes. The Plan Mode approval popup uses that cache to
    enable the fresh-context option and to build a first-turn prompt that
    instructs the model to implement the approved plan in a fresh context.
    
    Selecting the new option emits a TUI-internal
    `ClearUiAndSubmitUserMessage` event. `App` handles that event by reusing
    the existing clear flow: clear terminal state, reset app UI state, start
    a new app-server thread with `ThreadStartSource::Clear`, and attach a
    replacement `ChatWidget` with an initial user message. The existing
    initial-message suppression in `enqueue_primary_thread_session` ensures
    the prompt is submitted only after the new session is configured and any
    startup replay is rendered.
    
    ## Observability
    
    The previous thread remains resumable through the existing clear-session
    summary hint. There is no new telemetry or protocol event for this path,
    so debugging should start at the TUI event boundary: confirm the popup
    emitted `ClearUiAndSubmitUserMessage`, confirm the app-server thread
    start used `ThreadStartSource::Clear`, then confirm the fresh widget
    submitted the initial user message after `SessionConfigured`.
    
    ## Tests
    
    The Plan Mode popup snapshots cover the new option and preserve the
    original option as the first/default action. Unit coverage verifies the
    original same-context option still emits `SubmitUserMessageWithMode`,
    the new option emits `ClearUiAndSubmitUserMessage` with the approved
    plan embedded verbatim, and the clear-context option is disabled when
    Default mode is unavailable or no approved plan exists. The broader
    `codex-tui` test package passes with the updated fresh-thread
    initial-message plumbing.
  • feat: add Codex Apps sediment file remapping (#15197)
    ## Summary
    - bridge Codex Apps tools that declare `_meta["openai/fileParams"]`
    through the OpenAI file upload flow
    - mask those file params in model-visible tool schemas so the model
    provides absolute local file paths instead of raw file payload objects
    - rewrite those local file path arguments client-side into
    `ProvidedFilePayload`-shaped objects before the normal MCP tool call
    
    ## Details
    - applies to scalar and array file params declared in
    `openai/fileParams`
    - Codex uploads local files directly to the backend and uses the
    uploaded file metadata to build the MCP tool arguments locally
    - this PR is input-only
    
    ## Verification
    - `just fmt`
    - `cargo test -p codex-core mcp_tool_call -- --nocapture`
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • [codex] reduce module visibility (#16978)
    ## Summary
    - reduce public module visibility across Rust crates, preferring private
    or crate-private modules with explicit crate-root public exports
    - update external call sites and tests to use the intended public crate
    APIs instead of reaching through module trees
    - add the module visibility guideline to AGENTS.md
    
    ## Validation
    - `cargo check --workspace --all-targets --message-format=short` passed
    before the final fix/format pass
    - `just fix` completed successfully
    - `just fmt` completed successfully
    - `git diff --check` passed
  • docs: update argument_comment_lint instructions in AGENTS.md (#16375)
    I noticed that Codex was spending more time on running this lint check
    locally than I would like. Now that we have the linter running
    cross-platform using Bazel in CI, I find it's best just to update the PR
    ASAP to get CI going than to wait for `just argument-comment-lint` to
    finish locally before updating the PR.
  • Refactor external auth to use a single trait (#16356)
    ## Summary
    - Replace the separate external auth enum and refresher trait with a
    single `ExternalAuth` trait in login auth flow
    - Move bearer token auth behind `BearerTokenRefresher` and update
    `AuthManager` and app-server wiring to use the generic external auth API
  • Rename tui_app_server to tui (#16104)
    This is a follow-up to https://github.com/openai/codex/pull/15922. That
    previous PR deleted the old `tui` directory and left the new
    `tui_app_server` directory in place. This PR renames `tui_app_server` to
    `tui` and fixes up all references.
  • Remove the legacy TUI split (#15922)
    This is the part 1 of 2 PRs that will delete the `tui` /
    `tui_app_server` split. This part simply deletes the existing `tui`
    directory and marks the `tui_app_server` feature flag as removed. I left
    the `tui_app_server` feature flag in place for now so its presence
    doesn't result in an error. It is simply ignored.
    
    Part 2 will rename the `tui_app_server` directory `tui`. I did this as
    two parts to reduce visible code churn.
  • docs: update AGENTS.md to discourage adding code to codex-core (#15910)
    ## Why
    
    `codex-core` is already the largest crate in `codex-rs`, so defaulting
    to it for new functionality makes it harder to keep the workspace
    modular. The repo guidance should make it explicit that contributors are
    expected to look for an existing non-`codex-core` crate, or introduce a
    new crate, before growing `codex-core` further.
    
    ## What Changed
    
    - Added a dedicated `The \`codex-core\` crate` section to `AGENTS.md`.
    - Documented why `codex-core` should be treated as a last resort for new
    functionality.
    - Added concrete guidance for both implementation and review: prefer an
    existing non-`codex-core` crate when possible, introduce a new workspace
    crate when that is the cleaner boundary, and push back on PRs that grow
    `codex-core` unnecessarily.
  • chore: ask agents md not to play with PIDs (#15877)
    Ask Codex to be patient with Rust
  • Use released DotSlash package for argument-comment lint (#15199)
    ## Why
    The argument-comment lint now has a packaged DotSlash artifact from
    [#15198](https://github.com/openai/codex/pull/15198), so the normal repo
    lint path should use that released payload instead of rebuilding the
    lint from source every time.
    
    That keeps `just clippy` and CI aligned with the shipped artifact while
    preserving a separate source-build path for people actively hacking on
    the lint crate.
    
    The current alpha package also exposed two integration wrinkles that the
    repo-side prebuilt wrapper needs to smooth over:
    - the bundled Dylint library filename includes the host triple, for
    example `@nightly-2025-09-18-aarch64-apple-darwin`, and Dylint derives
    `RUSTUP_TOOLCHAIN` from that filename
    - on Windows, Dylint's driver path also expects `RUSTUP_HOME` to be
    present in the environment
    
    Without those adjustments, the prebuilt CI jobs fail during `cargo
    metadata` or driver setup. This change makes the checked-in prebuilt
    wrapper normalize the packaged library name to the plain
    `nightly-2025-09-18` channel before invoking `cargo-dylint`, and it
    teaches both the wrapper and the packaged runner source to infer
    `RUSTUP_HOME` from `rustup show home` when the environment does not
    already provide it.
    
    After the prebuilt Windows lint job started running successfully, it
    also surfaced a handful of existing anonymous literal callsites in
    `windows-sandbox-rs`. This PR now annotates those callsites so the new
    cross-platform lint job is green on the current tree.
    
    ## What Changed
    - checked in the current
    `tools/argument-comment-lint/argument-comment-lint` DotSlash manifest
    - kept `tools/argument-comment-lint/run.sh` as the source-build wrapper
    for lint development
    - added `tools/argument-comment-lint/run-prebuilt-linter.sh` as the
    normal enforcement path, using the checked-in DotSlash package and
    bundled `cargo-dylint`
    - updated `just clippy` and `just argument-comment-lint` to use the
    prebuilt wrapper
    - split `.github/workflows/rust-ci.yml` so source-package checks live in
    a dedicated `argument_comment_lint_package` job, while the released lint
    runs in an `argument_comment_lint_prebuilt` matrix on Linux, macOS, and
    Windows
    - kept the pinned `nightly-2025-09-18` toolchain install in the prebuilt
    CI matrix, since the prebuilt package still relies on rustup-provided
    toolchain components
    - updated `tools/argument-comment-lint/run-prebuilt-linter.sh` to
    normalize host-qualified nightly library filenames, keep the `rustup`
    shim directory ahead of direct toolchain `cargo` binaries, and export
    `RUSTUP_HOME` when needed for Windows Dylint driver setup
    - updated `tools/argument-comment-lint/src/bin/argument-comment-lint.rs`
    so future published DotSlash artifacts apply the same nightly-filename
    normalization and `RUSTUP_HOME` inference internally
    - fixed the remaining Windows lint violations in
    `codex-rs/windows-sandbox-rs` by adding the required `/*param*/`
    comments at the reported callsites
    - documented the checked-in DotSlash file, wrapper split, archive
    layout, nightly prerequisite, and Windows `RUSTUP_HOME` requirement in
    `tools/argument-comment-lint/README.md`
  • Move TUI on top of app server (parallel code) (#14717)
    This PR replicates the `tui` code directory and creates a temporary
    parallel `tui_app_server` directory. It also implements a new feature
    flag `tui_app_server` to select between the two tui implementations.
    
    Once the new app-server-based TUI is stabilized, we'll delete the old
    `tui` directory and feature flag.
  • client: extend custom CA handling across HTTPS and websocket clients (#14239)
    ## Stacked PRs
    
    This work is now effectively split across two steps:
    
    - #14178: add custom CA support for browser and device-code login flows,
    docs, and hermetic subprocess tests
    - #14239: extend that shared custom CA handling across Codex HTTPS
    clients and secure websocket TLS
    
    Note: #14240 was merged into this branch while it was stacked on top of
    this PR. This PR now subsumes that websocket follow-up and should be
    treated as the combined change.
    
    Builds on top of #14178.
    
    ## Problem
    
    Custom CA support landed first in the login path, but the real
    requirement is broader. Codex constructs outbound TLS clients in
    multiple places, and both HTTPS and secure websocket paths can fail
    behind enterprise TLS interception if they do not honor
    `CODEX_CA_CERTIFICATE` or `SSL_CERT_FILE` consistently.
    
    This PR broadens the shared custom-CA logic beyond login and applies the
    same policy to websocket TLS, so the enterprise-proxy story is no longer
    split between “HTTPS works” and “websockets still fail”.
    
    ## What This Delivers
    
    Custom CA support is no longer limited to login. Codex outbound HTTPS
    clients and secure websocket connections can now honor the same
    `CODEX_CA_CERTIFICATE` / `SSL_CERT_FILE` configuration, so enterprise
    proxy/intercept setups work more consistently end-to-end.
    
    For users and operators, nothing new needs to be configured beyond the
    same CA env vars introduced in #14178. The change is that more of Codex
    now respects them, including websocket-backed flows that were previously
    still using default trust roots.
    
    I also manually validated the proxy path locally with mitmproxy using:
    `CODEX_CA_CERTIFICATE=~/.mitmproxy/mitmproxy-ca-cert.pem
    HTTPS_PROXY=http://127.0.0.1:8080 just codex`
    with mitmproxy installed via `brew install mitmproxy` and configured as
    the macOS system proxy.
    
    ## Mental model
    
    `codex-client` is now the owner of shared custom-CA policy for outbound
    TLS client construction. Reqwest callers start from the builder
    configuration they already need, then pass that builder through
    `build_reqwest_client_with_custom_ca(...)`. Websocket callers ask the
    same module for a rustls client config when a custom CA bundle is
    configured.
    
    The env precedence is the same everywhere:
    - `CODEX_CA_CERTIFICATE` wins
    - otherwise fall back to `SSL_CERT_FILE`
    - otherwise use system roots
    
    The helper is intentionally narrow. It loads every usable certificate
    from the configured PEM bundle into the appropriate root store and
    returns either a configured transport or a typed error that explains
    what went wrong.
    
    ## Non-goals
    
    This does not add handshake-level integration tests against a live TLS
    endpoint. It does not validate that the configured bundle forms a
    meaningful certificate chain. It also does not try to force every
    transport in the repo through one abstraction; it extends the shared CA
    policy across the reqwest and websocket paths that actually needed it.
    
    ## Tradeoffs
    
    The main tradeoff is centralizing CA behavior in `codex-client` while
    still leaving adoption up to call sites. That keeps the implementation
    additive and reviewable, but it means the rule "outbound Codex TLS that
    should honor enterprise roots must use the shared helper" is still
    partly enforced socially rather than by types.
    
    For websockets, the shared helper only builds an explicit rustls config
    when a custom CA bundle is configured. When no override env var is set,
    websocket callers still use their ordinary default connector path.
    
    ## Architecture
    
    `codex-client::custom_ca` now owns CA bundle selection, PEM
    normalization, mixed-section parsing, certificate extraction, typed
    CA-loading errors, and optional rustls client-config construction for
    websocket TLS.
    
    The affected consumers now call into that shared helper directly rather
    than carrying login-local CA behavior:
    - backend-client
    - cloud-tasks
    - RMCP client paths that use `reqwest`
    - TUI voice HTTP paths
    - `codex-core` default reqwest client construction
    - `codex-api` websocket clients for both responses and realtime
    websocket connections
    
    The subprocess CA probe, env-sensitive integration tests, and shared PEM
    fixtures also live in `codex-client`, which is now the actual owner of
    the behavior they exercise.
    
    ## Observability
    
    The shared CA path logs:
    - which environment variable selected the bundle
    - which path was loaded
    - how many certificates were accepted
    - when `TRUSTED CERTIFICATE` labels were normalized
    - when CRLs were ignored
    - where client construction failed
    
    Returned errors remain user-facing and include the relevant env var,
    path, and remediation hint. That same error model now applies whether
    the failure surfaced while building a reqwest client or websocket TLS
    configuration.
    
    ## Tests
    
    Pure unit tests in `codex-client` cover env precedence and PEM
    normalization behavior. Real client construction remains in subprocess
    tests so the suite can control process env and avoid the macOS seatbelt
    panic path that motivated the hermetic test split.
    
    The subprocess coverage verifies:
    - `CODEX_CA_CERTIFICATE` precedence over `SSL_CERT_FILE`
    - fallback to `SSL_CERT_FILE`
    - single-cert and multi-cert bundles
    - malformed and empty-file errors
    - OpenSSL `TRUSTED CERTIFICATE` handling
    - CRL tolerance for well-formed CRL sections
    
    The websocket side is covered by the existing `codex-api` / `codex-core`
    websocket test suites plus the manual mitmproxy validation above.
    
    ---------
    
    Co-authored-by: Ivan Zakharchanka <3axap4eHko@gmail.com>
    Co-authored-by: Codex <noreply@openai.com>
  • feat: discourage the use of the --all-features flag (#12429)
    ## Why
    
    Developers are frequently running low on disk space, and routine use of
    `--all-features` contributes to larger Cargo build caches in `target/`
    by compiling additional feature combinations.
    
    This change updates local workflow guidance to avoid `--all-features` by
    default and reserve it for cases where full feature coverage is
    specifically needed.
    
    ## What Changed
    
    - Updated `AGENTS.md` guidance for `codex-rs` to recommend `cargo test`
    / `just test` for full-suite local runs, and to call out the disk-usage
    cost of routine `--all-features` usage.
    - Updated the root `justfile` so `just fix` and `just clippy` no longer
    pass `--all-features` by default.
    - Updated `docs/install.md` to explicitly describe `cargo test
    --all-features` as an optional heavier-weight run (more build time and
    `target/` disk usage).
    
    ## Verification
    
    - Confirmed the `justfile` parses and the recipes list successfully with
    `just --list`.
  • bazel: enforce MODULE.bazel.lock sync with Cargo.lock (#11790)
    ## Why this change
    
    When Cargo dependencies change, it is easy to end up with an unexpected
    local diff in
    `MODULE.bazel.lock` after running Bazel. That creates noisy working
    copies and pushes lockfile fixes
    later in the cycle. This change addresses that pain point directly.
    
    ## What this change enforces
    
    The expected invariant is: after dependency updates, `MODULE.bazel.lock`
    is already in sync with
    Cargo resolution. In practice, running `bazel mod deps` should not
    mutate the lockfile in a clean
    state. If it does, the dependency update is incomplete.
    
    ## How this is enforced
    
    This change adds a single lockfile check script that snapshots
    `MODULE.bazel.lock`, runs
    `bazel mod deps`, and fails if the file changes. The same check is wired
    into local workflow
    commands (`just bazel-lock-update` and `just bazel-lock-check`) and into
    Bazel CI (Linux x86_64 job)
    so drift is caught early and consistently. The developer documentation
    is updated in
    `codex-rs/docs/bazel.md` and `AGENTS.md` to make the expected flow
    explicit.
    
    `MODULE.bazel.lock` is also refreshed in this PR to match the current
    Cargo dependency resolution.
    
    ## Expected developer workflow
    
    After changing `Cargo.toml` or `Cargo.lock`, run `just
    bazel-lock-update`, then run
    `just bazel-lock-check`, and include any resulting `MODULE.bazel.lock`
    update in the same change.
    
    ## Testing
    
    Ran `just bazel-lock-check` locally.
  • docs: require insta snapshot coverage for UI changes (#10669)
    Adds an explicit requirement in AGENTS.md that any user-visible UI
    change includes corresponding insta snapshot coverage and that snapshots
    are reviewed/accepted in the PR.
    
    Tests: N/A (docs only)
  • feat(app-server): experimental flag to persist extended history (#11227)
    This PR adds an experimental `persist_extended_history` bool flag to
    app-server thread APIs so rollout logs can retain a richer set of
    EventMsgs for non-lossy Thread > Turn > ThreadItems reconstruction (i.e.
    on `thread/resume`).
    
    ### Motivation
    Today, our rollout recorder only persists a small subset (e.g. user
    message, reasoning, assistant message) of `EventMsg` types, dropping a
    good number (like command exec, file change, etc.) that are important
    for reconstructing full item history for `thread/resume`, `thread/read`,
    and `thread/fork`.
    
    Some clients want to be able to resume a thread without lossiness. This
    lossiness is primarily a UI thing, since what the model sees are
    `ResponseItem` and not `EventMsg`.
    
    ### Approach
    This change introduces an opt-in `persist_full_history` flag to preserve
    those events when you start/resume/fork a thread (defaults to `false`).
    
    This is done by adding an `EventPersistenceMode` to the rollout
    recorder:
    - `Limited` (existing behavior, default)
    - `Extended` (new opt-in behavior)
    
    In `Extended` mode, persist additional `EventMsg` variants needed for
    non-lossy app-server `ThreadItem` reconstruction. We now store the
    following ThreadItems that we didn't before:
    - web search
    - command execution
    - patch/file changes
    - MCP tool calls
    - image view calls
    - collab tool outcomes
    - context compaction
    - review mode enter/exit
    
    For **command executions** in particular, we truncate the output using
    the existing `truncate_text` from core to store an upper bound of 10,000
    bytes, which is also the default value for truncating tool outputs shown
    to the model. This keeps the size of the rollout file and command
    execution items returned over the wire reasonable.
    
    And we also persist `EventMsg::Error` which we can now map back to the
    Turn's status and populates the Turn's error metadata.
    
    #### Updates to EventMsgs
    To truly make `thread/resume` non-lossy, we also needed to persist the
    `status` on `EventMsg::CommandExecutionEndEvent` and
    `EventMsg::PatchApplyEndEvent`. Previously it was not obvious whether a
    command failed or was declined (similar for apply_patch). These
    EventMsgs were never persisted before so I made it a required field.
  • fix(app-server): fix TS annotations for optional fields on requests (#10412)
    This updates our generated TypeScript types to be more correct with how
    the server actually behaves, **specifically for JSON-RPC requests**.
    
    Before this PR, we'd generate `field: T | null`. After this PR, we will
    have `field?: T | null`. The latter matches how the server actually
    works, in that if an optional field is omitted, the server will treat it
    as null. This also makes it less annoying in theory for clients to
    upgrade to newer versions of Codex, since adding a new optional field to
    a JSON-RPC request should not require a client change.
    
    NOTE: This only applies to JSON-RPC requests. All other payloads (i.e.
    responses, notifications) will return `field: T | null` as usual.
  • Reject request_user_input outside Plan/Pair (#9955)
    ## Context
    
    Previous work in https://github.com/openai/codex/pull/9560 only rejected
    `request_user_input` in Execute and Custom modes. Since then, additional
    modes
    (e.g., Code) were added, so the guard should be mode-agnostic.
    
    ## What changed
    
    - Switch the handler to an allowlist: only Plan and PairProgramming are
    allowed
    - Return the same error for any other mode (including Code)
    - Add a Code-mode rejection test alongside the existing Execute/Custom
    tests
    
    ## Why
    
    This prevents `request_user_input` from being used in modes where it is
    not
    intended, even as new modes are introduced.
  • chore: tweak AGENTS.md (#9650)
    ## Summary
    Update AGENTS.md to improve testing flow
    
    ## Testing
    - [x] Tested locally, much faster
  • don't ask for approval for just fix (#9586)
    It blocks all my skills from executing because it asks to run just fmt.
    It's quick command that doesn't need approval.
    
    
    <img width="967" height="120" alt="image"
    src="https://github.com/user-attachments/assets/f8e6ca76-a650-49e9-beb2-ce98ba48d310"
    />
  • add generated jsonschema for config.toml (#8956)
    ### What
    Add JSON Schema generation for `config.toml`, with checked‑in
    `docs/config.schema.json`. We can move the schema elsewhere if preferred
    (and host it if there's demand).
    
    Add fixture test to prevent drift and `just write-config-schema` to
    regenerate on schema changes.
    
    Generate MCP config schema from `RawMcpServerConfig` instead of
    `McpServerConfig` because that is the runtime type used for
    deserialization.
    
    Populate feature flag values into generated schema so they can be
    autocompleted.
    
    ### Tests
    Added tests + regenerate script to prevent drift. Tested autocompletions
    using generated jsonschema locally with Even Better TOML.
    
    
    
    https://github.com/user-attachments/assets/5aa7cd39-520c-4a63-96fb-63798183d0bc
  • feat: introduce find_resource! macro that works with Cargo or Bazel (#8879)
    To support Bazelification in https://github.com/openai/codex/pull/8875,
    this PR introduces a new `find_resource!` macro that we use in place of
    our existing logic in tests that looks for resources relative to the
    compile-time `CARGO_MANIFEST_DIR` env var.
    
    To make this work, we plan to add the following to all `rust_library()`
    and `rust_test()` Bazel rules in the project:
    
    ```
    rustc_env = {
        "BAZEL_PACKAGE": native.package_name(),
    },
    ```
    
    Our new `find_resource!` macro reads this value via
    `option_env!("BAZEL_PACKAGE")` so that the Bazel package _of the code
    using `find_resource!`_ is injected into the code expanded from the
    macro. (If `find_resource()` were a function, then
    `option_env!("BAZEL_PACKAGE")` would always be
    `codex-rs/utils/cargo-bin`, which is not what we want.)
    
    Note we only consider the `BAZEL_PACKAGE` value when the `RUNFILES_DIR`
    environment variable is set at runtime, indicating that the test is
    being run by Bazel. In this case, we have to concatenate the runtime
    `RUNFILES_DIR` with the compile-time `BAZEL_PACKAGE` value to build the
    path to the resource.
    
    In testing this change, I discovered one funky edge case in
    `codex-rs/exec-server/tests/common/lib.rs` where we have to _normalize_
    (but not canonicalize!) the result from `find_resource!` because the
    path contains a `common/..` component that does not exist on disk when
    the test is run under Bazel, so it must be semantically normalized using
    the [`path-absolutize`](https://crates.io/crates/path-absolutize) crate
    before it is passed to `dotslash fetch`.
    
    Because this new behavior may be non-obvious, this PR also updates
    `AGENTS.md` to make humans/Codex aware that this API is preferred.
  • feat: introduce codex-utils-cargo-bin as an alternative to assert_cmd::Command (#8496)
    This PR introduces a `codex-utils-cargo-bin` utility crate that
    wraps/replaces our use of `assert_cmd::Command` and
    `escargot::CargoBuild`.
    
    As you can infer from the introduction of `buck_project_root()` in this
    PR, I am attempting to make it possible to build Codex under
    [Buck2](https://buck2.build) as well as `cargo`. With Buck2, I hope to
    achieve faster incremental local builds (largely due to Buck2's
    [dice](https://buck2.build/docs/insights_and_knowledge/modern_dice/)
    build strategy, as well as benefits from its local build daemon) as well
    as faster CI builds if we invest in remote execution and caching.
    
    See
    https://buck2.build/docs/getting_started/what_is_buck2/#why-use-buck2-key-advantages
    for more details about the performance advantages of Buck2.
    
    Buck2 enforces stronger requirements in terms of build and test
    isolation. It discourages assumptions about absolute paths (which is key
    to enabling remote execution). Because the `CARGO_BIN_EXE_*` environment
    variables that Cargo provides are absolute paths (which
    `assert_cmd::Command` reads), this is a problem for Buck2, which is why
    we need this `codex-utils-cargo-bin` utility.
    
    My WIP-Buck2 setup sets the `CARGO_BIN_EXE_*` environment variables
    passed to a `rust_test()` build rule as relative paths.
    `codex-utils-cargo-bin` will resolve these values to absolute paths,
    when necessary.
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/8496).
    * #8498
    * __->__ #8496
  • docs: remove blanket ban on unsigned integers (#7957)
    Drop the AGENTS.md rule that forbids unsigned ints. The blanket guidance
    causes unnecessary complexity in cases where values are naturally
    unsigned, leading to extra clamping/conversion code instead of using
    checked or saturating arithmetic where needed.
  • feat(core) Add login to shell_command tool (#6846)
    ## Summary
    Adds the `login` parameter to the `shell_command` tool - optional,
    defaults to true.
    
    ## Testing
    - [x] Tested locally
  • Prefer wait_for_event over wait_for_event_with_timeout. (#6346)
    No need to specify the timeout in most cases.
  • Auto compact at ~90% (#5292)
    Users now hit a window exceeded limit and they usually don't know what
    to do. This starts auto compact at ~90% of the window.
  • [MCP] Add support for resources (#5239)
    This PR adds support for [MCP
    resources](https://modelcontextprotocol.io/specification/2025-06-18/server/resources)
    by adding three new tools for the model:
    1. `list_resources`
    2. `list_resource_templates`
    3. `read_resource`
    
    These 3 tools correspond to the [three primary MCP resource protocol
    messages](https://modelcontextprotocol.io/specification/2025-06-18/server/resources#protocol-messages).
    
    Example of listing and reading a GitHub resource tempalte
    <img width="2984" height="804" alt="CleanShot 2025-10-15 at 17 31 10"
    src="https://github.com/user-attachments/assets/89b7f215-2e2a-41c5-90dd-b932ac84a585"
    />
    
    `/mcp` with Figma configured
    <img width="2984" height="442" alt="CleanShot 2025-10-15 at 18 29 35"
    src="https://github.com/user-attachments/assets/a7578080-2ed2-4c59-b9b4-d8461f90d8ee"
    />
    
    Fixes #4956
  • Simplify request body assertions (#4845)
    We'll have a lot more test like these
  • chore: subject docs/*.md to Prettier checks (#4645)
    Apparently we were not running our `pnpm run prettier` check in CI, so
    many files that were covered by the existing Prettier check were not
    well-formatted.
    
    This updates CI and formats the files.