37 Commits

  • [codex] Use expect in integration tests (#28441)
    The workspace denies `clippy::expect_used` in production. Although
    `clippy.toml` allows `expect` in tests, Bazel Clippy compiles
    integration-test helper code in a way that does not receive that
    exemption, which encouraged verbose `unwrap_or_else(... panic!(...))`
    and equivalent `match`/`let else` forms.
    
    This allows `clippy::expect_used` once at each integration-test crate
    root (including aggregated suites and test-support libraries), then
    replaces manual panic-based Result and Option unwraps with
    `expect`/`expect_err`. Standalone `tests/*.rs` files remain their own
    crate roots. Intentional assertion and unexpected-variant panics remain
    unchanged, and the production `expect_used = "deny"` lint remains in
    place.
    
    The cleanup is mechanical and net-negative in line count.
  • Guard core test subprocess cleanup (#27343)
    ## Why
    
    Local integration-heavy `codex-core` CLI tests can time out or be
    interrupted after spawning `codex exec`. Stopping only the direct child
    is not enough: `codex exec` can leave grandchildren behind, including
    `python3`/`python3.12` processes that get reparented to PID 1 and keep
    running after the test is gone.
    
    This PR fixes that failure mode directly for the affected CLI
    integration tests, without changing production code or reducing local
    test concurrency.
    
    ## What
    
    - Run the `cli_stream` `codex exec` subprocesses through a small private
    wrapper in `core/tests/suite/cli_stream.rs`.
    - Spawn those subprocesses in their own process group before execution.
    - Keep `.output()`-style stdout/stderr capture and the existing
    30-second timeout behavior.
    - Own each spawned process with a drop guard that kills the whole
    process group on success, timeout, panic, or other early return.
    
    The switch from `assert_cmd::Command` to `std::process::Command` is only
    for these subprocess launches; `assert_cmd` does not expose a pre-spawn
    hook for setting the process group.
    
    ## Verification
    
    - `just test -p codex-core --test all responses_mode_stream_cli`
    
    This is limited to core integration tests; it does not change production
    `src` code paths.
  • [codex-rs] support v2 personal access tokens (#25731)
    ## Summary
    
    - add v2 personal access token support for `codex login
    --with-access-token` and `CODEX_ACCESS_TOKEN`
    - classify opaque `at-` tokens separately from legacy Agent Identity
    JWTs
    - hydrate required ChatGPT account metadata through AuthAPI
    `/v1/user-auth-credential/whoami`
    - use PATs directly as bearer tokens while preserving existing ChatGPT
    account surfaces
    - expose PAT-backed auth as the explicit `personalAccessToken`
    app-server auth mode
    
    ## Implementation
    
    PAT auth is intentionally small and stateless. Loading a PAT performs
    one AuthAPI metadata request, stores the hydrated metadata in the
    in-memory auth object, and redacts the secret from debug output. Legacy
    Agent Identity JWT handling remains unchanged. The shared access-token
    classifier lives in a private neutral module because it dispatches
    between both credential types.
    
    PAT hydration fails closed when AuthAPI omits any required metadata,
    including email. Hydrated metadata is intentionally not persisted:
    startup performs a live `whoami` preflight so revoked tokens or changed
    account metadata are not accepted from a stale cache.
    
    ## Workspace restriction scope
    
    This change intentionally does **not** apply
    `forced_chatgpt_workspace_id` to PAT authentication. The setting is a
    client-side config guardrail, not an authorization boundary, and PAT
    does not currently require workspace-ID parity. The PAT login and
    `CODEX_ACCESS_TOKEN` paths therefore validate through AuthAPI without
    threading workspace-restriction state through access-token loading.
    Existing workspace checks for non-PAT auth remain on their established
    paths.
    
    ## App-server compatibility
    
    The public app-server `AuthMode` is shared across v1 and v2, and
    PAT-backed auth reports `personalAccessToken` through both APIs.
    Following human review, this intentionally removes the temporary v1
    compatibility mapping that reported PATs as `chatgpt`; the deprecated v1
    API is kept in parity with v2 rather than maintaining a separate closed
    enum. Clients with exhaustive auth-mode handling in either API version
    must add the new case and should generally treat it as ChatGPT-backed
    unless they need PAT-specific behavior.
    
    The v1 auth-status response still omits the raw PAT when `includeToken`
    is requested because that response cannot carry the account metadata
    needed to reuse the credential safely. Persisted PAT auth also omits the
    new enum value so older Codex builds can deserialize `auth.json` and
    infer PAT auth from the credential field after a rollback.
    
    ## Validation
    
    Latest review-fix validation:
    
    - `CARGO_INCREMENTAL=0 just test -p codex-login` (126 passed)
    - `CARGO_INCREMENTAL=0 just test -p codex-cli` (263 passed)
    - `CARGO_INCREMENTAL=0 just test -p codex-cli
    stored_auth_validation_handles_personal_access_token`
    - `CARGO_INCREMENTAL=0 just test -p codex-app-server-protocol` (226
    passed)
    - `CARGO_INCREMENTAL=0 just test -p codex-models-manager
    refresh_available_models_uses_remote_only_catalog_for_chatgpt_auth`
    - `CARGO_INCREMENTAL=0 just test -p codex-tui
    existing_non_oauth_chatgpt_login_counts_as_signed_in`
    - `CARGO_INCREMENTAL=0 just fix -p codex-login -p
    codex-app-server-protocol -p codex-models-manager -p codex-tui -p
    codex-cli`
    - `just fmt`
    - `git diff --check`
    
    The broader `codex-tui` suite previously compiled and ran 2,834 tests.
    Three unrelated environment-sensitive guardian/IDE-socket tests failed
    after retries; the PAT-relevant TUI coverage passed.
  • fix: drop flake (#24588)
    Dropping already commented out stuff
  • cli: rename profile v2 flag to --profile (#23883)
    ## Why
    
    Profile v2 is taking over the user-facing profile selection path, so the
    CLI no longer needs to expose the transitional `--profile-v2` spelling.
    This switches the public args surface to `--profile` before the
    remaining legacy profile plumbing is removed separately.
    
    ## What
    
    - Rebind `--profile` and `-p` to the v2 profile name argument that
    selects `$CODEX_HOME/<name>.config.toml`.
    - Stop parsing the legacy shared CLI profile argument while keeping its
    implementation path in place for follow-up cleanup.
    - Update CLI validation, profile-name parse errors, and the
    legacy-profile collision message/tests to refer to `--profile`.
    
    ## Testing
    
    - `cargo test -p codex-cli -p codex-config -p codex-protocol -p
    codex-utils-cli`
  • Remove CODEX_RS_SSE_FIXTURE test hook (#22413)
    ## Why
    
    `CODEX_RS_SSE_FIXTURE` let integration-style CLI, exec, and TUI tests
    bypass the normal Responses transport by reading SSE from local files.
    That kept test-only behavior wired through production client code. The
    affected tests can stay hermetic by using the existing
    `core_test_support::responses` mock server and passing `openai_base_url`
    instead.
    
    ## What Changed
    
    - Removed the `CODEX_RS_SSE_FIXTURE` flag,
    `codex_api::stream_from_fixture`, the `env-flags` dependency, and the
    checked-in SSE fixture files.
    - Repointed the affected core, exec, and TUI tests at `MockServer` with
    the existing SSE event constructors.
    - Removed the Bazel test data plumbing for the deleted fixtures and
    refreshed cargo/Bazel lock state.
    
    ## Verification
    
    - `cargo build -p codex-cli`
    - `cargo test -p codex-api`
    - `cargo test -p codex-core --test all responses_api_stream_cli`
    - `cargo test -p codex-core --test all
    integration_creates_and_checks_session_file`
    - `cargo test -p codex-exec --test all ephemeral`
    - `cargo test -p codex-exec --test all resume`
    - `cargo test -p codex-tui --test all
    resume_startup_does_not_consume_model_availability_nux_count`
    - `just bazel-lock-update`
    - `just bazel-lock-check`
    - `just fix -p codex-api -p codex-core -p codex-exec -p codex-tui`
    - `git diff --check`
  • Remove OPENAI_BASE_URL config fallback (#16720)
    The `OPENAI_BASE_URL` environment variable has been a significant
    support issue, so we decided to deprecate it in favor of an
    `openai_base_url` config key. We've had the deprecation warning in place
    for about a month, so users have had time to migrate to the new
    mechanism. This PR removes support for `OPENAI_BASE_URL` entirely.
  • core: remove cross-crate re-exports from lib.rs (#16512)
    ## Why
    
    `codex-core` was re-exporting APIs owned by sibling `codex-*` crates,
    which made downstream crates depend on `codex-core` as a proxy module
    instead of the actual owner crate.
    
    Removing those forwards makes crate boundaries explicit and lets leaf
    crates drop unnecessary `codex-core` dependencies. In this PR, this
    reduces the dependency on `codex-core` to `codex-login` in the following
    files:
    
    ```
    codex-rs/backend-client/Cargo.toml
    codex-rs/mcp-server/tests/common/Cargo.toml
    ```
    
    ## What
    
    - Remove `codex-rs/core/src/lib.rs` re-exports for symbols owned by
    `codex-login`, `codex-mcp`, `codex-rollout`, `codex-analytics`,
    `codex-protocol`, `codex-shell-command`, `codex-sandboxing`,
    `codex-tools`, and `codex-utils-path`.
    - Delete the `default_client` forwarding shim in `codex-rs/core`.
    - Update in-crate and downstream callsites to import directly from the
    owning `codex-*` crate.
    - Add direct Cargo dependencies where callsites now target the owner
    crate, and remove `codex-core` from `codex-rs/backend-client`.
  • Move git utilities into a dedicated crate (#15564)
    - create `codex-git-utils` and move the shared git helpers into it with
    file moves preserved for diff readability
    - move the `GitInfo` helpers out of `core` so stacked rollout work can
    depend on the shared crate without carrying its own git info module
    
    ---------
    
    Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
    Co-authored-by: Codex <noreply@openai.com>
  • Add openai_base_url config override for built-in provider (#12031)
    We regularly get bug reports from users who mistakenly have the
    `OPENAI_BASE_URL` environment variable set. This PR deprecates this
    environment variable in favor of a top-level config key
    `openai_base_url` that is used for the same purpose. By making it a
    config key, it will be more visible to users. It will also participate
    in all of the infrastructure we've added for layered and managed
    configs.
    
    Summary
    - introduce the `openai_base_url` top-level config key, update
    schema/tests, and route the built-in openai provider through it while
    - fall back to deprecated `OPENAI_BASE_URL` env var but warn user of
    deprecation when no `openai_base_url` config key is present
    - update CLI, SDK, and TUI code to prefer the new config path (with a
    deprecated env-var fallback) and document the SDK behavior change
  • Fix codex exec --profile handling (#14524)
    PR #14005 introduced a regression whereby `codex exec --profile`
    overrides were dropped when starting or resuming a thread. That causes
    the thread to miss profile-scoped settings like
    `model_instructions_file`.
    
    This PR preserve the active profile in the thread start/resume config
    overrides so the
    app-server rebuild sees the same profile that exec resolved. 
    
    Fixes #14515
  • chore: remove codex-core public protocol/shell re-exports (#12432)
    ## Why
    
    `codex-rs/core/src/lib.rs` re-exported a broad set of types and modules
    from `codex-protocol` and `codex-shell-command`. That made it easy for
    workspace crates to import those APIs through `codex-core`, which in
    turn hides dependency edges and makes it harder to reduce compile-time
    coupling over time.
    
    This change removes those public re-exports so call sites must import
    from the source crates directly. Even when a crate still depends on
    `codex-core` today, this makes dependency boundaries explicit and
    unblocks future work to drop `codex-core` dependencies where possible.
    
    ## What Changed
    
    - Removed public re-exports from `codex-rs/core/src/lib.rs` for:
    - `codex_protocol::protocol` and related protocol/model types (including
    `InitialHistory`)
      - `codex_protocol::config_types` (`protocol_config_types`)
    - `codex_shell_command::{bash, is_dangerous_command, is_safe_command,
    parse_command, powershell}`
    - Migrated workspace Rust call sites to import directly from:
      - `codex_protocol::protocol`
      - `codex_protocol::config_types`
      - `codex_protocol::models`
      - `codex_shell_command`
    - Added explicit `Cargo.toml` dependencies (`codex-protocol` /
    `codex-shell-command`) in crates that now import those crates directly.
    - Kept `codex-core` internal modules compiling by using `pub(crate)`
    aliases in `core/src/lib.rs` (internal-only, not part of the public
    API).
    - Updated the two utility crates that can already drop a `codex-core`
    dependency edge entirely:
      - `codex-utils-approval-presets`
      - `codex-utils-cli`
    
    ## Verification
    
    - `cargo test -p codex-utils-approval-presets`
    - `cargo test -p codex-utils-cli`
    - `cargo check --workspace --all-targets`
    - `just clippy`
  • Leverage state DB metadata for thread summaries (#10621)
    Summary:
    - read conversation summaries and cwd info from the state DB when
    possible so we no longer rely on rollout files for metadata and avoid
    extra I/O
    - persist CLI version in thread metadata, surface it through summary
    builders, and add the necessary DB migration hooks
    - simplify thread listing by using enriched state DB data directly
    rather than reading rollout heads
    
    Testing:
    - Not run (not requested)
  • [bazel] Improve runfiles handling (#10098)
    we can't use runfiles directory on Windows due to path lengths, so swap
    to manifest strategy. Parsing the manifest is a bit complex and the
    format is changing in Bazel upstream, so pull in the official Rust
    library (via a small hack to make it importable...) and cleanup all the
    associated logic to work cleanly in both bazel and cargo without extra
    confusion
  • feat: rename experimental_instructions_file to model_instructions_file (#9555)
    A user who has `experimental_instructions_file` set will now see this:
    
    <img width="888" height="660" alt="image"
    src="https://github.com/user-attachments/assets/51c98312-eb9b-4881-81f1-bea6677e158d"
    />
    
    And a `codex exec` would include this warning:
    
    <img width="888" height="660" alt="image"
    src="https://github.com/user-attachments/assets/a89f62be-1edf-4593-a75e-e0b4a762ed7d"
    />
  • feat(app-server, core): return threads by created_at or updated_at (#9247)
    Add support for returning threads by either `created_at` OR `updated_at`
    descending. Previously core always returned threads ordered by
    `created_at`.
    
    This PR:
    - updates core to be able to list threads by `updated_at` OR
    `created_at` descending based on what the caller wants
    - also update `thread/list` in app-server to expose this (default to
    `created_at` if not specified)
    
    All existing codepaths (app-server, TUI) still default to `created_at`,
    so no behavior change is expected with this PR.
    
    **Implementation**
    To sort by `updated_at` is a bit nontrivial (whereas `created_at` is
    easy due to the way we structure the folders and filenames on disk,
    which are all based on `created_at`).
    
    The most naive way to do this without introducing a cache file or sqlite
    DB (which we have to implement/maintain) is to scan files in reverse
    `created_at` order on disk, and look at the file's mtime (last modified
    timestamp according to the filesystem) until we reach `MAX_SCAN_FILES`
    (currently set to 10,000). Then, we can return the most recent N
    threads.
    
    Based on some quick and dirty benchmarking on my machine with ~1000
    rollout files, calling `thread/list` with limit 50, the `updated_at`
    path is slower as expected due to all the I/O:
    - updated-at: average 103.10 ms
    - created-at: average 41.10 ms
    
    Those absolute numbers aren't a big deal IMO, but we can certainly
    optimize this in a followup if needed by introducing more state stored
    on disk.
    
    **Caveat**
    There's also a limitation in that any files older than `MAX_SCAN_FILES`
    will be excluded, which means if a user continues a REALLY old thread,
    it's possible to not be included. In practice that should not be too big
    of an issue.
    
    If a user makes...
    - 1000 rollouts/day → threads older than 10 days won't show up
    - 100 rollouts/day → ~100 days
    
    If this becomes a problem for some reason, even more motivation to
    implement an updated_at cache.
  • feat: introduce find_resource! macro that works with Cargo or Bazel (#8879)
    To support Bazelification in https://github.com/openai/codex/pull/8875,
    this PR introduces a new `find_resource!` macro that we use in place of
    our existing logic in tests that looks for resources relative to the
    compile-time `CARGO_MANIFEST_DIR` env var.
    
    To make this work, we plan to add the following to all `rust_library()`
    and `rust_test()` Bazel rules in the project:
    
    ```
    rustc_env = {
        "BAZEL_PACKAGE": native.package_name(),
    },
    ```
    
    Our new `find_resource!` macro reads this value via
    `option_env!("BAZEL_PACKAGE")` so that the Bazel package _of the code
    using `find_resource!`_ is injected into the code expanded from the
    macro. (If `find_resource()` were a function, then
    `option_env!("BAZEL_PACKAGE")` would always be
    `codex-rs/utils/cargo-bin`, which is not what we want.)
    
    Note we only consider the `BAZEL_PACKAGE` value when the `RUNFILES_DIR`
    environment variable is set at runtime, indicating that the test is
    being run by Bazel. In this case, we have to concatenate the runtime
    `RUNFILES_DIR` with the compile-time `BAZEL_PACKAGE` value to build the
    path to the resource.
    
    In testing this change, I discovered one funky edge case in
    `codex-rs/exec-server/tests/common/lib.rs` where we have to _normalize_
    (but not canonicalize!) the result from `find_resource!` because the
    path contains a `common/..` component that does not exist on disk when
    the test is run under Bazel, so it must be semantically normalized using
    the [`path-absolutize`](https://crates.io/crates/path-absolutize) crate
    before it is passed to `dotslash fetch`.
    
    Because this new behavior may be non-obvious, this PR also updates
    `AGENTS.md` to make humans/Codex aware that this API is preferred.
  • chore: unify conversation with thread name (#8830)
    Done and verified by Codex + refactor feature of RustRover
  • feat: introduce codex-utils-cargo-bin as an alternative to assert_cmd::Command (#8496)
    This PR introduces a `codex-utils-cargo-bin` utility crate that
    wraps/replaces our use of `assert_cmd::Command` and
    `escargot::CargoBuild`.
    
    As you can infer from the introduction of `buck_project_root()` in this
    PR, I am attempting to make it possible to build Codex under
    [Buck2](https://buck2.build) as well as `cargo`. With Buck2, I hope to
    achieve faster incremental local builds (largely due to Buck2's
    [dice](https://buck2.build/docs/insights_and_knowledge/modern_dice/)
    build strategy, as well as benefits from its local build daemon) as well
    as faster CI builds if we invest in remote execution and caching.
    
    See
    https://buck2.build/docs/getting_started/what_is_buck2/#why-use-buck2-key-advantages
    for more details about the performance advantages of Buck2.
    
    Buck2 enforces stronger requirements in terms of build and test
    isolation. It discourages assumptions about absolute paths (which is key
    to enabling remote execution). Because the `CARGO_BIN_EXE_*` environment
    variables that Cargo provides are absolute paths (which
    `assert_cmd::Command` reads), this is a problem for Buck2, which is why
    we need this `codex-utils-cargo-bin` utility.
    
    My WIP-Buck2 setup sets the `CARGO_BIN_EXE_*` environment variables
    passed to a `rust_test()` build rule as relative paths.
    `codex-utils-cargo-bin` will resolve these values to absolute paths,
    when necessary.
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/8496).
    * #8498
    * __->__ #8496
  • chore: speed-up pipeline (#5812)
    Speed-up pipeline by:
    * Decoupling tests and clippy
    * Use pre-built binary in tests
    * `sccache` for caching of the builds
  • feat: annotate conversations with model_provider for filtering (#5658)
    Because conversations that use the Responses API can have encrypted
    reasoning messages, trying to resume a conversation with a different
    provider could lead to confusing "failed to decrypt" errors. (This is
    reproducible by starting a conversation using ChatGPT login and resuming
    it as a conversation that uses OpenAI models via Azure.)
    
    This changes `ListConversationsParams` to take a `model_providers:
    Option<Vec<String>>` and adds `model_provider` on each
    `ConversationSummary` it returns so these cases can be disambiguated.
    
    Note this ended up making changes to
    `codex-rs/core/src/rollout/tests.rs` because it had a number of cases
    where it expected `Some` for the value of `next_cursor`, but the list of
    rollouts was complete, so according to this docstring:
    
    
    https://github.com/openai/codex/blob/bcd64c7e7231d6316a2377d1525a0fa74f21b783/codex-rs/app-server-protocol/src/protocol.rs#L334-L337
    
    If there are no more items to return, then `next_cursor` should be
    `None`. This PR updates that logic.
    
    
    
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/5658).
    * #5803
    * #5793
    * __->__ #5658
  • test: reduce time dependency on test harness (#5053)
    Tightened the CLI integration tests to stop relying on wall-clock
    sleeps—new fs watcher helper waits for session files instead of timing
    out, and SSE mocks/fixtures make the flows deterministic.
  • Simplify request body assertions (#4845)
    We'll have a lot more test like these
  • Separate interactive and non-interactive sessions (#4612)
    Do not show exec session in VSCode/TUI selector.
  • make tests pass cleanly in sandbox (#4067)
    This changes the reqwest client used in tests to be sandbox-friendly,
    and skips a bunch of other tests that don't work inside the
    sandbox/without network.
  • enable-resume (#3537)
    Adding the ability to resume conversations.
    we have one verb `resume`. 
    
    Behavior:
    
    `tui`:
    `codex resume`: opens session picker
    `codex resume --last`: continue last message
    `codex resume <session id>`: continue conversation with `session id`
    
    `exec`:
    `codex resume --last`: continue last conversation
    `codex resume <session id>`: continue conversation with `session id`
    
    Implementation:
    - I added a function to find the path in `~/.codex/sessions/` with a
    `UUID`. This is helpful in resuming with session id.
    - Added the above mentioned flags
    - Added lots of testing
  • Move initial history to protocol (#3422)
    To fix an edge case of forking then resuming
    
    #3419
  • Introduce rollout items (#3380)
    This PR introduces Rollout items. This enable us to rollout eventmsgs
    and session meta.
    
    This is mostly #3214 with rebase on main
  • Generate more typescript types and return conversation id with ConversationSummary (#3219)
    This PR does multiple things that are necessary for conversation resume
    to work from the extension. I wanted to make sure everything worked so
    these changes wound up in one PR:
    1. Generate more ts types
    2. Resume rollout history files rather than create a new one every time
    it is resumed so you don't see a duplicate conversation in history for
    every resume. Chatted with @aibrahim-oai to verify this
    3. Return conversation_id in conversation summaries
    4. [Cleanup] Use serde and strong types for a lot of the rollout file
    parsing
  • chore: unify history loading (#2736)
    We have two ways of loading conversation with a previous history. Fork
    conversation and the experimental resume that we had before. In this PR,
    I am unifying their code path. The path is getting the history items and
    recording them in a brand new conversation. This PR also constraint the
    rollout recorder responsibilities to be only recording to the disk and
    loading from the disk.
    
    The PR also fixes a current bug when we have two forking in a row:
    History 1:
    <Environment Context>
    UserMessage_1
    UserMessage_2
    UserMessage_3
    
    **Fork with n = 1 (only remove one element)**
    History 2:
    <Environment Context>
    UserMessage_1
    UserMessage_2
    <Environment Context>
    
    **Fork with n = 1 (only remove one element)**
    History 2:
    <Environment Context>
    UserMessage_1
    UserMessage_2
    **<Environment Context>**
    
    This shouldn't happen but because we were appending the `<Environment
    Context>` after each spawning and it's considered as _user message_.
    Now, we don't add this message if restoring and old conversation.
  • test: faster test execution in codex-core (#2633)
    this dramatically improves time to run `cargo test -p codex-core` (~25x
    speedup).
    
    before:
    ```
    cargo test -p codex-core  35.96s user 68.63s system 19% cpu 8:49.80 total
    ```
    
    after:
    ```
    cargo test -p codex-core  5.51s user 8.16s system 63% cpu 21.407 total
    ```
    
    both tests measured "hot", i.e. on a 2nd run with no filesystem changes,
    to exclude compile times.
    
    approach inspired by [Delete Cargo Integration
    Tests](https://matklad.github.io/2021/02/27/delete-cargo-integration-tests.html),
    we move all test cases in tests/ into a single suite in order to have a
    single binary, as there is significant overhead for each test binary
    executed, and because test execution is only parallelized with a single
    binary.