Commit Graph

25 Commits

  • Leverage state DB metadata for thread summaries (#10621)
    Summary:
    - read conversation summaries and cwd info from the state DB when
    possible so we no longer rely on rollout files for metadata and avoid
    extra I/O
    - persist CLI version in thread metadata, surface it through summary
    builders, and add the necessary DB migration hooks
    - simplify thread listing by using enriched state DB data directly
    rather than reading rollout heads
    
    Testing:
    - Not run (not requested)
  • [bazel] Improve runfiles handling (#10098)
    we can't use runfiles directory on Windows due to path lengths, so swap
    to manifest strategy. Parsing the manifest is a bit complex and the
    format is changing in Bazel upstream, so pull in the official Rust
    library (via a small hack to make it importable...) and cleanup all the
    associated logic to work cleanly in both bazel and cargo without extra
    confusion
  • feat: rename experimental_instructions_file to model_instructions_file (#9555)
    A user who has `experimental_instructions_file` set will now see this:
    
    <img width="888" height="660" alt="image"
    src="https://github.com/user-attachments/assets/51c98312-eb9b-4881-81f1-bea6677e158d"
    />
    
    And a `codex exec` would include this warning:
    
    <img width="888" height="660" alt="image"
    src="https://github.com/user-attachments/assets/a89f62be-1edf-4593-a75e-e0b4a762ed7d"
    />
  • feat(app-server, core): return threads by created_at or updated_at (#9247)
    Add support for returning threads by either `created_at` OR `updated_at`
    descending. Previously core always returned threads ordered by
    `created_at`.
    
    This PR:
    - updates core to be able to list threads by `updated_at` OR
    `created_at` descending based on what the caller wants
    - also update `thread/list` in app-server to expose this (default to
    `created_at` if not specified)
    
    All existing codepaths (app-server, TUI) still default to `created_at`,
    so no behavior change is expected with this PR.
    
    **Implementation**
    To sort by `updated_at` is a bit nontrivial (whereas `created_at` is
    easy due to the way we structure the folders and filenames on disk,
    which are all based on `created_at`).
    
    The most naive way to do this without introducing a cache file or sqlite
    DB (which we have to implement/maintain) is to scan files in reverse
    `created_at` order on disk, and look at the file's mtime (last modified
    timestamp according to the filesystem) until we reach `MAX_SCAN_FILES`
    (currently set to 10,000). Then, we can return the most recent N
    threads.
    
    Based on some quick and dirty benchmarking on my machine with ~1000
    rollout files, calling `thread/list` with limit 50, the `updated_at`
    path is slower as expected due to all the I/O:
    - updated-at: average 103.10 ms
    - created-at: average 41.10 ms
    
    Those absolute numbers aren't a big deal IMO, but we can certainly
    optimize this in a followup if needed by introducing more state stored
    on disk.
    
    **Caveat**
    There's also a limitation in that any files older than `MAX_SCAN_FILES`
    will be excluded, which means if a user continues a REALLY old thread,
    it's possible to not be included. In practice that should not be too big
    of an issue.
    
    If a user makes...
    - 1000 rollouts/day → threads older than 10 days won't show up
    - 100 rollouts/day → ~100 days
    
    If this becomes a problem for some reason, even more motivation to
    implement an updated_at cache.
  • feat: introduce find_resource! macro that works with Cargo or Bazel (#8879)
    To support Bazelification in https://github.com/openai/codex/pull/8875,
    this PR introduces a new `find_resource!` macro that we use in place of
    our existing logic in tests that looks for resources relative to the
    compile-time `CARGO_MANIFEST_DIR` env var.
    
    To make this work, we plan to add the following to all `rust_library()`
    and `rust_test()` Bazel rules in the project:
    
    ```
    rustc_env = {
        "BAZEL_PACKAGE": native.package_name(),
    },
    ```
    
    Our new `find_resource!` macro reads this value via
    `option_env!("BAZEL_PACKAGE")` so that the Bazel package _of the code
    using `find_resource!`_ is injected into the code expanded from the
    macro. (If `find_resource()` were a function, then
    `option_env!("BAZEL_PACKAGE")` would always be
    `codex-rs/utils/cargo-bin`, which is not what we want.)
    
    Note we only consider the `BAZEL_PACKAGE` value when the `RUNFILES_DIR`
    environment variable is set at runtime, indicating that the test is
    being run by Bazel. In this case, we have to concatenate the runtime
    `RUNFILES_DIR` with the compile-time `BAZEL_PACKAGE` value to build the
    path to the resource.
    
    In testing this change, I discovered one funky edge case in
    `codex-rs/exec-server/tests/common/lib.rs` where we have to _normalize_
    (but not canonicalize!) the result from `find_resource!` because the
    path contains a `common/..` component that does not exist on disk when
    the test is run under Bazel, so it must be semantically normalized using
    the [`path-absolutize`](https://crates.io/crates/path-absolutize) crate
    before it is passed to `dotslash fetch`.
    
    Because this new behavior may be non-obvious, this PR also updates
    `AGENTS.md` to make humans/Codex aware that this API is preferred.
  • chore: unify conversation with thread name (#8830)
    Done and verified by Codex + refactor feature of RustRover
  • feat: introduce codex-utils-cargo-bin as an alternative to assert_cmd::Command (#8496)
    This PR introduces a `codex-utils-cargo-bin` utility crate that
    wraps/replaces our use of `assert_cmd::Command` and
    `escargot::CargoBuild`.
    
    As you can infer from the introduction of `buck_project_root()` in this
    PR, I am attempting to make it possible to build Codex under
    [Buck2](https://buck2.build) as well as `cargo`. With Buck2, I hope to
    achieve faster incremental local builds (largely due to Buck2's
    [dice](https://buck2.build/docs/insights_and_knowledge/modern_dice/)
    build strategy, as well as benefits from its local build daemon) as well
    as faster CI builds if we invest in remote execution and caching.
    
    See
    https://buck2.build/docs/getting_started/what_is_buck2/#why-use-buck2-key-advantages
    for more details about the performance advantages of Buck2.
    
    Buck2 enforces stronger requirements in terms of build and test
    isolation. It discourages assumptions about absolute paths (which is key
    to enabling remote execution). Because the `CARGO_BIN_EXE_*` environment
    variables that Cargo provides are absolute paths (which
    `assert_cmd::Command` reads), this is a problem for Buck2, which is why
    we need this `codex-utils-cargo-bin` utility.
    
    My WIP-Buck2 setup sets the `CARGO_BIN_EXE_*` environment variables
    passed to a `rust_test()` build rule as relative paths.
    `codex-utils-cargo-bin` will resolve these values to absolute paths,
    when necessary.
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/8496).
    * #8498
    * __->__ #8496
  • chore: speed-up pipeline (#5812)
    Speed-up pipeline by:
    * Decoupling tests and clippy
    * Use pre-built binary in tests
    * `sccache` for caching of the builds
  • feat: annotate conversations with model_provider for filtering (#5658)
    Because conversations that use the Responses API can have encrypted
    reasoning messages, trying to resume a conversation with a different
    provider could lead to confusing "failed to decrypt" errors. (This is
    reproducible by starting a conversation using ChatGPT login and resuming
    it as a conversation that uses OpenAI models via Azure.)
    
    This changes `ListConversationsParams` to take a `model_providers:
    Option<Vec<String>>` and adds `model_provider` on each
    `ConversationSummary` it returns so these cases can be disambiguated.
    
    Note this ended up making changes to
    `codex-rs/core/src/rollout/tests.rs` because it had a number of cases
    where it expected `Some` for the value of `next_cursor`, but the list of
    rollouts was complete, so according to this docstring:
    
    
    https://github.com/openai/codex/blob/bcd64c7e7231d6316a2377d1525a0fa74f21b783/codex-rs/app-server-protocol/src/protocol.rs#L334-L337
    
    If there are no more items to return, then `next_cursor` should be
    `None`. This PR updates that logic.
    
    
    
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/5658).
    * #5803
    * #5793
    * __->__ #5658
  • test: reduce time dependency on test harness (#5053)
    Tightened the CLI integration tests to stop relying on wall-clock
    sleeps—new fs watcher helper waits for session files instead of timing
    out, and SSE mocks/fixtures make the flows deterministic.
  • Simplify request body assertions (#4845)
    We'll have a lot more test like these
  • Separate interactive and non-interactive sessions (#4612)
    Do not show exec session in VSCode/TUI selector.
  • make tests pass cleanly in sandbox (#4067)
    This changes the reqwest client used in tests to be sandbox-friendly,
    and skips a bunch of other tests that don't work inside the
    sandbox/without network.
  • enable-resume (#3537)
    Adding the ability to resume conversations.
    we have one verb `resume`. 
    
    Behavior:
    
    `tui`:
    `codex resume`: opens session picker
    `codex resume --last`: continue last message
    `codex resume <session id>`: continue conversation with `session id`
    
    `exec`:
    `codex resume --last`: continue last conversation
    `codex resume <session id>`: continue conversation with `session id`
    
    Implementation:
    - I added a function to find the path in `~/.codex/sessions/` with a
    `UUID`. This is helpful in resuming with session id.
    - Added the above mentioned flags
    - Added lots of testing
  • Move initial history to protocol (#3422)
    To fix an edge case of forking then resuming
    
    #3419
  • Introduce rollout items (#3380)
    This PR introduces Rollout items. This enable us to rollout eventmsgs
    and session meta.
    
    This is mostly #3214 with rebase on main
  • Generate more typescript types and return conversation id with ConversationSummary (#3219)
    This PR does multiple things that are necessary for conversation resume
    to work from the extension. I wanted to make sure everything worked so
    these changes wound up in one PR:
    1. Generate more ts types
    2. Resume rollout history files rather than create a new one every time
    it is resumed so you don't see a duplicate conversation in history for
    every resume. Chatted with @aibrahim-oai to verify this
    3. Return conversation_id in conversation summaries
    4. [Cleanup] Use serde and strong types for a lot of the rollout file
    parsing
  • chore: unify history loading (#2736)
    We have two ways of loading conversation with a previous history. Fork
    conversation and the experimental resume that we had before. In this PR,
    I am unifying their code path. The path is getting the history items and
    recording them in a brand new conversation. This PR also constraint the
    rollout recorder responsibilities to be only recording to the disk and
    loading from the disk.
    
    The PR also fixes a current bug when we have two forking in a row:
    History 1:
    <Environment Context>
    UserMessage_1
    UserMessage_2
    UserMessage_3
    
    **Fork with n = 1 (only remove one element)**
    History 2:
    <Environment Context>
    UserMessage_1
    UserMessage_2
    <Environment Context>
    
    **Fork with n = 1 (only remove one element)**
    History 2:
    <Environment Context>
    UserMessage_1
    UserMessage_2
    **<Environment Context>**
    
    This shouldn't happen but because we were appending the `<Environment
    Context>` after each spawning and it's considered as _user message_.
    Now, we don't add this message if restoring and old conversation.
  • test: faster test execution in codex-core (#2633)
    this dramatically improves time to run `cargo test -p codex-core` (~25x
    speedup).
    
    before:
    ```
    cargo test -p codex-core  35.96s user 68.63s system 19% cpu 8:49.80 total
    ```
    
    after:
    ```
    cargo test -p codex-core  5.51s user 8.16s system 63% cpu 21.407 total
    ```
    
    both tests measured "hot", i.e. on a 2nd run with no filesystem changes,
    to exclude compile times.
    
    approach inspired by [Delete Cargo Integration
    Tests](https://matklad.github.io/2021/02/27/delete-cargo-integration-tests.html),
    we move all test cases in tests/ into a single suite in order to have a
    single binary, as there is significant overhead for each test binary
    executed, and because test execution is only parallelized with a single
    binary.