Commit Graph

33 Commits

  • Remove test-support feature from codex-core and replace it with explicit test toggles (#11405)
    ## Why
    
    `codex-core` was being built in multiple feature-resolved permutations
    because test-only behavior was modeled as crate features. For a large
    crate, those permutations increase compile cost and reduce cache reuse.
    
    ## Net Change
    
    - Removed the `test-support` crate feature and related feature wiring so
    `codex-core` no longer needs separate feature shapes for test consumers.
    - Standardized cross-crate test-only access behind
    `codex_core::test_support`.
    - External test code now imports helpers from
    `codex_core::test_support`.
    - Underlying implementation hooks are kept internal (`pub(crate)`)
    instead of broadly public.
    
    ## Outcome
    
    - Fewer `codex-core` build permutations.
    - Better incremental cache reuse across test targets.
    - No intended production behavior change.
  • Remove WebSocket wire format (#10179)
    I'd like WireApi to go away (when chat is removed) and WebSockets is
    still responses API just over a different transport.
  • Use test_codex more (#9961)
    Reduces boilderplate.
  • feat(core) update Personality on turn (#9644)
    ## Summary
    Support updating Personality mid-Thread via UserTurn/OverwriteTurn. This
    is explicitly unused by the clients so far, to simplify PRs - app-server
    and tui implementations will be follow-ups.
    
    ## Testing
    - [x] added integration tests
  • Add text element metadata to types (#9235)
    Initial type tweaking PR to make the diff of
    https://github.com/openai/codex/pull/9116 smaller
    
    This should not change any behavior, just adds some fields to types
  • Support response.done and add integration tests (#9129)
    The agent loop using a persistent incremental web socket connection.
  • chore: unify conversation with thread name (#8830)
    Done and verified by Codex + refactor feature of RustRover
  • feat: agent controller (#8783)
    Added an agent control plane that lets sessions spawn or message other
    conversations via `AgentControl`.
    
    `AgentBus` (core/src/agent/bus.rs) keeps track of the last known status
    of a conversation.
    
    ConversationManager now holds shared state behind an Arc so AgentControl
    keeps only a weak back-reference, the goal is just to avoid explicit
    cycle reference.
    
    Follow-ups:
    * Build a small tool in the TUI to be able to see every agent and send
    manual message to each of them
    * Handle approval requests in this TUI
    * Add tools to spawn/communicate between agents (see related design)
    * Define agent types
  • feat: introduce codex-utils-cargo-bin as an alternative to assert_cmd::Command (#8496)
    This PR introduces a `codex-utils-cargo-bin` utility crate that
    wraps/replaces our use of `assert_cmd::Command` and
    `escargot::CargoBuild`.
    
    As you can infer from the introduction of `buck_project_root()` in this
    PR, I am attempting to make it possible to build Codex under
    [Buck2](https://buck2.build) as well as `cargo`. With Buck2, I hope to
    achieve faster incremental local builds (largely due to Buck2's
    [dice](https://buck2.build/docs/insights_and_knowledge/modern_dice/)
    build strategy, as well as benefits from its local build daemon) as well
    as faster CI builds if we invest in remote execution and caching.
    
    See
    https://buck2.build/docs/getting_started/what_is_buck2/#why-use-buck2-key-advantages
    for more details about the performance advantages of Buck2.
    
    Buck2 enforces stronger requirements in terms of build and test
    isolation. It discourages assumptions about absolute paths (which is key
    to enabling remote execution). Because the `CARGO_BIN_EXE_*` environment
    variables that Cargo provides are absolute paths (which
    `assert_cmd::Command` reads), this is a problem for Buck2, which is why
    we need this `codex-utils-cargo-bin` utility.
    
    My WIP-Buck2 setup sets the `CARGO_BIN_EXE_*` environment variables
    passed to a `rust_test()` build rule as relative paths.
    `codex-utils-cargo-bin` will resolve these values to absolute paths,
    when necessary.
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/8496).
    * #8498
    * __->__ #8496
  • chore: migrate from Config::load_from_base_config_with_overrides to ConfigBuilder (#8276)
    https://github.com/openai/codex/pull/8235 introduced `ConfigBuilder` and
    this PR updates all call non-test call sites to use it instead of
    `Config::load_from_base_config_with_overrides()`.
    
    This is important because `load_from_base_config_with_overrides()` uses
    an empty `ConfigRequirements`, which is a reasonable default for testing
    so the tests are not influenced by the settings on the host. This method
    is now guarded by `#[cfg(test)]` so it cannot be used by business logic.
    
    Because `ConfigBuilder::build()` is `async`, many of the test methods
    had to be migrated to be `async`, as well. On the bright side, this made
    it possible to eliminate a bunch of `block_on_future()` stuff.
  • Reimplement skills loading using SkillsManager + skills/list op. (#7914)
    refactor the way we load and manage skills:
    1. Move skill discovery/caching into SkillsManager and reuse it across
    sessions.
    2. Add the skills/list API (Op::ListSkills/SkillsListResponse) to fetch
    skills for one or more cwds. Also update app-server for VSCE/App;
    3. Trigger skills/list during session startup so UIs preload skills and
    handle errors immediately.
  • Inject SKILL.md when it's explicitly mentioned. (#7763)
    1. Skills load once in core at session start; the cached outcome is
    reused across core and surfaced to TUI via SessionConfigured.
    2. TUI detects explicit skill selections, and core injects the matching
    SKILL.md content into the turn when a selected skill is present.
  • make model optional in config (#7769)
    - Make Config.model optional and centralize default-selection logic in
    ModelsManager, including a default_model helper (with
    codex-auto-balanced when available) so sessions now carry an explicit
    chosen model separate from the base config.
    - Resolve `model` once in `core` and `tui` from config. Then store the
    state of it on other structs.
    - Move refreshing models to be before resolving the default model
  • remove model_family from `config (#7571)
    - Remove `model_family` from `config`
    - Make sure to still override config elements related to `model_family`
    like supporting reasoning
  • Migrate model family to models manager (#7565)
    This PR moves `ModelsFamily` to `openai_models`. It also propagates
    `ModelsManager` to session services and use it to drive model family. We
    also make `derive_default_model_family` private because it's a step
    towards what we want: one place that gives model configuration.
    
    This is a second step at having one source of truth for models
    information and config: `ModelsManager`.
    
    Next steps would be to remove `ModelsFamily` from config. That's massive
    because it's being used in 41 occasions mostly pre launching `codex`.
    Also, we need to make `find_family_for_model` private. It's also big
    because it's being used in 21 occasions ~ all tests.
  • fix(apply_patch) tests for shell_command (#7307)
    ## Summary
    Adds test coverage for invocations of apply_patch via shell_command with
    heredoc, to validate behavior.
    
    ## Testing
    - [x] These are tests
  • feat: remote compaction (#6795)
    Co-authored-by: pakrym-oai <pakrym@openai.com>
  • chore(core) Add shell_serialization coverage (#6810)
    ## Summary
    Similar to #6545, this PR updates the shell_serialization test suite to
    cover the various `shell` tool invocations we have. Note that this does
    not cover unified_exec, which has its own suite of tests. This should
    provide some test coverage for when we eventually consolidate
    serialization logic.
    
    ## Testing
    - [x] These are tests
  • Promote shared helpers for suite tests (#6460)
    ## Summary
    - add `TestCodex::submit_turn_with_policies` and extend the response
    helpers with reusable tool-call utilities
    - update the grep_files, read_file, list_dir, shell_serialization, and
    tools suites to rely on the shared helpers instead of local copies
    - make the list_dir helper return `anyhow::Result` so clippy no longer
    warns about `expect`
    
    ## Testing
    - `just fix -p codex-core`
    - `cargo test -p codex-core --test all
    suite::grep_files::grep_files_tool_collects_matches`
    - `cargo test -p codex-core
    suite::grep_files::grep_files_tool_collects_matches -- --ignored`
    (filter requests ignored tests so nothing runs, but the build stays
    clean)
    
    
    ------
    [Codex
    Task](https://chatgpt.com/codex/tasks/task_i_69112d53abac83219813cab4d7cb6446)
  • chore(core) Consolidate apply_patch tests (#6545)
    ## Summary
    Consolidates our apply_patch tests into one suite, and ensures each test
    case tests the various ways the harness supports apply_patch:
    1. Freeform custom tool call
    2. JSON function tool
    3. Simple shell call
    4. Heredoc shell call
    
    There are a few test cases that are specific to a particular variant,
    I've left those alone.
    
    ## Testing
    - [x] This adds a significant number of tests
  • Set verbosity to low for 5.1 (#6568)
    And improve test coverage
  • chore: testing on freeform apply_patch (#5952)
    ## Summary
    Duplicates the tests in `apply_patch_cli.rs`, but tests the freeform
    apply_patch tool as opposed to the function call path. The good news is
    that all the tests pass with zero logical tests, with the exception of
    the heredoc, which doesn't really make sense in the freeform tool
    context anyway.
    
    @jif-oai since you wrote the original tests in #5557, I'd love your
    opinion on the right way to DRY these test cases between the two. Happy
    to set up a more sophisticated harness, but didn't want to go down the
    rabbit hole until we agreed on the right pattern
    
    ## Testing
    - [x] These are tests
  • feat: deprecation warning (#5825)
    <img width="955" height="311" alt="Screenshot 2025-10-28 at 14 26 25"
    src="https://github.com/user-attachments/assets/99729b3d-3bc9-4503-aab3-8dc919220ab4"
    />
  • fix: apply_patch shell_serialization tests (#4786)
    ## Summary
    Adds additional shell_serialization tests specifically for apply_patch
    and other cases.
    
    ## Test Plan
    - [x] These are all tests
  • [MCP] Add support for MCP Oauth credentials (#4517)
    This PR adds oauth login support to streamable http servers when
    `experimental_use_rmcp_client` is enabled.
    
    This PR is large but represents the minimal amount of work required for
    this to work. To keep this PR smaller, login can only be done with
    `codex mcp login` and `codex mcp logout` but it doesn't appear in `/mcp`
    or `codex mcp list` yet. Fingers crossed that this is the last large MCP
    PR and that subsequent PRs can be smaller.
    
    Under the hood, credentials are stored using platform credential
    managers using the [keyring crate](https://crates.io/crates/keyring).
    When the keyring isn't available, it falls back to storing credentials
    in `CODEX_HOME/.credentials.json` which is consistent with how other
    coding agents handle authentication.
    
    I tested this on macOS, Windows, WSL (ubuntu), and Linux. I wasn't able
    to test the dbus store on linux but did verify that the fallback works.
    
    One quirk is that if you have credentials, during development, every
    build will have its own ad-hoc binary so the keyring won't recognize the
    reader as being the same as the write so it may ask for the user's
    password. I may add an override to disable this or allow
    users/enterprises to opt-out of the keyring storage if it causes issues.
    
    <img width="5064" height="686" alt="CleanShot 2025-09-30 at 19 31 40"
    src="https://github.com/user-attachments/assets/9573f9b4-07f1-4160-83b8-2920db287e2d"
    />
    <img width="745" height="486" alt="image"
    src="https://github.com/user-attachments/assets/9562649b-ea5f-4f22-ace2-d0cb438b143e"
    />
  • Add notifier tests (#4064)
    Proposal:
    1. Use anyhow for tests and avoid unwrap
    2. Extract a helper for starting a test instance of codex