Commit Graph

32 Commits

  • Remove test-support feature from codex-core and replace it with explicit test toggles (#11405)
    ## Why
    
    `codex-core` was being built in multiple feature-resolved permutations
    because test-only behavior was modeled as crate features. For a large
    crate, those permutations increase compile cost and reduce cache reuse.
    
    ## Net Change
    
    - Removed the `test-support` crate feature and related feature wiring so
    `codex-core` no longer needs separate feature shapes for test consumers.
    - Standardized cross-crate test-only access behind
    `codex_core::test_support`.
    - External test code now imports helpers from
    `codex_core::test_support`.
    - Underlying implementation hooks are kept internal (`pub(crate)`)
    instead of broadly public.
    
    ## Outcome
    
    - Fewer `codex-core` build permutations.
    - Better incremental cache reuse across test targets.
    - No intended production behavior change.
  • Remove deterministic_process_ids feature to avoid duplicate codex-core builds (#11393)
    ## Why
    
    `codex-core` enabled `deterministic_process_ids` through a self
    dev-dependency.
    That forced a second feature-resolved build of the same crate, which
    increased
    compile time and test latency.
    
    ## What Changed
    
    - Removed the `deterministic_process_ids` feature from
    `codex-rs/core/Cargo.toml`.
    - Removed the self dev-dependency on `codex-core` that enabled that
    feature.
    - Removed the Bazel `deterministic_process_ids` crate feature for
    `codex-core`.
    - Added a test-only `AtomicBool` override in unified exec process-id
    allocation.
    - Added a test-support setter for that override and re-exported it from
    `codex-core`.
    - Enabled deterministic process IDs in integration tests via
    `core_test_support` ctor.
    
    ## Behavior
    
    - Production behavior remains random process IDs.
    - Unit tests remain deterministic via `cfg(test)`.
    - Integration tests remain deterministic via explicit test-support
    initialization.
    
    ## Validation
    
    - `just fmt`
    - `cargo test -p codex-core unified_exec::`
    - `cargo test -p codex-core --test all unified_exec -- --test-threads=1`
    - `cargo tree -p codex-core -e features` (verified the removed feature
    path)
  • Update tests to stop using sse_completed fixture (#10638)
    Summary:
    - replace the `sse_completed` fixture and related JSON template with
    direct `responses::ev_completed` payload builders
    - cascade the new SSE helpers through all affected core tests for
    consistency and clarity
    - remove legacy fixtures that were no longer needed once the helpers are
    in place
    
    Testing:
    - Not run (not requested)
  • feat: replace custom mcp-types crate with equivalents from rmcp (#10349)
    We started working with MCP in Codex before
    https://crates.io/crates/rmcp was mature, so we had our own crate for
    MCP types that was generated from the MCP schema:
    
    
    https://github.com/openai/codex/blob/8b95d3e082376f4cb23e92641705a22afb28a9da/codex-rs/mcp-types/README.md
    
    Now that `rmcp` is more mature, it makes more sense to use their MCP
    types in Rust, as they handle details (like the `_meta` field) that our
    custom version ignored. Though one advantage that our custom types had
    is that our generated types implemented `JsonSchema` and `ts_rs::TS`,
    whereas the types in `rmcp` do not. As such, part of the work of this PR
    is leveraging the adapters between `rmcp` types and the serializable
    types that are API for us (app server and MCP) introduced in #10356.
    
    Note this PR results in a number of changes to
    `codex-rs/app-server-protocol/schema`, which merit special attention
    during review. We must ensure that these changes are still
    backwards-compatible, which is possible because we have:
    
    ```diff
    - export type CallToolResult = { content: Array<ContentBlock>, isError?: boolean, structuredContent?: JsonValue, };
    + export type CallToolResult = { content: Array<JsonValue>, structuredContent?: JsonValue, isError?: boolean, _meta?: JsonValue, };
    ```
    
    so `ContentBlock` has been replaced with the more general `JsonValue`.
    Note that `ContentBlock` was defined as:
    
    ```typescript
    export type ContentBlock = TextContent | ImageContent | AudioContent | ResourceLink | EmbeddedResource;
    ```
    
    so the deletion of those individual variants should not be a cause of
    great concern.
    
    Similarly, we have the following change in
    `codex-rs/app-server-protocol/schema/typescript/Tool.ts`:
    
    ```
    - export type Tool = { annotations?: ToolAnnotations, description?: string, inputSchema: ToolInputSchema, name: string, outputSchema?: ToolOutputSchema, title?: string, };
    + export type Tool = { name: string, title?: string, description?: string, inputSchema: JsonValue, outputSchema?: JsonValue, annotations?: JsonValue, icons?: Array<JsonValue>, _meta?: JsonValue, };
    ```
    
    so:
    
    - `annotations?: ToolAnnotations` ➡️ `JsonValue`
    - `inputSchema: ToolInputSchema` ➡️ `JsonValue`
    - `outputSchema?: ToolOutputSchema` ➡️ `JsonValue`
    
    and two new fields: `icons?: Array<JsonValue>, _meta?: JsonValue`
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/10349).
    * #10357
    * __->__ #10349
    * #10356
  • fix: leverage codex_utils_cargo_bin() in codex-rs/core/tests/suite (#8887)
    This eliminates our dependency on the `escargot` crate and better
    prepares us for Bazel builds: https://github.com/openai/codex/pull/8875.
  • chore: unify conversation with thread name (#8830)
    Done and verified by Codex + refactor feature of RustRover
  • feat: introduce codex-utils-cargo-bin as an alternative to assert_cmd::Command (#8496)
    This PR introduces a `codex-utils-cargo-bin` utility crate that
    wraps/replaces our use of `assert_cmd::Command` and
    `escargot::CargoBuild`.
    
    As you can infer from the introduction of `buck_project_root()` in this
    PR, I am attempting to make it possible to build Codex under
    [Buck2](https://buck2.build) as well as `cargo`. With Buck2, I hope to
    achieve faster incremental local builds (largely due to Buck2's
    [dice](https://buck2.build/docs/insights_and_knowledge/modern_dice/)
    build strategy, as well as benefits from its local build daemon) as well
    as faster CI builds if we invest in remote execution and caching.
    
    See
    https://buck2.build/docs/getting_started/what_is_buck2/#why-use-buck2-key-advantages
    for more details about the performance advantages of Buck2.
    
    Buck2 enforces stronger requirements in terms of build and test
    isolation. It discourages assumptions about absolute paths (which is key
    to enabling remote execution). Because the `CARGO_BIN_EXE_*` environment
    variables that Cargo provides are absolute paths (which
    `assert_cmd::Command` reads), this is a problem for Buck2, which is why
    we need this `codex-utils-cargo-bin` utility.
    
    My WIP-Buck2 setup sets the `CARGO_BIN_EXE_*` environment variables
    passed to a `rust_test()` build rule as relative paths.
    `codex-utils-cargo-bin` will resolve these values to absolute paths,
    when necessary.
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/8496).
    * #8498
    * __->__ #8496
  • chore: migrate from Config::load_from_base_config_with_overrides to ConfigBuilder (#8276)
    https://github.com/openai/codex/pull/8235 introduced `ConfigBuilder` and
    this PR updates all call non-test call sites to use it instead of
    `Config::load_from_base_config_with_overrides()`.
    
    This is important because `load_from_base_config_with_overrides()` uses
    an empty `ConfigRequirements`, which is a reasonable default for testing
    so the tests are not influenced by the settings on the host. This method
    is now guarded by `#[cfg(test)]` so it cannot be used by business logic.
    
    Because `ConfigBuilder::build()` is `async`, many of the test methods
    had to be migrated to be `async`, as well. On the bright side, this made
    it possible to eliminate a bunch of `block_on_future()` stuff.
  • Fix unified_exec on windows (#7620)
    Fix unified_exec on windows
    
    Requires removal of PSUEDOCONSOLE_INHERIT_CURSOR flag so child processed
    don't attempt to wait for cursor position response (and timeout).
    
    
    https://github.com/wezterm/wezterm/compare/main...pakrym:wezterm:PSUEDOCONSOLE_INHERIT_CURSOR?expand=1
    
    ---------
    
    Co-authored-by: pakrym-oai <pakrym@openai.com>
  • feat(core) Add login to shell_command tool (#6846)
    ## Summary
    Adds the `login` parameter to the `shell_command` tool - optional,
    defaults to true.
    
    ## Testing
    - [x] Tested locally
  • [App-server] v2 for account/updated and account/logout (#6175)
    V2 for `account/updated` and `account/logout` for app server. correspond
    to old `authStatusChange` and `LogoutChatGpt` respectively. Followup PRs
    will make other v2 endpoints call `account/updated` instead of
    `authStatusChange` too.
  • Add ItemStarted/ItemCompleted events for UserInputItem (#5306)
    Adds a new ItemStarted event and delivers UserMessage as the first item
    type (more to come).
    
    
    Renames `InputItem` to `UserInput` considering we're using the `Item`
    suffix for actual items.
  • test: reduce time dependency on test harness (#5053)
    Tightened the CLI integration tests to stop relying on wall-clock
    sleeps—new fs watcher helper waits for session files instead of timing
    out, and SSE mocks/fixtures make the flows deterministic.
  • Make output assertions more explicit (#4784)
    Match using precise regexes.
  • chore: refactor tool handling (#4510)
    # Tool System Refactor
    
    - Centralizes tool definitions and execution in `core/src/tools/*`:
    specs (`spec.rs`), handlers (`handlers/*`), router (`router.rs`),
    registry/dispatch (`registry.rs`), and shared context (`context.rs`).
    One registry now builds the model-visible tool list and binds handlers.
    - Router converts model responses to tool calls; Registry dispatches
    with consistent telemetry via `codex-rs/otel` and unified error
    handling. Function, Local Shell, MCP, and experimental `unified_exec`
    all flow through this path; legacy shell aliases still work.
    - Rationale: reduce per‑tool boilerplate, keep spec/handler in sync, and
    make adding tools predictable and testable.
    
    Example: `read_file`
    - Spec: `core/src/tools/spec.rs` (see `create_read_file_tool`,
    registered by `build_specs`).
    - Handler: `core/src/tools/handlers/read_file.rs` (absolute `file_path`,
    1‑indexed `offset`, `limit`, `L#: ` prefixes, safe truncation).
    - E2E test: `core/tests/suite/read_file.rs` validates the tool returns
    the requested lines.
    
    ## Next steps:
    - Decompose `handle_container_exec_with_params` 
    - Add parallel tool calls
  • Add codex exec testing helpers (#4254)
    Add a shortcut to create working directories and run codex exec with
    fake server.
  • make tests pass cleanly in sandbox (#4067)
    This changes the reqwest client used in tests to be sandbox-friendly,
    and skips a bunch of other tests that don't work inside the
    sandbox/without network.
  • Add notifier tests (#4064)
    Proposal:
    1. Use anyhow for tests and avoid unwrap
    2. Extract a helper for starting a test instance of codex
  • [tools] Add apply_patch tool (#2303)
    ## Summary
    We've been seeing a number of issues and reports with our synthetic
    `apply_patch` tool, e.g. #802. Let's make this a real tool - in my
    anecdotal testing, it's critical for GPT-OSS models, but I'd like to
    make it the standard across GPT-5 and codex models as well.
    
    ## Testing
    - [x] Tested locally
    - [x] Integration test
  • Added allow-expect-in-tests / allow-unwrap-in-tests (#2328)
    This PR:
    * Added the clippy.toml to configure allowable expect / unwrap usage in
    tests
    * Removed as many expect/allow lines as possible from tests
    * moved a bunch of allows to expects where possible
    
    Note: in integration tests, non `#[test]` helper functions are not
    covered by this so we had to leave a few lingering `expect(expect_used`
    checks around
  • chore: introduce ConversationManager as a clearinghouse for all conversations (#2240)
    This PR does two things because after I got deep into the first one I
    started pulling on the thread to the second:
    
    - Makes `ConversationManager` the place where all in-memory
    conversations are created and stored. Previously, `MessageProcessor` in
    the `codex-mcp-server` crate was doing this via its `session_map`, but
    this is something that should be done in `codex-core`.
    - It unwinds the `ctrl_c: tokio::sync::Notify` that was threaded
    throughout our code. I think this made sense at one time, but now that
    we handle Ctrl-C within the TUI and have a proper `Op::Interrupt` event,
    I don't think this was quite right, so I removed it. For `codex exec`
    and `codex proto`, we now use `tokio::signal::ctrl_c()` directly, but we
    no longer make `Notify` a field of `Codex` or `CodexConversation`.
    
    Changes of note:
    
    - Adds the files `conversation_manager.rs` and `codex_conversation.rs`
    to `codex-core`.
    - `Codex` and `CodexSpawnOk` are no longer exported from `codex-core`:
    other crates must use `CodexConversation` instead (which is created via
    `ConversationManager`).
    - `core/src/codex_wrapper.rs` has been deleted in favor of
    `ConversationManager`.
    - `ConversationManager::new_conversation()` returns `NewConversation`,
    which is in line with the `new_conversation` tool we want to add to the
    MCP server. Note `NewConversation` includes `SessionConfiguredEvent`, so
    we eliminate checks in cases like `codex-rs/core/tests/client.rs` to
    verify `SessionConfiguredEvent` is the first event because that is now
    internal to `ConversationManager`.
    - Quite a bit of code was deleted from
    `codex-rs/mcp-server/src/message_processor.rs` since it no longer has to
    manage multiple conversations itself: it goes through
    `ConversationManager` instead.
    - `core/tests/live_agent.rs` has been deleted because I had to update a
    bunch of tests and all the tests in here were ignored, and I don't think
    anyone ever ran them, so this was just technical debt, at this point.
    - Removed `notify_on_sigint()` from `util.rs` (and in a follow-up, I
    hope to refactor the blandly-named `util.rs` into more descriptive
    files).
    - In general, I started replacing local variables named `codex` as
    `conversation`, where appropriate, though admittedly I didn't do it
    through all the integration tests because that would have added a lot of
    noise to this PR.
    
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2240).
    * #2264
    * #2263
    * __->__ #2240
  • Re-add markdown streaming (#2029)
    Wait for newlines, then render markdown on a line by line basis. Word wrap it for the current terminal size and then spit it out line by line into the UI. Also adds tests and fixes some UI regressions.
  • [core] Allow resume after client errors (#2053)
    ## Summary
    Allow tui conversations to resume after the client fails out of retries.
    I tested this with exec / mocked api failures as well, and it appears to
    be fine. But happy to add an exec integration test as well!
    
    ## Testing
    - [x] Added integration test
    - [x] Tested locally
  • fix: create separate test_support crates to eliminate #[allow(dead_code)] (#1667)
    Because of a quirk of how implementation tests work in Rust, we had a
    number of `#[allow(dead_code)]` annotations that were misleading because
    the functions _were_ being used, just not by all integration tests in a
    `tests/` folder, so when compiling the test that did not use the
    function, clippy would complain that it was unused.
    
    This fixes things by create a "test_support" crate under the `tests/`
    folder that is imported as a dev dependency for the respective crate.