13 Commits

  • [codex] Use expect in integration tests (#28441)
    The workspace denies `clippy::expect_used` in production. Although
    `clippy.toml` allows `expect` in tests, Bazel Clippy compiles
    integration-test helper code in a way that does not receive that
    exemption, which encouraged verbose `unwrap_or_else(... panic!(...))`
    and equivalent `match`/`let else` forms.
    
    This allows `clippy::expect_used` once at each integration-test crate
    root (including aggregated suites and test-support libraries), then
    replaces manual panic-based Result and Option unwraps with
    `expect`/`expect_err`. Standalone `tests/*.rs` files remain their own
    crate roots. Intentional assertion and unexpected-variant panics remain
    unchanged, and the production `expect_used = "deny"` lint remains in
    place.
    
    The cleanup is mechanical and net-negative in line count.
  • app-server: remove experimental persist_extended_history bool flag (#25712)
    ## Summary
    
    Remove the dead experimental `persistExtendedHistory` app-server flag
    and collapse rollout persistence to the single policy app-server already
    used.
    
    ## What Changed
    
    - Removed `persistExtendedHistory` from v2 thread start/resume/fork
    params and deleted its deprecation notice path.
    - Removed the persistence-mode enums and plumbing through core, rollout,
    and thread-store.
    - Made rollout filtering mode-free, keeping the existing limited
    persisted-history behavior.
    
    ## Test Plan
    
    - `just write-app-server-schema`
    - `cargo nextest run --no-fail-fast -p codex-app-server-protocol
    schema_fixtures`
    - `cargo nextest run --no-fail-fast -p codex-app-server
    thread_shell_command_history_responses_exclude_persisted_command_executions`
    - `cargo nextest run --no-fail-fast -p codex-rollout -p
    codex-thread-store`
    - final `rg` for removed flag/type names
  • fix: rename McpServer to TestAppServer (#25701)
    This PR brought to you via VS Code rather than Codex...
    
    - opened `codex-rs/app-server/tests/common/mcp_process.rs`
    - put the cursor on `McpServer`
    - hit `F2` and renamed the symbol to `TestAppServer`
    - went to the file tree
    - hit enter and renamed `mcp_process.rs` to `test_app_server.rs`
    - ran **Save All Files** from the Command Palette
    - ran `just fmt`
    
    The End
    
    (Admittedly, most of the local variables for `TestAppServer` are still
    named `mcp`, though.)
  • [codex] Add user input client ids (#24653)
    ## Summary
    
    Adds an optional `clientId` field to app-server v2 `UserInput` and
    carries it through the core `UserInput` model so clients can correlate
    echoed user input items without relying on payload equality.
    
    ## Details
    
    - Adds `client_id: Option<String>` to core `UserInput` variants.
    - Exposes the v2 app-server field as `clientId` on the wire and in
    generated TypeScript.
    - Preserves the id when converting between app-server v2 and core
    protocol types.
    - Regenerates app-server schema fixtures.
    
    ## Validation
    
    - `just fmt`
    - `just write-app-server-schema`
    - `cargo test -p codex-app-server-protocol`
    - `cargo test -p codex-protocol`
    - `just fix -p codex-app-server-protocol`
    - `just fix -p codex-protocol`
    - `git diff --check`
  • Make local environment optional in EnvironmentManager (#23369)
    ## Summary
    - make `EnvironmentManager` local environment/runtime paths optional
    - simplify constructor surface around snapshot materialization
    - rename local env accessors to `require_local_environment` /
    `try_local_environment`
    
    ## Validation
    - devbox Bazel build for touched crate surfaces
    - `//codex-rs/exec-server:exec-server-unit-tests`
    - `//codex-rs/app-server-client:app-server-client-unit-tests`
    - filtered touched `//codex-rs/core:core-unit-tests` cases
  • feat(app-server, threadstore): Thread pagination APIs and ThreadStore contract (#21566)
    ## Why
    The goal of this PR is to align on app-server and `ThreadStore` API
    updates for paginating through large threads.
    
    
    #### app-server
    ##### `thread/turns/list`
    - Updates `thread/turns/list` to support `itemsView?: "notLoaded" |
    "summary" | "full" | null`, defaulting to `summary`.
    - Implements the current `thread/turns/list` behavior over the existing
    persisted rollout-history fallback:
      - `notLoaded` returns turn envelopes with empty `items`.
    - `summary` returns the first user message and final assistant message
    when available.
      - `full` preserves the existing full item behavior.
    
    Note that this method still uses the naive approach of loading the
    entire rollout file, and returns just the filtered slice of the data.
    Real pagination will come later by leveraging SQLite.
    
    ##### `thread/turns/items/list`
    - Adds the experimental `thread/turns/items/list` protocol, schema,
    dispatcher, and processor stub. The app-server currently returns
    JSON-RPC `-32601` with `thread/turns/items/list is not supported yet`.
    
    #### ThreadStore
    - Adds the experimental `thread/turns/items/list` protocol, schema,
    dispatcher, and processor stub. The app-server currently returns
    JSON-RPC `-32601` with `thread/turns/items/list is not supported yet`.
    - Adds `ThreadStore` contract types and stubbed methods for listing
    thread turns and listing items within a turn.
    - Adds a typed `StoredTurnStatus` and `StoredTurnError` to avoid baking
    app-server API enums or lossy string status values into the store-facing
    turn contract.
    - Adds a typed `StoredTurnStatus` and `StoredTurnError` to avoid baking
    app-server API enums or lossy string status values into the store-facing
    turn contract.
    
    This also sketches the storage abstraction we expect to need once turns
    are indexed/stored. In particular, `notLoaded` is useful only if
    ThreadStore can eventually list turn metadata without loading every
    persisted item for each turn.
    
    ## Validation
    
    - Added/updated protocol serialization coverage for the new request and
    response shapes.
    - Added app-server integration coverage for `thread/turns/list` default
    summary behavior and all three `itemsView` modes.
    - Added app-server integration coverage that `thread/turns/items/list`
    returns the expected unsupported JSON-RPC error when experimental APIs
    are enabled.
    - Added thread-store coverage that the default trait methods return
    `ThreadStoreError::Unsupported`.
    
    No developers.openai.com documentation update is needed for this
    internal experimental app-server API surface.
  • feat(app-server): always return limited thread history (#20682)
    ## Why
    
    Whenever we return a thread's history (turns and items) over app-server,
    always return the limited form as specified by the rollout policy
    `EventPersistenceMode::Limited`, even if the thread was previously
    started with `EventPersistenceMode::Extended`.
    
    We're finding it is quite unscalable to be returning the extended
    history, so let's apply the same filtering logic of the rollout policy
    when we load and return the thread's history.
    
    ## What Changed
    
    - Reuse the rollout persistence policy when reconstructing app-server
    `ThreadItem` history so only `EventPersistenceMode::Limited` rollout
    items are replayed into API turns.
    - Route `thread/read`, `thread/resume`, `thread/fork`,
    `thread/turns/list`, and rollback responses through the same filtered
    app-server history projection.
    - Keep live active turns intact when composing a response for a
    currently running thread.
    - Update command execution coverage so persisted extended command events
    are excluded from returned history for `thread/read`, `thread/fork`, and
    `thread/turns/list`.
    
    ## Test Plan
    
    - `cargo test -p codex-app-server limited`
    - `cargo test -p codex-app-server thread_shell_command`
    - `cargo test -p codex-app-server thread_read`
    - `cargo test -p codex-app-server thread_rollback`
    - `cargo test -p codex-app-server thread_fork`
    - `cargo test -p codex-app-server-protocol`
  • Add sorting/backwardsCursor to thread/list and new thread/turns/list api (#17305)
    To improve performance of UI loads from the app, add two main
    improvements:
    1. The `thread/list` api now gets a `sortDirection` request field and a
    `backwardsCursor` to the response, which lets you paginate forwards and
    backwards from a window. This lets you fetch the first few items to
    display immediately while you paginate to fill in history, then can
    paginate "backwards" on future loads to catch up with any changes since
    the last UI load without a full reload of the entire data set.
    2. Added a new `thread/turns/list` api which also has sortDirection and
    backwardsCursor for the same behavior as `thread/list`, allowing you the
    same small-fetch for immediate display followed by background fill-in
    and resync catchup.
  • Fix Windows Bazel app-server trust tests (#16711)
    ## Why
    
    Extracted from [#16528](https://github.com/openai/codex/pull/16528) so
    the Windows Bazel app-server test failures can be reviewed independently
    from the rest of that PR.
    
    This PR targets:
    
    -
    `suite::v2::thread_shell_command::thread_shell_command_runs_as_standalone_turn_and_persists_history`
    -
    `suite::v2::thread_start::thread_start_with_elevated_sandbox_trusts_project_and_followup_loads_project_config`
    -
    `suite::v2::thread_start::thread_start_with_nested_git_cwd_trusts_repo_root`
    
    There were two Windows-specific assumptions baked into those tests and
    the underlying trust lookup:
    
    - project trust keys were persisted and looked up using raw path
    strings, but Bazel's Windows test environment can surface canonicalized
    paths with `\\?\` / UNC prefixes or normalized symlink/junction targets,
    so follow-up `thread/start` requests no longer matched the project entry
    that had just been written
    - `item/commandExecution/outputDelta` assertions compared exact trailing
    line endings even though shell output chunk boundaries and CRLF handling
    can differ on Windows, and Bazel made that timing-sensitive mismatch
    visible
    
    There was also one behavior bug separate from the assertion cleanup:
    `thread/start` decided whether to persist trust from the final resolved
    sandbox policy, but on Windows an explicit `workspace-write` request may
    be downgraded to `read-only`. That incorrectly skipped writing trust
    even though the request had asked to elevate the project, so the new
    logic also keys off the requested sandbox mode.
    
    ## What
    
    - Canonicalize project trust keys when persisting/loading `[projects]`
    entries, while still accepting legacy raw keys for existing configs.
    - Persist project trust when `thread/start` explicitly requests
    `workspace-write` or `danger-full-access`, even if the resolved policy
    is later downgraded on Windows.
    - Make the Windows app-server tests compare persisted trust paths and
    command output deltas in a path/newline-normalized way.
    
    ## Verification
    
    - Existing app-server v2 tests cover the three failing Windows Bazel
    cases above.
  • app-server: make thread/shellCommand tests shell-aware (#16635)
    ## Why
    `thread/shellCommand` executes the raw command string through the
    current user shell, which is PowerShell on Windows. The two v2
    app-server tests in `app-server/tests/suite/v2/thread_shell_command.rs`
    used POSIX `printf`, so Bazel CI on Windows failed with `printf` not
    being recognized as a PowerShell command.
    
    For reference, the user-shell task wraps commands with the active shell
    before execution:
    [`core/src/tasks/user_shell.rs`](https://github.com/openai/codex/blob/7a3eec6fdb356bd71f80582119eb829179ff0da1/codex-rs/core/src/tasks/user_shell.rs#L120-L126).
    
    ## What Changed
    Added a test-local helper that builds a shell-appropriate output command
    and expected newline sequence from `default_user_shell()`:
    
    - PowerShell: `Write-Output '...'` with `\r\n`
    - Cmd: `echo ...` with `\r\n`
    - POSIX shells: `printf '%s\n' ...` with `\n`
    
    Both `thread_shell_command_runs_as_standalone_turn_and_persists_history`
    and `thread_shell_command_uses_existing_active_turn` now use that
    helper.
    
    ## Verification
    - `cargo test -p codex-app-server thread_shell_command`
  • chore: clean up argument-comment lint and roll out all-target CI on macOS (#16054)
    ## Why
    
    `argument-comment-lint` was green in CI even though the repo still had
    many uncommented literal arguments. The main gap was target coverage:
    the repo wrapper did not force Cargo to inspect test-only call sites, so
    examples like the `latest_session_lookup_params(true, ...)` tests in
    `codex-rs/tui_app_server/src/lib.rs` never entered the blocking CI path.
    
    This change cleans up the existing backlog, makes the default repo lint
    path cover all Cargo targets, and starts rolling that stricter CI
    enforcement out on the platform where it is currently validated.
    
    ## What changed
    
    - mechanically fixed existing `argument-comment-lint` violations across
    the `codex-rs` workspace, including tests, examples, and benches
    - updated `tools/argument-comment-lint/run-prebuilt-linter.sh` and
    `tools/argument-comment-lint/run.sh` so non-`--fix` runs default to
    `--all-targets` unless the caller explicitly narrows the target set
    - fixed both wrappers so forwarded cargo arguments after `--` are
    preserved with a single separator
    - documented the new default behavior in
    `tools/argument-comment-lint/README.md`
    - updated `rust-ci` so the macOS lint lane keeps the plain wrapper
    invocation and therefore enforces `--all-targets`, while Linux and
    Windows temporarily pass `-- --lib --bins`
    
    That temporary CI split keeps the stricter all-targets check where it is
    already cleaned up, while leaving room to finish the remaining Linux-
    and Windows-specific target-gated cleanup before enabling
    `--all-targets` on those runners. The Linux and Windows failures on the
    intermediate revision were caused by the wrapper forwarding bug, not by
    additional lint findings in those lanes.
    
    ## Validation
    
    - `bash -n tools/argument-comment-lint/run.sh`
    - `bash -n tools/argument-comment-lint/run-prebuilt-linter.sh`
    - shell-level wrapper forwarding check for `-- --lib --bins`
    - shell-level wrapper forwarding check for `-- --tests`
    - `just argument-comment-lint`
    - `cargo test` in `tools/argument-comment-lint`
    - `cargo test -p codex-terminal-detection`
    
    ## Follow-up
    
    - Clean up remaining Linux-only target-gated callsites, then switch the
    Linux lint lane back to the plain wrapper invocation.
    - Clean up remaining Windows-only target-gated callsites, then switch
    the Windows lint lane back to the plain wrapper invocation.
  • Split features into codex-features crate (#15253)
    - Split the feature system into a new `codex-features` crate.
    - Cut `codex-core` and workspace consumers over to the new config and
    warning APIs.
    
    Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
    Co-authored-by: Codex <noreply@openai.com>
  • Add thread/shellCommand to app server API surface (#14988)
    This PR adds a new `thread/shellCommand` app server API so clients can
    implement `!` shell commands. These commands are executed within the
    sandbox, and the command text and output are visible to the model.
    
    The internal implementation mirrors the current TUI `!` behavior.
    - persist shell command execution as `CommandExecution` thread items,
    including source and formatted output metadata
    - bridge live and replayed app-server command execution events back into
    the existing `tui_app_server` exec rendering path
    
    This PR also wires `tui_app_server` to submit `!` commands through the
    new API.