Commit Graph

3440 Commits

  • Fixed a flaky test (#10970)
    ## Summary
    
    Stabilize v2 review integration tests by making them hermetic with
    respect to model discovery.
    
    `app-server` review tests were intermittently timing out in CI
    (especially on Windows runners) because their test config allowed remote
    model refresh. During `thread/start`, the test process could issue live
    `/v1/models` requests, introducing external network latency and
    nondeterministic timing before review flow assertions.
    
    This change disables remote model fetching in the review test config
    helper used by these tests.
  • refactor(network-proxy): flatten network config under [network] (#10965)
    Summary:
    - Rename config table from network_proxy to network.
    - Flatten allowed_domains, denied_domains, allow_unix_sockets, and
    allow_local_binding onto NetworkProxySettings.
    - Update runtime, state constraints, tests, and README to the new config
    shape.
  • fix(tui): conditionally restore status indicator using message phase (#10947)
    TLDR: use new message phase field emitted by preamble-supported models
    to determine whether an AgentMessage is mid-turn commentary. if so,
    restore the status indicator afterwards to indicate the turn has not
    completed.
    
    ### Problem
    `commit_tick` hides the status indicator while streaming assistant text.
    For preamble-capable models, that text can be commentary mid-turn, so
    hiding was correct during streaming but restore timing mattered:
    - restoring too aggressively caused jitter/flashing
    - not restoring caused indicator to stay hidden before subsequent work
    (tool calls, web search, etc.)
    
    ### Fix
    - Add optional `phase` to `AgentMessageItem` and propagate it from
    `ResponseItem::Message`
    - Keep indicator hidden during streamed commit ticks, restore only when:
      - assistant item completes as `phase=commentary`, and
      - stream queues are idle + task is still running.
    - Treat `phase=None` as final-answer behavior (no restore) to keep
    existing behavior for non-preamble models
    
    ### Tests
    Add/update tests for:
    - no idle-tick restore without commentary completion
    - commentary completion restoring status before tool begin
    - snapshot coverage for preamble/status behavior
    
    ---------
    
    Co-authored-by: Josh McKinney <joshka@openai.com>
  • Mark Config.apps as experimental, correct schema generation issue (#10938)
    This PR makes `Config.apps `experimental-only and fixes a TS schema
    post-processing bug that removed needed imports. The bug happened
    because import pruning only checked the inner type body after filtering,
    not the full alias, so `JsonValue` got dropped from `Config.ts`. We now
    prune against the full alias body and added a regression test for this
    scenario.
  • TUI/Core: preserve duplicate skill/app mention selection across submit + resume (#10855)
    ## What changed
    
    - In `codex-rs/core/src/skills/injection.rs`, we now honor explicit
    `UserInput::Skill { name, path }` first, then fall back to text mentions
    only when safe.
    - In `codex-rs/tui/src/bottom_pane/chat_composer.rs`, mention selection
    is now token-bound (selected mention is tied to the specific inserted
    `$token`), and we snapshot bindings at submit time so selection is not
    lost.
    - In `codex-rs/tui/src/chatwidget.rs` and
    `codex-rs/tui/src/bottom_pane/mod.rs`, submit/queue paths now consume
    the submit-time mention snapshot (instead of rereading cleared composer
    state).
    - In `codex-rs/tui/src/mention_codec.rs` and
    `codex-rs/tui/src/bottom_pane/chat_composer_history.rs`, history now
    round-trips mention targets so resume restores the same selected
    duplicate.
    - In `codex-rs/tui/src/bottom_pane/skill_popup.rs` and
    `codex-rs/tui/src/bottom_pane/chat_composer.rs`, duplicate labels are
    normalized to `[Repo]` / `[App]`, app rows no longer show `Connected -`,
    and description space is a bit wider.
    
    <img width="550" height="163" alt="Screenshot 2026-02-05 at 9 56 56 PM"
    src="https://github.com/user-attachments/assets/346a7eb2-a342-4a49-aec8-68dfec0c7d89"
    />
    <img width="550" height="163" alt="Screenshot 2026-02-05 at 9 57 09 PM"
    src="https://github.com/user-attachments/assets/5e04d9af-cccf-4932-98b3-c37183e445ed"
    />
    
    
    ## Before vs now
    
    - Before: selecting a duplicate could still submit the default/repo
    match, and resume could lose which duplicate was originally selected.
    - Now: the exact selected target (skill path or app id) is preserved
    through submit, queue/restore, and resume.
    
    ## Manual test
    
    1. Build and run this branch locally:
       - `cd /Users/daniels/code/codex/codex-rs`
       - `cargo build -p codex-cli --bin codex`
       - `./target/debug/codex`
    2. Open mention picker with `$` and pick a duplicate entry (not the
    first one).
    3. Confirm duplicate UI:
       - repo duplicate rows show `[Repo]`
       - app duplicate rows show `[App]`
       - app description does **not** start with `Connected -`
    4. Submit the prompt, then press Up to restore draft and submit again.  
       Expected: it keeps the same selected duplicate target.
    5. Use `/resume` to reopen the session and send again.  
    Expected: restored mention still resolves to the same duplicate target.
  • Support alternative websocket API (#10861)
    **Test plan**
    
    ```
    cargo build -p codex-cli && RUST_LOG='codex_api::endpoint::responses_websocket=trace,codex_core::client=debug,codex_core::codex=debug' \
      ./target/debug/codex \
        --enable responses_websockets_v2 \
        --profile byok \
        --full-auto
    ```
  • Treat compaction failure as failure state (#10927)
    - Return compaction errors from local and remote compaction flows.\n-
    Stop turns/tasks when auto-compaction fails instead of continuing
    execution.
  • chore(app-server): add experimental annotation to relevant fields (#10928)
    These fields had always been documented as experimental/unstable with
    docstrings, but now let's actually use the `experimental` annotation to
    be more explicit.
    
    - thread/start.experimentalRawEvents
    - thread/resume.history
    - thread/resume.path
    - thread/fork.path
    - turn/start.collaborationMode
    - account/login/start.chatgptAuthTokens
  • core: refresh developer instructions after compaction replacement history (#10574)
    ## Summary
    
    When replaying compacted history (especially `replacement_history` from
    remote compaction), we should not keep stale developer messages from
    older session state. This PR trims developer-
    role messages from compacted replacement history and reinjects fresh
    developer instructions derived from current turn/session state.
    
    This aligns compaction replay behavior with the intended "fresh
    instructions after summary" model.
    
    ## Problem
    
    Compaction replay had two paths:
    
    - `Compacted { replacement_history: None }`: rebuilt with fresh initial
    context
    - `Compacted { replacement_history: Some(...) }`: previously used raw
    replacement history as-is
    
    The second path could carry stale developer instructions
    (permissions/personality/collab-mode guidance) across session changes.
    
    ## What Changed
    
    ### 1) Added helper to refresh compacted developer instructions
    
    - **File:** `codex-rs/core/src/compact.rs`
    - **Function:** `refresh_compacted_developer_instructions(...)`
    
    Behavior:
    - remove all `ResponseItem::Message { role: "developer", .. }` from
    compacted history
    - append fresh developer messages from current
    `build_initial_context(...)`
    
    ### 2) Applied helper in remote compaction flow
    
    - **File:** `codex-rs/core/src/compact_remote.rs`
    - After receiving compact endpoint output, refresh developer
    instructions before replacing history and persisting
    `replacement_history`.
    
    ### 3) Applied helper while reconstructing history from rollout
    
    - **File:** `codex-rs/core/src/codex.rs`
    - In `reconstruct_history_from_rollout(...)`, when processing
    `Compacted` entries with `replacement_history`, refresh developer
    instructions instead of directly replacing with raw history.
    
    ## Non-Goals / Follow-up
    
    This PR does **not** address the existing first-turn-after-resume
    double-injection behavior.
    A follow-up PR will handle resume-time dedup/idempotence separately.
    
    If you want, I can also give you a shorter “squash-merge friendly”
    version of the description.
    
    ## Codex author
    `codex fork 019c25e6-706e-75d1-9198-688ec00a8256`
  • core: preconnect Responses websocket for first turn (#10698)
    ## Problem
    The first user turn can pay websocket handshake latency even when a
    session has already started. We want to reduce that initial delay while
    preserving turn semantics and avoiding any prompt send during startup.
    
    Reviewer feedback also called out duplicated connect/setup paths and
    unnecessary preconnect state complexity.
    
    ## Mental model
    `ModelClient` owns session-scoped transport state. During session
    startup, it can opportunistically warm one websocket handshake slot. A
    turn-scoped `ModelClientSession` adopts that slot once if available,
    restores captured sticky turn-state, and otherwise opens a websocket
    through the same shared connect path.
    
    If startup preconnect is still in flight, first turn setup awaits that
    task and treats it as the first connection attempt for the turn.
    
    Preconnect is handshake-only. The first `response.create` is still sent
    only when a turn starts.
    
    ## Non-goals
    This change does not make preconnect required for correctness and does
    not change prompt/turn payload semantics. It also does not expand
    fallback behavior beyond clearing preconnect state when fallback
    activates.
    
    ## Tradeoffs
    The implementation prioritizes simpler ownership and shared connection
    code over header-match gating for reuse. The single-slot cache keeps
    lifecycle straightforward but only benefits the immediate next turn.
    
    Awaiting in-flight preconnect has the same app-level connect-timeout
    semantics as existing websocket connect behavior (no new timeout class
    introduced by this PR).
    
    ## Architecture
    `core/src/client.rs`:
    - Added session-level preconnect lifecycle state (`Idle` / `InFlight` /
    `Ready`) carrying one warmed websocket plus optional captured
    turn-state.
    - Added `pre_establish_connection()` startup warmup and `preconnect()`
    handshake-only setup.
    - Deduped auth/provider resolution into `current_client_setup()` and
    websocket handshake wiring into `connect_websocket()` /
    `build_websocket_headers()`.
    - Updated turn websocket path to adopt preconnect first, await in-flight
    preconnect when present, then create a new websocket only when needed.
    - Ensured fallback activation clears warmed preconnect state.
    - Added documentation for lifecycle, ownership, sticky-routing
    invariants, and timeout semantics.
    
    `core/src/codex.rs`:
    - Session startup invokes `model_client.pre_establish_connection(...)`.
    - Turn metadata resolution uses the shared timeout helper.
    
    `core/src/turn_metadata.rs`:
    - Centralized shared timeout helper used by both turn-time metadata
    resolution and startup preconnect metadata building.
    
    `core/tests/common/responses.rs` + websocket test suites:
    - Added deterministic handshake waiting helper (`wait_for_handshakes`)
    with bounded polling.
    - Added startup preconnect and in-flight preconnect reuse coverage.
    - Fallback expectations now assert exactly two websocket attempts in
    covered scenarios (startup preconnect + turn attempt before fallback
    sticks).
    
    ## Observability
    Preconnect remains best-effort and non-fatal. Existing
    websocket/fallback telemetry remains in place, and debug logs now make
    preconnect-await behavior and preconnect failures easier to reason
    about.
    
    ## Tests
    Validated with:
    1. `just fmt`
    2. `cargo test -p codex-core websocket_preconnect -- --nocapture`
    3. `cargo test -p codex-core websocket_fallback -- --nocapture`
    4. `cargo test -p codex-core
    websocket_first_turn_waits_for_inflight_preconnect -- --nocapture`
  • fix(linux-sandbox): block io_uring syscalls in no-network seccomp policy (#10814)
    ## Summary
    
    - Add seccomp deny rules for `io_uring` syscalls in the Linux sandbox
    network policy.
    - Specifically deny:
      - `SYS_io_uring_setup`
      - `SYS_io_uring_enter`
      - `SYS_io_uring_register`
  • feat(network-proxy): add structured policy decision to blocked errors (#10420)
    ## Summary
    Add explicit, model-visible network policy decision metadata to blocked
    proxy responses/errors.
    
    Introduces a standardized prefix line: `CODEX_NETWORK_POLICY_DECISION
    {json}`
    
    and wires it through blocked paths for:
    - HTTP requests
    - HTTPS CONNECT
    - SOCKS5 TCP/UDP denials
    
    ## Why
    The model should see *why* a request was blocked
    (reason/source/protocol/host/port) so it can choose the correct next
    action.
    
    ## Notes
    - This PR is intentionally independent of config-layering/network-rule
    runtime integration.
    - Focus is blocked decision surface only.
  • Add app configs to config.toml (#10822)
    Adds app configs to config.toml + tests
  • Queue nudges while plan generating (#10457)
    ## Summary
    
    This PR fixes a UI/streaming race when nudged or steer-enabled messages
    are queued during an active Plan stream.
    
    Previously, `submit_user_message_with_mode` switched collaboration mode
    immediately (via `set_collaboration_mask`) even when the message was
    queued. If that happened mid-Plan stream, `active_mode_kind` could flip
    away from Plan before the turn finished, causing subsequent
    `on_plan_delta` updates to be ignored in the UI.
    
    Now, mode switching is deferred until the queued message is actually
    submitted.
    
    ## What changed
    
    - Added a per-message deferred mode override on `UserMessage`:
      - `collaboration_mode_override: Option<CollaborationModeMask>`
    - Updated `submit_user_message_with_mode` to:
      - create a `UserMessage` carrying the mode override
    - queue or submit that message without mutating global mode immediately
    - Updated `submit_user_message` to:
    - apply `collaboration_mode_override` just before constructing/sending
    `Op::UserTurn`
    - Kept queueing condition scoped to active Plan stream rendering:
    - queue only while plan output is actively streaming in TUI
    (`plan_stream_controller.is_some()`)
    
    ## Why
    
    This preserves Plan mode for the remainder of the in-flight Plan turn,
    so streamed plan deltas continue rendering correctly, while still
    ensuring the follow-up queued message is sent with the intended
    collaboration mode.
    
    ## Behavior after this change
    
    - If a nudged/steer submission happens while Plan output is actively
    streaming:
      - message is queued
      - UI stays in Plan mode for the running turn
    - once dequeued/submitted, mode override is applied and the message is
    sent in the intended mode
    - If no Plan stream is active:
    - submission proceeds immediately and mode override is applied as before
    
    ## Tests
    
    Added/updated coverage in `tui/src/chatwidget/tests.rs`:
    
    - `submit_user_message_with_mode_queues_while_plan_stream_is_active`
      - asserts mode remains Plan while queued
    - asserts mode switches to Code when queued message is actually
    submitted
    - `submit_user_message_with_mode_submits_when_plan_stream_is_not_active`
    - `steer_enter_queues_while_plan_stream_is_active`
    - `steer_enter_submits_when_plan_stream_is_not_active`
    
    Also updated existing `UserMessage { ... }` test fixtures to include the
    new field.
    
    ## Codex author
    `codex fork 019c1047-d5d5-7c92-a357-6009604dc7e8`
  • Removed "exec_policy" feature flag (#10851)
    This is no longer needed because it's on by default
  • Handle required MCP startup failures across components (#10902)
    Summary
    - add a `required` flag for MCP servers everywhere config/CLI data is
    touched so mandatory helpers can be round-tripped
    - have `codex exec` and `codex app-server` thread start/resume fail fast
    when required MCPs fail to initialize
  • nit: test an (#10892)
    # External (non-OpenAI) Pull Request Requirements
    
    Before opening this Pull Request, please read the dedicated
    "Contributing" markdown file or your PR may be closed:
    https://github.com/openai/codex/blob/main/docs/contributing.md
    
    If your PR conforms to our contribution guidelines, replace this text
    with a detailed and high quality description of your changes.
    
    Include a link to a bug report or enhancement request.
  • Personality setting is no longer available in experimental menu (#10852)
    This PR removes the inaccurate "Disable in /experimental." statement now
    that the "personality" feature flag is no longer experimental.
    
    This addresses #10850
  • Gate app tooltips to macOS (#10784)
    - Gate app promo tips to macOS and use non-app copy elsewhere.
  • Print warning when config does not meet requirements (#10792)
    <img width="1019" height="284" alt="Screenshot 2026-02-05 at 23 34 08"
    src="https://github.com/user-attachments/assets/19ec3ce1-3c3b-40f5-b251-a31d964bf3bb"
    />
    
    Currently, if a config value is set that fails the requirements, we exit
    Codex.
    
    Now, instead of this, we print a warning and default to a
    requirements-permitting value.
  • feat(app-server): turn/steer API (#10821)
    This PR adds a dedicated `turn/steer` API for appending user input to an
    in-flight turn.
    
    ## Motivation
    Currently, steering in the app is implemented by just calling
    `turn/start` while a turn is running. This has some really weird quirks:
    - Client gets back a new `turn.id`, even though streamed
    events/approvals remained tied to the original active turn ID.
    - All the various turn-level override params on `turn/start` do not
    apply to the "steer", and would only apply to the next real turn.
    - There can also be a race condition where the client thinks the turn is
    active but the server has already completed it, so there might be bugs
    if the client has baked in some client-specific behavior thinking it's a
    steer when in fact the server kicked off a new turn. This is
    particularly possible when running a client against a remote app-server.
    
    Having a dedicated `turn/steer` API eliminates all those quirks.
    
    `turn/steer` behavior:
    - Requires an active turn on threadId. Returns a JSON-RPC error if there
    is no active turn.
    - If expectedTurnId is provided, it must match the active turn (more
    useful when connecting to a remote app-server).
    - Does not emit `turn/started`.
    - Does not accept turn overrides (`cwd`, `model`, `sandbox`, etc.) or
    `outputSchema` to accurately reflect that these are not applied when
    steering.
  • Add stage field for experimental flags. (#10793)
    - [x] Add stage field for experimental flags.
  • updates: use brew api for version check (#10809)
    ## Problem
    
    `codex` currently prompts you to update via `brew upgrade --cask codex`
    but the brew api does not return the new version
    
    > <img width="1500" height="822" alt="Screenshot 2026-02-05 at 12 36
    09 PM"
    src="https://github.com/user-attachments/assets/9e12929d-95e8-43f4-8fba-ab93f5f76e73"
    />
    
    ## Solution
    
    `codex-rs/tui/src/updates.rs` was using the [latest cask in
    github](https://github.com/Homebrew/homebrew-cask/blob/HEAD/Casks/c/codex.rb)
    but this does not agree with the brew api, which leads to the issue
    above. Instead we use the [brew api json
    endpoint](https://github.com/Homebrew/homebrew-cask/blob/HEAD/Casks/c/codex.rb)
    to ensure our version check agrees with the upgrade command.
  • go back to auto-enabling web_search for azure (#10820)
    ###### What
    Remove special-casing that prevented auto-enabling `web_search` for
    Azure model provider users. Addresses #10071, #10257.
    
    ###### Why
    Azure fixed their responsesapi implementation; `web_search` is now
    supported on models it wasn't before (like `gpt-5.1-codex-max`).
    
    This request now works:
    ```
    curl "$AZURE_API_ENDPOINT" -H "Content-Type: application/json" -H "Authorization: Bearer $AZURE_API_KEY" -d '{
      "model": "gpt-5.1-codex-max",
      "tools": [
        { "type": "web_search" }
      ],
      "tool_choice": "auto",
      "input": "Find the sunrise time in Paris today and cite the source."
    }'
    ```
    
    ###### Tests
    Tested with above curl, removed Azure-specific tests.
  • Sync app-server requirements API with refreshed cloud loader (#10815)
    configRequirements/read now returns updated cloud requirements after
    login.
  • Add app-server transport layer with websocket support (#10693)
    - Adds --listen <URL> to codex app-server with two listen modes:
          - stdio:// (default, existing behavior)
          - ws://IP:PORT (new websocket transport)
      - Refactors message routing to be connection-aware:
    - Tracks per-connection session state (initialize/experimental
    capability)
          - Routes responses/errors to the originating connection
    - Broadcasts server notifications/requests to initialized connections
    - Updates initialization semantics to be per connection (not
    process-global), and updates app-server docs accordingly.
    - Adds websocket accept/read/write handling (JSON-RPC per text frame,
    ping/pong handling, connection lifecycle events).
    
    Testing
    
    - Unit tests for transport URL parsing and targeted response/error
    routing.
      - New websocket integration test validating:
          - per-connection initialization requirements
          - no cross-connection response leakage
          - same request IDs on different connections route independently.
  • chore: limit update to 0.98.0 NUX to < 0.98.0 ver (#10787)
    seems like footgun if we forget to remove before releasing 0.99.0,
    limited announcement to versions < 0.98.0
  • [app-server] Add a method to list experimental features. (#10721)
    - [x] Add a method to list experimental features.
  • chore: rm web-search-eligible header (#10660)
    default-enablement of web_search is now client-side, no need to send
    eligibility headers to backend.
    
    Tested locally, headers no longer sent.
    
    will wait for corresponding backend change to deploy before merging
  • add sandbox policy and sandbox name to codex.tool.call metrics (#10711)
    This will give visibility into the comparative success rate of the
    Windows sandbox implementations compared to other platforms.
  • fix(auth): isolate chatgptAuthTokens concept to auth manager and app-server (#10423)
    So that the rest of the codebase (like TUI) don't need to be concerned
    whether ChatGPT auth was handled by Codex itself or passed in via
    app-server's external auth mode.
  • feat(tui): add sortable resume picker with created/updated timestamp toggle (#10752)
    ## Summary
    
    - Add sorting support to the resume session picker with Tab key toggle
    - Sessions can now be sorted by either creation time or last updated
    time
    - Display the current sort mode in the picker header
    - Default to sorting by creation time (most recent first)
    
    ## Changes
    
    - Add `sort_key` field to `PickerState` to track current sort order
    - Pass sort key to `RolloutRecorder::list_threads()` for proper backend
    sorting
    - Add Tab key handler to toggle between `CreatedAt` and `UpdatedAt`
    sorting
    - Show current sort mode ("Created at" / "Updated at") in header
    - Add "Tab to toggle sort" keyboard hint
    - Intelligently hide secondary date column when terminal is narrow
    - Reload session list when sort order changes
    
    ## Test plan
    
    - [x] Unit tests for sort key toggle functionality
    - [x] Snapshot tests updated for new header format
    - [x] Test that Tab key triggers reload with new sort key
    - [x] Test column visibility adapts to narrow terminals
  • feat(tui): add /statusline command for interactive status line configuration (#10546)
    ## Summary
    - Adds a new `/statusline` command to configure TUI footer status line
    - Introduces reusable `MultiSelectPicker` component with keyboard
    navigation, optional ordering and toggle support
    - Implement status line setup modal that persist configuration to
    config.toml
    
      ## Status Line Items
      The following items can be displayed in the status line:
      - **Model**: Current model name (with optional reasoning level)
      - **Context**: Remaining/used context window percentage
      - **Rate Limits**: 5-day and weekly usage limits
      - **Git**: Current branch (with optimized lookups)
      - **Tokens**: Used tokens, input/output token counts
      - **Session**: Session ID (full or shortened prefix)
      - **Paths**: Current directory, project root
      - **Version**: Codex version
    
      ## Features
      - Live preview while configuring status line items
      - Fuzzy search filtering in the picker
      - Intelligent truncation when items don't fit
      - Items gracefully omit when data is unavailable
      - Configuration persists to `config.toml`
      - Validates and warns about invalid status line items
    
      ## Test plan
      - [x] Run `/statusline` and verify picker UI appears
      - [x] Toggle items on/off and verify live preview updates
      - [x] Confirm selection persists after restart
      - [x] Verify truncation behavior with many items selected
      - [x] Test git branch detection in and out of git repos
    
    ---------
    
    Co-authored-by: Josh McKinney <joshka@openai.com>
  • Add hooks implementation and wire up to notify (#9691)
    This introduces a `Hooks` service. It registers hooks from config and
    dispatches hook events at runtime.
    
    N.B. The hook config is not wired up to this yet. But for legacy
    reasons, we wire up `notify` from config and power it using hooks now.
    Nothing about the `notify` interface has changed.
    
    I'd start by reviewing `hooks/types.rs`
    
    Some things to note:
      - hook names subject to change
      - no hook result yet
      - stopping semantics yet to be introduced
      - additional hooks yet to be introduced