Commit Graph

3718 Commits

  • Personality setting is no longer available in experimental menu (#10852)
    This PR removes the inaccurate "Disable in /experimental." statement now
    that the "personality" feature flag is no longer experimental.
    
    This addresses #10850
  • Gate app tooltips to macOS (#10784)
    - Gate app promo tips to macOS and use non-app copy elsewhere.
  • Print warning when config does not meet requirements (#10792)
    <img width="1019" height="284" alt="Screenshot 2026-02-05 at 23 34 08"
    src="https://github.com/user-attachments/assets/19ec3ce1-3c3b-40f5-b251-a31d964bf3bb"
    />
    
    Currently, if a config value is set that fails the requirements, we exit
    Codex.
    
    Now, instead of this, we print a warning and default to a
    requirements-permitting value.
  • feat(app-server): turn/steer API (#10821)
    This PR adds a dedicated `turn/steer` API for appending user input to an
    in-flight turn.
    
    ## Motivation
    Currently, steering in the app is implemented by just calling
    `turn/start` while a turn is running. This has some really weird quirks:
    - Client gets back a new `turn.id`, even though streamed
    events/approvals remained tied to the original active turn ID.
    - All the various turn-level override params on `turn/start` do not
    apply to the "steer", and would only apply to the next real turn.
    - There can also be a race condition where the client thinks the turn is
    active but the server has already completed it, so there might be bugs
    if the client has baked in some client-specific behavior thinking it's a
    steer when in fact the server kicked off a new turn. This is
    particularly possible when running a client against a remote app-server.
    
    Having a dedicated `turn/steer` API eliminates all those quirks.
    
    `turn/steer` behavior:
    - Requires an active turn on threadId. Returns a JSON-RPC error if there
    is no active turn.
    - If expectedTurnId is provided, it must match the active turn (more
    useful when connecting to a remote app-server).
    - Does not emit `turn/started`.
    - Does not accept turn overrides (`cwd`, `model`, `sandbox`, etc.) or
    `outputSchema` to accurately reflect that these are not applied when
    steering.
  • Add stage field for experimental flags. (#10793)
    - [x] Add stage field for experimental flags.
  • updates: use brew api for version check (#10809)
    ## Problem
    
    `codex` currently prompts you to update via `brew upgrade --cask codex`
    but the brew api does not return the new version
    
    > <img width="1500" height="822" alt="Screenshot 2026-02-05 at 12 36
    09 PM"
    src="https://github.com/user-attachments/assets/9e12929d-95e8-43f4-8fba-ab93f5f76e73"
    />
    
    ## Solution
    
    `codex-rs/tui/src/updates.rs` was using the [latest cask in
    github](https://github.com/Homebrew/homebrew-cask/blob/HEAD/Casks/c/codex.rb)
    but this does not agree with the brew api, which leads to the issue
    above. Instead we use the [brew api json
    endpoint](https://github.com/Homebrew/homebrew-cask/blob/HEAD/Casks/c/codex.rb)
    to ensure our version check agrees with the upgrade command.
  • go back to auto-enabling web_search for azure (#10820)
    ###### What
    Remove special-casing that prevented auto-enabling `web_search` for
    Azure model provider users. Addresses #10071, #10257.
    
    ###### Why
    Azure fixed their responsesapi implementation; `web_search` is now
    supported on models it wasn't before (like `gpt-5.1-codex-max`).
    
    This request now works:
    ```
    curl "$AZURE_API_ENDPOINT" -H "Content-Type: application/json" -H "Authorization: Bearer $AZURE_API_KEY" -d '{
      "model": "gpt-5.1-codex-max",
      "tools": [
        { "type": "web_search" }
      ],
      "tool_choice": "auto",
      "input": "Find the sunrise time in Paris today and cite the source."
    }'
    ```
    
    ###### Tests
    Tested with above curl, removed Azure-specific tests.
  • Sync app-server requirements API with refreshed cloud loader (#10815)
    configRequirements/read now returns updated cloud requirements after
    login.
  • Add app-server transport layer with websocket support (#10693)
    - Adds --listen <URL> to codex app-server with two listen modes:
          - stdio:// (default, existing behavior)
          - ws://IP:PORT (new websocket transport)
      - Refactors message routing to be connection-aware:
    - Tracks per-connection session state (initialize/experimental
    capability)
          - Routes responses/errors to the originating connection
    - Broadcasts server notifications/requests to initialized connections
    - Updates initialization semantics to be per connection (not
    process-global), and updates app-server docs accordingly.
    - Adds websocket accept/read/write handling (JSON-RPC per text frame,
    ping/pong handling, connection lifecycle events).
    
    Testing
    
    - Unit tests for transport URL parsing and targeted response/error
    routing.
      - New websocket integration test validating:
          - per-connection initialization requirements
          - no cross-connection response leakage
          - same request IDs on different connections route independently.
  • chore: limit update to 0.98.0 NUX to < 0.98.0 ver (#10787)
    seems like footgun if we forget to remove before releasing 0.99.0,
    limited announcement to versions < 0.98.0
  • [app-server] Add a method to list experimental features. (#10721)
    - [x] Add a method to list experimental features.
  • chore: rm web-search-eligible header (#10660)
    default-enablement of web_search is now client-side, no need to send
    eligibility headers to backend.
    
    Tested locally, headers no longer sent.
    
    will wait for corresponding backend change to deploy before merging
  • add sandbox policy and sandbox name to codex.tool.call metrics (#10711)
    This will give visibility into the comparative success rate of the
    Windows sandbox implementations compared to other platforms.
  • fix(auth): isolate chatgptAuthTokens concept to auth manager and app-server (#10423)
    So that the rest of the codebase (like TUI) don't need to be concerned
    whether ChatGPT auth was handled by Codex itself or passed in via
    app-server's external auth mode.
  • feat(tui): add sortable resume picker with created/updated timestamp toggle (#10752)
    ## Summary
    
    - Add sorting support to the resume session picker with Tab key toggle
    - Sessions can now be sorted by either creation time or last updated
    time
    - Display the current sort mode in the picker header
    - Default to sorting by creation time (most recent first)
    
    ## Changes
    
    - Add `sort_key` field to `PickerState` to track current sort order
    - Pass sort key to `RolloutRecorder::list_threads()` for proper backend
    sorting
    - Add Tab key handler to toggle between `CreatedAt` and `UpdatedAt`
    sorting
    - Show current sort mode ("Created at" / "Updated at") in header
    - Add "Tab to toggle sort" keyboard hint
    - Intelligently hide secondary date column when terminal is narrow
    - Reload session list when sort order changes
    
    ## Test plan
    
    - [x] Unit tests for sort key toggle functionality
    - [x] Snapshot tests updated for new header format
    - [x] Test that Tab key triggers reload with new sort key
    - [x] Test column visibility adapts to narrow terminals
  • feat(tui): add /statusline command for interactive status line configuration (#10546)
    ## Summary
    - Adds a new `/statusline` command to configure TUI footer status line
    - Introduces reusable `MultiSelectPicker` component with keyboard
    navigation, optional ordering and toggle support
    - Implement status line setup modal that persist configuration to
    config.toml
    
      ## Status Line Items
      The following items can be displayed in the status line:
      - **Model**: Current model name (with optional reasoning level)
      - **Context**: Remaining/used context window percentage
      - **Rate Limits**: 5-day and weekly usage limits
      - **Git**: Current branch (with optimized lookups)
      - **Tokens**: Used tokens, input/output token counts
      - **Session**: Session ID (full or shortened prefix)
      - **Paths**: Current directory, project root
      - **Version**: Codex version
    
      ## Features
      - Live preview while configuring status line items
      - Fuzzy search filtering in the picker
      - Intelligent truncation when items don't fit
      - Items gracefully omit when data is unavailable
      - Configuration persists to `config.toml`
      - Validates and warns about invalid status line items
    
      ## Test plan
      - [x] Run `/statusline` and verify picker UI appears
      - [x] Toggle items on/off and verify live preview updates
      - [x] Confirm selection persists after restart
      - [x] Verify truncation behavior with many items selected
      - [x] Test git branch detection in and out of git repos
    
    ---------
    
    Co-authored-by: Josh McKinney <joshka@openai.com>
  • Add hooks implementation and wire up to notify (#9691)
    This introduces a `Hooks` service. It registers hooks from config and
    dispatches hook events at runtime.
    
    N.B. The hook config is not wired up to this yet. But for legacy
    reasons, we wire up `notify` from config and power it using hooks now.
    Nothing about the `notify` interface has changed.
    
    I'd start by reviewing `hooks/types.rs`
    
    Some things to note:
      - hook names subject to change
      - no hook result yet
      - stopping semantics yet to be introduced
      - additional hooks yet to be introduced
  • Leverage state DB metadata for thread summaries (#10621)
    Summary:
    - read conversation summaries and cwd info from the state DB when
    possible so we no longer rely on rollout files for metadata and avoid
    extra I/O
    - persist CLI version in thread metadata, surface it through summary
    builders, and add the necessary DB migration hooks
    - simplify thread listing by using enriched state DB data directly
    rather than reading rollout heads
    
    Testing:
    - Not run (not requested)
  • feat: add memory tool (#10637)
    Add a tool for memory to retrieve a full memory based on the memory ID
  • feat: resumable backfill (#10745)
    ## Summary
    
    This PR makes SQLite rollout backfill resumable and repeatable instead
    of one-shot-on-db-create.
    
    ## What changed
    
    - Added a persisted backfill state table:
      - state/migrations/0008_backfill_state.sql
    - Tracks status (pending|running|complete), last_watermark, and
    last_success_at.
    - Added backfill state model/types in codex-state:
      - BackfillState, BackfillStatus (state/src/model/backfill_state.rs)
    - Added runtime APIs to manage backfill lifecycle/progress:
      - get_backfill_state
      - mark_backfill_running
      - checkpoint_backfill
      - mark_backfill_complete
    - Updated core startup behavior:
    - Backfill now runs whenever state is not Complete (not only when DB
    file is newly created).
    - Reworked backfill execution:
    - Collect rollout files, derive deterministic watermark per path, sort,
    resume from last_watermark.
    - Process in batches (BACKFILL_BATCH_SIZE = 200), checkpoint after each
    batch.
      - Mark complete with last_success_at at the end.
    
    ## Why
    
    Previous behavior could leave users permanently partially backfilled if
    the process exited during initial async backfill. This change allows
    safe continuation across restarts and avoids restarting from scratch.
  • Include real OS info in metrics. (#10425)
    calculated a hashed user ID from either auth user id or API key
    Also correctly populates OS.
    
    These will make our metrics more useful and powerful for analysis.
  • Update explorer role default model (#10748)
    Summary
    - switch the explorer role in core agent configuration to use
    `gpt-5.1-codex-mini` as the default model override
    - leave other role defaults untouched
    
    Testing
    - Not run (not requested)
  • adding fork information (UI) when forking (#10246)
    - shows `/fork` command that ran in prev session
    - shows `session forked from name (uuid) || uuid (if name is not set)` as an event in new session
  • Allow user shell commands to run alongside active turns (#10513)
    Summary
    - refactor user shell command execution into a shared helper and add
    modes for standalone vs active-turn execution
    - run user shell commands asynchronously when a turn is already active
    so they don’t replace or abort the current turn
    - extend the tests to cover the new behavior and add the generated Codex
    environment manifest
    
    Testing
    - Not run (not requested)
  • fix(tui): flush input buffer on init to prevent early exit on Windows (#10729)
    Fixes #10661.
    
    ### Problem
    On Windows, the sign-in menu can exit immediately if the OS-level input
    buffer contains trailing characters (like the Enter key from running the
    command).
    
    ### Solution
    **Flush Input Buffer on Init**: Use FlushConsoleInputBuffer on Windows
    (and cflush on Unix) in ui::init() to discard any input captured before
    the TUI was ready.
    
    Verified by @CodebyAmbrose in #10661.
  • fix(core,app-server) resume with different model (#10719)
    ## Summary
    When resuming with a different model, we should also append a developer
    message with the model instructions
    
    ## Testing
    - [x] Added unit tests
  • Reload cloud requirements after user login (#10725)
    Reload cloud requirements after user login so it could take effect
    immediately.
  • Fix remote compaction estimator/payload instruction small mismatch (#10692)
    ## Summary
    This PR fixes a deterministic mismatch in remote compaction where
    pre-trim estimation and the `/v1/responses/compact` payload could use
    different base instructions.
    
    Before this change:
    - pre-trim estimation used model-derived instructions
    (`model_info.get_model_instructions(...)`)
    - compact payload used session base instructions
    (`sess.get_base_instructions()`)
    
    After this change:
    - remote pre-trim estimation and compact payload both use the same
    `BaseInstructions` instance from session state.
    
    ## Changes
    - Added a shared estimator entry point in `ContextManager`:
    - `estimate_token_count_with_base_instructions(&self, base_instructions:
    &BaseInstructions) -> Option<i64>`
    - Kept `estimate_token_count(&TurnContext)` as a thin wrapper that
    resolves model/personality instructions and delegates to the new helper.
    - Updated remote compaction flow to fetch base instructions once and
    reuse it for both:
      - trim preflight estimation
      - compact request payload construction
    - Added regression coverage for parity and behavior:
      - unit test verifying explicit-base estimator behavior
    - integration test proving remote compaction uses session override
    instructions and trims accordingly
    
    ## Why this matters
    This removes a deterministic divergence source where pre-trim could
    think the request fits while the actual compact request exceeded context
    because its instructions were longer/different.
    
    ## Scope
    In scope:
    - estimator/payload base-instructions parity in remote compaction
    
    Out of scope:
    - retry-on-`context_length_exceeded`
    - compaction threshold/headroom policy changes
    - broader trimming policy changes
    
    ## Codex author:
    `codex fork 019c2b24-c2df-7b31-a482-fb8cf7a28559`
  • Make steer stable by default (#10690)
    Promotes the Steer feature from Experimental to Stable and enables it by
    default.
    
    ## What is Steer mode?
    
    Steer mode changes how message submission works in the TUI:
    
    - **With Steer enabled (new default)**: 
      - `Enter` submits messages immediately, even when a task is running
    - `Tab` queues messages when a task is running (allows building up a
    queue)
      
    - **With Steer disabled (old behavior)**:
      - `Enter` queues messages when a task is running
      - This preserves the previous "queue while a task is running" behavior
    
    ## How Steer vs Queue work
    
    The key difference is in the submission behavior:
    
    1. **Steer mode** (`steer_enabled = true`):
    - Enter → `InputResult::Submitted` → sends immediately via
    `submit_user_message()`
    - Tab → `InputResult::Queued` → queues via `queue_user_message()` if a
    task is running
    - This gives users direct control: Enter for immediate submission, Tab
    for queuing
    
    2. **Queue mode** (`steer_enabled = false`, previous default):
    - Enter → `InputResult::Queued` → always queues when a task is running
       - Tab → `InputResult::Queued` → queues when a task is running
    - This preserves the original behavior where Enter respects the running
    task queue
    
    ## Implementation details
    
    The behavior is controlled in
    `ChatComposer::handle_key_event_without_popup()`:
    - When `steer_enabled` is true, Enter calls `handle_submission(false)`
    (submit immediately)
    - When `steer_enabled` is false, Enter calls `handle_submission(true)`
    (queue)
    
    See `codex-rs/tui/src/bottom_pane/chat_composer.rs` for the
    implementation.
    
    ## Documentation
    
    For more details on the chat composer behavior, see:
    - [TUI Chat Composer documentation](docs/tui-chat-composer.md)
    - Feature flag definition: `codex-rs/core/src/features.rs`
  • Sync collaboration mode naming across Default prompt, tools, and TUI (#10666)
    ## Summary
    - add shared `ModeKind` helpers for display names, TUI visibility, and
    `request_user_input` availability
    - derive TUI mode filtering/labels from shared `ModeKind` metadata
    instead of local hardcoded matches
    - derive `request_user_input` availability text and unavailable error
    mode names from shared mode metadata
    - replace hardcoded known mode names in the Default collaboration-mode
    template with `{{KNOWN_MODE_NAMES}}` and fill it from
    `TUI_VISIBLE_COLLABORATION_MODES`
    - add regression tests for mode metadata sync and placeholder
    replacement
    
    ## Notes
    - `cargo test -p codex-core` integration target (`tests/all`) still
    shows pre-existing env-specific failures in this environment due missing
    `test_stdio_server` binary resolution; core unit tests are green.
    
    ## Codex author
    `codex resume 019c26ff-dfe7-7173-bc04-c9e1fff1e447`
  • fix(core) switching model appends model instructions (#10651)
    ## Summary
    When switching models, we should append the instructions of the new
    model to the conversation as a developer message.
    
    ## Test
    - [x] Adds a unit test
  • chore(config) Default Personality Pragmatic (#10705)
    ## Summary
    Switch back to Pragmatic personality
    
    ## Testing
    - [x] Updated unit tests
  • fix: ensure resume args precede image args (#10709)
    ## Summary
    Fixes argument ordering when `resumeThread()` is used with
    `local_image`. The SDK previously emitted CLI args with `--image` before
    `resume <threadId>`, which caused the Codex CLI to treat `resume`/UUID
    as image paths and start a new session. This PR moves `resume
    <threadId>` before any `--image` flags and adds a regression test.
    
    ## Bug Report / Links
    - OpenAI issue: https://github.com/openai/codex/issues/10708
    - Repro repo:
    https://github.com/cryptonerdcn/codex-resume-local-image-repro
    - Repro issue (repo):
    https://github.com/cryptonerdcn/codex-resume-local-image-repro/issues/1
    
    ## Repro (pre-fix)
    1. Build SDK from source
    2. Run resume + local_image
    3. Args order: `--image <path> resume <id>`
    4. Result: new session created (thread id changes)
    
    ## Fix
    Move `resume <threadId>` before `--image` in `CodexExec.run` and add a
    regression test to assert ordering.
    
    ## Tests
    - `cd sdk/typescript && npm test`
      - **Failed**: `codex-rs/target/debug/codex` missing (ENOENT)
    
    ## Notes
    - I can rerun tests in an environment with `codex-rs` built and report
    results.