Commit Graph

3718 Commits

  • fix: ensure status indicator present earlier in exec path (#10700)
    ensure status indicator present in all classifications of exec tool.
    fixes indicator disappearing after preambles, will look into using
    `phase` to avoid this class of error in a few hours.
    
    commands parsed as unknown faced this issue
    
    tested locally, added test for specific failure flow
  • fix(tui): restore working shimmer after preamble output (#10701)
    ## Problem
    When a turn streamed a preamble line before any tool activity,
    `ChatWidget` hid the status row while committing streamed lines and did
    not restore it until a later event (commonly `ExecCommandBegin`). During
    that idle gap, the UI looked finished even though the turn was still
    active.
    
    ## Mental model
    The bottom status row and transcript stream are separate progress
    affordances:
    - transcript stream shows committed output
    - status row (spinner/shimmer + header) shows liveness of an active turn
    
    While stream output is actively committing, hiding the status row is
    acceptable to avoid redundant visual noise. Once stream controllers go
    idle, an active turn must restore the status row immediately so liveness
    remains visible across preamble-to-tool gaps.
    
    ## Non-goals
    - No changes to streaming chunking policy or pacing.
    - No changes to final completion behavior (status still hides when task
    actually ends).
    - No refactor of status lifecycle ownership between `ChatWidget` and
    `BottomPane`.
    
    ## Tradeoffs
    - We keep the existing behavior of hiding the status row during active
    stream commits.
    - We add explicit restoration on the idle boundary when the task is
    still running.
    - This introduces one extra status update on idle transitions, which is
    small overhead but makes liveness semantics consistent.
    
    ## Architecture
    `run_commit_tick_with_scope` in `chatwidget.rs` now documents and
    enforces a two-phase contract:
    1. For each committed streamed cell, hide status and append transcript
    output.
    2. If controllers are present and all idle, restore status iff task is
    still running, preserving the current header.
    
    This keeps status ownership in `ChatWidget` while relying on
    `BottomPane` helpers:
    - `hide_status_indicator()` during active stream commits
    - `ensure_status_indicator()` +
    `set_status_header(current_status_header)` at stream-idle boundary
    
    Documentation pass additions:
    - Clarified the function-level contract and lifecycle intent in
    `run_commit_tick_with_scope`.
    - Added an explicit regression snapshot test comment describing the
    failing sequence.
    
    ## Observability
    Signal that the fix is present:
    - In the preamble-idle state, rendered output still includes `• Working
    (… esc to interrupt)`.
    - New snapshot:
    `codex_tui__chatwidget__tests__preamble_keeps_working_status.snap`.
    
    Debug path for future regressions:
    - Start at `run_commit_tick_with_scope` for hide/restore transitions.
    - Verify `bottom_pane.is_task_running()` at idle transition.
    - Confirm `current_status_header` continuity when status is recreated.
    - Use the new snapshot and targeted test sequence to reproduce
    deterministic preamble-idle behavior.
    
    ## Tests
    - Updated regression assertion:
    - `streaming_final_answer_keeps_task_running_state` now expects status
    widget to remain present while turn is running.
    - Renamed/updated behavioral regression:
      - `preamble_keeps_status_indicator_visible_until_exec_begin`.
    - Added snapshot regression coverage:
      - `preamble_keeps_working_status_snapshot`.
    - Snapshot file:
    `tui/src/chatwidget/snapshots/codex_tui__chatwidget__tests__preamble_keeps_working_status.snap`.
    
    Commands run:
    - `just fmt`
    - `cargo test -p codex-tui
    preamble_keeps_status_indicator_visible_until_exec_begin`
    - `cargo test -p codex-tui preamble_keeps_working_status_snapshot`
    
    ## Risks / Inconsistencies
    - Status visibility policy is still split across multiple event paths
    (`commit tick`, `turn complete`, `exec begin`), so future regressions
    can reintroduce ordering gaps.
    - Restoration depends on `is_task_running()` correctness; if task
    lifecycle flags drift, status behavior will drift too.
    - Snapshot proves rendered state, not animation cadence; cadence still
    relies on frame scheduling behavior elsewhere.
  • chore(core) personality migration tests (#10650)
    ## Summary
    Adds additional tests for personality edge cases
    
    ## Testing
    - [x] These are tests
  • Cloud Requirements: increase timeout and retries (#10631)
    Add retries and an increased-length timeout for loading Cloud
    Requirements.
    
    Co-authored-by: alexsong-oai <alexsong@openai.com>
  • feat(core): add configurable log_dir (#10678)
    Adds a top-level `log_dir` config key (defaults to `$CODEX_HOME/log`) so
    one-off runs can redirect `codex-tui.log` via `-c`, e.g.:
    
      codex -c log_dir=./.codex-log
    
    Also resolves relative paths in CLI `-c/--config` overrides for
    `AbsolutePathBuf` values against the effective cwd (when available).
    
    Tests:
    - cargo test -p codex-core
  • Session-level model client (#10664)
    Make ModelClient a session-scoped object.
    Move state that is session level onto the client, and make state that is
    per-turn explicit on corresponding methods.
    Stop taking a huge Config object, instead only pass in values that are
    actually needed.
    
    ---------
    
    Co-authored-by: Josh McKinney <joshka@openai.com>
  • chore(app-server): document experimental API opt-in (#10667)
    Add a section on how to opt in to the experimental API.
  • feat(app-server, core): allow text + image content items for dynamic tool outputs (#10567)
    Took over the work that @aaronl-openai started here:
    https://github.com/openai/codex/pull/10397
    
    Now that app-server clients are able to set up custom tools (called
    `dynamic_tools` in app-server), we should expose a way for clients to
    pass in not just text, but also image outputs. This is something the
    Responses API already supports for function call outputs, where you can
    pass in either a string or an array of content outputs (text, image,
    file):
    https://platform.openai.com/docs/api-reference/responses/create#responses_create-input-input_item_list-item-function_tool_call_output-output-array-input_image
    
    So let's just plumb it through in Codex (with the caveat that we only
    support text and image for now). This is implemented end-to-end across
    app-server v2 protocol types and core tool handling.
    
    ## Breaking API change
    NOTE: This introduces a breaking change with dynamic tools, but I think
    it's ok since this concept was only recently introduced
    (https://github.com/openai/codex/pull/9539) and it's better to get the
    API contract correct. I don't think there are any real consumers of this
    yet (not even the Codex App).
    
    Old shape:
    `{ "output": "dynamic-ok", "success": true }`
    
    New shape:
    ```
    {
        "contentItems": [
          { "type": "inputText", "text": "dynamic-ok" },
          { "type": "inputImage", "imageUrl": "data:image/png;base64,AAA" }
        ]
      "success": true
    }
    ```
  • add none personality option (#10688)
    - add none personality enum value and empty placeholder behavior\n- add
    docs/schema updates and e2e coverage
  • Added support for live updates to skills (#10478)
    Add a centralized FileWatcher in codex-core (using notify) that watches
    skill roots from the config layer stack (recursive)
    
    Send `SkillsChanged` events when relevant file system changes are
    detected
    
    On `SkillsChanged`:
    * Invalidate the skills cache immediately in ThreadManager
    * Emit EventMsg::SkillsUpdateAvailable to active sessions
    ~~* Broadcast a new app-server notification:
    SkillsListUpdatedNotification~~
    
    This change does not inject new items into the event stream. That means
    the agent will not know about new skills, so it won't be able to
    implicitly invoke new skills. It also won't know about changes to
    existing skills, so if it has already read the contents of a modified
    skill, it will not honor the new behavior.
    
    This change also does not detect modifications to AGENTS.md.
    
    I plan to address these limitations in a follow-on PR modeled after
    #9985. Injection of new skills and AGENTS was deemed to risky, hence the
    need to split the feature into two stages. The changes in this PR were
    designed to easily accommodate the second stage once we have some other
    foundational changes in place.
    
    Testing: In addition to automated tests, I did manual testing to confirm
    that newly-created skills, deleted skills, and renamed skills are
    reflected in the TUI skill picker menu. Also confirmed that
    modifications to behaviors for explicitly-invoked skills are honored.
    
    ---------
    
    Co-authored-by: Xin Lin <xl@openai.com>
  • Fix test_shell_command_interruption flake (#10649)
    ## Human summary
    Sandboxing (specifically `LandlockRestrict`) is means that e.g. `sleep
    10` fails immediately. Therefore it cannot be interrupted.
    
    In suite::interrupt::test_shell_command_interruption, sleep 10 is issued
    at 17:28:16.554 (ToolCall: shell_command {"command":"sleep 10"...}),
    then fails at 17:28:16.589 with duration_ms=34, success=false,
    exit_code=101, and
        Sandbox(LandlockRestrict).
    
    ## Codex summary
    - set `sandbox_mode = "danger-full-access"` in `interrupt` and
    `v2/turn_interrupt` integration tests
    - set `sandbox: Some(SandboxMode::DangerFullAccess)` in
    `test_codex_jsonrpc_conversation_flow`
    - set `sandbox_policy: Some(SandboxPolicy::DangerFullAccess)` in
    `command_execution_notifications_include_process_id`
    
    ## Why
    On some Linux CI environments, command execution fails immediately with
    `LandlockRestrict` when sandboxed. These tests are intended to validate
    JSON-RPC/task lifecycle behavior (interrupt semantics, command
    notification shape/process id, request flow), but early sandbox startup
    failure changes turn flow and can trigger extra follow-up requests,
    causing flakes.
    
    This change removes environment-specific sandbox startup dependency from
    these tests while preserving their primary intent.
    
    ## Testing
    - not run in this environment (per request)
  • [apps] Cache MCP actions from apps. (#10662)
    MCP actions take a long time to load for users with lots of apps
    installed. Adding a cache for these actions with 1hr expiration, given
    that they are almost always aren't going to change unless people install
    another app, which means they also need to restart codex to pick it up.
  • Fix jitter in TUI apps/connectors picker (#10593)
    This PR fixes jitter in the TUI apps menu by making the description
    column stable during rendering and height measurement.
    Added a `stable_desc_col` option to
    `SelectionViewParams`/`ListSelectionView`, introduced stable variants of
    the shared row render/measure helpers in `selection_popup_common`, and
    enabled the stable mode for the apps/connectors picker in `chatwidget`.
    With these changes, only the apps/connectors picker uses this new
    option, though it could be used elsewhere in the future.
    
    Why: previously, the description column was computed from only currently
    visible rows, so as you scrolled or filtered, the column could shift and
    cause wrapping/height changes that looked jumpy. Computing it from all
    rows in this popup keeps alignment and layout consistent as users scroll
    through avaialble apps.
    
    
    
    **Before:**
    
    https://github.com/user-attachments/assets/3856cb72-5465-4b90-a993-65a2ffb09113
    
    
    
    
    
    **After:**
    
    https://github.com/user-attachments/assets/37b9d626-0b21-4c0f-8bb8-244c9ef971ff
  • feat: add phase 1 mem db (#10634)
    - Schema: thread_id (PK, FK to threads.id with cascade delete),
    trace_summary, memory_summary, updated_at.
    - Migration: creates the table and an index on (updated_at DESC,
    thread_id DESC) for efficient recent-first reads.
      - Runtime API (DB-only):
          - `get_thread_memory(thread_id)`: fetch one memory row.
    - `upsert_thread_memory(thread_id, trace_summary, memory_summary)`:
    insert/update by thread id and always advance updated_at.
    - `get_last_n_thread_memories_for_cwd(cwd, n)`: join thread_memory with
    threads and return newest n rows for an exact cwd match.
    - Model layer: introduced ThreadMemory and row conversion types to keep
    query decoding typed and consistent with existing state models.
  • Persist pending input user events (#10656)
    - Persist user-message events for mid-turn injected input by emitting
    user message turn items when pending input is recorded.
  • feat(linux-sandbox): add bwrap support (#9938)
    ## Summary
    This PR introduces a gated Bubblewrap (bwrap) Linux sandbox path. The
    curent Linux sandbox path relies on in-process restrictions (including
    Landlock). Bubblewrap gives us a more uniform filesystem isolation
    model, especially explicit writable roots with the option to make some
    directories read-only and granular network controls.
    
    This is behind a feature flag so we can validate behavior safely before
    making it the default.
    
    - Added temporary rollout flag:
      - `features.use_linux_sandbox_bwrap`
    - Preserved existing default path when the flag is off.
    - In Bubblewrap mode:
    - Added internal retry without /proc when /proc mount is not permitted
    by the host/container.
  • Cloud Requirements: take precedence over MDM (#10633)
    Cloud Requirements should be applied before MDM requirements.
  • Add option to approve and remember MCP/Apps tool usage (#10584)
    This PR adds a new approval option for app/MCP tool calls: “Allow and
    remember” (session-scoped).
    When selected, Codex stores a temporary approval and auto-approves
    matching future calls for the rest of the session.
    
    Added a session-scoped approval key (`server`, `connector_id`,
    `tool_name`) and persisted it in `tool_approvals` as
    `ApprovedForSession`.
    On subsequent matching calls, approval is skipped and treated as
    accepted.
    - Updated the approval question options to conditionally include:
    - Accept
    - Allow and remember (conditional)
    - Decline
    - Cancel
    
    The new “Allow and remember” option is only shown when all of these are
    true:
    
    1. The call is routed through the Codex Apps MCP server (codex_apps).
    2. The tool requires approval based on annotations:
    - read_only_hint == false, and
    - destructive_hint == true or open_world_hint == true.
    3. The tool includes a connector_id in metadata (used to build the
    remembered approval key).
    
    If no `connector_id` is present, the prompt still appears (when approval
    is required), but only with the existing choices (Accept / Decline /
    Cancel). Approval prompting in this path has an explicit early return
    unless server == `codex_apps`.
  • Stop client from being state carrier (#10595)
    I'd like to make client session wide. This requires shedding all random
    state it has to carry.
  • feat: land unified_exec (#10641)
    Land `unified_exec` for all non-windows OS
  • Update tests to stop using sse_completed fixture (#10638)
    Summary:
    - replace the `sse_completed` fixture and related JSON template with
    direct `responses::ev_completed` payload builders
    - cascade the new SSE helpers through all affected core tests for
    consistency and clarity
    - remove legacy fixtures that were no longer needed once the helpers are
    in place
    
    Testing:
    - Not run (not requested)
  • Migrate state DB path helpers to versioned filename (#10623)
    Summary
    - add versioned state sqlite filename helpers and re-export them from
    the state crate
    - remove legacy state files when initializing the runtime and update
    consumers/tests to use the new helpers
    - tweak logs client description and database resolution to match the new
    path
  • Add a codex.rate_limits event for websockets (#10324)
    When communicating over websockets, we can't rely on headers to deliver
    rate limit information. This PR adds a `codex.rate_limits` event that
    the server can pass to the client to inform them about rate limit usage.
    The client parses this data the same way we parse rate limit headers in
    HTTP mode.
    
    This PR also wires up the etag and reasoning headers for websockets
  • chore: simplify user message detection (#10611)
    We don't check anymore the response item with `user` role as they may be
    instructions etc
  • Requirements: add source to constrained requirement values (#10568)
    If we want to build `/debug-config`, we'll need to know the requirements
    sources that supplied the values.
    
    This PR adds those sources such that we can render them in the UI.
  • Prefer state DB thread listings before filesystem (#10544)
    Summary
    - add Cursor/ThreadsPage conversions so state DB listings can be mapped
    back into the rollout list model
    - make recorder list helpers query the state DB first (archived flag
    included) and only fall back to file traversal if needed, along with
    populating head bytes lazily
    - add extensive tests to ensure the DB path is honored for active and
    archived threads and that the fallback works
    
    Testing
    - Not run (not requested)
    
    <img width="1196" height="693" alt="Screenshot 2026-02-03 at 20 42 33"
    src="https://github.com/user-attachments/assets/826b3c7a-ef11-4b27-802a-3c343695794a"
    />
  • fix(core) Request Rule guidance tweak (#10598)
    ## Summary
    Forgot to include this tweak.
    
    ## Testing
    - [x] Unit tests pass
  • fix(core) updated request_rule guidance (#10379)
    ## Summary
    Update guidance for request_rule
    
    ## Testing
    - [x] Unit tests pass
  • Move metadata calculation out of client (#10589)
    Model client shouldn't be responsible for this.
  • Add thread/compact v2 (#10445)
    - add `thread/compact` as a trigger-only v2 RPC that submits
    `Op::Compact` and returns `{}` immediately.
    - add v2 compaction e2e coverage for success and invalid/unknown thread
    ids, and update protocol schemas/docs.
  • tui: make Esc clear request_user_input notes while notes are shown (#10569)
    ## Summary
    
    This PR updates the `request_user_input` TUI overlay so `Esc` is
    context-aware:
    
    - When notes are visible for an option question, `Esc` now clears notes
    and exits notes mode.
    - When notes are not visible (normal option selection UI), `Esc` still
    interrupts as before.
    
    It also updates footer guidance text to match behavior.
    
    ## Changes
    
    - Added a shared notes-clear path for option questions:
    - `Tab` and `Esc` now both clear notes and return focus to options when
    notes are visible.
    - Updated footer hint text in notes-visible state:
      - from: `tab to clear notes | ... | esc to interrupt`
      - to: `tab or esc to clear notes | ...`
    - Hid `esc to interrupt` hint while notes are visible for option
    questions.
    - Kept `esc to interrupt` visible and functional in normal
    option-selection mode.
    - Updated tests to assert the new `Esc` behavior in notes mode.
    - Updated snapshot output for the notes-visible footer row.
    - Updated docs in `docs/tui-request-user-input.md` to reflect
    mode-specific `Esc` behavior.
  • chore: add codex debug app-server tooling (#10367)
    codex debug app-server <user message> forwards the message through
    codex-app-server-test-client’s send_message_v2 library entry point,
    using std::env::current_exe() to resolve the codex binary.
    
    for how it looks like, see:
    
    ```
    celia@com-92114 codex-rs % cargo build -p codex-cli && target/debug/codex debug app-server --help                       
        Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.34s
    Tooling: helps debug the app server
    
    Usage: codex debug app-server [OPTIONS] <COMMAND>
    
    Commands:
      send-message-v2  
      help             Print this message or the help of the given subcommand(s)
    ````
    and
    ```
    celia@com-92114 codex-rs % cargo build -p codex-cli && target/debug/codex debug app-server send-message-v2 "hello world"
       Compiling codex-cli v0.0.0 (/Users/celia/code/codex/codex-rs/cli)
        Finished `dev` profile [unoptimized + debuginfo] target(s) in 1.38s
    > {
    >   "method": "initialize",
    >   "id": "f8ba9f60-3a49-4ea9-81d6-4ab6853e3954",
    >   "params": {
    >     "clientInfo": {
    >       "name": "codex-toy-app-server",
    >       "title": "Codex Toy App Server",
    >       "version": "0.0.0"
    >     },
    >     "capabilities": {
    >       "experimentalApi": true
    >     }
    >   }
    > }
    < {
    <   "id": "f8ba9f60-3a49-4ea9-81d6-4ab6853e3954",
    <   "result": {
    <     "userAgent": "codex-toy-app-server/0.0.0 (Mac OS 26.2.0; arm64) vscode/2.4.27 (codex-toy-app-server; 0.0.0)"
    <   }
    < }
    < initialize response: InitializeResponse { user_agent: "codex-toy-app-server/0.0.0 (Mac OS 26.2.0; arm64) vscode/2.4.27 (codex-toy-app-server; 0.0.0)" }
    > {
    >   "method": "thread/start",
    >   "id": "203f1630-beee-4e60-b17b-9eff16b1638b",
    >   "params": {
    >     "model": null,
    >     "modelProvider": null,
    >     "cwd": null,
    >     "approvalPolicy": null,
    >     "sandbox": null,
    >     "config": null,
    >     "baseInstructions": null,
    >     "developerInstructions": null,
    >     "personality": null,
    >     "ephemeral": null,
    >     "dynamicTools": null,
    >     "mockExperimentalField": null,
    >     "experimentalRawEvents": false
    >   }
    > }
    ...
    ```
  • feat(tui): pace catch-up stream chunking with hysteresis (#10461)
    ## Summary
    - preserve baseline streaming behavior (smooth mode still commits one
    line per 50ms tick)
    - extract adaptive chunking policy and commit-tick orchestration from
    ChatWidget into `streaming/chunking.rs` and `streaming/commit_tick.rs`
    - add hysteresis-based catch-up behavior with bounded batch draining to
    reduce queue lag without bursty single-frame jumps
    - document policy behavior, tuning guidance, and debug flow in rustdoc +
    docs
    
    ## Testing
    - just fmt
    - cargo test -p codex-tui
  • Feat: add upgrade to app server modelList (#10556)
    ### Summary
    * Add model upgrade to listModel app server endpoint to support
    dynamically show model upgrade banner.
  • Handle exec shutdown on Interrupt (fixes immortal codex exec with websockets) (#10519)
    ### Motivation
    - Ensure `codex exec` exits when a running turn is interrupted (e.g.,
    Ctrl-C) so the CLI is not "immortal" when websockets/streaming are used.
    
    ### Description
    - Return `CodexStatus::InitiateShutdown` when handling
    `EventMsg::TurnAborted` in
    `exec/src/event_processor_with_human_output.rs` so human-output exec
    mode shuts down after an interrupt.
    - Treat `protocol::EventMsg::TurnAborted` as
    `CodexStatus::InitiateShutdown` in
    `exec/src/event_processor_with_jsonl_output.rs` so JSONL output mode
    behaves the same.
    - Applied formatting with `just fmt`.
    
    ### Testing
    - Ran `just fmt` successfully.
    - Ran `cargo test -p codex-exec`; many unit tests ran and the test
    command completed, but the full test run in this environment produced
    `35 passed, 11 failed` where the failures are due to Landlock sandbox
    panics and 403 responses in the test harness (environmental/integration
    issues) and are not caused by the interrupt/shutdown changes.
    
    ------
    [Codex
    Task](https://chatgpt.com/codex/tasks/task_i_698165cec4e083258d17702bd29014c1)
  • feat: add APIs to list and download public remote skills (#10448)
    Add API to list / download from remote public skills
  • chore(arg0): advisory-lock janitor for codex tmp paths (#10039)
    ## Description
    
    ### What changed
    - Switch the arg0 helper root from `~/.codex/tmp/path` to
    `~/.codex/tmp/path2`
    - Add `Arg0PathEntryGuard` to keep both the `TempDir` and an exclusive
    `.lock` file alive for the process lifetime
    - Add a startup janitor that scans `path2` and deletes only directories
    whose lock can be acquired
    
    ### Tests
    - `cargo clippy -p codex-arg0`
    - `cargo clippy -p codex-core`
    - `cargo test -p codex-arg0`
    - `cargo test -p codex-core`
  • [codex] Default values from requirements if unset (#10531)
    If we don't set any explicit values for sandbox or approval policy,
    let's try to use a requirements-satisfying value.
  • implement per-workspace capability SIDs for workspace specific ACLs (#10189)
    Today, there is a single capability SID that allows the sandbox to write
    to
    * workspace (cwd)
    * tmp directories if enabled
    * additional writable roots
    
    This change splits those up, so that each workspace has its own
    capability SID, while tmp and additional roots, which are
    installation-wide, are still governed by the "generic" capability SID
    
    This isolates workspaces from each other in terms of sandbox write
    access.
    Also allows us to protect <cwd>/.codex when codex runs in a specific
    <cwd>
  • [apps] Gateway MCP should be blocking. (#10289)
    Make Apps Gateway MCP blocking since otherwise app mentions may not work
    when apps are not loaded. Messages sent before apps become available
    will be queued.
    
    This only affects when `apps` feature is enabled.