Commit Graph

3908 Commits

  • Improve token usage estimate for images (#12419)
    Fixes #11845.
    
    Adjust context/token estimation for inline image `data:*;base64,...`
    URLs so we
    do not count the raw base64 payload as model-visible text.
    
    What changed:
    - keep the existing JSON-length estimator as the baseline
    - detect only inline base64 `data:` image URLs in message and
    function-call
      output content items
    - subtract only the base64 payload bytes (preserving data URL prefix +
    JSON
      overhead)
    - add a fixed per-image estimate of 340 bytes (~85 tokens at the repo’s
      4-bytes/token heuristic)
    
    This avoids large overestimates from MCP image tool outputs while
    leaving normal
    image URLs (`https://`, `file://`, non-base64 `data:` URLs) unchanged.
    
    Tests:
    - message image data URL estimate regression
    - function-call output image data URL estimate regression
    - non-base64 image URLs unchanged
    - non-base64 `data:` URLs unchanged
    - `data:application/octet-stream;base64,...` adjusted
    - multiple inline images apply multiple fixed costs
    - text-only items unchanged
  • Prefer v2 websockets if available (#12428)
    And also cleanup settings flow to avoid reading many separate flags.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Prevent replayed runtime events from forcing active status (#12420)
    Fixes #11852
    
    Resume replay was applying transient runtime events (`TurnStarted`,
    `StreamError`) as if they were live, which could leave the TUI stuck in
    a stale `Working` / `Reconnecting...` state after resuming an
    interrupted reconnect.
    
    This change makes replay transcript-oriented for these events by:
    - skipping retry-status restoration for replayed non-stream events
    - ignoring replayed `TurnStarted` for task-running state
    - ignoring replayed `StreamError` for reconnect/status UI
    
    Also adds TUI regression tests and snapshot coverage for the interrupted
    reconnect replay case.
  • profile-level model_catalog_json overrie (#12410)
    enable `model-catalog_json` config value on `ConfigProfile` as well
  • feat(linux-sandbox): implement proxy-only egress via TCP-UDS-TCP bridge (#11293)
    ## Summary
    - Implement Linux proxy-only routing in `codex-rs/linux-sandbox` with a
    two-stage bridge: host namespace `loopback TCP proxy endpoint -> UDS`,
    then bwrap netns `loopback TCP listener -> host UDS`.
    - Add hidden `--proxy-route-spec` plumbing for outer-to-inner stage
    handoff.
    - Fail closed in proxy mode when no valid loopback proxy endpoints can
    be routed.
    - Introduce explicit network seccomp modes: `Restricted` (legacy
    restricted networking) and `ProxyRouted` (allow INET/INET6 for routed
    proxy access, deny `AF_UNIX` and `socketpair`).
    - Enforce that proxy bridge/routing is bwrap-only by validating
    `--apply-seccomp-then-exec` requires `--use-bwrap-sandbox`.
    - Keep landlock-only flows unchanged (no proxy bridge behavior outside
    bwrap).
    
    ---------
    
    Co-authored-by: Codex <199175422+chatgpt-codex-connector[bot]@users.noreply.github.com>
  • chore: delete empty codex-rs/code file (#12440)
    This file was added in https://github.com/openai/codex/pull/4195, but I
    think it may have been a mistake?
  • refactor(core): move embedded system skills into codex-skills crate (#12435)
    ## Why
    
    `codex-core` was carrying the embedded system-skill sample assets (and a
    `build.rs` that walks those files to register rerun triggers). Those
    assets change infrequently, but any change under `codex-core` still ties
    them to `codex-core`'s build/cache lifecycle.
    
    This change moves the embedded system-skills packaging into a dedicated
    `codex-skills` crate so it can be cached independently. That reduces
    unnecessary invalidation/rebuild pressure on `codex-core` when the
    skills bundle is the only thing that changes.
    
    ## What Changed
    
    - Added a new `codex-rs/skills` crate (`codex-skills`) with:
      - `Cargo.toml`
      - `BUILD.bazel`
      - `build.rs` to track skill asset file changes for Cargo rebuilds
    - `src/lib.rs` containing the embedded system-skills install/cache logic
    previously in `codex-core`
    - Moved the embedded sample skill assets from
    `codex-rs/core/src/skills/assets/samples` to
    `codex-rs/skills/src/assets/samples`.
    - Updated `codex-rs/core/Cargo.toml` to depend on `codex-skills` and
    removed `codex-core`'s direct `include_dir` dependency.
    - Removed `codex-core`'s `build.rs`.
    - Replaced `codex-rs/core/src/skills/system.rs` implementation with a
    thin re-export wrapper to keep existing `codex-core` call sites
    unchanged.
    - Updated workspace manifests/lockfile (`codex-rs/Cargo.toml`,
    `codex-rs/Cargo.lock`) for the new crate.
  • fix: codex-arg0 no longer depends on codex-core (#12434)
    ## Why
    
    `codex-rs/arg0` only needed two things from `codex-core`:
    
    - the `find_codex_home()` wrapper
    - the special argv flag used for the internal `apply_patch`
    self-invocation path
    
    That made `codex-arg0` depend on `codex-core` for a very small surface
    area. This change removes that dependency edge and moves the shared
    `apply_patch` invocation flag to a more natural boundary
    (`codex-apply-patch`) while keeping the contract explicitly documented.
    
    ## What Changed
    
    - Moved the internal `apply_patch` argv[1] flag constant out of
    `codex-core` and into `codex-apply-patch`.
    - Renamed the constant to `CODEX_CORE_APPLY_PATCH_ARG1` and documented
    that it is part of the Codex core process-invocation contract (even
    though it now lives in `codex-apply-patch`).
    - Updated `arg0`, the core apply-patch runtime, and the `codex-exec`
    apply-patch test to import the constant from `codex-apply-patch`.
    - Updated `codex-rs/arg0` to call
    `codex_utils_home_dir::find_codex_home()` directly instead of
    `codex_core::config::find_codex_home()`.
    - Removed the `codex-core` dependency from `codex-rs/arg0` and added the
    needed direct dependency on `codex-utils-home-dir`.
    - Added `codex-apply-patch` as a dev-dependency for `codex-rs/exec`
    tests (the apply-patch test now imports the moved constant directly).
    
    ## Verification
    
    - `cargo test -p codex-apply-patch`
    - `cargo test -p codex-arg0`
    - `cargo test -p codex-core --lib apply_patch`
    - `cargo test -p codex-exec
    test_standalone_exec_cli_can_use_apply_patch`
    - `cargo shear`
  • chore: remove codex-core public protocol/shell re-exports (#12432)
    ## Why
    
    `codex-rs/core/src/lib.rs` re-exported a broad set of types and modules
    from `codex-protocol` and `codex-shell-command`. That made it easy for
    workspace crates to import those APIs through `codex-core`, which in
    turn hides dependency edges and makes it harder to reduce compile-time
    coupling over time.
    
    This change removes those public re-exports so call sites must import
    from the source crates directly. Even when a crate still depends on
    `codex-core` today, this makes dependency boundaries explicit and
    unblocks future work to drop `codex-core` dependencies where possible.
    
    ## What Changed
    
    - Removed public re-exports from `codex-rs/core/src/lib.rs` for:
    - `codex_protocol::protocol` and related protocol/model types (including
    `InitialHistory`)
      - `codex_protocol::config_types` (`protocol_config_types`)
    - `codex_shell_command::{bash, is_dangerous_command, is_safe_command,
    parse_command, powershell}`
    - Migrated workspace Rust call sites to import directly from:
      - `codex_protocol::protocol`
      - `codex_protocol::config_types`
      - `codex_protocol::models`
      - `codex_shell_command`
    - Added explicit `Cargo.toml` dependencies (`codex-protocol` /
    `codex-shell-command`) in crates that now import those crates directly.
    - Kept `codex-core` internal modules compiling by using `pub(crate)`
    aliases in `core/src/lib.rs` (internal-only, not part of the public
    API).
    - Updated the two utility crates that can already drop a `codex-core`
    dependency edge entirely:
      - `codex-utils-approval-presets`
      - `codex-utils-cli`
    
    ## Verification
    
    - `cargo test -p codex-utils-approval-presets`
    - `cargo test -p codex-utils-cli`
    - `cargo check --workspace --all-targets`
    - `just clippy`
  • chore: move config diagnostics out of codex-core (#12427)
    ## Why
    
    Compiling `codex-rs/core` is a bottleneck for local iteration, so this
    change continues the ongoing extraction of config-related functionality
    out of `codex-core` and into `codex-config`.
    
    The goal is not just to move code, but to reduce `codex-core` ownership
    and indirection so more code depends on `codex-config` directly.
    
    ## What Changed
    
    - Moved config diagnostics logic from
    `core/src/config_loader/diagnostics.rs` into
    `config/src/diagnostics.rs`.
    - Updated `codex-core` to use `codex-config` diagnostics types/functions
    directly where possible.
    - Removed the `core/src/config_loader/diagnostics.rs` shim module
    entirely; the remaining `ConfigToml`-specific calls are in
    `core/src/config_loader/mod.rs`.
    - Moved `CONFIG_TOML_FILE` into `codex-config` and updated existing
    references to use `codex_config::CONFIG_TOML_FILE` directly.
    - Added a direct `codex-config` dependency to `codex-cli` for its
    `CONFIG_TOML_FILE` use.
  • Fix compaction context reinjection and model baselines (#12252)
    ## Summary
    - move regular-turn context diff/full-context persistence into
    `run_turn` so pre-turn compaction runs before incoming context updates
    are recorded
    - after successful pre-turn compaction, rely on a cleared
    `reference_context_item` to trigger full context reinjection on the
    follow-up regular turn (manual `/compact` keeps replacement history
    summary-only and also clears the baseline)
    - preserve `<model_switch>` when full context is reinjected, and inject
    it *before* the rest of the full-context items
    - scope `reference_context_item` and `previous_model` to regular user
    turns only so standalone tasks (`/compact`, shell, review, undo) cannot
    suppress future reinjection or `<model_switch>` behavior
    - make context-diff persistence + `reference_context_item` updates
    explicit in the regular-turn path, with clearer docs/comments around the
    invariant
    - stop persisting local `/compact` `RolloutItem::TurnContext` snapshots
    (only regular turns persist `TurnContextItem` now)
    - simplify resume/fork previous-model/reference-baseline hydration by
    looking up the last surviving turn context from rollout lifecycle
    events, including rollback and compaction-crossing handling
    - remove the legacy fallback that guessed from bare `TurnContext`
    rollouts without lifecycle events
    - update compaction/remote-compaction/model-visible snapshots and
    compact test assertions (including remote compaction mock response
    shape)
    
    ## Why
    We were persisting incoming context items before spawning the regular
    turn task, which let pre-turn compaction requests accidentally include
    incoming context diffs without the new user message. Fixing that exposed
    follow-on baseline issues around `/compact`, resume/fork, and standalone
    tasks that could cause duplicate context injection or suppress
    `<model_switch>` instructions.
    
    This PR re-centers the invariants around regular turns:
    - regular turns persist model-visible context diffs/full reinjection and
    update the `reference_context_item`
    - standalone tasks do not advance those regular-turn baselines
    - compaction clears the baseline when replacement history may have
    stripped the referenced context diffs
    
    ## Follow-ups (TODOs left in code)
    - `TODO(ccunningham)`: fix rollback/backtracking baseline handling more
    comprehensively
    - `TODO(ccunningham)`: include pending incoming context items in
    pre-turn compaction threshold estimation
    - `TODO(ccunningham)`: inject updated personality spec alongside
    `<model_switch>` so some model-switch paths can avoid forced full
    reinjection
    - `TODO(ccunningham)`: review task turn lifecycle
    (`TurnStarted`/`TurnComplete`) behavior and emit task-start context
    diffs for task types that should have them (excluding `/compact`)
    
    ## Validation
    - `just fmt`
    - CI should cover the updated compaction/resume/model-visible snapshot
    expectations and rollout-hydration behavior
    - I did **not** rerun the full local test suite after the latest
    resume-lookup / rollout-persistence simplifications
  • feat: discourage the use of the --all-features flag (#12429)
    ## Why
    
    Developers are frequently running low on disk space, and routine use of
    `--all-features` contributes to larger Cargo build caches in `target/`
    by compiling additional feature combinations.
    
    This change updates local workflow guidance to avoid `--all-features` by
    default and reserve it for cases where full feature coverage is
    specifically needed.
    
    ## What Changed
    
    - Updated `AGENTS.md` guidance for `codex-rs` to recommend `cargo test`
    / `just test` for full-suite local runs, and to call out the disk-usage
    cost of routine `--all-features` usage.
    - Updated the root `justfile` so `just fix` and `just clippy` no longer
    pass `--all-features` by default.
    - Updated `docs/install.md` to explicitly describe `cargo test
    --all-features` as an optional heavier-weight run (more build time and
    `target/` disk usage).
    
    ## Verification
    
    - Confirmed the `justfile` parses and the recipes list successfully with
    `just --list`.
  • fix(core) Filter non-matching prefix rules (#12314)
    ## Summary
    `gpt-5.3-codex` really likes to write complicated shell scripts, and
    suggest a partial prefix_rule that wouldn't actually approve the
    command. We should only show the `prefix_rule` suggestion from the model
    if it would actually fully approve the command the user is seeing.
    
    This will technically cause more instances of overly-specific
    suggestions when we fallback, but I think the UX is clearer,
    particularly when the model doesn't necessarily understand the current
    limitations of execpolicy parsing.
    
    ## Testing
     - [x] Add unit tests
     - [x] Add integration tests
  • ignore v1 in JSON schema codegen (#12408)
    ## Why
    
    The generated unnamespaced JSON envelope schemas (`ClientRequest` and
    `ServerNotification`) still contained both v1 and v2 variants, which
    pulled legacy v1/core types and v2 types into the same `definitions`
    graph. That caused `schemars` to produce numeric suffix names (for
    example `AskForApproval2`, `ByteRange2`, `MessagePhase2`).
    
    This PR moves JSON codegen toward v2-only output while preserving the
    unnamespaced envelope artifacts, and avoids reintroducing numeric-suffix
    tolerance by removing the v1/internal-only variants that caused the
    collisions in those envelope schemas.
    
    ## What Changed
    
    - In `codex-rs/app-server-protocol/src/export.rs`, JSON generation now
    excludes v1 schema artifacts (`v1/*`) while continuing to emit
    unnamespaced/root JSON schemas and the JSON bundle.
    - Added a narrow JSON v1 allowlist (`JSON_V1_ALLOWLIST`) so
    `InitializeParams` and `InitializeResponse` are still emitted.
    - Added JSON-only post-processing for the mixed envelope schemas before
    collision checks run:
    - `ClientRequest`: strips v1 request variants from the generated `oneOf`
    using the temporary `V1_CLIENT_REQUEST_METHODS` list
    - `ServerNotification`: strips v1 notifications plus the internal-only
    `rawResponseItem/completed` notification using the temporary
    `EXCLUDED_SERVER_NOTIFICATION_METHODS_FOR_JSON` list
    - Added a temporary local-definition pruning pass for those envelope
    schemas so now-unreferenced v1/core definitions are removed from
    `definitions` after method filtering.
    - Updated the variant-title naming heuristic for single-property literal
    object variants to use the literal value (when available), avoiding
    collisions like multiple `state`-only variants all deriving the same
    title.
    - Collision handling remains fail-fast (no numeric suffix fallback map
    in this PR path).
    
    ## Verification
    
    - `just write-app-server-schema`
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/12408).
    * __->__ #12408
    * #12406
  • test(app-server): wait for turn/completed in turn_start tests (#12376)
    ## Summary
    - switch a few app-server `turn_start` tests from
    `codex/event/task_complete` waits to `turn/completed` waits
    - avoid matching unrelated/background `task_complete` events
    - keep this flaky test fix separate from the /title feature PR
    
    ## Why
    On Windows ARM CI, these tests can return early after observing a
    generic `codex/event/task_complete` notification from another task. That
    can leave the mock Responses server with fewer calls than expected and
    fail the test with a wiremock verification mismatch.
    
    Using `turn/completed` matches the app-server turn lifecycle
    notification the tests actually care about.
    
    ## Validation
    - `cargo test -p codex-app-server
    turn_start_updates_sandbox_and_cwd_between_turns_v2 -- --nocapture`
    - `cargo test -p codex-app-server turn_start_exec_approval_ --
    --nocapture`
    - `just fmt`
  • feat: use OAI Responses API MessagePhase type directly in App Server v2 (#12422)
    https://github.com/openai/codex/pull/10455 introduced the `phase` field,
    and then https://github.com/openai/codex/pull/12072 introduced a
    `MessagePhase` type in `v2.rs` that paralleled the `MessagePhase` type
    in `codex-rs/protocol/src/models.rs`.
    
    The app server protocol prefers `camelCase` while the Responses API uses
    `snake_case`, so this meant we had two versions of `MessagePhase` with
    different serialization rules. When the app server protocol refers to
    types from the Responses API, we use the wire format of the the
    Responses API even though it is inconsistent with the app server API.
    
    This PR deletes `MessagePhase` from `v2.rs` and consolidates on the
    Responses API version to eliminate confusion.
  • fix: address flakiness in thread_resume_rejoins_running_thread_even_with_override_mismatch (#12381)
    ## Why
    `thread/resume` responses for already-running threads can be reported as
    `Idle` even while a turn is still in progress. This is caused by a
    timing window where the runtime watch state has not yet observed the
    running-thread transition, so API clients can receive stale status
    information at resume time.
    
    Possibly related: https://github.com/openai/codex/pull/11786
    
    ## What
    - Add a shared status normalization helper, `resolve_thread_status`, in
    `codex-rs/app-server/src/thread_status.rs` that resolves
    `Idle`/`NotLoaded` to `Active { active_flags: [] }` when an in-progress
    turn is known.
    - Reuse this helper across thread response paths in
    `codex-rs/app-server/src/codex_message_processor.rs` (including
    `thread/start`, `thread/unarchive`, `thread/read`, `thread/resume`,
    `thread/fork`, and review/thread-started notification responses).
    - In `handle_pending_thread_resume_request`, use both the in-memory
    `active_turn_snapshot` and the resumed rollout turns to decide whether a
    turn is in progress before resolving thread status for the response.
    - Extend `thread_status` tests to validate the new status-resolution
    behavior directly.
    
    ## Verification
    - `cargo test -p codex-app-server
    suite::v2::thread_resume::thread_resume_rejoins_running_thread_even_with_override_mismatch`
  • Add experimental realtime websocket backend prompt override (#12418)
    - add top-level `experimental_realtime_ws_backend_prompt` config key
    (experimental / do not use) and include it in config schema
    - apply the override only to `Op::RealtimeConversation` websocket
    `backend_prompt`, with config + realtime tests
  • Improve Plan mode reasoning selection flow (#12303)
    Addresses https://github.com/openai/codex/issues/11013
    
    ## Summary
    - add a Plan implementation path in the TUI that lets users choose
    reasoning before switching to Default mode and implementing
    - add Plan-mode reasoning scope handling (Plan-only override vs
    all-modes default), including config/schema/docs plumbing for
    `plan_mode_reasoning_effort`
    - remove the hardcoded Plan preset medium default and make the reasoning
    popup reflect the active Plan override as `(current)`
    - split the collaboration-mode switch notification UI hint into #12307
    to keep this diff focused
    
    If I have `plan_mode_reasoning_effort = "medium"` set in my
    `config.toml`:
    <img width="699" height="127" alt="Screenshot 2026-02-20 at 6 59 37 PM"
    src="https://github.com/user-attachments/assets/b33abf04-6b7a-49ed-b2e9-d24b99795369"
    />
    
    If I don't have `plan_mode_reasoning_effort` set in my `config.toml`:
    <img width="704" height="129" alt="Screenshot 2026-02-20 at 7 01 51 PM"
    src="https://github.com/user-attachments/assets/88a086d4-d2f1-49c7-8be4-f6f0c0fa1b8d"
    />
    
    ## Codex author
    `codex resume 019c78a2-726b-7fe3-adac-3fa4523dcc2a`
  • Add experimental realtime websocket URL override (#12416)
    - add top-level `experimental_realtime_ws_base_url` config key
    (experimental / do not use) and include it in config schema
    - apply the override only to `Op::RealtimeConversation` websocket
    transport, with config + realtime tests
  • fix(nix): include libcap dependency on linux builds (#12415)
    commit 923f931121 introduced a dependency
    on `libcap`. This PR fixes the nix build by including `libcap` in nix's
    build inputs
    
    issue number: #12102. @etraut-openai gave me permission to open pr 
    
    Testing:
    running `nix run .#codex-rs` works on both macos (aarch64) and nixos
    (x86-64)
  • Wire realtime api to core (#12268)
    - Introduce `RealtimeConversationManager` for realtime API management 
    - Add `op::conversation` to start conversation, insert audio, insert
    text, and close conversation.
    - emit conversation lifecycle and realtime events.
    - Move shared realtime payload types into codex-protocol and add core
    e2e websocket tests for start/replace/transport-close paths.
    
    Things to consider:
    - Should we use the same `op::` and `Events` channel to carry audio? I
    think we should try this simple approach and later we can create
    separate one if the channels got congested.
    - Sending text updates to the client: we can start simple and later
    restrict that.
    - Provider auth isn't wired for now intentionally
  • Add field to Thread object for the latest rename set for a given thread (#12301)
    Exposes through the app server updated names set for a thread. This
    enables other surfaces to use the core as the source of truth for thread
    naming. `threadName` is gathered using the helper functions used to
    interact with `session_index.jsonl`, and is hydrated in:
    - `thread/list`
    - `thread/read`
    - `thread/resume`
    - `thread/unarchive`
    - `thread/rollback`
    
    We don't do this for `thread/start` and `thread/fork`.
  • fix: explicitly list name collisions in JSON schema generation (#12406)
    ## Why
    
    JSON schema codegen was silently resolving naming collisions by
    appending numeric suffixes (for example `...2`, `...3`). That makes the
    generated schema names unstable: removing an earlier colliding type can
    cause a later type to be renumbered, which is a breaking change for
    consumers that referenced the old generated name.
    
    This PR makes those collisions explicit and reviewable.
    
    Though note that once we remove `v1` from the codegen, we will no longer
    support naming collisions. Or rather, naming collisions will have to be
    handled explicitly rather than the numeric suffix approach.
    
    ## What Changed
    
    - In `codex-rs/app-server-protocol/src/export.rs`, replaced implicit
    numeric suffix collision handling for generated variant titles with
    explicit special-case maps.
    - Added a panic when a collision occurs without an entry in the map, so
    new collisions fail loudly instead of silently renaming generated schema
    types.
    - Added the currently required special cases so existing generated names
    remain stable.
    - Extended the same approach to numbered `definitions` / `$defs`
    collisions (for example `MessagePhase2`-style names) so those are also
    explicitly tracked.
    
    ## Verification
    
    - Ran targeted generator-path test:
    - `cargo test -p codex-app-server-protocol
    generate_json_filters_experimental_fields_and_methods -- --nocapture`
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/12406).
    * #12408
    * __->__ #12406
  • [apps] Bump MCP tool call timeout. (#12405)
    - [x] Bump MCP tool call timeout.
  • fix(tui): queued-message edit shortcut unreachable in some terminals (#12240)
    ## Problem
    
    The TUI's "edit queued message" shortcut (Alt+Up) is either silently
    swallowed or recognized as another key combination by Apple Terminal,
    Warp, and VSCode's integrated terminal on macOS. Users in those
    environments see the hint but pressing the keys does nothing.
    
    ## Mental model
    
    When a model turn is in progress the user can still type follow-up
    messages. These are queued and displayed below the composer with a hint
    line showing how to pop the most recent one back into the editor. The
    hint text and the actual key handler must agree on which shortcut is
    used, and that shortcut must actually reach the TUI—i.e. it must not be
    intercepted by the host terminal.
    
    Three terminals are known to intercept Alt+Up: Apple Terminal (remaps it
    to cursor movement), Warp (consumes it for its own command palette), and
    VSCode (maps it to "move line up"). For these we use Shift+Left instead.
    
    <p align="center">
    <img width="283" height="182" alt="image"
    src="https://github.com/user-attachments/assets/4a9c5d13-6e47-4157-bb41-28b4ce96a914"
    />
    </p>
    
    | macOS Native Terminal | Warp | VSCode Terminal |
    |---|---|---|
    | <img width="1557" height="1010" alt="SCR-20260219-kigi"
    src="https://github.com/user-attachments/assets/f4ff52f8-119e-407b-a3f3-52f564c36d70"
    /> | <img width="1479" height="1261" alt="SCR-20260219-krrf"
    src="https://github.com/user-attachments/assets/5807d7c4-17ae-4a2b-aa27-238fd49d90fd"
    /> | <img width="1612" height="1312" alt="SCR-20260219-ksbz"
    src="https://github.com/user-attachments/assets/1cedb895-6966-4d63-ac5f-0eea0f7057e8"
    /> |
    
    ## Non-goals
    
    - Making the binding user-configurable at runtime (deferred to a broader
    keybinding-config effort).
    - Remapping any other shortcuts that might be terminal-specific.
    
    ## Tradeoffs
    
    - **Exhaustive match instead of a wildcard default.** The
    `queued_message_edit_binding_for_terminal` function explicitly lists
    every `TerminalName` variant. This is intentional: adding a new terminal
    to the enum will produce a compile error, forcing the author to decide
    which binding that terminal should use.
    
    - **Binding lives on `ChatWidget`, hint lives on `QueuedUserMessages`.**
    The key event handler that actually acts on the press is in
    `ChatWidget`, but the rendered hint text is inside `QueuedUserMessages`.
    These are kept in sync by `ChatWidget` calling
    `bottom_pane.set_queued_message_edit_binding(self.queued_message_edit_binding)`
    during construction. A mismatch would show the wrong hint but would not
    lose data.
    
    ## Architecture
    
    ```mermaid
      graph TD
          TI["terminal_info().name"] --> FN["queued_message_edit_binding_for_terminal(name)"]
          FN --> KB["KeyBinding"]
          KB --> CW["ChatWidget.queued_message_edit_binding<br/><i>key event matching</i>"]
          KB --> BP["BottomPane.set_queued_message_edit_binding()"]
          BP --> QUM["QueuedUserMessages.edit_binding<br/><i>rendered in hint line</i>"]
    
          subgraph "Special terminals (Shift+Left)"
              AT["Apple Terminal"]
              WT["Warp"]
              VS["VSCode"]
          end
    
          subgraph "Default (Alt+Up)"
              GH["Ghostty"]
              IT["iTerm2"]
              OT["Others…"]
          end
    
          AT --> FN
          WT --> FN
          VS --> FN
          GH --> FN
          IT --> FN
          OT --> FN
    ```
    
    No new crates or public API surface. The only cross-crate dependency
    added is `codex_core::terminal::{TerminalName, terminal_info}`, which
    already existed for telemetry.
    
    ## Observability
    
    No new logging. Terminal detection already emits a `tracing::debug!` log
    line at startup with the detected terminal name, which is sufficient to
    diagnose binding mismatches.
    
    ## Tests
    
    - Existing `alt_up_edits_most_recent_queued_message` test is preserved
    and explicitly sets the Alt+Up binding to isolate from the host
    terminal.
    - New parameterized async tests verify Shift+Left works for Apple
    Terminal, Warp, and VSCode.
    - A sync unit test asserts the mapping table covers the three special
    terminals (Shift+Left) and that iTerm2 still gets Alt+Up.
      
    Fixes #4490
  • [apps] Fix gateway url. (#12403)
    - [x] Fix connectors gateway url.
  • Show model/reasoning hint when switching modes (#12307)
    ## Summary
    - show an info message when switching collaboration modes changes the
    effective model or reasoning
    - include the target mode in the message (for example `... for Plan
    mode.`)
    - add TUI tests for model-change and reasoning-only change notifications
    on mode switch
    
    <img width="715" height="184" alt="Screenshot 2026-02-20 at 2 01 40 PM"
    src="https://github.com/user-attachments/assets/18d1beb3-ab87-4e1c-9ada-a10218520420"
    />
  • clarify model_catalog_json only applied on startup (#12379)
    # External (non-OpenAI) Pull Request Requirements
    
    Before opening this Pull Request, please read the dedicated
    "Contributing" markdown file or your PR may be closed:
    https://github.com/openai/codex/blob/main/docs/contributing.md
    
    If your PR conforms to our contribution guidelines, replace this text
    with a detailed and high quality description of your changes.
    
    Include a link to a bug report or enhancement request.
  • Move sanitizer into codex-secrets (#12306)
    ## Summary
    - move the sanitizer implementation into `codex-secrets`
    (`secrets/src/sanitizer.rs`) and re-export `redact_secrets`
    - switch `codex-core` to depend on/import `codex-secrets` for sanitizer
    usage
    - remove the old `utils/sanitizer` crate wiring and refresh lockfiles
    
    ## Testing
    - `just fmt`
    - `cargo test -p codex-secrets`
    - `cargo test -p codex-core --no-run`
    - `cargo clippy -p codex-secrets -p codex-core --all-targets
    --all-features -- -D warnings`
    - `just bazel-lock-update`
    - `just bazel-lock-check`
    
    ## Notes
    - not run: `cargo test --all-features` (full workspace suite)
  • Add ability to attach extra files to feedback (#12370)
    Allow clients to provide extra files.
  • docs: use --locked when installing cargo-nextest (#12377)
    ## What
    
    Updates the optional `cargo-nextest` install command in
    `docs/install.md`:
    
    - `cargo install cargo-nextest` -> `cargo install --locked
    cargo-nextest`
    
    ## Why
    
    The current docs command can fail during source install because recent
    `cargo-nextest` releases intentionally require `--locked`.
    
    Repro (macOS, but likely not platform-specific):
    - `cargo install cargo-nextest`
    - Fails with a compile error from `locked-tripwire` indicating:
      - `Nextest does not support being installed without --locked`
      - suggests `cargo install --locked cargo-nextest`
    
    Using the locked command succeeds:
    - `cargo install --locked cargo-nextest`
    
    ## How
    
    Single-line docs change in `docs/install.md` to match current
    `cargo-nextest` install requirements.
    
    ## Validation
    
    - Reproduced failure locally using a temporary `CARGO_HOME` directory
    (clean Cargo home)
    - Example command used: `CARGO_HOME=/tmp/cargo-home-test cargo install
    cargo-nextest`
    - Confirmed success with `cargo install --locked cargo-nextest`
  • [apps] Enforce simple logo url format. (#12374)
    - [x] Enforce simple logo url format when loading apps directory to save
    bandwidth.
  • core tests: use hermetic mock server in review suite (#12291)
    ## Summary
    - switch the review test SSE mock helper to use the shared hermetic mock
    server setup
    - ensure review tests always have a default `/v1/models` stub during
    Codex session bootstrap
    - remove the race that caused intermittent `/v1/models` connection
    failures and flaky ETag refresh assertions
    
    ## Testing
    - `just fmt`
    - `cargo test -p codex-core --test all
    refresh_models_on_models_etag_mismatch_and_avoid_duplicate_models_fetch`
    - `cargo test -p codex-core --test all
    review_uses_custom_review_model_from_config`
    - repeated both targeted tests 5x in a loop
    - `cargo clippy -p codex-core --tests -- -D warnings`
  • app-server: harden disconnect cleanup paths (#12218)
    Hardens codex-rs/app-server connection lifecycle and outbound routing
    for websocket clients. Fixes some FUD I was having
    
    - Added per-connection disconnect signaling (CancellationToken) for
    websocket transports.
    - Split websocket handling into independent inbound/outbound tasks
    coordinated by cancellation.
    - Changed outbound routing so websocket connections use non-blocking
    try_send; slow/full websocket writers are disconnected instead of
    stalling broadcast delivery.
    - Kept stdio behavior blocking-on-send (no forced disconnect) so local
    stdio clients are not dropped when queues are temporarily full.
    - Simplified outbound router flow by removing deferred
    pending_closed_connections handling.
    - Added guards to drop incoming response/notification/error messages
    from unknown connections.
    - Fixed listener teardown race in thread listener tasks using a
    listener_generation check so stale tasks do not clear newer listeners.
    
    Fixes
    https://linear.app/openai/issue/CODEX-4966/multiclient-handle-slow-notification-consumers
    
      ## Tests
    
      Added/updated transport tests covering:
    
      - broadcast does not block on a slow/full websocket connection
      - stdio connection waits instead of disconnecting on full queue
    
    I (maxj) have tested manually and will retest before landing
  • fix(core): require approval for destructive MCP tool calls (#12353)
    Summary
    - ensure destructive tool annotations short-circuit to require approval
    - simplify approval logic to only require read/write + open-world when
    destructive is false
    - update the unit test to cover the new destructive behavior
    
    Testing
    - Not run (not requested)
  • [apps] Implement apps configs. (#12086)
    - [x] Implement apps configs.
  • ci(bazel): install Node from node-version.txt in remote image (#12205)
    ## Summary
    Install Node in the Bazel remote execution image using the version
    pinned in `codex-rs/node-version.txt`.
    
    ## Why
    `js_repl` tests run under Bazel remote execution and require a modern
    Node runtime. Runner-level `setup-node` does not guarantee Node is
    available (or recent enough) inside the remote worker container.
    
    ## What changed
    - Updated `.github/workflows/Dockerfile.bazel` to install Node from
    official tarballs at image build time.
    - Added `xz-utils` for extracting `.tar.xz` archives.
    - Copied `codex-rs/node-version.txt` into the image build context and
    used it as the single source of truth for Node version.
    - Added architecture mapping for multi-arch builds:
      - `amd64 -> x64`
      - `arm64 -> arm64`
    - Verified install during image build with:
      - `node --version`
      - `npm --version`
    
    ## Impact
    - Bazel remote workers should now have the required Node version
    available for `js_repl` tests.
    - Keeps Node version synchronized with repo policy via
    `codex-rs/node-version.txt`.
    
    ## Testing
    - Verified Dockerfile changes and build steps locally (build-time
    commands are deterministic and fail fast on unsupported arch/version
    fetch issues).
    
    ## Follow-up
    - Rebuild and publish the Bazel runner image for both `linux/amd64` and
    `linux/arm64`.
    - Update image digests in `rbe.bzl` to roll out this runtime update in
    CI.
    
    
    #### [git stack](https://github.com/magus/git-stack-cli)
    -  `1` https://github.com/openai/codex/pull/12300
    -  `2` https://github.com/openai/codex/pull/12275
    - 👉 `3` https://github.com/openai/codex/pull/12205
    -  `4` https://github.com/openai/codex/pull/12185
    -  `5` https://github.com/openai/codex/pull/10673
  • CODEX-4927: Surface local login entitlement denials in browser (#12289)
    ## Problem
    Users without Codex access can hit a confusing local login loop. In the
    denial case, the callback could fall through to generic behavior
    (including a plain "Missing authorization code" page) instead of clearly
    explaining that access was denied.
    
    <img width="842" height="464" alt="Screenshot 2026-02-19 at 11 43 45 PM"
    src="https://github.com/user-attachments/assets/f7a25e1d-e480-4ac2-b0ff-8bfe31003e66"
    />
    <img width="842" height="464" alt="Screenshot 2026-02-19 at 11 44 53 PM"
    src="https://github.com/user-attachments/assets/8a4fe6e4-b27b-483c-9f0c-60164933221d"
    />
    
    
    ## Scope
    This PR improves local login error clarity only. It does not change
    entitlement policy, RBAC rules, or who is allowed to use Codex.
    
    ## What Changed
    - The local OAuth callback handler now parses `error` and
    `error_description` on `/auth/callback` and exits the callback loop with
    a real failure.
    - Callback failures render a branded local Codex error page instead of a
    generic/plain page.
    - `access_denied` + `missing_codex_entitlement` is now mapped to an
    explicit user-facing message telling the user Codex is not enabled for
    their workspace and to contact their workspace administrator for access.
    - Unknown OAuth callback errors continue to use a generic error page
    while preserving the OAuth error code/details for debugging.
    - Added the login error page template to Bazel assets so the local
    binary can render it in Bazel builds.
    
    ## Non-goals
    - No TUI onboarding/toast changes in this PR.
    - No backend entitlement or policy changes.
    
    ## Tests
    - Added an end-to-end `codex-login` test for `access_denied` +
    `missing_codex_entitlement` and verified the page shows the actionable
    admin guidance.
    - Added an end-to-end `codex-login` test for a generic `access_denied`
    reason to verify we keep a generic fallback page/message.
  • js_repl: remove codex.state helper references (#12275)
    ## Summary
    
    This PR removes `codex.state` from the `js_repl` helper surface and
    removes all corresponding documentation/instruction references.
    
    ## Motivation
    
    Top-level bindings in `js_repl` now persist across cells, so the extra
    `codex.state` helper is redundant and adds unnecessary API/docs surface.
    
    ## Changes
    
    - Removed the long-lived `state` object from the Node kernel helper
    wiring.
    - Stopped exposing `codex.state` (and `context.state`) during `js_repl`
    execution.
    - Updated user-facing `js_repl` docs to remove `codex.state`.
    - Updated generated instruction text and related test expectations to
    list only:
      - `codex.tmpDir`
      - `codex.tool(name, args?)`
    
    
    #### [git stack](https://github.com/magus/git-stack-cli)
    -  `1` https://github.com/openai/codex/pull/12300
    - 👉 `2` https://github.com/openai/codex/pull/12275
    -  `3` https://github.com/openai/codex/pull/12205
    -  `4` https://github.com/openai/codex/pull/12185
    -  `5` https://github.com/openai/codex/pull/10673
  • fix(network-proxy): add unix socket allow-all and update seatbelt rules (#11368)
    ## Summary
    Adds support for a Unix socket escape hatch so we can bypass socket
    allowlisting when explicitly enabled.
    
    ## Description
    * added a new flag, `network.dangerously_allow_all_unix_sockets` as an
    explicit escape hatch
    * In codex-network-proxy, enabling that flag now allows any absolute
    Unix socket path from x-unix-socket instead of requiring each path to be
    explicitly allowlisted. Relative paths are still rejected.
    * updated the macOS seatbelt path in core so it enforces the same Unix
    socket behavior:
      * allowlisted sockets generate explicit network* subpath rules
      * allow-all generates a broad network* (subpath "/") rule
    
    ---------
    
    Co-authored-by: Codex <199175422+chatgpt-codex-connector[bot]@users.noreply.github.com>
  • js_repl: block wrapped payload prefixes in grammar (#12300)
    ## Summary
    
    Tighten the `js_repl` freeform Lark grammar to block the most common
    malformed payload wrappers before they reach runtime validation.
    
    ## What Changed
    
    - Replaced the overly permissive `js_repl` freeform grammar (`start:
    /[\s\S]*/`) with a structured grammar that still supports:
      - plain JS source
      - optional first-line `// codex-js-repl:` pragma followed by JS source
    - Added grammar-level filtering for common bad payload shapes by
    rejecting inputs whose first significant token starts with:
      - `{` (JSON object wrapper like `{"code":"..."}`)
      - `"` (quoted code string)
      - `` ``` `` (markdown code fences)
    - Implemented the grammar without regex lookahead/lookbehind because the
    API-side Lark regex engine does not support look-around.
    - Added a unit test to validate the grammar shape and guard against
    reintroducing unsupported lookaround.
    
    ## Why
    
    `js_repl` is a freeform tool, but the model sometimes emits wrapped
    payloads (JSON, quoted strings, markdown fences) instead of raw
    JavaScript. We already reject those at runtime, but this change moves
    the constraint into the tool grammar so the model is less likely to
    generate invalid tool-call payloads in the first place.
    
    ## Testing
    
    - `cargo test -p codex-core
    js_repl_freeform_grammar_blocks_common_non_js_prefixes`
    - `cargo test -p codex-core parse_freeform_args_rejects_`
    
    ## Notes
    
    - This intentionally over-blocks a few uncommon valid JS starts (for
    example top-level `{ ... }` blocks or top-level quoted directives like
    `"use strict";`) in exchange for preventing the common wrapped-payload
    mistakes.
    
    
    
    #### [git stack](https://github.com/magus/git-stack-cli)
    - 👉 `1` https://github.com/openai/codex/pull/12300
    -  `2` https://github.com/openai/codex/pull/12275
    -  `3` https://github.com/openai/codex/pull/12205
    -  `4` https://github.com/openai/codex/pull/12185
    -  `5` https://github.com/openai/codex/pull/10673
  • Refactor network approvals to host/protocol/port scope (#12140)
    ## Summary
    Simplify network approvals by removing per-attempt proxy correlation and
    moving to session-level approval dedupe keyed by (host, protocol, port).
    Instead of encoding attempt IDs into proxy credentials/URLs, we now
    treat approvals as a destination policy decision.
    
    - Concurrent calls to the same destination share one approval prompt.
    - Different destinations (or same host on different ports) get separate
    prompts.
    - Allow once approves the current queued request group only.
    - Allow for session caches that (host, protocol, port) and auto-allows
    future matching requests.
    - Never policy continues to deny without prompting.
    
    Example:
    - 3 calls: 
      - a.com (line 443)
      - b.com (line 443)
      - a.com (line 443)
    => 2 prompts total (a, b), second a waits on the first decision.
    - a.com:80 is treated separately from a.com line 443
    
    ## Testing
    - `just fmt` (in `codex-rs`)
    - `cargo test -p codex-core tools::network_approval::tests`
    - `cargo test -p codex-core` (unit tests pass; existing
    integration-suite failures remain in this environment)
  • app-server: add JSON tracing logs (#12287)
    - add `LOG_FORMAT=json` support for app-server tracing logs via
    `tracing_subscriber`'s built-in JSON formatter
    - keep the default human-readable format unchanged and keep `RUST_LOG`
    filtering behavior
    - document the env var and update lockfile
  • Reuse connection between turns (#12294)
    Add a pool of one to the model client to reuse connections across turns.