Commit Graph

6657 Commits

  • [codex] Honor role-defined spawn service tiers (#22169)
    ## Why
    Custom agent roles are ordinary config layers, so a role file can
    already express `service_tier` just like other config values. The
    spawned-agent tier path needs to preserve that effective role config and
    follow the same precedence pattern as model/reasoning.
    
    ## What changed
    - Apply an explicit spawn-time `service_tier` onto the child config
    before role application, so a role config layer can override it just
    like role-defined model/reasoning settings do.
    - Validate the final effective child tier after the final child model is
    known, while still falling back to the parent tier when no child tier
    survives.
    - Add focused integration coverage for both v1 and v2 proving role TOML
    loads a service tier, spawned children keep that role-configured tier,
    and a role tier wins over a conflicting spawn-time tier.
    
    ## Validation
    - `just fmt`
    - `git diff --check`
    - Local Rust tests not run, per repo guidance; CI should exercise the
    new coverage.
  • fix: serialize unix app-server startup (#23516)
    # Summary
    
    Unix-socket app-server startup can currently race when multiple launch
    attempts target the same `CODEX_HOME`. Those processes can overlap
    before the control socket exists, which lets them enter SQLite state
    initialization concurrently and reproduce the startup corruption pattern
    seen in SSH mode.
    
    This change makes the app-server own that singleton startup guarantee.
    Unix-socket startup now takes a `CODEX_HOME`-scoped advisory lock before
    SQLite initialization, runs the existing control-socket preparation
    check while holding that lock, returns the established `AddrInUse` error
    when another live listener already owns the socket, and releases the
    lock once the new listener has bound its socket.
    
    # Design decisions
    
    - The singleton rule lives in `app-server --listen unix://`, not in a
    desktop-only caller path, so every Unix-socket launch gets the same race
    protection.
    - A duplicate raw app-server launch returns an error instead of silently
    succeeding. The attach operation remains `app-server proxy`, which
    continues to connect to an already-running listener.
    - The lock is held only across the dangerous startup window: socket
    preparation, SQLite initialization, and socket bind. It is not held for
    the app-server lifetime.
    - Listener detection stays in `prepare_control_socket_path(...)`, so the
    preexisting live-listener and stale-socket behavior remains the single
    source of truth.
    
    # Testing
    
    Tests: targeted Unix-socket transport tests on the branch checkout, full
    `codex-cli` build on `efrazer-db10`, and an SSH-style smoke on
    `efrazer-db10` covering concurrent app-server starts, explicit
    duplicate-start errors, and absence of SQLite startup-error matches in
    launch logs.
  • Split plugin install discovery into list and request tools (#23372)
    ## Summary
    - Add `list_available_plugins_to_install` as the inventory step for
    plugin and connector install suggestions.
    - Slim `request_plugin_install` so it only handles the actual
    elicitation, instead of carrying the full discoverable list in its
    prompt.
    - Emit send-time telemetry when an install elicitation is dispatched,
    including requested tool identity in the event payload.
    - Emit install-result telemetry through `SessionTelemetry`, including
    tool type, user response action, and completion status.
    - Update registration and tests to cover the new two-step flow while
    keeping the existing `tool_suggest` feature gate unchanged.
    
    ## Testing
    - `just fmt`
    - `cargo test -p codex-tools`
    - `cargo test -p codex-core request_plugin_install`
    - `cargo test -p codex-core list_available_plugins_to_install`
    - `cargo test -p codex-core
    install_suggestion_tools_can_be_registered_without_search_tool`
    - `cargo test -p codex-otel
    manager_records_plugin_install_suggestion_metric`
    - `cargo test -p codex-otel
    manager_records_plugin_install_elicitation_sent_metric`
    - `just fix -p codex-core`
    - `just fix -p codex-tools`
    - `just fix -p codex-otel`
    - `cargo check -p codex-core`
  • Route local-only app-server gating through processors (#23551)
    ## Summary
    - move local-only app-server gating out of `MessageProcessor`
    - let `fs/*`, `command/exec`, and `process/spawn` resolve local
    availability inside their owning processors
    - keep `fs/*` mounted for the future environment-param path while
    preserving current no-local error behavior
    
    ## Validation
    - not run locally per Codex repo guidance
  • Fix empty rollout path app-server handling (#23400)
    ## Summary
    - Coerce `path: ""` to `None` at the v2 protocol params deserialization
    boundary for `thread/resume` and `thread/fork`.
    - Restore the pre-ThreadStore running-thread resume behavior: if
    `threadId` is already running, rejoin it by id and treat a non-empty
    `path` only as a consistency check; otherwise cold resume keeps `history
    > path > threadId` precedence.
    - Add protocol, resume, and fork regression coverage for empty path
    payloads; refresh app-server schema fixtures for the clarified params
    docs.
    
    ## Tests
    - `just fmt`
    - `just write-app-server-schema`
    - `cargo test -p codex-app-server-protocol
    thread_path_params_deserialize_empty_path_as_none`
    - `cargo test -p codex-app-server-protocol --test schema_fixtures`
    - `cargo test -p codex-app-server empty_path`
    - `RUST_MIN_STACK=8388608 cargo test -p codex-app-server --test all
    thread_resume_rejects_mismatched_path_for_running_thread_id`
    - `RUST_MIN_STACK=8388608 cargo test -p codex-app-server --test all
    thread_resume_uses_path_over_non_running_thread_id`
  • fix(tui): preserve modified enter in plan questions (#23536)
    ## Why
    
    Plan mode questionnaires reuse the shared composer for free-form
    answers, but the surrounding `request_user_input` overlay still treated
    every `KeyCode::Enter` as “advance to the next question.” That made
    `Shift+Enter` insert a newline in the composer and then immediately
    advance the questionnaire anyway.
    
    Fixes #23448.
    
    ## What Changed
    
    - pass the live `RuntimeKeymap` into `RequestUserInputOverlay` so its
    embedded composer honors existing `/keymap` composer/editor remaps
    - advance free-form questions only on the configured composer submit
    binding, instead of any Enter-shaped key event
    - add regressions for `Shift+Enter` newline behavior and configured
    composer submit bindings inside the questionnaire UI
    
    ## How to Test
    
    1. Start Codex in Plan mode and trigger a `request_user_input`
    questionnaire with a free-form answer field.
    2. Focus the free-form field, type a line, then press `Shift+Enter`.
    3. Confirm the answer gains a newline and the questionnaire stays on the
    same question.
    4. Press the configured submit binding, or plain `Enter` with the
    default keymap, and confirm the questionnaire advances as before.
    
    Targeted tests:
    - `cargo test -p codex-tui
    bottom_pane::request_user_input::tests::freeform_ -- --nocapture`
    
    ## Notes
    
    - `cargo test -p codex-tui` still reaches an unrelated existing stack
    overflow in
    `app::tests::discard_side_thread_removes_agent_navigation_entry` on this
    checkout.
    - `just argument-comment-lint` is locally blocked by Bazel analysis
    failing in external `compiler-rt` before the lint runs.
  • Refactor exec-server websocket pump (#23327)
    ## Why
    Exec-server websocket handling had separate reader and writer tasks for
    the same socket. That made websocket control-frame handling asymmetric:
    the task reading frames could observe `Ping`, but the task allowed to
    write frames was elsewhere. This PR moves each physical websocket onto
    one always-running pump so the socket owner can handle application
    frames and websocket control frames together.
    
    ## What changed
    - Refactored direct exec-server websocket connections in `connection.rs`
    to use one task that owns the websocket for outbound JSON-RPC, inbound
    JSON-RPC, periodic keepalive pings, and `Ping` -> `Pong` replies.
    - Refactored relay websocket handling in `relay.rs` the same way for
    both the harness-side logical connection and the multiplexed executor
    physical socket.
    - Preserved the existing keepalive ownership policy: outbound direct
    websocket clients still send periodic pings, inbound Axum accepts only
    reply with pongs, and relay physical websocket endpoints keep their
    existing periodic pings.
    - Added focused websocket pump tests for ping/pong, binary JSON-RPC,
    relay data, malformed relay text frames, and close/disconnect behavior.
    - Reconnect behavior is intentionally left for a follow-up.
    
    ## Validation
    - Devbox Bazel focused unit target:
    - `//codex-rs/exec-server:exec-server-unit-tests
    --test_filter='websocket_connection_|harness_connection_|multiplexed_executor_'`
  • Make local environment optional in EnvironmentManager (#23369)
    ## Summary
    - make `EnvironmentManager` local environment/runtime paths optional
    - simplify constructor surface around snapshot materialization
    - rename local env accessors to `require_local_environment` /
    `try_local_environment`
    
    ## Validation
    - devbox Bazel build for touched crate surfaces
    - `//codex-rs/exec-server:exec-server-unit-tests`
    - `//codex-rs/app-server-client:app-server-client-unit-tests`
    - filtered touched `//codex-rs/core:core-unit-tests` cases
  • build: add Codex package builder (#23513)
    ## Why
    
    Codex CLI packaging is currently split across npm staging, standalone
    installers, and release bundle creation, which makes it hard to define
    and validate a single valid package directory. This adds the first
    standalone package builder so later release paths can converge on the
    same canonical layout.
    
    ## What changed
    
    - Added `scripts/build_codex_package.py` as the stable executable
    wrapper around `scripts/codex_package`.
    - Added modules for CLI parsing, target metadata, grouped cargo builds,
    package layout validation, and archive writing.
    - The builder creates a package directory with `codex-package.json`,
    `bin/`, `codex-resources/`, and `codex-path`, and can serialize it as
    `.tar.gz`, `.tar.zst`, or `.zip`.
    - Source-built artifacts are built by one grouped `cargo build`: `codex`
    for all targets, `bwrap` for Linux, and the Windows sandbox helpers for
    Windows. `rg` remains an input because it is vendored from upstream
    rather than built from this repo.
    - Added `scripts/codex_package/README.md` to document the package
    layout, source-built artifacts, and cargo profile behavior.
    
    ## Verification
    
    - Ran wrapper/module syntax compilation.
    - Ran `scripts/build_codex_package.py --help` from `/private/tmp`.
    - Ran fake-cargo package/archive builds for macOS, Linux, and Windows
    target layouts, including an assertion that generated tar archives
    contain no duplicate member names.
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/23513).
    * #23526
    * __->__ #23513
  • Add SubagentStart hook (#22782)
    # What
    
    `SubagentStart` runs once when Codex creates a thread-spawned subagent,
    before that child sends its first model request. Thread-spawned
    subagents use `SubagentStart` instead of the normal root-agent
    `SessionStart` hook.
    
    Configured handlers match on the subagent `agent_type`, using the same
    value passed to `spawn_agent`. When no agent type is specified, Codex
    uses the default agent type.
    
    Hook input includes the normal session-start fields plus:
    
    - `agent_id`: the child thread id.
    - `agent_type`: the resolved subagent type.
    
    `SubagentStart` may return `hookSpecificOutput.additionalContext`. That
    context is added to the child conversation before the first model
    request.
    
    # Lifecycle Scope
    
    Only thread-spawned subagents run `SubagentStart`.
    
    Internal/system subagents such as Review, Compact, MemoryConsolidation,
    and Other do not run normal `SessionStart` hooks and do not run
    `SubagentStart`. This avoids exposing synthetic matcher labels for
    internal implementation paths.
    
    Also the `SessionStart` hook no longer fires for subagents, this matches
    behavior with other coding agents' implementation
    
    # Stack
    
    1. This PR: add `SubagentStart`.
    2. #22873: add `SubagentStop`.
    3. #22882: add subagent identity to normal hook inputs.
  • Harden CLI rate limit window labels (#22929)
    ## Context
    
    The CLI rate-limit surfaces previously described usage windows as fixed
    5-hour and weekly limits. We want the CLI to display whatever supported
    rate-limit period the server returns instead of assuming a 5-hour/1-week
    pair. This supports generalized Codex rate-limit periods.
    
    ## Summary
    
    - Formats CLI rate-limit warning/status labels only for the supported
    returned window durations: approximate 5h, daily, weekly, monthly, and
    annual.
    - Uses generic fallback copy when a primary or secondary window has no
    duration, so missing secondary protection data does not produce stale
    weekly copy.
    - Uses generic fallback copy for unsupported window durations instead of
    adding arbitrary hourly, multi-day, multi-week, or multi-year labels.
    - Updates status line and terminal title setup descriptions/previews to
    talk about primary/secondary usage limits rather than fixed 5h/weekly
    limits.
    - Adds rendered insta snapshot coverage for the updated rate-limit
    status surfaces and `/status` fallback labels.
    
    ## Tests
    Tested locally:
    - one primary window
    - one secondary window
    - primary and secondary window
  • Make deny canonical for filesystem permission entries (#23493)
    ## Why
    Filesystem permission profiles used `none` for deny-read entries, which
    is less direct than the action the entry actually represents. This
    change makes `deny` the canonical filesystem permission spelling while
    preserving compatibility for older configs that still send `none`.
    
    ## What changed
    - rename `FileSystemAccessMode::None` to `Deny`
    - serialize and generate schemas with `deny` as the canonical value
    - retain `none` only as a legacy input alias for temporary config
    compatibility
    - update filesystem glob diagnostics and regression coverage to use the
    canonical spelling
    - refresh config and app-server schema fixtures to match the new wire
    shape
    
    ## Validation
    - `cargo test -p codex-protocol`
    - `cargo test -p codex-app-server-protocol`
    - `cargo test -p codex-core config_toml_deserializes_permission_profiles
    --lib`
    - `cargo test -p codex-core
    read_write_glob_patterns_still_reject_non_subpath_globs --lib`
    
    Earlier in the session, a broad `cargo test -p codex-core` run reached
    unrelated pre-existing failures in timing/snapshot/git-info tests under
    this environment; the targeted surfaces touched by this PR passed
    cleanly.
  • chore: namespace v1 sub-agent tools (#23475)
    ## Why
    
    The v1 sub-agent tools are a single tool family, but they were exposed
    as separate flat function tools. This makes the model-visible surface
    less clearly grouped and leaves the legacy names in the same flat
    namespace as newer agent tooling.
    
    ## What
    
    - Wraps the v1 `spawn_agent`, `send_input`, `resume_agent`,
    `wait_agent`, and `close_agent` specs in the `multi_agent_v1` namespace.
    - Registers the corresponding handlers with namespaced runtime tool
    names.
    - Updates tool-planning, deferred tool search, and sub-agent
    notification tests to assert the namespace shape and child `spawn_agent`
    lookup.
    
    ## Verification
    
    - Updated `codex-core` coverage for the v1 multi-agent tool plan,
    deferred tool search output, and sub-agent tool descriptions.
  • [codex] Make contextual user fragments dyn-renderable (#23397)
    ## Why
    `ContextualUserFragment` needs to be usable behind `dyn` for render-only
    paths, but associated constants made the trait non-object-safe.
    
    ## What changed
    - Replaced associated constants with trait methods so `dyn
    ContextualUserFragment` can render fragments.
    - Preserved the existing typed `T::matches_text(text)` registration
    pattern via `type_markers()`.
    - Kept default `render()` on the main trait so implementations only
    provide role, markers, and body.
    - Added unit coverage for rendering a `Box<dyn ContextualUserFragment>`.
    
    ## Verification
    - `cargo test -p codex-core contextual_user_fragment_is_dyn_compatible`
    - `just fix -p codex-core`
  • [2 of 4] tui: route app and skill enablement through app server (#22914)
    ## Why
    App and skill toggles are user config mutations too. When the TUI is
    attached to a remote app server, writing those toggles into the local
    `config.toml` makes the UI report success without updating the server
    that actually owns the session.
    
    This is **[2 of 4]** in a stacked series that moves TUI-owned config
    mutations onto app-server APIs.
    
    ## What changed
    - Routed app enable/disable persistence through app-server config batch
    writes.
    - Routed skill enable/disable persistence through `skills/config/write`.
    - Avoided refreshing local config from disk after these writes when the
    TUI is connected to a remote app server.
    
    ## Config keys affected
    - `apps.<app_id>.enabled`
    - `apps.<app_id>.disabled_reason`
    - `[[skills.config]]` entries keyed by `path`, with `enabled = false`
    used for persisted disables
    
    ## Suggested manual validation
    - Connect the TUI to a remote app server, disable an app, reconnect, and
    confirm the app remains disabled from remote config rather than local
    disk state.
    - Re-enable the same app and confirm both `apps.<app_id>.enabled` and
    `apps.<app_id>.disabled_reason` are cleared remotely.
    - Disable a skill in the manage-skills UI and confirm a remote
    `[[skills.config]]` disable entry appears.
    - Re-enable that skill and confirm the disable entry is removed and the
    effective enabled state updates without relying on local config reloads.
    
    ## Stack
    1. [#22913](https://github.com/openai/codex/pull/22913) `[1 of 4]`
    primary settings writes
    2. [#22914](https://github.com/openai/codex/pull/22914) `[2 of 4]` app
    and skill enablement
    3. [#22915](https://github.com/openai/codex/pull/22915) `[3 of 4]`
    feature and memory toggles
    4. [#22916](https://github.com/openai/codex/pull/22916) `[4 of 4]`
    startup and onboarding bookkeeping
  • [codex] Preserve steer input as user input (#23405)
    ## Why
    
    Steered input was queued as a `ResponseInputItem`, then parsed back into
    a user message before recording. That path loses information that only
    exists on `UserInput`, such as UI text elements.
    
    This change keeps turn-local pending input typed as either original
    `UserInput` or existing response items, so steered user input reaches
    user-message recording without being reconstructed from a response item.
    
    ## What changed
    
    - Add `TurnInput` for active-turn pending input.
    - Queue `Session::steer_input` as `TurnInput::UserInput`.
    - Run pending-input hook inspection only for `TurnInput::UserInput`.
    - Process drained pending input item by item: accepted items are
    recorded, blocked items append hook context and are skipped.
    - Remove the pending-input prepend/requeue path.
    
    ## Validation
    
    - `just fmt`
    - `just fix -p codex-core`
    - `RUST_MIN_STACK=16777216 cargo test -p codex-core --lib
    session::tests::task_finish_emits_turn_item_lifecycle_for_leftover_pending_user_input
    -- --nocapture`
    - `RUST_MIN_STACK=16777216 cargo test -p codex-core --lib steer_input`
    - `RUST_MIN_STACK=16777216 cargo test -p codex-core --lib pending_input`
    - `RUST_MIN_STACK=16777216 cargo test -p codex-core --test all
    pending_input`
    - `RUST_MIN_STACK=16777216 cargo test -p codex-core` (unit tests passed:
    1835 passed, 0 failed, 4 ignored; integration `all` target failed due
    missing helper binaries such as `codex`/`test_stdio_server` plus
    unrelated MCP/search/code-mode expectations)
  • [codex] Move hook request plumbing into hook runtime (#23388)
    ## Why
    
    `run_turn` was still hand-building hook payloads and lifecycle events
    for a couple of hook paths. Most hook call sites already delegate
    request construction and event emission to `hook_runtime`, which keeps
    turn orchestration focused on model-flow decisions rather than hook
    plumbing.
    
    This also keeps the legacy `after_agent` message extraction next to the
    legacy hook dispatch instead of leaving response-item walking in
    `run_turn`.
    
    ## What changed
    
    - Added `run_stop_hooks` in `hook_runtime` to build `StopRequest`, emit
    preview start events, run the hook, and emit completion events.
    - Added `run_legacy_after_agent_hook` in `hook_runtime` to build and
    dispatch the legacy `AfterAgent` hook payload, including extracting
    input messages from response items.
    - Updated `run_turn` to call the hook runtime helpers and keep only the
    resulting continuation/block/stop decisions inline.
    - Removed the repeated pending session-start hook check from the run
    loop.
    
    ## Validation
    
    - `cargo test -p codex-core hook_runtime`
  • [codex] Allow empty turn/start requests (#23409)
    ## Why
    
    `turn/start` already accepts an input array on the wire, including an
    empty array, but core treated empty input as a no-op before the turn
    could reach the model. App-server clients need to be able to start a
    real turn even when there is no new user message, for example to let the
    model proceed from existing thread context.
    
    ## What changed
    
    - Removed the `run_turn` early return that skipped empty-input turns
    when there was no pending input.
    - Kept empty active-turn steering rejected by moving the `steer_input`
    empty-input check until after core has determined whether there is an
    active regular turn.
    - Empty regular turns now refresh `previous_turn_settings` like other
    regular turns, so follow-up context injection state advances
    consistently.
    - Added an app-server v2 integration test proving `turn/start` with
    `input: []` emits started/completed notifications, sends one Responses
    request, and does not synthesize an empty user message.
    
    ## Validation
    
    - `cargo test -p codex-app-server --test all
    turn_start_with_empty_input_runs_model_request`
  • Defer v1 multi-agent tools behind tool search (#23144)
    Summary: defer v1 multi-agent tools when tool_search and namespace tools
    are available; keep concise searchable descriptions and move the v1
    usage guidance into developer instructions; add targeted coverage.
    Testing: not run per request; ran just fmt.
  • Add body_after_prefix auto-compact token limit scope (#22870)
    ## Why
    
    `model_auto_compact_token_limit` has only been able to budget the full
    active context. That makes it hard to set a small "growth since
    compaction" budget for sessions that preserve a large carried window
    prefix: the preserved prefix can consume the whole budget and force
    immediate repeated compaction.
    
    This PR adds an opt-in `body_after_prefix` scope so callers can apply
    `model_auto_compact_token_limit` to sampled output and later growth
    after the current carried prefix, while still forcing compaction before
    the full model context window is exhausted.
    
    ## What changed
    
    - Adds `AutoCompactTokenLimitScope` with the existing `total` behavior
    as the default and a new `body_after_prefix` mode:
    [`config_types.rs`](https://github.com/openai/codex/blob/973806b1cb35792555bead994cb3ed94656eb171/codex-rs/protocol/src/config_types.rs#L24-L37).
    - Threads `model_auto_compact_token_limit_scope` through config loading,
    `Config`, `core-api`, and app-server v2 schema/TypeScript generation.
    - Records the first observed input-token count for a `body_after_prefix`
    compaction window and uses it as the baseline when deciding whether the
    scoped auto-compaction budget is exhausted:
    [`turn.rs`](https://github.com/openai/codex/blob/973806b1cb35792555bead994cb3ed94656eb171/codex-rs/core/src/session/turn.rs#L743-L781).
    - Keeps a hard context-window cap in `body_after_prefix`, so scoped
    budgeting cannot let the active context overrun the usable window.
    
    ## Verification
    
    Added compact-suite coverage for the two key behaviors:
    `body_after_prefix` does not re-compact just because the carried prefix
    is larger than the scoped budget, and it still compacts when the total
    active context reaches the configured context window:
    [`compact.rs`](https://github.com/openai/codex/blob/973806b1cb35792555bead994cb3ed94656eb171/codex-rs/core/tests/suite/compact.rs#L3003-L3128).
  • Remove ToolsConfig from tool planning (#22835)
    ## Why
    
    `codex-tools` is meant to hold reusable tool primitives, but
    `ToolsConfig` had become a second copy of core runtime decisions instead
    of a small shared contract. It carried provider capabilities, auth/model
    gates, permission and environment state, web/search/image feature gates,
    multi-agent settings, and goal availability from core into `codex-tools`
    ([definition](https://github.com/openai/codex/blob/22dd9ad3929253ed24d7ee4f10f238e95ab25f37/codex-rs/tools/src/tool_config.rs#L97),
    [stored on each
    `TurnContext`](https://github.com/openai/codex/blob/22dd9ad3929253ed24d7ee4f10f238e95ab25f37/codex-rs/core/src/session/turn_context.rs#L87)).
    Every session/context variant then had to build and mutate that snapshot
    before assembling tools.
    
    This PR removes that master object instead of renaming it. Tool planning
    now reads the live `TurnContext`, where `codex-core` already owns those
    decisions, while `codex-tools` keeps only reusable primitives and a
    generic `ToolSetBuilder`/`ToolSet` accumulator.
    
    ## What Changed
    
    - Removed `ToolsConfig` / `ToolsConfigParams` from `codex-tools`; the
    crate keeps the shared helpers that still belong there, including
    request-user-input mode selection, shell backend/type resolution,
    `UnifiedExecShellMode`, and `ToolEnvironmentMode`.
    - Replaced config-snapshot planning with `ToolRouter::from_turn_context`
    and a `spec_plan` pipeline over `CoreToolPlanContext`, deriving provider
    capabilities, auth gates, model support, feature gates, environment
    count, goal support, multi-agent options, web search, and image
    generation from the authoritative turn state.
    - Added generic `codex_tools::ToolSetBuilder` / `ToolSet`, plus the
    small core adapter needed to accumulate `CoreToolRuntime` values and
    hosted model specs.
    - Added the `tool_family::shell` registration module and moved
    shell/unified-exec/memory accounting call sites to read the narrow
    per-turn fields directly.
    - Narrowed `TurnContext` to the remaining explicit per-turn fields
    needed by planning: `available_models`, `unified_exec_shell_mode`, and
    `goal_tools_supported`.
    - Reworked MCP exposure and tool-search setup so deferred/direct MCP
    behavior is driven by the current turn rather than a precomputed config
    snapshot.
    - Replaced the large expected-spec fixture tests with focused
    behavior-level coverage for shell tools, environments, goal and
    agent-job gates, MCP direct/deferred exposure, tool search,
    request-plugin-install, code mode, multi-agent mode, hosted tools, and
    extension executor dispatch.
    
    ## Verification
    
    - `cargo check -p codex-tools`
    - `cargo check -p codex-core --lib`
    - `cargo test -p codex-tools`
    - `cargo test -p codex-core spec_plan --lib`
    - `cargo test -p codex-core router --lib`
  • feat: dedicated goal DB (#23300)
    ## Why
    
    Thread goals are moving toward extension-owned runtime behavior, but
    their persisted state was still stored in the shared state database.
    This makes the goal store harder to isolate and keeps future storage
    splits tied to ad hoc runtime plumbing.
    
    This PR gives goals their own SQLite database while keeping the existing
    `StateRuntime` entry point. The goal is to make this the pattern for
    adding more dedicated runtime databases later.
    
    This also reduce load on existing DB and reduce contention
    
    ## Limitation
    Thread preview from goal is not supported anymore. I'm looking into this
    [EDIT]: solved
    
    ## What changed
    
    - Added a dedicated `goals_1.sqlite` database with its own
    `goals_migrations` directory.
    - Moved `thread_goals` creation into the goals DB migration set.
    - Dropped the old `thread_goals` table from the main state DB with a
    normal state migration. There is intentionally no backfill for existing
    goal rows.
    - Changed `GoalStore` to be backed only by the goals DB pool.
    - Removed the old goal-write side effect that filled empty
    `threads.preview` values from the goal objective.
    - Added shared runtime DB path metadata so startup, telemetry, `codex
    doctor`, and repair handling can include future DBs without bespoke path
    lists.
    - Updated Bazel compile data so the new goals migration directory is
    available to `sqlx::migrate!`.
    
    ## Verification
    
    - `cargo check --tests -p codex-state -p codex-cli -p codex-core -p
    codex-app-server`
    - `just fix -p codex-state`
    - `just fix -p codex-cli`
    - `just fix -p codex-app-server`
  • Preserve context baselines for full-history agent forks (#23352)
    ## Why
    
    Full-history agent forks should continue from the same prompt prefix as
    the parent. Dropping the stored `TurnContext` baseline forced the child
    to rebuild startup context on its first turn, which can duplicate
    developer instructions and also loses the cache continuity that a
    full-history fork is supposed to preserve.
    
    Truncated forks are different: once we keep only the last N turns, the
    original prompt prefix is no longer intact, so the child must establish
    a fresh context baseline.
    
    ## What changed
    
    - Preserve `RolloutItem::TurnContext` when forking with
    `SpawnAgentForkMode::FullHistory`, and keep dropping it for truncated
    forks:
    https://github.com/openai/codex/blob/4090717d94c1fc7f33c9bd122be133a0c5752052/codex-rs/core/src/agent/control.rs#L98-L126
    and
    https://github.com/openai/codex/blob/4090717d94c1fc7f33c9bd122be133a0c5752052/codex-rs/core/src/agent/control.rs#L399-L401
    - Remove the special-case MultiAgentV2 usage-hint filtering path.
    Full-history fork now preserves the cached developer prefix instead of
    trying to reconstruct part of it.
    - Extend the fork coverage to assert both sides of the contract:
    full-history forks keep the parent reference baseline, while last-N
    forks rebuild context after truncation:
    https://github.com/openai/codex/blob/4090717d94c1fc7f33c9bd122be133a0c5752052/codex-rs/core/src/agent/control_tests.rs#L603-L759
    and
    https://github.com/openai/codex/blob/4090717d94c1fc7f33c9bd122be133a0c5752052/codex-rs/core/src/agent/control_tests.rs#L854-L977
    
    ## Verification
    
    - `cargo test -p codex-core
    spawn_agent_can_fork_parent_thread_history_with_sanitized_items --
    --nocapture`
    - `RUST_MIN_STACK=16777216 cargo test -p codex-core
    spawn_agent_fork_last_n_turns_keeps_only_recent_turns -- --nocapture`
  • core: expose permission profile picker metadata (#22928)
    ## Why
    
    The `/permissions` picker needs a config-level way to distinguish legacy
    anonymous presets from named permission-profile mode. That signal cannot
    be inferred reliably in the TUI, especially for the edge case where
    `default_permissions = ":workspace"` is present without a
    `[permissions]` table.
    
    ## What changed
    
    - Expose whether the merged config is explicitly in permission-profile
    mode.
    - Expose the configured custom permission profile IDs alongside the
    built-in profile semantics.
    - Add regression coverage for profile mode detection and custom profile
    metadata, including the `default_permissions = ":workspace"` case.
    - Update the thread-manager sample config literal to match the expanded
    config shape.
    
    ## Stack
    
    1. **This PR**: config metadata needed by downstream permission-profile
    consumers.
    2. [#22931](https://github.com/openai/codex/pull/22931): refresh active
    permission profiles through runtime/session/network state.
    3. [#21559](https://github.com/openai/codex/pull/21559): switch
    `/permissions` to the profile-aware TUI picker.
    
    ## Verification
    
    - `cargo check -p codex-thread-manager-sample`
    - `cargo test -p codex-core
    default_permissions_can_select_builtin_profile_without_permissions_table`
    - `cargo test -p codex-core
    permissions_profiles_allow_direct_write_roots_outside_workspace_root`
  • Remove explicit connector tool undeferral (#23390)
    ## Summary
    - remove the explicit-connector carveout that kept mentioned app tools
    directly exposed instead of deferred
    - keep the surviving explicit-mention reconstruction only for analytics,
    preserving `codex_app_mentioned` and `codex_app_used.invoke_type`
    - trim the now-unused prompt/tool-exposure plumbing and refresh coverage
    around always-defer behavior
    
    ## Verification
    - `just fmt`
    - `cargo test -p codex-analytics`
    - `cargo test -p codex-core` *(one transient timeout in
    `shell_snapshot::tests::macos_zsh_snapshot_includes_sections`; isolated
    rerun passed)*
    - `cargo test -p codex-core --lib
    shell_snapshot::tests::macos_zsh_snapshot_includes_sections`
    - `cargo test -p codex-core --test all
    explicit_app_mentions_respect_always_defer`
    - `cargo test -p codex-core --lib
    mcp_tool_exposure::tests::always_defer_feature_defers_apps_too`
    - `just fix -p codex-analytics`
    - `just fix -p codex-core`
  • CI: Customize v8 building (#22086)
    ## Summary
    
    Move the rusty_v8 artifact production into hermetic Bazel path and bump
    the `v8` crate to `147.4.0`
    
    The new flow builds V8 release artifacts from source for Darwin and
    Linux targets, publishes both the current release-compatible artifacts
    and sandbox-enabled variants, and keeps Cargo consumers on prebuilt
    binaries by continuing to feed the `v8` crate the archive and generated
    binding files it already expects.
    
    ## Why
    
    We need control over V8 build-time features without giving up prebuilt
    artifacts for downstream Cargo builds.
    
    Upstream `rusty_v8` already supports source-only features such as
    `v8_enable_sandbox`, but its normal prebuilt release assets do not cover
    every feature combination we need. Building the artifacts ourselves lets
    us enable settings such as the V8 sandbox and pointer compression at
    artifact build time, then publish those outputs so ordinary Cargo builds
    can still consume prebuilts instead of compiling V8 locally.
    
    This keeps the fast consumer experience of prebuilt `rusty_v8` archives
    while giving us a reproducible path to ship featureful variants that
    upstream does not currently publish for us.
    
    ## Implementation Notes
    
    The Bazel graph in this PR is not copied wholesale from `rusty_v8`;
    `rusty_v8`'s normal source build is still GN/Ninja-based.
    
    Instead, this change starts from upstream V8's Bazel rules and adapts
    them to Codex's hermetic toolchains and dependency layout. Where we
    intentionally follow `rusty_v8`, we mirror its existing artifact
    contract:
    
    - the same `v8` crate version and generated binding expectations
    - the same sandbox feature relationship, where sandboxing requires
    pointer compression
    - the same custom libc++ model expected by Cargo's default
    `use_custom_libcxx` feature
    - the same release-style archive plus `src_binding` outputs consumed by
    the `v8` crate
    
    To preserve that contract, the Bazel release path pins the libc++,
    libc++abi, and llvm-libc revisions used by `rusty_v8 v147.4.0`, builds
    release artifacts with `--config=rusty-v8-upstream-libcxx`, and folds
    the matching runtime objects into the final static archive.
    
    ## Windows
    
    Windows is annoyingly handled differently.
    
    Codex's current hermetic Bazel Windows C++ platform is `windows-gnullvm`
    / `x86_64-w64-windows-gnu`, while upstream `rusty_v8` publishes Windows
    prebuilts for `*-pc-windows-msvc`. Those are different ABIs, so the
    Bazel graph cannot truthfully reproduce the upstream MSVC artifacts
    until we add a real MSVC-targeting C++ toolchain.
    
    For now:
    
    - Windows MSVC consumers continue to use upstream `rusty_v8` release
    archives.
    - Windows GNU targets are built in-tree so they link against a matching
    GNU ABI.
    - The canary workflow separately exercises upstream `rusty_v8` source
    builds for MSVC sandbox artifacts, but MSVC is not yet part of the
    Bazel-produced release matrix.
    
    ## Validation
    This PR is technically self validating through CI. I have already
    published it as a release tag so the artifacts from this branch are
    published to
    https://github.com/openai/codex/releases/tag/rusty-v8-v147.4.0 CI for
    this PR should therefore consume our own release targets. I have also
    locally tested for linux and darwin.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • [5 of 7] Replace OverrideTurnContext with ThreadSettings (#22508)
    **Stack position:** [5 of 7]
    
    ## Summary
    
    This PR adds `Op::ThreadSettings`, a queued settings-only update
    mechanism for changing stored thread settings without starting a new
    turn. It also removes the legacy `Op::OverrideTurnContext` in the same
    layer, so reviewers can see the replacement and deletion together.
    
    ## Changes
    
    - Add `Op::ThreadSettings` for settings-only queued updates.
    - Emit `ThreadSettingsApplied` with the effective thread settings
    snapshot after core applies an update.
    - Route settings-only updates through the same submission queue as user
    input.
    - Migrate remaining `OverrideTurnContext` tests and callers to the
    queued `Op::ThreadSettings` path.
    - Delete `Op::OverrideTurnContext` from the core protocol and submission
    loop.
    
    This stack addresses #20656 and #22090.
    
    ## Stack
    
    1. [1 of 7] [Add thread settings to
    UserInput](https://github.com/openai/codex/pull/23080)
    2. [2 of 7] [Remove
    UserInputWithTurnContext](https://github.com/openai/codex/pull/23081)
    3. [3 of 7] [Remove
    UserTurn](https://github.com/openai/codex/pull/23075)
    4. [4 of 7] [Placeholder for OverrideTurnContext
    cleanup](https://github.com/openai/codex/pull/23087)
    5. [5 of 7] [Replace OverrideTurnContext with
    ThreadSettings](https://github.com/openai/codex/pull/22508) (this PR)
    6. [6 of 7] [Add app-server thread settings
    API](https://github.com/openai/codex/pull/22509)
    7. [7 of 7] [Sync TUI thread
    settings](https://github.com/openai/codex/pull/22510)
  • fix(plugins): keep version upgrades additive (#23356)
    ## Why
    
    Windows can reject plugin cache upgrades when a running MCP server still
    has its working directory inside the currently active plugin version.
    The existing cache refresh path replaces
    `plugins/cache/<marketplace>/<plugin>` as a whole, so a live handle
    under the old version can make an otherwise ordinary version bump fail.
    
    This PR keeps the existing plugin-selection model intact while making
    version bumps less disruptive.
    
    ## What changed
    
    - When installing a new version beside an existing plugin cache root,
    move only the staged version directory into place instead of replacing
    the whole plugin root.
    - Best-effort prune older sibling version directories after the new
    version is activated.
    - Preserve the existing whole-root replacement path for first installs
    and same-version refreshes.
    - Add regression coverage for upgrading from `1.0.0` to `2.0.0` without
    replacing the plugin root.
    
    ## Verification
    
    - `cargo test -p codex-core-plugins install_with_new_version`
    - `cargo fmt --package codex-core-plugins --check`
  • [codex] Extract turn skill and plugin injections (#23396)
    ## Why
    
    `run_turn` had accumulated the turn-scoped skill, plugin, app, MCP,
    connector-selection, and analytics setup inline. That made the
    orchestration path harder to scan even though the actual turn item
    injection still needs to stay in `run_turn` so ordering is explicit.
    
    ## What changed
    
    This extracts that setup into `build_skills_and_plugins`, which returns
    the combined injection `ResponseItem`s and the explicitly enabled
    connector IDs. `run_turn` now keeps the required orchestration pieces:
    context update recording, user input handling, connector selection
    merge, and the explicit per-item `record_conversation_items` calls for
    injection items.
    
    The refactor keeps the change LOC-neutral in `core/src/session/turn.rs`
    and preserves the existing response-item based injection path.
    
    ## Validation
    
    - `cargo test -p codex-core collect_explicit_app_ids_from_skill_items`
    - `just fix -p codex-core`
  • [3 of 7] Remove UserTurn (#23075)
    **Stack position:** [3 of 7]
    
    ## Summary
    
    This PR finishes the input-op consolidation by moving the remaining
    `Op::UserTurn` callers onto `Op::UserInput` and deleting `Op::UserTurn`.
    This touches a lot of files, but it is a low-risk mechanical migration.
    
    ## Stack
    
    1. [1 of 7] [Add thread settings to
    UserInput](https://github.com/openai/codex/pull/23080)
    2. [2 of 7] [Remove
    UserInputWithTurnContext](https://github.com/openai/codex/pull/23081)
    3. [3 of 7] [Remove
    UserTurn](https://github.com/openai/codex/pull/23075) (this PR)
    4. [4 of 7] [Placeholder for OverrideTurnContext
    cleanup](https://github.com/openai/codex/pull/23087)
    5. [5 of 7] [Replace OverrideTurnContext with
    ThreadSettings](https://github.com/openai/codex/pull/22508)
    6. [6 of 7] [Add app-server thread settings
    API](https://github.com/openai/codex/pull/22509)
    7. [7 of 7] [Sync TUI thread
    settings](https://github.com/openai/codex/pull/22510)
  • [2 of 7] Remove UserInputWithTurnContext (#23081)
    **Stack position:** [2 of 7]
    
    ## Summary
    
    This PR removes the overlapping `Op::UserInputWithTurnContext` variant
    now that `Op::UserInput` can carry thread settings overrides directly.
    
    ## Stack
    
    1. [1 of 7] [Add thread settings to
    UserInput](https://github.com/openai/codex/pull/23080)
    2. [2 of 7] [Remove
    UserInputWithTurnContext](https://github.com/openai/codex/pull/23081)
    (this PR)
    3. [3 of 7] [Remove
    UserTurn](https://github.com/openai/codex/pull/23075)
    4. [4 of 7] [Placeholder for OverrideTurnContext
    cleanup](https://github.com/openai/codex/pull/23087)
    5. [5 of 7] [Replace OverrideTurnContext with
    ThreadSettings](https://github.com/openai/codex/pull/22508)
    6. [6 of 7] [Add app-server thread settings
    API](https://github.com/openai/codex/pull/22509)
    7. [7 of 7] [Sync TUI thread
    settings](https://github.com/openai/codex/pull/22510)
  • [1 of 7] Add thread settings to UserInput (#23080)
    **Stack position:** [1 of 7]
    
    ## Summary
    
    The first three PRs in this stack are a cleanup pass before the actual
    thread settings API work.
    
    Today, core has several overlapping "user input" ops: `UserInput`,
    `UserInputWithTurnContext`, and `UserTurn`. They differ mostly in how
    much next-turn state they carry, which makes the later queued thread
    settings update harder to reason about and review.
    
    This PR starts that cleanup by adding the shared
    `ThreadSettingsOverrides` payload and allowing `Op::UserInput` to carry
    it. Existing variants remain in place here, so this layer is mostly a
    behavior-preserving API shape change plus mechanical constructor
    updates.
    
    ## End State After PR3
    
    By the end of PR3, `Op::UserInput` is the only "user input" core op. It
    can carry optional thread settings overrides for callers that need to
    update stored defaults with a turn, while callers without updates use
    empty settings. `Op::UserInputWithTurnContext` and `Op::UserTurn` are
    deleted.
    
    ## End State After PR5
    
    By the end of PR5, core will have only two ops for this area:
    
    - `Op::UserInput` for user-input-bearing submissions.
    - `Op::ThreadSettings` for settings-only updates.
    
    ## Stack
    
    1. [1 of 7] [Add thread settings to
    UserInput](https://github.com/openai/codex/pull/23080) (this PR)
    2. [2 of 7] [Remove
    UserInputWithTurnContext](https://github.com/openai/codex/pull/23081)
    3. [3 of 7] [Remove
    UserTurn](https://github.com/openai/codex/pull/23075)
    4. [4 of 7] [Placeholder for OverrideTurnContext
    cleanup](https://github.com/openai/codex/pull/23087)
    5. [5 of 7] [Replace OverrideTurnContext with
    ThreadSettings](https://github.com/openai/codex/pull/22508)
    6. [6 of 7] [Add app-server thread settings
    API](https://github.com/openai/codex/pull/22509)
    7. [7 of 7] [Sync TUI thread
    settings](https://github.com/openai/codex/pull/22510)
  • Remove ToolSearch feature toggle (#23389)
    ## Summary
    - mark `ToolSearch` as removed and ignore stale config writes for its
    legacy key
    - make search tool exposure depend only on model capability, not a
    feature toggle
    - remove app-server enablement support and prune now-obsolete test
    coverage/setup
    
    ## Verification
    - `cargo test -p codex-features`
    - `cargo test -p codex-tools`
    - `cargo test -p codex-core search_tool_requires_model_capability`
    - `cargo test -p codex-app-server experimental_feature_enablement_set_`
    
    ## Notes
    - This keeps the legacy config key as a no-op for compatibility while
    removing the ability to toggle the behavior off cleanly.
    - No developer-facing docs update outside the touched app-server README
    was needed.
  • cleanup: Remove skill env var dependency prompting (#22721)
    Deletes the skill env var dependency prompt feature and its runtime
    path. env_var entries in skill dependency metadata are now silently
    ignored during skill loading.
  • [codex] Remove external websocket session resets (#23384)
    ## Why
    
    Compaction now installs replacement history inside the session, but the
    turn and compaction callers were still reaching into
    `ModelClientSession` to reset websocket transport state after that
    install. That made a transport-level reset part of the compaction API
    even though websocket incremental request selection already checks
    whether the next request is a strict extension of the previous one and
    falls back to a full `response.create` when it is not.
    
    ## What changed
    
    - Removed the compaction-side calls to `reset_websocket_session` from
    `compact.rs` and `session/turn.rs`.
    - Simplified pre-sampling and mid-turn compaction helpers so they return
    `CodexResult<()>` instead of carrying a reset flag.
    - Made `ModelClientSession::reset_websocket_session` private to
    `client.rs`, leaving only the websocket timeout recovery path inside the
    client as a caller.
    
    ## Validation
    
    - `cargo test -p codex-core --test all
    responses_websocket_creates_on_non_prefix`
    - `cargo test -p codex-core --test all
    steered_user_input_waits_for_model_continuation_after_mid_turn_compact`
    - `cargo test -p codex-core --test all
    pre_sampling_compact_runs_on_switch_to_smaller_context_model`
  • app-server: use profile ids in v2 permission params (#23360)
    ## Why
    
    The v2 app-server permission profile fields are experimental, but the
    previous migration kept a legacy object payload for profile selection.
    That made clients aware of server-owned `activePermissionProfile`
    metadata such as `extends`, and it kept a
    `legacy_additional_writable_roots` path even though
    `runtimeWorkspaceRoots` now owns runtime workspace-root selection.
    
    This PR makes the client contract match the intended model: clients
    select a permission profile by id, and the server resolves and reports
    active profile provenance in response payloads.
    
    Follow-up to #22611.
    
    ## What Changed
    
    - Changed `thread/start`, `thread/resume`, `thread/fork`, and
    `turn/start` permission profile selection to plain profile id strings.
    - Changed `command/exec.permissionProfile` to a plain profile id string
    for the same client/server ownership split.
    - Removed `PermissionProfileSelectionParams` and the legacy `{ type:
    "profile", modifications: [...] }` compatibility deserializer.
    - Updated app-server, TUI, and `codex exec` call sites to send only ids,
    while keeping `activePermissionProfile` as server response metadata.
    - Updated app-server docs and schema fixtures for the revised
    `command/exec.permissionProfile` shape.
    
    ## Verification
    
    - `cargo test -p codex-app-server-protocol`
    - `RUST_MIN_STACK=8388608 cargo test -p codex-app-server`
    - `cargo test -p codex-exec`
    - `RUST_MIN_STACK=8388608 cargo test -p codex-tui`
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/23360).
    * #23368
    * __->__ #23360
  • [codex-analytics] preserve user thread source for exec threads (#23376)
    ## Why
    - Follows #20949.
    - The above moved `thread_source` attribution from the reducer to
    explicit caller provided metadata
    - The `codex exec` path still omitted this metadata, leaving
    exec-created threads without `thread_source`
    
    
    ## What Changed
    - Ensures exec threads are marked as user created (`thread_source =
    "user"`)
    - Preserves thread-source metadata in exec’s startup session event
    
    
    ## Verification
    - Updated unit tests to validate exec `thread_source` propagation.
    - `cargo +1.93.0 test -p codex-exec --manifest-path codex-rs/Cargo.toml`
    - `cargo +1.93.1 build -p codex-cli --manifest-path codex-rs/Cargo.toml`
    - Validated locally with a freshly built `codex exec` run:
      - Startup logs showed `thread_source: Some(User)`.
      - Rollout metadata recorded `"thread_source":"user"`.
  • fix(tui): warn on unsupported iTerm2 pet versions (#23371)
    ## Why
    
    Older iTerm2 builds can be detected as supporting the image transport
    that terminal pets use, but in practice they fail to render the pet flow
    correctly. Instead of silently attempting image rendering, Codex should
    tell the user that their iTerm2 version is too old and that upgrading is
    the fix.
    
    ## What Changed
    
    - gate iTerm2 pet auto-detection on version `3.6.0` or newer
    - show a dedicated upgrade message for older or unknown iTerm2 versions
    instead of the generic unsupported-terminal warning
    - keep the existing generic unsupported-terminal path for non-iTerm
    terminals
    - add regression coverage for iTerm2 version parsing and the old-iTerm
    warning path
    
    ## How to Test
    
    1. Start Codex in iTerm2 3.6 or newer.
    2. Run `/pets`.
    3. Confirm the pets picker opens instead of showing a warning.
    4. Start Codex in an older iTerm2 build, or exercise the equivalent test
    path.
    5. Run `/pets`.
    6. Confirm Codex warns that pets require iTerm2 3.6 or newer and tells
    the user to upgrade.
    7. Also verify that a non-iTerm unsupported terminal still shows the
    generic unsupported-terminal message.
    
    Targeted tests:
    - `cargo test -p codex-terminal-detection`
    - `cargo test -p codex-tui pets::`
    - `cargo test -p codex-tui slash_pets_on_unsupported_terminal`
    - `cargo test -p codex-tui slash_pets_on_old_iterm2`
  • [codex] Move pending input into input queue (#22728)
    ## Why
    
    Pending model input was split across `Session`, `TurnState`, and the
    agent mailbox. That made it easy for new paths to manage queued user
    input or mailbox delivery outside the intended ownership boundary.
    
    This PR consolidates the model-facing input lifecycle behind the session
    input queue so turn-local pending input, next-turn queued items, and
    mailbox delivery coordination are owned in one place.
    
    ## What Changed
    
    - Added `session/input_queue.rs` to own pending input queues and mailbox
    delivery coordination.
    - Removed the standalone `agent/mailbox.rs` channel wrapper and store
    mailbox items directly in the input queue.
    - Moved pending-input mutations off `TurnState`; `TurnState` now exposes
    the queue-owned storage directly for now.
    - Routed abort cleanup, mailbox delivery phase changes, next-turn queued
    items, and active-turn pending input through `InputQueue`.
    - Boxed stack-heavy agent resume/fork startup futures that the refactor
    pushed over the default test stack.
    - Updated session, task, goal, stream-event, and multi-agent call sites
    and tests to use the new queue ownership.
    
    ## Verification
    
    - `cargo test -p codex-core --lib agent::control::tests`
    - `cargo test -p codex-core --lib
    agent::control::tests::resume_closed_child_reopens_open_descendants --
    --exact`
    - `cargo test -p codex-core --lib
    agent::control::tests::spawn_agent_fork_last_n_turns_keeps_only_recent_turns
    -- --exact`
    - `cargo test -p codex-core --lib
    agent::control::tests::resume_thread_subagent_restores_stored_nickname_and_role
    -- --exact`
    - `cargo test -p codex-core` was also run; it completed with 1814
    passed, 4 ignored, and one timeout in
    `agent::control::tests::resume_thread_subagent_restores_stored_nickname_and_role`,
    which passed when rerun in isolation.
  • Include plugin id in plugin MCP tool metadata (#23353)
    Adding the id of the plugin that contains the MCP (if any) so we can
    apply filters at plugin level.
    
    ## Summary
    - carry the plugin owner into MCP runtime provenance
    - attach `plugin_id` to outbound plugin-backed MCP tool-call `_meta`
    - avoid misattributing user-configured MCP servers that shadow plugin
    server names
    
    ## Testing
    - `just fmt`
    - `just fix -p codex-mcp`
    - `just fix -p codex-core`
    - `cargo test -p codex-mcp`
    - `cargo test -p codex-core
    plugin_mcp_tool_call_request_meta_includes_plugin_id`
    - `cargo test -p codex-core
    to_mcp_config_omits_plugin_id_when_user_server_shadows_plugin_mcp`
    - `cargo test -p codex-core
    rebuild_preserving_session_layers_refreshes_plugin_derived_mcp_config`
    - `git diff --check`
    
    ## Notes
    - Attempted `cargo test -p codex-core`; it aborted in
    `agent::control::tests::resume_agent_from_rollout_skips_descendants_when_parent_resume_fails`
    with a stack overflow before the full suite completed.
  • [codex] Trim unused TurnContextItem fields (#22709)
    ## Why
    
    `TurnContextItem` is the durable baseline used to reconstruct context
    diffs across resume/fork. Most of the old persisted-only fields on it
    are no longer read, so keeping them in rollout snapshots adds schema
    surface and state that can drift without affecting reconstruction.
    
    `summary` is the exception: older Codex versions require it to
    deserialize `turn_context` records, so keep writing a default
    compatibility value until that schema surface can be removed safely.
    
    ## What changed
    
    - Removed the unused persisted fields from `TurnContextItem`: trace ids,
    user/developer instructions, output schema, and truncation policy.
    - Kept `summary` with a compatibility comment and made
    `TurnContext::to_turn_context_item` write `ReasoningSummary::Auto`
    instead of live turn state.
    - Updated rollout/context reconstruction fixtures for the retained
    summary field.
    
    ## Verification
    
    - `cargo test -p codex-protocol --lib turn_context_item`
    - `cargo test -p codex-rollout
    resume_candidate_matches_cwd_reads_latest_turn_context`
    - `cargo test -p codex-state turn_context`
    - `cargo test -p codex-core --lib
    new_default_turn_captures_current_span_trace_id`
    - `cargo test -p codex-core --lib
    record_initial_history_resumed_turn_context_after_compaction_reestablishes_reference_context_item`
    - `cargo test -p codex-core --test all
    emits_warning_when_resumed_model_differs`
    - `git diff --check`
  • Publish Linux runtime wheels with glibc-compatible tags (#21812)
    ## Why
    
    The Python SDK depends on `openai-codex-cli-bin` runtime wheels being
    installable on the Linux hosts our users actually run. The release
    workflow currently tags the Linux runtime artifacts as `musllinux_*`,
    which makes pip ignore them on normal glibc distributions even though
    the bundled Rust executables are intended to run there.
    
    ## What changed
    
    - Tag the Linux runtime wheels as `manylinux_2_17_aarch64` and
    `manylinux_2_17_x86_64` instead of `musllinux_1_1_*`.
    - Keep the existing runtime wheel build and publish flow unchanged
    otherwise.
    
    ## Verification
    
    - Confirmed the wheel-tag issue against the PyPA platform-tag rules for
    `manylinux` vs `musllinux`.
    - This PR is now intentionally scoped to the tag correction only; the
    broader Python runtime release workflow has already landed on `main`
    through the merged stack.
    
    ## Follow-up
    
    After publishing the next alpha from this branch, install the
    SDK/runtime in a fresh glibc Linux environment and confirm pip resolves
    the tagged Linux wheel as expected.
    
    Co-authored-by: Codex <noreply@openai.com>
  • Improve codex remote-control CLI UX (#22878)
    ## Description
    
    This PR makes `codex remote-control` behave like a foreground CLI
    command by default. Running it now starts remote control, waits for
    readiness, prints a clear status message with the machine name, and
    stays alive until Ctrl-C.
    
    Users who want daemon behavior can use `codex remote-control start`, and
    `codex remote-control stop` now prints concise human-readable output.
    `--json` remains available for scripts.
    
    Implementation-wise, this now verifies the real app-server state instead
    of just assuming startup worked. The CLI starts or connects to
    app-server, probes its control socket, calls the `remoteControl/enable`
    API, and waits for the remote-control status response/notification
    before printing success.
    
    For daemon mode, `codex remote-control start` also reports which managed
    app-server binary was used, including its path and best-effort `codex
    --version`, so failures are easier to diagnose.
    
    ## Examples
    
    Example output:
    ```
    > codex remote-control
    Starting app-server with remote control enabled...
    This machine is available for remote control as com-97826.
    Press Ctrl-C to stop.
    ```
    
    Error case using daemon (currently expected based on our publicly
    released CLI version):
    ```
    > ./target/debug/codex remote-control start
    Starting app-server daemon with remote control enabled...
    Error: app server did not become ready on /Users/owen/.codex/app-server-control/app-server-control.sock
    
    Daemon used app-server:
      path: /Users/owen/.codex/packages/standalone/current/codex
      version: 0.130.0
    
    Managed app-server stderr (/Users/owen/.codex/app-server-daemon/app-server.stderr.log):
      error: unexpected argument '--remote-control' found
      
      Usage: codex app-server [OPTIONS] [COMMAND]
      
      For more information, try '--help'.
    
    Caused by:
        0: failed to connect to /Users/owen/.codex/app-server-control/app-server-control.sock
        1: No such file or directory (os error 2)
    ```
    
    ## What changed
    
    - `codex remote-control` now runs remote control in the foreground and
    prints a Ctrl-C stop hint.
    - `codex remote-control start` starts the daemon and waits for remote
    control readiness before reporting success.
    - `codex remote-control stop` reports stopped/not-running status in
    plain language.
    - Startup failures now include recent managed app-server stderr to make
    daemon issues easier to diagnose.
    - Added coverage for CLI output, readiness waiting, foreground shutdown,
    and stderr log tailing.
  • Reduce rust-ci-full Windows nextest timeout flakes (#23253)
    ## Why
    Recent `rust-ci-full` failures were dominated by transient Windows
    timeout clusters in process-heavy tests such as `suite::resume`,
    `suite::cli_stream`, `suite::auth_env`,
    `start_thread_uses_all_default_environments_from_codex_home`, and
    `connect_stdio_command_initializes_json_rpc_client_on_windows`.
    
    The goal here is to make those known flaky paths less likely to fail
    full CI without relaxing the global nextest timeout policy.
    
    ## What changed
    - Enable one global nextest retry with `retries = 1` so a single
    transient failure can recover.
    - Add a `windows_process_heavy` test group with `max-threads = 2` for
    the recurring Windows subprocess/session-heavy timeout families.
    - Add Windows-only slow-timeout overrides for that process-heavy group.
    - Add a narrower Windows-only timeout override for
    `start_thread_uses_all_default_environments_from_codex_home`, which
    still exceeded the broader Windows bucket in both Windows full-CI lanes.
    - Increase the `rust-ci-full` nextest job timeout from `45m` to `60m` so
    Windows ARM64 still has job-level headroom after retries and targeted
    per-test timeout increases.
    - Keep the global `slow-timeout` unchanged at `15s`.
    
    ## Validation
    Validated through `rust-ci-full` GitHub Actions reruns on this PR.
    
    Observed improvement on the tuned Windows lanes:
    - Windows x64 went from `5 timed out` to `0 timed out`.
    - Windows ARM64 went from `2 timed out` to `0 timed out`.
    - `start_thread_uses_all_default_environments_from_codex_home` recovered
    as a flaky pass on Windows ARM64 instead of timing out.
    
    The remaining failing tests in those runs were unrelated hard failures
    outside this nextest timeout tuning.
  • Add tool lifecycle extension contributor (#23309)
    ## Why
    
    Extensions that need to track runtime progress currently have no typed
    host signal for tool execution. The goal extension in particular needs
    to observe tool attempts without inspecting tool payloads, owning tool
    implementations, or staying coupled to core-only runtime plumbing.
    
    This adds a narrow lifecycle contributor API for host-owned tool
    execution: extensions can observe when an accepted tool call starts and
    how it finishes, while policy hooks and tool handlers continue to own
    payload rewriting, blocking, and execution.
    
    Relevant code:
    
    -
    [`ToolLifecycleContributor`](https://github.com/openai/codex/blob/3ad2850ffc7d8a1da19c65a92425637a59098f1b/codex-rs/ext/extension-api/src/contributors.rs#L119)
    defines the extension-facing observer contract.
    -
    [`tool_lifecycle.rs`](https://github.com/openai/codex/blob/3ad2850ffc7d8a1da19c65a92425637a59098f1b/codex-rs/ext/extension-api/src/contributors/tool_lifecycle.rs)
    defines the typed start/finish inputs, source, and outcome enums.
    - [`notify_tool_start` /
    `notify_tool_finish`](https://github.com/openai/codex/blob/3ad2850ffc7d8a1da19c65a92425637a59098f1b/codex-rs/core/src/tools/lifecycle.rs)
    bridges core tool dispatch into the extension registry.
    
    ## What Changed
    
    - Added `ToolLifecycleContributor` to `codex-extension-api`, including:
      - `ToolStartInput`
      - `ToolFinishInput`
      - `ToolCallSource`
      - `ToolCallOutcome`
    - Added registration and lookup support on `ExtensionRegistryBuilder` /
    `ExtensionRegistry`.
    - Wired core tool dispatch to notify lifecycle contributors for:
      - accepted tool starts
      - completed tool calls, including the tool output success marker
      - pre-tool-use blocks
      - failures before or after the handler runs
      - cancellation/abort in the parallel tool path
    - Registered the goal extension as a lifecycle contributor and added the
    outcome filter it will use for goal progress accounting.
    
    ## Test Coverage
    
    - Added `dispatch_notifies_tool_lifecycle_contributors` to cover
    lifecycle notification ordering and outcomes for successful and
    handler-failed tool calls.
  • fix: default unknown tool schemas to empty schemas (#22380)
    ## Why
    
    Some tool providers, especially MCP servers and dynamic tool sources,
    can supply schema nodes that omit `type` and have no recognized JSON
    Schema shape hints. Previously, `sanitize_json_schema` filled those
    unknown nodes in as `string`, which made the schema parseable but
    invented a scalar constraint that the provider did not specify. For
    description-only fields, that could incorrectly steer tool arguments
    away from the provider's actual accepted shape.
    
    The Responses API accepts permissive empty schemas such as `{}` at
    nested property positions, so Codex should preserve that permissive
    meaning instead of coercing unknown schema nodes into a misleading
    scalar type.
    
    ## What Changed
    
    - Changed the no-hints fallback in `codex-rs/tools/src/json_schema.rs`
    to clear unrecognized object schema nodes to `{}`.
    - Empty schemas now remain `{}` rather than becoming `type: "string"`.
    - Description-only or otherwise metadata-only nested property schemas
    now become `{}` while surrounding object/array/string/number inference
    still applies when recognized hints are present.
    - Updated `codex-tools` and `codex-core` tests to cover top-level empty
    schemas, nested empty schemas, metadata-only malformed schemas, dynamic
    tools, and MCP tool specs.
    
    ## Verification
    
    - `cargo test -p codex-tools`
    - `cargo test -p codex-core
    test_mcp_tool_property_missing_type_defaults_to_empty_schema`
    - Manually verified the real Responses API behavior for both
    empty-schema positions:
    - Top-level function `parameters: {}` is accepted and echoed back as
    `{"type":"object","properties":{}}`; when forced to call the tool,
    Responses emitted empty object arguments: `"arguments": "{}"`.
    - Nested property schema `{}` is accepted and preserved as `{}`; when
    forced to call a tool with `metadata.extra`, Responses emitted
    `"arguments": "{\"metadata\":{\"extra\":\"codex schema sanitizer
    behavior\"}}"`.
  • codex: route global AGENTS reads through LOCAL_FS (#23343)
    ## Summary
    - make `load_global_instructions` read through an `ExecutorFileSystem`
    - call global AGENTS reads with explicit `LOCAL_FS` so they stay tied to
    local codex-home state
    
    ## Validation
    - `bazel test --bes_backend= --bes_results_url=
    --test_filter=instruction_sources_include_global_before_agents_md_docs
    //codex-rs/core:core-unit-tests` on `dev`
  • feat(app-server): add optional thread_id to experimentalFeature/list (#23335)
    ## Why
    
    `experimentalFeature/list` reports effective feature enablement, but
    currently does not resolve it against a working directory where
    project-local config.toml files can exist and toggle on/off features
    when merged into the effective config after resolving the various config
    layers. That means we effectively (and incorrectly) ignore features set
    in project-local config.
    
    To address that, this PR exposes an optional `thread_id` param which
    allows us to load the thread's `cwd.
    
    ## Testing
    
    - `cargo test -p codex-app-server-protocol`
    - `cargo test -p codex-app-server experimental_feature_list`
  • feat(tui): handle paste in session picker (#23338)
    ## Why
    
    The session picker already supports typed search, but it ignored
    bracketed paste events entirely. On macOS terminals this makes pasted
    text look like a no-op on the resume screen, which is especially
    noticeable when a user wants to paste part of a thread name, branch, or
    path into the search field.
    
    ## What Changed
    
    - route `TuiEvent::Paste(String)` into the session picker instead of
    dropping it
    - normalize pasted search text into a single-line query by collapsing
    whitespace
    - ignore whitespace-only pastes
    - reuse the existing `set_query(...)` path so pasted searches keep the
    same filtering and pagination behavior as typed input
    - add focused tests for append behavior, whitespace normalization,
    whitespace-only paste, and the existing search-loading path
    
    This PR is stacked on top of #23234 and contains only the net change
    relative to `etraut/clarify-resume-hints`.
    
    ## How to Test
    
    1. Start Codex in a terminal that emits bracketed paste, for example
    iTerm2 on macOS.
    2. Open the resume picker so the search UI is visible.
    3. Copy a term that should match one of the visible sessions, then paste
    it into the picker.
    4. Confirm the query updates immediately and the list filters as if the
    text had been typed.
    5. Also verify that pasting text with newlines or tabs still produces a
    usable single-line search query.
    6. Also verify that normal typed search still works and that `Esc` still
    clears the query / exits as before.
    
    Targeted tests:
    - `cargo test -p codex-tui`
    
    ---------
    
    Co-authored-by: Eric Traut <etraut@openai.com>
  • goals: keep pause transitions explicit (#23088)
    ## Problem
    
    This addresses several user-reported cases where active goals were
    paused even though the user had not explicitly asked for that
    transition:
    
    - the guardian approval-review circuit breaker interrupted a turn and
    implicitly paused the goal
    - a shutdown in one app-server instance could pause a goal while a
    second instance was still actively running the same thread
    - steering-style interrupts could also pause the goal even though they
    are meant to redirect work, not stop the goal lifecycle
    
    The common problem was that core treated `TurnAbortReason::Interrupted`
    as an implicit request to transition the persisted goal to `paused`.
    That made unrelated interrupt paths mutate goal state as a side effect,
    and in the multi-app-server case it allowed stale process teardown to
    pause a live goal owned by another running client.
    
    After this change, transitioning a goal to `paused` is always an
    explicit action performed by a client or another intentional goal-state
    mutation. It is never an implicit transition triggered by generic
    interrupt handling.
    
    Refs #22884.
    
    ## What changed
    
    - Remove the goal runtime path that paused active goals after
    interrupted task aborts.
    - Drop the now-unused abort reason from `GoalRuntimeEvent::TaskAborted`.
    - Update the focused regression coverage so an interrupted active goal
    still accounts usage but remains `active`.