Commit Graph

5277 Commits

  • Handle closed TUI input stream as shutdown (#17430)
    Addresses #17276
    
    Problem: Closing the terminal while the TUI input stream is pending
    could leave the app outside the normal shutdown path, which is risky
    when an approval prompt is active.
    
    Solution: Treat a closed TUI input stream as ShutdownFirst so existing
    thread shutdown behavior cancels pending work and approvals before exit.
  • fix(tui): recall accepted slash commands locally (#17336)
    # TL;DR
    
    - Adds recognized slash commands to the TUI's local in-session recall
    history.
    - This is the MVP of the whole feature: it keeps slash-command recall
    local only: nothing is written to persistent history, app-server
    history, or core history storage.
    - Treats slash commands like submitted text once they parse as a known
    built-in command, regardless of whether command dispatch later succeeds.
    
    # Problem
    
    Slash commands are handled outside the normal message submission path,
    so they could clear the composer without becoming part of the local
    Up-arrow recall list. That made command-heavy workflows awkward: after
    running `/diff`, `/rename Better title`, `/plan investigate this`, or
    even a valid command that reports a usage error, users had to retype the
    command instead of recalling and editing it like a normal prompt.
    
    The goal of this PR is to make slash commands feel like submitted input
    inside the current TUI session while keeping the change deliberately
    local. This is not persistent history yet; it only affects the
    composer's in-memory recall behavior.
    
    # Mental model
    
    The composer owns draft state and local recall. When slash input parses
    as a recognized built-in command, the composer stages the submitted
    command text before returning `InputResult::Command` or
    `InputResult::CommandWithArgs`. `ChatWidget` then dispatches the command
    and records the staged entry once dispatch returns to the input-result
    path.
    
    Command-name recognition is the only validation before local recall. A
    valid slash command is recallable whether it succeeds, fails with a
    usage error, no-ops, is unavailable while a task is running, or is
    skipped by command-specific logic. An unrecognized slash command is
    different: it is restored as a draft, surfaces the existing
    unrecognized-command message, and is not added to recall.
    
    Bare commands recalled from typed text use the trimmed submitted draft.
    Commands selected from the popup record the canonical command text, such
    as `/diff`, rather than the partial filter text the user typed. Inline
    commands with arguments keep the original command invocation available
    locally even when their arguments are later prepared through the normal
    submission pipeline.
    
    # Non-goals
    
    Persisting slash commands across sessions is intentionally out of scope.
    This change does not modify app-server history, core history storage,
    protocol events, or message submission semantics.
    
    This does not change command availability, command side effects, popup
    filtering, command parsing, or the semantics of unsupported commands. It
    only changes whether recognized slash-command invocations are available
    through local Up-arrow recall after the user submits them.
    
    # Tradeoffs
    
    The main tradeoff is that recall is based on command recognition, not
    command outcome. This intentionally favors a simpler user model: if the
    TUI accepted the input as a slash command, the user can recall and edit
    that input just like plain text. That means valid-but-unsuccessful
    invocations such as usage errors are recallable, which is useful when
    the next action is usually to edit and retry.
    
    The previous accept/reject design required command dispatch to report a
    boolean outcome, which made the dispatcher API noisier and forced every
    branch to decide history behavior. This version keeps the dispatch APIs
    as side-effect-only methods and localizes history recording to the
    slash-command input path.
    
    Inline command handling still avoids double-recording by preparing
    inline arguments without using the normal message-submission history
    path. The staged slash-command entry remains the single local recall
    record for the command invocation.
    
    # Architecture
    
    `ChatComposer` stages a pending `HistoryEntry` when recognized
    slash-command input is promoted into an input result. The pending entry
    mirrors the existing local history payload shape so recall can restore
    text elements, local images, remote images, mention bindings, and
    pending paste state when those are present.
    
    `BottomPane` exposes a narrow method for recording that staged command
    entry because it owns the composer. `ChatWidget` records the staged
    entry after dispatching a recognized command from the input-result
    match. Valid commands rejected before they reach `ChatWidget`, such as
    commands unavailable while a task is running, are staged and recorded in
    the composer path that detects the rejection.
    
    Slash-command dispatch itself now lives in
    `chatwidget/slash_dispatch.rs` so the behavior is reviewable without
    adding more weight to `chatwidget.rs`. The extraction is
    behavior-preserving: the dispatch match arms stay intact, while the
    input flow in `chatwidget.rs` remains the single place that connects
    submitted slash-command input to dispatch.
    
    # Observability
    
    There is no new logging because this is a local UI recall behavior and
    the result is directly visible through Up-arrow recall. The practical
    debug path is to trace Enter through
    `ChatComposer::try_dispatch_bare_slash_command`,
    `ChatComposer::try_dispatch_slash_command_with_args`, or popup Enter/Tab
    handling, then confirm the recognized command is staged before dispatch
    and recorded exactly once afterward.
    
    If a valid command unexpectedly does not appear in recall, check whether
    the input path staged slash history before clearing the composer and
    whether it used the `ChatWidget` slash-dispatch wrapper. If an
    unrecognized command unexpectedly appears in recall, check the parser
    branch that should restore the draft instead of staging history.
    
    # Tests
    
    Composer-level tests cover staging and recording for a bare typed slash
    command, a popup-selected command, and an inline command with arguments.
    
    Chat-widget tests cover valid commands being recallable after normal
    dispatch, inline dispatch, usage errors, task-running unavailability,
    no-op stub dispatch, and command-specific skip behavior such as `/init`
    when an instructions file already exists. They also cover the negative
    case: unrecognized slash commands are not added to local recall.
  • Pass turn id with feedback uploads (#17314)
    ## Summary
    - Add an optional `tags` dictionary to feedback upload params.
    - Capture the active app-server turn id in the TUI and submit it as
    `tags.turn_id` with `/feedback` uploads.
    - Merge client-provided feedback tags into Sentry feedback tags while
    preserving reserved system fields like `thread_id`, `classification`,
    `cli_version`, `session_source`, and `reason`.
    
    ## Behavior / impact
    Existing feedback upload callers remain compatible because `tags` is
    optional and nullable. The wire shape is still a normal JSON object /
    TypeScript dictionary, so adding future feedback metadata will not
    require a new top-level protocol field each time. This change only adds
    feedback metadata for Codex CLI/TUI uploads; it does not affect existing
    pipelines, DAGs, exports, or downstream consumers unless they choose to
    read the new `turn_id` feedback tag.
    
    ## Tests
    - `cargo fmt -- --config imports_granularity=Item` passed; stable
    rustfmt warned that `imports_granularity` is nightly-only.
    - `cargo run -p codex-app-server-protocol --bin write_schema_fixtures`
    - `cargo test -p codex-feedback
    upload_tags_include_client_tags_and_preserve_reserved_fields`
    - `cargo test -p codex-app-server-protocol
    schema_fixtures_match_generated`
    - `cargo test -p codex-tui build_feedback_upload_params`
    - `cargo test -p codex-tui
    live_app_server_turn_started_sets_feedback_turn_id`
    - `cargo check -p codex-app-server --tests`
    - `git diff --check`
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • feat(devcontainer): add separate secure customer profile (#10431)
    ## Description
    
    Keeps the existing Codex contributor devcontainer in place and adds a
    separate secure profile for customer use.
    
    ## What changed
    
    - leaves `.devcontainer/devcontainer.json` and the contributor
    `Dockerfile` aligned with `main`
    - adds `.devcontainer/devcontainer.secure.json` and
    `.devcontainer/Dockerfile.secure`
    - adds secure-profile bootstrap scripts:
      - `post_install.py`
      - `post-start.sh`
      - `init-firewall.sh`
    - updates `.devcontainer/README.md` to explain when to use each path
    
    ## Secure profile behavior
    
    The new secure profile is opt-in and is meant for running Codex in a
    stricter project container:
    
    - preinstalls the Codex CLI plus common build tools
    - uses persistent volumes for Codex state, Cargo, Rustup, and GitHub
    auth
    - applies an allowlist-driven outbound firewall at startup
    - blocks IPv6 by default so the allowlist cannot be bypassed via AAAA
    routes
    - keeps the stricter networking isolated from the default contributor
    workflow
    
    ## Resulting behavior
    
    - `devcontainer.json` remains the low-friction Codex contributor setup
    - `devcontainer.secure.json` is the customer-facing secure option
    - the repo supports both workflows without forcing the secure profile on
    Codex contributors
  • Fix thread/list cwd filtering for Windows verbatim paths (#17414)
    Addresses #17302
    
    Problem: `thread/list` compared cwd filters with raw path equality, so
    `resume --last` could miss Windows sessions when the saved cwd used a
    verbatim path form and the current cwd did not.
    
    Solution: Normalize cwd comparisons through the existing path comparison
    utilities before falling back to direct equality, and add Windows
    regression coverage for verbatim paths. I made this a general utility
    function and replaced all of the duplicated instance of it across the
    code base.
  • Stabilize marketplace add local source test (#17424)
    ## Summary
    - Update the marketplace add local-source integration test to pass an
    explicit relative local path.
    - Keep the change test-only; no CLI source parsing behavior changes.
    
    ## Tests
    - cargo fmt -p codex-cli
    - cargo test -p codex-cli --test marketplace_add
    
    ## Impact
    - Production behavior is unchanged.
    - No impact to feedback upload logic, DAGs, exports, or downstream
    pipelines.
    
    Co-authored-by: Codex <noreply@openai.com>
  • [mcp] Support MCP Apps part 3 - Add mcp tool call support. (#17364)
    - [x] Add a new app-server method so that MCP Apps can call their own
    MCP server directly.
  • fix: unblock private DNS in macOS sandbox (#17370)
    ## Summary
    - keep hostname targets proxied by default by removing hostname suffixes
    from the managed `NO_PROXY` value while preserving private/link-local
    CIDRs
    - make the macOS `allow_local_binding` sandbox rules match the local
    socket shape used by DNS tools by allowing wildcard local binds
    - allow raw DNS egress to remote port 53 only when `allow_local_binding`
    is enabled, without opening blanket outbound network access
    
    ## Root cause
    Raw DNS tools do not honor `HTTP_PROXY` or `ALL_PROXY`, so the
    proxy-only Seatbelt policy blocked their resolver traffic before it
    could reach host DNS. In the affected managed config,
    `allow_local_binding = true`, but the existing rule only allowed
    `localhost:*` binds; `dig`/BIND can bind sockets in a way that needs
    wildcard local binding. Separately, hostname suffixes in `NO_PROXY`
    could force internal hostnames to resolve locally instead of through the
    proxy path.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • TUI: enforce core boundary (#17399)
    Problem: The TUI still depended on `codex-core` directly in a number of
    places, and we had no enforcement from keeping this problem from getting
    worse.
    
    Solution: Route TUI core access through
    `codex-app-server-client::legacy_core`, add CI enforcement for that
    boundary, and re-export this legacy bridge inside the TUI as
    `crate::legacy_core` so the remaining call sites stay readable. There is
    no functional change in this PR — just changes to import targets.
    
    Over time, we can whittle away at the remaining symbols in this legacy
    namespace with the eventual goal of removing them all. In the meantime,
    this linter rule will prevent us from inadvertently importing new
    symbols from core.
  • representing guardian review timeouts in protocol types (#17381)
    ## Summary
    
    - Add `TimedOut` to Guardian/review carrier types:
      - `ReviewDecision::TimedOut`
      - `GuardianAssessmentStatus::TimedOut`
      - app-server v2 `GuardianApprovalReviewStatus::TimedOut`
    - Regenerate app-server JSON/TypeScript schemas for the new wire shape.
    - Wire the new status through core/app-server/TUI mappings with
    conservative fail-closed handling.
    - Keep `TimedOut` non-user-selectable in the approval UI.
    
    **Does not change runtime behavior yet; emitting `TimeOut` and
    parent-model timeout messaging will come in followup PRs**
  • Fix Windows exec-server output test flake (#17409)
    Problem: The Windows exec-server test command could let separator
    whitespace become part of `echo` output, making the exact
    retained-output assertion flaky.
    
    Solution: Tighten the Windows `cmd.exe` command by placing command
    separators directly after the echoed tokens so stdout remains
    deterministic while preserving the exact assertion.
  • Add marketplace command (#17087)
    Added a new top-level `codex marketplace add` command for installing
    plugin marketplaces into Codex’s local marketplace cache.
    
    This change adds source parsing for local directories, GitHub shorthand,
    and git URLs, supports optional `--ref` and git-only `--sparse` checkout
    paths, stages the source in a temp directory, validates the marketplace
    manifest, and installs it under
    `$CODEX_HOME/marketplaces/<marketplace-name>`
    
    Included tests cover local install behavior in the CLI and marketplace
    discovery from installed roots in core. Scoped formatting and fix passes
    were run, and targeted CLI/core tests passed.
  • feat(analytics): add guardian review event schema (#17055)
    Just the analytics schema definition for guardian evaluations. No wiring
    done yet.
  • fix(permissions): fix symlinked writable roots in sandbox permissions (#15981)
    ## Summary
    - preserve logical symlink paths during permission normalization and
    config cwd handling
    - bind real targets for symlinked readable/writable roots in bwrap and
    remap carveouts and unreadable roots there
    - add regressions for symlinked carveouts and nested symlink escape
    masking
    
    ## Root cause
    Permission normalization canonicalized symlinked writable roots and cwd
    to their real targets too early. That drifted policy checks away from
    the logical paths the sandboxed process can actually address, while
    bwrap still needed the real targets for mounts. The mismatch caused
    shell and apply_patch failures on symlinked writable roots.
    
    ## Impact
    Fixes #15781.
    
    Also fixes #17079:
    - #17079 is the protected symlinked carveout side: bwrap now binds the
    real symlinked writable-root target and remaps carveouts before masking.
    
    Related to #15157:
    - #15157 is the broader permission-check side of this path-identity
    problem. This PR addresses the shared logical-vs-canonical normalization
    issue, but the reported Darwin prompt behavior should be validated
    separately before auto-closing it.
    
    This should also fix #14672, #14694, #14715, and #15725:
    - #14672, #14694, and #14715 are the same Linux
    symlinked-writable-root/bwrap family as #15781.
    - #15725 is the protected symlinked workspace path variant; the PR
    preserves the protected logical path in policy space while bwrap applies
    read-only or unreadable treatment to the resolved target so
    file-vs-directory bind mismatches do not abort sandbox setup.
    
    ## Notes
    - Added Linux-only regressions for symlinked writable ancestors and
    protected symlinked directory targets, including nested symlink escape
    masking without rebinding the escape target writable.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • app-server: add pipelined config rpc regression test (#17371)
    ### Summary
    Adds regression coverage for pipelined config RPC reads after writes
    ### Testing
    These are new tests
  • Revert "Option to Notify Workspace Owner When Usage Limit is Reached" (#17391)
    Reverts openai/codex#16969
    
    #sev3-2026-04-10-accountscheckversion-500s-for-openai-workspace-7300
  • fix(guardian, app-server): introduce guardian review ids (#17298)
    ## Description
    
    This PR introduces `review_id` as the stable identifier for guardian
    reviews and exposes it in app-server `item/autoApprovalReview/started`
    and `item/autoApprovalReview/completed` events.
    
    Internally, guardian rejection state is now keyed by `review_id` instead
    of the reviewed tool item ID. `target_item_id` is still included when a
    review maps to a concrete thread item, but it is no longer overloaded as
    the review lifecycle identifier.
    
    ## Motivation
    
    We'd like to give users the ability to preempt a guardian review while
    it's running (approve or decline).
    
    However, we can't implement the API that allows the user to override a
    running guardian review because we didn't have a unique `review_id` per
    guardian review. Using `target_item_id` is not correct since:
    - with execve reviews, there can be multiple execve calls (and therefore
    guardian reviews) per shell command
    - with network policy reviews, there is no target item ID
    
    The PR that actually implements user overrides will use `review_id` as
    the stable identifier.
  • Support clear SessionStart source (#17073)
    ## Motivation
    
    The `SessionStart` hook already receives `startup` and `resume` sources,
    but sessions created from `/clear` previously looked like normal startup
    sessions. This makes it impossible for hook authors to distinguish
    between these with the matcher.
    
    ## Summary
    
    - Add `InitialHistory::Cleared` so `/clear`-created sessions can be
    distinguished from ordinary startup sessions.
    - Add `SessionStartSource::Clear` and wire it through core, app-server
    thread start params, and TUI clear-session flow.
    - Update app-server protocol schemas, generated TypeScript, docs, and
    related tests.
    
    
    https://github.com/user-attachments/assets/9cae3cb4-41c7-4d06-b34f-966252442e5c
  • [codex] Improve hook status rendering (#17266)
    # Motivation
    
    Make hook display less noisy and more useful by keeping transient hook
    activity out of permanent history unless there is useful output,
    preserving visibility for meaningful hook work, and making completed
    hook severity easier to scan.
    
    Also addresses some of the concerns in
    https://github.com/openai/codex/issues/15497
    
    # Changes
    
    ## Demo
    
    
    https://github.com/user-attachments/assets/9d8cebd4-a502-4c95-819c-c806c0731288
    
    Reverse spec for the behavior changes in this branch:
    
    ## Hook Lifecycle Rendering
    - Hook start events no longer write permanent history rows like `Running
    PreToolUse hook`.
    - Running hooks now render in a dedicated live hook area above the
    composer. It's similar to the active cell we use for tool calls but its
    a separate lane.
    - Running hook rows use the existing animation setting.
    
    ## Hook Reveal Timing
    - We wait 300ms before showing running hook rows and linger for up to
    600ms once visible.
    - This is so fast hooks don't flash a transient `Running hook` row
    before user can read it every time.
    - If a fast hook completes with meaningful output, only the completed
    hook result is written to history.
    - If a fast hook completes successfully with no output, it leaves no
    visible trace.
    
    ## Completed Hook Output
    - Completed hooks with output are sticky, for example `• SessionStart
    hook (completed)`.
    - Hook output entries are rendered under that row with stable prefixes:
    `warning:`, `stop:`, `feedback:`, `hook context:`, and `error:`.
    - Blocked hooks show feedback entries, for example `• PreToolUse hook
    (blocked)` followed by `feedback: ...`.
    - Failed hooks show error entries, for example `• PostToolUse hook
    (failed)` followed by `error: ...`.
    - Stopped hooks show stop entries and remain visually treated as
    non-success.
    
    ## Parallel Hook Behavior
    - Multiple simultaneously running hooks can be tracked in one live hook
    cell.
    - Adjacent running hooks with the same hook event name and same status
    message collapse into a count, for example `• Running 3 PreToolUse
    hooks: checking command policy`.
    - Running hooks with different event names or different status messages
    remain separate rows.
    
    ## Hook Run Identity
    - `PreToolUse` and `PostToolUse` hook run IDs now include the tool call
    ID which prevents concurrent tool-use hooks from sharing a run ID and
    clobbering each other in the UI.
    - This ID scoping applies to tool-use hooks only; other hook event types
    keep their existing run identity behavior.
    
    ## App-Server Hook Notifications
    - App-server `HookStarted` and `HookCompleted` notifications use the
    same live hook rendering path as core hook events.
    - `UserPromptSubmit` hook notifications now render through the same
    completed hook output format, including warning and stop entries.
  • add parent-id to guardian context (#17194)
    adding parent codex session id to guardian prompt
  • Add thread title to configurable TUI status line (#17187)
    - Add thread-title as an optional TUI status line item, omitted unless
    the user has set a custom name (`ChatWidget.thread_name`).
    - Refresh the status line when threads are renamded
    - Add snapshot coverage for renamed-thread footer behavior.
  • [codex-analytics] add compaction analytics event (#17155)
    - event for compaction analytics
    - introduces thread-connection and thread metadata caches for data
    denormalization, expected to be useful for denormalization onto core
    emitted events in general
    - threads analytics event client into core (mirrors approved
    implementation in #16640)
    - denormalizes key thread metadata: thread_source, subagent_source,
    parent_thread_id, as well as app-server client and runtime metadata)
    - compaction strategy defaults to memento, forward compatible with
    expected prefill_compaction strategy
    
    1. Manual standalone compact, local
    `INFO | 2026-04-09 17:35:50 | codex_backend.routers.analytics_events |
    analytics_events.track_analytics_events:526 | Tracked
    codex_compaction_event event params={'thread_id':
    '019d74d0-5cfb-70c0-bef9-165c3bf9b2df', 'turn_id':
    '019d74d0-d7f6-7c81-acc6-aae2030243d6', 'product_surface': 'codex',
    'app_server_client': {'product_client_id': 'CODEX_CLI', 'client_name':
    'codex-tui', 'client_version': '0.0.0', 'rpc_transport': 'in_process',
    'experimental_api_enabled': True}, 'runtime': {'codex_rs_version':
    '0.0.0', 'runtime_os': 'macos', 'runtime_os_version': '26.4.0',
    'runtime_arch': 'aarch64'}, 'trigger': 'manual', 'reason':
    'user_requested', 'implementation': 'responses', 'phase':
    'standalone_turn', 'strategy': 'memento', 'status': 'completed',
    'active_context_tokens_before': 20170, 'active_context_tokens_after':
    4830, 'started_at': 1775781337, 'completed_at': 1775781350,
    'thread_source': 'user', 'subagent_source': None, 'parent_thread_id':
    None, 'error': None, 'duration_ms': 13524} | `
    
    2. Auto pre-turn compact, local
    `INFO | 2026-04-09 17:37:30 | codex_backend.routers.analytics_events |
    analytics_events.track_analytics_events:526 | Tracked
    codex_compaction_event event params={'thread_id':
    '019d74d2-45ef-71d1-9c93-23cc0c13d988', 'turn_id':
    '019d74d2-7b42-7372-9f0e-c0da3f352328', 'product_surface': 'codex',
    'app_server_client': {'product_client_id': 'CODEX_CLI', 'client_name':
    'codex-tui', 'client_version': '0.0.0', 'rpc_transport': 'in_process',
    'experimental_api_enabled': True}, 'runtime': {'codex_rs_version':
    '0.0.0', 'runtime_os': 'macos', 'runtime_os_version': '26.4.0',
    'runtime_arch': 'aarch64'}, 'trigger': 'auto', 'reason':
    'context_limit', 'implementation': 'responses', 'phase': 'pre_turn',
    'strategy': 'memento', 'status': 'completed',
    'active_context_tokens_before': 20063, 'active_context_tokens_after':
    4822, 'started_at': 1775781444, 'completed_at': 1775781449,
    'thread_source': 'user', 'subagent_source': None, 'parent_thread_id':
    None, 'error': None, 'duration_ms': 5497} | `
    
    3. Auto mid-turn compact, local
    `INFO | 2026-04-09 17:38:28 | codex_backend.routers.analytics_events |
    analytics_events.track_analytics_events:526 | Tracked
    codex_compaction_event event params={'thread_id':
    '019d74d3-212f-7a20-8c0a-4816a978675e', 'turn_id':
    '019d74d3-3ee1-7462-89f6-2ffbeefcd5e3', 'product_surface': 'codex',
    'app_server_client': {'product_client_id': 'CODEX_CLI', 'client_name':
    'codex-tui', 'client_version': '0.0.0', 'rpc_transport': 'in_process',
    'experimental_api_enabled': True}, 'runtime': {'codex_rs_version':
    '0.0.0', 'runtime_os': 'macos', 'runtime_os_version': '26.4.0',
    'runtime_arch': 'aarch64'}, 'trigger': 'auto', 'reason':
    'context_limit', 'implementation': 'responses', 'phase': 'mid_turn',
    'strategy': 'memento', 'status': 'completed',
    'active_context_tokens_before': 20325, 'active_context_tokens_after':
    14641, 'started_at': 1775781500, 'completed_at': 1775781508,
    'thread_source': 'user', 'subagent_source': None, 'parent_thread_id':
    None, 'error': None, 'duration_ms': 7507} | `
    
    4. Remote /responses/compact, manual standalone
    `INFO | 2026-04-09 17:40:20 | codex_backend.routers.analytics_events |
    analytics_events.track_analytics_events:526 | Tracked
    codex_compaction_event event params={'thread_id':
    '019d74d4-7a11-78a1-89f7-0535a1149416', 'turn_id':
    '019d74d4-e087-7183-9c20-b1e40b7578c0', 'product_surface': 'codex',
    'app_server_client': {'product_client_id': 'CODEX_CLI', 'client_name':
    'codex-tui', 'client_version': '0.0.0', 'rpc_transport': 'in_process',
    'experimental_api_enabled': True}, 'runtime': {'codex_rs_version':
    '0.0.0', 'runtime_os': 'macos', 'runtime_os_version': '26.4.0',
    'runtime_arch': 'aarch64'}, 'trigger': 'manual', 'reason':
    'user_requested', 'implementation': 'responses_compact', 'phase':
    'standalone_turn', 'strategy': 'memento', 'status': 'completed',
    'active_context_tokens_before': 23461, 'active_context_tokens_after':
    6171, 'started_at': 1775781601, 'completed_at': 1775781620,
    'thread_source': 'user', 'subagent_source': None, 'parent_thread_id':
    None, 'error': None, 'duration_ms': 18971} | `
  • Strengthen realtime backend delegation prompt (#17363)
    Encourages realtime prompt handling to delegate user requests to the
    backend agent by default when repo inspection, commands, implementation,
    or validation may help.
    
    Co-authored-by: Codex <noreply@openai.com>
  • Queue Realtime V2 response.create while active (#17306)
    Builds on #17264.
    
    - queues Realtime V2 `response.create` while an active response is open,
    then flushes it after `response.done` or `response.cancelled`
    - requests `response.create` after background agent final output and
    steering acknowledgements
    - adds app-server integration coverage for all `response.create` paths
    
    Validation:
    - `just fmt`
    - `cargo check -p codex-app-server --tests`
    - `git diff --check`
    - CI green
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • feat(guardian): send only transcript deltas on guardian followups (#17269)
    ## Description
    
    We reuse a guardian thread for a given user thread when we can. However,
    we had always sent the full transcript history every time we made a
    followup review request to an existing guardian thread.
    
    This is especially bad for long guardian threads since we keep
    re-appending old transcript entries instead of just what has changed.
    The fix is to just send what's new.
    
    **Caveat**: Whenever a thread is compacted or rolled back, we fall back
    to sending the full transcript to guardian again since the thread's
    history has been modified. However in the happy path we get a nice
    optimization.
    
    ## Before
    Initial guardian review sends the full parent transcript:
    
    ```
    The following is the Codex agent history whose request action you are assessing...
    >>> TRANSCRIPT START
    [1] user: Please check the repo visibility and push the docs fix if needed.
    [2] tool gh_repo_view call: {"repo":"openai/codex"}
    [3] tool gh_repo_view result: repo visibility: public
    [4] assistant: The repo is public; I now need approval to push the docs fix.
    >>> TRANSCRIPT END
    The Codex agent has requested the following action:
    >>> APPROVAL REQUEST START
    ...
    >>> APPROVAL REQUEST END
    ```
    
    And a followup to the same guardian thread would send the full
    transcript again (including items 1-4 we already sent):
    ```
    The following is the Codex agent history whose request action you are assessing...
    >>> TRANSCRIPT START
    [1] user: Please check the repo visibility and push the docs fix if needed.
    [2] tool gh_repo_view call: {"repo":"openai/codex"}
    [3] tool gh_repo_view result: repo visibility: public
    [4] assistant: The repo is public; I now need approval to push the docs fix.
    [5] user: Please push the second docs fix too.
    [6] assistant: I need approval for the second docs fix.
    >>> TRANSCRIPT END
    The Codex agent has requested the following action:
    >>> APPROVAL REQUEST START
    ...
    >>> APPROVAL REQUEST END
    ```
    
    ## After
    Initial guardian review sends the full parent transcript (this is
    unchanged):
    
    ```
    The following is the Codex agent history whose request action you are assessing...
    >>> TRANSCRIPT START
    [1] user: Please check the repo visibility and push the docs fix if needed.
    [2] tool gh_repo_view call: {"repo":"openai/codex"}
    [3] tool gh_repo_view result: repo visibility: public
    [4] assistant: The repo is public; I now need approval to push the docs fix.
    >>> TRANSCRIPT END
    The Codex agent has requested the following action:
    >>> APPROVAL REQUEST START
    ...
    >>> APPROVAL REQUEST END
    ```
    
    But a followup now sends:
    ```
    The following is the Codex agent history added since your last approval assessment. Continue the same review conversation...
    >>> TRANSCRIPT DELTA START
    [5] user: Please push the second docs fix too.
    [6] assistant: I need approval for the second docs fix.
    >>> TRANSCRIPT DELTA END
    The Codex agent has requested the following next action:
    >>> APPROVAL REQUEST START
    ...
    >>> APPROVAL REQUEST END
    ```
  • fix: MCP leaks in app-server (#17223)
    The disconnect path now reuses the same teardown flow as explicit
    unsubscribe, and the thread-state bookkeeping consistently reports only
    threads that lost their last subscriber
    
    https://github.com/openai/codex/issues/16895
  • feat: make rollout recorder reliable against errors (#17214)
    The rollout writer now keeps an owned/monitored task handle, returns
    real Result acks for flush/persist/shutdown, retries failed flushes by
    reopening the rollout file, and keeps buffered items until they are
    successfully written. Session flushes are now real durability barriers
    for fork/rollback/read-after-write paths, while turn completion surfaces
    a warning if the rollout still cannot be saved after recovery.
  • feat: move exec-server ownership (#16344)
    This introduces session-scoped ownership for exec-server so ws
    disconnects no longer immediately kill running remote exec processes,
    and it prepares the protocol for reconnect-based resume.
    - add session_id / resume_session_id to the exec-server initialize
    handshake
      - move process ownership under a shared session registry
    - detach sessions on websocket disconnect and expire them after a TTL
    instead of killing processes immediately (we will resume based on this)
    - allow a new connection to resume an existing session and take over
    notifications/ownership
    - I use UUID to make them not predictable as we don't have auth for now
    - make detached-session expiry authoritative at resume time so teardown
    wins at the TTL boundary
    - reject long-poll process/read calls that get resumed out from under an
    older attachment
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Add output_schema to code mode render (#17210)
    This updates code-mode tool rendering so MCP tools can surface
    structured output types from their `outputSchema`.
    
    What changed:
    - Detect MCP tool-call result wrappers from the output schema shape
    instead of relying on tool-name parsing or provenance flags.
    - Render shared TypeScript aliases once for MCP tool results
    (`CallToolResult`, `ContentBlock`, etc.) so multiple MCP tool
    declarations stay compact.
    - Type `structuredContent` from the tool definition's `outputSchema`
    instead of rendering it as `unknown`.
    - Update the shared MCP aliases to match the MCP draft `CallToolResult`
    schema more closely.
    
    Example:
    - Before: `declare const tools: { mcp__rmcp__echo(args: { env_var?:
    string; message: string; }): Promise<{ _meta?: unknown; content:
    Array<unknown>; isError?: boolean; structuredContent?: unknown; }>; };`
    - After: `declare const tools: { mcp__rmcp__echo(args: { env_var?:
    string; message: string; }): Promise<CallToolResult<{ echo: string; env:
    string | null; }>>; };`
  • Stream Realtime V2 background agent progress (#17264)
    Stream Realtime V2 background agent updates while the background agent
    task is still running, then send the final tool output when it
    completes. User input during an active V2 handoff is acknowledged back
    to realtime as a steering update.
    
    Stack:
    - Depends on #17278 for the background_agent rename.
    - Depends on #17280 for the input task handler refactor.
    
    Coverage:
    - Adds an app-server integration regression test that verifies V2
    progress is sent before the final function-call output.
    
    Validation:
    - just fmt
    - cargo check -p codex-core
    - cargo check -p codex-app-server --tests
    - git diff --check
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • adding parent_thread_id in guardian (#17249)
    ## Summary
    
    This PR adds the parent conversation/session id to the subagent-start
    analytics event for Guardian subagents.
    
    Previously, Guardian sessions were emitted as subagent
    thread-initialized events, but their `parent_thread_id` was serialized
    as `null`. After this change, the `codex_thread_initialized` analytics
    event for a Guardian child session includes the parent user conversation
    id.
  • Extract realtime input task handlers (#17280)
    Refactor the realtime input task select loop into named handlers for
    user text, background agent output, realtime server events, and user
    audio without changing the V2 behavior.
    
    Stack:
    - Depends on #17278 for the background_agent rename.
    
    Validation:
    - just fmt
    - cargo check -p codex-core
    - git diff --check
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Rename Realtime V2 tool to background_agent (#17278)
    Rename the Realtime V2 delegation tool and parser constant to
    background_agent, and update the tool description and fixtures to match.
    
    Validation: just fmt; cargo check -p codex-api; git diff --check
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Option to Notify Workspace Owner When Usage Limit is Reached (#16969)
    ## Summary
    - Replace the manual `/notify-owner` flow with an inline confirmation
    prompt when a usage-based workspace member hits a credits-depleted
    limit.
    - Fetch the current workspace role from the live ChatGPT
    `accounts/check/v4-2023-04-27` endpoint so owner/member behavior matches
    the desktop and web clients.
    - Keep owner, member, and spend-cap messaging distinct so we only offer
    the owner nudge when the workspace is actually out of credits.
    
    ## What Changed
    - `backend-client`
    - Added a typed fetch for the current account role from
    `accounts/check`.
      - Mapped backend role values into a Rust workspace-role enum.
    - `app-server` and protocol
      - Added `workspaceRole` to `account/read` and `account/updated`.
    - Derived `isWorkspaceOwner` from the live role, with a fallback to the
    cached token claim when the role fetch is unavailable.
    - `tui`
      - Removed the explicit `/notify-owner` slash command.
    - When a member is blocked because the workspace is out of credits, the
    error now prompts:
    - `Your workspace is out of credits. Request more from your workspace
    owner? [y/N]`
      - Choosing `y` sends the existing owner-notification request.
    - Choosing `n`, pressing `Esc`, or accepting the default selection
    dismisses the prompt without sending anything.
    - Selection popups now honor explicit item shortcuts, which is how the
    `y` / `n` interaction is wired.
    
    ## Reviewer Notes
    - The main behavior change is scoped to usage-based workspace members
    whose workspace credits are depleted.
    - Spend-cap reached should not show the owner-notification prompt.
    - Owners and admins should continue to see `/usage` guidance instead of
    the member prompt.
    - The live role fetch is best-effort; if it fails, we fall back to the
    existing token-derived ownership signal.
    
    ## Testing
    - Manual verification
      - Workspace owner does not see the member prompt.
    - Workspace member with depleted credits sees the confirmation prompt
    and can send the nudge with `y`.
    - Workspace member with spend cap reached does not see the
    owner-notification prompt.
    
    ### Workspace member out of usage
    
    https://github.com/user-attachments/assets/341ac396-eff4-4a7f-bf0c-60660becbea1
    
    ### Workspace owner
    <img width="1728" height="1086" alt="Screenshot 2026-04-09 at 11 48
    22 AM"
    src="https://github.com/user-attachments/assets/06262a45-e3fc-4cc4-8326-1cbedad46ed6"
    />
  • Install rustls provider for remote websocket client (#17288)
    Addresses #17283
    
    Problem: `codex --remote wss://...` could panic because
    app-server-client did not install rustls' process-level crypto provider
    before opening TLS websocket connections.
    
    Solution: Add the existing rustls provider utility dependency and
    install it before the remote websocket connect.
  • Emit live hook prompts before raw-event filtering (#17189)
    # What
    Project raw Stop-hook prompt response items into typed v2 hookPrompt
    item-completed notifications before applying the raw-response-event
    filter. Keep ordinary raw response items filtered for normal
    subscribers; only the existing hookPrompt bridge runs on the filtered
    raw-item path.
    
    # Why
    Blocked Stop hooks record their continuation instruction as a raw
    model-history user item. Normal v2 desktop subscribers do not opt into
    raw response events, so the app-server listener filtered that raw item
    before the existing hookPrompt translator could emit the typed live
    item/completed notification. As a result, the hook-prompt bubble only
    appeared after thread history was reloaded.
  • preserve search results order in tool_search_output (#17263)
    we used to alpha-sort tool search results because we were using
    `BTreeMap`, which threw away the actual search result ordering.
    
    Now we use a vec to preserve it.
    
    ### Tests
    Updated tests
  • fix: support split carveouts in windows elevated sandbox (#14568)
    ## Summary
    - preserve legacy Windows elevated sandbox behavior for existing
    policies
    - add elevated-only support for split filesystem policies that can be
    represented as readable-root overrides, writable-root overrides, and
    extra deny-write carveouts
    - resolve those elevated filesystem overrides during sandbox transform
    and thread them through setup and policy refresh
    - keep failing closed for explicit unreadable (`none`) carveouts and
    reopened writable descendants under read-only carveouts
    - for explicit read-only-under-writable-root carveouts, materialize
    missing carveout directories during elevated setup before applying the
    deny-write ACL
    - document the elevated vs restricted-token support split in the core
    README
    
    ## Example
    Given a split filesystem policy like:
    
    ```toml
    ":root" = "read"
    ":cwd" = "write"
    "./docs" = "read"
    "C:/scratch" = "write"
    ```
    
    the elevated backend now provisions the readable-root overrides,
    writable-root overrides, and extra deny-write carveouts during setup and
    refresh instead of collapsing back to the legacy workspace-only shape.
    
    If a read-only carveout under a writable root is missing at setup time,
    elevated setup creates that carveout as an empty directory before
    applying its deny-write ACE; otherwise the sandboxed command could
    create it later and bypass the carveout. This is only for explicit
    policy carveouts. Best-effort workspace protections like `.codex/` and
    `.agents/` still skip missing directories.
    
    A policy like:
    
    ```toml
    "/workspace" = "write"
    "/workspace/docs" = "read"
    "/workspace/docs/tmp" = "write"
    ```
    
    still fails closed, because the elevated backend does not reopen
    writable descendants under read-only carveouts yet.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Stop Realtime V2 response.done delegation (#17267)
    Stop parsing Realtime V2 response completion as a Codex handoff;
    delegation stays tied to item completion.\n\nValidation: just fmt; git
    diff --check
    
    Co-authored-by: Codex <noreply@openai.com>
  • Omit empty app-server instruction overrides (#17258)
    ## Summary
    - omit serialized Responses instructions when an app-server base
    instruction override is empty
    - skip empty developer instruction messages and add v2 coverage for the
    empty-override request shape
    
    ## Validation
    - just fmt
    - git diff --check
  • feat(tui): Ctrl+O copy hotkey and harden copy-as-markdown behavior (#16966)
    ## TL;DR
    
    - New `Ctrl+O` shortcut on top of the existing `/copy` command, allowing
    users to copy the latest agent response without having to cancel a plan
    or type `/copy`
    - Copy server clipboard to the client over SSH (OSC 52)
    - Fixes linux copy behavior: a clipboard handle has to be kept alive
    while the paste happens for the contents to be preserved
    - Uses arboard as primary mechanism on Windows, falling back to
    PowerShell copy clipboard function
    - Works with resumes, rolling back during a session, etc.
    
    Tested on macOS, Linux/X11, Windows WSL2, Windows cmd.exe, Windows
    PowerShell, Windows VSCode PowerShell, Windows VSCode WSL2, SSH (macOS
    -> macOS).
    
    ## Problem
    
    The TUI's `/copy` command was fragile. It relied on a single
    `last_copyable_output` field that was bluntly cleared on every rollback
    and thread reconfiguration, making copied content unavailable after
    common operations like backtracking. It also had no keyboard shortcut,
    requiring users to type `/copy` each time. The previous clipboard
    backend mixed platform selection policy with low-level I/O in a way that
    was hard to test, and it did not keep the Linux clipboard owner alive —
    meaning pasted content could vanish once the process that wrote it
    dropped its `arboard::Clipboard`.
    
    This addresses the text-copy failure modes reported in #12836, #15452,
    and #15663: native Linux clipboard access failing in remote or
    unreachable-display environments, copy state going blank even after
    visible assistant output, and local Linux X11 reporting success while
    leaving the clipboard empty.
    
    ## Shortcut rationale
    
    The copy hotkey is `Ctrl+O` rather than `Alt+C` because Alt/Option
    combinations are not delivered consistently by macOS terminal emulators.
    Terminal.app and iTerm2 can treat Option as text input or as a
    configurable Meta/Esc prefix, and Option+C may be consumed or
    transformed before the TUI sees an `Alt+C` key event. `Ctrl+O` is a
    stable control-key chord in Terminal.app, iTerm2, SSH, and the existing
    cross-platform terminal stack.
    
    ## Mental model
    
    Agent responses are now tracked as a bounded, ordinal-indexed history
    (`agent_turn_markdowns: Vec<AgentTurnMarkdown>`) rather than a single
    nullable string. Each completed agent turn appends an entry keyed by its
    ordinal (the number of user turns seen so far). Rollbacks pop entries
    whose ordinal exceeds the remaining turn count, then use the visible
    transcript cells as a best-effort fallback if the ordinal history no
    longer has a surviving entry. This means `/copy` and `Ctrl+O` reflect
    the most recent surviving agent response after a backtrack, instead of
    going blank.
    
    The clipboard backend was rewritten as `clipboard_copy.rs` with a
    strategy-injection design: `copy_to_clipboard_with` accepts closures for
    the OSC 52, arboard, and WSL PowerShell paths, making the selection
    logic fully unit-testable without touching real clipboards. On Linux,
    the `Clipboard` handle is returned as a `ClipboardLease` stored on
    `ChatWidget`, keeping X11/Wayland clipboard ownership alive for the
    lifetime of the TUI. When native copy fails under WSL, the backend now
    tries the Windows clipboard through PowerShell before falling back to
    OSC 52.
    
    ## Non-goals
    
    - This change does not introduce rich-text (HTML) clipboard support; the
    copied content is raw markdown.
    - It does not add a paste-from-history picker or multi-entry clipboard
    ring.
    - WSL support remains a best-effort fallback, not a new configuration
    surface or guarantee for every terminal/host combination.
    
    ## Tradeoffs
    
    - **Bounded history (256 entries)**: `MAX_AGENT_COPY_HISTORY` caps
    memory. For sessions with thousands of turns this silently drops the
    oldest entries. The cap is generous enough for realistic sessions.
    - **`saw_copy_source_this_turn` flag**: Prevents double-recording when
    both `AgentMessage` and `TurnComplete.last_agent_message` fire for the
    same turn. The flag is reset on turn start and on turn complete,
    creating a narrow window where a race between the two events could
    theoretically skip recording. In practice the protocol delivers them
    sequentially.
    - **Transcript fallback on rollback**:
    `last_agent_markdown_from_transcript` walks the visible transcript cells
    to reconstruct plain text when the ordinal history has been fully
    truncated. This path uses `AgentMessageCell::plain_text()` which joins
    rendered spans, so it reconstructs display text rather than the original
    raw markdown. It keeps visible text copyable after rollback, but
    responses with markdown-specific syntax can diverge from the original
    source.
    - **Clipboard fallback ordering**: SSH still uses OSC 52 exclusively
    because native/PowerShell clipboard access would target the wrong
    machine. Local sessions try native clipboard first, then WSL PowerShell
    when running under WSL, then OSC 52. This adds one process-spawn
    fallback for WSL users but keeps the normal desktop and SSH paths
    simple.
    
    ## Architecture
    
    ```
    chatwidget.rs
    ├── agent_turn_markdowns: Vec<AgentTurnMarkdown>  // ordinal-indexed history
    ├── last_agent_markdown: Option<String>            // always == last entry's markdown
    ├── completed_turn_count: usize                    // incremented when user turns enter history
    ├── saw_copy_source_this_turn: bool                // dedup guard
    ├── clipboard_lease: Option<ClipboardLease>        // keeps Linux clipboard owner alive
    │
    ├── record_agent_markdown(&str)                    // append/update history entry
    ├── truncate_agent_turn_markdowns_to_turn_count()  // rollback support
    ├── copy_last_agent_markdown()                     // public entry point (slash + hotkey)
    └── copy_last_agent_markdown_with(fn)              // testable core
    
    clipboard_copy.rs
    ├── copy_to_clipboard(text) -> Result<Option<ClipboardLease>>
    ├── copy_to_clipboard_with(text, ssh, wsl, osc52_fn, arboard_fn, wsl_fn)
    ├── ClipboardLease { _clipboard on linux }
    ├── arboard_copy(text)          // platform-conditional native clipboard path
    ├── wsl_clipboard_copy(text)    // WSL PowerShell fallback
    ├── osc52_copy(text)            // /dev/tty -> stdout fallback
    ├── SuppressStderr              // macOS stderr redirect guard
    ├── is_ssh_session()
    └── is_wsl_session()
    
    app_backtrack.rs
    ├── last_agent_markdown_from_transcript()  // reconstruct from visible cells
    └── truncate call sites in trim/apply_confirmed_rollback
    ```
    
    ## Observability
    
    - `tracing::warn!` on native clipboard failure before OSC 52 fallback.
    - `tracing::debug!` on `/dev/tty` open/write failure before stdout
    fallback.
    - History cell messages: "Copied last message to clipboard", "Copy
    failed: {error}", "No agent response to copy" appear in the TUI
    transcript.
    
    ## Tests
    
    - `clipboard_copy.rs`: Unit tests cover OSC 52 encoding roundtrip,
    payload size rejection, writer output, SSH-only OSC52 routing, non-WSL
    native-to-OSC52 fallback, WSL native-to-PowerShell fallback, WSL
    PowerShell-to-OSC52 fallback, and all-error reporting via strategy
    injection.
    - `chatwidget/tests/slash_commands.rs`: Updated existing `/copy` tests
    to use `last_agent_markdown_text()` accessor. Added coverage for the
    Linux clipboard lease lifecycle, missing
    `TurnComplete.last_agent_message` fallback through completed assistant
    items, replayed legacy agent messages, stale-output prevention after
    rollback, and the `Ctrl+O` no-output hotkey path.
    - `app_backtrack.rs`: Added
    `agent_group_count_ignores_context_compacted_marker` verifying that
    info-event cells don't inflate the agent group count.
    
    ---------
    
    Co-authored-by: Felipe Coury <felipe.coury@gmail.com>
    Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
  • [mcp] Expand tool search to custom MCPs. (#16944)
    - [x] Expand tool search to custom MCPs.
    - [x] Rename several variables/fields to be more generic.
    
    Updated tool & server name lifecycles:
    
    **Raw Identity**
    
    ToolInfo.server_name is raw MCP server name.
    ToolInfo.tool.name is raw MCP tool name.
    MCP calls route back to raw via parse_tool_name() returning
    (tool.server_name, tool.tool.name).
    mcpServerStatus/list now groups by raw server and keys tools by
    Tool.name: mod.rs:599
    App-server just forwards that grouped raw snapshot:
    codex_message_processor.rs:5245
    
    **Callable Names**
    
    On list-tools, we create provisional callable_namespace / callable_name:
    mcp_connection_manager.rs:1556
    For non-app MCP, provisional callable name starts as raw tool name.
    For codex-apps, provisional callable name is sanitized and strips
    connector name/id prefix; namespace includes connector name.
    Then qualify_tools() sanitizes callable namespace + name to ASCII alnum
    / _ only: mcp_tool_names.rs:128
    Note: this is stricter than Responses API. Hyphen is currently replaced
    with _ for code-mode compatibility.
    
    **Collision Handling**
    
    We do initially collapse example-server and example_server to the same
    base.
    Then qualify_tools() detects distinct raw namespace identities behind
    the same sanitized namespace and appends a hash to the callable
    namespace: mcp_tool_names.rs:137
    Same idea for tool-name collisions: hash suffix goes on callable tool
    name.
    Final list_all_tools() map key is callable_namespace + callable_name:
    mcp_connection_manager.rs:769
    
    **Direct Model Tools**
    
    Direct MCP tool declarations use the full qualified sanitized key as the
    Responses function name.
    The raw rmcp Tool is converted but renamed for model exposure.
    
    **Tool Search / Deferred**
    
    Tool search result namespace = final ToolInfo.callable_namespace:
    tool_search.rs:85
    Tool search result nested name = final ToolInfo.callable_name:
    tool_search.rs:86
    Deferred tool handler is registered as "{namespace}:{name}":
    tool_registry_plan.rs:248
    When a function call comes back, core recombines namespace + name, looks
    up the full qualified key, and gets the raw server/tool for MCP
    execution: codex.rs:4353
    
    **Separate Legacy Snapshot**
    
    collect_mcp_snapshot_from_manager_with_detail() still returns a map
    keyed by qualified callable name.
    mcpServerStatus/list no longer uses that; it uses
    McpServerStatusSnapshot, which is raw-inventory shaped.
  • app-server: Use shared receivers for app-server message processors (#17256)
    We do not rely on the mutability here, so express it in the type system.
  • Forward app-server turn clientMetadata to Responses (#16009)
    ## Summary
    App-server v2 already receives turn-scoped `clientMetadata`, but the
    Rust app-server was dropping it before the outbound Responses request.
    This change keeps the fix lightweight by threading that metadata through
    the existing turn-metadata path rather than inventing a new transport.
    
    ## What we're trying to do and why
    We want turn-scoped metadata from the app-server protocol layer,
    especially fields like Hermes/GAAS run IDs, to survive all the way to
    the actual Responses API request so it is visible in downstream
    websocket request logging and analytics.
    
    The specific bug was:
    - app-server protocol uses camelCase `clientMetadata`
    - Responses transport already has an existing turn metadata carrier:
    `x-codex-turn-metadata`
    - websocket transport already rewrites that header into
    `request.request_body.client_metadata["x-codex-turn-metadata"]`
    - but the Rust app-server never parsed or stored `clientMetadata`, so
    nothing from the app-server request was making it into that existing
    path
    
    This PR fixes that without adding a new header or a second metadata
    channel.
    
    ## How we did it
    ### Protocol surface
    - Add optional `clientMetadata` to v2 `TurnStartParams` and
    `TurnSteerParams`
    - Regenerate the JSON schema / TypeScript fixtures
    - Update app-server docs to describe the field and its behavior
    
    ### Runtime plumbing
    - Add a dedicated core op for app-server user input carrying turn-scoped
    metadata: `Op::UserInputWithClientMetadata`
    - Wire `turn/start` and `turn/steer` through that op / signature path
    instead of dropping the metadata at the message-processor boundary
    - Store the metadata in `TurnMetadataState`
    
    ### Transport behavior
    - Reuse the existing serialized `x-codex-turn-metadata` payload
    - Merge the new app-server `clientMetadata` into that JSON additively
    - Do **not** replace built-in reserved fields already present in the
    turn metadata payload
    - Keep websocket behavior unchanged at the outer shape level: it still
    sends only `client_metadata["x-codex-turn-metadata"]`, but that JSON
    string now contains the merged fields
    - Keep HTTP fallback behavior unchanged except that the existing
    `x-codex-turn-metadata` header now includes the merged fields too
    
    ### Request shape before / after
    Before, a websocket `response.create` looked like:
    ```json
    {
      "type": "response.create",
      "client_metadata": {
        "x-codex-turn-metadata": "{\"session_id\":\"...\",\"turn_id\":\"...\"}"
      }
    }
    ```
    Even if the app-server caller supplied `clientMetadata`, it was not
    represented there.
    
    After, the same request shape is preserved, but the serialized payload
    now includes the new turn-scoped fields:
    ```json
    {
      "type": "response.create",
      "client_metadata": {
        "x-codex-turn-metadata": "{\"session_id\":\"...\",\"turn_id\":\"...\",\"fiber_run_id\":\"fiber-start-123\",\"origin\":\"gaas\"}"
      }
    }
    ```
    
    ## Validation
    ### Targeted tests added / updated
    - protocol round-trip coverage for `clientMetadata` on `turn/start` and
    `turn/steer`
    - protocol round-trip coverage for `Op::UserInputWithClientMetadata`
    - `TurnMetadataState` merge test proving client metadata is added
    without overwriting reserved built-in fields
    - websocket request-shape test proving outbound `response.create`
    contains merged metadata inside
    `client_metadata["x-codex-turn-metadata"]`
    - app-server integration tests proving:
    - `turn/start` forwards `clientMetadata` into the outbound Responses
    request path
      - websocket warmup + real turn request both behave correctly
      - `turn/steer` updates the follow-up request metadata
    
    ### Commands run
    - `just write-app-server-schema`
    - `cargo test -p codex-app-server-protocol`
    - `cargo test -p codex-protocol`
    - `cargo test -p codex-core
    turn_metadata_state_merges_client_metadata_without_replacing_reserved_fields
    --lib`
    - `cargo test -p codex-core --test all
    responses_websocket_preserves_custom_turn_metadata_fields`
    - `cargo test -p codex-app-server --test all client_metadata`
    - `cargo test -p codex-app-server --test all
    turn_start_forwards_client_metadata_to_responses_websocket_request_body_v2
    -- --nocapture`
    - `just fmt`
    - `just fix -p codex-core -p codex-protocol -p codex-app-server-protocol
    -p codex-app-server`
    - `just fix -p codex-exec -p codex-tui-app-server`
    - `just argument-comment-lint`
    
    ### Full suite note
    `cargo test` in `codex-rs` still fails in:
    -
    `suite::v2::turn_interrupt::turn_interrupt_resolves_pending_command_approval_request`
    
    I verified that same failure on a clean detached `HEAD` worktree with an
    isolated `CARGO_TARGET_DIR`, so it is not caused by this patch.