Commit Graph

368 Commits

  • Pass turn id with feedback uploads (#17314)
    ## Summary
    - Add an optional `tags` dictionary to feedback upload params.
    - Capture the active app-server turn id in the TUI and submit it as
    `tags.turn_id` with `/feedback` uploads.
    - Merge client-provided feedback tags into Sentry feedback tags while
    preserving reserved system fields like `thread_id`, `classification`,
    `cli_version`, `session_source`, and `reason`.
    
    ## Behavior / impact
    Existing feedback upload callers remain compatible because `tags` is
    optional and nullable. The wire shape is still a normal JSON object /
    TypeScript dictionary, so adding future feedback metadata will not
    require a new top-level protocol field each time. This change only adds
    feedback metadata for Codex CLI/TUI uploads; it does not affect existing
    pipelines, DAGs, exports, or downstream consumers unless they choose to
    read the new `turn_id` feedback tag.
    
    ## Tests
    - `cargo fmt -- --config imports_granularity=Item` passed; stable
    rustfmt warned that `imports_granularity` is nightly-only.
    - `cargo run -p codex-app-server-protocol --bin write_schema_fixtures`
    - `cargo test -p codex-feedback
    upload_tags_include_client_tags_and_preserve_reserved_fields`
    - `cargo test -p codex-app-server-protocol
    schema_fixtures_match_generated`
    - `cargo test -p codex-tui build_feedback_upload_params`
    - `cargo test -p codex-tui
    live_app_server_turn_started_sets_feedback_turn_id`
    - `cargo check -p codex-app-server --tests`
    - `git diff --check`
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • [mcp] Support MCP Apps part 3 - Add mcp tool call support. (#17364)
    - [x] Add a new app-server method so that MCP Apps can call their own
    MCP server directly.
  • representing guardian review timeouts in protocol types (#17381)
    ## Summary
    
    - Add `TimedOut` to Guardian/review carrier types:
      - `ReviewDecision::TimedOut`
      - `GuardianAssessmentStatus::TimedOut`
      - app-server v2 `GuardianApprovalReviewStatus::TimedOut`
    - Regenerate app-server JSON/TypeScript schemas for the new wire shape.
    - Wire the new status through core/app-server/TUI mappings with
    conservative fail-closed handling.
    - Keep `TimedOut` non-user-selectable in the approval UI.
    
    **Does not change runtime behavior yet; emitting `TimeOut` and
    parent-model timeout messaging will come in followup PRs**
  • Revert "Option to Notify Workspace Owner When Usage Limit is Reached" (#17391)
    Reverts openai/codex#16969
    
    #sev3-2026-04-10-accountscheckversion-500s-for-openai-workspace-7300
  • fix(guardian, app-server): introduce guardian review ids (#17298)
    ## Description
    
    This PR introduces `review_id` as the stable identifier for guardian
    reviews and exposes it in app-server `item/autoApprovalReview/started`
    and `item/autoApprovalReview/completed` events.
    
    Internally, guardian rejection state is now keyed by `review_id` instead
    of the reviewed tool item ID. `target_item_id` is still included when a
    review maps to a concrete thread item, but it is no longer overloaded as
    the review lifecycle identifier.
    
    ## Motivation
    
    We'd like to give users the ability to preempt a guardian review while
    it's running (approve or decline).
    
    However, we can't implement the API that allows the user to override a
    running guardian review because we didn't have a unique `review_id` per
    guardian review. Using `target_item_id` is not correct since:
    - with execve reviews, there can be multiple execve calls (and therefore
    guardian reviews) per shell command
    - with network policy reviews, there is no target item ID
    
    The PR that actually implements user overrides will use `review_id` as
    the stable identifier.
  • Support clear SessionStart source (#17073)
    ## Motivation
    
    The `SessionStart` hook already receives `startup` and `resume` sources,
    but sessions created from `/clear` previously looked like normal startup
    sessions. This makes it impossible for hook authors to distinguish
    between these with the matcher.
    
    ## Summary
    
    - Add `InitialHistory::Cleared` so `/clear`-created sessions can be
    distinguished from ordinary startup sessions.
    - Add `SessionStartSource::Clear` and wire it through core, app-server
    thread start params, and TUI clear-session flow.
    - Update app-server protocol schemas, generated TypeScript, docs, and
    related tests.
    
    
    https://github.com/user-attachments/assets/9cae3cb4-41c7-4d06-b34f-966252442e5c
  • Option to Notify Workspace Owner When Usage Limit is Reached (#16969)
    ## Summary
    - Replace the manual `/notify-owner` flow with an inline confirmation
    prompt when a usage-based workspace member hits a credits-depleted
    limit.
    - Fetch the current workspace role from the live ChatGPT
    `accounts/check/v4-2023-04-27` endpoint so owner/member behavior matches
    the desktop and web clients.
    - Keep owner, member, and spend-cap messaging distinct so we only offer
    the owner nudge when the workspace is actually out of credits.
    
    ## What Changed
    - `backend-client`
    - Added a typed fetch for the current account role from
    `accounts/check`.
      - Mapped backend role values into a Rust workspace-role enum.
    - `app-server` and protocol
      - Added `workspaceRole` to `account/read` and `account/updated`.
    - Derived `isWorkspaceOwner` from the live role, with a fallback to the
    cached token claim when the role fetch is unavailable.
    - `tui`
      - Removed the explicit `/notify-owner` slash command.
    - When a member is blocked because the workspace is out of credits, the
    error now prompts:
    - `Your workspace is out of credits. Request more from your workspace
    owner? [y/N]`
      - Choosing `y` sends the existing owner-notification request.
    - Choosing `n`, pressing `Esc`, or accepting the default selection
    dismisses the prompt without sending anything.
    - Selection popups now honor explicit item shortcuts, which is how the
    `y` / `n` interaction is wired.
    
    ## Reviewer Notes
    - The main behavior change is scoped to usage-based workspace members
    whose workspace credits are depleted.
    - Spend-cap reached should not show the owner-notification prompt.
    - Owners and admins should continue to see `/usage` guidance instead of
    the member prompt.
    - The live role fetch is best-effort; if it fails, we fall back to the
    existing token-derived ownership signal.
    
    ## Testing
    - Manual verification
      - Workspace owner does not see the member prompt.
    - Workspace member with depleted credits sees the confirmation prompt
    and can send the nudge with `y`.
    - Workspace member with spend cap reached does not see the
    owner-notification prompt.
    
    ### Workspace member out of usage
    
    https://github.com/user-attachments/assets/341ac396-eff4-4a7f-bf0c-60660becbea1
    
    ### Workspace owner
    <img width="1728" height="1086" alt="Screenshot 2026-04-09 at 11 48
    22 AM"
    src="https://github.com/user-attachments/assets/06262a45-e3fc-4cc4-8326-1cbedad46ed6"
    />
  • Forward app-server turn clientMetadata to Responses (#16009)
    ## Summary
    App-server v2 already receives turn-scoped `clientMetadata`, but the
    Rust app-server was dropping it before the outbound Responses request.
    This change keeps the fix lightweight by threading that metadata through
    the existing turn-metadata path rather than inventing a new transport.
    
    ## What we're trying to do and why
    We want turn-scoped metadata from the app-server protocol layer,
    especially fields like Hermes/GAAS run IDs, to survive all the way to
    the actual Responses API request so it is visible in downstream
    websocket request logging and analytics.
    
    The specific bug was:
    - app-server protocol uses camelCase `clientMetadata`
    - Responses transport already has an existing turn metadata carrier:
    `x-codex-turn-metadata`
    - websocket transport already rewrites that header into
    `request.request_body.client_metadata["x-codex-turn-metadata"]`
    - but the Rust app-server never parsed or stored `clientMetadata`, so
    nothing from the app-server request was making it into that existing
    path
    
    This PR fixes that without adding a new header or a second metadata
    channel.
    
    ## How we did it
    ### Protocol surface
    - Add optional `clientMetadata` to v2 `TurnStartParams` and
    `TurnSteerParams`
    - Regenerate the JSON schema / TypeScript fixtures
    - Update app-server docs to describe the field and its behavior
    
    ### Runtime plumbing
    - Add a dedicated core op for app-server user input carrying turn-scoped
    metadata: `Op::UserInputWithClientMetadata`
    - Wire `turn/start` and `turn/steer` through that op / signature path
    instead of dropping the metadata at the message-processor boundary
    - Store the metadata in `TurnMetadataState`
    
    ### Transport behavior
    - Reuse the existing serialized `x-codex-turn-metadata` payload
    - Merge the new app-server `clientMetadata` into that JSON additively
    - Do **not** replace built-in reserved fields already present in the
    turn metadata payload
    - Keep websocket behavior unchanged at the outer shape level: it still
    sends only `client_metadata["x-codex-turn-metadata"]`, but that JSON
    string now contains the merged fields
    - Keep HTTP fallback behavior unchanged except that the existing
    `x-codex-turn-metadata` header now includes the merged fields too
    
    ### Request shape before / after
    Before, a websocket `response.create` looked like:
    ```json
    {
      "type": "response.create",
      "client_metadata": {
        "x-codex-turn-metadata": "{\"session_id\":\"...\",\"turn_id\":\"...\"}"
      }
    }
    ```
    Even if the app-server caller supplied `clientMetadata`, it was not
    represented there.
    
    After, the same request shape is preserved, but the serialized payload
    now includes the new turn-scoped fields:
    ```json
    {
      "type": "response.create",
      "client_metadata": {
        "x-codex-turn-metadata": "{\"session_id\":\"...\",\"turn_id\":\"...\",\"fiber_run_id\":\"fiber-start-123\",\"origin\":\"gaas\"}"
      }
    }
    ```
    
    ## Validation
    ### Targeted tests added / updated
    - protocol round-trip coverage for `clientMetadata` on `turn/start` and
    `turn/steer`
    - protocol round-trip coverage for `Op::UserInputWithClientMetadata`
    - `TurnMetadataState` merge test proving client metadata is added
    without overwriting reserved built-in fields
    - websocket request-shape test proving outbound `response.create`
    contains merged metadata inside
    `client_metadata["x-codex-turn-metadata"]`
    - app-server integration tests proving:
    - `turn/start` forwards `clientMetadata` into the outbound Responses
    request path
      - websocket warmup + real turn request both behave correctly
      - `turn/steer` updates the follow-up request metadata
    
    ### Commands run
    - `just write-app-server-schema`
    - `cargo test -p codex-app-server-protocol`
    - `cargo test -p codex-protocol`
    - `cargo test -p codex-core
    turn_metadata_state_merges_client_metadata_without_replacing_reserved_fields
    --lib`
    - `cargo test -p codex-core --test all
    responses_websocket_preserves_custom_turn_metadata_fields`
    - `cargo test -p codex-app-server --test all client_metadata`
    - `cargo test -p codex-app-server --test all
    turn_start_forwards_client_metadata_to_responses_websocket_request_body_v2
    -- --nocapture`
    - `just fmt`
    - `just fix -p codex-core -p codex-protocol -p codex-app-server-protocol
    -p codex-app-server`
    - `just fix -p codex-exec -p codex-tui-app-server`
    - `just argument-comment-lint`
    
    ### Full suite note
    `cargo test` in `codex-rs` still fails in:
    -
    `suite::v2::turn_interrupt::turn_interrupt_resolves_pending_command_approval_request`
    
    I verified that same failure on a clean detached `HEAD` worktree with an
    isolated `CARGO_TARGET_DIR`, so it is not caused by this patch.
  • Add realtime voice selection (#17176)
    - Add realtime voice selection for realtime/start.
    - Expose the supported v1/v2 voice lists and cover explicit, configured,
    default, and invalid voice paths.
  • Move default realtime prompt into core (#17165)
    - Adds a core-owned realtime backend prompt template and preparation
    path.
    - Makes omitted realtime start prompts use the core default, while null
    or empty prompts intentionally send empty instructions.
    - Covers the core realtime path and app-server v2 path with integration
    coverage.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Update guardian output schema (#17061)
    ## Summary
    - Update guardian output schema to separate risk, authorization,
    outcome, and rationale.
    - Feed guardian rationale into rejection messages.
    - Split the guardian policy into template and tenant-config sections.
    
    ## Validation
    - `cargo test -p codex-core mcp_tool_call`
    - `env -u CODEX_SANDBOX_NETWORK_DISABLED INSTA_UPDATE=always cargo test
    -p codex-core guardian::`
    
    ---------
    
    Co-authored-by: Owen Lin <owen@openai.com>
  • Use model metadata for Fast Mode status (#16949)
    Fast Mode status was still tied to one model name in the TUI and
    model-list plumbing. This changes the model metadata shape so a model
    can advertise additional speed tiers, carries that field through the
    app-server model list, and uses it to decide when to show Fast Mode
    status.
    
    For people using Codex, the behavior is intended to stay the same for
    existing models. Fast Mode still requires the existing signed-in /
    feature-gated path; the difference is that the UI can now recognize any
    model the model list marks as Fast-capable, instead of requiring a new
    client-side slug check.
  • Add WebRTC transport to realtime start (#16960)
    Adds WebRTC startup to the experimental app-server
    `thread/realtime/start` method with an optional transport enum. The
    websocket path remains the default; WebRTC offers create the realtime
    session through the shared start flow and emit the answer SDP via
    `thread/realtime/sdp`.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • [app-server-protocol] introduce generic ServerResponse for app-server-protocol (#17044)
    - introduces `ServerResponse` as the symmetrical typed response union to
    `ServerRequest` for app-server-protocol
    - enables scalable event stream ingestion for use cases such as
    analytics, particularly for tools/approvals
    - no runtime behavior changes, protocol/schema plumbing only
    - mirrors #15921
  • app-server: Move watch_id to request of fs/watch (#17026)
    It's easier for clients to maintain watchers if they define the watch
    id, so move it into the request.
    It's not used yet, so should be a safe change.
  • Honor null thread instructions (#16964)
    - Treat explicit null thread instructions as a blank-slate override
    while preserving omitted-field fallback behavior.
    - Preserve null through rollout resume/fork and keep explicit empty
    strings distinct.
    - Add app-server v2 start/fork coverage for the tri-state instruction
    params.
  • [codex] Add danger-full-access denylist-only network mode (#16946)
    ## Summary
    
    This adds `experimental_network.danger_full_access_denylist_only` for
    orgs that want yolo / danger-full-access sessions to keep full network
    access while still enforcing centrally managed deny rules.
    
    When the flag is true and the session sandbox is `danger-full-access`,
    the network proxy starts with:
    
    - domain allowlist set to `*`
    - managed domain `deny` entries enforced
    - upstream proxy use allowed
    - all Unix sockets allowed
    - local/private binding allowed
    
    Caveat: the denylist is best effort only. In yolo / danger-full-access
    mode, Codex or the model can use an allowed socket or other
    local/private network path to bypass the proxy denylist, so this should
    not be treated as a hard security boundary.
    
    The flag is intentionally scoped to `SandboxPolicy::DangerFullAccess`.
    Read-only and workspace-write modes keep the existing managed/user
    allowlist, denylist, Unix socket, and local-binding behavior. This does
    not enable the non-loopback proxy listener setting; that still requires
    its own explicit config.
    
    This also threads the new field through config requirements parsing,
    app-server protocol/schema output, config API mapping, and the TUI debug
    config output.
    
    ## How to use
    
    Add the flag under `[experimental_network]` in the network policy config
    that is delivered to Codex. The setting is not under `[permissions]`.
    
    ```toml
    [experimental_network]
    enabled = true
    danger_full_access_denylist_only = true
    
    [experimental_network.domains]
    "blocked.example.com" = "deny"
    "*.blocked.example.com" = "deny"
    ```
    
    With that configuration, yolo / danger-full-access sessions get broad
    network access except for the managed denied domains above. The denylist
    remains a best-effort proxy policy because the session may still use
    allowed sockets to bypass it. Other sandbox modes do not get the
    wildcard domain allowlist or the socket/local-binding relaxations from
    this flag.
    
    ## Verification
    
    - `cargo test -p codex-config network_requirements`
    - `cargo test -p codex-core network_proxy_spec`
    - `cargo test -p codex-app-server map_requirements_toml_to_api`
    - `cargo test -p codex-tui debug_config_output`
    - `cargo test -p codex-app-server-protocol`
    - `just write-app-server-schema`
    - `just fmt`
    - `just fix -p codex-config -p codex-core -p codex-app-server-protocol
    -p codex-app-server -p codex-tui`
    - `just fix -p codex-core -p codex-config`
    - `git diff --check`
    - `cargo clean`
  • [mcp] Support MCP Apps part 1. (#16082)
    - [x] Add `mcpResource/read` method to read mcp resource.
  • Speed up /mcp inventory listing (#16831)
    Addresses #16244
    
    This was a performance regression introduced when we moved the TUI on
    top of the app server API.
    
    Problem: `/mcp` rebuilt a full MCP inventory through
    `mcpServerStatus/list`, including resources and resource templates that
    made the TUI wait on slow inventory probes.
    
    Solution: add a lightweight `detail` mode to `mcpServerStatus/list`,
    have `/mcp` request tools-and-auth only, and cover the fast path with
    app-server and TUI tests.
    
    Testing: Confirmed slow (multi-second) response prior to change and
    immediate response after change.
    
    I considered two options:
    1. Change the existing `mcpServerStatus/list` API to accept an optional
    "details" parameter so callers can request only a subset of the
    information.
    2. Add a separate `mcpServer/list` API that returns only the servers,
    tools, and auth but omits the resources.
    
    I chose option 1, but option 2 is also a reasonable approach.
  • [codex-analytics] add protocol-native turn timestamps (#16638)
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/16638).
    * #16870
    * #16706
    * #16659
    * #16641
    * #16640
    * __->__ #16638
  • fix(guardian): fix ordering of guardian events (#16462)
    Guardian events were emitted a bit out of order for CommandExecution
    items. This would make it hard for the frontend to render a guardian
    auto-review, which has this payload:
    ```
    pub struct ItemGuardianApprovalReviewStartedNotification {
        pub thread_id: String,
        pub turn_id: String,
        pub target_item_id: String,
        pub review: GuardianApprovalReview,
        // FYI this is no longer a json blob
        pub action: Option<JsonValue>,
    }
    ```
    
    There is a `target_item_id` the auto-approval review is referring to,
    but the actual item had not been emitted yet.
    
    Before this PR:
    - `item/autoApprovalReview/started`
    - `item/autoApprovalReview/completed`, and if approved...
    - `item/started`
    - `item/completed`
    
    After this PR:
    - `item/started`
    - `item/autoApprovalReview/started`
    - `item/autoApprovalReview/completed`
    - `item/completed`
    
    This lines up much better with existing patterns (i.e. human review in
    `Default mode`, where app-server would send a server request to prompt
    for user approval after `item/started`), and makes it easier for clients
    to render what guardian is actually reviewing.
    
    We do this following a similar pattern as `FileChange` (aka apply patch)
    items, where we create a FileChange item and emit `item/started` if we
    see the apply patch approval request, before the actual apply patch call
    runs.
  • feat(requirements): support allowed_approval_reviewers (#16701)
    ## Description
    
    Add requirements.toml support for `allowed_approvals_reviewers =
    ["user", "guardian_subagent"]`, so admins can now restrict the use of
    guardian mode.
    
    Note: If a user sets a reviewer that isn’t allowed by requirements.toml,
    config loading falls back to the first allowed reviewer and emits a
    startup warning.
    
    The table below describes the possible admin controls.
    | Admin intent | `requirements.toml` | User `config.toml` | End result |
    |---|---|---|---|
    | Leave Guardian optional | omit `allowed_approvals_reviewers` or set
    `["user", "guardian_subagent"]` | user chooses `approvals_reviewer =
    "user"` or `"guardian_subagent"` | Guardian off for `user`, on for
    `guardian_subagent` + `approval_policy = "on-request"` |
    | Force Guardian off | `allowed_approvals_reviewers = ["user"]` | any
    user value | Effective reviewer is `user`; Guardian off |
    | Force Guardian on | `allowed_approvals_reviewers =
    ["guardian_subagent"]` and usually `allowed_approval_policies =
    ["on-request"]` | any user reviewer value; user should also have
    `approval_policy = "on-request"` unless policy is forced | Effective
    reviewer is `guardian_subagent`; Guardian on when effective approval
    policy is `on-request` |
    | Allow both, but default to manual if user does nothing |
    `allowed_approvals_reviewers = ["user", "guardian_subagent"]` | omit
    `approvals_reviewer` | Effective reviewer is `user`; Guardian off |
    | Allow both, and user explicitly opts into Guardian |
    `allowed_approvals_reviewers = ["user", "guardian_subagent"]` |
    `approvals_reviewer = "guardian_subagent"` and `approval_policy =
    "on-request"` | Guardian on |
    | Invalid admin config | `allowed_approvals_reviewers = []` | anything |
    Config load error |
  • Fix fork source display in /status (expose forked_from_id in app server) (#16596)
    Addresses #16560
    
    Problem: `/status` stopped showing the source thread id in forked TUI
    sessions after the app-server migration.
    
    Solution: Carry fork source ids through app-server v2 thread data and
    the TUI session adapter, and update TUI fixtures so `/status` matches
    the old TUI behavior.
  • fix(guardian): make GuardianAssessmentEvent.action strongly typed (#16448)
    ## Description
    
    Previously the `action` field on `EventMsg::GuardianAssessment`, which
    describes what Guardian is reviewing, was typed as an arbitrary JSON
    blob. This PR cleans it up and defines a sum type representing all the
    various actions that Guardian can review.
    
    This is a breaking change (on purpose), which is fine because:
    - the Codex app / VSCE does not actually use `action` at the moment
    - the TUI code that consumes `action` is updated in this PR as well
    - rollout files that serialized old `EventMsg::GuardianAssessment` will
    just silently drop these guardian events
    - the contract is defined as unstable, so other clients have a fair
    warning :)
    
    This will make things much easier for followup Guardian work.
    
    ## Why
    
    The old guardian review payloads worked, but they pushed too much shape
    knowledge into downstream consumers. The TUI had custom JSON parsing
    logic for commands, patches, network requests, and MCP calls, and the
    app-server protocol was effectively just passing through an opaque blob.
    
    Typing this at the protocol boundary makes the contract clearer.
  • chore: clean up argument-comment lint and roll out all-target CI on macOS (#16054)
    ## Why
    
    `argument-comment-lint` was green in CI even though the repo still had
    many uncommented literal arguments. The main gap was target coverage:
    the repo wrapper did not force Cargo to inspect test-only call sites, so
    examples like the `latest_session_lookup_params(true, ...)` tests in
    `codex-rs/tui_app_server/src/lib.rs` never entered the blocking CI path.
    
    This change cleans up the existing backlog, makes the default repo lint
    path cover all Cargo targets, and starts rolling that stricter CI
    enforcement out on the platform where it is currently validated.
    
    ## What changed
    
    - mechanically fixed existing `argument-comment-lint` violations across
    the `codex-rs` workspace, including tests, examples, and benches
    - updated `tools/argument-comment-lint/run-prebuilt-linter.sh` and
    `tools/argument-comment-lint/run.sh` so non-`--fix` runs default to
    `--all-targets` unless the caller explicitly narrows the target set
    - fixed both wrappers so forwarded cargo arguments after `--` are
    preserved with a single separator
    - documented the new default behavior in
    `tools/argument-comment-lint/README.md`
    - updated `rust-ci` so the macOS lint lane keeps the plain wrapper
    invocation and therefore enforces `--all-targets`, while Linux and
    Windows temporarily pass `-- --lib --bins`
    
    That temporary CI split keeps the stricter all-targets check where it is
    already cleaned up, while leaving room to finish the remaining Linux-
    and Windows-specific target-gated cleanup before enabling
    `--all-targets` on those runners. The Linux and Windows failures on the
    intermediate revision were caused by the wrapper forwarding bug, not by
    additional lint findings in those lanes.
    
    ## Validation
    
    - `bash -n tools/argument-comment-lint/run.sh`
    - `bash -n tools/argument-comment-lint/run-prebuilt-linter.sh`
    - shell-level wrapper forwarding check for `-- --lib --bins`
    - shell-level wrapper forwarding check for `-- --tests`
    - `just argument-comment-lint`
    - `cargo test` in `tools/argument-comment-lint`
    - `cargo test -p codex-terminal-detection`
    
    ## Follow-up
    
    - Clean up remaining Linux-only target-gated callsites, then switch the
    Linux lint lane back to the plain wrapper invocation.
    - Clean up remaining Windows-only target-gated callsites, then switch
    the Windows lint lane back to the plain wrapper invocation.
  • Add ChatGPT device-code login to app server (#15525)
    ## Problem
    
    App-server clients could only initiate ChatGPT login through the browser
    callback flow, even though the shared login crate already supports
    device-code auth. That left VS Code, Codex App, and other app-server
    clients without a first-class way to use the existing device-code
    backend when browser redirects are brittle or when the client UX wants
    to own the login ceremony.
    
    ## Mental model
    
    This change adds a second ChatGPT login start path to app-server:
    clients can now call `account/login/start` with `type:
    "chatgptDeviceCode"`. App-server immediately returns a `loginId` plus
    the device-code UX payload (`verificationUrl` and `userCode`), then
    completes the login asynchronously in the background using the existing
    `codex_login` polling flow. Successful device-code login still resolves
    to ordinary `chatgpt` auth, and completion continues to flow through the
    existing `account/login/completed` and `account/updated` notifications.
    
    ## Non-goals
    
    This does not introduce a new auth mode, a new account shape, or a
    device-code eligibility discovery API. It also does not add automatic
    fallback to browser login in core; clients remain responsible for
    choosing when to request device code and whether to retry with a
    different UX if the backend/admin policy rejects it.
    
    ## Tradeoffs
    
    We intentionally keep `login_chatgpt_common` as a local validation
    helper instead of turning it into a capability probe. Device-code
    eligibility is checked by actually calling `request_device_code`, which
    means policy-disabled cases surface as an immediate request error rather
    than an async completion event. We also keep the active-login state
    machine minimal: browser and device-code logins share the same public
    cancel contract, but device-code cancellation is implemented with a
    local cancel token rather than a larger cross-crate refactor.
    
    ## Architecture
    
    The protocol grows a new `chatgptDeviceCode` request/response variant in
    app-server v2. On the server side, the new handler reuses the existing
    ChatGPT login precondition checks, calls `request_device_code`, returns
    the device-code payload, and then spawns a background task that waits on
    either cancellation or `complete_device_code_login`. On success, it
    reuses the existing auth reload and cloud-requirements refresh path
    before emitting `account/login/completed` success and `account/updated`.
    On failure or cancellation, it emits only `account/login/completed`
    failure. The existing `account/login/cancel { loginId }` contract
    remains unchanged and now works for both browser and device-code
    attempts.
    
    
    ## Tests
    
    Added protocol serialization coverage for the new request/response
    variant, plus app-server tests for device-code success, failure, cancel,
    and start-time rejection behavior. Existing browser ChatGPT login
    coverage remains in place to show that the callback-based flow is
    unchanged.
  • chore: refactor network permissions to use explicit domain and unix socket rule maps (#15120)
    ## Summary
    
    This PR replaces the legacy network allow/deny list model with explicit
    rule maps for domains and unix sockets across managed requirements,
    permissions profiles, the network proxy config, and the app server
    protocol.
    
    Concretely, it:
    
    - introduces typed domain (`allow` / `deny`) and unix socket permission
    (`allow` / `none`) entries instead of separate `allowed_domains`,
    `denied_domains`, and `allow_unix_sockets` lists
    - updates config loading, managed requirements merging, and exec-policy
    overlays to read and upsert rule entries consistently
    - exposes the new shape through protocol/schema outputs, debug surfaces,
    and app-server config APIs
    - rejects the legacy list-based keys and updates docs/tests to reflect
    the new config format
    
    ## Why
    
    The previous representation split related network policy across multiple
    parallel lists, which made merging and overriding rules harder to reason
    about. Moving to explicit keyed permission maps gives us a single source
    of truth per host/socket entry, makes allow/deny precedence clearer, and
    gives protocol consumers access to the full rule state instead of
    derived projections only.
    
    ## Backward Compatibility
    
    ### Backward compatible
    
    - Managed requirements still accept the legacy
    `experimental_network.allowed_domains`,
    `experimental_network.denied_domains`, and
    `experimental_network.allow_unix_sockets` fields. They are normalized
    into the new canonical `domains` and `unix_sockets` maps internally.
    - App-server v2 still deserializes legacy `allowedDomains`,
    `deniedDomains`, and `allowUnixSockets` payloads, so older clients can
    continue reading managed network requirements.
    - App-server v2 responses still populate `allowedDomains`,
    `deniedDomains`, and `allowUnixSockets` as legacy compatibility views
    derived from the canonical maps.
    - `managed_allowed_domains_only` keeps the same behavior after
    normalization. Legacy managed allowlists still participate in the same
    enforcement path as canonical `domains` entries.
    
    ### Not backward compatible
    
    - Permissions profiles under `[permissions.<profile>.network]` no longer
    accept the legacy list-based keys. Those configs must use the canonical
    `[domains]` and `[unix_sockets]` tables instead of `allowed_domains`,
    `denied_domains`, or `allow_unix_sockets`.
    - Managed `experimental_network` config cannot mix canonical and legacy
    forms in the same block. For example, `domains` cannot be combined with
    `allowed_domains` or `denied_domains`, and `unix_sockets` cannot be
    combined with `allow_unix_sockets`.
    - The canonical format can express explicit `"none"` entries for unix
    sockets, but those entries do not round-trip through the legacy
    compatibility fields because the legacy fields only represent allow/deny
    lists.
    ## Testing
    `/target/debug/codex sandbox macos --log-denials /bin/zsh -c 'curl
    https://www.example.com' ` gives 200 with config
    ```
    [permissions.workspace.network.domains]
    "www.example.com" = "allow"
    ```
    and fails when set to deny: `curl: (56) CONNECT tunnel failed, response
    403`.
    
    Also tested backward compatibility path by verifying that adding the
    following to `/etc/codex/requirements.toml` works:
    ```
    [experimental_network]
    allowed_domains = ["www.example.com"]
    ```
  • [app-server-protocol] introduce generic ClientResponse for app-server-protocol (#15921)
    - introduces `ClientResponse` as the symmetrical typed response union to
    `ClientRequest` for app-server-protocol
    - enables scalable event stream ingestion for use cases such as
    analytics
    - no runtime behavior changes, protocol/schema plumbing only
  • permissions: remove macOS seatbelt extension profiles (#15918)
    ## Why
    
    `PermissionProfile` should only describe the per-command permissions we
    still want to grant dynamically. Keeping
    `MacOsSeatbeltProfileExtensions` in that surface forced extra macOS-only
    approval, protocol, schema, and TUI branches for a capability we no
    longer want to expose.
    
    ## What changed
    
    - Removed the macOS-specific permission-profile types from
    `codex-protocol`, the app-server v2 API, and the generated
    schema/TypeScript artifacts.
    - Deleted the core and sandboxing plumbing that threaded
    `MacOsSeatbeltProfileExtensions` through execution requests and seatbelt
    construction.
    - Simplified macOS seatbelt generation so it always includes the fixed
    read-only preferences allowlist instead of carrying a configurable
    profile extension.
    - Removed the macOS additional-permissions UI/docs/test coverage and
    deleted the obsolete macOS permission modules.
    - Tightened `request_permissions` intersection handling so explicitly
    empty requested read lists are preserved only when that field was
    actually granted, avoiding zero-grant responses being stored as active
    permissions.
  • chore: remove skill metadata from command approval payloads (#15906)
    ## Why
    
    This is effectively a follow-up to
    [#15812](https://github.com/openai/codex/pull/15812). That change
    removed the special skill-script exec path, but `skill_metadata` was
    still being threaded through command-approval payloads even though the
    approval flow no longer uses it to render prompts or resolve decisions.
    
    Keeping it around added extra protocol, schema, and client surface area
    without changing behavior.
    
    Removing it keeps the command-approval contract smaller and avoids
    carrying a dead field through app-server, TUI, and MCP boundaries.
    
    ## What changed
    
    - removed `ExecApprovalRequestSkillMetadata` and the corresponding
    `skillMetadata` field from core approval events and the v2 app-server
    protocol
    - removed the generated JSON and TypeScript schema output for that field
    - updated app-server, MCP server, TUI, and TUI app-server approval
    plumbing to stop forwarding the field
    - cleaned up tests that previously constructed or asserted
    `skillMetadata`
    
    ## Testing
    
    - `cargo test -p codex-app-server-protocol`
    - `cargo test -p codex-protocol`
    - `cargo test -p codex-app-server-test-client`
    - `cargo test -p codex-mcp-server`
    - `just argument-comment-lint`
  • [hooks] add non-streaming (non-stdin style) shell-only PostToolUse support (#15531)
    CHAINED PR - note that base is eternal/hooks-pretooluse-bash, not main
    -- so the following PR should be first
    
    Matching post-tool hook to the pre-tool functionality here:
    https://github.com/openai/codex/pull/15211
    
    So, PreToolUse calls for plain shell calls, allows blocking. This
    PostToolUse call runs after the command executed
    
    example run:
    
    ```
    › as a test, run in parallel the following commands:
      - echo 'one'
      - echo '[block-pre-tool-use]'
      - echo '[block-post-tool-use]'
    
    
    ⚠ MCP startup incomplete (failed: notion, linear)
    
    • Cruising through those three commands in parallel now, and I’ll share the exact outputs right after
      they land.
    
    • Running PreToolUse hook: checking the observatory runes
    
    • Running PreToolUse hook: checking the observatory runes
    
    • Running PreToolUse hook: checking the observatory runes
    
    PreToolUse hook (blocked)
      warning: wizard-tower PreToolUse demo blocked a Bash command on purpose.
      feedback: PreToolUse demo blocked the command. Remove [block-pre-tool-use] to continue.
    
    PreToolUse hook (completed)
      warning: wizard-tower PreToolUse demo inspected Bash: echo 'one'
    
    PreToolUse hook (completed)
      warning: wizard-tower PreToolUse demo inspected Bash: echo '[block-post-tool-use]'
    
    • Ran echo '[block-post-tool-use]'
      └ [block-post-tool-use]
    
    • Ran echo 'one'
      └ one
    
    • Running PostToolUse hook: reading the observatory aftermath
    
    • Running PostToolUse hook: reading the observatory aftermath
    
    PostToolUse hook (blocked)
      warning: wizard-tower PostToolUse demo blocked the Bash follow-through on purpose.
      feedback: PostToolUse demo blocked the result after execution. Remove [block-post-tool-use] to
    continue.
    
    PostToolUse hook (completed)
      warning: wizard-tower PostToolUse demo inspected Bash: echo 'one'
      hook context: PostToolUse demo saw the Bash result. Response preview: one
    
    ──────────────────────────────────────────────────────────────────────────────────────────────────────
    
    • Ran them in parallel. Little wave report:
    
      - echo 'one' completed and returned one.
      - echo '[block-pre-tool-use]' was blocked before execution by the PreToolUse hook.
      - echo '[block-post-tool-use]' ran, but its result was blocked afterward by the PostToolUse hook.
    
      The hook messages were:
    
      - PreToolUse demo blocked the command. Remove [block-pre-tool-use] to continue.
      - PostToolUse demo blocked the result after execution. Remove [block-post-tool-use] to continue.
    
    
    › for the blocked messages (correctly blocked in this test) did the tool output land back into
      context?
    
    
    • Yeah, from this run:
    
      - PreToolUse block: the command itself did not execute, so no command output landed in context. Only
        the block message did.
      - PostToolUse block: the command did execute, but the actual stdout did not come back into context.
        Only the post-tool block message did.
    
      So the blocked notifications landed in context, but the blocked command payloads themselves did not.
      Nice clean guardrail, aloha.
    ```
  • [app-server] Add a method to override feature flags. (#15601)
    - [x] Add a method to override feature flags globally and not just
    thread level.
  • app-server: Return codex home in initialize response (#15689)
    This allows clients to get enough information to interact with the codex
    skills/configuration/etc.
  • app-server: add filesystem watch support (#14533)
    ### Summary
    Add the v2 app-server filesystem watch RPCs and notifications, wire them
    through the message processor, and implement connection-scoped watches
    with notify-backed change delivery. This also updates the schema
    fixtures, app-server documentation, and the v2 integration coverage for
    watch and unwatch behavior.
    
    This allows clients to efficiently watch for filesystem updates, e.g. to
    react on branch changes.
    
    ### Testing
    - exercise watch lifecycles for directory changes, atomic file
    replacement, missing-file targets, and unwatch cleanup
  • Move git utilities into a dedicated crate (#15564)
    - create `codex-git-utils` and move the shared git helpers into it with
    file moves preserved for diff readability
    - move the `GitInfo` helpers out of `core` so stacked rollout work can
    depend on the shared crate without carrying its own git info module
    
    ---------
    
    Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
    Co-authored-by: Codex <noreply@openai.com>
  • Add fork snapshot modes (#15239)
    ## Summary
    - add `ForkSnapshotMode` to `ThreadManager::fork_thread` so callers can
    request either a committed snapshot or an interrupted snapshot
    - share the model-visible `<turn_aborted>` history marker between the
    live interrupt path and interrupted forks
    - update the small set of direct fork callsites to pass
    `ForkSnapshotMode::Committed`
    
    Note: this enables /btw to work similarly as Esc to interrupt (hopefully
    somewhat in distribution)
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • [hooks] add non-streaming (non-stdin style) shell-only PreToolUse support (#15211)
    - add `PreToolUse` hook for bash-like tool execution only at first
    - block shell execution before dispatch with deny-only hook behavior
    - introduces common.rs matcher framework for matching when hooks are run
    
    example run:
    
    ```
    › run three parallel echo commands, and the second one should echo "[block-pre-tool-use]" as a test
    
    
    • Running the three echo commands in parallel now and I’ll report the output directly.
    
    • Running PreToolUse hook: name for demo pre tool use hook
    
    • Running PreToolUse hook: name for demo pre tool use hook
    
    • Running PreToolUse hook: name for demo pre tool use hook
    
    PreToolUse hook (completed)
      warning: wizard-tower PreToolUse demo inspected Bash: echo "first parallel echo"
      
    PreToolUse hook (blocked)
      warning: wizard-tower PreToolUse demo blocked a Bash command on purpose.
      feedback: PreToolUse demo blocked the command. Remove [block-pre-tool-use] to continue.
    
    PreToolUse hook (completed)
      warning: wizard-tower PreToolUse demo inspected Bash: echo "third parallel echo"
    
    • Ran echo "first parallel echo"
      └ first parallel echo
    
    • Ran echo "third parallel echo"
      └ third parallel echo
    
    • Three little waves went out in parallel.
    
      1. printed first parallel echo
      2. was blocked before execution because it contained the exact test string [block-pre-tool-use]
      3. printed third parallel echo
    
      There was also an unrelated macOS defaults warning around the successful commands, but the echoes
      themselves worked fine. If you want, I can rerun the second one with a slightly modified string so
      it passes cleanly.
    ```
  • feat: support disable skills by name. (#15378)
    Support disabling skills by name, primarily for plugin skills. We can’t
    use the path, since plugin skill paths may change across versions.
  • tui: queue follow-ups during manual /compact (#15259)
    ## Summary
    - queue input after the user submits `/compact` until that manual
    compact turn ends
    - mirror the same behavior in the app-server TUI
    - add regressions for input queued before compact starts and while it is
    running
    
    Co-authored-by: Codex <noreply@openai.com>
  • Add realtime transcript notification in v2 (#15344)
    - emit a typed `thread/realtime/transcriptUpdated` notification from
    live realtime transcript deltas
    - expose that notification as flat `threadId`, `role`, and `text` fields
    instead of a nested transcript array
    - continue forwarding raw `handoff_request` items on
    `thread/realtime/itemAdded`, including the accumulated
    `active_transcript`
    - update app-server docs, tests, and generated protocol schema artifacts
    to match the delta-based payloads
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Feat/restore image generation history (#15223)
    Restore image generation items in resumed thread history
  • feat(app-server): add mcpServer/startupStatus/updated notification (#15220)
    Exposes the legacy `codex/event/mcp_startup_update` event as an API v2
    notification.
    
    The legacy event has this shape:
    ```
    #[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
    pub struct McpStartupUpdateEvent {
        /// Server name being started.
        pub server: String,
        /// Current startup status.
        pub status: McpStartupStatus,
    }
    
    #[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
    #[serde(rename_all = "snake_case", tag = "state")]
    #[ts(rename_all = "snake_case", tag = "state")]
    pub enum McpStartupStatus {
        Starting,
        Ready,
        Failed { error: String },
        Cancelled,
    }
    ```
  • [hooks] use a user message > developer message for prompt continuation (#14867)
    ## Summary
    
    Persist Stop-hook continuation prompts as `user` messages instead of
    hidden `developer` messages + some requested integration tests
    
    This is a followup to @pakrym 's comment in
    https://github.com/openai/codex/pull/14532 to make sure stop-block
    continuation prompts match training for turn loops
    
    - Stop continuation now writes `<hook_prompt hook_run_id="...">stop
    hook's user prompt<hook_prompt>`
    - Introduces quick-xml dependency, though we already indirectly depended
    on it anyway via syntect
    - This PR only has about 500 lines of actual logic changes, the rest is
    tests/schema
    
    ## Testing
    
    Example run (with a sessionstart hook and 3 stop hooks) - this shows
    context added by session start, then two stop hooks sending their own
    additional prompts in a new turn. The model responds with a single
    message addressing both. Then when that turn ends, the hooks detect that
    they just ran using `stop_hook_active` and decide not to infinite loop
    
    test files for this (unzip, move codex -> .codex):
    [codex.zip](https://github.com/user-attachments/files/26075806/codex.zip)
    
    ```
    › cats
    
    
    • Running SessionStart hook: lighting the observatory
    
    SessionStart hook (completed)
      warning: Hi, I'm a session start hook for wizard-tower (startup).
      hook context: A wimboltine stonpet is an exotic cuisine from hyperspace
    
    • Cats are tiny zen wizards, my friend: equal parts nap, mystery, and chaos. If you want, we can talk
      cat facts, cat breeds, cat names, or build something cat-themed in this repo.
    
    • Running Stop hook: checking the tower wards
    
    • Running Stop hook: sacking the guards
    
    • Running Stop hook: hiring the guards
    
    Stop hook (completed)
      warning: Wizard Tower Stop hook reviewed the completed reply (177 chars).
    
    Stop hook (blocked)
      warning: Wizard Tower Stop hook continuing conversation
      feedback: cook the stonpet
    
    Stop hook (blocked)
      warning: Wizard Tower Stop hook continuing conversation
      feedback: eat the cooked stonpet
    
    • Stonpet’s cooked, aloha style: flash-seared over a blue quasiflame, glazed with nebula salt, and
      rested until the hyperspace juices settle.
    
      Now we eat with gratitude, my friend. One mindful bite in, and the flavor is pure cosmic surf:
      smoky, bright, and totally out of this dimension.
    
    • Running Stop hook: checking the tower wards
    
    • Running Stop hook: sacking the guards
    
    • Running Stop hook: hiring the guards
    
    Stop hook (completed)
      warning: Wizard Tower Stop hook reviewed the completed reply (285 chars).
    
    Stop hook (completed)
      warning: Wizard Tower Stop hook saw a second pass and stayed calm to avoid a loop.
    
    Stop hook (completed)
      warning: Wizard Tower Stop hook saw a second pass and stayed calm to avoid a loop.
    ```
  • feat: support product-scoped plugins. (#15041)
    1. Added SessionSource::Custom(String) and --session-source.
      2. Enforced plugin and skill products by session_source.
      3. Applied the same filtering to curated background refresh.
  • Add thread/shellCommand to app server API surface (#14988)
    This PR adds a new `thread/shellCommand` app server API so clients can
    implement `!` shell commands. These commands are executed within the
    sandbox, and the command text and output are visible to the model.
    
    The internal implementation mirrors the current TUI `!` behavior.
    - persist shell command execution as `CommandExecution` thread items,
    including source and formatted output metadata
    - bridge live and replayed app-server command execution events back into
    the existing `tui_app_server` exec rendering path
    
    This PR also wires `tui_app_server` to submit `!` commands through the
    new API.
  • Simple directory mentions (#14970)
    - Adds simple support for directory mentions in the TUI.
    - Codex App/VS Code will require minor change to recognize a directory
    mention as such and change the link behavior.
    - Directory mentions have a trailing slash to differentiate from
    extensionless files
    
    
    <img width="972" height="382" alt="image"
    src="https://github.com/user-attachments/assets/8035b1eb-0978-465b-8d7a-4db2e5feca39"
    />
    <img width="978" height="228" alt="image"
    src="https://github.com/user-attachments/assets/af22cf0b-dd10-4440-9bee-a09915f6ba52"
    />