Commit Graph

6961 Commits

  • feat: Add focused diagnostics for MCP HTTP send failures (#25013)
    Adds failure-only logging for MCP streamable HTTP post_message calls and
    the underlying reqwest send path, capturing the MCP method/request id,
    endpoint shape, auth-header presence, timeout/connect classification,
    and sanitized error source chain without logging headers, bodies,
    tokens, or full URLs.
  • Move config document helpers into their own module (#25110)
    ## Why
    
    `core/src/config/edit.rs` owns the config edit state machine, but it
    also carried the TOML document helper code inline as a nested module.
    Moving those helpers into their own file keeps the edit orchestration
    easier to scan without changing the config persistence behavior.
    
    ## What changed
    
    - Moved the existing `document_helpers` module from
    `core/src/config/edit.rs` into
    `core/src/config/edit/document_helpers.rs`.
    - Added `mod document_helpers;` so the existing `pub(super)` helper API
    remains available to the rest of `config::edit`.
    
    ## Testing
    
    Not run; this is a refactor-only module extraction with no intended
    behavior change.
  • Show activity for standalone web search calls (#24693)
    ## Why
    
    Standalone `web.run` calls run in the extension, so they need normal
    web-search progress activity while a request is in flight and durable
    completed activity after a thread is reloaded.
    
    Follow-up to #23823; uses the extension turn-item emission path added in
    #24813.
    
    ## What changed
    
    - Emit standalone `web.run` start/completion items through the host
    turn-item emitter, preserving standard client delivery and rollout
    persistence.
    - Include useful completion detail for queries, image queries, and
    literal-URL `open`/`find` commands.
    - Render completed searches as `Searched the web` or `Searched the web
    for <detail>`, with snapshot coverage for the detail-free case.
    - Extend the app-server round-trip test to verify completed search
    activity is reconstructed by `thread/read` after a fresh-process reload.
    
    ## Testing
    
    - `just test -p codex-web-search-extension`
    - `just test -p codex-app-server -E
    "test(standalone_web_search_round_trips_encrypted_output)"`
  • [codex] Add model tool mode selector (#25031)
    ## Why
    Some models need to select their code-execution behavior through model
    catalog metadata. Models without that metadata must continue to follow
    the existing `CodeMode` and `CodeModeOnly` feature flags, including when
    a newer server sends an enum value this client does not recognize.
    
    ## What changed
    - add optional `ModelInfo.tool_mode` metadata with `direct`,
    `code_mode`, and `code_mode_only`
    - treat omitted and unknown wire values as `None`
    - resolve `None` from the existing feature flags
    - carry the resolved `ToolMode` directly on `TurnContext`, outside
    `Config`
    - use the resolved value for turn creation, model switches, review
    turns, tool planning, and code execution
    
    ## Coverage
    - add protocol coverage for omitted, known, and unknown enum values
    - add focused coverage for flag fallback and explicit metadata
    overriding feature flags
    - add core integration coverage that fetches remote model metadata
    through `/v1/models` and verifies the outbound `/responses` tools for
    explicit `direct` and `code_mode_only` selectors
    
    ## Stack
    - followed by #25032
  • Render multiline hook output in TUI (#24965)
    # Why
    
    Fixes #24529. Completed hook output in the TUI rendered each
    `HookOutputEntry` as one ratatui line, so explicit newlines inside hook
    output were not shown as separate transcript rows. That made multiline
    `SessionStart.additionalContext` hard to inspect even though the
    model-facing context path preserved the original text.
    
    # What
    
    - Split completed hook output entries on explicit newlines before
    rendering them in `codex-rs/tui/src/history_cell/hook_cell.rs`.
    - Keep the hook output prefix, such as `hook context:` or `warning:`, on
    the first physical line only.
    - Preserve explicit blank lines and render continuation lines with the
    hook body indent.
    - Add unit coverage for multiline context and warning output, plus a
    chatwidget snapshot regression for `SessionStart` history output.
    
    # Testing
    
    - `cargo nextest run -p codex-tui completed_hook_multiline
    hook_completed_before_reveal_renders_completed_without_running_flash`
    - `just argument-comment-lint -p codex-tui -- --ignore-rust-version
    --lib --tests`
  • Remove stale rollout TODO tests (#25106)
    ## Summary
    
    Remove a stale `TODO(jif)` block of commented-out rollout listing tests
    that still referenced an older listing API.
    
    The current rollout listing behavior is covered by the active state DB
    and filesystem fallback tests, so keeping the dead commented tests just
    adds noise.
    
    ## Validation
    
    - `just fmt`
    - `just test -p codex-rollout`
  • Handle goal usage limits from turn errors (#25095)
    ## Summary
    - handle goal usage-limit turn errors in the goal extension
    - exercise the extension path in the goal backend test
    
    ## Tests
    - just fmt
    - just test -p codex-goal-extension
    - just fix -p codex-goal-extension
  • [codex] Improve built-in tool schema docs (#24794)
    ## Summary
    - Clarify default, omission, and bounded behavior across built-in tool
    schemas, including unified exec, classic shell, Code Mode exec/wait,
    multi-agent, agent job, MCP resource, image, goal, plan, tool_search,
    and test-sync fields.
    - Convert update_plan status to an enum and add short field descriptions
    where the schema previously relied on surrounding context.
    - Remove the dedicated permission-approval schema test and keep only
    updates to existing expected-spec tests.
    
    ## Validation
    - Ran `just fmt`.
    - Ran `git diff --check`.
    - Did not run clippy or tests, per request.
    
    Regression has been eval
    [here](https://openai.slack.com/archives/C09GDSP1J9X/p1779905065496949)
    and we proved there are no regressions
  • Drop debug-client prompt state tracking (#25070)
    Deletes `codex-rs/debug-client/src/state.rs` as one step in removing the
    stale app-server debug client.
    
    This intentionally leaves Cargo workspace and lockfile cleanup for a
    later follow-up PR.
  • Remove debug-client server event reader (#25069)
    Deletes `codex-rs/debug-client/src/reader.rs` as one step in removing
    the stale app-server debug client.
    
    This intentionally leaves Cargo workspace and lockfile cleanup for a
    later follow-up PR.
  • Delete debug-client JSONL output helper (#25068)
    Deletes `codex-rs/debug-client/src/output.rs` as one step in removing
    the stale app-server debug client.
    
    This intentionally leaves Cargo workspace and lockfile cleanup for a
    later follow-up PR.
  • Remove the debug-client CLI entrypoint (#25067)
    Deletes `codex-rs/debug-client/src/main.rs` as one step in removing the
    stale app-server debug client.
    
    This intentionally leaves Cargo workspace and lockfile cleanup for a
    later follow-up PR.
  • Retire debug-client interactive command parsing (#25066)
    Deletes `codex-rs/debug-client/src/commands.rs` as one step in removing
    the stale app-server debug client.
    
    This intentionally leaves Cargo workspace and lockfile cleanup for a
    later follow-up PR.
  • Delete debug-client app-server process plumbing (#25065)
    Deletes `codex-rs/debug-client/src/client.rs` as one step in removing
    the stale app-server debug client.
    
    This intentionally leaves Cargo workspace and lockfile cleanup for a
    later follow-up PR.
  • Remove the generated debug-client README (#25064)
    Deletes `codex-rs/debug-client/README.md` as one step in removing the
    stale app-server debug client.
    
    This intentionally leaves Cargo workspace and lockfile cleanup for a
    later follow-up PR.
  • Drop the stale debug-client manifest (#25063)
    Deletes `codex-rs/debug-client/Cargo.toml` as one step in removing the
    stale app-server debug client.
    
    This intentionally leaves Cargo workspace and lockfile cleanup for a
    later follow-up PR.
  • Use inject_if_running for active goal steering (#24924)
    ## Why
    
    This PR is stacked on #24918, which moves goal steering onto
    source-labeled internal model context fragments. Active-turn goal
    steering should use the same running-turn injection path as other
    runtime steering, so those fragments enter the pending input queue as
    `ResponseItem`s through the existing
    [`Session::inject_if_running`](https://github.com/openai/codex/blob/8d6f6cdf69b055c27682e7cdea9caf72a3e2ee7f/codex-rs/core/src/session/inject.rs#L12-L27)
    behavior instead of through a goal-specific conversion wrapper.
    
    ## What Changed
    
    - Exposes a narrow `CodexThread::inject_if_running` bridge for callers
    that only hold a thread handle.
    - Changes `ext/goal` active-turn steering to pass `ResponseItem`s
    directly.
    - Builds goal steering prompts as contextual internal model context
    `ResponseItem`s before injecting them into the running turn.
    
    ## Testing
    
    Not run locally; PR metadata update only.
  • Use internal model context fragments for goal steering (#24918)
    ## Why
    
    Goal steering is one form of runtime-owned model context, but the old
    `<goal_context>` wrapper made the contextual-fragment hiding path
    goal-specific. Using a source-labeled internal context fragment gives
    core and extensions a shared shape for hidden model steering while
    keeping those prompts out of visible turn history.
    
    The change also keeps legacy `<goal_context>` messages recognized as
    hidden contextual input so existing stored history does not start
    rendering old goal-steering prompts as user-visible turn items.
    
    ## What Changed
    
    - Replaces `GoalContext` with `InternalModelContextFragment` plus a
    validated `InternalContextSource`.
    - Renders goal steering as `<codex_internal_context
    source="goal">...</codex_internal_context>`.
    - Updates core goal steering and `ext/goal` steering to inject the new
    internal-context fragment.
    - Updates contextual-fragment, event-mapping, goal, and session tests
    for the new wrapper.
    
    ## Test Coverage
    
    - Adds coverage for detecting the new internal model context fragment.
    - Preserves coverage for hiding legacy `<goal_context>` fragments.
    - Verifies invalid internal context sources are rejected and arbitrary
    context tags are not hidden.
    - Updates goal steering/session assertions to expect the new
    `source="goal"` wrapper.
  • Fix fs/watch debounce batching (#24716)
    ## Summary
    
    `fs/watch` was using a local debounce wrapper whose deadline was
    initialized once and then reused after the first batch. Once that stale
    deadline was in the past, later file changes could bypass the intended
    200ms debounce and send noisier `fs/changed` notifications.
    
    This moves the debounce wrapper into `codex-file-watcher` as
    `DebouncedWatchReceiver`, resets the debounce deadline for each event
    batch, preserves pending paths across cancelled receives, and updates
    app-server `fs/watch` to use the shared wrapper.
    
    Fixes #24692.
  • fix: preserve deny-read sandboxing for safe commands (#23943)
    ## Why
    
    Permission profiles can mark filesystem entries as unreadable with
    `deny` rules, including glob patterns. Several shell execution paths
    treated known-safe commands or execpolicy `allow` rules as sufficient to
    run outside the filesystem sandbox. That is not valid for read-capable
    commands: for example, `cat` or `ls` may be reasonable to allow
    generally, but dropping the sandbox would also drop deny-read
    constraints such as `**/*.env`.
    
    ## What changed
    
    - Added a shared check that treats active deny-read restrictions as
    incompatible with unsandboxed execution.
    - Kept first-attempt execution sandboxed for explicit escalation and
    execpolicy allow bypasses when deny-read entries are present.
    - Prevented no-sandbox retry after a sandbox denial when the active
    filesystem policy contains deny-read entries.
    - Updated the zsh-fork execve path so prefix-rule `allow` decisions
    continue inside the current sandbox when deny-read restrictions are
    active.
    
    ## Verification
    
    - `cargo test -p codex-core tools::sandboxing::tests`
    - `cargo test -p codex-core
    tools::runtimes::shell::unix_escalation::tests`
    - `cargo test -p codex-core
    shell_command_enforces_glob_deny_read_policy`
  • Seed prompt history from resumed messages (#24298)
    ## Why
    
    When the TUI resumes a thread, transcript replay renders prior user
    messages but did not seed the composer history. That leaves the resumed
    session with empty in-memory prompt history, so pressing Up can fall
    through to persisted global history and surface a prompt from another
    thread.
    
    The expected behavior is that prompts from the resumed thread are
    recalled first, with global history only as a fallback.
    
    ## What changed
    
    - Record replayed user messages into the composer history during resume
    replay.
    - Preserve the existing persisted history format and avoid any startup
    history scan.
    - Add focused TUI coverage showing replayed prompts are recalled before
    persisted global history.
    
    ## Validation
    
    - Added `replayed_user_messages_seed_composer_history` in
    `codex-rs/tui/src/chatwidget/tests/history_replay.rs`.
    - `just test -p codex-tui replayed_user_messages_seed_composer_history`
    passed.
  • Add runtime extra skill roots API (#24977)
    ## Summary
    - Add v2 `skills/extraRoots/set` to replace app-server process-local
    standalone skill roots. The setting is not persisted, accepts missing
    roots, and `extraRoots: []` clears the runtime set.
    - Wire runtime roots into core skill discovery for `skills/list` and
    turn loads, clear skill caches on set, and register the roots with the
    skills watcher so later filesystem changes emit `skills/changed`.
    - Update app-server docs, generated JSON/TypeScript schemas, and
    coverage for serialization, missing roots, empty clears, and restart
    behavior.
    
    ## Testing
    - `cargo test -p codex-app-server-protocol`
    - `cargo test -p codex-core-skills`
    - `cargo test -p codex-app-server
    skills_extra_roots_set_updates_process_runtime_roots`
    - `just fix -p codex-app-server-protocol`
    - `just fix -p codex-core-skills`
    - `just fix -p codex-app-server`
  • [codex] Avoid PowerShell safety parsing off Windows (#24946)
    ## Summary
    
    This fixes BUGB-17567 by preventing non-Windows command safety
    classification from invoking the Windows PowerShell safelist/parser
    path.
    
    Previously, `is_known_safe_command` called the Windows PowerShell
    classifier on every platform. That classifier recognizes
    `pwsh`/`powershell` by basename and delegates script parsing to the
    PowerShell AST parser. The parser starts the supplied executable, so on
    macOS/Linux a repository-controlled `pwsh` path could execute during
    safety parsing before the normal sandboxed command execution path.
    
    The change gates the Windows PowerShell classifier and module behind
    `#[cfg(windows)]`. On macOS/Linux, PowerShell-looking commands are no
    longer auto-approved by the Windows classifier and instead fall through
    to the normal non-Windows safe-command logic.
    
    ## Validation
    
    - `/private/tmp/codex-tools/bin/just fmt`
    - `PATH=/private/tmp/codex-tools/bin:$PATH
    /private/tmp/codex-tools/bin/just test -p codex-shell-command`
    
    The focused test run passed 135 tests with 0 skipped and completed the
    crate bench-smoke step.
    
    ## Notes
    
    This PR is scoped to the BUGB-17567 macOS/Linux path. Windows still uses
    the PowerShell classifier; a separate hardening follow-up should ensure
    Windows safety parsing only executes a trusted PowerShell parser binary
    and does not spawn the command's `argv[0]` when that path may be
    repository-controlled.
  • fix(config): use deny for Unix socket permissions (#24970)
    ## Why
    
    Unix socket permissions still accepted and displayed `"none"` while file
    permissions use the clearer `"deny"` spelling. This keeps network Unix
    socket policy vocabulary consistent with filesystem policy vocabulary.
    
    ## What changed
    
    - Replace the Unix socket permission variant and serialized spelling
    from `none` to `deny` across config, feature configuration, and network
    proxy types.
    - Update app-server v2 serialization, TUI debug output, focused tests,
    and generated schemas to expose `"deny"`.
    - Add coverage for denied Unix socket entries in managed requirements
    and profile overlay behavior.
    
    ## Security
    
    This is a vocabulary change for explicit Unix socket rejection, not a
    network access expansion. Denied entries continue to be omitted from the
    effective allowlist.
    
    ## Validation
    
    - `just fmt`
    - `just write-config-schema`
    - `just write-app-server-schema`
    - `just test -p codex-config -p codex-core -p codex-app-server-protocol
    -p codex-tui -E
    'test(network_requirements_are_preserved_as_constraints_with_source) |
    test(network_permission_containers_project_allowed_and_denied_entries) |
    test(network_toml_overlays_unix_socket_permissions_by_path) |
    test(permissions_profiles_resolve_extends_parent_first_with_child_overrides)
    | test(network_requirements_serializes_canonical_and_legacy_fields) |
    test(debug_config_output_formats_unix_socket_permissions)'`\n- Automatic
    `bench-smoke` follow-up from `just test`\n- `cargo clippy -p
    codex-config -p codex-core -p codex-features -p codex-network-proxy -p
    codex-app-server-protocol -p codex-app-server -p codex-tui --all-targets
    -- -D warnings`
  • feat(app-server): migrate remote control to server tokens (#24141)
    ## Why
    
    `codex-backend` now authenticates remote-control server websocket
    connections with short-lived server tokens instead of the user's ChatGPT
    access token. `app-server` needs to mint and refresh those server tokens
    without persisting them, so a restart can reconnect from durable
    enrollment identity while keeping the bearer token memory-only.
    
    ## What Changed
    
    Updated the remote-control transport to consume `remote_control_token`
    and `expires_at` from server enroll responses and added
    `/server/refresh` support for persisted enrollments or expiring cached
    tokens.
    
    Websocket handshakes now send `Authorization: Bearer
    <remote_control_token>` with the existing server identity headers, and
    no longer send the ChatGPT bearer token or `chatgpt-account-id` on that
    websocket path.
    
    The in-memory enrollment state now owns the ephemeral server token
    cache, while SQLite still persists only `server_id`, `environment_id`,
    and `server_name`. Websocket `401`/`403` clears only the cached token
    for refresh on reconnect; websocket or refresh `404` clears stale
    persisted enrollment and re-enrolls. Response body previews redact
    `remote_control_token` before surfacing parse errors.
    
    ## Verification
    
    - `just test -p codex-app-server-transport`
    - Manual prod smoke with an isolated `CODEX_HOME`: `codex remote-control
    --json -c 'chatgpt_base_url="https://chatgpt.com/backend-api"'` reached
    `status:"connected"` with
    `environmentId:"env_i_6a17d9f1d764832986da2e80f4554f1b"`.
  • Tighten hook output event schemas (#24962)
    # Why
    
    Fixes #23993.
    
    Hook command output schemas are published as the contract for hook
    authors and schema-driven tooling. The event-specific output schemas
    previously described `hookSpecificOutput.hookEventName` as the global
    `HookEventNameWire` enum, so a `pre-tool-use.command.output` schema
    would validate mismatched values like `PostToolUse`. That made the
    schemas less precise than the intended event-specific contract.
    
    # What
    
    Constrain each hook-specific output schema to the matching literal
    `hookEventName` value, mirroring the existing input-schema shape.
    
    Also split `SubagentStartHookSpecificOutputWire` from the session-start
    output wire so `subagent-start.command.output.schema.json` can emit
    `const: "SubagentStart"` instead of sharing the session-start
    definition.
    
    # Verification
    
    - `cargo nextest run -p codex-hooks`
    - `just fix -p codex-hooks`
    - `just argument-comment-lint -p codex-hooks -- --all-targets`
  • windows-sandbox: fix capture cancellation test roots (#24974)
    ## Why
    
    The Windows Bazel job on `main` started failing after #24108 because one
    Windows-only capture test still passed `cwd.as_path()` to
    `run_windows_sandbox_capture`. That helper now expects the explicit
    `workspace_roots` slice introduced by #24108, so the Windows test target
    no longer compiled.
    
    ## What Changed
    
    - Updates `legacy_capture_cancellation_is_not_reported_as_timeout` to
    pass `workspace_roots_for(cwd.as_path()).as_slice()`, matching the
    adjacent capture test and the new runner signature.
    
    ## Verification
    
    - GitHub Actions CI is the important validation for this Windows-only
    compile path.
    - Created quickly to get Windows CI running while the separate Ubuntu
    `compact_resume_fork` timeout is still under investigation.
  • windows-sandbox: pass workspace roots to runner (#24108)
    ## Why
    
    #23813 switches the Windows sandbox runner path to `PermissionProfile`,
    but it still left one runtime anchor for resolving symbolic
    `:workspace_roots` entries. That is not enough once a turn has multiple
    effective workspace roots: exact entries and deny globs under
    `:workspace_roots` need to be materialized for every runtime root before
    the command runner chooses token mode or builds ACL plans.
    
    ## What Changed
    
    - Replaces the Windows runner/setup `permission_profile_cwd` plumbing
    with `workspace_roots: Vec<AbsolutePathBuf>`.
    - Resolves Windows-local `PermissionProfile` data with
    `materialize_project_roots_with_workspace_roots(...)` instead of the
    single-cwd helper.
    - Threads `Config::effective_workspace_roots()` through core execution,
    unified exec, TUI setup/read-grant flows, app-server setup, app-server
    `command/exec`, and `debug sandbox` on Windows.
    - Preserves those workspace roots through the zsh-fork escalation
    executor instead of rebuilding them from `sandbox_policy_cwd`.
    - Makes `ExecRequest::new(...)` and the remaining
    `build_exec_request(...)` helper path take
    `windows_sandbox_workspace_roots` explicitly so new call sites cannot
    silently fall back to `vec![cwd]`.
    - Clarifies the `debug sandbox` non-Windows comment: remaining
    cwd-dependent resolution still uses `sandbox_policy_cwd`, while
    `:workspace_roots` entries are already materialized from config roots.
    - Updates elevated runner IPC `SpawnRequest` to send `workspace_roots`
    and bumps the framed IPC protocol version to `3` for the payload shape
    change.
    - Adds Windows-local resolver coverage for expanding exact and glob
    `:workspace_roots` entries across multiple roots, plus core helper
    coverage proving explicit roots are preserved.
    
    ## Verification
    
    - `cargo check -p codex-windows-sandbox -p codex-core -p codex-tui -p
    codex-cli -p codex-app-server`
    - `cargo test -p codex-windows-sandbox`
    - `cargo test -p codex-core windows_sandbox`
    - `cargo test -p codex-core unix_escalation`
    - `cargo test -p codex-app-server windows_sandbox`
    - `cargo test -p codex-tui windows_sandbox`
    - `cargo test -p codex-cli debug_sandbox`
    - `just test -p codex-core unified_exec`
    - `just test -p codex-core
    build_exec_request_preserves_windows_workspace_roots`
    - `env -u CODEX_NETWORK_PROXY_ACTIVE -u
    CODEX_NETWORK_ALLOW_LOCAL_BINDING just test -p codex-app-server --lib
    command_exec`
    - `just test -p codex-windows-sandbox`
    - `just test -p codex-exec sandbox`
    - `just fix -p codex-core -p codex-app-server -p codex-windows-sandbox`
    
    A local macOS cross-check with `cargo check --target
    x86_64-pc-windows-msvc ...` did not reach crate Rust code because native
    dependencies require Windows SDK headers (`windows.h` / `assert.h`) in
    this environment; Windows CI remains the real target validation.
    
    Two local targeted filters compile but do not run assertions on macOS:
    `env -u CODEX_NETWORK_PROXY_ACTIVE -u CODEX_NETWORK_ALLOW_LOCAL_BINDING
    just test -p codex-app-server --lib command_exec_processor` matched zero
    tests, and `just test -p codex-linux-sandbox landlock` matched zero
    tests because the landlock suite is Linux-only.
  • Surface filesystem permission profiles in prompt context (#23924)
    ## Summary
    Some permission profiles can encode filesystem reads that should remain
    unavailable to the agent. Before this change, the model-visible context
    and automatic approval review prompt summarized the effective
    permissions as a legacy sandbox mode, which can omit permission-profile
    filesystem entries from escalation decisions.
    
    For example, a profile can grant workspace access while denying a
    private subtree across every workspace root:
    
    ```toml
    default_permissions = "restricted-workspace"
    
    [permissions.restricted-workspace.workspace_roots]
    "/Users/alice/project" = true
    "/Users/alice/other-project" = true
    
    [permissions.restricted-workspace.filesystem]
    ":minimal" = "read"
    
    [permissions.restricted-workspace.filesystem.":workspace_roots"]
    "." = "write"
    "private" = "deny"
    "private/**" = "deny"
    ```
    
    The context window now describes the workspace roots and effective
    filesystem side of the `PermissionProfile` directly, with deny entries
    marked as non-escalatable:
    
    ```xml
    <environment_context>
      <cwd>/Users/alice/project</cwd>
      <shell>zsh</shell>
      <filesystem><workspace_roots><root>/Users/alice/project</root><root>/Users/alice/other-project</root></workspace_roots><permission_profile type="managed"><file_system type="restricted"><entry access="read"><special>:minimal</special></entry><entry access="write"><path>/Users/alice/project</path></entry><entry access="write"><path>/Users/alice/other-project</path></entry><entry access="deny" escalatable="false"><path>/Users/alice/project/private</path></entry><entry access="deny" escalatable="false"><path>/Users/alice/other-project/private</path></entry><entry access="deny" escalatable="false"><glob>/Users/alice/project/private/**</glob></entry><entry access="deny" escalatable="false"><glob>/Users/alice/other-project/private/**</glob></entry></file_system></permission_profile></filesystem>
    </environment_context>
    ```
    
    Managed requirements can impose the same kind of deny-read restriction:
    
    ```toml
    [permissions.filesystem]
    deny_read = [
      "/Users/alice/project/private",
      "/Users/alice/project/private/**",
    ]
    ```
    
    The automatic approval review prompt also receives the parent turn's
    denied-read context, so review decisions can account for the active
    permission profile.
    
    ## What Changed
    - Render the effective filesystem profile in `<environment_context>`,
    including profile type, filesystem entries, workspace roots, and
    non-escalatable deny entries.
    - Persist effective `workspace_roots` in `TurnContextItem` so
    resumed/replayed context does not have to bind `:workspace_roots`
    through legacy `cwd` fallback.
    - Add explicit permission instructions that denied reads are policy
    restrictions, not escalation targets.
    - Pass the parent turn's denied-read context into automatic approval
    reviews.
    - Add targeted coverage for prompt rendering, workspace-root
    materialization, replay context, and review prompt context.
    - Keep the prompt-context test expectations platform-aware so the same
    filesystem rendering assertions pass on Unix and Windows paths.
    
    ## Testing
    - `just test -p codex-core
    context::environment_context::tests::serialize_environment_context_with_full_filesystem_profile`
    - `just test -p codex-core
    context::environment_context::tests::turn_context_item_filesystem_uses_workspace_roots_instead_of_cwd`
    - `just test -p codex-core
    context::permissions_instructions::permissions_instructions_tests::builds_permissions_from_profile_with_denied_reads`
    - `just fix -p codex-core`
    
    I also attempted `just test -p codex-core`; the changed prompt-context
    tests passed, but the full local run did not complete cleanly in this
    sandboxed macOS environment due unrelated user-shell `CODEX_SANDBOX*`
    expectations and integration-test timeouts.
  • [codex] Add user input client ids (#24653)
    ## Summary
    
    Adds an optional `clientId` field to app-server v2 `UserInput` and
    carries it through the core `UserInput` model so clients can correlate
    echoed user input items without relying on payload equality.
    
    ## Details
    
    - Adds `client_id: Option<String>` to core `UserInput` variants.
    - Exposes the v2 app-server field as `clientId` on the wire and in
    generated TypeScript.
    - Preserves the id when converting between app-server v2 and core
    protocol types.
    - Regenerates app-server schema fixtures.
    
    ## Validation
    
    - `just fmt`
    - `just write-app-server-schema`
    - `cargo test -p codex-app-server-protocol`
    - `cargo test -p codex-protocol`
    - `just fix -p codex-app-server-protocol`
    - `just fix -p codex-protocol`
    - `git diff --check`
  • fix(exec-server): reject websocket requests with Origin headers (#24947)
    ## Why
    
    `codex exec-server` has a local WebSocket listener, but it did not apply
    the same browser-origin request handling as the `app-server` WebSocket
    transport. Requests that carry an `Origin` header should not be upgraded
    by this local transport, keeping both local WebSocket servers consistent
    and avoiding unexpected browser-initiated connections.
    
    ## What changed
    
    - Added an Axum middleware guard in
    `codex-rs/exec-server/src/server/transport.rs` that returns `403
    Forbidden` for requests carrying an `Origin` header.
    - Added an integration test in `codex-rs/exec-server/tests/websocket.rs`
    that covers rejection of an `Origin`-bearing WebSocket handshake.
    - Kept ordinary WebSocket clients unchanged: existing no-`Origin`
    initialization and process behavior remains covered by the crate tests.
    
    ## Validation
    
    - `just test -p codex-exec-server` test phase (`186 passed`; run outside
    the parent macOS sandbox so nested sandbox tests can execute)
    - `just clippy -p codex-exec-server`
  • fix: cancel Windows sandbox on network denial (#19880)
    ## Why
    
    When Guardian or the sandbox network proxy detects and denies a network
    attempt, core cancels the associated execution through `ExecExpiration`.
    The Windows sandbox capture path was only forwarding the timeout
    component of that expiration state. As a result, a sandboxed Windows
    command whose network attempt had already been denied could keep running
    until its timeout elapsed rather than terminating promptly in response
    to the denial.
    
    This change closes that cancellation-propagation gap for Windows sandbox
    execution.
    
    ## What changed
    
    - Added `WindowsSandboxCancellationToken` as the cancellation hook
    exposed to Windows capture backends.
    - Extracted the cancellation token from `ExecExpiration` in core and
    passed it to both the direct and elevated Windows sandbox capture paths
    alongside the existing timeout.
    - Updated direct capture to poll for either process exit, timeout, or
    cancellation and to terminate cancelled processes without reporting them
    as timed out.
    - Updated elevated capture to watch for cancellation and send the
    existing `Terminate` IPC frame to the elevated runner. The watcher parks
    for 50 ms between checks to bound response latency without a tight busy
    wait.
    - Added Windows regression coverage for a long-running PowerShell
    command: cancellation ends capture before its timeout and does not set
    `timed_out`.
    - Added a visible skip diagnostic when that PowerShell-dependent
    regression test cannot execute, and consolidated the duplicated
    expiration-policy branch identified in review.
    
    ## Security
    
    This improves enforcement after a denied network attempt has been
    attributed to a Windows sandboxed execution: the command no longer
    remains alive simply because Windows capture lost the cancellation
    signal.
    
    This PR does not claim to make Windows offline mode an airtight
    no-network or no-exfiltration boundary. It does not introduce
    AppContainer or change how network denial is detected; it makes an
    already-detected denial promptly stop the affected sandboxed command.
    
    ## Validation
    
    ### Commands run
    
    - `just fmt`
    - `cargo test -p codex-windows-sandbox`
    - `cargo test -p codex-core network_denial`
    - `cargo clippy -p codex-core -p codex-windows-sandbox --tests --no-deps
    -- -D warnings`
    - `just argument-comment-lint -p codex-windows-sandbox -p codex-core`
    
    The new capture regression is `cfg(target_os = "windows")`, so Windows
    CI is the execution coverage for that test path. The local macOS test
    runs validate the host-runnable crate and core network-denial behavior.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • runtime: prepend zsh fork bin dir to PATH (#23768)
    ## Why
    
    #23756 makes packaged Codex builds include and default to the bundled
    zsh fork. The important reason to put that fork's directory at the front
    of `PATH` is to keep executable-level escalation working after a command
    leaves the original shell and later re-enters zsh through `env`.
    
    The expected chain is:
    
    1. The zsh fork runs the top-level shell command.
    2. That command launches another program, such as `python3`, while
    inheriting the `EXEC_WRAPPER` environment and the escalation socket fd.
    3. That program spawns a shell script whose shebang is `#!/usr/bin/env
    zsh` rather than `#!/bin/zsh`, and it does not close the escalation fd.
    4. `/usr/bin/env` resolves `zsh` through `PATH`, so it must find the
    packaged zsh fork before the system zsh.
    5. Commands inside that nested script are intercepted by the zsh fork
    and can still request escalation from Codex.
    
    If `PATH` resolves `zsh` to the system shell instead, the nested script
    loses zsh-fork exec interception. Commands that should request
    escalation can then run only in the original sandbox, or fail there,
    without Codex ever receiving the approval request.
    
    Shell snapshots make this slightly more subtle: a snapshot can restore
    an older `PATH` after the child shell starts. This PR treats the zsh
    fork `PATH` prepend as an explicit environment override so snapshot
    wrapping preserves it.
    
    ## What Changed
    
    - Added shared zsh-fork runtime helpers that prepend the configured zsh
    executable parent directory to `PATH` without duplicate entries.
    - Applied the zsh fork `PATH` prepend to both zsh-fork `shell_command`
    launches and unified-exec zsh-fork launches before sandbox command
    construction.
    - Kept the shell-command zsh-fork backend API narrow: it derives the
    configured zsh path from session services and rebuilds its sandbox
    environment from `req.env`, rather than accepting a second, competing
    environment map or a separately threaded bin dir.
    - Kept Unix-only zsh-fork `PATH` mutation out of Windows clippy-visible
    mutability.
    - Added coverage for duplicate `PATH` entries, for preserving the zsh
    fork prepend through shell snapshot wrapping, and for the nested
    `python3` -> `#!/usr/bin/env zsh` escalation flow.
    
    ## Testing
    
    - `just fmt`
    - `just fix -p codex-core`
    
    I left final test validation to CI after the latest review-comment
    cleanup. Before that cleanup, `just test -p codex-core zsh_fork` passed
    locally for the zsh-fork-focused tests.
  • [codex] Remove Bedrock OSS models from catalog (#24960)
    Remove the GPT OSS 120B and 20B entries from the Amazon Bedrock static
    model catalog, as they are no longer supported.
  • [codex] Handle PowerShell UTF-8 setup failures (#24949)
    Fixes #12496.
    
    ## Why
    
    Windows sandboxed PowerShell commands can run under
    `ConstrainedLanguage` on some machines, especially enterprise-managed
    Windows environments. In that mode, our PowerShell command prelude could
    fail before every command because it directly assigned
    `[Console]::OutputEncoding` to UTF-8. The actual user command still ran,
    but Codex surfaced noisy `Cannot set property. Property setting is
    supported only on core types in this language mode.` output for every
    shell call.
    
    ## What Changed
    
    - Makes the PowerShell UTF-8 output encoding prelude best-effort by
    wrapping the assignment in `try { ... } catch {}`.
    - Keeps the existing UTF-8 behavior when PowerShell allows the
    assignment.
    - Adds focused tests for adding the prelude and avoiding duplicate
    prelude insertion.
    
    ## Validation
    
    - `cargo fmt -p codex-shell-command`
    - `cargo check -p codex-shell-command`
    - `git diff --check`
    - Verified a local `ConstrainedLanguage` PowerShell probe prints only
    the command output with no property-setting error.
    - Verified `codex exec` from a temporary `chcp 437` context reports
    `utf-8` / `65001` and preserves non-ASCII output (`café`, `漢字`).
  • fix(tui): prevent repository-configured code execution in /diff (#24954)
    ## Why
    
    `/diff` is intended to display working-tree changes, but its Git
    invocations honored repository-selected executable helpers. A repository
    could configure diff/text conversion helpers, clean/process filters,
    `core.fsmonitor`, or `post-index-change` hooks that execute when a user
    runs `/diff`.
    
    Fixes
    [PSEC-4395](https://linear.app/openai/issue/PSEC-4395/codex-cli-diff-executes-repository-selected-diff-helpers).
    
    ## What Changed
    
    - Pass `--no-textconv` and `--no-ext-diff` for tracked and untracked
    diff generation.
    - Discover configured `filter.<driver>.clean` and `.process` entries,
    then neutralize the selected drivers through structured
    `GIT_CONFIG_KEY_*` / `GIT_CONFIG_VALUE_*` overrides, including driver
    names containing `=`.
    - Run all `/diff` Git probes with `core.fsmonitor=false` and a null
    `core.hooksPath`.
    - Use short submodule reporting while ignoring dirty submodule
    worktrees, since inspecting a checked-out submodule for dirtiness can
    execute filters from that child repository. This intentionally omits
    dirty-only submodule markers in order to preserve the non-executing
    security boundary.
    - Add real-Git marker tests covering filters, fsmonitor, hooks, and
    configured helpers inside checked-out submodules.
    
    ## How to Test
    
    1. In a repository with ordinary tracked and untracked edits, run
    `/diff`.
    2. Confirm the normal working-tree diff is shown for top-level files.
    3. Run the targeted tests below; they configure executable marker
    helpers for repository filters, fsmonitor, hooks, and a checked-out
    submodule, then verify `/diff` does not invoke them.
    4. Confirm a dirty-only submodule does not cause Codex to enter the
    submodule and execute its configured helper.
    
    Targeted tests:
    - `just test -p codex-tui get_git_diff_`
    
    Validation note: `just test -p codex-tui` runs the new coverage, but
    this worktree currently also has two unrelated failing guardian tests:
    `app::tests::update_feature_flags_disabling_guardian_clears_review_policy_and_restores_default`
    and
    `app::tests::update_feature_flags_disabling_guardian_clears_manual_review_policy_without_history`.
  • Add codex app-server --stdio alias (#24940)
    ## Summary
    - Add `--stdio` as a direct alias for `codex app-server --listen
    stdio://`.
    - Keep `--stdio` and `--listen` mutually exclusive.
    - Update the app-server README to document both forms.
  • Move Bazel Windows jobs onto codex-runners (#24952)
    The codex-windows runner group should be much faster than the default
    GHA runners. Since bazel jobs on windows are frequently the long pole
    for PRs checks, this will hopefully get people landing a bit faster.
  • Add feature-gated standalone image generation extension (#24723)
    ## Why
    
    Add a standalone image generation path that can be exercised
    independently of hosted Responses image generation, while retaining the
    hosted tool as fallback unless the extension is actually available to
    the model.
    
    ## What changed
    
    - Added the `codex-image-generation-extension` crate with standalone
    generate/edit execution, prior-image selection for edits, model-visible
    image output, and local generated-image persistence.
    - Installed the extension in app-server behind the disabled-by-default
    `imagegenext` feature and backend eligibility checks.
    - Updated core tool planning so eligible `image_gen.imagegen` exposure
    replaces hosted `image_generation`, while unavailable configurations
    retain hosted fallback.
    - Added coverage for extension behavior, edit history reuse, feature
    gating, auth eligibility, and hosted-tool replacement.
    - The extension is installed through app-server only in this PR; other
    execution paths retain hosted image generation because hosted
    replacement occurs only when the standalone executor is actually
    registered and model-visible.
    - The initial extension contract intentionally fixes the image model to
    `gpt-image-2` and uses automatic image parameters.
    - Native generated-image history/card parity and rollout persistence
    cleanup are intentionally deferred follow-up work.
    
    ## Validation
    
    - `just test -p codex-image-generation-extension`
    - `just test -p codex-features`
    - `just test -p codex-core
    hosted_tools_follow_provider_auth_model_and_config_gates`
    - `just test -p codex-app-server`
    - `just fix -p codex-image-generation-extension -p codex-features -p
    codex-core -p codex-app-server`
    - `just fmt`
    - `just bazel-lock-update`
    - `just bazel-lock-check`
    
    ---------
    
    Co-authored-by: jif-oai <jif@openai.com>
  • Wire task completion into thread-idle lifecycle (#24928)
    ## Why
    
    #24744 introduced the thread idle lifecycle hook so idle continuation
    can be owned by lifecycle contributors instead of hard-coded goal
    runtime plumbing. Task completion still called
    `goal_runtime_apply(GoalRuntimeEvent::MaybeContinueIfIdle)` directly, so
    the post-turn idle transition remained goal-specific and did not notify
    generic thread lifecycle contributors.
    
    ## What Changed
    
    - Add `Session::emit_thread_idle_lifecycle_if_idle()` to gate idle
    emission on both no active turn and no queued trigger-turn mailbox work.
    - Call that helper when a task clears the active turn, replacing the
    direct `GoalRuntimeEvent::MaybeContinueIfIdle` path.
    - Cover the behavior with `codex-core` session tests for emitting after
    task completion and suppressing idle emission while trigger-turn mailbox
    work is pending.
    
    ## Verification
    
    - New tests in `core/src/session/tests.rs` exercise the idle lifecycle
    emission and trigger-turn mailbox guard.
  • TUI: Unified mentions tweaks + polish mentions rendering (#23363)
    This change keeps unified @mentions behind the mentions_v2 gate, moves
    the flag to under-development, and polishes mention rendering/history
    behavior.
    
    It also adds a few small improvements to the mentions feature around
    mention rendering and history round-tripping for plugin/tool mentions in
    message edit scenarios. Plugin selections now insert `@` mentions with
    better casing, and saved history preserves the visible sigil so recalled
    messages look the same as what the user typed.
    
    - Preserves `@` sigils when encoding/decoding mention history for
    tool/plugin paths.
    - Improves plugin mention insertion so display names/casing are
    reflected more cleanly in the composer.
    - Update composer to render user-entered plugin mentions in the same
    color as the mentions menu. ALso applies to recalled/edited messages.
    - Left/right arrows no longer switch unified-mention search modes after
    an @mention has already been accepted (Ex: arrowing left through a
    composed message that contains @mentions).
    - Keeps bound mentions stable around punctuation, so accepted `@`
    mentions do not reopen the popup and punctuated `$` mentions still
    persist to cross-session history.
    
    **Steps to test**
    - Ensure mentions_v2 is enabled through configuration or `--enable
    mentions_v2`
    - Type `@` in the TUI composer and verify filesystem/plugin/skill
    results are displayed in the unified mentions menu.
    - Select a plugin mention from the `@` popup and confirm the inserted
    text is an `@...` mention with casing, then recall/edit the message and
    confirm it still renders as `@...`.
    - Mention a skill and verify that skills still insert as `$skill`
    mentions rather than `@` mentions.
    - Verify punctuated mentions such as `@plugin.` and `($skill)` keep
    their bound mention behavior across editing and history recall.
  • chore: add GPT-5.5 to the Amazon Bedrock catalog (#24701)
    ## Summary
    
    Amazon Bedrock should expose GPT-5.5 alongside GPT-5.4, and the Bedrock
    GPT entries should stay aligned with the canonical bundled OpenAI model
    metadata instead of carrying a separate hand-written copy that can drift
    over time. This change will be merged when the model is online.
    
    This change:
    
    - Adds the Bedrock Mantle model id for `openai.gpt-5.5`.
    - Builds the Bedrock GPT-5.5 and GPT-5.4 catalog entries from the
    bundled OpenAI model catalog, then overrides the Bedrock-facing slug,
    explicit priority, and Bedrock-specific context windows.
    - Hardcodes both `context_window` and `max_context_window` to `272000`
    for Bedrock GPT-5.5 and GPT-5.4.
    - Keeps `openai.gpt-5.5` as the default Bedrock model ahead of
    `openai.gpt-5.4` and the Bedrock OSS models.
  • Fix extension turn item emitter test event ordering (#24936)
    ## Why
    
    PR #24813 added extension `TurnItemEmitter` coverage and introduced a
    test that records a conversation history item before asserting
    extension-emitted turn item events.
    
    `record_conversation_items()` also emits a `RawResponseItem` event to
    observers. The test was reading from the same event receiver and
    expected the next event to be `ItemStarted`, so the test failed reliably
    once the setup history item was present.
    
    ## What Changed
    
    Update
    `passes_turn_fields_and_scoped_turn_item_emitter_to_extension_call` to
    consume and assert the expected setup `RawResponseItem` before checking
    the extension `ItemStarted`, `WebSearchBegin`, `ItemCompleted`, and
    `WebSearchEnd` events.
    
    This is test-only and does not change extension runtime behavior.
    
    ## Verification
    
    - `cargo nextest run --no-fail-fast -p codex-core
    tools::handlers::extension_tools::tests::passes_turn_fields_and_scoped_turn_item_emitter_to_extension_call`
  • Reap stale multi-agent slots (#24903)
    ## Summary
    
    - Let `close_agent` clean up an agent that is still registered in
    `AgentRegistry` even when its underlying thread is already missing.
    - Preserve the explicit-close boundary: for known stale thread-spawn
    agents, mark the persisted spawn edge `Closed`, then treat
    `ThreadNotFound` / `InternalAgentDied` as a successful close so the
    registry slot can be released.
    - Add a regression for MultiAgentV2 task-name targets where
    `close_agent("worker")` succeeds after the worker thread has already
    disappeared.
    
    ## Motivation
    
    A worker can disappear from `ThreadManager` while its metadata still
    exists in the root `AgentRegistry`. Before this change, the close tool
    failed while trying to subscribe to the missing thread status, so it
    never reached the cleanup path that releases the registered agent slot.
    With `agents.max_threads = 1`, an explicit close of that stale task-name
    agent could fail and leave the session unable to spawn a replacement.
    
    ## Scope
    
    This PR intentionally does not add automatic stale-agent reaping to
    `spawn_agent`, `resume_agent`, or `list_agents`. A thread being missing
    from `ThreadManager` is not the same as an explicit close: persisted
    open spawn edges are still the durable source of truth for resume and
    task-name ownership until `close_agent` is called.
    
    ## Validation
    
    - `just test -p codex-core -E
    'test(multi_agent_v2_close_agent_reaps_stale_task_name_target) |
    test(resume_agent_from_rollout_reopens_open_descendants_after_manager_shutdown)'`
    - `just fix -p codex-core`
  • Expose MCP server info as part of server status (#24698)
    # Summary
    
    Expose MCP server info via App Server (when available) so apps can
    render a richer MCP experience
  • feat(app-server): include turns page on thread resume (#23534)
    ## Summary
    
    The client currently calls `thread/resume` to establish live updates and
    immediately follows it with `thread/turns/list` to hydrate recent turns.
    This lets `thread/resume` return that page directly, eliminating a round
    trip and the ordering/deduplication gap between the two calls.
    
    Experimental clients opt in with `initialTurnsPage: { limit,
    sortDirection, itemsView }`. The response returns `initialTurnsPage` as
    a `TurnsPage`, including cursors for paging further back in history.
    Keeping the controls in a nested opt-in object provides the useful
    `thread/turns/list` knobs without spreading page-specific parameters
    across `thread/resume`.
    
    ## Verification
    
    - `just fmt`
    - `just write-app-server-schema --experimental`
    - `just write-app-server-schema`
    - `cargo test -p codex-app-server-protocol`
    - `cargo test -p codex-app-server
    thread_resume_initial_turns_page_matches_requested_turns_list_page
    --tests`
    - `cargo test -p codex-app-server
    thread_resume_rejoins_running_thread_even_with_override_mismatch
    --tests`
    - `just fix -p codex-app-server-protocol -p codex-app-server`
  • extension-api: add TurnItemEmitter to tool calls (#24813)
    ## Why
    Extension-contributed tools need to emit visible turn items through
    Codex's normal event and persistence pipeline.
    
    ## What
    - Add `TurnItemEmitter` to extension `ToolCall`s and route the core
    implementation through `Session::emit_turn_item_*`.
    - Hold weak session and turn references so retained tool calls cannot
    keep host state alive.
    - Provide a no-op emitter for extension test callers.
    
    ## Test Plan
    - `just test -p codex-core -E
    'test(passes_turn_fields_and_scoped_turn_item_emitter_to_extension_call)'`
    
    ---------
    
    Co-authored-by: jif-oai <jif@openai.com>