Commit Graph

4889 Commits

  • Add ChatGPT device-code login to app server (#15525)
    ## Problem
    
    App-server clients could only initiate ChatGPT login through the browser
    callback flow, even though the shared login crate already supports
    device-code auth. That left VS Code, Codex App, and other app-server
    clients without a first-class way to use the existing device-code
    backend when browser redirects are brittle or when the client UX wants
    to own the login ceremony.
    
    ## Mental model
    
    This change adds a second ChatGPT login start path to app-server:
    clients can now call `account/login/start` with `type:
    "chatgptDeviceCode"`. App-server immediately returns a `loginId` plus
    the device-code UX payload (`verificationUrl` and `userCode`), then
    completes the login asynchronously in the background using the existing
    `codex_login` polling flow. Successful device-code login still resolves
    to ordinary `chatgpt` auth, and completion continues to flow through the
    existing `account/login/completed` and `account/updated` notifications.
    
    ## Non-goals
    
    This does not introduce a new auth mode, a new account shape, or a
    device-code eligibility discovery API. It also does not add automatic
    fallback to browser login in core; clients remain responsible for
    choosing when to request device code and whether to retry with a
    different UX if the backend/admin policy rejects it.
    
    ## Tradeoffs
    
    We intentionally keep `login_chatgpt_common` as a local validation
    helper instead of turning it into a capability probe. Device-code
    eligibility is checked by actually calling `request_device_code`, which
    means policy-disabled cases surface as an immediate request error rather
    than an async completion event. We also keep the active-login state
    machine minimal: browser and device-code logins share the same public
    cancel contract, but device-code cancellation is implemented with a
    local cancel token rather than a larger cross-crate refactor.
    
    ## Architecture
    
    The protocol grows a new `chatgptDeviceCode` request/response variant in
    app-server v2. On the server side, the new handler reuses the existing
    ChatGPT login precondition checks, calls `request_device_code`, returns
    the device-code payload, and then spawns a background task that waits on
    either cancellation or `complete_device_code_login`. On success, it
    reuses the existing auth reload and cloud-requirements refresh path
    before emitting `account/login/completed` success and `account/updated`.
    On failure or cancellation, it emits only `account/login/completed`
    failure. The existing `account/login/cancel { loginId }` contract
    remains unchanged and now works for both browser and device-code
    attempts.
    
    
    ## Tests
    
    Added protocol serialization coverage for the new request/response
    variant, plus app-server tests for device-code success, failure, cancel,
    and start-time rejection behavior. Existing browser ChatGPT login
    coverage remains in place to show that the callback-based flow is
    unchanged.
  • chore: refactor network permissions to use explicit domain and unix socket rule maps (#15120)
    ## Summary
    
    This PR replaces the legacy network allow/deny list model with explicit
    rule maps for domains and unix sockets across managed requirements,
    permissions profiles, the network proxy config, and the app server
    protocol.
    
    Concretely, it:
    
    - introduces typed domain (`allow` / `deny`) and unix socket permission
    (`allow` / `none`) entries instead of separate `allowed_domains`,
    `denied_domains`, and `allow_unix_sockets` lists
    - updates config loading, managed requirements merging, and exec-policy
    overlays to read and upsert rule entries consistently
    - exposes the new shape through protocol/schema outputs, debug surfaces,
    and app-server config APIs
    - rejects the legacy list-based keys and updates docs/tests to reflect
    the new config format
    
    ## Why
    
    The previous representation split related network policy across multiple
    parallel lists, which made merging and overriding rules harder to reason
    about. Moving to explicit keyed permission maps gives us a single source
    of truth per host/socket entry, makes allow/deny precedence clearer, and
    gives protocol consumers access to the full rule state instead of
    derived projections only.
    
    ## Backward Compatibility
    
    ### Backward compatible
    
    - Managed requirements still accept the legacy
    `experimental_network.allowed_domains`,
    `experimental_network.denied_domains`, and
    `experimental_network.allow_unix_sockets` fields. They are normalized
    into the new canonical `domains` and `unix_sockets` maps internally.
    - App-server v2 still deserializes legacy `allowedDomains`,
    `deniedDomains`, and `allowUnixSockets` payloads, so older clients can
    continue reading managed network requirements.
    - App-server v2 responses still populate `allowedDomains`,
    `deniedDomains`, and `allowUnixSockets` as legacy compatibility views
    derived from the canonical maps.
    - `managed_allowed_domains_only` keeps the same behavior after
    normalization. Legacy managed allowlists still participate in the same
    enforcement path as canonical `domains` entries.
    
    ### Not backward compatible
    
    - Permissions profiles under `[permissions.<profile>.network]` no longer
    accept the legacy list-based keys. Those configs must use the canonical
    `[domains]` and `[unix_sockets]` tables instead of `allowed_domains`,
    `denied_domains`, or `allow_unix_sockets`.
    - Managed `experimental_network` config cannot mix canonical and legacy
    forms in the same block. For example, `domains` cannot be combined with
    `allowed_domains` or `denied_domains`, and `unix_sockets` cannot be
    combined with `allow_unix_sockets`.
    - The canonical format can express explicit `"none"` entries for unix
    sockets, but those entries do not round-trip through the legacy
    compatibility fields because the legacy fields only represent allow/deny
    lists.
    ## Testing
    `/target/debug/codex sandbox macos --log-denials /bin/zsh -c 'curl
    https://www.example.com' ` gives 200 with config
    ```
    [permissions.workspace.network.domains]
    "www.example.com" = "allow"
    ```
    and fails when set to deny: `curl: (56) CONNECT tunnel failed, response
    403`.
    
    Also tested backward compatibility path by verifying that adding the
    following to `/etc/codex/requirements.toml` works:
    ```
    [experimental_network]
    allowed_domains = ["www.example.com"]
    ```
  • [app-server-protocol] introduce generic ClientResponse for app-server-protocol (#15921)
    - introduces `ClientResponse` as the symmetrical typed response union to
    `ClientRequest` for app-server-protocol
    - enables scalable event stream ingestion for use cases such as
    analytics
    - no runtime behavior changes, protocol/schema plumbing only
  • fix: increase timeout for rust-ci to 45 minutes for now (#15948)
    https://github.com/openai/codex/pull/15478 raised the timeout to 35
    minutes for `windows-arm64` only, though I just hit 35 minutes on
    https://github.com/openai/codex/actions/runs/23628986591/job/68826740108?pr=15944,
    so let's just increase it to 45 minutes. As noted, I'm hoping that we
    can bring it back down once we no longer have two copies of the `tui`
    crate.
  • codex-tools: extract MCP schema adapters (#15928)
    ## Why
    
    `codex-tools` already owns the shared tool input schema model and parser
    from the first extraction step, but `core/src/tools/spec.rs` still owned
    the MCP-specific adapter that normalizes `rmcp::model::Tool` schemas and
    wraps `structuredContent` into the call result output schema.
    
    Keeping that adapter in `codex-core` means the reusable MCP schema path
    is still split across crates, and the unit tests for that logic stay
    anchored in `codex-core` even though the runtime orchestration does not
    need to move yet.
    
    This change takes the next small step by moving the reusable MCP schema
    adapter into `codex-tools` while leaving `ResponsesApiTool` assembly in
    `codex-core`.
    
    ## What changed
    
    - added `tools/src/mcp_tool.rs` and sibling
    `tools/src/mcp_tool_tests.rs`
    - introduced `ParsedMcpTool`, `parse_mcp_tool()`, and
    `mcp_call_tool_result_output_schema()` in `codex-tools`
    - updated `core/src/tools/spec.rs` to consume parsed MCP tool parts from
    `codex-tools`
    - removed the now-redundant MCP schema unit tests from
    `core/src/tools/spec_tests.rs`
    - expanded `codex-rs/tools/README.md` to describe this second migration
    step
    
    ## Test plan
    
    - `cargo test -p codex-tools`
    - `cargo test -p codex-core --lib tools::spec::`
  • feat(windows-sandbox): add network proxy support (#12220)
    ## Summary
    
    This PR makes Windows sandbox proxying enforceable by routing proxy-only
    runs through the existing `offline` sandbox user and reserving direct
    network access for the existing `online` sandbox user.
    
    In brief:
    
    - if a Windows sandbox run should be proxy-enforced, we run it as the
    `offline` user
    - the `offline` user gets firewall rules that block direct outbound
    traffic and only permit the configured localhost proxy path
    - if a Windows sandbox run should have true direct network access, we
    run it as the `online` user
    - no new sandbox identity is introduced
    
    This brings Windows in line with the intended model: proxy use is not
    just env-based, it is backed by OS-level egress controls. Windows
    already has two sandbox identities:
    
    - `offline`: intended to have no direct network egress
    - `online`: intended to have full network access
    
    This PR makes proxy-enforced runs use that model directly.
    
    ### Proxy-enforced runs
    
    When proxy enforcement is active:
    
    - the run is assigned to the `offline` identity
    - setup extracts the loopback proxy ports from the sandbox env
    - Windows setup programs firewall rules for the `offline` user that:
      - block all non-loopback outbound traffic
      - block loopback UDP
      - block loopback TCP except for the configured proxy ports
    - optionally allow broader localhost access when `allow_local_binding=1`
    
    So the sandboxed process can only talk to the local proxy. It cannot
    open direct outbound sockets or do local UDP-based DNS on its own.The
    proxy then performs the real outbound network access outside that
    restricted sandbox identity.
    
    ### Direct-network runs
    
    When proxy enforcement is not active and full network access is allowed:
    
    - the run is assigned to the `online` identity
    - no proxy-only firewall restrictions are applied
    - the process gets normal direct network access
    
    ### Unelevated vs elevated
    
    The restricted-token / unelevated path cannot enforce per-identity
    firewall policy by itself.
    
    So for Windows proxy-enforced runs, we transparently use the logon-user
    sandbox path under the hood, even if the caller started from the
    unelevated mode. That keeps enforcement real instead of best-effort.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • permissions: remove macOS seatbelt extension profiles (#15918)
    ## Why
    
    `PermissionProfile` should only describe the per-command permissions we
    still want to grant dynamically. Keeping
    `MacOsSeatbeltProfileExtensions` in that surface forced extra macOS-only
    approval, protocol, schema, and TUI branches for a capability we no
    longer want to expose.
    
    ## What changed
    
    - Removed the macOS-specific permission-profile types from
    `codex-protocol`, the app-server v2 API, and the generated
    schema/TypeScript artifacts.
    - Deleted the core and sandboxing plumbing that threaded
    `MacOsSeatbeltProfileExtensions` through execution requests and seatbelt
    construction.
    - Simplified macOS seatbelt generation so it always includes the fixed
    read-only preferences allowlist instead of carrying a configurable
    profile extension.
    - Removed the macOS additional-permissions UI/docs/test coverage and
    deleted the obsolete macOS permission modules.
    - Tightened `request_permissions` intersection handling so explicitly
    empty requested read lists are preserved only when that field was
    actually granted, avoiding zero-grant responses being stored as active
    permissions.
  • codex-tools: extract shared tool schema parsing (#15923)
    ## Why
    
    `parse_tool_input_schema` and the supporting `JsonSchema` model were
    living in `core/src/tools/spec.rs`, but they already serve callers
    outside `codex-core`.
    
    Keeping that shared schema parsing logic inside `codex-core` makes the
    crate boundary harder to reason about and works against the guidance in
    `AGENTS.md` to avoid growing `codex-core` when reusable code can live
    elsewhere.
    
    This change takes the first extraction step by moving the schema parsing
    primitive into its own crate while keeping the rest of the tool-spec
    assembly in `codex-core`.
    
    ## What changed
    
    - added a new `codex-tools` crate under `codex-rs/tools`
    - moved the shared tool input schema model and sanitizer/parser into
    `tools/src/json_schema.rs`
    - kept `tools/src/lib.rs` exports-only, with the module-level unit tests
    split into `json_schema_tests.rs`
    - updated `codex-core` to use `codex-tools::JsonSchema` and re-export
    `parse_tool_input_schema`
    - updated `codex-app-server` dynamic tool validation to depend on
    `codex-tools` directly instead of reaching through `codex-core`
    - wired the new crate into the Cargo workspace and Bazel build graph
  • bazel: re-organize bazelrc (#15522)
    Replaced ci.bazelrc and v8-ci.bazelrc by custom configs inside the main
    .bazelrc file. As a result, github workflows setup is simplified down to
    a single '--config=<foo>' flag usage.
    
    Moved the build metadata flags to config=ci.
    Added custom tags metadata to help differentiate invocations based on
    workflow (bazel vs v8) and os (linux/macos/windows).
    
    Enabled users to override the default values in .bazelrc by using a
    user.bazelrc file locally.
    Added user.bazelrc to gitignore.
  • Preserve bazel repository cache in github actions (#14495)
    Highlights:
    
    - Trimmed down to just the repository cache for faster upload / download
    - Made the cache key only include files that affect external
    dependencies (since that's what the repository cache caches) -
    MODULE.bazel, codex-rs/Cargo.lock, codex-rs/Cargo.toml
    - Split the caching action in to explicit restore / save steps (similar
    to your rust CI) which allows us to skip uploads on cache hit, and not
    fail the build if upload fails
    
    This should get rid of 842 network fetches that are happening on every
    Bazel CI run, while also reducing the Github flakiness @bolinfest
    reported. Uploading should be faster (since we're not caching many small
    files), and will only happen when MODULE.bazel or Cargo.lock /
    Cargo.toml files change.
    
    In my testing, it [took 3s to save the repository
    cache](https://github.com/siggisim/codex/actions/runs/23014186143/job/66832859781).
  • fix(network-proxy): fail closed on network-proxy DNS lookup errors (#15909)
    ## Summary
    
    Fail closed when the network proxy's local/private IP pre-check hits a
    DNS lookup error or timeout, instead of treating the hostname as public
    and allowing the request.
    
    ## Root cause
    
    `host_resolves_to_non_public_ip()` returned `false` on resolver failure,
    which created a fail-open path in the `allow_local_binding = false`
    boundary. The eventual connect path performs its own DNS resolution
    later, so a transient pre-check failure is not evidence that the
    destination is public.
    
    ## Changes
    
    - Treat DNS lookup errors/timeouts as local/private for blocking
    purposes
    - Add a regression test for an allowlisted hostname that fails DNS
    resolution
    
    ## Validation
    
    - `cargo test -p codex-network-proxy`
    - `cargo clippy -p codex-network-proxy --all-targets -- -D warnings`
    - `just fmt`
    - `just argument-comment-lint`
  • chore: remove skill metadata from command approval payloads (#15906)
    ## Why
    
    This is effectively a follow-up to
    [#15812](https://github.com/openai/codex/pull/15812). That change
    removed the special skill-script exec path, but `skill_metadata` was
    still being threaded through command-approval payloads even though the
    approval flow no longer uses it to render prompts or resolve decisions.
    
    Keeping it around added extra protocol, schema, and client surface area
    without changing behavior.
    
    Removing it keeps the command-approval contract smaller and avoids
    carrying a dead field through app-server, TUI, and MCP boundaries.
    
    ## What changed
    
    - removed `ExecApprovalRequestSkillMetadata` and the corresponding
    `skillMetadata` field from core approval events and the v2 app-server
    protocol
    - removed the generated JSON and TypeScript schema output for that field
    - updated app-server, MCP server, TUI, and TUI app-server approval
    plumbing to stop forwarding the field
    - cleaned up tests that previously constructed or asserted
    `skillMetadata`
    
    ## Testing
    
    - `cargo test -p codex-app-server-protocol`
    - `cargo test -p codex-protocol`
    - `cargo test -p codex-app-server-test-client`
    - `cargo test -p codex-mcp-server`
    - `just argument-comment-lint`
  • chore: move bwrap config helpers into dedicated module (#15898)
    ## Summary
    - move the bwrap PATH lookup and warning helpers out of config/mod.rs
    - move the related tests into a dedicated bwrap_tests.rs file
    
    ## Validation
    - git diff --check
    - skipped heavier local tests per request
    
    Follow-up to #15791.
  • docs: update AGENTS.md to discourage adding code to codex-core (#15910)
    ## Why
    
    `codex-core` is already the largest crate in `codex-rs`, so defaulting
    to it for new functionality makes it harder to keep the workspace
    modular. The repo guidance should make it explicit that contributors are
    expected to look for an existing non-`codex-core` crate, or introduce a
    new crate, before growing `codex-core` further.
    
    ## What Changed
    
    - Added a dedicated `The \`codex-core\` crate` section to `AGENTS.md`.
    - Documented why `codex-core` should be treated as a last resort for new
    functionality.
    - Added concrete guidance for both implementation and review: prefer an
    existing non-`codex-core` crate when possible, introduce a new workspace
    crate when that is the cleaner boundary, and push back on PRs that grow
    `codex-core` unnecessarily.
  • sandboxing: use OsString for SandboxCommand.program (#15897)
    ## Why
    
    `SandboxCommand.program` represents an executable path, but keeping it
    as `String` forced path-backed callers to run `to_string_lossy()` before
    the sandbox layer ever touched the command. That loses fidelity earlier
    than necessary and adds avoidable conversions in runtimes that already
    have a `PathBuf`.
    
    ## What changed
    
    - Changed `SandboxCommand.program` to `OsString`.
    - Updated `SandboxManager::transform` to keep the program and argv in
    `OsString` form until the `SandboxExecRequest` conversion boundary.
    - Switched the path-backed `apply_patch` and `js_repl` runtimes to pass
    `into_os_string()` instead of `to_string_lossy()`.
    - Updated the remaining string-backed builders and tests to match the
    new type while preserving the existing Linux helper `arg0` behavior.
    
    ## Verification
    
    - `cargo test -p codex-sandboxing`
    - `just argument-comment-lint -p codex-core -p codex-sandboxing`
    - `cargo test -p codex-core` currently fails in unrelated existing
    config tests: `config::tests::approvals_reviewer_*` and
    `config::tests::smart_approvals_alias_*`
  • [codex] import token_data from codex-login directly (#15903)
    ## Why
    `token_data` is owned by `codex-login`, but `codex-core` was still
    re-exporting it. That let callers pull auth token types through
    `codex-core`, which keeps otherwise unrelated crates coupled to
    `codex-core` and makes `codex-core` more of a build-graph bottleneck.
    
    ## What changed
    - remove the `codex-core` re-export of `codex_login::token_data`
    - update the remaining `codex-core` internals that used
    `crate::token_data` to import `codex_login::token_data` directly
    - update downstream callers in `codex-rs/chatgpt`,
    `codex-rs/tui_app_server`, `codex-rs/app-server/tests/common`, and
    `codex-rs/core/tests` to import `codex_login::token_data` directly
    - add explicit `codex-login` workspace dependencies and refresh lock
    metadata for crates that now depend on it directly
    
    ## Validation
    - `cargo test -p codex-chatgpt --locked`
    - `just argument-comment-lint`
    - `just bazel-lock-update`
    - `just bazel-lock-check`
    
    ## Notes
    - attempted `cargo test -p codex-core --locked` and `cargo test -p
    codex-core auth_refresh --locked`, but both ran out of disk while
    linking `codex-core` test binaries in the local environment
  • Protect first-time project .codex creation across Linux and macOS sandboxes (#15067)
    ## Problem
    
    Codex already treated an existing top-level project `./.codex` directory
    as protected, but there was a gap on first creation.
    
    If `./.codex` did not exist yet, a turn could create files under it,
    such as `./.codex/config.toml`, without going through the same approval
    path as later modifications. That meant the initial write could bypass
    the intended protection for project-local Codex state.
    
    ## What this changes
    
    This PR closes that first-creation gap in the Unix enforcement layers:
    
    - `codex-protocol`
    - treat the top-level project `./.codex` path as a protected carveout
    even when it does not exist yet
    - avoid injecting the default carveout when the user already has an
    explicit rule for that exact path
    - macOS Seatbelt
    - deny writes to both the exact protected path and anything beneath it,
    so creating `./.codex` itself is blocked in addition to writes inside it
    - Linux bubblewrap
    - preserve the same protected-path behavior for first-time creation
    under `./.codex`
    - tests
    - add protocol regressions for missing `./.codex` and explicit-rule
    collisions
    - add Unix sandbox coverage for blocking first-time `./.codex` creation
      - tighten Seatbelt policy assertions around excluded subpaths
    
    ## Scope
    
    This change is intentionally scoped to protecting the top-level project
    `.codex` subtree from agent writes.
    
    It does not make `.codex` unreadable, and it does not change the product
    behavior around loading project skills from `.codex` when project config
    is untrusted.
    
    ## Why this shape
    
    The fix is pointed rather than broad:
    - it preserves the current model of “project `.codex` is protected from
    writes”
    - it closes the security-relevant first-write hole
    - it avoids folding a larger permissions-model redesign into this PR
    
    ## Validation
    
    - `cargo test -p codex-protocol`
    - `cargo test -p codex-sandboxing seatbelt`
    - `cargo test -p codex-exec --test all
    sandbox_blocks_first_time_dot_codex_creation -- --nocapture`
    
    ---------
    
    Co-authored-by: Michael Bolin <mbolin@openai.com>
  • app-server: Split transport module (#15811)
    `transport.rs` is getting pretty big, split individual transport
    implementations into separate files.
  • skills: remove unused skill permission metadata (#15900)
    ## Why
    
    Skill metadata accepted a `permissions` block and stored the result on
    `SkillMetadata`, but that data was never consumed by runtime behavior.
    Leaving the dead parsing path in place makes it look like skills can
    widen or otherwise influence execution permissions when, in practice,
    declared skill permissions are ignored.
    
    This change removes that misleading surface area so the skill metadata
    model matches what the system actually uses.
    
    ## What changed
    
    - removed `permission_profile` and `managed_network_override` from
    `core-skills::SkillMetadata`
    - stopped parsing `permissions` from skill metadata in
    `core-skills/src/loader.rs`
    - deleted the loader tests that only exercised the removed permissions
    parsing path
    - cleaned up dependent `SkillMetadata` constructors in tests and TUI
    code that were only carrying `None` for those fields
    
    ## Testing
    
    - `cargo test -p codex-core-skills`
    - `cargo test -p codex-tui
    submission_prefers_selected_duplicate_skill_path`
    - `just argument-comment-lint`
  • fix: resolve bwrap from trusted PATH entry (#15791)
    ## Summary
    - resolve system bwrap from PATH instead of hardcoding /usr/bin/bwrap
    - skip PATH entries that resolve inside the current workspace before
    launching the sandbox helper
    - keep the vendored bubblewrap fallback when no trusted system bwrap is
    found
    
    ## Validation
    - cargo test -p codex-core bwrap --lib
    - cargo test -p codex-linux-sandbox
    - just fix -p codex-core
    - just fix -p codex-linux-sandbox
    - just fmt
    - just argument-comment-lint
    - cargo clean
  • [plugins] Polish tool suggest prompts. (#15891)
    - [x] Polish tool suggest prompts to distinguish between missing
    connectors and discoverable plugins, and be very precise about the
    triggering conditions.
  • [mcp] Fix legacy_tools (#15885)
    - [x] Fix legacy_tools
  • feat(tui): add terminal title support to tui app server (#15860)
    ## TR;DR
    
    Replicates the `/title` command from `tui` to `tui_app_server`.
    
    ## Problem
    
    The classic `tui` crate supports customizing the terminal window/tab
    title via `/title`, but the `tui_app_server` crate does not. Users on
    the app-server path have no way to configure what their terminal title
    shows (project name, status, spinner, thread, etc.), making it harder to
    identify Codex sessions across tabs or windows.
    
    ## Mental model
    
    The terminal title is a *status surface* -- conceptually parallel to the
    footer status line. Both surfaces are configurable lists of items, both
    share expensive inputs (git branch lookup, project root discovery), and
    both must be refreshed at the same lifecycle points. This change ports
    the classic `tui`'s design verbatim:
    
    1. **`terminal_title.rs`** owns the low-level OSC write path and input
    sanitization. It strips control characters and bidi/invisible codepoints
    before placing untrusted text (model output, thread names, project
    paths) inside an escape sequence.
    
    2. **`title_setup.rs`** defines `TerminalTitleItem` (the 8 configurable
    items) and `TerminalTitleSetupView` (the interactive picker that wraps
    `MultiSelectPicker`).
    
    3. **`status_surfaces.rs`** is the shared refresh pipeline. It parses
    both surface configs once per refresh, warns about invalid items once
    per session, synchronizes the git-branch cache, then renders each
    surface from the same `StatusSurfaceSelections` snapshot.
    
    4. **`chatwidget.rs`** sets `TerminalTitleStatusKind` at each state
    transition (Working, Thinking, Undoing, WaitingForBackgroundTerminal)
    and calls `refresh_terminal_title()` whenever relevant state changes.
    
    5. **`app.rs`** handles the three setup events (confirm/preview/cancel),
    persists config via `ConfigEditsBuilder`, and clears the managed title
    on `Drop`.
    
    ## Non-goals
    
    - **Restoring the previous terminal title on exit.** There is no
    portable way to read the terminal's current title, so `Drop` clears the
    managed title rather than restoring it.
    - **Sharing code between `tui` and `tui_app_server`.** The
    implementation is a parallel copy, matching the existing pattern for the
    status-line feature. Extracting a shared crate is future work.
    
    ## Tradeoffs
    
    - **Duplicate code across crates.** The three core files
    (`terminal_title.rs`, `title_setup.rs`, `status_surfaces.rs`) are
    byte-for-byte copies from the classic `tui`. This was chosen for
    consistency with the existing status-line port and to avoid coupling the
    two crates at the dependency level. Future changes must be applied in
    both places.
    
    - **`status_surfaces.rs` is large (~660 lines).** It absorbs logic that
    previously lived inline in `chatwidget.rs` (status-line refresh, git
    branch management, project root discovery) plus all new terminal-title
    logic. This consolidation trades file size for a single place where both
    surfaces are coordinated.
    
    - **Spinner scheduling on every refresh.** The terminal title spinner
    (when active) schedules a frame every 100ms. This is the same pattern
    the status-indicator spinner already uses; the overhead is a timer
    registration, not a redraw.
    
    ## Architecture
    
    ```
    /title command
      -> SlashCommand::Title
      -> open_terminal_title_setup()
      -> TerminalTitleSetupView (MultiSelectPicker)
      -> on_change:  AppEvent::TerminalTitleSetupPreview  -> preview_terminal_title()
      -> on_confirm: AppEvent::TerminalTitleSetup         -> ConfigEditsBuilder + setup_terminal_title()
      -> on_cancel:  AppEvent::TerminalTitleSetupCancelled -> cancel_terminal_title_setup()
    
    Runtime title refresh:
      state change (turn start, reasoning, undo, plan update, thread rename, ...)
      -> set terminal_title_status_kind
      -> refresh_terminal_title()
      -> status_surface_selections()  (parse configs, collect invalids)
      -> refresh_terminal_title_from_selections()
         -> terminal_title_value_for_item() for each configured item
         -> assemble title string with separators
         -> skip if identical to last_terminal_title (dedup OSC writes)
         -> set_terminal_title() (sanitize + OSC 0 write)
         -> schedule spinner frame if animating
    
    Widget replacement:
      replace_chat_widget_with_app_server_thread()
      -> transfer last_terminal_title from old widget to new
      -> avoids redundant OSC clear+rewrite on session switch
    ```
    
    ## Observability
    
    - Invalid terminal-title item IDs in config emit a one-per-session
    warning via `on_warning()` (gated by
    `terminal_title_invalid_items_warned` `AtomicBool`).
    - OSC write failures are logged at `tracing::debug` level.
    - Config persistence failures are logged at `tracing::error` and
    surfaced to the user via `add_error_message()`.
    
    ## Tests
    
    - `terminal_title.rs`: 4 unit tests covering sanitization (control
    chars, bidi codepoints, truncation) and OSC output format.
    - `title_setup.rs`: 3 tests covering setup view snapshot rendering,
    parse order preservation, and invalid-ID rejection.
    - `chatwidget/tests.rs`: Updated test helpers with new fields; existing
    tests continue to pass.
    
    ---------
    
    Co-authored-by: Eric Traut <etraut@openai.com>
  • Add wildcard in the middle test coverage (#15813)
    ## Summary
    Add a focused codex network proxy unit test for the denylist pattern
    with wildcard in the middle `region*.some.malicious.tunnel.com`. This
    does not change how existing code works, just ensure that behavior stays
    the same and we got CI guards to guard existin behavior.
    
    ## Why
    The managed Codex denylist update relies on this mid label glob form,
    and the existing tests only covered exact hosts, `*.` subdomains, and
    `**.` apex plus subdomains.
    
    ## Validation
    `cargo test -p codex-network-proxy
    compile_globset_supports_mid_label_wildcards`
    `cargo test -p codex-network-proxy`
    `./tools/argument-comment-lint/run-prebuilt-linter.sh -p
    codex-network-proxy`
  • [codex] Block unsafe git global options from safe allowlist (#15796)
    ## Summary
    - block git global options that can redirect config, repository, or
    helper lookup from being auto-approved as safe
    - share the unsafe global-option predicate across the Unix and Windows
    git safety checks
    - add regression coverage for inline and split forms, including `bash
    -lc` and PowerShell wrappers
    
    ## Root cause
    The Unix safe-command gate only rejected `-c` and `--config-env`, even
    though the shared git parser already knew how to skip additional
    pre-subcommand globals such as `--git-dir`, `--work-tree`,
    `--exec-path`, `--namespace`, and `--super-prefix`. That let those
    arguments slip through safe-command classification on otherwise
    read-only git invocations and bypass approval. The Windows-specific
    safe-command path had the same trust-boundary gap for git global
    options.
  • fix: box apply_patch test harness futures (#15835)
    ## Why
    
    `#[large_stack_test]` made the `apply_patch_cli` tests pass by giving
    them more stack, but it did not address why those tests needed the extra
    stack in the first place.
    
    The real problem is the async state built by the `apply_patch_cli`
    harness path. Those tests await three helper boundaries directly:
    harness construction, turn submission, and apply-patch output
    collection. If those helpers inline their full child futures, the test
    future grows to include the whole harness startup and request/response
    path.
    
    This change replaces the workaround from #12768 with the same basic
    approach used in #13429, but keeps the fix narrower: only the helper
    boundaries awaited directly by `apply_patch_cli` stay boxed.
    
    ## What Changed
    
    - removed `#[large_stack_test]` from
    `core/tests/suite/apply_patch_cli.rs`
    - restored ordinary `#[tokio::test(flavor = "multi_thread",
    worker_threads = 2)]` annotations in that suite
    - deleted the now-unused `codex-test-macros` crate and removed its
    workspace wiring
    - boxed only the three helper boundaries that the suite awaits directly:
      - `apply_patch_harness_with(...)`
      - `TestCodexHarness::submit(...)`
      - `TestCodexHarness::apply_patch_output(...)`
    - added comments at those boxed boundaries explaining why they remain
    boxed
    
    ## Testing
    
    - `cargo test -p codex-core --test all suite::apply_patch_cli --
    --nocapture`
    
    ## References
    
    - #12768
    - #13429
  • Add MCP connector metrics (#15805)
    ## Summary
    - enrich `codex.mcp.call` with `tool`, `connector_id`, and sanitized
    `connector_name` for actual MCP executions
    - record `codex.mcp.call.duration_ms` for actual MCP executions so
    connector-level latency is visible in metrics
    - keep skipped, blocked, declined, and cancelled paths on the plain
    status-only `codex.mcp.call` counter
    
    ## Included Changes
    - `codex-rs/core/src/mcp_tool_call.rs`: add connector-sliced MCP count
    and duration metrics only for executed tool calls, while leaving
    non-executed outcomes as status-only counts
    - `codex-rs/core/src/mcp_tool_call_tests.rs`: cover metric tag shaping,
    connector-name sanitization, and the new duration metric tags
    
    ## Testing
    - `cargo test -p codex-core`
    - `just fix -p codex-core`
    - `just fmt`
    
    ## Notes
    - `cargo test -p codex-core` still hits existing unrelated failures in
    approvals-reviewer config tests and the sandboxed JS REPL `mktemp` test
    - full workspace `cargo test` was not run
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Fix duplicate /review messages in app-server TUI (#15839)
    ## Symptoms
    When `/review` ran through `tui_app_server`, the TUI could show
    duplicate review content:
    - the `>> Code review started: ... <<` banner appeared twice
    - the final review body could also appear twice
    
    ## Problem
    `tui_app_server` was treating review lifecycle items as renderable
    content on more than one delivery path.
    
    Specifically:
    - `EnteredReviewMode` was rendered both when the item started and again
    when it completed
    - `ExitedReviewMode` rendered the review text itself, even though the
    same review text was also delivered later as the assistant message item
    
    That meant the same logical review event was committed into history
    multiple times.
    
    ## Solution
    Make review lifecycle items control state transitions only once, and
    keep the final review body sourced from the assistant message item:
    - render the review-start banner from the live `ItemStarted` path, while
    still allowing replay to restore it once
    - treat `ExitedReviewMode` as a mode-exit/finish-banner event instead of
    rendering the review body from it
    - preserve the existing assistant-message rendering path as the single
    source of final review text
  • [plugins] Update the suggestable plugins list. (#15829)
    - [x] Update the suggestable plugins list to be featured plugins.
  • feat: use ProcessId in exec-server (#15866)
    Use a full struct for the ProcessId to increase readability and make it
    easier in the future to make it evolve if needed
  • chore: ask agents md not to play with PIDs (#15877)
    Ask Codex to be patient with Rust
  • feat: exec-server prep for unified exec (#15691)
    This PR partially rebase `unified_exec` on the `exec-server` and adapt
    the `exec-server` accordingly.
    
    ## What changed in `exec-server`
    
    1. Replaced the old "broadcast-driven; process-global" event model with
    process-scoped session events. The goal is to be able to have dedicated
    handler for each process.
    2. Add to protocol contract to support explicit lifecycle status and
    stream ordering:
    - `WriteResponse` now returns `WriteStatus` (Accepted, UnknownProcess,
    StdinClosed, Starting) instead of a bool.
      - Added seq fields to output/exited notifications.
      - Added terminal process/closed notification.
    3. Demultiplexed remote notifications into per-process channels. Same as
    for the event sys
    4. Local and remote backends now both implement ExecBackend.
    5. Local backend wraps internal process ID/operations into per-process
    ExecProcess objects.
    6. Remote backend registers a session channel before launch and
    unregisters on failed launch.
    
    ## What changed in `unified_exec`
    
    1. Added unified process-state model and backend-neutral process
    wrapper. This will probably disappear in the future, but it makes it
    easier to keep the work flowing on both side.
    - `UnifiedExecProcess` now handles both local PTY sessions and remote
    exec-server processes through a shared `ProcessHandle`.
    - Added `ProcessState` to track has_exited, exit_code, and terminal
    failure message consistently across backends.
    2. Routed write and lifecycle handling through process-level methods.
    
    ## Some rationals
    
    1. The change centralizes execution transport in exec-server while
    preserving policy and orchestration ownership in core, avoiding
    duplicated launch approval logic. This comes from internal discussion.
    2. Session-scoped events remove coupling/cross-talk between processes
    and make stream ordering and terminal state explicit (seq, closed,
    failed).
    3. The failure-path surfacing (remote launch failures, write failures,
    transport disconnects) makes command tool output and cleanup behavior
    deterministic
    
    ## Follow-ups:
    * Unify the concept of thread ID behind an obfuscated struct
    * FD handling
    * Full zsh-fork compatibility
    * Full network sandboxing compatibility
    * Handle ws disconnection
  • feat: clean spawn v1 (#15861)
    Avoid the usage of path in the v1 spawn
  • feat: replace askama by custom lib (#15784)
    Finalise the drop of `askama` to use our internal lib instead
  • fix: fix old system bubblewrap compatibility without falling back to vendored bwrap (#15693)
    Fixes #15283.
    
    ## Summary
    Older system bubblewrap builds reject `--argv0`, which makes our Linux
    sandbox fail before the helper can re-exec. This PR keeps using system
    `/usr/bin/bwrap` whenever it exists and only falls back to vendored
    bwrap when the system binary is missing. That matters on stricter
    AppArmor hosts, where the distro bwrap package also provides the policy
    setup needed for user namespaces.
    
    For old system bwrap, we avoid `--argv0` instead of switching binaries:
    - pass the sandbox helper a full-path `argv0`,
    - keep the existing `current_exe() + --argv0` path when the selected
    launcher supports it,
    - otherwise omit `--argv0` and re-exec through the helper's own
    `argv[0]` path, whose basename still dispatches as
    `codex-linux-sandbox`.
    
    Also updates the launcher/warning tests and docs so they match the new
    behavior: present-but-old system bwrap uses the compatibility path, and
    only absent system bwrap falls back to vendored.
    
    ### Validation
    
    1. Install Ubuntu 20.04 in a VM
    2. Compile codex and run without bubblewrap installed - see a warning
    about falling back to the vendored bwrap
    3. Install bwrap and verify version is 0.4.0 without `argv0` support
    4. run codex and use apply_patch tool without errors
    
    <img width="802" height="631" alt="Screenshot 2026-03-25 at 11 48 36 PM"
    src="https://github.com/user-attachments/assets/77248a29-aa38-4d7c-9833-496ec6a458b8"
    />
    <img width="807" height="634" alt="Screenshot 2026-03-25 at 11 47 32 PM"
    src="https://github.com/user-attachments/assets/5af8b850-a466-489b-95a6-455b76b5050f"
    />
    <img width="812" height="635" alt="Screenshot 2026-03-25 at 11 45 45 PM"
    src="https://github.com/user-attachments/assets/438074f0-8435-4274-a667-332efdd5cb57"
    />
    <img width="801" height="623" alt="Screenshot 2026-03-25 at 11 43 56 PM"
    src="https://github.com/user-attachments/assets/0dc8d3f5-e8cf-4218-b4b4-a4f7d9bf02e3"
    />
    
    ---------
    
    Co-authored-by: Michael Bolin <mbolin@openai.com>
  • Expand home-relative paths on Windows (#15817)
    Follow up to: https://github.com/openai/codex/pull/9193, also support
    this for Windows.
    
    ---------
    
    Co-authored-by: Michael Bolin <mbolin@openai.com>
  • Wire remote app-server auth through the client (#14853)
    For app-server websocket auth, support the two server-side mechanisms
    from
    PR #14847:
    
    - `--ws-auth capability-token --ws-token-file /abs/path`
    - `--ws-auth signed-bearer-token --ws-shared-secret-file /abs/path`
      with optional `--ws-issuer`, `--ws-audience`, and
      `--ws-max-clock-skew-seconds`
    
    On the client side, add interactive remote support via:
    
    - `--remote ws://host:port` or `--remote wss://host:port`
    - `--remote-auth-token-env <ENV_VAR>`
    
    Codex reads the bearer token from the named environment variable and
    sends it
    as `Authorization: Bearer <token>` during the websocket handshake.
    Remote auth
    tokens are only allowed for `wss://` URLs or loopback `ws://` URLs.
    
    Testing:
    - tested both auth methods manually to confirm connection success and
    rejection for both auth types
  • Fix quoted command rendering in tui_app_server (#15825)
    When `tui_app_server` is enabled, shell commands in the transcript
    render as fully quoted invocations like `/bin/zsh -lc "..."`. The
    non-app-server TUI correctly shows the parsed command body.
    
    Root cause:
    The app-server stores `ThreadItem::CommandExecution.command` as a
    shell-quoted string. When `tui_app_server` bridges that item back into
    the exec renderer, it was passing `vec![command]` unchanged instead of
    splitting the string back into argv. That prevented
    `strip_bash_lc_and_escape()` from recognizing the shell wrapper, so the
    renderer displayed the wrapper literally.
    
    Solution:
    Add a shared command-string splitter that round-trips shell-quoted
    commands back into argv when it is safe to do so, while preserving
    non-roundtrippable inputs as a single string. Use that helper everywhere
    `tui_app_server` reconstructs exec commands from app-server payloads,
    including live command-execution items, replayed thread items, and exec
    approval requests. This restores the same command display behavior as
    the direct TUI path without breaking Windows-style commands that cannot
    be safely round-tripped.
  • [plugins] Flip flags on. (#15820)
    - [x] Flip flags on.
  • [hooks] add non-streaming (non-stdin style) shell-only PostToolUse support (#15531)
    CHAINED PR - note that base is eternal/hooks-pretooluse-bash, not main
    -- so the following PR should be first
    
    Matching post-tool hook to the pre-tool functionality here:
    https://github.com/openai/codex/pull/15211
    
    So, PreToolUse calls for plain shell calls, allows blocking. This
    PostToolUse call runs after the command executed
    
    example run:
    
    ```
    › as a test, run in parallel the following commands:
      - echo 'one'
      - echo '[block-pre-tool-use]'
      - echo '[block-post-tool-use]'
    
    
    ⚠ MCP startup incomplete (failed: notion, linear)
    
    • Cruising through those three commands in parallel now, and I’ll share the exact outputs right after
      they land.
    
    • Running PreToolUse hook: checking the observatory runes
    
    • Running PreToolUse hook: checking the observatory runes
    
    • Running PreToolUse hook: checking the observatory runes
    
    PreToolUse hook (blocked)
      warning: wizard-tower PreToolUse demo blocked a Bash command on purpose.
      feedback: PreToolUse demo blocked the command. Remove [block-pre-tool-use] to continue.
    
    PreToolUse hook (completed)
      warning: wizard-tower PreToolUse demo inspected Bash: echo 'one'
    
    PreToolUse hook (completed)
      warning: wizard-tower PreToolUse demo inspected Bash: echo '[block-post-tool-use]'
    
    • Ran echo '[block-post-tool-use]'
      └ [block-post-tool-use]
    
    • Ran echo 'one'
      └ one
    
    • Running PostToolUse hook: reading the observatory aftermath
    
    • Running PostToolUse hook: reading the observatory aftermath
    
    PostToolUse hook (blocked)
      warning: wizard-tower PostToolUse demo blocked the Bash follow-through on purpose.
      feedback: PostToolUse demo blocked the result after execution. Remove [block-post-tool-use] to
    continue.
    
    PostToolUse hook (completed)
      warning: wizard-tower PostToolUse demo inspected Bash: echo 'one'
      hook context: PostToolUse demo saw the Bash result. Response preview: one
    
    ──────────────────────────────────────────────────────────────────────────────────────────────────────
    
    • Ran them in parallel. Little wave report:
    
      - echo 'one' completed and returned one.
      - echo '[block-pre-tool-use]' was blocked before execution by the PreToolUse hook.
      - echo '[block-post-tool-use]' ran, but its result was blocked afterward by the PostToolUse hook.
    
      The hook messages were:
    
      - PreToolUse demo blocked the command. Remove [block-pre-tool-use] to continue.
      - PostToolUse demo blocked the result after execution. Remove [block-post-tool-use] to continue.
    
    
    › for the blocked messages (correctly blocked in this test) did the tool output land back into
      context?
    
    
    • Yeah, from this run:
    
      - PreToolUse block: the command itself did not execute, so no command output landed in context. Only
        the block message did.
      - PostToolUse block: the command did execute, but the actual stdout did not come back into context.
        Only the post-tool block message did.
    
      So the blocked notifications landed in context, but the blocked command payloads themselves did not.
      Nice clean guardrail, aloha.
    ```