Commit Graph

5829 Commits

  • Fix codex-core config test type paths (#19726)
    Summary:
    - Update config tests to reference config requirement types from
    codex_config after the loader split.
    
    Tests:
    - just fmt
    - cargo build -p codex-core --tests
    - cargo clippy -p codex-core --tests -- -D warnings
  • permissions: migrate approval and sandbox consumers to profiles (#19393)
    ## Why
    
    Runtime decisions should not infer permissions from the lossy legacy
    sandbox projection once `PermissionProfile` is available. In particular,
    `Disabled` and `External` need to remain distinct, and managed profiles
    with split filesystem or deny-read rules should not be collapsed before
    approval, network, safety, or analytics code makes decisions.
    
    ## What Changed
    
    - Changes managed network proxy setup and network approval logic to use
    `PermissionProfile` when deciding whether a managed sandbox is active.
    - Migrates patch safety, Guardian/user-shell approval paths, Landlock
    helper setup, analytics sandbox classification, and selected
    turn/session code to profile-backed permissions.
    - Validates command-level profile overrides against the constrained
    `PermissionProfile` rather than a strict `SandboxPolicy` round trip.
    - Preserves configured deny-read restrictions when command profiles are
    narrowed.
    - Adds coverage for profile-backed trust, network proxy/approval
    behavior, patch safety, analytics classification, and command-profile
    narrowing.
    
    ## Verification
    
    - `cargo test -p codex-core direct_write_roots`
    - `cargo test -p codex-core runtime_roots_to_legacy_projection`
    - `cargo test -p codex-app-server
    requested_permissions_trust_project_uses_permission_profile_intent`
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19393).
    * #19395
    * #19394
    * __->__ #19393
  • [codex] Move config loading into codex-config (#19487)
    ## Why
    
    Config loading had become split across crates: `codex-config` owned the
    config types and merge logic, while `codex-core` still owned the loader
    that assembled the layer stack. This change consolidates that
    responsibility in `codex-config`, so the crate that defines config
    behavior also owns how configs are discovered and loaded.
    
    To make that move possible without reintroducing the old dependency
    cycle, the shell-environment policy types and helpers that
    `codex-exec-server` needs now live in `codex-protocol` instead of
    flowing through `codex-config`.
    
    This also makes the migrated loader tests more deterministic on machines
    that already have managed or system Codex config installed by letting
    tests override the system config and requirements paths instead of
    reading the host's `/etc/codex`.
    
    ## What Changed
    
    - moved the config loader implementation from `codex-core` into
    `codex-config::loader` and deleted the old `core::config_loader` module
    instead of leaving a compatibility shim
    - moved shell-environment policy types and helpers into
    `codex-protocol`, then updated `codex-exec-server` and other downstream
    crates to import them from their new home
    - updated downstream callers to use loader/config APIs from
    `codex-config`
    - added test-only loader overrides for system config and requirements
    paths so loader-focused tests do not depend on host-managed config state
    - cleaned up now-unused dependency entries and platform-specific cfgs
    that were surfaced by post-push CI
    
    ## Testing
    
    - `cargo test -p codex-config`
    - `cargo test -p codex-core config_loader_tests::`
    - `cargo test -p codex-protocol -p codex-exec-server -p
    codex-cloud-requirements -p codex-rmcp-client --lib`
    - `cargo test --lib -p codex-app-server-client -p codex-exec`
    - `cargo test --no-run --lib -p codex-app-server`
    - `cargo test -p codex-linux-sandbox --lib`
    - `cargo shear`
    - `just bazel-lock-check`
    
    ## Notes
    
    - I did not chase unrelated full-suite failures outside the migrated
    loader surface.
    - `cargo test -p codex-core --lib` still hits unrelated proxy-sensitive
    failures on this machine, and Windows CI still shows unrelated
    long-running/timeouting test noise outside the loader migration itself.
  • Lift app-server JSON-RPC error handling to request boundary (#19484)
    ## Why
    
    App-server request handling had a lot of repeated JSON-RPC error
    construction and one-off `send_error`/`return` branches. This made small
    handlers noisy and pushed error response details into leaf code that
    otherwise only needed to validate input or call the underlying API.
    
    ## What Changed
    
    - Added shared JSON-RPC error constructors in
    `codex-rs/app-server/src/error_code.rs`.
    - Lifted straightforward request result emission into
    `codex-rs/app-server/src/message_processor.rs` so response/error
    dispatch happens at the request boundary.
    - Reused the result helpers across command exec, config, filesystem,
    device-key, external-agent config, fs-watch, and outgoing-message paths.
    - Removed leaf wrapper handlers where the method body was only
    forwarding to a response helper.
    - Returned request validation errors upward in the simple cases instead
    of sending an error locally and immediately returning.
    
    ## Verification
    
    - `cargo test -p codex-app-server --lib command_exec::tests`
    - `cargo test -p codex-app-server --lib outgoing_message::tests`
    - `cargo test -p codex-app-server --lib in_process::tests`
    - `cargo test -p codex-app-server --test all v2::fs`
    - `cargo test -p codex-app-server --test all v2::config_rpc`
    - `cargo test -p codex-app-server --test all v2::external_agent_config`
    - `cargo test -p codex-app-server --test all v2::initialize`
    - `just fix -p codex-app-server`
    - `git diff --check`
    
    Note: full `cargo test -p codex-app-server` was attempted and stopped in
    `message_processor::tracing_tests::turn_start_jsonrpc_span_parents_core_turn_spans`
    with a stack overflow after unrelated tests had already passed.
  • permissions: derive compatibility policies from profiles (#19392)
    ## Why
    
    After #19391, `PermissionProfile` and the split filesystem/network
    policies could still be stored in parallel. That creates drift risk: a
    profile can preserve deny globs, external enforcement, or split
    filesystem entries while a cached projection silently loses those
    details. This PR makes the profile the runtime source and derives
    compatibility views from it.
    
    ## What Changed
    
    - Removes stored filesystem/network sandbox projections from
    `Permissions` and `SessionConfiguration`; their accessors now derive
    from the canonical `PermissionProfile`.
    - Derives legacy `SandboxPolicy` snapshots from profiles only where an
    older API still needs that field.
    - Updates MCP connection and elicitation state to track
    `PermissionProfile` instead of `SandboxPolicy` for auto-approval
    decisions.
    - Adds semantic filesystem-policy comparison so cwd changes can preserve
    richer profiles while still recognizing equivalent legacy projections
    independent of entry ordering.
    - Updates config/session tests to assert profile-derived projections
    instead of parallel stored fields.
    
    ## Verification
    
    - `cargo test -p codex-core direct_write_roots`
    - `cargo test -p codex-core runtime_roots_to_legacy_projection`
    - `cargo test -p codex-app-server
    requested_permissions_trust_project_uses_permission_profile_intent`
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19392).
    * #19395
    * #19394
    * #19393
    * __->__ #19392
  • permissions: make runtime config profile-backed (#19606)
    ## Why
    
    This supersedes #19391. During stack repair, GitHub marked #19391 as
    merged into a temporary stack branch rather than into `main`, so the
    runtime-config change needed a fresh PR.
    
    `PermissionProfile` is now the canonical permissions shape after #19231
    because it can distinguish `Managed`, `Disabled`, and `External`
    enforcement while also carrying filesystem rules that legacy
    `SandboxPolicy` cannot represent cleanly. Core config and session state
    still needed to accept profile-backed permissions without forcing every
    profile through the strict legacy bridge, which rejected valid runtime
    profiles such as direct write roots.
    
    The unrelated CI/test hardening that previously rode along with this PR
    has been split into #19683 so this PR stays focused on the permissions
    model migration.
    
    ## What Changed
    
    - Adds `Permissions.permission_profile` and
    `SessionConfiguration.permission_profile` as constrained runtime state,
    while keeping `sandbox_policy` as a legacy compatibility projection.
    - Introduces profile setters that keep `PermissionProfile`, split
    filesystem/network policies, and legacy `SandboxPolicy` projections
    synchronized.
    - Uses a compatibility projection for requirement checks and legacy
    consumers instead of rejecting profiles that cannot round-trip through
    `SandboxPolicy` exactly.
    - Updates config loading, config overrides, session updates, turn
    context plumbing, prompt permission text, sandbox tags, and exec request
    construction to carry profile-backed runtime permissions.
    - Preserves configured deny-read entries and `glob_scan_max_depth` when
    command/session profiles are narrowed.
    - Adds `PermissionProfile::read_only()` and
    `PermissionProfile::workspace_write()` presets that match legacy
    defaults.
    
    ## Verification
    
    - `cargo test -p codex-core direct_write_roots`
    - `cargo test -p codex-core runtime_roots_to_legacy_projection`
    - `cargo test -p codex-app-server
    requested_permissions_trust_project_uses_permission_profile_intent`
    
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19606).
    * #19395
    * #19394
    * #19393
    * #19392
    * __->__ #19606
  • feat: load AgentIdentity from JWT login/env (#18904)
    ## Summary
    
    This PR lets programmatic AgentIdentity users provide one token through
    either stdin login or environment auth.
    
    `codex login --with-agent-identity` reads an Agent Identity JWT from
    stdin, validates that it has the required claims, and stores that token
    as the `agent_identity` value in `auth.json`. The file format is
    token-only; the decoded account and key fields are runtime state, not
    hand-authored auth.json fields.
    
    The Agent Identity JWT claim shape and decoder live in
    `codex-agent-identity`; `codex-login` only owns env/storage precedence
    and conversion into `CodexAuth::AgentIdentity`.
    
    When env auth is enabled, `CODEX_AGENT_IDENTITY` can provide the same
    JWT without writing auth state to disk. `CODEX_API_KEY` still wins if
    both env vars are set.
    
    Reference old stack: https://github.com/openai/codex/pull/17387/changes
    Reference JWT/env stack: https://github.com/openai/codex/pull/18176
    
    ## Stack
    
    1. https://github.com/openai/codex/pull/18757: full revert
    2. https://github.com/openai/codex/pull/18871: isolated Agent Identity
    crate
    3. https://github.com/openai/codex/pull/18785: explicit AgentIdentity
    auth mode and startup task allocation
    4. https://github.com/openai/codex/pull/18811: migrate Codex backend
    auth callsites through AuthProvider
    5. This PR: accept AgentIdentity JWTs through login/env
    
    ## Testing
    
    Tests: targeted login and Agent Identity crate tests, CLI checks, scoped
    formatter/linter cleanup, and CI.
    
    ---------
    
    Co-authored-by: Shijie Rao <shijie.rao@openai.com>
  • test: harden app-server integration tests (#19683)
    ## Why
    
    Windows Bazel runs in the permissions stack exposed that app-server
    integration tests were launching normal plugin startup warmups in every
    subprocess. Those warmups can call
    `https://chatgpt.com/backend-api/plugins/featured` when a test is not
    specifically exercising plugin startup, which adds slow background work,
    noisy stderr, and dependence on external network state. The relevant
    startup/featured-plugin behavior was introduced across #15042 and
    #15264.
    
    A few app-server tests also had long optional waits or unbounded cleanup
    paths, making failures expensive to diagnose and contributing to slow
    Windows shards. One external-agent config test from #18246 used a
    GitHub-style marketplace source, which was enough to exercise the
    pending remote-import path but also meant the background completion task
    could attempt a real clone.
    
    ## What Changed
    
    - Adds explicit `AppServerRuntimeOptions` / `PluginStartupTasks`
    plumbing and a hidden debug-only
    `--disable-plugin-startup-tasks-for-tests` app-server flag, so
    integration tests can suppress startup plugin warmups without adding a
    production env-var gate.
    - Has the app-server test harness pass that hidden flag by default,
    while opting plugin-startup coverage back in for tests that
    intentionally exercise startup sync and featured-plugin warmup behavior.
    - Lowers normal app-server subprocess logging from `info`/`debug` to
    `warn` to avoid multi-megabyte stderr output in Bazel logs.
    - Prevents the external-agent config test from attempting a real
    marketplace clone by using an invalid non-local source while still
    exercising the pending-import completion path.
    - Bounds optional filesystem/realtime waits and fake WebSocket
    test-server shutdown so failures produce targeted timeouts instead of
    hanging a shard.
    - Fixes the Unix script-resolution test in `rmcp-client` to exercise
    PATH resolution directly and include the actual spawn error in failures.
    
    ## Verification
    
    - `cargo check -p codex-app-server`
    - `cargo clippy -p codex-app-server --tests -- -D warnings`
    - `cargo test -p codex-rmcp-client
    program_resolver::tests::test_unix_executes_script_without_extension`
    - `cargo test -p codex-app-server --test all
    external_agent_config_import_sends_completion_notification_after_pending_plugins_finish
    -- --nocapture`
    - `cargo test -p codex-app-server --test all
    plugin_list_uses_warmed_featured_plugin_ids_cache_on_first_request --
    --nocapture`
    - Windows Local Bazel passed with this test-hardening bundle before it
    was extracted from #19606.
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19683).
    * #19395
    * #19394
    * #19393
    * #19392
    * #19606
    * __->__ #19683
  • [codex] remove responses command (#19640)
    This removes the hidden `codex responses` CLI subcommand after
    confirming no downstream callers rely on it, deleting the raw Responses
    passthrough implementation, unregistering the subcommand, and dropping
    the now-unused CLI dependencies on `codex-api` and
    `codex-model-provider`.
  • Support end_turn in response.completed (#19610)
    Some providers of Responses API forward a model-defined `end_turn`
    boolean indicating explicitly the model's indication of whether it would
    like to end the turn or to be inferenced again. In this PR, we update
    the sampling loop to use this field correctly if it's set. If the field
    is not set by the provider, we fall back to the existing sampling logic.
  • fix(tui): reflow scrollback on terminal resize (#18575)
    Fixes multiple scrollback and terminal resize issues: #5538, #5576,
    #8352, #12223, #16165, and #15380.
    
    ## Why
    
    Codex writes finalized transcript output into terminal scrollback after
    wrapping it for the current viewport width. A later terminal resize
    could leave that scrollback shaped for the old width, so wider windows
    kept narrow output and narrower windows could show stale wrapping
    artifacts until enough new output replaced the visible area.
    
    This is also the foundation PR for responsive markdown tables. Table
    rendering needs finalized transcript content to be width-sensitive after
    insertion, not only while content is first streaming. Markdown table
    rendering itself stays in #18576.
    
    ## Stack
    
    - PR1: resize backlog reflow and interrupt cleanup
    - #18576: markdown table support
    
    ## What Changed
    
    - Rebuild source-backed transcript history when the terminal width
    changes. `terminal_resize_reflow` is introduced through the experimental
    feature system, but is enabled by default for this rollout so we can
    validate behavior across real terminals.
    - Preserve assistant and plan stream source so finalized streaming
    output can participate in resize reflow after consolidation.
    - Debounce resize work, but force a final source-backed reflow when a
    resize happened during active or unconsolidated streaming output.
    - Clear stale pending history lines on resize so old-width wrapped
    output is not emitted just before rebuilt scrollback.
    - Bound replay work with `[tui.terminal_resize_reflow].max_rows`:
    omitted uses terminal-specific defaults, `0` keeps all rendered rows,
    and a positive value sets an explicit cap. The cap applies both while
    initially replaying a resumed transcript into scrollback and when
    rebuilding scrollback after terminal resize.
    - Consolidate interrupted assistant streams before cleanup, then clear
    pending stream output and active-tail state consistently.
    - Move resize reflow and thread event buffering helpers out of `app.rs`
    into dedicated TUI modules.
    - Add focused coverage for resize reflow, feature-gated behavior,
    streaming source preservation, interrupted output cleanup,
    unicode-neutral text, terminal-specific row caps, and composer/layout
    stability.
    
    ## Runtime Bounds
    
    Resize reflow keeps only the most recent rendered rows when a row cap is
    active. The default is `auto`, which maps to the detected terminal's
    default scrollback size where Codex can identify it: VS Code `1000`,
    Windows Terminal `9001`, WezTerm `3500`, and Alacritty `10000`.
    Terminals without a dedicated mapping use the conservative fallback of
    `1000` rows. Users can override this with `[tui.terminal_resize_reflow]
    max_rows = N`, or set `max_rows = 0` to disable row limiting.
    
    ## Validation
    
    - `just fmt`
    - `git diff --check`
    - `cargo test --manifest-path codex-rs/Cargo.toml -p codex-tui reflow`
    - `cargo test --manifest-path codex-rs/Cargo.toml -p codex-tui
    transcript_reflow`
    - `just fix -p codex-tui`
    - PR CI in progress on the squashed branch
  • Guard npm update readiness (#19389)
    ## Why
    For npm/Bun-managed installs, the update prompt was treating the latest
    GitHub release as ready to install. During the `0.124.0` release, GitHub
    and npm visibility were not atomic: the root npm wrapper could become
    visible before the npm registry marked that version as the package
    `latest`. That left a window where users could be prompted to upgrade
    before npm was ready for the release.
    
    ## What changed
    - Keep GitHub Releases as the candidate latest-version source for
    npm/Bun installs, but only write the existing `version.json` cache after
    npm registry metadata proves that same root version is ready.
    - Add `codex-rs/tui/src/npm_registry.rs` to validate npm readiness by
    checking `dist-tags.latest` and root package `dist` metadata for the
    GitHub candidate version.
    - Move version parsing helpers into
    `codex-rs/tui/src/update_versions.rs` so that logic can be tested
    without compiling the release-only `updates.rs` module under tests.
    - Update `.github/workflows/rust-release.yml` so the six known platform
    tarballs publish before the root `@openai/codex` wrapper. Other npm
    tarballs publish before the root wrapper, and the SDK publishes after
    the root package it depends on.
  • fix: restore 30-minute timeout for Bazel builds (#19609)
    I think raising it to 45 minutes in
    https://github.com/openai/codex/pull/19578 was a mistake for the reasons
    explained in the comments in the code. Instead, we attempt to defend
    against timeouts by increasing the number of shards in
    `app-server-all-test` so that a "true failure" that gets run 3x should
    not take as much wall clock time.
  • test: stabilize app-server path assertions on Windows (#19604)
    ## Why
    
    Windows can represent the same canonical local path with either a normal
    drive path or a verbatim device path prefix. The failure pattern that
    motivated this PR was an assertion diff like `C:\...` versus
    `\\?\C:\...`: different spellings, same file.
    
    That became visible while validating the permissions stack above this
    PR. The stack increasingly routes paths through `AbsolutePathBuf`, which
    normalizes supported Windows device prefixes, while several existing
    tests still built expected values directly with
    `std::fs::canonicalize()` or compared `AbsolutePathBuf::as_path()` to a
    raw `PathBuf`. On Windows, that can make tests fail because the two
    sides choose different textual forms for an otherwise equivalent
    canonical path.
    
    This PR is intentionally split out as the bottom PR below #19606. The
    runtime permissions migration should not carry unrelated Windows test
    stabilization, and reviewers should be able to verify this as a
    test-only change before looking at the larger permissions changes.
    
    ## Failure Modes Covered
    
    - `conversation_summary` expected rollout paths were built from raw
    canonicalized `PathBuf`s, while app-server responses could carry
    `AbsolutePathBuf`-normalized paths.
    - `thread_resume` compared returned thread paths directly to previously
    stored or fixture paths, so a verbatim-prefix spelling could fail an
    otherwise correct resume.
    - `marketplace_add` compared plugin install roots through `as_path()`
    against raw canonicalized paths, reproducing the same `C:\...` versus
    `\\?\C:\...` mismatch in both app-server and core-plugin coverage.
    
    ## What Changed
    
    - In `app-server/tests/suite/conversation_summary.rs`, normalize both
    expected rollout paths and received `ConversationSummary.path` values
    through `AbsolutePathBuf` before comparing the full summary object.
    - In `app-server/tests/suite/v2/thread_resume.rs`, normalize both sides
    of thread path comparisons before asserting equality. This keeps the
    tests focused on whether resume returned the same existing path, not
    whether Windows used the same string spelling.
    - In `app-server/tests/suite/v2/marketplace_add.rs` and
    `core-plugins/src/marketplace_add.rs`, compare install roots as
    `AbsolutePathBuf` values instead of comparing an absolute-path wrapper
    to a raw canonicalized `PathBuf`.
    
    ## Behavior
    
    This PR does not change production app-server or marketplace behavior.
    It only changes tests to assert semantic path identity across Windows
    path spelling variants. It also leaves API response values untouched;
    the normalization happens inside assertions only.
    
    ## Verification
    
    Targeted local checks run while extracting this fix:
    
    - `cargo test -p codex-app-server
    get_conversation_summary_by_thread_id_reads_rollout`
    - `cargo test -p codex-app-server
    get_conversation_summary_by_relative_rollout_path_resolves_from_codex_home`
    - `cargo test -p codex-app-server
    thread_resume_prefers_path_over_thread_id`
    
    Windows-specific confidence comes from the Bazel Windows CI job for this
    PR, since the failure is platform-specific.
    
    ## Docs
    
    No docs update is needed because this is test-only infrastructure
    stabilization.
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19604).
    * #19395
    * #19394
    * #19393
    * #19392
    * #19606
    * __->__ #19604
  • [codex] Bypass managed network for escalated exec (#19595)
    ## Why
    
    `sandbox_permissions = "require_escalated"` is treated as an explicit
    request to approve the command and run it outside the
    filesystem/platform sandbox. Before this change, shell and unified exec
    still registered managed network approval context and could inject
    Codex-managed proxy state into the child process, which meant an
    approved escalated command could still hit a second network approval
    path.
    
    This PR makes that escalation boundary consistent: once a command is
    explicitly approved to run outside the sandbox, Codex does not also
    route that process through the managed network proxy.
    
    ## Security impact
    
    Command/filesystem sandbox approval now implies network approval for
    that command. If an untrusted command or script is allowed to run with
    `require_escalated`, its network calls are unsandboxed: Codex-managed
    network allowlists and denylists are not respected for that process, so
    the command can exfiltrate any data it can read.
    
    ## What changed
    
    - Skip managed network approval specs for
    `SandboxPermissions::RequireEscalated`.
    - Pass `network: None` into shell, zsh-fork shell, and unified exec
    sandbox preparation for explicitly escalated requests.
    - Strip Codex-managed proxy environment variables when
    `CODEX_NETWORK_PROXY_ACTIVE` is present, while preserving user proxy env
    when the Codex marker is absent.
    - Add regression coverage for the prepared exec request so the old
    behavior cannot silently reappear.
    
    ## Verification
    
    - `cargo test -p codex-core explicit_escalation`
    - `cargo clippy -p codex-core --all-targets -- -D warnings`
  • Keep slash command popup columns stable while scrolling (#19511)
    ## Why
    
    Fixes #19499.
    
    The slash-command popup recalculated the command-name column from only
    the rows visible in the current viewport. That made the description
    column shift horizontally while scrolling through `/` commands whenever
    longer command names entered or left the visible window.
    
    ## What Changed
    
    `codex-rs/tui/src/bottom_pane/command_popup.rs` now uses the shared
    selection-popup `AutoAllRows` column-width mode for both height
    measurement and rendering. This keeps the command description column
    based on the full filtered slash-command list instead of the current
    viewport.
    
    ## Verification
    
    - `cargo test -p codex-tui bottom_pane::command_popup`
  • test: isolate remote thread store regression from plugin warmups (#19593)
    Follow-up to #19266.
    
    ## Why
    
    
    `thread_start_with_non_local_thread_store_does_not_create_local_persistence`
    is meant to catch accidental local thread persistence when a non-local
    thread store is configured. The Windows flake reported in [this
    BuildBuddy
    invocation](https://app.buildbuddy.io/invocation/0b75dde4-6828-4e7b-a35b-e45b73fb005d)
    showed that the assertion was tripping on an unexpected top-level `.tmp`
    entry:
    
    ```diff
     {
    +    ".tmp",
         "config.toml",
         "installation_id",
         "memories",
         "skills",
     }
    ```
    
    That `.tmp` does not appear to come from `tempfile::TempDir`; it comes
    from unrelated plugin startup work that can legitimately materialize
    `codex_home/.tmp`, including the startup remote plugin sync marker in
    [`core/src/plugins/startup_sync.rs`](https://github.com/openai/codex/blob/bce74c70ce058982534507330ff33f7b196708ef/codex-rs/core/src/plugins/startup_sync.rs#L13-L15)
    and the curated plugin snapshot under
    [`.tmp/plugins`](https://github.com/openai/codex/blob/bce74c70ce058982534507330ff33f7b196708ef/codex-rs/core-plugins/src/startup_sync.rs#L25-L26).
    
    That makes the regression race unrelated background startup tasks
    instead of validating the thread-store invariant it was added to cover.
    Rather than weakening the assertion to allow arbitrary `.tmp` entries,
    this change isolates the test from plugin warmups so it can stay strict
    about unexpected local thread persistence artifacts.
    
    ## What changed
    
    - disable plugins in the generated config used by
    `app-server/tests/suite/v2/remote_thread_store.rs`
    - keep the existing `codex_home` assertions unchanged so the test still
    fails if local session or sqlite persistence is introduced
    
    ## Verification
    
    - `cargo test -p codex-app-server
    suite::v2::remote_thread_store::thread_start_with_non_local_thread_store_does_not_create_local_persistence
    -- --exact`
  • Restore persisted model provider on thread resume (#19287)
    Fixes #15219.
    
    ## Why
    
    `thread/resume` should continue a persisted thread with the same model
    provider that created the thread. The app server already restores the
    persisted model and reasoning effort before resuming, but it was leaving
    `model_provider` unset. If a user created a thread with one provider and
    later switched their active profile to another provider, resumed
    encrypted history could be sent to the wrong endpoint and fail with
    `invalid_encrypted_content`.
    
    The thread metadata already records the original provider, so resume
    should apply it when the caller has not explicitly requested a different
    model/provider/reasoning configuration.
    
    ## What changed
    
    This updates `merge_persisted_resume_metadata` in
    `app-server/src/codex_message_processor.rs` to copy
    `ThreadMetadata::model_provider` into `ConfigOverrides::model_provider`
    alongside the persisted model.
    
    The existing resume metadata tests now also assert that:
    
    - the persisted provider is restored for normal resume
    - explicit model, provider, or reasoning-effort overrides still prevent
    persisted resume metadata from being applied
    - a thread with no persisted model or reasoning effort still resumes
    with its persisted provider
    
    ## Verification
    
    - `cargo test -p codex-app-server` passed the app-server unit tests,
    including the updated resume metadata coverage. The broader integration
    portion of that command failed in an unrelated environment-sensitive
    skills-budget warning assertion, where this run saw 8 omitted skills
    instead of the expected 7.
    - `just fix -p codex-app-server` completed successfully.
  • fix: increase Bazel timeout to 45 minutes (#19578)
    Unfortunately, if most of the build graph is invalidated such that there
    are few cache hits, the Windows Bazel build for all the tests often
    takes more than `30` minutes, so this PR increases the timeout to `45`
    minutes until we set up distributed builds.
  • [codex] Order codex-mcp items by visibility (#19526)
    ## Why
    
    The visibility cleanup in the base PR reduced what `codex-mcp` exposes,
    but several files still made reviewers read private support machinery
    before the public or crate-facing entry points. This ordering pass makes
    each file easier to scan: exported API first, crate-visible MCP
    internals next, then private helpers in breadth-first order from the
    higher-level MCP flows to leaf utilities.
    
    ## What Changed
    
    - Reordered `codex-mcp` exports so the runtime, configuration, snapshot,
    auth, and helper surfaces are grouped by visibility and reader
    importance.
    - Moved public and crate-visible MCP items ahead of private helpers in
    the auth, MCP planning/snapshot, connection manager, and tool-name
    modules.
    - Kept the change mechanical, with no behavior changes intended.
    
    ## Verification
    
    - `cargo check -p codex-mcp`
  • [codex] Prune unused codex-mcp API and duplicate helpers (#19524)
    ## Why
    
    `codex-mcp` currently exposes more API than the rest of the workspace
    uses. Some of that surface is simply visibility that can be tightened,
    and some of it is public helper code that remains compiler-valid because
    it is exported even though no workspace caller uses it.
    
    That distinction matters: Rust does not warn on exported API just
    because the current workspace does not call it. This PR intentionally
    treats those exported-but-workspace-unreferenced paths as stale
    `codex-mcp` surface. The main example is MCP skill dependency
    collection, where the active implementation now lives in
    `codex-rs/core/src/mcp_skill_dependencies.rs`; keeping the older
    `codex-mcp` copy makes it unclear which implementation owns skill MCP
    installation.
    
    ## What Changed
    
    - Pruned unused `codex-mcp` re-exports from `codex-mcp/src/lib.rs`.
    - Removed non-runtime helper methods from `McpConnectionManager` so it
    stays focused on live MCP clients.
    - Made `ToolPluginProvenance` lookup methods crate-private.
    - Removed workspace-unreferenced snapshot wrapper APIs and
    qualified-tool grouping helpers.
    - Deleted the duplicate `codex-mcp` skill dependency module and tests
    now that skill MCP dependency handling is owned by `core`.
    
    ## Verification
    
    - `cargo check -p codex-mcp`
  • Enable unavailable dummy tools by default (#19459)
    ## Summary
    - Mark `unavailable_dummy_tools` as a stable feature and enable it by
    default
    - Update the feature registry test to match the new default state
    
    ## Testing
    - `just fmt`
    - `cargo test -p codex-features`
  • Fix codex-rs README grammar (#19514)
    ## Why
    
    Issue #19418 points out a small grammar issue in `codex-rs/README.md`
    under "Code Organization." The current sentence says "we hope this to
    be," which reads awkwardly.
    
    Fixes #19418.
    
    ## What changed
    
    Updated the `core/` crate description so the sentence reads "we hope
    this becomes a library crate."
    
    ## Verification
    
    Documentation-only change. Reviewed the Markdown diff.
  • Split approval matrix test groups (#19454)
    ## Why
    
    Recent `main` CI repeatedly timed out in:
    
    - `codex-core::all suite::approvals::approval_matrix_covers_all_modes`
    
    It failed in runs
    [24909500958](https://github.com/openai/codex/actions/runs/24909500958),
    [24908076251](https://github.com/openai/codex/actions/runs/24908076251),
    [24906197645](https://github.com/openai/codex/actions/runs/24906197645),
    [24905823212](https://github.com/openai/codex/actions/runs/24905823212),
    [24903439629](https://github.com/openai/codex/actions/runs/24903439629),
    [24903336028](https://github.com/openai/codex/actions/runs/24903336028),
    and
    [24898949647](https://github.com/openai/codex/actions/runs/24898949647).
    
    The failure pattern was a 60s Linux remote timeout. Logs showed many
    approval scenarios completing before the single matrix test timed out.
    
    ## Root Cause
    
    `approval_matrix_covers_all_modes` packed every approval/sandbox/tool
    scenario into one test case. That made the test vulnerable to normal CI
    variance: one slow scenario or a slow process startup could push the
    whole monolithic case past the 60s per-test timeout. It also hid which
    part of the matrix was slow because the runner only reported the one
    large matrix test.
    
    ## What Changed
    
    - Keep the shared `scenarios()` table as the single source of approval
    matrix coverage.
    - Use one `#[test_case]` per `ScenarioGroup` to generate five async
    Tokio tests: danger/full-access, read-only, workspace-write,
    apply-patch, and unified-exec.
    - Keep the group runner small and add per-scenario error context so a
    failure still reports the specific scenario name.
    
    ## Why This Should Be Reliable
    
    Each scenario group now has its own test harness timeout instead of
    sharing one timeout window with the full matrix. That removes the long
    sequential loop from a single test while keeping the implementation
    compact and easy to scan.
    
    The tests still run through the same scenario definitions and runner, so
    this preserves coverage. `test-case` already composes with
    `#[tokio::test]` in this crate and is already available for test code.
    
    ## Verification
    
    - `cargo test -p codex-core --test all approval_matrix_ -- --list`
    - `cargo test -p codex-core --test all approval_matrix_`
  • Add goal TUI UX (5 / 5) (#18077)
    Adds the TUI user experience for goals on top of the core runtime from
    PR 4.
    
    ## Why
    
    Users need a direct TUI control surface for long-running goals. The UI
    should make the current goal visible, support common goal actions
    without waiting for a model turn, and avoid confusing end-of-turn
    notifications while an active goal is immediately continuing.
    
    ## What changed
    
    - Added `/goal` summary rendering for the current goal, including
    active, paused, budget-limited, and complete states.
    - Added `/goal <objective>` creation/replacement through the app-server
    goal API rather than a model prompt.
    - Added `/goal clear`, `/goal pause`, and `/goal unpause` command
    variants.
    - Added a confirmation menu when the user enters a new goal while
    another goal already exists.
    - Updated `/goal` help and summary tip text so it reflects the supported
    command variants without advertising slash-command token budgets.
    - Added footer/statusline goal indicators, including elapsed time and
    token budget display when a budget exists from API/tool-created goals.
    - Consumes goal updated/cleared notifications so the TUI stays in sync
    with external app-server changes.
    - Suppresses end-of-turn desktop notifications only when a goal is still
    active and follow-up work is expected.
    - Preserves slash-command history behavior and avoids leaking queued
    `/goal` state into unrelated submissions.
    
    ## Verification
    
    - Added TUI unit and snapshot coverage for goal command availability,
    summary rendering, control commands, replacement menu behavior,
    status/footer display, notification handling, and command history.
  • Add goal core runtime (4 / 5) (#18076)
    Adds the core runtime behavior for active goals on top of the model
    tools from PR 3.
    
    ## Why
    
    A long-running goal should be a core runtime concern, not something
    every client has to implement. Core owns the turn lifecycle, tool
    completion boundaries, interruptions, resume behavior, and token usage,
    so it is the right place to account progress, enforce budgets, and
    decide when to continue work.
    
    ## What changed
    
    - Centralized goal lifecycle side effects behind
    `Session::goal_runtime_apply(GoalRuntimeEvent::...)`.
    - Starts goal continuation turns only when the session is idle; pending
    user input and mailbox work take priority.
    - Accounts token and wall-clock usage at turn, tool, mutation,
    interrupt, and resume boundaries; `get_thread_goal` remains read-only.
    - Preserves sub-second wall-clock remainder across accounting boundaries
    so long-running goals do not drift downward over time.
    - Treats token budget exhaustion as a soft stop by marking the goal
    `budget_limited` and injecting wrap-up steering instead of aborting the
    active turn.
    - Suppresses budget steering when `update_goal` marks a goal complete.
    - Pauses active goals on interrupt and auto-reactivates paused goals
    when a thread resumes outside plan mode.
    - Suppresses repeated automatic continuation when a continuation turn
    makes no tool calls.
    - Added continuation and budget-limit prompt templates.
    
    ## Verification
    
    - Added focused core coverage for continuation scheduling, accounting
    boundaries, budget-limit steering, completion accounting, interrupt
    pause behavior, resume auto-activation, and wall-clock remainder
    accounting.
  • Add goal model tools (3 / 5) (#18075)
    Adds the model-facing goal tools on top of the app-server API from PR 2.
    
    ## Why
    
    Once goals are persisted and exposed to clients, the model needs a
    small, constrained tool surface for goal workflows. The tool contract
    should let the model inspect goals, create them only when explicitly
    requested, and mark them complete without giving it broad control over
    user/runtime-owned state.
    
    ## What changed
    
    - Added `get_goal`, `create_goal`, and `update_goal` tool specs behind
    the `goals` feature flag.
    - Added core goal tool handlers that validate objectives and token
    budgets before mutating persisted state.
    - Constrained `create_goal` to create only when no goal exists, with
    optional `token_budget` only when a budget is explicitly provided.
    - Tightened the `create_goal` instructions so the model does not infer
    goals from ordinary task requests.
    - Constrained `update_goal` to expose only goal completion; pause,
    resume, clear, and budget-limited transitions remain user- or
    runtime-controlled.
    - Registered the goal tools in the tool registry and kept them out of
    review contexts where they should not appear.
    
    ## Verification
    
    - Added tool-registry coverage for feature gating and tool availability.
    - Added core session tests for create/get/update behavior, duplicate
    goal rejection, budget validation, and completion-only updates.
  • Add goal app-server API (2 / 5) (#18074)
    Adds the app-server v2 goal API on top of the persisted goal state from
    PR 1.
    
    ## Why
    
    Clients need a stable app-server surface for reading and controlling
    materialized thread goals before the model tools and TUI can use them.
    Goal changes also need to be observable by app-server clients, including
    clients that resume an existing thread.
    
    ## What changed
    
    - Added v2 `thread/goal/get`, `thread/goal/set`, and `thread/goal/clear`
    RPCs for materialized threads.
    - Added `thread/goal/updated` and `thread/goal/cleared` notifications so
    clients can keep local goal state in sync.
    - Added resume/snapshot wiring so reconnecting clients see the current
    goal state for a thread.
    - Added app-server handlers that reconcile persisted rollout state
    before direct goal mutations.
    - Updated the app-server README plus generated JSON and TypeScript
    schema fixtures for the new API surface.
    
    ## Verification
    
    - Added app-server v2 coverage for goal get/set/clear behavior,
    notification emission, resume snapshots, and non-local thread-store
    interactions.
  • Add goal persistence foundation (1 / 5) (#18073)
    Adds the persisted goal foundation for the rest of the stack. This PR is
    intentionally limited to feature flag and state-layer behavior;
    app-server APIs, model tools, runtime continuation, and TUI UX are
    layered in later PRs.
    
    ## Why
    
    Goal mode needs durable thread-level state before clients or model tools
    can safely build on it. The state layer needs to know whether a goal
    exists, what objective it tracks, whether it is active, paused,
    budget-limited, or complete, and how much time/token usage has already
    been accounted.
    
    ## What changed
    
    - Added the `goals` feature flag and generated config schema entry.
    - Added the `thread_goals` state table and Rust model for persisted
    thread goals.
    - Added state runtime APIs for creating, replacing, updating, deleting,
    and accounting goal usage.
    - Added `goal_id`-based stale update protection so an old goal update
    cannot overwrite a replacement.
    - Kept this PR scoped to persistence and state runtime behavior, with no
    app-server, model-facing, continuation, or TUI behavior yet.
    
    ## Verification
    
    - Added state runtime coverage for goal creation, replacement, stale
    update protection, status transitions, token-budget behavior, and usage
    accounting.
  • Fix Bazel cargo_bin runfiles paths (#19468)
    ## Summary
    
    Fix a Bazel-only path resolution bug in
    `codex_utils_cargo_bin::cargo_bin`.
    
    Under Bazel runfiles, `rlocation` can return a relative `bazel-out/...`
    path even though `cargo_bin()` documents that it returns an absolute
    path. That can break callers that store the returned binary path and
    later spawn it after changing cwd, because the relative path is resolved
    from the wrong directory.
    
    This patch absolutizes the runfiles-resolved path before returning it.
  • ci: pin codex-action v1.7 (#19472)
    ## Summary
    - update Codex issue automation to pin `openai/codex-action` to
    `5c3f4ccdb2b8790f73d6b21751ac00e602aa0c02`, the commit for `v1.7`
    - keep the release intent visible with `# v1.7` comments beside the hash
    pins
    
    ## Test plan
    - `git diff --check`
    - `yq e '.' .github/workflows/issue-labeler.yml`
    - `yq e '.' .github/workflows/issue-deduplicator.yml`
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • permissions: remove legacy read-only access modes (#19449)
    ## Why
    
    `ReadOnlyAccess` was a transitional legacy shape on `SandboxPolicy`:
    `FullAccess` meant the historical read-only/workspace-write modes could
    read the full filesystem, while `Restricted` tried to carry partial
    readable roots. The partial-read model now belongs in
    `FileSystemSandboxPolicy` and `PermissionProfile`, so keeping it on
    `SandboxPolicy` makes every legacy projection reintroduce lossy
    read-root bookkeeping and creates unnecessary noise in the rest of the
    permissions migration.
    
    This PR makes the legacy policy model narrower and explicit:
    `SandboxPolicy::ReadOnly` and `SandboxPolicy::WorkspaceWrite` represent
    the old full-read sandbox modes only. Split readable roots, deny-read
    globs, and platform-default/minimal read behavior stay in the runtime
    permissions model.
    
    ## What changed
    
    - Removes `ReadOnlyAccess` from
    `codex_protocol::protocol::SandboxPolicy`, including the generated
    `access` and `readOnlyAccess` API fields.
    - Updates legacy policy/profile conversions so restricted filesystem
    reads are represented only by `FileSystemSandboxPolicy` /
    `PermissionProfile` entries.
    - Keeps app-server v2 compatible with legacy `fullAccess` read-access
    payloads by accepting and ignoring that no-op shape, while rejecting
    legacy `restricted` read-access payloads instead of silently widening
    them to full-read legacy policies.
    - Carries Windows sandbox platform-default read behavior with an
    explicit override flag instead of depending on
    `ReadOnlyAccess::Restricted`.
    - Refreshes generated app-server schema/types and updates tests/docs for
    the simplified legacy policy shape.
    
    ## Verification
    
    - `cargo check -p codex-app-server-protocol --tests`
    - `cargo check -p codex-windows-sandbox --tests`
    - `cargo test -p codex-app-server-protocol sandbox_policy_`
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19449).
    * #19395
    * #19394
    * #19393
    * #19392
    * #19391
    * __->__ #19449
  • fix: Bedrock GPT-5.4 reasoning levels (#19461)
    ## Why
    
    When using the Amazon Bedrock provider with `openai.gpt-5.4-cmb`, the
    model picker allowed `xhigh` because the CMB catalog entry was derived
    from the bundled `gpt-5.4` reasoning metadata. Bedrock rejects that
    effort level, causing the request to fail before the turn can run:
    
    ```text
    {"error":{"code":"validation_error","message":"Failed to deserialize the JSON body into the target type: Invalid 'reasoning': Invalid 'effort': unknown variant `xhigh`, expected one of `high`, `low`, `medium`, `minimal` at line 1 column 77239","param":null,"type":"invalid_request_error"}}
    ```
    
    ## What Changed
    
    - Replace the runtime lookup of bundled `gpt-5.4` metadata for
    `openai.gpt-5.4-cmb` with an explicit Bedrock CMB `ModelInfo` entry.
    - Advertise only the Bedrock-supported CMB reasoning levels: `minimal`,
    `low`, `medium`, and `high`.
    - Keep the existing GPT OSS Bedrock model metadata and reasoning levels
    unchanged.
    - Add catalog coverage for the hardcoded CMB metadata and
    Bedrock-compatible reasoning level list.
  • Refactor log DB into LogWriter interface (#19234)
    ## Why
    
    This prepares feedback log capture for a future remote app-server hook
    sink without changing the current local SQLite upload path. The
    important boundary is now intentionally small: a log sink is a tracing
    `Layer` that can also flush entries it has accepted.
    
    That keeps the existing SQLite implementation simple while giving the
    upcoming gRPC sink a place to fit beside it. SQLite and gRPC have
    different worker/write semantics, so this PR avoids introducing a shared
    buffered-sink abstraction and instead lets each `LogWriter` own the
    buffering mechanics it needs.
    
    ## What Changed
    
    - Added `LogSinkQueueConfig` with the existing local defaults: queue
    capacity `512`, batch size `128`, and flush interval `2s`.
    - Added `LogDbLayer::start_with_config(...)` while preserving
    `LogDbLayer::start(...)` and `log_db::start(...)` defaults.
    - Introduced the `LogWriter` trait as the minimal shared interface:
    `tracing_subscriber::Layer` plus `flush()`.
    - Made `LogDbLayer` implement `LogWriter`.
    - Kept tracing event formatting inside `LogDbLayer`; it still creates
    one `LogEntry` per tracing event before queueing it for SQLite.
    - Kept normal event capture best-effort and non-blocking via bounded
    `try_send`.
    
    ## Behavior Notes
    
    This does not change the SQLite schema, retention behavior,
    `/feedback/upload`, or Sentry upload behavior. Normal log events still
    drop when the queue is full; explicit `flush()` still waits for queue
    capacity and receiver processing before returning.
    
    ## Verification
    
    - `cargo test -p codex-state log_db`
    - `cargo test -p codex-state`
    - `just fix -p codex-state`
    
    The added tests cover configured batch-size flushing, configured
    interval flushing, queue-full drops, and the flush barrier semantics.
  • Serialize legacy Windows PowerShell sandbox tests (#19453)
    ## Why
    
    Recent `main` CI had repeated Windows timeouts in the legacy sandbox
    process tests:
    
    - `codex-windows-sandbox
    session::tests::legacy_capture_powershell_emits_output` failed in runs
    [24909500958](https://github.com/openai/codex/actions/runs/24909500958),
    [24908076251](https://github.com/openai/codex/actions/runs/24908076251),
    [24906197645](https://github.com/openai/codex/actions/runs/24906197645),
    [24905411571](https://github.com/openai/codex/actions/runs/24905411571),
    [24903336028](https://github.com/openai/codex/actions/runs/24903336028),
    and
    [24898949647](https://github.com/openai/codex/actions/runs/24898949647).
    - `legacy_tty_powershell_emits_output_and_accepts_input` failed in the
    same set of runs.
    - `legacy_non_tty_cmd_emits_output` failed in runs
    [24909500958](https://github.com/openai/codex/actions/runs/24909500958),
    [24908076251](https://github.com/openai/codex/actions/runs/24908076251),
    [24906197645](https://github.com/openai/codex/actions/runs/24906197645),
    and
    [24903336028](https://github.com/openai/codex/actions/runs/24903336028).
    - `legacy_non_tty_powershell_emits_output` failed in runs
    [24908076251](https://github.com/openai/codex/actions/runs/24908076251),
    [24906197645](https://github.com/openai/codex/actions/runs/24906197645),
    and
    [24903336028](https://github.com/openai/codex/actions/runs/24903336028).
    
    These failures were 30s timeouts on Windows x64 and/or arm64 rather than
    assertion failures.
    
    ## Root Cause
    
    The active legacy Windows sandbox process tests all exercise host-level
    resources: sandbox setup, ACL/user state, private desktop process
    launch, stdio capture, and PowerShell/cmd child cleanup. Running several
    of these tests concurrently can leave them competing for the same
    Windows sandbox setup path and process/session resources, which makes
    command startup or output collection hang under CI load.
    
    ## What Changed
    
    - Added a shared in-process mutex for the active legacy Windows sandbox
    process tests.
    - Held that guard across each legacy cmd/PowerShell process test so
    those host-resource-heavy cases run one at a time.
    - Kept the skipped legacy cmd TTY tests unchanged.
    
    ## Why This Should Be Reliable
    
    The tests still use unique homes and run the real legacy sandbox process
    path, but they no longer overlap the fragile host-level setup and
    process/session lifecycle. Serializing just this small group removes the
    concurrency race without reducing the behavioral coverage of each test.
    
    ## Verification
    
    - `cargo test -p codex-windows-sandbox`
    - GitHub Windows CI is the primary validation signal for the affected
    tests; on this PR, Windows clippy, Windows release, and Windows local
    Bazel passed after the serialization fix.
  • [codex] Forward Codex Apps tool call IDs to backend metadata (#19207)
    ## Summary
    - include the outer tool `call_id` in Codex Apps MCP request metadata
    under `_meta._codex_apps.call_id`
    - preserve existing Codex Apps metadata like `resource_uri` and
    `contains_mcp_source`
    - add request metadata coverage for both the existing-metadata and
    no-existing-metadata cases
    
    ## Why
    The paired backend change in
    [openai/openai#850796](https://github.com/openai/openai/pull/850796)
    updates MCP compliance logging to prefer `_meta._codex_apps.call_id`
    instead of the JSON-RPC request id. This client change sends that outer
    tool call id so the backend can record the model/tool call identifier
    when it is available.
    
    This is wire-compatible with older backends because `_meta._codex_apps`
    is already reserved backend-only metadata. Backends that do not read
    `call_id` will ignore the extra field.
    
    ## Testing
    - `cargo test -p codex-core request_meta`
    - `just fmt`
    - `just fix -p codex-core`
  • feat: Compress skill paths with root aliases (#19098)
    Add skill root tracking so model-visible skill lists can use short path
    aliases when absolute paths would exceed the metadata budget.
  • [codex] add non-local thread store regression harness (#19266)
    - Add an integration test that guarantees nothing gets written to codex
    home dir or sqlite when running a rollout with a non-local ThreadStore
    - Add an in-memory "spy" ThreadStore for tests like this
    
    Note I could not find a good way to also ensure there were no filesystem
    _reads_ that didn't go through threadstore. I explored a more elaborate
    sandboxed-subprocess approach but it isn't platform portable and felt
    like it wasn't (yet) worth it.
  • Clarify bundled OpenAI Docs upgrade guide wording (#19422)
    ## Summary
    - Mirrors the OpenAI Docs skill cleanup in the bundled Codex skill copy
    - Clarifies reasoning-effort recommendation wording
    - Replaces internal snake_case prompt block names with natural-language
    guidance aligned to the prompting guide
    
    ## Test plan
    - `git diff --check`
    - Verified the old snake_case prompt block names no longer appear in the
    bundled upgrade guide
  • ci: publish codex-app-server release artifacts (#19447)
    ## Why
    The VS Code extension and desktop app do not need the full TUI binary,
    and `codex-app-server` is materially smaller than standalone `codex`. We
    still want to publish it as an official release artifact, but building
    it by tacking another `--bin` onto the existing release `cargo build`
    invocations would lengthen those jobs.
    
    This change keeps `codex-app-server` on its own release bundle so it can
    build in parallel with the existing `codex` and helper bundles.
    
    ## What changed
    - Made `.github/workflows/rust-release.yml` bundle-aware so each macOS
    and Linux MUSL target now builds either the existing `primary` bundle
    (`codex` and `codex-responses-api-proxy`) or a standalone `app-server`
    bundle (`codex-app-server`).
    - Preserved the historical artifact names for the primary macOS/Linux
    bundles so `scripts/stage_npm_packages.py` and
    `codex-cli/scripts/install_native_deps.py` continue to find release
    assets under the paths they already expect, while giving the new
    app-server artifacts distinct names.
    - Added a matching `app-server` bundle to
    `.github/workflows/rust-release-windows.yml`, and updated the final
    Windows packaging job to download, sign, stage, and archive
    `codex-app-server.exe` alongside the existing release binaries.
    - Generalized the shared signing actions in
    `.github/actions/linux-code-sign/action.yml`,
    `.github/actions/macos-code-sign/action.yml`, and
    `.github/actions/windows-code-sign/action.yml` so each workflow row
    declares its binaries once and reuses that list for build, signing, and
    staging.
    - Added `codex-app-server` to `.github/dotslash-config.json` so releases
    also publish a generated DotSlash manifest for the standalone app-server
    binary.
    - Kept the macOS DMG focused on the existing `primary` bundle;
    `codex-app-server` ships as the regular standalone archives and DotSlash
    manifest.
    
    ## Verification
    - Parsed the modified workflow and action YAML files locally with
    `python3` + `yaml.safe_load(...)`.
    - Parsed `.github/dotslash-config.json` locally with `python3` +
    `json.loads(...)`.
    - Reviewed the resulting release matrices, artifact names, and packaging
    paths to confirm that `codex-app-server` is built separately on macOS,
    Linux MUSL, and Windows, while the existing npm staging and Windows
    `codex` zip bundling contracts remain intact.
  • Add gpt-image-2 to bundled OpenAI Docs skill (#19443)
    ## Summary
    - Mirrors openai/skills#374 in the Codex bundled OpenAI Docs skill
    - Adds `gpt-image-2` as the best image generation/edit model
    - Updates `gpt-image-1.5` to less expensive image generation/edit
    quality
    
    ## Test plan
    - `git diff --check`
  • ci: stop publishing GNU Linux release artifacts (#19445)
    ## Why
    We already prefer shipping the MUSL Linux builds, and the in-repo
    release consumers resolve Linux release assets through the MUSL targets.
    Keeping the GNU release jobs around adds release time and extra assets
    without serving the paths we actually publish and consume.
    
    This is also easier to reason about as a standalone change: future work
    can point back to this PR as the intentional decision to stop publishing
    `x86_64-unknown-linux-gnu` and `aarch64-unknown-linux-gnu` release
    artifacts.
    
    ## What changed
    - Removed the `x86_64-unknown-linux-gnu` and `aarch64-unknown-linux-gnu`
    entries from the `build` matrix in `.github/workflows/rust-release.yml`.
    - Added a short comment in that matrix documenting that Linux release
    artifacts intentionally ship MUSL-linked binaries.
    
    ## Verification
    - Reviewed `.github/workflows/rust-release.yml` to confirm that the
    release workflow now only builds Linux release artifacts for
    `x86_64-unknown-linux-musl` and `aarch64-unknown-linux-musl`.
  • Migrate fork and resume reads to thread store (#18900)
    - Route cold thread/resume and thread/fork source loading through
    ThreadStore reads instead of direct rollout path operations
    - Keep lookups that explicitly specify a rollout-path using the local
    thread store methods but return an invalid-request error for remote
    ThreadStore configurations
    - Add some additional unit tests for code path coverage
  • permissions: make legacy profile conversion cwd-free (#19414)
    ## Why
    
    The profile conversion path still required a `cwd` even when it was only
    translating a legacy `SandboxPolicy` into a `PermissionProfile`. That
    made profile producers invent an ambient `cwd`, which is exactly the
    anchoring we are trying to remove from permission-profile data. A legacy
    workspace-write policy can be represented symbolically instead: `:cwd =
    write` plus read-only `:project_roots` metadata subpaths.
    
    This PR creates that cwd-free base so the rest of the stack can stop
    threading cwd through profile construction. Callers that actually need a
    concrete runtime filesystem policy for a specific cwd still have an
    explicitly named cwd-bound conversion.
    
    ## What Changed
    
    - `PermissionProfile::from_legacy_sandbox_policy` now takes only
    `&SandboxPolicy`.
    - `FileSystemSandboxPolicy::from_legacy_sandbox_policy` is now the
    symbolic, cwd-free projection for profiles.
    - The old concrete projection is retained as
    `FileSystemSandboxPolicy::from_legacy_sandbox_policy_for_cwd` for
    runtime/boundary code that must materialize legacy cwd behavior.
    - Workspace-write profiles preserve `CurrentWorkingDirectory` and
    `ProjectRoots` special entries instead of materializing cwd into
    absolute paths.
    
    ## Verification
    
    - `cargo check -p codex-protocol -p codex-core -p
    codex-app-server-protocol -p codex-app-server -p codex-exec -p
    codex-exec-server -p codex-tui -p codex-sandboxing -p
    codex-linux-sandbox -p codex-analytics --tests`
    - `just fix -p codex-protocol -p codex-core -p codex-app-server-protocol
    -p codex-app-server -p codex-exec -p codex-exec-server -p codex-tui -p
    codex-sandboxing -p codex-linux-sandbox -p codex-analytics`
    
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19414).
    * #19395
    * #19394
    * #19393
    * #19392
    * #19391
    * __->__ #19414
  • Skip disabled rows in selection menu numbering and default focus (#19170)
    Selection menus in the TUI currently let disabled rows interfere with
    numbering and default focus. This makes mixed menus harder to read and
    can land selection on rows that are not actionable. This change updates
    the shared selection-menu behavior in list_selection_view so disabled
    rows are not selected when these views open, and prevents them from
    being numbered like selectable rows.
    
    - Disabled rows no longer receive numeric labels
    - Digit shortcuts map to enabled rows only
    - Default selection moves to the first enabled row in mixed menus
    - Updated affected snapshot
    - Added snapshot coverage for a plugin detail error popup
    - Added a focused unit test for shared selection-view behavior
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Update unix socket transport to use WebSocket upgrade (#19244)
    ## Summary
    - Switch Unix socket app-server connections to perform the standard
    WebSocket HTTP Upgrade handshake
    - Update the Unix socket test to exercise a real upgrade over the Unix
    stream
    - Refresh the app-server README to describe the new Unix socket behavior
    
    ## Testing
    - `cargo test -p codex-app-server transport::unix_socket_tests`
    - `just fmt`
    - `git diff --check`
  • [codex] Omit fork turns from thread started notifications (#19093)
    ## Why
    
    `thread/fork` responses intentionally include copied history so the
    caller can render the fork immediately, but `thread/started` is a
    lifecycle notification. The v2 `Thread` contract says notifications
    should return `turns: []`, and the fork path was reusing the response
    thread directly, causing copied turns to be emitted through
    `thread/started` as well.
    
    ## What Changed
    
    - Route app-server `thread/started` notification construction through a
    helper that clears `thread.turns` before sending.
    - Keep `thread/fork` responses unchanged so callers still receive copied
    history.
    - Add persistent and ephemeral fork coverage that asserts
    `thread/started` emits an empty `turns` array while the response retains
    fork history.
    
    ## Testing
    
    - `just fmt`
    - `cargo test -p codex-app-server`
  • Fix: use function apply_patch tool for Bedrock model (#19416)
    ## Why
    
    `openai.gpt-5.4-cmb` is served through the Amazon Bedrock provider,
    whose request validator currently accepts `function` and `mcp` tool
    specs but rejects Responses `custom` tools. The CMB catalog entry reuses
    the bundled `gpt-5.4` metadata, which marks `apply_patch_tool_type` as
    `freeform`. That causes Codex to include an `apply_patch` tool with
    `type: "custom"`, so even heavily disabled sessions can fail before the
    model runs with:
    
    ```text
    Invalid tools: unknown variant `custom`, expected `function` or `mcp`
    ```
    
    This is provider-specific: the model should still expose `apply_patch`,
    but for Bedrock it needs to use the JSON/function tool shape instead of
    the freeform/custom shape.
    
    ## What Changed
    
    - Override the `openai.gpt-5.4-cmb` static catalog entry to set
    `apply_patch_tool_type` to `function` after inheriting the rest of the
    `gpt-5.4` model metadata.
    - Update the catalog test expectation so the CMB entry continues to
    track `gpt-5.4` metadata except for this Bedrock-specific tool shape
    override.
    
    ## Verification
    
    - `cargo test -p codex-model-provider`
    - `just fix -p codex-model-provider`