Commit Graph

7148 Commits

  • [codex] Preserve logical paths during AGENTS.md discovery (#26465)
    ## Intent
    
    Follow up on #26205 by avoiding unnecessary filesystem canonicalization
    during `AGENTS.md` discovery. The configured working directory is
    already absolute, and canonicalization incorrectly switches symlinked
    workspaces from their logical parent hierarchy to the target's
    hierarchy.
    
    ## User-facing behavior
    
    For a symlinked working directory such as:
    
    ```text
    test-root/
    |-- logical-repo/
    |   |-- AGENTS.md              ("logical parent doc")
    |   `-- workspace ------------> physical-repo/workspace/
    `-- physical-repo/
        |-- AGENTS.md              ("physical parent doc")
        `-- workspace/
            `-- AGENTS.md          ("workspace doc")
    ```
    
    Before this change, Codex canonicalized `logical-repo/workspace` to
    `physical-repo/workspace` before discovery. It therefore loaded
    `physical-repo/AGENTS.md` and `physical-repo/workspace/AGENTS.md`,
    ignoring the instructions from the repository through which the user
    entered the workspace.
    
    After this change, ancestor discovery walks the configured logical path,
    so Codex loads `logical-repo/AGENTS.md`. Opening
    `logical-repo/workspace/AGENTS.md` still follows the symlink through the
    host filesystem, so the workspace document is also loaded.
    `physical-repo/AGENTS.md` is not loaded.
    
    ## Implementation
    
    Use the logical absolute working directory when discovering project
    instructions and reporting instruction sources. Filesystem reads still
    follow the working-directory symlink, so an `AGENTS.md` in the target
    workspace continues to load while ancestor discovery uses the symlink's
    parents.
    
    ## Validation
    
    Added integration coverage proving that discovery loads the logical
    parent's instructions and the target workspace's instructions, but not
    the target parent's instructions.
  • Use Winget release environment secret (#26466)
    ## Why
    `WINGET_PUBLISH_PAT` now lives as a GitHub environment secret under
    `mainline-release-winget`. The WinGet release job needs to enter that
    environment so `secrets.WINGET_PUBLISH_PAT` resolves during
    stable/mainline Rust releases.
    
    ## What Changed
    - Attach the `winget` job in `.github/workflows/rust-release.yml` to the
    `mainline-release-winget` environment.
    - Set `deployment: false` so the job can read environment secrets
    without creating GitHub deployment records.
    
    ## Operational Note
    The `mainline-release-winget` environment must allow `rust-v*.*.*` tag
    refs before this can run on release tags. The live environment currently
    has a custom policy named `rust-v*.*.*` with type `branch`; add the
    corresponding `tag` policy before relying on this path for a release.
    
    ## Validation
    - `git diff --check origin/main...HEAD --
    .github/workflows/rust-release.yml`
    - `ruby -e 'require "yaml"; ARGV.each { |f| YAML.load_file(f); puts
    "yaml ok: #{f}" }' .github/workflows/rust-release.yml`
  • [codex] Use model-advertised reasoning effort order (#26446)
    ## Summary
    - preserve the model catalog order for app-server
    `supportedReasoningEfforts` and document that client contract
    - render TUI reasoning choices in the advertised order
    - step reasoning shortcuts by adjacent list position instead of deriving
    order from known effort names
    - anchor unsupported configured values to the advertised default, or the
    first option when needed
    - remove canonical effort ordering helpers and the unused upgrade effort
    mapping
    
    ## Validation
    - `just fmt`
    - Local tests and compilation were not run per request; relying on CI.
    
    Stacked on #26444.
  • [codex] Support model-defined reasoning efforts (#26444)
    ## Summary
    - accept non-empty model-defined reasoning effort values while
    preserving built-in effort behavior
    - propagate the non-Copy effort type through core, app-server, TUI,
    telemetry, and persistence call sites
    - preserve string wire encoding and expose an open-string schema for
    clients
    - update model selection and shortcut behavior for model-advertised
    effort values
    
    ## Root cause
    `ReasoningEffort` gained a string-backed custom variant, so it could no
    longer implement `Copy` or rely on derived closed-enum serialization.
    Existing consumers still moved effort values from shared references and
    assumed a fixed built-in value set.
    
    ## Validation
    - `just fmt`
    - Local tests and compilation were not run per request; relying on CI.
  • Cleanup experimentalFeature/enablement/set (#26312)
    ## Why
    
    `experimentalFeature/enablement/set` still allowed several keys that no
    longer need to be managed through this API. Keeping those keys also
    preserved corresponding special-case logic, including refreshing the
    apps list when the `apps` key was enabled.
    
    The endpoint also rejected an entire request when any key was invalid or
    unsupported. That makes clients brittle when they send a mix of current
    and stale keys, even when the valid entries can still be applied safely.
    
    ## What changed
    
    - remove the feature keys that no longer need to be supported by
    `experimentalFeature/enablement/set`
    - remove the corresponding apps-list refresh path and its auth/config
    plumbing
    - ignore and warn on invalid or unsupported keys while still applying
    valid keys from the same request
    - update the app-server documentation and integration coverage for the
    reduced key set and partial-acceptance behavior
    
    ## Test plan
    
    - `just test -p codex-app-server experimental_feature_enablement_set` (6
    passed)
    - `just test -p codex-app-server` exercised the changed tests
    successfully; unrelated sandbox-dependent and watcher/timing tests
    failed locally
  • Remove response.processed websocket request (#26447)
    ## Why
    
    The Responses websocket client no longer needs to send a follow-up
    `response.processed` request after a turn response has already been
    recorded. Keeping that extra acknowledgement path adds feature-gated
    control flow and a second websocket request shape that no longer carries
    useful behavior.
    
    ## What Changed
    
    - Removed the `response.processed` websocket request type and sender.
    - Removed the `responses_websocket_response_processed` feature flag and
    schema entry.
    - Removed turn and remote-compaction plumbing that only tracked response
    IDs to send the acknowledgement.
    - Removed tests that existed solely to cover the deleted feature path.
    
    ## Validation
    
    - `just fix -p codex-core -p codex-api -p codex-features`
  • build: use ThinLTO for release binaries (#23710)
    ## Why
    
    Fat LTO makes release builds substantially slower without providing
    enough measured runtime benefit to justify the release CI long pole. The
    build-profile investigation found that keeping Cargo's default release
    `opt-level=3` and switching from fat LTO to ThinLTO (`3/thin/1`) reduced
    a clean `codex-cli` release build from 2073.893 seconds to 1243.172
    seconds, a 40.06% improvement.
    
    The resulting binary increased from 196.7 MiB to 211.8 MiB (+7.63%).
    Measured runtime changes were small: the worst image workload median was
    +0.86% and app-server startup was +0.31% relative to fat LTO. ThinLTO
    retains cross-crate optimization while avoiding most of the fat-LTO
    build cost.
    
    This deliberately avoids global size optimization: final-executable
    testing showed a substantial regression on the image request path, which
    is expected to become more important as image usage grows.
    
    ## What changed
    
    - Set the workspace release profile to `lto = "thin"`, retaining Cargo's
    default release `opt-level=3`.
    - Remove release and CI workflow-specific LTO overrides so
    release-profile builds consistently use the workspace setting.
    - Remove the now-unused Windows release workflow input and related
    diagnostic output.
    
    ## Validation
    
    - Confirmed the release profile parses with `cargo metadata --no-deps
    --format-version 1`.
    - CI validates release builds across the supported target matrix.
  • [codex] Fix Windows sandbox build script lint (#26445)
    ## Why
    
    The Windows ARM64 Cargo clippy job on `main` is failing because
    workspace lints deny `clippy::expect_used`, and the
    `codex-windows-sandbox` build script used `expect()` while reading
    `CARGO_MANIFEST_DIR`.
    
    ## What changed
    
    `codex-rs/windows-sandbox-rs/build.rs` now returns `Result<(), String>`
    from `main()` and converts a missing `CARGO_MANIFEST_DIR` into an
    explicit build-script error. The non-Windows early return and Windows
    linker argument behavior are unchanged.
    
    ## Verification
    
    - `just clippy -p codex-windows-sandbox -- -D warnings`
    - `just test -p codex-windows-sandbox`
  • Route AGENTS.md loading through environment filesystems (#26205)
    ## Why
    
    Workspace-specific `AGENTS.md` loading needs to use the selected
    environment filesystem so remote workspaces and child agents read
    instructions from their actual environment instead of the host
    filesystem. The app-server should report the same instruction sources
    the initialized thread actually loaded, rather than independently
    rescanning configuration and filesystem state.
    
    ## What changed
    
    - Introduce `LoadedAgentsMd` to retain ordered user, project, and
    internal instructions with their provenance.
    - Load and canonicalize workspace `AGENTS.md` paths through the primary
    `EnvironmentManager` environment, then render the loaded instructions
    when constructing turn context.
    - Expose cached loaded instruction sources from initialized threads and
    use them for app-server start, resume, and fork responses.
    - Preserve global `CODEX_HOME` loading and separator behavior while
    excluding empty project files that did not supply model-visible
    instructions.
    - Add integration coverage for CLI injection, selected-environment
    provenance and rendering, empty environment selection, and cached
    sources on loaded-thread resume.
    
    ## Validation
    
    - `just test -p codex-core agents_md`
    - `just test -p codex-core
    selected_environment_sources_match_model_visible_instructions`
    - `just test -p codex-exec agents_md`
    - `just test -p codex-app-server instruction_sources`
    - `just test -p codex-app-server --status-level fail`
  • Use Azure artifact signing environment secrets (#25945)
    ## Why
    Windows release signing should read Azure signing credentials from the
    `azure-artifact-signing` environment instead of the old repo-level
    `AZURE_TRUSTED_SIGNING_*` names. The smoke runs confirmed the
    environment secrets resolve with the new `AZURE_ARTIFACT_SIGNING_*`
    names once the Windows signing job is attached to that environment.
    
    ## What Changed
    - Put the real Windows signing job in the `azure-artifact-signing`
    environment.
    - Switch the Windows signing action inputs from
    `AZURE_TRUSTED_SIGNING_*` to `AZURE_ARTIFACT_SIGNING_*`.
    - Drop the obsolete `workflow_call.secrets` declarations for the old
    repo-level secret names; the caller continues to use `secrets: inherit`.
    - Remove the temporary branch-trigger and Windows-only smoke-test
    workflow changes before finalizing this PR.
    
    ## Validation
    - `git diff --check -- .github/workflows/rust-release.yml
    .github/workflows/rust-release-windows.yml`
    - `ruby -e 'require "yaml"; ARGV.each { |f| YAML.load_file(f); puts
    "yaml ok: #{f}" }' .github/workflows/rust-release.yml
    .github/workflows/rust-release-windows.yml`
  • core: allow excluding tool namespaces from code mode (#26320)
    ## Why
    
    Research and training setups need to control which tool namespaces
    appear inside code mode's nested `tools` surface without disabling those
    tools entirely. This makes it possible to train against a deliberately
    reduced nested-tool setup while preserving the normal direct and
    deferred tool paths.
    
    ## What
    
    - Extend `features.code_mode` to accept structured configuration while
    preserving the existing boolean syntax.
    - Add an exact `excluded_tool_namespaces` list under
    `[features.code_mode]`:
    
      ```toml
      [features.code_mode]
      enabled = true
      excluded_tool_namespaces = ["mcp__codex_apps", "multi_agent_v1"]
      ```
    
    - Filter matching canonical `ToolName` namespaces when constructing code
    mode's nested router and code-mode-specific direct tool descriptions.
    - Keep excluded tools registered, directly exposed in mixed code mode,
    and discoverable through top-level `tool_search` when otherwise
    eligible.
    - Derive deferred nested-tool guidance after namespace filtering so the
    `exec` description does not advertise excluded-only deferred tools.
    - Preserve the boolean/table representation when materializing config
    locks and update the generated config schema.
    
    ## Testing
    
    - `just test -p codex-features`
    - `just test -p codex-config`
    - `just test -p codex-core load_config_resolves_code_mode_config`
    - `just test -p codex-core
    lock_contains_prompts_and_materializes_features`
    - `just test -p codex-core
    excluded_deferred_namespaces_do_not_enable_nested_tool_guidance`
    - `just test -p codex-core
    code_mode_excludes_configured_nested_tool_namespaces`
    - `cargo check -p codex-thread-manager-sample`
  • [codex-analytics] emit forked thread id on initialization (#26248)
    ## Why
    - Thread initialization analytics do not identify the source thread for
    forked threads.
    - The session viewer needs this lineage to construct thread trees.
    - Depends on openai/openai#987854. Do not release this change before
    that backend schema change is deployed.
    
    ## What Changed
    - Adds optional `forked_from_thread_id` to `codex_thread_initialized`.
    - Populates it from the existing thread fork lineage for app-server and
    in-process subagent initialization paths.
    - Keeps it null for non-forked threads.
    
    ## Verification
    - `just fmt`
    - `just test -p codex-analytics`
    - `just test -p codex-app-server
    thread_fork_tracks_thread_initialized_analytics`
  • external-agent-migration: avoid mixed MCP transport configs (#26435)
    ## Why
    
    MCP migration could recursively merge an imported server into an
    existing same-named Codex server. When one definition used stdio and the
    other used HTTP, this produced an invalid mixed configuration containing
    both `command` and `url`.
    
    ## What changed
    
    - Merge MCP configuration at the server level instead of field by field.
    - Preserve an existing same-named Codex MCP server unchanged.
    - Report only MCP servers that would actually be added during detection.
    - Add regression coverage for mixed command/HTTP source configurations.
    - Use neutral fixture names and reserved `example.com` URLs.
    
    ## Test plan
    
    - `just test -p codex-app-server repo_mcp`
      - 5 tests passed.
    - `just test -p codex-external-agent-migration
    mcp_migration_prefers_command_transport_for_mixed_server_config`
      - 1 test passed.
  • app-server: support -c config overrides (#26436)
    ## Why
    
    The standalone `codex-app-server` binary already routed a
    `CliConfigOverrides` value into app-server startup, but its own clap
    args did not expose the shared `-c/--config` option. That meant
    `codex-app-server -c key=value` was rejected before the existing config
    override path could run, unlike the main `codex` CLI.
    
    ## What Changed
    
    - Flatten `CliConfigOverrides` into `AppServerArgs` in
    `codex-rs/app-server/src/main.rs`.
    - Pass parsed overrides to `run_main_with_transport_options` instead of
    always using `CliConfigOverrides::default()`.
    - Add a binary parser test covering both `-c` and `--config` for the
    standalone app-server.
    
    ## Verification
    
    - `just test -p codex-app-server
    app_server_accepts_cli_config_overrides`
    
    The broader `just test -p codex-app-server` run was also attempted. It
    compiled and ran 812 tests, with 796 passing, but failed in this local
    sandbox on unrelated `sandbox-exec: sandbox_apply: Operation not
    permitted` command-exec/turn integration paths and a skills watcher
    timeout.
  • Expose configured marketplace source in plugin list JSON (#26417)
    ## Summary
    - Follow-up to #25330
    - Add `marketplaceSource` to `codex plugin list --json` entries for
    configured marketplaces
    - Keep the existing per-plugin `source` field unchanged; this still
    reports the local plugin source path
    - Include only the configured marketplace `sourceType` and `source` from
    `config.toml`
    - Keep human-readable output unchanged
    - Add CLI coverage for configured local and git marketplace sources
    
    Example:
    
    ```json
    {
      "source": {
        "source": "local",
        "path": "/path/to/.codex/.tmp/marketplaces/debug/plugins/sample"
      },
      "marketplaceSource": {
        "sourceType": "git",
        "source": "https://example.com/acme/agent-skills.git"
      }
    }
    ```
    
    ## Validation
    - `just fmt`
    - `just fix -p codex-cli`
    - `just test -p codex-cli plugin_list`
  • Bound external agent session detection work (#26291)
    ## Why
    
    External agent migration detection parsed and hashed every JSONL session
    file. For users with many large conversations, launching migration could
    consume substantial CPU and disk resources.
    
    Detection only needs the most recent sessions for the migration UI, so
    full-content work should be bounded.
    
    ## What
    
    - Use file modification metadata to select the 50 most recent eligible
    sessions before parsing JSONL content.
    - Skip unchanged imported sessions using metadata stored in the import
    ledger.
    - Preserve content hashing when metadata indicates a session may have
    changed.
    - Stream SHA-256 calculation through a 64 KiB buffer instead of loading
    an entire session into memory.
    - Continue detecting older sessions in subsequent batches after newer
    sessions are imported.
    
    ## Validation
    
    - `RUST_MIN_STACK=8388608 cargo nextest run --no-fail-fast -p
    codex-external-agent-sessions`
      - 20 tests passed.
    - Benchmarked release builds against 250 valid JSONL sessions totaling
    501 MiB:
      - Median detection time decreased from 1,138.8 ms to 47.0 ms.
      - CPU instructions decreased by 95.8%.
      - Both versions returned the expected 50 sessions.
    
    The benchmark used warm filesystem caches and measured the reduction in
    parsing, hashing, and CPU work.
  • Add saved image path hint to standalone image generation (#25947)
    ## Why
    
    Standalone image generation returns image bytes to the model, but the
    model also needs the host artifact path to reference the generated file
    in follow-up work.
    
    ## What changed
    
    - Append the default saved-image path hint alongside the generated image
    tool output.
    - Reuse the existing core image-generation hint text.
    - Pass the thread ID and Codex home directory needed to compute the
    artifact path.
    - Add app-server and extension coverage for the model-visible hint.
    
    ## Validation
    
    - `just fmt`
    - `just bazel-lock-check`
    - `just test -p codex-app-server
    standalone_image_generation_returns_saved_path_hint_to_model`
  • Simplify Codex CLI README (#26313)
    ## Summary
    
    The codex-rs README was left over from before we moved the docs into the
    developer site. Its contents were very much out of date, and we received
    some bug reports about it.
  • Load plugin hooks without other plugin capabilities (#26272)
    ## Summary
    
    `hooks/list` only consumes plugin hook declarations, but previously
    loaded every enabled plugin's skills, MCP configuration, apps, and
    capability summary before discarding them.
    
    In a local benchmark, this reduced `hooks/list` latency by over 100ms
    (e.g., from 594 to 467ms on startup, and 168 to 16ms when making a
    `hooks/list` call later in the same TUI session). This is on the
    critical path to rendering the TUI, so every 10s of ms should be eyed
    skeptically (IMO).
    
    This change adds a hook-specific plugin loading path that preserves
    plugin enablement, remote/local conflict resolution, deterministic
    ordering, manifest resolution, and hook-loading warnings while skipping
    unrelated capabilities. (I think there's room for a more general design
    here that allows you to project the capabilities you need at load-time,
    but that seems unnecessary right now.)
  • Reduce SQLite contention from OpenTelemetry SDK debug logs (#26396)
    ## Summary
    
    - skip `opentelemetry_sdk` DEBUG and TRACE events before formatting or
    queueing them for the SQLite log sink
    - preserve INFO, WARN, and ERROR events from the SDK, along with TRACE
    events from application targets
    - add a persistence-level regression test for the target and level
    policy
    
    ## Why
    
    OpenTelemetry's batch log processor emits internal
    `BatchLogProcessor.ExportingDueToTimer` meta-events every second per
    Codex process. In measured high-fanout `logs_2.sqlite` databases,
    low-level `opentelemetry_sdk` events accounted for over 30% of retained
    rows (30-60% on the machines of people I asked to check).
    
    Persisting this SDK bookkeeping across many processes adds substantial
    write volume and contention without representing application activity.
    
    ## Validation
    
    - `just test -p codex-state` (132/132 tests passed, plus bench smoke)
    - `just fix -p codex-state`
    - `just fmt`
  • Optimize unbounded byte scans with memchr (#26265)
    ## Summary
    
    This PR adds `memchr` for some low-hanging performance improvements
    (namely, in MCP stdio, Ollama streaming, and full message-history
    newline counts).
    
    Codex produced the following release benchmarks:
    
    | Operation | Before | After | Speedup |
    | --- | ---: | ---: | ---: |
    | MCP 1 MiB chunked line | 2.172 s | 3.984 ms | 545x |
    | Ollama 1 MiB chunked line | 1.673 s | 2.790 ms | 600x |
    | Count newlines in 10 MiB history | 132.83 ms | 20.05 ms | 6.6x |
    
    With a "real" MCP setup (`ExecutorStdioServerLauncher` started a Python
    MCP server, completed `initialize`, requested `tools/list`, and
    deserialized a 1 MiB tool description over newline-delimited stdio),
    it's about 16x faster end-to-end:
    
    | Branch | 50 calls | Per call |
    | --- | ---: | ---: |
    | `main` | 862.53 ms | 17.25 ms |
    | this branch | 53.89 ms | 1.08 ms |
    
    `memchr` is already in our dependency tree and extremely widely used for
    this kind of optimized scanning.
  • Bridge host-loaded skills into the skills extension (#26172)
    ## Why
    
    The skills extension needs to become the path that exposes local host
    skills without losing the behavior already owned by core skill loading.
    Host skill discovery is not just `$CODEX_HOME/skills`: it also includes
    config layers, bundled-skill settings, plugin roots, runtime extra
    roots, and the filesystem for the selected primary environment.
    
    Rather than making the extension reload host skills and risk drifting
    from that authoritative load, this PR bridges the already-loaded
    per-turn skills outcome into the extension. That lets the extension
    advertise host skills and inject explicit `$skill` prompts while
    preserving the same roots, disabled/hidden state, rendered paths, and
    environment-backed file reads that the legacy path uses.
    
    ## What Changed
    
    - Adds `HostLoadedSkills` in `core-skills` to wrap the turn's
    `SkillLoadOutcome` and read `SKILL.md` through the filesystem that
    loaded that skill.
    - Stores `HostLoadedSkills` in turn extension data for normal turns and
    review turns, so the skills extension can consume the loaded host
    catalog without reloading it.
    - Adds `HostSkillProvider` under `ext/skills/src/provider/host.rs`,
    mapping host-loaded skill metadata into the skills-extension
    catalog/read contract.
    - Registers the host provider by default from
    `codex_skills_extension::install()`.
    - Preserves host skill metadata such as dependencies, disabled state,
    hidden-from-prompt policy, and slash-normalized display paths.
    - Passes host-loaded skills through `SkillListQuery` and
    `SkillReadRequest` so explicit skill invocation reads only resources
    from the loaded host catalog.
    - Adds integration coverage for a real legacy
    `$CODEX_HOME/skills/.../SKILL.md` skill being listed and injected
    through the installed extension.
    
    ## Testing
    
    - Added `installed_extension_loads_host_skills_from_legacy_roots` in
    `ext/skills/tests/skills_extension.rs`.
    - `just test -p codex-skills-extension`
  • Gate automatic idle turns in Plan mode (#26147)
    ## Why
    
    Goal idle continuation is extension-triggered model-visible work, so it
    should follow one core-owned rule for when automatic work may start. In
    particular, it should not jump ahead of queued user/client work, start
    while another task is active, or inject a continuation turn while the
    thread is in Plan mode.
    
    Keeping this policy in `try_start_turn_if_idle` avoids passing
    `collaboration_mode` or review-specific state through
    `ThreadLifecycleContributor::on_thread_idle`. Active `/review` is
    covered by the same active-task gate because Review turns are not
    steerable.
    
    ## What Changed
    
    - Teach `Session::try_start_turn_if_idle` to reject automatic idle turns
    in Plan mode, both before reserving an idle turn and after building the
    turn context.
    - Document `CodexThread::try_start_turn_if_idle` as the extension-facing
    gate for automatic idle work, including Plan-mode and active Review-task
    behavior.
    - Add focused coverage for Plan-mode rejection and active Review-task
    rejection without queuing synthetic input.
    
    ## Testing
    
    - `just test -p codex-core try_start_turn_if_idle`
  • chore: calm down (#26367)
    Prompt update to address feedback
  • ci: sign macOS release artifacts with Azure Key Vault (#26252)
    ## Why
    
    The public Codex release workflow needs to sign and notarize macOS
    binaries and DMGs without placing the Developer ID private key in
    GitHub. This moves the private-key operation behind the protected
    `codesigning` environment and uses GitHub OIDC with Azure Key Vault
    PKCS#11, while preserving the existing external `build_unsigned` /
    `promote_signed` fallback.
    
    ## What changed
    
    - Add a reusable AKV PKCS11 setup action that authenticates to Azure
    with OIDC, downloads pinned signing tools, verifies their SHA-256
    digests, and loads the public signing certificate from Key Vault.
    - Replace the legacy macOS signing action with scripts that support
    AKV-backed `rcodesign`, notarize signed binaries and DMGs, and staple
    DMG notarization tickets.
    - Restructure `rust-release.yml` so macOS builds produce unsigned
    artifacts first, protected jobs perform signing and notarization, macOS
    runners package and verify the results, and release publishing waits for
    verified artifacts.
    - Preserve the manual external-signing handoff flow and make manual-mode
    conditions explicit.
    - Move the Codex entitlements file alongside the signing scripts and
    update CODEOWNERS for the new signing surfaces.
    
    ## Verification
    
    - [Live protected signing workflow
    run](https://github.com/openai/codex/actions/runs/26903610631) completed
    successfully for both macOS architectures, including binary
    signing/notarization, DMG signing/notarization, and final artifact
    verification.
    - Downloaded both signed DMGs and independently verified their checksums
    and strict signatures.
    - Confirmed `xcrun stapler validate` succeeds and Gatekeeper accepts
    both DMGs as `Notarized Developer ID`.
    - Mounted both DMGs and confirmed the contained `codex` and
    `codex-responses-api-proxy` binaries have valid Developer ID signatures
    for the expected architectures.
    
    ---------
    
    Co-authored-by: shijie-openai <shijie.rao@openai.com>
  • [codex-analytics] report compaction request token counts (#25946)
    ## Why
    
    Compaction analytics need token counts that better represent the request
    being compacted. The existing session snapshot can diverge from the
    actual remote compaction request after output rewriting, and remote v2
    can use server-side Responses usage when available.
    
    ## What changed
    
    - Add an optional `active_context_tokens_before` override to
    `CompactionAnalyticsAttempt::track(...)` for remote compaction when it
    has a better before-token value than the begin-time session snapshot.
    The local `/compact` path passes no override.
    - For remote v1 `responses_compact`, subtract the estimated token delta
    from pre-compaction output rewriting from the session snapshot, capped
    by locally-added tokens since the last successful API response.
    - For remote v2 `responses_compaction_v2`, use the same bounded
    output-rewrite fallback as remote v1, then overwrite
    `active_context_tokens_before` with server `token_usage.input_tokens`
    from the `response.completed` event when present.
    - Keep the existing v2 compaction-output validation while carrying the
    completed response token usage through `collect_compaction_output`.
    
    ## Verification
    
    - `just fmt`
    - `just test -p codex-core
    collect_compaction_output_accepts_additional_output_items`
    - `git diff --check`
  • cli: add package path from install context (#26189)
    ## Why
    
    Codex package installs include helper binaries in `codex-path`, such as
    the bundled `rg`. Package-layout launches should add that directory
    before user commands run, but standalone launches were missing it while
    npm launches only worked because `codex.js` had its own legacy `PATH`
    rewrite. That made npm and standalone package behavior diverge.
    
    Shell snapshot restoration can also reset `PATH` after runtime setup.
    Any package-owned `PATH` prepend has to be recorded as an explicit
    runtime override so shells, unified exec, and user-shell commands keep
    access to `codex-path` after a snapshot is sourced.
    
    ## Repro
    
    Before this change, a curl-installed package could contain `rg` under
    `codex-path` but still fail to put it on `PATH`:
    
    ```shell
    mkdir /tmp/test-codex-curl
    curl -fsSL https://chatgpt.com/codex/install.sh \
      | CODEX_HOME=/tmp/test-codex-curl CODEX_NON_INTERACTIVE=1 sh
    /tmp/test-codex-curl/packages/standalone/current/bin/codex exec \
      --skip-git-repo-check 'print `which -a rg`'
    find /tmp/test-codex-curl -name rg
    ```
    
    The `which -a rg` output omitted the packaged helper even though `find`
    showed it under
    `/tmp/test-codex-curl/packages/standalone/releases/.../codex-path/rg`.
    
    The npm install path behaved differently only because
    `codex-cli/bin/codex.js` had legacy `PATH` rewriting:
    
    ```shell
    mkdir /tmp/test-codex-npm
    cd /tmp/test-codex-npm
    npm install @openai/codex
    ./node_modules/.bin/codex exec --skip-git-repo-check 'print `which -a rg`'
    ```
    
    That printed the npm package's `vendor/<target>/codex-path/rg` first.
    This PR moves that behavior into Rust-side package launch setup so
    curl/standalone and npm/bun launches agree without JS rewriting `PATH`.
    
    ## What Changed
    
    - `codex-rs/arg0` now uses
    `InstallContext::current().package_layout.path_dir` to prepend the
    package helper directory before any threads are created.
    - Package helper `PATH` setup is independent from the temporary arg0
    alias setup, so `codex-path` is still added even if CODEX_HOME tempdir,
    lock, or symlink setup fails.
    - `codex-rs/install-context` detects the canonical package layout we
    ship: `bin/`, `codex-resources/`, and `codex-path/` next to
    `codex-package.json`.
    - Shell, local unified exec, and user-shell runtimes now record package
    `codex-path` prepends in `explicit_env_overrides`, matching the existing
    zsh-fork behavior so shell snapshots cannot restore over the package
    helper path.
    - Remote unified exec requests do not receive the local app-server
    package path overlay.
    - `codex-cli/bin/codex.js` no longer computes or overrides `PATH`; it
    only locates the native binary in the canonical package layout and
    passes npm/bun management metadata.
    - Added regression tests for `PATH` ordering, package layout detection,
    and shell snapshot preservation of package path prepends.
    
    ## Verification
    
    - `node --check codex-cli/bin/codex.js`
    - `just test -p codex-install-context -p codex-arg0`
    - `just test -p codex-core
    user_shell_snapshot_preserves_package_path_prepend`
    - `just test -p codex-core tools::runtimes::tests`
    - `just bazel-lock-update`
    - `just bazel-lock-check`
    - `just fix -p codex-install-context -p codex-arg0 -p codex-core`
  • log plugin MCP server names (#26002)
    ## Summary
    - emit the plugin capability summary's exact MCP server names in
    `codex_plugin_used`
    
    ## Test
    - `just test -p codex-analytics`
    - `just test -p codex-core
    explicit_plugin_mentions_track_plugin_used_analytics`
    - `just fix -p codex-analytics`
  • Use Windows setup marker as completion signal (#26074)
    # Why
    
    When an organization requires the elevated Windows sandbox, Codex
    launches an elevated helper to provision users, configure firewall and
    ACL rules, and lock persistent sandbox directories.
    
    We observed that closing the helper after setup started could leave the
    machine partially initialized while the TUI still announced **Sandbox
    ready**. Model-only turns continued to work, but the first shell command
    retried setup and failed with Windows cancellation error `1223`.
    
    This was not an enforcement bypass; command execution continued to fail
    closed. The issue was a false readiness signal: `setup_marker.json` was
    written during user provisioning, before the remaining setup stages had
    completed.
    
    # What
    
    Treat `setup_marker.json` as the commit record for Windows sandbox
    setup:
    
    1. Before full or provisioning setup begins, remove the existing marker
    and create the final marker path with a protected ACL.
    2. Keep the marker empty and therefore invalid while setup is in
    progress. Sandbox users cannot read, modify, or replace it.
    3. Run every synchronous setup stage.
    4. After setup succeeds, write the valid marker contents without
    changing its ACL.
    5. After the helper exits successfully, verify the existing readiness
    check before enabling the sandbox.
    
    If setup is canceled or fails, the marker remains invalid and Codex
    reports setup as incomplete instead of announcing readiness.
    
    Refresh-only and read-ACL-only helper runs continue to leave the marker
    untouched. The setup version remains `5` to avoid forcing all existing
    Windows users through elevated setup again.
    
    # Verification
    
    - Added coverage confirming sandbox users cannot read or modify the
    setup marker after elevated setup.
    - Added coverage confirming a successful helper exit without complete
    setup artifacts is rejected.
    - Ran `just test -p codex-windows-sandbox`.
  • codex-pr-body: avoid confidential references (#26260)
    ## Why
    
    PR descriptions can be visible outside the context used to generate
    them. In #23710, a generated description referenced an internal
    document, showing that the skill needs an explicit guardrail against
    exposing confidential context.
    
    ## What changed
    
    - Updated the `codex-pr-body` guidance to prohibit confidential
    references, including codenames and OpenAI-internal URLs.
  • Rewrite oversized tool outputs during remote compaction (#26251)
    ## Why
    
    When trying to fit history under compaction limit rewrite output items
    instead of removing them entirely. Otherwise we're breaking
    incrementality in relation to the previous response.
  • feat: catalog multi-agent v2 config (#26254)
    ## Why
    
    Model metadata can now select multi-agent v2 even when a user has not
    enabled `features.multi_agent_v2` in their config. Some existing configs
    still set the legacy `agents.max_threads` knob for v1 multi-agent
    behavior, so treating every v2 runtime as incompatible with
    `agents.max_threads` would break users whose only v2 signal came from
    the model catalog.
    
    The incompatible configuration is specifically enabling
    `features.multi_agent_v2` while also setting `agents.max_threads`.
    Catalog-forced v2 should use the v2 concurrency setting and ignore the
    legacy v1 cap instead of rejecting the config.
    
    ## What changed
    
    - Split config validation from runtime concurrency calculation:
    `effective_agent_max_threads` now just returns the effective cap for the
    resolved multi-agent runtime.
    - Added explicit validation for `features.multi_agent_v2` +
    `agents.max_threads` at session startup.
    - Preserved catalog-selected v2 behavior when `features.multi_agent_v2`
    is disabled, so existing configs with `agents.max_threads` keep
    starting.
    - Updated model-runtime selector coverage so a catalog v2 model still
    exposes v2 tools even when `agents.max_threads` is set and the config
    flag is disabled.
    
    ## Validation
    
    - `cargo check -p codex-core --lib`
    - `just test -p codex-core --lib -E
    "test(multi_agent_v2_feature_rejects_agents_max_threads) |
    test(catalog_v2_allows_agents_max_threads_when_feature_disabled)"`
  • [codex] Split Python runtime release workflow (#26226)
    ## Why
    
    Python SDK releases pin an exact `openai-codex-cli-bin` version, so all
    eight platform runtime wheels must be available on PyPI before the SDK
    package is built and published. PyPI does not support reusable workflows
    as Trusted Publishers, which means OIDC-backed publishing must run from
    each top-level release workflow.
    
    ## What changed
    
    - add reusable `python-runtime-build.yml` to prepare and upload all
    eight runtime wheels without publishing
    - add top-level `python-runtime-release.yml` for manual runtime
    publication before updating an SDK pin
    - have `python-sdk-release.yml` publish and verify the prepared runtime
    wheels from its own top-level trusted job before building the SDK
    - verify PyPI exposes exactly the expected eight runtime wheels before
    either release workflow continues
    
    ## PyPI configuration
    
    - keep the trusted publisher for
    `.github/workflows/python-sdk-release.yml` with environment `pypi`
    - add a trusted publisher for
    `.github/workflows/python-runtime-release.yml` with environment `pypi`
    - no trusted publisher is needed for
    `.github/workflows/python-runtime-build.yml`
    
    ## Validation
    
    - parsed all three workflow YAML files
    - validated all embedded shell blocks with `bash -n`
    - no local tests run; relying on online CI
  • Restore Windows coverage for code-mode image generation exposure (#25960)
    ## Summary
    
    Restore Windows coverage for standalone image generation in code mode.
    
    The previous test executed a V8-backed code-mode cell on Windows CI,
    where that runtime path is intentionally excluded because it is
    unreliable. The test was then ignored entirely on Windows, removing
    useful coverage.
    
    This splits the test into two checks:
    
    - All platforms verify that `image_gen__imagegen` is exposed to the
    model when image generation is configured for code mode only.
    - Non-Windows platforms continue to execute the full V8-backed flow and
    verify that the nested image-generation call succeeds.
    
    ## Verification
    
    - `just fmt`
    - `git diff --check`
    - `just test -p codex-app-server standalone_image_generation`
    
    Result: 3 tests passed, plus the required bench smoke check.
  • Fix forked thread name inheritance (#26075)
    Fixes #25950.
    
    ## Why
    Forking a renamed thread could fall back to the source thread's
    first-prompt title because the fork path did not preserve the source's
    explicit name. That meant fork-of-renamed-fork flows could show stale
    sidebar labels even though the user had renamed the parent.
    
    ## What changed
    `thread/fork` now reads the source thread's distinct `name`, normalizes
    it, persists it onto materialized forks, and applies it to the returned
    API thread. Because the source `name` already excludes first-prompt
    pseudo-titles, forks inherit only an explicit user rename instead of
    stale generated metadata.
  • [profile-switcher][rust] -- [1/2] Add app-server account session protocol (#25469)
    ## Summary
    
    Adds the app-server v2 `accountSession/*` protocol used by the Desktop
    profile switcher and the backend account metadata client needed to
    populate workspace choices.
    
    This is the protocol layer only. The app-server lifecycle and
    consolidated saved-session storage are split into a follow-up PR.
    
    ## Rust Stack
    
    1. This PR
    2. [openai/codex#25383](https://github.com/openai/codex/pull/25383) adds
    app-server session lifecycle behavior and consolidated saved-session
    storage.
    
    ## Validation
    
    - Generated app-server schema fixtures are included from the existing
    generation flow in the lifecycle PR where the routes are registered.
    - Did not run tests per requested scope.
  • Expose local image paths to models (#25944)
    ## Why
    
    Local image attachments include image bytes, but the adjacent
    model-visible label omits the source path. Exposing the path lets
    model-selected workflows refer back to the intended local image
    explicitly.
    
    ## What changed
    
    - Include an escaped `path` attribute in model-visible local image
    opening tags.
    - Reuse the path-aware marker generator in rollout coverage.
    - Update protocol, replay, and rollout coverage for the new request
    shape.
    
    ## Validation
    
    - `just fmt`
    - `just test -p codex-protocol`
    - `just test -p codex-core skips_local_image_label_text`
    - `just test -p codex-core
    copy_paste_local_image_persists_rollout_request_shape`
    - `git diff --check`
  • Preserve remote plugin default prompts (#25887)
    ## Summary
    
    - Read `default_prompts` from remote plugin release metadata.
    - Prefer the plural prompt list over legacy `default_prompt`.
    - Fall back to `default_prompt` as a single-item list for backward
    compatibility.
    
    ## Testing
    
    - `just test -p codex-core-plugins`
    - `just test -p codex-app-server`
  • [codex] Pin Python SDK to runtime 0.137.0a4 (#26216)
    ## Summary
    - pin the Python SDK runtime to `openai-codex-cli-bin==0.137.0a4`
    - refresh generated protocol artifacts from `rust-v0.137.0-alpha.4`
    - refresh `sdk/python/uv.lock` with all eight published runtime wheels
    
    ## Runtime publication
    - published `openai-codex-cli-bin==0.137.0a4` through the
    `python-sdk-release` workflow
    - includes macOS, manylinux, musllinux, and Windows wheels
    - publication run:
    https://github.com/openai/codex/actions/runs/26905608531
    
    ## Validation
    - ran `just fmt`
    - generated artifacts from the `rust-v0.137.0-alpha.4` release wheel
    - ran `uv lock --check --default-index https://pypi.org/simple`
    - did not run tests locally, per request; CI provides the test signal
  • [codex] Copy user Bazel settings into Codex worktrees (#25925)
    ## Why
    
    Codex-created linked worktrees do not include ignored files from the
    main worktree. Bazel users who keep local overrides in `user.bazelrc`
    therefore lose those settings in every new worktree.
    
    The setup must also work on Windows and must not overwrite a file that
    already exists in the worktree.
    
    ## What changed
    
    The checked-in Codex environment now invokes
    `.codex/environments/setup.py`. The script resolves the main worktree
    and current worktree, then uses
    `copy_from_main_worktree_to_worktree(repo_relative_path)` to copy
    ignored files into new worktrees without overwriting existing
    destinations.
    
    `main()` currently copies `user.bazelrc`. Additional repository-relative
    paths can be added as further calls to the same helper.
    
    ## Validation
    
    - Ran the setup script in a linked worktree and confirmed it handles a
    missing main-worktree `user.bazelrc`.
    - Verified the helper copies a main-worktree file, preserves an existing
    worktree file, and creates parent directories for a nested path.
  • core: stop threading SandboxPolicy through exec (#25700)
    ## Why
    
    #25450 attempts a broad `SandboxPolicy` removal across several unrelated
    surfaces, which makes it hard to review and still leaves new helper code
    moving legacy policies around. This PR is a narrower alternative:
    migrate only the exec-side Windows sandbox plumbing so the review can
    focus on one production path and one compatibility boundary.
    
    The goal is to stop threading `SandboxPolicy` through exec code without
    expanding the migration into app-server, protocol, telemetry, config, or
    session behavior.
    
    ## What changed
    
    - Removed `ExecRequest::compatibility_sandbox_policy()`.
    - Changed the Windows restricted-token and elevated filesystem override
    helpers to accept `PermissionProfile` plus the split filesystem/network
    policies instead of a `SandboxPolicy`.
    - Kept the remaining legacy projection local to the writable-root
    comparison that still needs to compare split policy behavior against the
    legacy Windows backend model.
    - Rejected restricted split filesystem policies that still grant
    full-disk writes before using the Windows restricted-token backend,
    preserving the previous clear-failure behavior for profiles that project
    to `ExternalSandbox`.
    - Updated the Windows sandbox override tests to exercise the new call
    shape and cover the full-write split-profile regression.
    
    ## Verification
    
    - `just test -p codex-core windows_restricted_token`
    - `just test -p codex-core windows_elevated`
  • Fix multiline paste in /goal edit (#26047)
    Fixes #26025.
    
    ## Why
    `/goal edit` opens `CustomPromptView`, which did not use the paste-burst
    handling that protects the main composer when terminals deliver paste as
    rapid key events. On Windows terminals, the first pasted newline could
    be treated as Enter-to-submit, truncating the goal edit and leaving the
    rest of the paste behind.
    
    ## What
    This reuses `PasteBurst` in `CustomPromptView` as a lightweight
    Enter-suppression detector for paste-like key streams. Characters still
    insert directly, explicit paste still goes through the view paste path,
    and ordinary text entry still submits on Enter.
  • feat: guard git enrichment (#26175)
    Skip turn git metadata enrichment when a turn has remote or multiple
    executors, so we do not report the orchestrator checkout as executor
    workspace metadata.
    
    Test: `just test -p codex-core` (blocked by existing
    `Session::conversation_id` compile error in `close_agent.rs`).
  • nit: small prompt update for MAv2 (#26179)
    Simple prompt change for MAv2 because of OOD compared to CBv9
  • [codex] Restore setup helper UAC manifest (#25949)
    ## Why
    
    #23764 removed Windows resource stamping from `codex-windows-sandbox`,
    but it also removed the setup helper's UAC manifest. That manifest was
    doing more than cosmetic version metadata: Microsoft documents
    `requestedExecutionLevel level="asInvoker"` as the setting that makes an
    executable run at the same permission level as the process that started
    it:
    https://learn.microsoft.com/en-us/windows/win32/sbscs/application-manifests#trustinfo
    
    In the reported session, `codex-windows-sandbox-setup.exe` was launched
    for a non-elevated setup refresh and `CreateProcess` failed with `os
    error 740` (`The requested operation requires elevation`). Restoring an
    explicit `asInvoker` manifest records the helper's intended default
    launch contract: normal launches inherit the caller's token, and
    elevation only happens through the code paths that request it
    explicitly.
    
    The setup helper has two launch modes:
    
    - setup refresh uses a normal `Command::new(...)` spawn and should never
    trigger UAC
    - full setup explicitly uses `ShellExecuteExW` with the `runas` verb
    when elevation is required
    
    Restoring `asInvoker` keeps refresh non-elevated by default while
    preserving the explicit elevated path for full setup.
    
    ## What changed
    
    - Restored a minimal `codex-windows-sandbox-setup.manifest` containing
    only `requestedExecutionLevel level="asInvoker"`.
    - Added a small build script that passes setup-helper-scoped manifest
    linker args for MSVC and the Windows GNU/LLVM target used by Bazel.
    - Wired the manifest into Bazel build-script data.
    
    This does not restore `winres`, `FileDescription`, `ProductName`, or
    package-wide resource stamping, so other Codex binaries that link
    `codex-windows-sandbox` do not inherit metadata from this package.
    
    ## Verification
    
    - `cargo fmt -p codex-windows-sandbox`
    - `cargo build -p codex-windows-sandbox --bin
    codex-windows-sandbox-setup`
    - `cargo build -p codex-windows-sandbox --bin codex-command-runner`
    - `cargo build -p codex-windows-sandbox --lib`
    - Build-script output simulation for `CARGO_CFG_TARGET_ENV=msvc` emits
    `/MANIFEST:EMBED` and `/MANIFESTINPUT:<manifest>`.
    - Build-script output simulation for `CARGO_CFG_TARGET_ENV=gnu` +
    `CARGO_CFG_TARGET_ABI=llvm` emits `-Wl,-Xlink=/manifest:embed` and
    `-Wl,-Xlink=/manifestinput:<manifest>`.
    - Inspected the built binaries and confirmed:
    - `codex-windows-sandbox-setup.exe` contains `requestedExecutionLevel` /
    `asInvoker`
      - `codex-command-runner.exe` does not contain those manifest strings
    - Windows `VersionInfo` remains blank for `FileDescription` /
    `ProductName`
    - `just test -p codex-windows-sandbox` ran through Nextest, with 114
    passing, 2 skipped, and 1 existing Windows sandbox failure:
    `unified_exec::tests::legacy_non_tty_cmd_emits_output` fails with
    `CreateRestrictedToken failed: 87`.
  • Implement v1 skills extension prompt injection (#26167)
    ## Why
    
    The skills extension needs a real turn-time path before host, executor,
    or remote skills can be routed through it. The previous code was mostly
    a placeholder catalog/provider sketch, so there was no bounded
    available-skills fragment, no source-owned `SKILL.md` read, and no place
    for warnings or per-turn selection state to live.
    
    This PR makes `ext/skills` the authority-preserving flow for listing
    candidate skills and injecting only explicitly selected main prompts,
    without adding more of that logic to `codex-core`.
    
    ## What changed
    
    - Expands catalog entries with `main_prompt`, display path, short
    description, dependency metadata, enabled/prompt visibility flags, and
    authority/package-aware read requests.
    - Replaces the placeholder `providers/*` modules with
    `SkillProviderSource` and `SkillProviders`, routing list/read/search
    calls by source kind and surfacing provider failures as warnings.
    - Adds bounded available-skills rendering and `SKILL.md` main-prompt
    truncation before the fragments enter model context.
    - Resolves explicit skill selections from structured `UserInput::Skill`,
    skill-file mentions, `skill://...` paths, and plain `$skill` text
    mentions, then reads selected prompts through their owning provider.
    - Stores mutable per-thread skills config and per-turn
    catalog/selection/warning state.
    - Adds `install_with_providers` so tests and future host wiring can
    supply concrete providers.
    
    ## Testing
    
    - Not run locally.
    - Added `codex-rs/ext/skills/tests/skills_extension.rs` coverage for
    available-catalog injection, selected prompt injection through the
    owning provider, and prompt-hidden skills that remain invokable.
  • chore: mechanical rename (#26156)
    Rename `Session::conversation_id` to `Session::thread_id` with an auto
    refactor in RustRover