Commit Graph

6773 Commits

  • Prefer just test over cargo test in docs (#23910)
    `cargo test` for the core and other crates fails on a fresh macOS
    checkout without the right stack size variable. This change encourages
    using the just test command that sets the environment up correctly.
    
    As a bonus, this should encourage agents to get more benefit out of
    nextest's parallel execution.
  • fix(app-server): fix optional bool annotations (#24099)
    `#[serde(default)]` wasn't sufficient for our generated TS types to
    reflect that clients didn't have to set them. We also need
    `skip_serializing_if = "std::ops::Not::not"`. This is already a rule in
    our agents.md file.
  • ci: Use codex produced v8 artifacts for release builds (#23934)
    Updates our build script to pull down the artifacts like we do in CI for
    building v8 into our targets.
    
    This changes the flow so that we now pre-install rusty v8 assets for all
    of our release targets from pre-built in workflow.
    Secondarily if running it locally we now optionally pull the assets down
    on python run assuming the user hasn't set the proper values, it then
    provides them.
    
    Sorry for the miss here.
  • fix: reject legacy profile selectors (#24059)
    ## Why
    
    `--profile` now selects `<name>.config.toml`, so the legacy `profile`
    selector should not be reintroduced through config write or MCP tool
    paths. A matching legacy selector in base user config also needs the
    same migration guard as a matching legacy `[profiles.<name>]` table so
    profile loading fails with one clear migration error instead of mixing
    the old and new profile models.
    
    ## What
    
    - reject non-null app-server config writes to the top-level legacy
    `profile` selector
    - make `--profile <name>` reject base user config that still selects the
    same legacy `profile = "<name>"` value, alongside the existing matching
    legacy profile-table guard
    - reject removed MCP `codex` tool fields such as `profile` by denying
    unknown tool-call parameters and exposing that restriction in the
    generated schema
    - add regression coverage for the app-server write paths, config loader
    guard, and MCP tool input/schema behavior
    
    ## Verification
    
    - targeted regression tests cover the new app-server, config loader, and
    MCP rejection paths
  • otel: drop legacy profile usage telemetry (#24061)
    ## Summary
    - drop the dead legacy profile usage metric and active-profile
    conversation-start fields
    - update role comments so they describe provider and service-tier
    preservation without legacy config-profile wording
    - pair the code cleanup with the file-backed profile docs update in
    openai/developers-website#1476
    
    ## Testing
    - `just fmt`
    - `cargo test -p codex-otel`
    - `cargo test -p codex-core` *(fails: existing stack overflow in
    `mcp_tool_call::tests::guardian_mode_mcp_denial_returns_rationale_message`)*
    - `cargo test -p codex-core --lib
    mcp_tool_call::tests::guardian_mode_mcp_denial_returns_rationale_message`
    *(fails with the same stack overflow)*
  • Avoid config snapshots in live agent subtree traversal (#24057)
    ## Why
    `/feedback` asks `ThreadManager` for the selected agent subtree before
    it uploads logs. The previous live subtree path reconstructed
    parent-child links by iterating every loaded thread and awaiting each
    thread config snapshot, so unrelated loaded-thread state could stall
    feedback subtree enumeration.
    
    The loaded-thread set already belongs to
    [`ThreadManagerState`](https://github.com/openai/codex/blob/50e6644c9425df2dcbfe52f65fd60bd7f15a8ea2/codex-rs/core/src/thread_manager.rs).
    Reading thread-spawn parents from the captured `CodexThread` session
    sources at that boundary keeps unload and resume behavior manager-owned
    while avoiding per-session config inspection.
    
    ## What Changed
    - expose parent-child thread-spawn edges for loaded, non-internal
    threads from `ThreadManagerState`
    - build the live child map from those edges while keeping agent metadata
    lookup and ordering in `AgentControl`
    - add regression coverage for live subtree enumeration when no state DB
    is available
    
    ## Validation
    - `git diff --check`
    - local Rust tests not run per request
  • config: remove legacy profile write paths (#24055)
    ## Why
    
    [#23883](https://github.com/openai/codex/pull/23883) moved the
    user-facing `--profile` flag onto profile v2 and
    [#23886](https://github.com/openai/codex/pull/23886) removed CLI
    forwarding for the legacy profile-v1 path. Core and TUI config
    persistence still carried `active_profile` and
    `ConfigEditsBuilder::with_profile`, which let later writes continue
    targeting legacy `[profiles.<name>]` tables after profile selection
    moved to profile-v2 config files.
    
    ## What
    
    - Remove legacy profile routing from
    [`ConfigEditsBuilder`](https://github.com/openai/codex/blob/4b38e9c22e762261d7f7eef49d8a21792e241a06/codex-rs/core/src/config/edit.rs#L1064-L1294),
    so core config edits no longer carry `with_profile` or infer
    `[profiles.*]` write targets from a `profile` key.
    - Drop `active_profile` plumbing from runtime `Config`, TUI
    startup/state, app-server config override forwarding, and Windows
    sandbox setup persistence.
    - Make app-server-backed TUI config edits use unscoped model,
    service-tier, feature, Auto-review, plan-mode, and Windows sandbox paths
    through
    [`tui/src/config_update.rs`](https://github.com/openai/codex/blob/4b38e9c22e762261d7f7eef49d8a21792e241a06/codex-rs/tui/src/config_update.rs#L43-L112).
    - Update config edit coverage so legacy `profile` state stays untouched
    by direct model writes, and remove tests whose only contract was the
    deleted profile-scoped persistence path.
    
    ## Testing
    
    - Not run locally.
  • config: remove legacy profile v1 resolution (#24051)
    ## Why
    
    [#23883](https://github.com/openai/codex/pull/23883) moved user-facing
    `--profile` selection onto profile v2, and
    [#23886](https://github.com/openai/codex/pull/23886) removed the old CLI
    `config_profile` override path. Core still had a second legacy path:
    `profile = "..."` could select `[profiles.*]` values while runtime
    config was built. Keeping that resolver alive preserves the old
    precedence model and profile-carrying surfaces even though profile
    selection now points at `$CODEX_HOME/<name>.config.toml`.
    
    ## What
    
    - Reject legacy top-level `profile = "..."` config while loading runtime
    config, with an error that points callers at `--profile <name>` and
    `<name>.config.toml` in the [core load
    path](https://github.com/openai/codex/blob/3d923366eca10a29143623124c6c6e538f058269/codex-rs/core/src/config/mod.rs#L2524-L2531).
    - Remove the remaining profile-v1 merge points from runtime config
    resolution, including features, permissions, model/provider selection,
    web search, Windows sandbox settings, TUI settings, role reloads, and
    OSS provider lookup.
    - Drop the leftover profile override surface from
    [`ConfigOverrides`](https://github.com/openai/codex/blob/3d923366eca10a29143623124c6c6e538f058269/codex-rs/core/src/config/mod.rs#L2118-L2148)
    and from the MCP server `codex` tool schema.
    - Prune profile-precedence tests that only exercised the removed
    resolver and replace them with rejection coverage for the legacy
    selector.
    
    ## Testing
    
    - Not run in this metadata pass.
    - Added
    [`legacy_profile_selection_is_rejected`](https://github.com/openai/codex/blob/3d923366eca10a29143623124c6c6e538f058269/codex-rs/core/src/config/config_tests.rs#L7942-L7965)
    coverage for the new runtime guard.
  • mcp: surface profile migration guidance under --profile (#23890)
    ## Why
    
    `codex --profile <name> mcp ...` should reach the same profile-v2
    migration guard as runtime commands. Otherwise legacy
    `[profiles.<name>]` users see the generic command-scope rejection
    instead of the existing guidance to move settings into
    `$CODEX_HOME/<name>.config.toml`.
    
    ## What
    
    - Allow `codex mcp` through the `--profile` subcommand gate.
    - Pass profile loader overrides into the MCP entry point only to
    validate profile-v2 migration when a profile is present.
    - Keep MCP add/remove/list/get/login/logout behavior otherwise
    unchanged; this does not add profile-scoped MCP server management.
    - Cover the legacy profile migration error for `codex --profile work mcp
    list`.
    
    ## Testing
    
    - `cargo test -p codex-cli`
  • [codex] Enable Node env proxy for managed network proxy (#23905)
    ## Summary
    - set `NODE_USE_ENV_PROXY=1` when Codex applies managed network proxy
    environment overrides
    - keep the Node opt-in in the proxy environment key set used by
    shell/runtime env handling
    - cover the new env var in the focused network proxy env test
    
    ## Why
    Codex already sets HTTP proxy environment variables for child processes
    when the managed network proxy is active. Node's built-in network
    behavior needs the `NODE_USE_ENV_PROXY` opt-in to honor those env vars,
    so Node-based skill scripts can otherwise skip the managed proxy path
    and fail under restricted network access.
    
    ## Validation
    - `just fmt` in `codex-rs`
    - `cargo test -p codex-network-proxy` in `codex-rs`
  • Allow parallel MCP tool calls when annotated readOnly (#23750)
    ## Summary
    - Treat MCP tools with `readOnlyHint: true` as parallel-safe even when
    `supports_parallel_tool_calls` is unset or `false`.
    - Keep server-level `supports_parallel_tool_calls` as an additive
    override for non-read-only tools.
    - Add focused unit coverage for the MCP handler eligibility decision.
    - Update RMCP integration coverage to keep the serial baseline on a
    mutable tool, verify read-only concurrency without server opt-in, and
    preserve the server opt-in concurrency path separately.
    
    ## Testing
    - `just fmt`
    - `cargo test -p codex-core --lib tools::handlers::mcp::tests::`
    - `cargo test -p codex-core --test all
    stdio_mcp_read_only_tool_calls_run_concurrently_without_server_opt_in`
    - `cargo test -p codex-core --test all
    stdio_mcp_parallel_tool_calls_opt_in_runs_concurrently`
    - `cargo test -p codex-rmcp-client`
  • feat: best-effort compact large tool schemas (#23904)
    ## Why
    
    The `dev/cc/ref-def` branch preserves richer JSON Schema detail for
    connector tools, including `$defs` and nested shapes. That improves
    fidelity, but it pushes the largest connector schemas well past the
    intended tool-schema budget. This PR adds a best-effort compaction pass
    for unusually large tool input schemas so the p99 and max tails stay
    small while ordinary schemas are left alone.
    
    ## What Changed
    
    - Added best-effort large-schema compaction in
    `codex-rs/tools/src/json_schema.rs` after schema sanitization and
    definition pruning.
    - Compaction runs as a waterfall only while the compact JSON budget
    proxy is exceeded:
      1. Strip schema `description` metadata.
      2. Drop root `$defs` / `definitions`.
      3. Collapse deep nested complex schema objects to `{}`.
    - Kept top-level argument names and immediate schema shape where
    possible.
    
    ## Corpus Results
    
    Scope: 2,025 schemas under `golden_schemas`, all parsed successfully.
    Token count is `o200k_base` over compact JSON from
    `parse_tool_input_schema`.
    
    | Percentile | Before `origin/main` `4dbca61e20` | After branch
    `dev/cc/ref-def` `f9bf071758` | After this PR |
    |---|---:|---:|---:|
    | p0 | 9 | 9 | 9 |
    | p10 | 59 | 63 | 63 |
    | p25 | 81 | 86 | 86 |
    | p50 | 114 | 127 | 125 |
    | p75 | 174 | 205 | 202 |
    | p90 | 295 | 335 | 322 |
    | p95 | 391 | 526 | 422 |
    | p99 | 794 | 1,303 | 689 |
    | max | 2,836 | 3,337 | 887 |
    
    After this PR, `0 / 2,025` schemas are over 1k tokens.
    
    ### Compaction Savings
    
    These are cumulative waterfall stages over the same corpus. Later passes
    only run for schemas that are still over the compact JSON budget proxy.
    
    | Stage | Total tokens | Step savings | Schemas changed by step |
    |---|---:|---:|---:|
    | No compaction | 391,862 | - | - |
    | Strip schema `description` metadata | 350,961 | 40,901 | 66 |
    | Drop root `$defs` / `definitions` | 340,683 | 10,278 | 13 |
    | Collapse deep complex schemas to `{}` | 335,875 | 4,808 | 6 |
  • Expose conversation history to extension tools (#23963)
    ## Why
    
    Extension tools that need conversation context should be able to read it
    from the live tool invocation instead of reaching into thread
    persistence themselves.
    
    ## What changed
    
    - Add a `ConversationHistory` snapshot to extension `ToolCall`s and
    populate it from the current raw in-memory response history.
    - Expose all history items at this boundary so each extension can filter
    and bound the subset it needs before consuming or forwarding it.
    - Cover the adapter and registry dispatch paths and update existing
    extension tests that construct `ToolCall` literals.
    
    ## Test plan
    
    - `cargo test -p codex-tools`
    - `cargo test -p codex-extension-api`
    - `cargo test -p codex-goal-extension`
    - `cargo test -p codex-memories-extension`
    - `cargo test -p codex-core passes_turn_fields_to_extension_call`
    - `cargo test -p codex-core
    extension_tool_executors_are_model_visible_and_dispatchable`
  • feat: support local refs and defs in tool input schemas (#23357)
    # Why
    
    Some connector tool input schemas use local JSON Schema references and
    definition tables to avoid duplicating large nested shapes. Codex
    previously lowered these schemas into the supported subset in a way that
    could discard `$ref`-only schema objects and lose the corresponding
    definitions, which made non-strict tool registration less faithful than
    the original connector schema.
    
    This keeps the existing minimal-lowering policy: Codex still does not
    raw-pass through arbitrary JSON Schema, but it now preserves local
    reference structure that fits the Responses-compatible subset and prunes
    definition entries that cannot be reached by following `$ref`s from the
    root schema after sanitization, including refs found transitively inside
    other reachable definitions. The pruning matters because Responses
    parses definition tables even when entries are unused, so keeping dead
    definitions wastes prompt tokens.
    
    # What changed
    
    - Added `$ref`, `$defs`, and legacy `definitions` fields to the tool
    `JsonSchema` representation.
    - Updated `parse_tool_input_schema` lowering so `$ref`-only schema
    objects survive sanitization instead of becoming `{}`.
    - Sanitized definition tables recursively and dropped malformed
    definition tables so non-strict registration degrades gracefully.
    - Added reachability pruning for root definition tables by starting from
    refs outside definition tables, then following refs inside reachable
    definitions.
    - Added JSON Pointer decoding for local definition refs such as
    `#/$defs/Foo~1Bar`.
    
    # Verification
    ran local golden-schema probes against representative connector schemas
    to validate behavior on real generated schemas:
    
    | Golden schema | Before bytes | After bytes | `$defs` before -> after |
    `$ref` before -> after | Result |
    |---|---:|---:|---:|---:|---|
    | `google_calendar/create_space` | 7111 | 4526 | 7 -> 7 | 7 -> 7 | all
    definitions preserved because all are reachable |
    | `figma/apply_file_variable_changes` | 4609 | 999 | 8 -> 5 | 8 -> 5 |
    unused defs pruned after unsupported `oneOf` shapes lower away |
    | `snowflake/list_catalog_integrations` | 1380 | 404 | 3 -> 0 | 0 -> 0 |
    all defs pruned because none are referenced |
    | `dropbox/create_shared_link` | 8894 | 1836 | 14 -> 4 | 9 -> 4 | only
    defs reachable from the root schema after sanitization are retained,
    including transitively through other retained defs |
    
    Token increase across golden schema due to this change:
    <img width="817" height="366" alt="Screenshot 2026-05-19 at 1 47 04 PM"
    src="https://github.com/user-attachments/assets/d5c80fe9-da85-41e6-8ac7-a01d1e0b0f71"
    />
  • Fix auto-review permission profile override (#23956)
    ## Summary
    The auto-review runtime sync path was assigning a raw
    `PermissionProfile` into `runtime_permission_profile_override`, whose
    field now expects `RuntimePermissionProfileOverride`. That broke the TUI
    Bazel build.
    
    This changes the assignment to store
    `RuntimePermissionProfileOverride::from_config(&self.config)`, matching
    the other runtime override paths and preserving the active profile and
    network metadata with the permission profile.
  • Add Bedrock Mantle GovCloud region (#23860)
    ## Summary
    - Add us-gov-west-1 to the Bedrock Mantle supported region list
    - Cover the GovCloud endpoint URL in the existing base_url unit test
    
    ## Test
    - cargo test -p codex-model-provider
  • fix: Allow plugin skills to share plugin-level icon assets (#23776)
    Thread the plugin root through plugin skill loading so skill interface
    icons can reference shared plugin assets, such as ../../assets/logo.svg.
  • [3 of 4] tui: route feature and memory toggles through app server (#22915)
    ## Why
    Experimental feature toggles and memory settings can update several
    related config values in one interaction. Keeping those writes local in
    a remote TUI session is especially dangerous because the UI can diverge
    from the app-server config while also leaving behind partially stale
    supporting keys.
    
    This is **[3 of 4]** in a stacked series that moves TUI-owned config
    mutations onto app-server APIs.
    
    ## What changed
    - Routed feature flag persistence through app-server batch writes,
    including the supporting reviewer and permission updates used by
    guardian approval.
    - Routed Windows sandbox mode persistence and legacy Windows feature
    cleanup through app-server writes.
    - Routed memory settings through app-server batch writes and updated the
    TUI tests to exercise the embedded app-server path.
    
    ## Config keys affected
    - `features.<feature_key>`
    - `profiles.<profile>.features.<feature_key>`
    - `approval_policy`
    - `sandbox_mode`
    - `approvals_reviewer`
    - `windows.sandbox`
    - `features.experimental_windows_sandbox`
    - `features.elevated_windows_sandbox`
    - `features.enable_experimental_windows_sandbox`
    - Profile-scoped Windows legacy feature variants under
    `profiles.<profile>.features.*`
    - `memories.use_memories`
    - `memories.generate_memories`
    - Profile-scoped memory variants under `profiles.<profile>.memories.*`
    
    ## Suggested manual validation
    - Connect the TUI to a remote app server, toggle guardian approval on
    and off, and confirm the remote config updates
    `features.guardian_approval`, reviewer state, approval policy, and
    sandbox mode coherently.
    - Toggle a default-false experimental feature at the root level, disable
    it again, and confirm the key clears instead of lingering as an
    unnecessary explicit `false`.
    - Change memory settings and confirm the remote config updates both
    memory keys while the running TUI reflects the new state.
    - On Windows, switch sandbox mode through the TUI and confirm
    `windows.sandbox` is updated while the legacy Windows feature keys are
    cleared.
    
    ## Stack
    1. [#22913](https://github.com/openai/codex/pull/22913) `[1 of 4]`
    primary settings writes
    2. [#22914](https://github.com/openai/codex/pull/22914) `[2 of 4]` app
    and skill enablement
    3. [#22915](https://github.com/openai/codex/pull/22915) `[3 of 4]`
    feature and memory toggles
    4. [#22916](https://github.com/openai/codex/pull/22916) `[4 of 4]`
    startup and onboarding bookkeeping
  • Add subagent identity to hook inputs (#22882)
    # What
    
    When a normal hook fires inside a thread-spawned subagent, Codex now
    includes these optional top-level fields in the hook input:
    
    - `agent_id`: the child thread id
    - `agent_type`: the subagent role
    
    Root-agent hook inputs omit these fields. `SubagentStart` and
    `SubagentStop` keep their existing required `agent_id` and `agent_type`
    fields because those events are inherently subagent-scoped.
    
    This does not change matcher behavior. Tool hooks still match on tool
    name, compact hooks still match on trigger, and `UserPromptSubmit` still
    ignores matchers. Only `SubagentStart` and `SubagentStop` match on
    `agent_type`.
  • fix(remote-control): retry after auth recovery (#23775)
    ## Why
    
    When remote control hits an auth failure such as a revoked or reused
    refresh token, the websocket loop falls into reconnect backoff. If the
    user fixes auth while that loop is sleeping, remote control can stay
    offline until the old retry timer expires because nothing wakes the loop
    or resets its exhausted auth recovery state.
    
    ## What Changed
    
    Added an auth-change watch on `AuthManager` for refresh-relevant cached
    auth updates.
    
    The remote-control websocket loop now subscribes to that signal, resets
    `UnauthorizedRecovery` and reconnect backoff when auth changes, and
    retries immediately instead of waiting for the previous delay.
    
    Updated the remote-control transport test to verify that reloading auth
    with the now-available account id wakes enrollment before the prior
    retry delay.
    
    ## Verification
    
    `cargo test -p codex-app-server-transport
    remote_control_waits_for_account_id_before_enrolling`
  • [codex] Make thread search case-insensitive (#23921)
    ## Summary
    - make rollout content search prefilter rollout files case-insensitively
    - keep the no-ripgrep fallback scan and visible snippet matcher aligned
    with that behavior
    - cover a lowercase `thread/search` query matching mixed-case
    conversation content
    
    ## Why
    The rollout-backed `thread/search` path used exact string matching in
    both its `rg` prefilter and semantic snippet generation. A content
    result could be missed solely because the query casing did not match the
    stored conversation text.
    
    ## Validation
    - `just fmt`
    - `cargo test -p codex-app-server thread_search_returns_content_matches`
    - `cargo test -p codex-rollout`
    - `just bazel-lock-update`
    - `just bazel-lock-check`
    - `cargo build -p codex-cli`
    - launched a local Electron dev instance with the rebuilt CLI binary
  • npm: remove legacy package artifact synthesis (#23836)
    ## Why
    
    `rust-release` now publishes `codex-package-<target>.tar.gz` as the
    canonical native package payload. npm staging should consume those
    archives directly instead of keeping legacy synthesis code that fetched
    `rg`, copied standalone binaries, and rebuilt an approximate package
    layout.
    
    That also means the package builder should not know the internal shape
    of `codex-package`. It should extract and copy the target payload
    wholesale so future layout changes stay localized to the archive
    producer.
    
    The release job stages `codex`, `codex-responses-api-proxy`, and
    `codex-sdk` together, so native artifact download should be filtered,
    observable, and shared across component installs. Since that native
    hydration is now only used by release staging, keeping a separate
    `install_native_deps.py` CLI adds an extra wrapper without a real
    caller.
    
    ## What Changed
    
    - Removed legacy `codex-package` synthesis and related compatibility
    flags from npm staging.
    - Folded the remaining native artifact hydration code into
    `scripts/stage_npm_packages.py` and deleted
    `codex-cli/scripts/install_native_deps.py`.
    - Made platform package staging copy the full extracted target directory
    instead of enumerating package entries.
    - Kept non-`codex-package` native components under their component
    directory name instead of using a legacy destination map.
    - Split native staging by component set while sharing one
    workflow-artifact cache across the invocation.
    - Changed workflow artifact download to select target artifacts by name,
    print sizes/progress, and reuse cached artifacts.
    - Removed the implicit `CI=true` default from `build_npm_package.py`;
    local CI-shaped runs should set that environment explicitly.
    - Kept `npm pack` cache/log output in its temporary directory so packing
    does not write to the user npm cache.
    
    ## Verification
    
    - `python3 -m py_compile scripts/stage_npm_packages.py
    codex-cli/scripts/build_npm_package.py`
    - `python3 -m unittest discover -s scripts/codex_package -p "test_*.py"`
    - `scripts/stage_npm_packages.py --help`
    - `codex-cli/scripts/build_npm_package.py --help`
    - Ran the release-shaped staging command from `rust-release.yml` against
    workflow run https://github.com/openai/codex/actions/runs/26240748758
    with `CI=true` set locally to match GitHub Actions:
    
    ```sh
    CI=true python3 ./scripts/stage_npm_packages.py \
      --release-version 0.133.0 \
      --workflow-url https://github.com/openai/codex/actions/runs/26240748758 \
      --package codex \
      --package codex-responses-api-proxy \
      --package codex-sdk
    ```
    
    That completed successfully, downloaded only the six target artifacts
    once, reused the cache for `codex-responses-api-proxy`, and produced all
    nine npm tarballs. Generated tarballs and staging/artifact temp dirs
    were cleaned afterward.
  • Remove plugin hooks feature flag (#22552)
    # Why
    
    This is a follow-up stacked on top of the `plugin_hooks` default-on
    change. Once we are comfortable making plugin hooks part of the normal
    plugin behavior, the separate feature flag stops buying us much and
    leaves extra branching/cache state behind.
    
    # What
    
    - remove the `PluginHooks` feature and generated config-schema entries
    - make plugin hook loading/listing follow plugin enablement directly
    - drop plugin-manager cache/state that only existed to distinguish
    hook-flag toggles
    - remove tests and fixtures that modeled `plugin_hooks = true/false`
  • [codex] Add rollout-backed thread content search (#23519)
    ## Summary
    - add experimental `thread/search` for local rollout-backed thread
    search using `rg` over JSONL rollouts
    - return search-specific result rows with optional previews instead of
    storing preview data on `StoredThread` or ordinary `Thread` responses
    - keep `thread/list` separate from full-content search and document the
    new app-server surface
    
    ## Testing
    - `cargo test -p codex-app-server-protocol`
    - `cargo test -p codex-app-server
    thread_search_returns_content_and_title_matches -- --nocapture`
  • TUI: skip goal replace prompt for completed goals (#23792)
    ## Why
    Users reported that the replacement confirmation feels unnecessary when
    the current thread goal is already complete. In that state, `/goal
    <objective>` is starting fresh rather than interrupting active work.
    
    ## What changed
    `/goal <objective>` now skips the replace confirmation when the existing
    goal has `complete` status and uses the existing fresh replacement path.
    Goals that are active, paused, blocked, usage-limited, or budget-limited
    still require confirmation before being replaced.
  • Reconnect disconnected exec-server websocket clients with fresh sessions (#23867)
    ## Summary
    - replace the one-shot lazy remote exec-server cache with a
    lock-protected current client
    - when the cached websocket client is already disconnected, create one
    fresh websocket client/session on the next `get()`
    - keep existing disconnect failure behavior for old process sessions and
    HTTP body streams; do not add session resume or request retry
    
    ## Why
    The prior PR direction was trying to grow into session restore: resume
    the old `session_id`, preserve existing process handles, and add
    reconnect retry policy. That is more machinery than we want for this
    slice.
    
    For now, the useful minimum is simpler: later fresh remote operations
    should not be stuck behind a dead cached websocket client, but anything
    already attached to the dead connection should fail loudly through the
    existing disconnect path. The server already has detached-session
    cleanup via its existing TTL, so this PR does not need to add
    client-side session preservation.
    
    ## What Changed
    - `LazyRemoteExecServerClient::get()` now keeps the current concrete
    client in a small mutex-protected cache plus one async connect lock.
    - If that cached client is still connected, `get()` returns it.
    - If that cached websocket client has observed the transport close,
    `get()` creates a brand-new websocket client with a brand-new
    exec-server session and replaces the cache.
    - If that cached client is stdio-backed, behavior stays one-shot: the
    dead client is returned and later work surfaces the existing disconnect
    error.
    - No `resume_session_id`, backoff, request replay, or existing
    `RemoteExecProcess` rebinding is added here.
    - Added focused websocket coverage that proves two concurrent `get()`
    calls after disconnect share one fresh replacement client/session.
  • Improve /goal error messages for ephemeral sessions (#23796)
    ## Why
    
    When a user runs `/goal` in a temporary session, the TUI can currently
    surface an internal app-server failure such as `thread/goal/get failed
    in TUI`. That message is technically true, but it does not explain the
    actual constraint: goals require a saved session because goal state is
    persisted with the thread.
    
    This is especially confusing when `codex doctor` reports the background
    app-server as running in ephemeral mode, since that wording is easy to
    conflate with ephemeral thread/session behavior.
    
    ## What changed
    
    - Added a TUI-side formatter for thread-goal RPC failures in
    `codex-rs/tui/src/app/thread_goal_actions.rs`.
    - Detects app-server/core errors that indicate goals are unsupported for
    an ephemeral thread/session.
    - Replaces the internal RPC failure with a user-facing explanation:
    
    ```text
    Goals need a saved session. This session is temporary.
    Run `codex` to start a saved session, or `codex resume` / `/resume` to reopen one.
    ```
    
    - Preserves the existing generic failure wording for non-ephemeral goal
    errors.
    
    ## Verification
    
    - `cargo test -p codex-tui thread_goal_error_message --lib`
    
    I also tried `cargo test -p codex-tui`; it built successfully but the
    test runner aborted in an unrelated side-thread stack overflow
    (`app::tests::discard_side_thread_removes_agent_navigation_entry`),
    which reproduced when run by itself.
  • packaging: move rg manifest out of npm bin (#23833)
    ## Why
    
    Installing `@openai/codex` currently places a Dotslash `rg` manifest at
    `node_modules/@openai/codex/bin/rg`, even though the native optional
    dependency already ships the actual helper under
    `vendor/<target>/codex-path/rg`. The launcher prepends that `codex-path`
    directory, so the top-level `bin/rg` file is redundant in the npm
    install.
    
    The remaining direct consumers of the manifest are package-building
    paths: `scripts/codex_package/ripgrep.py` and
    `codex-cli/scripts/install_native_deps.py`. Keeping the manifest under
    `codex-cli/bin` makes it look like a shipped npm binary, so this moves
    it next to the package-builder code that owns it. The checked-in
    `@openai/codex` package metadata should likewise describe only the meta
    package payload; generated platform packages continue to publish
    `vendor`.
    
    ## What Changed
    
    - Moved the Dotslash ripgrep manifest from `codex-cli/bin/rg` to
    `scripts/codex_package/rg`.
    - Updated the package builder, npm native-artifact hydrator, README, and
    CLI help text to reference the new manifest location.
    - Stopped `codex-cli/scripts/build_npm_package.py` from copying `rg`
    into the `@openai/codex` meta package.
    - Narrowed the checked-in meta package `files` whitelist to
    `bin/codex.js`.
    
    ## Verification
    
    - `python3 -m unittest discover -s scripts/codex_package -p "test_*.py"`
    - `python3 -m unittest discover -s codex-cli/scripts -p "test_*.py"`
    - `python3 -m py_compile codex-cli/scripts/build_npm_package.py
    codex-cli/scripts/install_native_deps.py
    scripts/codex_package/ripgrep.py scripts/codex_package/cli.py
    scripts/stage_npm_packages.py`
    - `codex-cli/scripts/build_npm_package.py --package codex --version
    0.0.0-test --pack-output <tmp>/codex-meta-no-vendor.tgz`
    - `tar -tf <tmp>/codex-meta-no-vendor.tgz` showed only
    `package/bin/codex.js`, `package/package.json`, and `package/README.md`.
    - Direct staging check showed `codex` uses `files: ["bin/codex.js"]`
    while `codex-darwin-arm64` still uses `files: ["vendor"]`.
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/23833).
    * #23836
    * __->__ #23833
  • tui: plumb permission profile selection (#23708)
    ## Why
    
    The named-profile `/permissions` picker needs a small TUI action path
    that can select permission profiles without folding the menu UI and
    profile metadata into the same review.
    
    ## What changed
    
    - Carry permission-profile selections through the TUI app event flow.
    - Persist selected profiles while preserving the existing approval
    settings and guardrail prompts.
    - Keep the legacy `/permissions` picker behavior in this layer; the
    profile-mode menu stays in the follow-up PR.
    
    ## Stack
    
    1. [#22931](https://github.com/openai/codex/pull/22931):
    runtime/session/network propagation for active permission profiles.
    2. **This PR**: TUI selection plumbing and guardrail flow.
    3. [#21559](https://github.com/openai/codex/pull/21559): profile-aware
    `/permissions` menu and custom profile display.
    
    <img width="1632" height="1186" alt="image"
    src="https://github.com/user-attachments/assets/69ddcd5e-b57c-468d-8c1d-246916323c15"
    />
    
    ## Validation
    
    - `git diff --cached --check` before commit.
    - Full test run skipped at the user request while pushing the split
    stack.
  • cli: remove legacy profile v1 plumbing (#23886)
    ## Why
    
    [#23883](https://github.com/openai/codex/pull/23883) moved the
    user-facing `--profile` flag onto profile v2. The shared CLI option
    layer still carried the old `config_profile` slot and several CLI
    entrypoints still copied that value into legacy config overrides.
    Leaving that path around makes the CLI surface look like it still
    selects legacy `[profiles.*]` state even though `--profile` now means
    `$CODEX_HOME/<name>.config.toml`.
    
    ## What
    
    - Remove the legacy `config_profile` field and merge/copy path from
    [`SharedCliOptions`](https://github.com/openai/codex/blob/95baaf72920c8db22097df8d15a0bb76c84528b6/codex-rs/utils/cli/src/shared_options.rs#L8-L177).
    - Stop forwarding profile-v1 overrides from CLI, exec, TUI, doctor,
    debug, feature, and exec-server paths; runtime profile selection remains
    on `config_profile_v2` through
    [`loader_overrides_for_profile`](https://github.com/openai/codex/blob/95baaf72920c8db22097df8d15a0bb76c84528b6/codex-rs/cli/src/main.rs#L1606-L1619).
    - Resolve local OSS provider selection from the base config in exec and
    TUI now that the legacy profile argument is gone.
    
    ## Testing
    
    - Not run (cleanup-only follow-up to #23883).
  • Route MCP servers through explicit environments (#23583)
    ## Summary
    - route each configured MCP server through an explicit per-server
    `environment_id` instead of a manager-wide remote toggle
    - default omitted `environment_id` to `local`, resolve named ids through
    `EnvironmentManager`, and fail only the affected MCP server when an
    explicit id is unknown
    - keep local stdio on the existing local launcher path for now, while
    named-environment stdio uses the selected environment backend and
    requires an absolute `cwd`
    - allow local HTTP MCP servers to keep using the ambient HTTP client
    when no local `Environment` is configured; named-environment HTTP MCPs
    use that environment's HTTP client
    
    ## Validation
    - devbox Bazel build: `bazel build --bes_backend= --bes_results_url=
    //codex-rs/cli:codex //codex-rs/rmcp-client:test_stdio_server
    //codex-rs/rmcp-client:test_streamable_http_server`
    - devbox app-server config matrix with real `config.toml` /
    `environments.toml` files covering omitted local, explicit local,
    omitted local under remote default, explicit remote stdio, local HTTP
    without local env, explicit remote HTTP, local stdio without local env,
    unknown explicit env, and remote stdio without `cwd`
  • docs: add description to codex-cli/package.json (#23835)
    Fix this eyesore where our lack of a `"description"` was causing our
    `README.md` to be used for previews on npm.
    
    <img width="1291" height="178" alt="image"
    src="https://github.com/user-attachments/assets/a9bc08c5-0def-4755-8bcc-0c90e096b9c2"
    />
  • cli: rename profile v2 flag to --profile (#23883)
    ## Why
    
    Profile v2 is taking over the user-facing profile selection path, so the
    CLI no longer needs to expose the transitional `--profile-v2` spelling.
    This switches the public args surface to `--profile` before the
    remaining legacy profile plumbing is removed separately.
    
    ## What
    
    - Rebind `--profile` and `-p` to the v2 profile name argument that
    selects `$CODEX_HOME/<name>.config.toml`.
    - Stop parsing the legacy shared CLI profile argument while keeping its
    implementation path in place for follow-up cleanup.
    - Update CLI validation, profile-name parse errors, and the
    legacy-profile collision message/tests to refer to `--profile`.
    
    ## Testing
    
    - `cargo test -p codex-cli -p codex-config -p codex-protocol -p
    codex-utils-cli`
  • chore: link doc in profile error messages (#23879)
    Just updating the error message with a link to the doc
  • refactor: centralize tool exposure planning (#23876)
    ## Why
    
    Tool exposure is a planning concern, but the deferred MCP path and
    dispatch-only legacy shell path were carrying those decisions in handler
    constructors and a shell-only tool-family builder. Keeping those
    decisions in `spec_plan` makes the core tool plan easier to follow and
    keeps handlers focused on runtime behavior.
    
    ## What changed
    
    - add `PlannedTools` helpers for ordinary runtimes, exposure overrides,
    dispatch-only runtimes, and hosted specs
    - inline shell tool assembly into `core/src/tools/spec_plan.rs` and
    remove the shell-only `tool_family` module
    - remove exposure state and special exposure constructors from
    `McpHandler` and `ShellCommandHandler`
    - keep hidden runtime behavior centralized in `ExposureOverride`,
    including disabling parallel tool calls for hidden handlers
    
    ## Testing
    
    - Not run (refactor only)
  • [codex] Stabilize subagent start hook test (#23882)
    ## What
    
    Remove the exact captured request-count assertion from the
    `SubagentStart` hook integration test while still waiting for the child
    request that matches the injected hook context.
    
    ## Why
    
    The test owns the start-hook behavior and already verifies that the
    child request reaches the context matcher plus that the start/session
    hook logs have the expected invocations. Counting every request captured
    by the response mock makes the test sensitive to lifecycle timing
    outside that contract and has been flaky in CI.
    
    ## Testing
    
    - `cargo test -p codex-core --test all
    suite::subagent_notifications::subagent_start_replaces_session_start_and_injects_context
    -- --exact`
  • Make tool executor specs mandatory (#23870)
    ## Why
    
    `ToolExecutor` is the runtime contract that keeps a callable tool and
    its model-visible spec together. Leaving `spec()` optional lets a
    registered runtime silently omit that half of the contract, and it also
    overloads a missing spec as an exposure decision for tools that should
    stay dispatchable without being shown to the model.
    
    ## What
    
    - Make `ToolExecutor::spec()` required and update core, extension, and
    test tool executors to return a concrete `ToolSpec`.
    - Add `ToolExposure::Hidden` for dispatch-only tools. The legacy
    `shell_command` runtime in unified-exec sessions now uses that explicit
    exposure instead of hiding itself by omitting a spec.
    - Build MCP tool specs when `McpHandler` is constructed so invalid MCP
    specs are skipped before the handler is registered.
    - Keep tool planning aligned with the new contract for direct, deferred,
    hidden, code-mode, dynamic, and namespaced tool paths.
    
    ## Testing
    
    - Added tool-plan coverage that invalid MCP tool specs are not
    registered.
    - Updated shell-family coverage for the hidden legacy `shell_command`
    runtime and the affected tool executor test fixtures.
  • feat: retain remote compaction truncation parity in v2 (#23728)
    ## Why
    
    Remote compaction now has two implementations: the existing
    server-rebuilt v1 path and the newer client-rebuilt v2 path behind
    `remote_compaction_v2`. The v1 path bounds retained
    user/developer/system history before installing the compaction item,
    while v2 was previously carrying the full retained history forward. That
    made the two paths diverge for large pre-compaction transcripts even
    though they are meant to preserve the same compaction contract.
    
    This aligns v2 with the retained-history budget expected from v1 so
    switching the feature flag does not materially change which
    pre-compaction messages survive into the rebuilt history.
    
    ## What changed
    
    - Apply a retained-message character budget while rebuilding v2
    compacted history in `core/src/compact_remote_v2.rs`.
    - Keep newest retained messages first, truncate the boundary message
    with the shared `truncate_text(...)` helper, and drop older retained
    messages once the budget is exhausted.
    - Preserve non-text retained message content such as images while
    truncating text content.
    - Use the current `64_000` token retained-message default translated to
    the existing `4x` character budget.
    
    ## Testing
    
    - `cargo test -p codex-core compact_remote_v2::tests::`
    - Added focused coverage for newest-first retention and truncating
    multipart retained messages without dropping images.
  • [codex] Steer budget-limited goal extension turns (#23718)
    ## What
    - Add a small extension capability for injecting model-visible response
    items into the active turn
    - Have the goal extension inject hidden goal-context steering when
    tool-finish accounting reaches `BudgetLimited`
    - Cover the extension backend path with an assertion on the injected
    steering item
    
    ## Why
    PR #23696 persists and emits the budget-limited goal update from
    tool-finish accounting, but it leaves the model unaware of that
    transition. The existing core runtime steers the model to wrap up in
    this case; the extension path should do the same through an explicit
    host capability.
    
    ## Testing
    - `just fmt`
    - `cargo test -p codex-goal-extension`
    - `cargo test -p codex-extension-api`
  • Trace logical websocket request after untraced warmup (#23581)
    ## Why
    
    `prewarm_websocket` intentionally stays out of rollout inference
    tracing, but the next traced websocket request can still reuse the
    warmup `response_id` and send an empty `input` delta. If tracing records
    that wire payload verbatim, replay sees an incremental request whose
    parent was never traced and cannot reconstruct the conversation.
    
    This fixes that at the producer boundary instead of relaxing
    `rollout-trace` replay semantics around unresolved
    `previous_response_id` values.
    
    ## What
    
    - track whether the last websocket response came from an untraced warmup
    and clear that state when the websocket session is reset or reconnected
    - when a traced websocket request reuses that warmup parent, keep
    sending the compressed websocket request on the wire but record the
    logical `ResponsesApiRequest` in the rollout trace
    - add a regression test that proves replay reconstructs the logical user
    message even though the websocket follow-up carries
    `previous_response_id = warm-1` with empty `input`
    - update `InferenceTraceAttempt::record_started` docs to reflect that
    callers may record a logical request rather than the exact transport
    payload
    
    ## Testing
    
    - `cargo test -p codex-core --test all
    responses_websocket_request_prewarm_traces_logical_request`
  • sdk: launch packaged Codex runtimes (#23786)
    ## Why
    
    The Python and TypeScript SDKs launch the native Codex runtime directly,
    so they need to consume the same package artifact shape that release
    jobs now produce. The runtime wheel should be built from the canonical
    Codex package archive rather than reconstructing a parallel layout from
    loose binaries.
    
    ## What Changed
    
    - Stage `openai-codex-cli-bin` by extracting
    `codex-package-<target>.tar.gz` into `src/codex_cli_bin` and validating
    the expected package layout.
    - Update release workflows to pass the generated package archive into
    `stage-runtime` instead of the temporary package directory.
    - Update Python runtime setup to download `codex-package-*.tar.gz`
    release assets directly.
    - Expose Python runtime helpers for the bundled package directory and
    `codex-path`, and prepend that path when `openai_codex` launches the
    installed runtime without duplicating Windows `Path`/`PATH` keys.
    - Teach the TypeScript SDK to resolve package-layout optional
    dependencies while keeping the existing npm fallback layout, and
    preserve the existing Windows path variable casing when prepending
    `codex-path`.
    
    ## Test Plan
    
    - `python3 -m py_compile sdk/python/scripts/update_sdk_artifacts.py
    sdk/python/_runtime_setup.py sdk/python/src/openai_codex/client.py
    sdk/python-runtime/src/codex_cli_bin/__init__.py`
    - `uv run --frozen --project sdk/python --extra dev ruff check
    sdk/python/scripts/update_sdk_artifacts.py sdk/python/_runtime_setup.py
    sdk/python/src/openai_codex/client.py
    sdk/python/tests/test_artifact_workflow_and_binaries.py
    sdk/python-runtime/src/codex_cli_bin/__init__.py`
    - `uv run --frozen --project sdk/python --extra dev pytest
    sdk/python/tests/test_artifact_workflow_and_binaries.py`
    - `pnpm eslint src/exec.ts tests/exec.test.ts`
    - `pnpm test --runInBand tests/exec.test.ts`
  • core: pass permission profiles to Windows runner (#23715)
    ## Why
    
    This is the functional handoff PR for the Windows sandbox
    `PermissionProfile` migration. After #23714, the Windows elevated
    backend can accept a profile-native request, but core still sent a
    compatibility `SandboxPolicy` into the elevated command-runner path.
    That meant profile-only details such as deny globs had to be translated
    through side channels instead of being preserved in the runner
    `SpawnRequest`.
    
    Passing the real `PermissionProfile` completes the command-runner
    handoff while leaving the unelevated restricted-token fallback on the
    legacy policy-string API.
    
    ## What
    
    - Updates one-shot Windows elevated execution in `core/src/exec.rs` to
    call `run_windows_sandbox_capture_for_permission_profile_elevated`.
    - Updates unified exec in `core/src/unified_exec/process_manager.rs` to
    call `spawn_windows_sandbox_session_elevated_for_permission_profile`.
    - Passes `request.permission_profile` /
    `exec_request.permission_profile` and the stored Windows sandbox policy
    cwd to the elevated backend.
    - Keeps compatibility `SandboxPolicy` serialization only for the
    non-elevated restricted-token fallback.
    
    ## Verification
    
    - `cargo test -p codex-core --test all --no-run`
  • feat: support managed permission profiles in requirements.toml (#23433)
    ## Why
    
    Cloud-managed `requirements.toml` should be able to define the managed
    permission profiles a client may select and constrain that selectable
    set without requiring local user config to recreate the profile catalog.
    
    This keeps requirements focused on restrictions. The selected default
    remains a config or session choice, while requirements contribute the
    managed profile bodies and `allowed_permissions` allowlist that the
    config-loading boundary validates before a resolved runtime
    `PermissionProfile` is installed.
    
    ## What changed
    
    - Add `requirements.toml` support for a managed permission-profile
    catalog plus its allowlist:
    
    ```toml
    allowed_permissions = ["review", "build"]
    
    [permissions.review]
    extends = ":read-only"
    
    [permissions.build]
    extends = ":workspace"
    ```
    
    - Merge requirements-defined profile bodies into the effective
    permission catalog and reject profile ids that collide with
    config-defined profiles.
    - Validate that every `allowed_permissions` entry resolves to a built-in
    or catalog profile before selection uses it.
    - Preserve allowed configured named-profile selections. When a
    configured named profile is disallowed, fall back to the first allowed
    requirements profile with a startup warning.
    - Keep built-in selections and the stock trust-based `:read-only` /
    `:workspace` fallback path intact when no permission profile is
    explicitly selected.
    - Centralize the managed catalog and allowlist selection path in
    `EffectivePermissionSelection` so the requirements boundary is visible
    in config loading.
    - Surface `allowedPermissions` through `configRequirements/read`, and
    update the generated app-server schema fixtures plus the app-server
    README.
    
    ## Validation
    
    - `cargo test -p codex-config`
    - `cargo test -p codex-core system_requirements_`
    - `cargo test -p codex-core system_allowed_permissions_`
    - `cargo test -p codex-app-server-protocol`
    - `just write-app-server-schema`
    
    ## Related work
    
    - Uses merged permission-profile inheritance support from #22270 and
    #23705.
    - Kept separate from the in-flight permission profile listing API in
    #23412.
  • windows-sandbox: add profile-native elevated APIs (#23714)
    ## Why
    
    This is the next step after #23167 in the Windows sandbox
    `PermissionProfile` migration. The elevated Windows backend still
    exposed policy-string entry points, which forced callers to pass a
    compatibility `SandboxPolicy` before the command-runner IPC could
    receive a profile.
    
    Adding profile-native APIs first keeps the core switch in the next PR
    small: reviewers can see that the Windows crate can prepare elevated
    setup, capability SIDs, and runner IPC from a resolved
    `PermissionProfile` without changing core behavior yet.
    
    ## What
    
    - Adds `ElevatedSandboxProfileCaptureRequest` and
    `run_windows_sandbox_capture_for_permission_profile_elevated` for
    one-shot elevated capture.
    - Adds `spawn_windows_sandbox_session_elevated_for_permission_profile`
    for unified exec sessions.
    - Factors elevated spawn prep through
    `prepare_elevated_spawn_context_for_permissions`, so both new APIs
    operate from `ResolvedWindowsSandboxPermissions` directly.
    - Keeps the existing legacy policy-string APIs as adapters for callers
    that have not moved yet.
    
    ## Verification
    
    - `cargo test -p codex-windows-sandbox`
    
    
    
    
    
    
    
    
    
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/23714).
    * #23715
    * __->__ #23714
  • [codex] Reject read-only fallback with approvals disabled (#23774)
    ## Why
    
    If a user configures `approval_policy = "never"` with `sandbox_mode =
    "danger-full-access"`, managed requirements can reject full access and
    force the existing permission fallback to read-only. That leaves Codex
    in a dead-end session: writes are blocked by the sandbox, while
    approvals are disabled so the session cannot ask to proceed.
    
    This PR rejects that constrained configuration during startup instead of
    letting the TUI enter a read-only session that cannot make progress. The
    rejection is attached to the requirement-constrained permission path in
    [`Config`](https://github.com/openai/codex/blob/39f0abc0a7c0ed0e348a6843e9f0c7b76e2400bc/codex-rs/core/src/config/mod.rs#L3301-L3318).
    
    ## What changed
    
    - Reject the `danger-full-access` to read-only managed-requirements
    fallback when the effective approval policy is `never`.
    - Explain in the startup config error why the fallback is invalid and
    how to fix it.
    - Add a regression test for the managed requirements path.
  • Use named MITM permissions config (#18240)
    ## Stack
    1. Parent PR: #18868 adds MITM hook config and model only.
    2. Parent PR: #20659 wires hook enforcement into the proxy request path.
    3. This PR changes the user facing PermissionProfile TOML shape.
    
    ## Why
    1. The broader goal is to make MITM clamping usable from the same
    permission profile that already controls network behavior.
    2. This PR is the config UX layer for the stack. It moves MITM policy
    into `[permissions.<profile>.network.mitm]` instead of exposing the flat
    runtime shape to users.
    3. The named hook and action tables belong here because users need
    reusable policy blocks that are easy to review, while the proxy runtime
    only needs a flat hook list.
    4. This PR validates action refs during config parsing so mistakes in
    the user facing policy fail before a proxy session starts.
    5. Keeping the lowering here lets the proxy keep its simpler runtime
    model and lets PermissionProfile remain the single source of network
    permission policy.
    
    ## Summary
    1. Keep MITM policy inside `[permissions.<profile>.network.mitm]` so the
    selected PermissionProfile owns network proxy policy.
    2. Use named MITM hooks under
    `[permissions.<profile>.network.mitm.hooks.<name>]`.
    3. Put host, methods, path prefixes, query, headers, body, and action
    refs on the hook table.
    4. Define reusable action blocks under
    `[permissions.<profile>.network.mitm.actions.<name>]`.
    5. Represent action blocks with `NetworkMitmActionToml`, then lower them
    into the proxy runtime action config.
    6. Reject unknown refs, empty refs, and empty action blocks during
    config parsing.
    7. Keep the runtime hook model unchanged by lowering config into the
    existing proxy hook list.
    8. Preserve the #20659 activation fix for nested MITM policy.
    
    ## Example
    ```toml
    [permissions.workspace.network.mitm]
    enabled = true
    
    [permissions.workspace.network.mitm.hooks.github_write]
    host = "api.github.com"
    methods = ["POST", "PUT"]
    path_prefixes = ["/repos/openai/"]
    action = ["strip_auth"]
    
    [permissions.workspace.network.mitm.actions.strip_auth]
    strip_request_headers = ["authorization"]
    ```
    
    ## Validation
    1. Regenerated the config schema.
    2. Ran the core MITM config parsing and validation tests.
    3. Ran the core PermissionProfile MITM proxy activation tests.
    4. Ran the core config schema fixture test.
    5. Ran the network proxy MITM policy tests.
    6. Ran the scoped Clippy fixer for the network proxy crate.
    7. Ran the scoped Clippy fixer for the core crate.
    
    ---------
    
    Co-authored-by: Winston Howes <winston@openai.com>
  • [codex] Add plugin id to MCP tool call items (#23737)
    Add owning plugin id to MCP tool call items so we can better filter them
    at plugin level.
    
    ## Summary
    - add optional `plugin_id` to MCP tool-call items and legacy begin/end
    events
    - propagate plugin metadata into emitted core items and app-server v2
    `ThreadItem::McpToolCall`
    - preserve plugin ids through app-server replay/redaction paths and
    regenerate v2 schema fixtures
    
    ## Testing
    - `just write-app-server-schema`
    - `just fmt`
    - `just fix -p codex-core`
    - `cargo test -p codex-protocol -p codex-app-server-protocol`
    - `cargo test -p codex-app-server-protocol`
    - `cargo test -p codex-core mcp_tool_call_item_includes_plugin_id --lib`
    - `cargo check -p codex-tui --tests`
    - `cargo check -p codex-app-server --tests`
    - `git diff --check`
    
    ## Notes
    - `just fix -p codex-core` completed with two non-fatal
    `too_many_arguments` warnings on the touched MCP notification helpers.
    - A broader `cargo test -p codex-core` run passed core unit tests, then
    hit shell/sandbox/snapshot failures in the integration target.
    - A broader app-server downstream run hit the existing
    `in_process::tests::in_process_start_clamps_zero_channel_capacity` stack
    overflow; `cargo test -p codex-exec` also hit the existing sandbox
    expectation mismatch in
    `thread_lifecycle_params_include_legacy_sandbox_when_no_active_profile`.
  • ci: run Codex package builder tests (#23760)
    ## Why
    
    #23752 and #23759 add Python unit tests for the Codex package builder,
    but the root CI workflow did not run tests under
    `scripts/codex_package`. That left the `zstd` resolution and
    prebuilt-resource packaging behavior covered locally without a CI check.
    
    ## What changed
    
    - Add a root CI step in `.github/workflows/ci.yml` that runs `python3 -m
    unittest discover -s scripts/codex_package -p "test_*.py"`.
    - Keep the step with the existing Python verification checks before
    Node/pnpm setup.
    
    ## Verification
    
    - `python3 -m unittest discover -s scripts/codex_package -p "test_*.py"`
    - `python3 -m py_compile scripts/codex_package/*.py`