Commit Graph

103 Commits

  • Warn for structured feature toggles (#27076)
    ## Summary
    Startup warnings for under-development features only recognized bare
    boolean toggles like `features.foo = true`. An upcoming feature will use
    table-format config, so `features.foo = { enabled = true, ... }` needs
    to count as an explicit opt-in too.
    
    This updates the warning predicate to recognize structured tables with
    `enabled = true`, while leaving tables without that field unwarned.
    
    ## Testing
    - `just fmt`
    - `just test -p codex-features
    unstable_warning_event_mentions_enabled_structured_under_development_feature`
  • feat: add secret auth storage configuration (#27504)
    ## Why
    
    Windows Credential Manager limits generic credential blobs to 2,560
    bytes. The encrypted local secrets backend avoids storing large
    serialized auth payloads directly in the OS keyring, but selecting that
    backend needs an independently reviewable feature/config layer before
    the auth and secrets implementation is wired in.
    
    ## What Changed
    
    - Added the stable `secret_auth_storage` feature, enabled by default on
    Windows and disabled by default elsewhere.
    - Added `AuthKeyringBackendKind` and config resolution for full and
    bootstrap config loading.
    - Applied managed feature requirements when resolving the bootstrap auth
    backend.
    - Updated the generated config schema and added focused tests.
    
    This is the base PR for #17931. The auth, secrets, MCP, CLI, TUI, and
    app-server implementation remains in that follow-up PR.
    
    ## Validation
    
    - `just test -p codex-features`
    - `just test -p codex-config`
    - `just test -p codex-core
    resolve_bootstrap_auth_keyring_backend_kind_uses_secret_auth_storage_feature`
    - `just write-config-schema`
    - `just fix -p codex-core`
    
    The full `just test -p codex-core` run compiled successfully and ran
    2,690 tests; 2,589 passed, one was flaky, and 101 environment-sensitive
    tests failed because this shell injects a `pyenv` rehash warning into
    command output or because sandboxed subprocesses timed out.
  • core: enable remote compaction v2 by default (#27573)
    ## Why
    
    Remote compaction v2 is ready to become the default for providers that
    already support remote compaction. Leaving it behind an
    under-development opt-in keeps eligible sessions on the legacy
    remote-compaction path.
    
    This does not broaden provider eligibility: OpenAI and Azure move to v2,
    while Bedrock and OSS providers retain their existing local-compaction
    behavior.
    
    ## What changed
    
    - Mark `remote_compaction_v2` stable and enable it by default.
    - Make tests that intentionally cover legacy remote compaction
    explicitly disable v2.
    - Update parity coverage so v2 exercises the production default and only
    legacy mode opts out.
    
    ## Verification
    
    - `just test -p codex-core
    auto_compact_runs_after_resume_when_token_usage_is_over_limit
    auto_compact_counts_encrypted_reasoning_before_last_user
    auto_compact_runs_when_reasoning_header_clears_between_turns
    responses_lite_compact_request_uses_lite_transport_contract`
  • [codex] Add token budget context feature (#27438)
    ## Why
    
    The model should be able to see bounded context-window budget metadata
    when the `token_budget` feature is enabled. The full-window message is
    only injected with full context, while normal turns get a smaller
    follow-up only when reported usage first crosses a budget threshold.
    
    ## What changed
    
    - Added the `TokenBudget` feature flag.
    - Added `<token_budget>` developer fragments for full context-window
    metadata and current-window remaining tokens.
    - Inserted the threshold message during normal turn handling by
    comparing token usage before and after sampling, avoiding persistent
    threshold bookkeeping.
    - Added core integration coverage for full-context-only metadata and
    25/50/75 percent threshold messages.
    
    ## Verification
    
    - `just test -p codex-core token_budget`
    - `git diff --check`
  • core: resize all history images behind a feature flag (#27247)
    ## Summary
    
    Adds complete client-side image preparation behind the default-off
    `resize_all_images` feature flag.
    
    When enabled, local image producers defer decoding and resizing. Images
    are prepared centrally before insertion into conversation history,
    covering user input, `view_image`, and structured tool-output images.
    
    ## Behavior
    
    - Processes base64 `data:` images in messages and function/custom tool
    outputs.
    - Leaves non-data URLs, including HTTP(S) URLs, unchanged.
    - Applies image-detail budgets:
      - `high` and omitted: 2048px maximum dimension and 2.5K 32px patches.
      - `original`: 6000px maximum dimension and 10K 32px patches.
      - `auto`: uses the same 2048px / 2.5K-patch budget as high.
      - `low`: unsupported and replaced with an actionable placeholder.
    - Preserves original image bytes when no resize or format conversion is
    needed.
    - Enforces the shared 1 GiB encoded and decoded data-URL sanity limits.
    - Replaces only an image that fails preparation, preserving sibling
    content and tool-output metadata.
    - Uses bounded placeholders distinguishing generic processing failures,
    oversized images, and unsupported `low` detail.
    - Prepares resumed and forked history before installing it as live
    history without modifying persisted rollouts.
    
    ## Flag-Off Behavior
    
    When `resize_all_images` is disabled:
    
    - Existing local user-input and `view_image` processing remains
    unchanged.
    - Existing decoding and error behavior remains unchanged.
    - Arbitrary tool-output images are not processed.
    - HTTP(S) image URLs continue to be forwarded unchanged.
    
    
    #### [git stack](https://github.com/magus/git-stack-cli)
    -  `1` https://github.com/openai/codex/pull/27245
    - 👉 `2` https://github.com/openai/codex/pull/27247
    -  `3` https://github.com/openai/codex/pull/27246
    -  `4` https://github.com/openai/codex/pull/27266
  • [codex] remove blocking external agent migration flow (#27064)
    ## Why
    
    External-agent import should be initiated deliberately instead of
    interrupting eligible TUI startups. This cleanup removes the blocking
    startup flow before the replacement import experience is introduced
    later in the stack.
    
    ## What changed
    
    - remove the startup-blocking external-agent migration prompt
    - remove the now-unused external migration feature gate
    - remove the obsolete TUI app-server migration wrappers
    - retain the dormant picker behind a module-scoped dead-code allowance
    until the next stack item wires it back in
    - keep normal TUI startup focused on entering Codex immediately
    
    ## Validation
    
    - `bazel build --config=clippy //codex-rs/tui:tui
    //codex-rs/tui:tui-unit-tests-bin`
    - `just test -p codex-tui external_agent_config_migration` (8 passed)
    - `just test -p codex-tui` (2,786 passed, 12 unrelated local
    environment-sensitive failures, 4 skipped)
    - `just fix -p codex-tui`
    - `just fmt`
    
    ## Stack
    
    1. [#27064](https://github.com/openai/codex/pull/27064): remove the
    startup migration flow
    2. [#27065](https://github.com/openai/codex/pull/27065): extract the
    picker renderer
    3. [#27070](https://github.com/openai/codex/pull/27070): add the
    external-agent import picker UX
    4. [#27071](https://github.com/openai/codex/pull/27071): expose the flow
    through `/import`
    
    **This PR is stack item 1.**
  • Use plugin-service MCP as the hosted plugin runtime (#27198)
    ## Stack
    
    - Base: #27191
    - This PR is the third vertical and should be reviewed against
    `jif/external-plugins-2`, not `main`.
    
    ## Why
    
    #27191 moves the host-owned Apps MCP registration behind an extension
    contributor, but deliberately preserves the existing endpoint-selection
    feature while that contribution contract lands. App-server can therefore
    resolve the server through extensions, yet the hosted plugin endpoint is
    still selected through temporary `apps_mcp_path_override` plumbing.
    
    That is not the long-term plugin model. A plugin can bundle skills,
    connectors, MCP servers, and hooks, and those components do not all need
    the same source or execution environment. In particular, an
    authenticated HTTP MCP server can expose plugin capabilities directly
    from a backend without an executor or an orchestrator filesystem.
    
    This PR completes that hosted vertical. App-server's MCP extension now
    owns the aggregate hosted plugin runtime at `/ps/mcp`. Connector actions
    continue to arrive as MCP tools, while backend-provided skills arrive as
    MCP resources and use Codex's existing resource list/read paths. No
    second backend client, skill filesystem, or generic plugin activation
    framework is introduced.
    
    The backend route remains the hosted implementation. This change
    replaces Codex's temporary endpoint-selection mechanism, not the service
    behind the endpoint.
    
    ## What changed
    
    ### Hosted plugin runtime
    
    The MCP extension now contributes `codex_apps` as the hosted plugin
    runtime rather than as a configurable Apps endpoint:
    
    - `https://chatgpt.com` resolves to
    `https://chatgpt.com/backend-api/ps/mcp`;
    - a bare custom ChatGPT base resolves to `/api/codex/ps/mcp`;
    - the existing product-SKU header and ChatGPT authentication behavior
    are preserved;
    - executor availability is never consulted for this streamable HTTP
    transport.
    
    The same MCP connection carries both component shapes supported by the
    hosted endpoint:
    
    - connector actions are discovered and invoked as MCP tools;
    - hosted skills are enumerated and read as MCP resources through the
    existing `list_mcp_resources` and `read_mcp_resource` paths.
    
    This keeps component access in the subsystem that already owns the
    protocol instead of downloading backend skills into an orchestrator
    filesystem or inventing a parallel hosted-skill client.
    
    ### Explicit runtime ordering
    
    `McpManager` now resolves the reserved `codex_apps` entry in three
    ordered phases:
    
    1. install the legacy Apps fallback for compatibility;
    2. apply ordered extension `Set` or `Remove` overlays;
    3. apply the final ChatGPT-auth gate without synthesizing the server
    again.
    
    This ordering is important:
    
    - an ordinary configured or plugin MCP server cannot claim the
    auth-bearing `codex_apps` name;
    - an extension-contributed hosted runtime wins over the fallback;
    - an extension `Remove` remains authoritative;
    - a host without the MCP extension retains the legacy Apps endpoint and
    current local-only behavior.
    
    The temporary `legacy_apps_mcp_loader_enabled` coordination flag is no
    longer needed.
    
    ### Remove the path override
    
    The `apps_mcp_path_override` feature and its runtime plumbing are
    removed, including:
    
    - the feature registry entry and structured feature config;
    - `Config` and `McpConfig` fields;
    - config schema output;
    - config-lock materialization;
    - URL override handling in `codex-mcp`.
    
    Existing boolean and structured forms still deserialize as ignored
    compatibility input. They are omitted from new serialized config, and
    config-lock comparison normalizes the removed input so older locks
    remain replayable.
    
    ### App-server coverage
    
    App-server MCP fixtures now serve the hosted route at
    `/api/codex/ps/mcp`. Existing resource-read and tool/elicitation flows
    therefore exercise the extension-owned endpoint rather than succeeding
    through the legacy fallback.
    
    The stack also adds the missing `codex_chatgpt::connectors` re-export
    for the manager-backed connector helper introduced in #27191.
    
    ## Compatibility
    
    - App-server installs the extension and uses `/ps/mcp` for the hosted
    runtime.
    - CLI and other hosts that do not install the extension retain the
    legacy Apps endpoint.
    - Apps disabled or non-ChatGPT authentication removes `codex_apps` from
    the effective runtime view.
    - Existing local plugins, local skills, executor-selected skills,
    configured MCP servers, and MCP OAuth behavior are otherwise unchanged.
    - Backend plugin enablement remains account/workspace state owned by the
    hosted endpoint; this PR does not add thread-local backend plugin
    selection.
    
    ## Architectural fit
    
    The stack now proves two independent runtime shapes:
    
    1. #27184 resolves filesystem-backed skills through the executor that
    owns a selected root.
    2. #27191 and this PR resolve a backend-hosted HTTP MCP through an
    extension with no executor.
    
    Together they preserve the intended separation:
    
    - selection identifies a plugin/root when explicit selection is needed;
    - each component's owning extension resolves its concrete access
    mechanism;
    - execution stays with the runtime required by that component;
    - existing skills, MCP, connector, and hook subsystems remain the
    downstream consumers.
    
    ## Planned follow-ups
    
    1. **Executor stdio MCP:** selecting an executor plugin registers a
    manifest-declared stdio MCP server and executes it in the environment
    that owns the plugin.
    2. **Optional backend selection:** only if CCA needs thread-local
    selection distinct from backend account/workspace enablement, add a
    concrete backend-owned capability location and surface those selected
    skills through the skills catalog.
    3. **Connector metadata and hooks:** activate those plugin components
    through their existing owning subsystems, with executor hooks remaining
    environment-bound.
    4. **Propagation and persistence:** define explicit resume, fork,
    subagent, refresh, and environment-removal semantics once selected roots
    have multiple real consumers.
    5. **Local convergence:** migrate legacy local skill, MCP, connector,
    and hook paths behind their owning extensions one vertical at a time,
    then remove duplicate core managers and compatibility plumbing after
    parity.
    
    ## Verification
    
    Coverage in this change exercises:
    
    - extension-owned `/backend-api/ps/mcp` registration without an
    executor;
    - preservation of the legacy endpoint in hosts without the extension;
    - extension `Set` and `Remove` precedence over the legacy fallback;
    - ChatGPT-auth gating for the reserved server;
    - hosted MCP resource reads with and without an active thread;
    - connector tool invocation and MCP elicitation through the hosted
    route;
    - ignored boolean and structured forms of the removed path override;
    - config-lock replay compatibility for the removed feature.
    
    `cargo check -p codex-features -p codex-mcp-extension -p
    codex-app-server` passes. Tests and Clippy were not run locally under
    the current development instruction; CI provides the full validation
    pass.
  • [codex] Gate terminal visualization instructions in TUI (#26013)
    ## Summary
    - add `Feature::TerminalVisualizationInstructions` as
    `UnderDevelopment`, disabled by default
    - keep terminal visualization instructions inside the TUI package
    - append them to existing developer instructions for TUI start, resume,
    and fork flows only when enabled
    - intentionally do not apply them to `codex exec`
    
    ## Rollout
    Control behavior is unchanged. TUI dogfooders can enable
    `terminal_visualization_instructions`; no default user receives the new
    terminal-specific instructions.
    
    The shared visualization-selection rule is supplied separately through
    the `codex_proxy_model_3` Statsig layer for every target Codex model
    slug in the gated cohort. This TUI feature determines how to render an
    appropriate visualization on the terminal surface; the model-layer
    treatment determines when to use one.
    
    ## Validation
    - `cargo test -p codex-tui
    terminal_visualization_instructions_are_gated_for_all_tui_thread_flows
    --lib`
    - `cargo test -p codex-features --lib`
    - `cargo fmt --all -- --check`
    - `git diff --check`
    - GPT-5.4 and GPT-5.5 real prompt-pipeline smoke tests: both visualized
    the positive mapping case, abstained on the negative route case, and
    passed exact prompt-stack verification on CLI and App
    - refreshed onto current `main` with a clean merge and reran the focused
    validation
    
    The full 53-probe all-model treatment comparison and requested
    production coding evals remain rollout gates before broadening beyond
    the initial employee cohort.
    
    This PR remains open for normal human review.
  • Remove response.processed websocket request (#26447)
    ## Why
    
    The Responses websocket client no longer needs to send a follow-up
    `response.processed` request after a turn response has already been
    recorded. Keeping that extra acknowledgement path adds feature-gated
    control flow and a second websocket request shape that no longer carries
    useful behavior.
    
    ## What Changed
    
    - Removed the `response.processed` websocket request type and sender.
    - Removed the `responses_websocket_response_processed` feature flag and
    schema entry.
    - Removed turn and remote-compaction plumbing that only tracked response
    IDs to send the acknowledgement.
    - Removed tests that existed solely to cover the deleted feature path.
    
    ## Validation
    
    - `just fix -p codex-core -p codex-api -p codex-features`
  • core: allow excluding tool namespaces from code mode (#26320)
    ## Why
    
    Research and training setups need to control which tool namespaces
    appear inside code mode's nested `tools` surface without disabling those
    tools entirely. This makes it possible to train against a deliberately
    reduced nested-tool setup while preserving the normal direct and
    deferred tool paths.
    
    ## What
    
    - Extend `features.code_mode` to accept structured configuration while
    preserving the existing boolean syntax.
    - Add an exact `excluded_tool_namespaces` list under
    `[features.code_mode]`:
    
      ```toml
      [features.code_mode]
      enabled = true
      excluded_tool_namespaces = ["mcp__codex_apps", "multi_agent_v1"]
      ```
    
    - Filter matching canonical `ToolName` namespaces when constructing code
    mode's nested router and code-mode-specific direct tool descriptions.
    - Keep excluded tools registered, directly exposed in mixed code mode,
    and discoverable through top-level `tool_search` when otherwise
    eligible.
    - Derive deferred nested-tool guidance after namespace filtering so the
    `exec` description does not advertise excluded-only deferred tools.
    - Preserve the boolean/table representation when materializing config
    locks and update the generated config schema.
    
    ## Testing
    
    - `just test -p codex-features`
    - `just test -p codex-config`
    - `just test -p codex-core load_config_resolves_code_mode_config`
    - `just test -p codex-core
    lock_contains_prompts_and_materializes_features`
    - `just test -p codex-core
    excluded_deferred_namespaces_do_not_enable_nested_tool_guidance`
    - `just test -p codex-core
    code_mode_excludes_configured_nested_tool_namespaces`
    - `cargo check -p codex-thread-manager-sample`
  • feat: gate unified exec zsh fork composition (#24979)
    ## Why
    
    `shell_zsh_fork` and unified exec need to remain independently
    controllable for enterprise rollouts, but we also need a third mode that
    composes them. That composed mode is intended to preserve unified exec
    command lifecycle support while letting the zsh fork provide more
    accurate `execv(2)` interception.
    
    Enabling `unified_exec_zsh_fork` by itself is intentionally not
    sufficient. It is a composition gate, not a dependency-enabling
    shortcut:
    
    - `unified_exec` selects the PTY-backed unified exec tool.
    - `shell_zsh_fork` opts into the zsh fork backend.
    - `unified_exec_zsh_fork` only allows those two already-enabled modes to
    be composed so local zsh unified exec commands can launch through the
    zsh fork.
    
    This separation is deliberate. Enterprises and staged rollouts must be
    able to enable or disable unified exec and zsh-fork independently. If
    `unified_exec_zsh_fork` implied either dependency, then enabling one
    under-development composition flag would silently activate a shell
    backend that the configured feature set left disabled.
    
    This PR introduces only the configuration and planning gate for that
    composition. Existing `shell_zsh_fork` behavior continues to use the
    standalone shell tool unless the new composition feature is explicitly
    enabled alongside both dependencies.
    
    ## What Changed
    
    - Added the under-development feature flag `unified_exec_zsh_fork`.
    - Added `UnifiedExecFeatureMode` so the three input feature flags
    collapse into `Disabled`, `Direct`, or `ZshFork` mode before tool
    planning.
    - Updated tool selection so zsh-fork composition requires
    `unified_exec`, `shell_zsh_fork`, and `unified_exec_zsh_fork`.
    - Kept the existing standalone zsh-fork shell tool behavior when only
    `shell_zsh_fork` is enabled.
    - Updated config schema output for the new feature flag.
    
    ## Verification
    
    - Added feature and tool-config coverage for the new gate.
    - Added planner coverage proving `shell_zsh_fork` remains standalone
    until composition is explicitly enabled.
    - Ran focused tests for `codex-features`, `codex-tools`, and the
    affected `codex-core` planner case.
    
    
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/24979).
    * #24982
    * #24981
    * #24980
    * __->__ #24979
  • Compress cold local rollouts (#25089)
    ## Rollout compression stack
    
    This stack splits #24941 into reviewable steps for local rollout
    compression. The design is intentionally staged:
    
    1. Teach readers, listing, search, and lookup to understand compressed
    rollouts.
    2. Make append and resume paths materialize compressed rollouts back to
    plain JSONL before writing.
    3. Add a disabled-by-default worker that can compress cold archived
    rollouts behind `local_thread_store_compression`.
    
    The key invariant is that writers append to plain `.jsonl`. A
    `.jsonl.zst` file is a cold/read representation; if a write is needed,
    the compressed file is materialized back to plain JSONL first. Readers
    prefer plain `.jsonl` when both forms exist and can fall back to the
    compressed sibling during transitions.
    
    The worker is deliberately the last PR and remains behind an
    under-development feature flag. It currently scans only
    `archived_sessions`, not active `sessions`, because active sessions have
    the highest resume/append race risk. That means this stack does not yet
    compress most unarchived local history.
    
    ## Known race / follow-up
    
    The remaining unresolved design question is writer/compressor
    coordination. Even for archived rollouts, a resume or metadata update
    can append while the worker is replacing the plain file with
    `.jsonl.zst`; the current double-stat checks narrow but do not fully
    eliminate the window where a writer has opened the plain file before
    unlink. Do not treat the worker PR as production-ready until we either:
    
    - prevent append/resume paths from racing archived compression, or
    - introduce a shared representation/append lock or equivalent
    coordination.
    
    The first two PRs are useful independently: they make compressed
    rollouts readable and make append paths safely recover back to plain
    JSONL. The third PR isolates the worker behavior so that coordination
    issue is reviewable separately.
    
    ## Validation
    
    Focused local validation for the stack includes:
    
    - `just test -p codex-rollout`
    - `just test -p codex-thread-store` where thread-store paths were
    touched
    - `just test -p codex-features` for the feature flag slice
    - `just bazel-lock-check` after dependency graph changes
    - scoped `just fix -p ...` passes for changed crates
    
    CI is still the source of truth for the full platform matrix.
    
    ## This PR in the stack
    
    This is PR 3/3, based on #25088. It adds the under-development feature
    flag and starts the best-effort background worker when enabled. The
    worker currently compresses only cold archived rollouts, skips active
    sessions, verifies compressed output, preserves mtime and permissions,
    keeps a store-level lock heartbeat, and cleans stale temp files.
    
    Stack order:
    
    1. #25087: read compressed local rollouts.
    2. #25088: materialize compressed rollouts before append.
    3. This PR: add the disabled local compression worker.
  • fix(config): use deny for Unix socket permissions (#24970)
    ## Why
    
    Unix socket permissions still accepted and displayed `"none"` while file
    permissions use the clearer `"deny"` spelling. This keeps network Unix
    socket policy vocabulary consistent with filesystem policy vocabulary.
    
    ## What changed
    
    - Replace the Unix socket permission variant and serialized spelling
    from `none` to `deny` across config, feature configuration, and network
    proxy types.
    - Update app-server v2 serialization, TUI debug output, focused tests,
    and generated schemas to expose `"deny"`.
    - Add coverage for denied Unix socket entries in managed requirements
    and profile overlay behavior.
    
    ## Security
    
    This is a vocabulary change for explicit Unix socket rejection, not a
    network access expansion. Denied entries continue to be omitted from the
    effective allowlist.
    
    ## Validation
    
    - `just fmt`
    - `just write-config-schema`
    - `just write-app-server-schema`
    - `just test -p codex-config -p codex-core -p codex-app-server-protocol
    -p codex-tui -E
    'test(network_requirements_are_preserved_as_constraints_with_source) |
    test(network_permission_containers_project_allowed_and_denied_entries) |
    test(network_toml_overlays_unix_socket_permissions_by_path) |
    test(permissions_profiles_resolve_extends_parent_first_with_child_overrides)
    | test(network_requirements_serializes_canonical_and_legacy_fields) |
    test(debug_config_output_formats_unix_socket_permissions)'`\n- Automatic
    `bench-smoke` follow-up from `just test`\n- `cargo clippy -p
    codex-config -p codex-core -p codex-features -p codex-network-proxy -p
    codex-app-server-protocol -p codex-app-server -p codex-tui --all-targets
    -- -D warnings`
  • Add feature-gated standalone image generation extension (#24723)
    ## Why
    
    Add a standalone image generation path that can be exercised
    independently of hosted Responses image generation, while retaining the
    hosted tool as fallback unless the extension is actually available to
    the model.
    
    ## What changed
    
    - Added the `codex-image-generation-extension` crate with standalone
    generate/edit execution, prior-image selection for edits, model-visible
    image output, and local generated-image persistence.
    - Installed the extension in app-server behind the disabled-by-default
    `imagegenext` feature and backend eligibility checks.
    - Updated core tool planning so eligible `image_gen.imagegen` exposure
    replaces hosted `image_generation`, while unavailable configurations
    retain hosted fallback.
    - Added coverage for extension behavior, edit history reuse, feature
    gating, auth eligibility, and hosted-tool replacement.
    - The extension is installed through app-server only in this PR; other
    execution paths retain hosted image generation because hosted
    replacement occurs only when the standalone executor is actually
    registered and model-visible.
    - The initial extension contract intentionally fixes the image model to
    `gpt-image-2` and uses automatic image parameters.
    - Native generated-image history/card parity and rollout persistence
    cleanup are intentionally deferred follow-up work.
    
    ## Validation
    
    - `just test -p codex-image-generation-extension`
    - `just test -p codex-features`
    - `just test -p codex-core
    hosted_tools_follow_provider_auth_model_and_config_gates`
    - `just test -p codex-app-server`
    - `just fix -p codex-image-generation-extension -p codex-features -p
    codex-core -p codex-app-server`
    - `just fmt`
    - `just bazel-lock-update`
    - `just bazel-lock-check`
    
    ---------
    
    Co-authored-by: jif-oai <jif@openai.com>
  • TUI: Unified mentions tweaks + polish mentions rendering (#23363)
    This change keeps unified @mentions behind the mentions_v2 gate, moves
    the flag to under-development, and polishes mention rendering/history
    behavior.
    
    It also adds a few small improvements to the mentions feature around
    mention rendering and history round-tripping for plugin/tool mentions in
    message edit scenarios. Plugin selections now insert `@` mentions with
    better casing, and saved history preserves the visible sigil so recalled
    messages look the same as what the user typed.
    
    - Preserves `@` sigils when encoding/decoding mention history for
    tool/plugin paths.
    - Improves plugin mention insertion so display names/casing are
    reflected more cleanly in the composer.
    - Update composer to render user-entered plugin mentions in the same
    color as the mentions menu. ALso applies to recalled/edited messages.
    - Left/right arrows no longer switch unified-mention search modes after
    an @mention has already been accepted (Ex: arrowing left through a
    composed message that contains @mentions).
    - Keeps bound mentions stable around punctuation, so accepted `@`
    mentions do not reopen the popup and punctuated `$` mentions still
    persist to cross-session history.
    
    **Steps to test**
    - Ensure mentions_v2 is enabled through configuration or `--enable
    mentions_v2`
    - Type `@` in the TUI composer and verify filesystem/plugin/skill
    results are displayed in the unified mentions menu.
    - Select a plugin mention from the `@` popup and confirm the inserted
    text is an `@...` mention with casing, then recall/edit the message and
    confirm it still renders as `@...`.
    - Mention a skill and verify that skills still insert as `$skill`
    mentions rather than `@` mentions.
    - Verify punctuated mentions such as `@plugin.` and `($skill)` keep
    their bound mention behavior across editing and history recall.
  • standalone websearch extension (#23823)
    ## Summary
    
    Add the extension-backed standalone `web.run` tool so Codex can call the
    standalone search endpoint through the `codex-api` search client and
    return its encrypted output to Responses.
    
    - gate the new tool behind `standalone_web_search`
    - install the extension in the app-server thread registry and hide
    hosted `web_search` when standalone search is enabled for OpenAI
    providers so the two paths stay mutually exclusive
    - build search context from persisted history using a small tail
    heuristic: previous user message, assistant text between the last two
    user turns capped at about 1k tokens, and current user message
    
    ## Test Plan
    
    - `cargo test -p codex-web-search-extension`
    - `cargo test -p codex-api`
    - `cargo test -p codex-core
    hosted_tools_follow_provider_auth_model_and_config_gates`
  • Move MCP tool naming mode into manager (#21576)
    ## Why
    
    The `non_prefixed_mcp_tool_names` feature should be applied where MCP
    tools become model-visible, not by remapping names later in core.
    Keeping the decision in `McpConnectionManager` construction makes
    `ToolInfo` the single shaped view that spec building, deferred tool
    search, routing, and unavailable-tool placeholders can consume directly.
    
    This also preserves the existing external behavior while the feature is
    off, and keeps the feature-on behavior for code mode and hooks explicit
    at the manager boundary.
    
    ## What Changed
    
    - Add `McpToolNameMode` to `codex-mcp` and flow it through `McpConfig`
    into `McpConnectionManager::new`.
    - Normalize MCP `ToolInfo` names in the manager using either
    legacy-prefixed namespaces or non-prefixed namespaces; the legacy path
    adds `mcp__` without restoring the old trailing namespace suffix.
    - Remove the core-side MCP name remapping path so specs, tool search,
    session resolution, and unavailable-tool placeholder construction use
    the manager-provided `ToolName` values directly.
    - Keep code mode flattening on the `__` namespace separator.
    - Preserve hook compatibility by giving non-prefixed MCP hook names
    legacy `mcp__...` matcher aliases.
    - Add/adjust integration and unit coverage for non-prefixed code-mode
    behavior, hook matching with the feature on and off, and manager-level
    legacy prefixing.
    
    ## Testing
    
    - `cargo test -p codex-mcp --lib`
    - `cargo test -p codex-core --lib tools::spec::tests -- --nocapture`
    - `cargo test -p codex-core --lib mcp_tools -- --nocapture`
    - `cargo test -p codex-core --lib mcp_tool_exposure -- --nocapture`
    - `cargo test -p codex-core --test all mcp_tool -- --nocapture`
    - `cargo test -p codex-core --test all search_tool -- --nocapture`
    - `cargo test -p codex-core --test all hooks_mcp -- --nocapture`
    - `cargo test -p codex-core --test all
    code_mode_uses_non_prefixed_mcp_tool_names_when_feature_enabled --
    --nocapture`
    - `cargo test -p codex-tools`
    - `cargo test -p codex-features`
  • Remove plugin hooks feature flag (#22552)
    # Why
    
    This is a follow-up stacked on top of the `plugin_hooks` default-on
    change. Once we are comfortable making plugin hooks part of the normal
    plugin behavior, the separate feature flag stops buying us much and
    leaves extra branching/cache state behind.
    
    # What
    
    - remove the `PluginHooks` feature and generated config-schema entries
    - make plugin hook loading/listing follow plugin enablement directly
    - drop plugin-manager cache/state that only existed to distinguish
    hook-flag toggles
    - remove tests and fixtures that modeled `plugin_hooks = true/false`
  • Make goals feature on by default and no longer experimental (#23732)
    ## Why
    
    The `goals` feature is ready to be available without requiring users to
    opt into experimental features. Keeping it behind the beta flag leaves
    persisted thread goals and automatic goal continuation disabled by
    default.
    
    This PR also marks the goal-related app server APIs and events as no
    longer experimental.
    
    ## What changed
    
    - Mark `goals` as `Stage::Stable`.
    - Enable `goals` by default in `codex-rs/features/src/lib.rs`.
  • Remove ToolSearch feature toggle (#23389)
    ## Summary
    - mark `ToolSearch` as removed and ignore stale config writes for its
    legacy key
    - make search tool exposure depend only on model capability, not a
    feature toggle
    - remove app-server enablement support and prune now-obsolete test
    coverage/setup
    
    ## Verification
    - `cargo test -p codex-features`
    - `cargo test -p codex-tools`
    - `cargo test -p codex-core search_tool_requires_model_capability`
    - `cargo test -p codex-app-server experimental_feature_enablement_set_`
    
    ## Notes
    - This keeps the legacy config key as a no-op for compatibility while
    removing the ability to toggle the behavior off cleanly.
    - No developer-facing docs update outside the touched app-server README
    was needed.
  • cleanup: Remove skill env var dependency prompting (#22721)
    Deletes the skill env var dependency prompt feature and its runtime
    path. env_var entries in skill dependency metadata are now silently
    ignored during skill loading.
  • Make multi-agent v2 tool namespace configurable (#23147)
    ## Summary
    - Add `features.multi_agent_v2.tool_namespace` with config/schema
    validation for Responses-compatible namespace values.
    - Thread the resolved namespace into `ToolsConfig` for normal turns and
    review turns.
    - Wrap MultiAgentV2 tool specs and registry names in the configured
    namespace when namespace tools are supported, while falling back to the
    plain tool names when they are not.
    
    ## Validation
    - `just fmt`
    - `just write-config-schema`
    - `cargo test -p codex-features multi_agent_v2_feature_config --
    --nocapture`
    - `cargo test -p codex-core test_build_specs_multi_agent_v2 --
    --nocapture`
    - `cargo test -p codex-core multi_agent_v2_config -- --nocapture`
    - `cargo test -p codex-core
    multi_agent_v2_rejects_invalid_tool_namespace -- --nocapture`
    - `cargo test -p codex-tools`
    - `git diff --check`
  • [codex] Group removed feature flags (#22730)
    ## Summary
    - move removed feature enum variants under the existing Removed section
    - keep active feature variants grouped away from no-op compatibility
    flags
    
    ## Test plan
    - just fmt
    - cargo test -p codex-features
    
    Co-authored-by: Codex <noreply@openai.com>
  • chore(features) rm Feature::ApplyPatchFreeform (#22711)
    ## Summary
    Removes the feature since this is effectively on by default in all cases
    where we should use it, or can be configured via models.json.
    
    ## Testing
    - [x] unit tests pass
  • enable/disable remote control at runtime, not via features (#22578)
    ## Why
    reapplies https://github.com/openai/codex/pull/22386 which was
    previously reverted
    
    Also, introduce `remoteControl/enable` and `remoteControl/disable`
    app-server APIs to toggle on/off remote control at runtime for a given
    running app-server instance.
    
    ## What Changed
    
    - Adds experimental v2 RPCs:
      - `remoteControl/enable`
      - `remoteControl/disable`
    - Adds `RemoteControlRequestProcessor` and routes the new RPCs through
    it instead of `ConfigRequestProcessor`.
    - Adds named `RemoteControlHandle::enable`, `disable`, and `status`
    methods.
    - Makes `remoteControl/enable` return an error when sqlite state DB is
    unavailable, while keeping enrollment/websocket failures as async status
    updates.
    - Adds `AppServerRuntimeOptions.remote_control_enabled` and hidden
    `--remote-control` flags for `codex app-server` and `codex-app-server`.
    - Updates managed daemon startup to use `codex app-server
    --remote-control --listen unix://`.
    - Marks `Feature::RemoteControl` as removed and ignores
    `[features].remote_control`.
    - Updates app-server README entries for the new remote-control methods.
  • chore(config) rm experimental_use_freeform_apply_patch (#22565)
    ## Summary
    Get rid of the `experimental_use_freeform_apply_patch` config option,
    since it is now encoded in model config. No deprecation message since it
    has been experimental this entire time.
    
    ## Testing
    - [x] Updated unit tests
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Make multi_agent_v2 wait_agent timeouts configurable (#22528)
    ## Why
    
    `multi_agent_v2` already allowed configuring the minimum `wait_agent`
    timeout, but the default timeout and upper bound were still hard-coded.
    That made it hard to tune waits for subagent mailbox activity in
    sessions that need either faster wakeups or longer waits, and it meant
    the model-visible `wait_agent` schema could not fully reflect the
    resolved runtime limits.
    
    ## What Changed
    
    - Added `features.multi_agent_v2.max_wait_timeout_ms` and
    `features.multi_agent_v2.default_wait_timeout_ms` alongside the existing
    `min_wait_timeout_ms` setting.
    - Validated all three timeouts in config as `0..=3_600_000`, with
    `min_wait_timeout_ms <= default_wait_timeout_ms <= max_wait_timeout_ms`.
    - Thread and review session tool config now passes the resolved
    min/default/max values into the `wait_agent` tool schema.
    - `wait_agent` now uses the configured default when `timeout_ms` is
    omitted and rejects explicit values outside the configured min/max range
    instead of silently clamping them.
    - Updated the generated config schema and config-lock test coverage for
    the new fields.
  • Enable plugin hooks by default (#22549)
    # Why
    
    Plugin-bundled hooks are already wired through the plugin manager,
    session setup, and app-server hook listing paths. Keeping `plugin_hooks`
    disabled by default means users still need an explicit feature opt-in
    before that existing behavior participates in normal plugin loading.
    
    # What
    
    - mark `plugin_hooks` as stable and enable it by default
    - add feature-registry test coverage for the new default/stage pairing
    
    Validation:
    
    - `cargo test -p codex-features`
    - `just fmt`
  • chore(config) rm Feature::CodexGitCommit (#22412)
    ## Summary
    Removes the unused Feature::CodexGitCommit
    
    ## Testing
    - [x] tests pass
  • feat: expose multi-agent v2 as model-only tools (#22514)
    ## Why
    
    `code_mode_only` filters code-mode nested tools out of the top-level
    tool list. For multi-agent v2, we need a rollout shape where the
    collaboration tools remain callable as normal model tools without also
    being embedded into the code-mode `exec` tool declaration.
    
    Related to this:
    https://openai-corpws.slack.com/archives/C0AQLHB4U75/p1778660267922549
    
    ## What Changed
    
    - Adds `features.multi_agent_v2.non_code_mode_only`, including config
    resolution, profile override handling, and generated schema coverage.
    - Introduces `ToolExposure::DirectModelOnly` so a tool can be included
    in the initial model-visible list while staying out of the nested
    code-mode tool surface.
    - Applies that exposure to the multi-agent v2 tools when the new flag is
    set: `spawn_agent`, `send_message`, `followup_task`, `wait_agent`,
    `close_agent`, and `list_agents`.
    - Updates code-mode-only filtering so direct-model-only tools remain
    visible while ordinary nested code-mode tools are still hidden.
    
    ## Verification
    
    - Added config parsing/profile tests for `non_code_mode_only`.
    - Added tool spec coverage for the code-mode-only multi-agent v2
    exposure behavior.
  • Remove unavailable MCP placeholder tool backfill (#22439)
    ## Why
    
    `UnavailableDummyTools` kept synthetic placeholder tools alive for
    historical tool calls whose backing MCP tool was no longer available.
    That path adds stale model-visible tool specs and special routing at the
    point where unavailable MCP calls should use ordinary current-tool
    handling. This removes the runtime backfill instead of preserving a
    second compatibility lane.
    
    ## Is it safe to remove?
    
    The unavailable tools were added in #17853 after a CS issue when a
    previously-called MCP tool failed to load and was omitted from the CS
    spec. Now that we have tool search, I think this is resolved:
    - API merges tools from previous TST output into effective tool set so
    theyre always in CS spec
    - if an MCP tool surfaced by TST later becomes unavailable, the model
    can still call it and it will just return model-visible error
    - both TST output and function call output are dropped on compaction so
    model will not remember old calls to MCP post compaction
    
    ## What changed
    
    - Delete unavailable-tool collection, placeholder handler, router/spec
    plumbing, and obsolete placeholder coverage.
    - Keep `features.unavailable_dummy_tools` as a removed no-op feature
    tombstone so existing configs still parse cleanly.
    - Add an integration-style `tool_search` regression test showing that a
    deferred MCP tool surfaced through `tool_search` still routes through
    MCP and returns a model-visible tool-call error rather than `unsupported
    call`.
    
    ## Verification
    
    - `cargo test -p codex-core tool_search`
  • feat: Expose plugin versions and gate plugin sharing (#22397)
    - Adds localVersion to plugin summaries and remoteVersion to share
    context, including generated API schemas.
    - Hydrates local and remote plugin versions from manifests and remote
    release metadata.
    - Adds default-on plugin_sharing gate for shared-with-me listing and
    plugin/share/save, with disabled-path errors
        and focused coverage.
  • mark Feature::RemoteControl as removed (#22386)
    ## Why
    
    `remote_control` can appear in `config.toml`, CLI feature overrides, and
    the app-server config APIs. Before this PR, app-server startup treated
    `config.features.enabled(Feature::RemoteControl)` as the signal to start
    remote control ([base
    code](https://github.com/openai/codex/blob/5e3ee5eddfa5333f2e0b011880abf0cbf92bd295/codex-rs/app-server/src/lib.rs#L678-L680)).
    That meant a user with:
    
    ```toml
    [features]
    remote_control = true
    ```
    
    would accidentally opt every app-server process into remote control.
    Remote-control startup should instead be a per-process launch decision
    made by CLI flags.
    
    ## What Changed
    
    - Marks `Feature::RemoteControl` as `Stage::Removed`, keeping
    `remote_control` as a known compatibility key while making it
    config-inert.
    - Adds a hidden `--remote-control` process flag to `codex app-server`
    and standalone `codex-app-server`.
    - Plumbs that flag through
    `AppServerRuntimeOptions.remote_control_enabled` and makes app-server
    startup use only that runtime option to decide whether to start remote
    control.
    - Removes the app-server config mutation hook that reloaded config and
    toggled remote control at runtime.
    - Updates managed daemon spawning to use `codex app-server
    --remote-control --listen unix://` instead of `--enable remote_control`.
    
    Config APIs can still list, read, write, and set `remote_control`; those
    operations just no longer affect remote-control process enrollment.
  • [codex] Remove workspace owner usage nudge gate (#20509)
    ## Summary
    - make workspace owner nudge handling unconditional in the TUI now that
    it is fully rolled out
    - keep `workspace_owner_usage_nudge` as a removed no-op compatibility
    flag so old configs/app overrides remain accepted during rollout
    - remove flag-disabled test setup
    
    ## Companion PR
    - https://github.com/openai/openai/pull/876351 removes the Codex Apps
    Statsig rollout gate override after this change is available to the
    app/runtime path
    
    ## Validation
    - `just write-config-schema`
    - `just fmt`
    - `cargo test -p codex-features`
    - `cargo test -p codex-tui status_and_layout`
  • feat: add network proxy feature flag (#20147)
    ## Why
    
    The permissions migration is making
    `permissions.<profile>.network.enabled` the canonical sandbox network
    bit, while proxy startup is a separate concern. Enabling network access
    should not implicitly start the proxy, and users who are still on legacy
    sandbox modes need a separate place to opt into proxy startup and
    provide proxy-specific settings.
    
    This follow-up to #19900 gives the network proxy its own feature surface
    instead of overloading permission-profile network semantics.
    
    ## What changed
    
    - Add an experimental `network_proxy` feature with a configurable
    `[features.network_proxy]` table.
    - Overlay `features.network_proxy` settings onto the configured proxy
    state after permission-profile selection, so the proxy only starts when
    the active `NetworkSandboxPolicy` already allows network access.
    - Preserve `[experimental_network]` startup behavior independently of
    the new feature flag.
    
    ## Behavior and examples
    
    There are now three related knobs:
    
    - `permissions.<profile>.network.enabled` controls whether the active
    permission profile has network access at all.
    - `features.network_proxy` enables proxy restrictions for an
    already-network-enabled profile.
    - Legacy `sandbox_mode` plus `[sandbox_workspace_write].network_access`
    still control whether legacy `workspace-write` has network access at
    all.
    
    The rule is:
    
    - network off + proxy flag on -> network stays off, proxy is a no-op
    - network on + proxy flag off -> unrestricted direct network
    - network on + proxy flag on -> network stays on, with proxy
    restrictions applied
    
    For permission profiles, the feature toggle adds proxy restrictions only
    when network access is already enabled:
    
    ```toml
    default_permissions = "workspace"
    
    [permissions.workspace.filesystem]
    ":minimal" = "read"
    
    [permissions.workspace.network]
    enabled = true
    
    [features]
    network_proxy = true
    ```
    
    If `network.enabled = false`, the same feature flag is a no-op: network
    remains off and the proxy does not start.
    
    For legacy sandbox config, `network_access` remains the master switch:
    
    ```toml
    sandbox_mode = "workspace-write"
    
    [sandbox_workspace_write]
    network_access = true
    
    [features]
    network_proxy = true
    ```
    
    That keeps legacy `workspace-write` network access on, but routes it
    through the proxy policy. If `network_access = false`, the proxy feature
    is a no-op and legacy `workspace-write` remains offline.
    
    The same proxy opt-in can be supplied from the CLI:
    
    ```bash
    codex -c 'features.network_proxy=true'
    ```
    
    Additional proxy settings can be supplied when a table is needed:
    
    ```bash
    codex \
      -c 'features.network_proxy.enabled=true' \
      -c 'features.network_proxy.enable_socks5=false'
    ```
    
    The intended behavior matrix is:
    
    | Config surface | Network setting | `features.network_proxy` | Direct
    sandbox network | Proxy |
    | --- | --- | --- | --- | --- |
    | Permission profile | `network.enabled = false` | off | restricted |
    off |
    | Permission profile | `network.enabled = false` | on | restricted | off
    |
    | Permission profile | `network.enabled = true` | off | enabled | off |
    | Permission profile | `network.enabled = true` | on | enabled | on |
    | Legacy `workspace-write` | `network_access = false` | off | restricted
    | off |
    | Legacy `workspace-write` | `network_access = false` | on | restricted
    | off |
    | Legacy `workspace-write` | `network_access = true` | off | enabled |
    off |
    | Legacy `workspace-write` | `network_access = true` | on | enabled | on
    |
    
    `[experimental_network]` requirements remain separate from the user
    feature toggle and still start the proxy on their own.
    
    Relevant code:
    -
    [`features/src/feature_configs.rs`](https://github.com/openai/codex/blob/43785aff47/codex-rs/features/src/feature_configs.rs#L58-L117)
    defines the feature-specific proxy config.
    -
    [`core/src/config/mod.rs`](https://github.com/openai/codex/blob/43785aff47/codex-rs/core/src/config/mod.rs#L1959-L1964)
    reads the feature table, and [later applies it only when network access
    is already
    enabled](https://github.com/openai/codex/blob/43785aff47/codex-rs/core/src/config/mod.rs#L2448-L2458).
    
    ## Verification
    
    Added focused coverage for:
    - keeping the proxy off when `features.network_proxy` is enabled but
    sandbox network access is disabled
    - the full permission-profile and legacy `workspace-write` matrix above
    - preserving `[experimental_network]` startup without the feature
    - reusing profile-supplied proxy settings when the feature is enabled
    
    Ran:
    - `cargo test -p codex-features`
    - `cargo test -p codex-core network_proxy_feature`
    - `cargo test -p codex-core
    experimental_network_requirements_enable_proxy_without_feature`
  • Unified mentions in TUI (#19068)
    This PR replaces the TUI’s file-only `@mention` popup with a unified
    mentions experience. Typing `@...` now searches across filesystem
    matches, installed plugins, and skills in one popup, with result types
    clearly labeled and selectable from the same flow.
    
    - Adds a unified `@mentions` popup that returns:
      - plugins
      - skills
      - files
      - directories
    
    - Adds search modes so users can narrow the popup without changing their
    query:
      - All Results _(default/same as Codex App)_
      - Filesystem Only
      - Plugins _(...and skills)_
    
    - Preserves existing insertion behavior:
      - selected file paths are inserted into the prompt
      - paths with spaces are quoted
      - image file selections still attach as images when possible
      - selecting a plugin or skill inserts the corresponding `$name`
    - the composer records the canonical mention binding, such as
    `plugin://...` or the skill path
    
    - Expanded `@mentions` rendering:
      - type tags for Plugin, Skill, File, and Dir
      - distinct plugin/filesystem colors
      - stable fixed-height layout (8 rows)
      - truncation behavior for narrow terminals
    
    Note:
    - The unified mentions popup does not display app connectors under
    `@mention` results for Codex App parity. Connector mentions remain
    available through the existing `$mention` path.
    
    
    https://github.com/user-attachments/assets/f93781ed-57d3-4cb5-9972-675bc5f3ef3f
  • chore: drop built-in MCPs (#22173)
    Drop something that was never used
  • [codex] Enable apply_patch freeform by default (#21687)
    ## Summary
    - enable `apply_patch_freeform` by default in the feature registry
    
    ## Why
    - make the freeform `apply_patch` tool available by default when model
    metadata does not explicitly opt into another mode
    
    ## Validation
    - `just fmt`
    - did not run tests
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • [codex] Add response.processed websocket request (#21284)
    ## Summary
    
    - Add a `response.processed` websocket request payload and sender for
    Responses API websockets.
    - Send `response.processed` from `try_run_sampling_request` after a
    response completes, local turn processing succeeds, and the
    session-owned feature flag is enabled.
    - Add websocket coverage for both enabled and disabled feature-flag
    behavior.
    
    ## Validation
    
    - `just fmt`
    - `cargo test -p codex-core response_processed`
    - `cargo test -p codex-api responses_websocket`
    - `cargo test -p codex-features
    responses_websocket_response_processed_is_under_development`
    - `git diff --check`
    - `just fix -p codex-api -p codex-core -p codex-features`
    - `git diff --check origin/main...HEAD`
  • Support Codex Apps auth elicitations (#19193)
    ## Summary
    
    - request URL-mode MCP elicitations when Codex Apps tool calls fail with
    connector auth metadata
    - route Codex Apps auth URL elicitations into the TUI app-link flow
    
    ## Test plan
    
    - `just fmt`
    - `cargo test -p codex-core mcp_tool_call::tests`
    - `cargo test -p codex-mcp`
    - `cargo test -p codex-tui bottom_pane::app_link_view::tests`
    - `just fix -p codex-core`
    - `just fix -p codex-mcp`
    - `just fix -p codex-tui`
    
    Also attempted broader local runs:
    
    - `cargo test -p codex-core` fails in unrelated
    config/request-permission/proxy-sensitive tests under the current Codex
    Desktop environment.
    - `cargo test -p codex-tui` fails in unrelated status
    snapshots/trust-default tests because the ambient environment renders
    workspace-write/network permission defaults.
  • feat: add remote compaction v2 Responses client path (#20773)
    ## Why
    
    This adds the `remote_compaction_v2` client path so remote compaction
    can run through the normal Responses stream and install a
    `context_compaction` item that trigger a compaction.
    
    The goal is to migrate some of the compaction logic on the client side
    
    We keeps the v2 transport behind a feature flag while letting follow-up
    requests reuse the compacted context instead of falling back to the
    legacy compaction item shape.
    
    ## What changed
    
    - add `ResponseItem::ContextCompaction` and refresh the generated
    app-server / schema / TypeScript fixtures that expose response items on
    the wire
    - add `core/src/compact_remote_v2.rs` to send compaction through the
    standard streamed Responses client, require exactly one
    `context_compaction` output item, and install that item into compacted
    history
    - route manual compact and auto-compaction through the v2 path when
    `remote_compaction_v2` is enabled, while keeping the existing remote
    compaction path as the fallback
    - preserve the new item type across history retention, follow-up request
    construction, telemetry, rollout persistence, and rollout-trace
    normalization
    - add targeted coverage for the feature flag, `context_compaction`
    serialization, rollout-trace normalization, and remote-compaction
    follow-up behavior
    
    ## Verification
    
    - added protocol tests for `context_compaction`
    serialization/deserialization in `protocol/src/models.rs`
    - added rollout-trace coverage for `context_compaction` normalization in
    `rollout-trace/src/reducer/conversation_tests.rs`
    - added remote compaction integration coverage for v2 follow-up reuse
    and mixed compaction output streams in
    `core/tests/suite/compact_remote.rs`
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • feat: export and replay effective config locks (#20405)
    ## Why
    
    For reproducibility. A hand-written `config.toml` is not enough to
    recreate what a Codex session actually ran with because layered config,
    CLI overrides, defaults, feature aliases, resolved feature config,
    prompt setup, and model-catalog/session values can all affect the final
    runtime behavior.
    
    This PR adds an effective config lockfile path: one run can export the
    resolved session config, and a later run can replay that lockfile and
    fail early if the regenerated effective config drifts.
    
    ## What Changed
    
    - Add a dedicated `ConfigLockfileToml` wrapper with top-level lockfile
    metadata plus the replayable config:
    
      ```toml
      version = 1
      codex_version = "..."
    
      [config]
      # effective ConfigToml fields
      ```
    
    - Keep lockfile metadata out of regular `ConfigToml`; replay loads
    `ConfigLockfileToml` and then uses its nested `config` as the
    authoritative config layer.
    - Add `debug.config_lockfile.export_dir` to write
    `<thread_id>.config.lock.toml` when a root session starts.
    - Add `debug.config_lockfile.load_path` to replay a saved lockfile and
    validate the regenerated session lockfile against it.
    - Add `debug.config_lockfile.allow_codex_version_mismatch` to optionally
    tolerate Codex binary version drift while still comparing the rest of
    the lockfile.
    - Add `debug.config_lockfile.save_fields_resolved_from_model_catalog` so
    lock creation can either save model-catalog/session-resolved fields or
    intentionally leave those fields dynamic.
    - Build lockfiles from the effective config plus resolved runtime values
    such as model selection, reasoning settings, prompts, service tier, web
    search mode, feature states/config, memories config, skill instructions,
    and agent limits.
    - Materialize feature aliases and custom feature config into the
    lockfile so replay compares canonical resolved behavior instead of
    user-authored alias shape.
    - Strip profile/debug/file-include/environment-specific inputs from
    generated lockfiles so they contain replayable values rather than the
    inputs that produced those values.
    - Surface JSON-RPC server error code/data in app-server client and TUI
    bootstrap errors so config-lock replay failures include the actual TOML
    diff.
    - Regenerate the config schema for the new debug config keys.
    
    ## Review Notes
    
    The main flow is split across these files:
    
    - `config/src/config_toml.rs`: lockfile/debug TOML shapes.
    - `core/src/config/mod.rs`: loading `debug.config_lockfile.*`, replaying
    a lockfile as a config layer, and preserving the expected lockfile for
    validation.
    - `core/src/session/config_lock.rs`: exporting the current session
    lockfile and materializing resolved session/config values.
    - `core/src/config_lock.rs`: lockfile parsing, metadata/version checks,
    replay comparison, and diff formatting.
    
    ## Usage
    
    Export a lockfile from a normal session:
    
    ```sh
    codex -c 'debug.config_lockfile.export_dir="/tmp/codex-locks"'
    ```
    
    Export a lockfile without saving model-catalog/session-resolved fields:
    
    ```sh
    codex -c 'debug.config_lockfile.export_dir="/tmp/codex-locks"' \
      -c 'debug.config_lockfile.save_fields_resolved_from_model_catalog=false'
    ```
    
    Replay a saved lockfile in a later session:
    
    ```sh
    codex -c 'debug.config_lockfile.load_path="/tmp/codex-locks/<thread_id>.config.lock.toml"'
    ```
    
    If replay resolves to a different effective config, startup fails with a
    TOML diff.
    
    To tolerate Codex binary version drift during replay:
    
    ```sh
    codex -c 'debug.config_lockfile.load_path="/tmp/codex-locks/<thread_id>.config.lock.toml"' \
      -c 'debug.config_lockfile.allow_codex_version_mismatch=true'
    ```
    
    ## Limitations
    
    This does not support custom rules/network policies.
    
    ## Verification
    
    - `cargo test -p codex-core config_lock`
    - `cargo test -p codex-config`
    - `cargo test -p codex-thread-manager-sample`
  • Alias codex_hooks feature as hooks (#20522)
    # Why
    
    The hooks feature flag should use the concise canonical name `hooks`,
    while existing configs that still use `codex_hooks` continue to work
    during the rename.
    
    # What
    
    - change the canonical `Feature::CodexHooks` key from `codex_hooks` to
    `hooks`
    - register `codex_hooks` through the existing legacy-alias path
    - update the config schema and canonical config fixtures to prefer
    `hooks`
    - add regression coverage that both `hooks` and `codex_hooks` resolve to
    `Feature::CodexHooks`
    
    # Verification
    
    - `cargo test -p codex-features`
    - `cargo test -p codex-core config::schema_tests`
    - `cargo test -p codex-core
    pre_tool_use_blocks_shell_when_defined_in_config_toml`
    - `cargo test -p codex-app-server
    hooks_list_uses_each_cwds_effective_feature_enablement`
  • [Codex] Add browser use external feature flag (#20245)
    ## Summary
    
    - Adds a separate feature control for external-browser Browser Use
    integrations.
    - Registers `browser_use_external` as a stable, default-enabled
    requirements-owned feature key.
    - Updates feature registry tests and regenerates the config schema.
    
    Codex validation:
    - `cargo fmt -- --config imports_granularity=Item`
    - `cargo run -p codex-core --bin codex-write-config-schema`
    - `cargo test -p codex-features`
    
    ## Addendum
    
    This gives enterprise policy a coarse control for Browser Use outside
    the Codex-managed in-app browser. The existing `browser_use` feature is
    the Browser Use control, while `browser_use_external` can gate
    extension/native integrations for external browsers as that surface
    grows
  • Mark goals feature as experimental (#20083)
    ## Why
    
    The `goals` feature flag is ready to move out of the hidden
    under-development bucket and into the user-facing experimental surface.
    Marking it experimental lets users discover it through the experimental
    features UI while still making clear that it is opt-in.
    
    ## What changed
    
    - Changed `goals` from `Stage::UnderDevelopment` to
    `Stage::Experimental` in `codex-rs/features/src/lib.rs`.
    - Added experimental menu metadata for the feature with the description
    `Set a persistent goal Codex can continue over time`.
    
    ## Verification
    
    - `cargo test -p codex-features`
  • [apps] Add apps MCP path override (#20231)
    Summary
    
    - Add `[features.apps_mcp_path_override]` config with a `path` field for
    overriding only the built-in apps MCP path.
    - Keep existing host/base URL derivation unchanged and append the
    configured path after that base.
    - Regenerate the config schema with the custom feature-config case.
    
    Test Plan
    
    - Not run for latest revision; only `just fmt` and `just
    write-config-schema` were run.
    - Earlier revision: `cargo test -p codex-features`
    - Earlier revision: `cargo test -p codex-mcp`
  • Discover hooks bundled with plugins (#19705)
    ## Why
    
    Plugins can bundle lifecycle hooks, but Codex previously only discovered
    hooks from user, project, and managed config layers. This adds the
    plugin discovery and runtime plumbing needed for plugin-bundled hooks
    while keeping execution behind the `plugin_hooks` feature flag.
    
    ## What
    
    - Discovers plugin hook sources from each plugin's default
    `hooks/hooks.json`.
    - Supports `plugin.json` manifest `hooks` entries as either relative
    paths or inline hook objects.
    - Plumbs discovered plugin hook sources through plugin loading into the
    hook runtime when `plugin_hooks` is enabled.
    - Marks plugin-originated hook runs as `HookSource::Plugin`.
    - Injects `PLUGIN_ROOT` and `CLAUDE_PLUGIN_ROOT` into plugin hook
    command environments.
    - Updates generated schemas and hook source metadata for the plugin hook
    source.
    
    ## Stack
    
    1. This PR - openai/codex#19705
    2. openai/codex#19778
    3. openai/codex#19840
    4. openai/codex#19882
    
    ## Reviewer Notes
    
    - Core logic is in `codex-rs/core-plugins/src/loader.rs` and
    `codex-rs/hooks/src/engine/discovery.rs`
    - Moved existing / adding new tests to
    `codex-rs/core-plugins/src/loader_tests.rs` hence the large diff there
    - Otherwise mostly plumbing and minor schema updates
    
    ### Core Changes
    
    The `codex-rs/core` changes are limited to wiring plugin hook support
    into existing core flows:
    
    - `core/src/session/session.rs` conditionally pulls effective plugin
    hook sources and plugin hook load warnings from `PluginsManager` when
    `plugin_hooks` is enabled, then passes them into `HooksConfig`.
    - `core/src/hook_runtime.rs` adds the `plugin` metric tag for
    `HookSource::Plugin`.
    - `core/config.schema.json` picks up the new `plugin_hooks` feature
    flag, and `core/src/plugins/manager_tests.rs` updates fixtures for the
    added plugin hook fields.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>