Commit Graph

4541 Commits

  • Fix stale create_wait_tool reference (#14639)
    ## Summary
    - replace the stale `create_wait_tool()` reference in `spec_tests.rs`
    - use `create_wait_agent_tool()` to match the actual multi-agent tool
    rename from `#14631`
    - fix the resulting `codex-core` spec-test compile failure on current
    `main`
    
    ## Context
    `#14631` renamed the model-facing multi-agent tool from `wait` to
    `wait_agent` and renamed the corresponding spec helper to
    `create_wait_agent_tool()`.
    
    One `spec_tests.rs` call site was left behind, so current `main` fails
    to compile `codex-core` tests with:
    - `cannot find function create_wait_tool`
    
    Using `create_wait_agent_tool()` is the correct fix here;
    `create_exec_wait_tool()` would point at the separate exec wait tool and
    would not match the renamed multi-agent toolset.
    
    ## Testing
    - not rerun locally after the rebase
    
    Co-authored-by: Codex <noreply@openai.com>
  • Add Smart Approvals guardian review across core, app-server, and TUI (#13860)
    ## Summary
    - add `approvals_reviewer = "user" | "guardian_subagent"` as the runtime
    control for who reviews approval requests
    - route Smart Approvals guardian review through core for command
    execution, file changes, managed-network approvals, MCP approvals, and
    delegated/subagent approval flows
    - expose guardian review in app-server with temporary unstable
    `item/autoApprovalReview/{started,completed}` notifications carrying
    `targetItemId`, `review`, and `action`
    - update the TUI so Smart Approvals can be enabled from `/experimental`,
    aligned with the matching `/approvals` mode, and surfaced clearly while
    reviews are pending or resolved
    
    ## Runtime model
    This PR does not introduce a new `approval_policy`.
    
    Instead:
    - `approval_policy` still controls when approval is needed
    - `approvals_reviewer` controls who reviewable approval requests are
    routed to:
      - `user`
      - `guardian_subagent`
    
    `guardian_subagent` is a carefully prompted reviewer subagent that
    gathers relevant context and applies a risk-based decision framework
    before approving or denying the request.
    
    The `smart_approvals` feature flag is a rollout/UI gate. Core runtime
    behavior keys off `approvals_reviewer`.
    
    When Smart Approvals is enabled from the TUI, it also switches the
    current `/approvals` settings to the matching Smart Approvals mode so
    users immediately see guardian review in the active thread:
    - `approval_policy = on-request`
    - `approvals_reviewer = guardian_subagent`
    - `sandbox_mode = workspace-write`
    
    Users can still change `/approvals` afterward.
    
    Config-load behavior stays intentionally narrow:
    - plain `smart_approvals = true` in `config.toml` remains just the
    rollout/UI gate and does not auto-set `approvals_reviewer`
    - the deprecated `guardian_approval = true` alias migration does
    backfill `approvals_reviewer = "guardian_subagent"` in the same scope
    when that reviewer is not already configured there, so old configs
    preserve their original guardian-enabled behavior
    
    ARC remains a separate safety check. For MCP tool approvals, ARC
    escalations now flow into the configured reviewer instead of always
    bypassing guardian and forcing manual review.
    
    ## Config stability
    The runtime reviewer override is stable, but the config-backed
    app-server protocol shape is still settling.
    
    - `thread/start`, `thread/resume`, and `turn/start` keep stable
    `approvalsReviewer` overrides
    - the config-backed `approvals_reviewer` exposure returned via
    `config/read` (including profile-level config) is now marked
    `[UNSTABLE]` / experimental in the app-server protocol until we are more
    confident in that config surface
    
    ## App-server surface
    This PR intentionally keeps the guardian app-server shape narrow and
    temporary.
    
    It adds generic unstable lifecycle notifications:
    - `item/autoApprovalReview/started`
    - `item/autoApprovalReview/completed`
    
    with payloads of the form:
    - `{ threadId, turnId, targetItemId, review, action? }`
    
    `review` is currently:
    - `{ status, riskScore?, riskLevel?, rationale? }`
    - where `status` is one of `inProgress`, `approved`, `denied`, or
    `aborted`
    
    `action` carries the guardian action summary payload from core when
    available. This lets clients render temporary standalone pending-review
    UI, including parallel reviews, even when the underlying tool item has
    not been emitted yet.
    
    These notifications are explicitly documented as `[UNSTABLE]` and
    expected to change soon.
    
    This PR does **not** persist guardian review state onto `thread/read`
    tool items. The intended follow-up is to attach guardian review state to
    the reviewed tool item lifecycle instead, which would improve
    consistency with manual approvals and allow thread history / reconnect
    flows to replay guardian review state directly.
    
    ## TUI behavior
    - `/experimental` exposes the rollout gate as `Smart Approvals`
    - enabling it in the TUI enables the feature and switches the current
    session to the matching Smart Approvals `/approvals` mode
    - disabling it in the TUI clears the persisted `approvals_reviewer`
    override when appropriate and returns the session to default manual
    review when the effective reviewer changes
    - `/approvals` still exposes the reviewer choice directly
    - the TUI renders:
    - pending guardian review state in the live status footer, including
    parallel review aggregation
      - resolved approval/denial state in history
    
    ## Scope notes
    This PR includes the supporting core/runtime work needed to make Smart
    Approvals usable end-to-end:
    - shell / unified-exec / apply_patch / managed-network / MCP guardian
    review
    - delegated/subagent approval routing into guardian review
    - guardian review risk metadata and action summaries for app-server/TUI
    - config/profile/TUI handling for `smart_approvals`, `guardian_approval`
    alias migration, and `approvals_reviewer`
    - a small internal cleanup of delegated approval forwarding to dedupe
    fallback paths and simplify guardian-vs-parent approval waiting (no
    intended behavior change)
    
    Out of scope for this PR:
    - redesigning the existing manual approval protocol shapes
    - persisting guardian review state onto app-server `ThreadItem`s
    - delegated MCP elicitation auto-review (the current delegated MCP
    guardian shim only covers the legacy `RequestUserInput` path)
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Fix wait_agent expectations in core tests (#14637)
    ## Summary
    - update stale core tool-spec expectations from `wait` to `wait_agent`
    - update the prompt-caching tool-name assertion to match the renamed
    tool
    - fix the Bazel regressions introduced after #14631 renamed the
    multi-agent wait tool
    
    ## Testing
    - cargo test -p codex-core tools::spec::tests
    - cargo test -p codex-core
    suite::prompt_caching::prompt_tools_are_consistent_across_requests
    
    Co-authored-by: Codex <noreply@openai.com>
  • Normalize MCP tool names to code-mode safe form (#14605)
    Code mode doesn't allow `-` in names and it's better if function names
    and code-mode names are the same.
  • app-server: add v2 filesystem APIs (#14245)
    Add a protocol-level filesystem surface to the v2 app-server so Codex
    clients can read and write files, inspect directories, and subscribe to
    path changes without relying on host-specific helpers.
    
    High-level changes:
    - define the new v2 fs/readFile, fs/writeFile, fs/createDirectory,
    fs/getMetadata, fs/readDirectory, fs/remove, fs/copy RPCs
    - implement the app-server handlers, including absolute-path validation,
    base64 file payloads, recursive copy/remove semantics
    - document the API, regenerate protocol schemas/types, and add
    end-to-end tests for filesystem operations, copy edge cases
    
    Testing plan:
    - validate protocol serialization and generated schema output for the
    new fs request, response, and notification types
    - run app-server integration coverage for file and directory CRUD paths,
    metadata/readDirectory responses, copy failure modes, and absolute-path
    validation
  • Stabilize multi-agent feature flag (#14622)
    - make multi_agent stable and enabled by default
    - update feature and tool-spec coverage to match the new default
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Rename multi-agent wait tool to wait_agent (#14631)
    - rename the multi-agent tool name the model sees to wait_agent
    - update the model-facing prompts and tool descriptions to match
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Slash copy osc52 wsl support (#13201)
    This PR is a followup to the /copy feature to support WSL and SSH!
  • Add code_mode_only feature (#14617)
    Summary
    - add the code_mode_only feature flag/config schema and wire its
    dependency on code_mode
    - update code mode tool descriptions to list nested tools with detailed
    headers
    - restrict available tools for prompt and exec descriptions when
    code_mode_only is enabled and test the behavior
    
    Testing
    - Not run (not requested)
  • fix: preserve zsh-fork escalation fds across unified-exec spawn paths (#13644)
    ## Why
    
    `zsh-fork` sessions launched through unified-exec need the escalation
    socket to survive the wrapper -> server -> child handoff so later
    intercepted `exec()` calls can still reach the escalation server.
    
    The inherited-fd spawn path also needs to avoid closing Rust's internal
    exec-error pipe, and the shell-escalation handoff needs to tolerate the
    receive-side case where a transferred fd is installed into the same
    stdio slot it will be mapped onto.
    
    ## What Changed
    
    - Added `SpawnLifecycle::inherited_fds()` in
    `codex-rs/core/src/unified_exec/process.rs` and threaded inherited fds
    through `codex-rs/core/src/unified_exec/process_manager.rs` so
    unified-exec can preserve required descriptors across both PTY and
    no-stdin pipe spawn paths.
    - Updated `codex-rs/core/src/tools/runtimes/shell/zsh_fork_backend.rs`
    to expose the escalation socket fd through the spawn lifecycle.
    - Added inherited-fd-aware spawn helpers in
    `codex-rs/utils/pty/src/pty.rs` and `codex-rs/utils/pty/src/pipe.rs`,
    including Unix pre-exec fd pruning that preserves requested inherited
    fds while leaving `FD_CLOEXEC` descriptors alone. The pruning helper is
    now named `close_inherited_fds_except()` to better describe that
    behavior.
    - Updated `codex-rs/shell-escalation/src/unix/escalate_client.rs` to
    duplicate local stdio before transfer and send destination stdio numbers
    in `SuperExecMessage`, so the wrapper keeps using its own
    `stdin`/`stdout`/`stderr` until the escalated child takes over.
    - Updated `codex-rs/shell-escalation/src/unix/escalate_server.rs` so the
    server accepts the overlap case where a received fd reuses the same
    stdio descriptor number that the child setup will target with `dup2`.
    - Added comments around the PTY stdio wiring and the overlap regression
    helper to make the fd handoff and controlling-terminal setup easier to
    follow.
    
    ## Verification
    
    - `cargo test -p codex-utils-pty`
    - covers preserved-fd PTY spawn behavior, PTY resize, Python REPL
    continuity, exec-failure reporting, and the no-stdin pipe path
    - `cargo test -p codex-shell-escalation`
    - covers duplicated-fd transfer on the client side and verifies the
    overlap case by passing a pipe-backed stdin payload through the
    server-side `dup2` path
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13644).
    * #14624
    * __->__ #13644
  • feat(app-server, core): add more spans (#14479)
    ## Description
    
    This PR expands tracing coverage across app-server thread startup, core
    session initialization, and the Responses transport layer. It also gives
    core dispatch spans stable operation-specific names so traces are easier
    to follow than the old generic `submission_dispatch` spans.
    
    Also use `fmt::Display` for types that we serialize in traces so we send
    strings instead of rust types
  • Override local apps settings with requirements.toml settings (#14304)
    This PR changes app and connector enablement when `requirements.toml` is
    present locally or via remote configuration.
    
    For apps.* entries:
    - `enabled = false` in `requirements.toml` overrides the user’s local
    `config.toml` and forces the app to be disabled.
    - `enabled = true` in `requirements.toml` does not re-enable an app the
    user has disabled in config.toml.
    
    This behavior applies whether or not the user has an explicit entry for
    that app in `config.toml`. It also applies to cloud-managed policies and
    configurations when the admin sets the override through
    `requirements.toml`.
    
    Scenarios tested and verified:
    - Remote managed, user config (present) override
    - Admin-defined policies & configurations include a connector override:
      `[apps.<appID>]
    enabled = false`
    - User's config.toml has the same connector configured with `enabled =
    true`
      - TUI/App should show connector as disabled
      - Connector should be unavailable for use in the composer
      
    - Remote managed, user config (absent) override
    - Admin-defined policies & configurations include a connector override:
      `[apps.<appID>]
    enabled = false`
      - User's config.toml has no entry for the the same connector
      - TUI/App should show connector as disabled
      - Connector should be unavailable for use in the composer
      
    - Locally managed, user config (present) override
      - Local requirements.toml includes a connector override:
      `[apps.<appID>]
    enabled = false`
    - User's config.toml has the same connector configured with `enabled =
    true`
      - TUI/App should show connector as disabled
      - Connector should be unavailable for use in the composer
    
    - Locally managed, user config (absent) override
      - Local requirements.toml includes a connector override:
      `[apps.<appID>]
    enabled = false`
      - User's config.toml has no entry for the the same connector
      - TUI/App should show connector as disabled
      - Connector should be unavailable for use in the composer
    
    
    
    
    <img width="1446" height="753" alt="image"
    src="https://github.com/user-attachments/assets/61c714ca-dcca-4952-8ad2-0afc16ff3835"
    />
    <img width="595" height="233" alt="image"
    src="https://github.com/user-attachments/assets/7c8ab147-8fd7-429a-89fb-591c21c15621"
    />
  • Use subagents naming in the TUI (#14618)
    - rename user-facing TUI multi-agent wording to subagents
    - rename the surfaced slash command to `subagents` and update
    tests/snapshots
    
    Co-authored-by: Codex <noreply@openai.com>
  • app-server: Add platform os and family to init response (#14527)
    This allows the client to pick os-specific behavior while interacting
    with the app server, e.g. to use proper path separators.
  • Unify realtime v1/v2 session config (#14606)
    ## Summary
    - unify realtime websocket settings under `[realtime]` (`version` and
    `type`)
    - remove `realtime_conversation_v2` and select parser/session mode from
    config
    
    ## Testing
    - not run (per request)
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Start TUI on embedded app server (#14512)
    This PR is part of the effort to move the TUI on top of the app server.
    In a previous PR, we introduced an in-process app server and moved
    `exec` on top of it.
    
    For the TUI, we want to do the migration in stages. The app server
    doesn't currently expose all of the functionality required by the TUI,
    so we're going to need to support a hybrid approach as we make the
    transition.
    
    This PR changes the TUI initialization to instantiate an in-process app
    server and access its `AuthManager` and `ThreadManager` rather than
    constructing its own copies. It also adds a placeholder TUI event
    handler that will eventually translate app server events into TUI
    events. App server notifications are accepted but ignored for now. It
    also adds proper shutdown of the app server when the TUI terminates.
  • [bazel] Bump up cc and rust toolchains (#14542)
    This lets us drop various patches and go all the way to a very clean
    setup.
    
    In case folks are curious what was going on... we were depending on the
    toolchain finding stdlib headers as sibling files of `clang++`, and for
    linking we were providing a `-resource-dir` containing the runtime libs.
    However, some users of the cc toolchain (such as rust build scripts) do
    the equivalent of `$CC $CCFLAGS $LDFLAGS` so the `-resource-dir` was
    being passed when compiling, which suppressed the default stdlib header
    location logic. The upstream fix was to swap to using `-isystem` to pass
    the stdlib headers, while carefully controlling the ordering to simulate
    them coming from the resource-dir.
  • chore: clarify plugin + app copy in model instructions (#14541)
    - clarify app mentions are in user messages
    - clarify what it means for tools to be provided via `codex_apps` MCP
    - add plugin descriptions (with basic sanitization) to top-level `##
    Plugins` section alongside the corresponding plugin names
    - explain that skills from plugins are prefixed with `plugin_name:` in
    top-level `##Plugins` section
    
    changes to more logically organize `Apps`, `Skills`, and `Plugins`
    instructions will be in a separate PR, as that shuffles dev + user
    instructions in ways that change tests broadly.
    
    ### Tests
    confirmed in local rollout, some new tests.
  • sending back imagaegencall response back to responseapi (#14558)
    Sending back the ResponseItem::ImageGenerationCall as is, because it is
    now supported from the API-side.
  • Use a private desktop for Windows sandbox instead of Winsta0\Default (#14400)
    ## Summary
    - launch Windows sandboxed children on a private desktop instead of
    `Winsta0\Default`
    - make private desktop the default while keeping
    `windows.sandbox_private_desktop=false` as the escape hatch
    - centralize process launch through the shared
    `create_process_as_user(...)` path
    - scope the private desktop ACL to the launching logon SID
    
    ## Why
    Today sandboxed Windows commands run on the visible shared desktop. That
    leaves an avoidable same-desktop attack surface for window interaction,
    spoofing, and related UI/input issues. This change moves sandboxed
    commands onto a dedicated per-launch desktop by default so the sandbox
    no longer shares `Winsta0\Default` with the user session.
    
    The implementation stays conservative on security with no silent
    fallback back to `Winsta0\Default`
    
    If private-desktop setup fails on a machine, users can still opt out
    explicitly with `windows.sandbox_private_desktop=false`.
    
    ## Validation
    - `cargo build -p codex-cli`
    - elevated-path `codex exec` desktop-name probe returned
    `CodexSandboxDesktop-*`
    - elevated-path `codex exec` smoke sweep for shell commands, nested
    `pwsh`, jobs, and hidden `notepad` launch
    - unelevated-path full private-desktop compatibility sweep via `codex
    exec` with `-c windows.sandbox=unelevated`
  • code mode: single line tool declarations (#14526)
    ## Summary
    - render code mode tool declarations as single-line TypeScript snippets
    - make the JSON schema renderer emit inline object shapes for these
    declarations
    - update code mode/spec expectations to match the new inline rendering
    
    ## Testing
    - `just fmt`
    - `cargo test -p codex-core render_json_schema_to_typescript`
    - `cargo test -p codex-core code_mode_augments_`
    - `cargo test -p codex-core --test all exports_all_tools_metadata --
    --nocapture`
  • Split multi-agent handler into dedicated files (#14603)
    ## Summary
    - move the multi-agent handlers suite into its own files for spawn,
    wait, resume, send input, and close logic
    - keep the aggregated module in place while delegating each handler to
    its new file to keep things organized per handler
    
    ## Testing
    - Not run (not requested)
  • Add diagnostics for read_only_unless_trusted timeout flake (#14518)
    ## Summary
    - add targeted diagnostic logging for the
    read_only_unless_trusted_requires_approval scenarios in
    approval_matrix_covers_all_modes
    - add a scoped timeout buffer only for ro_unless_trusted write-file
    scenarios: 1000ms -> 2000ms
    - keep all other write-file scenarios at 1000ms
    
    ## Why
    The last two main failures were both in codex-core::all
    suite::approvals::approval_matrix_covers_all_modes with exit_code=124 in
    the same scenario. This points to execution-time jitter in CI rather
    than a semantic approval-policy mismatch.
    
    ## Notes
    - This does not introduce any >5s timeout and does not
    disable/quarantine tests.
    - The timeout increase is tightly scoped to the single flaky path and
    keeps the matrix deterministic under CI scheduling variance.
  • Add realtime transcription mode for websocket sessions (#14556)
    - add experimental_realtime_ws_mode (conversational/transcription) and
    plumb it into realtime conversation session config
    - switch realtime websocket intent and session.update payload shape
    based on mode
    - update config schema and realtime/config tests
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Add codex tool support for realtime v2 handoff (#14554)
    - Advertise a `codex` function tool in realtime v2 session updates.
    - Emit handoff replies as `function_call_output` items while keeping v1
    behavior unchanged.
    - Split realtime event parsing into explicit v1/v2 modules with shared
    common helpers.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • feat: support skill-scoped managed network domain overrides in skill config (#14522)
    ## Summary
    
    This lets skill loading split `permissions.network` into two distinct
    pieces:
    
    - `permissions.network.enabled` still feeds the skill
    `PermissionProfile` and remains the coarse gate for whether the skill
    can use network access at all.
    - `permissions.network.allowed_domains` and
    `permissions.network.denied_domains` are lifted into a new
    `SkillManagedNetworkOverride` so managed-network sessions can start
    per-skill scoped proxies with the right domain overrides.
    
    The change also updates `SkillMetadata` construction sites and adds
    loader tests covering YAML parsing plus normalization of the network
    gate vs. domain override fields.
    
    ## Follow-up
    A PR that uses the network_override to spin up a skill-specific proxy if
    network_override is not none.
  • Add realtime v2 event parser behind feature flag (#14537)
    - Add a feature-flagged realtime v2 parser on the existing
    websocket/session pipeline.
    - Wire parser selection from core feature flags and map the codex
    handoff tool-call path into existing handoff events.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Refactor cloud requirements error and surface in JSON-RPC error (#14504)
    Refactors cloud requirements error handling to carry structured error
    metadata and surfaces that metadata through JSON-RPC config-load
    failures, including:
    * adds typed CloudRequirementsLoadErrorCode values plus optional
    statusCode
    * marks thread/start, thread/resume, and thread/fork config failures
    with structured cloud-requirements error data
  • code_mode: Move exec params from runtime declarations to @pragma (#14511)
    This change moves code_mode exec session settings out of the runtime API
    and into an optional first-line pragma, so instead of calling runtime
    helpers like set_yield_time() or set_max_output_tokens_per_exec_call(),
    the model can write // @exec: {"yield_time_ms": ...,
    "max_output_tokens": ...} at the top of the freeform exec source. Rust
    now parses that pragma before building the source, validates it, and
    passes the values directly in the exec start message to the code-mode
    broker, which applies them at session start without any worker-runtime
    mutation path. The @openai/code_mode module no longer exposes those
    setter functions, the docs and grammar were updated to describe the
    pragma form, and the existing code_mode tests were converted to use
    pragma-based configuration instead.
  • Add plugin usage telemetry (#14531)
    adding metrics including: 
    * plugin used
    * plugin installed/uninstalled
    * plugin enabled/disabled
  • fix: reopen writable linux carveouts under denied parents (#14514)
    ## Summary
    - preserve Linux bubblewrap semantics for `write -> none -> write`
    filesystem policies by recreating masked mount targets before rebinding
    narrower writable descendants
    - add a Linux runtime regression for `/repo = write`, `/repo/a = none`,
    `/repo/a/b = write` so the nested writable child is exercised under
    bubblewrap
    - document the supported legacy Landlock fallback and the split-policy
    bubblewrap behavior for overlapping carveouts
    
    ## Example
    Given a split filesystem policy like:
    
    ```toml
    "/repo" = "write"
    "/repo/a" = "none"
    "/repo/a/b" = "write"
    ```
    
    this PR keeps `/repo` writable, masks `/repo/a`, and still reopens
    `/repo/a/b` as writable again under bubblewrap.
    
    ## Testing
    - `just fmt`
    - `cargo test -p codex-linux-sandbox`
    - `cargo clippy -p codex-linux-sandbox --tests -- -D warnings`
  • Add typed multi-agent tool outputs (#14536)
    ## Summary
    - return typed `ToolOutput` values from the multi-agent handlers instead
    of plain `FunctionToolOutput`
    - keep the regular function-call response shape as JSON text while
    exposing structured values to code mode
    - add output schemas for `spawn_agent`, `send_input`, `resume_agent`,
    `wait`, and `close_agent`
    
    ## Verification
    - `just fmt`
    - focused multi-agent and integration tests passed earlier in this
    branch during iteration
    - after the final edit, I only reran formatting before opening this PR
  • client: extend custom CA handling across HTTPS and websocket clients (#14239)
    ## Stacked PRs
    
    This work is now effectively split across two steps:
    
    - #14178: add custom CA support for browser and device-code login flows,
    docs, and hermetic subprocess tests
    - #14239: extend that shared custom CA handling across Codex HTTPS
    clients and secure websocket TLS
    
    Note: #14240 was merged into this branch while it was stacked on top of
    this PR. This PR now subsumes that websocket follow-up and should be
    treated as the combined change.
    
    Builds on top of #14178.
    
    ## Problem
    
    Custom CA support landed first in the login path, but the real
    requirement is broader. Codex constructs outbound TLS clients in
    multiple places, and both HTTPS and secure websocket paths can fail
    behind enterprise TLS interception if they do not honor
    `CODEX_CA_CERTIFICATE` or `SSL_CERT_FILE` consistently.
    
    This PR broadens the shared custom-CA logic beyond login and applies the
    same policy to websocket TLS, so the enterprise-proxy story is no longer
    split between “HTTPS works” and “websockets still fail”.
    
    ## What This Delivers
    
    Custom CA support is no longer limited to login. Codex outbound HTTPS
    clients and secure websocket connections can now honor the same
    `CODEX_CA_CERTIFICATE` / `SSL_CERT_FILE` configuration, so enterprise
    proxy/intercept setups work more consistently end-to-end.
    
    For users and operators, nothing new needs to be configured beyond the
    same CA env vars introduced in #14178. The change is that more of Codex
    now respects them, including websocket-backed flows that were previously
    still using default trust roots.
    
    I also manually validated the proxy path locally with mitmproxy using:
    `CODEX_CA_CERTIFICATE=~/.mitmproxy/mitmproxy-ca-cert.pem
    HTTPS_PROXY=http://127.0.0.1:8080 just codex`
    with mitmproxy installed via `brew install mitmproxy` and configured as
    the macOS system proxy.
    
    ## Mental model
    
    `codex-client` is now the owner of shared custom-CA policy for outbound
    TLS client construction. Reqwest callers start from the builder
    configuration they already need, then pass that builder through
    `build_reqwest_client_with_custom_ca(...)`. Websocket callers ask the
    same module for a rustls client config when a custom CA bundle is
    configured.
    
    The env precedence is the same everywhere:
    - `CODEX_CA_CERTIFICATE` wins
    - otherwise fall back to `SSL_CERT_FILE`
    - otherwise use system roots
    
    The helper is intentionally narrow. It loads every usable certificate
    from the configured PEM bundle into the appropriate root store and
    returns either a configured transport or a typed error that explains
    what went wrong.
    
    ## Non-goals
    
    This does not add handshake-level integration tests against a live TLS
    endpoint. It does not validate that the configured bundle forms a
    meaningful certificate chain. It also does not try to force every
    transport in the repo through one abstraction; it extends the shared CA
    policy across the reqwest and websocket paths that actually needed it.
    
    ## Tradeoffs
    
    The main tradeoff is centralizing CA behavior in `codex-client` while
    still leaving adoption up to call sites. That keeps the implementation
    additive and reviewable, but it means the rule "outbound Codex TLS that
    should honor enterprise roots must use the shared helper" is still
    partly enforced socially rather than by types.
    
    For websockets, the shared helper only builds an explicit rustls config
    when a custom CA bundle is configured. When no override env var is set,
    websocket callers still use their ordinary default connector path.
    
    ## Architecture
    
    `codex-client::custom_ca` now owns CA bundle selection, PEM
    normalization, mixed-section parsing, certificate extraction, typed
    CA-loading errors, and optional rustls client-config construction for
    websocket TLS.
    
    The affected consumers now call into that shared helper directly rather
    than carrying login-local CA behavior:
    - backend-client
    - cloud-tasks
    - RMCP client paths that use `reqwest`
    - TUI voice HTTP paths
    - `codex-core` default reqwest client construction
    - `codex-api` websocket clients for both responses and realtime
    websocket connections
    
    The subprocess CA probe, env-sensitive integration tests, and shared PEM
    fixtures also live in `codex-client`, which is now the actual owner of
    the behavior they exercise.
    
    ## Observability
    
    The shared CA path logs:
    - which environment variable selected the bundle
    - which path was loaded
    - how many certificates were accepted
    - when `TRUSTED CERTIFICATE` labels were normalized
    - when CRLs were ignored
    - where client construction failed
    
    Returned errors remain user-facing and include the relevant env var,
    path, and remediation hint. That same error model now applies whether
    the failure surfaced while building a reqwest client or websocket TLS
    configuration.
    
    ## Tests
    
    Pure unit tests in `codex-client` cover env precedence and PEM
    normalization behavior. Real client construction remains in subprocess
    tests so the suite can control process env and avoid the macOS seatbelt
    panic path that motivated the hermetic test split.
    
    The subprocess coverage verifies:
    - `CODEX_CA_CERTIFICATE` precedence over `SSL_CERT_FILE`
    - fallback to `SSL_CERT_FILE`
    - single-cert and multi-cert bundles
    - malformed and empty-file errors
    - OpenSSL `TRUSTED CERTIFICATE` handling
    - CRL tolerance for well-formed CRL sections
    
    The websocket side is covered by the existing `codex-api` / `codex-core`
    websocket test suites plus the manual mitmproxy validation above.
    
    ---------
    
    Co-authored-by: Ivan Zakharchanka <3axap4eHko@gmail.com>
    Co-authored-by: Codex <noreply@openai.com>
  • [js_repl] Hard-stop active js_repl execs on explicit user interrupts (#13329)
    ## Summary
    - hard-stop `js_repl` only for `TurnAbortReason::Interrupted`,
    preserving the persistent REPL across replaced turns
    - track the current top-level exec by turn and only reset when the
    interrupted turn owns submitted work or a freshly started kernel for the
    current exec attempt
    - close both interrupt races: the write-window race by marking the exec
    as submitted before async pipe writes begin, and the startup-window race
    by tracking fresh-kernel ownership until submission
    - add regression coverage for interrupted in-flight execs and the
    pending-kernel-start window
    
    ## Why
    Stopping a turn previously surfaced `aborted by user after Xs` even
    though the underlying `js_repl` kernel could continue executing. Earlier
    fixes also risked resetting the session-scoped REPL too broadly or
    missing already-dispatched work. This change keeps cleanup scoped to
    explicit stop semantics and makes the interrupt path line up with both
    submitted execs and newly started kernels.
    
    ## Testing
    - `just fmt`
    - `cargo test -p codex-core`
    - `just fix -p codex-core`
    
    `cargo test -p codex-core` passes the updated `js_repl` coverage,
    including the new startup-window regression test, but still has
    unrelated integration failures in this environment outside `js_repl`.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Split multi-agent handlers per tool (#14535)
    Summary
    - move the existing multi-agent handler logic into each tool-specific
    handler and inline helper implementations
    - remove the old central dispatcher now that each handler encapsulates
    its own behavior
    - adjust handler specs and tests to match the new structure without
    macros
    
    Testing
    - Not run (not requested)
  • login: add custom CA support for login flows (#14178)
    ## Stacked PRs
    
    This work is split across three stacked PRs:
    
    - #14178: add custom CA support for browser and device-code login flows,
    docs, and hermetic subprocess tests
    - #14239: broaden the shared custom CA path from login to other outbound
    `reqwest` clients across Codex
    - #14240: extend that shared custom CA handling to secure websocket TLS
    so websocket connections honor the same CA env vars
    
    Review order: #14178, then #14239, then #14240.
    
    Supersedes #6864.
    
    Thanks to @3axap4eHko for the original implementation and investigation
    here. Although this version rearranges the code and history
    significantly, the majority of the credit for this work belongs to them.
    
    ## Problem
    
    Login flows need to work in enterprise environments where outbound TLS
    is intercepted by an internal proxy or gateway. In those setups, system
    root certificates alone are often insufficient to validate the OAuth and
    device-code endpoints used during login. The change adds a
    login-specific custom CA loading path, but the important contracts
    around env precedence, PEM compatibility, test boundaries, and
    probe-only workarounds need to be explicit so reviewers can understand
    what behavior is intentional.
    
    For users and operators, the behavior is simple: if login needs to trust
    a custom root CA, set `CODEX_CA_CERTIFICATE` to a PEM file containing
    one or more certificates. If that variable is unset, login falls back to
    `SSL_CERT_FILE`. If neither is set, login uses system roots. Invalid or
    empty PEM files now fail with an error that points back to those
    environment variables and explains how to recover.
    
    ## What This Delivers
    
    Users can now make Codex login work behind enterprise TLS interception
    by pointing `CODEX_CA_CERTIFICATE` at a PEM bundle containing the
    relevant root certificates. If that variable is unset, login falls back
    to `SSL_CERT_FILE`, then to system roots.
    
    This PR applies that behavior to both browser-based and device-code
    login flows. It also makes login tolerant of the PEM shapes operators
    actually have in hand: multi-certificate bundles, OpenSSL `TRUSTED
    CERTIFICATE` labels, and bundles that include well-formed CRLs.
    
    ## Mental model
    
    `codex-login` is the place where the login flows construct ad hoc
    outbound HTTP clients. That makes it the right boundary for a narrow CA
    policy: look for `CODEX_CA_CERTIFICATE`, fall back to `SSL_CERT_FILE`,
    load every parseable certificate block in that bundle into a
    `reqwest::Client`, and fail early with a clear user-facing error if the
    bundle is unreadable or malformed.
    
    The implementation is intentionally pragmatic about PEM input shape. It
    accepts ordinary certificate bundles, multi-certificate bundles, OpenSSL
    `TRUSTED CERTIFICATE` labels, and bundles that also contain CRLs. It
    does not validate a certificate chain or prove a handshake; it only
    constructs the root store used by login.
    
    ## Non-goals
    
    This change does not introduce a general-purpose transport abstraction
    for the rest of the product. It does not validate whether the provided
    bundle forms a real chain, and it does not add handshake-level
    integration tests against a live TLS server. It also does not change
    login state management or OAuth semantics beyond ensuring the existing
    flows share the same CA-loading rules.
    
    ## Tradeoffs
    
    The main tradeoff is keeping this logic scoped to login-specific client
    construction rather than lifting it into a broader shared HTTP layer.
    That keeps the review surface smaller, but it also means future
    login-adjacent code must continue to use `build_login_http_client()` or
    it can silently bypass enterprise CA overrides.
    
    The `TRUSTED CERTIFICATE` handling is also intentionally a local
    compatibility shim. The rustls ecosystem does not currently accept that
    PEM label upstream, so the code normalizes it locally and trims the
    OpenSSL `X509_AUX` trailer bytes down to the certificate DER that
    `reqwest` can consume.
    
    ## Architecture
    
    `custom_ca.rs` is now the single place that owns login CA behavior. It
    selects the CA file from the environment, reads it, normalizes PEM label
    shape where needed, iterates mixed PEM sections with `rustls-pki-types`,
    ignores CRLs, trims OpenSSL trust metadata when necessary, and returns
    either a configured `reqwest::Client` or a typed error.
    
    The browser login server and the device-code flow both call
    `build_login_http_client()`, so they share the same trust-store policy.
    Environment-sensitive tests run through the `login_ca_probe` helper
    binary because those tests must control process-wide env vars and cannot
    reliably build a real reqwest client in-process on macOS seatbelt runs.
    
    ## Observability
    
    The custom CA path logs which environment variable selected the bundle,
    which file path was loaded, how many certificates were accepted, when
    `TRUSTED CERTIFICATE` labels were normalized, when CRLs were ignored,
    and where client construction failed. Returned errors remain user-facing
    and include the relevant path, env var, and remediation hint.
    
    This gives enough signal for three audiences:
    - users can see why login failed and which env/file caused it
    - sysadmins can confirm which override actually won
    - developers can tell whether the failure happened during file read, PEM
    parsing, certificate registration, or final reqwest client construction
    
    ## Tests
    
    Pure unit tests stay limited to env precedence and empty-value handling.
    Real client construction lives in subprocess tests so the suite remains
    hermetic with respect to process env and macOS sandbox behavior.
    
    The subprocess tests verify:
    - `CODEX_CA_CERTIFICATE` precedence over `SSL_CERT_FILE`
    - fallback to `SSL_CERT_FILE`
    - single-certificate and multi-certificate bundles
    - malformed and empty-bundle errors
    - OpenSSL `TRUSTED CERTIFICATE` handling
    - CRL tolerance for well-formed CRL sections
    
    The named PEM fixtures under `login/tests/fixtures/` are shared by the
    tests so their purpose stays reviewable.
    
    ---------
    
    Co-authored-by: Ivan Zakharchanka <3axap4eHko@gmail.com>
    Co-authored-by: Codex <noreply@openai.com>
  • feat: add plugin/read. (#14445)
    return more information for a specific plugin.
  • Fix codex exec --profile handling (#14524)
    PR #14005 introduced a regression whereby `codex exec --profile`
    overrides were dropped when starting or resuming a thread. That causes
    the thread to miss profile-scoped settings like
    `model_instructions_file`.
    
    This PR preserve the active profile in the thread start/resume config
    overrides so the
    app-server rebuild sees the same profile that exec resolved. 
    
    Fixes #14515
  • Reapply "Pass more params to compaction" (#14298) (#14521)
    This reverts commit 8af97ce4b08fdedadc6037851b5e20cc653e9536.
    
    Confirmed that this runs locally without the previous issues with tool
    use
  • Expose code-mode tools through globals (#14517)
    Summary
    - make all code-mode tools accessible as globals so callers only need
    `tools.<name>`
    - rename text/image helpers and key globals (store, load, ALL_TOOLS,
    etc.) to reflect the new shared namespace
    - update the JS bridge, runners, descriptions, router, and tests to
    follow the new API
    
    Testing
    - Not run (not requested)
  • Persist js_repl codex helpers across cells (#14503)
    ## Summary
    
    This changes `js_repl` so saved references to `codex.tool(...)` and
    `codex.emitImage(...)` keep working across cells.
    
    Previously, those helpers were recreated per exec and captured that
    exec's `message.id`. If a persisted object or saved closure reused an
    old helper in a later cell, the nested tool/image call could fail with
    `js_repl exec context not found`.
    
    This patch:
    - keeps stable `codex.tool` and `codex.emitImage` helper identities in
    the kernel
    - resolves the current exec dynamically at call time using
    `AsyncLocalStorage`
    - adds regression coverage for persisted helper references across cells
    - updates the js_repl docs and project-doc instructions to describe the
    new behavior and its limits
    
    ## Why
    
    We already support persistent top-level bindings across `js_repl` cells,
    so persisted objects should be able to reuse `codex` helpers in later
    active cells. The bug was that helper identity was exec-scoped, not
    kernel-scoped.
    
    Using `AsyncLocalStorage` fixes the cross-cell reuse case without
    falling back to a single global active exec that could accidentally
    attribute stale background callbacks to the wrong cell.
  • Update tool search prompts (#14500)
    - [x] Add mentions of connectors because model always think in connector
    terms in its CoT.
    - [x] Suppress list_mcp_resources in favor of tool search for available
    apps.
  • Rename exec session IDs to cell IDs (#14510)
    - Update the code-mode executor, wait handler, and protocol plumbing to
    use cell IDs instead of session IDs for node communication
    - Switch tool metadata, wait description, and suite tests to refer to
    cell IDs so user-visible messages match the new terminology
    
    **Testing**
    - Not run (not requested)
  • memories: focus write prompts on user preferences (#14493)
    ## Summary
    - update `codex-rs/core/templates/memories/stage_one_system.md` so phase
    1 captures stronger user-preference signals, richer task summaries, and
    cwd provenance without branch-specific fields
    - update `codex-rs/core/templates/memories/consolidation.md` so phase 2
    keeps separate sections for user preferences, reusable knowledge, and
    failure shields while staying cwd-aware but branchless
    - document the `codex` prompt-template maintenance rule in
    `codex-rs/core/src/memories/README.md`: the undated templates are
    canonical here and should be edited in place
    
    ## Testing
    - cargo test -p codex-core memories --manifest-path codex-rs/Cargo.toml
  • Fix MCP tool calling (#14491)
    Properly escape mcp tool names and make tools only available via
    imports.