Commit Graph

2552 Commits

  • feat: add explicit AgentIdentity auth mode (#18785)
    ## Summary
    
    This PR adds `CodexAuth::AgentIdentity` as an explicit auth mode.
    
    An AgentIdentity auth record is a standalone `auth.json` mode. When
    `AuthManager::auth().await` loads that mode, it registers one
    process-scoped task and stores it in runtime-only state on the auth
    value. Header creation stays synchronous after that because the task is
    initialized before callers receive the auth object.
    
    This PR also removes the old feature flag path. AgentIdentity is
    selected by explicit auth mode, not by a hidden flag or lazy mutation of
    ChatGPT auth records.
    
    Reference old stack: https://github.com/openai/codex/pull/17387/changes
    
    ## Design Decisions
    
    - AgentIdentity is a real auth enum variant because it can be the only
    credential in `auth.json`.
    - The process task is ephemeral runtime state. It is not serialized and
    is not stored in rollout/session data.
    - Account/user metadata needed by existing Codex backend checks lives on
    the AgentIdentity record for now.
    - `is_chatgpt_auth()` remains token-specific.
    - `uses_codex_backend()` is the broader predicate for ChatGPT-token auth
    and AgentIdentity auth.
    
    ## Stack
    
    1. https://github.com/openai/codex/pull/18757: full revert
    2. https://github.com/openai/codex/pull/18871: isolated Agent Identity
    crate
    3. This PR: explicit AgentIdentity auth mode and startup task allocation
    4. https://github.com/openai/codex/pull/18811: migrate Codex backend
    auth callsites through AuthProvider
    5. https://github.com/openai/codex/pull/18904: accept AgentIdentity JWTs
    and load `CODEX_AGENT_IDENTITY`
    
    ## Testing
    
    Tests: targeted Rust checks, cargo-shear, Bazel lock check, and CI.
  • core: derive active permission profiles (#18277)
    ## Why
    
    `Permissions` should not store a separate `PermissionProfile` that can
    drift from the constrained `SandboxPolicy` and network settings. The
    active profile needs to be derived from the same constrained values that
    already honor `requirements.toml`.
    
    ## What changed
    
    This adds derivation of the active `PermissionProfile` from the
    constrained runtime permission settings and exposes that derived value
    through config snapshots and thread state. The app-server can then
    report the active profile without introducing a second source of truth.
    
    ## Verification
    
    - `cargo test -p codex-core --test all permissions_messages --
    --nocapture`
    - `cargo test -p codex-core --test all request_permissions --
    --nocapture`
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18277).
    * #18288
    * #18287
    * #18286
    * #18285
    * #18284
    * #18283
    * #18282
    * #18281
    * #18280
    * #18279
    * #18278
    * __->__ #18277
  • [codex] Clean guardian instructions (#18934)
    ## Summary
    - Keep the guardian policy installed as guardian base instructions.
    - Clear inherited parent `developer_instructions` for guardian review
    sessions.
    - Update guardian config tests to assert developer instructions are
    cleared and policy text is sourced from base instructions.
    
    ## Why
    Guardian review sessions are intended to run under an isolated guardian
    policy. Because the guardian config is cloned from the parent config,
    inherited custom or managed developer instructions could otherwise
    remain active and conflict with guardian review behavior.
    
    ## Validation
    - `just fmt`
    - `cargo test -p codex-core guardian_review_session_config`
    
    Co-authored-by: Codex <noreply@openai.com>
  • chore(tui) debug-config guardian_policy_config (#18923)
    ## Summary
    List guardian_policy_config_source in `/debug-config` output
    
    ## Testing
     - [x] Ran locally
  • exec-server: carry filesystem sandbox profiles (#18276)
    ## Why
    
    The exec-server still needs platform sandbox inputs, but the migration
    should preserve the `PermissionProfile` that produced them. Keeping only
    the derived legacy sandbox map would keep `SandboxPolicy` as the
    effective abstraction and would make full-disk vs. restricted profiles
    harder to preserve as the permissions stack starts round-tripping
    profiles.
    
    `PermissionProfile` entries can also be cwd-sensitive (`:cwd`,
    `:project_roots`, relative globs), so the exec-server must carry the
    request sandbox cwd instead of resolving those entries against the
    long-lived exec-server process cwd.
    
    ## What changed
    
    `FileSystemSandboxContext` now carries `permissions: PermissionProfile`
    plus an optional `cwd`:
    
    - removed `sandboxPolicy`, `sandboxPolicyCwd`,
    `fileSystemSandboxPolicy`, and `additionalPermissions`
    - added `permissions` and `cwd`
    - kept the platform knobs `windowsSandboxLevel`,
    `windowsSandboxPrivateDesktop`, and `useLegacyLandlock`
    
    Core turn and apply-patch paths populate the context from the active
    runtime permissions and request cwd. Exec-server derives platform
    `SandboxPolicy`/`FileSystemSandboxPolicy` at the filesystem boundary,
    adds helper runtime reads there, and rejects cwd-dependent profiles that
    arrive without a cwd.
    
    The legacy `FileSystemSandboxContext::new(SandboxPolicy)` constructor
    now preserves the old workspace-write conversion semantics for
    compatibility tests/callers.
    
    ## Verification
    
    - `cargo test -p codex-exec-server`
    - `cargo test -p codex-exec-server sandbox_cwd -- --nocapture`
    - `cargo test -p codex-exec-server
    sandbox_context_new_preserves_legacy_workspace_write_read_only_subpaths
    -- --nocapture`
    - `cargo test -p codex-core --lib
    file_system_sandbox_context_uses_active_attempt -- --nocapture`
  • feat: Support remote plugin list/read. (#18452)
    Add a temporary internal remote_plugin feature flag that merges remote
    marketplaces into plugin/list and routes plugin/read through the remote
    APIs when needed, while keeping pure local marketplaces working as
    before.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • feat: add AWS SigV4 auth for OpenAI-compatible model providers (#17820)
    ## Summary
    
    Add first-class Amazon Bedrock Mantle provider support so Codex can keep
    using its existing Responses API transport with OpenAI-compatible
    AWS-hosted endpoints such as AOA/Mantle.
    
    This is needed for the AWS launch path, where provider traffic should
    authenticate with AWS credentials instead of OpenAI bearer credentials.
    Requests are authenticated immediately before transport send, so SigV4
    signs the final method, URL, headers, and body bytes that `reqwest` will
    send.
    
    ## What Changed
    
    - Added a new `codex-aws-auth` crate for loading AWS SDK config,
    resolving credentials, and signing finalized HTTP requests with AWS
    SigV4.
    - Added a built-in `amazon-bedrock` provider that targets Bedrock Mantle
    Responses endpoints, defaults to `us-east-1`, supports region/profile
    overrides, disables WebSockets, and does not require OpenAI auth.
    - Added Amazon Bedrock auth resolution in `codex-model-provider`: prefer
    `AWS_BEARER_TOKEN_BEDROCK` when set, otherwise use AWS SDK credentials
    and SigV4 signing.
    - Added `AuthProvider::apply_auth` and `Request::prepare_body_for_send`
    so request-signing providers can sign the exact outbound request after
    JSON serialization/compression.
    - Determine the region by taking the `aws.region` config first (required
    for bearer token codepath), and fallback to SDK default region.
    
    ## Testing
    Amazon Bedrock Mantle Responses paths:
    
    - Built the local Codex binary with `cargo build`.
    - Verified the custom proxy-backed `aws` provider using `env_key =
    "AWS_BEARER_TOKEN_BEDROCK"` streamed raw `responses` output with
    `response.output_text.delta`, `response.completed`, and `mantle-env-ok`.
    - Verified a full `codex exec --profile aws` turn returned
    `mantle-env-ok`.
    - Confirmed the custom provider used the bearer env var, not AWS profile
    auth: bogus `AWS_PROFILE` still passed, empty env var failed locally,
    and malformed env var reached Mantle and failed with `401
    invalid_api_key`.
    - Verified built-in `amazon-bedrock` with `AWS_BEARER_TOKEN_BEDROCK` set
    passed despite bogus AWS profiles, returning `amazon-bedrock-env-ok`.
    - Verified built-in `amazon-bedrock` SDK/SigV4 auth passed with
    `AWS_BEARER_TOKEN_BEDROCK` unset and temporary AWS session env
    credentials, returning `amazon-bedrock-sdk-env-ok`.
  • test(core): move prompt debug coverage to integration suite (#18916)
    ## Why
    
    `build_prompt_input` now initializes `ExecServerRuntimePaths`, which
    requires a configured Codex executable path. The previous inline unit
    test in `core/src/prompt_debug.rs` built a bare `test_config()` and then
    failed before it could assert anything useful:
    
    ```text
    Codex executable path is not configured
    ```
    
    This coverage is also integration-shaped: it drives the public
    `build_prompt_input` entry point through config, thread, and session
    setup rather than testing a small internal helper in isolation.
    
    Bazel CI did not catch this earlier because the affected test was behind
    the same wrapped Rust unit-test path fixed by #18913. Before that
    launcher/sharding fix, the outer `workspace_root_test` changed the
    working directory for Insta compatibility while the inner `rules_rust`
    sharding wrapper still expected its runfiles working directory. In
    practice, Bazel could report success without executing the Rust test
    cases in that shard. Once #18913 makes the wrapper run the Rust test
    binary directly and shard with libtest arguments, this stale unit test
    actually runs and exposes the missing `codex_self_exe` setup.
    
    ## What Changed
    
    - Moved `build_prompt_input_includes_context_and_user_message` out of
    `core/src/prompt_debug.rs`.
    - Added `core/tests/suite/prompt_debug_tests.rs` and registered it from
    `core/tests/suite/mod.rs`.
    - Builds the test config with `ConfigBuilder` and provides
    `codex_self_exe` using the current test executable, matching the
    runtime-path invariant required by prompt debug setup.
    - Preserves the existing assertions that the generated prompt input
    includes both the debug user message and project-specific user
    instructions.
    
    ## Verification
    
    - `cargo test -p codex-core --test all
    prompt_debug_tests::build_prompt_input_includes_context_and_user_message`
    - `bazel test //codex-rs/core:core-all-test
    --test_arg=prompt_debug_tests::build_prompt_input_includes_context_and_user_message
    --test_output=errors`
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18916).
    * #18913
    * __->__ #18916
  • fix(core): emit hooks for apply_patch edits (#18391)
    Fixes https://github.com/openai/codex/issues/16732.
    
    ## Why
    
    `apply_patch` is Codex's primary file edit path, but it was not emitting
    `PreToolUse` or `PostToolUse` hook events. That meant hook-based policy,
    auditing, and write coordination could observe shell commands while
    missing the actual file mutation performed by `apply_patch`.
    
    The issue also exposed that the hook runtime serialized command hook
    payloads with `tool_name: "Bash"` unconditionally. Even if `apply_patch`
    supplied hook payloads, hooks would either fail to match it directly or
    receive misleading stdin that identified the edit as a Bash tool call.
    
    ## What Changed
    
    - Added `PreToolUse` and `PostToolUse` payload support to
    `ApplyPatchHandler`.
    - Exposed the raw patch body as `tool_input.command` for both
    JSON/function and freeform `apply_patch` calls.
    - Taught tool hook payloads to carry a handler-supplied hook-facing
    `tool_name`.
    - Preserved existing shell compatibility by continuing to emit `Bash`
    for shell-like tools.
    - Serialized the selected hook `tool_name` into hook stdin instead of
    hardcoding `Bash`.
    - Relaxed the generated hook command input schema so `tool_name` can
    represent tools other than `Bash`.
    
    ## Verification
    
    Added focused handler coverage for:
    
    - JSON/function `apply_patch` calls producing a `PreToolUse` payload.
    - Freeform `apply_patch` calls producing a `PreToolUse` payload.
    - Successful `apply_patch` output producing a `PostToolUse` payload.
    - Shell and `exec_command` handlers continuing to expose `Bash`.
    
    Added end-to-end hook coverage for:
    
    - A `PreToolUse` hook matching `^apply_patch$` blocking the patch before
    the target file is created.
    - A `PostToolUse` hook matching `^apply_patch$` receiving the patch
    input and tool response, then adding context to the follow-up model
    request.
    - Non-participating tools such as the plan tool continuing not to emit
    `PreToolUse`/`PostToolUse` hook events.
    
    Also validated manually with a live `codex exec` smoke test using an
    isolated temp workspace and temp `CODEX_HOME`. The smoke test confirmed
    that a real `apply_patch` edit emits `PreToolUse`/`PostToolUse` with
    `tool_name: "apply_patch"`, a shell command still emits `tool_name:
    "Bash"`, and a denying `PreToolUse` hook prevents the blocked patch file
    from being created.
  • Add turn-scoped environment selections (#18416)
    ## Summary
    - add experimental turn/start.environments params for per-turn
    environment id + cwd selections
    - pass selections through core protocol ops and resolve them with
    EnvironmentManager before TurnContext creation
    - treat omitted selections as default behavior, empty selections as no
    environment, and non-empty selections as first environment/cwd as the
    turn primary
    
    ## Testing
    - ran `just fmt`
    - ran `just write-app-server-schema`
    - not run: unit tests for this stacked PR
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • sandboxing: materialize cwd-relative permission globs (#18867)
    ## Why
    
    #18275 anchors session-scoped `:cwd` and `:project_roots` grants to the
    request cwd before recording them for reuse. Relative deny glob entries
    need the same treatment. Without anchoring, a stored session permission
    can keep a pattern such as `**/*.env` relative, then reinterpret that
    deny against a later turn cwd. That makes the persisted profile depend
    on the cwd at reuse time instead of the cwd that was reviewed and
    approved.
    
    ## What changed
    
    `intersect_permission_profiles` now materializes retained
    `FileSystemPath::GlobPattern` entries against the request cwd, matching
    the existing materialization for cwd-sensitive special paths.
    
    Materialized accepted grants are now deduplicated before deny retention
    runs. This keeps the sticky-grant preapproval shape stable when a
    repeated request is merged with the stored grant and both `:cwd = write`
    and the materialized absolute cwd write are present.
    
    The preapproval check compares against the same materialized form, so a
    later request for the same cwd-relative deny glob still matches the
    stored anchored grant instead of re-prompting or rejecting.
    
    Tests cover both the storage path and the preapproval path: a
    session-scoped `:cwd = write` grant with `**/*.env = none` is stored
    with both the cwd write and deny glob anchored to the original request
    cwd, cannot be reused from a later cwd, and remains preapproved when
    re-requested from the original cwd after merging with the stored grant.
    
    ## Verification
    
    - `cargo test -p codex-sandboxing policy_transforms`
    - `cargo test -p codex-core --lib
    relative_deny_glob_grants_remain_preapproved_after_materialization`
    - `cargo clippy -p codex-sandboxing --tests -- -D
    clippy::redundant_clone`
    - `cargo clippy -p codex-core --lib -- -D clippy::redundant_clone`
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18867).
    * #18288
    * #18287
    * #18286
    * #18285
    * #18284
    * #18283
    * #18282
    * #18281
    * #18280
    * #18279
    * #18278
    * #18277
    * #18276
    * __->__ #18867
  • Allow guardian bare allow output (#18797)
    ## Summary
    
    Allow guardian to skip other fields and output only
    `{"outcome":"allow"}` when the command is low risk.
    This change lets guardian reviews use a non-strict text format while
    keeping the JSON schema itself as plain user-visible schema data, so
    transport strictness is carried out-of-band instead of through a schema
    marker key.
    
    ## What changed
    
    - Add an explicit `output_schema_strict` flag to model prompts and pass
    it into `codex-api` text formatting.
    - Set guardian reviewer prompts to non-strict schema validation while
    preserving strict-by-default behavior for normal callers.
    - Update the guardian output contract so definitely-low-risk decisions
    may return only `{"outcome":"allow"}`.
    - Treat bare allow responses as low-risk approvals in the guardian
    parser.
    - Add tests and snapshots covering the non-strict guardian request and
    optional guardian output fields.
    
    ## Verification
    
    - `cargo test -p codex-core guardian::tests::guardian`
    - `cargo test -p codex-core guardian::tests::`
    - `cargo test -p codex-core client_common::tests::`
    - `cargo test -p codex-protocol
    user_input_serialization_includes_final_output_json_schema`
    - `cargo test -p codex-api`
    - `git diff --check`
    
    Note: `cargo test -p codex-core` was also attempted, but this desktop
    environment injects ambient config/proxy state that causes unrelated
    config/session tests expecting pristine defaults to fail.
    
    ---------
    
    Co-authored-by: Dylan Hurd <dylan.hurd@openai.com>
    Co-authored-by: Codex <noreply@openai.com>
  • Support multiple managed environments (#18401)
    ## Summary
    - refactor EnvironmentManager to own keyed environments with
    default/local lookup helpers
    - keep remote exec-server client creation lazy until exec/fs use
    - preserve disabled agent environment access separately from internal
    local environment access
    
    ## Validation
    - not run (per Codex worktree instruction to avoid tests/builds unless
    requested)
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • fix: fully revert agent identity runtime wiring (#18757)
    ## Summary
    
    This PR fully reverts the previously merged Agent Identity runtime
    integration from the old stack:
    https://github.com/openai/codex/pull/17387/changes
    
    It removes the Codex-side task lifecycle wiring, rollout/session
    persistence, feature flag plumbing, lazy `auth.json` mutation,
    background task auth paths, and request callsite changes introduced by
    that stack.
    
    This leaves the repo in a clean pre-AgentIdentity integration state so
    the follow-up PRs can reintroduce the pieces in smaller reviewable
    layers.
    
    ## Stack
    
    1. This PR: full revert
    2. https://github.com/openai/codex/pull/18871: move Agent Identity
    business logic into a crate
    3. https://github.com/openai/codex/pull/18785: add explicit
    AgentIdentity auth mode and startup task allocation
    4. https://github.com/openai/codex/pull/18811: migrate auth callsites
    through AuthProvider
    
    ## Testing
    
    Tests: targeted Rust checks, cargo-shear, Bazel lock check, and CI.
  • chore: default multi-agent v2 fork to all (#18873)
    Default sub-agents v2 to `all` for the fork mode
  • Add Windows sandbox unified exec runtime support (#15578)
    ## Summary
    
    This is the runtime/foundation half of the Windows sandbox unified-exec
    work.
    
    - add Windows sandbox `unified_exec` session support in
    `windows-sandbox-rs` for both:
      - the legacy restricted-token backend
      - the elevated runner backend
    - extend the PTY/process runtime so driver-backed sessions can support:
      - stdin streaming
      - stdout/stderr separation
      - exit propagation
      - PTY resize hooks
    - add Windows sandbox runtime coverage in `codex-windows-sandbox` /
    `codex-utils-pty`
    
    This PR does **not** enable Windows sandbox `UnifiedExec` for product
    callers yet because hooking this up to app-server comes in the next PR.
    
    Windows sandbox advertising is intentionally kept aligned with `main`,
    so sandboxed Windows callers still fall back to `ShellCommand`.
    
    This PR isolates the runtime/session layer so it can be reviewed
    independently from product-surface enablement.
    
    ---------
    
    Co-authored-by: jif-oai <jif@openai.com>
    Co-authored-by: Codex <noreply@openai.com>
  • sandboxing: intersect permission profiles semantically (#18275)
    ## Why
    
    Permission approval responses must not be able to grant more access than
    the tool requested. Moving this flow to `PermissionProfile` means the
    comparison must be profile-shaped instead of `SandboxPolicy`-shaped, and
    cwd-relative special paths such as `:cwd` and `:project_roots` must stay
    anchored to the turn that produced the request.
    
    ## What changed
    
    This implements semantic `PermissionProfile` intersection in
    `codex-sandboxing` for file-system and network permissions. The
    intersection accepts narrower path grants, rejects broader grants,
    preserves deny-read carve-outs and glob scan depth, and materializes
    cwd-dependent special-path grants to absolute paths before they can be
    recorded for reuse.
    
    The request-permissions response paths now use that intersection
    consistently. App-server captures the request turn cwd before waiting
    for the client response, includes that cwd in the v2 approval params,
    and core stores the requested profile plus cwd for direct TUI/client
    responses and Guardian decisions before recording turn- or
    session-scoped grants. The TUI app-server bridge now preserves the
    app-server request cwd when converting permission approval params into
    core events.
    
    ## Verification
    
    - `cargo test -p codex-sandboxing intersect_permission_profiles --
    --nocapture`
    - `cargo test -p codex-app-server request_permissions_response --
    --nocapture`
    - `cargo test -p codex-core
    request_permissions_response_materializes_session_cwd_grants_before_recording
    -- --nocapture`
    - `cargo check -p codex-tui --tests`
    - `cargo check --tests`
    - `cargo test -p codex-tui
    app_server_request_permissions_preserves_file_system_permissions`
  • Split DeveloperInstructions into individual fragments. (#18813)
    Split DeveloperInstructions into individual fragments.
  • Refactor app-server config loading into ConfigManager (#18442)
    Localize app-server configuration loading in one place.
  • Propagate thread id in MCP tool metadata (#18093)
    ## Summary
    - attach the authoritative Codex thread id to MCP tool request
    `_meta.threadId` for model-initiated tool calls
    - attach the same thread id for manual `mcpServer/tool/call` requests
    before invoking the MCP server
    - cover both metadata helper behavior and the manual app-server MCP path
    in tests
    
    
    needed because the Rust app-server is the last place that still has
    authoritative knowledge of “this model-generated MCP tool call belongs
    to conversation/thread X” before the request leaves Codex and reaches
    Hoopa. It adds threadId to MCP request metadata in the model-generated
    tool-call path, using sess.conversation_id, and also does the same for
    the manual mcpServer/tool/call path.
    
    ## Test plan
    - `cargo test -p codex-core
    mcp_tool_call_thread_id_meta_is_added_to_request_meta --lib`
    - `cargo test -p codex-app-server
    mcp_server_tool_call_returns_tool_result`
    
    Paired Hoopa consumer PR: https://github.com/openai/openai/pull/833263
  • Move external agent config out of core (#18850)
    ## Summary
    - Move external agent config migration logic and tests from `codex-core`
    into `app-server/src/config`.
    - Keep the migration service crate-private to app-server and update the
    API adapter imports.
    - Remove stale core re-exports and expose only the needed marketplace
    source helper.
    
    ## Testing
    - `cargo test -p codex-app-server config::external_agent_config`
    - `just fmt`
    - `just fix -p codex-app-server`
    - `just fix -p codex-core`
    - `git diff --check`
  • [tool search] support namespaced deferred dynamic tools (#18413)
    Deferred dynamic tools need to round-trip a namespace so a tool returned
    by `tool_search` can be called through the same registry key that core
    uses for dispatch.
    
    This change adds namespace support for dynamic tool specs/calls,
    persists it through app-server thread state, and routes dynamic tool
    calls by full `ToolName` while still sending the app the leaf tool name.
    Deferred dynamic tools must provide a namespace; non-deferred dynamic
    tools may remain top-level.
    
    It also introduces `LoadableToolSpec` as the shared
    function-or-namespace Responses shape used by both `tool_search` output
    and dynamic tool registration, so dynamic tools use the same wrapping
    logic in both paths.
    
    Validation:
    - `cargo test -p codex-tools`
    - `cargo test -p codex-core tool_search`
    
    ---------
    
    Co-authored-by: Sayan Sisodiya <sayan@openai.com>
  • chore: document intentional await-holding cases (#18423)
    ## Why
    
    This PR prepares the stack to enable Clippy await-holding lints that
    were left disabled in #18178. The mechanical lock-scope cleanup is
    handled separately; this PR is the documentation/configuration layer for
    the remaining await-across-guard sites.
    
    Without explicit annotations, reviewers and future maintainers cannot
    tell whether an await-holding warning is a real concurrency smell or an
    intentional serialization boundary.
    
    ## What changed
    
    - Configures `clippy.toml` so `await_holding_invalid_type` also covers
    `tokio::sync::{MutexGuard,RwLockReadGuard,RwLockWriteGuard}`.
    - Adds targeted `#[expect(clippy::await_holding_invalid_type, reason =
    ...)]` annotations for intentional async guard lifetimes.
    - Documents the main categories of intentional cases: active-turn state
    transitions that must remain atomic, session-owned MCP manager accesses,
    remote-control websocket serialization, JS REPL kernel/process
    serialization, OAuth persistence, external bearer token refresh
    serialization, and tests that intentionally serialize shared global or
    session-owned state.
    - For external bearer token refresh, documents the existing
    serialization boundary: holding `cached_token` across the provider
    command prevents concurrent cache misses from starting duplicate refresh
    commands, and the current behavior is small enough that an explicit
    expectation is easier to maintain than adding another synchronization
    primitive.
    
    ## Verification
    
    - `cargo clippy -p codex-login --all-targets`
    - `cargo clippy -p codex-connectors --all-targets`
    - `cargo clippy -p codex-core --all-targets`
    - The follow-up PR #18698 enables `await_holding_invalid_type` and
    `await_holding_lock` as workspace `deny` lints, so any undocumented
    remaining offender will fail Clippy.
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18423).
    * #18698
    * __->__ #18423
  • Organize context fragments (#18794)
    Organize context fragments under `core/context`. Implement same trait on
    all of them.
  • Add remote_sandbox_config to our config requirements (#18763)
    ## Why
    
    Customers need finer-grained control over allowed sandbox modes based on
    the host Codex is running on. For example, they may want stricter
    sandbox limits on devboxes while keeping a different default elsewhere.
    
    Our current cloud requirements can target user/account groups, but they
    cannot vary sandbox requirements by host. That makes remote development
    environments awkward because the same top-level `allowed_sandbox_modes`
    has to apply everywhere.
    
    ## What
    
    Adds a new `remote_sandbox_config` section to `requirements.toml`:
    
    ```toml
    allowed_sandbox_modes = ["read-only"]
    
    [[remote_sandbox_config]]
    hostname_patterns = ["*.org"]
    allowed_sandbox_modes = ["read-only", "workspace-write"]
    
    [[remote_sandbox_config]]
    hostname_patterns = ["*.sh", "runner-*.ci"]
    allowed_sandbox_modes = ["read-only", "danger-full-access"]
    ```
    
    During requirements resolution, Codex resolves the local host name once,
    preferring the machine FQDN when available and falling back to the
    cleaned kernel hostname. This host classification is best effort rather
    than authenticated device proof.
    
    Each requirements source applies its first matching
    `remote_sandbox_config` entry before it is merged with other sources.
    The shared merge helper keeps that `apply_remote_sandbox_config` step
    paired with requirements merging so new requirements sources do not have
    to remember the extra call.
    
    That preserves source precedence: a lower-precedence requirements file
    with a matching `remote_sandbox_config` cannot override a
    higher-precedence source that already set `allowed_sandbox_modes`.
    
    This also wires the hostname-aware resolution through app-server,
    CLI/TUI config loading, config API reads, and config layer metadata so
    they all evaluate remote sandbox requirements consistently.
    
    ## Verification
    
    - `cargo test -p codex-config remote_sandbox_config`
    - `cargo test -p codex-config host_name`
    - `cargo test -p codex-core
    load_config_layers_applies_matching_remote_sandbox_config`
    - `cargo test -p codex-core
    system_remote_sandbox_config_keeps_cloud_sandbox_modes`
    - `cargo test -p codex-config`
    - `cargo test -p codex-core` unit tests passed; `tests/all.rs`
    integration matrix was intentionally stopped after the relevant focused
    tests passed
    - `just fix -p codex-config`
    - `just fix -p codex-core`
    - `cargo check -p codex-app-server`
  • feat(auto-review) Handle request_permissions calls (#18393)
    ## Summary
    When auto-review is enabled, it should handle request_permissions tool.
    We'll need to clean up the UX but I'm planning to do that in a separate
    pass
    
    ## Testing
    - [x] Ran locally
    <img width="893" height="396" alt="Screenshot 2026-04-17 at 1 16 13 PM"
    src="https://github.com/user-attachments/assets/4c045c5f-1138-4c6c-ac6e-2cb6be4514d8"
    />
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • fix(guardian) Dont hard error on feature disable (#18795)
    ## Summary 
    This shouldn't error for now
    
    ## Test plan
    - [x] Updated unit test
  • feat: add a built-in Amazon Bedrock model provider (#18744)
    ## Why
    
    Codex needs a first-class `amazon-bedrock` model provider so users can
    select Bedrock without copying a full provider definition into
    `config.toml`. The provider has Codex-owned defaults for the pieces that
    should stay consistent across users: the display `name`, Bedrock
    `base_url`, and `wire_api`.
    
    At the same time, users still need a way to choose the AWS credential
    profile used by their local environment. This change makes
    `amazon-bedrock` a partially modifiable built-in provider: code owns the
    provider identity and endpoint defaults, while user config can set
    `model_providers.amazon-bedrock.aws.profile`.
    
    For example:
    
    ```toml
    model_provider = "amazon-bedrock"
    
    [model_providers.amazon-bedrock.aws]
    profile = "codex-bedrock"
    ```
    
    ## What Changed
    
    - Added `amazon-bedrock` to the built-in model provider map with:
      - `name = "Amazon Bedrock"`
      - `base_url = "https://bedrock-mantle.us-east-1.api.aws/v1"`
      - `wire_api = "responses"`
    - Added AWS provider auth config with a profile-only shape:
    `model_providers.<id>.aws.profile`.
    - Kept AWS auth config restricted to `amazon-bedrock`; custom providers
    that set `aws` are rejected.
    - Allowed `model_providers.amazon-bedrock` through reserved-provider
    validation so it can act as a partial override.
    - During config loading, only `aws.profile` is copied from the
    user-provided `amazon-bedrock` entry onto the built-in provider. Other
    Bedrock provider fields remain hard-coded by the built-in definition.
    - Updated the generated config schema for the new provider AWS profile
    config.
  • fix: fix stale proxy env restoration after shell snapshots (#17271)
    ## Summary
    
    This fixes a stale-environment path in shell snapshot restoration. A
    sandboxed command can source a shell snapshot that was captured while an
    older proxy process was running. If that proxy has died and come back on
    a different port, the snapshot can otherwise put old proxy values back
    into the command environment, which is how tools like `pip` end up
    talking to a dead proxy.
    
    The wrapper now captures the live process environment before sourcing
    the snapshot and then restores or clears every proxy env var from the
    proxy crate's canonical list. That makes proxy state after shell
    snapshot restoration match the current command environment, rather than
    whatever proxy values happened to be present in the snapshot. On macOS,
    the Codex-generated `GIT_SSH_COMMAND` is refreshed when the SOCKS
    listener changes, while custom SSH wrappers are still left alone.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Add session config loader interface (#18208)
    ## Why
    
    Cloud-hosted sessions need a way for the service that starts or manages
    a thread to provide session-owned config without treating all config as
    if it came from the same user/project/workspace TOML stack.
    
    The important boundary is ownership: some values should be controlled by
    the session/orchestrator, some by the authenticated user, and later some
    may come from the executor. The earlier broad config-store shape made
    that boundary too fuzzy and overlapped heavily with the existing
    filesystem-backed config loader. This PR starts with the smaller piece
    we need now: a typed session config loader that can feed the existing
    config layer stack while preserving the normal precedence and merge
    behavior.
    
    ## What Changed
    
    - Added `ThreadConfigLoader` and related typed payloads in
    `codex-config`.
    - `SessionThreadConfig` currently supports `model_provider`,
    `model_providers`, and feature flags.
    - `UserThreadConfig` is present as an ownership boundary, but does not
    yet add TOML-backed fields.
    - `NoopThreadConfigLoader` preserves existing behavior when no external
    loader is configured.
      - `StaticThreadConfigLoader` supports tests and simple callers.
    
    - Taught thread config sources to produce ordinary `ConfigLayerEntry`
    values so the existing `ConfigLayerStack` remains the place where
    precedence and merging happen.
    
    - Wired the loader through `ConfigBuilder`, the config loader, and
    app-server startup paths so app-server can provide session-owned config
    before deriving a thread config.
    
    - Added coverage for:
      - translating typed thread config into config layers,
    - inserting thread config layers into the stack at the right precedence,
    - applying session-provided model provider and feature settings when
    app-server derives config from thread params.
    
    ## Follow-Ups
    
    This intentionally stops short of adding the remote/service transport.
    The next pieces are expected to be:
    
    1. Define the proto/API shape for this interface.
    2. Add a client implementation that can source session config from the
    service side.
    
    ## Verification
    
    - Added unit coverage in `codex-config` for the loader and layer
    conversion.
    - Added `codex-core` config loader coverage for thread config layer
    precedence.
    - Added app-server coverage that verifies session thread config wins
    over request-provided config for model provider and feature settings.
  • Add realtime silence tool (#18635)
    ## Summary
    
    Adds a second realtime v2 function tool, `remain_silent`, so the
    realtime model has an explicit non-speaking action when the
    collaboration mode or latest context says it should not answer aloud.
    This is stacked on #18597.
    
    ## Design
    
    - Advertise `remain_silent` alongside `background_agent` in realtime v2
    conversational sessions.
    - Parse `remain_silent` function calls into a typed
    `RealtimeEvent::NoopRequested` event.
    - Have core answer that function call with an empty
    `function_call_output` and deliberately avoid `response.create`, so no
    follow-up realtime response is requested.
    - Keep the event hidden from app-server/TUI surfaces; it is operational
    plumbing, not user-visible conversation content.
  • [codex] prefer inherited spawn agent model (#18701)
    This updates the spawn-agent tool contract so subagents are presented as
    inheriting the parent model by default. The visible model list is now
    framed as optional overrides, the model parameter tells callers to leave
    it unset and the delegation guidance no longer nudges models toward
    picking a smaller/mini override.
    
    Fixes reports that 5.4 would occasionally pick 5.2 or lower as
    sub-agents.
  • Add experimental remote thread store config (#18714)
    Add experimental config to use remote thread store rather than local
    thread store implementation in app server
  • Fix stale model test fixtures (#18719)
    Fixes stale test fixtures left after the active bundled model catalog
    updates in #18586 and #18388. Those changes made `gpt-5.4` the current
    default and removed several older hardcoded slugs, which left Windows
    Bazel shards failing TUI and config tests.
    
    What changed:
    - Refresh TUI model migration, availability NUX, plan-mode, status, and
    snapshot fixtures to use active bundled model slugs.
    - Update the config edit test expectation for the TOML-quoted
    `"gpt-5.2"` migration key.
    - Move the model catalog tests into
    `codex-rs/tui/src/app/tests/model_catalog.rs` so touching them does not
    trip the blob-size policy for `app.rs`.
    
    Verification:
    - CI Bazel/lint checks are expected to cover the affected test shards.
  • Update realtime handoff transcript handling (#18597)
    ## Summary
    
    This PR aims to improve integration between the realtime model and the
    codex agent by sharing more context with each other. In particular, we
    now share full realtime conversation transcript deltas in addition to
    the delegation message.
    
    realtime_conversation.rs now turns a handoff into:
    ```
    <realtime_delegation>
      <input>...</input>
      <transcript_delta>...</transcript_delta>
    </realtime_delegation>
    ```
    
    ## Implementation notes
    
    The transcript is accumulated in the realtime websocket layer as parsed
    realtime events arrive. When a background-agent handoff is requested,
    the current transcript snapshot is copied onto the handoff event and
    then serialized by `realtime_conversation.rs` into the hidden realtime
    delegation envelope that Codex receives as user-turn context.
    
    For Realtime V2, the session now explicitly enables input audio
    transcription, and the parser handles the relevant input/output
    transcript completion events so the snapshot includes both user speech
    and realtime model responses. The delegation `<input>` remains the
    actual handoff request, while `<transcript_delta>` carries the
    surrounding conversation history for context.
    
    Reviewers should note that the transcript payload is intended for Codex
    context sharing, not UI rendering. The realtime delegation envelope
    should stay hidden from the user-facing transcript surface, while still
    being included in the background-agent turn so Codex can answer with the
    same conversational context the realtime model had.
  • chore(guardian) disable mcps and plugins (#18722)
    ## Summary
    Disables apps, plugins, mcps for the guardian subagent thread
    
    ## Testing
    - [x] Added unit tests
  • [codex-analytics] guardian review analytics schema polishing (#17692)
    ## Why
    
    Guardian review analytics needs a Rust event shape that matches the
    backend schema while avoiding unnecessary PII exposure from reviewed
    tool calls. This PR narrows the analytics payload to the fields we
    intend to emit and keeps shared Guardian assessment enums in protocol
    instead of duplicating equivalent analytics-only enums.
    
    ## What changed
    
    - Uses protocol Guardian enums directly for `risk_level`,
    `user_authorization`, `outcome`, and command source values.
    - Removes high-risk reviewed-action fields from the analytics payload,
    including raw commands, display strings, working directories, file
    paths, network targets/hosts, justification text, retry reason, and
    rationale text.
    - Makes `target_item_id` and `tool_call_count` nullable so the Codex
    event can represent cases where the app-server protocol or producer does
    not have those values.
    - Keeps lower-risk structured reviewed-action metadata such as sandbox
    permissions, permission profile, `tty`, `execve` source/program, network
    protocol/port, and MCP connector/tool labels.
    - Adds an analytics reducer/client test covering `codex_guardian_review`
    serialization with an optional `target_item_id` and absent removed
    fields.
    
    ## Verification
    
    - `cargo test -p codex-analytics
    guardian_review_event_ingests_custom_fact_with_optional_target_item`
    - `cargo fmt --check`
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/17692).
    * #17696
    * #17695
    * #17693
    * __->__ #17692
  • Wire the PatchUpdated events through app_server (#18289)
    Wires patch_updated events through app_server. These events are parsed
    and streamed while apply_patch is being written by the model. Also adds 500ms of buffering to the patch_updated events in the diff_consumer.
    
    The eventual goal is to use this to display better progress indicators in
    the codex app.
  • Update models.json (#18586)
    - Replace the active models-manager catalog with the deleted core
    catalog contents.
    - Replace stale hardcoded test model slugs with current bundled model
    slugs.
    - Keep this as a stacked change on top of the cleanup PR.
  • refactor: use semaphores for async serialization gates (#18403)
    This is the second cleanup in the await-holding lint stack. The
    higher-level goal, following https://github.com/openai/codex/pull/18178
    and https://github.com/openai/codex/pull/18398, is to enable Clippy
    coverage for guards held across `.await` points without carrying broad
    suppressions.
    
    The stack is working toward enabling Clippy's
    [`await_holding_lock`](https://rust-lang.github.io/rust-clippy/master/index.html#await_holding_lock)
    lint and the configurable
    [`await_holding_invalid_type`](https://rust-lang.github.io/rust-clippy/master/index.html#await_holding_invalid_type)
    lint for Tokio guard types.
    
    Several existing fields used `tokio::sync::Mutex<()>` only as
    one-at-a-time async gates. Those guards intentionally lived across
    `.await` while an operation was serialized. A mutex over `()` suggests
    protected data and trips the await-holding lint shape; a single-permit
    `tokio::sync::Semaphore` expresses the intended serialization directly.
    
    ## What changed
    
    - Replace `Mutex<()>` serialization gates with `Semaphore::new(1)` for
    agent identity ensure, exec policy updates, guardian review session
    reuse, plugin remote sync, managed network proxy refresh, auth token
    refresh, and RMCP session recovery.
    - Update call sites from `lock().await` / `try_lock()` to
    `acquire().await` / `try_acquire()`.
    - Map closed-semaphore errors into the existing local error types, even
    though these semaphores are owned for the lifetime of their managers.
    - Update session test builders for the new
    `managed_network_proxy_refresh_lock` type.
    
    ## Verification
    
    - The split stack was verified at the final lint-enabling head with
    `just clippy`.
    
    
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18403).
    * #18698
    * #18423
    * #18418
    * __->__ #18403
  • protocol: canonicalize file system permissions (#18274)
    ## Why
    
    `PermissionProfile` needs stable, canonical file-system semantics before
    it can become the primary runtime permissions abstraction. Without a
    canonical form, callers have to keep re-deriving legacy sandbox maps and
    profile comparisons remain lossy or order-dependent.
    
    ## What changed
    
    This adds canonicalization helpers for `FileSystemPermissions` and
    `PermissionProfile`, expands special paths into explicit sandbox
    entries, and updates permission request/conversion paths to consume
    those canonical entries. It also tightens the legacy bridge so root-wide
    write profiles with narrower carveouts are not silently projected as
    full-disk legacy access.
    
    ## Verification
    
    - `cargo test -p codex-protocol
    root_write_with_read_only_child_is_not_full_disk_write -- --nocapture`
    - `cargo test -p codex-sandboxing permission -- --nocapture`
    - `cargo test -p codex-tui permissions -- --nocapture`
  • Avoid false shell snapshot cleanup warnings (#18441)
    ## Why
    Fresh app-server thread startup can create a shell snapshot through a
    temp file and then promote it to the final snapshot path. The previous
    implementation briefly wrapped the temp path in `ShellSnapshot`, so
    after a successful rename its `Drop` attempted to delete the old temp
    path and could log a false `ENOENT` warning.
    
    Fixes #17549.
    
    ## What changed
    - Validate the temp snapshot path directly before promotion.
    - Rename the temp path directly to the final snapshot path.
    - Keep explicit cleanup of the temp path on validation or finalization
    failures.
  • [codex] Use background agent task auth for backend calls (#18094)
    ## Summary
    
    Introduces a single background/control-plane agent task for ChatGPT
    backend requests that do not have a thread-scoped task, with
    `AuthManager` owning the default ChatGPT backend authorization decision.
    
    Callers now ask `AuthManager` for the default ChatGPT backend
    authorization header. `AuthManager` decides whether that is bearer or
    background AgentAssertion based on config/internal state, while
    low-level bootstrap paths can explicitly request bearer-only auth.
    
    This PR is stacked on PR4 and focuses on the shared background task auth
    plumbing plus the first tranche of backend/control-plane consumers. The
    remaining callsite wiring is split into PR4.2 to keep review size down.
    
    ## Stack
    
    - PR1: https://github.com/openai/codex/pull/17385 - add
    `features.use_agent_identity`
    - PR2: https://github.com/openai/codex/pull/17386 - register agent
    identities when enabled
    - PR3: https://github.com/openai/codex/pull/17387 - register agent tasks
    when enabled
    - PR3.1: https://github.com/openai/codex/pull/17978 - persist and
    prewarm registered tasks per thread
    - PR4: https://github.com/openai/codex/pull/17980 - use task-scoped
    `AgentAssertion` for downstream calls
    - PR4.1: this PR - introduce AuthManager-owned background/control-plane
    `AgentAssertion` auth
    - PR4.2: https://github.com/openai/codex/pull/18260 - use background
    task auth for additional backend/control-plane calls
    
    ## What Changed
    
    - add background task registration and assertion minting inside
    `codex-login`
    - persist `agent_identity.background_task_id` separately from
    per-session task state
    - make `BackgroundAgentTaskManager` private to `codex-login`; call sites
    do not instantiate or pass it around
    - teach `AuthManager` the ChatGPT backend base URL and feature-derived
    background auth mode from resolved config
    - expose bearer-only helpers for bootstrap/registration/refresh-style
    paths that must not use AgentAssertion
    - wire `AuthManager` default ChatGPT authorization through app listing,
    connector directory listing, remote plugins, MCP status/listing,
    analytics, and core-skills remote calls
    - preserve bearer fallback when the feature is disabled, the backend
    host is unsupported, or background task registration is not available
    
    ## Validation
    
    - `just fmt`
    - `cargo check -p codex-core -p codex-login -p codex-analytics -p
    codex-app-server -p codex-cloud-requirements -p codex-cloud-tasks -p
    codex-models-manager -p codex-chatgpt -p codex-model-provider -p
    codex-mcp -p codex-core-skills`
    - `cargo test -p codex-login agent_identity`
    - `cargo test -p codex-model-provider bearer_auth_provider`
    - `cargo test -p codex-core agent_assertion`
    - `cargo test -p codex-app-server remote_control`
    - `cargo test -p codex-cloud-requirements fetch_cloud_requirements`
    - `cargo test -p codex-models-manager manager::tests`
    - `cargo test -p codex-chatgpt`
    - `cargo test -p codex-cloud-tasks`
    - `just fix -p codex-core -p codex-login -p codex-analytics -p
    codex-app-server -p codex-cloud-requirements -p codex-cloud-tasks -p
    codex-models-manager -p codex-chatgpt -p codex-model-provider -p
    codex-mcp -p codex-core-skills`
    - `just fix -p codex-app-server`
    - `git diff --check`
  • feat: add metric to track the number of turns with memory usage (#18662)
    Add a metric `codex.turn.memory` to know if a turn used memories or not.
    This is not part of the other turn metrics as a label to limit
    cardinality
  • feat: add --ignore-user-config and --ignore-rules (#18646)
    Add those 2 flags to be able to fully isolate a run of `codex exec` from
    any rules or tools.
    This will be used by Chronicle
  • fix: FS watcher when file does not exist yet (#18492)
    The initial goal of this PR was to stabilise the test
    `fs_watch_allows_missing_file_targets`. After further investigation, it
    turns out that this test was always failing and the unstability was
    coming from a race between timeouts mostly
    
    The goal of the test was to test what happens if a notifier gets
    subscribed while a file does not exist yet. But actually the main code
    was broken and in case of a file not existing yet, the notifier used to
    never notify anything (even if the file ended up being created)
    
    This PR fixes the main code (and the test). For this, we basically watch
    the sup-directory when a file does not exist and refresh on it when the
    files gets created
  • chore: morpheus to path (#18353)
    Make the morpheus agent (which is the phase 2 memories agent) follow the
    agent-v2 path system by naming it `/morpheus`. To maintain the path
    primitive this means moving it to a dedicated `AgentControl`
    
    Co-authored-by: Codex <noreply@openai.com>