Commit Graph

5666 Commits

  • app-server: expose thread permission profiles (#18278)
    ## Why
    
    The `PermissionProfile` migration needs app-server clients to see the
    same constrained permission model that core is using at runtime. Before
    this PR, thread lifecycle responses only exposed the legacy
    `SandboxPolicy` shape, so clients still had to infer active permissions
    from sandbox fields. That makes downstream resume, fork, and override
    flows harder to make `PermissionProfile`-first.
    
    External sandbox policies are intentionally excluded from this canonical
    view. External enforcement cannot be round-tripped as a
    `PermissionProfile`, and exposing a lossy root-write profile would let
    clients accidentally change sandbox semantics if they echo the profile
    back later.
    
    ## What changed
    
    - Adds the app-server v2 `PermissionProfile` wire shape, including
    filesystem permissions and glob scan depth metadata.
    - Adds `PermissionProfileNetworkPermissions` so the profile response
    does not expose active network state through the older
    additional-permissions naming.
    - Returns `permissionProfile` from thread start, resume, and fork
    responses when the active sandbox can be represented as a
    `PermissionProfile`.
    - Keeps legacy `sandbox` in those responses for compatibility and
    documents `permissionProfile` as canonical when present.
    - Makes lifecycle `permissionProfile` nullable and returns `null` for
    `ExternalSandbox` to avoid exposing a lossy profile.
    - Regenerates the app-server JSON schema and TypeScript fixtures.
    
    ## Verification
    
    - `cargo test -p codex-app-server-protocol`
    - `cargo test -p codex-app-server
    thread_response_permission_profile_omits_external_sandbox --
    --nocapture`
    - `cargo check --tests -p codex-analytics -p codex-exec -p codex-tui`
    - `just fix -p codex-app-server-protocol -p codex-app-server -p
    codex-analytics -p codex-exec -p codex-tui`
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18278).
    * #18279
    * __->__ #18278
  • use long-lived sessions for codex sandbox windows (#18953)
    `codex sandbox windows` previously did a one-shot spawn for all
    commands.
    This change uses the `unified_exec` session to spawn long-lived
    processes instead, and implements a simple bridge to forward stdin to
    the spawned session and stdout/stderr from the spawned session back to
    the caller.
    
    It also fixes a bug with the new shared spawn context code where the
    "no-network env" was being applied to both elevated and unelevated
    sandbox spawns. It should only be applied for the unelevated sandbox
    because the elevated one uses firewall rules instead of an env-based
    network suppression strategy.
  • feat: add explicit AgentIdentity auth mode (#18785)
    ## Summary
    
    This PR adds `CodexAuth::AgentIdentity` as an explicit auth mode.
    
    An AgentIdentity auth record is a standalone `auth.json` mode. When
    `AuthManager::auth().await` loads that mode, it registers one
    process-scoped task and stores it in runtime-only state on the auth
    value. Header creation stays synchronous after that because the task is
    initialized before callers receive the auth object.
    
    This PR also removes the old feature flag path. AgentIdentity is
    selected by explicit auth mode, not by a hidden flag or lazy mutation of
    ChatGPT auth records.
    
    Reference old stack: https://github.com/openai/codex/pull/17387/changes
    
    ## Design Decisions
    
    - AgentIdentity is a real auth enum variant because it can be the only
    credential in `auth.json`.
    - The process task is ephemeral runtime state. It is not serialized and
    is not stored in rollout/session data.
    - Account/user metadata needed by existing Codex backend checks lives on
    the AgentIdentity record for now.
    - `is_chatgpt_auth()` remains token-specific.
    - `uses_codex_backend()` is the broader predicate for ChatGPT-token auth
    and AgentIdentity auth.
    
    ## Stack
    
    1. https://github.com/openai/codex/pull/18757: full revert
    2. https://github.com/openai/codex/pull/18871: isolated Agent Identity
    crate
    3. This PR: explicit AgentIdentity auth mode and startup task allocation
    4. https://github.com/openai/codex/pull/18811: migrate Codex backend
    auth callsites through AuthProvider
    5. https://github.com/openai/codex/pull/18904: accept AgentIdentity JWTs
    and load `CODEX_AGENT_IDENTITY`
    
    ## Testing
    
    Tests: targeted Rust checks, cargo-shear, Bazel lock check, and CI.
  • core: derive active permission profiles (#18277)
    ## Why
    
    `Permissions` should not store a separate `PermissionProfile` that can
    drift from the constrained `SandboxPolicy` and network settings. The
    active profile needs to be derived from the same constrained values that
    already honor `requirements.toml`.
    
    ## What changed
    
    This adds derivation of the active `PermissionProfile` from the
    constrained runtime permission settings and exposes that derived value
    through config snapshots and thread state. The app-server can then
    report the active profile without introducing a second source of truth.
    
    ## Verification
    
    - `cargo test -p codex-core --test all permissions_messages --
    --nocapture`
    - `cargo test -p codex-core --test all request_permissions --
    --nocapture`
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18277).
    * #18288
    * #18287
    * #18286
    * #18285
    * #18284
    * #18283
    * #18282
    * #18281
    * #18280
    * #18279
    * #18278
    * __->__ #18277
  • chore: remove unused Bedrock auth lazy loading (#18948)
    ## Summary
    
    The Bedrock Mantle SigV4 auth provider currently looks like it can
    lazily load `AwsAuthContext`, but the provider is only constructed after
    `resolve_auth_method` has already loaded that context. Because
    `with_context` always pre-populates the `OnceCell`, the
    `get_or_try_init` fallback is unused in normal operation and makes the
    provider lifecycle harder to reason about.
    
    This change removes that dead lazy-loading path and makes the actual
    behavior explicit:
    
    - `BedrockAuthMethod::AwsSdkAuth` carries only the resolved
    `AwsAuthContext`.
    - `BedrockMantleSigV4AuthProvider` stores the resolved context directly.
    - request signing uses the stored context without going through
    `OnceCell`.
    
    The existing eager AWS auth resolution behavior is unchanged; this is a
    simplification of the provider state, not a behavior change.
    
    ## Testing
    
    - `cargo shear`
    - `cargo test -p codex-model-provider`
    - `just bazel-lock-check`
  • [codex] Clean guardian instructions (#18934)
    ## Summary
    - Keep the guardian policy installed as guardian base instructions.
    - Clear inherited parent `developer_instructions` for guardian review
    sessions.
    - Update guardian config tests to assert developer instructions are
    cleared and policy text is sourced from base instructions.
    
    ## Why
    Guardian review sessions are intended to run under an isolated guardian
    policy. Because the guardian config is cloned from the parent config,
    inherited custom or managed developer instructions could otherwise
    remain active and conflict with guardian review behavior.
    
    ## Validation
    - `just fmt`
    - `cargo test -p codex-core guardian_review_session_config`
    
    Co-authored-by: Codex <noreply@openai.com>
  • tests: serialize process-heavy Windows CI suites (#18943)
    ## Why
    
    A [Windows Cargo
    build](https://github.com/openai/codex/actions/runs/24754807756/job/72425641062)
    on `main` timed out in several unrelated-looking suites at the same
    time:
    
    - `codex-app-server` account tests failed before account logic, while
    `mcp.initialize()` was waiting for the first JSON-RPC response.
    - `codex-core` `apply_patch_cli` tests timed out while running full
    Codex/apply_patch turns.
    - `codex-windows-sandbox` legacy session tests timed out while creating
    restricted-token child processes and private desktops.
    
    The app-server log reached the test harness write path in
    [`McpProcess::initialize_with_params`](https://github.com/openai/codex/blob/731b54d08fb93aec31fb020999a62447886f3ab3/codex-rs/app-server/tests/common/mcp_process.rs#L244-L263),
    but never printed the matching stdout read from
    [`read_jsonrpc_message`](https://github.com/openai/codex/blob/731b54d08fb93aec31fb020999a62447886f3ab3/codex-rs/app-server/tests/common/mcp_process.rs#L1123-L1128).
    The server initialize handler is a small bookkeeping/response path
    ([`message_processor.rs`](https://github.com/openai/codex/blob/731b54d08fb93aec31fb020999a62447886f3ab3/codex-rs/app-server/src/message_processor.rs#L601-L728)),
    so the failure looks like Windows runner process/pipe scheduling
    starvation rather than account-specific behavior.
    
    ## What Changed
    
    This updates `.config/nextest.toml` to serialize two process-heavy sets:
    
    - `codex-core` tests matching `package(codex-core) & kind(test) &
    test(apply_patch_cli)`
    - `codex-windows-sandbox` tests matching `package(codex-windows-sandbox)
    & test(legacy_)`
    
    `codex-app-server` integration tests were already serialized inside
    their own package; this change reduces overlap with the other suites
    that were saturating the runner at the same time.
    
    ## Verification
    
    - `cargo nextest list --filterset "package(codex-core) & kind(test) &
    test(apply_patch_cli)"`
    - `cargo nextest list --filterset "package(codex-windows-sandbox) &
    test(legacy_)"`
    
    The Windows sandbox filter naturally lists no tests on macOS, but it
    validates the nextest filter/config syntax locally.
  • chore(tui) debug-config guardian_policy_config (#18923)
    ## Summary
    List guardian_policy_config_source in `/debug-config` output
    
    ## Testing
     - [x] Ran locally
  • ci: keep argument comment lint checks materialized (#18926)
    ## Why
    
    The fast `rust-ci` workflow decides whether to run the cross-platform
    `argument-comment-lint` job based on changed paths. PRs that touch
    Rust-adjacent Bazel wrapper files, such as `defs.bzl` or
    `workspace_root_test_launcher.*.tpl`, can change how Rust tests and lint
    targets behave without changing any `.rs` files.
    
    When that detector returned false, GitHub skipped the matrix job before
    expanding it. That produced a single skipped check named `Argument
    comment lint - ${{ matrix.name }}` instead of the Linux, macOS, and
    Windows check names that branch protection expects, leaving the PR
    unable to go green when those matrix checks are required.
    
    ## What Changed
    
    - Treat root Bazel wrapper files as `argument-comment-lint` relevant
    changes.
    - Keep the `argument_comment_lint_prebuilt` matrix job materialized for
    every PR so the per-platform check names always exist.
    - Add a single gate step that decides whether the real lint work should
    run.
    - Move the checkout-adjacent Bazel setup and OS-specific lint commands
    into `.github/actions/run-argument-comment-lint/action.yml` so the
    workflow does not repeat the same path-detection condition on each step.
    
    ## Verification
    
    - Parsed `.github/workflows/rust-ci.yml` and
    `.github/actions/run-argument-comment-lint/action.yml` with Python YAML
    loading.
    - Simulated the workflow path-matching shell conditions for the root
    Bazel wrapper files and confirmed they set `argument_comment_lint=true`.
  • exec-server: carry filesystem sandbox profiles (#18276)
    ## Why
    
    The exec-server still needs platform sandbox inputs, but the migration
    should preserve the `PermissionProfile` that produced them. Keeping only
    the derived legacy sandbox map would keep `SandboxPolicy` as the
    effective abstraction and would make full-disk vs. restricted profiles
    harder to preserve as the permissions stack starts round-tripping
    profiles.
    
    `PermissionProfile` entries can also be cwd-sensitive (`:cwd`,
    `:project_roots`, relative globs), so the exec-server must carry the
    request sandbox cwd instead of resolving those entries against the
    long-lived exec-server process cwd.
    
    ## What changed
    
    `FileSystemSandboxContext` now carries `permissions: PermissionProfile`
    plus an optional `cwd`:
    
    - removed `sandboxPolicy`, `sandboxPolicyCwd`,
    `fileSystemSandboxPolicy`, and `additionalPermissions`
    - added `permissions` and `cwd`
    - kept the platform knobs `windowsSandboxLevel`,
    `windowsSandboxPrivateDesktop`, and `useLegacyLandlock`
    
    Core turn and apply-patch paths populate the context from the active
    runtime permissions and request cwd. Exec-server derives platform
    `SandboxPolicy`/`FileSystemSandboxPolicy` at the filesystem boundary,
    adds helper runtime reads there, and rejects cwd-dependent profiles that
    arrive without a cwd.
    
    The legacy `FileSystemSandboxContext::new(SandboxPolicy)` constructor
    now preserves the old workspace-write conversion semantics for
    compatibility tests/callers.
    
    ## Verification
    
    - `cargo test -p codex-exec-server`
    - `cargo test -p codex-exec-server sandbox_cwd -- --nocapture`
    - `cargo test -p codex-exec-server
    sandbox_context_new_preserves_legacy_workspace_write_read_only_subpaths
    -- --nocapture`
    - `cargo test -p codex-core --lib
    file_system_sandbox_context_uses_active_attempt -- --nocapture`
  • refactor: add agent identity crate (#18871)
    ## Summary
    
    This PR adds `codex-agent-identity` as an isolated crate for Agent
    Identity business logic.
    
    The crate owns:
    - AgentAssertion construction.
    - Agent task registration.
    - private-key assertion signing.
    - bounded blocking HTTP for task registration.
    
    It does not wire AgentIdentity into `auth.json`, `AuthManager`, rollout
    state, or request callsites. That integration happens in later PRs.
    
    Reference old stack: https://github.com/openai/codex/pull/17387/changes
    
    ## Stack
    
    1. https://github.com/openai/codex/pull/18757: full revert
    2. This PR: isolated Agent Identity crate
    3. https://github.com/openai/codex/pull/18785: explicit AgentIdentity
    auth mode and startup task allocation
    4. https://github.com/openai/codex/pull/18811: migrate Codex backend
    auth callsites through AuthProvider
    5. https://github.com/openai/codex/pull/18904: accept AgentIdentity JWTs
    and load `CODEX_AGENT_IDENTITY`
    
    ## Testing
    
    Tests: targeted Rust checks, cargo-shear, Bazel lock check, and CI.
  • Fix remote app-server shutdown race (#18936)
    ## Why
    
    A Mac Bazel CI run saw `remote_notifications_arrive_over_websocket` fail
    during shutdown with `remote app-server shutdown channel is closed`
    (https://app.buildbuddy.io/invocation/9dac05d6-ae20-40f9-b627-fca6e91cf127).
    The remote websocket worker can legitimately finish while `shutdown()`
    is waiting for the shutdown acknowledgement: after the test server sends
    a notification and exits, the worker may deliver the required disconnect
    event, observe that the caller has dropped the event receiver, and exit
    before it sends the shutdown one-shot.
    
    That state is already terminal cleanup, not a failed shutdown, so
    callers should not see a `BrokenPipe` from the acknowledgement channel.
    
    ## What Changed
    
    - Treat a closed remote shutdown acknowledgement as an already-exited
    worker while still propagating websocket close errors when the worker
    returns them.
    - Added a deterministic regression test for the interleaving where the
    shutdown command is received and the worker exits before replying.
    
    ## Verification
    
    - `cargo test -p codex-app-server-client`
    - New test:
    `remote::tests::shutdown_tolerates_worker_exit_after_command_is_queued`
  • feat: Support remote plugin list/read. (#18452)
    Add a temporary internal remote_plugin feature flag that merges remote
    marketplaces into plugin/list and routes plugin/read through the remote
    APIs when needed, while keeping pure local marketplaces working as
    before.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • bazel: run wrapped Rust unit test shards (#18913)
    ## Why
    
    The `codex-tui` Cargo test suite was catching stale snapshot
    expectations, but the matching Bazel unit-test target was still green.
    The TUI unit target is wrapped by `workspace_root_test` so tests run
    from the repository root and Insta can resolve Cargo-like snapshot
    paths. After native Bazel sharding was enabled for that wrapped target,
    rules_rust also inserted its own sharding wrapper around the Rust test
    binary.
    
    Those two wrappers did not compose: rules_rust's sharding wrapper
    expects to run from its own runfiles cwd, while `workspace_root_test`
    deliberately changes cwd to the repo root before invoking the test. In
    that configuration, the inner wrapper could fail to enumerate the Rust
    tests and exit successfully with empty shards, so snapshot regressions
    were not being exercised by Bazel.
    
    ## What Changed
    
    - Stop enabling rules_rust's inner `experimental_enable_sharding` for
    unit-test binaries created by `codex_rust_crate`.
    - Keep the configured `shard_count` on the outer `workspace_root_test`
    target.
    - Add libtest sharding directly to `workspace_root_test_launcher.sh.tpl`
    and `workspace_root_test_launcher.bat.tpl` after the launcher has
    resolved the actual test binary and established the intended
    repository-root cwd.
    - Partition tests by a stable FNV-1a hash of each libtest test name,
    matching the stable-shard behavior we wanted without depending on the
    inner rules_rust wrapper.
    - Preserve ad-hoc local test filters by running the resolved test binary
    directly when explicit test args are supplied.
    - On Windows, run selected libtest names from the shard list in bounded
    PowerShell batches instead of concatenating every selected test into one
    `cmd.exe` command line.
    
    This PR is stacked on top of #18912, which contains only the snapshot
    expectation updates exposed once the Bazel target actually runs the TUI
    unit tests. It is also the reason #18916 becomes visible: once this
    wrapper fix makes Bazel execute the affected `codex-core` test, that
    test needs its own executable-path setup fixed.
    
    ## Verification
    
    - `cargo test -p codex-tui`
    - `bazel test //codex-rs/tui:tui-unit-tests --test_output=errors`
    - `bazel test //codex-rs/tui:all --test_output=errors`
    - `bash -n workspace_root_test_launcher.sh.tpl`
    - Exercised the Windows PowerShell batching fragment locally with a fake
    test binary and shard-list file.
  • feat: add AWS SigV4 auth for OpenAI-compatible model providers (#17820)
    ## Summary
    
    Add first-class Amazon Bedrock Mantle provider support so Codex can keep
    using its existing Responses API transport with OpenAI-compatible
    AWS-hosted endpoints such as AOA/Mantle.
    
    This is needed for the AWS launch path, where provider traffic should
    authenticate with AWS credentials instead of OpenAI bearer credentials.
    Requests are authenticated immediately before transport send, so SigV4
    signs the final method, URL, headers, and body bytes that `reqwest` will
    send.
    
    ## What Changed
    
    - Added a new `codex-aws-auth` crate for loading AWS SDK config,
    resolving credentials, and signing finalized HTTP requests with AWS
    SigV4.
    - Added a built-in `amazon-bedrock` provider that targets Bedrock Mantle
    Responses endpoints, defaults to `us-east-1`, supports region/profile
    overrides, disables WebSockets, and does not require OpenAI auth.
    - Added Amazon Bedrock auth resolution in `codex-model-provider`: prefer
    `AWS_BEARER_TOKEN_BEDROCK` when set, otherwise use AWS SDK credentials
    and SigV4 signing.
    - Added `AuthProvider::apply_auth` and `Request::prepare_body_for_send`
    so request-signing providers can sign the exact outbound request after
    JSON serialization/compression.
    - Determine the region by taking the `aws.region` config first (required
    for bearer token codepath), and fallback to SDK default region.
    
    ## Testing
    Amazon Bedrock Mantle Responses paths:
    
    - Built the local Codex binary with `cargo build`.
    - Verified the custom proxy-backed `aws` provider using `env_key =
    "AWS_BEARER_TOKEN_BEDROCK"` streamed raw `responses` output with
    `response.output_text.delta`, `response.completed`, and `mantle-env-ok`.
    - Verified a full `codex exec --profile aws` turn returned
    `mantle-env-ok`.
    - Confirmed the custom provider used the bearer env var, not AWS profile
    auth: bogus `AWS_PROFILE` still passed, empty env var failed locally,
    and malformed env var reached Mantle and failed with `401
    invalid_api_key`.
    - Verified built-in `amazon-bedrock` with `AWS_BEARER_TOKEN_BEDROCK` set
    passed despite bogus AWS profiles, returning `amazon-bedrock-env-ok`.
    - Verified built-in `amazon-bedrock` SDK/SigV4 auth passed with
    `AWS_BEARER_TOKEN_BEDROCK` unset and temporary AWS session env
    credentials, returning `amazon-bedrock-sdk-env-ok`.
  • test(core): move prompt debug coverage to integration suite (#18916)
    ## Why
    
    `build_prompt_input` now initializes `ExecServerRuntimePaths`, which
    requires a configured Codex executable path. The previous inline unit
    test in `core/src/prompt_debug.rs` built a bare `test_config()` and then
    failed before it could assert anything useful:
    
    ```text
    Codex executable path is not configured
    ```
    
    This coverage is also integration-shaped: it drives the public
    `build_prompt_input` entry point through config, thread, and session
    setup rather than testing a small internal helper in isolation.
    
    Bazel CI did not catch this earlier because the affected test was behind
    the same wrapped Rust unit-test path fixed by #18913. Before that
    launcher/sharding fix, the outer `workspace_root_test` changed the
    working directory for Insta compatibility while the inner `rules_rust`
    sharding wrapper still expected its runfiles working directory. In
    practice, Bazel could report success without executing the Rust test
    cases in that shard. Once #18913 makes the wrapper run the Rust test
    binary directly and shard with libtest arguments, this stale unit test
    actually runs and exposes the missing `codex_self_exe` setup.
    
    ## What Changed
    
    - Moved `build_prompt_input_includes_context_and_user_message` out of
    `core/src/prompt_debug.rs`.
    - Added `core/tests/suite/prompt_debug_tests.rs` and registered it from
    `core/tests/suite/mod.rs`.
    - Builds the test config with `ConfigBuilder` and provides
    `codex_self_exe` using the current test executable, matching the
    runtime-path invariant required by prompt debug setup.
    - Preserves the existing assertions that the generated prompt input
    includes both the debug user message and project-specific user
    instructions.
    
    ## Verification
    
    - `cargo test -p codex-core --test all
    prompt_debug_tests::build_prompt_input_includes_context_and_user_message`
    - `bazel test //codex-rs/core:core-all-test
    --test_arg=prompt_debug_tests::build_prompt_input_includes_context_and_user_message
    --test_output=errors`
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18916).
    * #18913
    * __->__ #18916
  • fix(core): emit hooks for apply_patch edits (#18391)
    Fixes https://github.com/openai/codex/issues/16732.
    
    ## Why
    
    `apply_patch` is Codex's primary file edit path, but it was not emitting
    `PreToolUse` or `PostToolUse` hook events. That meant hook-based policy,
    auditing, and write coordination could observe shell commands while
    missing the actual file mutation performed by `apply_patch`.
    
    The issue also exposed that the hook runtime serialized command hook
    payloads with `tool_name: "Bash"` unconditionally. Even if `apply_patch`
    supplied hook payloads, hooks would either fail to match it directly or
    receive misleading stdin that identified the edit as a Bash tool call.
    
    ## What Changed
    
    - Added `PreToolUse` and `PostToolUse` payload support to
    `ApplyPatchHandler`.
    - Exposed the raw patch body as `tool_input.command` for both
    JSON/function and freeform `apply_patch` calls.
    - Taught tool hook payloads to carry a handler-supplied hook-facing
    `tool_name`.
    - Preserved existing shell compatibility by continuing to emit `Bash`
    for shell-like tools.
    - Serialized the selected hook `tool_name` into hook stdin instead of
    hardcoding `Bash`.
    - Relaxed the generated hook command input schema so `tool_name` can
    represent tools other than `Bash`.
    
    ## Verification
    
    Added focused handler coverage for:
    
    - JSON/function `apply_patch` calls producing a `PreToolUse` payload.
    - Freeform `apply_patch` calls producing a `PreToolUse` payload.
    - Successful `apply_patch` output producing a `PostToolUse` payload.
    - Shell and `exec_command` handlers continuing to expose `Bash`.
    
    Added end-to-end hook coverage for:
    
    - A `PreToolUse` hook matching `^apply_patch$` blocking the patch before
    the target file is created.
    - A `PostToolUse` hook matching `^apply_patch$` receiving the patch
    input and tool response, then adding context to the follow-up model
    request.
    - Non-participating tools such as the plan tool continuing not to emit
    `PreToolUse`/`PostToolUse` hook events.
    
    Also validated manually with a live `codex exec` smoke test using an
    isolated temp workspace and temp `CODEX_HOME`. The smoke test confirmed
    that a real `apply_patch` edit emits `PreToolUse`/`PostToolUse` with
    `tool_name: "apply_patch"`, a shell command still emits `tool_name:
    "Bash"`, and a denying `PreToolUse` hook prevents the blocked patch file
    from being created.
  • Add turn-scoped environment selections (#18416)
    ## Summary
    - add experimental turn/start.environments params for per-turn
    environment id + cwd selections
    - pass selections through core protocol ops and resolve them with
    EnvironmentManager before TurnContext creation
    - treat omitted selections as default behavior, empty selections as no
    environment, and non-empty selections as first environment/cwd as the
    turn primary
    
    ## Testing
    - ran `just fmt`
    - ran `just write-app-server-schema`
    - not run: unit tests for this stacked PR
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • fix: windows snapshot for external_agent_config_migration::tests::prompt_snapshot did not match windows output (#18915)
    Fix a snapshot test that is failing on Windows, but is currently missed
    by Bazel due to https://github.com/openai/codex/pull/18913. We see this
    failing on Cargo builds on Windows, though.
    
    This Bazel vs. Cargo inconsistency explains why
    https://github.com/openai/codex/pull/18768 did not fix the Cargo Windows
    build.
  • sandboxing: materialize cwd-relative permission globs (#18867)
    ## Why
    
    #18275 anchors session-scoped `:cwd` and `:project_roots` grants to the
    request cwd before recording them for reuse. Relative deny glob entries
    need the same treatment. Without anchoring, a stored session permission
    can keep a pattern such as `**/*.env` relative, then reinterpret that
    deny against a later turn cwd. That makes the persisted profile depend
    on the cwd at reuse time instead of the cwd that was reviewed and
    approved.
    
    ## What changed
    
    `intersect_permission_profiles` now materializes retained
    `FileSystemPath::GlobPattern` entries against the request cwd, matching
    the existing materialization for cwd-sensitive special paths.
    
    Materialized accepted grants are now deduplicated before deny retention
    runs. This keeps the sticky-grant preapproval shape stable when a
    repeated request is merged with the stored grant and both `:cwd = write`
    and the materialized absolute cwd write are present.
    
    The preapproval check compares against the same materialized form, so a
    later request for the same cwd-relative deny glob still matches the
    stored anchored grant instead of re-prompting or rejecting.
    
    Tests cover both the storage path and the preapproval path: a
    session-scoped `:cwd = write` grant with `**/*.env = none` is stored
    with both the cwd write and deny glob anchored to the original request
    cwd, cannot be reused from a later cwd, and remains preapproved when
    re-requested from the original cwd after merging with the stored grant.
    
    ## Verification
    
    - `cargo test -p codex-sandboxing policy_transforms`
    - `cargo test -p codex-core --lib
    relative_deny_glob_grants_remain_preapproved_after_materialization`
    - `cargo clippy -p codex-sandboxing --tests -- -D
    clippy::redundant_clone`
    - `cargo clippy -p codex-core --lib -- -D clippy::redundant_clone`
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18867).
    * #18288
    * #18287
    * #18286
    * #18285
    * #18284
    * #18283
    * #18282
    * #18281
    * #18280
    * #18279
    * #18278
    * #18277
    * #18276
    * __->__ #18867
  • Update /statusline and /title snapshots (#18909)
    Update `/statusline` and `/title` snapshots
  • [codex] Tighten external migration prompt tests (#18768)
    ## Summary
    - tighten the external migration prompt snapshot around stable synthetic
    fixture text
    - add focused display_description tests for relative path rewriting and
    plugin summaries
    - split the path-format assertions into smaller, easier-to-read unit
    tests
    
    ## Why
    The previous prompt snapshot was coupled to path text that came from
    detected migration items, which made it noisier and more brittle than
    necessary. This change keeps the snapshot focused on stable UI structure
    and moves dynamic path formatting checks into targeted unit tests.
    
    ## Validation
    - cargo test -p codex-tui external_agent_config_migration::tests::
    - cargo test -p codex-tui
    external_agent_config_migration::tests::display_description_
    - just fmt
    
    ## Notes
    Per the repo instructions, I did not rerun tests after the final `just
    fmt` pass.
  • Normalize /statusline & /title items (#18886)
    This change aligns the `/statusline` and `/title` UIs around the same
    normalized item model so both surfaces use consistent ids, labels, and
    preview semantics. It keeps the shared preview work from #18435 ,
    tightens the remaining mismatches by standardizing item naming, expands
    title/status item coverage where appropriate, and makes `/title` preview
    use the same title-specific formatting path as the real rendered
    terminal title.
    
    - Normalizes persisted item ids and keeps legacy aliases for
    compatibility
    - Aligns `status-line` and `terminal-title` items with the shared
    preview model
    - Routes `terminal-title` preview through title-specific formatting and
    truncation
    - Updates the affected status/title setup snapshots
    
    Added to `/statusline`:
    - status
    - task-progress
      
    Normalized in `/statusline`:
    - model-name -> model
    - project-root -> project-name
    
    Added to `/title`:
    - current-dir
    - context-remaining
    - context-used
    - five-hour-limit
    - weekly-limit
    - codex-version
    - used-tokens
    - total-input-tokens
    - total-output-tokens
    - session-id
    - fast-mode
    - model-with-reasoning
    
    Normalized in `/title`:
    - project -> project-name
    - thread -> thread-title
    - model-name -> model
  • Allow guardian bare allow output (#18797)
    ## Summary
    
    Allow guardian to skip other fields and output only
    `{"outcome":"allow"}` when the command is low risk.
    This change lets guardian reviews use a non-strict text format while
    keeping the JSON schema itself as plain user-visible schema data, so
    transport strictness is carried out-of-band instead of through a schema
    marker key.
    
    ## What changed
    
    - Add an explicit `output_schema_strict` flag to model prompts and pass
    it into `codex-api` text formatting.
    - Set guardian reviewer prompts to non-strict schema validation while
    preserving strict-by-default behavior for normal callers.
    - Update the guardian output contract so definitely-low-risk decisions
    may return only `{"outcome":"allow"}`.
    - Treat bare allow responses as low-risk approvals in the guardian
    parser.
    - Add tests and snapshots covering the non-strict guardian request and
    optional guardian output fields.
    
    ## Verification
    
    - `cargo test -p codex-core guardian::tests::guardian`
    - `cargo test -p codex-core guardian::tests::`
    - `cargo test -p codex-core client_common::tests::`
    - `cargo test -p codex-protocol
    user_input_serialization_includes_final_output_json_schema`
    - `cargo test -p codex-api`
    - `git diff --check`
    
    Note: `cargo test -p codex-core` was also attempted, but this desktop
    environment injects ambient config/proxy state that causes unrelated
    config/session tests expecting pristine defaults to fail.
    
    ---------
    
    Co-authored-by: Dylan Hurd <dylan.hurd@openai.com>
    Co-authored-by: Codex <noreply@openai.com>
  • Support multiple managed environments (#18401)
    ## Summary
    - refactor EnvironmentManager to own keyed environments with
    default/local lookup helpers
    - keep remote exec-server client creation lazy until exec/fs use
    - preserve disabled agent environment access separately from internal
    local environment access
    
    ## Validation
    - not run (per Codex worktree instruction to avoid tests/builds unless
    requested)
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • [rollout_trace] Add rollout trace crate (#18876)
    ## Summary
    
    Adds the standalone `codex-rollout-trace` crate, which defines the raw
    trace event format, replay/reduction model, writer, and reducer logic
    for reconstructing model-visible conversation/runtime state from
    recorded rollout data.
    
    The crate-level design is documented in
    [`codex-rs/rollout-trace/README.md`](https://github.com/openai/codex/blob/codex/rollout-trace-crate/codex-rs/rollout-trace/README.md).
    
    ## Stack
    
    This is PR 1/5 in the rollout trace stack.
    
    - [#18876](https://github.com/openai/codex/pull/18876): Add rollout
    trace crate
    - [#18877](https://github.com/openai/codex/pull/18877): Record core
    session rollout traces
    - [#18878](https://github.com/openai/codex/pull/18878): Trace tool and
    code-mode boundaries
    - [#18879](https://github.com/openai/codex/pull/18879): Trace sessions
    and multi-agent edges
    - [#18880](https://github.com/openai/codex/pull/18880): Add debug trace
    reduction command
    
    ## Review Notes
    
    This PR intentionally does not wire tracing into live Codex execution.
    It establishes the data model and reducer contract first, with
    crate-local tests covering conversation reconstruction, compaction
    boundaries, tool/session edges, and code-cell lifecycle reduction. Later
    PRs emit into this model.
    
    The README is the best entry point for reviewing the intended trace
    format and reduction semantics before diving into the reducer modules.
  • Preserve Cloudfare HTTP cookies in codex (#17783)
    ## Summary
    - Adds a process-local, in-memory cookie store for ChatGPT HTTP clients.
    - Limits cookie storage and replay to a shared ChatGPT host allowlist.
    - Wires the shared store into the default Codex reqwest client and
    backend client.
    - Shares the ChatGPT host allowlist with remote-control URL validation
    to avoid drift.
    - Enables reqwest cookie support and updates lockfiles.
  • fix: fully revert agent identity runtime wiring (#18757)
    ## Summary
    
    This PR fully reverts the previously merged Agent Identity runtime
    integration from the old stack:
    https://github.com/openai/codex/pull/17387/changes
    
    It removes the Codex-side task lifecycle wiring, rollout/session
    persistence, feature flag plumbing, lazy `auth.json` mutation,
    background task auth paths, and request callsite changes introduced by
    that stack.
    
    This leaves the repo in a clean pre-AgentIdentity integration state so
    the follow-up PRs can reintroduce the pieces in smaller reviewable
    layers.
    
    ## Stack
    
    1. This PR: full revert
    2. https://github.com/openai/codex/pull/18871: move Agent Identity
    business logic into a crate
    3. https://github.com/openai/codex/pull/18785: add explicit
    AgentIdentity auth mode and startup task allocation
    4. https://github.com/openai/codex/pull/18811: migrate auth callsites
    through AuthProvider
    
    ## Testing
    
    Tests: targeted Rust checks, cargo-shear, Bazel lock check, and CI.
  • app-server: implement device key v2 methods (#18430)
    ## Why
    
    The device-key protocol needs an app-server implementation that keeps
    local key operations behind the same request-processing boundary as
    other v2 APIs.
    
    app-server owns request dispatch, transport policy, documentation, and
    JSON-RPC error shaping. `codex-device-key` owns key binding, validation,
    platform provider selection, and signing mechanics. Keeping the adapter
    thin makes the boundary easier to review and avoids moving local
    key-management details into thread orchestration code.
    
    ## What changed
    
    - Added `DeviceKeyApi` as the app-server adapter around
    `DeviceKeyStore`.
    - Converted protocol protection policies, payload variants, algorithms,
    and protection classes to and from the device-key crate types.
    - Encoded SPKI public keys and DER signatures as base64 protocol fields.
    - Routed `device/key/create`, `device/key/public`, and `device/key/sign`
    through `MessageProcessor`.
    - Rejected remote transports before provider access while allowing local
    `stdio` and in-process callers to reach the device-key API.
    - Added stdio, in-process, and websocket tests for device-key validation
    and transport policy.
    - Documented the device-key methods in the app-server v2 method list.
    
    ## Test coverage
    
    - `device_key_create_rejects_empty_account_user_id`
    - `in_process_allows_device_key_requests_to_reach_device_key_api`
    - `device_key_methods_are_rejected_over_websocket`
    
    ## Stack
    
    This is PR 3 of 4 in the device-key app-server stack. It is stacked on
    #18429.
    
    ## Validation
    
    - `cargo test -p codex-app-server device_key`
    - `just fix -p codex-app-server`
  • feat(tui): shortcuts to change reasoning level temporarily (#18866)
    ## Summary
    
    Adds main-chat shortcuts for changing reasoning effort one step at a
    time:
    
    - `Alt+,` lowers reasoning (has the `<` arrow on the key)
    - `Alt+.` raises reasoning (similarly, has the `>` arrow)
    
    The shortcut updates the active session only. It does not persist the
    selected reasoning level as the default for future sessions. In Plan
    mode, it applies temporarily to Plan mode without opening the
    global-vs-Plan scope prompt.
    
    ## Details
    
    The shortcut uses the active model preset to decide which reasoning
    levels are valid. If the current session has no explicit reasoning
    effort, it starts from the model default. Each keypress moves to the
    next supported level in the requested direction.
    
    The shortcut only runs from the main chat surface. If a popup or modal
    is open, input remains owned by that UI.
    
    In Plan mode, the shortcut updates the in-memory Plan reasoning override
    directly. The model/reasoning picker still keeps the existing scope
    prompt for explicit picker changes.
    
    ## Notes
    
    Ctrl-plus and Ctrl-minus were considered, but terminals do not deliver
    those combinations consistently, so this PR uses Alt shortcuts instead.
    
    If the current effort is unsupported by the selected model, the shortcut
    skips to the nearest supported level in the requested direction. If
    there is no valid step, it shows the existing boundary message.
    
    ## Tests
    
    - `cargo test -p codex-tui reasoning_shortcuts`
    - `cargo test -p codex-tui reasoning_effort`
    - `cargo test -p codex-tui reasoning_shortcut`
    - `cargo test -p codex-tui footer_snapshots`
    - `cargo test -p codex-tui`
    - `just fix -p codex-tui`
    - `./tools/argument-comment-lint/run.py -p codex-tui -- --tests`
    
    ---------
    
    Co-authored-by: Eric Traut <etraut@openai.com>
  • Load app-server config through ConfigManager (#18870)
    ## Summary
    - Load app-server startup config through `ConfigManager` instead of
    direct `ConfigBuilder` calls.
    - Move `ConfigManager` constructor-owned state (`cli_overrides`, runtime
    feature map, cloud requirements loader) behind internal manager fields.
    - Pass `ConfigManager` into `MessageProcessor` directly instead of
    reconstructing it from raw args.
    
    ## Tests
    - `cargo check -p codex-app-server`
    - `cargo test -p codex-app-server`
    - `just fix -p codex-app-server`
    - `just fmt`
  • chore: default multi-agent v2 fork to all (#18873)
    Default sub-agents v2 to `all` for the fork mode
  • app-server: fix Bazel clippy in tracing tests (#18872)
    ## Why
    
    PR #18431 exposed a Bazel clippy failure in the app-server unit-test
    target across Linux, macOS, and Windows. The failing lint was
    `clippy::await_holding_invalid_type`: two tracing tests serialized
    access to global tracing state by holding a `tokio::sync::MutexGuard`
    across awaited test work.
    
    That serialization is still needed because the tests share
    process-global tracing setup and exporter state, but it should not
    require holding an async mutex guard through the whole test body.
    
    ## What changed
    
    - Replaced the bespoke async `tracing_test_guard` helper with
    `serial_test` on the two tracing tests that need global tracing
    serialization.
    - Removed the `#[expect(clippy::await_holding_invalid_type)]`
    annotations and the lock guard callsites that Bazel clippy rejected.
    
    ## Validation
    
    - `cargo test -p codex-app-server jsonrpc_span`
    - `just fix -p codex-app-server`
    - `git diff --check`
    
    I also attempted the exact failing Bazel clippy target locally with
    BuildBuddy disabled: `bazel --noexperimental_remote_repo_contents_cache
    build --config=clippy --bes_backend= --remote_cache=
    --experimental_remote_downloader= --
    //codex-rs/app-server:app-server-unit-tests-bin`. That run did not reach
    clippy because Bazel timed out downloading `libcap-2.27.tar.gz` from
    `kernel.org`.
  • app-server: add codex-device-key crate (#18429)
    ## Why
    
    Device-key storage and signing are local security-sensitive operations
    with platform-specific behavior. Keeping the core API in
    `codex-device-key` keeps app-server focused on routing and business
    logic instead of owning key-management details.
    
    The crate keeps the signing surface intentionally narrow: callers can
    create a bound key, fetch its public key, or sign one of the structured
    payloads accepted by the crate. It does not expose a generic
    arbitrary-byte signing API.
    
    Key IDs cross into platform-specific labels, tags, and metadata paths,
    so externally supplied IDs are constrained to the same auditable
    namespace created by the crate: `dk_` followed by unpadded base64url for
    32 bytes. Remote-control target paths are also tied to each signed
    payload shape so connection proofs cannot be reused for enrollment
    endpoints, or vice versa.
    
    ## What changed
    
    - Added the `codex-device-key` workspace crate.
    - Added account/client-bound key creation with stable `dk_` key IDs.
    - Added strict `key_id` validation before public-key lookup or signing
    reaches a provider.
    - Added public-key lookup and structured signing APIs.
    - Split remote-control client endpoint allowlists by connection vs
    enrollment payload shape.
    - Added validation for key bindings, accepted payload fields, token
    expiration, and payload/key binding mismatches.
    - Added flow-oriented docs on the validation helpers that gate provider
    signing.
    - Added protection policy and protection-class types without wiring a
    platform provider yet.
    - Added an unsupported default provider so platforms without an
    implementation fail explicitly instead of silently falling back to
    software-backed keys.
    - Updated Cargo and Bazel lock metadata for the new crate and
    non-platform-specific dependencies.
    
    ## Stack
    
    This is stacked on #18428.
    
    ## Validation
    
    - `cargo test -p codex-device-key`
    - Added unit coverage for strict `key_id` validation before provider
    use.
    - Added unit coverage that rejects remote-control paths from the wrong
    signed payload shape.
    - `just bazel-lock-update`
    - `just bazel-lock-check`
  • Add Windows sandbox unified exec runtime support (#15578)
    ## Summary
    
    This is the runtime/foundation half of the Windows sandbox unified-exec
    work.
    
    - add Windows sandbox `unified_exec` session support in
    `windows-sandbox-rs` for both:
      - the legacy restricted-token backend
      - the elevated runner backend
    - extend the PTY/process runtime so driver-backed sessions can support:
      - stdin streaming
      - stdout/stderr separation
      - exit propagation
      - PTY resize hooks
    - add Windows sandbox runtime coverage in `codex-windows-sandbox` /
    `codex-utils-pty`
    
    This PR does **not** enable Windows sandbox `UnifiedExec` for product
    callers yet because hooking this up to app-server comes in the next PR.
    
    Windows sandbox advertising is intentionally kept aligned with `main`,
    so sandboxed Windows callers still fall back to `ShellCommand`.
    
    This PR isolates the runtime/session layer so it can be reviewed
    independently from product-surface enablement.
    
    ---------
    
    Co-authored-by: jif-oai <jif@openai.com>
    Co-authored-by: Codex <noreply@openai.com>
  • Refresh generated Python app-server SDK types (#18862)
    This is the first step in splitting the Python SDK PyPI publish work
    into reviewable layers: land the generated SDK refresh by itself before
    changing packaging mechanics. The next PRs will make the runtime wheel
    publishable, then wire the SDK package/version pinning to that runtime.
    
    ## Summary
    - Refresh generated Python app-server v2 models and notification
    registry from the current schema.
    - Update the public API signature expectations for the newly generated
    kwargs.
    
    ## Stack
    - PR 1 of 3 for the Python SDK PyPI publishing split.
    - Follow-up PRs will handle runtime wheel publishing mechanics, then
    SDK/package version pinning.
    
    ## Tests
    - `uv run --extra dev pytest` in `sdk/python` -> 51 passed, 37 skipped.
  • sandboxing: intersect permission profiles semantically (#18275)
    ## Why
    
    Permission approval responses must not be able to grant more access than
    the tool requested. Moving this flow to `PermissionProfile` means the
    comparison must be profile-shaped instead of `SandboxPolicy`-shaped, and
    cwd-relative special paths such as `:cwd` and `:project_roots` must stay
    anchored to the turn that produced the request.
    
    ## What changed
    
    This implements semantic `PermissionProfile` intersection in
    `codex-sandboxing` for file-system and network permissions. The
    intersection accepts narrower path grants, rejects broader grants,
    preserves deny-read carve-outs and glob scan depth, and materializes
    cwd-dependent special-path grants to absolute paths before they can be
    recorded for reuse.
    
    The request-permissions response paths now use that intersection
    consistently. App-server captures the request turn cwd before waiting
    for the client response, includes that cwd in the v2 approval params,
    and core stores the requested profile plus cwd for direct TUI/client
    responses and Guardian decisions before recording turn- or
    session-scoped grants. The TUI app-server bridge now preserves the
    app-server request cwd when converting permission approval params into
    core events.
    
    ## Verification
    
    - `cargo test -p codex-sandboxing intersect_permission_profiles --
    --nocapture`
    - `cargo test -p codex-app-server request_permissions_response --
    --nocapture`
    - `cargo test -p codex-core
    request_permissions_response_materializes_session_cwd_grants_before_recording
    -- --nocapture`
    - `cargo check -p codex-tui --tests`
    - `cargo check --tests`
    - `cargo test -p codex-tui
    app_server_request_permissions_preserves_file_system_permissions`
  • Split DeveloperInstructions into individual fragments. (#18813)
    Split DeveloperInstructions into individual fragments.
  • Refactor app-server config loading into ConfigManager (#18442)
    Localize app-server configuration loading in one place.
  • Move TUI app tests to modules they cover (#18799)
    ## Summary
    
    The TUI app refactor in #18753 moved the old `app.rs` tests into a
    single `app/tests.rs` file. That kept the split mechanically simple, but
    it left several focused unit tests far from the modules they exercise.
    
    This PR is a follow-up that moves tests next to the code they cover.
    
    It also adds `tui/src/app/test_support.rs` for shared fixture
    construction.
    
    This is just a mechanical refactoring (no functional changes) and does
    not affect any production code.
  • Stabilize debug clear memories integration test (#18858)
    ## Why
    
    `debug_clear_memories_resets_state_and_removes_memory_dir` can be flaky
    because the test drops its `sqlx::SqlitePool` immediately before
    invoking `codex debug clear-memories`. Dropping the pool does not wait
    for all SQLite connections to close, so the CLI can race with still-open
    test connections.
    
    ## What changed
    
    - Await `pool.close()` before spawning `codex debug clear-memories`.
    - Close the reopened verification pool before the temp `CODEX_HOME` is
    torn down.
    
    ## Verification
    
    - `cargo test -p codex-cli --test debug_clear_memories
    debug_clear_memories_resets_state_and_removes_memory_dir`
  • Queue follow-up input during user shell commands (#18820)
    Fixes #17954.
    
    ## Why
    When a manual shell command like `!sleep 10` is running, submitting
    plain text such as `hi` currently sends that text as a steer for the
    active shell turn. User shell turns are not steerable like model turns,
    so the TUI can remain stuck in `Working` after the shell command
    finishes.
    
    ## What Changed
    - Detect when the only active work is one or more
    `ExecCommandSource::UserShell` commands.
    - Queue plain submitted input in that state so it drains after the shell
    command and shell turn complete.
    - Preserve `!cmd` submissions during running work so explicit shell
    commands keep their existing behavior.
    - Add regression coverage for the `!sleep 10` plus `hi` flow in
    `chatwidget::tests::exec_flow::user_message_during_user_shell_command_is_queued_not_steered`.
    
    ## Verification
    - Manually confirmed hang before the fix and no hang after the fix
  • [codex] Add tmux-aware OSC 9 notifications (#17836)
    ## Summary
    - wrap OSC 9 notifications in tmux's DCS passthrough so terminal
    notifications make it through tmux
    - use codex-terminal-detection for OSC 9 auto-selection so tmux sessions
    inherit the underlying client terminal support
    - add focused notification backend tests for plain OSC 9 and
    tmux-wrapped output
    
    ## Stack
    - base PR: #18479
    - review order: #18479, then this PR
    
    ## Why
    Tmux does not forward OSC 9 notifications directly; the sequence has to
    be wrapped in tmux's DCS passthrough envelope. Codex also had local
    notification heuristics that could miss supported terminals when running
    under tmux, even though codex-terminal-detection already knows how to
    attribute tmux sessions to the client terminal.
    
    ## Validation
    - `just fmt`
    - `cargo test -p codex-tui` *(currently blocked by an unrelated existing
    compile error in `app-server/src/message_processor.rs:754` referencing
    `connection_id` out of scope; not caused by this change)*
    
    Co-authored-by: Codex <noreply@openai.com>
  • Propagate thread id in MCP tool metadata (#18093)
    ## Summary
    - attach the authoritative Codex thread id to MCP tool request
    `_meta.threadId` for model-initiated tool calls
    - attach the same thread id for manual `mcpServer/tool/call` requests
    before invoking the MCP server
    - cover both metadata helper behavior and the manual app-server MCP path
    in tests
    
    
    needed because the Rust app-server is the last place that still has
    authoritative knowledge of “this model-generated MCP tool call belongs
    to conversation/thread X” before the request leaves Codex and reaches
    Hoopa. It adds threadId to MCP request metadata in the model-generated
    tool-call path, using sess.conversation_id, and also does the same for
    the manual mcpServer/tool/call path.
    
    ## Test plan
    - `cargo test -p codex-core
    mcp_tool_call_thread_id_meta_is_added_to_request_meta --lib`
    - `cargo test -p codex-app-server
    mcp_server_tool_call_returns_tool_result`
    
    Paired Hoopa consumer PR: https://github.com/openai/openai/pull/833263
  • app-server: define device key v2 protocol (#18428)
    ## Why
    
    Clients need a stable app-server protocol surface for enrolling a local
    device key, retrieving its public key, and producing a device-bound
    proof.
    
    The protocol reports `protectionClass` explicitly so clients can
    distinguish hardware-backed keys from an explicitly allowed OS-protected
    fallback. Signing uses a tagged `DeviceKeySignPayload` enum rather than
    arbitrary bytes so each signed statement is auditable at the API
    boundary.
    
    ## What changed
    
    - Added v2 JSON-RPC methods for `device/key/create`,
    `device/key/public`, and `device/key/sign`.
    - Added request/response types for device-key metadata, SPKI public
    keys, protection classes, and ECDSA signatures.
    - Added `DeviceKeyProtectionPolicy` with hardware-only default behavior
    and an explicit `allow_os_protected_nonextractable` option.
    - Added the initial `remoteControlClientConnection` signing payload
    variant.
    - Regenerated JSON Schema and TypeScript fixtures for app-server
    clients.
    
    ## Stack
    
    This is PR 1 of 4 in the device-key app-server stack.
    
    ## Validation
    
    - `just write-app-server-schema`
    - `cargo test -p codex-app-server-protocol`
  • core: make test-log a dev dependency (#18846)
    The `test-log` crate is only used by `codex-core` tests, so it does not
    need
    to be part of the normal `codex-core` dependency graph. Keeping
    `test-log` in
    `dev-dependencies` removes it from normal `codex-core` builds and keeps
    the
    production dependency set a little smaller.
    
    Verification:
    
    - `cargo tree -p codex-core --edges normal --invert test-log`
    - `cargo check -p codex-core --lib`
    - `cargo test -p codex-core --lib`
  • feat: baseline lib (#18848)
    This add with 2 entry point:
    * `reset_git_repository` that takes a directory and set it as a new git
    root
    * `diff_since_latest_init` this returns the diff for a given directory
    since the last `reset_git_repository`
  • build: reduce Rust dev debuginfo (#18844)
    ## What changed
    
    This PR makes the default Cargo dev profile use line-tables-only debug
    info:
    
    ```toml
    [profile.dev]
    debug = 1
    ```
    
    That keeps useful backtraces while avoiding the cost of full variable
    debug
    info in normal local dev builds.
    
    This also makes the Bazel CI setting explicit with `-Cdebuginfo=0` for
    target
    and exec-configuration Rust actions. Bazel/rules_rust does not read
    Cargo
    profiles for this setting, and the current fastbuild action already
    emitted
    `--codegen=debuginfo=0`; the Bazel part of this PR makes that choice
    direct in
    our build configuration.
    
    ## Why
    
    The slow codex-core rebuilds are dominated by debug-info codegen, not
    parsing
    or type checking. On a warm-dependency package rebuild, the baseline
    codex-core compile was about 39.5s wall / 38.9s rustc total, with
    codegen_crate around 14.0s and LLVM_passes around 13.4s. Setting
    codex-core
    to line-tables-only debug info brought that to about 27.2s wall / 26.7s
    rustc
    total, with codegen_crate around 3.1s and LLVM_passes around 2.8s.
    
    `debug = 0` was only about another 0.7s faster than `debug = 1` in the
    codex-core measurement, so `debug = 1` is the better default dev
    tradeoff: it
    captures nearly all of the compile-time win while preserving basic
    debuggability.
    
    I also sampled other first-party crates instead of keeping a
    codex-core-only
    package override. codex-app-server showed the same pattern: rustc total
    dropped from 15.85s to 10.48s, while codegen_crate plus LLVM_passes
    dropped
    from about 13.47s to 3.23s. codex-app-server-protocol had a smaller but
    still
    real improvement, 16.05s to 14.58s total, and smaller crates showed
    modest
    wins. That points to a workspace dev-profile policy rather than a
    hand-maintained list of large crates.
    
    ## Relationship to #18612
    
    [#18612](https://github.com/openai/codex/pull/18612) added the
    `dev-small`
    profile. That remains useful when someone wants a working local build
    quickly
    and is willing to opt in with `cargo build --profile dev-small`.
    
    This PR is deliberately less aggressive: it changes the common default
    dev
    profile while preserving line tables/backtraces. `dev-small` remains the
    explicit "build quickly, no debuggability concern" path.
    
    ## Other investigation
    
    I looked for another structural win comparable to
    [#16631](https://github.com/openai/codex/pull/16631) and
    [#16630](https://github.com/openai/codex/pull/16630), but did not find
    one.
    The attempted TOML monomorphization changes were noisy or worse in
    measurement, and the async task changes reduced some instantiations but
    only
    translated to roughly a one-second improvement while being much more
    disruptive. The debug-info setting was the one repeatable, material win
    that
    survived measurement.
    
    ## Verification
    
    - `just bazel-lock-update`
    - `just bazel-lock-check`
    - `cargo check -p codex-core --lib`
    - `cargo test -p codex-core --lib`
    - Bazel `aquery --config=ci-linux` confirmed `--codegen=debuginfo=0` and
      `-Cdebuginfo=0` for `//codex-rs/core:core`
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18844).
    * #18846
    * __->__ #18844