Commit Graph

653 Commits

  • fix: reduce flakiness of compact_resume_after_second_compaction_preserves_history (#11663)
    ## Why
    `compact_resume_after_second_compaction_preserves_history` has been
    intermittently flaky in Windows CI.
    
    The test had two one-shot request matchers in the second compact/resume
    phase that could overlap, and it waited for the first `Warning` event
    after compaction. In practice, that made the test sensitive to
    platform/config-specific prompt shape and unrelated warning timing.
    
    ## What Changed
    - Hardened the second compaction matcher in
    `codex-rs/core/tests/suite/compact_resume_fork.rs` so it accepts
    expected compact-request variants while explicitly excluding the
    `AFTER_SECOND_RESUME` payload.
    - Updated `compact_conversation()` to wait for the specific compaction
    warning (`COMPACT_WARNING_MESSAGE`) rather than any `Warning` event.
    - Added an inline comment explaining why the matcher is intentionally
    broad but disjoint from the follow-up resume matcher.
    
    ## Test Plan
    - `cargo test -p codex-core --test all
    suite::compact_resume_fork::compact_resume_after_second_compaction_preserves_history
    -- --exact`
    - Repeated the same test in a loop (40 runs) to check for local
    nondeterminism.
  • core: limit search_tool_bm25 to Apps and clarify discovery guidance (#11669)
    ## Summary
    - Limit `search_tool_bm25` indexing to `codex_apps` tools only, so
    non-Apps MCP servers are no longer discoverable through this search
    path.
    - Move search-tool discovery guidance into the `search_tool_bm25` tool
    description (via template include) instead of injecting it as a separate
    developer message.
    - Update Apps discovery guidance wording to clarify when to use
    `search_tool_bm25` for Apps-backed systems (for example Slack, Google
    Drive, Jira, Notion) and when to call tools directly.
    - Remove dead `core` helper code (`filter_codex_apps_mcp_tools` and
    `codex_apps_connector_id`) that is no longer used after the
    tool-selection refactor.
    - Update `core` search-tool tests to assert codex-apps-only behavior and
    to validate guidance from the tool description.
    
    ## Validation
    -  `just fmt`
    -  `cargo test -p codex-core search_tool`
    - ⚠️ `cargo test -p codex-core` was attempted, but the run repeatedly
    stalled on
    `tools::js_repl::tests::js_repl_can_attach_image_via_view_image_tool`.
    
    ## Tickets
    - None
  • chore(approvals) More approvals scenarios (#11660)
    ## Summary
    Add some additional tests to approvals flow
    
    ## Testing
    - [x] these are tests
  • Persist complete TurnContextItem state via canonical conversion (#11656)
    ## Summary
    
    This PR delivers the first small, shippable step toward model-visible
    state diffing by making
    `TurnContextItem` more complete and standardizing how it is built.
    
    Specifically, it:
    - Adds persisted network context to `TurnContextItem`.
    - Introduces a single canonical `TurnContext -> TurnContextItem`
    conversion path.
    - Routes existing rollout write sites through that canonical conversion
    helper.
    
    No context injection/diff behavior changes are included in this PR.
    
    ## Why this change
    
    The design goal is to make `TurnContextItem` the canonical source of
    truth for context-diff
    decisions.
    Before this PR:
    - `TurnContextItem` did not include all TurnContext-derived environment
    inputs needed for v1
    completeness.
    - Construction was duplicated at multiple write sites.
    
    This PR addresses both with a minimal, reviewable change.
    
    ## Changes
    
    ### 1) Extend `TurnContextItem` with network state
    - Added `TurnContextNetworkItem { allowed_domains, denied_domains }`.
    - Added `network: Option<TurnContextNetworkItem>` to `TurnContextItem`.
    - Kept backward compatibility by making the new field optional and
    skipped when absent.
    
    Files:
    - `codex-rs/protocol/src/protocol.rs`
    
    ### 2) Canonical conversion helper
    - Added `TurnContext::to_turn_context_item(collaboration_mode)` in core.
    - Added internal helper to derive network fields from
    `config_layer_stack.requirements().network`.
    
    Files:
    - `codex-rs/core/src/codex.rs`
    
    ### 3) Use canonical conversion at rollout write sites
    - Replaced ad hoc `TurnContextItem { ... }` construction with
    `to_turn_context_item(...)` in:
      - sampling request path
      - compaction path
    
    Files:
    - `codex-rs/core/src/codex.rs`
    - `codex-rs/core/src/compact.rs`
    
    ### 4) Update fixtures/tests for new optional field
    - Updated existing `TurnContextItem` literals in tests to include
    `network: None`.
    - Added protocol tests for:
      - deserializing old payloads with no `network`
      - serializing when `network` is present
    
    Files:
    - `codex-rs/core/tests/suite/resume_warning.rs`
    - No replay/diff logic changes.
    - Persisted rollout `TurnContextItem` now carries additional network
    context when available.
    - Older rollout lines without `network` remain readable.
  • feat: introduce Permissions (#11633)
    ## Why
    We currently carry multiple permission-related concepts directly on
    `Config` for shell/unified-exec behavior (`approval_policy`,
    `sandbox_policy`, `network`, `shell_environment_policy`,
    `windows_sandbox_mode`).
    
    Consolidating these into one in-memory struct makes permission handling
    easier to reason about and sets up the next step: supporting named
    permission profiles (`[permissions.PROFILE_NAME]`) without changing
    behavior now.
    
    This change is mostly mechanical: it updates existing callsites to go
    through `config.permissions`, but it does not yet refactor those
    callsites to take a single `Permissions` value in places where multiple
    permission fields are still threaded separately.
    
    This PR intentionally **does not** change the on-disk `config.toml`
    format yet and keeps compatibility with legacy config keys.
    
    ## What Changed
    - Introduced `Permissions` in `core/src/config/mod.rs`.
    - Added `Config::permissions` and moved effective runtime permission
    fields under it:
      - `approval_policy`
      - `sandbox_policy`
      - `network`
      - `shell_environment_policy`
      - `windows_sandbox_mode`
    - Updated config loading/building so these effective values are still
    derived from the same existing config inputs and constraints.
    - Updated Windows sandbox helpers/resolution to read/write via
    `permissions`.
    - Threaded the new field through all permission consumers across core
    runtime, app-server, CLI/exec, TUI, and sandbox summary code.
    - Updated affected tests to reference `config.permissions.*`.
    - Renamed the struct/field from
    `EffectivePermissions`/`effective_permissions` to
    `Permissions`/`permissions` and aligned variable naming accordingly.
    
    ## Verification
    - `just fix -p codex-core -p codex-tui -p codex-cli -p codex-app-server
    -p codex-exec -p codex-utils-sandbox-summary`
    - `cargo build -p codex-core -p codex-tui -p codex-cli -p
    codex-app-server -p codex-exec -p codex-utils-sandbox-summary`
  • Add js_repl host helpers and exec end events (#10672)
    ## Summary
    
    This PR adds host-integrated helper APIs for `js_repl` and updates model
    guidance so the agent can use them reliably.
    
    ### What’s included
    
    - Add `codex.tool(name, args?)` in the JS kernel so `js_repl` can call
    normal Codex tools.
    - Keep persistent JS state and scratch-path helpers available:
      - `codex.state`
      - `codex.tmpDir`
    - Wire `js_repl` tool calls through the standard tool router path.
    - Add/align `js_repl` execution completion/end event behavior with
    existing tool logging patterns.
    - Update dynamic prompt injection (`project_doc`) to document:
      - how to call `codex.tool(...)`
      - raw output behavior
    - image flow via `view_image` (`codex.tmpDir` +
    `codex.tool("view_image", ...)`)
    - stdio safety guidance (`console.log` / `codex.tool`, avoid direct
    `process.std*`)
    
    ## Why
    
    - Standardize JS-side tool usage on `codex.tool(...)`
    - Make `js_repl` behavior more consistent with existing tool execution
    and event/logging patterns.
    - Give the model enough runtime guidance to use `js_repl` safely and
    effectively.
    
    ## Testing
    
    - Added/updated unit and runtime tests for:
      - `codex.tool` calls from `js_repl` (including shell/MCP paths)
      - image handoff flow via `view_image`
      - prompt-injection text for `js_repl` guidance
      - execution/end event behavior and related regression coverage
    
    
    
    
    #### [git stack](https://github.com/magus/git-stack-cli)
    -  `1` https://github.com/openai/codex/pull/10674
    - 👉 `2` https://github.com/openai/codex/pull/10672
    -  `3` https://github.com/openai/codex/pull/10671
    -  `4` https://github.com/openai/codex/pull/10673
    -  `5` https://github.com/openai/codex/pull/10670
  • feat(app-server): experimental flag to persist extended history (#11227)
    This PR adds an experimental `persist_extended_history` bool flag to
    app-server thread APIs so rollout logs can retain a richer set of
    EventMsgs for non-lossy Thread > Turn > ThreadItems reconstruction (i.e.
    on `thread/resume`).
    
    ### Motivation
    Today, our rollout recorder only persists a small subset (e.g. user
    message, reasoning, assistant message) of `EventMsg` types, dropping a
    good number (like command exec, file change, etc.) that are important
    for reconstructing full item history for `thread/resume`, `thread/read`,
    and `thread/fork`.
    
    Some clients want to be able to resume a thread without lossiness. This
    lossiness is primarily a UI thing, since what the model sees are
    `ResponseItem` and not `EventMsg`.
    
    ### Approach
    This change introduces an opt-in `persist_full_history` flag to preserve
    those events when you start/resume/fork a thread (defaults to `false`).
    
    This is done by adding an `EventPersistenceMode` to the rollout
    recorder:
    - `Limited` (existing behavior, default)
    - `Extended` (new opt-in behavior)
    
    In `Extended` mode, persist additional `EventMsg` variants needed for
    non-lossy app-server `ThreadItem` reconstruction. We now store the
    following ThreadItems that we didn't before:
    - web search
    - command execution
    - patch/file changes
    - MCP tool calls
    - image view calls
    - collab tool outcomes
    - context compaction
    - review mode enter/exit
    
    For **command executions** in particular, we truncate the output using
    the existing `truncate_text` from core to store an upper bound of 10,000
    bytes, which is also the default value for truncating tool outputs shown
    to the model. This keeps the size of the rollout file and command
    execution items returned over the wire reasonable.
    
    And we also persist `EventMsg::Error` which we can now map back to the
    Turn's status and populates the Turn's error metadata.
    
    #### Updates to EventMsgs
    To truly make `thread/resume` non-lossy, we also needed to persist the
    `status` on `EventMsg::CommandExecutionEndEvent` and
    `EventMsg::PatchApplyEndEvent`. Previously it was not obvious whether a
    command failed or was declined (similar for apply_patch). These
    EventMsgs were never persisted before so I made it a required field.
  • fix(core) model_info preserves slug (#11602)
    ## Summary
    Preserve the specified model slug when we get a prefix-based match
    
    ## Testing
    - [x] added unit test
    
    ---------
    
    Co-authored-by: Ahmed Ibrahim <aibrahim@openai.com>
  • Fix test flake (#11448)
    Flaking with
    
    ```
       Nextest run ID 6b7ff5f7-57f6-4c9c-8026-67f08fa2f81f with nextest profile: default
          Starting 3282 tests across 118 binaries (21 tests skipped)
              FAIL [  14.548s] (1367/3282) codex-core::all suite::apply_patch_cli::apply_patch_cli_can_use_shell_command_output_as_patch_input
        stdout ───
    
          running 1 test
          test suite::apply_patch_cli::apply_patch_cli_can_use_shell_command_output_as_patch_input ... FAILED
    
          failures:
    
          failures:
              suite::apply_patch_cli::apply_patch_cli_can_use_shell_command_output_as_patch_input
    
          test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 522 filtered out; finished in 14.41s
    
        stderr ───
    
          thread 'suite::apply_patch_cli::apply_patch_cli_can_use_shell_command_output_as_patch_input' (15632) panicked at C:\a\codex\codex\codex-rs\core\tests\common\lib.rs:186:14:
          timeout waiting for event: Elapsed(())
          stack backtrace:
          read_output:
          Exit code: 0
          Wall time: 8.5 seconds
          Output:
          line1
          naïve café
          line3
    
          stdout:
          line1
          naïve café
          line3
          patch:
          *** Begin Patch
          *** Add File: target.txt
          +line1
          +naïve café
          +line3
          *** End Patch
          note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
    ```
  • Handle response.incomplete (#11558)
    Treat it same as error.
  • Hide the first websocket retry (#11548)
    Sometimes connection needs to be quickly reestablished, don't produce an
    error for that.
  • fix compilation (#11532)
    fix broken main
  • feat: make sandbox read access configurable with ReadOnlyAccess (#11387)
    `SandboxPolicy::ReadOnly` previously implied broad read access and could
    not express a narrower read surface.
    This change introduces an explicit read-access model so we can support
    user-configurable read restrictions in follow-up work, while preserving
    current behavior today.
    
    It also ensures unsupported backends fail closed for restricted-read
    policies instead of silently granting broader access than intended.
    
    ## What
    
    - Added `ReadOnlyAccess` in protocol with:
      - `Restricted { include_platform_defaults, readable_roots }`
      - `FullAccess`
    - Updated `SandboxPolicy` to carry read-access configuration:
      - `ReadOnly { access: ReadOnlyAccess }`
      - `WorkspaceWrite { ..., read_only_access: ReadOnlyAccess }`
    - Preserved existing behavior by defaulting current construction paths
    to `ReadOnlyAccess::FullAccess`.
    - Threaded the new fields through sandbox policy consumers and call
    sites across `core`, `tui`, `linux-sandbox`, `windows-sandbox`, and
    related tests.
    - Updated Seatbelt policy generation to honor restricted read roots by
    emitting scoped read rules when full read access is not granted.
    - Added fail-closed behavior on Linux and Windows backends when
    restricted read access is requested but not yet implemented there
    (`UnsupportedOperation`).
    - Regenerated app-server protocol schema and TypeScript artifacts,
    including `ReadOnlyAccess`.
    
    ## Compatibility / rollout
    
    - Runtime behavior remains unchanged by default (`FullAccess`).
    - API/schema changes are in place so future config wiring can enable
    restricted read access without another policy-shape migration.
  • Update context window after model switch (#11520)
    - Update token usage aggregation to refresh model context window after a
    model change.
    - Add protocol/core tests, including an e2e model-switch test that
    validates switching to a smaller model updates telemetry.
  • Clamp auto-compact limit to context window (#11516)
    - Clamp auto-compaction to the minimum of configured limit and 90% of
    context window
    - Add an e2e compact test for clamped behavior
    - Update remote compact tests to account for earlier auto-compaction in
    setup turns
  • Pre-sampling compact with previous model context (#11504)
    - Run pre-sampling compact through a single helper that builds
    previous-model turn context and compacts before the follow-up request
    when switching to a smaller context window.
    - Keep compaction events on the parent turn id and add compact suite
    coverage for switch-in-session and resume+switch flows.
  • Consolidate search_tool feature into apps (#11509)
    ## Summary
    - Remove `Feature::SearchTool` and the `search_tool` config key from the
    feature registry/schema.
    - Gate `search_tool_bm25` exposure via `Feature::Apps` in
    `core/src/tools/spec.rs`.
    - Update MCP selection logic in `core/src/codex.rs` to use
    `Feature::Apps` for search-tool behavior.
    - Update `core/tests/suite/search_tool.rs` to enable `Feature::Apps`.
    - Regenerate `core/config.schema.json` via `just write-config-schema`.
    
    ## Testing
    - `just fmt`
    - `cargo test -p codex-core --test all suite::search_tool::`
    
    ## Tickets
    - None
  • Reapply "Add app-server transport layer with websocket support" (#11370)
    Reapply "Add app-server transport layer with websocket support" with
    additional fixes from https://github.com/openai/codex/pull/11313/changes
    to avoid deadlocking.
    
    This reverts commit 47356ff83c.
    
    ## Summary
    
    To avoid deadlocking when queues are full, we maintain separate tokio
    tasks dedicated to incoming vs outgoing event handling
    - split the app-server main loop into two tasks in
    `run_main_with_transport`
       - inbound handling (`transport_event_rx`)
       - outbound handling (`outgoing_rx` + `thread_created_rx`)
    - separate incoming and outgoing websocket tasks
    
    ## Validation
    
    Integration tests, testing thoroughly e2e in codex app w/ >10 concurrent
    requests
    
    <img width="1365" height="979" alt="Screenshot 2026-02-10 at 2 54 22 PM"
    src="https://github.com/user-attachments/assets/47ca2c13-f322-4e5c-bedd-25859cbdc45f"
    />
    
    ---------
    
    Co-authored-by: jif-oai <jif@openai.com>
  • Do not attempt to append after response.completed (#11402)
    Completed responses are fully done, and new response must be created.
  • Remove test-support feature from codex-core and replace it with explicit test toggles (#11405)
    ## Why
    
    `codex-core` was being built in multiple feature-resolved permutations
    because test-only behavior was modeled as crate features. For a large
    crate, those permutations increase compile cost and reduce cache reuse.
    
    ## Net Change
    
    - Removed the `test-support` crate feature and related feature wiring so
    `codex-core` no longer needs separate feature shapes for test consumers.
    - Standardized cross-crate test-only access behind
    `codex_core::test_support`.
    - External test code now imports helpers from
    `codex_core::test_support`.
    - Underlying implementation hooks are kept internal (`pub(crate)`)
    instead of broadly public.
    
    ## Outcome
    
    - Fewer `codex-core` build permutations.
    - Better incremental cache reuse across test targets.
    - No intended production behavior change.
  • feat: support multiple rate limits (#11260)
    Added multi-limit support end-to-end by carrying limit_name in
    rate-limit snapshots and handling multiple buckets instead of only
    codex.
    Extended /usage client parsing to consume additional_rate_limits
    Updated TUI /status and in-memory state to store/render per-limit
    snapshots
    Extended app-server rate-limit read response: kept rate_limits and added
    rate_limits_by_name.
    Adjusted usage-limit error messaging for non-default codex limit buckets
  • chore: persist turn_id in rollout session and make turn_id uuid based (#11246)
    Problem:
    1. turn id is constructed in-memory;
    2. on resuming threads, turn_id might not be unique;
    3. client cannot no the boundary of a turn from rollout files easily.
    
    This PR does three things:
    1. persist `task_started` and `task_complete` events;
    1. persist `turn_id` in rollout turn events;
    5. generate turn_id as unique uuids instead of incrementing it in
    memory.
    
    This helps us resolve the issue of clients wanting to have unique turn
    ids for resuming a thread, and knowing the boundry of each turn in
    rollout files.
    
    example debug logs
    ```
    2026-02-11T00:32:10.746876Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=8 turn=Turn { id: "019c4a07-d809-74c3-bc4b-fd9618487b4b", items: [UserMessage { id: "item-24", content: [Text { text: "hi", text_elements: [] }] }, AgentMessage { id: "item-25", text: "Hi. I’m in the workspace with your current changes loaded and ready. Send the next task and I’ll execute it end-to-end." }], status: Completed, error: None }
    2026-02-11T00:32:10.746888Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=9 turn=Turn { id: "019c4a18-1004-76c0-a0fb-a77610f6a9b8", items: [UserMessage { id: "item-26", content: [Text { text: "hello", text_elements: [] }] }, AgentMessage { id: "item-27", text: "Hello. Ready for the next change in `codex-rs`; I can continue from the current in-progress diff or start a new task." }], status: Completed, error: None }
    2026-02-11T00:32:10.746899Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=10 turn=Turn { id: "019c4a19-41f0-7db0-ad78-74f1503baeb8", items: [UserMessage { id: "item-28", content: [Text { text: "hello", text_elements: [] }] }, AgentMessage { id: "item-29", text: "Hello. Send the specific change you want in `codex-rs`, and I’ll implement it and run the required checks." }], status: Completed, error: None }
    ```
    
    backward compatibility:
    if you try to resume an old session without task_started and
    task_complete event populated, the following happens:
    - If you resume and do nothing: those reconstructed historical IDs can
    differ next time you resume.
    - If you resume and send a new turn: the new turn gets a fresh UUID from
    live submission flow and is persisted, so that new turn’s ID is stable
    on later resumes.
    I think this behavior is fine, because we only care about deterministic
    turn id once a turn is triggered.
  • Do not resend output items in incremental websockets connections (#11383)
    In the incremental websocket output items are already part of the
    context, no need to send them again and duplicate.
  • fix(exec-policy) No empty command lists (#11397)
    ## Summary
    This should rarely, if ever, happen in practice. But regardless, we
    should never provide an empty list of `commands` to ExecPolicy. This PR
    is almost entirely adding test around these cases.
    
    ## Testing
    - [x] Adds a bunch of unit tests for this
  • Remove deterministic_process_ids feature to avoid duplicate codex-core builds (#11393)
    ## Why
    
    `codex-core` enabled `deterministic_process_ids` through a self
    dev-dependency.
    That forced a second feature-resolved build of the same crate, which
    increased
    compile time and test latency.
    
    ## What Changed
    
    - Removed the `deterministic_process_ids` feature from
    `codex-rs/core/Cargo.toml`.
    - Removed the self dev-dependency on `codex-core` that enabled that
    feature.
    - Removed the Bazel `deterministic_process_ids` crate feature for
    `codex-core`.
    - Added a test-only `AtomicBool` override in unified exec process-id
    allocation.
    - Added a test-support setter for that override and re-exported it from
    `codex-core`.
    - Enabled deterministic process IDs in integration tests via
    `core_test_support` ctor.
    
    ## Behavior
    
    - Production behavior remains random process IDs.
    - Unit tests remain deterministic via `cfg(test)`.
    - Integration tests remain deterministic via explicit test-support
    initialization.
    
    ## Validation
    
    - `just fmt`
    - `cargo test -p codex-core unified_exec::`
    - `cargo test -p codex-core --test all unified_exec -- --test-threads=1`
    - `cargo tree -p codex-core -e features` (verified the removed feature
    path)
  • Prefer websocket transport when model opts in (#11386)
    Summary
    - add a `prefer_websockets` field to `ModelInfo`, defaulting to `false`
    in all fixtures and constructors
    - wire the new flag into websocket selection so models that opt in
    always use websocket transport even when the feature gate is off
    
    Testing
    - Not run (not requested)
  • Update models.json (#11274)
    Automated update of models.json.
    
    ---------
    
    Co-authored-by: aibrahim-oai <219906144+aibrahim-oai@users.noreply.github.com>
    Co-authored-by: Ahmed Ibrahim <aibrahim@openai.com>
    Co-authored-by: Sayan Sisodiya <sayan@openai.com>
  • Strip unsupported images from prompt history to guard against model switch (#11349)
    - Make `ContextManager::for_prompt` modality-aware and strip input_image
    content when the active model is text-only.
    - Added a test for multi-model -> text-only model switch
  • include sandbox (seatbelt, elevated, etc.) as in turn metadata header (#10946)
    This will help us understand retention/usage for folks who use the
    Windows (or any other) sandboxes
  • Sanitize MCP image output for text-only models (#11346)
    - Replace image blocks in MCP tool results with a text placeholder when
    the active model does not accept image input.
    - Add an e2e rmcp test to verify sanitized tool output is what gets sent
    back to the model.
  • Always expose view_image and return unsupported image-input error (#11336)
    - Keep `view_image` in the advertised tool list for all models.
    - Return a clear error when the current model does not support image
    inputs, and cover it with a unit test.
  • Compare full request for websockets incrementality (#11343)
    Tools can dynamically change mid-turn now. We need to be more thorough
    about reusing incremental connections.
  • test(core): stabilize ARM bazel remote-model and parallelism tests (#11330)
    ## Summary
    - keep wiremock MockServer handles alive through async assertions in
    remote model suite tests
    - assert /models request count in remote_models_hide_picker_only_models
    - use a slightly higher parallel timing threshold on aarch64 while
    keeping existing x86 threshold
    
    ## Validation
    - just fmt
    - targeted tests:
    - cargo test -p codex-core --test all
    suite::remote_models::remote_models_merge_replaces_overlapping_model --
    --exact
    - cargo test -p codex-core --test all
    suite::remote_models::remote_models_hide_picker_only_models -- --exact
    - cargo test -p codex-core --test all
    suite::tool_parallelism::shell_tools_run_in_parallel -- --exact
    - soak loop: 40 iterations of all three targeted tests
    
    ## Notes
    - cargo test -p codex-core has one unrelated local-env failure in
    shell_snapshot::tests::try_new_creates_and_deletes_snapshot_file from
    exported certificate env content in this workspace.
    - local bazel test //codex-rs/core:core-all-test failed to build due
    missing rust-objcopy in this host toolchain.
  • Fix: update parallel tool call exec approval to approve on request id (#11162)
    ### Summary
    
    In parallel tool call, exec command approvals were not approved at
    request level but at a turn level. i.e. when a single request is
    approved, the system currently treats all requests in turn as approved.
    
    ### Before
    
    https://github.com/user-attachments/assets/d50ed129-b3d2-4b2f-97fa-8601eb11f6a8
    
    ### After
    
    https://github.com/user-attachments/assets/36528a43-a4aa-4775-9e12-f13287ef19fc
  • fix(protocol): approval policy never prompt (#11288)
    This removes overly directed language about how the model should behave
    when it's in `approval_policy=never` mode.
    
    ---------
    
    Co-authored-by: Dylan Hurd <dylan.hurd@openai.com>
  • Fix pending input test waiting logic (#11322)
    ## Summary
    - remove redundant user message wait that could time out and cause
    flakiness
    - rely on the existing turn-complete wait to ensure the follow-up
    request is observed
    
    ## Testing
    - Not run (not requested)
  • feat(sandbox): enforce proxy-aware network routing in sandbox (#11113)
    ## Summary
    - expand proxy env injection to cover common tool env vars
    (`HTTP_PROXY`/`HTTPS_PROXY`/`ALL_PROXY`/`NO_PROXY` families +
    tool-specific variants)
    - harden macOS Seatbelt network policy generation to route through
    inferred loopback proxy endpoints and fail closed when proxy env is
    malformed
    - thread proxy-aware Linux sandbox flags and add minimal bwrap netns
    isolation hook for restricted non-proxy runs
    - add/refresh tests for proxy env wiring, Seatbelt policy generation,
    and Linux sandbox argument wiring
  • Adjust shell command timeouts for Windows (#11247)
    Summary
    - add platform-aware defaults for shell command timeouts so Windows
    tests get longer waits
    - keep medium timeout longer on Windows to ensure flakiness is reduced
    
    Testing
    - Not run (not requested)
  • Remove offline fallback for models (#11238)
    # External (non-OpenAI) Pull Request Requirements
    
    Before opening this Pull Request, please read the dedicated
    "Contributing" markdown file or your PR may be closed:
    https://github.com/openai/codex/blob/main/docs/contributing.md
    
    If your PR conforms to our contribution guidelines, replace this text
    with a detailed and high quality description of your changes.
    
    Include a link to a bug report or enhancement request.
  • Use longest remote model prefix matching (#11228)
    Match model metadata by longest matching remote slug prefix before local
    fallback.
    
    - Update `get_model_info` to prefer the most specific remote slug prefix
    for the requested model.
    - Add an integration test to assert `gpt-5.3-codex-test` resolves to
    `gpt-5.3-codex` over `gpt-5.3`.
  • [apps] Add gated instructions for Apps. (#10924)
    - [x] Add gated instructions for Apps.
  • feat: search_tool (#10657)
    **Why We Did This**
    - The goal is to reduce MCP tool context pollution by not exposing the
    full MCP tool list up front
    - It forces an explicit discovery step (`search_tool_bm25`) so the model
    narrows tool scope before making MCP calls, which helps relevance and
    lowers prompt/tool clutter.
    
    **What It Changed**
    - Added a new experimental feature flag `search_tool` in
    `core/src/features.rs:90` and `core/src/features.rs:430`.
    - Added config/schema support for that flag in
    `core/config.schema.json:214` and `core/config.schema.json:1235`.
    - Added BM25 dependency (`bm25`) in `Cargo.toml:129` and
    `core/Cargo.toml:23`.
    - Added new tool handler `search_tool_bm25` in
    `core/src/tools/handlers/search_tool_bm25.rs:18`.
    - Registered the handler and tool spec in
    `core/src/tools/handlers/mod.rs:11` and `core/src/tools/spec.rs:780` and
    `core/src/tools/spec.rs:1344`.
    - Extended `ToolsConfig` to carry `search_tool` enablement in
    `core/src/tools/spec.rs:32` and `core/src/tools/spec.rs:56`.
    - Injected dedicated developer instructions for tool-discovery workflow
    in `core/src/codex.rs:483` and `core/src/codex.rs:1976`, using
    `core/templates/search_tool/developer_instructions.md:1`.
    - Added session state to store one-shot selected MCP tools in
    `core/src/state/session.rs:27` and `core/src/state/session.rs:131`.
    - Added filtering so when feature is enabled, only selected MCP tools
    are exposed on the next request (then consumed) in
    `core/src/codex.rs:3800` and `core/src/codex.rs:3843`.
    - Added E2E suite coverage for
    enablement/instructions/hide-until-search/one-turn-selection in
    `core/tests/suite/search_tool.rs:72`,
    `core/tests/suite/search_tool.rs:109`,
    `core/tests/suite/search_tool.rs:147`, and
    `core/tests/suite/search_tool.rs:218`.
    - Refactored test helper utilities to support config-driven tool
    collection in `core/tests/suite/tools.rs:281`.
    
    **Net Behavioral Effect**
    - With `search_tool` **off**: existing MCP behavior (tools exposed
    normally).
    - With `search_tool` **on**: MCP tools start hidden, model must call
    `search_tool_bm25`, and only returned `selected_tools` are available for
    the next model call.