Commit Graph

131 Commits

  • chore: drop and clean from phase 1 (#11605)
    This PR is mostly cleaning and simplifying phase 1 of memories
  • Extract codex-config from codex-core (#11389)
    `codex-core` had accumulated config loading, requirements parsing,
    constraint logic, and config-layer state handling in a single crate.
    This change extracts that subsystem into `codex-config` to reduce
    `codex-core` rebuild/test surface area and isolate future config work.
    
    ## What Changed
    
    ### Added `codex-config`
    
    - Added new workspace crate `codex-rs/config` (`codex-config`).
    - Added workspace/build wiring in:
      - `codex-rs/Cargo.toml`
      - `codex-rs/config/Cargo.toml`
      - `codex-rs/config/BUILD.bazel`
    - Updated lockfiles (`codex-rs/Cargo.lock`, `MODULE.bazel.lock`).
    - Added `codex-core` -> `codex-config` dependency in
    `codex-rs/core/Cargo.toml`.
    
    ### Moved config internals from `core` into `config`
    
    Moved modules to `codex-rs/config/src/`:
    
    - `core/src/config/constraint.rs` -> `config/src/constraint.rs`
    - `core/src/config_loader/cloud_requirements.rs` ->
    `config/src/cloud_requirements.rs`
    - `core/src/config_loader/config_requirements.rs` ->
    `config/src/config_requirements.rs`
    - `core/src/config_loader/fingerprint.rs` -> `config/src/fingerprint.rs`
    - `core/src/config_loader/merge.rs` -> `config/src/merge.rs`
    - `core/src/config_loader/overrides.rs` -> `config/src/overrides.rs`
    - `core/src/config_loader/requirements_exec_policy.rs` ->
    `config/src/requirements_exec_policy.rs`
    - `core/src/config_loader/state.rs` -> `config/src/state.rs`
    
    `codex-config` now re-exports this surface from `config/src/lib.rs` at
    the crate top level.
    
    ### Updated `core` to consume/re-export `codex-config`
    
    - `core/src/config_loader/mod.rs` now imports/re-exports config-loader
    types/functions from top-level `codex_config::*`.
    - Local moved modules were removed from `core/src/config_loader/`.
    - `core/src/config/mod.rs` now re-exports constraint types from
    `codex_config`.
  • Remove test-support feature from codex-core and replace it with explicit test toggles (#11405)
    ## Why
    
    `codex-core` was being built in multiple feature-resolved permutations
    because test-only behavior was modeled as crate features. For a large
    crate, those permutations increase compile cost and reduce cache reuse.
    
    ## Net Change
    
    - Removed the `test-support` crate feature and related feature wiring so
    `codex-core` no longer needs separate feature shapes for test consumers.
    - Standardized cross-crate test-only access behind
    `codex_core::test_support`.
    - External test code now imports helpers from
    `codex_core::test_support`.
    - Underlying implementation hooks are kept internal (`pub(crate)`)
    instead of broadly public.
    
    ## Outcome
    
    - Fewer `codex-core` build permutations.
    - Better incremental cache reuse across test targets.
    - No intended production behavior change.
  • Remove deterministic_process_ids feature to avoid duplicate codex-core builds (#11393)
    ## Why
    
    `codex-core` enabled `deterministic_process_ids` through a self
    dev-dependency.
    That forced a second feature-resolved build of the same crate, which
    increased
    compile time and test latency.
    
    ## What Changed
    
    - Removed the `deterministic_process_ids` feature from
    `codex-rs/core/Cargo.toml`.
    - Removed the self dev-dependency on `codex-core` that enabled that
    feature.
    - Removed the Bazel `deterministic_process_ids` crate feature for
    `codex-core`.
    - Added a test-only `AtomicBool` override in unified exec process-id
    allocation.
    - Added a test-support setter for that override and re-exported it from
    `codex-core`.
    - Enabled deterministic process IDs in integration tests via
    `core_test_support` ctor.
    
    ## Behavior
    
    - Production behavior remains random process IDs.
    - Unit tests remain deterministic via `cfg(test)`.
    - Integration tests remain deterministic via explicit test-support
    initialization.
    
    ## Validation
    
    - `just fmt`
    - `cargo test -p codex-core unified_exec::`
    - `cargo test -p codex-core --test all unified_exec -- --test-threads=1`
    - `cargo tree -p codex-core -e features` (verified the removed feature
    path)
  • chore: rename codex-command to codex-shell-command (#11378)
    This addresses some post-merge feedback on
    https://github.com/openai/codex/pull/11361:
    
    - crate rename
    - reuse `detect_shell_type()` utility
  • # Split command parsing/safety out of codex-core into new codex-command (#11361)
    `codex-core` had accumulated command parsing and command safety logic
    (`bash`, `powershell`, `parse_command`, and `command_safety`) that is
    logically cohesive but orthogonal to most core session/runtime logic.
    Keeping this code in `codex-core` made the crate increasingly monolithic
    and raised iteration cost for unrelated core changes.
    
    This change extracts that surface into a dedicated crate,
    `codex-command`, while preserving existing `codex_core::...` call sites
    via re-exports.
    
    ## Why this refactor
    
    During analysis, command parsing/safety stood out as a good first split
    because it has:
    
    - a clear domain boundary (shell parsing + safety classification)
    - relatively self-contained dependencies (notably `tree-sitter` /
    `tree-sitter-bash`)
    - a meaningful standalone test surface (`134` tests moved with the
    crate)
    - many downstream uses that benefit from independent compilation and
    caching
    
    The practical problem was build latency from a large `codex-core`
    compile/test graph. Clean-build timings before and after this split
    showed measurable wins:
    
    - `cargo check -p codex-core`: `57.08s` -> `53.54s` (~`6.2%` faster)
    - `cargo test -p codex-core --no-run`: `2m39.9s` -> `2m20s` (~`12.4%`
    faster)
    - `codex-core lib` compile unit: `57.18s` -> `49.67s` (~`13.1%` faster)
    - `codex-core lib(test)` compile unit: `60.87s` -> `53.21s` (~`12.6%`
    faster)
    
    This gives a concrete reduction in core build overhead without changing
    behavior.
    
    ## What changed
    
    ### New crate
    
    - Added `codex-rs/command` as workspace crate `codex-command`.
    - Added:
      - `command/src/lib.rs`
      - `command/src/bash.rs`
      - `command/src/powershell.rs`
      - `command/src/parse_command.rs`
      - `command/src/command_safety/*`
      - `command/src/shell_detect.rs`
      - `command/BUILD.bazel`
    
    ### Code moved out of `codex-core`
    
    - Moved modules from `core/src` into `command/src`:
      - `bash.rs`
      - `powershell.rs`
      - `parse_command.rs`
      - `command_safety/*`
    
    ### Dependency graph updates
    
    - Added workspace member/dependency entries for `codex-command` in
    `codex-rs/Cargo.toml`.
    - Added `codex-command` dependency to `codex-rs/core/Cargo.toml`.
    - Removed `tree-sitter` and `tree-sitter-bash` from `codex-core` direct
    deps (now owned by `codex-command`).
    
    ### API compatibility for callers
    
    To avoid immediate downstream churn, `codex-core` now re-exports the
    moved modules/functions:
    
    - `codex_command::bash`
    - `codex_command::powershell`
    - `codex_command::parse_command`
    - `codex_command::is_safe_command`
    - `codex_command::is_dangerous_command`
    
    This keeps existing `codex_core::...` paths working while enabling
    gradual migration to direct `codex-command` usage.
    
    ### Internal decoupling detail
    
    - Added `command::shell_detect` so moved `bash`/`powershell` logic no
    longer depends on core shell internals.
    - Adjusted PowerShell helper visibility in `codex-command` for existing
    core test usage (`UTF8` prefix helper + executable discovery functions).
    
    ## Validation
    
    - `just fmt`
    - `just fix -p codex-command -p codex-core`
    - `cargo test -p codex-command` (`134` passed)
    - `cargo test -p codex-core --no-run`
    - `cargo test -p codex-core shell_command_handler`
    
    ## Notes / follow-up
    
    This commit intentionally prioritizes boundary extraction and
    compatibility. A follow-up can migrate downstream crates to depend
    directly on `codex-command` (instead of through `codex-core` re-exports)
    to realize additional incremental build wins.
  • Extract hooks into dedicated crate (#11311)
    Summary
    - move `core/src/hooks` implementation into a new `codex-hooks` crate
    with its own manifest
    - update `codex-rs` workspace and `codex-core` crate to depend on the
    extracted `hooks` crate and wire up the shared APIs
    - ensure references, modules, and lockfile reflect the new crate layout
    
    Testing
    - Not run (not requested)
  • memories: add extraction and prompt module foundation (#11200)
    ## Summary
    - add the new `core/src/memories` module (phase-one parsing, rollout
    filtering, storage, selection, prompts)
    - add Askama-backed memory templates for stage-one input/system and
    consolidation prompts
    - add module tests for parsing, filtering, path bucketing, and summary
    maintenance
    
    ## Testing
    - just fmt
    - cargo test -p codex-core --lib memories::
  • feat: search_tool (#10657)
    **Why We Did This**
    - The goal is to reduce MCP tool context pollution by not exposing the
    full MCP tool list up front
    - It forces an explicit discovery step (`search_tool_bm25`) so the model
    narrows tool scope before making MCP calls, which helps relevance and
    lowers prompt/tool clutter.
    
    **What It Changed**
    - Added a new experimental feature flag `search_tool` in
    `core/src/features.rs:90` and `core/src/features.rs:430`.
    - Added config/schema support for that flag in
    `core/config.schema.json:214` and `core/config.schema.json:1235`.
    - Added BM25 dependency (`bm25`) in `Cargo.toml:129` and
    `core/Cargo.toml:23`.
    - Added new tool handler `search_tool_bm25` in
    `core/src/tools/handlers/search_tool_bm25.rs:18`.
    - Registered the handler and tool spec in
    `core/src/tools/handlers/mod.rs:11` and `core/src/tools/spec.rs:780` and
    `core/src/tools/spec.rs:1344`.
    - Extended `ToolsConfig` to carry `search_tool` enablement in
    `core/src/tools/spec.rs:32` and `core/src/tools/spec.rs:56`.
    - Injected dedicated developer instructions for tool-discovery workflow
    in `core/src/codex.rs:483` and `core/src/codex.rs:1976`, using
    `core/templates/search_tool/developer_instructions.md:1`.
    - Added session state to store one-shot selected MCP tools in
    `core/src/state/session.rs:27` and `core/src/state/session.rs:131`.
    - Added filtering so when feature is enabled, only selected MCP tools
    are exposed on the next request (then consumed) in
    `core/src/codex.rs:3800` and `core/src/codex.rs:3843`.
    - Added E2E suite coverage for
    enablement/instructions/hide-until-search/one-turn-selection in
    `core/tests/suite/search_tool.rs:72`,
    `core/tests/suite/search_tool.rs:109`,
    `core/tests/suite/search_tool.rs:147`, and
    `core/tests/suite/search_tool.rs:218`.
    - Refactored test helper utilities to support config-driven tool
    collection in `core/tests/suite/tools.rs:281`.
    
    **Net Behavioral Effect**
    - With `search_tool` **off**: existing MCP behavior (tools exposed
    normally).
    - With `search_tool` **on**: MCP tools start hidden, model must call
    `search_tool_bm25`, and only returned `selected_tools` are available for
    the next model call.
  • Load requirements on windows (#10770)
    We support requirements on Unix, loading from
    `/etc/codex/requirements.toml`. On MacOS, we also support MDM.
    
    Now, on Windows, we'll load requirements from
    `%ProgramData%\OpenAI\Codex\requirements.toml`
  • feat: include NetworkConfig through ExecParams (#11105)
    This PR adds the following field to `Config`:
    
    ```rust
    pub network: Option<NetworkProxy>,
    ```
    
    Though for the moment, it will always be initialized as `None` (this
    will be addressed in a subsequent PR).
    
    This PR does the work to thread `network` through to `execute_exec_env()`, `process_exec_tool_call()`, and `UnifiedExecRuntime.run()` to ensure it is available whenever we span a process.
  • Added support for live updates to skills (#10478)
    Add a centralized FileWatcher in codex-core (using notify) that watches
    skill roots from the config layer stack (recursive)
    
    Send `SkillsChanged` events when relevant file system changes are
    detected
    
    On `SkillsChanged`:
    * Invalidate the skills cache immediately in ThreadManager
    * Emit EventMsg::SkillsUpdateAvailable to active sessions
    ~~* Broadcast a new app-server notification:
    SkillsListUpdatedNotification~~
    
    This change does not inject new items into the event stream. That means
    the agent will not know about new skills, so it won't be able to
    implicitly invoke new skills. It also won't know about changes to
    existing skills, so if it has already read the contents of a modified
    skill, it will not honor the new behavior.
    
    This change also does not detect modifications to AGENTS.md.
    
    I plan to address these limitations in a follow-on PR modeled after
    #9985. Injection of new skills and AGENTS was deemed to risky, hence the
    need to split the feature into two stages. The changes in this PR were
    designed to easily accommodate the second stage once we have some other
    foundational changes in place.
    
    Testing: In addition to automated tests, I did manual testing to confirm
    that newly-created skills, deleted skills, and renamed skills are
    reflected in the TUI skill picker menu. Also confirmed that
    modifications to behaviors for explicitly-invoked skills are honored.
    
    ---------
    
    Co-authored-by: Xin Lin <xl@openai.com>
  • feat: add APIs to list and download public remote skills (#10448)
    Add API to list / download from remote public skills
  • feat: replace custom mcp-types crate with equivalents from rmcp (#10349)
    We started working with MCP in Codex before
    https://crates.io/crates/rmcp was mature, so we had our own crate for
    MCP types that was generated from the MCP schema:
    
    
    https://github.com/openai/codex/blob/8b95d3e082376f4cb23e92641705a22afb28a9da/codex-rs/mcp-types/README.md
    
    Now that `rmcp` is more mature, it makes more sense to use their MCP
    types in Rust, as they handle details (like the `_meta` field) that our
    custom version ignored. Though one advantage that our custom types had
    is that our generated types implemented `JsonSchema` and `ts_rs::TS`,
    whereas the types in `rmcp` do not. As such, part of the work of this PR
    is leveraging the adapters between `rmcp` types and the serializable
    types that are API for us (app server and MCP) introduced in #10356.
    
    Note this PR results in a number of changes to
    `codex-rs/app-server-protocol/schema`, which merit special attention
    during review. We must ensure that these changes are still
    backwards-compatible, which is possible because we have:
    
    ```diff
    - export type CallToolResult = { content: Array<ContentBlock>, isError?: boolean, structuredContent?: JsonValue, };
    + export type CallToolResult = { content: Array<JsonValue>, structuredContent?: JsonValue, isError?: boolean, _meta?: JsonValue, };
    ```
    
    so `ContentBlock` has been replaced with the more general `JsonValue`.
    Note that `ContentBlock` was defined as:
    
    ```typescript
    export type ContentBlock = TextContent | ImageContent | AudioContent | ResourceLink | EmbeddedResource;
    ```
    
    so the deletion of those individual variants should not be a cause of
    great concern.
    
    Similarly, we have the following change in
    `codex-rs/app-server-protocol/schema/typescript/Tool.ts`:
    
    ```
    - export type Tool = { annotations?: ToolAnnotations, description?: string, inputSchema: ToolInputSchema, name: string, outputSchema?: ToolOutputSchema, title?: string, };
    + export type Tool = { name: string, title?: string, description?: string, inputSchema: JsonValue, outputSchema?: JsonValue, annotations?: JsonValue, icons?: Array<JsonValue>, _meta?: JsonValue, };
    ```
    
    so:
    
    - `annotations?: ToolAnnotations` ➡️ `JsonValue`
    - `inputSchema: ToolInputSchema` ➡️ `JsonValue`
    - `outputSchema?: ToolOutputSchema` ➡️ `JsonValue`
    
    and two new fields: `icons?: Array<JsonValue>, _meta?: JsonValue`
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/10349).
    * #10357
    * __->__ #10349
    * #10356
  • feat: Read personal skills from .agents/skills (#10437)
    - Issue: https://github.com/agentskills/agentskills/issues/15
    - Follow-up to https://github.com/openai/codex/pull/10317 (for team/repo
    skills)
    - This change now also loads personal/user skills from
    `$HOME/.agents/skills` (or `~/.agents/skills`) in addition to loading
    from `.agents/skills` inside of git repos.
    - The location of `.system` skills remains unchanged.
    - Keeping backwards compatibility with `~/.codex/skills` for now until
    we fully deprecate.
    
    With skills in both personal folders:
    <img width="831" height="421" alt="image"
    src="https://github.com/user-attachments/assets/ad8ac918-bfe6-4a2d-8a8e-d608c9d3d701"
    />
    
    We load from both places:
    <img width="607" height="236" alt="image"
    src="https://github.com/user-attachments/assets/480f4db0-ae64-4dc1-bdf5-c5de98c16f5c"
    />
  • fix: System skills marker includes nested folders recursively (#10350)
    Updated system skills bundled with Codex were not correctly replacing
    the user's skills in their .system folder.
    
    - Fix `.codex-system-skills.marker` not updating by hashing embedded
    system skills recursively (nested dirs + file contents), so updates
    trigger a reinstall.
    - Added a build Cargo hook to rerun if there are changes in
    `src/skills/assets/samples/*`, ensuring embedded skill updates rebuild
    correctly under caching.
    - Add a small unit test to ensure nested entries are included in the
    fingerprint.
  • Add websocket telemetry metrics and labels (#10316)
    Summary
    - expose websocket telemetry hooks through the responses client so
    request durations and event processing can be reported
    - record websocket request/event metrics and emit runtime telemetry
    events that the history UI now surfaces
    - improve tests to cover websocket telemetry reporting and guard runtime
    summary updates
    
    
    <img width="824" height="79" alt="Screenshot 2026-01-31 at 5 28 12 PM"
    src="https://github.com/user-attachments/assets/ea9a7965-d8b4-4e3c-a984-ef4fdc44c81d"
    />
  • Validate CODEX_HOME before resolving (#10249)
    Summary
    - require `CODEX_HOME` to point to an existing directory before
    canonicalizing and surface clear errors otherwise
    - share the same helper logic in both `core` and `rmcp-client` and add
    unit tests that cover missing, non-directory, valid, and default paths
    
    This addresses #9222
  • feat: sqlite 1 (#10004)
    Add a `.sqlite` database to be used to store rollout metatdata (and
    later logs)
    This PR is phase 1:
    * Add the database and the required infrastructure
    * Add a backfill of the database
    * Persist the newly created rollout both in files and in the DB
    * When we need to get metadata or a rollout, consider the `JSONL` as the
    source of truth but compare the results with the DB and show any errors
  • Add exec policy TOML representation (#10026)
    We'd like to represent these in `requirements.toml`. This just adds the
    representation and the tests, doesn't wire it up anywhere yet.
  • Another round of improvements for config error messages (#9746)
    In a [recent PR](https://github.com/openai/codex/pull/9182), I made some
    improvements to config error messages so errors didn't leave app server
    clients in a dead state. This is a follow-on PR to make these error
    messages more readable and actionable for both TUI and GUI users. For
    example, see #9668 where the user was understandably confused about the
    source of the problem and how to fix it.
    
    The improved error message:
    1. Clearly identifies the config file where the error was found (which
    is more important now that we support layered configs)
    2. Provides a line and column number of the error
    3. Displays the line where the error occurred and underlines it
    
    For example, if my `config.toml` includes the following:
    ```toml
    [features]
    collaboration_modes = "true"
    ```
    
    Here's the current CLI error message:
    ```
    Error loading config.toml: invalid type: string "true", expected a boolean in `features`
    ```
    
    And here's the improved message:
    ```
    Error loading config.toml:
    /Users/etraut/.codex/config.toml:43:23: invalid type: string "true", expected a boolean
       |
    43 | collaboration_modes = "true"
       |                       ^^^^^^
    ```
    
    The bulk of the new logic is contained within a new module
    `config_loader/diagnostics.rs` that is responsible for calculating the
    text range for a given toml path (which is more involved than I would
    have expected).
    
    In addition, this PR adds the file name and text range to the
    `ConfigWarningNotification` app server struct. This allows GUI clients
    to present the user with a better error message and an optional link to
    open the errant config file. This was a suggestion from @.bolinfest when
    he reviewed my previous PR.
  • chore(deps): bump arc-swap from 1.7.1 to 1.8.0 in /codex-rs (#9468)
    Bumps [arc-swap](https://github.com/vorner/arc-swap) from 1.7.1 to
    1.8.0.
    <details>
    <summary>Changelog</summary>
    <p><em>Sourced from <a
    href="https://github.com/vorner/arc-swap/blob/master/CHANGELOG.md">arc-swap's
    changelog</a>.</em></p>
    <blockquote>
    <h1>1.8.0</h1>
    <ul>
    <li>Support for Pin (<a
    href="https://redirect.github.com/vorner/arc-swap/issues/185">#185</a>,
    <a
    href="https://redirect.github.com/vorner/arc-swap/issues/183">#183</a>).</li>
    <li>Fix (hopefully) crash on ARM (<a
    href="https://redirect.github.com/vorner/arc-swap/issues/164">#164</a>).</li>
    <li>Fix Miri check (<a
    href="https://redirect.github.com/vorner/arc-swap/issues/186">#186</a>,
    <a
    href="https://redirect.github.com/vorner/arc-swap/issues/156">#156</a>).</li>
    <li>Fix support for Rust 1.31.0.</li>
    <li>Some minor clippy lints.</li>
    </ul>
    </blockquote>
    </details>
    <details>
    <summary>Commits</summary>
    <ul>
    <li><a
    href="https://github.com/vorner/arc-swap/commit/2540d266a837143948a0541a05d200fa1087a7db"><code>2540d26</code></a>
    Version bump to 1.8.0</li>
    <li><a
    href="https://github.com/vorner/arc-swap/commit/9981e3af2351d82fe6f77761ee1e4a8479ec1fc7"><code>9981e3a</code></a>
    Keep &quot;old&quot; Cargo.lock around</li>
    <li><a
    href="https://github.com/vorner/arc-swap/commit/57a8abbfc4100d918bcc4511eaa3c61740fe9c10"><code>57a8abb</code></a>
    Fix documentation links</li>
    <li><a
    href="https://github.com/vorner/arc-swap/commit/346c5b642b00acb30ea8756f8186599a30e1edbc"><code>346c5b6</code></a>
    Fix some clippy warnings</li>
    <li><a
    href="https://github.com/vorner/arc-swap/commit/0bd349a56bd448e0712a034f8892edfb6d4a41f2"><code>0bd349a</code></a>
    Fix support for Rust 1.31.0</li>
    <li><a
    href="https://github.com/vorner/arc-swap/commit/57aa5224c19124ad2fa26eae70fa7778dd2224ac"><code>57aa522</code></a>
    Merge pull request <a
    href="https://redirect.github.com/vorner/arc-swap/issues/185">#185</a>
    from SpriteOvO/pin</li>
    <li><a
    href="https://github.com/vorner/arc-swap/commit/4c0c4ab3218beeb0ae0b73d00e2a6c71b5b612f3"><code>4c0c4ab</code></a>
    Implement <code>RefCnt</code> for <code>Pin\&lt;Arc&gt;</code> and
    <code>Pin\&lt;Rc&gt;</code></li>
    <li><a
    href="https://github.com/vorner/arc-swap/commit/e596275acf37fceceb643a835e8b42563c42d919"><code>e596275</code></a>
    Avoid warnings about hidden lifetimes</li>
    <li><a
    href="https://github.com/vorner/arc-swap/commit/d849a2d17e02b66c58a67f95beff8f072e6a306c"><code>d849a2d</code></a>
    Use SeqCst in debt-lists</li>
    <li><a
    href="https://github.com/vorner/arc-swap/commit/1f9b221da9907d690ff10a119c7d0155e99d09cb"><code>1f9b221</code></a>
    Merge pull request <a
    href="https://redirect.github.com/vorner/arc-swap/issues/186">#186</a>
    from nbdd0121/prov</li>
    <li>Additional commits viewable in <a
    href="https://github.com/vorner/arc-swap/compare/v1.7.1...v1.8.0">compare
    view</a></li>
    </ul>
    </details>
    <br />
    
    
    [![Dependabot compatibility
    score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=arc-swap&package-manager=cargo&previous-version=1.7.1&new-version=1.8.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
    
    Dependabot will resolve any conflicts with this PR as long as you don't
    alter it yourself. You can also trigger a rebase manually by commenting
    `@dependabot rebase`.
    
    [//]: # (dependabot-automerge-start)
    [//]: # (dependabot-automerge-end)
    
    ---
    
    <details>
    <summary>Dependabot commands and options</summary>
    <br />
    
    You can trigger Dependabot actions by commenting on this PR:
    - `@dependabot rebase` will rebase this PR
    - `@dependabot recreate` will recreate this PR, overwriting any edits
    that have been made to it
    - `@dependabot merge` will merge this PR after your CI passes on it
    - `@dependabot squash and merge` will squash and merge this PR after
    your CI passes on it
    - `@dependabot cancel merge` will cancel a previously requested merge
    and block automerging
    - `@dependabot reopen` will reopen this PR if it is closed
    - `@dependabot close` will close this PR and stop Dependabot recreating
    it. You can achieve the same result by closing it manually
    - `@dependabot show <dependency name> ignore conditions` will show all
    of the ignore conditions of the specified dependency
    - `@dependabot ignore this major version` will close this PR and stop
    Dependabot creating any more for this major version (unless you reopen
    the PR or upgrade to it yourself)
    - `@dependabot ignore this minor version` will close this PR and stop
    Dependabot creating any more for this minor version (unless you reopen
    the PR or upgrade to it yourself)
    - `@dependabot ignore this dependency` will close this PR and stop
    Dependabot creating any more for this dependency (unless you reopen the
    PR or upgrade to it yourself)
    
    
    </details>
    
    Signed-off-by: dependabot[bot] <support@github.com>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
  • add generated jsonschema for config.toml (#8956)
    ### What
    Add JSON Schema generation for `config.toml`, with checked‑in
    `docs/config.schema.json`. We can move the schema elsewhere if preferred
    (and host it if there's demand).
    
    Add fixture test to prevent drift and `just write-config-schema` to
    regenerate on schema changes.
    
    Generate MCP config schema from `RawMcpServerConfig` instead of
    `McpServerConfig` because that is the runtime type used for
    deserialization.
    
    Populate feature flag values into generated schema so they can be
    autocompleted.
    
    ### Tests
    Added tests + regenerate script to prevent drift. Tested autocompletions
    using generated jsonschema locally with Even Better TOML.
    
    
    
    https://github.com/user-attachments/assets/5aa7cd39-520c-4a63-96fb-63798183d0bc
  • Use markdown for migration screen (#8952)
    Next steps will be routing this to model info
  • fix: leverage codex_utils_cargo_bin() in codex-rs/core/tests/suite (#8887)
    This eliminates our dependency on the `escargot` crate and better
    prepares us for Bazel builds: https://github.com/openai/codex/pull/8875.
  • feat: metrics capabilities (#8318)
    Add metrics capabilities to Codex. The `README.md` is up to date.
    
    This will not be merged with the metrics before this PR of course:
    https://github.com/openai/codex/pull/8350
  • Add feature for optional request compression (#8767)
    Adds a new feature
    `enable_request_compression` that will compress using zstd requests to
    the codex-backend. Currently only enabled for codex-backend so only enabled for openai providers when using chatgpt::auth even when the feature is enabled
    
    Added a new info log line too for evaluating the compression ratio and
    overhead off compressing before requesting. You can enable with
    `RUST_LOG=$RUST_LOG,codex_client::transport=info`
    
    ```
    2026-01-06T00:09:48.272113Z  INFO codex_client::transport: Compressed request body with zstd pre_compression_bytes=28914 post_compression_bytes=11485 compression_duration_ms=0
    ```
  • feat: introduce codex-utils-cargo-bin as an alternative to assert_cmd::Command (#8496)
    This PR introduces a `codex-utils-cargo-bin` utility crate that
    wraps/replaces our use of `assert_cmd::Command` and
    `escargot::CargoBuild`.
    
    As you can infer from the introduction of `buck_project_root()` in this
    PR, I am attempting to make it possible to build Codex under
    [Buck2](https://buck2.build) as well as `cargo`. With Buck2, I hope to
    achieve faster incremental local builds (largely due to Buck2's
    [dice](https://buck2.build/docs/insights_and_knowledge/modern_dice/)
    build strategy, as well as benefits from its local build daemon) as well
    as faster CI builds if we invest in remote execution and caching.
    
    See
    https://buck2.build/docs/getting_started/what_is_buck2/#why-use-buck2-key-advantages
    for more details about the performance advantages of Buck2.
    
    Buck2 enforces stronger requirements in terms of build and test
    isolation. It discourages assumptions about absolute paths (which is key
    to enabling remote execution). Because the `CARGO_BIN_EXE_*` environment
    variables that Cargo provides are absolute paths (which
    `assert_cmd::Command` reads), this is a problem for Buck2, which is why
    we need this `codex-utils-cargo-bin` utility.
    
    My WIP-Buck2 setup sets the `CARGO_BIN_EXE_*` environment variables
    passed to a `rust_test()` build rule as relative paths.
    `codex-utils-cargo-bin` will resolve these values to absolute paths,
    when necessary.
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/8496).
    * #8498
    * __->__ #8496
  • Add ExecPolicyManager (#8349)
    Move exec policy management into services to keep turn context
    immutable.
  • feat: introduce ExternalSandbox policy (#8290)
    ## Description
    
    Introduced `ExternalSandbox` policy to cover use case when sandbox
    defined by outside environment, effectively it translates to
    `SandboxMode#DangerFullAccess` for file system (since sandbox configured
    on container level) and configurable `network_access` (either Restricted
    or Enabled by outside environment).
    
    as example you can configure `ExternalSandbox` policy as part of
    `sendUserTurn` v1 app_server API:
    
    ```
     {
                "conversationId": <id>,
                "cwd": <cwd>,
                "approvalPolicy": "never",
                "sandboxPolicy": {
                      "type": ""external-sandbox",
                      "network_access": "enabled"/"restricted"
                },
                "model": <model>,
                "effort": <effort>,
                ....
            }
    ```
  • chore: migrate from Config::load_from_base_config_with_overrides to ConfigBuilder (#8276)
    https://github.com/openai/codex/pull/8235 introduced `ConfigBuilder` and
    this PR updates all call non-test call sites to use it instead of
    `Config::load_from_base_config_with_overrides()`.
    
    This is important because `load_from_base_config_with_overrides()` uses
    an empty `ConfigRequirements`, which is a reasonable default for testing
    so the tests are not influenced by the settings on the host. This method
    is now guarded by `#[cfg(test)]` so it cannot be used by business logic.
    
    Because `ConfigBuilder::build()` is `async`, many of the test methods
    had to be migrated to be `async`, as well. On the bright side, this made
    it possible to eliminate a bunch of `block_on_future()` stuff.
  • Support SYSTEM skills. (#8220)
    1. Remove PUBLIC skills and introduce SYSTEM skills embedded in the
    binary and installed into $CODEX_HOME/skills/.system at startup.
    2. Skills are now always enabled (feature flag removed).
    3. Update skills/list to accept forceReload and plumb it through (not
    used by clients yet).
  • Removed experimental "command risk assessment" feature (#7799)
    This experimental feature received lukewarm reception during internal
    testing. Removing from the code base.
  • fix: introduce AbsolutePathBuf and resolve relative paths in config.toml (#7796)
    This PR attempts to solve two problems by introducing a
    `AbsolutePathBuf` type with a special deserializer:
    
    - `AbsolutePathBuf` attempts to be a generally useful abstraction, as it
    ensures, by constructing, that it represents a value that is an
    absolute, normalized path, which is a stronger guarantee than an
    arbitrary `PathBuf`.
    - Values in `config.toml` that can be either an absolute or relative
    path should be resolved against the folder containing the `config.toml`
    in the relative path case. This PR makes this easy to support: the main
    cost is ensuring `AbsolutePathBufGuard` is used inside
    `deserialize_config_toml_with_base()`.
    
    While `AbsolutePathBufGuard` may seem slightly distasteful because it
    relies on thread-local storage, this seems much cleaner to me than using
    than my various experiments with
    https://docs.rs/serde/latest/serde/de/trait.DeserializeSeed.html.
    Further, since the `deserialize()` method from the `Deserialize` trait
    is not async, we do not really have to worry about the deserialization
    work being spread across multiple threads in a way that would interfere
    with `AbsolutePathBufGuard`.
    
    To start, this PR introduces the use of `AbsolutePathBuf` in
    `OtelTlsConfig`. Note how this simplifies `otel_provider.rs` because it
    no longer requires `settings.codex_home` to be threaded through.
    Furthermore, this sets us up better for a world where multiple
    `config.toml` files from different folders could be loaded and then
    merged together, as the absolutifying of the paths must be done against
    the correct parent folder.
  • Wire with_remote_overrides to construct model families (#7621)
    - This PR wires `with_remote_overrides` and make the
    `construct_model_families` an async function
    - Moves getting model family a level above to keep the function `sync`
    - Updates the tests to local, offline, and `sync` helper for model
    families
  • feat: experimental support for skills.md (#7412)
    This change prototypes support for Skills with the CLI. This is an
    **experimental** feature for internal testing.
    
    ---------
    
    Co-authored-by: Gav Verma <gverma@openai.com>
  • chore: add cargo-deny configuration (#7119)
    - add GitHub workflow running cargo-deny on push/PR
    - document cargo-deny allowlist with workspace-dep notes and advisory
    ignores
    - align workspace crates to inherit version/edition/license for
    consistent checks
  • Windows: flag some invocations that launch browsers/URLs as dangerous (#7111)
    Prevent certain Powershell/cmd invocations from reaching the sandbox
    when they are trying to launch a browser, or run a command with a URL,
    etc.
  • Fix: Improve text encoding for shell output in VSCode preview (#6178) (#6182)
    ## 🐛 Problem
    
    Users running commands with non-ASCII characters (like Russian text
    "пример") in Windows/WSL environments experience garbled text in
    VSCode's shell preview window, with Unicode replacement characters (�)
    appearing instead of the actual text.
    
    **Issue**: https://github.com/openai/codex/issues/6178
    
    ## 🔧 Root Cause
    
    The issue was in `StreamOutput<Vec<u8>>::from_utf8_lossy()` method in
    `codex-rs/core/src/exec.rs`, which used `String::from_utf8_lossy()` to
    convert shell output bytes to strings. This function immediately
    replaces any invalid UTF-8 byte sequences with replacement characters,
    without attempting to decode using other common encodings.
    
    In Windows/WSL environments, shell output often uses encodings like:
    
    - Windows-1252 (common Windows encoding)
    - Latin-1/ISO-8859-1 (extended ASCII)
    
    ## 🛠️ Solution
    
    Replaced the simple `String::from_utf8_lossy()` call with intelligent
    encoding detection via a new `bytes_to_string_smart()` function that
    tries multiple encoding strategies:
    
    1. **UTF-8** (fast path for valid UTF-8)
    2. **Windows-1252** (handles Windows-specific characters in 0x80-0x9F
    range)
    3. **Latin-1** (fallback for extended ASCII)
    4. **Lossy UTF-8** (final fallback, same as before)
    
    ## 📁 Changes
    
    ### New Files
    
    - `codex-rs/core/src/text_encoding.rs` - Smart encoding detection module
    - `codex-rs/core/tests/suite/text_encoding_fix.rs` - Integration tests
    
    ### Modified Files
    
    - `codex-rs/core/src/lib.rs` - Added text_encoding module
    - `codex-rs/core/src/exec.rs` - Updated StreamOutput::from_utf8_lossy()
    - `codex-rs/core/tests/suite/mod.rs` - Registered new test module
    
    ##  Testing
    
    - **5 unit tests** covering UTF-8, Windows-1252, Latin-1, and fallback
    scenarios
    - **2 integration tests** simulating the exact Issue #6178 scenario
    - **Demonstrates improvement** over the previous
    `String::from_utf8_lossy()` approach
    
    All tests pass:
    
    ```bash
    cargo test -p codex-core text_encoding
    cargo test -p codex-core test_shell_output_encoding_issue_6178
    ```
    
    ## 🎯 Impact
    
    -  **Eliminates garbled text** in VSCode shell preview for non-ASCII
    content
    -  **Supports Windows/WSL environments** with proper encoding detection
    -  **Zero performance impact** for UTF-8 text (fast path)
    -  **Backward compatible** - UTF-8 content works exactly as before
    -  **Handles edge cases** with robust fallback mechanism
    
    ## 🧪 Test Scenarios
    
    The fix has been tested with:
    
    - Russian text ("пример")
    - Windows-1252 quotation marks (""test")
    - Latin-1 accented characters ("café")
    - Mixed encoding content
    - Invalid byte sequences (graceful fallback)
    
    ## 📋 Checklist
    
    - [X] Addresses the reported issue
    - [X] Includes comprehensive tests
    - [X] Maintains backward compatibility
    - [X] Follows project coding conventions
    - [X] No breaking changes
    
    ---------
    
    Co-authored-by: Josh McKinney <joshka@openai.com>
  • execpolicy2 core integration (#6641)
    This PR threads execpolicy2 into codex-core.
    
    activated via feature flag: exec_policy (on by default)
    
    reads and parses all .codexpolicy files in `codex_home/codex`
    
    refactored tool runtime API to integrate execpolicy logic
    
    ---------
    
    Co-authored-by: Michael Bolin <mbolin@openai.com>
  • Fix FreeBSD/OpenBSD builds: target-specific keyring features and BSD hardening (#6680)
    ## Summary
    Builds on FreeBSD and OpenBSD were failing due to globally enabled
    Linux-specific keyring features and hardening code paths not gated by
    OS. This PR scopes keyring native backends to the
    appropriate targets, disables default features at the workspace root,
    and adds a BSD-specific hardening function. Linux/macOS/Windows behavior
    remains unchanged, while FreeBSD/OpenBSD
      now build and run with a supported backend.
    
    ## Key Changes
    
      - Keyring features:
    - Disable keyring default features at the workspace root to avoid
    pulling Linux backends on non-Linux.
    - Move native backend features into target-specific sections in the
    affected crates:
              - Linux: linux-native-async-persistent
              - macOS: apple-native
              - Windows: windows-native
              - FreeBSD/OpenBSD: sync-secret-service
      - Process hardening:
          - Add pre_main_hardening_bsd() for FreeBSD/OpenBSD, applying:
              - Set RLIMIT_CORE to 0
              - Clear LD_* environment variables
    - Simplify process-hardening Cargo deps to unconditional libc (avoid
    conflicting OS fragments).
      - No changes to CODEX_SANDBOX_* behavior.
    
    ## Rationale
    
    - Previously, enabling keyring native backends globally pulled
    Linux-only features on BSD, causing build errors.
    - Hardening logic was tailored for Linux/macOS; BSD builds lacked a
    gated path with equivalent safeguards.
    - Target-scoped features and BSD hardening make the crates portable
    across these OSes without affecting existing behavior elsewhere.
    
    ## Impact by Platform
    
      - Linux: No functional change; backends now selected via target cfg.
      - macOS: No functional change; explicit apple-native mapping.
      - Windows: No functional change; explicit windows-native mapping.
    - FreeBSD/OpenBSD: Builds succeed using sync-secret-service; BSD
    hardening applied during startup.
    
    ## Testing
    
    - Verified compilation across affected crates with target-specific
    features.
    - Smoke-checked that Linux/macOS/Windows feature sets remain identical
    functionally after scoping.
    - On BSD, confirmed keyring resolves to sync-secret-service and
    hardening compiles.
    
    ## Risks / Compatibility
    
      - Minimal risk: only feature scoping and OS-gated additions.
    - No public API changes in the crates; runtime behavior on non-BSD
    platforms is preserved.
    - On BSD, the new hardening clears LD_*; this is consistent with
    security posture on other Unix platforms.
    
    ## Reviewer Notes
    
    - Pay attention to target-specific sections for keyring in the affected
    Cargo.toml files.
    - Confirm pre_main_hardening_bsd() mirrors the safe subset of
    Linux/macOS hardening without introducing Linux-only calls.
    - Confirm no references to CODEX_SANDBOX_ENV_VAR or
    CODEX_SANDBOX_NETWORK_DISABLED_ENV_VAR were added/modified.
    
    ## Checklist
    
      - Disable keyring default features at workspace root.
    - Target-specific keyring features mapped per OS
    (Linux/macOS/Windows/BSD).
      - Add BSD hardening (RLIMIT_CORE=0, clear LD_*).
      - Simplify process-hardening dependencies to unconditional libc.
      - No changes to sandbox env var code.
      - Formatting and linting: just fmt + just fix -p for changed crates.
      - Project tests pass for changed crates; broader suite unchanged.
    
    ---------
    
    Co-authored-by: celia-oai <celia@openai.com>
  • chore(core) Consolidate apply_patch tests (#6545)
    ## Summary
    Consolidates our apply_patch tests into one suite, and ensures each test
    case tests the various ways the harness supports apply_patch:
    1. Freeform custom tool call
    2. JSON function tool
    3. Simple shell call
    4. Heredoc shell call
    
    There are a few test cases that are specific to a particular variant,
    I've left those alone.
    
    ## Testing
    - [x] This adds a significant number of tests
  • Use codex-linux-sandbox in unified exec (#6480)
    Unified exec isn't working on Linux because we don't provide the correct
    arg0.
    
    The library we use for pty management doesn't allow setting arg0
    separately from executable. Use the same aliasing strategy we use for
    `apply_patch` for `codex-linux-sandbox`.
    
    Use `#[ctor]` hack to dispatch codex-linux-sandbox calls.
    
    
    Addresses https://github.com/openai/codex/issues/6450