Commit Graph

32 Commits

  • Support CODEX_API_KEY for codex exec (#4615)
    Allows to set API key per invocation of `codex exec`
  • fix: remove mcp-types from app server protocol (#4537)
    We continue the separation between `codex app-server` and `codex
    mcp-server`.
    
    In particular, we introduce a new crate, `codex-app-server-protocol`,
    and migrate `codex-rs/protocol/src/mcp_protocol.rs` into it, renaming it
    `codex-rs/app-server-protocol/src/protocol.rs`.
    
    Because `ConversationId` was defined in `mcp_protocol.rs`, we move it
    into its own file, `codex-rs/protocol/src/conversation_id.rs`, and
    because it is referenced in a ton of places, we have to touch a lot of
    files as part of this PR.
    
    We also decide to get away from proper JSON-RPC 2.0 semantics, so we
    also introduce `codex-rs/app-server-protocol/src/jsonrpc_lite.rs`, which
    is basically the same `JSONRPCMessage` type defined in `mcp-types`
    except with all of the `"jsonrpc": "2.0"` removed.
    
    Getting rid of `"jsonrpc": "2.0"` makes our serialization logic
    considerably simpler, as we can lean heavier on serde to serialize
    directly into the wire format that we use now.
  • Remove legacy codex exec --json format (#4525)
    `codex exec --json` now maps to the behavior of `codex exec
    --experimental-json` with new event and item shapes.
    
    Thread events:
    - thread.started
    - turn.started
    - turn.completed
    - turn.failed
    - item.started
    - item.updated
    - item.completed
    
    Item types: 
    - assistant_message
    - reasoning
    - command_execution
    - file_change
    - mcp_tool_call
    - web_search
    - todo_list
    - error
    
    Sample output:
    
    <details>
    `codex exec "list my assigned github issues"  --json | jq`
    
    ```
    {
      "type": "thread.started",
      "thread_id": "01999ce5-f229-7661-8570-53312bd47ea3"
    }
    {
      "type": "turn.started"
    }
    {
      "type": "item.completed",
      "item": {
        "id": "item_0",
        "item_type": "reasoning",
        "text": "**Planning to list assigned GitHub issues**"
      }
    }
    {
      "type": "item.started",
      "item": {
        "id": "item_1",
        "item_type": "mcp_tool_call",
        "server": "github",
        "tool": "search_issues",
        "status": "in_progress"
      }
    }
    {
      "type": "item.completed",
      "item": {
        "id": "item_1",
        "item_type": "mcp_tool_call",
        "server": "github",
        "tool": "search_issues",
        "status": "completed"
      }
    }
    {
      "type": "item.completed",
      "item": {
        "id": "item_2",
        "item_type": "reasoning",
        "text": "**Organizing final message structure**"
      }
    }
    {
      "type": "item.completed",
      "item": {
        "id": "item_3",
        "item_type": "assistant_message",
        "text": "**Assigned Issues**\n- openai/codex#3267 – “stream error: stream disconnected before completion…” (bug) – last update 2025-09-08\n- openai/codex#3257 – “You've hit your usage limit. Try again in 4 days 20 hours 9 minutes.” – last update 2025-09-23\n- openai/codex#3054 – “reqwest SSL panic (library has no ciphers)” (bug) – last update 2025-09-03\n- openai/codex#3051 – “thread 'main' panicked at linux-sandbox/src/linux_run_main.rs:53:5:” (bug) – last update 2025-09-10\n- openai/codex#3004 – “Auto-compact when approaching context limit” (enhancement) – last update 2025-09-26\n- openai/codex#2916 – “Feature request: Add OpenAI service tier support for cost optimization” – last update 2025-09-12\n- openai/codex#1581 – “stream error: stream disconnected before completion: stream closed before response.complete; retrying...” (bug) – last update 2025-09-17"
      }
    }
    {
      "type": "turn.completed",
      "usage": {
        "input_tokens": 34785,
        "cached_input_tokens": 12544,
        "output_tokens": 560
      }
    }
    ```
    
    </details>
  • Wire up web search item (#4511)
    Add handling for web search events.
  • Add MCP tool call item to codex exec (#4481)
    No arguments/results for now.
    ```
    {
      "type": "item.started",
      "item": {
        "id": "item_1",
        "item_type": "mcp_tool_call",
        "server": "github",
        "tool": "search_issues",
        "status": "in_progress"
      }
    }
    {
      "type": "item.completed",
      "item": {
        "id": "item_1",
        "item_type": "mcp_tool_call",
        "server": "github",
        "tool": "search_issues",
        "status": "completed"
      }
    }
    ```
  • OpenTelemetry events (#2103)
    ### Title
    
    ## otel
    
    Codex can emit [OpenTelemetry](https://opentelemetry.io/) **log events**
    that
    describe each run: outbound API requests, streamed responses, user
    input,
    tool-approval decisions, and the result of every tool invocation. Export
    is
    **disabled by default** so local runs remain self-contained. Opt in by
    adding an
    `[otel]` table and choosing an exporter.
    
    ```toml
    [otel]
    environment = "staging"   # defaults to "dev"
    exporter = "none"          # defaults to "none"; set to otlp-http or otlp-grpc to send events
    log_user_prompt = false    # defaults to false; redact prompt text unless explicitly enabled
    ```
    
    Codex tags every exported event with `service.name = "codex-cli"`, the
    CLI
    version, and an `env` attribute so downstream collectors can distinguish
    dev/staging/prod traffic. Only telemetry produced inside the
    `codex_otel`
    crate—the events listed below—is forwarded to the exporter.
    
    ### Event catalog
    
    Every event shares a common set of metadata fields: `event.timestamp`,
    `conversation.id`, `app.version`, `auth_mode` (when available),
    `user.account_id` (when available), `terminal.type`, `model`, and
    `slug`.
    
    With OTEL enabled Codex emits the following event types (in addition to
    the
    metadata above):
    
    - `codex.api_request`
      - `cf_ray` (optional)
      - `attempt`
      - `duration_ms`
      - `http.response.status_code` (optional)
      - `error.message` (failures)
    - `codex.sse_event`
      - `event.kind`
      - `duration_ms`
      - `error.message` (failures)
      - `input_token_count` (completion only)
      - `output_token_count` (completion only)
      - `cached_token_count` (completion only, optional)
      - `reasoning_token_count` (completion only, optional)
      - `tool_token_count` (completion only)
    - `codex.user_prompt`
      - `prompt_length`
      - `prompt` (redacted unless `log_user_prompt = true`)
    - `codex.tool_decision`
      - `tool_name`
      - `call_id`
    - `decision` (`approved`, `approved_for_session`, `denied`, or `abort`)
      - `source` (`config` or `user`)
    - `codex.tool_result`
      - `tool_name`
      - `call_id`
      - `arguments`
      - `duration_ms` (execution time for the tool)
      - `success` (`"true"` or `"false"`)
      - `output`
    
    ### Choosing an exporter
    
    Set `otel.exporter` to control where events go:
    
    - `none` – leaves instrumentation active but skips exporting. This is
    the
      default.
    - `otlp-http` – posts OTLP log records to an OTLP/HTTP collector.
    Specify the
      endpoint, protocol, and headers your collector expects:
    
      ```toml
      [otel]
      exporter = { otlp-http = {
        endpoint = "https://otel.example.com/v1/logs",
        protocol = "binary",
        headers = { "x-otlp-api-key" = "${OTLP_TOKEN}" }
      }}
      ```
    
    - `otlp-grpc` – streams OTLP log records over gRPC. Provide the endpoint
    and any
      metadata headers:
    
      ```toml
      [otel]
      exporter = { otlp-grpc = {
        endpoint = "https://otel.example.com:4317",
        headers = { "x-otlp-meta" = "abc123" }
      }}
      ```
    
    If the exporter is `none` nothing is written anywhere; otherwise you
    must run or point to your
    own collector. All exporters run on a background batch worker that is
    flushed on
    shutdown.
    
    If you build Codex from source the OTEL crate is still behind an `otel`
    feature
    flag; the official prebuilt binaries ship with the feature enabled. When
    the
    feature is disabled the telemetry hooks become no-ops so the CLI
    continues to
    function without the extra dependencies.
    
    ---------
    
    Co-authored-by: Anton Panasenko <apanasenko@openai.com>
  • Add turn started/completed events and correct exit code on error (#4309)
    Adds new event for session completed that includes usage. Also ensures
    we return 1 on failures.
    ```
    {
      "type": "session.created",
      "session_id": "019987a7-93e7-7b20-9e05-e90060e411ea"
    }
    {
      "type": "turn.started"
    }
    ...
    {
      "type": "turn.completed",
      "usage": {
        "input_tokens": 78913,
        "cached_input_tokens": 65280,
        "output_tokens": 1099
      }
    }
    ```
  • Add todo-list tool support (#4255)
    Adds a 1-per-turn todo-list item and item.updated event
    
    ```jsonl
    {"type":"item.started","item":{"id":"item_6","item_type":"todo_list","items":[{"text":"Record initial two-step plan  now","completed":false},{"text":"Update progress to next step","completed":false}]}}
    {"type":"item.updated","item":{"id":"item_6","item_type":"todo_list","items":[{"text":"Record initial two-step plan  now","completed":true},{"text":"Update progress to next step","completed":false}]}}
    {"type":"item.completed","item":{"id":"item_6","item_type":"todo_list","items":[{"text":"Record initial two-step plan  now","completed":true},{"text":"Update progress to next step","completed":false}]}}
    ```
  • Add codex exec testing helpers (#4254)
    Add a shortcut to create working directories and run codex exec with
    fake server.
  • [codex exec] Add item.started and support it for command execution (#4250)
    Adds a new `item.started` event to `codex exec` and implements it for
    command_execution item type.
    
    ```jsonl
    {"type":"session.created","session_id":"019982d1-75f0-7920-b051-e0d3731a5ed8"}
    {"type":"item.completed","item":{"id":"item_0","item_type":"reasoning","text":"**Executing commands securely**\n\nI'm thinking about how the default harness typically uses \"bash -lc,\" while historically \"bash\" is what we've been using. The command should be executed as a string in our CLI, so using \"bash -lc 'echo hello'\" is optimal but calling \"echo hello\" directly feels safer. The sandbox makes sure environment variables like CODEX_SANDBOX_NETWORK_DISABLED=1 are set, so I won't ask for approval. I just need to run \"echo hello\" and correctly present the output."}}
    {"type":"item.completed","item":{"id":"item_1","item_type":"reasoning","text":"**Preparing for tool calls**\n\nI realize that I need to include a preamble before making any tool calls. So, I'll first state the preamble in the commentary channel, then proceed with the tool call. After that, I need to present the final message along with the output. It's possible that the CLI will show the output inline, but I must ensure that I present the result clearly regardless. Let's move forward and get this organized!"}}
    {"type":"item.completed","item":{"id":"item_2","item_type":"assistant_message","text":"Running `echo` to confirm shell access and print output."}}
    {"type":"item.started","item":{"id":"item_3","item_type":"command_execution","command":"bash -lc echo hello","aggregated_output":"","exit_code":null,"status":"in_progress"}}
    {"type":"item.completed","item":{"id":"item_3","item_type":"command_execution","command":"bash -lc echo hello","aggregated_output":"hello\n","exit_code":0,"status":"completed"}}
    {"type":"item.completed","item":{"id":"item_4","item_type":"assistant_message","text":"hello"}}
    ```
  • make tests pass cleanly in sandbox (#4067)
    This changes the reqwest client used in tests to be sandbox-friendly,
    and skips a bunch of other tests that don't work inside the
    sandbox/without network.
  • Add explicit codex exec events (#4177)
    This pull request add a new experimental format of JSON output.
    
    You can try it using `codex exec --experimental-json`.
    
    Design takes a lot of inspiration from Responses API items and stream
    format.
    
    # Session and items
    Each invocation of `codex exec` starts or resumes a session. 
    
    Session contains multiple high-level item types:
    1. Assistant message 
    2. Assistant thinking 
    3. Command execution 
    4. File changes
    5. To-do lists
    6. etc.
    
    # Events 
    Session and items are going through their life cycles which is
    represented by events.
    
    Session is `session.created` or `session.resumed`
    Items are `item.added`, `item.updated`, `item.completed`,
    `item.require_approval` (or other item types like `item.output_delta`
    when we need streaming).
    
    So a typical session can look like:
    
    <details>
    
    ```
    {
      "type": "session.created",
      "session_id": "01997dac-9581-7de3-b6a0-1df8256f2752"
    }
    {
      "type": "item.completed",
      "item": {
        "id": "itm_0",
        "item_type": "assistant_message",
        "text": "I’ll locate the top-level README and remove its first line. Then I’ll show a quick summary of what changed."
      }
    }
    {
      "type": "item.completed",
      "item": {
        "id": "itm_1",
        "item_type": "command_execution",
        "command": "bash -lc ls -la | sed -n '1,200p'",
        "aggregated_output": "pyenv: cannot rehash: /Users/pakrym/.pyenv/shims isn't writable\ntotal 192\ndrwxr-xr-x@  33 pakrym  staff   1056 Sep 24 14:36 .\ndrwxr-xr-x   41 pakrym  staff   1312 Sep 24 09:17 ..\n-rw-r--r--@   1 pakrym  staff      6 Jul  9 16:16 .codespellignore\n-rw-r--r--@   1 pakrym  staff    258 Aug 13 09:40 .codespellrc\ndrwxr-xr-x@   5 pakrym  staff    160 Jul 23 08:26 .devcontainer\n-rw-r--r--@   1 pakrym  staff   6148 Jul 22 10:03 .DS_Store\ndrwxr-xr-x@  15 pakrym  staff    480 Sep 24 14:38 .git\ndrwxr-xr-x@  12 pakrym  staff    384 Sep  2 16:00 .github\n-rw-r--r--@   1 pakrym  staff    778 Jul  9 16:16 .gitignore\ndrwxr-xr-x@   3 pakrym  staff     96 Aug 11 09:37 .husky\n-rw-r--r--@   1 pakrym  staff    104 Jul  9 16:16 .npmrc\n-rw-r--r--@   1 pakrym  staff     96 Sep  2 08:52 .prettierignore\n-rw-r--r--@   1 pakrym  staff    170 Jul  9 16:16 .prettierrc.toml\ndrwxr-xr-x@   5 pakrym  staff    160 Sep 14 17:43 .vscode\ndrwxr-xr-x@   2 pakrym  staff     64 Sep 11 11:37 2025-09-11\n-rw-r--r--@   1 pakrym  staff   5505 Sep 18 09:28 AGENTS.md\n-rw-r--r--@   1 pakrym  staff     92 Sep  2 08:52 CHANGELOG.md\n-rw-r--r--@   1 pakrym  staff   1145 Jul  9 16:16 cliff.toml\ndrwxr-xr-x@  11 pakrym  staff    352 Sep 24 13:03 codex-cli\ndrwxr-xr-x@  38 pakrym  staff   1216 Sep 24 14:38 codex-rs\ndrwxr-xr-x@  18 pakrym  staff    576 Sep 23 11:01 docs\n-rw-r--r--@   1 pakrym  staff   2038 Jul  9 16:16 flake.lock\n-rw-r--r--@   1 pakrym  staff   1434 Jul  9 16:16 flake.nix\n-rw-r--r--@   1 pakrym  staff  10926 Jul  9 16:16 LICENSE\ndrwxr-xr-x@ 465 pakrym  staff  14880 Jul 15 07:36 node_modules\n-rw-r--r--@   1 pakrym  staff    242 Aug  5 08:25 NOTICE\n-rw-r--r--@   1 pakrym  staff    578 Aug 14 12:31 package.json\n-rw-r--r--@   1 pakrym  staff    498 Aug 11 09:37 pnpm-lock.yaml\n-rw-r--r--@   1 pakrym  staff     58 Aug 11 09:37 pnpm-workspace.yaml\n-rw-r--r--@   1 pakrym  staff   2402 Jul  9 16:16 PNPM.md\n-rw-r--r--@   1 pakrym  staff   4393 Sep 12 14:36 README.md\ndrwxr-xr-x@   4 pakrym  staff    128 Sep 18 09:28 scripts\ndrwxr-xr-x@   2 pakrym  staff     64 Sep 11 11:34 tmp\n",
        "exit_code": 0,
        "status": "completed"
      }
    }
    {
      "type": "item.completed",
      "item": {
        "id": "itm_2",
        "item_type": "reasoning",
        "text": "**Reviewing README.md file**\n\nI've located the README.md file at the root, and it’s 4393 bytes. Now, I need to remove the first line, but first, I should check its content to make sure I’m patching it correctly. I’ll use sed to display the first 20 lines. By reviewing those lines, I can determine exactly what needs to be removed before I proceed with the editing. Let's do this carefully!"
      }
    }
    {
      "type": "item.completed",
      "item": {
        "id": "itm_3",
        "item_type": "command_execution",
        "command": "bash -lc sed -n '1,40p' README.md",
        "aggregated_output": "<h1 align=\"center\">OpenAI Codex CLI</h1>\n\n<p align=\"center\"><code>npm i -g @openai/codex</code><br />or <code>brew install codex</code></p>\n\n<p align=\"center\"><strong>Codex CLI</strong> is a coding agent from OpenAI that runs locally on your computer.\n</br>\n</br>If you want Codex in your code editor (VS Code, Cursor, Windsurf), <a href=\"https://developers.openai.com/codex/ide\">install in your IDE</a>\n</br>If you are looking for the <em>cloud-based agent</em> from OpenAI, <strong>Codex Web</strong>, go to <a href=\"https://chatgpt.com/codex\">chatgpt.com/codex</a></p>\n\n<p align=\"center\">\n  <img src=\"./.github/codex-cli-splash.png\" alt=\"Codex CLI splash\" width=\"80%\" />\n  </p>\n\n---\n\n## Quickstart\n\n### Installing and running Codex CLI\n\nInstall globally with your preferred package manager. If you use npm:\n\n```shell\nnpm install -g @openai/codex\n```\n\nAlternatively, if you use Homebrew:\n\n```shell\nbrew install codex\n```\n\nThen simply run `codex` to get started:\n\n```shell\ncodex\n```\n\n<details>\n<summary>You can also go to the <a href=\"https://github.com/openai/codex/releases/latest\">latest GitHub Release</a> and download the appropriate binary for your platform.</summary>\n\n",
        "exit_code": 0,
        "status": "completed"
      }
    }
    {
      "type": "item.completed",
      "item": {
        "id": "itm_4",
        "item_type": "reasoning",
        "text": "**Updating README file**\n\nThe user wants to update the README by removing the first line, which is an HTML h1 tag. I'll proceed to delete that line while keeping the rest intact. To do this, I need to use the apply_patch tool and create a patch in the diff format. I’ll take care to include context lines around the change. After removing the header, I’m considering whether to delete the following blank line as well to keep it tidy."
      }
    }
    {
      "type": "item.completed",
      "item": {
        "id": "itm_5",
        "item_type": "reasoning",
        "text": "**Crafting the patch**\n\nThe user has asked to remove just the first line, without mentioning the blank lines. Technically, leaving a blank line at the start is acceptable, but it's not ideal. Ideally, I'd prefer to remove the first line while maintaining the subsequent blank line. However, if I can also remove that blank line, it could tidy things up even more. Still, since the user's request is strictly to remove only the first line, I'll stick to that and craft the patch accordingly."
      }
    }
    {
      "type": "item.completed",
      "item": {
        "id": "itm_6",
        "item_type": "assistant_message",
        "text": "Applying the change by removing the first line from `README.md`."
      }
    }
    {
      "type": "item.completed",
      "item": {
        "id": "itm_7",
        "item_type": "file_change",
        "changes": [
          {
            "path": "/Users/pakrym/code/codex/README.md",
            "kind": "update"
          }
        ],
        "status": "completed"
      }
    }
    {
      "type": "item.completed",
      "item": {
        "id": "itm_8",
        "item_type": "reasoning",
        "text": "**Validating README Update**\n\nI updated the README.md and now I need to validate that the first line has been successfully removed. To do this, I'll check the first 10 lines of the document to ensure everything looks correct. It’s always good to double-check these kinds of changes, so I’m going to take that extra step to make sure the update was executed properly. Let's see what those initial lines say!"
      }
    }
    {
      "type": "item.completed",
      "item": {
        "id": "itm_9",
        "item_type": "command_execution",
        "command": "bash -lc sed -n '1,20p' README.md",
        "aggregated_output": "\n<p align=\"center\"><code>npm i -g @openai/codex</code><br />or <code>brew install codex</code></p>\n\n<p align=\"center\"><strong>Codex CLI</strong> is a coding agent from OpenAI that runs locally on your computer.\n</br>\n</br>If you want Codex in your code editor (VS Code, Cursor, Windsurf), <a href=\"https://developers.openai.com/codex/ide\">install in your IDE</a>\n</br>If you are looking for the <em>cloud-based agent</em> from OpenAI, <strong>Codex Web</strong>, go to <a href=\"https://chatgpt.com/codex\">chatgpt.com/codex</a></p>\n\n<p align=\"center\">\n  <img src=\"./.github/codex-cli-splash.png\" alt=\"Codex CLI splash\" width=\"80%\" />\n  </p>\n\n---\n\n## Quickstart\n\n### Installing and running Codex CLI\n\nInstall globally with your preferred package manager. If you use npm:\n\n",
        "exit_code": 0,
        "status": "completed"
      }
    }
    {
      "type": "item.completed",
      "item": {
        "id": "itm_10",
        "item_type": "assistant_message",
        "text": "- Updated `README.md:1` by removing the opening H1 line.\n- The README now starts with the install instructions paragraph.\n\nWant me to also remove the leading blank line at the top?"
      }
    }
    ```
    
    </details>
    
    The idea is to give users fully formatted items they can use directly in
    their rendering/application logic and avoid having them building up
    items manually based on events (unless they want to for streaming).
    
    This PR implements only the `item.completed` payload for some event
    types, more event types and item types to come.
    
    ---------
    
    Co-authored-by: Michael Bolin <mbolin@openai.com>
  • Add exec output-schema parameter (#4079)
    Adds structured output to `exec` via the `--structured-output`
    parameter.
  • chore: clippy on redundant closure (#4058)
    Add redundant closure clippy rules and let Codex fix it by minimising
    FQP
  • Use helpers instead of fixtures (#3888)
    Move to using test helper method everywhere.
  • fix: ensure cwd for conversation and sandbox are separate concerns (#3874)
    Previous to this PR, both of these functions take a single `cwd`:
    
    
    https://github.com/openai/codex/blob/71038381aa0f51aa62e1a2bcc7cbf26a05b141f3/codex-rs/core/src/seatbelt.rs#L19-L25
    
    
    https://github.com/openai/codex/blob/71038381aa0f51aa62e1a2bcc7cbf26a05b141f3/codex-rs/core/src/landlock.rs#L16-L23
    
    whereas `cwd` and `sandbox_cwd` should be set independently (fixed in
    this PR).
    
    Added `sandbox_distinguishes_command_and_policy_cwds()` to
    `codex-rs/exec/tests/suite/sandbox.rs` to verify this.
  • enable-resume (#3537)
    Adding the ability to resume conversations.
    we have one verb `resume`. 
    
    Behavior:
    
    `tui`:
    `codex resume`: opens session picker
    `codex resume --last`: continue last message
    `codex resume <session id>`: continue conversation with `session id`
    
    `exec`:
    `codex resume --last`: continue last conversation
    `codex resume <session id>`: continue conversation with `session id`
    
    Implementation:
    - I added a function to find the path in `~/.codex/sessions/` with a
    `UUID`. This is helpful in resuming with session id.
    - Added the above mentioned flags
    - Added lots of testing
  • chore: enable clippy::redundant_clone (#3489)
    Created this PR by:
    
    - adding `redundant_clone` to `[workspace.lints.clippy]` in
    `cargo-rs/Cargol.toml`
    - running `cargo clippy --tests --fix`
    - running `just fmt`
    
    Though I had to clean up one instance of the following that resulted:
    
    ```rust
    let codex = codex;
    ```
  • chore: require uninlined_format_args from clippy (#2845)
    - added `uninlined_format_args` to `[workspace.lints.clippy]` in the
    `Cargo.toml` for the workspace
    - ran `cargo clippy --tests --fix`
    - ran `just fmt`
  • [exec] Clean up apply-patch tests (#2648)
    ## Summary
    These tests were getting a bit unwieldy, and they're starting to become
    load-bearing. Let's clean them up, and get them working solidly so we
    can easily expand this harness with new tests.
    
    ## Test Plan
    - [x] Tests continue to pass
  • test: faster test execution in codex-core (#2633)
    this dramatically improves time to run `cargo test -p codex-core` (~25x
    speedup).
    
    before:
    ```
    cargo test -p codex-core  35.96s user 68.63s system 19% cpu 8:49.80 total
    ```
    
    after:
    ```
    cargo test -p codex-core  5.51s user 8.16s system 63% cpu 21.407 total
    ```
    
    both tests measured "hot", i.e. on a 2nd run with no filesystem changes,
    to exclude compile times.
    
    approach inspired by [Delete Cargo Integration
    Tests](https://matklad.github.io/2021/02/27/delete-cargo-integration-tests.html),
    we move all test cases in tests/ into a single suite in order to have a
    single binary, as there is significant overhead for each test binary
    executed, and because test execution is only parallelized with a single
    binary.
  • [apply_patch] freeform apply_patch tool (#2576)
    ## Summary
    GPT-5 introduced the concept of [custom
    tools](https://platform.openai.com/docs/guides/function-calling#custom-tools),
    which allow the model to send a raw string result back, simplifying
    json-escape issues. We are migrating gpt-5 to use this by default.
    
    However, gpt-oss models do not support custom tools, only normal
    functions. So we keep both tool definitions, and provide whichever one
    the model family supports.
    
    ## Testing
    - [x] Tested locally with various models
    - [x] Unit tests pass
  • [tools] Add apply_patch tool (#2303)
    ## Summary
    We've been seeing a number of issues and reports with our synthetic
    `apply_patch` tool, e.g. #802. Let's make this a real tool - in my
    anecdotal testing, it's critical for GPT-OSS models, but I'd like to
    make it the standard across GPT-5 and codex models as well.
    
    ## Testing
    - [x] Tested locally
    - [x] Integration test
  • Added allow-expect-in-tests / allow-unwrap-in-tests (#2328)
    This PR:
    * Added the clippy.toml to configure allowable expect / unwrap usage in
    tests
    * Removed as many expect/allow lines as possible from tests
    * moved a bunch of allows to expects where possible
    
    Note: in integration tests, non `#[test]` helper functions are not
    covered by this so we had to leave a few lingering `expect(expect_used`
    checks around
  • Fix AF_UNIX, sockpair, recvfrom in linux sandbox (#2309)
    When using codex-tui on a linux system I was unable to run `cargo
    clippy` inside of codex due to:
    ```
    [pid 3548377] socketpair(AF_UNIX, SOCK_SEQPACKET|SOCK_CLOEXEC, 0,  <unfinished ...>
    [pid 3548370] close(8 <unfinished ...>
    [pid 3548377] <... socketpair resumed>0x7ffb97f4ed60) = -1 EPERM (Operation not permitted)
    ```
    And
    ```
    3611300 <... recvfrom resumed>0x708b8b5cffe0, 8, 0, NULL, NULL) = -1 EPERM (Operation not permitted)
    ```
    
    This PR:
    * Fixes a bug that disallowed AF_UNIX to allow it on `socket()`
    * Adds recvfrom() to the syscall allow list, this should be fine since
    we disable opening new sockets. But we should validate there is not a
    open socket inheritance issue.
    * Allow socketpair to be called for AF_UNIX
    * Adds tests for AF_UNIX components
    * All of which allows running `cargo clippy` within the sandbox on
    linux, and possibly other tooling using a fork server model + AF_UNIX
    comms.
  • fix: run python_multiprocessing_lock_works integration test on Mac and Linux (#2318)
    The high-order bit on this PR is that it makes it so `sandbox.rs` tests
    both Mac and Linux, as we introduce a general
    `spawn_command_under_sandbox()` function with platform-specific
    implementations for testing.
    
    An important, and interesting, discovery in porting the test to Linux is
    that (for reasons cited in the code comments), `/dev/shm` has to be
    added to `writable_roots` on Linux in order for `multiprocessing.Lock`
    to work there. Granting write access to `/dev/shm` comes with some
    degree of risk, so we do not make this the default for Codex CLI.
    
    Piggybacking on top of #2317, this moves the
    `python_multiprocessing_lock_works` test yet again, moving
    `codex-rs/core/tests/sandbox.rs` to `codex-rs/exec/tests/sandbox.rs`
    because in `codex-rs/exec/tests` we can use `cargo_bin()` like so:
    
    ```
    let codex_linux_sandbox_exe = assert_cmd::cargo::cargo_bin("codex-exec");
    ```
    
    which is necessary so we can use `codex_linux_sandbox_exe` and therefore
    `spawn_command_under_linux_sandbox` in an integration test.
    
    This also moves `spawn_command_under_linux_sandbox()` out of `exec.rs`
    and into `landlock.rs`, which makes things more consistent with
    `seatbelt.rs` in `codex-core`.
    
    For reference, https://github.com/openai/codex/pull/1808 is the PR that
    made the change to Seatbelt to get this test to pass on Mac.
  • fix: run apply_patch calls through the sandbox (#1705)
    Building on the work of https://github.com/openai/codex/pull/1702, this
    changes how a shell call to `apply_patch` is handled.
    
    Previously, a shell call to `apply_patch` was always handled in-process,
    never leveraging a sandbox. To determine whether the `apply_patch`
    operation could be auto-approved, the
    `is_write_patch_constrained_to_writable_paths()` function would check if
    all the paths listed in the paths were writable. If so, the agent would
    apply the changes listed in the patch.
    
    Unfortunately, this approach afforded a loophole: symlinks!
    
    * For a soft link, we could fix this issue by tracing the link and
    checking whether the target is in the set of writable paths, however...
    * ...For a hard link, things are not as simple. We can run `stat FILE`
    to see if the number of links is greater than 1, but then we would have
    to do something potentially expensive like `find . -inum <inode_number>`
    to find the other paths for `FILE`. Further, even if this worked, this
    approach runs the risk of a
    [TOCTOU](https://en.wikipedia.org/wiki/Time-of-check_to_time-of-use)
    race condition, so it is not robust.
    
    The solution, implemented in this PR, is to take the virtual execution
    of the `apply_patch` CLI into an _actual_ execution using `codex
    --codex-run-as-apply-patch PATCH`, which we can run under the sandbox
    the user specified, just like any other `shell` call.
    
    This, of course, assumes that the sandbox prevents writing through
    symlinks as a mechanism to write to folders that are not in the writable
    set configured by the sandbox. I verified this by testing the following
    on both Mac and Linux:
    
    ```shell
    #!/usr/bin/env bash
    set -euo pipefail
    
    # Can running a command in SANDBOX_DIR write a file in EXPLOIT_DIR?
    
    # Codex is run in SANDBOX_DIR, so writes should be constrianed to this directory.
    SANDBOX_DIR=$(mktemp -d -p "$HOME" sandboxtesttemp.XXXXXX)
    # EXPLOIT_DIR is outside of SANDBOX_DIR, so let's see if we can write to it.
    EXPLOIT_DIR=$(mktemp -d -p "$HOME" sandboxtesttemp.XXXXXX)
    
    echo "SANDBOX_DIR: $SANDBOX_DIR"
    echo "EXPLOIT_DIR: $EXPLOIT_DIR"
    
    cleanup() {
      # Only remove if it looks sane and still exists
      [[ -n "${SANDBOX_DIR:-}" && -d "$SANDBOX_DIR" ]] && rm -rf -- "$SANDBOX_DIR"
      [[ -n "${EXPLOIT_DIR:-}" && -d "$EXPLOIT_DIR" ]] && rm -rf -- "$EXPLOIT_DIR"
    }
    
    trap cleanup EXIT
    
    echo "I am the original content" > "${EXPLOIT_DIR}/original.txt"
    
    # Drop the -s to test hard links.
    ln -s "${EXPLOIT_DIR}/original.txt" "${SANDBOX_DIR}/link-to-original.txt"
    
    cat "${SANDBOX_DIR}/link-to-original.txt"
    
    if [[ "$(uname)" == "Linux" ]]; then
        SANDBOX_SUBCOMMAND=landlock
    else
        SANDBOX_SUBCOMMAND=seatbelt
    fi
    
    # Attempt the exploit
    cd "${SANDBOX_DIR}"
    
    codex debug "${SANDBOX_SUBCOMMAND}" bash -lc "echo pwned > ./link-to-original.txt" || true
    
    cat "${EXPLOIT_DIR}/original.txt"
    ```
    
    Admittedly, this change merits a proper integration test, but I think I
    will have to do that in a follow-up PR.
  • fix: support special --codex-run-as-apply-patch arg (#1702)
    This introduces some special behavior to the CLIs that are using the
    `codex-arg0` crate where if `arg1` is `--codex-run-as-apply-patch`, then
    it will run as if `apply_patch arg2` were invoked. This is important
    because it means we can do things like:
    
    ```
    SANDBOX_TYPE=landlock # or seatbelt for macOS
    codex debug "${SANDBOX_TYPE}" -- codex --codex-run-as-apply-patch PATCH
    ```
    
    which gives us a way to run `apply_patch` while ensuring it adheres to
    the sandbox the user specified.
    
    While it would be nice to use the `arg0` trick like we are currently
    doing for `codex-linux-sandbox`, there is no way to specify the `arg0`
    for the underlying command when running under `/usr/bin/sandbox-exec`,
    so it will not work for us in this case.
    
    Admittedly, we could have also supported this via a custom environment
    variable (e.g., `CODEX_ARG0`), but since environment variables are
    inherited by child processes, that seemed like a potentially leakier
    abstraction.
    
    This change, as well as our existing reliance on checking `arg0`, place
    additional requirements on those who include `codex-core`. Its
    `README.md` has been updated to reflect this.
    
    While we could have just added an `apply-patch` subcommand to the
    `codex` multitool CLI, that would not be sufficient for the standalone
    `codex-exec` CLI, which is something that we distribute as part of our
    GitHub releases for those who know they will not be using the TUI and
    therefore prefer to use a slightly smaller executable:
    
    https://github.com/openai/codex/releases/tag/rust-v0.10.0
    
    To that end, this PR adds an integration test to ensure that the
    `--codex-run-as-apply-patch` option works with the standalone
    `codex-exec` CLI.
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/1702).
    * #1705
    * #1703
    * __->__ #1702
    * #1698
    * #1697