Commit Graph

52 Commits

  • [core] add optional status_code to error events (#6865)
    We want to better uncover error status code for clients. Add an optional
    status_code to error events (thread error, error, stream error) so app
    server could uncover the status code from the client side later.
    
    in event log:
    ```
    < {
    <   "method": "codex/event/stream_error",
    <   "params": {
    <     "conversationId": "019a9a32-f576-7292-9711-8e57e8063536",
    <     "id": "0",
    <     "msg": {
    <       "message": "Reconnecting... 5/5",
    <       "status_code": 401,
    <       "type": "stream_error"
    <     }
    <   }
    < }
    < {
    <   "method": "codex/event/error",
    <   "params": {
    <     "conversationId": "019a9a32-f576-7292-9711-8e57e8063536",
    <     "id": "0",
    <     "msg": {
    <       "message": "exceeded retry limit, last status: 401 Unauthorized, request id: 9a0cb03a485067f7-SJC",
    <       "status_code": 401,
    <       "type": "error"
    <     }
    <   }
    < }
    ```
  • fix: add more fields to ThreadStartResponse and ThreadResumeResponse (#6847)
    This adds the following fields to `ThreadStartResponse` and
    `ThreadResumeResponse`:
    
    ```rust
        pub model: String,
        pub model_provider: String,
        pub cwd: PathBuf,
        pub approval_policy: AskForApproval,
        pub sandbox: SandboxPolicy,
        pub reasoning_effort: Option<ReasoningEffort>,
    ```
    
    This is important because these fields are optional in
    `ThreadStartParams` and `ThreadResumeParams`, so the caller needs to be
    able to determine what values were ultimately used to start/resume the
    conversation. (Though note that any of these could be changed later
    between turns in the conversation.)
    
    Though to get this information reliably, it must be read from the
    internal `SessionConfiguredEvent` that is created in response to the
    start of a conversation. Because `SessionConfiguredEvent` (as defined in
    `codex-rs/protocol/src/protocol.rs`) did not have all of these fields, a
    number of them had to be added as part of this PR.
    
    Because `SessionConfiguredEvent` is referenced in many tests, test
    instances of `SessionConfiguredEvent` had to be updated, as well, which
    is why this PR touches so many files.
  • Update defaults to gpt-5.1 (#6652)
    ## Summary
    - update documentation, example configs, and automation defaults to
    reference gpt-5.1 / gpt-5.1-codex
    - bump the CLI and core configuration defaults, model presets, and error
    messaging to the new models while keeping the model-family/tool coverage
    for legacy slugs
    - refresh tests, fixtures, and TUI snapshots so they expect the upgraded
    defaults
    
    ## Testing
    - `cargo test -p codex-core
    config::tests::test_precedence_fixture_with_gpt5_profile`
    
    
    ------
    [Codex
    Task](https://chatgpt.com/codex/tasks/task_i_6916c5b3c2b08321ace04ee38604fc6b)
  • [app-server] feat: add v2 command execution approval flow (#6758)
    This PR adds the API V2 version of the command‑execution approval flow
    for the shell tool.
    
    This PR wires the new RPC (`item/commandExecution/requestApproval`, V2
    only) and related events (`item/started`, `item/completed`, and
    `item/commandExecution/delta`, which are emitted in both V1 and V2)
    through the app-server
    protocol. The new approval RPC is only sent when the user initiates a
    turn with the new `turn/start` API so we don't break backwards
    compatibility with VSCE.
    
    The approach I took was to make as few changes to the Codex core as
    possible, leveraging existing `EventMsg` core events, and translating
    those in app-server. I did have to add additional fields to
    `EventMsg::ExecCommandEndEvent` to capture the command's input so that
    app-server can statelessly transform these events to a
    `ThreadItem::CommandExecution` item for the `item/completed` event.
    
    Once we stabilize the API and it's complete enough for our partners, we
    can work on migrating the core to be aware of command execution items as
    a first-class concept.
    
    **Note**: We'll need followup work to make sure these APIs work for the
    unified exec tool, but will wait til that's stable and landed before
    doing a pass on app-server.
    
    Example payloads below:
    ```
    {
      "method": "item/started",
      "params": {
        "item": {
          "aggregatedOutput": null,
          "command": "/bin/zsh -lc 'touch /tmp/should-trigger-approval'",
          "cwd": "/Users/owen/repos/codex/codex-rs",
          "durationMs": null,
          "exitCode": null,
          "id": "call_lNWWsbXl1e47qNaYjFRs0dyU",
          "parsedCmd": [
            {
              "cmd": "touch /tmp/should-trigger-approval",
              "type": "unknown"
            }
          ],
          "status": "inProgress",
          "type": "commandExecution"
        }
      }
    }
    ```
    
    ```
    {
      "id": 0,
      "method": "item/commandExecution/requestApproval",
      "params": {
        "itemId": "call_lNWWsbXl1e47qNaYjFRs0dyU",
        "parsedCmd": [
          {
            "cmd": "touch /tmp/should-trigger-approval",
            "type": "unknown"
          }
        ],
        "reason": "Need to create file in /tmp which is outside workspace sandbox",
        "risk": null,
        "threadId": "019a93e8-0a52-7fe3-9808-b6bc40c0989a",
        "turnId": "1"
      }
    }
    ```
    
    ```
    {
      "id": 0,
      "result": {
        "acceptSettings": {
          "forSession": false
        },
        "decision": "accept"
      }
    }
    ```
    
    ```
    {
      "params": {
        "item": {
          "aggregatedOutput": null,
          "command": "/bin/zsh -lc 'touch /tmp/should-trigger-approval'",
          "cwd": "/Users/owen/repos/codex/codex-rs",
          "durationMs": 224,
          "exitCode": 0,
          "id": "call_lNWWsbXl1e47qNaYjFRs0dyU",
          "parsedCmd": [
            {
              "cmd": "touch /tmp/should-trigger-approval",
              "type": "unknown"
            }
          ],
          "status": "completed",
          "type": "commandExecution"
        }
      }
    }
    ```
  • feat: better UI for unified_exec (#6515)
    <img width="376" height="132" alt="Screenshot 2025-11-12 at 17 36 22"
    src="https://github.com/user-attachments/assets/ce693f0d-5ca0-462e-b170-c20811dcc8d5"
    />
  • feat: Add support for --add-dir to exec and TypeScript SDK (#6565)
    ## Summary
    
    Adds support for specifying additional directories in the TypeScript SDK
    through a new `additionalDirectories` option in `ThreadOptions`.
    
    ## Changes
    
    - Added `additionalDirectories` parameter to `ThreadOptions` interface
    - Updated `CodexExec` to accept and pass through additional directories
    via the `--config` flag for `sandbox_workspace_write.writable_roots`
    - Added comprehensive test coverage for the new functionality
    
    ## Test plan
    
    - Added test case that verifies `additionalDirectories` is correctly
    passed as repeated flags
    - Existing tests continue to pass
    
    ---------
    
    Co-authored-by: Claude <noreply@anthropic.com>
  • Fix warning message phrasing (#6446)
    Small fix for sentence phrasing in the warning message
    
    Co-authored-by: AndrewNikolin <877163+AndrewNikolin@users.noreply.github.com>
  • Add warning on compact (#6052)
    This PR introduces the ability for `core` to send `warnings` as it can
    send `errors. It also sends a warning on compaction.
    
    <img width="811" height="187" alt="image"
    src="https://github.com/user-attachments/assets/0947a42d-b720-420d-b7fd-115f8a65a46a"
    />
  • [exec] Add MCP tool arguments and results (#5899)
    Extends mcp_tool_call item to include arguments and results.
  • feature: Add "!cmd" user shell execution (#2471)
    feature: Add "!cmd" user shell execution
    
    This change lets users run local shell commands directly from the TUI by
    prefixing their input with ! (e.g. !ls). Output is truncated to keep the
    exec cell usable, and Ctrl-C cleanly
      interrupts long-running commands (e.g. !sleep 10000).
    
    **Summary of changes**
    
    - Route Op::RunUserShellCommand through a dedicated UserShellCommandTask
    (core/src/tasks/user_shell.rs), keeping the task logic out of codex.rs.
    - Reuse the existing tool router: the task constructs a ToolCall for the
    local_shell tool and relies on ShellHandler, so no manual MCP tool
    lookup is required.
    - Emit exec lifecycle events (ExecCommandBegin/ExecCommandEnd) so the
    TUI can show command metadata, live output, and exit status.
    
    **End-to-end flow**
    
      **TUI handling**
    
    1. ChatWidget::submit_user_message (TUI) intercepts messages starting
    with !.
    2. Non-empty commands dispatch Op::RunUserShellCommand { command };
    empty commands surface a help hint.
    3. No UserInput items are created, so nothing is enqueued for the model.
    
      **Core submission loop**
    4. The submission loop routes the op to handlers::run_user_shell_command
    (core/src/codex.rs).
    5. A fresh TurnContext is created and Session::spawn_user_shell_command
    enqueues UserShellCommandTask.
    
      **Task execution**
    6. UserShellCommandTask::run emits TaskStartedEvent, formats the
    command, and prepares a ToolCall targeting local_shell.
      7. ToolCallRuntime::handle_tool_call dispatches to ShellHandler.
    
      **Shell tool runtime**
    8. ShellHandler::run_exec_like launches the process via the unified exec
    runtime, honoring sandbox and shell policies, and emits
    ExecCommandBegin/End.
    9. Stdout/stderr are captured for the UI, but the task does not turn the
    resulting ToolOutput into a model response.
    
      **Completion**
    10. After ExecCommandEnd, the task finishes without an assistant
    message; the session marks it complete and the exec cell displays the
    final output.
    
      **Conversation context**
    
    - The command and its output never enter the conversation history or the
    model prompt; the flow is local-only.
      - Only exec/task events are emitted for UI rendering.
    
    **Demo video**
    
    
    https://github.com/user-attachments/assets/fcd114b0-4304-4448-a367-a04c43e0b996
  • feat(tui): clarify Windows auto mode requirements (#5568)
    ## Summary
    - Coerce Windows `workspace-write` configs back to read-only, surface
    the forced downgrade in the approvals popup,
      and funnel users toward WSL or Full Access.
    - Add WSL installation instructions to the Auto preset on Windows while
    keeping the preset available for other
      platforms.
    - Skip the trust-on-first-run prompt on native Windows so new folders
    remain read-only without additional
      confirmation.
    - Expose a structured sandbox policy resolution from config to flag
    Windows downgrades and adjust tests (core,
    exec, TUI) to reflect the new behavior; provide a Windows-only approvals
    snapshot.
    
      ## Testing
      - cargo fmt
    - cargo test -p codex-core
    config::tests::add_dir_override_extends_workspace_writable_roots
    - cargo test -p codex-exec
    suite::resume::exec_resume_preserves_cli_configuration_overrides
    - cargo test -p codex-tui
    chatwidget::tests::approvals_selection_popup_snapshot
    - cargo test -p codex-tui
    approvals_popup_includes_wsl_note_for_auto_mode
      - cargo test -p codex-tui windows_skips_trust_prompt
      - just fix -p codex-core
      - just fix -p codex-tui
  • chore: drop approve all (#5503)
    Not needed anymore
  • Set codex SDK TypeScript originator (#4894)
    ## Summary
    - ensure the TypeScript SDK sets CODEX_INTERNAL_ORIGINATOR_OVERRIDE to
    codex_sdk_ts when spawning the Codex CLI
    - extend the responses proxy test helper to capture request headers for
    assertions
    - add coverage that verifies Codex threads launched from the TypeScript
    SDK send the codex_sdk_ts originator header
    
    ## Testing
    - Not Run (not requested)
    
    
    ------
    https://chatgpt.com/codex/tasks/task_i_68e561b125248320a487f129093d16e7
  • Simplify request body assertions (#4845)
    We'll have a lot more test like these
  • Use response helpers when mounting SSE test responses (#4783)
    ## Summary
    - replace manual wiremock SSE mounts in the compact suite with the
    shared response helpers
    - simplify the exec auth_env integration test by using the
    mount_sse_once_match helper
    - rely on mount_sse_sequence plus server request collection to replace
    the bespoke SeqResponder utility in tests
    
    ## Testing
    - just fmt
    
    ------
    https://chatgpt.com/codex/tasks/task_i_68e2e238f2a88320a337f0b9e4098093
  • Add helper for response created SSE events in tests (#4758)
    ## Summary
    - add a reusable `ev_response_created` helper that builds
    `response.created` SSE events for integration tests
    - update the exec and core integration suites to use the new helper
    instead of repeating manual JSON literals
    - keep the streaming fixtures consistent by relying on the shared helper
    in every touched test
    
    ## Testing
    - `just fmt`
    
    
    ------
    https://chatgpt.com/codex/tasks/task_i_68e1fe885bb883208aafffb94218da61
  • feat: codex exec writes only the final message to stdout (#4644)
    This updates `codex exec` so that, by default, most of the agent's
    activity is written to stderr so that only the final agent message is
    written to stdout. This makes it easier to pipe `codex exec` into
    another tool without extra filtering.
    
    I introduced `#![deny(clippy::print_stdout)]` to help enforce this
    change and renamed the `ts_println!()` macro to `ts_msg()` because (1)
    it no longer calls `println!()` and (2), `ts_eprintln!()` seemed too
    long of a name.
    
    While here, this also adds `-o` as an alias for `--output-last-message`.
    
    Fixes https://github.com/openai/codex/issues/1670
  • chore: refactor tool handling (#4510)
    # Tool System Refactor
    
    - Centralizes tool definitions and execution in `core/src/tools/*`:
    specs (`spec.rs`), handlers (`handlers/*`), router (`router.rs`),
    registry/dispatch (`registry.rs`), and shared context (`context.rs`).
    One registry now builds the model-visible tool list and binds handlers.
    - Router converts model responses to tool calls; Registry dispatches
    with consistent telemetry via `codex-rs/otel` and unified error
    handling. Function, Local Shell, MCP, and experimental `unified_exec`
    all flow through this path; legacy shell aliases still work.
    - Rationale: reduce per‑tool boilerplate, keep spec/handler in sync, and
    make adding tools predictable and testable.
    
    Example: `read_file`
    - Spec: `core/src/tools/spec.rs` (see `create_read_file_tool`,
    registered by `build_specs`).
    - Handler: `core/src/tools/handlers/read_file.rs` (absolute `file_path`,
    1‑indexed `offset`, `limit`, `L#: ` prefixes, safe truncation).
    - E2E test: `core/tests/suite/read_file.rs` validates the tool returns
    the requested lines.
    
    ## Next steps:
    - Decompose `handle_container_exec_with_params` 
    - Add parallel tool calls
  • Support CODEX_API_KEY for codex exec (#4615)
    Allows to set API key per invocation of `codex exec`
  • fix: remove mcp-types from app server protocol (#4537)
    We continue the separation between `codex app-server` and `codex
    mcp-server`.
    
    In particular, we introduce a new crate, `codex-app-server-protocol`,
    and migrate `codex-rs/protocol/src/mcp_protocol.rs` into it, renaming it
    `codex-rs/app-server-protocol/src/protocol.rs`.
    
    Because `ConversationId` was defined in `mcp_protocol.rs`, we move it
    into its own file, `codex-rs/protocol/src/conversation_id.rs`, and
    because it is referenced in a ton of places, we have to touch a lot of
    files as part of this PR.
    
    We also decide to get away from proper JSON-RPC 2.0 semantics, so we
    also introduce `codex-rs/app-server-protocol/src/jsonrpc_lite.rs`, which
    is basically the same `JSONRPCMessage` type defined in `mcp-types`
    except with all of the `"jsonrpc": "2.0"` removed.
    
    Getting rid of `"jsonrpc": "2.0"` makes our serialization logic
    considerably simpler, as we can lean heavier on serde to serialize
    directly into the wire format that we use now.
  • Remove legacy codex exec --json format (#4525)
    `codex exec --json` now maps to the behavior of `codex exec
    --experimental-json` with new event and item shapes.
    
    Thread events:
    - thread.started
    - turn.started
    - turn.completed
    - turn.failed
    - item.started
    - item.updated
    - item.completed
    
    Item types: 
    - assistant_message
    - reasoning
    - command_execution
    - file_change
    - mcp_tool_call
    - web_search
    - todo_list
    - error
    
    Sample output:
    
    <details>
    `codex exec "list my assigned github issues"  --json | jq`
    
    ```
    {
      "type": "thread.started",
      "thread_id": "01999ce5-f229-7661-8570-53312bd47ea3"
    }
    {
      "type": "turn.started"
    }
    {
      "type": "item.completed",
      "item": {
        "id": "item_0",
        "item_type": "reasoning",
        "text": "**Planning to list assigned GitHub issues**"
      }
    }
    {
      "type": "item.started",
      "item": {
        "id": "item_1",
        "item_type": "mcp_tool_call",
        "server": "github",
        "tool": "search_issues",
        "status": "in_progress"
      }
    }
    {
      "type": "item.completed",
      "item": {
        "id": "item_1",
        "item_type": "mcp_tool_call",
        "server": "github",
        "tool": "search_issues",
        "status": "completed"
      }
    }
    {
      "type": "item.completed",
      "item": {
        "id": "item_2",
        "item_type": "reasoning",
        "text": "**Organizing final message structure**"
      }
    }
    {
      "type": "item.completed",
      "item": {
        "id": "item_3",
        "item_type": "assistant_message",
        "text": "**Assigned Issues**\n- openai/codex#3267 – “stream error: stream disconnected before completion…” (bug) – last update 2025-09-08\n- openai/codex#3257 – “You've hit your usage limit. Try again in 4 days 20 hours 9 minutes.” – last update 2025-09-23\n- openai/codex#3054 – “reqwest SSL panic (library has no ciphers)” (bug) – last update 2025-09-03\n- openai/codex#3051 – “thread 'main' panicked at linux-sandbox/src/linux_run_main.rs:53:5:” (bug) – last update 2025-09-10\n- openai/codex#3004 – “Auto-compact when approaching context limit” (enhancement) – last update 2025-09-26\n- openai/codex#2916 – “Feature request: Add OpenAI service tier support for cost optimization” – last update 2025-09-12\n- openai/codex#1581 – “stream error: stream disconnected before completion: stream closed before response.complete; retrying...” (bug) – last update 2025-09-17"
      }
    }
    {
      "type": "turn.completed",
      "usage": {
        "input_tokens": 34785,
        "cached_input_tokens": 12544,
        "output_tokens": 560
      }
    }
    ```
    
    </details>
  • Wire up web search item (#4511)
    Add handling for web search events.
  • Add MCP tool call item to codex exec (#4481)
    No arguments/results for now.
    ```
    {
      "type": "item.started",
      "item": {
        "id": "item_1",
        "item_type": "mcp_tool_call",
        "server": "github",
        "tool": "search_issues",
        "status": "in_progress"
      }
    }
    {
      "type": "item.completed",
      "item": {
        "id": "item_1",
        "item_type": "mcp_tool_call",
        "server": "github",
        "tool": "search_issues",
        "status": "completed"
      }
    }
    ```
  • OpenTelemetry events (#2103)
    ### Title
    
    ## otel
    
    Codex can emit [OpenTelemetry](https://opentelemetry.io/) **log events**
    that
    describe each run: outbound API requests, streamed responses, user
    input,
    tool-approval decisions, and the result of every tool invocation. Export
    is
    **disabled by default** so local runs remain self-contained. Opt in by
    adding an
    `[otel]` table and choosing an exporter.
    
    ```toml
    [otel]
    environment = "staging"   # defaults to "dev"
    exporter = "none"          # defaults to "none"; set to otlp-http or otlp-grpc to send events
    log_user_prompt = false    # defaults to false; redact prompt text unless explicitly enabled
    ```
    
    Codex tags every exported event with `service.name = "codex-cli"`, the
    CLI
    version, and an `env` attribute so downstream collectors can distinguish
    dev/staging/prod traffic. Only telemetry produced inside the
    `codex_otel`
    crate—the events listed below—is forwarded to the exporter.
    
    ### Event catalog
    
    Every event shares a common set of metadata fields: `event.timestamp`,
    `conversation.id`, `app.version`, `auth_mode` (when available),
    `user.account_id` (when available), `terminal.type`, `model`, and
    `slug`.
    
    With OTEL enabled Codex emits the following event types (in addition to
    the
    metadata above):
    
    - `codex.api_request`
      - `cf_ray` (optional)
      - `attempt`
      - `duration_ms`
      - `http.response.status_code` (optional)
      - `error.message` (failures)
    - `codex.sse_event`
      - `event.kind`
      - `duration_ms`
      - `error.message` (failures)
      - `input_token_count` (completion only)
      - `output_token_count` (completion only)
      - `cached_token_count` (completion only, optional)
      - `reasoning_token_count` (completion only, optional)
      - `tool_token_count` (completion only)
    - `codex.user_prompt`
      - `prompt_length`
      - `prompt` (redacted unless `log_user_prompt = true`)
    - `codex.tool_decision`
      - `tool_name`
      - `call_id`
    - `decision` (`approved`, `approved_for_session`, `denied`, or `abort`)
      - `source` (`config` or `user`)
    - `codex.tool_result`
      - `tool_name`
      - `call_id`
      - `arguments`
      - `duration_ms` (execution time for the tool)
      - `success` (`"true"` or `"false"`)
      - `output`
    
    ### Choosing an exporter
    
    Set `otel.exporter` to control where events go:
    
    - `none` – leaves instrumentation active but skips exporting. This is
    the
      default.
    - `otlp-http` – posts OTLP log records to an OTLP/HTTP collector.
    Specify the
      endpoint, protocol, and headers your collector expects:
    
      ```toml
      [otel]
      exporter = { otlp-http = {
        endpoint = "https://otel.example.com/v1/logs",
        protocol = "binary",
        headers = { "x-otlp-api-key" = "${OTLP_TOKEN}" }
      }}
      ```
    
    - `otlp-grpc` – streams OTLP log records over gRPC. Provide the endpoint
    and any
      metadata headers:
    
      ```toml
      [otel]
      exporter = { otlp-grpc = {
        endpoint = "https://otel.example.com:4317",
        headers = { "x-otlp-meta" = "abc123" }
      }}
      ```
    
    If the exporter is `none` nothing is written anywhere; otherwise you
    must run or point to your
    own collector. All exporters run on a background batch worker that is
    flushed on
    shutdown.
    
    If you build Codex from source the OTEL crate is still behind an `otel`
    feature
    flag; the official prebuilt binaries ship with the feature enabled. When
    the
    feature is disabled the telemetry hooks become no-ops so the CLI
    continues to
    function without the extra dependencies.
    
    ---------
    
    Co-authored-by: Anton Panasenko <apanasenko@openai.com>
  • Add turn started/completed events and correct exit code on error (#4309)
    Adds new event for session completed that includes usage. Also ensures
    we return 1 on failures.
    ```
    {
      "type": "session.created",
      "session_id": "019987a7-93e7-7b20-9e05-e90060e411ea"
    }
    {
      "type": "turn.started"
    }
    ...
    {
      "type": "turn.completed",
      "usage": {
        "input_tokens": 78913,
        "cached_input_tokens": 65280,
        "output_tokens": 1099
      }
    }
    ```
  • Add todo-list tool support (#4255)
    Adds a 1-per-turn todo-list item and item.updated event
    
    ```jsonl
    {"type":"item.started","item":{"id":"item_6","item_type":"todo_list","items":[{"text":"Record initial two-step plan  now","completed":false},{"text":"Update progress to next step","completed":false}]}}
    {"type":"item.updated","item":{"id":"item_6","item_type":"todo_list","items":[{"text":"Record initial two-step plan  now","completed":true},{"text":"Update progress to next step","completed":false}]}}
    {"type":"item.completed","item":{"id":"item_6","item_type":"todo_list","items":[{"text":"Record initial two-step plan  now","completed":true},{"text":"Update progress to next step","completed":false}]}}
    ```
  • Add codex exec testing helpers (#4254)
    Add a shortcut to create working directories and run codex exec with
    fake server.
  • [codex exec] Add item.started and support it for command execution (#4250)
    Adds a new `item.started` event to `codex exec` and implements it for
    command_execution item type.
    
    ```jsonl
    {"type":"session.created","session_id":"019982d1-75f0-7920-b051-e0d3731a5ed8"}
    {"type":"item.completed","item":{"id":"item_0","item_type":"reasoning","text":"**Executing commands securely**\n\nI'm thinking about how the default harness typically uses \"bash -lc,\" while historically \"bash\" is what we've been using. The command should be executed as a string in our CLI, so using \"bash -lc 'echo hello'\" is optimal but calling \"echo hello\" directly feels safer. The sandbox makes sure environment variables like CODEX_SANDBOX_NETWORK_DISABLED=1 are set, so I won't ask for approval. I just need to run \"echo hello\" and correctly present the output."}}
    {"type":"item.completed","item":{"id":"item_1","item_type":"reasoning","text":"**Preparing for tool calls**\n\nI realize that I need to include a preamble before making any tool calls. So, I'll first state the preamble in the commentary channel, then proceed with the tool call. After that, I need to present the final message along with the output. It's possible that the CLI will show the output inline, but I must ensure that I present the result clearly regardless. Let's move forward and get this organized!"}}
    {"type":"item.completed","item":{"id":"item_2","item_type":"assistant_message","text":"Running `echo` to confirm shell access and print output."}}
    {"type":"item.started","item":{"id":"item_3","item_type":"command_execution","command":"bash -lc echo hello","aggregated_output":"","exit_code":null,"status":"in_progress"}}
    {"type":"item.completed","item":{"id":"item_3","item_type":"command_execution","command":"bash -lc echo hello","aggregated_output":"hello\n","exit_code":0,"status":"completed"}}
    {"type":"item.completed","item":{"id":"item_4","item_type":"assistant_message","text":"hello"}}
    ```
  • make tests pass cleanly in sandbox (#4067)
    This changes the reqwest client used in tests to be sandbox-friendly,
    and skips a bunch of other tests that don't work inside the
    sandbox/without network.
  • Add explicit codex exec events (#4177)
    This pull request add a new experimental format of JSON output.
    
    You can try it using `codex exec --experimental-json`.
    
    Design takes a lot of inspiration from Responses API items and stream
    format.
    
    # Session and items
    Each invocation of `codex exec` starts or resumes a session. 
    
    Session contains multiple high-level item types:
    1. Assistant message 
    2. Assistant thinking 
    3. Command execution 
    4. File changes
    5. To-do lists
    6. etc.
    
    # Events 
    Session and items are going through their life cycles which is
    represented by events.
    
    Session is `session.created` or `session.resumed`
    Items are `item.added`, `item.updated`, `item.completed`,
    `item.require_approval` (or other item types like `item.output_delta`
    when we need streaming).
    
    So a typical session can look like:
    
    <details>
    
    ```
    {
      "type": "session.created",
      "session_id": "01997dac-9581-7de3-b6a0-1df8256f2752"
    }
    {
      "type": "item.completed",
      "item": {
        "id": "itm_0",
        "item_type": "assistant_message",
        "text": "I’ll locate the top-level README and remove its first line. Then I’ll show a quick summary of what changed."
      }
    }
    {
      "type": "item.completed",
      "item": {
        "id": "itm_1",
        "item_type": "command_execution",
        "command": "bash -lc ls -la | sed -n '1,200p'",
        "aggregated_output": "pyenv: cannot rehash: /Users/pakrym/.pyenv/shims isn't writable\ntotal 192\ndrwxr-xr-x@  33 pakrym  staff   1056 Sep 24 14:36 .\ndrwxr-xr-x   41 pakrym  staff   1312 Sep 24 09:17 ..\n-rw-r--r--@   1 pakrym  staff      6 Jul  9 16:16 .codespellignore\n-rw-r--r--@   1 pakrym  staff    258 Aug 13 09:40 .codespellrc\ndrwxr-xr-x@   5 pakrym  staff    160 Jul 23 08:26 .devcontainer\n-rw-r--r--@   1 pakrym  staff   6148 Jul 22 10:03 .DS_Store\ndrwxr-xr-x@  15 pakrym  staff    480 Sep 24 14:38 .git\ndrwxr-xr-x@  12 pakrym  staff    384 Sep  2 16:00 .github\n-rw-r--r--@   1 pakrym  staff    778 Jul  9 16:16 .gitignore\ndrwxr-xr-x@   3 pakrym  staff     96 Aug 11 09:37 .husky\n-rw-r--r--@   1 pakrym  staff    104 Jul  9 16:16 .npmrc\n-rw-r--r--@   1 pakrym  staff     96 Sep  2 08:52 .prettierignore\n-rw-r--r--@   1 pakrym  staff    170 Jul  9 16:16 .prettierrc.toml\ndrwxr-xr-x@   5 pakrym  staff    160 Sep 14 17:43 .vscode\ndrwxr-xr-x@   2 pakrym  staff     64 Sep 11 11:37 2025-09-11\n-rw-r--r--@   1 pakrym  staff   5505 Sep 18 09:28 AGENTS.md\n-rw-r--r--@   1 pakrym  staff     92 Sep  2 08:52 CHANGELOG.md\n-rw-r--r--@   1 pakrym  staff   1145 Jul  9 16:16 cliff.toml\ndrwxr-xr-x@  11 pakrym  staff    352 Sep 24 13:03 codex-cli\ndrwxr-xr-x@  38 pakrym  staff   1216 Sep 24 14:38 codex-rs\ndrwxr-xr-x@  18 pakrym  staff    576 Sep 23 11:01 docs\n-rw-r--r--@   1 pakrym  staff   2038 Jul  9 16:16 flake.lock\n-rw-r--r--@   1 pakrym  staff   1434 Jul  9 16:16 flake.nix\n-rw-r--r--@   1 pakrym  staff  10926 Jul  9 16:16 LICENSE\ndrwxr-xr-x@ 465 pakrym  staff  14880 Jul 15 07:36 node_modules\n-rw-r--r--@   1 pakrym  staff    242 Aug  5 08:25 NOTICE\n-rw-r--r--@   1 pakrym  staff    578 Aug 14 12:31 package.json\n-rw-r--r--@   1 pakrym  staff    498 Aug 11 09:37 pnpm-lock.yaml\n-rw-r--r--@   1 pakrym  staff     58 Aug 11 09:37 pnpm-workspace.yaml\n-rw-r--r--@   1 pakrym  staff   2402 Jul  9 16:16 PNPM.md\n-rw-r--r--@   1 pakrym  staff   4393 Sep 12 14:36 README.md\ndrwxr-xr-x@   4 pakrym  staff    128 Sep 18 09:28 scripts\ndrwxr-xr-x@   2 pakrym  staff     64 Sep 11 11:34 tmp\n",
        "exit_code": 0,
        "status": "completed"
      }
    }
    {
      "type": "item.completed",
      "item": {
        "id": "itm_2",
        "item_type": "reasoning",
        "text": "**Reviewing README.md file**\n\nI've located the README.md file at the root, and it’s 4393 bytes. Now, I need to remove the first line, but first, I should check its content to make sure I’m patching it correctly. I’ll use sed to display the first 20 lines. By reviewing those lines, I can determine exactly what needs to be removed before I proceed with the editing. Let's do this carefully!"
      }
    }
    {
      "type": "item.completed",
      "item": {
        "id": "itm_3",
        "item_type": "command_execution",
        "command": "bash -lc sed -n '1,40p' README.md",
        "aggregated_output": "<h1 align=\"center\">OpenAI Codex CLI</h1>\n\n<p align=\"center\"><code>npm i -g @openai/codex</code><br />or <code>brew install codex</code></p>\n\n<p align=\"center\"><strong>Codex CLI</strong> is a coding agent from OpenAI that runs locally on your computer.\n</br>\n</br>If you want Codex in your code editor (VS Code, Cursor, Windsurf), <a href=\"https://developers.openai.com/codex/ide\">install in your IDE</a>\n</br>If you are looking for the <em>cloud-based agent</em> from OpenAI, <strong>Codex Web</strong>, go to <a href=\"https://chatgpt.com/codex\">chatgpt.com/codex</a></p>\n\n<p align=\"center\">\n  <img src=\"./.github/codex-cli-splash.png\" alt=\"Codex CLI splash\" width=\"80%\" />\n  </p>\n\n---\n\n## Quickstart\n\n### Installing and running Codex CLI\n\nInstall globally with your preferred package manager. If you use npm:\n\n```shell\nnpm install -g @openai/codex\n```\n\nAlternatively, if you use Homebrew:\n\n```shell\nbrew install codex\n```\n\nThen simply run `codex` to get started:\n\n```shell\ncodex\n```\n\n<details>\n<summary>You can also go to the <a href=\"https://github.com/openai/codex/releases/latest\">latest GitHub Release</a> and download the appropriate binary for your platform.</summary>\n\n",
        "exit_code": 0,
        "status": "completed"
      }
    }
    {
      "type": "item.completed",
      "item": {
        "id": "itm_4",
        "item_type": "reasoning",
        "text": "**Updating README file**\n\nThe user wants to update the README by removing the first line, which is an HTML h1 tag. I'll proceed to delete that line while keeping the rest intact. To do this, I need to use the apply_patch tool and create a patch in the diff format. I’ll take care to include context lines around the change. After removing the header, I’m considering whether to delete the following blank line as well to keep it tidy."
      }
    }
    {
      "type": "item.completed",
      "item": {
        "id": "itm_5",
        "item_type": "reasoning",
        "text": "**Crafting the patch**\n\nThe user has asked to remove just the first line, without mentioning the blank lines. Technically, leaving a blank line at the start is acceptable, but it's not ideal. Ideally, I'd prefer to remove the first line while maintaining the subsequent blank line. However, if I can also remove that blank line, it could tidy things up even more. Still, since the user's request is strictly to remove only the first line, I'll stick to that and craft the patch accordingly."
      }
    }
    {
      "type": "item.completed",
      "item": {
        "id": "itm_6",
        "item_type": "assistant_message",
        "text": "Applying the change by removing the first line from `README.md`."
      }
    }
    {
      "type": "item.completed",
      "item": {
        "id": "itm_7",
        "item_type": "file_change",
        "changes": [
          {
            "path": "/Users/pakrym/code/codex/README.md",
            "kind": "update"
          }
        ],
        "status": "completed"
      }
    }
    {
      "type": "item.completed",
      "item": {
        "id": "itm_8",
        "item_type": "reasoning",
        "text": "**Validating README Update**\n\nI updated the README.md and now I need to validate that the first line has been successfully removed. To do this, I'll check the first 10 lines of the document to ensure everything looks correct. It’s always good to double-check these kinds of changes, so I’m going to take that extra step to make sure the update was executed properly. Let's see what those initial lines say!"
      }
    }
    {
      "type": "item.completed",
      "item": {
        "id": "itm_9",
        "item_type": "command_execution",
        "command": "bash -lc sed -n '1,20p' README.md",
        "aggregated_output": "\n<p align=\"center\"><code>npm i -g @openai/codex</code><br />or <code>brew install codex</code></p>\n\n<p align=\"center\"><strong>Codex CLI</strong> is a coding agent from OpenAI that runs locally on your computer.\n</br>\n</br>If you want Codex in your code editor (VS Code, Cursor, Windsurf), <a href=\"https://developers.openai.com/codex/ide\">install in your IDE</a>\n</br>If you are looking for the <em>cloud-based agent</em> from OpenAI, <strong>Codex Web</strong>, go to <a href=\"https://chatgpt.com/codex\">chatgpt.com/codex</a></p>\n\n<p align=\"center\">\n  <img src=\"./.github/codex-cli-splash.png\" alt=\"Codex CLI splash\" width=\"80%\" />\n  </p>\n\n---\n\n## Quickstart\n\n### Installing and running Codex CLI\n\nInstall globally with your preferred package manager. If you use npm:\n\n",
        "exit_code": 0,
        "status": "completed"
      }
    }
    {
      "type": "item.completed",
      "item": {
        "id": "itm_10",
        "item_type": "assistant_message",
        "text": "- Updated `README.md:1` by removing the opening H1 line.\n- The README now starts with the install instructions paragraph.\n\nWant me to also remove the leading blank line at the top?"
      }
    }
    ```
    
    </details>
    
    The idea is to give users fully formatted items they can use directly in
    their rendering/application logic and avoid having them building up
    items manually based on events (unless they want to for streaming).
    
    This PR implements only the `item.completed` payload for some event
    types, more event types and item types to come.
    
    ---------
    
    Co-authored-by: Michael Bolin <mbolin@openai.com>
  • Add exec output-schema parameter (#4079)
    Adds structured output to `exec` via the `--structured-output`
    parameter.
  • chore: clippy on redundant closure (#4058)
    Add redundant closure clippy rules and let Codex fix it by minimising
    FQP
  • Use helpers instead of fixtures (#3888)
    Move to using test helper method everywhere.
  • fix: ensure cwd for conversation and sandbox are separate concerns (#3874)
    Previous to this PR, both of these functions take a single `cwd`:
    
    
    https://github.com/openai/codex/blob/71038381aa0f51aa62e1a2bcc7cbf26a05b141f3/codex-rs/core/src/seatbelt.rs#L19-L25
    
    
    https://github.com/openai/codex/blob/71038381aa0f51aa62e1a2bcc7cbf26a05b141f3/codex-rs/core/src/landlock.rs#L16-L23
    
    whereas `cwd` and `sandbox_cwd` should be set independently (fixed in
    this PR).
    
    Added `sandbox_distinguishes_command_and_policy_cwds()` to
    `codex-rs/exec/tests/suite/sandbox.rs` to verify this.
  • enable-resume (#3537)
    Adding the ability to resume conversations.
    we have one verb `resume`. 
    
    Behavior:
    
    `tui`:
    `codex resume`: opens session picker
    `codex resume --last`: continue last message
    `codex resume <session id>`: continue conversation with `session id`
    
    `exec`:
    `codex resume --last`: continue last conversation
    `codex resume <session id>`: continue conversation with `session id`
    
    Implementation:
    - I added a function to find the path in `~/.codex/sessions/` with a
    `UUID`. This is helpful in resuming with session id.
    - Added the above mentioned flags
    - Added lots of testing
  • chore: enable clippy::redundant_clone (#3489)
    Created this PR by:
    
    - adding `redundant_clone` to `[workspace.lints.clippy]` in
    `cargo-rs/Cargol.toml`
    - running `cargo clippy --tests --fix`
    - running `just fmt`
    
    Though I had to clean up one instance of the following that resulted:
    
    ```rust
    let codex = codex;
    ```
  • chore: require uninlined_format_args from clippy (#2845)
    - added `uninlined_format_args` to `[workspace.lints.clippy]` in the
    `Cargo.toml` for the workspace
    - ran `cargo clippy --tests --fix`
    - ran `just fmt`
  • [exec] Clean up apply-patch tests (#2648)
    ## Summary
    These tests were getting a bit unwieldy, and they're starting to become
    load-bearing. Let's clean them up, and get them working solidly so we
    can easily expand this harness with new tests.
    
    ## Test Plan
    - [x] Tests continue to pass
  • test: faster test execution in codex-core (#2633)
    this dramatically improves time to run `cargo test -p codex-core` (~25x
    speedup).
    
    before:
    ```
    cargo test -p codex-core  35.96s user 68.63s system 19% cpu 8:49.80 total
    ```
    
    after:
    ```
    cargo test -p codex-core  5.51s user 8.16s system 63% cpu 21.407 total
    ```
    
    both tests measured "hot", i.e. on a 2nd run with no filesystem changes,
    to exclude compile times.
    
    approach inspired by [Delete Cargo Integration
    Tests](https://matklad.github.io/2021/02/27/delete-cargo-integration-tests.html),
    we move all test cases in tests/ into a single suite in order to have a
    single binary, as there is significant overhead for each test binary
    executed, and because test execution is only parallelized with a single
    binary.
  • [apply_patch] freeform apply_patch tool (#2576)
    ## Summary
    GPT-5 introduced the concept of [custom
    tools](https://platform.openai.com/docs/guides/function-calling#custom-tools),
    which allow the model to send a raw string result back, simplifying
    json-escape issues. We are migrating gpt-5 to use this by default.
    
    However, gpt-oss models do not support custom tools, only normal
    functions. So we keep both tool definitions, and provide whichever one
    the model family supports.
    
    ## Testing
    - [x] Tested locally with various models
    - [x] Unit tests pass
  • [tools] Add apply_patch tool (#2303)
    ## Summary
    We've been seeing a number of issues and reports with our synthetic
    `apply_patch` tool, e.g. #802. Let's make this a real tool - in my
    anecdotal testing, it's critical for GPT-OSS models, but I'd like to
    make it the standard across GPT-5 and codex models as well.
    
    ## Testing
    - [x] Tested locally
    - [x] Integration test
  • Added allow-expect-in-tests / allow-unwrap-in-tests (#2328)
    This PR:
    * Added the clippy.toml to configure allowable expect / unwrap usage in
    tests
    * Removed as many expect/allow lines as possible from tests
    * moved a bunch of allows to expects where possible
    
    Note: in integration tests, non `#[test]` helper functions are not
    covered by this so we had to leave a few lingering `expect(expect_used`
    checks around
  • Fix AF_UNIX, sockpair, recvfrom in linux sandbox (#2309)
    When using codex-tui on a linux system I was unable to run `cargo
    clippy` inside of codex due to:
    ```
    [pid 3548377] socketpair(AF_UNIX, SOCK_SEQPACKET|SOCK_CLOEXEC, 0,  <unfinished ...>
    [pid 3548370] close(8 <unfinished ...>
    [pid 3548377] <... socketpair resumed>0x7ffb97f4ed60) = -1 EPERM (Operation not permitted)
    ```
    And
    ```
    3611300 <... recvfrom resumed>0x708b8b5cffe0, 8, 0, NULL, NULL) = -1 EPERM (Operation not permitted)
    ```
    
    This PR:
    * Fixes a bug that disallowed AF_UNIX to allow it on `socket()`
    * Adds recvfrom() to the syscall allow list, this should be fine since
    we disable opening new sockets. But we should validate there is not a
    open socket inheritance issue.
    * Allow socketpair to be called for AF_UNIX
    * Adds tests for AF_UNIX components
    * All of which allows running `cargo clippy` within the sandbox on
    linux, and possibly other tooling using a fork server model + AF_UNIX
    comms.
  • fix: run python_multiprocessing_lock_works integration test on Mac and Linux (#2318)
    The high-order bit on this PR is that it makes it so `sandbox.rs` tests
    both Mac and Linux, as we introduce a general
    `spawn_command_under_sandbox()` function with platform-specific
    implementations for testing.
    
    An important, and interesting, discovery in porting the test to Linux is
    that (for reasons cited in the code comments), `/dev/shm` has to be
    added to `writable_roots` on Linux in order for `multiprocessing.Lock`
    to work there. Granting write access to `/dev/shm` comes with some
    degree of risk, so we do not make this the default for Codex CLI.
    
    Piggybacking on top of #2317, this moves the
    `python_multiprocessing_lock_works` test yet again, moving
    `codex-rs/core/tests/sandbox.rs` to `codex-rs/exec/tests/sandbox.rs`
    because in `codex-rs/exec/tests` we can use `cargo_bin()` like so:
    
    ```
    let codex_linux_sandbox_exe = assert_cmd::cargo::cargo_bin("codex-exec");
    ```
    
    which is necessary so we can use `codex_linux_sandbox_exe` and therefore
    `spawn_command_under_linux_sandbox` in an integration test.
    
    This also moves `spawn_command_under_linux_sandbox()` out of `exec.rs`
    and into `landlock.rs`, which makes things more consistent with
    `seatbelt.rs` in `codex-core`.
    
    For reference, https://github.com/openai/codex/pull/1808 is the PR that
    made the change to Seatbelt to get this test to pass on Mac.