Commit Graph

92 Commits

  • chore: cleanup Config instantiation codepaths (#8226)
    This PR does various types of cleanup before I can proceed with more
    ambitious changes to config loading.
    
    First, I noticed duplicated code across these two methods:
    
    
    https://github.com/openai/codex/blob/774bd9e432fa2e0f4e059e97648cf92216912e19/codex-rs/core/src/config/mod.rs#L314-L324
    
    
    https://github.com/openai/codex/blob/774bd9e432fa2e0f4e059e97648cf92216912e19/codex-rs/core/src/config/mod.rs#L334-L344
    
    This has now been consolidated in
    `load_config_as_toml_with_cli_overrides()`.
    
    Further, I noticed that `Config::load_with_cli_overrides()` took two
    similar arguments:
    
    
    https://github.com/openai/codex/blob/774bd9e432fa2e0f4e059e97648cf92216912e19/codex-rs/core/src/config/mod.rs#L308-L311
    
    The difference between `cli_overrides` and `overrides` was not
    immediately obvious to me. At first glance, it appears that one should
    be able to be expressed in terms of the other, but it turns out that
    some fields of `ConfigOverrides` (such as `cwd` and
    `codex_linux_sandbox_exe`) are, by design, not configurable via a
    `.toml` file or a command-line `--config` flag.
    
    That said, I discovered that many callers of
    `Config::load_with_cli_overrides()` were passing
    `ConfigOverrides::default()` for `overrides`, so I created two separate
    methods:
    
    - `Config::load_with_cli_overrides(cli_overrides: Vec<(String,
    TomlValue)>)`
    - `Config::load_with_cli_overrides_and_harness_overrides(cli_overrides:
    Vec<(String, TomlValue)>, harness_overrides: ConfigOverrides)`
    
    The latter has a long name, as it is _not_ what should be used in the
    common case, so the extra typing is designed to draw attention to this
    fact. I tried to update the existing callsites to use the shorter name,
    where possible.
    
    Further, in the cases where `ConfigOverrides` is used, usually only a
    limited subset of fields are actually set, so I updated the declarations
    to leverage `..Default::default()` where possible.
  • feat: Constrain values for approval_policy (#7778)
    Constrain `approval_policy` through new `admin_policy` config.
    
    This PR will:
    1. Add a `admin_policy` section to config, with a single field (for now)
    `allowed_approval_policies`. This list constrains the set of
    user-settable `approval_policy`s.
    2. Introduce a new `Constrained<T>` type, which combines a current value
    and a validator function. The validator function ensures disallowed
    values are not set.
    3. Change the type of `approval_policy` on `Config` and
    `SessionConfiguration` from `AskForApproval` to
    `Constrained<AskForApproval>`. The validator function is set by the
    values passed into `allowed_approval_policies`.
    4. `GenericDisplayRow`: add a `disabled_reason: Option<String>`. When
    set, it disables selection of the value and indicates as such in the
    menu. This also makes it unselectable with arrow keys or numbers. This
    is used in the `/approvals` menu.
    
    Follow ups are:
    1. Do the same thing to `sandbox_policy`.
    2. Propagate the allowed set of values through app-server for the
    extension (though already this should prevent app-server from setting
    this values, it's just that we want to disable UI elements that are
    unsettable).
    
    Happy to split this PR up if you prefer, into the logical numbered areas
    above. Especially if there are parts we want to gavel on separately
    (e.g. admin_policy).
    
    Disabled full access:
    <img width="1680" height="380" alt="image"
    src="https://github.com/user-attachments/assets/1fb61c8c-1fcb-4dc4-8355-2293edb52ba0"
    />
    
    Disabled `--yolo` on startup:
    <img width="749" height="76" alt="image"
    src="https://github.com/user-attachments/assets/0a1211a0-6eb1-40d6-a1d7-439c41e94ddb"
    />
    
    CODEX-4087
  • make model optional in config (#7769)
    - Make Config.model optional and centralize default-selection logic in
    ModelsManager, including a default_model helper (with
    codex-auto-balanced when available) so sessions now carry an explicit
    chosen model separate from the base config.
    - Resolve `model` once in `core` and `tui` from config. Then store the
    state of it on other structs.
    - Move refreshing models to be before resolving the default model
  • Removed experimental "command risk assessment" feature (#7799)
    This experimental feature received lukewarm reception during internal
    testing. Removing from the code base.
  • support MCP elicitations (#6947)
    No support for request schema yet, but we'll at least show the message
    and allow accept/decline.
    
    <img width="823" height="551" alt="Screenshot 2025-11-21 at 2 44 05 PM"
    src="https://github.com/user-attachments/assets/6fbb892d-ca12-4765-921e-9ac4b217534d"
    />
  • codex-exec: allow resume --last to read prompt #6717 (#6719)
    ### Description
    
    - codex exec --json resume --last "<prompt>" bailed out because clap
    treated the prompt as SESSION_ID. I removed the conflicts_with flag and
    reinterpret that positional as a prompt when
    --last is set, so the flow now keeps working in JSON mode.
    (codex-rs/exec/src/cli.rs:84-104, codex-rs/exec/src/lib.rs:75-130)
    - Added a regression test that exercises resume --last in JSON mode to
    ensure the prompt is accepted and the rollout file is updated.
    (codex-rs/exec/tests/suite/resume.rs:126-178)
    
    ### Testing
    
      - just fmt
      - cargo test -p codex-exec
      - just fix -p codex-exec
      - cargo test -p codex-exec
    
    #6717
    
    Signed-off-by: Dmitri Khokhlov <dkhokhlov@cribl.io>
  • LM Studio OSS Support (#2312)
    ## Overview
    
    Adds LM Studio OSS support. Closes #1883
    
    
    ### Changes
    This PR enhances the behavior of `--oss` flag to support LM Studio as a
    provider. Additionally, it introduces a new flag`--local-provider` which
    can take in `lmstudio` or `ollama` as values if the user wants to
    explicitly choose which one to use.
    
    If no provider is specified `codex --oss` will auto-select the provider
    based on whichever is running.
    
    #### Additional enhancements 
    The default can be set using `oss-provider` in config like:
    
    ```
    oss_provider = "lmstudio"
    ```
    
    For non-interactive users, they will need to either provide the provider
    as an arg or have it in their `config.toml`
    
    ### Notes
    For best performance, [set the default context
    length](https://lmstudio.ai/docs/app/advanced/per-model) for gpt-oss to
    the maximum your machine can support
    
    ---------
    
    Co-authored-by: Matt Clayton <matt@lmstudio.ai>
    Co-authored-by: Eric Traut <etraut@openai.com>
  • feat: Add support for --add-dir to exec and TypeScript SDK (#6565)
    ## Summary
    
    Adds support for specifying additional directories in the TypeScript SDK
    through a new `additionalDirectories` option in `ThreadOptions`.
    
    ## Changes
    
    - Added `additionalDirectories` parameter to `ThreadOptions` interface
    - Updated `CodexExec` to accept and pass through additional directories
    via the `--config` flag for `sandbox_workspace_write.writable_roots`
    - Added comprehensive test coverage for the new functionality
    
    ## Test plan
    
    - Added test case that verifies `additionalDirectories` is correctly
    passed as repeated flags
    - Existing tests continue to pass
    
    ---------
    
    Co-authored-by: Claude <noreply@anthropic.com>
  • [Hygiene] Remove include_view_image_tool config (#5976)
    There's still some debate about whether we want to expose
    `tools.view_image` or `feature.view_image` so those are left unchanged
    for now, but this old `include_view_image_tool` config is good-to-go.
    Also updated the doc to reflect that `view_image` tool is now by default
    true.
  • [codex] add developer instructions (#5897)
    we are using developer instructions for code reviews, we need to pass
    them in cli as well.
  • feat: compaction prompt configurable (#5959)
    ```
     codex -c compact_prompt="Summarize in bullet points"
     ```
  • Fixed bug that results in a sporadic hang when attaching images (#5891)
    Addresses https://github.com/openai/codex/issues/5773
    
    Testing: I tested that images work (regardless of order that they are
    associated with the task prompt) in both the CLI and Extension. Also
    verified that conversations in CLI and extension with images can be
    resumed.
  • [Auth] Choose which auth storage to use based on config (#5792)
    This PR is a follow-up to #5591. It allows users to choose which auth
    storage mode they want by using the new
    `cli_auth_credentials_store_mode` config.
  • feat: annotate conversations with model_provider for filtering (#5658)
    Because conversations that use the Responses API can have encrypted
    reasoning messages, trying to resume a conversation with a different
    provider could lead to confusing "failed to decrypt" errors. (This is
    reproducible by starting a conversation using ChatGPT login and resuming
    it as a conversation that uses OpenAI models via Azure.)
    
    This changes `ListConversationsParams` to take a `model_providers:
    Option<Vec<String>>` and adds `model_provider` on each
    `ConversationSummary` it returns so these cases can be disambiguated.
    
    Note this ended up making changes to
    `codex-rs/core/src/rollout/tests.rs` because it had a number of cases
    where it expected `Some` for the value of `next_cursor`, but the list of
    rollouts was complete, so according to this docstring:
    
    
    https://github.com/openai/codex/blob/bcd64c7e7231d6316a2377d1525a0fa74f21b783/codex-rs/app-server-protocol/src/protocol.rs#L334-L337
    
    If there are no more items to return, then `next_cursor` should be
    `None`. This PR updates that logic.
    
    
    
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/5658).
    * #5803
    * #5793
    * __->__ #5658
  • Added model summary and risk assessment for commands that violate sandbox policy (#5536)
    This PR adds support for a model-based summary and risk assessment for
    commands that violate the sandbox policy and require user approval. This
    aids the user in evaluating whether the command should be approved.
    
    The feature works by taking a failed command and passing it back to the
    model and asking it to summarize the command, give it a risk level (low,
    medium, high) and a risk category (e.g. "data deletion" or "data
    exfiltration"). It uses a new conversation thread so the context in the
    existing thread doesn't influence the answer. If the call to the model
    fails or takes longer than 5 seconds, it falls back to the current
    behavior.
    
    For now, this is an experimental feature and is gated by a config key
    `experimental_sandbox_command_assessment`.
    
    Here is a screen shot of the approval prompt showing the risk assessment
    and summary.
    
    <img width="723" height="282" alt="image"
    src="https://github.com/user-attachments/assets/4597dd7c-d5a0-4e9f-9d13-414bd082fd6b"
    />
  • chore: drop approve all (#5503)
    Not needed anymore
  • Enable plan tool by default (#5384)
    ## Summary
    - make the plan tool available by default by removing the feature flag
    and always registering the handler
    - drop plan-tool CLI and API toggles across the exec, TUI, MCP server,
    and app server code paths
    - update tests and configs to reflect the always-on plan tool and guard
    workspace restriction tests against env leakage
    
    ## Testing
    Manually tested the extension. 
    ------
    https://chatgpt.com/codex/tasks/task_i_68f67a3ff2d083209562a773f814c1f9
  • Add ItemStarted/ItemCompleted events for UserInputItem (#5306)
    Adds a new ItemStarted event and delivers UserMessage as the first item
    type (more to come).
    
    
    Renames `InputItem` to `UserInput` considering we're using the `Item`
    suffix for actual items.
  • Add forced_chatgpt_workspace_id and forced_login_method configuration options (#5303)
    This PR adds support for configs to specify a forced login method
    (chatgpt or api) as well as a forced chatgpt account id. This lets
    enterprises uses [managed
    configs](https://developers.openai.com/codex/security#managed-configuration)
    to force all employees to use their company's workspace instead of their
    own or any other.
    
    When a workspace id is set, a query param is sent to the login flow
    which auto-selects the given workspace or errors if the user isn't a
    member of it.
    
    This PR is large but a large % of it is tests, wiring, and required
    formatting changes.
    
    API login with chatgpt forced
    <img width="1592" height="116" alt="CleanShot 2025-10-19 at 22 40 04"
    src="https://github.com/user-attachments/assets/560c6bb4-a20a-4a37-95af-93df39d057dd"
    />
    
    ChatGPT login with api forced
    <img width="1018" height="100" alt="CleanShot 2025-10-19 at 22 40 29"
    src="https://github.com/user-attachments/assets/d010bbbb-9c8d-4227-9eda-e55bf043b4af"
    />
    
    Onboarding with api forced
    <img width="892" height="460" alt="CleanShot 2025-10-19 at 22 41 02"
    src="https://github.com/user-attachments/assets/cc0ed45c-b257-4d62-a32e-6ca7514b5edd"
    />
    
    Onboarding with ChatGPT forced
    <img width="1154" height="426" alt="CleanShot 2025-10-19 at 22 41 27"
    src="https://github.com/user-attachments/assets/41c41417-dc68-4bb4-b3e7-3b7769f7e6a1"
    />
    
    Logging in with the wrong workspace
    <img width="2222" height="84" alt="CleanShot 2025-10-19 at 22 42 31"
    src="https://github.com/user-attachments/assets/0ff4222c-f626-4dd3-b035-0b7fe998a046"
    />
  • feat: add --add-dir flag for extra writable roots (#5335)
    Add a `--add-dir` CLI flag so sessions can use extra writable roots in
    addition to the ones specified in the config file. These are ephemerally
    added during the session only.
    
    Fixes #3303
    Fixes #2797
  • Set codex SDK TypeScript originator (#4894)
    ## Summary
    - ensure the TypeScript SDK sets CODEX_INTERNAL_ORIGINATOR_OVERRIDE to
    codex_sdk_ts when spawning the Codex CLI
    - extend the responses proxy test helper to capture request headers for
    assertions
    - add coverage that verifies Codex threads launched from the TypeScript
    SDK send the codex_sdk_ts originator header
    
    ## Testing
    - Not Run (not requested)
    
    
    ------
    https://chatgpt.com/codex/tasks/task_i_68e561b125248320a487f129093d16e7
  • feat: Freeform apply_patch with simple shell output (#4718)
    ## Summary
    This PR is an alternative approach to #4711, but instead of changing our
    storage, parses out shell calls in the client and reserializes them on
    the fly before we send them out as part of the request.
    
    What this changes:
    1. Adds additional serialization logic when the
    ApplyPatchToolType::Freeform is in use.
    2. Adds a --custom-apply-patch flag to enable this setting on a
    session-by-session basis.
    
    This change is delicate, but is not meant to be permanent. It is meant
    to be the first step in a migration:
    1. (This PR) Add in-flight serialization with config
    2. Update model_family default
    3. Update serialization logic to store turn outputs in a structured
    format, with logic to serialize based on model_family setting.
    4. Remove this rewrite in-flight logic.
    
    ## Test Plan
    - [x] Additional unit tests added
    - [x] Integration tests added
    - [x] Tested locally
  • add(core): managed config (#3868)
    ## Summary
    
    - Factor `load_config_as_toml` into `core::config_loader` so config
    loading is reusable across callers.
    - Layer `~/.codex/config.toml`, optional `~/.codex/managed_config.toml`,
    and macOS managed preferences (base64) with recursive table merging and
    scoped threads per source.
    
    ## Config Flow
    
    ```
    Managed prefs (macOS profile: com.openai.codex/config_toml_base64)
                                   ▲
                                   │
    ~/.codex/managed_config.toml   │  (optional file-based override)
                                   ▲
                                   │
                    ~/.codex/config.toml (user-defined settings)
    ```
    
    - The loader searches under the resolved `CODEX_HOME` directory
    (defaults to `~/.codex`).
    - Managed configs let administrators ship fleet-wide overrides via
    device profiles which is useful for enforcing certain settings like
    sandbox or approval defaults.
    - For nested hash tables: overlays merge recursively. Child tables are
    merged key-by-key, while scalar or array values replace the prior layer
    entirely. This lets admins add or tweak individual fields without
    clobbering unrelated user settings.
  • feat: codex exec writes only the final message to stdout (#4644)
    This updates `codex exec` so that, by default, most of the agent's
    activity is written to stderr so that only the final agent message is
    written to stdout. This makes it easier to pipe `codex exec` into
    another tool without extra filtering.
    
    I introduced `#![deny(clippy::print_stdout)]` to help enforce this
    change and renamed the `ts_println!()` macro to `ts_msg()` because (1)
    it no longer calls `println!()` and (2), `ts_eprintln!()` seemed too
    long of a name.
    
    While here, this also adds `-o` as an alias for `--output-last-message`.
    
    Fixes https://github.com/openai/codex/issues/1670
  • chore: refactor tool handling (#4510)
    # Tool System Refactor
    
    - Centralizes tool definitions and execution in `core/src/tools/*`:
    specs (`spec.rs`), handlers (`handlers/*`), router (`router.rs`),
    registry/dispatch (`registry.rs`), and shared context (`context.rs`).
    One registry now builds the model-visible tool list and binds handlers.
    - Router converts model responses to tool calls; Registry dispatches
    with consistent telemetry via `codex-rs/otel` and unified error
    handling. Function, Local Shell, MCP, and experimental `unified_exec`
    all flow through this path; legacy shell aliases still work.
    - Rationale: reduce per‑tool boilerplate, keep spec/handler in sync, and
    make adding tools predictable and testable.
    
    Example: `read_file`
    - Spec: `core/src/tools/spec.rs` (see `create_read_file_tool`,
    registered by `build_specs`).
    - Handler: `core/src/tools/handlers/read_file.rs` (absolute `file_path`,
    1‑indexed `offset`, `limit`, `L#: ` prefixes, safe truncation).
    - E2E test: `core/tests/suite/read_file.rs` validates the tool returns
    the requested lines.
    
    ## Next steps:
    - Decompose `handle_container_exec_with_params` 
    - Add parallel tool calls
  • Use supports_color in codex exec (#4633)
    It knows how to detect github actions
  • Separate interactive and non-interactive sessions (#4612)
    Do not show exec session in VSCode/TUI selector.
  • Support CODEX_API_KEY for codex exec (#4615)
    Allows to set API key per invocation of `codex exec`
  • Remove legacy codex exec --json format (#4525)
    `codex exec --json` now maps to the behavior of `codex exec
    --experimental-json` with new event and item shapes.
    
    Thread events:
    - thread.started
    - turn.started
    - turn.completed
    - turn.failed
    - item.started
    - item.updated
    - item.completed
    
    Item types: 
    - assistant_message
    - reasoning
    - command_execution
    - file_change
    - mcp_tool_call
    - web_search
    - todo_list
    - error
    
    Sample output:
    
    <details>
    `codex exec "list my assigned github issues"  --json | jq`
    
    ```
    {
      "type": "thread.started",
      "thread_id": "01999ce5-f229-7661-8570-53312bd47ea3"
    }
    {
      "type": "turn.started"
    }
    {
      "type": "item.completed",
      "item": {
        "id": "item_0",
        "item_type": "reasoning",
        "text": "**Planning to list assigned GitHub issues**"
      }
    }
    {
      "type": "item.started",
      "item": {
        "id": "item_1",
        "item_type": "mcp_tool_call",
        "server": "github",
        "tool": "search_issues",
        "status": "in_progress"
      }
    }
    {
      "type": "item.completed",
      "item": {
        "id": "item_1",
        "item_type": "mcp_tool_call",
        "server": "github",
        "tool": "search_issues",
        "status": "completed"
      }
    }
    {
      "type": "item.completed",
      "item": {
        "id": "item_2",
        "item_type": "reasoning",
        "text": "**Organizing final message structure**"
      }
    }
    {
      "type": "item.completed",
      "item": {
        "id": "item_3",
        "item_type": "assistant_message",
        "text": "**Assigned Issues**\n- openai/codex#3267 – “stream error: stream disconnected before completion…” (bug) – last update 2025-09-08\n- openai/codex#3257 – “You've hit your usage limit. Try again in 4 days 20 hours 9 minutes.” – last update 2025-09-23\n- openai/codex#3054 – “reqwest SSL panic (library has no ciphers)” (bug) – last update 2025-09-03\n- openai/codex#3051 – “thread 'main' panicked at linux-sandbox/src/linux_run_main.rs:53:5:” (bug) – last update 2025-09-10\n- openai/codex#3004 – “Auto-compact when approaching context limit” (enhancement) – last update 2025-09-26\n- openai/codex#2916 – “Feature request: Add OpenAI service tier support for cost optimization” – last update 2025-09-12\n- openai/codex#1581 – “stream error: stream disconnected before completion: stream closed before response.complete; retrying...” (bug) – last update 2025-09-17"
      }
    }
    {
      "type": "turn.completed",
      "usage": {
        "input_tokens": 34785,
        "cached_input_tokens": 12544,
        "output_tokens": 560
      }
    }
    ```
    
    </details>
  • Set originator for codex exec (#4485)
    Distinct from the main CLI.
  • OpenTelemetry events (#2103)
    ### Title
    
    ## otel
    
    Codex can emit [OpenTelemetry](https://opentelemetry.io/) **log events**
    that
    describe each run: outbound API requests, streamed responses, user
    input,
    tool-approval decisions, and the result of every tool invocation. Export
    is
    **disabled by default** so local runs remain self-contained. Opt in by
    adding an
    `[otel]` table and choosing an exporter.
    
    ```toml
    [otel]
    environment = "staging"   # defaults to "dev"
    exporter = "none"          # defaults to "none"; set to otlp-http or otlp-grpc to send events
    log_user_prompt = false    # defaults to false; redact prompt text unless explicitly enabled
    ```
    
    Codex tags every exported event with `service.name = "codex-cli"`, the
    CLI
    version, and an `env` attribute so downstream collectors can distinguish
    dev/staging/prod traffic. Only telemetry produced inside the
    `codex_otel`
    crate—the events listed below—is forwarded to the exporter.
    
    ### Event catalog
    
    Every event shares a common set of metadata fields: `event.timestamp`,
    `conversation.id`, `app.version`, `auth_mode` (when available),
    `user.account_id` (when available), `terminal.type`, `model`, and
    `slug`.
    
    With OTEL enabled Codex emits the following event types (in addition to
    the
    metadata above):
    
    - `codex.api_request`
      - `cf_ray` (optional)
      - `attempt`
      - `duration_ms`
      - `http.response.status_code` (optional)
      - `error.message` (failures)
    - `codex.sse_event`
      - `event.kind`
      - `duration_ms`
      - `error.message` (failures)
      - `input_token_count` (completion only)
      - `output_token_count` (completion only)
      - `cached_token_count` (completion only, optional)
      - `reasoning_token_count` (completion only, optional)
      - `tool_token_count` (completion only)
    - `codex.user_prompt`
      - `prompt_length`
      - `prompt` (redacted unless `log_user_prompt = true`)
    - `codex.tool_decision`
      - `tool_name`
      - `call_id`
    - `decision` (`approved`, `approved_for_session`, `denied`, or `abort`)
      - `source` (`config` or `user`)
    - `codex.tool_result`
      - `tool_name`
      - `call_id`
      - `arguments`
      - `duration_ms` (execution time for the tool)
      - `success` (`"true"` or `"false"`)
      - `output`
    
    ### Choosing an exporter
    
    Set `otel.exporter` to control where events go:
    
    - `none` – leaves instrumentation active but skips exporting. This is
    the
      default.
    - `otlp-http` – posts OTLP log records to an OTLP/HTTP collector.
    Specify the
      endpoint, protocol, and headers your collector expects:
    
      ```toml
      [otel]
      exporter = { otlp-http = {
        endpoint = "https://otel.example.com/v1/logs",
        protocol = "binary",
        headers = { "x-otlp-api-key" = "${OTLP_TOKEN}" }
      }}
      ```
    
    - `otlp-grpc` – streams OTLP log records over gRPC. Provide the endpoint
    and any
      metadata headers:
    
      ```toml
      [otel]
      exporter = { otlp-grpc = {
        endpoint = "https://otel.example.com:4317",
        headers = { "x-otlp-meta" = "abc123" }
      }}
      ```
    
    If the exporter is `none` nothing is written anywhere; otherwise you
    must run or point to your
    own collector. All exporters run on a background batch worker that is
    flushed on
    shutdown.
    
    If you build Codex from source the OTEL crate is still behind an `otel`
    feature
    flag; the official prebuilt binaries ship with the feature enabled. When
    the
    feature is disabled the telemetry hooks become no-ops so the CLI
    continues to
    function without the extra dependencies.
    
    ---------
    
    Co-authored-by: Anton Panasenko <apanasenko@openai.com>
  • Add turn started/completed events and correct exit code on error (#4309)
    Adds new event for session completed that includes usage. Also ensures
    we return 1 on failures.
    ```
    {
      "type": "session.created",
      "session_id": "019987a7-93e7-7b20-9e05-e90060e411ea"
    }
    {
      "type": "turn.started"
    }
    ...
    {
      "type": "turn.completed",
      "usage": {
        "input_tokens": 78913,
        "cached_input_tokens": 65280,
        "output_tokens": 1099
      }
    }
    ```
  • Add explicit codex exec events (#4177)
    This pull request add a new experimental format of JSON output.
    
    You can try it using `codex exec --experimental-json`.
    
    Design takes a lot of inspiration from Responses API items and stream
    format.
    
    # Session and items
    Each invocation of `codex exec` starts or resumes a session. 
    
    Session contains multiple high-level item types:
    1. Assistant message 
    2. Assistant thinking 
    3. Command execution 
    4. File changes
    5. To-do lists
    6. etc.
    
    # Events 
    Session and items are going through their life cycles which is
    represented by events.
    
    Session is `session.created` or `session.resumed`
    Items are `item.added`, `item.updated`, `item.completed`,
    `item.require_approval` (or other item types like `item.output_delta`
    when we need streaming).
    
    So a typical session can look like:
    
    <details>
    
    ```
    {
      "type": "session.created",
      "session_id": "01997dac-9581-7de3-b6a0-1df8256f2752"
    }
    {
      "type": "item.completed",
      "item": {
        "id": "itm_0",
        "item_type": "assistant_message",
        "text": "I’ll locate the top-level README and remove its first line. Then I’ll show a quick summary of what changed."
      }
    }
    {
      "type": "item.completed",
      "item": {
        "id": "itm_1",
        "item_type": "command_execution",
        "command": "bash -lc ls -la | sed -n '1,200p'",
        "aggregated_output": "pyenv: cannot rehash: /Users/pakrym/.pyenv/shims isn't writable\ntotal 192\ndrwxr-xr-x@  33 pakrym  staff   1056 Sep 24 14:36 .\ndrwxr-xr-x   41 pakrym  staff   1312 Sep 24 09:17 ..\n-rw-r--r--@   1 pakrym  staff      6 Jul  9 16:16 .codespellignore\n-rw-r--r--@   1 pakrym  staff    258 Aug 13 09:40 .codespellrc\ndrwxr-xr-x@   5 pakrym  staff    160 Jul 23 08:26 .devcontainer\n-rw-r--r--@   1 pakrym  staff   6148 Jul 22 10:03 .DS_Store\ndrwxr-xr-x@  15 pakrym  staff    480 Sep 24 14:38 .git\ndrwxr-xr-x@  12 pakrym  staff    384 Sep  2 16:00 .github\n-rw-r--r--@   1 pakrym  staff    778 Jul  9 16:16 .gitignore\ndrwxr-xr-x@   3 pakrym  staff     96 Aug 11 09:37 .husky\n-rw-r--r--@   1 pakrym  staff    104 Jul  9 16:16 .npmrc\n-rw-r--r--@   1 pakrym  staff     96 Sep  2 08:52 .prettierignore\n-rw-r--r--@   1 pakrym  staff    170 Jul  9 16:16 .prettierrc.toml\ndrwxr-xr-x@   5 pakrym  staff    160 Sep 14 17:43 .vscode\ndrwxr-xr-x@   2 pakrym  staff     64 Sep 11 11:37 2025-09-11\n-rw-r--r--@   1 pakrym  staff   5505 Sep 18 09:28 AGENTS.md\n-rw-r--r--@   1 pakrym  staff     92 Sep  2 08:52 CHANGELOG.md\n-rw-r--r--@   1 pakrym  staff   1145 Jul  9 16:16 cliff.toml\ndrwxr-xr-x@  11 pakrym  staff    352 Sep 24 13:03 codex-cli\ndrwxr-xr-x@  38 pakrym  staff   1216 Sep 24 14:38 codex-rs\ndrwxr-xr-x@  18 pakrym  staff    576 Sep 23 11:01 docs\n-rw-r--r--@   1 pakrym  staff   2038 Jul  9 16:16 flake.lock\n-rw-r--r--@   1 pakrym  staff   1434 Jul  9 16:16 flake.nix\n-rw-r--r--@   1 pakrym  staff  10926 Jul  9 16:16 LICENSE\ndrwxr-xr-x@ 465 pakrym  staff  14880 Jul 15 07:36 node_modules\n-rw-r--r--@   1 pakrym  staff    242 Aug  5 08:25 NOTICE\n-rw-r--r--@   1 pakrym  staff    578 Aug 14 12:31 package.json\n-rw-r--r--@   1 pakrym  staff    498 Aug 11 09:37 pnpm-lock.yaml\n-rw-r--r--@   1 pakrym  staff     58 Aug 11 09:37 pnpm-workspace.yaml\n-rw-r--r--@   1 pakrym  staff   2402 Jul  9 16:16 PNPM.md\n-rw-r--r--@   1 pakrym  staff   4393 Sep 12 14:36 README.md\ndrwxr-xr-x@   4 pakrym  staff    128 Sep 18 09:28 scripts\ndrwxr-xr-x@   2 pakrym  staff     64 Sep 11 11:34 tmp\n",
        "exit_code": 0,
        "status": "completed"
      }
    }
    {
      "type": "item.completed",
      "item": {
        "id": "itm_2",
        "item_type": "reasoning",
        "text": "**Reviewing README.md file**\n\nI've located the README.md file at the root, and it’s 4393 bytes. Now, I need to remove the first line, but first, I should check its content to make sure I’m patching it correctly. I’ll use sed to display the first 20 lines. By reviewing those lines, I can determine exactly what needs to be removed before I proceed with the editing. Let's do this carefully!"
      }
    }
    {
      "type": "item.completed",
      "item": {
        "id": "itm_3",
        "item_type": "command_execution",
        "command": "bash -lc sed -n '1,40p' README.md",
        "aggregated_output": "<h1 align=\"center\">OpenAI Codex CLI</h1>\n\n<p align=\"center\"><code>npm i -g @openai/codex</code><br />or <code>brew install codex</code></p>\n\n<p align=\"center\"><strong>Codex CLI</strong> is a coding agent from OpenAI that runs locally on your computer.\n</br>\n</br>If you want Codex in your code editor (VS Code, Cursor, Windsurf), <a href=\"https://developers.openai.com/codex/ide\">install in your IDE</a>\n</br>If you are looking for the <em>cloud-based agent</em> from OpenAI, <strong>Codex Web</strong>, go to <a href=\"https://chatgpt.com/codex\">chatgpt.com/codex</a></p>\n\n<p align=\"center\">\n  <img src=\"./.github/codex-cli-splash.png\" alt=\"Codex CLI splash\" width=\"80%\" />\n  </p>\n\n---\n\n## Quickstart\n\n### Installing and running Codex CLI\n\nInstall globally with your preferred package manager. If you use npm:\n\n```shell\nnpm install -g @openai/codex\n```\n\nAlternatively, if you use Homebrew:\n\n```shell\nbrew install codex\n```\n\nThen simply run `codex` to get started:\n\n```shell\ncodex\n```\n\n<details>\n<summary>You can also go to the <a href=\"https://github.com/openai/codex/releases/latest\">latest GitHub Release</a> and download the appropriate binary for your platform.</summary>\n\n",
        "exit_code": 0,
        "status": "completed"
      }
    }
    {
      "type": "item.completed",
      "item": {
        "id": "itm_4",
        "item_type": "reasoning",
        "text": "**Updating README file**\n\nThe user wants to update the README by removing the first line, which is an HTML h1 tag. I'll proceed to delete that line while keeping the rest intact. To do this, I need to use the apply_patch tool and create a patch in the diff format. I’ll take care to include context lines around the change. After removing the header, I’m considering whether to delete the following blank line as well to keep it tidy."
      }
    }
    {
      "type": "item.completed",
      "item": {
        "id": "itm_5",
        "item_type": "reasoning",
        "text": "**Crafting the patch**\n\nThe user has asked to remove just the first line, without mentioning the blank lines. Technically, leaving a blank line at the start is acceptable, but it's not ideal. Ideally, I'd prefer to remove the first line while maintaining the subsequent blank line. However, if I can also remove that blank line, it could tidy things up even more. Still, since the user's request is strictly to remove only the first line, I'll stick to that and craft the patch accordingly."
      }
    }
    {
      "type": "item.completed",
      "item": {
        "id": "itm_6",
        "item_type": "assistant_message",
        "text": "Applying the change by removing the first line from `README.md`."
      }
    }
    {
      "type": "item.completed",
      "item": {
        "id": "itm_7",
        "item_type": "file_change",
        "changes": [
          {
            "path": "/Users/pakrym/code/codex/README.md",
            "kind": "update"
          }
        ],
        "status": "completed"
      }
    }
    {
      "type": "item.completed",
      "item": {
        "id": "itm_8",
        "item_type": "reasoning",
        "text": "**Validating README Update**\n\nI updated the README.md and now I need to validate that the first line has been successfully removed. To do this, I'll check the first 10 lines of the document to ensure everything looks correct. It’s always good to double-check these kinds of changes, so I’m going to take that extra step to make sure the update was executed properly. Let's see what those initial lines say!"
      }
    }
    {
      "type": "item.completed",
      "item": {
        "id": "itm_9",
        "item_type": "command_execution",
        "command": "bash -lc sed -n '1,20p' README.md",
        "aggregated_output": "\n<p align=\"center\"><code>npm i -g @openai/codex</code><br />or <code>brew install codex</code></p>\n\n<p align=\"center\"><strong>Codex CLI</strong> is a coding agent from OpenAI that runs locally on your computer.\n</br>\n</br>If you want Codex in your code editor (VS Code, Cursor, Windsurf), <a href=\"https://developers.openai.com/codex/ide\">install in your IDE</a>\n</br>If you are looking for the <em>cloud-based agent</em> from OpenAI, <strong>Codex Web</strong>, go to <a href=\"https://chatgpt.com/codex\">chatgpt.com/codex</a></p>\n\n<p align=\"center\">\n  <img src=\"./.github/codex-cli-splash.png\" alt=\"Codex CLI splash\" width=\"80%\" />\n  </p>\n\n---\n\n## Quickstart\n\n### Installing and running Codex CLI\n\nInstall globally with your preferred package manager. If you use npm:\n\n",
        "exit_code": 0,
        "status": "completed"
      }
    }
    {
      "type": "item.completed",
      "item": {
        "id": "itm_10",
        "item_type": "assistant_message",
        "text": "- Updated `README.md:1` by removing the opening H1 line.\n- The README now starts with the install instructions paragraph.\n\nWant me to also remove the leading blank line at the top?"
      }
    }
    ```
    
    </details>
    
    The idea is to give users fully formatted items they can use directly in
    their rendering/application logic and avoid having them building up
    items manually based on events (unless they want to for streaming).
    
    This PR implements only the `item.completed` payload for some event
    types, more event types and item types to come.
    
    ---------
    
    Co-authored-by: Michael Bolin <mbolin@openai.com>
  • [exec] add include-plan-tool flag and print it nicely (#3461)
    ### Summary
    Sometimes in exec runs, we want to allow the model to use the
    `update_plan` tool, but that's not easily configurable. This change adds
    a feature flag for this, and formats the output so it's human-readable
    
    ## Test Plan
    <img width="1280" height="354" alt="Screenshot 2025-09-11 at 12 39
    44 AM"
    src="https://github.com/user-attachments/assets/72e11070-fb98-47f5-a784-5123ca7333d9"
    />
  • Add exec output-schema parameter (#4079)
    Adds structured output to `exec` via the `--structured-output`
    parameter.
  • enable-resume (#3537)
    Adding the ability to resume conversations.
    we have one verb `resume`. 
    
    Behavior:
    
    `tui`:
    `codex resume`: opens session picker
    `codex resume --last`: continue last message
    `codex resume <session id>`: continue conversation with `session id`
    
    `exec`:
    `codex resume --last`: continue last conversation
    `codex resume <session id>`: continue conversation with `session id`
    
    Implementation:
    - I added a function to find the path in `~/.codex/sessions/` with a
    `UUID`. This is helpful in resuming with session id.
    - Added the above mentioned flags
    - Added lots of testing
  • Review Mode (Core) (#3401)
    ## 📝 Review Mode -- Core
    
    This PR introduces the Core implementation for Review mode:
    
    - New op `Op::Review { prompt: String }:` spawns a child review task
    with isolated context, a review‑specific system prompt, and a
    `Config.review_model`.
    - `EnteredReviewMode`: emitted when the child review session starts.
    Every event from this point onwards reflects the review session.
    - `ExitedReviewMode(Option<ReviewOutputEvent>)`: emitted when the review
    finishes or is interrupted, with optional structured findings:
    
    ```json
    {
      "findings": [
        {
          "title": "<≤ 80 chars, imperative>",
          "body": "<valid Markdown explaining *why* this is a problem; cite files/lines/functions>",
          "confidence_score": <float 0.0-1.0>,
          "priority": <int 0-3>,
          "code_location": {
            "absolute_file_path": "<file path>",
            "line_range": {"start": <int>, "end": <int>}
          }
        }
      ],
      "overall_correctness": "patch is correct" | "patch is incorrect",
      "overall_explanation": "<1-3 sentence explanation justifying the overall_correctness verdict>",
      "overall_confidence_score": <float 0.0-1.0>
    }
    ```
    
    ## Questions
    
    ### Why separate out its own message history?
    
    We want the review thread to match the training of our review models as
    much as possible -- that means using a custom prompt, removing user
    instructions, and starting a clean chat history.
    
    We also want to make sure the review thread doesn't leak into the parent
    thread.
    
    ### Why do this as a mode, vs. sub-agents?
    
    1. We want review to be a synchronous task, so it's fine for now to do a
    bespoke implementation.
    2. We're still unclear about the final structure for sub-agents. We'd
    prefer to land this quickly and then refactor into sub-agents without
    rushing that implementation.
  • Simplify auth flow and reconcile differences between ChatGPT and API Key auth (#3189)
    This PR does the following:
    * Adds the ability to paste or type an API key.
    * Removes the `preferred_auth_method` config option. The last login
    method is always persisted in auth.json, so this isn't needed.
    * If OPENAI_API_KEY env variable is defined, the value is used to
    prepopulate the new UI. The env variable is otherwise ignored by the
    CLI.
    * Adds a new MCP server entry point "login_api_key" so we can implement
    this same API key behavior for the VS Code extension.
    <img width="473" height="140" alt="Screenshot 2025-09-04 at 3 51 04 PM"
    src="https://github.com/user-attachments/assets/c11bbd5b-8a4d-4d71-90fd-34130460f9d9"
    />
    <img width="726" height="254" alt="Screenshot 2025-09-04 at 3 51 32 PM"
    src="https://github.com/user-attachments/assets/6cc76b34-309a-4387-acbc-15ee5c756db9"
    />
  • Replace config.responses_originator_header_internal_override with CODEX_INTERNAL_ORIGINATOR_OVERRIDE_ENV_VAR (#3388)
    The previous config approach had a few issues:
    1. It is part of the config but not designed to be used externally
    2. It had to be wired through many places (look at the +/- on this PR
    3. It wasn't guaranteed to be set consistently everywhere because we
    don't have a super well defined way that configs stack. For example, the
    extension would configure during newConversation but anything that
    happened outside of that (like login) wouldn't get it.
    
    This env var approach is cleaner and also creates one less thing we have
    to deal with when coming up with a better holistic story around configs.
    
    One downside is that I removed the unit test testing for the override
    because I don't want to deal with setting the global env or spawning
    child processes and figuring out how to introspect their originator
    header. The new code is sufficiently simple and I tested it e2e that I
    feel as if this is still worth it.
  • Never store requests (#3212)
    When item ids are sent to Responses API it will load them from the
    database ignoring the provided values. This adds extra latency.
    
    Not having the mode to store requests also allows us to simplify the
    code.
    
    ## Breaking change
    
    The `disable_response_storage` configuration option is removed.
  • Add a common way to create HTTP client (#3110)
    Ensure User-Agent and originator are always sent.
  • Move CodexAuth and AuthManager to the core crate (#3074)
    Fix a long standing layering issue.
  • Add "View Image" tool (#2723)
    Adds a "View Image" tool so Codex can find and see images by itself:
    
    <img width="1772" height="420" alt="Screenshot 2025-08-26 at 10 40
    04 AM"
    src="https://github.com/user-attachments/assets/7a459c7b-0b86-4125-82d9-05fbb35ade03"
    />
  • Add web search tool (#2371)
    Adds web_search tool, enabling the model to use Responses API web_search
    tool.
    - Disabled by default, enabled by --search flag
    - When --search is passed, exposes web_search_request function tool to
    the model, which triggers user approval. When approved, the model can
    use the web_search tool for the remainder of the turn
    <img width="1033" height="294" alt="image"
    src="https://github.com/user-attachments/assets/62ac6563-b946-465c-ba5d-9325af28b28f"
    />
    
    ---------
    
    Co-authored-by: easong-openai <easong@openai.com>