Commit Graph

453 Commits

  • Separate interactive and non-interactive sessions (#4612)
    Do not show exec session in VSCode/TUI selector.
  • fix: handle JSON Schema in additionalProperties for MCP tools (#4454)
    Fixes #4176
    
    Some common tools provide a schema (even if just an empty object schema)
    as the value for `additionalProperties`. The parsing as it currently
    stands fails when it encounters this. This PR updates the schema to
    accept a schema object in addition to a boolean value, per the JSON
    Schema spec.
  • Support CODEX_API_KEY for codex exec (#4615)
    Allows to set API key per invocation of `codex exec`
  • Include request ID in the error message (#4572)
    To help with issue debugging
    <img width="1414" height="253" alt="image"
    src="https://github.com/user-attachments/assets/254732df-44ac-4252-997a-6c5e0927355b"
    />
  • canonicalize display of Agents.md paths on Windows. (#4577)
    Canonicalize path on Windows to 
    - remove unattractive path prefixes such as `\\?\`
    - simplify it (`../AGENTS.md` vs
    `C:\Users\iceweasel\code\coded\Agents.md`)
    before: <img width="1110" height="45" alt="Screenshot 2025-10-01 123520"
    src="https://github.com/user-attachments/assets/48920ae6-d89c-41b8-b4ea-df5c18fb5fad"
    />
    
    after: 
    <img width="585" height="46" alt="Screenshot 2025-10-01 123612"
    src="https://github.com/user-attachments/assets/70a1761a-9d97-4836-b14c-670b6f13e608"
    />
  • Fall back to configured instruction files if AGENTS.md isn't available (#4544)
    Allow users to configure an agents.md alternative to consume, but warn
    the user it may degrade model performance.
    
    Fixes #4376
  • implement command safety for PowerShell commands (#4269)
    Implement command safety for PowerShell commands on Windows
    
    This change adds a new Windows-specific command-safety module under
    `codex-rs/core/src/command_safety/windows_safe_commands.rs` to strictly
    sanitise PowerShell invocations. Key points:
    
    - Introduce `is_safe_command_windows()` to only allow explicitly
    read-only PowerShell calls.
    - Parse and split PowerShell invocations (including inline `-Command`
    scripts and pipelines).
    - Block unsafe switches (`-File`, `-EncodedCommand`, `-ExecutionPolicy`,
    unknown flags, call operators, redirections, separators).
    - Whitelist only read-only cmdlets (`Get-ChildItem`, `Get-Content`,
    `Select-Object`, etc.), safe Git subcommands (`status`, `log`, `show`,
    `diff`, `cat-file`), and ripgrep without unsafe options.
    - Add comprehensive unit tests covering allowed and rejected command
    patterns (nested calls, side effects, chaining, redirections).
    
    This ensures Codex on Windows can safely execute discover-only
    PowerShell workflows without risking destructive operations.
  • chore: sanbox extraction (#4286)
    # Extract and Centralize Sandboxing
    - Goal: Improve safety and clarity by centralizing sandbox planning and
    execution.
      - Approach:
    - Add planner (ExecPlan) and backend registry (Direct/Seatbelt/Linux)
    with run_with_plan.
    - Refactor codex.rs to plan-then-execute; handle failures/escalation via
    the plan.
    - Delegate apply_patch to the codex binary and run it with an empty env
    for determinism.
  • fix: remove mcp-types from app server protocol (#4537)
    We continue the separation between `codex app-server` and `codex
    mcp-server`.
    
    In particular, we introduce a new crate, `codex-app-server-protocol`,
    and migrate `codex-rs/protocol/src/mcp_protocol.rs` into it, renaming it
    `codex-rs/app-server-protocol/src/protocol.rs`.
    
    Because `ConversationId` was defined in `mcp_protocol.rs`, we move it
    into its own file, `codex-rs/protocol/src/conversation_id.rs`, and
    because it is referenced in a ton of places, we have to touch a lot of
    files as part of this PR.
    
    We also decide to get away from proper JSON-RPC 2.0 semantics, so we
    also introduce `codex-rs/app-server-protocol/src/jsonrpc_lite.rs`, which
    is basically the same `JSONRPCMessage` type defined in `mcp-types`
    except with all of the `"jsonrpc": "2.0"` removed.
    
    Getting rid of `"jsonrpc": "2.0"` makes our serialization logic
    considerably simpler, as we can lean heavier on serde to serialize
    directly into the wire format that we use now.
  • Add cloud tasks (#3197)
    Adds a TUI for managing, applying, and creating cloud tasks
  • Set originator for codex exec (#4485)
    Distinct from the main CLI.
  • [Core]: add tail in the rollout data (#4461)
    This will help us show the conversation tail and last updated timestamp.
  • Parse out frontmatter for custom prompts (#4456)
    [Cherry picked from https://github.com/openai/codex/pull/3565]
    
    Removes the frontmatter description/args from custom prompt files and
    only includes body.
  • OpenTelemetry events (#2103)
    ### Title
    
    ## otel
    
    Codex can emit [OpenTelemetry](https://opentelemetry.io/) **log events**
    that
    describe each run: outbound API requests, streamed responses, user
    input,
    tool-approval decisions, and the result of every tool invocation. Export
    is
    **disabled by default** so local runs remain self-contained. Opt in by
    adding an
    `[otel]` table and choosing an exporter.
    
    ```toml
    [otel]
    environment = "staging"   # defaults to "dev"
    exporter = "none"          # defaults to "none"; set to otlp-http or otlp-grpc to send events
    log_user_prompt = false    # defaults to false; redact prompt text unless explicitly enabled
    ```
    
    Codex tags every exported event with `service.name = "codex-cli"`, the
    CLI
    version, and an `env` attribute so downstream collectors can distinguish
    dev/staging/prod traffic. Only telemetry produced inside the
    `codex_otel`
    crate—the events listed below—is forwarded to the exporter.
    
    ### Event catalog
    
    Every event shares a common set of metadata fields: `event.timestamp`,
    `conversation.id`, `app.version`, `auth_mode` (when available),
    `user.account_id` (when available), `terminal.type`, `model`, and
    `slug`.
    
    With OTEL enabled Codex emits the following event types (in addition to
    the
    metadata above):
    
    - `codex.api_request`
      - `cf_ray` (optional)
      - `attempt`
      - `duration_ms`
      - `http.response.status_code` (optional)
      - `error.message` (failures)
    - `codex.sse_event`
      - `event.kind`
      - `duration_ms`
      - `error.message` (failures)
      - `input_token_count` (completion only)
      - `output_token_count` (completion only)
      - `cached_token_count` (completion only, optional)
      - `reasoning_token_count` (completion only, optional)
      - `tool_token_count` (completion only)
    - `codex.user_prompt`
      - `prompt_length`
      - `prompt` (redacted unless `log_user_prompt = true`)
    - `codex.tool_decision`
      - `tool_name`
      - `call_id`
    - `decision` (`approved`, `approved_for_session`, `denied`, or `abort`)
      - `source` (`config` or `user`)
    - `codex.tool_result`
      - `tool_name`
      - `call_id`
      - `arguments`
      - `duration_ms` (execution time for the tool)
      - `success` (`"true"` or `"false"`)
      - `output`
    
    ### Choosing an exporter
    
    Set `otel.exporter` to control where events go:
    
    - `none` – leaves instrumentation active but skips exporting. This is
    the
      default.
    - `otlp-http` – posts OTLP log records to an OTLP/HTTP collector.
    Specify the
      endpoint, protocol, and headers your collector expects:
    
      ```toml
      [otel]
      exporter = { otlp-http = {
        endpoint = "https://otel.example.com/v1/logs",
        protocol = "binary",
        headers = { "x-otlp-api-key" = "${OTLP_TOKEN}" }
      }}
      ```
    
    - `otlp-grpc` – streams OTLP log records over gRPC. Provide the endpoint
    and any
      metadata headers:
    
      ```toml
      [otel]
      exporter = { otlp-grpc = {
        endpoint = "https://otel.example.com:4317",
        headers = { "x-otlp-meta" = "abc123" }
      }}
      ```
    
    If the exporter is `none` nothing is written anywhere; otherwise you
    must run or point to your
    own collector. All exporters run on a background batch worker that is
    flushed on
    shutdown.
    
    If you build Codex from source the OTEL crate is still behind an `otel`
    feature
    flag; the official prebuilt binaries ship with the feature enabled. When
    the
    feature is disabled the telemetry hooks become no-ops so the CLI
    continues to
    function without the extra dependencies.
    
    ---------
    
    Co-authored-by: Anton Panasenko <apanasenko@openai.com>
  • [MCP] Add experimental support for streamable HTTP MCP servers (#4317)
    This PR adds support for streamable HTTP MCP servers when the
    `experimental_use_rmcp_client` is enabled.
    
    To set one up, simply add a new mcp server config with the url:
    ```
    [mcp_servers.figma]
    url = "http://127.0.0.1:3845/mcp"
    ```
    
    It also supports an optional `bearer_token` which will be provided in an
    authorization header. The full oauth flow is not supported yet.
    
    The config parsing will throw if it detects that the user mixed and
    matched config fields (like command + bearer token or url + env).
    
    The best way to review it is to review `core/src` and then
    `rmcp-client/src/rmcp_client.rs` first. The rest is tests and
    propagating the `Transport` struct around the codebase.
    
    Example with the Figma MCP:
    <img width="5084" height="1614" alt="CleanShot 2025-09-26 at 13 35 40"
    src="https://github.com/user-attachments/assets/eaf2771e-df3e-4300-816b-184d7dec5a28"
    />
  • reject dangerous commands for AskForApproval::Never (#4307)
    If we detect a dangerous command but approval_policy is Never, simply
    reject the command.
  • /status followup (#4304)
    - Render `send a message to load usage data` in the beginning of the
    session
    - Render `data not available yet` if received no rate limits 
    - nit case
    - Deleted stall snapshots that were moved to
    `codex-rs/tui/src/status/snapshots`
  • [MCP] Introduce an experimental official rust sdk based mcp client (#4252)
    The [official Rust
    SDK](https://github.com/modelcontextprotocol/rust-sdk/tree/57fc428c578a1a3fe851ee0838bf068bda120eb3)
    has come a long way since we first started our mcp client implementation
    5 months ago and, today, it is much more complete than our own
    stdio-only implementation.
    
    This PR introduces a new config flag `experimental_use_rmcp_client`
    which will use a new mcp client powered by the sdk instead of our own.
    
    To keep this PR simple, I've only implemented the same stdio MCP
    functionality that we had but will expand on it with future PRs.
    
    ---------
    
    Co-authored-by: pakrym-oai <pakrym@openai.com>
  • fix: token usage for compaction (#4281)
    Emit token usage update when draining compaction
  • ref: state - 2 (#4229)
    Extracting tasks in a module and start abstraction behind a Trait (more
    to come on this but each task will be tackled in a dedicated PR)
    The goal was to drop the ActiveTask and to have a (potentially) set of
    tasks during each turn
  • core: add potentially dangerous command check (#4211)
    Certain shell commands are potentially dangerous, and we want to check
    for them.
    Unless the user has explicitly approved a command, we will *always* ask
    them for approval
    when one of these commands is encountered, regardless of whether they
    are in a sandbox, or what their approval policy is.
    
    The first (of probably many) such examples is `git reset --hard`. We
    will be conservative and check for any `git reset`
  • Fixed login failure with API key in IDE extension when a .codex directory doesn't exist (#4258)
    This addresses bug #4092
    
    Testing:
    * Confirmed error occurs prior to fix if logging in using API key and no
    `~/.codex` directory exists
    * Confirmed after fix that `~/.codex` directory is properly created and
    error doesn't occur
  • fix (#4251)
    # External (non-OpenAI) Pull Request Requirements
    
    Before opening this Pull Request, please read the dedicated
    "Contributing" markdown file or your PR may be closed:
    https://github.com/openai/codex/blob/main/docs/contributing.md
    
    If your PR conforms to our contribution guidelines, replace this text
    with a detailed and high quality description of your changes.
  • make tests pass cleanly in sandbox (#4067)
    This changes the reqwest client used in tests to be sandbox-friendly,
    and skips a bunch of other tests that don't work inside the
    sandbox/without network.
  • Fix error message (#4204)
    Co-authored-by: Ahmed Ibrahim <aibrahim@openai.com>
  • chore: refactor attempt_stream_responses() out of stream_responses() (#4194)
    I would like to be able to swap in a different way to resolve model
    sampling requests, so this refactoring consolidates things behind
    `attempt_stream_responses()` to make that easier. Ideally, we would
    support an in-memory backend that we can use in our integration tests,
    for example.
  • ref: full state refactor (#4174)
    ## Current State Observations
    - `Session` currently holds many unrelated responsibilities (history,
    approval queues, task handles, rollout recorder, shell discovery, token
    tracking, etc.), making it hard to reason about ownership and lifetimes.
    - The anonymous `State` struct inside `codex.rs` mixes session-long data
    with turn-scoped queues and approval bookkeeping.
    - Turn execution (`run_task`) relies on ad-hoc local variables that
    should conceptually belong to a per-turn state object.
    - External modules (`codex::compact`, tests) frequently poke the raw
    `Session.state` mutex, which couples them to implementation details.
    - Interrupts, approvals, and rollout persistence all have bespoke
    cleanup paths, contributing to subtle bugs when a turn is aborted
    mid-flight.
    
    ## Desired End State
    - Keep a slim `Session` object that acts as the orchestrator and façade.
    It should expose a focused API (submit, approvals, interrupts, event
    emission) without storing unrelated fields directly.
    - Introduce a `state` module that encapsulates all mutable data
    structures:
    - `SessionState`: session-persistent data (history, approved commands,
    token/rate-limit info, maybe user preferences).
    - `ActiveTurn`: metadata for the currently running turn (sub-id, task
    kind, abort handle) and an `Arc<TurnState>`.
    - `TurnState`: all turn-scoped pieces (pending inputs, approval waiters,
    diff tracker, review history, auto-compact flags, last agent message,
    outstanding tool call bookkeeping).
    - Group long-lived helpers/managers into a dedicated `SessionServices`
    struct so `Session` does not accumulate "random" fields.
    - Provide clear, lock-safe APIs so other modules never touch raw
    mutexes.
    - Ensure every turn creates/drops a `TurnState` and that
    interrupts/finishes delegate cleanup to it.
  • chore: drop unused values from env_flags (#4188)
    For the most part, we try to avoid environment variables in favor of
    config options so the environment variables do not leak into child
    processes. These environment variables are no longer honored, so let's
    delete them to be clear.
    
    Ultimately, I would also like to eliminate `CODEX_RS_SSE_FIXTURE` in
    favor of something cleaner.
  • adds a windows-specific method to check if a command is safe (#4119)
    refactors command_safety files into its own package, so we can add
    platform-specific ones
    Also creates a windows-specific of `is_known_safe_command` that just
    returns false always, since that is what happens today.
  • Simplify tool implemetations (#4160)
    Use Result<String, FunctionCallError> for all tool handling code and
    rely on error propagation instead of creating failed items everywhere.
  • chore: upgrade to Rust 1.90 (#4124)
    Inspired by Dependabot's attempt to do this:
    https://github.com/openai/codex/pull/4029
    
    The new version of Clippy found some unused structs that are removed in
    this PR.
    
    Though nothing stood out to me in the Release Notes in terms of things
    we should start to take advantage of:
    https://blog.rust-lang.org/2025/09/18/Rust-1.90.0/.
  • nit: 350k tokens (#4156)
    350k tokens for gpt-5-codex auto-compaction and update comments for
    better description
  • Add Reset in for rate limits (#4111)
    - Parse the headers
    - Reorganize the struct because it's getting too long
    - show the resets at in the tui
    
    <img width="324" height="79" alt="image"
    src="https://github.com/user-attachments/assets/ca15cd48-f112-4556-91ab-1e3a9bc4683d"
    />
  • nit: drop instruction override for auto-compact (#4137)
    drop instruction override for auto-compact as this is not used and
    dangerous as it invalidates the cache
  • nit: update auto compact to 250k (#4135)
    update auto compact for gpt-5-codex to 250k
  • Send limits when getting rate limited (#4102)
    Users need visibility on rate limits when they are rate limited.
  • Add exec output-schema parameter (#4079)
    Adds structured output to `exec` via the `--structured-output`
    parameter.
  • chore: compact do not modify instructions (#4088)
    Keep the developer instruction and insert the summarisation message as a
    user message instead
  • chore: enable auto-compaction for gpt-5-codex (#4093)
    enable auto-compaction for `gpt-5-codex` at 220k tokens
  • Add notifier tests (#4064)
    Proposal:
    1. Use anyhow for tests and avoid unwrap
    2. Extract a helper for starting a test instance of codex
  • feat: update default (#4076)
    Changes:
    - Default model and docs now use gpt-5-codex. 
    - Disables the GPT-5 Codex NUX by default.
    - Keeps presets available for API key users.
  • Truncate potentially long user messages in compact message. (#4068)
    If a prior user message is massive, any future `/compact` task would
    fail because we're verbatim copying the user message into the new chat.