Commit Graph

30 Commits

  • feat: change ConfigLayerName into a disjoint union rather than a simple enum (#8095)
    This attempts to tighten up the types related to "config layers."
    Currently, `ConfigLayerEntry` is defined as follows:
    
    
    https://github.com/openai/codex/blob/bef36f4ae765f471d7cd69372fcf1b92c8f0367a/codex-rs/core/src/config_loader/state.rs#L19-L25
    
    but the `source` field is a bit of a lie, as:
    
    - for `ConfigLayerName::Mdm`, it is
    `"com.openai.codex/config_toml_base64"`
    - for `ConfigLayerName::SessionFlags`, it is `"--config"`
    - for `ConfigLayerName::User`, it is `"config.toml"` (just the file
    name, not the path to the `config.toml` on disk that was read)
    - for `ConfigLayerName::System`, it seems like it is usually
    `/etc/codex/managed_config.toml` in practice, though on Windows, it is
    `%CODEX_HOME%/managed_config.toml`:
    
    
    https://github.com/openai/codex/blob/bef36f4ae765f471d7cd69372fcf1b92c8f0367a/codex-rs/core/src/config_loader/layer_io.rs#L84-L101
    
    All that is to say, in three out of the four `ConfigLayerName`, `source`
    is a `PathBuf` that is not an absolute path (or even a true path).
    
    This PR tries to uplevel things by eliminating `source` from
    `ConfigLayerEntry` and turning `ConfigLayerName` into a disjoint union
    named `ConfigLayerSource` that has the appropriate metadata for each
    variant, favoring the use of `AbsolutePathBuf` where appropriate:
    
    ```rust
    pub enum ConfigLayerSource {
        /// Managed preferences layer delivered by MDM (macOS only).
        #[serde(rename_all = "camelCase")]
        #[ts(rename_all = "camelCase")]
        Mdm { domain: String, key: String },
        /// Managed config layer from a file (usually `managed_config.toml`).
        #[serde(rename_all = "camelCase")]
        #[ts(rename_all = "camelCase")]
        System { file: AbsolutePathBuf },
        /// Session-layer overrides supplied via `-c`/`--config`.
        SessionFlags,
        /// User config layer from a file (usually `config.toml`).
        #[serde(rename_all = "camelCase")]
        #[ts(rename_all = "camelCase")]
        User { file: AbsolutePathBuf },
    }
    ```
  • fix: introduce AbsolutePathBuf as part of sandbox config (#7856)
    Changes the `writable_roots` field of the `WorkspaceWrite` variant of
    the `SandboxPolicy` enum from `Vec<PathBuf>` to `Vec<AbsolutePathBuf>`.
    This is helpful because now callers can be sure the value is an absolute
    path rather than a relative one. (Though when using an absolute path in
    a Seatbelt config policy, we still have to _canonicalize_ it first.)
    
    Because `writable_roots` can be read from a config file, it is important
    that we are able to resolve relative paths properly using the parent
    folder of the config file as the base path.
  • seatbelt: allow openpty() (#7507)
    This allows `openpty(3)` to run in the default sandbox. Also permit
    reading `kern.argmax`, which is the maximum number of arguments to
    exec().
  • codex-exec: allow resume --last to read prompt #6717 (#6719)
    ### Description
    
    - codex exec --json resume --last "<prompt>" bailed out because clap
    treated the prompt as SESSION_ID. I removed the conflicts_with flag and
    reinterpret that positional as a prompt when
    --last is set, so the flow now keeps working in JSON mode.
    (codex-rs/exec/src/cli.rs:84-104, codex-rs/exec/src/lib.rs:75-130)
    - Added a regression test that exercises resume --last in JSON mode to
    ensure the prompt is accepted and the rollout file is updated.
    (codex-rs/exec/tests/suite/resume.rs:126-178)
    
    ### Testing
    
      - just fmt
      - cargo test -p codex-exec
      - just fix -p codex-exec
      - cargo test -p codex-exec
    
    #6717
    
    Signed-off-by: Dmitri Khokhlov <dkhokhlov@cribl.io>
  • Update defaults to gpt-5.1 (#6652)
    ## Summary
    - update documentation, example configs, and automation defaults to
    reference gpt-5.1 / gpt-5.1-codex
    - bump the CLI and core configuration defaults, model presets, and error
    messaging to the new models while keeping the model-family/tool coverage
    for legacy slugs
    - refresh tests, fixtures, and TUI snapshots so they expect the upgraded
    defaults
    
    ## Testing
    - `cargo test -p codex-core
    config::tests::test_precedence_fixture_with_gpt5_profile`
    
    
    ------
    [Codex
    Task](https://chatgpt.com/codex/tasks/task_i_6916c5b3c2b08321ace04ee38604fc6b)
  • feat: Add support for --add-dir to exec and TypeScript SDK (#6565)
    ## Summary
    
    Adds support for specifying additional directories in the TypeScript SDK
    through a new `additionalDirectories` option in `ThreadOptions`.
    
    ## Changes
    
    - Added `additionalDirectories` parameter to `ThreadOptions` interface
    - Updated `CodexExec` to accept and pass through additional directories
    via the `--config` flag for `sandbox_workspace_write.writable_roots`
    - Added comprehensive test coverage for the new functionality
    
    ## Test plan
    
    - Added test case that verifies `additionalDirectories` is correctly
    passed as repeated flags
    - Existing tests continue to pass
    
    ---------
    
    Co-authored-by: Claude <noreply@anthropic.com>
  • feat(tui): clarify Windows auto mode requirements (#5568)
    ## Summary
    - Coerce Windows `workspace-write` configs back to read-only, surface
    the forced downgrade in the approvals popup,
      and funnel users toward WSL or Full Access.
    - Add WSL installation instructions to the Auto preset on Windows while
    keeping the preset available for other
      platforms.
    - Skip the trust-on-first-run prompt on native Windows so new folders
    remain read-only without additional
      confirmation.
    - Expose a structured sandbox policy resolution from config to flag
    Windows downgrades and adjust tests (core,
    exec, TUI) to reflect the new behavior; provide a Windows-only approvals
    snapshot.
    
      ## Testing
      - cargo fmt
    - cargo test -p codex-core
    config::tests::add_dir_override_extends_workspace_writable_roots
    - cargo test -p codex-exec
    suite::resume::exec_resume_preserves_cli_configuration_overrides
    - cargo test -p codex-tui
    chatwidget::tests::approvals_selection_popup_snapshot
    - cargo test -p codex-tui
    approvals_popup_includes_wsl_note_for_auto_mode
      - cargo test -p codex-tui windows_skips_trust_prompt
      - just fix -p codex-core
      - just fix -p codex-tui
  • chore: drop approve all (#5503)
    Not needed anymore
  • Set codex SDK TypeScript originator (#4894)
    ## Summary
    - ensure the TypeScript SDK sets CODEX_INTERNAL_ORIGINATOR_OVERRIDE to
    codex_sdk_ts when spawning the Codex CLI
    - extend the responses proxy test helper to capture request headers for
    assertions
    - add coverage that verifies Codex threads launched from the TypeScript
    SDK send the codex_sdk_ts originator header
    
    ## Testing
    - Not Run (not requested)
    
    
    ------
    https://chatgpt.com/codex/tasks/task_i_68e561b125248320a487f129093d16e7
  • Simplify request body assertions (#4845)
    We'll have a lot more test like these
  • Use response helpers when mounting SSE test responses (#4783)
    ## Summary
    - replace manual wiremock SSE mounts in the compact suite with the
    shared response helpers
    - simplify the exec auth_env integration test by using the
    mount_sse_once_match helper
    - rely on mount_sse_sequence plus server request collection to replace
    the bespoke SeqResponder utility in tests
    
    ## Testing
    - just fmt
    
    ------
    https://chatgpt.com/codex/tasks/task_i_68e2e238f2a88320a337f0b9e4098093
  • Add helper for response created SSE events in tests (#4758)
    ## Summary
    - add a reusable `ev_response_created` helper that builds
    `response.created` SSE events for integration tests
    - update the exec and core integration suites to use the new helper
    instead of repeating manual JSON literals
    - keep the streaming fixtures consistent by relying on the shared helper
    in every touched test
    
    ## Testing
    - `just fmt`
    
    
    ------
    https://chatgpt.com/codex/tasks/task_i_68e1fe885bb883208aafffb94218da61
  • feat: codex exec writes only the final message to stdout (#4644)
    This updates `codex exec` so that, by default, most of the agent's
    activity is written to stderr so that only the final agent message is
    written to stdout. This makes it easier to pipe `codex exec` into
    another tool without extra filtering.
    
    I introduced `#![deny(clippy::print_stdout)]` to help enforce this
    change and renamed the `ts_println!()` macro to `ts_msg()` because (1)
    it no longer calls `println!()` and (2), `ts_eprintln!()` seemed too
    long of a name.
    
    While here, this also adds `-o` as an alias for `--output-last-message`.
    
    Fixes https://github.com/openai/codex/issues/1670
  • Support CODEX_API_KEY for codex exec (#4615)
    Allows to set API key per invocation of `codex exec`
  • OpenTelemetry events (#2103)
    ### Title
    
    ## otel
    
    Codex can emit [OpenTelemetry](https://opentelemetry.io/) **log events**
    that
    describe each run: outbound API requests, streamed responses, user
    input,
    tool-approval decisions, and the result of every tool invocation. Export
    is
    **disabled by default** so local runs remain self-contained. Opt in by
    adding an
    `[otel]` table and choosing an exporter.
    
    ```toml
    [otel]
    environment = "staging"   # defaults to "dev"
    exporter = "none"          # defaults to "none"; set to otlp-http or otlp-grpc to send events
    log_user_prompt = false    # defaults to false; redact prompt text unless explicitly enabled
    ```
    
    Codex tags every exported event with `service.name = "codex-cli"`, the
    CLI
    version, and an `env` attribute so downstream collectors can distinguish
    dev/staging/prod traffic. Only telemetry produced inside the
    `codex_otel`
    crate—the events listed below—is forwarded to the exporter.
    
    ### Event catalog
    
    Every event shares a common set of metadata fields: `event.timestamp`,
    `conversation.id`, `app.version`, `auth_mode` (when available),
    `user.account_id` (when available), `terminal.type`, `model`, and
    `slug`.
    
    With OTEL enabled Codex emits the following event types (in addition to
    the
    metadata above):
    
    - `codex.api_request`
      - `cf_ray` (optional)
      - `attempt`
      - `duration_ms`
      - `http.response.status_code` (optional)
      - `error.message` (failures)
    - `codex.sse_event`
      - `event.kind`
      - `duration_ms`
      - `error.message` (failures)
      - `input_token_count` (completion only)
      - `output_token_count` (completion only)
      - `cached_token_count` (completion only, optional)
      - `reasoning_token_count` (completion only, optional)
      - `tool_token_count` (completion only)
    - `codex.user_prompt`
      - `prompt_length`
      - `prompt` (redacted unless `log_user_prompt = true`)
    - `codex.tool_decision`
      - `tool_name`
      - `call_id`
    - `decision` (`approved`, `approved_for_session`, `denied`, or `abort`)
      - `source` (`config` or `user`)
    - `codex.tool_result`
      - `tool_name`
      - `call_id`
      - `arguments`
      - `duration_ms` (execution time for the tool)
      - `success` (`"true"` or `"false"`)
      - `output`
    
    ### Choosing an exporter
    
    Set `otel.exporter` to control where events go:
    
    - `none` – leaves instrumentation active but skips exporting. This is
    the
      default.
    - `otlp-http` – posts OTLP log records to an OTLP/HTTP collector.
    Specify the
      endpoint, protocol, and headers your collector expects:
    
      ```toml
      [otel]
      exporter = { otlp-http = {
        endpoint = "https://otel.example.com/v1/logs",
        protocol = "binary",
        headers = { "x-otlp-api-key" = "${OTLP_TOKEN}" }
      }}
      ```
    
    - `otlp-grpc` – streams OTLP log records over gRPC. Provide the endpoint
    and any
      metadata headers:
    
      ```toml
      [otel]
      exporter = { otlp-grpc = {
        endpoint = "https://otel.example.com:4317",
        headers = { "x-otlp-meta" = "abc123" }
      }}
      ```
    
    If the exporter is `none` nothing is written anywhere; otherwise you
    must run or point to your
    own collector. All exporters run on a background batch worker that is
    flushed on
    shutdown.
    
    If you build Codex from source the OTEL crate is still behind an `otel`
    feature
    flag; the official prebuilt binaries ship with the feature enabled. When
    the
    feature is disabled the telemetry hooks become no-ops so the CLI
    continues to
    function without the extra dependencies.
    
    ---------
    
    Co-authored-by: Anton Panasenko <apanasenko@openai.com>
  • Add turn started/completed events and correct exit code on error (#4309)
    Adds new event for session completed that includes usage. Also ensures
    we return 1 on failures.
    ```
    {
      "type": "session.created",
      "session_id": "019987a7-93e7-7b20-9e05-e90060e411ea"
    }
    {
      "type": "turn.started"
    }
    ...
    {
      "type": "turn.completed",
      "usage": {
        "input_tokens": 78913,
        "cached_input_tokens": 65280,
        "output_tokens": 1099
      }
    }
    ```
  • Add codex exec testing helpers (#4254)
    Add a shortcut to create working directories and run codex exec with
    fake server.
  • make tests pass cleanly in sandbox (#4067)
    This changes the reqwest client used in tests to be sandbox-friendly,
    and skips a bunch of other tests that don't work inside the
    sandbox/without network.
  • Add exec output-schema parameter (#4079)
    Adds structured output to `exec` via the `--structured-output`
    parameter.
  • chore: clippy on redundant closure (#4058)
    Add redundant closure clippy rules and let Codex fix it by minimising
    FQP
  • Use helpers instead of fixtures (#3888)
    Move to using test helper method everywhere.
  • fix: ensure cwd for conversation and sandbox are separate concerns (#3874)
    Previous to this PR, both of these functions take a single `cwd`:
    
    
    https://github.com/openai/codex/blob/71038381aa0f51aa62e1a2bcc7cbf26a05b141f3/codex-rs/core/src/seatbelt.rs#L19-L25
    
    
    https://github.com/openai/codex/blob/71038381aa0f51aa62e1a2bcc7cbf26a05b141f3/codex-rs/core/src/landlock.rs#L16-L23
    
    whereas `cwd` and `sandbox_cwd` should be set independently (fixed in
    this PR).
    
    Added `sandbox_distinguishes_command_and_policy_cwds()` to
    `codex-rs/exec/tests/suite/sandbox.rs` to verify this.
  • enable-resume (#3537)
    Adding the ability to resume conversations.
    we have one verb `resume`. 
    
    Behavior:
    
    `tui`:
    `codex resume`: opens session picker
    `codex resume --last`: continue last message
    `codex resume <session id>`: continue conversation with `session id`
    
    `exec`:
    `codex resume --last`: continue last conversation
    `codex resume <session id>`: continue conversation with `session id`
    
    Implementation:
    - I added a function to find the path in `~/.codex/sessions/` with a
    `UUID`. This is helpful in resuming with session id.
    - Added the above mentioned flags
    - Added lots of testing
  • chore: enable clippy::redundant_clone (#3489)
    Created this PR by:
    
    - adding `redundant_clone` to `[workspace.lints.clippy]` in
    `cargo-rs/Cargol.toml`
    - running `cargo clippy --tests --fix`
    - running `just fmt`
    
    Though I had to clean up one instance of the following that resulted:
    
    ```rust
    let codex = codex;
    ```
  • chore: require uninlined_format_args from clippy (#2845)
    - added `uninlined_format_args` to `[workspace.lints.clippy]` in the
    `Cargo.toml` for the workspace
    - ran `cargo clippy --tests --fix`
    - ran `just fmt`
  • [exec] Clean up apply-patch tests (#2648)
    ## Summary
    These tests were getting a bit unwieldy, and they're starting to become
    load-bearing. Let's clean them up, and get them working solidly so we
    can easily expand this harness with new tests.
    
    ## Test Plan
    - [x] Tests continue to pass
  • test: faster test execution in codex-core (#2633)
    this dramatically improves time to run `cargo test -p codex-core` (~25x
    speedup).
    
    before:
    ```
    cargo test -p codex-core  35.96s user 68.63s system 19% cpu 8:49.80 total
    ```
    
    after:
    ```
    cargo test -p codex-core  5.51s user 8.16s system 63% cpu 21.407 total
    ```
    
    both tests measured "hot", i.e. on a 2nd run with no filesystem changes,
    to exclude compile times.
    
    approach inspired by [Delete Cargo Integration
    Tests](https://matklad.github.io/2021/02/27/delete-cargo-integration-tests.html),
    we move all test cases in tests/ into a single suite in order to have a
    single binary, as there is significant overhead for each test binary
    executed, and because test execution is only parallelized with a single
    binary.