34 Commits

  • [codex] Record exec-server lifecycle metrics (#27467)
    ## Summary
    
    - Record bounded connection, request, and process lifecycle metrics.
    - Report active gauges from callbacks on every collection, including
    delta exports.
    - Serialize active-count updates so concurrent starts and finishes
    cannot publish stale values.
    - Serialize process exit, explicit termination, and shutdown through the
    process registry so exactly one completion result wins.
    - Keep the implementation small with single-owner RAII guards and one
    real OTLP/HTTP integration test using the existing `wiremock`
    dependency.
    
    ## Root cause
    
    Process exit and session shutdown previously used cloned completion
    state. That avoided duplicate emission, but it duplicated lifecycle
    ownership and made the ordering harder to reason about. The process
    registry mutex already defines the lifecycle ordering, so the final
    implementation stores the metric guard and termination flag directly on
    the process entry. Whichever path claims the entry first owns the
    completion result.
    
    Production metric export uses delta temporality. Event-only synchronous
    gauge recordings disappear after the next collection when no count
    changes, so active counts now use observable callbacks that report
    current state on every collection.
    
    The cleanup also removes the constant `result="accepted"` connection
    tag, redundant route and response assertions, a custom HTTP collector,
    and fallback initialization machinery that did not add behavior.
    
    ## Stack
    
    Review and land this stack in order:
    
    1. #27466 — trace exec-server JSON-RPC requests
    2. #27467 — record bounded connection, request, and process lifecycle
    metrics **(this PR)**
    3. #27470 — observe remote registration and Noise rendezvous lifecycle
    
    ## Validation
    
    - `just test -p codex-exec-server --lib` (158 passed)
    - `just test -p codex-cli --test exec_server` (3 passed)
    - `just test -p codex-otel
    observable_gauge_is_collected_on_every_delta_snapshot` (1 passed)
    - `CARGO_BUILD_JOBS=1 just fix -p codex-otel -p codex-exec-server`
    - `just fmt`
    - `git diff --check`
  • feat: use run agent task auth for inference (#19051)
    ## Stack
    
    This is PR 3 of the simplified HAI single-run-task stack:
    
    - [#19047](https://github.com/openai/codex/pull/19047) Agent Identity
    assertion and task-registration primitives, including the shared
    run-task helper used by existing Agent Identity JWT auth.
    - [#19049](https://github.com/openai/codex/pull/19049)
    Disabled-by-default ChatGPT auth opt-in that provisions/reuses persisted
    Agent Identity runtime auth and its single run task.
    - [#19051](https://github.com/openai/codex/pull/19051) Run-scoped
    provider auth that uses one backend-owned task id for first-party
    inference and compaction requests.
    
    [#19054](https://github.com/openai/codex/pull/19054) collapsed out of
    the active stack because the simplified design no longer needs a
    separate background/control-plane task helper.
    
    ## Summary
    
    This PR moves Agent Identity usage into provider auth resolution. That
    keeps `AgentAssertion` auth tied to first-party OpenAI provider requests
    instead of applying a late session-wide override that could affect
    local, custom, Bedrock, API-key, or external-bearer providers.
    
    What changed:
    
    - adds a small `ProviderAuthScope` struct carrying the run auth policy
    and session source needed by provider-scoped auth resolution
    - lets `Session` opt the existing `ModelClient` into `ChatGptAuth`
    policy when `use_agent_identity` is enabled, without adding a second
    model-client constructor
    - resolves Agent Identity only for first-party OpenAI provider auth
    paths
    - uses the persisted run task id from the `AgentIdentityAuth` record to
    build `AgentAssertion` auth for Responses requests
    - routes shared request setup through scoped provider auth so unary
    compact requests use the same run-task assertion path as inference turns
    - keeps local/custom/Bedrock/env-key/external-bearer provider auth
    unchanged
    - lets missing run-task state surface through the existing model-request
    error path instead of silently falling back to bearer auth
    
    This PR intentionally does not create thread-scoped, target-scoped, or
    background-scoped task identities. The run task is the only task Codex
    registers in this POC shape.
    
    ## Testing
    
    - `just test -p codex-model-provider`
    - `just test -p codex-core client::tests::provider_auth_scope_uses`
    - `just test -p codex-core remote_compact_uses_agent_identity_assertion`
  • [codex] Use expect in integration tests (#28441)
    The workspace denies `clippy::expect_used` in production. Although
    `clippy.toml` allows `expect` in tests, Bazel Clippy compiles
    integration-test helper code in a way that does not receive that
    exemption, which encouraged verbose `unwrap_or_else(... panic!(...))`
    and equivalent `match`/`let else` forms.
    
    This allows `clippy::expect_used` once at each integration-test crate
    root (including aggregated suites and test-support libraries), then
    replaces manual panic-based Result and Option unwraps with
    `expect`/`expect_err`. Standalone `tests/*.rs` files remain their own
    crate roots. Intentional assertion and unexpected-variant panics remain
    unchanged, and the production `expect_used = "deny"` lint remains in
    place.
    
    The cleanup is mechanical and net-negative in line count.
  • [codex] Add second-based OTEL duration histograms (#27058)
    ## Why
    
    Exec-server request and connection latencies need fractional-second
    histograms. The existing duration API records integer milliseconds and
    uses millisecond-scale buckets.
    
    ## What changed
    
    - Adds a described duration API that records `Duration` values as
    fractional seconds.
    - Uses second-scale explicit histogram boundaries.
    - Caches duration histograms by name, unit, and description, matching
    the existing instrument caching model.
    - Covers exact boundaries, representative bucket placement, fractional
    sums, and exported metadata.
    
    This PR only adds the duration primitive. It does not add exec-server
    adoption.
    
    ## Stack
    
    1. #26091: counter descriptions
    2. #27057: gauge instruments
    3. **#27058: second-based duration histograms**
    4. #25019: initialize exec-server OpenTelemetry at startup
    
    Related independent coverage: #27059 tests OTLP HTTP log and trace event
    export.
    
    ## Validation
    
    - `just test -p codex-otel`
  • [codex] Cover OTLP HTTP log and trace event export (#27059)
    ## Why
    
    The generic OTLP HTTP paths for log events and trace events need
    end-to-end coverage before exec-server relies on them.
    
    ## What changed
    
    - Adds loopback coverage for exporting `codex_otel.log_only` events to
    `/v1/logs`.
    - Verifies `codex_otel.trace_safe` events are present in the exported
    trace payload.
    
    This is a test-only PR. It does not change OTEL runtime behavior or
    metric APIs.
    
    ## Related work
    
    - #26091: counter descriptions
    - #27057: gauge instruments
    - #27058: second-based duration histograms
    
    This PR is independent and can land directly on `main`.
    
    ## Validation
    
    - `just test -p codex-otel`
    - `just fix -p codex-otel`
    - `just fmt`
  • [codex] Add reusable OTEL gauge instruments (#27057)
    ## Why
    
    Exec-server observability needs current-value measurements in addition
    to counters. The reusable OTEL client should expose that primitive
    without coupling it to exec-server runtime behavior.
    
    ## What changed
    
    - Adds integer gauge instruments, with optional descriptions.
    - Caches gauges by name and description so instrument metadata remains
    part of the declaration identity.
    - Covers gauge values, descriptions, merged attributes, and OTLP HTTP
    export.
    
    This PR only adds the gauge primitive. It does not add second-based
    duration histograms or exec-server adoption.
    
    ## Stack
    
    1. #26091: counter descriptions
    2. **#27057: gauge instruments**
    3. #27058: second-based duration histograms
    
    Related independent coverage: #27059 tests OTLP HTTP log and trace event
    export.
    
    ## Validation
    
    - `just test -p codex-otel`
    - `just fix -p codex-otel`
    - `just fmt`
  • [codex] Add OTEL counter descriptions (#26091)
    ## Why
    
    Metric descriptions should be declared with reusable OTEL instruments
    instead of being coupled to individual consumers. Counter descriptions
    are the smallest API primitive needed by the exec-server observability
    work.
    
    ## What changed
    
    - Adds `counter_with_description` while preserving the existing counter
    API.
    - Caches counters by name and description so instrument metadata remains
    part of the declaration identity.
    - Covers the exported description together with the existing value and
    attribute contract.
    
    This PR only adds counter descriptions. It does not add gauges,
    second-based durations, or exec-server adoption.
    
    ## Stack
    
    1. **#26091: counter descriptions**
    2. #27057: gauge instruments
    3. #27058: second-based duration histograms
    
    Related independent coverage: #27059 tests OTLP HTTP log and trace event
    export.
    
    The `codex-exec-server` bounded service tag now stays with the
    exec-server adoption change instead of this reusable infrastructure
    stack.
    
    ## Validation
    
    - `just test -p codex-otel`
    - `just fix -p codex-otel`
    - `just fmt`
  • otel: drop legacy profile usage telemetry (#24061)
    ## Summary
    - drop the dead legacy profile usage metric and active-profile
    conversation-start fields
    - update role comments so they describe provider and service-tier
    preservation without legacy config-profile wording
    - pair the code cleanup with the file-backed profile docs update in
    openai/developers-website#1476
    
    ## Testing
    - `just fmt`
    - `cargo test -p codex-otel`
    - `cargo test -p codex-core` *(fails: existing stack overflow in
    `mcp_tool_call::tests::guardian_mode_mcp_denial_returns_rationale_message`)*
    - `cargo test -p codex-core --lib
    mcp_tool_call::tests::guardian_mode_mcp_denial_returns_rationale_message`
    *(fails with the same stack overflow)*
  • Split plugin install discovery into list and request tools (#23372)
    ## Summary
    - Add `list_available_plugins_to_install` as the inventory step for
    plugin and connector install suggestions.
    - Slim `request_plugin_install` so it only handles the actual
    elicitation, instead of carrying the full discoverable list in its
    prompt.
    - Emit send-time telemetry when an install elicitation is dispatched,
    including requested tool identity in the event payload.
    - Emit install-result telemetry through `SessionTelemetry`, including
    tool type, user response action, and completion status.
    - Update registration and tests to cover the new two-step flow while
    keeping the existing `tool_suggest` feature gate unchanged.
    
    ## Testing
    - `just fmt`
    - `cargo test -p codex-tools`
    - `cargo test -p codex-core request_plugin_install`
    - `cargo test -p codex-core list_available_plugins_to_install`
    - `cargo test -p codex-core
    install_suggestion_tools_can_be_registered_without_search_tool`
    - `cargo test -p codex-otel
    manager_records_plugin_install_suggestion_metric`
    - `cargo test -p codex-otel
    manager_records_plugin_install_elicitation_sent_metric`
    - `just fix -p codex-core`
    - `just fix -p codex-tools`
    - `just fix -p codex-otel`
    - `cargo check -p codex-core`
  • Preserve image detail in app-server inputs (#20693)
    ## Summary
    
    - Add optional image detail to user image inputs across core, app-server
    v2, thread history/event mapping, and the generated app-server
    schemas/types.
    - Preserve requested detail when serializing Responses image inputs:
    omitted detail stays on the existing `high` default, while explicit
    `original` keeps local images on the original-resolution path.
    - Support `high`/`original` consistently for tool image outputs,
    including MCP `codex/imageDetail`, code-mode image helpers, and
    `view_image`.
  • Simplify MCP tool handler plumbing (#21595)
    ## Why
    The MCP tool path had accumulated a few core-owned special cases: a
    dedicated payload variant, resolver plumbing, a legacy `AfterToolUse`
    translation path, and a side channel for parallel-call metadata. That
    made `ToolRegistry` and the spec builder know more about MCP than they
    needed to.
    
    This change moves MCP-specific execution details back onto `ToolInfo`
    and `McpHandler` so `codex-core` can treat MCP calls like normal
    function calls while still preserving MCP-specific dispatch and
    telemetry behavior where it belongs.
    
    ## What changed
    - removed `resolve_mcp_tool_info`, `ToolPayload::Mcp`, `ToolKind`, and
    the remaining registry-side MCP resolver path
    - stored MCP routing metadata directly on `McpHandler` and `ToolInfo`,
    including `supports_parallel_tool_calls`
    - deleted the legacy `AfterToolUse` consumer in `core`, which removes
    the need for handler-specific `after_tool_use_payload` implementations
    - switched tool-result telemetry to handler-provided tags and kept
    MCP-specific dispatch payload construction inside the handler
    - simplified tool spec planning/building by passing `ToolInfo` directly
    and dropping the direct/deferred MCP wrapper structs and the
    parallel-server side table
    
    ## Testing
    - `cargo check -p codex-core -p codex-mcp -p codex-otel`
    - `cargo test -p codex-core
    mcp_parallel_support_uses_exact_payload_server`
    - `cargo test -p codex-core
    direct_mcp_tools_register_namespaced_handlers`
    - `cargo test -p codex-core
    search_tool_description_lists_each_mcp_source_once`
    - `cargo test -p codex-mcp
    list_all_tools_uses_startup_snapshot_while_client_is_pending`
    - `just fix -p codex-core -p codex-mcp -p codex-otel`
  • codex-otel: add configurable trace metadata (#21556)
    Add Codex config for static trace span attributes and structured W3C
    tracestate field upserts. The config flows through OtelSettings so
    callers can attach trace metadata without touching every span call site.
    
    Apply span attributes with an SDK span processor so every exported
    trace span carries the configured metadata. Model tracestate as nested
    member fields so configured keys can be upserted while unrelated
    propagated state in the same member is preserved.
    
    Validate configured tracestate before installing provider-global state,
    including header-unsafe values the SDK does not reject by itself. This
    keeps Codex from propagating malformed trace context from config.
    
    Update the config schema, public docs, and OTLP loopback coverage for
    config parsing, span export, propagation, and invalid-header rejection.
  • [codex] reduce module visibility (#16978)
    ## Summary
    - reduce public module visibility across Rust crates, preferring private
    or crate-private modules with explicit crate-root public exports
    - update external call sites and tests to use the intended public crate
    APIs instead of reaching through module trees
    - add the module visibility guideline to AGENTS.md
    
    ## Validation
    - `cargo check --workspace --all-targets --message-format=short` passed
    before the final fix/format pass
    - `just fix` completed successfully
    - `just fmt` completed successfully
    - `git diff --check` passed
  • chore: clean up argument-comment lint and roll out all-target CI on macOS (#16054)
    ## Why
    
    `argument-comment-lint` was green in CI even though the repo still had
    many uncommented literal arguments. The main gap was target coverage:
    the repo wrapper did not force Cargo to inspect test-only call sites, so
    examples like the `latest_session_lookup_params(true, ...)` tests in
    `codex-rs/tui_app_server/src/lib.rs` never entered the blocking CI path.
    
    This change cleans up the existing backlog, makes the default repo lint
    path cover all Cargo targets, and starts rolling that stricter CI
    enforcement out on the platform where it is currently validated.
    
    ## What changed
    
    - mechanically fixed existing `argument-comment-lint` violations across
    the `codex-rs` workspace, including tests, examples, and benches
    - updated `tools/argument-comment-lint/run-prebuilt-linter.sh` and
    `tools/argument-comment-lint/run.sh` so non-`--fix` runs default to
    `--all-targets` unless the caller explicitly narrows the target set
    - fixed both wrappers so forwarded cargo arguments after `--` are
    preserved with a single separator
    - documented the new default behavior in
    `tools/argument-comment-lint/README.md`
    - updated `rust-ci` so the macOS lint lane keeps the plain wrapper
    invocation and therefore enforces `--all-targets`, while Linux and
    Windows temporarily pass `-- --lib --bins`
    
    That temporary CI split keeps the stricter all-targets check where it is
    already cleaned up, while leaving room to finish the remaining Linux-
    and Windows-specific target-gated cleanup before enabling
    `--all-targets` on those runners. The Linux and Windows failures on the
    intermediate revision were caused by the wrapper forwarding bug, not by
    additional lint findings in those lanes.
    
    ## Validation
    
    - `bash -n tools/argument-comment-lint/run.sh`
    - `bash -n tools/argument-comment-lint/run-prebuilt-linter.sh`
    - shell-level wrapper forwarding check for `-- --lib --bins`
    - shell-level wrapper forwarding check for `-- --tests`
    - `just argument-comment-lint`
    - `cargo test` in `tools/argument-comment-lint`
    - `cargo test -p codex-terminal-detection`
    
    ## Follow-up
    
    - Clean up remaining Linux-only target-gated callsites, then switch the
    Linux lint lane back to the plain wrapper invocation.
    - Clean up remaining Windows-only target-gated callsites, then switch
    the Windows lint lane back to the plain wrapper invocation.
  • Add auth env observability (#14905)
    CXC-410 Emit Env Var Status with `/feedback` report
    
    Add more observability on top of #14611 
    
    [Unset](https://openai.sentry.io/issues/7340419168/?project=4510195390611458&query=019cfa8d-c1ba-7002-96fa-e35fc340551d&referrer=issue-stream)
    
    [Set](https://openai.sentry.io/issues/7340426331/?project=4510195390611458&query=019cfa91-aba1-7823-ab7e-762edfbc0ed4&referrer=issue-stream)
    <img width="1063" height="610" alt="image"
    src="https://github.com/user-attachments/assets/937ab026-1c2d-4757-81d5-5f31b853113e"
    />
    
    
    ###### Summary
    - Adds auth-env telemetry that records whether key auth-related env
    overrides were present on session start and request paths.
    - Threads those auth-env fields through `/responses`, websocket, and
    `/models` telemetry and feedback metadata.
    - Buckets custom provider `env_key` configuration to a safe
    `"configured"` value instead of emitting raw config text.
    - Keeps the slice observability-only: no raw token values or raw URLs
    are emitted.
    
    ###### Rationale (from spec findings)
    - 401 and auth-path debugging needs a way to distinguish env-driven auth
    paths from sessions with no auth env override.
    - Startup and model-refresh failures need the same auth-env diagnostics
    as normal request failures.
    - Feedback and Sentry tags need the same auth-env signal as OTel events
    so reports can be triaged consistently.
    - Custom provider config is user-controlled text, so the telemetry
    contract must stay presence-only / bucketed.
    
    ###### Scope
    - Adds a small `AuthEnvTelemetry` bundle for env presence collection and
    threads it through the main request/session telemetry paths.
    - Does not add endpoint/base-url/provider-header/geo routing attribution
    or broader telemetry API redesign.
    
    ###### Trade-offs
    - `provider_env_key_name` is bucketed to `"configured"` instead of
    preserving the literal configured env var name.
    - `/models` is included because startup/model-refresh auth failures need
    the same diagnostics, but broader parity work remains out of scope.
    - This slice keeps the existing telemetry APIs and layers auth-env
    fields onto them rather than redesigning the metadata model.
    
    ###### Client follow-up
    - Add the separate endpoint/base-url attribution slice if routing-source
    diagnosis is still needed.
    - Add provider-header or residency attribution only if auth-env presence
    proves insufficient in real reports.
    - Revisit whether any additional auth-related env inputs need safe
    bucketing after more 401 triage data.
    
    ###### Testing
    - `cargo test -p codex-core emit_feedback_request_tags -- --nocapture`
    - `cargo test -p codex-core
    collect_auth_env_telemetry_buckets_provider_env_key_name -- --nocapture`
    - `cargo test -p codex-core
    models_request_telemetry_emits_auth_env_feedback_tags_on_failure --
    --nocapture`
    - `cargo test -p codex-otel
    otel_export_routing_policy_routes_api_request_auth_observability --
    --nocapture`
    - `cargo test -p codex-otel
    otel_export_routing_policy_routes_websocket_connect_auth_observability
    -- --nocapture`
    - `cargo test -p codex-otel
    otel_export_routing_policy_routes_websocket_request_transport_observability
    -- --nocapture`
    - `cargo test -p codex-core --no-run --message-format short`
    - `cargo test -p codex-otel --no-run --message-format short`
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Add auth 401 observability to client bug reports (#14611)
    CXC-392
    
      [With
      401](https://openai.sentry.io/issues/7333870443/?project=4510195390611458&query=019ce8f8-560c-7f10-a00a-c59553740674&referrer=issue-stream)
      <img width="1909" height="555" alt="401 auth tags in Sentry"
      src="https://github.com/user-attachments/assets/412ea950-61c4-4780-9697-15c270971ee3"
      />
    
    
      - auth_401_*: preserved facts from the latest unauthorized response snapshot
      - auth_*: latest auth-related facts from the latest request attempt
      - auth_recovery_*: unauthorized recovery state and follow-up result
    
    
      Without 401
      <img width="1917" height="522" alt="happy-path auth tags in Sentry"
      src="https://github.com/user-attachments/assets/3381ed28-8022-43b0-b6c0-623a630e679f"
      />
    
      ###### Summary
      - Add client-visible 401 diagnostics for auth attachment, upstream auth classification, and 401 request id / cf-ray correlation.
      - Record unauthorized recovery mode, phase, outcome, and retry/follow-up status without changing auth behavior.
      - Surface the highest-signal auth and recovery fields on uploaded client bug reports so they are usable in Sentry.
      - Preserve original unauthorized evidence under `auth_401_*` while keeping follow-up result tags separate.
    
      ###### Rationale (from spec findings)
      - The dominant bucket needed proof of whether the client attached auth before send or upstream still classified the request as missing auth.
      - Client uploads needed to show whether unauthorized recovery ran and what the client tried next.
      - Request id and cf-ray needed to be preserved on the unauthorized response so server-side correlation is immediate.
      - The bug-report path needed the same auth evidence as the request telemetry path, otherwise the observability would not be operationally useful.
    
      ###### Scope
      - Add auth 401 and unauthorized-recovery observability in `codex-rs/core`, `codex-rs/codex-api`, and `codex-rs/otel`, including feedback-tag surfacing.
      - Keep auth semantics, refresh behavior, retry behavior, endpoint classification, and geo-denial follow-up work out of this PR.
    
      ###### Trade-offs
      - This exports only safe auth evidence: header presence/name, upstream auth classification, request ids, and recovery state. It does not export token values or raw upstream bodies.
      - This keeps websocket connection reuse as a transport clue because it can help distinguish stale reused sessions from fresh reconnects.
      - Misroute/base-url classification and geo-denial are intentionally deferred to a separate follow-up PR so this review stays focused on the dominant auth 401 bucket.
    
      ###### Client follow-up
      - PR 2 will add misroute/provider and geo-denial observability plus the matching feedback-tag surfacing.
      - A separate host/app-server PR should log auth-decision inputs so pre-send host auth state can be correlated with client request evidence.
      - `device_id` remains intentionally separate until there is a safe existing source on the feedback upload path.
    
      ###### Testing
      - `cargo test -p codex-core refresh_available_models_sorts_by_priority`
      - `cargo test -p codex-core emit_feedback_request_tags_`
      - `cargo test -p codex-core emit_feedback_auth_recovery_tags_`
      - `cargo test -p codex-core auth_request_telemetry_context_tracks_attached_auth_and_retry_phase`
      - `cargo test -p codex-core extract_response_debug_context_decodes_identity_headers`
      - `cargo test -p codex-core identity_auth_details`
      - `cargo test -p codex-core telemetry_error_messages_preserve_non_http_details`
      - `cargo test -p codex-core --all-features --no-run`
      - `cargo test -p codex-otel otel_export_routing_policy_routes_api_request_auth_observability`
      - `cargo test -p codex-otel otel_export_routing_policy_routes_websocket_connect_auth_observability`
      - `cargo test -p codex-otel otel_export_routing_policy_routes_websocket_request_transport_observability`
  • fix(otel): make HTTP trace export survive app-server runtimes (#14300)
    ## Summary
    
    This PR fixes OTLP HTTP trace export in runtimes where the previous
    exporter setup was unreliable, especially around app-server usage. It
    also removes the old `codex_otel::otel_provider` compatibility shim and
    switches remaining call sites over to the crate-root
    `codex_otel::OtelProvider` export.
    
    ## What changed
    
    - Use a runtime-safe OTLP HTTP trace exporter path for Tokio runtimes.
    - Add an async HTTP client path for trace export when we are already
    inside a multi-thread Tokio runtime.
    - Make provider shutdown flush traces before tearing down the tracer
    provider.
    - Add loopback coverage that verifies traces are actually sent to
    `/v1/traces`:
      - outside Tokio
      - inside a multi-thread Tokio runtime
      - inside a current-thread Tokio runtime
    - Remove the `codex_otel::otel_provider` shim and update remaining
    imports.
    
    ## Why
    
    I hit cases where spans were being created correctly but never made it
    to the collector. The issue turned out to be in exporter/runtime
    behavior rather than the span plumbing itself. This PR narrows that gap
    and gives us regression coverage for the actual export path.
  • chore(otel): rename OtelManager to SessionTelemetry (#13808)
    ## Summary
    This is a purely mechanical refactor of `OtelManager` ->
    `SessionTelemetry` to better convey what the struct is doing. No
    behavior change.
    
    ## Why
    
    `OtelManager` ended up sounding much broader than what this type
    actually does. It doesn't manage OTEL globally; it's the session-scoped
    telemetry surface for emitting log/trace events and recording metrics
    with consistent session metadata (`app_version`, `model`, `slug`,
    `originator`, etc.).
    
    `SessionTelemetry` is a more accurate name, and updating the call sites
    makes that boundary a lot easier to follow.
    
    ## Validation
    
    - `just fmt`
    - `cargo test -p codex-otel`
    - `cargo test -p codex-core`
  • feat(otel, core): record turn TTFT and TTFM metrics in codex-core (#13630)
    ### Summary
    This adds turn-level latency metrics for the first model output and the
    first completed agent message.
    - `codex.turn.ttft.duration_ms` starts at turn start and records on the
    first output signal we see from the model. That includes normal
    assistant text, reasoning deltas, and non-text outputs like tool-call
    items.
    - `codex.turn.ttfm.duration_ms` also starts at turn start, but it
    records when the first agent message finishes streaming rather than when
    its first delta arrives.
    
    ### Implementation notes
    The timing is tracked in codex-core, not app-server, so the definition
    stays consistent across CLI, TUI, and app-server clients.
    
    I reused the existing turn lifecycle boundary that already drives
    `codex.turn.e2e_duration_ms`, stored the turn start timestamp in turn
    state, and record each metric once per turn.
    
    I also wired the new metric names into the OTEL runtime metrics summary
    so they show up in the same in-memory/debug snapshot path as the
    existing timing metrics.
  • feat(otel): safe tracing (#13626)
    ### Motivation
    Today config.toml has three different OTEL knobs under `[otel]`:
    - `exporter` controls where OTEL logs go
    - `trace_exporter` controls where OTEL traces go
    - `metrics_exporter` controls where metrics go
    
    Those often (pretty much always?) serve different purposes.
    
    For example, for OpenAI internal usage, the **log exporter** is already
    being used for IT/security telemetry, and that use case is intentionally
    content-rich: tool calls, arguments, outputs, MCP payloads, and in some
    cases user content are all useful there. `log_user_prompt` is a good
    example of that distinction. When it’s enabled, we include raw prompt
    text in OTEL logs, which is acceptable for the security use case.
    
    The **trace exporter** is a different story. The goal there is to give
    OpenAI engineers visibility into latency and request behavior when they
    run Codex locally, without sending sensitive prompt or tool data as
    trace event data. In other words, traces should help answer “what was
    slow?” or “where did time go?”, not “what did the user say?” or “what
    did the tool return?”
    
    The complication is that Rust’s `tracing` crate does not make a hard
    distinction between “logs” and “trace events.” It gives us one
    instrumentation API for logs and trace events (via `tracing::event!`),
    and subscribers decide what gets treated as logs, trace events, or both.
    
    Before this change, our OTEL trace layer was effectively attached to the
    general tracing stream, which meant turning on `trace_exporter` could
    pick up content-rich events that were originally written with logging
    (and the `log_exporter`) in mind. That made it too easy for sensitive
    data to end up in exported traces by accident.
    
    ### Concrete example
    In `otel_manager.rs`, this `tracing::event!` call would be exported in
    both logs AND traces (as a trace event).
    ```
        pub fn user_prompt(&self, items: &[UserInput]) {
            let prompt = items
                .iter()
                .flat_map(|item| match item {
                    UserInput::Text { text, .. } => Some(text.as_str()),
                    _ => None,
                })
                .collect::<String>();
    
            let prompt_to_log = if self.metadata.log_user_prompts {
                prompt.as_str()
            } else {
                "[REDACTED]"
            };
    
            tracing::event!(
                tracing::Level::INFO,
                event.name = "codex.user_prompt",
                event.timestamp = %timestamp(),
                // ...
                prompt = %prompt_to_log,
            );
        }
    ```
    
    Instead of `tracing::event!`, we should now be using `log_event!` and
    `trace_event!` instead to more clearly indicate which sink (logs vs.
    traces) that event should be exported to.
    
    ### What changed
    This PR makes the log and trace export distinct instead of treating them
    as two sinks for the same data.
    
    On the provider side, OTEL logs and traces now have separate
    routing/filtering policy. The log exporter keeps receiving the existing
    `codex_otel` events, while trace export is limited to spans and trace
    events.
    
    On the event side, `OtelManager` now emits two flavors of telemetry
    where needed:
    - a log-only event with the current rich payloads
    - a tracing-safe event with summaries only
    
    It also has a convenience `log_and_trace_event!` macro for emitting to
    both logs and traces when it's safe to do so, as well as log- and
    trace-specific fields.
    
    That means prompts, tool args, tool output, account email, MCP metadata,
    and similar content stay in the log lane, while traces get the pieces
    that are actually useful for performance work: durations, counts, sizes,
    status, token counts, tool origin, and normalized error classes.
    
    This preserves current IT/security logging behavior while making it safe
    to turn on trace export for employees.
    
    ### Full list of things removed from trace export
    - raw user prompt text from `codex.user_prompt`
    - raw tool arguments and output from `codex.tool_result`
    - MCP server metadata from `codex.tool_result` (mcp_server,
    mcp_server_origin)
    - account identity fields like `user.email` and `user.account_id` from
    trace-safe OTEL events
    - `host.name` from trace resources
    - generic `codex.tool_decision` events from traces
    - generic `codex.sse_event` events from traces
    - the full ToolCall debug payload from the `handle_tool_call` span
    
    What traces now keep instead is mostly:
    - spans
    - trace-safe OTEL events
    - counts, lengths, durations, status, token counts, and tool origin
    summaries
  • feat: add service name to app-server (#12319)
    Add service name to the app-server so that the app can use it's own
    service name
    
    This is on thread level because later we might plan the app-server to
    become a singleton on the computer
  • Add MCP server context to otel tool_result logs (#12267)
    Summary
    - capture the origin for each configured MCP server and expose it via
    the connection manager
    - plumb MCP server name/origin into tool logging and emit
    codex.tool_result events with those fields
    - add unit coverage for origin parsing and extend OTEL tests to assert
    empty MCP fields for non-MCP tools
    - currently not logging full urls or url paths to prevent logging
    potentially sensitive data
    
    Testing
    - Not run (not requested)
  • add sandbox policy and sandbox name to codex.tool.call metrics (#10711)
    This will give visibility into the comparative success rate of the
    Windows sandbox implementations compared to other platforms.
  • fix(auth): isolate chatgptAuthTokens concept to auth manager and app-server (#10423)
    So that the rest of the codebase (like TUI) don't need to be concerned
    whether ChatGPT auth was handled by Codex itself or passed in via
    app-server's external auth mode.
  • Add websocket telemetry metrics and labels (#10316)
    Summary
    - expose websocket telemetry hooks through the responses client so
    request durations and event processing can be reported
    - record websocket request/event metrics and emit runtime telemetry
    events that the history UI now surfaces
    - improve tests to cover websocket telemetry reporting and guard runtime
    summary updates
    
    
    <img width="824" height="79" alt="Screenshot 2026-01-31 at 5 28 12 PM"
    src="https://github.com/user-attachments/assets/ea9a7965-d8b4-4e3c-a984-ef4fdc44c81d"
    />
  • feat: show runtime metrics in console (#10278)
    Summary of changes:
    
    - Adds a new feature flag: runtime_metrics
      - Declared in core/src/features.rs
      - Added to core/config.schema.json
      - Wired into OTEL init in core/src/otel_init.rs
    
    - Enables on-demand runtime metric snapshots in OTEL
      - Adds runtime_metrics: bool to otel/src/config.rs
      - Enables experimental custom reader features in otel/Cargo.toml
      - Adds snapshot/reset/summary APIs in:
        - otel/src/lib.rs
        - otel/src/metrics/client.rs
        - otel/src/metrics/config.rs
        - otel/src/metrics/error.rs
    
    - Defines metric names and a runtime summary builder
      - New files:
        - otel/src/metrics/names.rs
        - otel/src/metrics/runtime_metrics.rs
      - Summarizes totals for:
        - Tool calls
        - API requests
        - SSE/streaming events
    
    - Instruments metrics collection in OTEL manager
      - otel/src/traces/otel_manager.rs now records:
        - API call counts + durations
        - SSE event counts + durations (success/failure)
        - Tool call metrics now use shared constants
    
    - Surfaces runtime metrics in the TUI
      - Resets runtime metrics at turn start in tui/src/chatwidget.rs
    - Displays metrics in the final separator line in
    tui/src/history_cell.rs
    
    - Adds tests
      - New OTEL tests:
        - otel/tests/suite/snapshot.rs
        - otel/tests/suite/runtime_summary.rs
      - New TUI test:
    - final_message_separator_includes_runtime_metrics in
    tui/src/history_cell.rs
    
    Scope:
    - 19 files changed
    - ~652 insertions, 38 deletions
    
    
    <img width="922" height="169" alt="Screenshot 2026-01-30 at 4 11 34 PM"
    src="https://github.com/user-attachments/assets/1efd754d-a16d-4564-83a5-f4442fd2f998"
    />
  • feat: backfill timing metric (#10218)
    1. Add a metric to measure the backfill time
    2. Add a unit to the timing histogram
  • feat: add session source as otel metadata tag (#9720)
    Add session.source and user.account_id as global OTEL metric tags to
    identify client surface and user.
  • otel test: retry WouldBlock errors (#8915)
    This test looks flaky on Windows:
    
    ```
            FAIL [   0.034s] (1442/2802) codex-otel::tests suite::otlp_http_loopback::otlp_http_exporter_sends_metrics_to_collector
      stdout ───
    
        running 1 test
        test suite::otlp_http_loopback::otlp_http_exporter_sends_metrics_to_collector ... FAILED
    
        failures:
    
        failures:
            suite::otlp_http_loopback::otlp_http_exporter_sends_metrics_to_collector
    
        test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 14 filtered out; finished in 0.02s
        
      stderr ───
        Error: ProviderShutdown { source: InternalFailure("[InternalFailure(\"Failed to shutdown\")]") }
    
    ────────────
         Summary [ 175.360s] 2802 tests run: 2801 passed, 1 failed, 15 skipped
            FAIL [   0.034s] (1442/2802) codex-otel::tests suite::otlp_http_loopback::otlp_http_exporter_sends_metrics_to_collector
    ```
  • feat: metrics capabilities (#8318)
    Add metrics capabilities to Codex. The `README.md` is up to date.
    
    This will not be merged with the metrics before this PR of course:
    https://github.com/openai/codex/pull/8350