Commit Graph

75 Commits

  • Move auth code into login crate (#15150)
    - Move the auth implementation and token data into codex-login.
    - Keep codex-core re-exporting that surface from codex-login for
    existing callers.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Log automated reviewer approval sources distinctly (#15201)
    ## Summary
    
    - log guardian-reviewed tool approvals as `source=automated_reviewer` in
    `codex.tool_decision`
    - keep direct user approvals as `source=user` and config-driven
    approvals as `source=config`
    
    ## Testing
    
    -
    `/Users/gabec/.codex/skills/codex-oss-fastdev/scripts/codex-rs-fmt-quiet.sh`
    -
    `/Users/gabec/.codex/skills/codex-oss-fastdev/scripts/codex-rs-test-quiet.sh
    -p codex-otel` (fails in sandboxed loopback bind tests under
    `otel/tests/suite/otlp_http_loopback.rs`)
    - `cargo test -p codex-core guardian -- --nocapture` (original-tree run
    reached Guardian tests and only hit sandbox-related listener/proxy
    failures)
    
    Co-authored-by: Codex <noreply@openai.com>
  • Revert "fix: harden plugin feature gating" (#15102)
    Reverts openai/codex#15020
    
    I messed up the commit in my PR and accidentally merged changes that
    were still under review.
  • fix: harden plugin feature gating (#15020)
    1. Use requirement-resolved config.features as the plugin gate.
    2. Guard plugin/list, plugin/read, and related flows behind that gate.
    3. Skip bad marketplace.json files instead of failing the whole list.
    4. Simplify plugin state and caching.
  • Add auth env observability (#14905)
    CXC-410 Emit Env Var Status with `/feedback` report
    
    Add more observability on top of #14611 
    
    [Unset](https://openai.sentry.io/issues/7340419168/?project=4510195390611458&query=019cfa8d-c1ba-7002-96fa-e35fc340551d&referrer=issue-stream)
    
    [Set](https://openai.sentry.io/issues/7340426331/?project=4510195390611458&query=019cfa91-aba1-7823-ab7e-762edfbc0ed4&referrer=issue-stream)
    <img width="1063" height="610" alt="image"
    src="https://github.com/user-attachments/assets/937ab026-1c2d-4757-81d5-5f31b853113e"
    />
    
    
    ###### Summary
    - Adds auth-env telemetry that records whether key auth-related env
    overrides were present on session start and request paths.
    - Threads those auth-env fields through `/responses`, websocket, and
    `/models` telemetry and feedback metadata.
    - Buckets custom provider `env_key` configuration to a safe
    `"configured"` value instead of emitting raw config text.
    - Keeps the slice observability-only: no raw token values or raw URLs
    are emitted.
    
    ###### Rationale (from spec findings)
    - 401 and auth-path debugging needs a way to distinguish env-driven auth
    paths from sessions with no auth env override.
    - Startup and model-refresh failures need the same auth-env diagnostics
    as normal request failures.
    - Feedback and Sentry tags need the same auth-env signal as OTel events
    so reports can be triaged consistently.
    - Custom provider config is user-controlled text, so the telemetry
    contract must stay presence-only / bucketed.
    
    ###### Scope
    - Adds a small `AuthEnvTelemetry` bundle for env presence collection and
    threads it through the main request/session telemetry paths.
    - Does not add endpoint/base-url/provider-header/geo routing attribution
    or broader telemetry API redesign.
    
    ###### Trade-offs
    - `provider_env_key_name` is bucketed to `"configured"` instead of
    preserving the literal configured env var name.
    - `/models` is included because startup/model-refresh auth failures need
    the same diagnostics, but broader parity work remains out of scope.
    - This slice keeps the existing telemetry APIs and layers auth-env
    fields onto them rather than redesigning the metadata model.
    
    ###### Client follow-up
    - Add the separate endpoint/base-url attribution slice if routing-source
    diagnosis is still needed.
    - Add provider-header or residency attribution only if auth-env presence
    proves insufficient in real reports.
    - Revisit whether any additional auth-related env inputs need safe
    bucketing after more 401 triage data.
    
    ###### Testing
    - `cargo test -p codex-core emit_feedback_request_tags -- --nocapture`
    - `cargo test -p codex-core
    collect_auth_env_telemetry_buckets_provider_env_key_name -- --nocapture`
    - `cargo test -p codex-core
    models_request_telemetry_emits_auth_env_feedback_tags_on_failure --
    --nocapture`
    - `cargo test -p codex-otel
    otel_export_routing_policy_routes_api_request_auth_observability --
    --nocapture`
    - `cargo test -p codex-otel
    otel_export_routing_policy_routes_websocket_connect_auth_observability
    -- --nocapture`
    - `cargo test -p codex-otel
    otel_export_routing_policy_routes_websocket_request_transport_observability
    -- --nocapture`
    - `cargo test -p codex-core --no-run --message-format short`
    - `cargo test -p codex-otel --no-run --message-format short`
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • fix(core): prevent hanging turn/start due to websocket warming issues (#14838)
    ## Description
    
    This PR fixes a bad first-turn failure mode in app-server when the
    startup websocket prewarm hangs. Before this change, `initialize ->
    thread/start -> turn/start` could sit behind the prewarm for up to five
    minutes, so the client would not see `turn/started`, and even
    `turn/interrupt` would block because the turn had not actually started
    yet.
    
    Now, we:
    - set a (configurable) timeout of 15s for websocket startup time,
    exposed as `websocket_startup_timeout_ms` in config.toml
    - `turn/started` is sent immediately on `turn/start` even if the
    websocket is still connecting
    - `turn/interrupt` can be used to cancel a turn that is still waiting on
    the websocket warmup
    - the turn task will wait for the full 15s websocket warming timeout
    before falling back
    
    ## Why
    
    The old behavior made app-server feel stuck at exactly the moment the
    client expects turn lifecycle events to start flowing. That was
    especially painful for external clients, because from their point of
    view the server had accepted the request but then went silent for
    minutes.
    
    ## Configuring the websocket startup timeout
    Can set it in config.toml like this:
    ```
    [model_providers.openai]
    supports_websockets = true
    websocket_connect_timeout_ms = 15000
    ```
  • Apply argument comment lint across codex-rs (#14652)
    ## Why
    
    Once the repo-local lint exists, `codex-rs` needs to follow the
    checked-in convention and CI needs to keep it from drifting. This commit
    applies the fallback `/*param*/` style consistently across existing
    positional literal call sites without changing those APIs.
    
    The longer-term preference is still to avoid APIs that require comments
    by choosing clearer parameter types and call shapes. This PR is
    intentionally the mechanical follow-through for the places where the
    existing signatures stay in place.
    
    After rebasing onto newer `main`, the rollout also had to cover newly
    introduced `tui_app_server` call sites. That made it clear the first cut
    of the CI job was too expensive for the common path: it was spending
    almost as much time installing `cargo-dylint` and re-testing the lint
    crate as a representative test job spends running product tests. The CI
    update keeps the full workspace enforcement but trims that extra
    overhead from ordinary `codex-rs` PRs.
    
    ## What changed
    
    - keep a dedicated `argument_comment_lint` job in `rust-ci`
    - mechanically annotate remaining opaque positional literals across
    `codex-rs` with exact `/*param*/` comments, including the rebased
    `tui_app_server` call sites that now fall under the lint
    - keep the checked-in style aligned with the lint policy by using
    `/*param*/` and leaving string and char literals uncommented
    - cache `cargo-dylint`, `dylint-link`, and the relevant Cargo
    registry/git metadata in the lint job
    - split changed-path detection so the lint crate's own `cargo test` step
    runs only when `tools/argument-comment-lint/*` or `rust-ci.yml` changes
    - continue to run the repo wrapper over the `codex-rs` workspace, so
    product-code enforcement is unchanged
    
    Most of the code changes in this commit are intentionally mechanical
    comment rewrites or insertions driven by the lint itself.
    
    ## Verification
    
    - `./tools/argument-comment-lint/run.sh --workspace`
    - `cargo test -p codex-tui-app-server -p codex-tui`
    - parsed `.github/workflows/rust-ci.yml` locally with PyYAML
    
    ---
    
    * -> #14652
    * #14651
  • Add auth 401 observability to client bug reports (#14611)
    CXC-392
    
      [With
      401](https://openai.sentry.io/issues/7333870443/?project=4510195390611458&query=019ce8f8-560c-7f10-a00a-c59553740674&referrer=issue-stream)
      <img width="1909" height="555" alt="401 auth tags in Sentry"
      src="https://github.com/user-attachments/assets/412ea950-61c4-4780-9697-15c270971ee3"
      />
    
    
      - auth_401_*: preserved facts from the latest unauthorized response snapshot
      - auth_*: latest auth-related facts from the latest request attempt
      - auth_recovery_*: unauthorized recovery state and follow-up result
    
    
      Without 401
      <img width="1917" height="522" alt="happy-path auth tags in Sentry"
      src="https://github.com/user-attachments/assets/3381ed28-8022-43b0-b6c0-623a630e679f"
      />
    
      ###### Summary
      - Add client-visible 401 diagnostics for auth attachment, upstream auth classification, and 401 request id / cf-ray correlation.
      - Record unauthorized recovery mode, phase, outcome, and retry/follow-up status without changing auth behavior.
      - Surface the highest-signal auth and recovery fields on uploaded client bug reports so they are usable in Sentry.
      - Preserve original unauthorized evidence under `auth_401_*` while keeping follow-up result tags separate.
    
      ###### Rationale (from spec findings)
      - The dominant bucket needed proof of whether the client attached auth before send or upstream still classified the request as missing auth.
      - Client uploads needed to show whether unauthorized recovery ran and what the client tried next.
      - Request id and cf-ray needed to be preserved on the unauthorized response so server-side correlation is immediate.
      - The bug-report path needed the same auth evidence as the request telemetry path, otherwise the observability would not be operationally useful.
    
      ###### Scope
      - Add auth 401 and unauthorized-recovery observability in `codex-rs/core`, `codex-rs/codex-api`, and `codex-rs/otel`, including feedback-tag surfacing.
      - Keep auth semantics, refresh behavior, retry behavior, endpoint classification, and geo-denial follow-up work out of this PR.
    
      ###### Trade-offs
      - This exports only safe auth evidence: header presence/name, upstream auth classification, request ids, and recovery state. It does not export token values or raw upstream bodies.
      - This keeps websocket connection reuse as a transport clue because it can help distinguish stale reused sessions from fresh reconnects.
      - Misroute/base-url classification and geo-denial are intentionally deferred to a separate follow-up PR so this review stays focused on the dominant auth 401 bucket.
    
      ###### Client follow-up
      - PR 2 will add misroute/provider and geo-denial observability plus the matching feedback-tag surfacing.
      - A separate host/app-server PR should log auth-decision inputs so pre-send host auth state can be correlated with client request evidence.
      - `device_id` remains intentionally separate until there is a safe existing source on the feedback upload path.
    
      ###### Testing
      - `cargo test -p codex-core refresh_available_models_sorts_by_priority`
      - `cargo test -p codex-core emit_feedback_request_tags_`
      - `cargo test -p codex-core emit_feedback_auth_recovery_tags_`
      - `cargo test -p codex-core auth_request_telemetry_context_tracks_attached_auth_and_retry_phase`
      - `cargo test -p codex-core extract_response_debug_context_decodes_identity_headers`
      - `cargo test -p codex-core identity_auth_details`
      - `cargo test -p codex-core telemetry_error_messages_preserve_non_http_details`
      - `cargo test -p codex-core --all-features --no-run`
      - `cargo test -p codex-otel otel_export_routing_policy_routes_api_request_auth_observability`
      - `cargo test -p codex-otel otel_export_routing_policy_routes_websocket_connect_auth_observability`
      - `cargo test -p codex-otel otel_export_routing_policy_routes_websocket_request_transport_observability`
  • feat: search_tool migrate to bring you own tool of Responses API (#14274)
    ## Why
    
    to support a new bring your own search tool in Responses
    API(https://developers.openai.com/api/docs/guides/tools-tool-search#client-executed-tool-search)
    we migrating our bm25 search tool to use official way to execute search
    on client and communicate additional tools to the model.
    
    ## What
    - replace the legacy `search_tool_bm25` flow with client-executed
    `tool_search`
    - add protocol, SSE, history, and normalization support for
    `tool_search_call` and `tool_search_output`
    - return namespaced Codex Apps search results and wire namespaced
    follow-up tool calls back into MCP dispatch
  • feat(core): emit turn metric for network proxy state (#14250)
    ## Summary
    - add a per-turn `codex.turn.network_proxy` metric constant
    - emit the metric from turn completion using the live managed proxy
    enabled state
    - add focused tests for active and inactive tag emission
  • fix(otel): make HTTP trace export survive app-server runtimes (#14300)
    ## Summary
    
    This PR fixes OTLP HTTP trace export in runtimes where the previous
    exporter setup was unreliable, especially around app-server usage. It
    also removes the old `codex_otel::otel_provider` compatibility shim and
    switches remaining call sites over to the crate-root
    `codex_otel::OtelProvider` export.
    
    ## What changed
    
    - Use a runtime-safe OTLP HTTP trace exporter path for Tokio runtimes.
    - Add an async HTTP client path for trace export when we are already
    inside a multi-thread Tokio runtime.
    - Make provider shutdown flush traces before tearing down the tracer
    provider.
    - Add loopback coverage that verifies traces are actually sent to
    `/v1/traces`:
      - outside Tokio
      - inside a multi-thread Tokio runtime
      - inside a current-thread Tokio runtime
    - Remove the `codex_otel::otel_provider` shim and update remaining
    imports.
    
    ## Why
    
    I hit cases where spans were being created correctly but never made it
    to the collector. The issue turned out to be in exporter/runtime
    behavior rather than the span plumbing itself. This PR narrows that gap
    and gives us regression coverage for the actual export path.
  • feat(otel): Centralize OTEL metric names and shared tag builders (#14117)
    This cleans up a bunch of metric plumbing that had started to drift.
    
    The main change is making `codex-otel` the canonical home for shared
    metric definitions and metric tag helpers. I moved the `turn/thread`
    metric names that were still duplicated into the OTEL metric registry,
    added a shared `metrics::tags` module for common tag keys and session
    tag construction, and updated `SessionTelemetry` to build its metadata
    tags through that shared path.
    
    On the codex-core side, TTFT/TTFM now use the shared metric-name
    constants instead of local string definitions. I also switched the
    obvious remaining turn/thread metric callsites over to the shared
    constants, and added a small helper so TTFT/TTFM can attach an optional
    sanitized client.name tag from TurnContext.
    
    This should make follow-on telemetry work less ad hoc:
    - one canonical place for metric names
    - one canonical place for common metric tag keys/builders
    - less duplication between `codex-core` and `codex-otel`
  • chore(otel): rename OtelManager to SessionTelemetry (#13808)
    ## Summary
    This is a purely mechanical refactor of `OtelManager` ->
    `SessionTelemetry` to better convey what the struct is doing. No
    behavior change.
    
    ## Why
    
    `OtelManager` ended up sounding much broader than what this type
    actually does. It doesn't manage OTEL globally; it's the session-scoped
    telemetry surface for emitting log/trace events and recording metrics
    with consistent session metadata (`app_version`, `model`, `slug`,
    `originator`, etc.).
    
    `SessionTelemetry` is a more accurate name, and updating the call sites
    makes that boundary a lot easier to follow.
    
    ## Validation
    
    - `just fmt`
    - `cargo test -p codex-otel`
    - `cargo test -p codex-core`
  • chore(otel): reorganize codex-otel crate (#13800)
    ## Summary
    This is a structural cleanup of `codex-otel` to make the ownership
    boundaries a lot clearer.
    
    For example, previously it was quite confusing that `OtelManager` which
    emits log + trace event telemetry lived under
    `codex-rs/otel/src/traces/`. Also, there were two places that defined
    methods on OtelManager via `impl OtelManager` (`lib.rs` and
    `otel_manager.rs`).
    
    What changed:
    - move the `OtelProvider` implementation into `src/provider.rs`
    - move `OtelManager` and session-scoped event emission into
    `src/events/otel_manager.rs`
    - collapse the shared log/trace event helpers into
    `src/events/shared.rs`
    - pull target classification into `src/targets.rs`
    - move `traceparent_context_from_env()` into `src/trace_context.rs`
    - keep `src/otel_provider.rs` as a compatibility shim for existing
    imports
    - update the `codex-otel` README to reflect the new layout
    
    ## Why
    `lib.rs` and `otel_provider.rs` were doing too many different jobs at
    once: provider setup, export routing, trace-context helpers, and session
    event emission all lived together.
    
    This refactor separates those concerns without trying to change the
    behavior of the crate. The goal is to make future OTEL work easier to
    reason about and easier to review.
    
    ## Notes
    - no intended behavior change
    - `OtelManager` remains the session-scoped event emitter in this PR
    - the `otel_provider` shim keeps downstream churn low while the
    internals move around
    
    ## Validation
    - `just fmt`
    - `cargo test -p codex-otel`
    - `just fix -p codex-otel`
  • feat(otel, core): record turn TTFT and TTFM metrics in codex-core (#13630)
    ### Summary
    This adds turn-level latency metrics for the first model output and the
    first completed agent message.
    - `codex.turn.ttft.duration_ms` starts at turn start and records on the
    first output signal we see from the model. That includes normal
    assistant text, reasoning deltas, and non-text outputs like tool-call
    items.
    - `codex.turn.ttfm.duration_ms` also starts at turn start, but it
    records when the first agent message finishes streaming rather than when
    its first delta arrives.
    
    ### Implementation notes
    The timing is tracked in codex-core, not app-server, so the definition
    stays consistent across CLI, TUI, and app-server clients.
    
    I reused the existing turn lifecycle boundary that already drives
    `codex.turn.e2e_duration_ms`, stored the turn start timestamp in turn
    state, and record each metric once per turn.
    
    I also wired the new metric names into the OTEL runtime metrics summary
    so they show up in the same in-memory/debug snapshot path as the
    existing timing metrics.
  • feat(otel): safe tracing (#13626)
    ### Motivation
    Today config.toml has three different OTEL knobs under `[otel]`:
    - `exporter` controls where OTEL logs go
    - `trace_exporter` controls where OTEL traces go
    - `metrics_exporter` controls where metrics go
    
    Those often (pretty much always?) serve different purposes.
    
    For example, for OpenAI internal usage, the **log exporter** is already
    being used for IT/security telemetry, and that use case is intentionally
    content-rich: tool calls, arguments, outputs, MCP payloads, and in some
    cases user content are all useful there. `log_user_prompt` is a good
    example of that distinction. When it’s enabled, we include raw prompt
    text in OTEL logs, which is acceptable for the security use case.
    
    The **trace exporter** is a different story. The goal there is to give
    OpenAI engineers visibility into latency and request behavior when they
    run Codex locally, without sending sensitive prompt or tool data as
    trace event data. In other words, traces should help answer “what was
    slow?” or “where did time go?”, not “what did the user say?” or “what
    did the tool return?”
    
    The complication is that Rust’s `tracing` crate does not make a hard
    distinction between “logs” and “trace events.” It gives us one
    instrumentation API for logs and trace events (via `tracing::event!`),
    and subscribers decide what gets treated as logs, trace events, or both.
    
    Before this change, our OTEL trace layer was effectively attached to the
    general tracing stream, which meant turning on `trace_exporter` could
    pick up content-rich events that were originally written with logging
    (and the `log_exporter`) in mind. That made it too easy for sensitive
    data to end up in exported traces by accident.
    
    ### Concrete example
    In `otel_manager.rs`, this `tracing::event!` call would be exported in
    both logs AND traces (as a trace event).
    ```
        pub fn user_prompt(&self, items: &[UserInput]) {
            let prompt = items
                .iter()
                .flat_map(|item| match item {
                    UserInput::Text { text, .. } => Some(text.as_str()),
                    _ => None,
                })
                .collect::<String>();
    
            let prompt_to_log = if self.metadata.log_user_prompts {
                prompt.as_str()
            } else {
                "[REDACTED]"
            };
    
            tracing::event!(
                tracing::Level::INFO,
                event.name = "codex.user_prompt",
                event.timestamp = %timestamp(),
                // ...
                prompt = %prompt_to_log,
            );
        }
    ```
    
    Instead of `tracing::event!`, we should now be using `log_event!` and
    `trace_event!` instead to more clearly indicate which sink (logs vs.
    traces) that event should be exported to.
    
    ### What changed
    This PR makes the log and trace export distinct instead of treating them
    as two sinks for the same data.
    
    On the provider side, OTEL logs and traces now have separate
    routing/filtering policy. The log exporter keeps receiving the existing
    `codex_otel` events, while trace export is limited to spans and trace
    events.
    
    On the event side, `OtelManager` now emits two flavors of telemetry
    where needed:
    - a log-only event with the current rich payloads
    - a tracing-safe event with summaries only
    
    It also has a convenience `log_and_trace_event!` macro for emitting to
    both logs and traces when it's safe to do so, as well as log- and
    trace-specific fields.
    
    That means prompts, tool args, tool output, account email, MCP metadata,
    and similar content stay in the log lane, while traces get the pieces
    that are actually useful for performance work: durations, counts, sizes,
    status, token counts, tool origin, and normalized error classes.
    
    This preserves current IT/security logging behavior while making it safe
    to turn on trace export for employees.
    
    ### Full list of things removed from trace export
    - raw user prompt text from `codex.user_prompt`
    - raw tool arguments and output from `codex.tool_result`
    - MCP server metadata from `codex.tool_result` (mcp_server,
    mcp_server_origin)
    - account identity fields like `user.email` and `user.account_id` from
    trace-safe OTEL events
    - `host.name` from trace resources
    - generic `codex.tool_decision` events from traces
    - generic `codex.sse_event` events from traces
    - the full ToolCall debug payload from the `handle_tool_call` span
    
    What traces now keep instead is mostly:
    - spans
    - trace-safe OTEL events
    - counts, lengths, durations, status, token counts, and tool origin
    summaries
  • feat(core): persist trace_id for turns in RolloutItem::TurnContext (#13602)
    This PR adds a durable trace linkage for each turn by storing the active
    trace ID on the rollout TurnContext record stored in session rollout
    files.
    
    Before this change, we propagated trace context at runtime but didn’t
    persist a stable per-turn trace key in rollout history. That made
    after-the-fact debugging harder (for example, mapping a historical turn
    to the corresponding trace in datadog). This sets us up for much easier
    debugging in the future.
    
    ### What changed
    - Added an optional `trace_id` to TurnContextItem (rollout schema).
    - Added a small OTEL helper to read the current span trace ID.
    - Captured `trace_id` when creating `TurnContext` and included it in
    `to_turn_context_item()`.
    - Updated tests and fixtures that construct TurnContextItem so
    older/no-trace cases still work.
    
    ### Why this approach
    TurnContext is already the canonical durable per-turn metadata in
    rollout. This keeps ownership clean: trace linkage lives with other
    persisted turn metadata.
  • image-gen-core (#13290)
    Core tool-calling for image-gen, handles requesting and receiving logic
    for images using response API
  • feat(app-server): propagate app-server trace context into core (#13368)
    ### Summary
    Propagate trace context originating at app-server RPC method handlers ->
    codex core submission loop (so this includes spans such as `run_turn`!).
    This implements PR 2 of the app-server tracing rollout.
    
    This also removes the old lower-level env-based reparenting in core so
    explicit request/submission ancestry wins instead of being overridden by
    ambient `TRACEPARENT` state.
    
    ### What changed
    - Added `trace: Option<W3cTraceContext>` to codex_protocol::Submission
    - Taught `Codex::submit()` / `submit_with_id()` to automatically capture
    the current span context when constructing or forwarding a submission
    - Wrapped the core submission loop in a submission_dispatch span
    parented from Submission.trace
    - Warn on invalid submission trace carriers and ignore them cleanly
    - Removed the old env-based downstream reparenting path in core task
    execution
    - Stopped OTEL provider init from implicitly attaching env trace context
    process-wide
    - Updated mcp-server Submission call sites for the new field
    
    Added focused unit tests for:
    - capturing trace context into Submission
    - preferring `Submission.trace` when building the core dispatch span
    
    ### Why
    PR 1 gave us consistent inbound request spans in app-server, but that
    only covered the transport boundary. For long-running work like turns
    and reviews, the important missing piece was preserving ancestry after
    the request handler returns and core continues work on a different async
    path.
    
    This change makes that handoff explicit and keeps the parentage rules
    simple:
    - app-server request span sets the current context
    - `Submission.trace` snapshots that context
    - core restores it once, at the submission boundary
    - deeper core spans inherit naturally
    
    That also lets us stop relying on env-based reparenting for this path,
    which was too ambient and could override explicit ancestry.
  • feat(app-server): add tracing to all app-server APIs (#13285)
    ### Overview
    This PR adds the first piece of tracing for app-server JSON-RPC
    requests.
    
    There are two main changes:
    - JSON-RPC requests can now take an optional W3C trace context at the
    top level via a `trace` field (`traceparent` / `tracestate`).
    - app-server now creates a dedicated request span for every inbound
    JSON-RPC request in `MessageProcessor`, and uses the request-level trace
    context as the parent when present.
    
    For compatibility with existing flows, app-server still falls back to
    the TRACEPARENT env var when there is no request-level traceparent.
    
    This PR is intentionally scoped to the app-server boundary. In a
    followup, we'll actually propagate trace context through the async
    handoff into core execution spans like run_turn, which will make
    app-server traces much more useful.
    
    ### Spans
    A few details on the app-server span shape:
    - each inbound request gets its own server span
    - span/resource names are based on the JSON-RPC method (`initialize`,
    `thread/start`, `turn/start`, etc.)
    - spans record transport (stdio vs websocket), request id, connection
    id, and client name/version when available
    - `initialize` stores client metadata in session state so later requests
    on the same connection can reuse it
  • otel: add host.name resource attribute to logs/traces via gethostname (#12352)
    **PR Summary**
    
    This PR adds the OpenTelemetry `host.name` resource attribute to Codex
    OTEL exports so every OTEL log (and trace, via the shared resource)
    carries the machine hostname.
    
    **What changed**
    
    - Added `host.name` to the shared OTEL `Resource` in
    `/Users/michael.mcgrew/code/codex/codex-rs/otel/src/otel_provider.rs`
      - This applies to both:
        - OTEL logs (`SdkLoggerProvider`)
        - OTEL traces (`SdkTracerProvider`)
    - Hostname is now resolved via `gethostname::gethostname()`
    (best-effort)
      - Value is trimmed
      - Empty values are omitted (non-fatal)
    - Added focused unit tests for:
      - including `host.name` when present
      - omitting `host.name` when missing/empty
    
    **Why**
    
    - `host.name` is host/process metadata and belongs on the OTEL
    `resource`, not per-event attributes.
    - Attaching it in the shared resource is the smallest change that
    guarantees coverage across all exported OTEL logs/traces.
    
    **Scope / Non-goals**
    
    - No public API changes
    - No changes to metrics behavior (this PR only updates log/trace
    resource metadata)
    
    **Dependency updates**
    
    - Added `gethostname` as a workspace dependency and `codex-otel`
    dependency
    - `Cargo.lock` updated accordingly
    - `MODULE.bazel.lock` unchanged after refresh/check
    
    **Validation**
    
    - `just fmt`
    - `cargo test -p codex-otel`
    - `just bazel-lock-update`
    - `just bazel-lock-check`
  • feat: add service name to app-server (#12319)
    Add service name to the app-server so that the app can use it's own
    service name
    
    This is on thread level because later we might plan the app-server to
    become a singleton on the computer
  • Add MCP server context to otel tool_result logs (#12267)
    Summary
    - capture the origin for each configured MCP server and expose it via
    the connection manager
    - plumb MCP server name/origin into tool logging and emit
    codex.tool_result events with those fields
    - add unit coverage for origin parsing and extend OTEL tests to assert
    empty MCP fields for non-MCP tools
    - currently not logging full urls or url paths to prevent logging
    potentially sensitive data
    
    Testing
    - Not run (not requested)
  • add(core): safety check downgrade warning (#11964)
    Add per-turn notice when a request is downgraded to a fallback model due
    to cyber safety checks.
    
    **Changes**
    
    - codex-api: Emit a ServerModel event based on the openai-model response
    header and/or response payload (SSE + WebSocket), including when the
    model changes mid-stream.
    - core: When the server-reported model differs from the requested model,
    emit a single per-turn warning explaining the reroute to gpt-5.2 and
    directing users to Trusted
        Access verification and the cyber safety explainer.
    - app-server (v2): Surface these cyber model-routing warnings as
    synthetic userMessage items with text prefixed by Warning: (and document
    this behavior).
  • add sandbox policy and sandbox name to codex.tool.call metrics (#10711)
    This will give visibility into the comparative success rate of the
    Windows sandbox implementations compared to other platforms.
  • fix(auth): isolate chatgptAuthTokens concept to auth manager and app-server (#10423)
    So that the rest of the codebase (like TUI) don't need to be concerned
    whether ChatGPT auth was handled by Codex itself or passed in via
    app-server's external auth mode.
  • Include real OS info in metrics. (#10425)
    calculated a hashed user ID from either auth user id or API key
    Also correctly populates OS.
    
    These will make our metrics more useful and powerful for analysis.
  • Add websocket telemetry metrics and labels (#10316)
    Summary
    - expose websocket telemetry hooks through the responses client so
    request durations and event processing can be reported
    - record websocket request/event metrics and emit runtime telemetry
    events that the history UI now surfaces
    - improve tests to cover websocket telemetry reporting and guard runtime
    summary updates
    
    
    <img width="824" height="79" alt="Screenshot 2026-01-31 at 5 28 12 PM"
    src="https://github.com/user-attachments/assets/ea9a7965-d8b4-4e3c-a984-ef4fdc44c81d"
    />
  • feat: show runtime metrics in console (#10278)
    Summary of changes:
    
    - Adds a new feature flag: runtime_metrics
      - Declared in core/src/features.rs
      - Added to core/config.schema.json
      - Wired into OTEL init in core/src/otel_init.rs
    
    - Enables on-demand runtime metric snapshots in OTEL
      - Adds runtime_metrics: bool to otel/src/config.rs
      - Enables experimental custom reader features in otel/Cargo.toml
      - Adds snapshot/reset/summary APIs in:
        - otel/src/lib.rs
        - otel/src/metrics/client.rs
        - otel/src/metrics/config.rs
        - otel/src/metrics/error.rs
    
    - Defines metric names and a runtime summary builder
      - New files:
        - otel/src/metrics/names.rs
        - otel/src/metrics/runtime_metrics.rs
      - Summarizes totals for:
        - Tool calls
        - API requests
        - SSE/streaming events
    
    - Instruments metrics collection in OTEL manager
      - otel/src/traces/otel_manager.rs now records:
        - API call counts + durations
        - SSE event counts + durations (success/failure)
        - Tool call metrics now use shared constants
    
    - Surfaces runtime metrics in the TUI
      - Resets runtime metrics at turn start in tui/src/chatwidget.rs
    - Displays metrics in the final separator line in
    tui/src/history_cell.rs
    
    - Adds tests
      - New OTEL tests:
        - otel/tests/suite/snapshot.rs
        - otel/tests/suite/runtime_summary.rs
      - New TUI test:
    - final_message_separator_includes_runtime_metrics in
    tui/src/history_cell.rs
    
    Scope:
    - 19 files changed
    - ~652 insertions, 38 deletions
    
    
    <img width="922" height="169" alt="Screenshot 2026-01-30 at 4 11 34 PM"
    src="https://github.com/user-attachments/assets/1efd754d-a16d-4564-83a5-f4442fd2f998"
    />
  • feat: backfill timing metric (#10218)
    1. Add a metric to measure the backfill time
    2. Add a unit to the timing histogram
  • feat: sqlite 1 (#10004)
    Add a `.sqlite` database to be used to store rollout metatdata (and
    later logs)
    This PR is phase 1:
    * Add the database and the required infrastructure
    * Add a backfill of the database
    * Persist the newly created rollout both in files and in the DB
    * When we need to get metadata or a rollout, consider the `JSONL` as the
    source of truth but compare the results with the DB and show any errors
  • feat: add session source as otel metadata tag (#9720)
    Add session.source and user.account_id as global OTEL metric tags to
    identify client surface and user.
  • feat: support proxy for ws connection (#9409)
    unfortunately tokio-tungstenite doesn't support proxy configuration
    outbox, while https://github.com/snapview/tokio-tungstenite/pull/370 is
    in review, we can depend on source code for now.
  • Act on reasoning-included per turn (#9402)
    - Reset reasoning-included flag each turn and update compaction test
  • Add text element metadata to types (#9235)
    Initial type tweaking PR to make the diff of
    https://github.com/openai/codex/pull/9116 smaller
    
    This should not change any behavior, just adds some fields to types
  • feat: add support for building with Bazel (#8875)
    This PR configures Codex CLI so it can be built with
    [Bazel](https://bazel.build) in addition to Cargo. The `.bazelrc`
    includes configuration so that remote builds can be done using
    [BuildBuddy](https://www.buildbuddy.io).
    
    If you are familiar with Bazel, things should work as you expect, e.g.,
    run `bazel test //... --keep-going` to run all the tests in the repo,
    but we have also added some new aliases in the `justfile` for
    convenience:
    
    - `just bazel-test` to run tests locally
    - `just bazel-remote-test` to run tests remotely (currently, the remote
    build is for x86_64 Linux regardless of your host platform). Note we are
    currently seeing the following test failures in the remote build, so we
    still need to figure out what is happening here:
    
    ```
    failures:
        suite::compact::manual_compact_twice_preserves_latest_user_messages
        suite::compact_resume_fork::compact_resume_after_second_compaction_preserves_history
        suite::compact_resume_fork::compact_resume_and_fork_preserve_model_history_view
    ```
    
    - `just build-for-release` to build release binaries for all
    platforms/architectures remotely
    
    To setup remote execution:
    - [Create a buildbuddy account](https://app.buildbuddy.io/) (OpenAI
    employees should also request org access at
    https://openai.buildbuddy.io/join/ with their `@openai.com` email
    address.)
    - [Copy your API key](https://app.buildbuddy.io/docs/setup/) to
    `~/.bazelrc` (add the line `build
    --remote_header=x-buildbuddy-api-key=YOUR_KEY`)
    - Use `--config=remote` in your `bazel` invocations (or add `common
    --config=remote` to your `~/.bazelrc`, or use the `just` commands)
    
    ## CI
    
    In terms of CI, this PR introduces `.github/workflows/bazel.yml`, which
    uses Bazel to run the tests _locally_ on Mac and Linux GitHub runners
    (we are working on supporting Windows, but that is not ready yet). Note
    that the failures we are seeing in `just bazel-remote-test` do not occur
    on these GitHub CI jobs, so everything in `.github/workflows/bazel.yml`
    is green right now.
    
    The `bazel.yml` uses extra config in `.github/workflows/ci.bazelrc` so
    that macOS CI jobs build _remotely_ on Linux hosts (using the
    `docker://docker.io/mbolin491/codex-bazel` Docker image declared in the
    root `BUILD.bazel`) using cross-compilation to build the macOS
    artifacts. Then these artifacts are downloaded locally to GitHub's macOS
    runner so the tests can be executed natively. This is the relevant
    config that enables this:
    
    ```
    common:macos --config=remote
    common:macos --strategy=remote
    common:macos --strategy=TestRunner=darwin-sandbox,local
    ```
    
    Because of the remote caching benefits we get from BuildBuddy, these new
    CI jobs can be extremely fast! For example, consider these two jobs that
    ran all the tests on Linux x86_64:
    
    - Bazel 1m37s
    https://github.com/openai/codex/actions/runs/20861063212/job/59940545209?pr=8875
    - Cargo 9m20s
    https://github.com/openai/codex/actions/runs/20861063192/job/59940559592?pr=8875
    
    For now, we will continue to run both the Bazel and Cargo jobs for PRs,
    but once we add support for Windows and running Clippy, we should be
    able to cutover to using Bazel exclusively for PRs, which should still
    speed things up considerably. We will probably continue to run the Cargo
    jobs post-merge for commits that land on `main` as a sanity check.
    
    Release builds will also continue to be done by Cargo for now.
    
    Earlier attempt at this PR: https://github.com/openai/codex/pull/8832
    Earlier attempt to add support for Buck2, now abandoned:
    https://github.com/openai/codex/pull/8504
    
    ---------
    
    Co-authored-by: David Zbarsky <dzbarsky@gmail.com>
    Co-authored-by: Michael Bolin <mbolin@openai.com>