4 Commits

  • [codex] Observe remote exec-server lifecycle (#27470)
    ## Summary
    
    - Record bounded duration and outcome metrics for remote environment
    registration and Noise rendezvous connection attempts.
    - Count reconnects by bounded reason: disconnect, connection failure, or
    rejected registration.
    - Trace registration at the owning client boundary without exporting raw
    environment or registration identifiers.
    - Replace the stale pre-Noise WebSocket observability design with the
    current remote transport model.
    
    ## Stack
    
    Review and land this stack in order:
    
    1. #27466 — trace exec-server JSON-RPC requests
    2. #27467 — record bounded connection, request, and process lifecycle
    metrics
    3. #27470 — observe remote registration and Noise rendezvous lifecycle
    **(this PR)**
    
    ## Validation
    
    - `just test -p codex-exec-server --lib` (149 passed)
    - `just test -p codex-cli --test exec_server` (4 passed)
    - `just argument-comment-lint`
    - `just bazel-lock-check`
    - `just fix -p codex-exec-server -p codex-cli`
    - `just fmt`
  • [codex] Record exec-server lifecycle metrics (#27467)
    ## Summary
    
    - Record bounded connection, request, and process lifecycle metrics.
    - Report active gauges from callbacks on every collection, including
    delta exports.
    - Serialize active-count updates so concurrent starts and finishes
    cannot publish stale values.
    - Serialize process exit, explicit termination, and shutdown through the
    process registry so exactly one completion result wins.
    - Keep the implementation small with single-owner RAII guards and one
    real OTLP/HTTP integration test using the existing `wiremock`
    dependency.
    
    ## Root cause
    
    Process exit and session shutdown previously used cloned completion
    state. That avoided duplicate emission, but it duplicated lifecycle
    ownership and made the ordering harder to reason about. The process
    registry mutex already defines the lifecycle ordering, so the final
    implementation stores the metric guard and termination flag directly on
    the process entry. Whichever path claims the entry first owns the
    completion result.
    
    Production metric export uses delta temporality. Event-only synchronous
    gauge recordings disappear after the next collection when no count
    changes, so active counts now use observable callbacks that report
    current state on every collection.
    
    The cleanup also removes the constant `result="accepted"` connection
    tag, redundant route and response assertions, a custom HTTP collector,
    and fallback initialization machinery that did not add behavior.
    
    ## Stack
    
    Review and land this stack in order:
    
    1. #27466 — trace exec-server JSON-RPC requests
    2. #27467 — record bounded connection, request, and process lifecycle
    metrics **(this PR)**
    3. #27470 — observe remote registration and Noise rendezvous lifecycle
    
    ## Validation
    
    - `just test -p codex-exec-server --lib` (158 passed)
    - `just test -p codex-cli --test exec_server` (3 passed)
    - `just test -p codex-otel
    observable_gauge_is_collected_on_every_delta_snapshot` (1 passed)
    - `CARGO_BUILD_JOBS=1 just fix -p codex-otel -p codex-exec-server`
    - `just fmt`
    - `git diff --check`
  • [codex] Initialize exec-server OpenTelemetry at startup (#25019)
    ## Summary
    
    - Initialize stderr tracing and the configured OpenTelemetry provider
    for local and remote `codex exec-server` startup.
    - Instrument the local and remote server entrypoints with a root runtime
    span.
    - Keep raw Noise environment, registration, and stream identifiers out
    of exported spans while preserving them in local debug events.
    - Keep telemetry setup in a focused CLI module instead of growing the
    top-level command entrypoint.
    
    ## Stack
    
    - Previous: none (`#27058` has merged)
    - Next: #27466
    
    ## Validation
    
    - `just test -p codex-exec-server --lib` (139 passed)
    - `just test -p codex-cli --test exec_server` (3 passed)
    - `just bazel-lock-check`
    - `just fix -p codex-exec-server -p codex-cli`
    - `just fmt`
    
    ---------
    
    Co-authored-by: Richard Lee <richardlee@openai.com>
  • cli: add strict config to exec-server (#23719)
    ## Why
    
    PR #20559 added opt-in strict config parsing to the config-loading
    command surfaces, but `codex exec-server` was left out. That meant
    `codex exec-server --strict-config` was rejected even though the command
    can load config for remote registration, and local server startup had no
    way to fail fast on misspelled config keys.
    
    ## What Changed
    
    - Added `--strict-config` to `codex exec-server`.
    - Allowed root-level inheritance from `codex --strict-config
    exec-server`.
    - Validated config before local exec-server startup when strict mode is
    requested.
    - Reused the loaded strict-config-aware config for remote exec-server
    registration auth.
    - Added CLI coverage showing `codex exec-server --strict-config` rejects
    unknown config fields.
    
    ## Verification
    
    - `cargo test -p codex-cli`
    - New integration test:
    `strict_config_rejects_unknown_config_fields_for_exec_server`
    
    ## Documentation
    
    Any strict-config command list on developers.openai.com/codex should
    include `codex exec-server` with the other supported config-loading
    entry points.