Commit Graph

22 Commits

  • Use workspace requirements for guardian prompt override (#14727)
    ## Summary
    - move `guardian_developer_instructions` from managed config into
    workspace-managed `requirements.toml`
    - have guardian continue using the override when present and otherwise
    fall back to the bundled local guardian prompt
    - keep the generalized prompt-quality improvements in the shared
    guardian default prompt
    - update requirements parsing, layering, schema, and tests for the new
    source of truth
    
    ## Context
    This replaces the earlier managed-config / MDM rollout plan.
    
    The intended rollout path is workspace-managed requirements, including
    cloud enterprise policies, rather than backend model metadata, Statsig,
    or Jamf-managed config. That keeps the default/fallback behavior local
    to `codex-rs` while allowing faster policy updates through the
    enterprise requirements plane.
    
    This is intentionally an admin-managed policy input, not a user
    preference: the guardian prompt should come either from the bundled
    `codex-rs` default or from enterprise-managed `requirements.toml`, and
    normal user/project/session config should not override it.
    
    ## Updating The OpenAI Prompt
    After this lands, the OpenAI-specific guardian prompt should be updated
    through the workspace Policies UI at `/codex/settings/policies` rather
    than through Jamf or codex-backend model metadata.
    
    Operationally:
    - open the workspace Policies editor as a Codex admin
    - edit the default `requirements.toml` policy, or a higher-precedence
    group-scoped override if we ever want different behavior for a subset of
    users
    - set `guardian_developer_instructions = """..."""` to the full
    OpenAI-specific guardian prompt text
    - save the policy; codex-backend stores the raw TOML and `codex-rs`
    fetches the effective requirements file from `/wham/config/requirements`
    
    When updating the OpenAI-specific prompt, keep it aligned with the
    shared default guardian policy in `codex-rs` except for intentional
    OpenAI-only additions.
    
    ## Testing
    - `cargo check --tests -p codex-core -p codex-config -p
    codex-cloud-requirements --message-format short`
    - `cargo run -p codex-core --bin codex-write-config-schema`
    - `cargo fmt`
    - `git diff --check`
    
    Co-authored-by: Codex <noreply@openai.com>
  • Apply argument comment lint across codex-rs (#14652)
    ## Why
    
    Once the repo-local lint exists, `codex-rs` needs to follow the
    checked-in convention and CI needs to keep it from drifting. This commit
    applies the fallback `/*param*/` style consistently across existing
    positional literal call sites without changing those APIs.
    
    The longer-term preference is still to avoid APIs that require comments
    by choosing clearer parameter types and call shapes. This PR is
    intentionally the mechanical follow-through for the places where the
    existing signatures stay in place.
    
    After rebasing onto newer `main`, the rollout also had to cover newly
    introduced `tui_app_server` call sites. That made it clear the first cut
    of the CI job was too expensive for the common path: it was spending
    almost as much time installing `cargo-dylint` and re-testing the lint
    crate as a representative test job spends running product tests. The CI
    update keeps the full workspace enforcement but trims that extra
    overhead from ordinary `codex-rs` PRs.
    
    ## What changed
    
    - keep a dedicated `argument_comment_lint` job in `rust-ci`
    - mechanically annotate remaining opaque positional literals across
    `codex-rs` with exact `/*param*/` comments, including the rebased
    `tui_app_server` call sites that now fall under the lint
    - keep the checked-in style aligned with the lint policy by using
    `/*param*/` and leaving string and char literals uncommented
    - cache `cargo-dylint`, `dylint-link`, and the relevant Cargo
    registry/git metadata in the lint job
    - split changed-path detection so the lint crate's own `cargo test` step
    runs only when `tools/argument-comment-lint/*` or `rust-ci.yml` changes
    - continue to run the repo wrapper over the `codex-rs` workspace, so
    product-code enforcement is unchanged
    
    Most of the code changes in this commit are intentionally mechanical
    comment rewrites or insertions driven by the lint itself.
    
    ## Verification
    
    - `./tools/argument-comment-lint/run.sh --workspace`
    - `cargo test -p codex-tui-app-server -p codex-tui`
    - parsed `.github/workflows/rust-ci.yml` locally with PyYAML
    
    ---
    
    * -> #14652
    * #14651
  • Move TUI on top of app server (parallel code) (#14717)
    This PR replicates the `tui` code directory and creates a temporary
    parallel `tui_app_server` directory. It also implements a new feature
    flag `tui_app_server` to select between the two tui implementations.
    
    Once the new app-server-based TUI is stabilized, we'll delete the old
    `tui` directory and feature flag.
  • Add auth 401 observability to client bug reports (#14611)
    CXC-392
    
      [With
      401](https://openai.sentry.io/issues/7333870443/?project=4510195390611458&query=019ce8f8-560c-7f10-a00a-c59553740674&referrer=issue-stream)
      <img width="1909" height="555" alt="401 auth tags in Sentry"
      src="https://github.com/user-attachments/assets/412ea950-61c4-4780-9697-15c270971ee3"
      />
    
    
      - auth_401_*: preserved facts from the latest unauthorized response snapshot
      - auth_*: latest auth-related facts from the latest request attempt
      - auth_recovery_*: unauthorized recovery state and follow-up result
    
    
      Without 401
      <img width="1917" height="522" alt="happy-path auth tags in Sentry"
      src="https://github.com/user-attachments/assets/3381ed28-8022-43b0-b6c0-623a630e679f"
      />
    
      ###### Summary
      - Add client-visible 401 diagnostics for auth attachment, upstream auth classification, and 401 request id / cf-ray correlation.
      - Record unauthorized recovery mode, phase, outcome, and retry/follow-up status without changing auth behavior.
      - Surface the highest-signal auth and recovery fields on uploaded client bug reports so they are usable in Sentry.
      - Preserve original unauthorized evidence under `auth_401_*` while keeping follow-up result tags separate.
    
      ###### Rationale (from spec findings)
      - The dominant bucket needed proof of whether the client attached auth before send or upstream still classified the request as missing auth.
      - Client uploads needed to show whether unauthorized recovery ran and what the client tried next.
      - Request id and cf-ray needed to be preserved on the unauthorized response so server-side correlation is immediate.
      - The bug-report path needed the same auth evidence as the request telemetry path, otherwise the observability would not be operationally useful.
    
      ###### Scope
      - Add auth 401 and unauthorized-recovery observability in `codex-rs/core`, `codex-rs/codex-api`, and `codex-rs/otel`, including feedback-tag surfacing.
      - Keep auth semantics, refresh behavior, retry behavior, endpoint classification, and geo-denial follow-up work out of this PR.
    
      ###### Trade-offs
      - This exports only safe auth evidence: header presence/name, upstream auth classification, request ids, and recovery state. It does not export token values or raw upstream bodies.
      - This keeps websocket connection reuse as a transport clue because it can help distinguish stale reused sessions from fresh reconnects.
      - Misroute/base-url classification and geo-denial are intentionally deferred to a separate follow-up PR so this review stays focused on the dominant auth 401 bucket.
    
      ###### Client follow-up
      - PR 2 will add misroute/provider and geo-denial observability plus the matching feedback-tag surfacing.
      - A separate host/app-server PR should log auth-decision inputs so pre-send host auth state can be correlated with client request evidence.
      - `device_id` remains intentionally separate until there is a safe existing source on the feedback upload path.
    
      ###### Testing
      - `cargo test -p codex-core refresh_available_models_sorts_by_priority`
      - `cargo test -p codex-core emit_feedback_request_tags_`
      - `cargo test -p codex-core emit_feedback_auth_recovery_tags_`
      - `cargo test -p codex-core auth_request_telemetry_context_tracks_attached_auth_and_retry_phase`
      - `cargo test -p codex-core extract_response_debug_context_decodes_identity_headers`
      - `cargo test -p codex-core identity_auth_details`
      - `cargo test -p codex-core telemetry_error_messages_preserve_non_http_details`
      - `cargo test -p codex-core --all-features --no-run`
      - `cargo test -p codex-otel otel_export_routing_policy_routes_api_request_auth_observability`
      - `cargo test -p codex-otel otel_export_routing_policy_routes_websocket_connect_auth_observability`
      - `cargo test -p codex-otel otel_export_routing_policy_routes_websocket_request_transport_observability`
  • Override local apps settings with requirements.toml settings (#14304)
    This PR changes app and connector enablement when `requirements.toml` is
    present locally or via remote configuration.
    
    For apps.* entries:
    - `enabled = false` in `requirements.toml` overrides the user’s local
    `config.toml` and forces the app to be disabled.
    - `enabled = true` in `requirements.toml` does not re-enable an app the
    user has disabled in config.toml.
    
    This behavior applies whether or not the user has an explicit entry for
    that app in `config.toml`. It also applies to cloud-managed policies and
    configurations when the admin sets the override through
    `requirements.toml`.
    
    Scenarios tested and verified:
    - Remote managed, user config (present) override
    - Admin-defined policies & configurations include a connector override:
      `[apps.<appID>]
    enabled = false`
    - User's config.toml has the same connector configured with `enabled =
    true`
      - TUI/App should show connector as disabled
      - Connector should be unavailable for use in the composer
      
    - Remote managed, user config (absent) override
    - Admin-defined policies & configurations include a connector override:
      `[apps.<appID>]
    enabled = false`
      - User's config.toml has no entry for the the same connector
      - TUI/App should show connector as disabled
      - Connector should be unavailable for use in the composer
      
    - Locally managed, user config (present) override
      - Local requirements.toml includes a connector override:
      `[apps.<appID>]
    enabled = false`
    - User's config.toml has the same connector configured with `enabled =
    true`
      - TUI/App should show connector as disabled
      - Connector should be unavailable for use in the composer
    
    - Locally managed, user config (absent) override
      - Local requirements.toml includes a connector override:
      `[apps.<appID>]
    enabled = false`
      - User's config.toml has no entry for the the same connector
      - TUI/App should show connector as disabled
      - Connector should be unavailable for use in the composer
    
    
    
    
    <img width="1446" height="753" alt="image"
    src="https://github.com/user-attachments/assets/61c714ca-dcca-4952-8ad2-0afc16ff3835"
    />
    <img width="595" height="233" alt="image"
    src="https://github.com/user-attachments/assets/7c8ab147-8fd7-429a-89fb-591c21c15621"
    />
  • Refactor cloud requirements error and surface in JSON-RPC error (#14504)
    Refactors cloud requirements error handling to carry structured error
    metadata and surfaces that metadata through JSON-RPC config-load
    failures, including:
    * adds typed CloudRequirementsLoadErrorCode values plus optional
    statusCode
    * marks thread/start, thread/resume, and thread/fork config failures
    with structured cloud-requirements error data
  • fix: properly handle 401 error in clound requirement fetch. (#14049)
    Handle cloud requirements 401s with the same auth recovery flow as
    normal requests, so permanent refresh failures surface the existing
    user-facing auth message instead of a generic workspace-config load
    error.
  • config: enforce enterprise feature requirements (#13388)
    ## Why
    
    Enterprises can already constrain approvals, sandboxing, and web search
    through `requirements.toml` and MDM, but feature flags were still only
    configurable as managed defaults. That meant an enterprise could suggest
    feature values, but it could not actually pin them.
    
    This change closes that gap and makes enterprise feature requirements
    behave like the other constrained settings. The effective feature set
    now stays consistent with enterprise requirements during config load,
    when config writes are validated, and when runtime code mutates feature
    flags later in the session.
    
    It also tightens the runtime API for managed features. `ManagedFeatures`
    now follows the same constraint-oriented shape as `Constrained<T>`
    instead of exposing panic-prone mutation helpers, and production code
    can no longer construct it through an unconstrained `From<Features>`
    path.
    
    The PR also hardens the `compact_resume_fork` integration coverage on
    Windows. After the feature-management changes,
    `compact_resume_after_second_compaction_preserves_history` was
    overflowing the libtest/Tokio thread stacks on Windows, so the test now
    uses an explicit larger-stack harness as a pragmatic mitigation. That
    may not be the ideal root-cause fix, and it merits a parallel
    investigation into whether part of the async future chain should be
    boxed to reduce stack pressure instead.
    
    ## What Changed
    
    Enterprises can now pin feature values in `requirements.toml` with the
    requirements-side `features` table:
    
    ```toml
    [features]
    personality = true
    unified_exec = false
    ```
    
    Only canonical feature keys are allowed in the requirements `features`
    table; omitted keys remain unconstrained.
    
    - Added a requirements-side pinned feature map to
    `ConfigRequirementsToml`, threaded it through source-preserving
    requirements merge and normalization in `codex-config`, and made the
    TOML surface use `[features]` (while still accepting legacy
    `[feature_requirements]` for compatibility).
    - Exposed `featureRequirements` from `configRequirements/read`,
    regenerated the JSON/TypeScript schema artifacts, and updated the
    app-server README.
    - Wrapped the effective feature set in `ManagedFeatures`, backed by
    `ConstrainedWithSource<Features>`, and changed its API to mirror
    `Constrained<T>`: `can_set(...)`, `set(...) -> ConstraintResult<()>`,
    and result-returning `enable` / `disable` / `set_enabled` helpers.
    - Removed the legacy-usage and bulk-map passthroughs from
    `ManagedFeatures`; callers that need those behaviors now mutate a plain
    `Features` value and reapply it through `set(...)`, so the constrained
    wrapper remains the enforcement boundary.
    - Removed the production loophole for constructing unconstrained
    `ManagedFeatures`. Non-test code now creates it through the configured
    feature-loading path, and `impl From<Features> for ManagedFeatures` is
    restricted to `#[cfg(test)]`.
    - Rejected legacy feature aliases in enterprise feature requirements,
    and return a load error when a pinned combination cannot survive
    dependency normalization.
    - Validated config writes against enterprise feature requirements before
    persisting changes, including explicit conflicting writes and
    profile-specific feature states that normalize into invalid
    combinations.
    - Updated runtime and TUI feature-toggle paths to use the constrained
    setter API and to persist or apply the effective post-constraint value
    rather than the requested value.
    - Updated the `core_test_support` Bazel target to include the bundled
    core model-catalog fixtures in its runtime data, so helper code that
    resolves `core/models.json` through runfiles works in remote Bazel test
    environments.
    - Renamed the core config test coverage to emphasize that effective
    feature values are normalized at runtime, while conflicting persisted
    config writes are rejected.
    - Ran `compact_resume_after_second_compaction_preserves_history` inside
    an explicit 8 MiB test thread and Tokio runtime worker stack, following
    the existing larger-stack integration-test pattern, to keep the Windows
    `compact_resume_fork` test slice from aborting while a parallel
    investigation continues into whether some of the underlying async
    futures should be boxed.
    
    ## Verification
    
    - `cargo test -p codex-config`
    - `cargo test -p codex-core feature_requirements_ -- --nocapture`
    - `cargo test -p codex-core
    load_requirements_toml_produces_expected_constraints -- --nocapture`
    - `cargo test -p codex-core
    compact_resume_after_second_compaction_preserves_history -- --nocapture`
    - `cargo test -p codex-core compact_resume_fork -- --nocapture`
    - Re-ran the built `codex-core` `tests/all` binary with
    `RUST_MIN_STACK=262144` for
    `compact_resume_after_second_compaction_preserves_history` to confirm
    the explicit-stack harness fixes the deterministic low-stack repro.
    - `cargo test -p codex-core`
    - This still fails locally in unrelated integration areas that expect
    the `codex` / `test_stdio_server` binaries or hit existing `search_tool`
    wiremock mismatches.
    
    ## Docs
    
    `developers.openai.com/codex` should document the requirements-side
    `[features]` table for enterprise and MDM-managed configuration,
    including that it only accepts canonical feature keys and that
    conflicting config writes are rejected.
  • Make cloud_requirements fail close (#13063)
    Make it fail-close only for CLI for now
    Will extend this for app-server later
  • Add a background job to refresh the requirements local cache (#12936)
    - Update the cloud requirements cache TTL to 30 minutes.
    - Add a background job to refresh the cache every 5 minutes.
    - Ensure there is only one refresh job per process.
  • Cache cloud requirements (#11305)
    We're loading these from the web on every startup. This puts them in a
    local file with a 1hr TTL.
    
    We sign the downloaded requirements with a key compiled into the Codex
    CLI to prevent unsophisticated tampering (determined circumvention is
    outside of our threat model: after all, one could just compile Codex
    without any of these checks).
    
    If any of the following are true, we ignore the local cache and re-fetch
    from Cloud:
    * The signature is invalid for the payload (== requirements, sign time,
    ttl, user identity)
    * The identity does not match the auth'd user's identity
    * The TTL has expired
    * We cannot parse requirements.toml from the payload
  • feat(core): add network constraints schema to requirements.toml (#10958)
    ## Summary
    
    Add `requirements.toml` schema support for admin-defined network
    constraints in the requirements layer
    
    example config:
    
    ```
    [experimental_network]
    enabled = true
    allowed_domains = ["api.openai.com"]
    denied_domains = ["example.com"]
    ```
  • feat: add support for allowed_web_search_modes in requirements.toml (#10964)
    This PR makes it possible to disable live web search via an enterprise
    config even if the user is running in `--yolo` mode (though cached web
    search will still be available). To do this, create
    `/etc/codex/requirements.toml` as follows:
    
    ```toml
    # "live" is not allowed; "disabled" is allowed even though not listed explicitly.
    allowed_web_search_modes = ["cached"]
    ```
    
    Or set `requirements_toml_base64` MDM as explained on
    https://developers.openai.com/codex/security/#locations.
    
    ### Why
    - Enforce admin/MDM/`requirements.toml` constraints on web-search
    behavior, independent of user config and per-turn sandbox defaults.
    - Ensure per-turn config resolution and review-mode overrides never
    crash when constraints are present.
    
    ### What
    - Add `allowed_web_search_modes` to requirements parsing and surface it
    in app-server v2 `ConfigRequirements` (`allowedWebSearchModes`), with
    fixtures updated.
    - Define a requirements allowlist type (`WebSearchModeRequirement`) and
    normalize semantics:
      - `disabled` is always implicitly allowed (even if not listed).
      - An empty list is treated as `["disabled"]`.
    - Make `Config.web_search_mode` a `Constrained<WebSearchMode>` and apply
    requirements via `ConstrainedWithSource<WebSearchMode>`.
    - Update per-turn resolution (`resolve_web_search_mode_for_turn`) to:
    - Prefer `Live → Cached → Disabled` when
    `SandboxPolicy::DangerFullAccess` is active (subject to requirements),
    unless the user preference is explicitly `Disabled`.
    - Otherwise, honor the user’s preferred mode, falling back to an allowed
    mode when necessary.
    - Update TUI `/debug-config` and app-server mapping to display
    normalized `allowed_web_search_modes` (including implicit `disabled`).
    - Fix web-search integration tests to assert cached behavior under
    `SandboxPolicy::ReadOnly` (since `DangerFullAccess` legitimately prefers
    `live` when allowed).
  • Cloud Requirements: increase timeout and retries (#10631)
    Add retries and an increased-length timeout for loading Cloud
    Requirements.
    
    Co-authored-by: alexsong-oai <alexsong@openai.com>
  • Fix minor typos in comments and documentation (#10287)
    ## Summary
    
    I have read the contribution guidelines.  
    All changes in this PR are limited to text corrections and do not modify
    any business logic, runtime behavior, or user-facing functionality.
    
    ## Details
    
    This PR fixes several minor typos, including:
    
    - `create` -> `crate`
    - `analagous` -> `analogous`
    - `apply-patch` -> `apply_patch`
    - `codecs` -> `codex`
    - ` '/" ` -> ` '/' `
    - `Respesent` -> `Represent`
  • Turn on cloud requirements for business too (#10283)
    Need to check "enterprise" and "business"
  • Add enforce_residency to requirements (#10263)
    Add `enforce_residency` to requirements.toml and thread it through to a
    header on `default_client`.
  • Load exec policy rules from requirements (#10190)
    `requirements.toml` should be able to specify rules which always run. 
    
    My intention here was that these rules could only ever be restrictive,
    which means the decision can be "prompt" or "forbidden" but never
    "allow". A requirement of "you must always allow this command" didn't
    make sense to me, but happy to be gaveled otherwise.
    
    Rules already applies the most restrictive decision, so we can safely
    merge these with rules found in other config folders.
  • Fetch Requirements from cloud (#10167)
    Load requirements from Codex Backend. It only does this for enterprise
    customers signed in with ChatGPT.
    
    Todo in follow-up PRs:
    * Add to app-server and exec too
    * Switch from fail-open to fail-closed on failure