Commit Graph

5178 Commits

  • codex debug 7 (guardian approved) (#17123)
    Removes lines 43-49 from core/templates/agents/orchestrator.md.
  • codex debug 5 (guardian approved) (#17121)
    Removes lines 29-35 from core/templates/agents/orchestrator.md.
  • codex debug 3 (guardian approved) (#17119)
    Removes lines 15-21 from core/templates/agents/orchestrator.md.
  • codex debug 1 (guardian approved) (#17117)
    Removes lines 1-7 from core/templates/agents/orchestrator.md.
  • feat: single app-server bootstrap in TUI (#16582)
    Before this, the TUI was starting 2 app-server. One to check the login
    status and one to actually start the session
    
    This PR make only one app-server startup and defer the login check in
    async, outside of the frame rendering path
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Support anyOf and enum in JsonSchema (#16875)
    This brings us into better alignment with the JSON schema subset that is
    supported in
    <https://developers.openai.com/api/docs/guides/structured-outputs#supported-schemas>,
    and also allows us to render richer function signatures in code mode
    (e.g., anyOf{null, OtherObjectType})
  • Remove obsolete codex-cli README (#17096)
    Problem: codex-cli/README.md is obsolete and confusing to keep around.
    
    Solution: Delete codex-cli/README.md so the stale README is no longer
    present in the repository.
  • Remove expired April 2nd tooltip copy (#16698)
    Addresses #16677
    
    Problem: Paid-plan startup tooltips still advertised 2x rate limits
    until April 2nd after that promo had expired.
    
    Solution: Remove the stale expiry copy and use evergreen Codex App /
    Codex startup tips instead.
  • fix: refresh network proxy settings when sandbox mode changes (#17040)
    ## Summary
    
    Fix network proxy sessions so changing sandbox mode recomputes the
    effective managed network policy and applies it to the already-running
    per-session proxy.
    
    ## Root Cause
    
    `danger_full_access_denylist_only` injects `"*"` only while building the
    proxy spec for Full Access. Sessions built that spec once at startup, so
    a later permission switch to Full Access left the live proxy in its
    original restricted policy. Switching back needed the same recompute
    path to remove the synthetic wildcard again.
    
    ## What Changed
    
    - Preserve the original managed network proxy config/requirements so the
    effective spec can be recomputed for a new sandbox policy.
    - Refresh the current session proxy when sandbox settings change, then
    reapply exec-policy network overlays.
    - Add an in-place proxy state update path while rejecting
    listener/port/SOCKS changes that cannot be hot-reloaded.
    - Keep runtime proxy settings cheap to snapshot and update.
    - Add regression coverage for workspace-write -> Full Access ->
    workspace-write.
  • Add project-local codex bug triage skill (#17064)
    Add a `codex-bug` skill to help diagnose and fix bugs in codex.
  • Add remote exec start script (#17059)
    Just pass an SSH host
    ```
    ./scripts/start-codex-exec.sh codex-remote
    ```
  • Add regression tests for JsonSchema (#17052)
    Tests added for existing JsonSchema in
    `codex-rs/tools/src/json_schema_tests.rs`:
    
    - `parse_tool_input_schema_coerces_boolean_schemas`
    - `parse_tool_input_schema_infers_object_shape_and_defaults_properties`
    - `parse_tool_input_schema_normalizes_integer_and_missing_array_items`
    - `parse_tool_input_schema_sanitizes_additional_properties_schema`
    -
    `parse_tool_input_schema_infers_object_shape_from_boolean_additional_properties_only`
    - `parse_tool_input_schema_infers_number_from_numeric_keywords`
    - `parse_tool_input_schema_infers_number_from_multiple_of`
    -
    `parse_tool_input_schema_infers_string_from_enum_const_and_format_keywords`
    - `parse_tool_input_schema_defaults_empty_schema_to_string`
    - `parse_tool_input_schema_infers_array_from_prefix_items`
    -
    `parse_tool_input_schema_preserves_boolean_additional_properties_on_inferred_object`
    -
    `parse_tool_input_schema_infers_object_shape_from_schema_additional_properties_only`
    
    Tests that we expect to fail on the baseline normalizer, but pass with
    the new JsonSchema:
    
    - `parse_tool_input_schema_preserves_nested_nullable_type_union`
    - `parse_tool_input_schema_preserves_nested_any_of_property`
  • fix(tui): reduce startup and new-session latency (#17039)
    ## TL;DR
    
    - Fetches account/rateLimits/read asynchronously so the TUI can continue
    starting without waiting for the rate-limit response.
    - Fixes the /status card so it no longer leaves a stale “refreshing
    cached limits...” notice in terminal history.
    
    ## Problem
    
    The TUI bootstrap path fetched account rate limits synchronously
    (`account/rateLimits/read`) before the event loop started for
    ChatGPT/OpenAI-authenticated startups. This added ~670 ms of blocking
    latency in the measured hot-start case, even though rate-limit data is
    not needed to render the initial UI or accept user input. The delay was
    especially noticeable on hot starts where every other RPC
    (`account/read`, `model/list`, `thread/start`) completed in under 70 ms
    total.
    
    Moving that fetch to the background also exposed a `/status` UI bug: the
    status card is flattened into terminal scrollback when it is inserted. A
    transient "refreshing limits in background..." line could not be cleared
    later, because the async completion updated the retained `HistoryCell`,
    not the already-written terminal history.
    
    ## Mental model
    
    Before this change, `AppServerSession::bootstrap()` performed three
    sequential RPCs: `account/read` → `model/list` →
    `account/rateLimits/read`. The result of the third call was baked into
    `AppServerBootstrap` and applied to the chat widget before the event
    loop began.
    
    After this change, `bootstrap()` only performs two RPCs (`account/read`
    + `model/list`), and rate-limit fetching is kicked off as an async
    background task immediately after the first frame is scheduled. A new
    enum, `RateLimitRefreshOrigin`, tags each fetch so the event handler
    knows whether the result came from the startup prefetch or from a
    user-initiated `/status` command; they have different completion
    side-effects.
    
    The `get_login_status()` helper (used outside the main app flow) was
    also decoupled: it previously called the full `bootstrap()` just to
    check auth mode, wasting model-list and rate-limit work. It now calls
    the narrower `read_account()` directly.
    
    For `/status`, this PR keeps the background refresh request but stops
    printing transient refresh notices into status history when cached
    limits are already available. If a refresh updates the cache, the next
    `/status` command will render the new values.
    
    ## Non-goals
    
    - This change does not alter the rate-limit data itself.
    - This change does not introduce caching, retries, or staleness
    management for rate limits.
    - This change does not affect the `model/list` or `thread/start` RPCs;
    they remain on the critical startup path.
    
    ## Tradeoffs
    
    - **Stale-on-first-render**: The status bar will briefly show no
    rate-limit info until the background fetch completes; observed
    background fetches landed roughly in the 400-900 ms range after the UI
    appeared. This is acceptable because the user cannot meaningfully act on
    rate-limit data in the first fraction of a second.
    - **Error silence on startup prefetch**: If the startup prefetch fails,
    the error is logged but the UI is not notified (unlike `/status` refresh
    failures, which go through the status-command completion path). This
    avoids surfacing transient network errors as a startup blocker.
    - **Static `/status` history**: `/status` output is terminal history,
    not a live widget. The card now avoids progress-style language that
    would appear stuck in scrollback; users can run `/status` again to see
    newly cached values.
    - **`account_auth_mode` field removed from `AppServerBootstrap`**: The
    only consumer was `get_login_status()`, which no longer goes through
    `bootstrap()`. The field was dead weight.
    
    ## Architecture
    
    ### New types
    
    - `RateLimitRefreshOrigin` (in `app_event.rs`): A `Copy` enum
    distinguishing `StartupPrefetch` from `StatusCommand { request_id }`.
    Carried through `RefreshRateLimits` and `RateLimitsLoaded` events so the
    handler applies the right completion behavior.
    
    ### Modified types
    
    - `AppServerBootstrap`: Lost `account_auth_mode` and
    `rate_limit_snapshots`; gained `requires_openai_auth: bool` (passed
    through from the account response so the caller can decide whether to
    fire the prefetch).
    
    ### Control flow
    
    1. `bootstrap()` returns with `requires_openai_auth` and
    `has_chatgpt_account`.
    2. After scheduling the first frame, `App::run_inner` fires
    `refresh_rate_limits(StartupPrefetch)` if both flags are true.
    3. When `RateLimitsLoaded { StartupPrefetch, Ok(..) }` arrives,
    snapshots are applied and a frame is scheduled to repaint the status
    bar.
    4. When `RateLimitsLoaded { StartupPrefetch, Err(..) }` arrives, the
    error is logged and no UI update occurs.
    5. `/status`-initiated refreshes continue to use `StatusCommand {
    request_id }` and call `finish_status_rate_limit_refresh` on completion
    (success or failure).
    6. `/status` history cells with cached rate-limit rows no longer render
    an additional "refreshing limits" notice; the async refresh updates the
    cache for future status output.
    
    ### Extracted method
    
    - `AppServerSession::read_account()`: Factored out of `bootstrap()` so
    that `get_login_status()` can call it independently without triggering
    model-list or rate-limit work.
    
    ## Observability
    
    - The existing `tracing::warn!` for rate-limit fetch failures is
    preserved for the startup path.
    - No new metrics or spans are introduced. The startup-time improvement
    is observable via the existing `ready` timestamp in TUI startup logs.
    
    ## Tests
    
    - Existing tests in `status_command_tests.rs` are updated to match on
    `RateLimitRefreshOrigin::StatusCommand { request_id }` instead of a bare
    `request_id`.
    - Focused `/status` tests now assert that status history avoids
    transient refresh text, continues to request an async refresh, and uses
    refreshed cached limits in future status output.
    - No new tests are added for the startup prefetch path because it is a
    fire-and-forget spawn with no observable side-effect other than the
    widget state update, which is already covered by the
    snapshot-application tests.
    
    ---------
    
    Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
  • Use model metadata for Fast Mode status (#16949)
    Fast Mode status was still tied to one model name in the TUI and
    model-list plumbing. This changes the model metadata shape so a model
    can advertise additional speed tiers, carries that field through the
    app-server model list, and uses it to decide when to show Fast Mode
    status.
    
    For people using Codex, the behavior is intended to stay the same for
    existing models. Fast Mode still requires the existing signed-in /
    feature-gated path; the difference is that the UI can now recognize any
    model the model list marks as Fast-capable, instead of requiring a new
    client-side slug check.
  • [codex] Apply patches through executor filesystem (#17048)
    ## Summary
    - run apply_patch through the executor filesystem when a remote
    environment is present instead of shelling out to the local process
    - thread the executor FileSystem into apply_patch interception and keep
    existing local behavior for non-remote turns
    - make the apply_patch integration harness use the executor filesystem
    for setup/assertions
    - add remote-aware skips for turn-diff coverage that still reads the
    test-runner filesystem
    
    ## Why
    Remote apply_patch needed to mutate the remote workspace instead of the
    local checkout. The tests also needed to seed and assert workspace state
    through the same filesystem abstraction so local and remote runs
    exercise the same behavior.
    
    ## Validation
    - `just fmt`
    - `git diff --check`
    - `cargo check -p core_test_support --tests`
    - `cargo test -p codex-core --test all
    suite::shell_serialization::apply_patch_custom_tool_call -- --nocapture`
    - `cargo test -p codex-core --test all
    suite::apply_patch_cli::apply_patch_cli_updates_file_appends_trailing_newline
    -- --nocapture`
    - remote `cargo test -p codex-core --test all apply_patch_cli --
    --nocapture` (229 passed)
  • Fix remote address format to work with Windows Firewall rules. (#17053)
    since March 27, most elevated sandbox setups are failing with:
    ```
    {
      "code": "helper_firewall_rule_create_or_add_failed",
      "message": "SetRemoteAddresses_failed__Error___code__HRESULT_0xD000000D___message___An_invalid_parameter_was_passed_to_a_service_or_function.",
      "originator": "Codex_Desktop",
      "__metric_type": "sum"
    }
    ```
  • Add WebRTC transport to realtime start (#16960)
    Adds WebRTC startup to the experimental app-server
    `thread/realtime/start` method with an optional transport enum. The
    websocket path remains the default; WebRTC offers create the realtime
    session through the shared start flow and emit the answer SDP via
    `thread/realtime/sdp`.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • [app-server-protocol] introduce generic ServerResponse for app-server-protocol (#17044)
    - introduces `ServerResponse` as the symmetrical typed response union to
    `ServerRequest` for app-server-protocol
    - enables scalable event stream ingestion for use cases such as
    analytics, particularly for tools/approvals
    - no runtime behavior changes, protocol/schema plumbing only
    - mirrors #15921
  • [codex] Migrate apply_patch to executor filesystem (#17027)
    - Migrate apply-patch verification and application internals to use the
    async `ExecutorFileSystem` abstraction from `exec-server`.
    - Convert apply-patch `cwd` handling to `AbsolutePathBuf` through the
    verifier/parser/handler boundary.
    
    Doesn't change how the tool itself works.
  • fix(core) revert Command line in unified exec output (#17031)
    ## Summary
    https://github.com/openai/codex/pull/13860 changed the serialized output
    format of Unified Exec. This PR reverts those changes and some related
    test changes
    
    ## Testing
    - [x] Update tests
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • [codex] Fix unified exec test build (#17032)
    ## Summary
    - Remove the stale `?` after `AbsolutePathBuf::join` in the unified exec
    integration test helper.
    
    ## Root Cause
    - `AbsolutePathBuf::join` was made infallible, but
    `core/tests/suite/unified_exec.rs` still treated it as a `Result`, which
    broke the Windows test build for the `all` integration test target.
    
    ## Validation
    - `just fmt`
    - `cargo test -p codex-core --test all
    unified_exec_resolves_relative_workdir`
  • app-server: Allow enabling remote control in runtime (#16973)
    Refresh the feature flag on writes to the config.
  • Add full-ci branch trigger (#16980)
    Allow branches to trigger full ci (helpful to run remote tests)
  • app-server: Move watch_id to request of fs/watch (#17026)
    It's easier for clients to maintain watchers if they define the watch
    id, so move it into the request.
    It's not used yet, so should be a safe change.
  • [codex] Make unified exec tests remote aware (#16977)
    ## Summary
    - Convert unified exec integration tests that can run against the remote
    executor to use the remote-aware test harness.
    - Create workspace directories through the executor filesystem for
    remote runs.
    - Install `python3` and `zsh` in the remote test container so restored
    Python/zsh-based test commands work in fresh Ubuntu containers.
    
    ## Validation
    - `just fmt`
    - `cargo test -p codex-core --test all unified_exec_defaults_to_pipe`
    - `cargo test -p codex-core --test all unified_exec_can_enable_tty`
    - `cargo test -p codex-core --test all unified_exec`
    - Remote on `codex-remote`: `source scripts/test-remote-env.sh && cd
    codex-rs && cargo test -p codex-core --test all unified_exec`
    - `just fix -p codex-core`
  • Update README (#16348)
    Rename ChatGPT Team to ChatGPT Business as the correct plan name in the
    README.
  • [codex] Make AbsolutePathBuf joins infallible (#16981)
    Having to check for errors every time join is called is painful and
    unnecessary.
  • fix(guardian): don't throw away transcript when over budget (#16956)
    ## Description
    
    This PR changes guardian transcript compaction so oversized
    conversations no longer collapse into a nearly empty placeholder.
    
    Before this change, if the retained user history alone exceeded the
    message budget, guardian would replace the entire transcript with
    `<transcript omitted to preserve budget for planned action>`!
    
    That meant approvals, especially network approvals, could lose the
    recent tool call and tool result that explained what guardian was
    actually reviewing. Now we keep a compact but usable transcript instead
    of dropping it all.
    
    ### Before
    ```
    The following is the Codex agent history whose request action you are assessing...
    >>> TRANSCRIPT START
    <transcript omitted to preserve budget for planned action>
    >>> TRANSCRIPT END
    
    Conversation transcript omitted due to size.
    
    The Codex agent has requested the following action:
    >>> APPROVAL REQUEST START
    Retry reason:
    Sandbox blocked outbound network access.
    
    Assess the exact planned action below. Use read-only tool checks when local state matters.
    Planned action JSON:
    {
      "tool": "network_access",
      "target": "https://example.com:443",
      "host": "example.com",
      "protocol": "https",
      "port": 443
    }
    >>> APPROVAL REQUEST END
    ```
    
    ### After
    ```
    The following is the Codex agent history whose request action you are assessing...
    >>> TRANSCRIPT START
    [1] user: Please investigate why uploads to example.com are failing and retry if needed.
    [8] user: If the request looks correct, go ahead and try again with network access.
    [9] tool shell call: {"command":["curl","-X","POST","https://example.com/upload"],"cwd":"/repo"}
    [10] tool shell result: sandbox blocked outbound network access
    >>> TRANSCRIPT END
    
    Some conversation entries were omitted.
    
    The Codex agent has requested the following action:
    >>> APPROVAL REQUEST START
    Retry reason:
    Sandbox blocked outbound network access.
    
    Assess the exact planned action below. Use read-only tool checks when local state matters.
    Planned action JSON:
    {
      "tool": "network_access",
      "target": "https://example.com:443",
      "host": "example.com",
      "protocol": "https",
      "port": 443
    }
    >>> APPROVAL REQUEST END
    ```
  • feat(analytics): generate an installation_id and pass it in responsesapi client_metadata (#16912)
    ## Summary
    
    This adds a stable Codex installation ID and includes it on Responses
    API requests via `x-codex-installation-id` passed in via the
    `client_metadata` field for analytics/debugging.
    
    The main pieces are:
    - persist a UUID in `$CODEX_HOME/installation_id`
    - thread the installation ID into `ModelClient`
    - send it in `client_metadata` on Responses requests so it works
    consistently across HTTP and WebSocket transports
  • Fix missing resume hint on zero-token exits (#16987)
    Addresses #16421
    
    Problem: Resumed interactive sessions exited before new token usage
    skipped all footer lines, hiding the `codex resume` continuation
    command.
    
    It's not clear whether this was an intentional design choice, but I
    think it's reasonable to expect this message under these circumstances.
    
    Solution: Compose token usage and resume hints independently so
    resumable sessions still print the continuation command with zero usage.
  • Preserve null developer instructions (#16976)
    Preserve explicit null developer-instruction overrides across app-server
    resume and fork flows.
  • Fix nested exec thread ID restore (#16882)
    Addresses #15527
    
    Problem: Nested `codex exec` commands could source a shell snapshot that
    re-exported the parent `CODEX_THREAD_ID`, so commands inside the nested
    session were attributed to the wrong thread.
    
    Solution: Reapply the live command env's `CODEX_THREAD_ID` after
    sourcing the snapshot.
  • Fix read-only apply_patch rejection message (#16885)
    Addresses #15532
    
    Problem: Nested read-only `apply_patch` rejections report in-project
    files as outside the project.
    
    Solution: Choose the rejection message based on sandbox mode so
    read-only sessions report a read-only-specific reason, and add focused
    safety coverage.
  • Stabilize flaky multi-agent followup interrupt test (#16739)
    Problem: The multi-agent followup interrupt test polled history before
    interrupt cleanup and mailbox wakeup were guaranteed to settle, which
    made it flaky under CI scheduling variance.
    
    Solution: Wait for the child turn's `TurnAborted(Interrupted)` event
    before asserting that the redirected assistant envelope is recorded and
    no plain user message is left behind.
  • [codex] reduce module visibility (#16978)
    ## Summary
    - reduce public module visibility across Rust crates, preferring private
    or crate-private modules with explicit crate-root public exports
    - update external call sites and tests to use the intended public crate
    APIs instead of reaching through module trees
    - add the module visibility guideline to AGENTS.md
    
    ## Validation
    - `cargo check --workspace --all-targets --message-format=short` passed
    before the final fix/format pass
    - `just fix` completed successfully
    - `just fmt` completed successfully
    - `git diff --check` passed
  • [codex] ez - rename env=>request in codex-rs/core/src/unified_exec/process_manager.rs (#16724)
    # External (non-OpenAI) Pull Request Requirements
    
    Before opening this Pull Request, please read the dedicated
    "Contributing" markdown file or your PR may be closed:
    https://github.com/openai/codex/blob/main/docs/contributing.md
    
    If your PR conforms to our contribution guidelines, replace this text
    with a detailed and high quality description of your changes.
    
    Include a link to a bug report or enhancement request.
  • collapse dev message into one (#16988)
    collapse image-gen dev message into one
  • Honor null thread instructions (#16964)
    - Treat explicit null thread instructions as a blank-slate override
    while preserving omitted-field fallback behavior.
    - Preserve null through rollout resume/fork and keep explicit empty
    strings distinct.
    - Add app-server v2 start/fork coverage for the tri-state instruction
    params.
  • Make AGENTS.md discovery FS-aware (#15826)
    ## Summary
    - make AGENTS.md discovery and loading fully FS-aware and remove the
    non-FS discover helper
    - migrate remote-aware codex-core tests to use TestEnv workspace setup
    instead of syncing a local workspace copy
    - add AGENTS.md corner-case coverage, including directory fallbacks and
    remote-aware integration coverage
    
    ## Testing
    - cargo test -p codex-core project_doc -- --nocapture
    - cargo test -p codex-core hierarchical_agents -- --nocapture
    - cargo test -p codex-core agents_md -- --nocapture
    - cargo test -p codex-tui status -- --nocapture
    - cargo test -p codex-tui-app-server status -- --nocapture
    - just fix
    - just fmt
    - just bazel-lock-update
    - just bazel-lock-check
    - just argument-comment-lint
    - remote Linux executor tests in progress via scripts/test-remote-env.sh