Commit Graph

1364 Commits

  • feat: sqlite 1 (#10004)
    Add a `.sqlite` database to be used to store rollout metatdata (and
    later logs)
    This PR is phase 1:
    * Add the database and the required infrastructure
    * Add a backfill of the database
    * Persist the newly created rollout both in files and in the DB
    * When we need to get metadata or a rollout, consider the `JSONL` as the
    source of truth but compare the results with the DB and show any errors
  • Add exec policy TOML representation (#10026)
    We'd like to represent these in `requirements.toml`. This just adds the
    representation and the tests, doesn't wire it up anywhere yet.
  • feat(core) RequestRule (#9489)
    ## Summary
    Instead of trying to derive the prefix_rule for a command mechanically,
    let's let the model decide for us.
    
    ## Testing
    - [x] tested locally
  • fix(core) info cleanup (#9986)
    ## Summary
    Simplify this logic a bit.
  • [skills] Auto install MCP dependencies when running skils with dependency specs. (#9982)
    Auto install MCP dependencies when running skils with dependency specs.
  • fix: allow unknown fields on Notice in schema (#10041)
    the `notice` field didn't allow unknown fields in the schema, leading to
    issues where they shouldn't be.
    
    Now we allow unknown fields.
    
    <img width="2260" height="720" alt="image"
    src="https://github.com/user-attachments/assets/1de43b60-0d50-4a96-9c9c-34419270d722"
    />
  • fix: enable per-turn updates to web search mode (#10040)
    web_search can now be updated per-turn, for things like changes to
    sandbox policy.
    
    `SandboxPolicy::DangerFullAccess` now sets web_search to `live`, and the
    default is still `cached`.
    
    Added integration tests.
  • enable live web search for DangerFullAccess sandbox policy (#10008)
    Auto-enable live `web_search` tool when sandbox policy is
    `DangerFullAccess`.
    
    Explicitly setting `web_search` (canonical setting), or enabling
    `web_search_cached` or `web_search_request` still takes precedence over
    this sandbox-policy-driven enablement.
  • remove sandbox globals. (#9797)
    Threads sandbox updates through OverrideTurnContext for active turn
    Passes computed sandbox type into safety/exec
  • fix(app-server, core): defer initial context write to rollout file until first turn (#9950)
    ### Overview
    Currently calling `thread/resume` will always bump the thread's
    `updated_at` timestamp. This PR makes it the `updated_at` timestamp
    changes only if a turn is triggered.
    
    ### Additonal context
    What we typically do on resuming a thread is **always** writing “initial
    context” to the rollout file immediately. This initial context includes:
    - Developer instructions derived from sandbox/approval policy + cwd
    - Optional developer instructions (if provided)
    - Optional collaboration-mode instructions
    - Optional user instructions (if provided)
    - Environment context (cwd, shell, etc.)
    
    This PR defers writing the “initial context” to the rollout file until
    the first `turn/start`, so we don't inadvertently bump the thread's
    `updated_at` timestamp until a turn is actually triggered.
    
    This works even though both `thread/resume` and `turn/start` accept
    overrides (such as `model`, `cwd`, etc.) because the initial context is
    seeded from the effective `TurnContext` in memory, computed at
    `turn/start` time, after both sets of overrides have been applied.
    
    **NOTE**: This is a very short-lived solution until we introduce sqlite.
    Then we can remove this.
  • Fix: cap aggregated exec output consistently (#9759)
    ## WHAT?
    - Bias aggregated output toward stderr under contention (2/3 stderr, 1/3
    stdout) while keeping the 1 MiB cap.
    - Rebalance unused stderr share back to stdout when stderr is tiny to
    avoid underfilling.
    - Add tests for contention, small-stderr rebalance, and under-cap
    ordering (stdout then stderr).
    
    ## WHY?
    - Review feedback requested stderr priority under contention.
    - Avoid underfilled aggregated output when stderr is small while
    preserving a consistent cap across exec paths.
    
    ## HOW?
    - Update `aggregate_output` to compute stdout/stderr shares, then
    reassign unused capacity to the other stream.
    - Use the helper in both Windows and async exec paths.
    - Add regression tests for contention/rebalance and under-cap ordering.
    
    ## BEFORE
    ```rust
    // Best-effort aggregate: stdout then stderr (capped).
    let mut aggregated = Vec::with_capacity(
        stdout
            .text
            .len()
            .saturating_add(stderr.text.len())
            .min(EXEC_OUTPUT_MAX_BYTES),
    );
    append_capped(&mut aggregated, &stdout.text, EXEC_OUTPUT_MAX_BYTES);
    append_capped(&mut aggregated, &stderr.text, EXEC_OUTPUT_MAX_BYTES);
    let aggregated_output = StreamOutput {
        text: aggregated,
        truncated_after_lines: None,
    };
    ```
    
    ## AFTER
    ```rust
    fn aggregate_output(
        stdout: &StreamOutput<Vec<u8>>,
        stderr: &StreamOutput<Vec<u8>>,
    ) -> StreamOutput<Vec<u8>> {
        let total_len = stdout.text.len().saturating_add(stderr.text.len());
        let max_bytes = EXEC_OUTPUT_MAX_BYTES;
        let mut aggregated = Vec::with_capacity(total_len.min(max_bytes));
    
        if total_len <= max_bytes {
            aggregated.extend_from_slice(&stdout.text);
            aggregated.extend_from_slice(&stderr.text);
            return StreamOutput {
                text: aggregated,
                truncated_after_lines: None,
            };
        }
    
        // Under contention, reserve 1/3 for stdout and 2/3 for stderr; rebalance unused stderr to stdout.
        let want_stdout = stdout.text.len().min(max_bytes / 3);
        let want_stderr = stderr.text.len();
        let stderr_take = want_stderr.min(max_bytes.saturating_sub(want_stdout));
        let remaining = max_bytes.saturating_sub(want_stdout + stderr_take);
        let stdout_take = want_stdout + remaining.min(stdout.text.len().saturating_sub(want_stdout));
    
        aggregated.extend_from_slice(&stdout.text[..stdout_take]);
        aggregated.extend_from_slice(&stderr.text[..stderr_take]);
    
        StreamOutput {
            text: aggregated,
            truncated_after_lines: None,
        }
    }
    ```
    
    ## TESTS
    - [x] `just fmt`
    - [x] `just fix -p codex-core`
    - [x] `cargo test -p codex-core aggregate_output_`
    - [x] `cargo test -p codex-core`
    - [x] `cargo test --all-features`
    
    ## FIXES
    Fixes #9758
  • Fixing main and make plan mode reasoning effort medium (#9980)
    It's overthinking so much on high and going over the context window.
  • make plan prompt less detailed (#9977)
    This was too much to ask for
  • make cached web_search client-side default (#9974)
    [Experiment](https://console.statsig.com/50aWbk2p4R76rNX9lN5VUw/experiments/codex_web_search_rollout/summary)
    for default cached `web_search` completed; cached chosen as default.
    
    Update client to reflect that.
  • plan prompt (#9975)
    # External (non-OpenAI) Pull Request Requirements
    
    Before opening this Pull Request, please read the dedicated
    "Contributing" markdown file or your PR may be closed:
    https://github.com/openai/codex/blob/main/docs/contributing.md
    
    If your PR conforms to our contribution guidelines, replace this text
    with a detailed and high quality description of your changes.
    
    Include a link to a bug report or enhancement request.
  • prompt (#9970)
    # External (non-OpenAI) Pull Request Requirements
    
    Before opening this Pull Request, please read the dedicated
    "Contributing" markdown file or your PR may be closed:
    https://github.com/openai/codex/blob/main/docs/contributing.md
    
    If your PR conforms to our contribution guidelines, replace this text
    with a detailed and high quality description of your changes.
    
    Include a link to a bug report or enhancement request.
  • prompt final (#9969)
    hopefully final this time (at least tonight) >_<
  • Improve plan mode prompt (#9968)
    # External (non-OpenAI) Pull Request Requirements
    
    Before opening this Pull Request, please read the dedicated
    "Contributing" markdown file or your PR may be closed:
    https://github.com/openai/codex/blob/main/docs/contributing.md
    
    If your PR conforms to our contribution guidelines, replace this text
    with a detailed and high quality description of your changes.
    
    Include a link to a bug report or enhancement request.
  • plan prompt v7 (#9966)
    # External (non-OpenAI) Pull Request Requirements
    
    Before opening this Pull Request, please read the dedicated
    "Contributing" markdown file or your PR may be closed:
    https://github.com/openai/codex/blob/main/docs/contributing.md
    
    If your PR conforms to our contribution guidelines, replace this text
    with a detailed and high quality description of your changes.
    
    Include a link to a bug report or enhancement request.
  • fix: handle all web_search actions and in progress invocations (#9960)
    ### Summary
    - Parse all `web_search` tool actions (`search`, `find_in_page`,
    `open_page`).
    - Previously we only parsed + displayed `search`, which made the TUI
    appear to pause when the other actions were being used.
    - Show in progress `web_search` calls as `Searching the web`
      - Previously we only showed completed tool calls
    
    <img width="308" height="149" alt="image"
    src="https://github.com/user-attachments/assets/90a4e8ff-b06a-48ff-a282-b57b31121845"
    />
    
    ### Tests
    Added + updated tests, tested locally
    
    ### Follow ups
    Update VSCode extension to display these as well
  • Use test_codex more (#9961)
    Reduces boilderplate.
  • Reject request_user_input outside Plan/Pair (#9955)
    ## Context
    
    Previous work in https://github.com/openai/codex/pull/9560 only rejected
    `request_user_input` in Execute and Custom modes. Since then, additional
    modes
    (e.g., Code) were added, so the guard should be mode-agnostic.
    
    ## What changed
    
    - Switch the handler to an allowlist: only Plan and PairProgramming are
    allowed
    - Return the same error for any other mode (including Code)
    - Add a Code-mode rejection test alongside the existing Execute/Custom
    tests
    
    ## Why
    
    This prevents `request_user_input` from being used in modes where it is
    not
    intended, even as new modes are introduced.
  • Add composer config and shared menu surface helpers (#9891)
    Centralize built-in slash-command gating and extract shared menu-surface
    helpers.
    
    - Add bottom_pane::slash_commands and reuse it from composer + command
    popup.
    - Introduce ChatComposerConfig + shared menu surface rendering without
    changing default behavior.
  • plan prompt (#9943)
    # External (non-OpenAI) Pull Request Requirements
    
    Before opening this Pull Request, please read the dedicated
    "Contributing" markdown file or your PR may be closed:
    https://github.com/openai/codex/blob/main/docs/contributing.md
    
    If your PR conforms to our contribution guidelines, replace this text
    with a detailed and high quality description of your changes.
    
    Include a link to a bug report or enhancement request.
  • Add MCP server scopes config and use it as fallback for OAuth login (#9647)
    ### Motivation
    - Allow MCP OAuth flows to request scopes defined in `config.toml`
    instead of requiring users to always pass `--scopes` on the CLI.
    CLI/remote parameters should still override config values.
    
    ### Description
    - Add optional `scopes: Option<Vec<String>>` to `McpServerConfig` and
    `RawMcpServerConfig`, and propagate it through deserialization and the
    built config types.
    - Serialize `scopes` into the MCP server TOML via
    `serialize_mcp_server_table` in `core/src/config/edit.rs` and include
    `scopes` in the generated config schema (`core/config.schema.json`).
    - CLI: update `codex-rs/cli/src/mcp_cmd.rs` `run_login` to fall back to
    `server.scopes` when the `--scopes` flag is empty, with explicit CLI
    scopes still taking precedence.
    - App server: update
    `codex-rs/app-server/src/codex_message_processor.rs`
    `mcp_server_oauth_login` to use `params.scopes.or_else(||
    server.scopes.clone())` so the RPC path also respects configured scopes.
    - Update many test fixtures to initialize the new `scopes` field (set to
    `None`) so test code builds with the new struct field.
    
    ### Testing
    - Ran config tooling and formatters: `just write-config-schema`
    (succeeded), `just fmt` (succeeded), and `just fix -p codex-core`, `just
    fix -p codex-cli`, `just fix -p codex-app-server` (succeeded where
    applicable).
    - Ran unit tests for the CLI: `cargo test -p codex-cli` (passed).
    - Ran unit tests for core: `cargo test -p codex-core` (ran; many tests
    passed but several failed, including model refresh/403-related tests,
    shell snapshot/timeouts, and several `unified_exec` expectations).
    - Ran app-server tests: `cargo test -p codex-app-server` (ran; many
    integration-suite tests failed due to mocked/remote HTTP 401/403
    responses and wiremock expectations).
    
    If you want, I can split the tests into smaller focused runs or help
    debug the failing integration tests (they appear to be unrelated to the
    config change and stem from external HTTP/mocking behaviors encountered
    during the test runs).
    
    ------
    [Codex
    Task](https://chatgpt.com/codex/tasks/task_i_69718f505914832ea1f334b3ba064553)
  • Aligned feature stage names with public feature maturity stages (#9929)
    We've recently standardized a [feature maturity
    model](https://developers.openai.com/codex/feature-maturity) that we're
    using in our docs and support forums to communicate expectations to
    users. This PR updates the internal stage names and descriptions to
    match.
    
    This change involves a simple internal rename and updates to a few
    user-visible strings. No functional change.
  • Add thread/unarchive to restore archived rollouts (#9843)
    ## Summary
    - Adds a new `thread/unarchive` RPC to move archived thread rollouts
    back into the active `sessions/` tree.
    
    ## What changed
    - **Protocol**
      - Adds `thread/unarchive` request/response types and wiring.
    - **Server**
      - Implements `thread_unarchive` in the app server.
      - Validates the archived rollout path and thread ID.
    - Restores the rollout to `sessions/YYYY/MM/DD/...` based on the rollout
    filename timestamp.
    - **Core**
    - Adds `find_archived_thread_path_by_id_str` helper for archived
    rollouts.
    - **Docs**
      - Documents the new RPC and usage example.
    - **Tests**
      - Adds an end-to-end server test that:
        1) starts a thread,
        2) archives it,
        3) unarchives it,
        4) asserts the file is restored to `sessions/`.
    
    ## How to use
    ```json
    { "method": "thread/unarchive", "id": 24, "params": { "threadId": "<thread-id>" } }
    ```
    
    ## Author Codex Session
    
    `codex resume 019bf158-54b6-7960-a696-9d85df7e1bc1` (soon I'll make this
    kind of session UUID forkable by anyone with the right
    `session_object_storage_url` line in their config, but for now just
    pasting it here for my reference)
  • prompt (#9928)
    # External (non-OpenAI) Pull Request Requirements
    
    Before opening this Pull Request, please read the dedicated
    "Contributing" markdown file or your PR may be closed:
    https://github.com/openai/codex/blob/main/docs/contributing.md
    
    If your PR conforms to our contribution guidelines, replace this text
    with a detailed and high quality description of your changes.
    
    Include a link to a bug report or enhancement request.
  • fix: attempt to reduce high cpu usage when using collab (#9776)
    Reproduce with a prompt like this with collab enabled:
    ```
    Examine the code at <some subdirectory with a deeply nested project>.  Find the most urgent issue to resolve and describe it to me.
    ```
    
    Existing behavior causes the top-level agent to busy wait on subagents.
  • Fix flakey shell snapshot test (#9919)
    Sometimes fails with:
    
    ```
    failures:
    
      ---- shell_snapshot::tests::timed_out_snapshot_shell_is_terminated stdout ----
    
      thread 'shell_snapshot::tests::timed_out_snapshot_shell_is_terminated' panicked at codex-rs/core/src/shell_snapshot.rs:588:9:
      expected timeout error, got Failed to execute sh
    
      Caused by:
          Text file busy (os error 26)
    
    
      failures:
          shell_snapshot::tests::timed_out_snapshot_shell_is_terminated
    
      test result: FAILED. 815 passed; 1 failed; 4 ignored; 0 measured; 0 filtered out; finished in 18.00s
    ```
  • Feat: add isOther to question returned by request user input tool (#9890)
    ### Summary
    Add `isOther` to question object from request_user_input tool input and
    remove `other` option from the tool prompt to better handle tool input.
  • Fix up config disabled err msg (#9916)
    **Before:**
    <img width="745" height="375" alt="image"
    src="https://github.com/user-attachments/assets/d6c23562-b87f-4af9-8642-329aab8e594d"
    />
    
    **After:**
    <img width="1042" height="354" alt="image"
    src="https://github.com/user-attachments/assets/c9a2413c-c945-4c34-8b7e-c6c9b8fbf762"
    />
    
    Two changes:
    1. only display if there is a `config.toml` that is skipped (i.e. if
    there is just `.codex/skills` but no `.codex/config.toml` we do not
    display the error)
    2. clarify the implications and the fix in the error message.
  • feat: dynamic tools injection (#9539)
    ## Summary
    Add dynamic tool injection to thread startup in API v2, wire dynamic
    tool calls through the app server to clients, and plumb responses back
    into the model tool pipeline.
    
    ### Flow (high level)
    - Thread start injects `dynamic_tools` into the model tool list for that
    thread (validation is done here).
    - When the model emits a tool call for one of those names, core raises a
    `DynamicToolCallRequest` event.
    - The app server forwards it to the client as `item/tool/call`, waits
    for the client’s response, then submits a `DynamicToolResponse` back to
    core.
    - Core turns that into a `function_call_output` in the next model
    request so the model can continue.
    
    ### What changed
    - Added dynamic tool specs to v2 thread start params and protocol types;
    introduced `item/tool/call` (request/response) for dynamic tool
    execution.
    - Core now registers dynamic tool specs at request time and routes those
    calls via a new dynamic tool handler.
    - App server validates tool names/schemas, forwards dynamic tool call
    requests to clients, and publishes tool outputs back into the session.
    - Integration tests
  • chore(core) move model_instructions_template config (#9871)
    ## Summary
    Move `model_instructions_template` config to the experimental slug while
    we iterate on this feature
    
    ## Testing
    - [x] Tested locally, unit tests still pass
  • feat(tui) /personality (#9718)
    ## Summary
    Adds /personality selector in the TUI, which leverages the new core
    interface in #9644
    
    Notes:
    - We are doing some of our own state management for model_info loading
    here, but not sure if that's ideal. open to opinions on simpler
    approach, but would like to avoid blocking on a larger refactor
    - Right now, the `/personality` selector just hides when the model
    doesn't support it. we can update this behavior down the line
    
    ## Testing
    - [x] Tested locally
    - [x] Added snapshot tests
  • Plan prompt (#9877)
    # External (non-OpenAI) Pull Request Requirements
    
    Before opening this Pull Request, please read the dedicated
    "Contributing" markdown file or your PR may be closed:
    https://github.com/openai/codex/blob/main/docs/contributing.md
    
    If your PR conforms to our contribution guidelines, replace this text
    with a detailed and high quality description of your changes.
    
    Include a link to a bug report or enhancement request.
  • Prompt (#9874)
    # External (non-OpenAI) Pull Request Requirements
    
    Before opening this Pull Request, please read the dedicated
    "Contributing" markdown file or your PR may be closed:
    https://github.com/openai/codex/blob/main/docs/contributing.md
    
    If your PR conforms to our contribution guidelines, replace this text
    with a detailed and high quality description of your changes.
    
    Include a link to a bug report or enhancement request.