Commit Graph

515 Commits

  • feat: sqlite 1 (#10004)
    Add a `.sqlite` database to be used to store rollout metatdata (and
    later logs)
    This PR is phase 1:
    * Add the database and the required infrastructure
    * Add a backfill of the database
    * Persist the newly created rollout both in files and in the DB
    * When we need to get metadata or a rollout, consider the `JSONL` as the
    source of truth but compare the results with the DB and show any errors
  • feat(core) RequestRule (#9489)
    ## Summary
    Instead of trying to derive the prefix_rule for a command mechanically,
    let's let the model decide for us.
    
    ## Testing
    - [x] tested locally
  • fix: enable per-turn updates to web search mode (#10040)
    web_search can now be updated per-turn, for things like changes to
    sandbox policy.
    
    `SandboxPolicy::DangerFullAccess` now sets web_search to `live`, and the
    default is still `cached`.
    
    Added integration tests.
  • remove sandbox globals. (#9797)
    Threads sandbox updates through OverrideTurnContext for active turn
    Passes computed sandbox type into safety/exec
  • make cached web_search client-side default (#9974)
    [Experiment](https://console.statsig.com/50aWbk2p4R76rNX9lN5VUw/experiments/codex_web_search_rollout/summary)
    for default cached `web_search` completed; cached chosen as default.
    
    Update client to reflect that.
  • fix: handle all web_search actions and in progress invocations (#9960)
    ### Summary
    - Parse all `web_search` tool actions (`search`, `find_in_page`,
    `open_page`).
    - Previously we only parsed + displayed `search`, which made the TUI
    appear to pause when the other actions were being used.
    - Show in progress `web_search` calls as `Searching the web`
      - Previously we only showed completed tool calls
    
    <img width="308" height="149" alt="image"
    src="https://github.com/user-attachments/assets/90a4e8ff-b06a-48ff-a282-b57b31121845"
    />
    
    ### Tests
    Added + updated tests, tested locally
    
    ### Follow ups
    Update VSCode extension to display these as well
  • Use test_codex more (#9961)
    Reduces boilderplate.
  • Reject request_user_input outside Plan/Pair (#9955)
    ## Context
    
    Previous work in https://github.com/openai/codex/pull/9560 only rejected
    `request_user_input` in Execute and Custom modes. Since then, additional
    modes
    (e.g., Code) were added, so the guard should be mode-agnostic.
    
    ## What changed
    
    - Switch the handler to an allowlist: only Plan and PairProgramming are
    allowed
    - Return the same error for any other mode (including Code)
    - Add a Code-mode rejection test alongside the existing Execute/Custom
    tests
    
    ## Why
    
    This prevents `request_user_input` from being used in modes where it is
    not
    intended, even as new modes are introduced.
  • Add MCP server scopes config and use it as fallback for OAuth login (#9647)
    ### Motivation
    - Allow MCP OAuth flows to request scopes defined in `config.toml`
    instead of requiring users to always pass `--scopes` on the CLI.
    CLI/remote parameters should still override config values.
    
    ### Description
    - Add optional `scopes: Option<Vec<String>>` to `McpServerConfig` and
    `RawMcpServerConfig`, and propagate it through deserialization and the
    built config types.
    - Serialize `scopes` into the MCP server TOML via
    `serialize_mcp_server_table` in `core/src/config/edit.rs` and include
    `scopes` in the generated config schema (`core/config.schema.json`).
    - CLI: update `codex-rs/cli/src/mcp_cmd.rs` `run_login` to fall back to
    `server.scopes` when the `--scopes` flag is empty, with explicit CLI
    scopes still taking precedence.
    - App server: update
    `codex-rs/app-server/src/codex_message_processor.rs`
    `mcp_server_oauth_login` to use `params.scopes.or_else(||
    server.scopes.clone())` so the RPC path also respects configured scopes.
    - Update many test fixtures to initialize the new `scopes` field (set to
    `None`) so test code builds with the new struct field.
    
    ### Testing
    - Ran config tooling and formatters: `just write-config-schema`
    (succeeded), `just fmt` (succeeded), and `just fix -p codex-core`, `just
    fix -p codex-cli`, `just fix -p codex-app-server` (succeeded where
    applicable).
    - Ran unit tests for the CLI: `cargo test -p codex-cli` (passed).
    - Ran unit tests for core: `cargo test -p codex-core` (ran; many tests
    passed but several failed, including model refresh/403-related tests,
    shell snapshot/timeouts, and several `unified_exec` expectations).
    - Ran app-server tests: `cargo test -p codex-app-server` (ran; many
    integration-suite tests failed due to mocked/remote HTTP 401/403
    responses and wiremock expectations).
    
    If you want, I can split the tests into smaller focused runs or help
    debug the failing integration tests (they appear to be unrelated to the
    config change and stem from external HTTP/mocking behaviors encountered
    during the test runs).
    
    ------
    [Codex
    Task](https://chatgpt.com/codex/tasks/task_i_69718f505914832ea1f334b3ba064553)
  • Add thread/unarchive to restore archived rollouts (#9843)
    ## Summary
    - Adds a new `thread/unarchive` RPC to move archived thread rollouts
    back into the active `sessions/` tree.
    
    ## What changed
    - **Protocol**
      - Adds `thread/unarchive` request/response types and wiring.
    - **Server**
      - Implements `thread_unarchive` in the app server.
      - Validates the archived rollout path and thread ID.
    - Restores the rollout to `sessions/YYYY/MM/DD/...` based on the rollout
    filename timestamp.
    - **Core**
    - Adds `find_archived_thread_path_by_id_str` helper for archived
    rollouts.
    - **Docs**
      - Documents the new RPC and usage example.
    - **Tests**
      - Adds an end-to-end server test that:
        1) starts a thread,
        2) archives it,
        3) unarchives it,
        4) asserts the file is restored to `sessions/`.
    
    ## How to use
    ```json
    { "method": "thread/unarchive", "id": 24, "params": { "threadId": "<thread-id>" } }
    ```
    
    ## Author Codex Session
    
    `codex resume 019bf158-54b6-7960-a696-9d85df7e1bc1` (soon I'll make this
    kind of session UUID forkable by anyone with the right
    `session_object_storage_url` line in their config, but for now just
    pasting it here for my reference)
  • Feat: add isOther to question returned by request user input tool (#9890)
    ### Summary
    Add `isOther` to question object from request_user_input tool input and
    remove `other` option from the tool prompt to better handle tool input.
  • chore(core) move model_instructions_template config (#9871)
    ## Summary
    Move `model_instructions_template` config to the experimental slug while
    we iterate on this feature
    
    ## Testing
    - [x] Tested locally, unit tests still pass
  • feat(tui) /personality (#9718)
    ## Summary
    Adds /personality selector in the TUI, which leverages the new core
    interface in #9644
    
    Notes:
    - We are doing some of our own state management for model_info loading
    here, but not sure if that's ideal. open to opinions on simpler
    approach, but would like to avoid blocking on a larger refactor
    - Right now, the `/personality` selector just hides when the model
    doesn't support it. we can update this behavior down the line
    
    ## Testing
    - [x] Tested locally
    - [x] Added snapshot tests
  • feat: ephemeral threads (#9765)
    Add ephemeral threads capabilities. Only exposed through the
    `app-server` v2
    
    The idea is to disable the rollout recorder for those threads.
  • change collaboration mode to struct (#9793)
    Shouldn't cause behavioral change
  • feat: support proxy for ws connection (#9719)
    reapply websocket changes without changing tls lib.
  • feat(core) update Personality on turn (#9644)
    ## Summary
    Support updating Personality mid-Thread via UserTurn/OverwriteTurn. This
    is explicitly unused by the clients so far, to simplify PRs - app-server
    and tui implementations will be follow-ups.
    
    ## Testing
    - [x] added integration tests
  • Support end_turn flag (#9698)
    Experimental flag that signals the end of the turn.
  • feat(core) ModelInfo.model_instructions_template (#9597)
    ## Summary
    #9555 is the start of a rename, so I'm starting to standardize here.
    Sets up `model_instructions` templating with a strongly-typed object for
    injecting a personality block into the model instructions.
    
    ## Testing
    - [x] Added tests
    - [x] Ran locally
  • Reduce burst testing flake (#9549)
    ## Summary
    
    - make paste-burst tests deterministic by injecting explicit timestamps
    instead of relying on wall clock timing
    - add time-aware helpers for input/submission paths so tests can drive
    the burst heuristic precisely
    - update burst-related tests to flush using computed timeouts while
    preserving behavior assertions
    - increase timeout slack in
    shell_tools_start_before_response_completed_when_stream_delayed to
    reduce flakiness
  • Add collaboration_mode to TurnContextItem (#9583)
    ## Summary
    - add optional `collaboration_mode` to `TurnContextItem` in rollouts
    - persist the current collaboration mode when recording turn context
    (sampling + compaction)
    
    ## Rationale
    We already persist turn context data for resume logic. Capturing
    collaboration mode in the rollout gives us the mode context for each
    turn, enabling follow‑up work to diff mode instructions correctly on
    resume.
    
    ## Changes
    - protocol: add optional `collaboration_mode` field to `TurnContextItem`
    - core: persist collaboration mode alongside other turn context settings
    in rollouts
  • Chore: update plan mode output in prompt (#9592)
    ### Summary
    * Update plan prompt output
    * Update requestUserInput response to be a single key value pair
    `answer: String`.
  • chore: defensive shell snapshot (#9609)
    This PR adds 2 defensive mechanisms for shell snapshotting:
    * Filter out invalid env variables (containing `-` for example) without
    dropping the whole snapshot
    * Validate the snapshot before considering it as valid by running a mock
    command with a shell snapshot
  • Reject ask user question tool in Execute and Custom (#9560)
    ## Summary
    - Keep `request_user_input` in the tool list but reject it at runtime in
    Execute/Custom modes with a clear model-facing error.
    - Add a session accessor for current collaboration mode and enforce the
    gate in the request_user_input handler.
    - Update core/app-server tests to use Plan mode for success and add
    Execute/Custom rejection coverage.
  • feat: rename experimental_instructions_file to model_instructions_file (#9555)
    A user who has `experimental_instructions_file` set will now see this:
    
    <img width="888" height="660" alt="image"
    src="https://github.com/user-attachments/assets/51c98312-eb9b-4881-81f1-bea6677e158d"
    />
    
    And a `codex exec` would include this warning:
    
    <img width="888" height="660" alt="image"
    src="https://github.com/user-attachments/assets/a89f62be-1edf-4593-a75e-e0b4a762ed7d"
    />
  • Add total (non-partial) TextElement placeholder accessors (#9545)
    ## Summary
    - Make `TextElement` placeholders private and add a text-backed accessor
    to avoid assuming `Some`.
    - Since they are optional in the protocol, we want to make sure any
    accessors properly handle the None case (getting the placeholder using
    the byte range in the text)
    - Preserve placeholders during protocol/app-server conversions using the
    accessor fallback.
    - Update TUI composer/remap logic and tests to use the new
    constructor/accessor.
  • merge remote models (#9547)
    We have `models.json` and `/models` response
    Behavior:
    1. New models from models endpoint gets added
    2. Shared models get replaced by remote ones
    3. Existing models in `models.json` but not `/models` are kept
    4. Mark highest priority as default
  • fix: prevent repeating interrupted turns (#9043)
    ## What
    Record a model-visible `<turn_aborted>` marker in history when a turn is
    interrupted, and treat it as a session prefix.
    
    ## Why
    When a turn is interrupted, Codex emits `TurnAborted` but previously did
    not persist anything model-visible in the conversation history. On the
    next user turn, the model can’t tell the previous work was aborted and
    may resume/repeat earlier actions (including duplicated side effects
    like re-opening PRs).
    
    Fixes: https://github.com/openai/codex/issues/9042
    
    ## How
    On `TurnAbortReason::Interrupted`, append a hidden user message
    containing a `<turn_aborted>…</turn_aborted>` marker and flush.
    Treat `<turn_aborted>` like `<environment_context>` for session-prefix
    filtering.
    Add a regression test to ensure follow-up turns don’t repeat side
    effects from an aborted turn.
    
    ## Testing
    `just fmt`
    `just fix -p codex-core`
    `cargo test -p codex-core -- --test-threads=1`
    `cargo test --all-features -- --test-threads=1`
    
    ---------
    
    Co-authored-by: Skylar Graika <sgraika127@gmail.com>
    Co-authored-by: jif-oai <jif@openai.com>
    Co-authored-by: Eric Traut <etraut@openai.com>
  • feat(personality) introduce model_personality config (#9459)
    ## Summary
    Introduces the concept of a config model_personality. I would consider
    this an MVP for testing out the feature. There are a number of
    follow-ups to this PR:
    
    - More sophisticated templating with validation
    - In-product experience to manage this
    
    ## Testing
    - [x] Testing locally
  • feat: support proxy for ws connection (#9409)
    unfortunately tokio-tungstenite doesn't support proxy configuration
    outbox, while https://github.com/snapview/tokio-tungstenite/pull/370 is
    in review, we can depend on source code for now.
  • fix(core) Preserve base_instructions in SessionMeta (#9427)
    ## Summary
    This PR consolidates base_instructions onto SessionMeta /
    SessionConfiguration, so we ensure `base_instructions` is set once per
    session and should be (mostly) immutable, unless:
    - overridden by config on resume / fork
    - sub-agent tasks, like review or collab
    
    
    In a future PR, we should convert all references to `base_instructions`
    to consistently used the typed struct, so it's less likely that we put
    other strings there. See #9423. However, this PR is already quite
    complex, so I'm deferring that to a follow-up.
    
    ## Testing
    - [x] Added a resume test to assert that instructions are preserved. In
    particular, `resume_switches_models_preserves_base_instructions` fails
    against main.
    
    Existing test coverage thats assert base instructions are preserved
    across multiple requests in a session:
    - Manual compact keeps baseline instructions:
    core/tests/suite/compact.rs:199
    - Auto-compact keeps baseline instructions:
    core/tests/suite/compact.rs:1142
    - Prompt caching reuses the same instructions across two requests:
    core/tests/suite/prompt_caching.rs:150 and
    core/tests/suite/prompt_caching.rs:157
    - Prompt caching with explicit expected string across two requests:
    core/tests/suite/prompt_caching.rs:213 and
    core/tests/suite/prompt_caching.rs:222
    - Resume with model switch keeps original instructions:
    core/tests/suite/resume.rs:136
    - Compact/resume/fork uses request 0 instructions for later expected
    payloads: core/tests/suite/compact_resume_fork.rs:215
  • Act on reasoning-included per turn (#9402)
    - Reset reasoning-included flag each turn and update compaction test
  • Feat: request user input tool (#9472)
    ### Summary
    * Add `requestUserInput` tool that the model can use for gather
    feedback/asking question mid turn.
    
    
    ### Tool input schema
    ```
    {
      "$schema": "http://json-schema.org/draft-07/schema#",
      "title": "requestUserInput input",
      "type": "object",
      "additionalProperties": false,
      "required": ["questions"],
      "properties": {
        "questions": {
          "type": "array",
          "description": "Questions to show the user (1-3). Prefer 1 unless multiple independent decisions block progress.",
          "minItems": 1,
          "maxItems": 3,
          "items": {
            "type": "object",
            "additionalProperties": false,
            "required": ["id", "header", "question"],
            "properties": {
              "id": {
                "type": "string",
                "description": "Stable identifier for mapping answers (snake_case)."
              },
              "header": {
                "type": "string",
                "description": "Short header label shown in the UI (12 or fewer chars)."
              },
              "question": {
                "type": "string",
                "description": "Single-sentence prompt shown to the user."
              },
              "options": {
                "type": "array",
                "description": "Optional 2-3 mutually exclusive choices. Put the recommended option first and suffix its label with \"(Recommended)\". Only include \"Other\" option if we want to include a free form option. If the question is free form in nature, do not include any option.",
                "minItems": 2,
                "maxItems": 3,
                "items": {
                  "type": "object",
                  "additionalProperties": false,
                  "required": ["value", "label", "description"],
                  "properties": {
                    "value": {
                      "type": "string",
                      "description": "Machine-readable value (snake_case)."
                    },
                    "label": {
                      "type": "string",
                      "description": "User-facing label (1-5 words)."
                    },
                    "description": {
                      "type": "string",
                      "description": "One short sentence explaining impact/tradeoff if selected."
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
    ```
    
    ### Tool output schema
    ```
    {
      "$schema": "http://json-schema.org/draft-07/schema#",
      "title": "requestUserInput output",
      "type": "object",
      "additionalProperties": false,
      "required": ["answers"],
      "properties": {
        "answers": {
          "type": "object",
          "description": "Map of question id to user answer.",
          "additionalProperties": {
            "type": "object",
            "additionalProperties": false,
            "required": ["selected"],
            "properties": {
              "selected": {
                "type": "array",
                "items": { "type": "string" }
              },
              "other": {
                "type": ["string", "null"]
              }
            }
          }
        }
      }
    }
    ```
  • Add collaboration developer instructions (#9424)
    - Add additional instructions when they are available
    - Make sure to update them on change either UserInput or UserTurn
  • chore(instructions) Remove unread SessionMeta.instructions field (#9423)
    ### Description
    - Remove the now-unused `instructions` field from the session metadata
    to simplify SessionMeta and stop propagating transient instruction text
    through the rollout recorder API. This was only saving
    user_instructions, and was never being read.
    - Stop passing user instructions into the rollout writer at session
    creation so the rollout header only contains canonical session metadata.
    
    ### Testing
    
    - Ran `just fmt` which completed successfully.
    - Ran `just fix -p codex-protocol`, `just fix -p codex-core`, `just fix
    -p codex-app-server`, `just fix -p codex-tui`, and `just fix -p
    codex-tui2` which completed (Clippy fixes applied) as part of
    verification.
    - Ran `cargo test -p codex-protocol` which passed (28 tests).
    - Ran `cargo test -p codex-core` which showed failures in a small set of
    tests (not caused by the protocol type change directly):
    `default_client::tests::test_create_client_sets_default_headers`,
    several `models_manager::manager::tests::refresh_available_models_*`,
    and `shell_snapshot::tests::linux_sh_snapshot_includes_sections` (these
    tests failed in this CI run).
    - Ran `cargo test -p codex-app-server` which reported several failing
    integration tests (including
    `suite::codex_message_processor_flow::test_codex_jsonrpc_conversation_flow`,
    `suite::output_schema::send_user_turn_*`, and
    `suite::user_agent::get_user_agent_returns_current_codex_user_agent`).
    - `cargo test -p codex-tui` and `cargo test -p codex-tui2` were
    attempted but aborted due to disk space exhaustion (`No space left on
    device`).
    
    ------
    [Codex
    Task](https://chatgpt.com/codex/tasks/task_i_696bd8ce632483228d298cf07c7eb41c)
  • feat(app-server, core): return threads by created_at or updated_at (#9247)
    Add support for returning threads by either `created_at` OR `updated_at`
    descending. Previously core always returned threads ordered by
    `created_at`.
    
    This PR:
    - updates core to be able to list threads by `updated_at` OR
    `created_at` descending based on what the caller wants
    - also update `thread/list` in app-server to expose this (default to
    `created_at` if not specified)
    
    All existing codepaths (app-server, TUI) still default to `created_at`,
    so no behavior change is expected with this PR.
    
    **Implementation**
    To sort by `updated_at` is a bit nontrivial (whereas `created_at` is
    easy due to the way we structure the folders and filenames on disk,
    which are all based on `created_at`).
    
    The most naive way to do this without introducing a cache file or sqlite
    DB (which we have to implement/maintain) is to scan files in reverse
    `created_at` order on disk, and look at the file's mtime (last modified
    timestamp according to the filesystem) until we reach `MAX_SCAN_FILES`
    (currently set to 10,000). Then, we can return the most recent N
    threads.
    
    Based on some quick and dirty benchmarking on my machine with ~1000
    rollout files, calling `thread/list` with limit 50, the `updated_at`
    path is slower as expected due to all the I/O:
    - updated-at: average 103.10 ms
    - created-at: average 41.10 ms
    
    Those absolute numbers aren't a big deal IMO, but we can certainly
    optimize this in a followup if needed by introducing more state stored
    on disk.
    
    **Caveat**
    There's also a limitation in that any files older than `MAX_SCAN_FILES`
    will be excluded, which means if a user continues a REALLY old thread,
    it's possible to not be included. In practice that should not be too big
    of an issue.
    
    If a user makes...
    - 1000 rollouts/day → threads older than 10 days won't show up
    - 100 rollouts/day → ~100 days
    
    If this becomes a problem for some reason, even more motivation to
    implement an updated_at cache.
  • Turn-state sticky routing per turn (#9332)
    - capture the header from SSE/WS handshakes, store it per
    ModelClientSession using `Oncelock`, echo it on turn-scoped requests,
    and add SSE+WS integration tests for within-turn persistence +
    cross-turn reset.
    
    - keep `x-codex-turn-state` sticky within a user turn to maintain
    routing continuity for retries/tool follow-ups.
  • chore: close pipe on non-pty processes (#9369)
    Closing the STDIN of piped process when starting them to avoid commands
    like `rg` to wait for content on STDIN and hangs for ever
  • Add text element metadata to protocol, app server, and core (#9331)
    The second part of breaking up PR
    https://github.com/openai/codex/pull/9116
    
    Summary:
    
    - Add `TextElement` / `ByteRange` to protocol user inputs and user
    message events with defaults.
    - Thread `text_elements` through app-server v1/v2 request handling and
    history rebuild.
    - Preserve UI metadata only in user input/events (not `ContentItem`)
    while keeping local image attachments in user events for rehydration.
    
    Details:
    
    - Protocol: `UserInput::Text` carries `text_elements`;
    `UserMessageEvent` carries `text_elements` + `local_images`.
    Serialization includes empty vectors for backward compatibility.
    - app-server-protocol: v1 defines `V1TextElement` / `V1ByteRange` in
    camelCase with conversions; v2 uses its own camelCase wrapper.
    - app-server: v1/v2 input mapping includes `text_elements`; thread
    history rebuilds include them.
    - Core: user event emission preserves UI metadata while model history
    stays clean; history replay round-trips the metadata.
  • [search] allow explicitly disabling web search (#9249)
    moving `web_search` rollout serverside, so need a way to explicitly
    disable search + signal eligibility from the client.
    
    - Add `x‑oai‑web‑search‑eligible` header that signifies whether the
    request can have web search.
    - Only attach the `web_search` tool when the resolved `WebSearchMode` is
    `Live` or `Cached`.
  • Propagate MCP disabled reason (#9207)
    Indicate why MCP servers are disabled when they are disabled by
    requirements:
    
    ```
    ➜  codex git:(main) ✗ just codex mcp list
    cargo run --bin codex -- "$@"
        Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.27s
         Running `target/debug/codex mcp list`
    Name         Command          Args  Env  Cwd  Status                                                                  Auth
    docs         docs-mcp         -     -    -    disabled: requirements (MDM com.openai.codex:requirements_toml_base64)  Unsupported
    hello_world  hello-world-mcp  -     -    -    disabled: requirements (MDM com.openai.codex:requirements_toml_base64)  Unsupported
    
    ➜  codex git:(main) ✗ just c
    cargo run --bin codex -- "$@"
        Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.90s
         Running `target/debug/codex`
    ╭─────────────────────────────────────────────╮
    │ >_ OpenAI Codex (v0.0.0)                    │
    │                                             │
    │ model:     gpt-5.2 xhigh   /model to change │
    │ directory: ~/code/codex/codex-rs            │
    ╰─────────────────────────────────────────────╯
    
    /mcp
    
    🔌  MCP Tools
    
      • No MCP tools available.
    
      • docs (disabled)
        • Reason: requirements (MDM com.openai.codex:requirements_toml_base64)
    
      • hello_world (disabled)
        • Reason: requirements (MDM com.openai.codex:requirements_toml_base64)
    ```
  • Add text element metadata to types (#9235)
    Initial type tweaking PR to make the diff of
    https://github.com/openai/codex/pull/9116 smaller
    
    This should not change any behavior, just adds some fields to types
  • add WebSearchMode enum (#9216)
    ### What
    Add `WebSearchMode` enum (disabled, cached live, defaults to cached) to
    config + V2 protocol. This enum takes precedence over legacy flags:
    `web_search_cached`, `web_search_request`, and `tools.web_search`.
    
    Keep `--search` as live.
    
    ### Tests
    Added tests