Commit Graph

239 Commits

  • change collaboration mode to struct (#9793)
    Shouldn't cause behavioral change
  • Fix execpolicy parsing for multiline quoted args (#9565)
    ## What
    Fix bash command parsing to accept double-quoted strings that contain
    literal newlines so execpolicy can match allow rules.
    
    ## Why
    Allow rules like [git, commit] should still match when commit messages
    include a newline in a quoted argument; the parser currently rejects
    these strings and falls back to the outer shell invocation.
    
    ## How
    - Validate double-quoted strings by ensuring all named children are
    string_content and then stripping the outer quotes from the raw node
    text so embedded newlines are preserved.
    - Reuse the helper for concatenated arguments.
    - Ensure large SI suffix formatting uses the caller-provided locale
    formatter for grouping.
    - Add coverage for newline-containing quoted arguments.
    
    Fixes #9541.
    
    ## Tests
    - cargo test -p codex-core
    - just fix -p codex-core
    - cargo test -p codex-protocol
    - just fix -p codex-protocol
    - cargo test --all-features
  • feat(core) update Personality on turn (#9644)
    ## Summary
    Support updating Personality mid-Thread via UserTurn/OverwriteTurn. This
    is explicitly unused by the clients so far, to simplify PRs - app-server
    and tui implementations will be follow-ups.
    
    ## Testing
    - [x] added integration tests
  • Support end_turn flag (#9698)
    Experimental flag that signals the end of the turn.
  • Add UI for skill enable/disable. (#9627)
    "/skill" will now allow you to enable/disable skills:
    <img width="658" height="199" alt="image"
    src="https://github.com/user-attachments/assets/bf8994c8-d6c1-462f-8bbb-f1ee9241caa4"
    />
  • feat(core) ModelInfo.model_instructions_template (#9597)
    ## Summary
    #9555 is the start of a rename, so I'm starting to standardize here.
    Sets up `model_instructions` templating with a strongly-typed object for
    injecting a personality block into the model instructions.
    
    ## Testing
    - [x] Added tests
    - [x] Ran locally
  • Add layered config.toml support to app server (#9510)
    This PR adds support for chained (layered) config.toml file merging for
    clients that use the app server interface. This feature already exists
    for the TUI, but it does not work for GUI clients.
    
    It does the following:
    * Changes code paths for new thread, resume thread, and fork thread to
    use the effective config based on the cwd.
    * Updates the `config/read` API to accept an optional `cwd` parameter.
    If specified, the API returns the effective config based on that cwd
    path. Also optionally includes all layers including project config
    files. If cwd is not specified, the API falls back on its older behavior
    where it considers only the global (non-project) config files when
    computing the effective config.
    
    The changes in codex_message_processor.rs look deceptively large. They
    mostly just involve moving existing blocks of code to a later point in
    some functions so it can use the cwd to calculate the config.
    
    This PR builds upon #9509 and should be reviewed and merged after that
    PR.
    
    Tested:
    * Verified change with (dependent, as-yet-uncommitted) changes to IDE
    Extension and confirmed correct behavior
    
    The full fix requires additional changes in the IDE Extension code base,
    but they depend on this PR.
  • Add collaboration_mode to TurnContextItem (#9583)
    ## Summary
    - add optional `collaboration_mode` to `TurnContextItem` in rollouts
    - persist the current collaboration mode when recording turn context
    (sampling + compaction)
    
    ## Rationale
    We already persist turn context data for resume logic. Capturing
    collaboration mode in the rollout gives us the mode context for each
    turn, enabling follow‑up work to diff mode instructions correctly on
    resume.
    
    ## Changes
    - protocol: add optional `collaboration_mode` field to `TurnContextItem`
    - core: persist collaboration mode alongside other turn context settings
    in rollouts
  • Chore: update plan mode output in prompt (#9592)
    ### Summary
    * Update plan prompt output
    * Update requestUserInput response to be a single key value pair
    `answer: String`.
  • Prompt Expansion: Preserve Text Elements (#9518)
    Summary
    - Preserve `text_elements` through custom prompt argument parsing and
    expansion (named and numeric placeholders).
    - Translate text element ranges through Shlex parsing using sentinel
    substitution, and rehydrate text + element ranges per arg.
    - Drop image attachments when their placeholder does not survive prompt
    expansion, keeping attachments consistent with rendered elements.
    - Mirror changes in TUI2 and expand tests for prompt parsing/expansion
    edge cases.
    
    Tests
    - placeholders with spaces as single tokens (positional + key=value,
    quoted + unquoted),
      - prompt expansion with image placeholders,
      - large paste + image arg combinations,
      - unused image arg dropped after expansion.
  • Add total (non-partial) TextElement placeholder accessors (#9545)
    ## Summary
    - Make `TextElement` placeholders private and add a text-backed accessor
    to avoid assuming `Some`.
    - Since they are optional in the protocol, we want to make sure any
    accessors properly handle the None case (getting the placeholder using
    the byte range in the text)
    - Preserve placeholders during protocol/app-server conversions using the
    accessor fallback.
    - Update TUI composer/remap logic and tests to use the new
    constructor/accessor.
  • fix(core) Preserve base_instructions in SessionMeta (#9427)
    ## Summary
    This PR consolidates base_instructions onto SessionMeta /
    SessionConfiguration, so we ensure `base_instructions` is set once per
    session and should be (mostly) immutable, unless:
    - overridden by config on resume / fork
    - sub-agent tasks, like review or collab
    
    
    In a future PR, we should convert all references to `base_instructions`
    to consistently used the typed struct, so it's less likely that we put
    other strings there. See #9423. However, this PR is already quite
    complex, so I'm deferring that to a follow-up.
    
    ## Testing
    - [x] Added a resume test to assert that instructions are preserved. In
    particular, `resume_switches_models_preserves_base_instructions` fails
    against main.
    
    Existing test coverage thats assert base instructions are preserved
    across multiple requests in a session:
    - Manual compact keeps baseline instructions:
    core/tests/suite/compact.rs:199
    - Auto-compact keeps baseline instructions:
    core/tests/suite/compact.rs:1142
    - Prompt caching reuses the same instructions across two requests:
    core/tests/suite/prompt_caching.rs:150 and
    core/tests/suite/prompt_caching.rs:157
    - Prompt caching with explicit expected string across two requests:
    core/tests/suite/prompt_caching.rs:213 and
    core/tests/suite/prompt_caching.rs:222
    - Resume with model switch keeps original instructions:
    core/tests/suite/resume.rs:136
    - Compact/resume/fork uses request 0 instructions for later expected
    payloads: core/tests/suite/compact_resume_fork.rs:215
  • Migrate tui to use UserTurn (#9497)
    - `tui/` and `tui2/` submit `Op::UserTurn` and own full turn context
    (cwd/approval/sandbox/model/etc.).
    - `Op::UserInput` is documented as legacy in `codex-protocol` (doc-only;
    no `#[deprecated]` to avoid `-D warnings` fallout).
    - Remove obsolete `#[allow(deprecated)]` and the unused `ConversationId`
    alias/re-export.
  • Feat: request user input tool (#9472)
    ### Summary
    * Add `requestUserInput` tool that the model can use for gather
    feedback/asking question mid turn.
    
    
    ### Tool input schema
    ```
    {
      "$schema": "http://json-schema.org/draft-07/schema#",
      "title": "requestUserInput input",
      "type": "object",
      "additionalProperties": false,
      "required": ["questions"],
      "properties": {
        "questions": {
          "type": "array",
          "description": "Questions to show the user (1-3). Prefer 1 unless multiple independent decisions block progress.",
          "minItems": 1,
          "maxItems": 3,
          "items": {
            "type": "object",
            "additionalProperties": false,
            "required": ["id", "header", "question"],
            "properties": {
              "id": {
                "type": "string",
                "description": "Stable identifier for mapping answers (snake_case)."
              },
              "header": {
                "type": "string",
                "description": "Short header label shown in the UI (12 or fewer chars)."
              },
              "question": {
                "type": "string",
                "description": "Single-sentence prompt shown to the user."
              },
              "options": {
                "type": "array",
                "description": "Optional 2-3 mutually exclusive choices. Put the recommended option first and suffix its label with \"(Recommended)\". Only include \"Other\" option if we want to include a free form option. If the question is free form in nature, do not include any option.",
                "minItems": 2,
                "maxItems": 3,
                "items": {
                  "type": "object",
                  "additionalProperties": false,
                  "required": ["value", "label", "description"],
                  "properties": {
                    "value": {
                      "type": "string",
                      "description": "Machine-readable value (snake_case)."
                    },
                    "label": {
                      "type": "string",
                      "description": "User-facing label (1-5 words)."
                    },
                    "description": {
                      "type": "string",
                      "description": "One short sentence explaining impact/tradeoff if selected."
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
    ```
    
    ### Tool output schema
    ```
    {
      "$schema": "http://json-schema.org/draft-07/schema#",
      "title": "requestUserInput output",
      "type": "object",
      "additionalProperties": false,
      "required": ["answers"],
      "properties": {
        "answers": {
          "type": "object",
          "description": "Map of question id to user answer.",
          "additionalProperties": {
            "type": "object",
            "additionalProperties": false,
            "required": ["selected"],
            "properties": {
              "selected": {
                "type": "array",
                "items": { "type": "string" }
              },
              "other": {
                "type": ["string", "null"]
              }
            }
          }
        }
      }
    }
    ```
  • Remove unused protocol collaboration mode prompts (#9463)
    Delete duplicate collaboration mode markdown under protocol prompts;
    core templates remain the single source of truth.
  • Add collaboration modes test prompts (#9443)
    # External (non-OpenAI) Pull Request Requirements
    
    Before opening this Pull Request, please read the dedicated
    "Contributing" markdown file or your PR may be closed:
    https://github.com/openai/codex/blob/main/docs/contributing.md
    
    If your PR conforms to our contribution guidelines, replace this text
    with a detailed and high quality description of your changes.
    
    Include a link to a bug report or enhancement request.
  • Add collaboration developer instructions (#9424)
    - Add additional instructions when they are available
    - Make sure to update them on change either UserInput or UserTurn
  • chore(instructions) Remove unread SessionMeta.instructions field (#9423)
    ### Description
    - Remove the now-unused `instructions` field from the session metadata
    to simplify SessionMeta and stop propagating transient instruction text
    through the rollout recorder API. This was only saving
    user_instructions, and was never being read.
    - Stop passing user instructions into the rollout writer at session
    creation so the rollout header only contains canonical session metadata.
    
    ### Testing
    
    - Ran `just fmt` which completed successfully.
    - Ran `just fix -p codex-protocol`, `just fix -p codex-core`, `just fix
    -p codex-app-server`, `just fix -p codex-tui`, and `just fix -p
    codex-tui2` which completed (Clippy fixes applied) as part of
    verification.
    - Ran `cargo test -p codex-protocol` which passed (28 tests).
    - Ran `cargo test -p codex-core` which showed failures in a small set of
    tests (not caused by the protocol type change directly):
    `default_client::tests::test_create_client_sets_default_headers`,
    several `models_manager::manager::tests::refresh_available_models_*`,
    and `shell_snapshot::tests::linux_sh_snapshot_includes_sections` (these
    tests failed in this CI run).
    - Ran `cargo test -p codex-app-server` which reported several failing
    integration tests (including
    `suite::codex_message_processor_flow::test_codex_jsonrpc_conversation_flow`,
    `suite::output_schema::send_user_turn_*`, and
    `suite::user_agent::get_user_agent_returns_current_codex_user_agent`).
    - `cargo test -p codex-tui` and `cargo test -p codex-tui2` were
    attempted but aborted due to disk space exhaustion (`No space left on
    device`).
    
    ------
    [Codex
    Task](https://chatgpt.com/codex/tasks/task_i_696bd8ce632483228d298cf07c7eb41c)
  • Introduce collaboration modes (#9340)
    - Merge `model` and `reasoning_effort` under collaboration modes.
    - Add additional instructions for custom collaboration mode
    - Default to Custom to not change behavior
  • feat: show forked from session id in /status (#9330)
    Summary:
    - Add forked_from to SessionMeta/SessionConfiguredEvent and persist it
    for forked sessions.
    - Surface forked_from in /status for tui + tui2 and add snapshots.
  • chore: upgrade to Rust 1.92.0 (#8860)
    **Summary**
    - Upgrade Rust toolchain used by CI to 1.92.0.
    - Address new clippy `derivable_impls` warnings by deriving `Default`
    for enums across protocol, core, backend openapi models, and
    windows-sandbox setup.
    - Tidy up related test/config behavior (originator header handling, env
    override cleanup) and remove a now-unused assignment in TUI/TUI2 render
    layout.
    
    **Testing**
    - `just fmt`
    - `just fix -p codex-tui`
    - `just fix -p codex-tui2`
    - `just fix -p codex-windows-sandbox`
    - `cargo test -p codex-tui`
    - `cargo test -p codex-tui2`
    - `cargo test -p codex-windows-sandbox`
    - `cargo test -p codex-core --test all`
    - `cargo test -p codex-app-server --test all`
    - `cargo test -p codex-mcp-server --test all`
    - `cargo test --all-features`
  • Add text element metadata to protocol, app server, and core (#9331)
    The second part of breaking up PR
    https://github.com/openai/codex/pull/9116
    
    Summary:
    
    - Add `TextElement` / `ByteRange` to protocol user inputs and user
    message events with defaults.
    - Thread `text_elements` through app-server v1/v2 request handling and
    history rebuild.
    - Preserve UI metadata only in user input/events (not `ContentItem`)
    while keeping local image attachments in user events for rehydration.
    
    Details:
    
    - Protocol: `UserInput::Text` carries `text_elements`;
    `UserMessageEvent` carries `text_elements` + `local_images`.
    Serialization includes empty vectors for backward compatibility.
    - app-server-protocol: v1 defines `V1TextElement` / `V1ByteRange` in
    camelCase with conversions; v2 uses its own camelCase wrapper.
    - app-server: v1/v2 input mapping includes `text_elements`; thread
    history rebuilds include them.
    - Core: user event emission preserves UI metadata while model history
    stays clean; history replay round-trips the metadata.
  • Support SKILL.toml file. (#9125)
    We’re introducing a new SKILL.toml to hold skill metadata so Codex can
    deliver a richer Skills experience.
    
    Initial focus is the interface block:
    ```
    [interface]
    display_name = "Optional user-facing name"
    short_description = "Optional user-facing description"
    icon_small = "./assets/small-400px.png"
    icon_large = "./assets/large-logo.svg"
    brand_color = "#3B82F6"
    default_prompt = "Optional surrounding prompt to use the skill with"
    ```
    
    All fields are exposed via the app server API.
    display_name and short_description are consumed by the TUI.
  • Add migration_markdown in model_info (#9219)
    Next step would be to clean Model Upgrade in model presets
    
    ---------
    
    Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
    Co-authored-by: aibrahim-oai <219906144+aibrahim-oai@users.noreply.github.com>
  • Add text element metadata to types (#9235)
    Initial type tweaking PR to make the diff of
    https://github.com/openai/codex/pull/9116 smaller
    
    This should not change any behavior, just adds some fields to types
  • add WebSearchMode enum (#9216)
    ### What
    Add `WebSearchMode` enum (disabled, cached live, defaults to cached) to
    config + V2 protocol. This enum takes precedence over legacy flags:
    `web_search_cached`, `web_search_request`, and `tools.web_search`.
    
    Keep `--search` as live.
    
    ### Tests
    Added tests
  • feat: emit events around collab tools (#9095)
    Emit the following events around the collab tools. On the `app-server`
    this will be under `item/started` and `item/completed`
    ```
    #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
    pub struct CollabAgentSpawnBeginEvent {
        /// Identifier for the collab tool call.
        pub call_id: String,
        /// Thread ID of the sender.
        pub sender_thread_id: ThreadId,
        /// Initial prompt sent to the agent. Can be empty to prevent CoT leaking at the
        /// beginning.
        pub prompt: String,
    }
    
    #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
    pub struct CollabAgentSpawnEndEvent {
        /// Identifier for the collab tool call.
        pub call_id: String,
        /// Thread ID of the sender.
        pub sender_thread_id: ThreadId,
        /// Thread ID of the newly spawned agent, if it was created.
        pub new_thread_id: Option<ThreadId>,
        /// Initial prompt sent to the agent. Can be empty to prevent CoT leaking at the
        /// beginning.
        pub prompt: String,
        /// Last known status of the new agent reported to the sender agent.
        pub status: AgentStatus,
    }
    
    #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
    pub struct CollabAgentInteractionBeginEvent {
        /// Identifier for the collab tool call.
        pub call_id: String,
        /// Thread ID of the sender.
        pub sender_thread_id: ThreadId,
        /// Thread ID of the receiver.
        pub receiver_thread_id: ThreadId,
        /// Prompt sent from the sender to the receiver. Can be empty to prevent CoT
        /// leaking at the beginning.
        pub prompt: String,
    }
    
    #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
    pub struct CollabAgentInteractionEndEvent {
        /// Identifier for the collab tool call.
        pub call_id: String,
        /// Thread ID of the sender.
        pub sender_thread_id: ThreadId,
        /// Thread ID of the receiver.
        pub receiver_thread_id: ThreadId,
        /// Prompt sent from the sender to the receiver. Can be empty to prevent CoT
        /// leaking at the beginning.
        pub prompt: String,
        /// Last known status of the receiver agent reported to the sender agent.
        pub status: AgentStatus,
    }
    
    #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
    pub struct CollabWaitingBeginEvent {
        /// Thread ID of the sender.
        pub sender_thread_id: ThreadId,
        /// Thread ID of the receiver.
        pub receiver_thread_id: ThreadId,
        /// ID of the waiting call.
        pub call_id: String,
    }
    
    #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
    pub struct CollabWaitingEndEvent {
        /// Thread ID of the sender.
        pub sender_thread_id: ThreadId,
        /// Thread ID of the receiver.
        pub receiver_thread_id: ThreadId,
        /// ID of the waiting call.
        pub call_id: String,
        /// Last known status of the receiver agent reported to the sender agent.
        pub status: AgentStatus,
    }
    
    #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
    pub struct CollabCloseBeginEvent {
        /// Identifier for the collab tool call.
        pub call_id: String,
        /// Thread ID of the sender.
        pub sender_thread_id: ThreadId,
        /// Thread ID of the receiver.
        pub receiver_thread_id: ThreadId,
    }
    
    #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
    pub struct CollabCloseEndEvent {
        /// Identifier for the collab tool call.
        pub call_id: String,
        /// Thread ID of the sender.
        pub sender_thread_id: ThreadId,
        /// Thread ID of the receiver.
        pub receiver_thread_id: ThreadId,
        /// Last known status of the receiver agent reported to the sender agent before
        /// the close.
        pub status: AgentStatus,
    }
    ```
  • feat: add support for read-only bind mounts in the linux sandbox (#9112)
    ### Motivation
    
    - Landlock alone cannot prevent writes to sensitive in-repo files like
    `.git/` when the repo root is writable, so explicit mount restrictions
    are required for those paths.
    - The sandbox must set up any mounts before calling Landlock so Landlock
    can still be applied afterwards and the two mechanisms compose
    correctly.
    
    ### Description
    
    - Add a new `linux-sandbox` helper `apply_read_only_mounts` in
    `linux-sandbox/src/mounts.rs` that: unshares namespaces, maps uids/gids
    when required, makes mounts private, bind-mounts targets, and remounts
    them read-only.
    - Wire the mount step into the sandbox flow by calling
    `apply_read_only_mounts(...)` before network/seccomp and before applying
    Landlock rules in `linux-sandbox/src/landlock.rs`.
  • clean models manager (#9168)
    Have only the following Methods:
    - `list_models`: getting current available models
    - `try_list_models`: sync version no refresh for tui use
    - `get_default_model`: get the default model (should be tightened to
    core and received on session configuration)
    - `get_model_info`: get `ModelInfo` for a specific model (should be
    tightened to core but used in tests)
    - `refresh_if_new_etag`: trigger refresh on different etags
    
    Also move the cache to its own struct
  • Use markdown for migration screen (#8952)
    Next steps will be routing this to model info
  • Add model client sessions (#9102)
    Maintain a long-running session.
  • Assemble sandbox/approval/network prompts dynamically (#8961)
    - Add a single builder for developer permissions messaging that accepts
    SandboxPolicy and approval policy. This builder now drives the developer
    “permissions” message that’s injected at session start and any time
    sandbox/approval settings change.
    - Trim EnvironmentContext to only include cwd, writable roots, and
    shell; removed sandbox/approval/network duplication and adjusted XML
    serialization and tests accordingly.
    
    Follow-up: adding a config value to replace the developer permissions
    message for custom sandboxes.
  • feat: hot reload mcp servers (#8957)
    ### Summary
    * Added `mcpServer/refresh` command to inform app servers and active
    threads to refresh mcpServer on next turn event.
    * Added `pending_mcp_server_refresh_config` to codex core so that if the
    value is populated, we reinitialize the mcp server manager on the thread
    level.
    * The config is updated on `mcpServer/refresh` command which we iterate
    through threads and provide with the latest config value after last
    write.
  • feat: add wait tool implementation for collab (#9088)
    Add implementation for the `wait` tool.
    
    For this we consider all status different from `PendingInit` and
    `Running` as terminal. The `wait` tool call will return either after a
    given timeout or when the tool reaches a non-terminal status.
    
    A few points to note:
    * The usage of a channel is preferred to prevent some races (just
    looping on `get_status()` could "miss" a terminal status)
    * The order of operations is very important, we need to first subscribe
    and then check the last known status to prevent race conditions
    * If the channel gets dropped, we return an error on purpose
  • Label attached images so agent can understand in-message labels (#8950)
    Agent wouldn't "see" attached images and would instead try to use the
    view_file tool:
    <img width="1516" height="504" alt="image"
    src="https://github.com/user-attachments/assets/68a705bb-f962-4fc1-9087-e932a6859b12"
    />
    
    In this PR, we wrap image content items in XML tags with the name of
    each image (now just a numbered name like `[Image #1]`), so that the
    model can understand inline image references (based on name). We also
    put the image content items above the user message which the model seems
    to prefer (maybe it's more used to definitions being before references).
    
    We also tweak the view_file tool description which seemed to help a bit
    
    Results on a simple eval set of images:
    
    Before
    <img width="980" height="310" alt="image"
    src="https://github.com/user-attachments/assets/ba838651-2565-4684-a12e-81a36641bf86"
    />
    
    After
    <img width="918" height="322" alt="image"
    src="https://github.com/user-attachments/assets/10a81951-7ee6-415e-a27e-e7a3fd0aee6f"
    />
    
    ```json
    [
      {
        "id": "single_describe",
        "prompt": "Describe the attached image in one sentence.",
        "images": ["image_a.png"]
      },
      {
        "id": "single_color",
        "prompt": "What is the dominant color in the image? Answer with a single color word.",
        "images": ["image_b.png"]
      },
      {
        "id": "orientation_check",
        "prompt": "Is the image portrait or landscape? Answer in one sentence.",
        "images": ["image_c.png"]
      },
      {
        "id": "detail_request",
        "prompt": "Look closely at the image and call out any small details you notice.",
        "images": ["image_d.png"]
      },
      {
        "id": "two_images_compare",
        "prompt": "I attached two images. Are they the same or different? Briefly explain.",
        "images": ["image_a.png", "image_b.png"]
      },
      {
        "id": "two_images_captions",
        "prompt": "Provide a short caption for each image (Image 1, Image 2).",
        "images": ["image_c.png", "image_d.png"]
      },
      {
        "id": "multi_image_rank",
        "prompt": "Rank the attached images from most colorful to least colorful.",
        "images": ["image_a.png", "image_b.png", "image_c.png"]
      },
      {
        "id": "multi_image_choice",
        "prompt": "Which image looks more vibrant? Answer with 'Image 1' or 'Image 2'.",
        "images": ["image_b.png", "image_d.png"]
      }
    ]
    ```
  • feat: add support for building with Bazel (#8875)
    This PR configures Codex CLI so it can be built with
    [Bazel](https://bazel.build) in addition to Cargo. The `.bazelrc`
    includes configuration so that remote builds can be done using
    [BuildBuddy](https://www.buildbuddy.io).
    
    If you are familiar with Bazel, things should work as you expect, e.g.,
    run `bazel test //... --keep-going` to run all the tests in the repo,
    but we have also added some new aliases in the `justfile` for
    convenience:
    
    - `just bazel-test` to run tests locally
    - `just bazel-remote-test` to run tests remotely (currently, the remote
    build is for x86_64 Linux regardless of your host platform). Note we are
    currently seeing the following test failures in the remote build, so we
    still need to figure out what is happening here:
    
    ```
    failures:
        suite::compact::manual_compact_twice_preserves_latest_user_messages
        suite::compact_resume_fork::compact_resume_after_second_compaction_preserves_history
        suite::compact_resume_fork::compact_resume_and_fork_preserve_model_history_view
    ```
    
    - `just build-for-release` to build release binaries for all
    platforms/architectures remotely
    
    To setup remote execution:
    - [Create a buildbuddy account](https://app.buildbuddy.io/) (OpenAI
    employees should also request org access at
    https://openai.buildbuddy.io/join/ with their `@openai.com` email
    address.)
    - [Copy your API key](https://app.buildbuddy.io/docs/setup/) to
    `~/.bazelrc` (add the line `build
    --remote_header=x-buildbuddy-api-key=YOUR_KEY`)
    - Use `--config=remote` in your `bazel` invocations (or add `common
    --config=remote` to your `~/.bazelrc`, or use the `just` commands)
    
    ## CI
    
    In terms of CI, this PR introduces `.github/workflows/bazel.yml`, which
    uses Bazel to run the tests _locally_ on Mac and Linux GitHub runners
    (we are working on supporting Windows, but that is not ready yet). Note
    that the failures we are seeing in `just bazel-remote-test` do not occur
    on these GitHub CI jobs, so everything in `.github/workflows/bazel.yml`
    is green right now.
    
    The `bazel.yml` uses extra config in `.github/workflows/ci.bazelrc` so
    that macOS CI jobs build _remotely_ on Linux hosts (using the
    `docker://docker.io/mbolin491/codex-bazel` Docker image declared in the
    root `BUILD.bazel`) using cross-compilation to build the macOS
    artifacts. Then these artifacts are downloaded locally to GitHub's macOS
    runner so the tests can be executed natively. This is the relevant
    config that enables this:
    
    ```
    common:macos --config=remote
    common:macos --strategy=remote
    common:macos --strategy=TestRunner=darwin-sandbox,local
    ```
    
    Because of the remote caching benefits we get from BuildBuddy, these new
    CI jobs can be extremely fast! For example, consider these two jobs that
    ran all the tests on Linux x86_64:
    
    - Bazel 1m37s
    https://github.com/openai/codex/actions/runs/20861063212/job/59940545209?pr=8875
    - Cargo 9m20s
    https://github.com/openai/codex/actions/runs/20861063192/job/59940559592?pr=8875
    
    For now, we will continue to run both the Bazel and Cargo jobs for PRs,
    but once we add support for Windows and running Clippy, we should be
    able to cutover to using Bazel exclusively for PRs, which should still
    speed things up considerably. We will probably continue to run the Cargo
    jobs post-merge for commits that land on `main` as a sanity check.
    
    Release builds will also continue to be done by Cargo for now.
    
    Earlier attempt at this PR: https://github.com/openai/codex/pull/8832
    Earlier attempt to add support for Buck2, now abandoned:
    https://github.com/openai/codex/pull/8504
    
    ---------
    
    Co-authored-by: David Zbarsky <dzbarsky@gmail.com>
    Co-authored-by: Michael Bolin <mbolin@openai.com>
  • fix: add tui.alternate_screen config and --no-alt-screen CLI flag for Zellij scrollback (#8555)
    Fixes #2558
    
    Codex uses alternate screen mode (CSI 1049) which, per xterm spec,
    doesn't support scrollback. Zellij follows this strictly, so users can't
    scroll back through output.
    
    **Changes:**
    - Add `tui.alternate_screen` config: `auto` (default), `always`, `never`
    - Add `--no-alt-screen` CLI flag
    - Auto-detect Zellij and skip alt screen (uses existing `ZELLIJ` env var
    detection)
    
    **Usage:**
    ```bash
    # CLI flag
    codex --no-alt-screen
    
    # Or in config.toml
    [tui]
    alternate_screen = "never"
    ```
    
    With default `auto` mode, Zellij users get working scrollback without
    any config changes.
    
    ---------
    
    Co-authored-by: Josh McKinney <joshka@openai.com>
  • Merge Modelfamily into modelinfo (#8763)
    - Merge ModelFamily into ModelInfo
    - Remove logic for adding instructions to apply patch
    - Add compaction limit and visible context window to `ModelInfo`
  • chore: unify conversation with thread name (#8830)
    Done and verified by Codex + refactor feature of RustRover
  • feat(app-server): thread/rollback API (#8454)
    Add `thread/rollback` to app-server to support IDEs undo-ing the last N
    turns of a thread.
    
    For context, an IDE partner will be supporting an "undo" capability
    where the IDE (the app-server client) will be responsible for reverting
    the local changes made during the last turn. To support this well, we
    also need a way to drop the last turn (or more generally, the last N
    turns) from the agent's context. This is what `thread/rollback` does.
    
    **Core idea**: A Thread rollback is represented as a persisted event
    message (EventMsg::ThreadRollback) in the rollout JSONL file, not by
    rewriting history. On resume, both the model's context (core replay) and
    the UI turn list (app-server v2's thread history builder) apply these
    markers so the pruned history is consistent across live conversations
    and `thread/resume`.
    
    Implementation notes:
    - Rollback only affects agent context and appends to the rollout file;
    clients are responsible for reverting files on disk.
    - If a thread rollback is currently in progress, subsequent
    `thread/rollback` calls are rejected.
    - Because we use `CodexConversation::submit` and codex core tracks
    active turns, returning an error on concurrent rollbacks is communicated
    via an `EventMsg::Error` with a new variant
    `CodexErrorInfo::ThreadRollbackFailed`. app-server watches for that and
    sends the BAD_REQUEST RPC response.
    
    Tests cover thread rollbacks in both core and app-server, including when
    `num_turns` > existing turns (which clears all turns).
    
    **Note**: this explicitly does **not** behave like `/undo` which we just
    removed from the CLI, which does the opposite of what `thread/rollback`
    does. `/undo` reverts local changes via ghost commits/snapshots and does
    not modify the agent's context / conversation history.
  • feat: expose outputSchema to user_turn/turn_start app_server API (#8377)
    What changed
    - Added `outputSchema` support to the app-server APIs, mirroring `codex
    exec --output-schema` behavior.
    - V1 `sendUserTurn` now accepts `outputSchema` and constrains the final
    assistant message for that turn.
    - V2 `turn/start` now accepts `outputSchema` and constrains the final
    assistant message for that turn (explicitly per-turn only).
    
    Core behavior
    - `Op::UserTurn` already supported `final_output_json_schema`; now V1
    `sendUserTurn` forwards `outputSchema` into that field.
    - `Op::UserInput` now carries `final_output_json_schema` for per-turn
    settings updates; core maps it into
    `SessionSettingsUpdate.final_output_json_schema` so it applies to the
    created turn context.
    - V2 `turn/start` does NOT persist the schema via `OverrideTurnContext`
    (it’s applied only for the current turn). Other overrides
    (cwd/model/etc) keep their existing persistent behavior.
    
    API / docs
    - `codex-rs/app-server-protocol/src/protocol/v1.rs`: add `output_schema:
    Option<serde_json::Value>` to `SendUserTurnParams` (serialized as
    `outputSchema`).
    - `codex-rs/app-server-protocol/src/protocol/v2.rs`: add `output_schema:
    Option<JsonValue>` to `TurnStartParams` (serialized as `outputSchema`).
    - `codex-rs/app-server/README.md`: document `outputSchema` for
    `turn/start` and clarify it applies only to the current turn.
    - `codex-rs/docs/codex_mcp_interface.md`: document `outputSchema` for v1
    `sendUserTurn` and v2 `turn/start`.
    
    Tests added/updated
    - New app-server integration tests asserting `outputSchema` is forwarded
    into outbound `/responses` requests as `text.format`:
      - `codex-rs/app-server/tests/suite/output_schema.rs`
      - `codex-rs/app-server/tests/suite/v2/output_schema.rs`
    - Added per-turn semantics tests (schema does not leak to the next
    turn):
      - `send_user_turn_output_schema_is_per_turn_v1`
      - `turn_start_output_schema_is_per_turn_v2`
    - Added protocol wire-compat tests for the merged op:
      - serialize omits `final_output_json_schema` when `None`
      - deserialize works when field is missing
      - serialize includes `final_output_json_schema` when `Some(schema)`
    
    Call site updates (high level)
    - Updated all `Op::UserInput { .. }` constructions to include
    `final_output_json_schema`:
      - `codex-rs/app-server/src/codex_message_processor.rs`
      - `codex-rs/core/src/codex_delegate.rs`
      - `codex-rs/mcp-server/src/codex_tool_runner.rs`
      - `codex-rs/tui/src/chatwidget.rs`
      - `codex-rs/tui2/src/chatwidget.rs`
      - plus impacted core tests.
    
    Validation
    - `just fmt`
    - `cargo test -p codex-core`
    - `cargo test -p codex-app-server`
    - `cargo test -p codex-mcp-server`
    - `cargo test -p codex-tui`
    - `cargo test -p codex-tui2`
    - `cargo test -p codex-protocol`
    - `cargo clippy --all-features --tests --profile dev --fix -- -D
    warnings`
  • Account for last token count on resume (#8677)
    last token count in context manager is initialized to 0. Gets populated
    only on events from server.
    
    This PR populates it on resume so we can decide if we need to compact or
    not.
  • Log compaction request bodies (#8676)
    We already log request bodies for normal requests, logging for
    compaction helps with debugging.