Commit Graph

161 Commits

  • [apps] Add thread_id param to optionally load thread config for apps feature check. (#11279)
    - [x] Add thread_id param to optionally load thread config for apps
    feature check
  • fix(app-server): for external auth, replace id_token with chatgpt_acc… (#11240)
    …ount_id and chatgpt_plan_type
    
    ### Summary
    Following up on external auth mode which was introduced here:
    https://github.com/openai/codex/pull/10012
    
    Turns out some clients have a differently shaped ID token and don't have
    a chosen workspace (aka chatgpt_account_id) encoded in their ID token.
    So, let's replace `id_token` param with `chatgpt_account_id` and
    `chatgpt_plan_type` (optional) when initializing the external ChatGPT
    auth mode (`account/login/start` with `chatgptAuthTokens`).
    
    The client was able to test end-to-end with a Codex build from this
    branch and verified it worked!
  • feat: do not close unified exec processes across turns (#10799)
    With this PR we do not close the unified exec processes (i.e. background
    terminals) at the end of a turn unless:
    * The user interrupt the turn
    * The user decide to clean the processes through `app-server` or
    `/clean`
    
    I made sure that `codex exec` correctly kill all the processes
  • feat: include NetworkConfig through ExecParams (#11105)
    This PR adds the following field to `Config`:
    
    ```rust
    pub network: Option<NetworkProxy>,
    ```
    
    Though for the moment, it will always be initialized as `None` (this
    will be addressed in a subsequent PR).
    
    This PR does the work to thread `network` through to `execute_exec_env()`, `process_exec_tool_call()`, and `UnifiedExecRuntime.run()` to ensure it is available whenever we span a process.
  • [apps] Improve app loading. (#10994)
    There are two concepts of apps that we load in the harness:
    
    - Directory apps, which is all the apps that the user can install.
    - Accessible apps, which is what the user actually installed and can be
    $ inserted and be used by the model. These are extracted from the tools
    that are loaded through the gateway MCP.
    
    Previously we wait for both sets of apps before returning the full apps
    list. Which causes many issues because accessible apps won't be
    available to the UI or the model if directory apps aren't loaded or
    failed to load.
    
    In this PR we are separating them so that accessible apps can be loaded
    separately and are instantly available to be shown in the UI and to be
    provided in model context. We also added an app-server event so that
    clients can subscribe to also get accessible apps without being blocked
    on the full app list.
    
    - [x] Separate accessible apps and directory apps loading.
    - [x] `app/list` request will also emit `app/list/updated` notifications
    that app-server clients can subscribe. Which allows clients to get
    accessible apps list to render in the $ menu without being blocked by
    directory apps.
    - [x] Cache both accessible and directory apps with 1 hour TTL to avoid
    reloading them when creating new threads.
    - [x] TUI improvements to redraw $ menu and /apps menu when app list is
    updated.
  • Defer persistence of rollout file (#11028)
    - Defer rollout persistence for fresh threads (`InitialHistory::New`):
    keep rollout events in memory and only materialize rollout file + state
    DB row on first `EventMsg::UserMessage`.
    - Keep precomputed rollout path available before materialization.
    - Change `thread/start` to build thread response from live config
    snapshot and optional precomputed path.
    - Improve pre-materialization behavior in app-server/TUI: clearer
    invalid-request errors for file-backed ops and a friendlier `/fork` “not
    ready yet” UX.
    - Update tests to match deferred semantics across
    start/read/archive/unarchive/fork/resume/review flows.
    - Improved resilience of user_shell test, which should be unrelated to
    this change but must be affected by timing changes
    
    For Reviewers:
    * The primary change is in recorder.rs
    * Most of the other changes were to fix up broken assumptions in
    existing tests
    
    Testing:
    * Manually tested CLI
    * Exercised app server paths by manually running IDE Extension with
    rebuilt CLI binary
    * Only user-visible change is that `/fork` in TUI generates visible
    error if used prior to first turn
  • app-server: treat null mode developer instructions as built-in defaults (#10983)
    ## Summary
    - make `turn/start` normalize
    `collaborationMode.settings.developer_instructions: null` to the
    built-in instructions for the selected mode
    - prevent app-server clients from accidentally clearing mode-switch
    developer instructions by sending `null`
    - document this behavior in the v2 protocol and app-server docs
    
    ## What changed
    - `codex-rs/app-server/src/codex_message_processor.rs`
      - added a small `normalize_turn_start_collaboration_mode` helper
      - in `turn_start`, apply normalization before `OverrideTurnContext`
    - `codex-rs/app-server/tests/suite/v2/turn_start.rs`
    - extended `turn_start_accepts_collaboration_mode_override_v2` to assert
    the outgoing request includes default-mode instruction text when the
    client sends `developer_instructions: null`
    - `codex-rs/app-server-protocol/src/protocol/v2.rs`
    - clarified `TurnStartParams.collaboration_mode` docs:
    `settings.developer_instructions: null` means use built-in mode
    instructions
    - regenerated schema fixture:
    - `codex-rs/app-server-protocol/schema/typescript/v2/TurnStartParams.ts`
    - docs:
      - `codex-rs/app-server/README.md`
      - `codex-rs/docs/codex_mcp_interface.md`
  • feat(app-server): turn/steer API (#10821)
    This PR adds a dedicated `turn/steer` API for appending user input to an
    in-flight turn.
    
    ## Motivation
    Currently, steering in the app is implemented by just calling
    `turn/start` while a turn is running. This has some really weird quirks:
    - Client gets back a new `turn.id`, even though streamed
    events/approvals remained tied to the original active turn ID.
    - All the various turn-level override params on `turn/start` do not
    apply to the "steer", and would only apply to the next real turn.
    - There can also be a race condition where the client thinks the turn is
    active but the server has already completed it, so there might be bugs
    if the client has baked in some client-specific behavior thinking it's a
    steer when in fact the server kicked off a new turn. This is
    particularly possible when running a client against a remote app-server.
    
    Having a dedicated `turn/steer` API eliminates all those quirks.
    
    `turn/steer` behavior:
    - Requires an active turn on threadId. Returns a JSON-RPC error if there
    is no active turn.
    - If expectedTurnId is provided, it must match the active turn (more
    useful when connecting to a remote app-server).
    - Does not emit `turn/started`.
    - Does not accept turn overrides (`cwd`, `model`, `sandbox`, etc.) or
    `outputSchema` to accurately reflect that these are not applied when
    steering.
  • Add stage field for experimental flags. (#10793)
    - [x] Add stage field for experimental flags.
  • Sync app-server requirements API with refreshed cloud loader (#10815)
    configRequirements/read now returns updated cloud requirements after
    login.
  • Add app-server transport layer with websocket support (#10693)
    - Adds --listen <URL> to codex app-server with two listen modes:
          - stdio:// (default, existing behavior)
          - ws://IP:PORT (new websocket transport)
      - Refactors message routing to be connection-aware:
    - Tracks per-connection session state (initialize/experimental
    capability)
          - Routes responses/errors to the originating connection
    - Broadcasts server notifications/requests to initialized connections
    - Updates initialization semantics to be per connection (not
    process-global), and updates app-server docs accordingly.
    - Adds websocket accept/read/write handling (JSON-RPC per text frame,
    ping/pong handling, connection lifecycle events).
    
    Testing
    
    - Unit tests for transport URL parsing and targeted response/error
    routing.
      - New websocket integration test validating:
          - per-connection initialization requirements
          - no cross-connection response leakage
          - same request IDs on different connections route independently.
  • [app-server] Add a method to list experimental features. (#10721)
    - [x] Add a method to list experimental features.
  • fix(auth): isolate chatgptAuthTokens concept to auth manager and app-server (#10423)
    So that the rest of the codebase (like TUI) don't need to be concerned
    whether ChatGPT auth was handled by Codex itself or passed in via
    app-server's external auth mode.
  • Leverage state DB metadata for thread summaries (#10621)
    Summary:
    - read conversation summaries and cwd info from the state DB when
    possible so we no longer rely on rollout files for metadata and avoid
    extra I/O
    - persist CLI version in thread metadata, surface it through summary
    builders, and add the necessary DB migration hooks
    - simplify thread listing by using enriched state DB data directly
    rather than reading rollout heads
    
    Testing:
    - Not run (not requested)
  • Reload cloud requirements after user login (#10725)
    Reload cloud requirements after user login so it could take effect
    immediately.
  • feat(linux-sandbox): add bwrap support (#9938)
    ## Summary
    This PR introduces a gated Bubblewrap (bwrap) Linux sandbox path. The
    curent Linux sandbox path relies on in-process restrictions (including
    Landlock). Bubblewrap gives us a more uniform filesystem isolation
    model, especially explicit writable roots with the option to make some
    directories read-only and granular network controls.
    
    This is behind a feature flag so we can validate behavior safely before
    making it the default.
    
    - Added temporary rollout flag:
      - `features.use_linux_sandbox_bwrap`
    - Preserved existing default path when the flag is off.
    - In Bubblewrap mode:
    - Added internal retry without /proc when /proc mount is not permitted
    by the host/container.
  • Add thread/compact v2 (#10445)
    - add `thread/compact` as a trigger-only v2 RPC that submits
    `Op::Compact` and returns `{}` immediately.
    - add v2 compaction e2e coverage for success and invalid/unknown thread
    ids, and update protocol schemas/docs.
  • feat: add APIs to list and download public remote skills (#10448)
    Add API to list / download from remote public skills
  • Inject CODEX_THREAD_ID into the terminal environment (#10096)
    Inject CODEX_THREAD_ID (when applicable) into the terminal environment
    so that the agent (and skills) can refer to the current thread / session
    ID.
    
    Discussion:
    https://openai.slack.com/archives/C095U48JNL9/p1769542492067109
  • feat: experimental flags (#10231)
    ## Problem being solved
    - We need a single, reliable way to mark app-server API surface as
    experimental so that:
      1. the runtime can reject experimental usage unless the client opts in
    2. generated TS/JSON schemas can exclude experimental methods/fields for
    stable clients.
    
    Right now that’s easy to drift or miss when done ad-hoc.
    
    ## How to declare experimental methods and fields
    - **Experimental method**: add `#[experimental("method/name")]` to the
    `ClientRequest` variant in `client_request_definitions!`.
    - **Experimental field**: on the params struct, derive `ExperimentalApi`
    and annotate the field with `#[experimental("method/name.field")]` + set
    `inspect_params: true` for the method variant so
    `ClientRequest::experimental_reason()` inspects params for experimental
    fields.
    
    ## How the macro solves it
    - The new derive macro lives in
    `codex-rs/codex-experimental-api-macros/src/lib.rs` and is used via
    `#[derive(ExperimentalApi)]` plus `#[experimental("reason")]`
    attributes.
    - **Structs**:
    - Generates `ExperimentalApi::experimental_reason(&self)` that checks
    only annotated fields.
      - The “presence” check is type-aware:
        - `Option<T>`: `is_some_and(...)` recursively checks inner.
        - `Vec`/`HashMap`/`BTreeMap`: must be non-empty.
        - `bool`: must be `true`.
        - Other types: considered present (returns `true`).
    - Registers each experimental field in an `inventory` with `(type_name,
    serialized field name, reason)` and exposes `EXPERIMENTAL_FIELDS` for
    that type. Field names are converted from `snake_case` to `camelCase`
    for schema/TS filtering.
    - **Enums**:
    - Generates an exhaustive `match` returning `Some(reason)` for annotated
    variants and `None` otherwise (no wildcard arm).
    - **Wiring**:
    - Runtime gating uses `ExperimentalApi::experimental_reason()` in
    `codex-rs/app-server/src/message_processor.rs` to reject requests unless
    `InitializeParams.capabilities.experimental_api == true`.
    - Schema/TS export filters use the inventory list and
    `EXPERIMENTAL_CLIENT_METHODS` from `client_request_definitions!` to
    strip experimental methods/fields when `experimental_api` is false.
  • Bump thread updated_at on unarchive to refresh sidebar ordering (#10280)
    ## Summary
    - Touch restored rollout files on `thread/unarchive` so `updatedAt`
    reflects the unarchive time.
    - Add a regression test to ensure unarchiving bumps `updated_at` from an
    old mtime.
    
    ## Notes
    This fixes the UX issue where unarchived old threads don’t reappear near
    the top of recent threads.
  • chore(config) Rename config setting to personality (#10314)
    ## Summary
    Let's make the setting name consistent with the SlashCommand!
    
    ## Testing
    - [x] Updated tests
  • Wire up cloud reqs in exec, app-server (#10241)
    We're fetching cloud requirements in TUI in
    https://github.com/openai/codex/pull/10167.
    
    This adds the same fetching in exec and app-server binaries also.
  • chore: rename ChatGpt -> Chatgpt in type names (#10244)
    When using ChatGPT in names of types, we should be consistent, so this
    renames some types with `ChatGpt` in the name to `Chatgpt`. From
    https://rust-lang.github.io/api-guidelines/naming.html:
    
    > In `UpperCamelCase`, acronyms and contractions of compound words count
    as one word: use `Uuid` rather than `UUID`, `Usize` rather than `USize`
    or `Stdin` rather than `StdIn`. In `snake_case`, acronyms and
    contractions are lower-cased: `is_xid_start`.
    
    This PR updates existing uses of `ChatGpt` and changes them to
    `Chatgpt`. Though in all cases where it could affect the wire format, I
    visually inspected that we don't change anything there. That said, this
    _will_ change the codegen because it will affect the spelling of type
    names.
    
    For example, this renames `AuthMode::ChatGPT` to `AuthMode::Chatgpt` in
    `app-server-protocol`, but the wire format is still `"chatgpt"`.
    
    This PR also updates a number of types in `codex-rs/core/src/auth.rs`.
  • feat: refactor CodexAuth so invalid state cannot be represented (#10208)
    Previously, `CodexAuth` was defined as follows:
    
    
    https://github.com/openai/codex/blob/d550fbf41afc09d7d7b5ac813aea38de07b2a73f/codex-rs/core/src/auth.rs#L39-L46
    
    But if you looked at its constructors, we had creation for
    `AuthMode::ApiKey` where `storage` was built using a nonsensical path
    (`PathBuf::new()`) and `auth_dot_json` was `None`:
    
    
    https://github.com/openai/codex/blob/d550fbf41afc09d7d7b5ac813aea38de07b2a73f/codex-rs/core/src/auth.rs#L212-L220
    
    By comparison, when `AuthMode::ChatGPT` was used, `api_key` was always
    `None`:
    
    
    https://github.com/openai/codex/blob/d550fbf41afc09d7d7b5ac813aea38de07b2a73f/codex-rs/core/src/auth.rs#L665-L671
    
    https://github.com/openai/codex/pull/10012 took things further because
    it introduced a new `ChatgptAuthTokens` variant to `AuthMode`, which is
    important in when invoking `account/login/start` via the app server, but
    most logic _internal_ to the app server should just reason about two
    `AuthMode` variants: `ApiKey` and `ChatGPT`.
    
    This PR tries to clean things up as follows:
    
    - `LoginAccountParams` and `AuthMode` in `codex-rs/app-server-protocol/`
    both continue to have the `ChatgptAuthTokens` variant, though it is used
    exclusively for the on-the-wire messaging.
    - `codex-rs/core/src/auth.rs` now has its own `AuthMode` enum, which
    only has two variants: `ApiKey` and `ChatGPT`.
    - `CodexAuth` has been changed from a struct to an enum. It is a
    disjoint union where each variant (`ApiKey`, `ChatGpt`, and
    `ChatGptAuthTokens`) have only the associated fields that make sense for
    that variant.
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/10208).
    * #10224
    * __->__ #10208
  • Conversation naming (#8991)
    Session renaming:
    - `/rename my_session`
    - `/rename` without arg and passing an argument in `customViewPrompt`
    - AppExitInfo shows resume hint using the session name if set instead of
    uuid, defaults to uuid if not set
    - Names are stored in `CODEX_HOME/sessions.jsonl`
    
    Session resuming:
    - codex resume <name> lookup for `CODEX_HOME/sessions.jsonl` first entry
    matching the name and resumes the session
    
    ---------
    
    Co-authored-by: jif-oai <jif@openai.com>
  • feat(app-server): support external auth mode (#10012)
    This enables a new use case where `codex app-server` is embedded into a
    parent application that will directly own the user's ChatGPT auth
    lifecycle, which means it owns the user’s auth tokens and refreshes it
    when necessary. The parent application would just want a way to pass in
    the auth tokens for codex to use directly.
    
    The idea is that we are introducing a new "auth mode" currently only
    exposed via app server: **`chatgptAuthTokens`** which consist of the
    `id_token` (stores account metadata) and `access_token` (the bearer
    token used directly for backend API calls). These auth tokens are only
    stored in-memory. This new mode is in addition to the existing `apiKey`
    and `chatgpt` auth modes.
    
    This PR reuses the shape of our existing app-server account APIs as much
    as possible:
    - Update `account/login/start` with a new `chatgptAuthTokens` variant,
    which will allow the client to pass in the tokens and have codex
    app-server use them directly. Upon success, the server emits
    `account/login/completed` and `account/updated` notifications.
    - A new server->client request called
    `account/chatgptAuthTokens/refresh` which the server can use whenever
    the access token previously passed in has expired and it needs a new one
    from the parent application.
    
    I leveraged the core 401 retry loop which typically triggers auth token
    refreshes automatically, but made it pluggable:
    - **chatgpt** mode refreshes internally, as usual.
    - **chatgptAuthTokens** mode calls the client via
    `account/chatgptAuthTokens/refresh`, the client responds with updated
    tokens, codex updates its in-memory auth, then retries. This RPC has a
    10s timeout and handles JSON-RPC errors from the client.
    
    Also some additional things:
    - chatgpt logins are blocked while external auth is active (have to log
    out first. typically clients will pick one OR the other, not support
    both)
    - `account/logout` clears external auth in memory
    - Ensures that if `forced_chatgpt_workspace_id` is set via the user's
    config, we respect it in both:
    - `account/login/start` with `chatgptAuthTokens` (returns a JSON-RPC
    error back to the client)
    - `account/chatgptAuthTokens/refresh` (fails the turn, and on next
    request app-server will send another `account/chatgptAuthTokens/refresh`
    request to the client).
  • [connectors] Support connectors part 2 - slash command and tui (#9728)
    - [x] Support `/apps` slash command to browse the apps in tui.
    - [x] Support inserting apps to prompt using `$`.
    - [x] Lots of simplification/renaming from connectors to apps.
  • feat: sqlite 1 (#10004)
    Add a `.sqlite` database to be used to store rollout metatdata (and
    later logs)
    This PR is phase 1:
    * Add the database and the required infrastructure
    * Add a backfill of the database
    * Persist the newly created rollout both in files and in the DB
    * When we need to get metadata or a rollout, consider the `JSONL` as the
    source of truth but compare the results with the DB and show any errors
  • [skills] Auto install MCP dependencies when running skils with dependency specs. (#9982)
    Auto install MCP dependencies when running skils with dependency specs.
  • remove sandbox globals. (#9797)
    Threads sandbox updates through OverrideTurnContext for active turn
    Passes computed sandbox type into safety/exec
  • Add MCP server scopes config and use it as fallback for OAuth login (#9647)
    ### Motivation
    - Allow MCP OAuth flows to request scopes defined in `config.toml`
    instead of requiring users to always pass `--scopes` on the CLI.
    CLI/remote parameters should still override config values.
    
    ### Description
    - Add optional `scopes: Option<Vec<String>>` to `McpServerConfig` and
    `RawMcpServerConfig`, and propagate it through deserialization and the
    built config types.
    - Serialize `scopes` into the MCP server TOML via
    `serialize_mcp_server_table` in `core/src/config/edit.rs` and include
    `scopes` in the generated config schema (`core/config.schema.json`).
    - CLI: update `codex-rs/cli/src/mcp_cmd.rs` `run_login` to fall back to
    `server.scopes` when the `--scopes` flag is empty, with explicit CLI
    scopes still taking precedence.
    - App server: update
    `codex-rs/app-server/src/codex_message_processor.rs`
    `mcp_server_oauth_login` to use `params.scopes.or_else(||
    server.scopes.clone())` so the RPC path also respects configured scopes.
    - Update many test fixtures to initialize the new `scopes` field (set to
    `None`) so test code builds with the new struct field.
    
    ### Testing
    - Ran config tooling and formatters: `just write-config-schema`
    (succeeded), `just fmt` (succeeded), and `just fix -p codex-core`, `just
    fix -p codex-cli`, `just fix -p codex-app-server` (succeeded where
    applicable).
    - Ran unit tests for the CLI: `cargo test -p codex-cli` (passed).
    - Ran unit tests for core: `cargo test -p codex-core` (ran; many tests
    passed but several failed, including model refresh/403-related tests,
    shell snapshot/timeouts, and several `unified_exec` expectations).
    - Ran app-server tests: `cargo test -p codex-app-server` (ran; many
    integration-suite tests failed due to mocked/remote HTTP 401/403
    responses and wiremock expectations).
    
    If you want, I can split the tests into smaller focused runs or help
    debug the failing integration tests (they appear to be unrelated to the
    config change and stem from external HTTP/mocking behaviors encountered
    during the test runs).
    
    ------
    [Codex
    Task](https://chatgpt.com/codex/tasks/task_i_69718f505914832ea1f334b3ba064553)
  • Add thread/unarchive to restore archived rollouts (#9843)
    ## Summary
    - Adds a new `thread/unarchive` RPC to move archived thread rollouts
    back into the active `sessions/` tree.
    
    ## What changed
    - **Protocol**
      - Adds `thread/unarchive` request/response types and wiring.
    - **Server**
      - Implements `thread_unarchive` in the app server.
      - Validates the archived rollout path and thread ID.
    - Restores the rollout to `sessions/YYYY/MM/DD/...` based on the rollout
    filename timestamp.
    - **Core**
    - Adds `find_archived_thread_path_by_id_str` helper for archived
    rollouts.
    - **Docs**
      - Documents the new RPC and usage example.
    - **Tests**
      - Adds an end-to-end server test that:
        1) starts a thread,
        2) archives it,
        3) unarchives it,
        4) asserts the file is restored to `sessions/`.
    
    ## How to use
    ```json
    { "method": "thread/unarchive", "id": 24, "params": { "threadId": "<thread-id>" } }
    ```
    
    ## Author Codex Session
    
    `codex resume 019bf158-54b6-7960-a696-9d85df7e1bc1` (soon I'll make this
    kind of session UUID forkable by anyone with the right
    `session_object_storage_url` line in their config, but for now just
    pasting it here for my reference)
  • feat: dynamic tools injection (#9539)
    ## Summary
    Add dynamic tool injection to thread startup in API v2, wire dynamic
    tool calls through the app server to clients, and plumb responses back
    into the model tool pipeline.
    
    ### Flow (high level)
    - Thread start injects `dynamic_tools` into the model tool list for that
    thread (validation is done here).
    - When the model emits a tool call for one of those names, core raises a
    `DynamicToolCallRequest` event.
    - The app server forwards it to the client as `item/tool/call`, waits
    for the client’s response, then submits a `DynamicToolResponse` back to
    core.
    - Core turns that into a `function_call_output` in the next model
    request so the model can continue.
    
    ### What changed
    - Added dynamic tool specs to v2 thread start params and protocol types;
    introduced `item/tool/call` (request/response) for dynamic tool
    execution.
    - Core now registers dynamic tool specs at request time and routes those
    calls via a new dynamic tool handler.
    - App server validates tool names/schemas, forwards dynamic tool call
    requests to clients, and publishes tool outputs back into the session.
    - Integration tests
  • feat: ephemeral threads (#9765)
    Add ephemeral threads capabilities. Only exposed through the
    `app-server` v2
    
    The idea is to disable the rollout recorder for those threads.
  • feat(app-server) Expose personality (#9674)
    ### Motivation
    Exposes a per-thread / per-turn `personality` override in the v2
    app-server API so clients can influence model communication style at
    thread/turn start. Ensures the override is passed into the session
    configuration resolution so it becomes effective for subsequent turns
    and headless runners.
    
    ### Testing
    - [x] Add an integration-style test
    `turn_start_accepts_personality_override_v2` in
    `codex-rs/app-server/tests/suite/v2/turn_start.rs` that verifies a
    `/personality` override results in a developer update message containing
    `<personality_spec>` in the outbound model request.
    
    ------
    [Codex
    Task](https://chatgpt.com/codex/tasks/task_i_6971d646b1c08322a689a54d2649f3fe)
  • [connectors] Support connectors part 1 - App server & MCP (#9667)
    In order to make Codex work with connectors, we add a built-in gateway
    MCP that acts as a transparent proxy between the client and the
    connectors. The gateway MCP collects actions that are accessible to the
    user and sends them down to the user, when a connector action is chosen
    to be called, the client invokes the action through the gateway MCP as
    well.
    
     - [x] Add the system built-in gateway MCP to list and run connectors.
     - [x] Add the app server methods and protocol
  • feat(core) update Personality on turn (#9644)
    ## Summary
    Support updating Personality mid-Thread via UserTurn/OverwriteTurn. This
    is explicitly unused by the clients so far, to simplify PRs - app-server
    and tui implementations will be follow-ups.
    
    ## Testing
    - [x] added integration tests
  • Add layered config.toml support to app server (#9510)
    This PR adds support for chained (layered) config.toml file merging for
    clients that use the app server interface. This feature already exists
    for the TUI, but it does not work for GUI clients.
    
    It does the following:
    * Changes code paths for new thread, resume thread, and fork thread to
    use the effective config based on the cwd.
    * Updates the `config/read` API to accept an optional `cwd` parameter.
    If specified, the API returns the effective config based on that cwd
    path. Also optionally includes all layers including project config
    files. If cwd is not specified, the API falls back on its older behavior
    where it considers only the global (non-project) config files when
    computing the effective config.
    
    The changes in codex_message_processor.rs look deceptively large. They
    mostly just involve moving existing blocks of code to a later point in
    some functions so it can use the cwd to calculate the config.
    
    This PR builds upon #9509 and should be reviewed and merged after that
    PR.
    
    Tested:
    * Verified change with (dependent, as-yet-uncommitted) changes to IDE
    Extension and confirmed correct behavior
    
    The full fix requires additional changes in the IDE Extension code base,
    but they depend on this PR.
  • chore(instructions) Remove unread SessionMeta.instructions field (#9423)
    ### Description
    - Remove the now-unused `instructions` field from the session metadata
    to simplify SessionMeta and stop propagating transient instruction text
    through the rollout recorder API. This was only saving
    user_instructions, and was never being read.
    - Stop passing user instructions into the rollout writer at session
    creation so the rollout header only contains canonical session metadata.
    
    ### Testing
    
    - Ran `just fmt` which completed successfully.
    - Ran `just fix -p codex-protocol`, `just fix -p codex-core`, `just fix
    -p codex-app-server`, `just fix -p codex-tui`, and `just fix -p
    codex-tui2` which completed (Clippy fixes applied) as part of
    verification.
    - Ran `cargo test -p codex-protocol` which passed (28 tests).
    - Ran `cargo test -p codex-core` which showed failures in a small set of
    tests (not caused by the protocol type change directly):
    `default_client::tests::test_create_client_sets_default_headers`,
    several `models_manager::manager::tests::refresh_available_models_*`,
    and `shell_snapshot::tests::linux_sh_snapshot_includes_sections` (these
    tests failed in this CI run).
    - Ran `cargo test -p codex-app-server` which reported several failing
    integration tests (including
    `suite::codex_message_processor_flow::test_codex_jsonrpc_conversation_flow`,
    `suite::output_schema::send_user_turn_*`, and
    `suite::user_agent::get_user_agent_returns_current_codex_user_agent`).
    - `cargo test -p codex-tui` and `cargo test -p codex-tui2` were
    attempted but aborted due to disk space exhaustion (`No space left on
    device`).
    
    ------
    [Codex
    Task](https://chatgpt.com/codex/tasks/task_i_696bd8ce632483228d298cf07c7eb41c)
  • Expose collaboration presets (#9421)
    Expose collaboration presets for clients
    
    ---------
    
    Co-authored-by: Josh McKinney <joshka@openai.com>
  • Support enable/disable skill via config/api. (#9328)
    In config.toml:
    ```
    [[skills.config]]
    path = "/Users/xl/.codex/skills/my_skill/SKILL.md"
    enabled = false
    ```
    
    API:
    skills/list, skills/config/write
  • feat(app-server, core): return threads by created_at or updated_at (#9247)
    Add support for returning threads by either `created_at` OR `updated_at`
    descending. Previously core always returned threads ordered by
    `created_at`.
    
    This PR:
    - updates core to be able to list threads by `updated_at` OR
    `created_at` descending based on what the caller wants
    - also update `thread/list` in app-server to expose this (default to
    `created_at` if not specified)
    
    All existing codepaths (app-server, TUI) still default to `created_at`,
    so no behavior change is expected with this PR.
    
    **Implementation**
    To sort by `updated_at` is a bit nontrivial (whereas `created_at` is
    easy due to the way we structure the folders and filenames on disk,
    which are all based on `created_at`).
    
    The most naive way to do this without introducing a cache file or sqlite
    DB (which we have to implement/maintain) is to scan files in reverse
    `created_at` order on disk, and look at the file's mtime (last modified
    timestamp according to the filesystem) until we reach `MAX_SCAN_FILES`
    (currently set to 10,000). Then, we can return the most recent N
    threads.
    
    Based on some quick and dirty benchmarking on my machine with ~1000
    rollout files, calling `thread/list` with limit 50, the `updated_at`
    path is slower as expected due to all the I/O:
    - updated-at: average 103.10 ms
    - created-at: average 41.10 ms
    
    Those absolute numbers aren't a big deal IMO, but we can certainly
    optimize this in a followup if needed by introducing more state stored
    on disk.
    
    **Caveat**
    There's also a limitation in that any files older than `MAX_SCAN_FILES`
    will be excluded, which means if a user continues a REALLY old thread,
    it's possible to not be included. In practice that should not be too big
    of an issue.
    
    If a user makes...
    - 1000 rollouts/day → threads older than 10 days won't show up
    - 100 rollouts/day → ~100 days
    
    If this becomes a problem for some reason, even more motivation to
    implement an updated_at cache.
  • Add text element metadata to protocol, app server, and core (#9331)
    The second part of breaking up PR
    https://github.com/openai/codex/pull/9116
    
    Summary:
    
    - Add `TextElement` / `ByteRange` to protocol user inputs and user
    message events with defaults.
    - Thread `text_elements` through app-server v1/v2 request handling and
    history rebuild.
    - Preserve UI metadata only in user input/events (not `ContentItem`)
    while keeping local image attachments in user events for rehydration.
    
    Details:
    
    - Protocol: `UserInput::Text` carries `text_elements`;
    `UserMessageEvent` carries `text_elements` + `local_images`.
    Serialization includes empty vectors for backward compatibility.
    - app-server-protocol: v1 defines `V1TextElement` / `V1ByteRange` in
    camelCase with conversions; v2 uses its own camelCase wrapper.
    - app-server: v1/v2 input mapping includes `text_elements`; thread
    history rebuilds include them.
    - Core: user event emission preserves UI metadata while model history
    stays clean; history replay round-trips the metadata.