Commit Graph

227 Commits

  • chore: remove codex-core public protocol/shell re-exports (#12432)
    ## Why
    
    `codex-rs/core/src/lib.rs` re-exported a broad set of types and modules
    from `codex-protocol` and `codex-shell-command`. That made it easy for
    workspace crates to import those APIs through `codex-core`, which in
    turn hides dependency edges and makes it harder to reduce compile-time
    coupling over time.
    
    This change removes those public re-exports so call sites must import
    from the source crates directly. Even when a crate still depends on
    `codex-core` today, this makes dependency boundaries explicit and
    unblocks future work to drop `codex-core` dependencies where possible.
    
    ## What Changed
    
    - Removed public re-exports from `codex-rs/core/src/lib.rs` for:
    - `codex_protocol::protocol` and related protocol/model types (including
    `InitialHistory`)
      - `codex_protocol::config_types` (`protocol_config_types`)
    - `codex_shell_command::{bash, is_dangerous_command, is_safe_command,
    parse_command, powershell}`
    - Migrated workspace Rust call sites to import directly from:
      - `codex_protocol::protocol`
      - `codex_protocol::config_types`
      - `codex_protocol::models`
      - `codex_shell_command`
    - Added explicit `Cargo.toml` dependencies (`codex-protocol` /
    `codex-shell-command`) in crates that now import those crates directly.
    - Kept `codex-core` internal modules compiling by using `pub(crate)`
    aliases in `core/src/lib.rs` (internal-only, not part of the public
    API).
    - Updated the two utility crates that can already drop a `codex-core`
    dependency edge entirely:
      - `codex-utils-approval-presets`
      - `codex-utils-cli`
    
    ## Verification
    
    - `cargo test -p codex-utils-approval-presets`
    - `cargo test -p codex-utils-cli`
    - `cargo check --workspace --all-targets`
    - `just clippy`
  • ignore v1 in JSON schema codegen (#12408)
    ## Why
    
    The generated unnamespaced JSON envelope schemas (`ClientRequest` and
    `ServerNotification`) still contained both v1 and v2 variants, which
    pulled legacy v1/core types and v2 types into the same `definitions`
    graph. That caused `schemars` to produce numeric suffix names (for
    example `AskForApproval2`, `ByteRange2`, `MessagePhase2`).
    
    This PR moves JSON codegen toward v2-only output while preserving the
    unnamespaced envelope artifacts, and avoids reintroducing numeric-suffix
    tolerance by removing the v1/internal-only variants that caused the
    collisions in those envelope schemas.
    
    ## What Changed
    
    - In `codex-rs/app-server-protocol/src/export.rs`, JSON generation now
    excludes v1 schema artifacts (`v1/*`) while continuing to emit
    unnamespaced/root JSON schemas and the JSON bundle.
    - Added a narrow JSON v1 allowlist (`JSON_V1_ALLOWLIST`) so
    `InitializeParams` and `InitializeResponse` are still emitted.
    - Added JSON-only post-processing for the mixed envelope schemas before
    collision checks run:
    - `ClientRequest`: strips v1 request variants from the generated `oneOf`
    using the temporary `V1_CLIENT_REQUEST_METHODS` list
    - `ServerNotification`: strips v1 notifications plus the internal-only
    `rawResponseItem/completed` notification using the temporary
    `EXCLUDED_SERVER_NOTIFICATION_METHODS_FOR_JSON` list
    - Added a temporary local-definition pruning pass for those envelope
    schemas so now-unreferenced v1/core definitions are removed from
    `definitions` after method filtering.
    - Updated the variant-title naming heuristic for single-property literal
    object variants to use the literal value (when available), avoiding
    collisions like multiple `state`-only variants all deriving the same
    title.
    - Collision handling remains fail-fast (no numeric suffix fallback map
    in this PR path).
    
    ## Verification
    
    - `just write-app-server-schema`
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/12408).
    * __->__ #12408
    * #12406
  • feat: use OAI Responses API MessagePhase type directly in App Server v2 (#12422)
    https://github.com/openai/codex/pull/10455 introduced the `phase` field,
    and then https://github.com/openai/codex/pull/12072 introduced a
    `MessagePhase` type in `v2.rs` that paralleled the `MessagePhase` type
    in `codex-rs/protocol/src/models.rs`.
    
    The app server protocol prefers `camelCase` while the Responses API uses
    `snake_case`, so this meant we had two versions of `MessagePhase` with
    different serialization rules. When the app server protocol refers to
    types from the Responses API, we use the wire format of the the
    Responses API even though it is inconsistent with the app server API.
    
    This PR deletes `MessagePhase` from `v2.rs` and consolidates on the
    Responses API version to eliminate confusion.
  • Add field to Thread object for the latest rename set for a given thread (#12301)
    Exposes through the app server updated names set for a thread. This
    enables other surfaces to use the core as the source of truth for thread
    naming. `threadName` is gathered using the helper functions used to
    interact with `session_index.jsonl`, and is hydrated in:
    - `thread/list`
    - `thread/read`
    - `thread/resume`
    - `thread/unarchive`
    - `thread/rollback`
    
    We don't do this for `thread/start` and `thread/fork`.
  • fix: explicitly list name collisions in JSON schema generation (#12406)
    ## Why
    
    JSON schema codegen was silently resolving naming collisions by
    appending numeric suffixes (for example `...2`, `...3`). That makes the
    generated schema names unstable: removing an earlier colliding type can
    cause a later type to be renumbered, which is a breaking change for
    consumers that referenced the old generated name.
    
    This PR makes those collisions explicit and reviewable.
    
    Though note that once we remove `v1` from the codegen, we will no longer
    support naming collisions. Or rather, naming collisions will have to be
    handled explicitly rather than the numeric suffix approach.
    
    ## What Changed
    
    - In `codex-rs/app-server-protocol/src/export.rs`, replaced implicit
    numeric suffix collision handling for generated variant titles with
    explicit special-case maps.
    - Added a panic when a collision occurs without an entry in the map, so
    new collisions fail loudly instead of silently renaming generated schema
    types.
    - Added the currently required special cases so existing generated names
    remain stable.
    - Extended the same approach to numbered `definitions` / `$defs`
    collisions (for example `MessagePhase2`-style names) so those are also
    explicitly tracked.
    
    ## Verification
    
    - Ran targeted generator-path test:
    - `cargo test -p codex-app-server-protocol
    generate_json_filters_experimental_fields_and_methods -- --nocapture`
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/12406).
    * #12408
    * __->__ #12406
  • Add ability to attach extra files to feedback (#12370)
    Allow clients to provide extra files.
  • [apps] Implement apps configs. (#12086)
    - [x] Implement apps configs.
  • fix(network-proxy): add unix socket allow-all and update seatbelt rules (#11368)
    ## Summary
    Adds support for a Unix socket escape hatch so we can bypass socket
    allowlisting when explicitly enabled.
    
    ## Description
    * added a new flag, `network.dangerously_allow_all_unix_sockets` as an
    explicit escape hatch
    * In codex-network-proxy, enabling that flag now allows any absolute
    Unix socket path from x-unix-socket instead of requiring each path to be
    explicitly allowlisted. Relative paths are still rejected.
    * updated the macOS seatbelt path in core so it enforces the same Unix
    socket behavior:
      * allowlisted sockets generate explicit network* subpath rules
      * allow-all generates a broad network* (subpath "/") rule
    
    ---------
    
    Co-authored-by: Codex <199175422+chatgpt-codex-connector[bot]@users.noreply.github.com>
  • Refactor network approvals to host/protocol/port scope (#12140)
    ## Summary
    Simplify network approvals by removing per-attempt proxy correlation and
    moving to session-level approval dedupe keyed by (host, protocol, port).
    Instead of encoding attempt IDs into proxy credentials/URLs, we now
    treat approvals as a destination policy decision.
    
    - Concurrent calls to the same destination share one approval prompt.
    - Different destinations (or same host on different ports) get separate
    prompts.
    - Allow once approves the current queued request group only.
    - Allow for session caches that (host, protocol, port) and auto-allows
    future matching requests.
    - Never policy continues to deny without prompting.
    
    Example:
    - 3 calls: 
      - a.com (line 443)
      - b.com (line 443)
      - a.com (line 443)
    => 2 prompts total (a, b), second a waits on the first decision.
    - a.com:80 is treated separately from a.com line 443
    
    ## Testing
    - `just fmt` (in `codex-rs`)
    - `cargo test -p codex-core tools::network_approval::tests`
    - `cargo test -p codex-core` (unit tests pass; existing
    integration-suite failures remain in this environment)
  • feat: cleaner TUI for sub-agents (#12327)
    <img width="760" height="496" alt="Screenshot 2026-02-20 at 14 31 25"
    src="https://github.com/user-attachments/assets/1983b825-bb47-417e-9925-6f727af56765"
    />
  • feat: add nick name to sub-agents (#12320)
    Adding random nick name to sub-agents. Used for UX
    
    At the same time, also storing and wiring the role of the sub-agent
  • app-server: improve thread resume rejoin flow (#11776)
    thread/resume response includes latest turn with all items, in band so
    no events are stale or lost
    
    Testing
    - e2e tested using app-server-test-client using flow described in
    "Testing Thread Rejoin Behavior" in
    codex-rs/app-server-test-client/README.md
    - e2e tested in codex desktop by reconnecting to a running turn
  • feat: add Reject approval policy with granular prompt rejection controls (#12087)
    ## Why
    
    We need a way to auto-reject specific approval prompt categories without
    switching all approvals off.
    
    The goal is to let users independently control:
    - sandbox escalation approvals,
    - execpolicy `prompt` rule approvals,
    - MCP elicitation prompts.
    
    ## What changed
    
    - Added a new primary approval mode in `protocol/src/protocol.rs`:
    
    ```rust
    pub enum AskForApproval {
        // ...
        Reject(RejectConfig),
        // ...
    }
    
    pub struct RejectConfig {
        pub sandbox_approval: bool,
        pub rules: bool,
        pub mcp_elicitations: bool,
    }
    ```
    
    - Wired `RejectConfig` semantics through approval paths in `core`:
      - `core/src/exec_policy.rs`
        - rejects rule-driven prompts when `rules = true`
        - rejects sandbox/escalation prompts when `sandbox_approval = true`
    - preserves rule priority when both rule and sandbox prompt conditions
    are present
      - `core/src/tools/sandboxing.rs`
    - applies `sandbox_approval` to default exec approval decisions and
    sandbox-failure retry gating
      - `core/src/safety.rs`
    - keeps `Reject { all false }` behavior aligned with `OnRequest` for
    patch safety
        - rejects out-of-root patch approvals when `sandbox_approval = true`
      - `core/src/mcp_connection_manager.rs`
        - auto-declines MCP elicitations when `mcp_elicitations = true`
    
    - Ensured approval policy used by MCP elicitation flow stays in sync
    with constrained session policy updates.
    
    - Updated app-server v2 conversions and generated schema/TypeScript
    artifacts for the new `Reject` shape.
    
    ## Verification
    
    Added focused unit coverage for the new behavior in:
    - `core/src/exec_policy.rs`
    - `core/src/tools/sandboxing.rs`
    - `core/src/mcp_connection_manager.rs`
    - `core/src/safety.rs`
    - `core/src/tools/runtimes/apply_patch.rs`
    
    Key cases covered include rule-vs-sandbox prompt precedence, MCP
    auto-decline behavior, and patch/sandbox retry behavior under
    `RejectConfig`.
  • app-server: expose loaded thread status via read/list and notifications (#11786)
    Motivation
    - Today, a newly connected client has no direct way to determine the
    current runtime status of threads from read/list responses alone.
    - This forces clients to infer state from transient events, which can
    lead to stale or inconsistent UI when reconnecting or attaching late.
    
    Changes
    - Add `status` to `thread/read` responses.
    - Add `statuses` to `thread/list` responses.
    - Emit `thread/status/changed` notifications with `threadId` and the new
    status.
    - Track runtime status for all loaded threads and default unknown
    threads to `idle`.
    - Update protocol/docs/tests/schema fixtures for the revised API.
    
    Testing
    - Validated protocol API changes with automated protocol tests and
    regenerated schema/type fixtures.
    - Validated app-server behavior with unit and integration test suites,
    including status transitions and notifications.
  • app-server support for Windows sandbox setup. (#12025)
    app-server support for initiating Windows sandbox setup.
    server responds quickly to setup request and makes a future RPC call
    back to client when the setup finishes.
    
    The TUI implementation is unaffected but in a future PR I'll update the
    TUI to use the shared setup helper
    (`windows_sandbox.run_windows_sandbox_setup`)
  • feat(core): plumb distinct approval ids for command approvals (#12051)
    zsh fork PR stack:
    - https://github.com/openai/codex/pull/12051 👈 
    - https://github.com/openai/codex/pull/12052
    
    With upcoming support for a fork of zsh that allows us to intercept
    `execve` and run execpolicy checks for each subcommand as part of a
    `CommandExecution`, it will be possible for there to be multiple
    approval requests for a shell command like `/path/to/zsh -lc 'git status
    && rg \"TODO\" src && make test'`.
    
    To support that, this PR introduces a new `approval_id` field across
    core, protocol, and app-server so that we can associate approvals
    properly for subcommands.
  • app-server: Emit thread archive/unarchive notifications (#12030)
    * Add v2 server notifications `thread/archived` and `thread/unarchived`
    with a `threadId` payload.
    * Wire new events into `thread/archive` and `thread/unarchive` success
    paths.
    * Update app-server protocol/schema/docs accordingly.
    
    Testing:
    - Updated archive/unarchive end-to-end tests to verify both
    notifications are emitted with the expected thread id payload.
  • [apps] Expose more fields from apps listing endpoints. (#11706)
    - [x] Expose app_metadata, branding, and labels in AppInfo.
  • Add remote skill scope/product_surface/enabled params and cleanup (#11801)
    skills/remote/list: params=hazelnutScope, productSurface, enabled;
    returns=data: { id, name, description }[]
    skills/remote/export: params=hazelnutId; returns={ id, path }
  • Feat: add model reroute notification (#12001)
    ### Summary
    Builiding off
    https://github.com/openai/codex/pull/11964/files/5c75aa7b89a70bc2cc410a6fd238749306ec4c5e#diff-058ae8f109a8b84b4b79bbfa45f522c2233b9d9e139696044ae374d50b6196e0,
    we have created a `model/rerouted` notification that captures the event
    so that consumers can render as expected. Keep the `EventMsg::Warning`
    path in core so that this does not affect TUI rendering.
    
    `model/rerouted` is meant to be generic to account for future usage
    including capacity planning etc.
  • fix: send unfiltered models over model/list (#11793)
    ### What
    to unblock filtering models in VSCE, change `model/list` app-server
    endpoint to send all models + visibility field `showInPicker` so
    filtering can be done in VSCE if desired.
    
    ### Tests
    Updated tests.
  • [app-server] add fuzzyFileSearch/sessionCompleted (#11773)
    this is to allow the client to know when to stop showing a spinner.
  • Add cwd as an optional field to thread/list (#11651)
    Add's the ability to filter app-server thread/list by cwd
  • [apps] Add is_enabled to app info. (#11417)
    - [x] Add is_enabled to app info and the response of `app/list`.
    - [x] Update TUI to have Enable/Disable button on the app detail page.
  • feat(app-server): experimental flag to persist extended history (#11227)
    This PR adds an experimental `persist_extended_history` bool flag to
    app-server thread APIs so rollout logs can retain a richer set of
    EventMsgs for non-lossy Thread > Turn > ThreadItems reconstruction (i.e.
    on `thread/resume`).
    
    ### Motivation
    Today, our rollout recorder only persists a small subset (e.g. user
    message, reasoning, assistant message) of `EventMsg` types, dropping a
    good number (like command exec, file change, etc.) that are important
    for reconstructing full item history for `thread/resume`, `thread/read`,
    and `thread/fork`.
    
    Some clients want to be able to resume a thread without lossiness. This
    lossiness is primarily a UI thing, since what the model sees are
    `ResponseItem` and not `EventMsg`.
    
    ### Approach
    This change introduces an opt-in `persist_full_history` flag to preserve
    those events when you start/resume/fork a thread (defaults to `false`).
    
    This is done by adding an `EventPersistenceMode` to the rollout
    recorder:
    - `Limited` (existing behavior, default)
    - `Extended` (new opt-in behavior)
    
    In `Extended` mode, persist additional `EventMsg` variants needed for
    non-lossy app-server `ThreadItem` reconstruction. We now store the
    following ThreadItems that we didn't before:
    - web search
    - command execution
    - patch/file changes
    - MCP tool calls
    - image view calls
    - collab tool outcomes
    - context compaction
    - review mode enter/exit
    
    For **command executions** in particular, we truncate the output using
    the existing `truncate_text` from core to store an upper bound of 10,000
    bytes, which is also the default value for truncating tool outputs shown
    to the model. This keeps the size of the rollout file and command
    execution items returned over the wire reasonable.
    
    And we also persist `EventMsg::Error` which we can now map back to the
    Turn's status and populates the Turn's error metadata.
    
    #### Updates to EventMsgs
    To truly make `thread/resume` non-lossy, we also needed to persist the
    `status` on `EventMsg::CommandExecutionEndEvent` and
    `EventMsg::PatchApplyEndEvent`. Previously it was not obvious whether a
    command failed or was declined (similar for apply_patch). These
    EventMsgs were never persisted before so I made it a required field.
  • feat: make sandbox read access configurable with ReadOnlyAccess (#11387)
    `SandboxPolicy::ReadOnly` previously implied broad read access and could
    not express a narrower read surface.
    This change introduces an explicit read-access model so we can support
    user-configurable read restrictions in follow-up work, while preserving
    current behavior today.
    
    It also ensures unsupported backends fail closed for restricted-read
    policies instead of silently granting broader access than intended.
    
    ## What
    
    - Added `ReadOnlyAccess` in protocol with:
      - `Restricted { include_platform_defaults, readable_roots }`
      - `FullAccess`
    - Updated `SandboxPolicy` to carry read-access configuration:
      - `ReadOnly { access: ReadOnlyAccess }`
      - `WorkspaceWrite { ..., read_only_access: ReadOnlyAccess }`
    - Preserved existing behavior by defaulting current construction paths
    to `ReadOnlyAccess::FullAccess`.
    - Threaded the new fields through sandbox policy consumers and call
    sites across `core`, `tui`, `linux-sandbox`, `windows-sandbox`, and
    related tests.
    - Updated Seatbelt policy generation to honor restricted read roots by
    emitting scoped read rules when full read access is not granted.
    - Added fail-closed behavior on Linux and Windows backends when
    restricted read access is requested but not yet implemented there
    (`UnsupportedOperation`).
    - Regenerated app-server protocol schema and TypeScript artifacts,
    including `ReadOnlyAccess`.
    
    ## Compatibility / rollout
    
    - Runtime behavior remains unchanged by default (`FullAccess`).
    - API/schema changes are in place so future config wiring can enable
    restricted read access without another policy-shape migration.
  • change model cap to server overload (#11388)
    # External (non-OpenAI) Pull Request Requirements
    
    Before opening this Pull Request, please read the dedicated
    "Contributing" markdown file or your PR may be closed:
    https://github.com/openai/codex/blob/main/docs/contributing.md
    
    If your PR conforms to our contribution guidelines, replace this text
    with a detailed and high quality description of your changes.
    
    Include a link to a bug report or enhancement request.
  • feat: support multiple rate limits (#11260)
    Added multi-limit support end-to-end by carrying limit_name in
    rate-limit snapshots and handling multiple buckets instead of only
    codex.
    Extended /usage client parsing to consume additional_rate_limits
    Updated TUI /status and in-memory state to store/render per-limit
    snapshots
    Extended app-server rate-limit read response: kept rate_limits and added
    rate_limits_by_name.
    Adjusted usage-limit error messaging for non-default codex limit buckets
  • chore: persist turn_id in rollout session and make turn_id uuid based (#11246)
    Problem:
    1. turn id is constructed in-memory;
    2. on resuming threads, turn_id might not be unique;
    3. client cannot no the boundary of a turn from rollout files easily.
    
    This PR does three things:
    1. persist `task_started` and `task_complete` events;
    1. persist `turn_id` in rollout turn events;
    5. generate turn_id as unique uuids instead of incrementing it in
    memory.
    
    This helps us resolve the issue of clients wanting to have unique turn
    ids for resuming a thread, and knowing the boundry of each turn in
    rollout files.
    
    example debug logs
    ```
    2026-02-11T00:32:10.746876Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=8 turn=Turn { id: "019c4a07-d809-74c3-bc4b-fd9618487b4b", items: [UserMessage { id: "item-24", content: [Text { text: "hi", text_elements: [] }] }, AgentMessage { id: "item-25", text: "Hi. I’m in the workspace with your current changes loaded and ready. Send the next task and I’ll execute it end-to-end." }], status: Completed, error: None }
    2026-02-11T00:32:10.746888Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=9 turn=Turn { id: "019c4a18-1004-76c0-a0fb-a77610f6a9b8", items: [UserMessage { id: "item-26", content: [Text { text: "hello", text_elements: [] }] }, AgentMessage { id: "item-27", text: "Hello. Ready for the next change in `codex-rs`; I can continue from the current in-progress diff or start a new task." }], status: Completed, error: None }
    2026-02-11T00:32:10.746899Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=10 turn=Turn { id: "019c4a19-41f0-7db0-ad78-74f1503baeb8", items: [UserMessage { id: "item-28", content: [Text { text: "hello", text_elements: [] }] }, AgentMessage { id: "item-29", text: "Hello. Send the specific change you want in `codex-rs`, and I’ll implement it and run the required checks." }], status: Completed, error: None }
    ```
    
    backward compatibility:
    if you try to resume an old session without task_started and
    task_complete event populated, the following happens:
    - If you resume and do nothing: those reconstructed historical IDs can
    differ next time you resume.
    - If you resume and send a new turn: the new turn gets a fresh UUID from
    live submission flow and is persisted, so that new turn’s ID is stable
    on later resumes.
    I think this behavior is fine, because we only care about deterministic
    turn id once a turn is triggered.
  • feat: opt-out of events in the app-server (#11319)
    Add `optOutNotificationMethods` in the app-server to opt-out events
    based on exact method matching
  • [apps] Add thread_id param to optionally load thread config for apps feature check. (#11279)
    - [x] Add thread_id param to optionally load thread config for apps
    feature check
  • fix(app-server): for external auth, replace id_token with chatgpt_acc… (#11240)
    …ount_id and chatgpt_plan_type
    
    ### Summary
    Following up on external auth mode which was introduced here:
    https://github.com/openai/codex/pull/10012
    
    Turns out some clients have a differently shaped ID token and don't have
    a chosen workspace (aka chatgpt_account_id) encoded in their ID token.
    So, let's replace `id_token` param with `chatgpt_account_id` and
    `chatgpt_plan_type` (optional) when initializing the external ChatGPT
    auth mode (`account/login/start` with `chatgptAuthTokens`).
    
    The client was able to test end-to-end with a Codex build from this
    branch and verified it worked!
  • feat: do not close unified exec processes across turns (#10799)
    With this PR we do not close the unified exec processes (i.e. background
    terminals) at the end of a turn unless:
    * The user interrupt the turn
    * The user decide to clean the processes through `app-server` or
    `/clean`
    
    I made sure that `codex exec` correctly kill all the processes
  • [apps] Improve app loading. (#10994)
    There are two concepts of apps that we load in the harness:
    
    - Directory apps, which is all the apps that the user can install.
    - Accessible apps, which is what the user actually installed and can be
    $ inserted and be used by the model. These are extracted from the tools
    that are loaded through the gateway MCP.
    
    Previously we wait for both sets of apps before returning the full apps
    list. Which causes many issues because accessible apps won't be
    available to the UI or the model if directory apps aren't loaded or
    failed to load.
    
    In this PR we are separating them so that accessible apps can be loaded
    separately and are instantly available to be shown in the UI and to be
    provided in model context. We also added an app-server event so that
    clients can subscribe to also get accessible apps without being blocked
    on the full app list.
    
    - [x] Separate accessible apps and directory apps loading.
    - [x] `app/list` request will also emit `app/list/updated` notifications
    that app-server clients can subscribe. Which allows clients to get
    accessible apps list to render in the $ menu without being blocked by
    directory apps.
    - [x] Cache both accessible and directory apps with 1 hour TTL to avoid
    reloading them when creating new threads.
    - [x] TUI improvements to redraw $ menu and /apps menu when app list is
    updated.
  • app-server: treat null mode developer instructions as built-in defaults (#10983)
    ## Summary
    - make `turn/start` normalize
    `collaborationMode.settings.developer_instructions: null` to the
    built-in instructions for the selected mode
    - prevent app-server clients from accidentally clearing mode-switch
    developer instructions by sending `null`
    - document this behavior in the v2 protocol and app-server docs
    
    ## What changed
    - `codex-rs/app-server/src/codex_message_processor.rs`
      - added a small `normalize_turn_start_collaboration_mode` helper
      - in `turn_start`, apply normalization before `OverrideTurnContext`
    - `codex-rs/app-server/tests/suite/v2/turn_start.rs`
    - extended `turn_start_accepts_collaboration_mode_override_v2` to assert
    the outgoing request includes default-mode instruction text when the
    client sends `developer_instructions: null`
    - `codex-rs/app-server-protocol/src/protocol/v2.rs`
    - clarified `TurnStartParams.collaboration_mode` docs:
    `settings.developer_instructions: null` means use built-in mode
    instructions
    - regenerated schema fixture:
    - `codex-rs/app-server-protocol/schema/typescript/v2/TurnStartParams.ts`
    - docs:
      - `codex-rs/app-server/README.md`
      - `codex-rs/docs/codex_mcp_interface.md`
  • feat(core): add network constraints schema to requirements.toml (#10958)
    ## Summary
    
    Add `requirements.toml` schema support for admin-defined network
    constraints in the requirements layer
    
    example config:
    
    ```
    [experimental_network]
    enabled = true
    allowed_domains = ["api.openai.com"]
    denied_domains = ["example.com"]
    ```
  • Add resume_agent collab tool (#10903)
    Summary
    - add the new resume_agent collab tool path through core, protocol, and
    the app server API, including the resume events
    - update the schema/TypeScript definitions plus docs so resume_agent
    appears in generated artifacts and README
    - note that resumed agents rehydrate rollout history without overwriting
    their base instructions
    
    Testing
    - Not run (not requested)
  • feat: add support for allowed_web_search_modes in requirements.toml (#10964)
    This PR makes it possible to disable live web search via an enterprise
    config even if the user is running in `--yolo` mode (though cached web
    search will still be available). To do this, create
    `/etc/codex/requirements.toml` as follows:
    
    ```toml
    # "live" is not allowed; "disabled" is allowed even though not listed explicitly.
    allowed_web_search_modes = ["cached"]
    ```
    
    Or set `requirements_toml_base64` MDM as explained on
    https://developers.openai.com/codex/security/#locations.
    
    ### Why
    - Enforce admin/MDM/`requirements.toml` constraints on web-search
    behavior, independent of user config and per-turn sandbox defaults.
    - Ensure per-turn config resolution and review-mode overrides never
    crash when constraints are present.
    
    ### What
    - Add `allowed_web_search_modes` to requirements parsing and surface it
    in app-server v2 `ConfigRequirements` (`allowedWebSearchModes`), with
    fixtures updated.
    - Define a requirements allowlist type (`WebSearchModeRequirement`) and
    normalize semantics:
      - `disabled` is always implicitly allowed (even if not listed).
      - An empty list is treated as `["disabled"]`.
    - Make `Config.web_search_mode` a `Constrained<WebSearchMode>` and apply
    requirements via `ConstrainedWithSource<WebSearchMode>`.
    - Update per-turn resolution (`resolve_web_search_mode_for_turn`) to:
    - Prefer `Live → Cached → Disabled` when
    `SandboxPolicy::DangerFullAccess` is active (subject to requirements),
    unless the user preference is explicitly `Disabled`.
    - Otherwise, honor the user’s preferred mode, falling back to an allowed
    mode when necessary.
    - Update TUI `/debug-config` and app-server mapping to display
    normalized `allowed_web_search_modes` (including implicit `disabled`).
    - Fix web-search integration tests to assert cached behavior under
    `SandboxPolicy::ReadOnly` (since `DangerFullAccess` legitimately prefers
    `live` when allowed).
  • fix(tui): conditionally restore status indicator using message phase (#10947)
    TLDR: use new message phase field emitted by preamble-supported models
    to determine whether an AgentMessage is mid-turn commentary. if so,
    restore the status indicator afterwards to indicate the turn has not
    completed.
    
    ### Problem
    `commit_tick` hides the status indicator while streaming assistant text.
    For preamble-capable models, that text can be commentary mid-turn, so
    hiding was correct during streaming but restore timing mattered:
    - restoring too aggressively caused jitter/flashing
    - not restoring caused indicator to stay hidden before subsequent work
    (tool calls, web search, etc.)
    
    ### Fix
    - Add optional `phase` to `AgentMessageItem` and propagate it from
    `ResponseItem::Message`
    - Keep indicator hidden during streamed commit ticks, restore only when:
      - assistant item completes as `phase=commentary`, and
      - stream queues are idle + task is still running.
    - Treat `phase=None` as final-answer behavior (no restore) to keep
    existing behavior for non-preamble models
    
    ### Tests
    Add/update tests for:
    - no idle-tick restore without commentary completion
    - commentary completion restoring status before tool begin
    - snapshot coverage for preamble/status behavior
    
    ---------
    
    Co-authored-by: Josh McKinney <joshka@openai.com>
  • Mark Config.apps as experimental, correct schema generation issue (#10938)
    This PR makes `Config.apps `experimental-only and fixes a TS schema
    post-processing bug that removed needed imports. The bug happened
    because import pruning only checked the inner type body after filtering,
    not the full alias, so `JsonValue` got dropped from `Config.ts`. We now
    prune against the full alias body and added a regression test for this
    scenario.
  • chore(app-server): add experimental annotation to relevant fields (#10928)
    These fields had always been documented as experimental/unstable with
    docstrings, but now let's actually use the `experimental` annotation to
    be more explicit.
    
    - thread/start.experimentalRawEvents
    - thread/resume.history
    - thread/resume.path
    - thread/fork.path
    - turn/start.collaborationMode
    - account/login/start.chatgptAuthTokens
  • Add app configs to config.toml (#10822)
    Adds app configs to config.toml + tests
  • feat(app-server): turn/steer API (#10821)
    This PR adds a dedicated `turn/steer` API for appending user input to an
    in-flight turn.
    
    ## Motivation
    Currently, steering in the app is implemented by just calling
    `turn/start` while a turn is running. This has some really weird quirks:
    - Client gets back a new `turn.id`, even though streamed
    events/approvals remained tied to the original active turn ID.
    - All the various turn-level override params on `turn/start` do not
    apply to the "steer", and would only apply to the next real turn.
    - There can also be a race condition where the client thinks the turn is
    active but the server has already completed it, so there might be bugs
    if the client has baked in some client-specific behavior thinking it's a
    steer when in fact the server kicked off a new turn. This is
    particularly possible when running a client against a remote app-server.
    
    Having a dedicated `turn/steer` API eliminates all those quirks.
    
    `turn/steer` behavior:
    - Requires an active turn on threadId. Returns a JSON-RPC error if there
    is no active turn.
    - If expectedTurnId is provided, it must match the active turn (more
    useful when connecting to a remote app-server).
    - Does not emit `turn/started`.
    - Does not accept turn overrides (`cwd`, `model`, `sandbox`, etc.) or
    `outputSchema` to accurately reflect that these are not applied when
    steering.
  • Add stage field for experimental flags. (#10793)
    - [x] Add stage field for experimental flags.
  • [app-server] Add a method to list experimental features. (#10721)
    - [x] Add a method to list experimental features.
  • feat(app-server, core): allow text + image content items for dynamic tool outputs (#10567)
    Took over the work that @aaronl-openai started here:
    https://github.com/openai/codex/pull/10397
    
    Now that app-server clients are able to set up custom tools (called
    `dynamic_tools` in app-server), we should expose a way for clients to
    pass in not just text, but also image outputs. This is something the
    Responses API already supports for function call outputs, where you can
    pass in either a string or an array of content outputs (text, image,
    file):
    https://platform.openai.com/docs/api-reference/responses/create#responses_create-input-input_item_list-item-function_tool_call_output-output-array-input_image
    
    So let's just plumb it through in Codex (with the caveat that we only
    support text and image for now). This is implemented end-to-end across
    app-server v2 protocol types and core tool handling.
    
    ## Breaking API change
    NOTE: This introduces a breaking change with dynamic tools, but I think
    it's ok since this concept was only recently introduced
    (https://github.com/openai/codex/pull/9539) and it's better to get the
    API contract correct. I don't think there are any real consumers of this
    yet (not even the Codex App).
    
    Old shape:
    `{ "output": "dynamic-ok", "success": true }`
    
    New shape:
    ```
    {
        "contentItems": [
          { "type": "inputText", "text": "dynamic-ok" },
          { "type": "inputImage", "imageUrl": "data:image/png;base64,AAA" }
        ]
      "success": true
    }
    ```