Commit Graph

333 Commits

  • Unify rollout reconstruction with resume/fork TurnContext hydration (#12612)
    ## Summary
    
    This PR unifies rollout history reconstruction and resume/fork metadata
    hydration under a single `Session::reconstruct_history_from_rollout`
    implementation.
    
    The key change from main is that replay metadata now comes from the same
    reconstruction pass that rebuilds model-visible history, instead of
    doing a second bespoke rollout scan to recover `previous_model` /
    `reference_context_item`.
    
    ## What Changed
    
    ### Unified reconstruction output
    
    `reconstruct_history_from_rollout` now returns a single
    `RolloutReconstruction` bundle containing:
    
    - rebuilt `history`
    - `previous_model`
    - `reference_context_item`
    
    Resume and fork both consume that shared output directly.
    
    ### Reverse replay core
    
    The reconstruction logic moved into
    `codex-rs/core/src/codex/rollout_reconstruction.rs` and now scans
    rollout items newest-to-oldest.
    
    That reverse pass:
    
    - derives `previous_model`
    - derives whether `reference_context_item` is preserved or cleared
    - stops early once it has both resume metadata and a surviving
    `replacement_history` checkpoint
    
    History materialization is still bridged eagerly for now by replaying
    only the surviving suffix forward, which keeps the history result stable
    while moving the control flow toward the future lazy reverse loader
    design.
    
    ### Removed bespoke context lookup
    
    This deletes `last_rollout_regular_turn_context_lookup` and its separate
    compaction-aware scan.
    
    The previous model / baseline metadata is now computed from the same
    replay state that rebuilds history, so resume/fork cannot drift from the
    reconstructed transcript view.
    
    ### `TurnContextItem` persistence contract
    
    `TurnContextItem` is now treated as the replay source of truth for
    durable model-visible baselines.
    
    This PR keeps the following contract explicit:
    
    - persist `TurnContextItem` for the first real user turn so resume can
    recover `previous_model`
    - persist it for later turns that emit model-visible context updates
    - if mid-turn compaction reinjects full initial context into replacement
    history, persist a fresh `TurnContextItem` after `Compacted` so
    resume/fork can re-establish the baseline from the rewritten history
    - do not treat manual compaction or pre-sampling compaction as creating
    a new durable baseline on their own
    
    ## Behavior Preserved
    
    - rollback replay stays aligned with `drop_last_n_user_turns`
    - rollback skips only user turns
    - incomplete active user turns are dropped before older finalized turns
    when rollback applies
    - unmatched aborts do not consume the current active turn
    - missing abort IDs still conservatively clear stale compaction state
    - compaction clears `reference_context_item` until a later
    `TurnContextItem` re-establishes it
    - `previous_model` still comes from the newest surviving user turn that
    established one
    
    ## Tests
    
    Targeted validation run for the current branch shape:
    
    - `cd codex-rs && cargo test -p codex-core --lib
    codex::rollout_reconstruction_tests -- --nocapture`
    - `cd codex-rs && just fmt`
    
    The branch also extracts the rollout reconstruction tests into
    `codex-rs/core/src/codex/rollout_reconstruction_tests.rs` so this logic
    has a dedicated home instead of living inline in `codex.rs`.
  • Clarify escalation guidance for sandbox-related network failures (#13051)
    This updates the on-request permissions instructions so likely
    sandbox-related network failures during dependency installation are
    treated as escalation candidates.
    
    Repro:
    - Run `codex -a on-request -s workspace-write` in a fresh temp dir.
    - Prompt: `Build a new rust app with one dependency, anyhow, and try
    installing the dependency`.
    - Before this change, DNS/registry failures like `Could not resolve
    host: index.crates.io` could be treated like ordinary transient failures
    and not escalate.
    
    Fix:
    - Clarify that likely sandbox-related network errors such as DNS/host
    resolution, registry/index access, and dependency download failures
    should trigger escalation.
    
    Validation:
    - Rebuild the CLI and rerun the same repro. The same instructions should
    now be more likely to trigger escalation instead of silently stopping.
    
    Related Slack canvas:
    - https://openai.enterprise.slack.com/docs/T0BQTNSUF/F0ACVNJAV09
  • fix: use AbsolutePathBuf for permission profile file roots (#12970)
    ## Why
    `PermissionProfile` should describe filesystem roots as absolute paths
    at the type level. Using `PathBuf` in `FileSystemPermissions` made the
    shared type too permissive and blurred together three different
    deserialization cases:
    
    - skill metadata in `agents/openai.yaml`, where relative paths should
    resolve against the skill directory
    - app-server API payloads, where callers should have to send absolute
    paths
    - local tool-call payloads for commands like `shell_command` and
    `exec_command`, where `additional_permissions.file_system` may
    legitimately be relative to the command `workdir`
    
    This change tightens the shared model without regressing the existing
    local command flow.
    
    ## What Changed
    - changed `protocol::models::FileSystemPermissions` and the app-server
    `AdditionalFileSystemPermissions` mirror to use `AbsolutePathBuf`
    - wrapped skill metadata deserialization in `AbsolutePathBufGuard`, so
    relative permission roots in `agents/openai.yaml` resolve against the
    containing skill directory
    - kept app-server/API deserialization strict, so relative
    `additionalPermissions.fileSystem.*` paths are rejected at the boundary
    - restored cwd/workdir-relative deserialization for local tool-call
    payloads by parsing `shell`, `shell_command`, and `exec_command`
    arguments under an `AbsolutePathBufGuard` rooted at the resolved command
    working directory
    - simplified runtime additional-permission normalization so it only
    canonicalizes and deduplicates absolute roots instead of trying to
    recover relative ones later
    - updated the app-server schema fixtures, `app-server/README.md`, and
    the affected transport/TUI tests to match the final behavior
  • Add model availability NUX metadata (#12972)
    - replace show_nux with structured availability_nux model metadata
    - expose availability NUX data through the app-server model API
    - update shared fixtures and tests for the new field
  • Support multimodal custom tool outputs (#12948)
    ## Summary
    
    This changes `custom_tool_call_output` to use the same output payload
    shape as `function_call_output`, so freeform tools can return either
    plain text or structured content items.
    
    The main goal is to let `js_repl` return image content from nested
    `view_image` calls in its own `custom_tool_call_output`, instead of
    relying on a separate injected message.
    
    ## What changed
    
    - Changed `custom_tool_call_output.output` from `string` to
    `FunctionCallOutputPayload`
    - Updated freeform tool plumbing to preserve structured output bodies
    - Updated `js_repl` to aggregate nested tool content items and attach
    them to the outer `js_repl` result
    - Removed the old `js_repl` special case that injected `view_image`
    results as a separate pending user image message
    - Updated normalization/history/truncation paths to handle multimodal
    `custom_tool_call_output`
    - Regenerated app-server protocol schema artifacts
    
    ## Behavior
    
    Direct `view_image` calls still return a `function_call_output` with
    image content.
    
    When `view_image` is called inside `js_repl`, the outer `js_repl`
    `custom_tool_call_output` now carries:
    - an `input_text` item if the JS produced text output
    - one or more `input_image` items from nested tool results
    
    So the nested image result now stays inside the `js_repl` tool output
    instead of being injected as a separate message.
    
    ## Compatibility
    
    This is intended to be backward-compatible for resumed conversations.
    
    Older histories that stored `custom_tool_call_output.output` as a plain
    string still deserialize correctly, and older histories that used the
    previous injected-image-message flow also continue to resume.
    
    Added regression coverage for resuming a pre-change rollout containing:
    - string-valued `custom_tool_call_output`
    - legacy injected image message history
    
    
    #### [git stack](https://github.com/magus/git-stack-cli)
    - 👉 `1` https://github.com/openai/codex/pull/12948
  • feat: add local date/timezone to turn environment context (#12947)
    ## Summary
    
    This PR includes the session's local date and timezone in the
    model-visible environment context and persists that data in
    `TurnContextItem`.
    
      ## What changed
    - captures the current local date and IANA timezone when building a turn
    context, with a UTC fallback if the timezone lookup fails
    - includes current_date and timezone in the serialized
    <environment_context> payload
    - stores those fields on TurnContextItem so they survive rollout/history
    handling, subagent review threads, and resume flows
    - treats date/timezone changes as environment updates, so prompt caching
    and context refresh logic do not silently reuse stale time context
    - updates tests to validate the new environment fields without depending
    on a single hardcoded environment-context string
    
    ## test
    
    built a local build and saw it in the rollout file:
    ```
    {"timestamp":"2026-02-26T21:39:50.737Z","type":"response_item","payload":{"type":"message","role":"user","content":[{"type":"input_text","text":"<environment_context>\n  <shell>zsh</shell>\n  <current_date>2026-02-26</current_date>\n  <timezone>America/Los_Angeles</timezone>\n</environment_context>"}]}}
    ```
  • Allow clients not to send summary as an option (#12950)
    Summary is a required parameter on UserTurn. Ideally we'd like the core
    to decide the appropriate summary level.
    
    Make the summary optional and don't send it when not needed.
  • feat: include sandbox config with escalation request (#12839)
    ## Why
    
    Before this change, an escalation approval could say that a command
    should be rerun, but it could not carry the sandbox configuration that
    should still apply when the escalated command is actually spawned.
    
    That left an unsafe gap in the `zsh-fork` skill path: skill scripts
    under `scripts/` that did not declare permissions could be escalated
    without a sandbox, and scripts that did declare permissions could lose
    their bounded sandbox on rerun or cached session approval.
    
    This PR extends the escalation protocol so approvals can optionally
    carry sandbox configuration all the way through execution. That lets the
    shell runtime preserve the intended sandbox instead of silently widening
    access.
    
    We likely want a single permissions type for this codepath eventually,
    probably centered on `Permissions`. For now, the protocol needs to
    represent both the existing `PermissionProfile` form and the fuller
    `Permissions` form, so this introduces a temporary disjoint union,
    `EscalationPermissions`, to carry either one.
    
    Further, this means that today, a skill either:
    
    - does not declare any permissions, in which case it is run using the
    default sandbox for the turn
    - specifies permissions, in which case the skill is run using that exact
    sandbox, which might be more restrictive than the default sandbox for
    the turn
    
    We will likely change the skill's permissions to be additive to the
    existing permissions for the turn.
    
    ## What Changed
    
    - Added `EscalationPermissions` to `codex-protocol` so escalation
    requests can carry either a `PermissionProfile` or a full `Permissions`
    payload.
    - Added an explicit `EscalationExecution` mode to the shell escalation
    protocol so reruns distinguish between `Unsandboxed`, `TurnDefault`, and
    `Permissions(...)` instead of overloading `None`.
    - Updated `zsh-fork` shell reruns to resolve `TurnDefault` at execution
    time, which keeps ordinary `UseDefault` commands on the turn sandbox and
    preserves turn-level macOS seatbelt profile extensions.
    - Updated the `zsh-fork` skill path so a skill with no declared
    permissions inherits the conversation's effective sandbox instead of
    escalating unsandboxed.
    - Updated the `zsh-fork` skill path so a skill with declared permissions
    reruns with exactly those permissions, including when a cached session
    approval is reused.
    
    ## Testing
    
    - Added unit coverage in
    `core/src/tools/runtimes/shell/unix_escalation.rs` for the explicit
    `UseDefault` / `RequireEscalated` / `WithAdditionalPermissions`
    execution mapping.
    - Added unit coverage in
    `core/src/tools/runtimes/shell/unix_escalation.rs` for macOS seatbelt
    extension preservation in both the `TurnDefault` and
    explicit-permissions rerun paths.
    - Added integration coverage in `core/tests/suite/skill_approval.rs` for
    permissionless skills inheriting the turn sandbox and explicit skill
    permissions remaining bounded across cached approval reuse.
  • Use model catalog default for reasoning summary fallback (#12873)
    ## Summary
    - make `Config.model_reasoning_summary` optional so unset means use
    model default
    - resolve the optional config value to a concrete summary when building
    `TurnContext`
    - add protocol support for `default_reasoning_summary` in model metadata
    
    ## Validation
    - `cargo test -p codex-core --lib client::tests -- --nocapture`
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Enforce user input length cap (#12823)
    Currently there is no bound on the length of a user message submitted in
    the TUI or through the app server interface. That means users can paste
    many megabytes of text, which can lead to bad performance, hangs, and
    crashes. In extreme cases, it can lead to a [kernel
    panic](https://github.com/openai/codex/issues/12323).
    
    This PR limits the length of a user input to 2**20 (about 1M)
    characters. This value was chosen because it fills the entire context
    window on the latest models, so accepting longer inputs wouldn't make
    sense anyway.
    
    Summary
    - add a shared `MAX_USER_INPUT_TEXT_CHARS` constant in codex-protocol
    and surface it in TUI and app server code
    - block oversized submissions in the TUI submit flow and emit error
    history cells when validation fails
    - reject heavy app-server requests with JSON-RPC `-32602` and structured
    `input_too_large` data, plus document the behavior
    
    Testing
    - ran the IDE extension with this change and verified that when I
    attempt to paste a user message that's several MB long, it correctly
    reports an error instead of crashing or making my computer hot.
  • feat: include available decisions in command approval requests (#12758)
    Command-approval clients currently infer which choices to show from
    side-channel fields like `networkApprovalContext`,
    `proposedExecpolicyAmendment`, and `additionalPermissions`. That makes
    the request shape harder to evolve, and it forces each client to
    replicate the server's heuristics instead of receiving the exact
    decision list for the prompt.
    
    This PR introduces a mapping between `CommandExecutionApprovalDecision`
    and `codex_protocol::protocol::ReviewDecision`:
    
    ```rust
    impl From<CoreReviewDecision> for CommandExecutionApprovalDecision {
        fn from(value: CoreReviewDecision) -> Self {
            match value {
                CoreReviewDecision::Approved => Self::Accept,
                CoreReviewDecision::ApprovedExecpolicyAmendment {
                    proposed_execpolicy_amendment,
                } => Self::AcceptWithExecpolicyAmendment {
                    execpolicy_amendment: proposed_execpolicy_amendment.into(),
                },
                CoreReviewDecision::ApprovedForSession => Self::AcceptForSession,
                CoreReviewDecision::NetworkPolicyAmendment {
                    network_policy_amendment,
                } => Self::ApplyNetworkPolicyAmendment {
                    network_policy_amendment: network_policy_amendment.into(),
                },
                CoreReviewDecision::Abort => Self::Cancel,
                CoreReviewDecision::Denied => Self::Decline,
            }
        }
    }
    ```
    
    And updates `CommandExecutionRequestApprovalParams` to have a new field:
    
    ```rust
    available_decisions: Option<Vec<CommandExecutionApprovalDecision>>
    ```
    
    when, if specified, should make it easier for clients to display an
    appropriate list of options in the UI.
    
    This makes it possible for `CoreShellActionProvider::prompt()` in
    `unix_escalation.rs` to specify the `Vec<ReviewDecision>` directly,
    adding support for `ApprovedForSession` when approving a skill script,
    which was previously missing in the TUI.
    
    Note this results in a significant change to `exec_options()` in
    `approval_overlay.rs`, as the displayed options are now derived from
    `available_decisions: &[ReviewDecision]`.
    
    ## What Changed
    
    - Add `available_decisions` to
    [`ExecApprovalRequestEvent`](https://github.com/openai/codex/blob/de00e932dd9801de0a4faac0519162099753f331/codex-rs/protocol/src/approvals.rs#L111-L175),
    including helpers to derive the legacy default choices when older
    senders omit the field.
    - Map `codex_protocol::protocol::ReviewDecision` to app-server
    `CommandExecutionApprovalDecision` and expose the ordered list as
    experimental `availableDecisions` in
    [`CommandExecutionRequestApprovalParams`](https://github.com/openai/codex/blob/de00e932dd9801de0a4faac0519162099753f331/codex-rs/app-server-protocol/src/protocol/v2.rs#L3798-L3807).
    - Thread optional `available_decisions` through the core approval path
    so Unix shell escalation can explicitly request `ApprovedForSession` for
    session-scoped approvals instead of relying on client heuristics.
    [`unix_escalation.rs`](https://github.com/openai/codex/blob/de00e932dd9801de0a4faac0519162099753f331/codex-rs/core/src/tools/runtimes/shell/unix_escalation.rs#L194-L214)
    - Update the TUI approval overlay to build its buttons from the ordered
    decision list, while preserving the legacy fallback when
    `available_decisions` is missing.
    - Update the app-server README, test client output, and generated schema
    artifacts to document and surface the new field.
    
    ## Testing
    
    - Add `approval_overlay.rs` coverage for explicit decision lists,
    including the generic `ApprovedForSession` path and network approval
    options.
    - Update `chatwidget/tests.rs` and app-server protocol tests to populate
    the new optional field and keep older event shapes working.
    
    ## Developers Docs
    
    - If we document `item/commandExecution/requestApproval` on
    [developers.openai.com/codex](https://developers.openai.com/codex), add
    experimental `availableDecisions` as the preferred source of approval
    choices and note that older servers may omit it.
  • Revert "Add skill approval event/response (#12633)" (#12811)
    This reverts commit https://github.com/openai/codex/pull/12633. We no
    longer need this PR, because we favor sending normal exec command
    approval server request with `additional_permissions` of skill
    permissions instead
  • feat(app-server): add ThreadItem::DynamicToolCall (#12732)
    Previously, clients would call `thread/start` with dynamic_tools set,
    and when a model invokes a dynamic tool, it would just make the
    server->client `item/tool/call` request and wait for the client's
    response to complete the tool call. This works, but it doesn't have an
    `item/started` or `item/completed` event.
    
    Now we are doing this:
    - [new] emit `item/started` with `DynamicToolCall` populated with the
    call arguments
    - send an `item/tool/call` server request
    - [new] once the client responds, emit `item/completed` with
    `DynamicToolCall` populated with the response.
    
    Also, with `persistExtendedHistory: true`, dynamic tool calls are now
    reconstructable in `thread/read` and `thread/resume` as
    `ThreadItem::DynamicToolCall`.
  • chore: migrate additional permissions to PermissionProfile (#12731)
    This PR replaces the old `additional_permissions.fs_read/fs_write` shape
    with a shared `PermissionProfile`
    model and wires it through the command approval, sandboxing, protocol,
    and TUI layers. The schema is adopted from the
    `SkillManifestPermissions`, which is also refactored to use this unified
    struct. This helps us easily expose permission profiles in app
    server/core as a follow-up.
  • feat(core) Introduce Feature::RequestPermissions (#11871)
    ## Summary
    Introduces the initial implementation of Feature::RequestPermissions.
    RequestPermissions allows the model to request that a command be run
    inside the sandbox, with additional permissions, like writing to a
    specific folder. Eventually this will include other rules as well, and
    the ability to persist these permissions, but this PR is already quite
    large - let's get the core flow working and go from there!
    
    <img width="1279" height="541" alt="Screenshot 2026-02-15 at 2 26 22 PM"
    src="https://github.com/user-attachments/assets/0ee3ec0f-02ec-4509-91a2-809ac80be368"
    />
    
    ## Testing
    - [x] Added tests
    - [x] Tested locally
    - [x] Feature
  • chore: rm hardcoded PRESETS list (#12650)
    rm `PRESETS` list harcoded in `model_presets` as we now have bundled
    `models.json` with equivalent info.
    
    update logic to rely on bundled models instead, update tests.
  • Add skill approval event/response (#12633)
    Set the stage for skill-level permission approval in addition to
    command-level.
    
    Behind a feature flag.
  • feat(core): persist network approvals in execpolicy (#12357)
    ## Summary
    Persist network approval allow/deny decisions as `network_rule(...)`
    entries in execpolicy (not proxy config)
    
    It adds `network_rule` parsing + append support in `codex-execpolicy`,
    including `decision="prompt"` (parse-only; not compiled into proxy
    allow/deny lists)
    - compile execpolicy network rules into proxy allow/deny lists and
    update the live proxy state on approval
    - preserve requirements execpolicy `network_rule(...)` entries when
    merging with file-based execpolicy
    - reject broad wildcard hosts (for example `*`) for persisted
    `network_rule(...)`
  • Fix compaction context reinjection and model baselines (#12252)
    ## Summary
    - move regular-turn context diff/full-context persistence into
    `run_turn` so pre-turn compaction runs before incoming context updates
    are recorded
    - after successful pre-turn compaction, rely on a cleared
    `reference_context_item` to trigger full context reinjection on the
    follow-up regular turn (manual `/compact` keeps replacement history
    summary-only and also clears the baseline)
    - preserve `<model_switch>` when full context is reinjected, and inject
    it *before* the rest of the full-context items
    - scope `reference_context_item` and `previous_model` to regular user
    turns only so standalone tasks (`/compact`, shell, review, undo) cannot
    suppress future reinjection or `<model_switch>` behavior
    - make context-diff persistence + `reference_context_item` updates
    explicit in the regular-turn path, with clearer docs/comments around the
    invariant
    - stop persisting local `/compact` `RolloutItem::TurnContext` snapshots
    (only regular turns persist `TurnContextItem` now)
    - simplify resume/fork previous-model/reference-baseline hydration by
    looking up the last surviving turn context from rollout lifecycle
    events, including rollback and compaction-crossing handling
    - remove the legacy fallback that guessed from bare `TurnContext`
    rollouts without lifecycle events
    - update compaction/remote-compaction/model-visible snapshots and
    compact test assertions (including remote compaction mock response
    shape)
    
    ## Why
    We were persisting incoming context items before spawning the regular
    turn task, which let pre-turn compaction requests accidentally include
    incoming context diffs without the new user message. Fixing that exposed
    follow-on baseline issues around `/compact`, resume/fork, and standalone
    tasks that could cause duplicate context injection or suppress
    `<model_switch>` instructions.
    
    This PR re-centers the invariants around regular turns:
    - regular turns persist model-visible context diffs/full reinjection and
    update the `reference_context_item`
    - standalone tasks do not advance those regular-turn baselines
    - compaction clears the baseline when replacement history may have
    stripped the referenced context diffs
    
    ## Follow-ups (TODOs left in code)
    - `TODO(ccunningham)`: fix rollback/backtracking baseline handling more
    comprehensively
    - `TODO(ccunningham)`: include pending incoming context items in
    pre-turn compaction threshold estimation
    - `TODO(ccunningham)`: inject updated personality spec alongside
    `<model_switch>` so some model-switch paths can avoid forced full
    reinjection
    - `TODO(ccunningham)`: review task turn lifecycle
    (`TurnStarted`/`TurnComplete`) behavior and emit task-start context
    diffs for task types that should have them (excluding `/compact`)
    
    ## Validation
    - `just fmt`
    - CI should cover the updated compaction/resume/model-visible snapshot
    expectations and rollout-hydration behavior
    - I did **not** rerun the full local test suite after the latest
    resume-lookup / rollout-persistence simplifications
  • Wire realtime api to core (#12268)
    - Introduce `RealtimeConversationManager` for realtime API management 
    - Add `op::conversation` to start conversation, insert audio, insert
    text, and close conversation.
    - emit conversation lifecycle and realtime events.
    - Move shared realtime payload types into codex-protocol and add core
    e2e websocket tests for start/replace/transport-close paths.
    
    Things to consider:
    - Should we use the same `op::` and `Events` channel to carry audio? I
    think we should try this simple approach and later we can create
    separate one if the channels got congested.
    - Sending text updates to the client: we can start simple and later
    restrict that.
    - Provider auth isn't wired for now intentionally
  • feat: cleaner TUI for sub-agents (#12327)
    <img width="760" height="496" alt="Screenshot 2026-02-20 at 14 31 25"
    src="https://github.com/user-attachments/assets/1983b825-bb47-417e-9925-6f727af56765"
    />
  • feat: add nick name to sub-agents (#12320)
    Adding random nick name to sub-agents. Used for UX
    
    At the same time, also storing and wiring the role of the sub-agent
  • feat: add Reject approval policy with granular prompt rejection controls (#12087)
    ## Why
    
    We need a way to auto-reject specific approval prompt categories without
    switching all approvals off.
    
    The goal is to let users independently control:
    - sandbox escalation approvals,
    - execpolicy `prompt` rule approvals,
    - MCP elicitation prompts.
    
    ## What changed
    
    - Added a new primary approval mode in `protocol/src/protocol.rs`:
    
    ```rust
    pub enum AskForApproval {
        // ...
        Reject(RejectConfig),
        // ...
    }
    
    pub struct RejectConfig {
        pub sandbox_approval: bool,
        pub rules: bool,
        pub mcp_elicitations: bool,
    }
    ```
    
    - Wired `RejectConfig` semantics through approval paths in `core`:
      - `core/src/exec_policy.rs`
        - rejects rule-driven prompts when `rules = true`
        - rejects sandbox/escalation prompts when `sandbox_approval = true`
    - preserves rule priority when both rule and sandbox prompt conditions
    are present
      - `core/src/tools/sandboxing.rs`
    - applies `sandbox_approval` to default exec approval decisions and
    sandbox-failure retry gating
      - `core/src/safety.rs`
    - keeps `Reject { all false }` behavior aligned with `OnRequest` for
    patch safety
        - rejects out-of-root patch approvals when `sandbox_approval = true`
      - `core/src/mcp_connection_manager.rs`
        - auto-declines MCP elicitations when `mcp_elicitations = true`
    
    - Ensured approval policy used by MCP elicitation flow stays in sync
    with constrained session policy updates.
    
    - Updated app-server v2 conversions and generated schema/TypeScript
    artifacts for the new `Reject` shape.
    
    ## Verification
    
    Added focused unit coverage for the new behavior in:
    - `core/src/exec_policy.rs`
    - `core/src/tools/sandboxing.rs`
    - `core/src/mcp_connection_manager.rs`
    - `core/src/safety.rs`
    - `core/src/tools/runtimes/apply_patch.rs`
    
    Key cases covered include rule-vs-sandbox prompt precedence, MCP
    auto-decline behavior, and patch/sandbox retry behavior under
    `RejectConfig`.
  • client side modelinfo overrides (#12101)
    TL;DR
    Add top-level `model_catalog_json` config support so users can supply a
    local model catalog override from a JSON file path (including adding new
    models) without backend changes.
    
    ### Problem
    Codex previously had no clean client-side way to replace/overlay model
    catalog data for local testing of model metadata and new model entries.
    
    ### Fix
    - Add top-level `model_catalog_json` config field (JSON file path).
    - Apply catalog entries when resolving `ModelInfo`:
      1. Base resolved model metadata (remote/fallback)
      2. Catalog overlay from `model_catalog_json`
    3. Existing global top-level overrides (`model_context_window`,
    `model_supports_reasoning_summaries`, etc.)
    
    ### Note
    Will revisit per-field overrides in a follow-up
    
    ### Tests
    Added tests
  • fix: Restricted Read: /System is too permissive for macOS platform de… (#11798)
    …fault
    
    Update the list of platform defaults included for `ReadOnlyAccess`.
    
    When `ReadOnlyAccess::Restricted::include_platform_defaults` is `true`,
    the policy defined in
    `codex-rs/core/src/seatbelt_platform_defaults.sbpl` is appended to
    enable macOS programs to function properly.
  • feat(core): plumb distinct approval ids for command approvals (#12051)
    zsh fork PR stack:
    - https://github.com/openai/codex/pull/12051 👈 
    - https://github.com/openai/codex/pull/12052
    
    With upcoming support for a fork of zsh that allows us to intercept
    `execve` and run execpolicy checks for each subcommand as part of a
    `CommandExecution`, it will be possible for there to be multiple
    approval requests for a shell command like `/path/to/zsh -lc 'git status
    && rg \"TODO\" src && make test'`.
    
    To support that, this PR introduces a new `approval_id` field across
    core, protocol, and app-server so that we can associate approvals
    properly for subcommands.
  • Add remote skill scope/product_surface/enabled params and cleanup (#11801)
    skills/remote/list: params=hazelnutScope, productSurface, enabled;
    returns=data: { id, name, description }[]
    skills/remote/export: params=hazelnutId; returns={ id, path }
  • Feat: add model reroute notification (#12001)
    ### Summary
    Builiding off
    https://github.com/openai/codex/pull/11964/files/5c75aa7b89a70bc2cc410a6fd238749306ec4c5e#diff-058ae8f109a8b84b4b79bbfa45f522c2233b9d9e139696044ae374d50b6196e0,
    we have created a `model/rerouted` notification that captures the event
    so that consumers can render as expected. Keep the `EventMsg::Warning`
    path in core so that this does not affect TUI rendering.
    
    `model/rerouted` is meant to be generic to account for future usage
    including capacity planning etc.
  • chore(core) rm Feature::RequestRule (#11866)
    ## Summary
    This feature is now reasonably stable, let's remove it so we can
    simplify our upcoming iterations here.
    
    ## Testing 
    - [x] Existing tests pass
  • fix: show user warning when using default fallback metadata (#11690)
    ### What
    It's currently unclear when the harness falls back to the default,
    generic `ModelInfo`. This happens when the `remote_models` feature is
    disabled or the model is truly unknown, and can lead to bad performance
    and issues in the harness.
    
    Add a user-facing warning when this happens so they are aware when their
    setup is broken.
    
    ### Tests
    Added tests, tested locally.
  • feat(core): add structured network approval plumbing and policy decision model (#11672)
    ### Description
    #### Summary
    Introduces the core plumbing required for structured network approvals
    
    #### What changed
    - Added structured network policy decision modeling in core.
    - Added approval payload/context types needed for network approval
    semantics.
    - Wired shell/unified-exec runtime plumbing to consume structured
    decisions.
    - Updated related core error/event surfaces for structured handling.
    - Updated protocol plumbing used by core approval flow.
    - Included small CLI debug sandbox compatibility updates needed by this
    layer.
    
    #### Why
    establishes the minimal backend foundation for network approvals without
    yet changing high-level orchestration or TUI behavior.
    
    #### Notes
    - Behavior remains constrained by existing requirements/config gating.
    - Follow-up PRs in the stack handle orchestration, UX, and app-server
    integration.
    
    ---------
    
    Co-authored-by: Codex <199175422+chatgpt-codex-connector[bot]@users.noreply.github.com>
  • fix(protocol): make local image test Bazel-friendly (#11799)
    Fixes Bazel build failure in //codex-rs/protocol:protocol-unit-tests.
    
    The test used include_bytes! to read a PNG from codex-core assets; Cargo
    can read it,
    but Bazel sandboxing can't, so the crate fails to compile.
    
    This change inlines a tiny valid PNG in the test to keep it hermetic.
    
    Related regression: #10590 (cc: @charley-oai)
  • tui: preserve remote image attachments across resume/backtrack (#10590)
    ## Summary
    This PR makes app-server-provided image URLs first-class attachments in
    TUI, so they survive resume/backtrack/history recall and are resubmitted
    correctly.
    
    <img width="715" height="491" alt="Screenshot 2026-02-12 at 8 27 08 PM"
    src="https://github.com/user-attachments/assets/226cbd35-8f0c-4e51-a13e-459ef5dd1927"
    />
    
    Can delete the attached image upon backtracking:
    <img width="716" height="301" alt="Screenshot 2026-02-12 at 8 27 31 PM"
    src="https://github.com/user-attachments/assets/4558d230-f1bd-4eed-a093-8e1ab9c6db27"
    />
    
    In both history and composer, remote images are rendered as normal
    `[Image #N]` placeholders, with numbering unified with local images.
    
    ## What changed
    - Plumb remote image URLs through TUI message state:
      - `UserHistoryCell`
      - `BacktrackSelection`
      - `ChatComposerHistory::HistoryEntry`
      - `ChatWidget::UserMessage`
    - Show remote images as placeholder rows inside the composer box (above
    textarea), and in history cells.
    - Support keyboard selection/deletion for remote image rows in composer
    (`Up`/`Down`, `Delete`/`Backspace`).
    - Preserve remote-image-only turns in local composer history (Up/Down
    recall), including restore after backtrack.
    - Ensure submit/queue/backtrack resubmit include remote images in model
    input (`UserInput::Image`), and keep request shape stable for
    remote-image-only turns.
    - Keep image numbering contiguous across remote + local images:
      - remote images occupy `[Image #1]..[Image #M]`
      - local images start at `[Image #M+1]`
      - deletion renumbers consistently.
    - In protocol conversion, increment shared image index for remote images
    too, so mixed remote/local image tags stay in a single sequence.
    - Simplify restore logic to trust in-memory attachment order (no
    placeholder-number parsing path).
    - Backtrack/replay rollback handling now queues trims through
    `AppEvent::ApplyThreadRollback` and syncs transcript overlay/deferred
    lines after trims, so overlay/transcript state stays consistent.
    - Trim trailing blank rendered lines from user history rendering to
    avoid oversized blank padding.
    
    ## Docs + tests
    - Updated: `docs/tui-chat-composer.md` (remote image flow,
    selection/deletion, numbering offsets)
    - Added/updated tests across `tui/src/chatwidget/tests.rs`,
    `tui/src/app.rs`, `tui/src/app_backtrack.rs`, `tui/src/history_cell.rs`,
    and `tui/src/bottom_pane/chat_composer.rs`
    - Added snapshot coverage for remote image composer states, including
    deleting the first of two remote images.
    
    ## Validation
    - `just fmt`
    - `cargo test -p codex-tui`
    
    ## Codex author
    `codex fork 019c2636-1571-74a1-8471-15a3b1c3f49d`
  • Persist complete TurnContextItem state via canonical conversion (#11656)
    ## Summary
    
    This PR delivers the first small, shippable step toward model-visible
    state diffing by making
    `TurnContextItem` more complete and standardizing how it is built.
    
    Specifically, it:
    - Adds persisted network context to `TurnContextItem`.
    - Introduces a single canonical `TurnContext -> TurnContextItem`
    conversion path.
    - Routes existing rollout write sites through that canonical conversion
    helper.
    
    No context injection/diff behavior changes are included in this PR.
    
    ## Why this change
    
    The design goal is to make `TurnContextItem` the canonical source of
    truth for context-diff
    decisions.
    Before this PR:
    - `TurnContextItem` did not include all TurnContext-derived environment
    inputs needed for v1
    completeness.
    - Construction was duplicated at multiple write sites.
    
    This PR addresses both with a minimal, reviewable change.
    
    ## Changes
    
    ### 1) Extend `TurnContextItem` with network state
    - Added `TurnContextNetworkItem { allowed_domains, denied_domains }`.
    - Added `network: Option<TurnContextNetworkItem>` to `TurnContextItem`.
    - Kept backward compatibility by making the new field optional and
    skipped when absent.
    
    Files:
    - `codex-rs/protocol/src/protocol.rs`
    
    ### 2) Canonical conversion helper
    - Added `TurnContext::to_turn_context_item(collaboration_mode)` in core.
    - Added internal helper to derive network fields from
    `config_layer_stack.requirements().network`.
    
    Files:
    - `codex-rs/core/src/codex.rs`
    
    ### 3) Use canonical conversion at rollout write sites
    - Replaced ad hoc `TurnContextItem { ... }` construction with
    `to_turn_context_item(...)` in:
      - sampling request path
      - compaction path
    
    Files:
    - `codex-rs/core/src/codex.rs`
    - `codex-rs/core/src/compact.rs`
    
    ### 4) Update fixtures/tests for new optional field
    - Updated existing `TurnContextItem` literals in tests to include
    `network: None`.
    - Added protocol tests for:
      - deserializing old payloads with no `network`
      - serializing when `network` is present
    
    Files:
    - `codex-rs/core/tests/suite/resume_warning.rs`
    - No replay/diff logic changes.
    - Persisted rollout `TurnContextItem` now carries additional network
    context when available.
    - Older rollout lines without `network` remain readable.
  • [apps] Add is_enabled to app info. (#11417)
    - [x] Add is_enabled to app info and the response of `app/list`.
    - [x] Update TUI to have Enable/Disable button on the app detail page.
  • chore(core) Deprecate approval_policy: on-failure (#11631)
    ## Summary
    In an effort to start simplifying our sandbox setup, we're announcing
    this approval_policy as deprecated. In general, it performs worse than
    `on-request`, and we're focusing on making fewer sandbox configurations
    perform much better.
    
    ## Testing
    - [x] Tested locally
    - [x] Existing tests pass
  • feat(app-server): experimental flag to persist extended history (#11227)
    This PR adds an experimental `persist_extended_history` bool flag to
    app-server thread APIs so rollout logs can retain a richer set of
    EventMsgs for non-lossy Thread > Turn > ThreadItems reconstruction (i.e.
    on `thread/resume`).
    
    ### Motivation
    Today, our rollout recorder only persists a small subset (e.g. user
    message, reasoning, assistant message) of `EventMsg` types, dropping a
    good number (like command exec, file change, etc.) that are important
    for reconstructing full item history for `thread/resume`, `thread/read`,
    and `thread/fork`.
    
    Some clients want to be able to resume a thread without lossiness. This
    lossiness is primarily a UI thing, since what the model sees are
    `ResponseItem` and not `EventMsg`.
    
    ### Approach
    This change introduces an opt-in `persist_full_history` flag to preserve
    those events when you start/resume/fork a thread (defaults to `false`).
    
    This is done by adding an `EventPersistenceMode` to the rollout
    recorder:
    - `Limited` (existing behavior, default)
    - `Extended` (new opt-in behavior)
    
    In `Extended` mode, persist additional `EventMsg` variants needed for
    non-lossy app-server `ThreadItem` reconstruction. We now store the
    following ThreadItems that we didn't before:
    - web search
    - command execution
    - patch/file changes
    - MCP tool calls
    - image view calls
    - collab tool outcomes
    - context compaction
    - review mode enter/exit
    
    For **command executions** in particular, we truncate the output using
    the existing `truncate_text` from core to store an upper bound of 10,000
    bytes, which is also the default value for truncating tool outputs shown
    to the model. This keeps the size of the rollout file and command
    execution items returned over the wire reasonable.
    
    And we also persist `EventMsg::Error` which we can now map back to the
    Turn's status and populates the Turn's error metadata.
    
    #### Updates to EventMsgs
    To truly make `thread/resume` non-lossy, we also needed to persist the
    `status` on `EventMsg::CommandExecutionEndEvent` and
    `EventMsg::PatchApplyEndEvent`. Previously it was not obvious whether a
    command failed or was declined (similar for apply_patch). These
    EventMsgs were never persisted before so I made it a required field.
  • feat: mem slash commands (#11569)
    Add 2 slash commands for memories:
    * `/m_drop` delete all the memories
    * `/m_update` update the memories with phase 1 and 2
  • feat: make sandbox read access configurable with ReadOnlyAccess (#11387)
    `SandboxPolicy::ReadOnly` previously implied broad read access and could
    not express a narrower read surface.
    This change introduces an explicit read-access model so we can support
    user-configurable read restrictions in follow-up work, while preserving
    current behavior today.
    
    It also ensures unsupported backends fail closed for restricted-read
    policies instead of silently granting broader access than intended.
    
    ## What
    
    - Added `ReadOnlyAccess` in protocol with:
      - `Restricted { include_platform_defaults, readable_roots }`
      - `FullAccess`
    - Updated `SandboxPolicy` to carry read-access configuration:
      - `ReadOnly { access: ReadOnlyAccess }`
      - `WorkspaceWrite { ..., read_only_access: ReadOnlyAccess }`
    - Preserved existing behavior by defaulting current construction paths
    to `ReadOnlyAccess::FullAccess`.
    - Threaded the new fields through sandbox policy consumers and call
    sites across `core`, `tui`, `linux-sandbox`, `windows-sandbox`, and
    related tests.
    - Updated Seatbelt policy generation to honor restricted read roots by
    emitting scoped read rules when full read access is not granted.
    - Added fail-closed behavior on Linux and Windows backends when
    restricted read access is requested but not yet implemented there
    (`UnsupportedOperation`).
    - Regenerated app-server protocol schema and TypeScript artifacts,
    including `ReadOnlyAccess`.
    
    ## Compatibility / rollout
    
    - Runtime behavior remains unchanged by default (`FullAccess`).
    - API/schema changes are in place so future config wiring can enable
    restricted read access without another policy-shape migration.
  • Update context window after model switch (#11520)
    - Update token usage aggregation to refresh model context window after a
    model change.
    - Add protocol/core tests, including an e2e model-switch test that
    validates switching to a smaller model updates telemetry.
  • Clamp auto-compact limit to context window (#11516)
    - Clamp auto-compaction to the minimum of configured limit and 90% of
    context window
    - Add an e2e compact test for clamped behavior
    - Update remote compact tests to account for earlier auto-compaction in
    setup turns
  • change model cap to server overload (#11388)
    # External (non-OpenAI) Pull Request Requirements
    
    Before opening this Pull Request, please read the dedicated
    "Contributing" markdown file or your PR may be closed:
    https://github.com/openai/codex/blob/main/docs/contributing.md
    
    If your PR conforms to our contribution guidelines, replace this text
    with a detailed and high quality description of your changes.
    
    Include a link to a bug report or enhancement request.
  • feat: support multiple rate limits (#11260)
    Added multi-limit support end-to-end by carrying limit_name in
    rate-limit snapshots and handling multiple buckets instead of only
    codex.
    Extended /usage client parsing to consume additional_rate_limits
    Updated TUI /status and in-memory state to store/render per-limit
    snapshots
    Extended app-server rate-limit read response: kept rate_limits and added
    rate_limits_by_name.
    Adjusted usage-limit error messaging for non-default codex limit buckets
  • chore: persist turn_id in rollout session and make turn_id uuid based (#11246)
    Problem:
    1. turn id is constructed in-memory;
    2. on resuming threads, turn_id might not be unique;
    3. client cannot no the boundary of a turn from rollout files easily.
    
    This PR does three things:
    1. persist `task_started` and `task_complete` events;
    1. persist `turn_id` in rollout turn events;
    5. generate turn_id as unique uuids instead of incrementing it in
    memory.
    
    This helps us resolve the issue of clients wanting to have unique turn
    ids for resuming a thread, and knowing the boundry of each turn in
    rollout files.
    
    example debug logs
    ```
    2026-02-11T00:32:10.746876Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=8 turn=Turn { id: "019c4a07-d809-74c3-bc4b-fd9618487b4b", items: [UserMessage { id: "item-24", content: [Text { text: "hi", text_elements: [] }] }, AgentMessage { id: "item-25", text: "Hi. I’m in the workspace with your current changes loaded and ready. Send the next task and I’ll execute it end-to-end." }], status: Completed, error: None }
    2026-02-11T00:32:10.746888Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=9 turn=Turn { id: "019c4a18-1004-76c0-a0fb-a77610f6a9b8", items: [UserMessage { id: "item-26", content: [Text { text: "hello", text_elements: [] }] }, AgentMessage { id: "item-27", text: "Hello. Ready for the next change in `codex-rs`; I can continue from the current in-progress diff or start a new task." }], status: Completed, error: None }
    2026-02-11T00:32:10.746899Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=10 turn=Turn { id: "019c4a19-41f0-7db0-ad78-74f1503baeb8", items: [UserMessage { id: "item-28", content: [Text { text: "hello", text_elements: [] }] }, AgentMessage { id: "item-29", text: "Hello. Send the specific change you want in `codex-rs`, and I’ll implement it and run the required checks." }], status: Completed, error: None }
    ```
    
    backward compatibility:
    if you try to resume an old session without task_started and
    task_complete event populated, the following happens:
    - If you resume and do nothing: those reconstructed historical IDs can
    differ next time you resume.
    - If you resume and send a new turn: the new turn gets a fresh UUID from
    live submission flow and is persisted, so that new turn’s ID is stable
    on later resumes.
    I think this behavior is fine, because we only care about deterministic
    turn id once a turn is triggered.
  • Prefer websocket transport when model opts in (#11386)
    Summary
    - add a `prefer_websockets` field to `ModelInfo`, defaulting to `false`
    in all fixtures and constructors
    - wire the new flag into websocket selection so models that opt in
    always use websocket transport even when the feature gate is off
    
    Testing
    - Not run (not requested)
  • Fix: update parallel tool call exec approval to approve on request id (#11162)
    ### Summary
    
    In parallel tool call, exec command approvals were not approved at
    request level but at a turn level. i.e. when a single request is
    approved, the system currently treats all requests in turn as approved.
    
    ### Before
    
    https://github.com/user-attachments/assets/d50ed129-b3d2-4b2f-97fa-8601eb11f6a8
    
    ### After
    
    https://github.com/user-attachments/assets/36528a43-a4aa-4775-9e12-f13287ef19fc