Commit Graph

345 Commits

  • Feat: Preserve network access on read-only sandbox policies (#13409)
    ## Summary
    
    `PermissionProfile.network` could not be preserved when additional or
    compiled permissions resolved to
    `SandboxPolicy::ReadOnly`, because `ReadOnly` had no network_access
    field. This change makes read-only + network
    enabled representable directly and threads that through the protocol,
    app-server v2 mirror, and permission-
      merging logic.
    
    ## What changed
    
    - Added `network_access: bool` to `SandboxPolicy::ReadOnly` in the core
    protocol and app-server v2 protocol.
    - Kept backward compatibility by defaulting the new field to false, so
    legacy read-only payloads still
        deserialize unchanged.
    - Updated `has_full_network_access()` and sandbox summaries to respect
    read-only network access.
      - Preserved PermissionProfile.network when:
          - compiling skill permission profiles into sandbox policies
          - normalizing additional permissions
          - merging additional permissions into existing sandbox policies
    - Updated the approval overlay to show network in the rendered
    permission rule when requested.
      - Regenerated app-server schema fixtures for the new v2 wire shape.
  • feat(app-server): propagate app-server trace context into core (#13368)
    ### Summary
    Propagate trace context originating at app-server RPC method handlers ->
    codex core submission loop (so this includes spans such as `run_turn`!).
    This implements PR 2 of the app-server tracing rollout.
    
    This also removes the old lower-level env-based reparenting in core so
    explicit request/submission ancestry wins instead of being overridden by
    ambient `TRACEPARENT` state.
    
    ### What changed
    - Added `trace: Option<W3cTraceContext>` to codex_protocol::Submission
    - Taught `Codex::submit()` / `submit_with_id()` to automatically capture
    the current span context when constructing or forwarding a submission
    - Wrapped the core submission loop in a submission_dispatch span
    parented from Submission.trace
    - Warn on invalid submission trace carriers and ignore them cleanly
    - Removed the old env-based downstream reparenting path in core task
    execution
    - Stopped OTEL provider init from implicitly attaching env trace context
    process-wide
    - Updated mcp-server Submission call sites for the new field
    
    Added focused unit tests for:
    - capturing trace context into Submission
    - preferring `Submission.trace` when building the core dispatch span
    
    ### Why
    PR 1 gave us consistent inbound request spans in app-server, but that
    only covered the transport boundary. For long-running work like turns
    and reviews, the important missing piece was preserving ancestry after
    the request handler returns and core continues work on a different async
    path.
    
    This change makes that handoff explicit and keeps the parentage rules
    simple:
    - app-server request span sets the current context
    - `Submission.trace` snapshots that context
    - core restores it once, at the submission boundary
    - deeper core spans inherit naturally
    
    That also lets us stop relying on env-based reparenting for this path,
    which was too ambient and could override explicit ancestry.
  • Add under-development original-resolution view_image support (#13050)
    ## Summary
    
    Add original-resolution support for `view_image` behind the
    under-development `view_image_original_resolution` feature flag.
    
    When the flag is enabled and the target model is `gpt-5.3-codex` or
    newer, `view_image` now preserves original PNG/JPEG/WebP bytes and sends
    `detail: "original"` to the Responses API instead of using the legacy
    resize/compress path.
    
    ## What changed
    
    - Added `view_image_original_resolution` as an under-development feature
    flag.
    - Added `ImageDetail` to the protocol models and support for serializing
    `detail: "original"` on tool-returned images.
    - Added `PromptImageMode::Original` to `codex-utils-image`.
      - Preserves original PNG/JPEG/WebP bytes.
      - Keeps legacy behavior for the resize path.
    - Updated `view_image` to:
    - use the shared `local_image_content_items_with_label_number(...)`
    helper in both code paths
      - select original-resolution mode only when:
        - the feature flag is enabled, and
        - the model slug parses as `gpt-5.3-codex` or newer
    - Kept local user image attachments on the existing resize path; this
    change is specific to `view_image`.
    - Updated history/image accounting so only `detail: "original"` images
    use the docs-based GPT-5 image cost calculation; legacy images still use
    the old fixed estimate.
    - Added JS REPL guidance, gated on the same feature flag, to prefer JPEG
    at 85% quality unless lossless is required, while still allowing other
    formats when explicitly requested.
    - Updated tests and helper code that construct
    `FunctionCallOutputContentItem::InputImage` to carry the new `detail`
    field.
    
    ## Behavior
    
    ### Feature off
    - `view_image` keeps the existing resize/re-encode behavior.
    - History estimation keeps the existing fixed-cost heuristic.
    
    ### Feature on + `gpt-5.3-codex+`
    - `view_image` sends original-resolution images with `detail:
    "original"`.
    - PNG/JPEG/WebP source bytes are preserved when possible.
    - History estimation uses the GPT-5 docs-based image-cost calculation
    for those `detail: "original"` images.
    
    
    #### [git stack](https://github.com/magus/git-stack-cli)
    - 👉 `1` https://github.com/openai/codex/pull/13050
    -  `2` https://github.com/openai/codex/pull/13331
    -  `3` https://github.com/openai/codex/pull/13049
  • realtime prompt changes (#13376)
    # External (non-OpenAI) Pull Request Requirements
    
    Before opening this Pull Request, please read the dedicated
    "Contributing" markdown file or your PR may be closed:
    https://github.com/openai/codex/blob/main/docs/contributing.md
    
    If your PR conforms to our contribution guidelines, replace this text
    with a detailed and high quality description of your changes.
    
    Include a link to a bug report or enhancement request.
  • app-server service tier plumbing (plus some cleanup) (#13334)
    followup to https://github.com/openai/codex/pull/13212 to expose fast
    tier controls to app server
    (majority of this PR is generated schema jsons - actual code is +69 /
    -35 and +24 tests )
    
    - add service tier fields to the app-server protocol surfaces used by
    thread lifecycle, turn start, config, and session configured events
    - thread service tier through the app-server message processor and core
    thread config snapshots
    - allow runtime config overrides to carry service tier for app-server
    callers
    
    cleanup:
    - Removing useless "legacy" code supporting "standard" - we moved to
    None | "fast", so "standard" is not needed.
  • add fast mode toggle (#13212)
    - add a local Fast mode setting in codex-core (similar to how model id
    is currently stored on disk locally)
    - send `service_tier=priority` on requests when Fast is enabled
    - add `/fast` in the TUI and persist it locally
    - feature flag
  • Update realtime websocket API (#13265)
    - migrate the realtime websocket transport to the new session and
    handoff flow
    - make the realtime model configurable in config.toml and use API-key
    auth for the websocket
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • feat(app-server): add tracing to all app-server APIs (#13285)
    ### Overview
    This PR adds the first piece of tracing for app-server JSON-RPC
    requests.
    
    There are two main changes:
    - JSON-RPC requests can now take an optional W3C trace context at the
    top level via a `trace` field (`traceparent` / `tracestate`).
    - app-server now creates a dedicated request span for every inbound
    JSON-RPC request in `MessageProcessor`, and uses the request-level trace
    context as the parent when present.
    
    For compatibility with existing flows, app-server still falls back to
    the TRACEPARENT env var when there is no request-level traceparent.
    
    This PR is intentionally scoped to the app-server boundary. In a
    followup, we'll actually propagate trace context through the async
    handoff into core execution spans like run_turn, which will make
    app-server traces much more useful.
    
    ### Spans
    A few details on the app-server span shape:
    - each inbound request gets its own server span
    - span/resource names are based on the JSON-RPC method (`initialize`,
    `thread/start`, `turn/start`, etc.)
    - spans record transport (stdio vs websocket), request id, connection
    id, and client name/version when available
    - `initialize` stores client metadata in session state so later requests
    on the same connection can reuse it
  • feat: polluted memories (#13008)
    Add a feature flag to disable memory creation for "polluted"
  • Record realtime close marker on replacement (#13058)
    ## Summary
    - record a realtime close developer message when a new realtime session
    replaces an active one
    - assert the replacement marker through the mocked responses request
    path
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
    Co-authored-by: Charles Cunningham <ccunningham@openai.com>
  • Unify rollout reconstruction with resume/fork TurnContext hydration (#12612)
    ## Summary
    
    This PR unifies rollout history reconstruction and resume/fork metadata
    hydration under a single `Session::reconstruct_history_from_rollout`
    implementation.
    
    The key change from main is that replay metadata now comes from the same
    reconstruction pass that rebuilds model-visible history, instead of
    doing a second bespoke rollout scan to recover `previous_model` /
    `reference_context_item`.
    
    ## What Changed
    
    ### Unified reconstruction output
    
    `reconstruct_history_from_rollout` now returns a single
    `RolloutReconstruction` bundle containing:
    
    - rebuilt `history`
    - `previous_model`
    - `reference_context_item`
    
    Resume and fork both consume that shared output directly.
    
    ### Reverse replay core
    
    The reconstruction logic moved into
    `codex-rs/core/src/codex/rollout_reconstruction.rs` and now scans
    rollout items newest-to-oldest.
    
    That reverse pass:
    
    - derives `previous_model`
    - derives whether `reference_context_item` is preserved or cleared
    - stops early once it has both resume metadata and a surviving
    `replacement_history` checkpoint
    
    History materialization is still bridged eagerly for now by replaying
    only the surviving suffix forward, which keeps the history result stable
    while moving the control flow toward the future lazy reverse loader
    design.
    
    ### Removed bespoke context lookup
    
    This deletes `last_rollout_regular_turn_context_lookup` and its separate
    compaction-aware scan.
    
    The previous model / baseline metadata is now computed from the same
    replay state that rebuilds history, so resume/fork cannot drift from the
    reconstructed transcript view.
    
    ### `TurnContextItem` persistence contract
    
    `TurnContextItem` is now treated as the replay source of truth for
    durable model-visible baselines.
    
    This PR keeps the following contract explicit:
    
    - persist `TurnContextItem` for the first real user turn so resume can
    recover `previous_model`
    - persist it for later turns that emit model-visible context updates
    - if mid-turn compaction reinjects full initial context into replacement
    history, persist a fresh `TurnContextItem` after `Compacted` so
    resume/fork can re-establish the baseline from the rewritten history
    - do not treat manual compaction or pre-sampling compaction as creating
    a new durable baseline on their own
    
    ## Behavior Preserved
    
    - rollback replay stays aligned with `drop_last_n_user_turns`
    - rollback skips only user turns
    - incomplete active user turns are dropped before older finalized turns
    when rollback applies
    - unmatched aborts do not consume the current active turn
    - missing abort IDs still conservatively clear stale compaction state
    - compaction clears `reference_context_item` until a later
    `TurnContextItem` re-establishes it
    - `previous_model` still comes from the newest surviving user turn that
    established one
    
    ## Tests
    
    Targeted validation run for the current branch shape:
    
    - `cd codex-rs && cargo test -p codex-core --lib
    codex::rollout_reconstruction_tests -- --nocapture`
    - `cd codex-rs && just fmt`
    
    The branch also extracts the rollout reconstruction tests into
    `codex-rs/core/src/codex/rollout_reconstruction_tests.rs` so this logic
    has a dedicated home instead of living inline in `codex.rs`.
  • Clarify escalation guidance for sandbox-related network failures (#13051)
    This updates the on-request permissions instructions so likely
    sandbox-related network failures during dependency installation are
    treated as escalation candidates.
    
    Repro:
    - Run `codex -a on-request -s workspace-write` in a fresh temp dir.
    - Prompt: `Build a new rust app with one dependency, anyhow, and try
    installing the dependency`.
    - Before this change, DNS/registry failures like `Could not resolve
    host: index.crates.io` could be treated like ordinary transient failures
    and not escalate.
    
    Fix:
    - Clarify that likely sandbox-related network errors such as DNS/host
    resolution, registry/index access, and dependency download failures
    should trigger escalation.
    
    Validation:
    - Rebuild the CLI and rerun the same repro. The same instructions should
    now be more likely to trigger escalation instead of silently stopping.
    
    Related Slack canvas:
    - https://openai.enterprise.slack.com/docs/T0BQTNSUF/F0ACVNJAV09
  • fix: use AbsolutePathBuf for permission profile file roots (#12970)
    ## Why
    `PermissionProfile` should describe filesystem roots as absolute paths
    at the type level. Using `PathBuf` in `FileSystemPermissions` made the
    shared type too permissive and blurred together three different
    deserialization cases:
    
    - skill metadata in `agents/openai.yaml`, where relative paths should
    resolve against the skill directory
    - app-server API payloads, where callers should have to send absolute
    paths
    - local tool-call payloads for commands like `shell_command` and
    `exec_command`, where `additional_permissions.file_system` may
    legitimately be relative to the command `workdir`
    
    This change tightens the shared model without regressing the existing
    local command flow.
    
    ## What Changed
    - changed `protocol::models::FileSystemPermissions` and the app-server
    `AdditionalFileSystemPermissions` mirror to use `AbsolutePathBuf`
    - wrapped skill metadata deserialization in `AbsolutePathBufGuard`, so
    relative permission roots in `agents/openai.yaml` resolve against the
    containing skill directory
    - kept app-server/API deserialization strict, so relative
    `additionalPermissions.fileSystem.*` paths are rejected at the boundary
    - restored cwd/workdir-relative deserialization for local tool-call
    payloads by parsing `shell`, `shell_command`, and `exec_command`
    arguments under an `AbsolutePathBufGuard` rooted at the resolved command
    working directory
    - simplified runtime additional-permission normalization so it only
    canonicalizes and deduplicates absolute roots instead of trying to
    recover relative ones later
    - updated the app-server schema fixtures, `app-server/README.md`, and
    the affected transport/TUI tests to match the final behavior
  • Add model availability NUX metadata (#12972)
    - replace show_nux with structured availability_nux model metadata
    - expose availability NUX data through the app-server model API
    - update shared fixtures and tests for the new field
  • Support multimodal custom tool outputs (#12948)
    ## Summary
    
    This changes `custom_tool_call_output` to use the same output payload
    shape as `function_call_output`, so freeform tools can return either
    plain text or structured content items.
    
    The main goal is to let `js_repl` return image content from nested
    `view_image` calls in its own `custom_tool_call_output`, instead of
    relying on a separate injected message.
    
    ## What changed
    
    - Changed `custom_tool_call_output.output` from `string` to
    `FunctionCallOutputPayload`
    - Updated freeform tool plumbing to preserve structured output bodies
    - Updated `js_repl` to aggregate nested tool content items and attach
    them to the outer `js_repl` result
    - Removed the old `js_repl` special case that injected `view_image`
    results as a separate pending user image message
    - Updated normalization/history/truncation paths to handle multimodal
    `custom_tool_call_output`
    - Regenerated app-server protocol schema artifacts
    
    ## Behavior
    
    Direct `view_image` calls still return a `function_call_output` with
    image content.
    
    When `view_image` is called inside `js_repl`, the outer `js_repl`
    `custom_tool_call_output` now carries:
    - an `input_text` item if the JS produced text output
    - one or more `input_image` items from nested tool results
    
    So the nested image result now stays inside the `js_repl` tool output
    instead of being injected as a separate message.
    
    ## Compatibility
    
    This is intended to be backward-compatible for resumed conversations.
    
    Older histories that stored `custom_tool_call_output.output` as a plain
    string still deserialize correctly, and older histories that used the
    previous injected-image-message flow also continue to resume.
    
    Added regression coverage for resuming a pre-change rollout containing:
    - string-valued `custom_tool_call_output`
    - legacy injected image message history
    
    
    #### [git stack](https://github.com/magus/git-stack-cli)
    - 👉 `1` https://github.com/openai/codex/pull/12948
  • feat: add local date/timezone to turn environment context (#12947)
    ## Summary
    
    This PR includes the session's local date and timezone in the
    model-visible environment context and persists that data in
    `TurnContextItem`.
    
      ## What changed
    - captures the current local date and IANA timezone when building a turn
    context, with a UTC fallback if the timezone lookup fails
    - includes current_date and timezone in the serialized
    <environment_context> payload
    - stores those fields on TurnContextItem so they survive rollout/history
    handling, subagent review threads, and resume flows
    - treats date/timezone changes as environment updates, so prompt caching
    and context refresh logic do not silently reuse stale time context
    - updates tests to validate the new environment fields without depending
    on a single hardcoded environment-context string
    
    ## test
    
    built a local build and saw it in the rollout file:
    ```
    {"timestamp":"2026-02-26T21:39:50.737Z","type":"response_item","payload":{"type":"message","role":"user","content":[{"type":"input_text","text":"<environment_context>\n  <shell>zsh</shell>\n  <current_date>2026-02-26</current_date>\n  <timezone>America/Los_Angeles</timezone>\n</environment_context>"}]}}
    ```
  • Allow clients not to send summary as an option (#12950)
    Summary is a required parameter on UserTurn. Ideally we'd like the core
    to decide the appropriate summary level.
    
    Make the summary optional and don't send it when not needed.
  • feat: include sandbox config with escalation request (#12839)
    ## Why
    
    Before this change, an escalation approval could say that a command
    should be rerun, but it could not carry the sandbox configuration that
    should still apply when the escalated command is actually spawned.
    
    That left an unsafe gap in the `zsh-fork` skill path: skill scripts
    under `scripts/` that did not declare permissions could be escalated
    without a sandbox, and scripts that did declare permissions could lose
    their bounded sandbox on rerun or cached session approval.
    
    This PR extends the escalation protocol so approvals can optionally
    carry sandbox configuration all the way through execution. That lets the
    shell runtime preserve the intended sandbox instead of silently widening
    access.
    
    We likely want a single permissions type for this codepath eventually,
    probably centered on `Permissions`. For now, the protocol needs to
    represent both the existing `PermissionProfile` form and the fuller
    `Permissions` form, so this introduces a temporary disjoint union,
    `EscalationPermissions`, to carry either one.
    
    Further, this means that today, a skill either:
    
    - does not declare any permissions, in which case it is run using the
    default sandbox for the turn
    - specifies permissions, in which case the skill is run using that exact
    sandbox, which might be more restrictive than the default sandbox for
    the turn
    
    We will likely change the skill's permissions to be additive to the
    existing permissions for the turn.
    
    ## What Changed
    
    - Added `EscalationPermissions` to `codex-protocol` so escalation
    requests can carry either a `PermissionProfile` or a full `Permissions`
    payload.
    - Added an explicit `EscalationExecution` mode to the shell escalation
    protocol so reruns distinguish between `Unsandboxed`, `TurnDefault`, and
    `Permissions(...)` instead of overloading `None`.
    - Updated `zsh-fork` shell reruns to resolve `TurnDefault` at execution
    time, which keeps ordinary `UseDefault` commands on the turn sandbox and
    preserves turn-level macOS seatbelt profile extensions.
    - Updated the `zsh-fork` skill path so a skill with no declared
    permissions inherits the conversation's effective sandbox instead of
    escalating unsandboxed.
    - Updated the `zsh-fork` skill path so a skill with declared permissions
    reruns with exactly those permissions, including when a cached session
    approval is reused.
    
    ## Testing
    
    - Added unit coverage in
    `core/src/tools/runtimes/shell/unix_escalation.rs` for the explicit
    `UseDefault` / `RequireEscalated` / `WithAdditionalPermissions`
    execution mapping.
    - Added unit coverage in
    `core/src/tools/runtimes/shell/unix_escalation.rs` for macOS seatbelt
    extension preservation in both the `TurnDefault` and
    explicit-permissions rerun paths.
    - Added integration coverage in `core/tests/suite/skill_approval.rs` for
    permissionless skills inheriting the turn sandbox and explicit skill
    permissions remaining bounded across cached approval reuse.
  • Use model catalog default for reasoning summary fallback (#12873)
    ## Summary
    - make `Config.model_reasoning_summary` optional so unset means use
    model default
    - resolve the optional config value to a concrete summary when building
    `TurnContext`
    - add protocol support for `default_reasoning_summary` in model metadata
    
    ## Validation
    - `cargo test -p codex-core --lib client::tests -- --nocapture`
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Enforce user input length cap (#12823)
    Currently there is no bound on the length of a user message submitted in
    the TUI or through the app server interface. That means users can paste
    many megabytes of text, which can lead to bad performance, hangs, and
    crashes. In extreme cases, it can lead to a [kernel
    panic](https://github.com/openai/codex/issues/12323).
    
    This PR limits the length of a user input to 2**20 (about 1M)
    characters. This value was chosen because it fills the entire context
    window on the latest models, so accepting longer inputs wouldn't make
    sense anyway.
    
    Summary
    - add a shared `MAX_USER_INPUT_TEXT_CHARS` constant in codex-protocol
    and surface it in TUI and app server code
    - block oversized submissions in the TUI submit flow and emit error
    history cells when validation fails
    - reject heavy app-server requests with JSON-RPC `-32602` and structured
    `input_too_large` data, plus document the behavior
    
    Testing
    - ran the IDE extension with this change and verified that when I
    attempt to paste a user message that's several MB long, it correctly
    reports an error instead of crashing or making my computer hot.
  • feat: include available decisions in command approval requests (#12758)
    Command-approval clients currently infer which choices to show from
    side-channel fields like `networkApprovalContext`,
    `proposedExecpolicyAmendment`, and `additionalPermissions`. That makes
    the request shape harder to evolve, and it forces each client to
    replicate the server's heuristics instead of receiving the exact
    decision list for the prompt.
    
    This PR introduces a mapping between `CommandExecutionApprovalDecision`
    and `codex_protocol::protocol::ReviewDecision`:
    
    ```rust
    impl From<CoreReviewDecision> for CommandExecutionApprovalDecision {
        fn from(value: CoreReviewDecision) -> Self {
            match value {
                CoreReviewDecision::Approved => Self::Accept,
                CoreReviewDecision::ApprovedExecpolicyAmendment {
                    proposed_execpolicy_amendment,
                } => Self::AcceptWithExecpolicyAmendment {
                    execpolicy_amendment: proposed_execpolicy_amendment.into(),
                },
                CoreReviewDecision::ApprovedForSession => Self::AcceptForSession,
                CoreReviewDecision::NetworkPolicyAmendment {
                    network_policy_amendment,
                } => Self::ApplyNetworkPolicyAmendment {
                    network_policy_amendment: network_policy_amendment.into(),
                },
                CoreReviewDecision::Abort => Self::Cancel,
                CoreReviewDecision::Denied => Self::Decline,
            }
        }
    }
    ```
    
    And updates `CommandExecutionRequestApprovalParams` to have a new field:
    
    ```rust
    available_decisions: Option<Vec<CommandExecutionApprovalDecision>>
    ```
    
    when, if specified, should make it easier for clients to display an
    appropriate list of options in the UI.
    
    This makes it possible for `CoreShellActionProvider::prompt()` in
    `unix_escalation.rs` to specify the `Vec<ReviewDecision>` directly,
    adding support for `ApprovedForSession` when approving a skill script,
    which was previously missing in the TUI.
    
    Note this results in a significant change to `exec_options()` in
    `approval_overlay.rs`, as the displayed options are now derived from
    `available_decisions: &[ReviewDecision]`.
    
    ## What Changed
    
    - Add `available_decisions` to
    [`ExecApprovalRequestEvent`](https://github.com/openai/codex/blob/de00e932dd9801de0a4faac0519162099753f331/codex-rs/protocol/src/approvals.rs#L111-L175),
    including helpers to derive the legacy default choices when older
    senders omit the field.
    - Map `codex_protocol::protocol::ReviewDecision` to app-server
    `CommandExecutionApprovalDecision` and expose the ordered list as
    experimental `availableDecisions` in
    [`CommandExecutionRequestApprovalParams`](https://github.com/openai/codex/blob/de00e932dd9801de0a4faac0519162099753f331/codex-rs/app-server-protocol/src/protocol/v2.rs#L3798-L3807).
    - Thread optional `available_decisions` through the core approval path
    so Unix shell escalation can explicitly request `ApprovedForSession` for
    session-scoped approvals instead of relying on client heuristics.
    [`unix_escalation.rs`](https://github.com/openai/codex/blob/de00e932dd9801de0a4faac0519162099753f331/codex-rs/core/src/tools/runtimes/shell/unix_escalation.rs#L194-L214)
    - Update the TUI approval overlay to build its buttons from the ordered
    decision list, while preserving the legacy fallback when
    `available_decisions` is missing.
    - Update the app-server README, test client output, and generated schema
    artifacts to document and surface the new field.
    
    ## Testing
    
    - Add `approval_overlay.rs` coverage for explicit decision lists,
    including the generic `ApprovedForSession` path and network approval
    options.
    - Update `chatwidget/tests.rs` and app-server protocol tests to populate
    the new optional field and keep older event shapes working.
    
    ## Developers Docs
    
    - If we document `item/commandExecution/requestApproval` on
    [developers.openai.com/codex](https://developers.openai.com/codex), add
    experimental `availableDecisions` as the preferred source of approval
    choices and note that older servers may omit it.
  • Revert "Add skill approval event/response (#12633)" (#12811)
    This reverts commit https://github.com/openai/codex/pull/12633. We no
    longer need this PR, because we favor sending normal exec command
    approval server request with `additional_permissions` of skill
    permissions instead
  • feat(app-server): add ThreadItem::DynamicToolCall (#12732)
    Previously, clients would call `thread/start` with dynamic_tools set,
    and when a model invokes a dynamic tool, it would just make the
    server->client `item/tool/call` request and wait for the client's
    response to complete the tool call. This works, but it doesn't have an
    `item/started` or `item/completed` event.
    
    Now we are doing this:
    - [new] emit `item/started` with `DynamicToolCall` populated with the
    call arguments
    - send an `item/tool/call` server request
    - [new] once the client responds, emit `item/completed` with
    `DynamicToolCall` populated with the response.
    
    Also, with `persistExtendedHistory: true`, dynamic tool calls are now
    reconstructable in `thread/read` and `thread/resume` as
    `ThreadItem::DynamicToolCall`.
  • chore: migrate additional permissions to PermissionProfile (#12731)
    This PR replaces the old `additional_permissions.fs_read/fs_write` shape
    with a shared `PermissionProfile`
    model and wires it through the command approval, sandboxing, protocol,
    and TUI layers. The schema is adopted from the
    `SkillManifestPermissions`, which is also refactored to use this unified
    struct. This helps us easily expose permission profiles in app
    server/core as a follow-up.
  • feat(core) Introduce Feature::RequestPermissions (#11871)
    ## Summary
    Introduces the initial implementation of Feature::RequestPermissions.
    RequestPermissions allows the model to request that a command be run
    inside the sandbox, with additional permissions, like writing to a
    specific folder. Eventually this will include other rules as well, and
    the ability to persist these permissions, but this PR is already quite
    large - let's get the core flow working and go from there!
    
    <img width="1279" height="541" alt="Screenshot 2026-02-15 at 2 26 22 PM"
    src="https://github.com/user-attachments/assets/0ee3ec0f-02ec-4509-91a2-809ac80be368"
    />
    
    ## Testing
    - [x] Added tests
    - [x] Tested locally
    - [x] Feature
  • chore: rm hardcoded PRESETS list (#12650)
    rm `PRESETS` list harcoded in `model_presets` as we now have bundled
    `models.json` with equivalent info.
    
    update logic to rely on bundled models instead, update tests.
  • Add skill approval event/response (#12633)
    Set the stage for skill-level permission approval in addition to
    command-level.
    
    Behind a feature flag.
  • feat(core): persist network approvals in execpolicy (#12357)
    ## Summary
    Persist network approval allow/deny decisions as `network_rule(...)`
    entries in execpolicy (not proxy config)
    
    It adds `network_rule` parsing + append support in `codex-execpolicy`,
    including `decision="prompt"` (parse-only; not compiled into proxy
    allow/deny lists)
    - compile execpolicy network rules into proxy allow/deny lists and
    update the live proxy state on approval
    - preserve requirements execpolicy `network_rule(...)` entries when
    merging with file-based execpolicy
    - reject broad wildcard hosts (for example `*`) for persisted
    `network_rule(...)`
  • Fix compaction context reinjection and model baselines (#12252)
    ## Summary
    - move regular-turn context diff/full-context persistence into
    `run_turn` so pre-turn compaction runs before incoming context updates
    are recorded
    - after successful pre-turn compaction, rely on a cleared
    `reference_context_item` to trigger full context reinjection on the
    follow-up regular turn (manual `/compact` keeps replacement history
    summary-only and also clears the baseline)
    - preserve `<model_switch>` when full context is reinjected, and inject
    it *before* the rest of the full-context items
    - scope `reference_context_item` and `previous_model` to regular user
    turns only so standalone tasks (`/compact`, shell, review, undo) cannot
    suppress future reinjection or `<model_switch>` behavior
    - make context-diff persistence + `reference_context_item` updates
    explicit in the regular-turn path, with clearer docs/comments around the
    invariant
    - stop persisting local `/compact` `RolloutItem::TurnContext` snapshots
    (only regular turns persist `TurnContextItem` now)
    - simplify resume/fork previous-model/reference-baseline hydration by
    looking up the last surviving turn context from rollout lifecycle
    events, including rollback and compaction-crossing handling
    - remove the legacy fallback that guessed from bare `TurnContext`
    rollouts without lifecycle events
    - update compaction/remote-compaction/model-visible snapshots and
    compact test assertions (including remote compaction mock response
    shape)
    
    ## Why
    We were persisting incoming context items before spawning the regular
    turn task, which let pre-turn compaction requests accidentally include
    incoming context diffs without the new user message. Fixing that exposed
    follow-on baseline issues around `/compact`, resume/fork, and standalone
    tasks that could cause duplicate context injection or suppress
    `<model_switch>` instructions.
    
    This PR re-centers the invariants around regular turns:
    - regular turns persist model-visible context diffs/full reinjection and
    update the `reference_context_item`
    - standalone tasks do not advance those regular-turn baselines
    - compaction clears the baseline when replacement history may have
    stripped the referenced context diffs
    
    ## Follow-ups (TODOs left in code)
    - `TODO(ccunningham)`: fix rollback/backtracking baseline handling more
    comprehensively
    - `TODO(ccunningham)`: include pending incoming context items in
    pre-turn compaction threshold estimation
    - `TODO(ccunningham)`: inject updated personality spec alongside
    `<model_switch>` so some model-switch paths can avoid forced full
    reinjection
    - `TODO(ccunningham)`: review task turn lifecycle
    (`TurnStarted`/`TurnComplete`) behavior and emit task-start context
    diffs for task types that should have them (excluding `/compact`)
    
    ## Validation
    - `just fmt`
    - CI should cover the updated compaction/resume/model-visible snapshot
    expectations and rollout-hydration behavior
    - I did **not** rerun the full local test suite after the latest
    resume-lookup / rollout-persistence simplifications
  • Wire realtime api to core (#12268)
    - Introduce `RealtimeConversationManager` for realtime API management 
    - Add `op::conversation` to start conversation, insert audio, insert
    text, and close conversation.
    - emit conversation lifecycle and realtime events.
    - Move shared realtime payload types into codex-protocol and add core
    e2e websocket tests for start/replace/transport-close paths.
    
    Things to consider:
    - Should we use the same `op::` and `Events` channel to carry audio? I
    think we should try this simple approach and later we can create
    separate one if the channels got congested.
    - Sending text updates to the client: we can start simple and later
    restrict that.
    - Provider auth isn't wired for now intentionally
  • feat: cleaner TUI for sub-agents (#12327)
    <img width="760" height="496" alt="Screenshot 2026-02-20 at 14 31 25"
    src="https://github.com/user-attachments/assets/1983b825-bb47-417e-9925-6f727af56765"
    />
  • feat: add nick name to sub-agents (#12320)
    Adding random nick name to sub-agents. Used for UX
    
    At the same time, also storing and wiring the role of the sub-agent
  • feat: add Reject approval policy with granular prompt rejection controls (#12087)
    ## Why
    
    We need a way to auto-reject specific approval prompt categories without
    switching all approvals off.
    
    The goal is to let users independently control:
    - sandbox escalation approvals,
    - execpolicy `prompt` rule approvals,
    - MCP elicitation prompts.
    
    ## What changed
    
    - Added a new primary approval mode in `protocol/src/protocol.rs`:
    
    ```rust
    pub enum AskForApproval {
        // ...
        Reject(RejectConfig),
        // ...
    }
    
    pub struct RejectConfig {
        pub sandbox_approval: bool,
        pub rules: bool,
        pub mcp_elicitations: bool,
    }
    ```
    
    - Wired `RejectConfig` semantics through approval paths in `core`:
      - `core/src/exec_policy.rs`
        - rejects rule-driven prompts when `rules = true`
        - rejects sandbox/escalation prompts when `sandbox_approval = true`
    - preserves rule priority when both rule and sandbox prompt conditions
    are present
      - `core/src/tools/sandboxing.rs`
    - applies `sandbox_approval` to default exec approval decisions and
    sandbox-failure retry gating
      - `core/src/safety.rs`
    - keeps `Reject { all false }` behavior aligned with `OnRequest` for
    patch safety
        - rejects out-of-root patch approvals when `sandbox_approval = true`
      - `core/src/mcp_connection_manager.rs`
        - auto-declines MCP elicitations when `mcp_elicitations = true`
    
    - Ensured approval policy used by MCP elicitation flow stays in sync
    with constrained session policy updates.
    
    - Updated app-server v2 conversions and generated schema/TypeScript
    artifacts for the new `Reject` shape.
    
    ## Verification
    
    Added focused unit coverage for the new behavior in:
    - `core/src/exec_policy.rs`
    - `core/src/tools/sandboxing.rs`
    - `core/src/mcp_connection_manager.rs`
    - `core/src/safety.rs`
    - `core/src/tools/runtimes/apply_patch.rs`
    
    Key cases covered include rule-vs-sandbox prompt precedence, MCP
    auto-decline behavior, and patch/sandbox retry behavior under
    `RejectConfig`.
  • client side modelinfo overrides (#12101)
    TL;DR
    Add top-level `model_catalog_json` config support so users can supply a
    local model catalog override from a JSON file path (including adding new
    models) without backend changes.
    
    ### Problem
    Codex previously had no clean client-side way to replace/overlay model
    catalog data for local testing of model metadata and new model entries.
    
    ### Fix
    - Add top-level `model_catalog_json` config field (JSON file path).
    - Apply catalog entries when resolving `ModelInfo`:
      1. Base resolved model metadata (remote/fallback)
      2. Catalog overlay from `model_catalog_json`
    3. Existing global top-level overrides (`model_context_window`,
    `model_supports_reasoning_summaries`, etc.)
    
    ### Note
    Will revisit per-field overrides in a follow-up
    
    ### Tests
    Added tests
  • fix: Restricted Read: /System is too permissive for macOS platform de… (#11798)
    …fault
    
    Update the list of platform defaults included for `ReadOnlyAccess`.
    
    When `ReadOnlyAccess::Restricted::include_platform_defaults` is `true`,
    the policy defined in
    `codex-rs/core/src/seatbelt_platform_defaults.sbpl` is appended to
    enable macOS programs to function properly.
  • feat(core): plumb distinct approval ids for command approvals (#12051)
    zsh fork PR stack:
    - https://github.com/openai/codex/pull/12051 👈 
    - https://github.com/openai/codex/pull/12052
    
    With upcoming support for a fork of zsh that allows us to intercept
    `execve` and run execpolicy checks for each subcommand as part of a
    `CommandExecution`, it will be possible for there to be multiple
    approval requests for a shell command like `/path/to/zsh -lc 'git status
    && rg \"TODO\" src && make test'`.
    
    To support that, this PR introduces a new `approval_id` field across
    core, protocol, and app-server so that we can associate approvals
    properly for subcommands.
  • Add remote skill scope/product_surface/enabled params and cleanup (#11801)
    skills/remote/list: params=hazelnutScope, productSurface, enabled;
    returns=data: { id, name, description }[]
    skills/remote/export: params=hazelnutId; returns={ id, path }
  • Feat: add model reroute notification (#12001)
    ### Summary
    Builiding off
    https://github.com/openai/codex/pull/11964/files/5c75aa7b89a70bc2cc410a6fd238749306ec4c5e#diff-058ae8f109a8b84b4b79bbfa45f522c2233b9d9e139696044ae374d50b6196e0,
    we have created a `model/rerouted` notification that captures the event
    so that consumers can render as expected. Keep the `EventMsg::Warning`
    path in core so that this does not affect TUI rendering.
    
    `model/rerouted` is meant to be generic to account for future usage
    including capacity planning etc.
  • chore(core) rm Feature::RequestRule (#11866)
    ## Summary
    This feature is now reasonably stable, let's remove it so we can
    simplify our upcoming iterations here.
    
    ## Testing 
    - [x] Existing tests pass
  • fix: show user warning when using default fallback metadata (#11690)
    ### What
    It's currently unclear when the harness falls back to the default,
    generic `ModelInfo`. This happens when the `remote_models` feature is
    disabled or the model is truly unknown, and can lead to bad performance
    and issues in the harness.
    
    Add a user-facing warning when this happens so they are aware when their
    setup is broken.
    
    ### Tests
    Added tests, tested locally.
  • feat(core): add structured network approval plumbing and policy decision model (#11672)
    ### Description
    #### Summary
    Introduces the core plumbing required for structured network approvals
    
    #### What changed
    - Added structured network policy decision modeling in core.
    - Added approval payload/context types needed for network approval
    semantics.
    - Wired shell/unified-exec runtime plumbing to consume structured
    decisions.
    - Updated related core error/event surfaces for structured handling.
    - Updated protocol plumbing used by core approval flow.
    - Included small CLI debug sandbox compatibility updates needed by this
    layer.
    
    #### Why
    establishes the minimal backend foundation for network approvals without
    yet changing high-level orchestration or TUI behavior.
    
    #### Notes
    - Behavior remains constrained by existing requirements/config gating.
    - Follow-up PRs in the stack handle orchestration, UX, and app-server
    integration.
    
    ---------
    
    Co-authored-by: Codex <199175422+chatgpt-codex-connector[bot]@users.noreply.github.com>
  • fix(protocol): make local image test Bazel-friendly (#11799)
    Fixes Bazel build failure in //codex-rs/protocol:protocol-unit-tests.
    
    The test used include_bytes! to read a PNG from codex-core assets; Cargo
    can read it,
    but Bazel sandboxing can't, so the crate fails to compile.
    
    This change inlines a tiny valid PNG in the test to keep it hermetic.
    
    Related regression: #10590 (cc: @charley-oai)
  • tui: preserve remote image attachments across resume/backtrack (#10590)
    ## Summary
    This PR makes app-server-provided image URLs first-class attachments in
    TUI, so they survive resume/backtrack/history recall and are resubmitted
    correctly.
    
    <img width="715" height="491" alt="Screenshot 2026-02-12 at 8 27 08 PM"
    src="https://github.com/user-attachments/assets/226cbd35-8f0c-4e51-a13e-459ef5dd1927"
    />
    
    Can delete the attached image upon backtracking:
    <img width="716" height="301" alt="Screenshot 2026-02-12 at 8 27 31 PM"
    src="https://github.com/user-attachments/assets/4558d230-f1bd-4eed-a093-8e1ab9c6db27"
    />
    
    In both history and composer, remote images are rendered as normal
    `[Image #N]` placeholders, with numbering unified with local images.
    
    ## What changed
    - Plumb remote image URLs through TUI message state:
      - `UserHistoryCell`
      - `BacktrackSelection`
      - `ChatComposerHistory::HistoryEntry`
      - `ChatWidget::UserMessage`
    - Show remote images as placeholder rows inside the composer box (above
    textarea), and in history cells.
    - Support keyboard selection/deletion for remote image rows in composer
    (`Up`/`Down`, `Delete`/`Backspace`).
    - Preserve remote-image-only turns in local composer history (Up/Down
    recall), including restore after backtrack.
    - Ensure submit/queue/backtrack resubmit include remote images in model
    input (`UserInput::Image`), and keep request shape stable for
    remote-image-only turns.
    - Keep image numbering contiguous across remote + local images:
      - remote images occupy `[Image #1]..[Image #M]`
      - local images start at `[Image #M+1]`
      - deletion renumbers consistently.
    - In protocol conversion, increment shared image index for remote images
    too, so mixed remote/local image tags stay in a single sequence.
    - Simplify restore logic to trust in-memory attachment order (no
    placeholder-number parsing path).
    - Backtrack/replay rollback handling now queues trims through
    `AppEvent::ApplyThreadRollback` and syncs transcript overlay/deferred
    lines after trims, so overlay/transcript state stays consistent.
    - Trim trailing blank rendered lines from user history rendering to
    avoid oversized blank padding.
    
    ## Docs + tests
    - Updated: `docs/tui-chat-composer.md` (remote image flow,
    selection/deletion, numbering offsets)
    - Added/updated tests across `tui/src/chatwidget/tests.rs`,
    `tui/src/app.rs`, `tui/src/app_backtrack.rs`, `tui/src/history_cell.rs`,
    and `tui/src/bottom_pane/chat_composer.rs`
    - Added snapshot coverage for remote image composer states, including
    deleting the first of two remote images.
    
    ## Validation
    - `just fmt`
    - `cargo test -p codex-tui`
    
    ## Codex author
    `codex fork 019c2636-1571-74a1-8471-15a3b1c3f49d`
  • Persist complete TurnContextItem state via canonical conversion (#11656)
    ## Summary
    
    This PR delivers the first small, shippable step toward model-visible
    state diffing by making
    `TurnContextItem` more complete and standardizing how it is built.
    
    Specifically, it:
    - Adds persisted network context to `TurnContextItem`.
    - Introduces a single canonical `TurnContext -> TurnContextItem`
    conversion path.
    - Routes existing rollout write sites through that canonical conversion
    helper.
    
    No context injection/diff behavior changes are included in this PR.
    
    ## Why this change
    
    The design goal is to make `TurnContextItem` the canonical source of
    truth for context-diff
    decisions.
    Before this PR:
    - `TurnContextItem` did not include all TurnContext-derived environment
    inputs needed for v1
    completeness.
    - Construction was duplicated at multiple write sites.
    
    This PR addresses both with a minimal, reviewable change.
    
    ## Changes
    
    ### 1) Extend `TurnContextItem` with network state
    - Added `TurnContextNetworkItem { allowed_domains, denied_domains }`.
    - Added `network: Option<TurnContextNetworkItem>` to `TurnContextItem`.
    - Kept backward compatibility by making the new field optional and
    skipped when absent.
    
    Files:
    - `codex-rs/protocol/src/protocol.rs`
    
    ### 2) Canonical conversion helper
    - Added `TurnContext::to_turn_context_item(collaboration_mode)` in core.
    - Added internal helper to derive network fields from
    `config_layer_stack.requirements().network`.
    
    Files:
    - `codex-rs/core/src/codex.rs`
    
    ### 3) Use canonical conversion at rollout write sites
    - Replaced ad hoc `TurnContextItem { ... }` construction with
    `to_turn_context_item(...)` in:
      - sampling request path
      - compaction path
    
    Files:
    - `codex-rs/core/src/codex.rs`
    - `codex-rs/core/src/compact.rs`
    
    ### 4) Update fixtures/tests for new optional field
    - Updated existing `TurnContextItem` literals in tests to include
    `network: None`.
    - Added protocol tests for:
      - deserializing old payloads with no `network`
      - serializing when `network` is present
    
    Files:
    - `codex-rs/core/tests/suite/resume_warning.rs`
    - No replay/diff logic changes.
    - Persisted rollout `TurnContextItem` now carries additional network
    context when available.
    - Older rollout lines without `network` remain readable.