Commit Graph

1780 Commits

  • config: enforce enterprise feature requirements (#13388)
    ## Why
    
    Enterprises can already constrain approvals, sandboxing, and web search
    through `requirements.toml` and MDM, but feature flags were still only
    configurable as managed defaults. That meant an enterprise could suggest
    feature values, but it could not actually pin them.
    
    This change closes that gap and makes enterprise feature requirements
    behave like the other constrained settings. The effective feature set
    now stays consistent with enterprise requirements during config load,
    when config writes are validated, and when runtime code mutates feature
    flags later in the session.
    
    It also tightens the runtime API for managed features. `ManagedFeatures`
    now follows the same constraint-oriented shape as `Constrained<T>`
    instead of exposing panic-prone mutation helpers, and production code
    can no longer construct it through an unconstrained `From<Features>`
    path.
    
    The PR also hardens the `compact_resume_fork` integration coverage on
    Windows. After the feature-management changes,
    `compact_resume_after_second_compaction_preserves_history` was
    overflowing the libtest/Tokio thread stacks on Windows, so the test now
    uses an explicit larger-stack harness as a pragmatic mitigation. That
    may not be the ideal root-cause fix, and it merits a parallel
    investigation into whether part of the async future chain should be
    boxed to reduce stack pressure instead.
    
    ## What Changed
    
    Enterprises can now pin feature values in `requirements.toml` with the
    requirements-side `features` table:
    
    ```toml
    [features]
    personality = true
    unified_exec = false
    ```
    
    Only canonical feature keys are allowed in the requirements `features`
    table; omitted keys remain unconstrained.
    
    - Added a requirements-side pinned feature map to
    `ConfigRequirementsToml`, threaded it through source-preserving
    requirements merge and normalization in `codex-config`, and made the
    TOML surface use `[features]` (while still accepting legacy
    `[feature_requirements]` for compatibility).
    - Exposed `featureRequirements` from `configRequirements/read`,
    regenerated the JSON/TypeScript schema artifacts, and updated the
    app-server README.
    - Wrapped the effective feature set in `ManagedFeatures`, backed by
    `ConstrainedWithSource<Features>`, and changed its API to mirror
    `Constrained<T>`: `can_set(...)`, `set(...) -> ConstraintResult<()>`,
    and result-returning `enable` / `disable` / `set_enabled` helpers.
    - Removed the legacy-usage and bulk-map passthroughs from
    `ManagedFeatures`; callers that need those behaviors now mutate a plain
    `Features` value and reapply it through `set(...)`, so the constrained
    wrapper remains the enforcement boundary.
    - Removed the production loophole for constructing unconstrained
    `ManagedFeatures`. Non-test code now creates it through the configured
    feature-loading path, and `impl From<Features> for ManagedFeatures` is
    restricted to `#[cfg(test)]`.
    - Rejected legacy feature aliases in enterprise feature requirements,
    and return a load error when a pinned combination cannot survive
    dependency normalization.
    - Validated config writes against enterprise feature requirements before
    persisting changes, including explicit conflicting writes and
    profile-specific feature states that normalize into invalid
    combinations.
    - Updated runtime and TUI feature-toggle paths to use the constrained
    setter API and to persist or apply the effective post-constraint value
    rather than the requested value.
    - Updated the `core_test_support` Bazel target to include the bundled
    core model-catalog fixtures in its runtime data, so helper code that
    resolves `core/models.json` through runfiles works in remote Bazel test
    environments.
    - Renamed the core config test coverage to emphasize that effective
    feature values are normalized at runtime, while conflicting persisted
    config writes are rejected.
    - Ran `compact_resume_after_second_compaction_preserves_history` inside
    an explicit 8 MiB test thread and Tokio runtime worker stack, following
    the existing larger-stack integration-test pattern, to keep the Windows
    `compact_resume_fork` test slice from aborting while a parallel
    investigation continues into whether some of the underlying async
    futures should be boxed.
    
    ## Verification
    
    - `cargo test -p codex-config`
    - `cargo test -p codex-core feature_requirements_ -- --nocapture`
    - `cargo test -p codex-core
    load_requirements_toml_produces_expected_constraints -- --nocapture`
    - `cargo test -p codex-core
    compact_resume_after_second_compaction_preserves_history -- --nocapture`
    - `cargo test -p codex-core compact_resume_fork -- --nocapture`
    - Re-ran the built `codex-core` `tests/all` binary with
    `RUST_MIN_STACK=262144` for
    `compact_resume_after_second_compaction_preserves_history` to confirm
    the explicit-stack harness fixes the deterministic low-stack repro.
    - `cargo test -p codex-core`
    - This still fails locally in unrelated integration areas that expect
    the `codex` / `test_stdio_server` binaries or hit existing `search_tool`
    wiremock mismatches.
    
    ## Docs
    
    `developers.openai.com/codex` should document the requirements-side
    `[features]` table for enterprise and MDM-managed configuration,
    including that it only accepts canonical feature keys and that
    conflicting config writes are rejected.
  • Feat: Preserve network access on read-only sandbox policies (#13409)
    ## Summary
    
    `PermissionProfile.network` could not be preserved when additional or
    compiled permissions resolved to
    `SandboxPolicy::ReadOnly`, because `ReadOnly` had no network_access
    field. This change makes read-only + network
    enabled representable directly and threads that through the protocol,
    app-server v2 mirror, and permission-
      merging logic.
    
    ## What changed
    
    - Added `network_access: bool` to `SandboxPolicy::ReadOnly` in the core
    protocol and app-server v2 protocol.
    - Kept backward compatibility by defaulting the new field to false, so
    legacy read-only payloads still
        deserialize unchanged.
    - Updated `has_full_network_access()` and sandbox summaries to respect
    read-only network access.
      - Preserved PermissionProfile.network when:
          - compiling skill permission profiles into sandbox policies
          - normalizing additional permissions
          - merging additional permissions into existing sandbox policies
    - Updated the approval overlay to show network in the rendered
    permission rule when requested.
      - Regenerated app-server schema fixtures for the new v2 wire shape.
  • feat(app-server): propagate app-server trace context into core (#13368)
    ### Summary
    Propagate trace context originating at app-server RPC method handlers ->
    codex core submission loop (so this includes spans such as `run_turn`!).
    This implements PR 2 of the app-server tracing rollout.
    
    This also removes the old lower-level env-based reparenting in core so
    explicit request/submission ancestry wins instead of being overridden by
    ambient `TRACEPARENT` state.
    
    ### What changed
    - Added `trace: Option<W3cTraceContext>` to codex_protocol::Submission
    - Taught `Codex::submit()` / `submit_with_id()` to automatically capture
    the current span context when constructing or forwarding a submission
    - Wrapped the core submission loop in a submission_dispatch span
    parented from Submission.trace
    - Warn on invalid submission trace carriers and ignore them cleanly
    - Removed the old env-based downstream reparenting path in core task
    execution
    - Stopped OTEL provider init from implicitly attaching env trace context
    process-wide
    - Updated mcp-server Submission call sites for the new field
    
    Added focused unit tests for:
    - capturing trace context into Submission
    - preferring `Submission.trace` when building the core dispatch span
    
    ### Why
    PR 1 gave us consistent inbound request spans in app-server, but that
    only covered the transport boundary. For long-running work like turns
    and reviews, the important missing piece was preserving ancestry after
    the request handler returns and core continues work on a different async
    path.
    
    This change makes that handoff explicit and keeps the parentage rules
    simple:
    - app-server request span sets the current context
    - `Submission.trace` snapshots that context
    - core restores it once, at the submission boundary
    - deeper core spans inherit naturally
    
    That also lets us stop relying on env-based reparenting for this path,
    which was too ambient and could override explicit ancestry.
  • feat: load plugin apps (#13401)
    load plugin-apps from `.app.json`.
    
    make apps runtime-mentionable iff `codex_apps` MCP actually exposes
    tools for that `connector_id`.
    
    if the app isn't available, it's filtered out of runtime connector set,
    so no tools are added and no app-mentions resolve.
    
    right now we don't have a clean cli-side error for an app not being
    installed. can look at this after.
    
    ### Tests
    Added tests, tested locally that using a plugin that bundles an app
    picks up the app.
  • Make js_repl image output controllable (#13331)
    ## Summary
    
    Instead of always adding inner function call outputs to the model
    context, let js code decide which ones to return.
    
    - Stop auto-hoisting nested tool outputs from `codex.tool(...)` into the
    outer `js_repl` function output.
    - Keep `codex.tool(...)` return values unchanged as structured JS
    objects.
    - Add `codex.emitImage(...)` as the explicit path for attaching an image
    to the outer `js_repl` function output.
    - Support emitting from a direct image URL, a single `input_image` item,
    an explicit `{ bytes, mimeType }` object, or a raw tool response object
    containing exactly one image.
    - Preserve existing `view_image` original-resolution behavior when JS
    emits the raw `view_image` tool result.
    - Suppress the special `ViewImageToolCall` event for `js_repl`-sourced
    `view_image` calls so nested inspection stays side-effect free until JS
    explicitly emits.
    - Update the `js_repl` docs and generated project instructions with both
    recommended patterns:
      - `await codex.emitImage(codex.tool("view_image", { path }))`
    - `await codex.emitImage({ bytes: await page.screenshot({ type: "jpeg",
    quality: 85 }), mimeType: "image/jpeg" })`
    
    #### [git stack](https://github.com/magus/git-stack-cli)
    -  `1` https://github.com/openai/codex/pull/13050
    - 👉 `2` https://github.com/openai/codex/pull/13331
    -  `3` https://github.com/openai/codex/pull/13049
  • Add under-development original-resolution view_image support (#13050)
    ## Summary
    
    Add original-resolution support for `view_image` behind the
    under-development `view_image_original_resolution` feature flag.
    
    When the flag is enabled and the target model is `gpt-5.3-codex` or
    newer, `view_image` now preserves original PNG/JPEG/WebP bytes and sends
    `detail: "original"` to the Responses API instead of using the legacy
    resize/compress path.
    
    ## What changed
    
    - Added `view_image_original_resolution` as an under-development feature
    flag.
    - Added `ImageDetail` to the protocol models and support for serializing
    `detail: "original"` on tool-returned images.
    - Added `PromptImageMode::Original` to `codex-utils-image`.
      - Preserves original PNG/JPEG/WebP bytes.
      - Keeps legacy behavior for the resize path.
    - Updated `view_image` to:
    - use the shared `local_image_content_items_with_label_number(...)`
    helper in both code paths
      - select original-resolution mode only when:
        - the feature flag is enabled, and
        - the model slug parses as `gpt-5.3-codex` or newer
    - Kept local user image attachments on the existing resize path; this
    change is specific to `view_image`.
    - Updated history/image accounting so only `detail: "original"` images
    use the docs-based GPT-5 image cost calculation; legacy images still use
    the old fixed estimate.
    - Added JS REPL guidance, gated on the same feature flag, to prefer JPEG
    at 85% quality unless lossless is required, while still allowing other
    formats when explicitly requested.
    - Updated tests and helper code that construct
    `FunctionCallOutputContentItem::InputImage` to carry the new `detail`
    field.
    
    ## Behavior
    
    ### Feature off
    - `view_image` keeps the existing resize/re-encode behavior.
    - History estimation keeps the existing fixed-cost heuristic.
    
    ### Feature on + `gpt-5.3-codex+`
    - `view_image` sends original-resolution images with `detail:
    "original"`.
    - PNG/JPEG/WebP source bytes are preserved when possible.
    - History estimation uses the GPT-5 docs-based image-cost calculation
    for those `detail: "original"` images.
    
    
    #### [git stack](https://github.com/magus/git-stack-cli)
    - 👉 `1` https://github.com/openai/codex/pull/13050
    -  `2` https://github.com/openai/codex/pull/13331
    -  `3` https://github.com/openai/codex/pull/13049
  • Add thread metadata update endpoint to app server (#13280)
    ## Summary
    - add the v2 `thread/metadata/update` API, including
    protocol/schema/TypeScript exports and app-server docs
    - patch stored thread `gitInfo` in sqlite without resuming the thread,
    with validation plus support for explicit `null` clears
    - repair missing sqlite thread rows from rollout data before patching,
    and make those repairs safe by inserting only when absent and updating
    only git columns so newer metadata is not clobbered
    - keep sqlite authoritative for mutable thread git metadata by
    preserving existing sqlite git fields during reconcile/backfill and only
    using rollout `SessionMeta` git fields to fill gaps
    - add regression coverage for the endpoint, repair paths, concurrent
    sqlite writes, clearing git fields, and rollout/backfill reconciliation
    - fix the login server shutdown race so cancelling before the waiter
    starts still terminates `block_until_done()` correctly
    
    ## Testing
    - `cargo test -p codex-state
    apply_rollout_items_preserves_existing_git_branch_and_fills_missing_git_fields`
    - `cargo test -p codex-state
    update_thread_git_info_preserves_newer_non_git_metadata`
    - `cargo test -p codex-core
    backfill_sessions_preserves_existing_git_branch_and_fills_missing_git_fields`
    - `cargo test -p codex-app-server thread_metadata_update`
    - `cargo test`
    - currently fails in existing `codex-core` grep-files tests with
    `unsupported call: grep_files`:
        - `suite::grep_files::grep_files_tool_collects_matches`
        - `suite::grep_files::grep_files_tool_reports_empty_results`
  • tui: align pending steers with core acceptance (#12868)
    ## Summary
    - submit `Enter` steers immediately while a turn is already running
    instead of routing them through `queued_user_messages`
    - keep those submitted steers visible in the footer as `pending_steers`
    until core records them as a user message or aborts the turn
    - reconcile pending steers on `ItemCompleted(UserMessage)`, not
    `RawResponseItem`
    - emit user-message item lifecycle for leftover pending input at task
    finish, then remove the TUI `TurnComplete` fallback
    - keep `queued_user_messages` for actual queued drafts, rendered below
    pending steers
    
    ## Problem
    While the assistant was generating, pressing `Enter` could send the
    input into `queued_user_messages`. That queue only drains after the turn
    ends, so ordinary steers behaved like queued drafts instead of landing
    at the next core sampling boundary.
    
    The first version of this fix also used `RawResponseItem` to decide when
    a steer had landed. Review feedback was that this is the wrong
    abstraction for client behavior.
    
    There was also a late edge case in core: if pending steer input was
    accepted after the final sampling decision but before `TurnComplete`,
    core would record that user message into history at task finish without
    emitting `ItemStarted(UserMessage)` / `ItemCompleted(UserMessage)`. TUI
    had a fallback to paper over that gap locally.
    
    ## Approach
    - `Enter` during an active turn now submits a normal `Op::UserTurn`
    immediately
    - TUI keeps a local pending-steer preview instead of rendering that user
    message into history immediately
    - when core records the steer as `ItemCompleted(UserMessage)`, TUI
    matches and removes the corresponding pending preview, then renders the
    committed user message
    - core now emits the same user-message lifecycle when
    `on_task_finished(...)` drains leftover pending user input, before
    `TurnComplete`
    - with that lifecycle gap closed in core, TUI no longer needs to flush
    pending steers into history on `TurnComplete`
    - if the turn is interrupted, pending steers and queued drafts are both
    restored into the composer, with pending steers first
    
    ## Notes
    - `Tab` still uses the real queued-message path
    - `queued_user_messages` and `pending_steers` are separate state with
    separate semantics
    - the pending-steer matching key is built directly from `UserInput`
    - this removes the new TUI dependency on `RawResponseItem`
    
    ## Validation
    - `just fmt`
    - `cargo test -p codex-core
    task_finish_emits_turn_item_lifecycle_for_leftover_pending_user_input --
    --nocapture`
    - `cargo test -p codex-tui`
  • Refactor plugin config and cache path (#13333)
    Update config.toml plugin entries to use
    <plugin_name>@<marketplace_name> as the key.
    Plugin now stays in
    [plugins/cache/marketplace-name/plugin-name/$version/]
    Clean up the plugin code structure.
    Add plugin install functionality (not used yet).
  • Build delegated realtime handoff text from all messages (#13395)
    ## Summary
    - Route delegated realtime handoff turns from all handoff message texts,
    preserving order
    - Fallback to input_transcript only when no messages are present
    - Add regression coverage for multi-message handoff requests
  • feat: pres artifact part 5 (#13355)
    Mostly written by Codex
  • feat: presentation artifact p1 (#13341)
    Part 1 of presentation tool artifact
  • app-server service tier plumbing (plus some cleanup) (#13334)
    followup to https://github.com/openai/codex/pull/13212 to expose fast
    tier controls to app server
    (majority of this PR is generated schema jsons - actual code is +69 /
    -35 and +24 tests )
    
    - add service tier fields to the app-server protocol surfaces used by
    thread lifecycle, turn start, config, and session configured events
    - thread service tier through the app-server message processor and core
    thread config snapshots
    - allow runtime config overrides to carry service tier for app-server
    callers
    
    cleanup:
    - Removing useless "legacy" code supporting "standard" - we moved to
    None | "fast", so "standard" is not needed.
  • fix: agent when profile (#13235)
    Co-authored-by: Josh McKinney <joshka@openai.com>
    Co-authored-by: Codex <noreply@openai.com>
  • add fast mode toggle (#13212)
    - add a local Fast mode setting in codex-core (similar to how model id
    is currently stored on disk locally)
    - send `service_tier=priority` on requests when Fast is enabled
    - add `/fast` in the TUI and persist it locally
    - feature flag
  • chore: remove SkillMetadata.permissions and derive skill sandboxing from permission_profile (#13061)
    ## Summary
    
    This change removes the compiled permissions field from skill metadata
    and keeps permission_profile as the single source of truth.
    
    Skill loading no longer compiles skill permissions eagerly. Instead, the
    zsh-fork skill escalation path compiles `skill.permission_profile` when
    it needs to determine the sandbox to apply for a skill script.
    
      ## Behavior change
    
      For skills that declare:
    ```
      permissions: {}
    ```
    we now treat that the same as having no skill permissions override,
    instead of creating and using a default readonly sandbox. This change
    makes the behavior more intuitive:
    
      - only non-empty skill permission profiles affect sandboxing
    - omitting permissions and writing permissions: {} now mean the same
    thing
    - skill metadata keeps a single permissions representation instead of
    storing derived state too
    
    Overall, this makes skill sandbox behavior easier to understand and more
    predictable.
  • Update realtime websocket API (#13265)
    - migrate the realtime websocket transport to the new session and
    handoff flow
    - make the realtime model configurable in config.toml and use API-key
    auth for the websocket
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • app-server: Update thread/name/set to support not-loaded threads (#13282)
    Currently `thread/name/set` does only work for loaded threads.
    Expand the scope to also support persisted but not-yet-loaded ones for a
    more predictable API surface.
    This will make it possible to rename threads discovered via
    `thread/list` and similar operations.
  • fix(core) shell_snapshot multiline exports (#12642)
    ## Summary
    Codex discovered this one - shell_snapshot tests were breaking on my
    machine because I had a multiline env var. We should handle these!
    
    ## Testing
    - [x] existing tests pass
    - [x] Updated unit tests
  • Fix project trust config parsing so CLI overrides work (#13090)
    Fixes #13076
    
    This PR fixes a bug that causes command-line config overrides for MCP
    subtables to not be merged correctly.
    
    Summary
    - make project trust loading go through the dedicated struct so CLI
    overrides can update trusted project-local MCP transports
    
    ---------
    
    Co-authored-by: jif-oai <jif@openai.com>
  • core: reuse parent shell snapshot for thread-spawn subagents (#13052)
    ## Summary
    - reuse the parent shell snapshot when spawning/forking/resuming
    `SessionSource::SubAgent(SubAgentSource::ThreadSpawn { .. })` sessions
    - plumb inherited snapshot through `AgentControl -> ThreadManager ->
    Codex::spawn -> SessionConfiguration`
    - skip shell snapshot refresh on cwd updates for thread-spawn subagents
    so inherited snapshots are not replaced
    
    ## Why
    - avoids per-subagent shell snapshot creation and cleanup work
    - keeps thread-spawn subagents on the parent snapshot path, matching the
    intended parent/child snapshot model
    
    ## Validation
    - `just fmt` (in `codex-rs`)
    - `cargo test -p codex-core --no-run`
    - `cargo test -p codex-core spawn_agent -- --nocapture`
    - `cargo test -p codex-core --test all
    suite::agent_jobs::spawn_agents_on_csv_runs_and_exports`
    
    ## Notes
    - full `cargo test -p codex-core --test all` was left running separately
    for broader verification
    
    Co-authored-by: Codex <noreply@openai.com>
  • feat: polluted memories (#13008)
    Add a feature flag to disable memory creation for "polluted"
  • Record realtime close marker on replacement (#13058)
    ## Summary
    - record a realtime close developer message when a new realtime session
    replaces an active one
    - assert the replacement marker through the mocked responses request
    path
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
    Co-authored-by: Charles Cunningham <ccunningham@openai.com>
  • fix: MacOSAutomationPermission::BundleIDs should allow communicating … (#12989)
    …with launchservicesd
    
    Add mach lookup for `launchservicesd` when extending the sandbox for
    `MacOSAutomationPermission::BundleIDs`. This is necessary so that the
    target application can be launched for automation.
    
    This omission was due to a spec error in a document, which has been
    fixed.
  • feat: load from plugins (#12864)
    Support loading plugins.
    
    Plugins can now be enabled via [plugins.<name>] in config.toml. They are
    loaded as first-class entities through PluginsManager, and their default
    skills/ and .mcp.json contributions are integrated into the existing
    skills and MCP flows.
  • core: resolve host_executable() rules during preflight (#13065)
    ## Why
    
    [#12964](https://github.com/openai/codex/pull/12964) added
    `host_executable()` support to `codex-execpolicy`, and
    [#13046](https://github.com/openai/codex/pull/13046) adopted it in the
    zsh-fork interception path.
    
    The remaining gap was the preflight execpolicy check in
    `core/src/exec_policy.rs`. That path derives approval requirements
    before execution for `shell`, `shell_command`, and `unified_exec`, but
    it was still using the default exact-token matcher.
    
    As a result, a command that already included an absolute executable
    path, such as `/usr/bin/git status`, could still miss a basename rule
    like `prefix_rule(pattern = ["git"], ...)` during preflight even when
    the policy also defined a matching `host_executable(name = "git", ...)`
    entry.
    
    This PR brings the same opt-in `host_executable()` resolution to the
    preflight approval path when an absolute program path is already present
    in the parsed command.
    
    ## What Changed
    
    - updated
    `ExecPolicyManager::create_exec_approval_requirement_for_command()` in
    `core/src/exec_policy.rs` to use `check_multiple_with_options(...)` with
    `MatchOptions { resolve_host_executables: true }`
    - kept the existing shell parsing flow for approval derivation, but now
    allow basename rules to match absolute executable paths during preflight
    when `host_executable()` permits it
    - updated requested-prefix amendment evaluation to use the same
    host-executable-aware matching mode, so suggested `prefix_rule()`
    amendments are checked consistently for absolute-path commands
    - added preflight coverage for:
    - absolute-path commands that should match basename rules through
    `host_executable()`
    - absolute-path commands whose paths are not in the allowed
    `host_executable()` mapping
      - requested prefix-rule amendments for absolute-path commands
    
    ## Verification
    
    - `just fix -p codex-core`
    - `cargo test -p codex-core --lib exec_policy::tests::`
  • Speed up subagent startup (#12935)
    ## Summary
    - skip online model refresh for subagent sessions
    - avoid rollout flushes during subagent startup
    - keep /models refresh for non-subagent sessions
    
    ## Testing
    - cargo test -p codex-core --test all
    suite::models_etag_responses::refresh_models_on_models_etag_mismatch_and_avoid_duplicate_models_fetch
    - cargo test -p codex-core --test all
    suite::remote_models::remote_models_long_model_slug_is_sent_with_high_reasoning
    - cargo test -p codex-core --test all
    suite::model_switching::model_switch_to_smaller_model_updates_token_context_window
    - cargo test -p codex-core --test all
    suite::compact::pre_sampling_compact_runs_on_switch_to_smaller_context_model
    - cargo test -p codex-core --test all
    suite::compact::pre_sampling_compact_runs_after_resume_and_switch_to_smaller_model
    - cargo test -p codex-core --test all
    suite::personality::remote_model_friendly_personality_instructions_with_feature
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Make cloud_requirements fail close (#13063)
    Make it fail-close only for CLI for now
    Will extend this for app-server later
  • app-server: Add ephemeral field to Thread object (#13084)
    Currently there is no alternative way to know that thread is ephemeral,
    only client which did create it has the knowledge.
  • core: adopt host_executable() rules in zsh-fork (#13046)
    ## Why
    
    [#12964](https://github.com/openai/codex/pull/12964) added
    `host_executable()` support to `codex-execpolicy`, but the zsh-fork
    interception path in `unix_escalation.rs` was still evaluating commands
    with the default exact-token matcher.
    
    That meant an intercepted absolute executable such as `/usr/bin/git
    status` could still miss basename rules like `prefix_rule(pattern =
    ["git", "status"])`, even when the policy also defined a matching
    `host_executable(name = "git", ...)` entry.
    
    This PR adopts the new matching behavior in the zsh-fork runtime only.
    That keeps the rollout intentionally narrow: zsh-fork already requires
    explicit user opt-in, so it is a safer first caller to exercise the new
    `host_executable()` scheme before expanding it to other execpolicy call
    sites.
    
    It also brings zsh-fork back in line with the current `prefix_rule()`
    execution model. Until prefix rules can carry their own permission
    profiles, a matched `prefix_rule()` is expected to rerun the intercepted
    command unsandboxed on `allow`, or after the user accepts `prompt`,
    instead of merely continuing inside the inherited shell sandbox.
    
    ## What Changed
    
    - added `evaluate_intercepted_exec_policy()` in
    `core/src/tools/runtimes/shell/unix_escalation.rs` to centralize
    execpolicy evaluation for intercepted commands
    - switched intercepted direct execs in the zsh-fork path to
    `check_multiple_with_options(...)` with `MatchOptions {
    resolve_host_executables: true }`
    - added `commands_for_intercepted_exec_policy()` so zsh-fork policy
    evaluation works from intercepted `(program, argv)` data instead of
    reconstructing a synthetic command before matching
    - left shell-wrapper parsing intentionally disabled by default behind
    `ENABLE_INTERCEPTED_EXEC_POLICY_SHELL_WRAPPER_PARSING`, so
    path-sensitive matching relies on later direct exec interception rather
    than shell-script parsing
    - made matched `prefix_rule()` decisions rerun intercepted commands with
    `EscalationExecution::Unsandboxed`, while unmatched-command fallback
    keeps the existing sandbox-preserving behavior
    - extracted the zsh-fork test harness into
    `core/tests/common/zsh_fork.rs` so both the skill-focused and
    approval-focused integration suites can exercise the same runtime setup
    - limited this change to the intercepted zsh-fork path rather than
    changing every execpolicy caller at once
    - added runtime coverage in
    `core/src/tools/runtimes/shell/unix_escalation_tests.rs` for allowed and
    disallowed `host_executable()` mappings and the wrapper-parsing modes
    - added integration coverage in `core/tests/suite/approvals.rs` to
    verify a saved `prefix_rule(pattern=["touch"], decision="allow")` reruns
    under zsh-fork outside a restrictive `WorkspaceWrite` sandbox
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13046).
    * #13065
    * __->__ #13046
  • Add model availability NUX tooltips (#13021)
    - override startup tooltips with model availability NUX and persist
    per-model show counts in config
    - stop showing each model after four exposures and fall back to normal
    tooltips
  • Handle missing plan info for ChatGPT accounts (#13072)
    Addresses https://github.com/openai/codex/issues/13007 and
    https://github.com/openai/codex/issues/12170
    
    There are situations where the ChatGPT auth backend might return a JWT
    that contains no plan information. Most code paths already handle this
    case well, but the internal implementation of the "account/read" app
    server call was failing in this case (returning an error rather than
    properly returning None for the plan).
    
    This resulted in a situation where users needed to log in every time the
    extension or app started even if they successfully logged in the last
    time.
    
    Summary
    - allow ChatGPT-authenticated accounts to fall back to
    `AccountPlanType::Unknown` when the token omits the plan claim
    - add regression coverage in `app-server/tests/suite/v2/account.rs` to
    confirm `account/read` returns `plan_type: Unknown` when the claim is
    absent
    - ensure the Rust auth helpers and fixtures treat missing plan claims as
    Optional and default to `Unknown`
  • Unify rollout reconstruction with resume/fork TurnContext hydration (#12612)
    ## Summary
    
    This PR unifies rollout history reconstruction and resume/fork metadata
    hydration under a single `Session::reconstruct_history_from_rollout`
    implementation.
    
    The key change from main is that replay metadata now comes from the same
    reconstruction pass that rebuilds model-visible history, instead of
    doing a second bespoke rollout scan to recover `previous_model` /
    `reference_context_item`.
    
    ## What Changed
    
    ### Unified reconstruction output
    
    `reconstruct_history_from_rollout` now returns a single
    `RolloutReconstruction` bundle containing:
    
    - rebuilt `history`
    - `previous_model`
    - `reference_context_item`
    
    Resume and fork both consume that shared output directly.
    
    ### Reverse replay core
    
    The reconstruction logic moved into
    `codex-rs/core/src/codex/rollout_reconstruction.rs` and now scans
    rollout items newest-to-oldest.
    
    That reverse pass:
    
    - derives `previous_model`
    - derives whether `reference_context_item` is preserved or cleared
    - stops early once it has both resume metadata and a surviving
    `replacement_history` checkpoint
    
    History materialization is still bridged eagerly for now by replaying
    only the surviving suffix forward, which keeps the history result stable
    while moving the control flow toward the future lazy reverse loader
    design.
    
    ### Removed bespoke context lookup
    
    This deletes `last_rollout_regular_turn_context_lookup` and its separate
    compaction-aware scan.
    
    The previous model / baseline metadata is now computed from the same
    replay state that rebuilds history, so resume/fork cannot drift from the
    reconstructed transcript view.
    
    ### `TurnContextItem` persistence contract
    
    `TurnContextItem` is now treated as the replay source of truth for
    durable model-visible baselines.
    
    This PR keeps the following contract explicit:
    
    - persist `TurnContextItem` for the first real user turn so resume can
    recover `previous_model`
    - persist it for later turns that emit model-visible context updates
    - if mid-turn compaction reinjects full initial context into replacement
    history, persist a fresh `TurnContextItem` after `Compacted` so
    resume/fork can re-establish the baseline from the rewritten history
    - do not treat manual compaction or pre-sampling compaction as creating
    a new durable baseline on their own
    
    ## Behavior Preserved
    
    - rollback replay stays aligned with `drop_last_n_user_turns`
    - rollback skips only user turns
    - incomplete active user turns are dropped before older finalized turns
    when rollback applies
    - unmatched aborts do not consume the current active turn
    - missing abort IDs still conservatively clear stale compaction state
    - compaction clears `reference_context_item` until a later
    `TurnContextItem` re-establishes it
    - `previous_model` still comes from the newest surviving user turn that
    established one
    
    ## Tests
    
    Targeted validation run for the current branch shape:
    
    - `cd codex-rs && cargo test -p codex-core --lib
    codex::rollout_reconstruction_tests -- --nocapture`
    - `cd codex-rs && just fmt`
    
    The branch also extracts the rollout reconstruction tests into
    `codex-rs/core/src/codex/rollout_reconstruction_tests.rs` so this logic
    has a dedicated home instead of living inline in `codex.rs`.
  • execpolicy: add host_executable() path mappings (#12964)
    ## Why
    
    `execpolicy` currently keys `prefix_rule()` matching off the literal
    first token. That works for rules like `["/usr/bin/git"]`, but it means
    shared basename rules such as `["git"]` do not help when a caller passes
    an absolute executable path like `/usr/bin/git`.
    
    This PR lays the groundwork for basename-aware matching without changing
    existing callers yet. It adds typed host-executable metadata and an
    opt-in resolution path in `codex-execpolicy`, so a follow-up PR can
    adopt the new behavior in `unix_escalation.rs` and other call sites
    without having to redesign the policy layer first.
    
    ## What Changed
    
    - added `host_executable(name = ..., paths = [...])` to the execpolicy
    parser and validated it with `AbsolutePathBuf`
    - stored host executable mappings separately from prefix rules inside
    `Policy`
    - added `MatchOptions` and opt-in `*_with_options()` APIs that preserve
    existing behavior by default
    - implemented exact-first matching with optional basename fallback,
    gated by `host_executable()` allowlists when present
    - normalized executable names for cross-platform matching so Windows
    paths like `git.exe` can satisfy `host_executable(name = "git", ...)`
    - updated `match` / `not_match` example validation to exercise the
    host-executable resolution path instead of only raw prefix-rule matching
    - preserved source locations for deferred example-validation errors so
    policy load failures still point at the right file and line
    - surfaced `resolvedProgram` on `RuleMatch` so callers can tell when a
    basename rule matched an absolute executable path
    - preserved host executable metadata when requirements policies overlay
    file-based policies in `core/src/exec_policy.rs`
    - documented the new rule shape and CLI behavior in
    `execpolicy/README.md`
    
    ## Verification
    
    - `cargo test -p codex-execpolicy`
    - added coverage in `execpolicy/tests/basic.rs` for parsing, precedence,
    empty allowlists, basename fallback, exact-match precedence, and
    host-executable-backed `match` / `not_match` examples
    - added a regression test in `core/src/exec_policy.rs` to verify
    requirements overlays preserve `host_executable()` metadata
    - verified `cargo test -p codex-core --lib`, including source-rendering
    coverage for deferred validation errors
  • fix(tui): promote windows terminal diff ansi16 to truecolor (#13016)
    ## Summary
    
    - Promote ANSI-16 to truecolor for diff rendering when running inside
    Windows Terminal
    - Respect explicit `FORCE_COLOR` override, skipping promotion when set
    - Extract a pure `diff_color_level_for_terminal` function for
    testability
    - Strip background tints from ANSI-16 diff output, rendering add/delete
    lines with foreground color only
    - Introduce `RichDiffColorLevel` to type-safely restrict background
    fills to truecolor and ansi256
    
    ## Problem
    
    Windows Terminal fully supports 24-bit (truecolor) rendering but often
    does not provide the usual TERM metadata (`TERM`, `TERM_PROGRAM`,
    `COLORTERM`) in `cmd.exe`/PowerShell sessions. In those environments,
    `supports-color` can report only ANSI-16 support. The diff renderer
    therefore falls back to a 16-color palette, producing washed-out,
    hard-to-read diffs.
    
    The screenshots below demonstrate that both PowerShell and cmd.exe don't
    set any `*TERM*` environment variables.
    
    | PowerShell | cmd.exe |
    |---|---|
    | <img width="2032" height="1162" alt="SCR-20260226-nfvy"
    src="https://github.com/user-attachments/assets/59e968cc-4add-4c7b-a415-07163297e86a"
    /> | <img width="2032" height="1162" alt="SCR-20260226-nfyc"
    src="https://github.com/user-attachments/assets/d06b3e39-bf91-4ce3-9705-82bf9563a01b"
    /> |
    
    
    ## Mental model
    
    `StdoutColorLevel` (from `supports-color`) is the _detected_ capability.
    `DiffColorLevel` is the _intended_ capability for diff rendering. A new
    intermediary — `diff_color_level_for_terminal` — maps one to the other
    and is the single place where terminal-specific overrides live.
    
    Windows Terminal is detected two independent ways: the `TerminalName`
    parsed by `terminal_info()` and the raw presence of `WT_SESSION`. When
    `WT_SESSION` is present and `FORCE_COLOR` is not set, we promote
    unconditionally to truecolor. When `WT_SESSION` is absent but
    `TerminalName::WindowsTerminal` is detected, we promote only the ANSI-16
    level (not `Unknown`).
    
    A single override helper — `has_force_color_override()` — checks whether
    `FORCE_COLOR` is set. When it is, both the `WT_SESSION` fast-path and
    the `TerminalName`-based promotion are suppressed, preserving explicit
    user intent.
    
    | PowerShell | cmd.exe | WSL | Bash for Windows |
    |---|---|---|---|
    |
    ![SCR-20260226-msrh](https://github.com/user-attachments/assets/0f6297a6-4241-4dbf-b7ff-cf02da8941b0)
    |
    ![SCR-20260226-nbao](https://github.com/user-attachments/assets/bb5ff8a9-903c-4677-a2de-1f6e1f34b18e)
    |
    ![SCR-20260226-nbej](https://github.com/user-attachments/assets/26ecec2c-a7e9-410a-8702-f73995b490a6)
    |
    ![SCR-20260226-nbkz](https://github.com/user-attachments/assets/80c4bf9a-3b41-40e1-bc87-f5c565f96075)
    |
    
    ## Non-goals
    
    - This does not change color detection for anything outside the diff
    renderer (e.g. the chat widget, markdown rendering).
    - This does not add a user-facing config knob; `FORCE_COLOR` already
    serves that role.
    
    ## Tradeoffs
    
    - The `has_wt_session` signal is intentionally kept separate from
    `TerminalName::WindowsTerminal`. `terminal_info()` is derived with
    `TERM_PROGRAM` precedence, so it can differ from raw `WT_SESSION`.
    - Real-world validation in this issue: in both `cmd.exe` and PowerShell,
    `TERM`/`TERM_PROGRAM`/`COLORTERM` were absent, so TERM-based capability
    hints were unavailable in those sessions.
    - Checking `FORCE_COLOR` for presence rather than parsing its value is a
    simplification. In practice `supports-color` has already parsed it, so
    our check is a coarse "did the user set _anything_?" gate. The effective
    color level still comes from `supports-color`.
    - When `WT_SESSION` is present without `FORCE_COLOR`, we promote to
    truecolor regardless of `stdout_level` (including `Unknown`). This is
    aggressive but correct: `WT_SESSION` is a strong signal that we're in
    Windows Terminal.
    - ANSI-16 add/delete backgrounds (bright green/red) overpower
    syntax-highlighted token colors, making diffs harder to read.
    Foreground-only cues (colored text, gutter signs) preserve readability
    on low-color terminals.
    
    ## Architecture
    
    ```
    stdout_color_level()  ──┐
    terminal_info().name  ──┤
    WT_SESSION presence   ──┼──▶ diff_color_level_for_terminal() ──▶ DiffColorLevel
    FORCE_COLOR presence  ──┘                                            │
                                                                         ▼
                                                              RichDiffColorLevel::from_diff_color_level()
                                                                         │
                                                              ┌──────────┴──────────┐
                                                              │ Some(TrueColor|256) │ → bg tints
                                                              │ None (Ansi16)       │ → fg only
                                                              └─────────────────────┘
    ```
    
    `diff_color_level()` is the environment-reading entry point; it gathers
    the four runtime signals and delegates to the pure, testable
    `diff_color_level_for_terminal()`.
    
    ## Observability
    
    No new logs or metrics. Incorrect color selection is immediately visible
    as broken diff rendering; the test suite covers the decision matrix
    exhaustively.
    
    ## Tests
    
    Six new unit tests exercise every branch of
    `diff_color_level_for_terminal`:
    
    | Test | Inputs | Expected |
    |------|--------|----------|
    | `windows_terminal_promotes_ansi16_to_truecolor_for_diffs` | Ansi16 +
    WindowsTerminal name | TrueColor |
    | `wt_session_promotes_ansi16_to_truecolor_for_diffs` | Ansi16 +
    WT_SESSION only | TrueColor |
    | `non_windows_terminal_keeps_ansi16_diff_palette` | Ansi16 + WezTerm |
    Ansi16 |
    | `wt_session_promotes_unknown_color_level_to_truecolor` | Unknown +
    WT_SESSION | TrueColor |
    | `explicit_force_override_keeps_ansi16_on_windows_terminal` | Ansi16 +
    WindowsTerminal + FORCE_COLOR | Ansi16 |
    | `explicit_force_override_keeps_ansi256_on_windows_terminal` | Ansi256
    + WT_SESSION + FORCE_COLOR | Ansi256 |
    | `ansi16_add_style_uses_foreground_only` | Dark + Ansi16 | fg=Green,
    bg=None |
    | (and any other new snapshot/assertion tests from commits d757fee and
    d7c78b3) | | |
    
    ## Test plan
    
    - [x] Verify all new unit tests pass (`cargo test -p codex-tui --lib`)
    - [x] On Windows Terminal: confirm diffs render with truecolor
    backgrounds
    - [x] On Windows Terminal with `FORCE_COLOR` set: confirm promotion is
    disabled and output follows the forced `supports-color` level
    - [x] On macOS/Linux terminals: confirm no behavior change
    
    Fixes https://github.com/openai/codex/issues/12904 
    Fixes https://github.com/openai/codex/issues/12890
    Fixes https://github.com/openai/codex/issues/12912
    Fixes https://github.com/openai/codex/issues/12840
  • fix: use AbsolutePathBuf for permission profile file roots (#12970)
    ## Why
    `PermissionProfile` should describe filesystem roots as absolute paths
    at the type level. Using `PathBuf` in `FileSystemPermissions` made the
    shared type too permissive and blurred together three different
    deserialization cases:
    
    - skill metadata in `agents/openai.yaml`, where relative paths should
    resolve against the skill directory
    - app-server API payloads, where callers should have to send absolute
    paths
    - local tool-call payloads for commands like `shell_command` and
    `exec_command`, where `additional_permissions.file_system` may
    legitimately be relative to the command `workdir`
    
    This change tightens the shared model without regressing the existing
    local command flow.
    
    ## What Changed
    - changed `protocol::models::FileSystemPermissions` and the app-server
    `AdditionalFileSystemPermissions` mirror to use `AbsolutePathBuf`
    - wrapped skill metadata deserialization in `AbsolutePathBufGuard`, so
    relative permission roots in `agents/openai.yaml` resolve against the
    containing skill directory
    - kept app-server/API deserialization strict, so relative
    `additionalPermissions.fileSystem.*` paths are rejected at the boundary
    - restored cwd/workdir-relative deserialization for local tool-call
    payloads by parsing `shell`, `shell_command`, and `exec_command`
    arguments under an `AbsolutePathBufGuard` rooted at the resolved command
    working directory
    - simplified runtime additional-permission normalization so it only
    canonicalizes and deduplicates absolute roots instead of trying to
    recover relative ones later
    - updated the app-server schema fixtures, `app-server/README.md`, and
    the affected transport/TUI tests to match the final behavior
  • notify: include client in legacy hook payload (#12968)
    ## Why
    
    The `notify` hook payload did not identify which Codex client started
    the turn. That meant downstream notification hooks could not distinguish
    between completions coming from the TUI and completions coming from
    app-server clients such as VS Code or Xcode. Now that the Codex App
    provides its own desktop notifications, it would be nice to be able to
    filter those out.
    
    This change adds that context without changing the existing payload
    shape for callers that do not know the client name, and keeps the new
    end-to-end test cross-platform.
    
    ## What changed
    
    - added an optional top-level `client` field to the legacy `notify` JSON
    payload
    - threaded that value through `core` and `hooks`; the internal session
    and turn state now carries it as `app_server_client_name`
    - set the field to `codex-tui` for TUI turns
    - captured `initialize.clientInfo.name` in the app server and applied it
    to subsequent turns before dispatching hooks
    - replaced the notify integration test hook with a `python3` script so
    the test does not rely on Unix shell permissions or `bash`
    - documented the new field in `docs/config.md`
    
    ## Testing
    
    - `cargo test -p codex-hooks`
    - `cargo test -p codex-tui`
    - `cargo test -p codex-app-server
    suite::v2::initialize::turn_start_notify_payload_includes_initialize_client_name
    -- --exact --nocapture`
    - `cargo test -p codex-core` (`src/lib.rs` passed; `core/tests/all.rs`
    still has unrelated existing failures in this environment)
    
    ## Docs
    
    The public config reference on `developers.openai.com/codex` should
    mention that the legacy `notify` payload may include a top-level
    `client` field. The TUI reports `codex-tui`, and the app server reports
    `initialize.clientInfo.name` when it is available.