Commit Graph

280 Commits

  • feat: support product-scoped plugins. (#15041)
    1. Added SessionSource::Custom(String) and --session-source.
      2. Enforced plugin and skill products by session_source.
      3. Applied the same filtering to curated background refresh.
  • Add thread/shellCommand to app server API surface (#14988)
    This PR adds a new `thread/shellCommand` app server API so clients can
    implement `!` shell commands. These commands are executed within the
    sandbox, and the command text and output are visible to the model.
    
    The internal implementation mirrors the current TUI `!` behavior.
    - persist shell command execution as `CommandExecution` thread items,
    including source and formatted output metadata
    - bridge live and replayed app-server command execution events back into
    the existing `tui_app_server` exec rendering path
    
    This PR also wires `tui_app_server` to submit `!` commands through the
    new API.
  • fix: harden plugin feature gating (#15104)
    Resubmit https://github.com/openai/codex/pull/15020 with correct
    content.
    
    1. Use requirement-resolved config.features as the plugin gate.
    2. Guard plugin/list, plugin/read, and related flows behind that gate.
    3. Skip bad marketplace.json files instead of failing the whole list.
    4. Simplify plugin state and caching.
  • Feat: reuse persisted model and reasoning effort on thread resume (#14888)
    ## Summary
    
    This PR makes `thread/resume` reuse persisted thread model metadata when
    the caller does not explicitly override it.
    
    Changes:
    - read persisted thread metadata from SQLite during `thread/resume`
    - reuse persisted `model` and `model_reasoning_effort` as resume-time
    defaults
    - fetch persisted metadata once and reuse it later in the resume
    response path
    - keep thread summary loading on the existing rollout path, while
    reusing persisted metadata when available
    - document the resume fallback behavior in the app-server README
    
    ## Why
    
    Before this change, resuming a thread without explicit overrides derived
    `model` and `model_reasoning_effort` from current config, which could
    drift from the thread’s last persisted values. That meant a resumed
    thread could report and run with different model settings than the ones
    it previously used.
    
    ## Behavior
    
    Precedence on `thread/resume` is now:
    1. explicit resume overrides
    2. persisted SQLite metadata for the thread
    3. normal config resolution for the resumed cwd
  • Revert "fix: harden plugin feature gating" (#15102)
    Reverts openai/codex#15020
    
    I messed up the commit in my PR and accidentally merged changes that
    were still under review.
  • fix: harden plugin feature gating (#15020)
    1. Use requirement-resolved config.features as the plugin gate.
    2. Guard plugin/list, plugin/read, and related flows behind that gate.
    3. Skip bad marketplace.json files instead of failing the whole list.
    4. Simplify plugin state and caching.
  • Prefer websockets when providers support them (#13592)
    Remove all flags and model settings.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • feat: Add product-aware plugin policies and clean up manifest naming (#14993)
    - Add shared Product support to marketplace plugin policy and skill
    policy (no enforced yet).
    - Move marketplace installation/authentication under policy and model it
    as MarketplacePluginPolicy.
    - Rename plugin/marketplace local manifest types to separate raw serde
    shapes from resolved in-memory models.
  • Cleanup skills/remote/xxx endpoints. (#14977)
    Remote skills/remote/xxx as they are not in used for now.
  • fix: align marketplace display name with existing interface conventions (#14886)
    1. camelCase for displayName;
    2. move displayName under interface.
  • feat: support remote_sync for plugin install/uninstall. (#14878)
    - Added forceRemoteSync to plugin/install and plugin/uninstall.
    - With forceRemoteSync=true, we update the remote plugin status first,
    then apply the local change only if the backend call succeeds.
    - Kept plugin/list(forceRemoteSync=true) as the main recon path, and for
    now it treats remote enabled=false as uninstall. We
    will eventually migrate to plugin/installed for more precise state
    handling.
  • Add marketplace display names to plugin/list (#14861)
    Add display_name support to marketplace.json.
  • Apply argument comment lint across codex-rs (#14652)
    ## Why
    
    Once the repo-local lint exists, `codex-rs` needs to follow the
    checked-in convention and CI needs to keep it from drifting. This commit
    applies the fallback `/*param*/` style consistently across existing
    positional literal call sites without changing those APIs.
    
    The longer-term preference is still to avoid APIs that require comments
    by choosing clearer parameter types and call shapes. This PR is
    intentionally the mechanical follow-through for the places where the
    existing signatures stay in place.
    
    After rebasing onto newer `main`, the rollout also had to cover newly
    introduced `tui_app_server` call sites. That made it clear the first cut
    of the CI job was too expensive for the common path: it was spending
    almost as much time installing `cargo-dylint` and re-testing the lint
    crate as a representative test job spends running product tests. The CI
    update keeps the full workspace enforcement but trims that extra
    overhead from ordinary `codex-rs` PRs.
    
    ## What changed
    
    - keep a dedicated `argument_comment_lint` job in `rust-ci`
    - mechanically annotate remaining opaque positional literals across
    `codex-rs` with exact `/*param*/` comments, including the rebased
    `tui_app_server` call sites that now fall under the lint
    - keep the checked-in style aligned with the lint policy by using
    `/*param*/` and leaving string and char literals uncommented
    - cache `cargo-dylint`, `dylint-link`, and the relevant Cargo
    registry/git metadata in the lint job
    - split changed-path detection so the lint crate's own `cargo test` step
    runs only when `tools/argument-comment-lint/*` or `rust-ci.yml` changes
    - continue to run the repo wrapper over the `codex-rs` workspace, so
    product-code enforcement is unchanged
    
    Most of the code changes in this commit are intentionally mechanical
    comment rewrites or insertions driven by the lint itself.
    
    ## Verification
    
    - `./tools/argument-comment-lint/run.sh --workspace`
    - `cargo test -p codex-tui-app-server -p codex-tui`
    - parsed `.github/workflows/rust-ci.yml` locally with PyYAML
    
    ---
    
    * -> #14652
    * #14651
  • fix: tui freeze when sub-agents are present (#14816)
    The issue was due to a circular `Drop` schema where the embedded
    app-server wait for some listeners that wait for this app-server
    them-selves.
    
    The fix is an explicit cleaning
    
    **Repro:**
    * Start codex
    * Ask it to spawn a sub-agent
    * Close Codex
    * It takes 5s to exit
  • dynamic tool calls: add param exposeToContext to optionally hide tool (#14501)
    This extends dynamic_tool_calls to allow us to hide a tool from the
    model context but still use it as part of the general tool calling
    runtime (for ex from js_repl/code_mode)
  • Add Smart Approvals guardian review across core, app-server, and TUI (#13860)
    ## Summary
    - add `approvals_reviewer = "user" | "guardian_subagent"` as the runtime
    control for who reviews approval requests
    - route Smart Approvals guardian review through core for command
    execution, file changes, managed-network approvals, MCP approvals, and
    delegated/subagent approval flows
    - expose guardian review in app-server with temporary unstable
    `item/autoApprovalReview/{started,completed}` notifications carrying
    `targetItemId`, `review`, and `action`
    - update the TUI so Smart Approvals can be enabled from `/experimental`,
    aligned with the matching `/approvals` mode, and surfaced clearly while
    reviews are pending or resolved
    
    ## Runtime model
    This PR does not introduce a new `approval_policy`.
    
    Instead:
    - `approval_policy` still controls when approval is needed
    - `approvals_reviewer` controls who reviewable approval requests are
    routed to:
      - `user`
      - `guardian_subagent`
    
    `guardian_subagent` is a carefully prompted reviewer subagent that
    gathers relevant context and applies a risk-based decision framework
    before approving or denying the request.
    
    The `smart_approvals` feature flag is a rollout/UI gate. Core runtime
    behavior keys off `approvals_reviewer`.
    
    When Smart Approvals is enabled from the TUI, it also switches the
    current `/approvals` settings to the matching Smart Approvals mode so
    users immediately see guardian review in the active thread:
    - `approval_policy = on-request`
    - `approvals_reviewer = guardian_subagent`
    - `sandbox_mode = workspace-write`
    
    Users can still change `/approvals` afterward.
    
    Config-load behavior stays intentionally narrow:
    - plain `smart_approvals = true` in `config.toml` remains just the
    rollout/UI gate and does not auto-set `approvals_reviewer`
    - the deprecated `guardian_approval = true` alias migration does
    backfill `approvals_reviewer = "guardian_subagent"` in the same scope
    when that reviewer is not already configured there, so old configs
    preserve their original guardian-enabled behavior
    
    ARC remains a separate safety check. For MCP tool approvals, ARC
    escalations now flow into the configured reviewer instead of always
    bypassing guardian and forcing manual review.
    
    ## Config stability
    The runtime reviewer override is stable, but the config-backed
    app-server protocol shape is still settling.
    
    - `thread/start`, `thread/resume`, and `turn/start` keep stable
    `approvalsReviewer` overrides
    - the config-backed `approvals_reviewer` exposure returned via
    `config/read` (including profile-level config) is now marked
    `[UNSTABLE]` / experimental in the app-server protocol until we are more
    confident in that config surface
    
    ## App-server surface
    This PR intentionally keeps the guardian app-server shape narrow and
    temporary.
    
    It adds generic unstable lifecycle notifications:
    - `item/autoApprovalReview/started`
    - `item/autoApprovalReview/completed`
    
    with payloads of the form:
    - `{ threadId, turnId, targetItemId, review, action? }`
    
    `review` is currently:
    - `{ status, riskScore?, riskLevel?, rationale? }`
    - where `status` is one of `inProgress`, `approved`, `denied`, or
    `aborted`
    
    `action` carries the guardian action summary payload from core when
    available. This lets clients render temporary standalone pending-review
    UI, including parallel reviews, even when the underlying tool item has
    not been emitted yet.
    
    These notifications are explicitly documented as `[UNSTABLE]` and
    expected to change soon.
    
    This PR does **not** persist guardian review state onto `thread/read`
    tool items. The intended follow-up is to attach guardian review state to
    the reviewed tool item lifecycle instead, which would improve
    consistency with manual approvals and allow thread history / reconnect
    flows to replay guardian review state directly.
    
    ## TUI behavior
    - `/experimental` exposes the rollout gate as `Smart Approvals`
    - enabling it in the TUI enables the feature and switches the current
    session to the matching Smart Approvals `/approvals` mode
    - disabling it in the TUI clears the persisted `approvals_reviewer`
    override when appropriate and returns the session to default manual
    review when the effective reviewer changes
    - `/approvals` still exposes the reviewer choice directly
    - the TUI renders:
    - pending guardian review state in the live status footer, including
    parallel review aggregation
      - resolved approval/denial state in history
    
    ## Scope notes
    This PR includes the supporting core/runtime work needed to make Smart
    Approvals usable end-to-end:
    - shell / unified-exec / apply_patch / managed-network / MCP guardian
    review
    - delegated/subagent approval routing into guardian review
    - guardian review risk metadata and action summaries for app-server/TUI
    - config/profile/TUI handling for `smart_approvals`, `guardian_approval`
    alias migration, and `approvals_reviewer`
    - a small internal cleanup of delegated approval forwarding to dedupe
    fallback paths and simplify guardian-vs-parent approval waiting (no
    intended behavior change)
    
    Out of scope for this PR:
    - redesigning the existing manual approval protocol shapes
    - persisting guardian review state onto app-server `ThreadItem`s
    - delegated MCP elicitation auto-review (the current delegated MCP
    guardian shim only covers the legacy `RequestUserInput` path)
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • app-server: add v2 filesystem APIs (#14245)
    Add a protocol-level filesystem surface to the v2 app-server so Codex
    clients can read and write files, inspect directories, and subscribe to
    path changes without relying on host-specific helpers.
    
    High-level changes:
    - define the new v2 fs/readFile, fs/writeFile, fs/createDirectory,
    fs/getMetadata, fs/readDirectory, fs/remove, fs/copy RPCs
    - implement the app-server handlers, including absolute-path validation,
    base64 file payloads, recursive copy/remove semantics
    - document the API, regenerate protocol schemas/types, and add
    end-to-end tests for filesystem operations, copy edge cases
    
    Testing plan:
    - validate protocol serialization and generated schema output for the
    new fs request, response, and notification types
    - run app-server integration coverage for file and directory CRUD paths,
    metadata/readDirectory responses, copy failure modes, and absolute-path
    validation
  • feat(app-server, core): add more spans (#14479)
    ## Description
    
    This PR expands tracing coverage across app-server thread startup, core
    session initialization, and the Responses transport layer. It also gives
    core dispatch spans stable operation-specific names so traces are easier
    to follow than the old generic `submission_dispatch` spans.
    
    Also use `fmt::Display` for types that we serialize in traces so we send
    strings instead of rust types
  • Use a private desktop for Windows sandbox instead of Winsta0\Default (#14400)
    ## Summary
    - launch Windows sandboxed children on a private desktop instead of
    `Winsta0\Default`
    - make private desktop the default while keeping
    `windows.sandbox_private_desktop=false` as the escape hatch
    - centralize process launch through the shared
    `create_process_as_user(...)` path
    - scope the private desktop ACL to the launching logon SID
    
    ## Why
    Today sandboxed Windows commands run on the visible shared desktop. That
    leaves an avoidable same-desktop attack surface for window interaction,
    spoofing, and related UI/input issues. This change moves sandboxed
    commands onto a dedicated per-launch desktop by default so the sandbox
    no longer shares `Winsta0\Default` with the user session.
    
    The implementation stays conservative on security with no silent
    fallback back to `Winsta0\Default`
    
    If private-desktop setup fails on a machine, users can still opt out
    explicitly with `windows.sandbox_private_desktop=false`.
    
    ## Validation
    - `cargo build -p codex-cli`
    - elevated-path `codex exec` desktop-name probe returned
    `CodexSandboxDesktop-*`
    - elevated-path `codex exec` smoke sweep for shell commands, nested
    `pwsh`, jobs, and hidden `notepad` launch
    - unelevated-path full private-desktop compatibility sweep via `codex
    exec` with `-c windows.sandbox=unelevated`
  • Refactor cloud requirements error and surface in JSON-RPC error (#14504)
    Refactors cloud requirements error handling to carry structured error
    metadata and surfaces that metadata through JSON-RPC config-load
    failures, including:
    * adds typed CloudRequirementsLoadErrorCode values plus optional
    statusCode
    * marks thread/start, thread/resume, and thread/fork config failures
    with structured cloud-requirements error data
  • feat: add plugin/read. (#14445)
    return more information for a specific plugin.
  • use scopes_supported for OAuth when present on MCP servers (#14419)
    Fixes [#8889](https://github.com/openai/codex/issues/8889).
    
    ## Summary
    - Discover and use advertised MCP OAuth `scopes_supported` when no
    explicit or configured scopes are present.
    - Apply the same scope precedence across `mcp add`, `mcp login`, skill
    dependency auto-login, and app-server MCP OAuth login.
    - Keep discovered scopes ephemeral and non-persistent.
    - Retry once without scopes for CLI and skill auto-login flows if the
    OAuth provider rejects discovered scopes.
    
    ## Motivation
    Some MCP servers advertise the scopes they expect clients to request
    during OAuth, but Codex was ignoring that metadata and typically
    starting OAuth with no scopes unless the user manually passed `--scopes`
    or configured `server.scopes`.
    
    That made compliant MCP servers harder to use out of the box and is the
    behavior described in
    [#8889](https://github.com/openai/codex/issues/8889).
    
    This change also brings our behavior in line with the MCP authorization
    spec's scope selection guidance:
    
    https://modelcontextprotocol.io/specification/2025-11-25/basic/authorization#scope-selection-strategy
    
    ## Behavior
    Scope selection now follows this order everywhere:
    1. Explicit request scopes / CLI `--scopes`
    2. Configured `server.scopes`
    3. Discovered `scopes_supported`
    4. Legacy empty-scope behavior
    
    Compatibility notes:
    - Existing working setups keep the same behavior because explicit and
    configured scopes still win.
    - Discovered scopes are never written back into config or token storage.
    - If discovery is missing, malformed, or empty, behavior falls back to
    the previous empty-scope path.
    - App-server login gets the same precedence rules, but does not add a
    transparent retry path in this change.
    
    ## Implementation
    - Extend streamable HTTP OAuth discovery to parse and normalize
    `scopes_supported`.
    - Add a shared MCP scope resolver in `core` so all login entrypoints use
    the same precedence rules.
    - Preserve provider callback errors from the OAuth flow so CLI/skill
    flows can safely distinguish provider rejections from other failures.
    - Reuse discovered scopes from the existing OAuth support check where
    possible instead of persisting new config.
  • refactor: make bubblewrap the default Linux sandbox (#13996)
    ## Summary
    - make bubblewrap the default Linux sandbox and keep
    `use_legacy_landlock` as the only override
    - remove `use_linux_sandbox_bwrap` from feature, config, schema, and
    docs surfaces
    - update Linux sandbox selection, CLI/config plumbing, and related
    tests/docs to match the new default
    - fold in the follow-up CI fixes for request-permissions responses and
    Linux read-only sandbox error text
  • chore: use AVAILABLE and ON_INSTALL as default plugin install and auth policies (#14407)
    make `AVAILABLE` the default plugin installPolicy when unset in
    `marketplace.json`. similarly, make `ON_INSTALL` the default authPolicy.
    
    this means, when unset, plugins are available to be installed (but not
    auto-installed), and the contained connectors will be authed at
    install-time.
    
    updated tests.
  • feat(app-server): propagate traces across tasks and core ops (#14387)
    ## Summary
    
    This PR keeps app-server RPC request trace context alive for the full
    lifetime of the work that request kicks off (e.g. for `thread/start`,
    this is `app-server rpc handler -> tokio background task -> core op
    submissions`). Previously we lose trace lineage once the request handler
    returns or hands work off to background tasks.
    
    This approach is especially relevant for `thread/start` and other RPC
    handlers that run in a non-blocking way. In the near future we'll most
    likely want to make all app-server handlers run in a non-blocking way by
    default, and only queue operations that must operate in order (e.g.
    thread RPCs per thread?), so we want to make sure tracing in app-server
    just generally works.
    
    Depends on https://github.com/openai/codex/pull/14300
    
    **Before**
    <img width="155" height="207" alt="image"
    src="https://github.com/user-attachments/assets/c9487459-36f1-436c-beb7-fafeb40737af"
    />
    
    
    **After**
    <img width="299" height="337" alt="image"
    src="https://github.com/user-attachments/assets/727392b2-d072-4427-9dc4-0502d8652dea"
    />
    
    ## What changed
    
    - Keep request-scoped trace context around until we send the final
    response or error, or the connection closes.
    - Thread that trace context through detached `thread/start` work so
    background startup stays attached to the originating request.
    - Pass request trace context through to downstream core operations,
    including:
      - thread creation
      - resume/fork flows
      - turn submission
      - review
      - interrupt
      - realtime conversation operations
    - Add tracing tests that verify:
      - remote W3C trace context is preserved for `thread/start`
      - remote W3C trace context is preserved for `turn/start`
      - downstream core spans stay under the originating request span
      - request-scoped tracing state is cleaned up correctly
    - Clean up shutdown behavior so detached background tasks and spawned
    threads are drained before process exit.
  • chore(app-server): stop emitting codex/event/ notifications (#14392)
    ## Description
    
    This PR stops emitting legacy `codex/event/*` notifications from the
    public app-server transports.
    
    It's been a long time coming! app-server was still producing a raw
    notification stream from core, alongside the typed app-server
    notifications and server requests, for compatibility reasons. Now,
    external clients should no longer be depending on those legacy
    notifications, so this change removes them from the stdio and websocket
    contract and updates the surrounding docs, examples, and tests to match.
    
    ### Caveat
    I left the "in-process" version of app-server alone for now, since
    `codex exec` was recently based on top of app-server via this in-process
    form here: https://github.com/openai/codex/pull/14005
    
    Seems like `codex exec` still consumes some legacy notifications
    internally, so this branch only removes `codex/event/*` from app-server
    over stdio and websockets.
    
    ## Follow-up
    
    Once `codex exec` is fully migrated off `codex/event/*` notifications,
    we'll be able to stop emitting them entirely entirely instead of just
    filtering it at the external transport boundary.
  • chore: wire through plugin policies + category from marketplace.json (#14305)
    wire plugin marketplace metadata through app-server endpoints:
    - `plugin/list` has `installPolicy` and `authPolicy`
    - `plugin/install` has plugin-level `authPolicy`
    
    `plugin/install` also now enforces `NOT_AVAILABLE` `installPolicy` when
    installing.
    
    
    added tests.
  • Add ephemeral flag support to thread fork (#14248)
    ### Summary
    This PR adds first-class ephemeral support to thread/fork, bringing it
    in line with thread/start. The goal is to support one-off completions on
    full forked threads without persisting them as normal user-visible
    threads.
    
    ### Testing
  • feat: Allow sync with remote plugin status. (#14176)
    Add forceRemoteSync to plugin/list.
    When it is set to True, we will sync the local plugin status with the
    remote one (backend-api/plugins/list).
  • Mark incomplete resumed turns interrupted when idle (#14125)
    Fixes a Codex app bug where quitting the app mid-run could leave the
    reopened thread stuck in progress and non-interactable. On cold thread
    resume, app-server could return an idle thread with a replayed turn
    still marked in progress. This marks incomplete replayed turns as
    interrupted unless the thread is actually active.
  • Log ChatGPT user ID for feedback tags (#13901)
    There are some bug investigations that currently require us to ask users
    for their user ID even though they've already uploaded logs and session
    details via `/feedback`. This frustrates users and increases the time
    for diagnosis.
    
    This PR includes the ChatGPT user ID in the metadata uploaded for
    `/feedback` (both the TUI and app-server).
  • Implemented thread-level atomic elicitation counter for stopwatch pausing (#12296)
    ### Purpose
    While trying to build out CLI-Tools for the agent to use under skills we
    have found that those tools sometimes need to invoke a user elicitation.
    These elicitations are handled out of band of the codex app-server but
    need to indicate to the exec manager that the command running is not
    going to progress on the usual timeout horizon.
    
    ### Example
    Model calls universal exec:
    `$ download-credit-card-history --start-date 2026-01-19 --end-date
    2026-02-19 > credit_history.jsonl`
    
    download-cred-card-history might hit a hosted/preauthenticated service
    to fetch data. That service might decide that the request requires an
    end user approval the access to the personal data. It should be able to
    signal to the running thread that the command in question is blocked on
    user elicitation. In that case we want the exec to continue, but the
    timeout to not expire on the tool call, essentially freezing time until
    the user approves or rejects the command at which point the tool would
    signal the app-server to decrement the outstanding elicitation count.
    Now timeouts would proceed as normal.
    
    ### What's Added
    
    - New v2 RPC methods:
        - thread/increment_elicitation
        - thread/decrement_elicitation
    - Protocol updates in:
        - codex-rs/app-server-protocol/src/protocol/common.rs
        - codex-rs/app-server-protocol/src/protocol/v2.rs
    - App-server handlers wired in:
        - codex-rs/app-server/src/codex_message_processor.rs
    
    ### Behavior
    
    - Counter starts at 0 per thread.
    - increment atomically increases the counter.
    - decrement atomically decreases the counter; decrement at 0 returns
    invalid request.
    - Transition rules:
    - 0 -> 1: broadcast pause state, pausing all active stopwatches
    immediately.
        - \>0 -> >0: remain paused.
        - 1 -> 0: broadcast unpause state, resuming stopwatches.
    - Core thread/session logic:
        - codex-rs/core/src/codex_thread.rs
        - codex-rs/core/src/codex.rs
        - codex-rs/core/src/mcp_connection_manager.rs
    
    ### Exec-server stopwatch integration
    
    - Added centralized stopwatch tracking/controller:
        - codex-rs/exec-server/src/posix/stopwatch_controller.rs
    - Hooked pause/unpause broadcast handling + stopwatch registration:
        - codex-rs/exec-server/src/posix/mcp.rs
        - codex-rs/exec-server/src/posix/stopwatch.rs
        - codex-rs/exec-server/src/posix.rs
  • [apps] Fix apps enablement condition. (#14011)
    - [x] Fix apps enablement condition to check both the feature flag and
    that the user is not an API key user.
  • fix(protocol): preserve legacy workspace-write semantics (#13957)
    ## Summary
    This is a fast follow to the initial `[permissions]` structure.
    
    - keep the new split-policy carveout behavior for narrower non-write
    entries under broader writable roots
    - preserve legacy `WorkspaceWrite` semantics by using a cwd-aware bridge
    that drops only redundant nested readable roots when projecting from
    `SandboxPolicy`
    - route the legacy macOS seatbelt adapter through that same legacy
    bridge so redundant nested readable roots do not become read-only
    carveouts on macOS
    - derive the legacy bridge for `command_exec` using the sandbox root cwd
    rather than the request cwd so policy derivation matches later sandbox
    enforcement
    - add regression coverage for the legacy macOS nested-readable-root case
    
    ## Examples
    ### Legacy `workspace-write` on macOS
    A legacy `workspace-write` policy can redundantly list a nested readable
    root under an already-writable workspace root.
    
    For example, legacy config can effectively mean:
    - workspace root (`.` / `cwd`) is writable
    - `docs/` is also listed in `readable_roots`
    
    The new shared split-policy helper intentionally treats a narrower
    non-write entry under a broader writable root as a carveout for real
    `[permissions]` configs. Without this fast follow, the unchanged macOS
    seatbelt legacy adapter could project that legacy shape into a
    `FileSystemSandboxPolicy` that treated `docs/` like a read-only carveout
    under the writable workspace root. In practice, legacy callers on macOS
    could unexpectedly lose write access inside `docs/`, even though that
    path was writable before the `[permissions]` migration work.
    
    This change fixes that by routing the legacy seatbelt path through the
    cwd-aware legacy bridge, so:
    - legacy `workspace-write` keeps `docs/` writable when `docs/` was only
    a redundant readable root
    - explicit `[permissions]` entries like `'.' = 'write'` and `'docs' =
    'read'` still make `docs/` read-only, which is the new intended
    split-policy behavior
    
    ### Legacy `command_exec` with a subdirectory cwd
    `command_exec` can run a command from a request cwd that is narrower
    than the sandbox root cwd.
    
    For example:
    - sandbox root cwd is `/repo`
    - request cwd is `/repo/subdir`
    - legacy policy is still `workspace-write` rooted at `/repo`
    
    Before this fast follow, `command_exec` derived the legacy bridge using
    the request cwd, but the sandbox was later built using the sandbox root
    cwd. That mismatch could miss redundant legacy readable roots during
    projection and accidentally reintroduce read-only carveouts for paths
    that should still be writable under the legacy model.
    
    This change fixes that by deriving the legacy bridge with the same
    sandbox root cwd that sandbox enforcement later uses.
    
    ## Verification
    - `just fmt`
    - `cargo test -p codex-core
    seatbelt_legacy_workspace_write_nested_readable_root_stays_writable`
    - `cargo test -p codex-core test_sandbox_config_parsing`
    - `cargo clippy -p codex-core -p codex-app-server --all-targets -- -D
    warnings`
    - `cargo clean`
  • chore: plugin/uninstall endpoint (#14111)
    add `plugin/uninstall` app-server endpoint to fully rm plugin from
    plugins cache dir and rm entry from user config file.
    
    plugin-enablement is session-scoped, so uninstalls are only picked up in
    new sessions (like installs).
    
    added tests.
  • app-server: require absolute cwd for windowsSandbox/setupStart (#13833)
    ## Summary
    - require windowsSandbox/setupStart.cwd to be an AbsolutePathBuf
    - reject relative cwd values at request parsing instead of normalizing
    them later in the setup flow
    - add RPC-layer coverage for relative cwd rejection and update the
    checked-in protocol schemas/docs
    
    ## Why
    windowsSandbox/setupStart was carrying the client-provided cwd as a raw
    PathBuf for command_cwd while config derivation normalized the same
    value into an absolute policy_cwd.
    
    That left room for relative-path ambiguity in the setup path, especially
    for inputs like cwd: "repo". Making the RPC accept only absolute paths
    removes that split entirely: the handler now receives one
    already-validated absolute path and uses it for both config derivation
    and setup.
    
    This keeps the trust model unchanged. Trusted clients could already
    choose the session cwd; this change is only about making the setup RPC
    reject relative paths so command_cwd and policy_cwd cannot diverge.
    
    ## Testing
    - cargo test -p codex-app-server windows_sandbox_setup (run locally by
    user)
    - cargo test -p codex-app-server-protocol windows_sandbox (run locally
    by user)
  • sandboxing: plumb split sandbox policies through runtime (#13439)
    ## Why
    
    `#13434` introduces split `FileSystemSandboxPolicy` and
    `NetworkSandboxPolicy`, but the runtime still made most execution-time
    sandbox decisions from the legacy `SandboxPolicy` projection.
    
    That projection loses information about combinations like unrestricted
    filesystem access with restricted network access. In practice, that
    means the runtime can choose the wrong platform sandbox behavior or set
    the wrong network-restriction environment for a command even when config
    has already separated those concerns.
    
    This PR carries the split policies through the runtime so sandbox
    selection, process spawning, and exec handling can consult the policy
    that actually matters.
    
    ## What changed
    
    - threaded `FileSystemSandboxPolicy` and `NetworkSandboxPolicy` through
    `TurnContext`, `ExecRequest`, sandbox attempts, shell escalation state,
    unified exec, and app-server exec overrides
    - updated sandbox selection in `core/src/sandboxing/mod.rs` and
    `core/src/exec.rs` to key off `FileSystemSandboxPolicy.kind` plus
    `NetworkSandboxPolicy`, rather than inferring behavior only from the
    legacy `SandboxPolicy`
    - updated process spawning in `core/src/spawn.rs` and the platform
    wrappers to use `NetworkSandboxPolicy` when deciding whether to set
    `CODEX_SANDBOX_NETWORK_DISABLED`
    - kept additional-permissions handling and legacy `ExternalSandbox`
    compatibility projections aligned with the split policies, including
    explicit user-shell execution and Windows restricted-token routing
    - updated callers across `core`, `app-server`, and `linux-sandbox` to
    pass the split policies explicitly
    
    ## Verification
    
    - added regression coverage in `core/tests/suite/user_shell_cmd.rs` to
    verify `RunUserShellCommand` does not inherit
    `CODEX_SANDBOX_NETWORK_DISABLED` from the active turn
    - added coverage in `core/src/exec.rs` for Windows restricted-token
    sandbox selection when the legacy projection is `ExternalSandbox`
    - updated Linux sandbox coverage in
    `linux-sandbox/tests/suite/landlock.rs` to exercise the split-policy
    exec path
    - verified the current PR state with `just clippy`
    
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13439).
    * #13453
    * #13452
    * #13451
    * #13449
    * #13448
    * #13445
    * #13440
    * __->__ #13439
    
    ---------
    
    Co-authored-by: viyatb-oai <viyatb@openai.com>
  • app-server: Add streaming and tty/pty capabilities to command/exec (#13640)
    * Add an ability to stream stdin, stdout, and stderr
    * Streaming of stdout and stderr has a configurable cap for total amount
    of transmitted bytes (with an ability to disable it)
    * Add support for overriding environment variables
    * Add an ability to terminate running applications (using
    `command/exec/terminate`)
    * Add TTY/PTY support, with an ability to resize the terminal (using
    `command/exec/resize`)
  • feat: Add curated plugin marketplace + Metadata Cleanup. (#13712)
    1. Add a synced curated plugin marketplace and include it in marketplace
    discovery.
    2. Expose optional plugin.json interface metadata in plugin/list
    3. Tighten plugin and marketplace path handling using validated absolute
    paths.
    4. Let manifests override skill, MCP, and app config paths.
    5. Restrict plugin enablement/config loading to the user config layer so
    plugin enablement is at global level
  • check app auth in plugin/install (#13685)
    #### What
    on `plugin/install`, check if installed apps are already authed on
    chatgpt, and return list of all apps that are not. clients can use this
    list to trigger auth workflows as needed.
    
    checks are best effort based on `codex_apps` loading, much like
    `app/list`.
    
    #### Tests
    Added integration tests, tested locally.
  • support plugin/list. (#13540)
    Introduce a plugin/list which reads from local marketplace.json.
    Also update the signature for plugin/install.
  • [tui] Show speed in session header (#13446)
    - add a speed row to the startup/session header under the model row
    - render the speed row with the same styling pattern as the model row,
    using /fast to change
    - show only Fast or Standard to users and update the affected snapshots
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Preserve persisted thread git info in resume (#13504)
    ## Summary
    - ensure `thread.resume` reuses the stored `gitInfo` instead of
    rebuilding it from the live working tree
    - persist and apply thread git metadata through the resume flow and add
    a regression test covering branch mismatch cases
    
    ## Testing
    - Not run (not requested)
  • plugin: support local-based marketplace.json + install endpoint. (#13422)
    Support marketplace.json that points to a local file, with
    ```
        "source":
        {
            "source": "local",
            "path": "./plugin-1"
        },
     ```
     
     Add a new plugin/install endpoint which add the plugin to the cache folder and enable it in config.toml.
  • allow apps to specify cwd for sandbox setup. (#13484)
    The electron app doesn't start up the app-server in a particular
    workspace directory.
    So sandbox setup happens in the app-installed directory instead of the
    project workspace.
    
    This allows the app do specify the workspace cwd so that the sandbox
    setup actually sets up the ACLs instead of exiting fast and then having
    the first shell command be slow.
  • config: enforce enterprise feature requirements (#13388)
    ## Why
    
    Enterprises can already constrain approvals, sandboxing, and web search
    through `requirements.toml` and MDM, but feature flags were still only
    configurable as managed defaults. That meant an enterprise could suggest
    feature values, but it could not actually pin them.
    
    This change closes that gap and makes enterprise feature requirements
    behave like the other constrained settings. The effective feature set
    now stays consistent with enterprise requirements during config load,
    when config writes are validated, and when runtime code mutates feature
    flags later in the session.
    
    It also tightens the runtime API for managed features. `ManagedFeatures`
    now follows the same constraint-oriented shape as `Constrained<T>`
    instead of exposing panic-prone mutation helpers, and production code
    can no longer construct it through an unconstrained `From<Features>`
    path.
    
    The PR also hardens the `compact_resume_fork` integration coverage on
    Windows. After the feature-management changes,
    `compact_resume_after_second_compaction_preserves_history` was
    overflowing the libtest/Tokio thread stacks on Windows, so the test now
    uses an explicit larger-stack harness as a pragmatic mitigation. That
    may not be the ideal root-cause fix, and it merits a parallel
    investigation into whether part of the async future chain should be
    boxed to reduce stack pressure instead.
    
    ## What Changed
    
    Enterprises can now pin feature values in `requirements.toml` with the
    requirements-side `features` table:
    
    ```toml
    [features]
    personality = true
    unified_exec = false
    ```
    
    Only canonical feature keys are allowed in the requirements `features`
    table; omitted keys remain unconstrained.
    
    - Added a requirements-side pinned feature map to
    `ConfigRequirementsToml`, threaded it through source-preserving
    requirements merge and normalization in `codex-config`, and made the
    TOML surface use `[features]` (while still accepting legacy
    `[feature_requirements]` for compatibility).
    - Exposed `featureRequirements` from `configRequirements/read`,
    regenerated the JSON/TypeScript schema artifacts, and updated the
    app-server README.
    - Wrapped the effective feature set in `ManagedFeatures`, backed by
    `ConstrainedWithSource<Features>`, and changed its API to mirror
    `Constrained<T>`: `can_set(...)`, `set(...) -> ConstraintResult<()>`,
    and result-returning `enable` / `disable` / `set_enabled` helpers.
    - Removed the legacy-usage and bulk-map passthroughs from
    `ManagedFeatures`; callers that need those behaviors now mutate a plain
    `Features` value and reapply it through `set(...)`, so the constrained
    wrapper remains the enforcement boundary.
    - Removed the production loophole for constructing unconstrained
    `ManagedFeatures`. Non-test code now creates it through the configured
    feature-loading path, and `impl From<Features> for ManagedFeatures` is
    restricted to `#[cfg(test)]`.
    - Rejected legacy feature aliases in enterprise feature requirements,
    and return a load error when a pinned combination cannot survive
    dependency normalization.
    - Validated config writes against enterprise feature requirements before
    persisting changes, including explicit conflicting writes and
    profile-specific feature states that normalize into invalid
    combinations.
    - Updated runtime and TUI feature-toggle paths to use the constrained
    setter API and to persist or apply the effective post-constraint value
    rather than the requested value.
    - Updated the `core_test_support` Bazel target to include the bundled
    core model-catalog fixtures in its runtime data, so helper code that
    resolves `core/models.json` through runfiles works in remote Bazel test
    environments.
    - Renamed the core config test coverage to emphasize that effective
    feature values are normalized at runtime, while conflicting persisted
    config writes are rejected.
    - Ran `compact_resume_after_second_compaction_preserves_history` inside
    an explicit 8 MiB test thread and Tokio runtime worker stack, following
    the existing larger-stack integration-test pattern, to keep the Windows
    `compact_resume_fork` test slice from aborting while a parallel
    investigation continues into whether some of the underlying async
    futures should be boxed.
    
    ## Verification
    
    - `cargo test -p codex-config`
    - `cargo test -p codex-core feature_requirements_ -- --nocapture`
    - `cargo test -p codex-core
    load_requirements_toml_produces_expected_constraints -- --nocapture`
    - `cargo test -p codex-core
    compact_resume_after_second_compaction_preserves_history -- --nocapture`
    - `cargo test -p codex-core compact_resume_fork -- --nocapture`
    - Re-ran the built `codex-core` `tests/all` binary with
    `RUST_MIN_STACK=262144` for
    `compact_resume_after_second_compaction_preserves_history` to confirm
    the explicit-stack harness fixes the deterministic low-stack repro.
    - `cargo test -p codex-core`
    - This still fails locally in unrelated integration areas that expect
    the `codex` / `test_stdio_server` binaries or hit existing `search_tool`
    wiremock mismatches.
    
    ## Docs
    
    `developers.openai.com/codex` should document the requirements-side
    `[features]` table for enterprise and MDM-managed configuration,
    including that it only accepts canonical feature keys and that
    conflicting config writes are rejected.
  • Add thread metadata update endpoint to app server (#13280)
    ## Summary
    - add the v2 `thread/metadata/update` API, including
    protocol/schema/TypeScript exports and app-server docs
    - patch stored thread `gitInfo` in sqlite without resuming the thread,
    with validation plus support for explicit `null` clears
    - repair missing sqlite thread rows from rollout data before patching,
    and make those repairs safe by inserting only when absent and updating
    only git columns so newer metadata is not clobbered
    - keep sqlite authoritative for mutable thread git metadata by
    preserving existing sqlite git fields during reconcile/backfill and only
    using rollout `SessionMeta` git fields to fill gaps
    - add regression coverage for the endpoint, repair paths, concurrent
    sqlite writes, clearing git fields, and rollout/backfill reconciliation
    - fix the login server shutdown race so cancelling before the waiter
    starts still terminates `block_until_done()` correctly
    
    ## Testing
    - `cargo test -p codex-state
    apply_rollout_items_preserves_existing_git_branch_and_fills_missing_git_fields`
    - `cargo test -p codex-state
    update_thread_git_info_preserves_newer_non_git_metadata`
    - `cargo test -p codex-core
    backfill_sessions_preserves_existing_git_branch_and_fills_missing_git_fields`
    - `cargo test -p codex-app-server thread_metadata_update`
    - `cargo test`
    - currently fails in existing `codex-core` grep-files tests with
    `unsupported call: grep_files`:
        - `suite::grep_files::grep_files_tool_collects_matches`
        - `suite::grep_files::grep_files_tool_reports_empty_results`
  • chore(app-server): delete v1 RPC methods and notifications (#13375)
    ## Summary
    This removes the old app-server v1 methods and notifications we no
    longer need, while keeping the small set the main codex app client still
    depends on for now.
    
    The remaining legacy surface is:
    - `initialize`
    - `getConversationSummary`
    - `getAuthStatus`
    - `gitDiffToRemote`
    - `fuzzyFileSearch`
    - `fuzzyFileSearch/sessionStart`
    - `fuzzyFileSearch/sessionUpdate`
    - `fuzzyFileSearch/sessionStop`
    
    And the raw `codex/event/*` notifications emitted from core. These
    notifications will be removed in a followup PR.
    
    ## What changed
    - removed deprecated v1 request variants from the protocol and
    app-server dispatcher
    - removed deprecated typed notifications: `authStatusChange`,
    `loginChatGptComplete`, and `sessionConfigured`
    - updated the app-server test client to use v2 flows instead of deleted
    v1 flows
    - deleted legacy-only app-server test suites and added focused coverage
    for `getConversationSummary`
    - regenerated app-server schema fixtures and updated the MCP interface
    docs to match the remaining compatibility surface
    
    ## Testing
    - `just write-app-server-schema`
    - `cargo test -p codex-app-server-protocol`
    - `cargo test -p codex-app-server`