Commit Graph

328 Commits

  • Add Smart Approvals guardian review across core, app-server, and TUI (#13860)
    ## Summary
    - add `approvals_reviewer = "user" | "guardian_subagent"` as the runtime
    control for who reviews approval requests
    - route Smart Approvals guardian review through core for command
    execution, file changes, managed-network approvals, MCP approvals, and
    delegated/subagent approval flows
    - expose guardian review in app-server with temporary unstable
    `item/autoApprovalReview/{started,completed}` notifications carrying
    `targetItemId`, `review`, and `action`
    - update the TUI so Smart Approvals can be enabled from `/experimental`,
    aligned with the matching `/approvals` mode, and surfaced clearly while
    reviews are pending or resolved
    
    ## Runtime model
    This PR does not introduce a new `approval_policy`.
    
    Instead:
    - `approval_policy` still controls when approval is needed
    - `approvals_reviewer` controls who reviewable approval requests are
    routed to:
      - `user`
      - `guardian_subagent`
    
    `guardian_subagent` is a carefully prompted reviewer subagent that
    gathers relevant context and applies a risk-based decision framework
    before approving or denying the request.
    
    The `smart_approvals` feature flag is a rollout/UI gate. Core runtime
    behavior keys off `approvals_reviewer`.
    
    When Smart Approvals is enabled from the TUI, it also switches the
    current `/approvals` settings to the matching Smart Approvals mode so
    users immediately see guardian review in the active thread:
    - `approval_policy = on-request`
    - `approvals_reviewer = guardian_subagent`
    - `sandbox_mode = workspace-write`
    
    Users can still change `/approvals` afterward.
    
    Config-load behavior stays intentionally narrow:
    - plain `smart_approvals = true` in `config.toml` remains just the
    rollout/UI gate and does not auto-set `approvals_reviewer`
    - the deprecated `guardian_approval = true` alias migration does
    backfill `approvals_reviewer = "guardian_subagent"` in the same scope
    when that reviewer is not already configured there, so old configs
    preserve their original guardian-enabled behavior
    
    ARC remains a separate safety check. For MCP tool approvals, ARC
    escalations now flow into the configured reviewer instead of always
    bypassing guardian and forcing manual review.
    
    ## Config stability
    The runtime reviewer override is stable, but the config-backed
    app-server protocol shape is still settling.
    
    - `thread/start`, `thread/resume`, and `turn/start` keep stable
    `approvalsReviewer` overrides
    - the config-backed `approvals_reviewer` exposure returned via
    `config/read` (including profile-level config) is now marked
    `[UNSTABLE]` / experimental in the app-server protocol until we are more
    confident in that config surface
    
    ## App-server surface
    This PR intentionally keeps the guardian app-server shape narrow and
    temporary.
    
    It adds generic unstable lifecycle notifications:
    - `item/autoApprovalReview/started`
    - `item/autoApprovalReview/completed`
    
    with payloads of the form:
    - `{ threadId, turnId, targetItemId, review, action? }`
    
    `review` is currently:
    - `{ status, riskScore?, riskLevel?, rationale? }`
    - where `status` is one of `inProgress`, `approved`, `denied`, or
    `aborted`
    
    `action` carries the guardian action summary payload from core when
    available. This lets clients render temporary standalone pending-review
    UI, including parallel reviews, even when the underlying tool item has
    not been emitted yet.
    
    These notifications are explicitly documented as `[UNSTABLE]` and
    expected to change soon.
    
    This PR does **not** persist guardian review state onto `thread/read`
    tool items. The intended follow-up is to attach guardian review state to
    the reviewed tool item lifecycle instead, which would improve
    consistency with manual approvals and allow thread history / reconnect
    flows to replay guardian review state directly.
    
    ## TUI behavior
    - `/experimental` exposes the rollout gate as `Smart Approvals`
    - enabling it in the TUI enables the feature and switches the current
    session to the matching Smart Approvals `/approvals` mode
    - disabling it in the TUI clears the persisted `approvals_reviewer`
    override when appropriate and returns the session to default manual
    review when the effective reviewer changes
    - `/approvals` still exposes the reviewer choice directly
    - the TUI renders:
    - pending guardian review state in the live status footer, including
    parallel review aggregation
      - resolved approval/denial state in history
    
    ## Scope notes
    This PR includes the supporting core/runtime work needed to make Smart
    Approvals usable end-to-end:
    - shell / unified-exec / apply_patch / managed-network / MCP guardian
    review
    - delegated/subagent approval routing into guardian review
    - guardian review risk metadata and action summaries for app-server/TUI
    - config/profile/TUI handling for `smart_approvals`, `guardian_approval`
    alias migration, and `approvals_reviewer`
    - a small internal cleanup of delegated approval forwarding to dedupe
    fallback paths and simplify guardian-vs-parent approval waiting (no
    intended behavior change)
    
    Out of scope for this PR:
    - redesigning the existing manual approval protocol shapes
    - persisting guardian review state onto app-server `ThreadItem`s
    - delegated MCP elicitation auto-review (the current delegated MCP
    guardian shim only covers the legacy `RequestUserInput` path)
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • app-server: add v2 filesystem APIs (#14245)
    Add a protocol-level filesystem surface to the v2 app-server so Codex
    clients can read and write files, inspect directories, and subscribe to
    path changes without relying on host-specific helpers.
    
    High-level changes:
    - define the new v2 fs/readFile, fs/writeFile, fs/createDirectory,
    fs/getMetadata, fs/readDirectory, fs/remove, fs/copy RPCs
    - implement the app-server handlers, including absolute-path validation,
    base64 file payloads, recursive copy/remove semantics
    - document the API, regenerate protocol schemas/types, and add
    end-to-end tests for filesystem operations, copy edge cases
    
    Testing plan:
    - validate protocol serialization and generated schema output for the
    new fs request, response, and notification types
    - run app-server integration coverage for file and directory CRUD paths,
    metadata/readDirectory responses, copy failure modes, and absolute-path
    validation
  • feat(app-server, core): add more spans (#14479)
    ## Description
    
    This PR expands tracing coverage across app-server thread startup, core
    session initialization, and the Responses transport layer. It also gives
    core dispatch spans stable operation-specific names so traces are easier
    to follow than the old generic `submission_dispatch` spans.
    
    Also use `fmt::Display` for types that we serialize in traces so we send
    strings instead of rust types
  • app-server: Add platform os and family to init response (#14527)
    This allows the client to pick os-specific behavior while interacting
    with the app server, e.g. to use proper path separators.
  • feat: add plugin/read. (#14445)
    return more information for a specific plugin.
  • chore(app-server): stop exporting EventMsg schemas (#14478)
    Follow up to https://github.com/openai/codex/pull/14392, stop exporting
    EventMsg types to TypeScript and JSON schema since we no longer emit
    them.
  • chore: use AVAILABLE and ON_INSTALL as default plugin install and auth policies (#14407)
    make `AVAILABLE` the default plugin installPolicy when unset in
    `marketplace.json`. similarly, make `ON_INSTALL` the default authPolicy.
    
    this means, when unset, plugins are available to be installed (but not
    auto-installed), and the contained connectors will be authed at
    install-time.
    
    updated tests.
  • Include spawn agent model metadata in app-server items (#14410)
    - add model and reasoning effort to app-server collab spawn items and
    notifications
    - regenerate app-server protocol schemas for the new fields
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • chore(app-server): delete unused rpc methods from v1.rs (#14394)
    ## Description
    
    This PR trims `app-server-protocol`'s v1 surface down to the small set
    of legacy types we still actually use.
    
    Unfortunately, we can't delete all of them yet because:
    - a few one-off v1 RPCs are still used by the Codex app
    - a few of these app-server-protocol v1 types are actually imported by
    core crates
    
    This change deletes that unused RPC surface, keeps the remaining
    compatibility types in place, and makes the crate root re-export only
    the v1 structs that downstream crates still depend on.
    
    ## Why
    
    The main goal here is to make the legacy protocol surface match reality.
    Leaving a large pile of dead v1 structs in place makes it harder to tell
    which compatibility paths are still intentional, and it keeps old
    schema/types around even though nothing should be building against them
    anymore.
    
    This also gives us a cleaner boundary for future cleanup. Instead of
    re-exporting all of `protocol::v1::*`, the crate now explicitly exposes
    only the v1 types that are still live, which makes it much easier to see
    what remains and delete more safely later.
    
    ## What changed
    
    - Deleted the unused v1 RPC/request/response structs from
    `app-server-protocol/src/protocol/v1.rs`.
    - Kept the small set of v1 compatibility types that are still live,
    including:
      - `initialize`
      - `getConversationSummary`
      - `getAuthStatus`
      - `gitDiffToRemote`
      - legacy approval payloads
      - config-related structs still used by downstream crates
    - Replaced the blanket `pub use protocol::v1::*` export in
    `app-server-protocol/src/lib.rs` with an explicit list of the remaining
    supported v1 types.
    - Regenerated the schema/type artifacts, which also updated the
    `InitializeCapabilities` opt-out example to use `thread/started` instead
    of the old `codex/event/session_configured` example.
    
    ## Validation
    
    - `just write-app-server-schema`
    - `cargo test -p codex-app-server-protocol`
    
    ## Follow-up
    
    The next cleanup is to keep shrinking the remaining v1 compatibility
    surface as callers migrate off it. Once the remaining consumers stop
    importing these legacy types, we should be able to remove more of the v1
    module and eventually stop exporting it from the crate root entirely.
  • feat: search_tool migrate to bring you own tool of Responses API (#14274)
    ## Why
    
    to support a new bring your own search tool in Responses
    API(https://developers.openai.com/api/docs/guides/tools-tool-search#client-executed-tool-search)
    we migrating our bm25 search tool to use official way to execute search
    on client and communicate additional tools to the model.
    
    ## What
    - replace the legacy `search_tool_bm25` flow with client-executed
    `tool_search`
    - add protocol, SSE, history, and normalization support for
    `tool_search_call` and `tool_search_output`
    - return namespaced Codex Apps search results and wire namespaced
    follow-up tool calls back into MCP dispatch
  • chore(app-server): stop emitting codex/event/ notifications (#14392)
    ## Description
    
    This PR stops emitting legacy `codex/event/*` notifications from the
    public app-server transports.
    
    It's been a long time coming! app-server was still producing a raw
    notification stream from core, alongside the typed app-server
    notifications and server requests, for compatibility reasons. Now,
    external clients should no longer be depending on those legacy
    notifications, so this change removes them from the stdio and websocket
    contract and updates the surrounding docs, examples, and tests to match.
    
    ### Caveat
    I left the "in-process" version of app-server alone for now, since
    `codex exec` was recently based on top of app-server via this in-process
    form here: https://github.com/openai/codex/pull/14005
    
    Seems like `codex exec` still consumes some legacy notifications
    internally, so this branch only removes `codex/event/*` from app-server
    over stdio and websockets.
    
    ## Follow-up
    
    Once `codex exec` is fully migrated off `codex/event/*` notifications,
    we'll be able to stop emitting them entirely entirely instead of just
    filtering it at the external transport boundary.
  • chore: wire through plugin policies + category from marketplace.json (#14305)
    wire plugin marketplace metadata through app-server endpoints:
    - `plugin/list` has `installPolicy` and `authPolicy`
    - `plugin/install` has plugin-level `authPolicy`
    
    `plugin/install` also now enforces `NOT_AVAILABLE` `installPolicy` when
    installing.
    
    
    added tests.
  • Show spawned agent model and effort in TUI (#14273)
    - include the requested sub-agent model and reasoning effort in the
    spawn begin event\n- render that metadata next to the spawned agent name
    and role in the TUI transcript
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • chore: add a separate reject-policy flag for skill approvals (#14271)
    ## Summary
    - add `skill_approval` to `RejectConfig` and the app-server v2
    `AskForApproval::Reject` payload so skill-script prompts can be
    configured independently from sandbox and rule-based prompts
    - update Unix shell escalation to reject prompts based on the actual
    decision source, keeping prefix rules tied to `rules`, unmatched command
    fallbacks tied to `sandbox_approval`, and skill scripts tied to
    `skill_approval`
    - regenerate the affected protocol/config schemas and expand
    unit/integration coverage for the new flag and skill approval behavior
  • feat: Add additional macOS Sandbox Permissions for Launch Services, Contacts, Reminders (#14155)
    Add additional macOS Sandbox Permissions levers for the following:
    
    - Launch Services
    - Contacts
    - Reminders
  • Add ephemeral flag support to thread fork (#14248)
    ### Summary
    This PR adds first-class ephemeral support to thread/fork, bringing it
    in line with thread/start. The goal is to support one-off completions on
    full forked threads without persisting them as normal user-visible
    threads.
    
    ### Testing
  • app-server: propagate nested experimental gating for AskForApproval::Reject (#14191)
    ## Summary
    This change makes `AskForApproval::Reject` gate correctly anywhere it
    appears inside otherwise-stable app-server protocol types.
    
    Previously, experimental gating for `approval_policy: Reject` was
    handled with request-specific logic in `ClientRequest` detection. That
    covered a few request params types, but it did not generalize to other
    nested uses such as `ProfileV2`, `Config`, `ConfigReadResponse`, or
    `ConfigRequirements`.
    
    This PR replaces that ad hoc handling with a generic nested experimental
    propagation mechanism.
    
    ## Testing
    
    seeing this when run app-server-test-client without experimental api
    enabled:
    ```
     initialize response: InitializeResponse { user_agent: "codex-toy-app-server/0.0.0 (Mac OS 26.3.1; arm64) vscode/2.4.36 (codex-toy-app-server; 0.0.0)" }
    > {
    >   "id": "50244f6a-270a-425d-ace0-e9e98205bde7",
    >   "method": "thread/start",
    >   "params": {
    >     "approvalPolicy": {
    >       "reject": {
    >         "mcp_elicitations": false,
    >         "request_permissions": true,
    >         "rules": false,
    >         "sandbox_approval": true
    >       }
    >     },
    >     "baseInstructions": null,
    >     "config": null,
    >     "cwd": null,
    >     "developerInstructions": null,
    >     "dynamicTools": null,
    >     "ephemeral": null,
    >     "experimentalRawEvents": false,
    >     "mockExperimentalField": null,
    >     "model": null,
    >     "modelProvider": null,
    >     "persistExtendedHistory": false,
    >     "personality": null,
    >     "sandbox": null,
    >     "serviceName": null
    >   }
    > }
    < {
    <   "error": {
    <     "code": -32600,
    <     "message": "askForApproval.reject requires experimentalApi capability"
    <   },
    <   "id": "50244f6a-270a-425d-ace0-e9e98205bde7"
    < }
    [verified] thread/start rejected approvalPolicy=Reject without experimentalApi
    ```
    
    ---------
    
    Co-authored-by: celia-oai <celia@openai.com>
  • feat: Allow sync with remote plugin status. (#14176)
    Add forceRemoteSync to plugin/list.
    When it is set to True, we will sync the local plugin status with the
    remote one (backend-api/plugins/list).
  • Use realtime transcript for handoff context (#14132)
    - collect input/output transcript deltas into active handoff transcript
    state
    - attach and clear that transcript on each handoff, and regenerate
    schema/tests
  • Implemented thread-level atomic elicitation counter for stopwatch pausing (#12296)
    ### Purpose
    While trying to build out CLI-Tools for the agent to use under skills we
    have found that those tools sometimes need to invoke a user elicitation.
    These elicitations are handled out of band of the codex app-server but
    need to indicate to the exec manager that the command running is not
    going to progress on the usual timeout horizon.
    
    ### Example
    Model calls universal exec:
    `$ download-credit-card-history --start-date 2026-01-19 --end-date
    2026-02-19 > credit_history.jsonl`
    
    download-cred-card-history might hit a hosted/preauthenticated service
    to fetch data. That service might decide that the request requires an
    end user approval the access to the personal data. It should be able to
    signal to the running thread that the command in question is blocked on
    user elicitation. In that case we want the exec to continue, but the
    timeout to not expire on the tool call, essentially freezing time until
    the user approves or rejects the command at which point the tool would
    signal the app-server to decrement the outstanding elicitation count.
    Now timeouts would proceed as normal.
    
    ### What's Added
    
    - New v2 RPC methods:
        - thread/increment_elicitation
        - thread/decrement_elicitation
    - Protocol updates in:
        - codex-rs/app-server-protocol/src/protocol/common.rs
        - codex-rs/app-server-protocol/src/protocol/v2.rs
    - App-server handlers wired in:
        - codex-rs/app-server/src/codex_message_processor.rs
    
    ### Behavior
    
    - Counter starts at 0 per thread.
    - increment atomically increases the counter.
    - decrement atomically decreases the counter; decrement at 0 returns
    invalid request.
    - Transition rules:
    - 0 -> 1: broadcast pause state, pausing all active stopwatches
    immediately.
        - \>0 -> >0: remain paused.
        - 1 -> 0: broadcast unpause state, resuming stopwatches.
    - Core thread/session logic:
        - codex-rs/core/src/codex_thread.rs
        - codex-rs/core/src/codex.rs
        - codex-rs/core/src/mcp_connection_manager.rs
    
    ### Exec-server stopwatch integration
    
    - Added centralized stopwatch tracking/controller:
        - codex-rs/exec-server/src/posix/stopwatch_controller.rs
    - Hooked pause/unpause broadcast handling + stopwatch registration:
        - codex-rs/exec-server/src/posix/mcp.rs
        - codex-rs/exec-server/src/posix/stopwatch.rs
        - codex-rs/exec-server/src/posix.rs
  • fix(core) default RejectConfig.request_permissions (#14165)
    ## Summary
    Adds a default here so existing config deserializes
    
    ## Testing
    - [x] Added a unit test
  • start of hooks engine (#13276)
    (Experimental)
    
    This PR adds a first MVP for hooks, with SessionStart and Stop
    
    The core design is:
    
    - hooks live in a dedicated engine under codex-rs/hooks
    - each hook type has its own event-specific file
    - hook execution is synchronous and blocks normal turn progression while
    running
    - matching hooks run in parallel, then their results are aggregated into
    a normalized HookRunSummary
    
    On the AppServer side, hooks are exposed as operational metadata rather
    than transcript-native items:
    
    - new live notifications: hook/started, hook/completed
    - persisted/replayed hook results live on Turn.hookRuns
    - we intentionally did not add hook-specific ThreadItem variants
    
    Hooks messages are not persisted, they remain ephemeral. The context
    changes they add are (they get appended to the user's prompt)
  • feat(approvals) RejectConfig for request_permissions (#14118)
    ## Summary
    We need to support allowing request_permissions calls when using
    `Reject` policy
    
    <img width="1133" height="588" alt="Screenshot 2026-03-09 at 12 06
    40 PM"
    src="https://github.com/user-attachments/assets/a8df987f-c225-4866-b8ab-5590960daec5"
    />
    
    Note that this is a backwards-incompatible change for Reject policy. I'm
    not sure if we need to add a default based on our current use/setup
    
    ## Testing
    - [x] Added tests
    - [x] Tested locally
  • feat(core) Persist request_permission data across turns (#14009)
    ## Summary
    request_permissions flows should support persisting results for the
    session.
    
    Open Question: Still deciding if we need within-turn approvals - this
    adds complexity but I could see it being useful
    
    ## Testing
    - [x] Updated unit tests
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Stabilize protocol schema fixture generation (#13886)
    ## What changed
    - TypeScript schema fixture generation now goes through in-memory tree
    helpers rather than a heavier on-disk generation path.
    - The comparison logic normalizes generated banner and path differences
    that are not semantically relevant to the exported schema.
    - TypeScript and JSON fixture coverage are split into separate tests,
    and the expensive schema-export tests are serialized in `nextest`.
    
    ## Why this fixes the flake
    - The original fixture coverage mixed several heavy codegen paths into
    one monolithic test and then compared generated output that included
    incidental banner/path differences.
    - On Windows CI, that combination created both runtime pressure and
    output variance unrelated to the schema shapes we actually care about.
    - Splitting the coverage isolates failures by format, in-memory
    generation reduces filesystem churn, normalization strips generator
    noise, and serializing the heavy tests removes parallel resource
    contention.
    
    ## Scope
    - Production helper change plus test changes.
  • chore: plugin/uninstall endpoint (#14111)
    add `plugin/uninstall` app-server endpoint to fully rm plugin from
    plugins cache dir and rm entry from user config file.
    
    plugin-enablement is session-scoped, so uninstalls are only picked up in
    new sessions (like installs).
    
    added tests.
  • Add request permissions tool (#13092)
    Adds a built-in `request_permissions` tool and wires it through the
    Codex core, protocol, and app-server layers so a running turn can ask
    the client for additional permissions instead of relying on a static
    session policy.
    
    The new flow emits a `RequestPermissions` event from core, tracks the
    pending request by call ID, forwards it through app-server v2 as an
    `item/permissions/requestApproval` request, and resumes the tool call
    once the client returns an approved subset of the requested permission
    profile.
  • app-server: include experimental skill metadata in exec approval requests (#13929)
    ## Summary
    
    This change surfaces skill metadata on command approval requests so
    app-server clients can tell when an approval came from a skill script
    and identify the originating `SKILL.md`.
    
    - add `skill_metadata` to exec approval events in the shared protocol
    - thread skill metadata through core shell escalation and delegated
    approval handling for skill-triggered approvals
    - expose the field in app-server v2 as experimental `skillMetadata`
    - regenerate the JSON/TypeScript schemas and cover the new field in
    protocol, transport, core, and TUI tests
    
    ## Why
    
    Skill-triggered approvals already carry skill context inside core, but
    app-server clients could not see which skill caused the prompt. Sending
    the skill metadata with the approval request makes it possible for
    clients to present better approval UX and connect the prompt back to the
    relevant skill definition.
    
    
    ## example event in app-server-v2
    verified that we see this event when experimental api is on:
    ```
    < {
    <   "id": 11,
    <   "method": "item/commandExecution/requestApproval",
    <   "params": {
    <     "additionalPermissions": {
    <       "fileSystem": null,
    <       "macos": {
    <         "accessibility": false,
    <         "automations": {
    <           "bundle_ids": [
    <             "com.apple.Notes"
    <           ]
    <         },
    <         "calendar": false,
    <         "preferences": "read_only"
    <       },
    <       "network": null
    <     },
    <     "approvalId": "25d600ee-5a3c-4746-8d17-e2e61fb4c563",
    <     "availableDecisions": [
    <       "accept",
    <       "acceptForSession",
    <       "cancel"
    <     ],
    <     "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info",
    <     "commandActions": [
    <       {
    <         "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info",
    <         "type": "unknown"
    <       }
    <     ],
    <     "cwd": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes",
    <     "itemId": "call_jZp3xFpNg4D8iKAD49cvEvZy",
    <     "skillMetadata": {
    <       "pathToSkillsMd": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/SKILL.md"
    <     },
    <     "threadId": "019ccc10-b7d3-7ff2-84fe-3a75e7681e69",
    <     "turnId": "019ccc10-b848-76f1-81b3-4a1fa225493f"
    <   }
    < }`
    ```
    
    & verified that this is the event when experimental api is off:
    ```
    < {
    <   "id": 13,
    <   "method": "item/commandExecution/requestApproval",
    <   "params": {
    <     "approvalId": "5fbbf776-261b-4cf8-899b-c125b547f2c0",
    <     "availableDecisions": [
    <       "accept",
    <       "acceptForSession",
    <       "cancel"
    <     ],
    <     "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info",
    <     "commandActions": [
    <       {
    <         "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info",
    <         "type": "unknown"
    <       }
    <     ],
    <     "cwd": "/Users/celia/code/codex/codex-rs",
    <     "itemId": "call_OV2DHzTgYcbYtWaTTBWlocOt",
    <     "threadId": "019ccc16-2a2b-7be1-8500-e00d45b892d4",
    <     "turnId": "019ccc16-2a8e-7961-98ec-649600e7d06a"
    <   }
    < }
    ```
  • Add in-process app server and wire up exec to use it (#14005)
    This is a subset of PR #13636. See that PR for a full overview of the
    architectural change.
    
    This PR implements the in-process app server and modifies the
    non-interactive "exec" entry point to use the app server.
    
    ---------
    
    Co-authored-by: Felipe Coury <felipe.coury@gmail.com>
  • [app-server] Support hot-reload user config when batch writing config. (#13839)
    - [x] Support hot-reload user config when batch writing config.
  • Add guardian approval MVP (#13692)
    ## Summary
    - add the guardian reviewer flow for `on-request` approvals in command,
    patch, sandbox-retry, and managed-network approval paths
    - keep guardian behind `features.guardian_approval` instead of exposing
    a public `approval_policy = guardian` mode
    - route ordinary `OnRequest` approvals to the guardian subagent when the
    feature is enabled, without changing the public approval-mode surface
    
    ## Public model
    - public approval modes stay unchanged
    - guardian is enabled via `features.guardian_approval`
    - when that feature is on, `approval_policy = on-request` keeps the same
    approval boundaries but sends those approval requests to the guardian
    reviewer instead of the user
    - `/experimental` only persists the feature flag; it does not rewrite
    `approval_policy`
    - CLI and app-server no longer expose a separate `guardian` approval
    mode in this PR
    
    ## Guardian reviewer
    - the reviewer runs as a normal subagent and reuses the existing
    subagent/thread machinery
    - it is locked to a read-only sandbox and `approval_policy = never`
    - it does not inherit user/project exec-policy rules
    - it prefers `gpt-5.4` when the current provider exposes it, otherwise
    falls back to the parent turn's active model
    - it fail-closes on timeout, startup failure, malformed output, or any
    other review error
    - it currently auto-approves only when `risk_score < 80`
    
    ## Review context and policy
    - guardian mirrors `OnRequest` approval semantics rather than
    introducing a separate approval policy
    - explicit `require_escalated` requests follow the same approval surface
    as `OnRequest`; the difference is only who reviews them
    - managed-network allowlist misses that enter the approval flow are also
    reviewed by guardian
    - the review prompt includes bounded recent transcript history plus
    recent tool call/result evidence
    - transcript entries and planned-action strings are truncated with
    explicit `<guardian_truncated ... />` markers so large payloads stay
    bounded
    - apply-patch reviews include the full patch content (without
    duplicating the structured `changes` payload)
    - the guardian request layout is snapshot-tested using the same
    model-visible Responses request formatter used elsewhere in core
    
    ## Guardian network behavior
    - the guardian subagent inherits the parent session's managed-network
    allowlist when one exists, so it can use the same approved network
    surface while reviewing
    - exact session-scoped network approvals are copied into the guardian
    session with protocol/port scope preserved
    - those copied approvals are now seeded before the guardian's first turn
    is submitted, so inherited approvals are available during any immediate
    review-time checks
    
    ## Out of scope / follow-ups
    - the sandbox-permission validation split was pulled into a separate PR
    and is not part of this diff
    - a future follow-up can enable `serde_json` preserve-order in
    `codex-core` and then simplify the guardian action rendering further
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • app-server: require absolute cwd for windowsSandbox/setupStart (#13833)
    ## Summary
    - require windowsSandbox/setupStart.cwd to be an AbsolutePathBuf
    - reject relative cwd values at request parsing instead of normalizing
    them later in the setup flow
    - add RPC-layer coverage for relative cwd rejection and update the
    checked-in protocol schemas/docs
    
    ## Why
    windowsSandbox/setupStart was carrying the client-provided cwd as a raw
    PathBuf for command_cwd while config derivation normalized the same
    value into an absolute policy_cwd.
    
    That left room for relative-path ambiguity in the setup path, especially
    for inputs like cwd: "repo". Making the RPC accept only absolute paths
    removes that split entirely: the handler now receives one
    already-validated absolute path and uses it for both config derivation
    and setup.
    
    This keeps the trust model unchanged. Trusted clients could already
    choose the session cwd; this change is only about making the setup RPC
    reject relative paths so command_cwd and policy_cwd cannot diverge.
    
    ## Testing
    - cargo test -p codex-app-server windows_sandbox_setup (run locally by
    user)
    - cargo test -p codex-app-server-protocol windows_sandbox (run locally
    by user)
  • feat(app-server-protocol): address naming conflicts in json schema exporter (#13819)
    This fixes a schema export bug where two different `WebSearchAction`
    types were getting merged under the same name in the app-server v2 JSON
    schema bundle.
    
    The problem was that v2 thread items use the app-server API's
    `WebSearchAction` with camelCase variants like `openPage`, while
    `ThreadResumeParams.history` and
    `RawResponseItemCompletedNotification.item` pull in the upstream
    `ResponseItem` graph, which uses the Responses API snake_case shape like
    `open_page`. During bundle generation we were flattening nested
    definitions into the v2 namespace by plain name, so the later definition
    could silently overwrite the earlier one.
    
    That meant clients generating code from the bundled schema could end up
    with the wrong `WebSearchAction` definition for v2 thread history. In
    practice this shows up on web search items reconstructed from rollout
    files with persisted extended history.
    
    This change does two things:
    - Gives the upstream Responses API schema a distinct JSON schema name:
    `ResponsesApiWebSearchAction`
    - Makes namespace-level schema definition collisions fail loudly instead
    of silently overwriting
  • app-server: Add streaming and tty/pty capabilities to command/exec (#13640)
    * Add an ability to stream stdin, stdout, and stderr
    * Streaming of stdout and stderr has a configurable cap for total amount
    of transmitted bytes (with an ability to disable it)
    * Add support for overriding environment variables
    * Add an ability to terminate running applications (using
    `command/exec/terminate`)
    * Add TTY/PTY support, with an ability to resize the terminal (using
    `command/exec/resize`)
  • Allow full web search tool config (#13675)
    Previously, we could only configure whether web search was on/off.
    
    This PR enables sending along a web search config, which includes all
    the stuff responsesapi supports: filters, location, etc.
  • feat: Add curated plugin marketplace + Metadata Cleanup. (#13712)
    1. Add a synced curated plugin marketplace and include it in marketplace
    discovery.
    2. Expose optional plugin.json interface metadata in plugin/list
    3. Tighten plugin and marketplace path handling using validated absolute
    paths.
    4. Let manifests override skill, MCP, and app config paths.
    5. Restrict plugin enablement/config loading to the user config layer so
    plugin enablement is at global level
  • feat: structured plugin parsing (#13711)
    #### What
    
    Add structured `@plugin` parsing and TUI support for plugin mentions.
    
    - Core: switch from plain-text `@display_name` parsing to structured
    `plugin://...` mentions via `UserInput::Mention` and
    `[$...](plugin://...)` links in text, same pattern as apps/skills.
    - TUI: add plugin mention popup, autocomplete, and chips when typing
    `$`. Load plugin capability summaries and feed them into the composer;
    plugin mentions appear alongside skills and apps.
    - Generalize mention parsing to a sigil parameter, still defaults to `$`
    
    <img width="797" height="119" alt="image"
    src="https://github.com/user-attachments/assets/f0fe2658-d908-4927-9139-73f850805ceb"
    />
    
    Builds on #13510. Currently clients have to build their own `id` via
    `plugin@marketplace` and filter plugins to show by `enabled`, but we
    will add `id` and `available` as fields returned from `plugin/list`
    soon.
    
    ####Tests
    
    Added tests, verified locally.
  • [elicitations] Switch to use MCP style elicitation payload for mcp tool approvals. (#13621)
    - [x] Switch to use MCP style elicitation payload for mcp tool
    approvals.
    - [ ] TODO: Update the UI to support the full spec.
  • Enabling CWD Saving for Image-Gen (#13607)
    Codex now saves the generated image on to your current working
    directory.
  • check app auth in plugin/install (#13685)
    #### What
    on `plugin/install`, check if installed apps are already authed on
    chatgpt, and return list of all apps that are not. clients can use this
    list to trigger auth workflows as needed.
    
    checks are best effort based on `codex_apps` loading, much like
    `app/list`.
    
    #### Tests
    Added integration tests, tested locally.
  • refactor: remove proxy admin endpoint (#13687)
    ## Summary
    - delete the network proxy admin server and its runtime listener/task
    plumbing
    - remove the admin endpoint config, runtime, requirement, protocol,
    schema, and debug-surface fields
    - update proxy docs to reflect the remaining HTTP and SOCKS listeners
    only
  • fix: accept two macOS automation input shapes for approval payload compatibility (#13683)
    ## Summary
    This PR:
    1. fixes a deserialization mismatch for macOS automation permissions in
    approval payloads by making core parsing accept both supported wire
    shapes for bundle IDs.
    2. added `#[serde(default)]` to `MacOsSeatbeltProfileExtensions` so
    omitted fields deserialize to secure defaults.
    
    
    ## Why this change is needed
    `MacOsAutomationPermission` uses `#[serde(try_from =
    "MacOsAutomationPermissionDe")]`, so deserialization is controlled by
    `MacOsAutomationPermissionDe`. After we aligned v2
    `additionalPermissions.macos.automations` to the core shape, approval
    payloads started including `{ "bundle_ids": [...] }` in some paths.
    `MacOsAutomationPermissionDe` previously accepted only `"none" | "all"`
    or a plain array, so object-shaped bundle IDs failed with `data did not
    match any variant of untagged enum MacOsAutomationPermissionDe`. This
    change restores compatibility by accepting both forms while preserving
    existing normalization behavior (trim values and map empty bundle lists
    to `None`).
    
    ## Validation
    
    saw this error went away when running
    ```
    cargo run -p codex-app-server-test-client -- \
        --codex-bin ./target/debug/codex \
        -c 'approval_policy="on-request"' \
        -c 'features.shell_zsh_fork=true' \
        -c 'zsh_path="/tmp/codex-zsh-fork/package/vendor/aarch64-apple-darwin/zsh/macos-15/zsh"' \
        send-message-v2 --experimental-api \
        'Use $apple-notes and run scripts/notes_info now.'
    ```
    :
    ```
    Error: failed to deserialize ServerRequest from JSONRPCRequest
    
    Caused by:
        data did not match any variant of untagged enum MacOsAutomationPermissionDe
    ```
  • support plugin/list. (#13540)
    Introduce a plugin/list which reads from local marketplace.json.
    Also update the signature for plugin/install.
  • core/protocol: add structured macOS additional permissions and merge them into sandbox execution (#13499)
    ## Summary
    - Introduce strongly-typed macOS additional permissions across
    protocol/core/app-server boundaries.
    - Merge additional permissions into effective sandbox execution,
    including macOS seatbelt profile extensions.
    - Expand docs, schema/tool definitions, UI rendering, and tests for
    `network`, `file_system`, and `macos` additional permissions.
  • add @plugin mentions (#13510)
    ## Note-- added plugin mentions via @, but that conflicts with file
    mentions
    
    depends and builds upon #13433.
    
    - introduces explicit `@plugin` mentions. this injects the plugin's mcp
    servers, app names, and skill name format into turn context as a dev
    message.
    - we do not yet have UI for these mentions, so we currently parse raw
    text (as opposed to skills and apps which have UI chips, autocomplete,
    etc.) this depends on a `plugins/list` app-server endpoint we can feed
    the UI with, which is upcoming
    - also annotate mcp and app tool descriptions with the plugin(s) they
    come from. this gives the model a first class way of understanding what
    tools come from which plugins, which will help implicit invocation.
    
    ### Tests
    Added and updated tests, unit and integration. Also confirmed locally a
    raw `@plugin` injects the dev message, and the model knows about its
    apps, mcps, and skills.
  • feat(app-server): support mcp elicitations in v2 api (#13425)
    This adds a first-class server request for MCP server elicitations:
    `mcpServer/elicitation/request`.
    
    Until now, MCP elicitation requests only showed up as a raw
    `codex/event/elicitation_request` event from core. That made it hard for
    v2 clients to handle elicitations using the same request/response flow
    as other server-driven interactions (like shell and `apply_patch`
    tools).
    
    This also updates the underlying MCP elicitation request handling in
    core to pass through the full MCP request (including URL and form data)
    so we can expose it properly in app-server.
    
    ### Why not `item/mcpToolCall/elicitationRequest`?
    This is because MCP elicitations are related to MCP servers first, and
    only optionally to a specific MCP tool call.
    
    In the MCP protocol, elicitation is a server-to-client capability: the
    server sends `elicitation/create`, and the client replies with an
    elicitation result. RMCP models it that way as well.
    
    In practice an elicitation is often triggered by an MCP tool call, but
    not always.
    
    ### What changed
    - add `mcpServer/elicitation/request` to the v2 app-server API
    - translate core `codex/event/elicitation_request` events into the new
    v2 server request
    - map client responses back into `Op::ResolveElicitation` so the MCP
    server can continue
    - update app-server docs and generated protocol schema
    - add an end-to-end app-server test that covers the full round trip
    through a real RMCP elicitation flow
    - The new test exercises a realistic case where an MCP tool call
    triggers an elicitation, the app-server emits
    mcpServer/elicitation/request, the client accepts it, and the tool call
    resumes and completes successfully.
    
    ### app-server API flow
    - Client starts a thread with `thread/start`.
    - Client starts a turn with `turn/start`.
    - App-server sends `item/started` for the `mcpToolCall`.
    - While that tool call is in progress, app-server sends
    `mcpServer/elicitation/request`.
    - Client responds to that request with `{ action: "accept" | "decline" |
    "cancel" }`.
    - App-server sends `serverRequest/resolved`.
    - App-server sends `item/completed` for the mcpToolCall.
    - App-server sends `turn/completed`.
    - If the turn is interrupted while the elicitation is pending,
    app-server still sends `serverRequest/resolved` before the turn
    finishes.
  • image-gen-event/client_processing (#13512)
    enabling client-side to process with image-generation capabilities
    (setting app-server)
  • plugin: support local-based marketplace.json + install endpoint. (#13422)
    Support marketplace.json that points to a local file, with
    ```
        "source":
        {
            "source": "local",
            "path": "./plugin-1"
        },
     ```
     
     Add a new plugin/install endpoint which add the plugin to the cache folder and enable it in config.toml.
  • feat(app-server-test-client): OTEL setup for tracing (#13493)
    ### Overview
    This PR:
    - Updates `app-server-test-client` to load OTEL settings from
    `$CODEX_HOME/config.toml` and initializes its own OTEL provider.
    - Add real client root spans to app-server test client traces.
    
    This updates `codex-app-server-test-client` so its Datadog traces
    reflect the full client-driven flow instead of a set of server spans
    stitched together under a synthetic parent.
    
    Before this change, the test client generated a fake `traceparent` once
    and reused it for every JSON-RPC request. That kept the requests in one
    trace, but there was no real client span at the top, so Datadog ended up
    showing the sequence in a slightly misleading way, where all RPCs were
    anchored under `initialize`.
    
    Now the test client:
    - loads OTEL settings from the normal Codex config path, including
    `$CODEX_HOME/config.toml` and existing --config overrides
    - initializes tracing the same way other Codex binaries do when trace
    export is enabled
    - creates a real client root span for each scripted command
    - creates per-request client spans for JSON-RPC methods like
    `initialize`, `thread/start`, and `turn/start`
    - injects W3C trace context from the current client span into
    request.trace instead of reusing a fabricated carrier
    
    This gives us a cleaner trace shape in Datadog:
    - one trace URL for the whole scripted flow
    - a visible client root span
    - proper client/server parent-child relationships for each app-server
    request