Commit Graph

384 Commits

  • Add realtime start instructions config override (#14270)
    - add `realtime_start_instructions` config support
    - thread it into realtime context updates, schema, docs, and tests
  • Show spawned agent model and effort in TUI (#14273)
    - include the requested sub-agent model and reasoning effort in the
    spawn begin event\n- render that metadata next to the spawned agent name
    and role in the TUI transcript
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • chore: add a separate reject-policy flag for skill approvals (#14271)
    ## Summary
    - add `skill_approval` to `RejectConfig` and the app-server v2
    `AskForApproval::Reject` payload so skill-script prompts can be
    configured independently from sandbox and rule-based prompts
    - update Unix shell escalation to reject prompts based on the actual
    decision source, keeping prefix rules tied to `rules`, unmatched command
    fallbacks tied to `sandbox_approval`, and skill scripts tied to
    `skill_approval`
    - regenerate the affected protocol/config schemas and expand
    unit/integration coverage for the new flag and skill approval behavior
  • feat: Add additional macOS Sandbox Permissions for Launch Services, Contacts, Reminders (#14155)
    Add additional macOS Sandbox Permissions levers for the following:
    
    - Launch Services
    - Contacts
    - Reminders
  • Add output schema to MCP tools and expose MCP tool results in code mode (#14236)
    Summary
    - drop `McpToolOutput` in favor of `CallToolResult`, moving its helpers
    to keep MCP tooling focused on the final result shape
    - wire the new schema definitions through code mode, context, handlers,
    and spec modules so MCP tools serialize the exact output shape expected
    by the model
    - extend code mode tests to cover multiple MCP call scenarios and ensure
    the serialized data matches the new schema
    - refresh JS runner helpers and protocol models alongside the schema
    changes
    
    Testing
    - Not run (not requested)
  • Reuse McpToolOutput in McpHandler (#14229)
    We already have a type to represent the MCP tool output, reuse it
    instead of the custom McpHandlerOutput
  • Use realtime transcript for handoff context (#14132)
    - collect input/output transcript deltas into active handoff transcript
    state
    - attach and clear that transcript on each handoff, and regenerate
    schema/tests
  • fix(core) default RejectConfig.request_permissions (#14165)
    ## Summary
    Adds a default here so existing config deserializes
    
    ## Testing
    - [x] Added a unit test
  • start of hooks engine (#13276)
    (Experimental)
    
    This PR adds a first MVP for hooks, with SessionStart and Stop
    
    The core design is:
    
    - hooks live in a dedicated engine under codex-rs/hooks
    - each hook type has its own event-specific file
    - hook execution is synchronous and blocks normal turn progression while
    running
    - matching hooks run in parallel, then their results are aggregated into
    a normalized HookRunSummary
    
    On the AppServer side, hooks are exposed as operational metadata rather
    than transcript-native items:
    
    - new live notifications: hook/started, hook/completed
    - persisted/replayed hook results live on Turn.hookRuns
    - we intentionally did not add hook-specific ThreadItem variants
    
    Hooks messages are not persisted, they remain ephemeral. The context
    changes they add are (they get appended to the user's prompt)
  • fix: keep permissions profiles forward compatible (#14107)
    ## Summary
    - preserve unknown `:special_path` tokens, including nested entries, so
    older Codex builds warn and ignore instead of failing config load
    - fail closed with a startup warning when a permissions profile has
    missing or empty filesystem entries instead of aborting profile
    compilation
    - normalize Windows verbatim paths like `\?\C:\...` before absolute-path
    validation while keeping explicit errors for truly invalid paths
    
    ## Testing
    - just fmt
    - cargo test -p codex-core permissions_profiles_allow
    - cargo test -p codex-core
    normalize_absolute_path_for_platform_simplifies_windows_verbatim_paths
    - cargo test -p codex-protocol
    unknown_special_paths_are_ignored_by_legacy_bridge
    - cargo clippy -p codex-core -p codex-protocol --all-targets -- -D
    warnings
    - cargo clean
  • fix(protocol): preserve legacy workspace-write semantics (#13957)
    ## Summary
    This is a fast follow to the initial `[permissions]` structure.
    
    - keep the new split-policy carveout behavior for narrower non-write
    entries under broader writable roots
    - preserve legacy `WorkspaceWrite` semantics by using a cwd-aware bridge
    that drops only redundant nested readable roots when projecting from
    `SandboxPolicy`
    - route the legacy macOS seatbelt adapter through that same legacy
    bridge so redundant nested readable roots do not become read-only
    carveouts on macOS
    - derive the legacy bridge for `command_exec` using the sandbox root cwd
    rather than the request cwd so policy derivation matches later sandbox
    enforcement
    - add regression coverage for the legacy macOS nested-readable-root case
    
    ## Examples
    ### Legacy `workspace-write` on macOS
    A legacy `workspace-write` policy can redundantly list a nested readable
    root under an already-writable workspace root.
    
    For example, legacy config can effectively mean:
    - workspace root (`.` / `cwd`) is writable
    - `docs/` is also listed in `readable_roots`
    
    The new shared split-policy helper intentionally treats a narrower
    non-write entry under a broader writable root as a carveout for real
    `[permissions]` configs. Without this fast follow, the unchanged macOS
    seatbelt legacy adapter could project that legacy shape into a
    `FileSystemSandboxPolicy` that treated `docs/` like a read-only carveout
    under the writable workspace root. In practice, legacy callers on macOS
    could unexpectedly lose write access inside `docs/`, even though that
    path was writable before the `[permissions]` migration work.
    
    This change fixes that by routing the legacy seatbelt path through the
    cwd-aware legacy bridge, so:
    - legacy `workspace-write` keeps `docs/` writable when `docs/` was only
    a redundant readable root
    - explicit `[permissions]` entries like `'.' = 'write'` and `'docs' =
    'read'` still make `docs/` read-only, which is the new intended
    split-policy behavior
    
    ### Legacy `command_exec` with a subdirectory cwd
    `command_exec` can run a command from a request cwd that is narrower
    than the sandbox root cwd.
    
    For example:
    - sandbox root cwd is `/repo`
    - request cwd is `/repo/subdir`
    - legacy policy is still `workspace-write` rooted at `/repo`
    
    Before this fast follow, `command_exec` derived the legacy bridge using
    the request cwd, but the sandbox was later built using the sandbox root
    cwd. That mismatch could miss redundant legacy readable roots during
    projection and accidentally reintroduce read-only carveouts for paths
    that should still be writable under the legacy model.
    
    This change fixes that by deriving the legacy bridge with the same
    sandbox root cwd that sandbox enforcement later uses.
    
    ## Verification
    - `just fmt`
    - `cargo test -p codex-core
    seatbelt_legacy_workspace_write_nested_readable_root_stays_writable`
    - `cargo test -p codex-core test_sandbox_config_parsing`
    - `cargo clippy -p codex-core -p codex-app-server --all-targets -- -D
    warnings`
    - `cargo clean`
  • feat(approvals) RejectConfig for request_permissions (#14118)
    ## Summary
    We need to support allowing request_permissions calls when using
    `Reject` policy
    
    <img width="1133" height="588" alt="Screenshot 2026-03-09 at 12 06
    40 PM"
    src="https://github.com/user-attachments/assets/a8df987f-c225-4866-b8ab-5590960daec5"
    />
    
    Note that this is a backwards-incompatible change for Reject policy. I'm
    not sure if we need to add a default based on our current use/setup
    
    ## Testing
    - [x] Added tests
    - [x] Tested locally
  • feat(core) Persist request_permission data across turns (#14009)
    ## Summary
    request_permissions flows should support persisting results for the
    session.
    
    Open Question: Still deciding if we need within-turn approvals - this
    adds complexity but I could see it being useful
    
    ## Testing
    - [x] Updated unit tests
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • guardian initial feedback / tweaks (#13897)
    ## Summary
    - remove the remaining model-visible guardian-specific `on-request`
    prompt additions so enabling the feature does not change the main
    approval-policy instructions
    - neutralize user-facing guardian wording to talk about automatic
    approval review / approval requests rather than a second reviewer or
    only sandbox escalations
    - tighten guardian retry-context handling so agent-authored
    `justification` stays in the structured action JSON and is not also
    injected as raw retry context
    - simplify guardian review plumbing in core by deleting dead
    prompt-append paths and trimming some request/transcript setup code
    
    ## Notable Changes
    - delete the dead `permissions/approval_policy/guardian.md` append path
    and stop threading `guardian_approval_enabled` through model-facing
    developer-instruction builders
    - rename the experimental feature copy to `Automatic approval review`
    and update the `/experimental` snapshot text accordingly
    - make approval-review status strings generic across shell, patch,
    network, and MCP review types
    - forward real sandbox/network retry reasons for shell and unified-exec
    guardian review, but do not pass agent-authored justification as raw
    retry context
    - simplify `guardian.rs` by removing the one-field request wrapper,
    deduping reasoning-effort selection, and cleaning up transcript entry
    collection
    
    ## Testing
    - `just fmt`
    - full validation left to CI
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Add request permissions tool (#13092)
    Adds a built-in `request_permissions` tool and wires it through the
    Codex core, protocol, and app-server layers so a running turn can ask
    the client for additional permissions instead of relying on a static
    session policy.
    
    The new flow emits a `RequestPermissions` event from core, tracks the
    pending request by call ID, forwards it through app-server v2 as an
    `item/permissions/requestApproval` request, and resumes the tool call
    once the client returns an approved subset of the requested permission
    profile.
  • app-server: include experimental skill metadata in exec approval requests (#13929)
    ## Summary
    
    This change surfaces skill metadata on command approval requests so
    app-server clients can tell when an approval came from a skill script
    and identify the originating `SKILL.md`.
    
    - add `skill_metadata` to exec approval events in the shared protocol
    - thread skill metadata through core shell escalation and delegated
    approval handling for skill-triggered approvals
    - expose the field in app-server v2 as experimental `skillMetadata`
    - regenerate the JSON/TypeScript schemas and cover the new field in
    protocol, transport, core, and TUI tests
    
    ## Why
    
    Skill-triggered approvals already carry skill context inside core, but
    app-server clients could not see which skill caused the prompt. Sending
    the skill metadata with the approval request makes it possible for
    clients to present better approval UX and connect the prompt back to the
    relevant skill definition.
    
    
    ## example event in app-server-v2
    verified that we see this event when experimental api is on:
    ```
    < {
    <   "id": 11,
    <   "method": "item/commandExecution/requestApproval",
    <   "params": {
    <     "additionalPermissions": {
    <       "fileSystem": null,
    <       "macos": {
    <         "accessibility": false,
    <         "automations": {
    <           "bundle_ids": [
    <             "com.apple.Notes"
    <           ]
    <         },
    <         "calendar": false,
    <         "preferences": "read_only"
    <       },
    <       "network": null
    <     },
    <     "approvalId": "25d600ee-5a3c-4746-8d17-e2e61fb4c563",
    <     "availableDecisions": [
    <       "accept",
    <       "acceptForSession",
    <       "cancel"
    <     ],
    <     "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info",
    <     "commandActions": [
    <       {
    <         "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info",
    <         "type": "unknown"
    <       }
    <     ],
    <     "cwd": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes",
    <     "itemId": "call_jZp3xFpNg4D8iKAD49cvEvZy",
    <     "skillMetadata": {
    <       "pathToSkillsMd": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/SKILL.md"
    <     },
    <     "threadId": "019ccc10-b7d3-7ff2-84fe-3a75e7681e69",
    <     "turnId": "019ccc10-b848-76f1-81b3-4a1fa225493f"
    <   }
    < }`
    ```
    
    & verified that this is the event when experimental api is off:
    ```
    < {
    <   "id": 13,
    <   "method": "item/commandExecution/requestApproval",
    <   "params": {
    <     "approvalId": "5fbbf776-261b-4cf8-899b-c125b547f2c0",
    <     "availableDecisions": [
    <       "accept",
    <       "acceptForSession",
    <       "cancel"
    <     ],
    <     "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info",
    <     "commandActions": [
    <       {
    <         "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info",
    <         "type": "unknown"
    <       }
    <     ],
    <     "cwd": "/Users/celia/code/codex/codex-rs",
    <     "itemId": "call_OV2DHzTgYcbYtWaTTBWlocOt",
    <     "threadId": "019ccc16-2a2b-7be1-8500-e00d45b892d4",
    <     "turnId": "019ccc16-2a8e-7961-98ec-649600e7d06a"
    <   }
    < }
    ```
  • protocol: keep root carveouts sandboxed (#13452)
    ## Why
    
    A restricted filesystem policy that grants `:root` read or write access
    but also carries explicit deny entries should still behave like scoped
    access with carveouts, not like unrestricted disk access.
    
    Without that distinction, later platform backends cannot preserve
    blocked subpaths under root-level permissions because the protocol layer
    reports the policy as fully unrestricted.
    
    ## What changed
    
    - taught `FileSystemSandboxPolicy` to treat root access plus explicit
    deny entries as scoped access rather than full-disk access
    - derived readable and writable roots from the filesystem root when root
    access is combined with carveouts, while preserving the denied paths as
    read-only subpaths
    - added protocol coverage for root-write policies with carveouts and a
    core sandboxing regression so those policies still require platform
    sandboxing
    
    ## Verification
    
    - added protocol coverage in `protocol/src/permissions.rs` and
    `protocol/src/protocol.rs` for root access with explicit carveouts
    - added platform-sandbox regression coverage in
    `core/src/sandboxing/mod.rs`
    - verified the current PR state with `just clippy`
    
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13452).
    * #13453
    * __->__ #13452
    * #13451
    * #13449
    * #13448
    * #13445
    * #13440
    * #13439
    
    ---------
    
    Co-authored-by: viyatb-oai <viyatb@openai.com>
  • linux-sandbox: plumb split sandbox policies through helper (#13449)
    ## Why
    
    The Linux sandbox helper still only accepted the legacy `SandboxPolicy`
    payload.
    
    That meant the runtime could compute split filesystem and network
    policies, but the helper would immediately collapse them back to the
    compatibility projection before applying seccomp or staging the
    bubblewrap inner command.
    
    ## What changed
    
    - added hidden `--file-system-sandbox-policy` and
    `--network-sandbox-policy` flags alongside the legacy `--sandbox-policy`
    flag so the helper can migrate incrementally
    - updated the core-side Landlock wrapper to pass the split policies
    explicitly when launching `codex-linux-sandbox`
    - added helper-side resolution logic that accepts either the legacy
    policy alone or a complete split-policy pair and normalizes that into
    one effective configuration
    - switched Linux helper network decisions to use `NetworkSandboxPolicy`
    directly
    - added `FromStr` support for the split policy types so the helper can
    parse them from CLI JSON
    
    ## Verification
    
    - added helper coverage in `linux-sandbox/src/linux_run_main_tests.rs`
    for split-policy flags and policy resolution
    - added CLI argument coverage in `core/src/landlock.rs`
    - verified the current PR state with `just clippy`
    
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13449).
    * #13453
    * #13452
    * #13451
    * __->__ #13449
    * #13448
    * #13445
    * #13440
    * #13439
    
    ---------
    
    Co-authored-by: viyatb-oai <viyatb@openai.com>
  • Add guardian approval MVP (#13692)
    ## Summary
    - add the guardian reviewer flow for `on-request` approvals in command,
    patch, sandbox-retry, and managed-network approval paths
    - keep guardian behind `features.guardian_approval` instead of exposing
    a public `approval_policy = guardian` mode
    - route ordinary `OnRequest` approvals to the guardian subagent when the
    feature is enabled, without changing the public approval-mode surface
    
    ## Public model
    - public approval modes stay unchanged
    - guardian is enabled via `features.guardian_approval`
    - when that feature is on, `approval_policy = on-request` keeps the same
    approval boundaries but sends those approval requests to the guardian
    reviewer instead of the user
    - `/experimental` only persists the feature flag; it does not rewrite
    `approval_policy`
    - CLI and app-server no longer expose a separate `guardian` approval
    mode in this PR
    
    ## Guardian reviewer
    - the reviewer runs as a normal subagent and reuses the existing
    subagent/thread machinery
    - it is locked to a read-only sandbox and `approval_policy = never`
    - it does not inherit user/project exec-policy rules
    - it prefers `gpt-5.4` when the current provider exposes it, otherwise
    falls back to the parent turn's active model
    - it fail-closes on timeout, startup failure, malformed output, or any
    other review error
    - it currently auto-approves only when `risk_score < 80`
    
    ## Review context and policy
    - guardian mirrors `OnRequest` approval semantics rather than
    introducing a separate approval policy
    - explicit `require_escalated` requests follow the same approval surface
    as `OnRequest`; the difference is only who reviews them
    - managed-network allowlist misses that enter the approval flow are also
    reviewed by guardian
    - the review prompt includes bounded recent transcript history plus
    recent tool call/result evidence
    - transcript entries and planned-action strings are truncated with
    explicit `<guardian_truncated ... />` markers so large payloads stay
    bounded
    - apply-patch reviews include the full patch content (without
    duplicating the structured `changes` payload)
    - the guardian request layout is snapshot-tested using the same
    model-visible Responses request formatter used elsewhere in core
    
    ## Guardian network behavior
    - the guardian subagent inherits the parent session's managed-network
    allowlist when one exists, so it can use the same approved network
    surface while reviewing
    - exact session-scoped network approvals are copied into the guardian
    session with protocol/port scope preserved
    - those copied approvals are now seeded before the guardian's first turn
    is submitted, so inherited approvals are available during any immediate
    review-time checks
    
    ## Out of scope / follow-ups
    - the sandbox-permission validation split was pulled into a separate PR
    and is not part of this diff
    - a future follow-up can enable `serde_json` preserve-order in
    `codex-core` and then simplify the guardian action rendering further
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • protocol: derive effective file access from filesystem policies (#13440)
    ## Why
    
    `#13434` and `#13439` introduce split filesystem and network policies,
    but the only code that could answer basic filesystem questions like "is
    access effectively unrestricted?" or "which roots are readable and
    writable for this cwd?" still lived on the legacy `SandboxPolicy` path.
    
    That would force later backends to either keep projecting through
    `SandboxPolicy` or duplicate path-resolution logic. This PR moves those
    queries onto `FileSystemSandboxPolicy` itself so later runtime and
    platform changes can consume the split policy directly.
    
    ## What changed
    
    - added `FileSystemSandboxPolicy` helpers for full-read/full-write
    checks, platform-default reads, readable roots, writable roots, and
    explicit unreadable roots resolved against a cwd
    - added a shared helper for the default read-only carveouts under
    writable roots so the legacy and split-policy paths stay aligned
    - added protocol coverage for full-access detection and derived
    readable, writable, and unreadable roots
    
    ## Verification
    
    - added protocol coverage in `protocol/src/protocol.rs` and
    `protocol/src/permissions.rs` for full-root access and derived
    filesystem roots
    - verified the current PR state with `just clippy`
    
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13440).
    * #13453
    * #13452
    * #13451
    * #13449
    * #13448
    * #13445
    * __->__ #13440
    * #13439
    
    ---------
    
    Co-authored-by: viyatb-oai <viyatb@openai.com>
  • sandboxing: plumb split sandbox policies through runtime (#13439)
    ## Why
    
    `#13434` introduces split `FileSystemSandboxPolicy` and
    `NetworkSandboxPolicy`, but the runtime still made most execution-time
    sandbox decisions from the legacy `SandboxPolicy` projection.
    
    That projection loses information about combinations like unrestricted
    filesystem access with restricted network access. In practice, that
    means the runtime can choose the wrong platform sandbox behavior or set
    the wrong network-restriction environment for a command even when config
    has already separated those concerns.
    
    This PR carries the split policies through the runtime so sandbox
    selection, process spawning, and exec handling can consult the policy
    that actually matters.
    
    ## What changed
    
    - threaded `FileSystemSandboxPolicy` and `NetworkSandboxPolicy` through
    `TurnContext`, `ExecRequest`, sandbox attempts, shell escalation state,
    unified exec, and app-server exec overrides
    - updated sandbox selection in `core/src/sandboxing/mod.rs` and
    `core/src/exec.rs` to key off `FileSystemSandboxPolicy.kind` plus
    `NetworkSandboxPolicy`, rather than inferring behavior only from the
    legacy `SandboxPolicy`
    - updated process spawning in `core/src/spawn.rs` and the platform
    wrappers to use `NetworkSandboxPolicy` when deciding whether to set
    `CODEX_SANDBOX_NETWORK_DISABLED`
    - kept additional-permissions handling and legacy `ExternalSandbox`
    compatibility projections aligned with the split policies, including
    explicit user-shell execution and Windows restricted-token routing
    - updated callers across `core`, `app-server`, and `linux-sandbox` to
    pass the split policies explicitly
    
    ## Verification
    
    - added regression coverage in `core/tests/suite/user_shell_cmd.rs` to
    verify `RunUserShellCommand` does not inherit
    `CODEX_SANDBOX_NETWORK_DISABLED` from the active turn
    - added coverage in `core/src/exec.rs` for Windows restricted-token
    sandbox selection when the legacy projection is `ExternalSandbox`
    - updated Linux sandbox coverage in
    `linux-sandbox/tests/suite/landlock.rs` to exercise the split-policy
    exec path
    - verified the current PR state with `just clippy`
    
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13439).
    * #13453
    * #13452
    * #13451
    * #13449
    * #13448
    * #13445
    * #13440
    * __->__ #13439
    
    ---------
    
    Co-authored-by: viyatb-oai <viyatb@openai.com>
  • feat(app-server-protocol): address naming conflicts in json schema exporter (#13819)
    This fixes a schema export bug where two different `WebSearchAction`
    types were getting merged under the same name in the app-server v2 JSON
    schema bundle.
    
    The problem was that v2 thread items use the app-server API's
    `WebSearchAction` with camelCase variants like `openPage`, while
    `ThreadResumeParams.history` and
    `RawResponseItemCompletedNotification.item` pull in the upstream
    `ResponseItem` graph, which uses the Responses API snake_case shape like
    `open_page`. During bundle generation we were flattening nested
    definitions into the v2 namespace by plain name, so the later definition
    could silently overwrite the earlier one.
    
    That meant clients generating code from the bundled schema could end up
    with the wrong `WebSearchAction` definition for v2 thread history. In
    practice this shows up on web search items reconstructed from rollout
    files with persisted extended history.
    
    This change does two things:
    - Gives the upstream Responses API schema a distinct JSON schema name:
    `ResponsesApiWebSearchAction`
    - Makes namespace-level schema definition collisions fail loudly instead
    of silently overwriting
  • Allow full web search tool config (#13675)
    Previously, we could only configure whether web search was on/off.
    
    This PR enables sending along a web search config, which includes all
    the stuff responsesapi supports: filters, location, etc.
  • config: add initial support for the new permission profile config language in config.toml (#13434)
    ## Why
    
    `SandboxPolicy` currently mixes together three separate concerns:
    
    - parsing layered config from `config.toml`
    - representing filesystem sandbox state
    - carrying basic network policy alongside filesystem choices
    
    That makes the existing config awkward to extend and blocks the new TOML
    proposal where `[permissions]` becomes a table of named permission
    profiles selected by `default_permissions`. (The idea is that if
    `default_permissions` is not specified, we assume the user is opting
    into the "traditional" way to configure the sandbox.)
    
    This PR adds the config-side plumbing for those profiles while still
    projecting back to the legacy `SandboxPolicy` shape that the current
    macOS and Linux sandbox backends consume.
    
    It also tightens the filesystem profile model so scoped entries only
    exist for `:project_roots`, and so nested keys must stay within a
    project root instead of using `.` or `..` traversal.
    
    This drops support for the short-lived `[permissions.network]` in
    `config.toml` because now that would be interpreted as a profile named
    `network` within `[permissions]`.
    
    ## What Changed
    
    - added `PermissionsToml`, `PermissionProfileToml`,
    `FilesystemPermissionsToml`, and `FilesystemPermissionToml` so config
    can parse named profiles under `[permissions.<profile>.filesystem]`
    - added top-level `default_permissions` selection, validation for
    missing or unknown profiles, and compilation from a named profile into
    split `FileSystemSandboxPolicy` and `NetworkSandboxPolicy` values
    - taught config loading to choose between the legacy `sandbox_mode` path
    and the profile-based path without breaking legacy users
    - introduced `codex-protocol::permissions` for the split filesystem and
    network sandbox types, and stored those alongside the legacy projected
    `sandbox_policy` in runtime `Permissions`
    - modeled `FileSystemSpecialPath` so only `ProjectRoots` can carry a
    nested `subpath`, matching the intended config syntax instead of
    allowing invalid states for other special paths
    - restricted scoped filesystem maps to `:project_roots`, with validation
    that nested entries are non-empty descendant paths and cannot use `.` or
    `..` to escape the project root
    - kept existing runtime consumers working by projecting
    `FileSystemSandboxPolicy` back into `SandboxPolicy`, with an explicit
    error for profiles that request writes outside the workspace root
    - loaded proxy settings from top-level `[network]`
    - regenerated `core/config.schema.json`
    
    ## Verification
    
    - added config coverage for profile deserialization,
    `default_permissions` selection, top-level `[network]` loading, network
    enablement, rejection of writes outside the workspace root, rejection of
    nested entries for non-`:project_roots` special paths, and rejection of
    parent-directory traversal in `:project_roots` maps
    - added protocol coverage for the legacy bridge rejecting non-workspace
    writes
    
    ## Docs
    
    - update the Codex config docs on developers.openai.com/codex to
    document named `[permissions.<profile>]` entries, `default_permissions`,
    scoped `:project_roots` syntax, the descendant-path restriction for
    nested `:project_roots` entries, and top-level `[network]` proxy
    configuration
    
    
    
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13434).
    * #13453
    * #13452
    * #13451
    * #13449
    * #13448
    * #13445
    * #13440
    * #13439
    * __->__ #13434
  • feat: structured plugin parsing (#13711)
    #### What
    
    Add structured `@plugin` parsing and TUI support for plugin mentions.
    
    - Core: switch from plain-text `@display_name` parsing to structured
    `plugin://...` mentions via `UserInput::Mention` and
    `[$...](plugin://...)` links in text, same pattern as apps/skills.
    - TUI: add plugin mention popup, autocomplete, and chips when typing
    `$`. Load plugin capability summaries and feed them into the composer;
    plugin mentions appear alongside skills and apps.
    - Generalize mention parsing to a sigil parameter, still defaults to `$`
    
    <img width="797" height="119" alt="image"
    src="https://github.com/user-attachments/assets/f0fe2658-d908-4927-9139-73f850805ceb"
    />
    
    Builds on #13510. Currently clients have to build their own `id` via
    `plugin@marketplace` and filter plugins to show by `enabled`, but we
    will add `id` and `available` as fields returned from `plugin/list`
    soon.
    
    ####Tests
    
    Added tests, verified locally.
  • Clarify sandbox permission override helper semantics (#13703)
    ## Summary
    Today `SandboxPermissions::requires_additional_permissions()` does not
    actually mean "is `WithAdditionalPermissions`". It returns `true` for
    any non-default sandbox override, including `RequireEscalated`. That
    broad behavior is relied on in multiple `main` callsites.
    
    The naming is security-sensitive because `SandboxPermissions` is used on
    shell-like tool calls to tell the executor how a single command should
    relate to the turn sandbox:
    - `UseDefault`: run with the turn sandbox unchanged
    - `RequireEscalated`: request execution outside the sandbox
    - `WithAdditionalPermissions`: stay sandboxed but widen permissions for
    that command only
    
    ## Problem
    The old helper name reads as if it only applies to the
    `WithAdditionalPermissions` variant. In practice it means "this command
    requested any explicit sandbox override."
    
    That ambiguity made it easy to read production checks incorrectly and
    made the guardian change look like a standalone `main` fix when it is
    not.
    
    On `main` today:
    - `shell` and `unified_exec` intentionally reject any explicit
    `sandbox_permissions` request unless approval policy is `OnRequest`
    - `exec_policy` intentionally treats any explicit sandbox override as
    prompt-worthy in restricted sandboxes
    - tests intentionally serialize both `RequireEscalated` and
    `WithAdditionalPermissions` as explicit sandbox override requests
    
    So changing those callsites from the broad helper to a narrow
    `WithAdditionalPermissions` check would be a behavior change, not a pure
    cleanup.
    
    ## What This PR Does
    - documents `SandboxPermissions` as a per-command sandbox override, not
    a generic permissions bag
    - adds `requests_sandbox_override()` for the broad meaning: anything
    except `UseDefault`
    - adds `uses_additional_permissions()` for the narrow meaning: only
    `WithAdditionalPermissions`
    - keeps `requires_additional_permissions()` as a compatibility alias to
    the broad meaning for now
    - updates the current broad callsites to use the accurately named broad
    helper
    - adds unit coverage that locks in the semantics of all three helpers
    
    ## What This PR Does Not Do
    This PR does not change runtime behavior. That is intentional.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • [elicitations] Switch to use MCP style elicitation payload for mcp tool approvals. (#13621)
    - [x] Switch to use MCP style elicitation payload for mcp tool
    approvals.
    - [ ] TODO: Update the UI to support the full spec.
  • Enabling CWD Saving for Image-Gen (#13607)
    Codex now saves the generated image on to your current working
    directory.
  • refactor: remove proxy admin endpoint (#13687)
    ## Summary
    - delete the network proxy admin server and its runtime listener/task
    plumbing
    - remove the admin endpoint config, runtime, requirement, protocol,
    schema, and debug-surface fields
    - update proxy docs to reflect the remaining HTTP and SOCKS listeners
    only
  • fix: accept two macOS automation input shapes for approval payload compatibility (#13683)
    ## Summary
    This PR:
    1. fixes a deserialization mismatch for macOS automation permissions in
    approval payloads by making core parsing accept both supported wire
    shapes for bundle IDs.
    2. added `#[serde(default)]` to `MacOsSeatbeltProfileExtensions` so
    omitted fields deserialize to secure defaults.
    
    
    ## Why this change is needed
    `MacOsAutomationPermission` uses `#[serde(try_from =
    "MacOsAutomationPermissionDe")]`, so deserialization is controlled by
    `MacOsAutomationPermissionDe`. After we aligned v2
    `additionalPermissions.macos.automations` to the core shape, approval
    payloads started including `{ "bundle_ids": [...] }` in some paths.
    `MacOsAutomationPermissionDe` previously accepted only `"none" | "all"`
    or a plain array, so object-shaped bundle IDs failed with `data did not
    match any variant of untagged enum MacOsAutomationPermissionDe`. This
    change restores compatibility by accepting both forms while preserving
    existing normalization behavior (trim values and map empty bundle lists
    to `None`).
    
    ## Validation
    
    saw this error went away when running
    ```
    cargo run -p codex-app-server-test-client -- \
        --codex-bin ./target/debug/codex \
        -c 'approval_policy="on-request"' \
        -c 'features.shell_zsh_fork=true' \
        -c 'zsh_path="/tmp/codex-zsh-fork/package/vendor/aarch64-apple-darwin/zsh/macos-15/zsh"' \
        send-message-v2 --experimental-api \
        'Use $apple-notes and run scripts/notes_info now.'
    ```
    :
    ```
    Error: failed to deserialize ServerRequest from JSONRPCRequest
    
    Caused by:
        data did not match any variant of untagged enum MacOsAutomationPermissionDe
    ```
  • chore: remove unused legacy macOS permission types (#13677)
    ## Summary
    
    This PR removes legacy macOS permission model types from
    `codex-rs/protocol/src/models.rs`:
    
    - `MacOsPermissions`
    - `MacOsPreferencesValue`
    - `MacOsAutomationValue`
    
    The protocol now relies on the current `MacOsSeatbeltProfileExtensions`
    model for macOS permission data.
  • core/protocol: add structured macOS additional permissions and merge them into sandbox execution (#13499)
    ## Summary
    - Introduce strongly-typed macOS additional permissions across
    protocol/core/app-server boundaries.
    - Merge additional permissions into effective sandbox execution,
    including macOS seatbelt profile extensions.
    - Expand docs, schema/tool definitions, UI rendering, and tests for
    `network`, `file_system`, and `macos` additional permissions.
  • feat(core): persist trace_id for turns in RolloutItem::TurnContext (#13602)
    This PR adds a durable trace linkage for each turn by storing the active
    trace ID on the rollout TurnContext record stored in session rollout
    files.
    
    Before this change, we propagated trace context at runtime but didn’t
    persist a stable per-turn trace key in rollout history. That made
    after-the-fact debugging harder (for example, mapping a historical turn
    to the corresponding trace in datadog). This sets us up for much easier
    debugging in the future.
    
    ### What changed
    - Added an optional `trace_id` to TurnContextItem (rollout schema).
    - Added a small OTEL helper to read the current span trace ID.
    - Captured `trace_id` when creating `TurnContext` and included it in
    `to_turn_context_item()`.
    - Updated tests and fixtures that construct TurnContextItem so
    older/no-trace cases still work.
    
    ### Why this approach
    TurnContext is already the canonical durable per-turn metadata in
    rollout. This keeps ownership clean: trace linkage lives with other
    persisted turn metadata.
  • feat(app-server): support mcp elicitations in v2 api (#13425)
    This adds a first-class server request for MCP server elicitations:
    `mcpServer/elicitation/request`.
    
    Until now, MCP elicitation requests only showed up as a raw
    `codex/event/elicitation_request` event from core. That made it hard for
    v2 clients to handle elicitations using the same request/response flow
    as other server-driven interactions (like shell and `apply_patch`
    tools).
    
    This also updates the underlying MCP elicitation request handling in
    core to pass through the full MCP request (including URL and form data)
    so we can expose it properly in app-server.
    
    ### Why not `item/mcpToolCall/elicitationRequest`?
    This is because MCP elicitations are related to MCP servers first, and
    only optionally to a specific MCP tool call.
    
    In the MCP protocol, elicitation is a server-to-client capability: the
    server sends `elicitation/create`, and the client replies with an
    elicitation result. RMCP models it that way as well.
    
    In practice an elicitation is often triggered by an MCP tool call, but
    not always.
    
    ### What changed
    - add `mcpServer/elicitation/request` to the v2 app-server API
    - translate core `codex/event/elicitation_request` events into the new
    v2 server request
    - map client responses back into `Op::ResolveElicitation` so the MCP
    server can continue
    - update app-server docs and generated protocol schema
    - add an end-to-end app-server test that covers the full round trip
    through a real RMCP elicitation flow
    - The new test exercises a realistic case where an MCP tool call
    triggers an elicitation, the app-server emits
    mcpServer/elicitation/request, the client accepts it, and the tool call
    resumes and completes successfully.
    
    ### app-server API flow
    - Client starts a thread with `thread/start`.
    - Client starts a turn with `turn/start`.
    - App-server sends `item/started` for the `mcpToolCall`.
    - While that tool call is in progress, app-server sends
    `mcpServer/elicitation/request`.
    - Client responds to that request with `{ action: "accept" | "decline" |
    "cancel" }`.
    - App-server sends `serverRequest/resolved`.
    - App-server sends `item/completed` for the mcpToolCall.
    - App-server sends `turn/completed`.
    - If the turn is interrupted while the elicitation is pending,
    app-server still sends `serverRequest/resolved` before the turn
    finishes.
  • chore: add web_search_tool_type for image support (#13538)
    add `web_search_tool_type` on model_info that can be populated from
    backend. will be used to filter which models can use `web_search` with
    images and which cant.
    
    added small unit test.
  • image-gen-event/client_processing (#13512)
    enabling client-side to process with image-generation capabilities
    (setting app-server)
  • image-gen-core (#13290)
    Core tool-calling for image-gen, handles requesting and receiving logic
    for images using response API
  • chore: Nest skill and protocol network permissions under network.enabled (#13427)
    ## Summary
    
    Changes the permission profile shape from a bare network boolean to a
    nested object.
    
    Before:
    
    ```yaml
    permissions:
      network: true
    ```
    
    After:
    
    ```yaml
    permissions:
      network:
        enabled: true
    ```
    
    This also updates the shared Rust and app-server protocol types so
    `PermissionProfile.network` is no longer `Option<bool>`, but
    `Option<NetworkPermissions>` with `enabled: Option<bool>`.
    
    ## What Changed
    
    - Updated `PermissionProfile` in `codex-rs/protocol/src/models.rs`:
    - `pub network: Option<bool>` -> `pub network:
    Option<NetworkPermissions>`
    - Added `NetworkPermissions` with:
      - `pub enabled: Option<bool>`
    - Changed emptiness semantics so `network` is only considered empty when
    `enabled` is `None`
    - Updated skill metadata parsing to accept `permissions.network.enabled`
    - Updated core permission consumers to read
    `network.enabled.unwrap_or(false)` where a concrete boolean is needed
    - Updated app-server v2 protocol types and regenerated schema/TypeScript
    outputs
    - Updated docs to mention `additionalPermissions.network.enabled`
  • Feat: Preserve network access on read-only sandbox policies (#13409)
    ## Summary
    
    `PermissionProfile.network` could not be preserved when additional or
    compiled permissions resolved to
    `SandboxPolicy::ReadOnly`, because `ReadOnly` had no network_access
    field. This change makes read-only + network
    enabled representable directly and threads that through the protocol,
    app-server v2 mirror, and permission-
      merging logic.
    
    ## What changed
    
    - Added `network_access: bool` to `SandboxPolicy::ReadOnly` in the core
    protocol and app-server v2 protocol.
    - Kept backward compatibility by defaulting the new field to false, so
    legacy read-only payloads still
        deserialize unchanged.
    - Updated `has_full_network_access()` and sandbox summaries to respect
    read-only network access.
      - Preserved PermissionProfile.network when:
          - compiling skill permission profiles into sandbox policies
          - normalizing additional permissions
          - merging additional permissions into existing sandbox policies
    - Updated the approval overlay to show network in the rendered
    permission rule when requested.
      - Regenerated app-server schema fixtures for the new v2 wire shape.
  • feat(app-server): propagate app-server trace context into core (#13368)
    ### Summary
    Propagate trace context originating at app-server RPC method handlers ->
    codex core submission loop (so this includes spans such as `run_turn`!).
    This implements PR 2 of the app-server tracing rollout.
    
    This also removes the old lower-level env-based reparenting in core so
    explicit request/submission ancestry wins instead of being overridden by
    ambient `TRACEPARENT` state.
    
    ### What changed
    - Added `trace: Option<W3cTraceContext>` to codex_protocol::Submission
    - Taught `Codex::submit()` / `submit_with_id()` to automatically capture
    the current span context when constructing or forwarding a submission
    - Wrapped the core submission loop in a submission_dispatch span
    parented from Submission.trace
    - Warn on invalid submission trace carriers and ignore them cleanly
    - Removed the old env-based downstream reparenting path in core task
    execution
    - Stopped OTEL provider init from implicitly attaching env trace context
    process-wide
    - Updated mcp-server Submission call sites for the new field
    
    Added focused unit tests for:
    - capturing trace context into Submission
    - preferring `Submission.trace` when building the core dispatch span
    
    ### Why
    PR 1 gave us consistent inbound request spans in app-server, but that
    only covered the transport boundary. For long-running work like turns
    and reviews, the important missing piece was preserving ancestry after
    the request handler returns and core continues work on a different async
    path.
    
    This change makes that handoff explicit and keeps the parentage rules
    simple:
    - app-server request span sets the current context
    - `Submission.trace` snapshots that context
    - core restores it once, at the submission boundary
    - deeper core spans inherit naturally
    
    That also lets us stop relying on env-based reparenting for this path,
    which was too ambient and could override explicit ancestry.
  • Add under-development original-resolution view_image support (#13050)
    ## Summary
    
    Add original-resolution support for `view_image` behind the
    under-development `view_image_original_resolution` feature flag.
    
    When the flag is enabled and the target model is `gpt-5.3-codex` or
    newer, `view_image` now preserves original PNG/JPEG/WebP bytes and sends
    `detail: "original"` to the Responses API instead of using the legacy
    resize/compress path.
    
    ## What changed
    
    - Added `view_image_original_resolution` as an under-development feature
    flag.
    - Added `ImageDetail` to the protocol models and support for serializing
    `detail: "original"` on tool-returned images.
    - Added `PromptImageMode::Original` to `codex-utils-image`.
      - Preserves original PNG/JPEG/WebP bytes.
      - Keeps legacy behavior for the resize path.
    - Updated `view_image` to:
    - use the shared `local_image_content_items_with_label_number(...)`
    helper in both code paths
      - select original-resolution mode only when:
        - the feature flag is enabled, and
        - the model slug parses as `gpt-5.3-codex` or newer
    - Kept local user image attachments on the existing resize path; this
    change is specific to `view_image`.
    - Updated history/image accounting so only `detail: "original"` images
    use the docs-based GPT-5 image cost calculation; legacy images still use
    the old fixed estimate.
    - Added JS REPL guidance, gated on the same feature flag, to prefer JPEG
    at 85% quality unless lossless is required, while still allowing other
    formats when explicitly requested.
    - Updated tests and helper code that construct
    `FunctionCallOutputContentItem::InputImage` to carry the new `detail`
    field.
    
    ## Behavior
    
    ### Feature off
    - `view_image` keeps the existing resize/re-encode behavior.
    - History estimation keeps the existing fixed-cost heuristic.
    
    ### Feature on + `gpt-5.3-codex+`
    - `view_image` sends original-resolution images with `detail:
    "original"`.
    - PNG/JPEG/WebP source bytes are preserved when possible.
    - History estimation uses the GPT-5 docs-based image-cost calculation
    for those `detail: "original"` images.
    
    
    #### [git stack](https://github.com/magus/git-stack-cli)
    - 👉 `1` https://github.com/openai/codex/pull/13050
    -  `2` https://github.com/openai/codex/pull/13331
    -  `3` https://github.com/openai/codex/pull/13049
  • realtime prompt changes (#13376)
    # External (non-OpenAI) Pull Request Requirements
    
    Before opening this Pull Request, please read the dedicated
    "Contributing" markdown file or your PR may be closed:
    https://github.com/openai/codex/blob/main/docs/contributing.md
    
    If your PR conforms to our contribution guidelines, replace this text
    with a detailed and high quality description of your changes.
    
    Include a link to a bug report or enhancement request.
  • app-server service tier plumbing (plus some cleanup) (#13334)
    followup to https://github.com/openai/codex/pull/13212 to expose fast
    tier controls to app server
    (majority of this PR is generated schema jsons - actual code is +69 /
    -35 and +24 tests )
    
    - add service tier fields to the app-server protocol surfaces used by
    thread lifecycle, turn start, config, and session configured events
    - thread service tier through the app-server message processor and core
    thread config snapshots
    - allow runtime config overrides to carry service tier for app-server
    callers
    
    cleanup:
    - Removing useless "legacy" code supporting "standard" - we moved to
    None | "fast", so "standard" is not needed.
  • add fast mode toggle (#13212)
    - add a local Fast mode setting in codex-core (similar to how model id
    is currently stored on disk locally)
    - send `service_tier=priority` on requests when Fast is enabled
    - add `/fast` in the TUI and persist it locally
    - feature flag
  • Update realtime websocket API (#13265)
    - migrate the realtime websocket transport to the new session and
    handoff flow
    - make the realtime model configurable in config.toml and use API-key
    auth for the websocket
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • feat(app-server): add tracing to all app-server APIs (#13285)
    ### Overview
    This PR adds the first piece of tracing for app-server JSON-RPC
    requests.
    
    There are two main changes:
    - JSON-RPC requests can now take an optional W3C trace context at the
    top level via a `trace` field (`traceparent` / `tracestate`).
    - app-server now creates a dedicated request span for every inbound
    JSON-RPC request in `MessageProcessor`, and uses the request-level trace
    context as the parent when present.
    
    For compatibility with existing flows, app-server still falls back to
    the TRACEPARENT env var when there is no request-level traceparent.
    
    This PR is intentionally scoped to the app-server boundary. In a
    followup, we'll actually propagate trace context through the async
    handoff into core execution spans like run_turn, which will make
    app-server traces much more useful.
    
    ### Spans
    A few details on the app-server span shape:
    - each inbound request gets its own server span
    - span/resource names are based on the JSON-RPC method (`initialize`,
    `thread/start`, `turn/start`, etc.)
    - spans record transport (stdio vs websocket), request id, connection
    id, and client name/version when available
    - `initialize` stores client metadata in session state so later requests
    on the same connection can reuse it
  • feat: polluted memories (#13008)
    Add a feature flag to disable memory creation for "polluted"