codex

Add Smart Approvals guardian review across core, app-server, and TUI (#13860 )

## Summary
- add `approvals_reviewer = "user" | "guardian_subagent"` as the runtime
control for who reviews approval requests
- route Smart Approvals guardian review through core for command
execution, file changes, managed-network approvals, MCP approvals, and
delegated/subagent approval flows
- expose guardian review in app-server with temporary unstable
`item/autoApprovalReview/{started,completed}` notifications carrying
`targetItemId`, `review`, and `action`
- update the TUI so Smart Approvals can be enabled from `/experimental`,
aligned with the matching `/approvals` mode, and surfaced clearly while
reviews are pending or resolved

## Runtime model
This PR does not introduce a new `approval_policy`.

Instead:
- `approval_policy` still controls when approval is needed
- `approvals_reviewer` controls who reviewable approval requests are
routed to:
  - `user`
  - `guardian_subagent`

`guardian_subagent` is a carefully prompted reviewer subagent that
gathers relevant context and applies a risk-based decision framework
before approving or denying the request.

The `smart_approvals` feature flag is a rollout/UI gate. Core runtime
behavior keys off `approvals_reviewer`.

When Smart Approvals is enabled from the TUI, it also switches the
current `/approvals` settings to the matching Smart Approvals mode so
users immediately see guardian review in the active thread:
- `approval_policy = on-request`
- `approvals_reviewer = guardian_subagent`
- `sandbox_mode = workspace-write`

Users can still change `/approvals` afterward.

Config-load behavior stays intentionally narrow:
- plain `smart_approvals = true` in `config.toml` remains just the
rollout/UI gate and does not auto-set `approvals_reviewer`
- the deprecated `guardian_approval = true` alias migration does
backfill `approvals_reviewer = "guardian_subagent"` in the same scope
when that reviewer is not already configured there, so old configs
preserve their original guardian-enabled behavior

ARC remains a separate safety check. For MCP tool approvals, ARC
escalations now flow into the configured reviewer instead of always
bypassing guardian and forcing manual review.

## Config stability
The runtime reviewer override is stable, but the config-backed
app-server protocol shape is still settling.

- `thread/start`, `thread/resume`, and `turn/start` keep stable
`approvalsReviewer` overrides
- the config-backed `approvals_reviewer` exposure returned via
`config/read` (including profile-level config) is now marked
`[UNSTABLE]` / experimental in the app-server protocol until we are more
confident in that config surface

## App-server surface
This PR intentionally keeps the guardian app-server shape narrow and
temporary.

It adds generic unstable lifecycle notifications:
- `item/autoApprovalReview/started`
- `item/autoApprovalReview/completed`

with payloads of the form:
- `{ threadId, turnId, targetItemId, review, action? }`

`review` is currently:
- `{ status, riskScore?, riskLevel?, rationale? }`
- where `status` is one of `inProgress`, `approved`, `denied`, or
`aborted`

`action` carries the guardian action summary payload from core when
available. This lets clients render temporary standalone pending-review
UI, including parallel reviews, even when the underlying tool item has
not been emitted yet.

These notifications are explicitly documented as `[UNSTABLE]` and
expected to change soon.

This PR does **not** persist guardian review state onto `thread/read`
tool items. The intended follow-up is to attach guardian review state to
the reviewed tool item lifecycle instead, which would improve
consistency with manual approvals and allow thread history / reconnect
flows to replay guardian review state directly.

## TUI behavior
- `/experimental` exposes the rollout gate as `Smart Approvals`
- enabling it in the TUI enables the feature and switches the current
session to the matching Smart Approvals `/approvals` mode
- disabling it in the TUI clears the persisted `approvals_reviewer`
override when appropriate and returns the session to default manual
review when the effective reviewer changes
- `/approvals` still exposes the reviewer choice directly
- the TUI renders:
- pending guardian review state in the live status footer, including
parallel review aggregation
  - resolved approval/denial state in history

## Scope notes
This PR includes the supporting core/runtime work needed to make Smart
Approvals usable end-to-end:
- shell / unified-exec / apply_patch / managed-network / MCP guardian
review
- delegated/subagent approval routing into guardian review
- guardian review risk metadata and action summaries for app-server/TUI
- config/profile/TUI handling for `smart_approvals`, `guardian_approval`
alias migration, and `approvals_reviewer`
- a small internal cleanup of delegated approval forwarding to dedupe
fallback paths and simplify guardian-vs-parent approval waiting (no
intended behavior change)

Out of scope for this PR:
- redesigning the existing manual approval protocol shapes
- persisting guardian review state onto app-server `ThreadItem`s
- delegated MCP elicitation auto-review (the current delegated MCP
guardian shim only covers the legacy `RequestUserInput` path)

---------

Co-authored-by: Codex <noreply@openai.com>

Charley Cunningham · 2026-03-13 15:27:00 -07:00

bc24017d64

app-server: add v2 filesystem APIs (#14245 )

Add a protocol-level filesystem surface to the v2 app-server so Codex
clients can read and write files, inspect directories, and subscribe to
path changes without relying on host-specific helpers.

High-level changes:
- define the new v2 fs/readFile, fs/writeFile, fs/createDirectory,
fs/getMetadata, fs/readDirectory, fs/remove, fs/copy RPCs
- implement the app-server handlers, including absolute-path validation,
base64 file payloads, recursive copy/remove semantics
- document the API, regenerate protocol schemas/types, and add
end-to-end tests for filesystem operations, copy edge cases

Testing plan:
- validate protocol serialization and generated schema output for the
new fs request, response, and notification types
- run app-server integration coverage for file and directory CRUD paths,
metadata/readDirectory responses, copy failure modes, and absolute-path
validation

Ruslan Nigmatullin · 2026-03-13 14:42:20 -07:00

f8f82bfc2b

feat(app-server, core): add more spans (#14479 )

## Description

This PR expands tracing coverage across app-server thread startup, core
session initialization, and the Responses transport layer. It also gives
core dispatch spans stable operation-specific names so traces are easier
to follow than the old generic `submission_dispatch` spans.

Also use `fmt::Display` for types that we serialize in traces so we send
strings instead of rust types

Owen Lin · 2026-03-13 13:16:33 -07:00

014e19510d

app-server: Add platform os and family to init response (#14527 )

This allows the client to pick os-specific behavior while interacting
with the app server, e.g. to use proper path separators.

Ruslan Nigmatullin · 2026-03-13 19:07:54 +00:00

50558e6507

feat: add plugin/read. (#14445 )

return more information for a specific plugin.

xl-openai · 2026-03-12 16:52:21 -07:00

1ea69e8d50

Rename reject approval policy to granular (#14516 )

Jack Mousseau · 2026-03-12 16:38:04 -07:00

b7dba72dbd

chore(app-server): stop exporting EventMsg schemas (#14478 )

Follow up to https://github.com/openai/codex/pull/14392, stop exporting
EventMsg types to TypeScript and JSON schema since we no longer emit
them.

Owen Lin · 2026-03-12 12:16:05 -07:00

4724a2e9e7

chore: use AVAILABLE and ON_INSTALL as default plugin install and auth policies (#14407 )

make `AVAILABLE` the default plugin installPolicy when unset in
`marketplace.json`. similarly, make `ON_INSTALL` the default authPolicy.

this means, when unset, plugins are available to be installed (but not
auto-installed), and the contained connectors will be authed at
install-time.

updated tests.

sayan-oai · 2026-03-11 20:33:17 -07:00

917c2df201

Include spawn agent model metadata in app-server items (#14410 )

- add model and reasoning effort to app-server collab spawn items and
notifications
- regenerate app-server protocol schemas for the new fields

---------

Co-authored-by: Codex <noreply@openai.com>

Ahmed Ibrahim · 2026-03-11 19:25:21 -07:00

bf5e997b31

chore(app-server): delete unused rpc methods from v1.rs (#14394 )

## Description

This PR trims `app-server-protocol`'s v1 surface down to the small set
of legacy types we still actually use.

Unfortunately, we can't delete all of them yet because:
- a few one-off v1 RPCs are still used by the Codex app
- a few of these app-server-protocol v1 types are actually imported by
core crates

This change deletes that unused RPC surface, keeps the remaining
compatibility types in place, and makes the crate root re-export only
the v1 structs that downstream crates still depend on.

## Why

The main goal here is to make the legacy protocol surface match reality.
Leaving a large pile of dead v1 structs in place makes it harder to tell
which compatibility paths are still intentional, and it keeps old
schema/types around even though nothing should be building against them
anymore.

This also gives us a cleaner boundary for future cleanup. Instead of
re-exporting all of `protocol::v1::*`, the crate now explicitly exposes
only the v1 types that are still live, which makes it much easier to see
what remains and delete more safely later.

## What changed

- Deleted the unused v1 RPC/request/response structs from
`app-server-protocol/src/protocol/v1.rs`.
- Kept the small set of v1 compatibility types that are still live,
including:
  - `initialize`
  - `getConversationSummary`
  - `getAuthStatus`
  - `gitDiffToRemote`
  - legacy approval payloads
  - config-related structs still used by downstream crates
- Replaced the blanket `pub use protocol::v1::*` export in
`app-server-protocol/src/lib.rs` with an explicit list of the remaining
supported v1 types.
- Regenerated the schema/type artifacts, which also updated the
`InitializeCapabilities` opt-out example to use `thread/started` instead
of the old `codex/event/session_configured` example.

## Validation

- `just write-app-server-schema`
- `cargo test -p codex-app-server-protocol`

## Follow-up

The next cleanup is to keep shrinking the remaining v1 compatibility
surface as callers migrate off it. Once the remaining consumers stop
importing these legacy types, we should be able to remove more of the v1
module and eventually stop exporting it from the crate root entirely.

Owen Lin · 2026-03-12 01:41:16 +00:00

c1ea3f95d1

feat: search_tool migrate to bring you own tool of Responses API (#14274 )

## Why

to support a new bring your own search tool in Responses
API(https://developers.openai.com/api/docs/guides/tools-tool-search#client-executed-tool-search)
we migrating our bm25 search tool to use official way to execute search
on client and communicate additional tools to the model.

## What
- replace the legacy `search_tool_bm25` flow with client-executed
`tool_search`
- add protocol, SSE, history, and normalization support for
`tool_search_call` and `tool_search_output`
- return namespaced Codex Apps search results and wire namespaced
follow-up tool calls back into MCP dispatch

Anton Panasenko · 2026-03-11 17:51:51 -07:00

77b0c75267

chore(app-server): stop emitting codex/event/ notifications (#14392 )

## Description

This PR stops emitting legacy `codex/event/*` notifications from the
public app-server transports.

It's been a long time coming! app-server was still producing a raw
notification stream from core, alongside the typed app-server
notifications and server requests, for compatibility reasons. Now,
external clients should no longer be depending on those legacy
notifications, so this change removes them from the stdio and websocket
contract and updates the surrounding docs, examples, and tests to match.

### Caveat
I left the "in-process" version of app-server alone for now, since
`codex exec` was recently based on top of app-server via this in-process
form here: https://github.com/openai/codex/pull/14005

Seems like `codex exec` still consumes some legacy notifications
internally, so this branch only removes `codex/event/*` from app-server
over stdio and websockets.

## Follow-up

Once `codex exec` is fully migrated off `codex/event/*` notifications,
we'll be able to stop emitting them entirely entirely instead of just
filtering it at the external transport boundary.

Owen Lin · 2026-03-12 00:45:20 +00:00

72631755e0

chore: wire through plugin policies + category from marketplace.json (#14305 )

wire plugin marketplace metadata through app-server endpoints:
- `plugin/list` has `installPolicy` and `authPolicy`
- `plugin/install` has plugin-level `authPolicy`

`plugin/install` also now enforces `NOT_AVAILABLE` `installPolicy` when
installing.


added tests.

sayan-oai · 2026-03-11 12:33:10 -07:00

7b2cee53db

Show spawned agent model and effort in TUI (#14273 )

- include the requested sub-agent model and reasoning effort in the
spawn begin event\n- render that metadata next to the spawned agent name
and role in the TUI transcript

---------

Co-authored-by: Codex <noreply@openai.com>

Ahmed Ibrahim · 2026-03-11 12:33:09 -07:00

285b3a5143

chore: add a separate reject-policy flag for skill approvals (#14271 )

## Summary
- add `skill_approval` to `RejectConfig` and the app-server v2
`AskForApproval::Reject` payload so skill-script prompts can be
configured independently from sandbox and rule-based prompts
- update Unix shell escalation to reject prompts based on the actual
decision source, keeping prefix rules tied to `rules`, unmatched command
fallbacks tied to `sandbox_approval`, and skill scripts tied to
`skill_approval`
- regenerate the affected protocol/config schemas and expand
unit/integration coverage for the new flag and skill approval behavior

Celia Chen · 2026-03-11 12:33:09 -07:00

c1a424691f

feat: Add additional macOS Sandbox Permissions for Launch Services, Contacts, Reminders (#14155 )

Add additional macOS Sandbox Permissions levers for the following:

- Launch Services
- Contacts
- Reminders

Leo Shimonaka · 2026-03-11 12:33:09 -07:00

889b4796fc

Add ephemeral flag support to thread fork (#14248 )

### Summary
This PR adds first-class ephemeral support to thread/fork, bringing it
in line with thread/start. The goal is to support one-off completions on
full forked threads without persisting them as normal user-visible
threads.

### Testing

joeytrasatti-openai · 2026-03-11 12:33:08 -07:00

8ac27b2a16

app-server: propagate nested experimental gating for AskForApproval::Reject (#14191 )

## Summary
This change makes `AskForApproval::Reject` gate correctly anywhere it
appears inside otherwise-stable app-server protocol types.

Previously, experimental gating for `approval_policy: Reject` was
handled with request-specific logic in `ClientRequest` detection. That
covered a few request params types, but it did not generalize to other
nested uses such as `ProfileV2`, `Config`, `ConfigReadResponse`, or
`ConfigRequirements`.

This PR replaces that ad hoc handling with a generic nested experimental
propagation mechanism.

## Testing

seeing this when run app-server-test-client without experimental api
enabled:
```
 initialize response: InitializeResponse { user_agent: "codex-toy-app-server/0.0.0 (Mac OS 26.3.1; arm64) vscode/2.4.36 (codex-toy-app-server; 0.0.0)" }
> {
>   "id": "50244f6a-270a-425d-ace0-e9e98205bde7",
>   "method": "thread/start",
>   "params": {
>     "approvalPolicy": {
>       "reject": {
>         "mcp_elicitations": false,
>         "request_permissions": true,
>         "rules": false,
>         "sandbox_approval": true
>       }
>     },
>     "baseInstructions": null,
>     "config": null,
>     "cwd": null,
>     "developerInstructions": null,
>     "dynamicTools": null,
>     "ephemeral": null,
>     "experimentalRawEvents": false,
>     "mockExperimentalField": null,
>     "model": null,
>     "modelProvider": null,
>     "persistExtendedHistory": false,
>     "personality": null,
>     "sandbox": null,
>     "serviceName": null
>   }
> }
< {
<   "error": {
<     "code": -32600,
<     "message": "askForApproval.reject requires experimentalApi capability"
<   },
<   "id": "50244f6a-270a-425d-ace0-e9e98205bde7"
< }
[verified] thread/start rejected approvalPolicy=Reject without experimentalApi
```

---------

Co-authored-by: celia-oai <celia@openai.com>

Dylan Hurd · 2026-03-11 12:33:08 -07:00

d5694529ca

feat: Allow sync with remote plugin status. (#14176 )

Add forceRemoteSync to plugin/list.
When it is set to True, we will sync the local plugin status with the
remote one (backend-api/plugins/list).

xl-openai · 2026-03-11 12:33:08 -07:00

d751e68f44

Use realtime transcript for handoff context (#14132 )

- collect input/output transcript deltas into active handoff transcript
state
- attach and clear that transcript on each handoff, and regenerate
schema/tests

Ahmed Ibrahim · 2026-03-09 22:30:03 -07:00

2e24be2134

Implemented thread-level atomic elicitation counter for stopwatch pausing (#12296 )

### Purpose
While trying to build out CLI-Tools for the agent to use under skills we
have found that those tools sometimes need to invoke a user elicitation.
These elicitations are handled out of band of the codex app-server but
need to indicate to the exec manager that the command running is not
going to progress on the usual timeout horizon.

### Example
Model calls universal exec:
`$ download-credit-card-history --start-date 2026-01-19 --end-date
2026-02-19 > credit_history.jsonl`

download-cred-card-history might hit a hosted/preauthenticated service
to fetch data. That service might decide that the request requires an
end user approval the access to the personal data. It should be able to
signal to the running thread that the command in question is blocked on
user elicitation. In that case we want the exec to continue, but the
timeout to not expire on the tool call, essentially freezing time until
the user approves or rejects the command at which point the tool would
signal the app-server to decrement the outstanding elicitation count.
Now timeouts would proceed as normal.

### What's Added

- New v2 RPC methods:
    - thread/increment_elicitation
    - thread/decrement_elicitation
- Protocol updates in:
    - codex-rs/app-server-protocol/src/protocol/common.rs
    - codex-rs/app-server-protocol/src/protocol/v2.rs
- App-server handlers wired in:
    - codex-rs/app-server/src/codex_message_processor.rs

### Behavior

- Counter starts at 0 per thread.
- increment atomically increases the counter.
- decrement atomically decreases the counter; decrement at 0 returns
invalid request.
- Transition rules:
- 0 -> 1: broadcast pause state, pausing all active stopwatches
immediately.
    - \>0 -> >0: remain paused.
    - 1 -> 0: broadcast unpause state, resuming stopwatches.
- Core thread/session logic:
    - codex-rs/core/src/codex_thread.rs
    - codex-rs/core/src/codex.rs
    - codex-rs/core/src/mcp_connection_manager.rs

### Exec-server stopwatch integration

- Added centralized stopwatch tracking/controller:
    - codex-rs/exec-server/src/posix/stopwatch_controller.rs
- Hooked pause/unpause broadcast handling + stopwatch registration:
    - codex-rs/exec-server/src/posix/mcp.rs
    - codex-rs/exec-server/src/posix/stopwatch.rs
    - codex-rs/exec-server/src/posix.rs

Channing Conger · 2026-03-09 22:29:26 -07:00

c6343e0649

fix(core) default RejectConfig.request_permissions (#14165 )

## Summary
Adds a default here so existing config deserializes

## Testing
- [x] Added a unit test

Dylan Hurd · 2026-03-10 04:56:23 +00:00

772259b01f

start of hooks engine (#13276 )

(Experimental)

This PR adds a first MVP for hooks, with SessionStart and Stop

The core design is:

- hooks live in a dedicated engine under codex-rs/hooks
- each hook type has its own event-specific file
- hook execution is synchronous and blocks normal turn progression while
running
- matching hooks run in parallel, then their results are aggregated into
a normalized HookRunSummary

On the AppServer side, hooks are exposed as operational metadata rather
than transcript-native items:

- new live notifications: hook/started, hook/completed
- persisted/replayed hook results live on Turn.hookRuns
- we intentionally did not add hook-specific ThreadItem variants

Hooks messages are not persisted, they remain ephemeral. The context
changes they add are (they get appended to the user's prompt)

Andrei Eternal · 2026-03-10 04:11:31 +00:00

244b2d53f4

feat(approvals) RejectConfig for request_permissions (#14118 )

## Summary
We need to support allowing request_permissions calls when using
`Reject` policy

<img width="1133" height="588" alt="Screenshot 2026-03-09 at 12 06
40 PM"
src="https://github.com/user-attachments/assets/a8df987f-c225-4866-b8ab-5590960daec5"
/>

Note that this is a backwards-incompatible change for Reject policy. I'm
not sure if we need to add a default based on our current use/setup

## Testing
- [x] Added tests
- [x] Tested locally

Dylan Hurd · 2026-03-09 18:16:54 -07:00

6da84efed8

feat(core) Persist request_permission data across turns (#14009 )

## Summary
request_permissions flows should support persisting results for the
session.

Open Question: Still deciding if we need within-turn approvals - this
adds complexity but I could see it being useful

## Testing
- [x] Updated unit tests

---------

Co-authored-by: Codex <noreply@openai.com>

Dylan Hurd · 2026-03-09 14:36:38 -07:00

d241dc598c

Stabilize protocol schema fixture generation (#13886 )

## What changed
- TypeScript schema fixture generation now goes through in-memory tree
helpers rather than a heavier on-disk generation path.
- The comparison logic normalizes generated banner and path differences
that are not semantically relevant to the exported schema.
- TypeScript and JSON fixture coverage are split into separate tests,
and the expensive schema-export tests are serialized in `nextest`.

## Why this fixes the flake
- The original fixture coverage mixed several heavy codegen paths into
one monolithic test and then compared generated output that included
incidental banner/path differences.
- On Windows CI, that combination created both runtime pressure and
output variance unrelated to the schema shapes we actually care about.
- Splitting the coverage isolates failures by format, in-memory
generation reduces filesystem churn, normalization strips generator
noise, and serializing the heavy tests removes parallel resource
contention.

## Scope
- Production helper change plus test changes.

Ahmed Ibrahim · 2026-03-09 13:51:50 -07:00

831ee51c86

chore: plugin/uninstall endpoint (#14111 )

add `plugin/uninstall` app-server endpoint to fully rm plugin from
plugins cache dir and rm entry from user config file.

plugin-enablement is session-scoped, so uninstalls are only picked up in
new sessions (like installs).

added tests.

sayan-oai · 2026-03-09 12:40:25 -07:00

6ad448b658

Add request permissions tool (#13092 )

Adds a built-in `request_permissions` tool and wires it through the
Codex core, protocol, and app-server layers so a running turn can ask
the client for additional permissions instead of relying on a static
session policy.

The new flow emits a `RequestPermissions` event from core, tracks the
pending request by call ID, forwards it through app-server v2 as an
`item/permissions/requestApproval` request, and resumes the tool call
once the client returns an approved subset of the requested permission
profile.

Jack Mousseau · 2026-03-08 20:23:06 -07:00

e6b93841c5

app-server: include experimental skill metadata in exec approval requests (#13929 )

## Summary

This change surfaces skill metadata on command approval requests so
app-server clients can tell when an approval came from a skill script
and identify the originating `SKILL.md`.

- add `skill_metadata` to exec approval events in the shared protocol
- thread skill metadata through core shell escalation and delegated
approval handling for skill-triggered approvals
- expose the field in app-server v2 as experimental `skillMetadata`
- regenerate the JSON/TypeScript schemas and cover the new field in
protocol, transport, core, and TUI tests

## Why

Skill-triggered approvals already carry skill context inside core, but
app-server clients could not see which skill caused the prompt. Sending
the skill metadata with the approval request makes it possible for
clients to present better approval UX and connect the prompt back to the
relevant skill definition.


## example event in app-server-v2
verified that we see this event when experimental api is on:
```
< {
<   "id": 11,
<   "method": "item/commandExecution/requestApproval",
<   "params": {
<     "additionalPermissions": {
<       "fileSystem": null,
<       "macos": {
<         "accessibility": false,
<         "automations": {
<           "bundle_ids": [
<             "com.apple.Notes"
<           ]
<         },
<         "calendar": false,
<         "preferences": "read_only"
<       },
<       "network": null
<     },
<     "approvalId": "25d600ee-5a3c-4746-8d17-e2e61fb4c563",
<     "availableDecisions": [
<       "accept",
<       "acceptForSession",
<       "cancel"
<     ],
<     "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info",
<     "commandActions": [
<       {
<         "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info",
<         "type": "unknown"
<       }
<     ],
<     "cwd": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes",
<     "itemId": "call_jZp3xFpNg4D8iKAD49cvEvZy",
<     "skillMetadata": {
<       "pathToSkillsMd": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/SKILL.md"
<     },
<     "threadId": "019ccc10-b7d3-7ff2-84fe-3a75e7681e69",
<     "turnId": "019ccc10-b848-76f1-81b3-4a1fa225493f"
<   }
< }`
```

& verified that this is the event when experimental api is off:
```
< {
<   "id": 13,
<   "method": "item/commandExecution/requestApproval",
<   "params": {
<     "approvalId": "5fbbf776-261b-4cf8-899b-c125b547f2c0",
<     "availableDecisions": [
<       "accept",
<       "acceptForSession",
<       "cancel"
<     ],
<     "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info",
<     "commandActions": [
<       {
<         "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info",
<         "type": "unknown"
<       }
<     ],
<     "cwd": "/Users/celia/code/codex/codex-rs",
<     "itemId": "call_OV2DHzTgYcbYtWaTTBWlocOt",
<     "threadId": "019ccc16-2a2b-7be1-8500-e00d45b892d4",
<     "turnId": "019ccc16-2a8e-7961-98ec-649600e7d06a"
<   }
< }
```

Celia Chen · 2026-03-08 18:07:46 -07:00

340f9c9ecb

Add in-process app server and wire up exec to use it (#14005 )

This is a subset of PR #13636. See that PR for a full overview of the
architectural change.

This PR implements the in-process app server and modifies the
non-interactive "exec" entry point to use the app server.

---------

Co-authored-by: Felipe Coury <felipe.coury@gmail.com>

Eric Traut · 2026-03-08 18:43:55 -06:00

da3689f0ef

[app-server] Support hot-reload user config when batch writing config. (#13839 )

- [x] Support hot-reload user config when batch writing config.

Matthew Zeng · 2026-03-08 17:38:01 -07:00

a684a36091

Add guardian approval MVP (#13692 )

## Summary
- add the guardian reviewer flow for `on-request` approvals in command,
patch, sandbox-retry, and managed-network approval paths
- keep guardian behind `features.guardian_approval` instead of exposing
a public `approval_policy = guardian` mode
- route ordinary `OnRequest` approvals to the guardian subagent when the
feature is enabled, without changing the public approval-mode surface

## Public model
- public approval modes stay unchanged
- guardian is enabled via `features.guardian_approval`
- when that feature is on, `approval_policy = on-request` keeps the same
approval boundaries but sends those approval requests to the guardian
reviewer instead of the user
- `/experimental` only persists the feature flag; it does not rewrite
`approval_policy`
- CLI and app-server no longer expose a separate `guardian` approval
mode in this PR

## Guardian reviewer
- the reviewer runs as a normal subagent and reuses the existing
subagent/thread machinery
- it is locked to a read-only sandbox and `approval_policy = never`
- it does not inherit user/project exec-policy rules
- it prefers `gpt-5.4` when the current provider exposes it, otherwise
falls back to the parent turn's active model
- it fail-closes on timeout, startup failure, malformed output, or any
other review error
- it currently auto-approves only when `risk_score < 80`

## Review context and policy
- guardian mirrors `OnRequest` approval semantics rather than
introducing a separate approval policy
- explicit `require_escalated` requests follow the same approval surface
as `OnRequest`; the difference is only who reviews them
- managed-network allowlist misses that enter the approval flow are also
reviewed by guardian
- the review prompt includes bounded recent transcript history plus
recent tool call/result evidence
- transcript entries and planned-action strings are truncated with
explicit `<guardian_truncated ... />` markers so large payloads stay
bounded
- apply-patch reviews include the full patch content (without
duplicating the structured `changes` payload)
- the guardian request layout is snapshot-tested using the same
model-visible Responses request formatter used elsewhere in core

## Guardian network behavior
- the guardian subagent inherits the parent session's managed-network
allowlist when one exists, so it can use the same approved network
surface while reviewing
- exact session-scoped network approvals are copied into the guardian
session with protocol/port scope preserved
- those copied approvals are now seeded before the guardian's first turn
is submitted, so inherited approvals are available during any immediate
review-time checks

## Out of scope / follow-ups
- the sandbox-permission validation split was pulled into a separate PR
and is not part of this diff
- a future follow-up can enable `serde_json` preserve-order in
`codex-core` and then simplify the guardian action rendering further

---------

Co-authored-by: Codex <noreply@openai.com>

Charley Cunningham · 2026-03-07 05:40:10 -08:00

e84ee33cc0

app-server: require absolute cwd for windowsSandbox/setupStart (#13833 )

## Summary
- require windowsSandbox/setupStart.cwd to be an AbsolutePathBuf
- reject relative cwd values at request parsing instead of normalizing
them later in the setup flow
- add RPC-layer coverage for relative cwd rejection and update the
checked-in protocol schemas/docs

## Why
windowsSandbox/setupStart was carrying the client-provided cwd as a raw
PathBuf for command_cwd while config derivation normalized the same
value into an absolute policy_cwd.

That left room for relative-path ambiguity in the setup path, especially
for inputs like cwd: "repo". Making the RPC accept only absolute paths
removes that split entirely: the handler now receives one
already-validated absolute path and uses it for both config derivation
and setup.

This keeps the trust model unchanged. Trusted clients could already
choose the session cwd; this change is only about making the setup RPC
reject relative paths so command_cwd and policy_cwd cannot diverge.

## Testing
- cargo test -p codex-app-server windows_sandbox_setup (run locally by
user)
- cargo test -p codex-app-server-protocol windows_sandbox (run locally
by user)

iceweasel-oai · 2026-03-06 22:47:08 -08:00

4b4f61d379

feat(app-server-protocol): address naming conflicts in json schema exporter (#13819 )

This fixes a schema export bug where two different `WebSearchAction`
types were getting merged under the same name in the app-server v2 JSON
schema bundle.

The problem was that v2 thread items use the app-server API's
`WebSearchAction` with camelCase variants like `openPage`, while
`ThreadResumeParams.history` and
`RawResponseItemCompletedNotification.item` pull in the upstream
`ResponseItem` graph, which uses the Responses API snake_case shape like
`open_page`. During bundle generation we were flattening nested
definitions into the v2 namespace by plain name, so the later definition
could silently overwrite the earlier one.

That meant clients generating code from the bundled schema could end up
with the wrong `WebSearchAction` definition for v2 thread history. In
practice this shows up on web search items reconstructed from rollout
files with persisted extended history.

This change does two things:
- Gives the upstream Responses API schema a distinct JSON schema name:
`ResponsesApiWebSearchAction`
- Makes namespace-level schema definition collisions fail loudly instead
of silently overwriting

Owen Lin · 2026-03-07 01:33:46 +00:00

90469d0a23

app-server: Add streaming and tty/pty capabilities to command/exec (#13640 )

* Add an ability to stream stdin, stdout, and stderr
* Streaming of stdout and stderr has a configurable cap for total amount
of transmitted bytes (with an ability to disable it)
* Add support for overriding environment variables
* Add an ability to terminate running applications (using
`command/exec/terminate`)
* Add TTY/PTY support, with an ability to resize the terminal (using
`command/exec/resize`)

Ruslan Nigmatullin · 2026-03-06 17:30:17 -08:00

e9bd8b20a1

Allow full web search tool config (#13675 )

Previously, we could only configure whether web search was on/off.

This PR enables sending along a web search config, which includes all
the stuff responsesapi supports: filters, location, etc.

Rohan Mehta · 2026-03-07 00:50:50 +00:00

61098c7f51

feat: Add curated plugin marketplace + Metadata Cleanup. (#13712 )

1. Add a synced curated plugin marketplace and include it in marketplace
discovery.
2. Expose optional plugin.json interface metadata in plugin/list
3. Tighten plugin and marketplace path handling using validated absolute
paths.
4. Let manifests override skill, MCP, and app config paths.
5. Restrict plugin enablement/config loading to the user config layer so
plugin enablement is at global level

xl-openai · 2026-03-06 19:39:35 -05:00

0243734300

feat: structured plugin parsing (#13711 )

#### What

Add structured `@plugin` parsing and TUI support for plugin mentions.

- Core: switch from plain-text `@display_name` parsing to structured
`plugin://...` mentions via `UserInput::Mention` and
`[$...](plugin://...)` links in text, same pattern as apps/skills.
- TUI: add plugin mention popup, autocomplete, and chips when typing
`$`. Load plugin capability summaries and feed them into the composer;
plugin mentions appear alongside skills and apps.
- Generalize mention parsing to a sigil parameter, still defaults to `$`

<img width="797" height="119" alt="image"
src="https://github.com/user-attachments/assets/f0fe2658-d908-4927-9139-73f850805ceb"
/>

Builds on #13510. Currently clients have to build their own `id` via
`plugin@marketplace` and filter plugins to show by `enabled`, but we
will add `id` and `available` as fields returned from `plugin/list`
soon.

####Tests

Added tests, verified locally.

sayan-oai · 2026-03-06 11:08:36 -08:00

8a54d3caaa

[elicitations] Switch to use MCP style elicitation payload for mcp tool approvals. (#13621 )

- [x] Switch to use MCP style elicitation payload for mcp tool
approvals.
- [ ] TODO: Update the UI to support the full spec.

Matthew Zeng · 2026-03-06 01:50:26 -08:00

98dca99db7

Enabling CWD Saving for Image-Gen (#13607 )

Codex now saves the generated image on to your current working
directory.

Won Park · 2026-03-06 00:47:21 -08:00

ee1a20258a

check app auth in plugin/install (#13685 )

#### What
on `plugin/install`, check if installed apps are already authed on
chatgpt, and return list of all apps that are not. clients can use this
list to trigger auth workflows as needed.

checks are best effort based on `codex_apps` loading, much like
`app/list`.

#### Tests
Added integration tests, tested locally.

sayan-oai · 2026-03-06 06:45:00 +00:00

014a59fb0b

refactor: remove proxy admin endpoint (#13687 )

## Summary
- delete the network proxy admin server and its runtime listener/task
plumbing
- remove the admin endpoint config, runtime, requirement, protocol,
schema, and debug-surface fields
- update proxy docs to reflect the remaining HTTP and SOCKS listeners
only

viyatb-oai · 2026-03-05 22:03:16 -08:00

6a79ed5920

fix: accept two macOS automation input shapes for approval payload compatibility (#13683 )

## Summary
This PR:
1. fixes a deserialization mismatch for macOS automation permissions in
approval payloads by making core parsing accept both supported wire
shapes for bundle IDs.
2. added `#[serde(default)]` to `MacOsSeatbeltProfileExtensions` so
omitted fields deserialize to secure defaults.

## Why this change is needed
`MacOsAutomationPermission` uses `#[serde(try_from =
"MacOsAutomationPermissionDe")]`, so deserialization is controlled by
`MacOsAutomationPermissionDe`. After we aligned v2
`additionalPermissions.macos.automations` to the core shape, approval
payloads started including `{ "bundle_ids": [...] }` in some paths.
`MacOsAutomationPermissionDe` previously accepted only `"none" | "all"`
or a plain array, so object-shaped bundle IDs failed with `data did not
match any variant of untagged enum MacOsAutomationPermissionDe`. This
change restores compatibility by accepting both forms while preserving
existing normalization behavior (trim values and map empty bundle lists
to `None`).

## Validation

saw this error went away when running
```
cargo run -p codex-app-server-test-client -- \
--codex-bin ./target/debug/codex \
-c 'approval_policy="on-request"' \
-c 'features.shell_zsh_fork=true' \
-c 'zsh_path="/tmp/codex-zsh-fork/package/vendor/aarch64-apple-darwin/zsh/macos-15/zsh"' \
send-message-v2 --experimental-api \
'Use $apple-notes and run scripts/notes_info now.'
```
:
```
Error: failed to deserialize ServerRequest from JSONRPCRequest

Caused by:
data did not match any variant of untagged enum MacOsAutomationPermissionDe
```

Celia Chen · 2026-03-06 06:02:33 +00:00

f9ce403b5a

support plugin/list. (#13540 )

Introduce a plugin/list which reads from local marketplace.json.
Also update the signature for plugin/install.

xl-openai · 2026-03-05 21:58:50 -05:00

520ed724d2

core/protocol: add structured macOS additional permissions and merge them into sandbox execution (#13499 )

## Summary
- Introduce strongly-typed macOS additional permissions across
protocol/core/app-server boundaries.
- Merge additional permissions into effective sandbox execution,
including macOS seatbelt profile extensions.
- Expand docs, schema/tool definitions, UI rendering, and tests for
`network`, `file_system`, and `macos` additional permissions.

Celia Chen · 2026-03-05 16:21:45 -08:00

aaefee04cd

add @plugin mentions (#13510 )

## Note-- added plugin mentions via @, but that conflicts with file
mentions

depends and builds upon #13433.

- introduces explicit `@plugin` mentions. this injects the plugin's mcp
servers, app names, and skill name format into turn context as a dev
message.
- we do not yet have UI for these mentions, so we currently parse raw
text (as opposed to skills and apps which have UI chips, autocomplete,
etc.) this depends on a `plugins/list` app-server endpoint we can feed
the UI with, which is upcoming
- also annotate mcp and app tool descriptions with the plugin(s) they
come from. this gives the model a first class way of understanding what
tools come from which plugins, which will help implicit invocation.

### Tests
Added and updated tests, unit and integration. Also confirmed locally a
raw `@plugin` injects the dev message, and the model knows about its
apps, mcps, and skills.

sayan-oai · 2026-03-06 00:03:39 +00:00

4e77ea0ec7

feat(app-server): support mcp elicitations in v2 api (#13425 )

This adds a first-class server request for MCP server elicitations:
`mcpServer/elicitation/request`.

Until now, MCP elicitation requests only showed up as a raw
`codex/event/elicitation_request` event from core. That made it hard for
v2 clients to handle elicitations using the same request/response flow
as other server-driven interactions (like shell and `apply_patch`
tools).

This also updates the underlying MCP elicitation request handling in
core to pass through the full MCP request (including URL and form data)
so we can expose it properly in app-server.

### Why not `item/mcpToolCall/elicitationRequest`?
This is because MCP elicitations are related to MCP servers first, and
only optionally to a specific MCP tool call.

In the MCP protocol, elicitation is a server-to-client capability: the
server sends `elicitation/create`, and the client replies with an
elicitation result. RMCP models it that way as well.

In practice an elicitation is often triggered by an MCP tool call, but
not always.

### What changed
- add `mcpServer/elicitation/request` to the v2 app-server API
- translate core `codex/event/elicitation_request` events into the new
v2 server request
- map client responses back into `Op::ResolveElicitation` so the MCP
server can continue
- update app-server docs and generated protocol schema
- add an end-to-end app-server test that covers the full round trip
through a real RMCP elicitation flow
- The new test exercises a realistic case where an MCP tool call
triggers an elicitation, the app-server emits
mcpServer/elicitation/request, the client accepts it, and the tool call
resumes and completes successfully.

### app-server API flow
- Client starts a thread with `thread/start`.
- Client starts a turn with `turn/start`.
- App-server sends `item/started` for the `mcpToolCall`.
- While that tool call is in progress, app-server sends
`mcpServer/elicitation/request`.
- Client responds to that request with `{ action: "accept" | "decline" |
"cancel" }`.
- App-server sends `serverRequest/resolved`.
- App-server sends `item/completed` for the mcpToolCall.
- App-server sends `turn/completed`.
- If the turn is interrupted while the elicitation is pending,
app-server still sends `serverRequest/resolved` before the turn
finishes.

Owen Lin · 2026-03-05 07:20:20 -08:00

926b2f19e8

image-gen-event/client_processing (#13512 )

enabling client-side to process with image-generation capabilities
(setting app-server)

Won Park · 2026-03-04 16:54:38 -08:00

229e6d0347

plugin: support local-based marketplace.json + install endpoint. (#13422 )

Support marketplace.json that points to a local file, with
```
    "source":
    {
        "source": "local",
        "path": "./plugin-1"
    },
 ```
 
 Add a new plugin/install endpoint which add the plugin to the cache folder and enable it in config.toml.

xl-openai · 2026-03-04 19:08:18 -05:00

1e877ccdd2

feat(app-server-test-client): OTEL setup for tracing (#13493 )

### Overview
This PR:
- Updates `app-server-test-client` to load OTEL settings from
`$CODEX_HOME/config.toml` and initializes its own OTEL provider.
- Add real client root spans to app-server test client traces.

This updates `codex-app-server-test-client` so its Datadog traces
reflect the full client-driven flow instead of a set of server spans
stitched together under a synthetic parent.

Before this change, the test client generated a fake `traceparent` once
and reused it for every JSON-RPC request. That kept the requests in one
trace, but there was no real client span at the top, so Datadog ended up
showing the sequence in a slightly misleading way, where all RPCs were
anchored under `initialize`.

Now the test client:
- loads OTEL settings from the normal Codex config path, including
`$CODEX_HOME/config.toml` and existing --config overrides
- initializes tracing the same way other Codex binaries do when trace
export is enabled
- creates a real client root span for each scripted command
- creates per-request client spans for JSON-RPC methods like
`initialize`, `thread/start`, and `turn/start`
- injects W3C trace context from the current client span into
request.trace instead of reusing a fabricated carrier

This gives us a cleaner trace shape in Datadog:
- one trace URL for the whole scripted flow
- a visible client root span
- proper client/server parent-child relationships for each app-server
request

Owen Lin · 2026-03-04 13:30:09 -08:00

8dfd654196

328 Commits