- Split the feature system into a new `codex-features` crate.
- Cut `codex-core` and workspace consumers over to the new config and
warning APIs.
Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
Co-authored-by: Codex <noreply@openai.com>
## Summary
Some background. We're looking to instrument GA turns end to end. Right
now a big gap is grouping mcp tool calls with their codex sessions. We
send session id and turn id headers to the responses call but not the
mcp/wham calls.
Ideally we could pass the args as headers like with responses, but given
the setup of the rmcp client, we can't send as headers without either
changing the rmcp package upstream to allow per request headers or
introducing a mutex which break concurrency. An earlier attempt made the
assumption that we had 1 client per thread, which allowed us to set
headers at the start of a turn. @pakrym mentioned that this assumption
might break in the near future.
So the solution now is to package the turn metadata/session id into the
_meta field in the post body and pull out in codex-backend.
- send turn metadata to MCP servers via `tools/call` `_meta` instead of
assuming per-thread request headers on shared clients
- preserve the existing `_codex_apps` metadata while adding
`x-codex-turn-metadata` for all MCP tool calls
- extend tests to cover both custom MCP servers and the codex apps
search flow
---------
Co-authored-by: Codex <noreply@openai.com>
The idea is that codex-exec exposes an Environment struct with services
on it. Each of those is a trait.
Depending on construction parameters passed to Environment they are
either backed by local or remote server but core doesn't see these
differences.
## Description
Dependent on:
- [responsesapi] https://github.com/openai/openai/pull/760991
- [codex-backend] https://github.com/openai/openai/pull/760985
`codex app-server -> codex-backend -> responsesapi` now reuses a
persistent websocket connection across many turns. This PR updates
tracing when using websockets so that each `response.create` websocket
request propagates the current tracing context, so we can get a holistic
end-to-end trace for each turn.
Tracing is propagated via special keys (`ws_request_header_traceparent`,
`ws_request_header_tracestate`) set in the `client_metadata` param in
Responses API.
Currently tracing on websockets is a bit broken because we only set
tracing context on ws connection time, so it's detached from a
`turn/start` request.
Resubmit https://github.com/openai/codex/pull/15020 with correct
content.
1. Use requirement-resolved config.features as the plugin gate.
2. Guard plugin/list, plugin/read, and related flows behind that gate.
3. Skip bad marketplace.json files instead of failing the whole list.
4. Simplify plugin state and caching.
## Summary
If a subagent requests approval, and the user persists that approval to
the execpolicy, it should (by default) propagate. We'll need to rethink
this a bit in light of coming Permissions changes, though I think this
is closer to the end state that we'd want, which is that execpolicy
changes to one permissions profile should be synced across threads.
## Testing
- [x] Added integration test
---------
Co-authored-by: Codex <noreply@openai.com>
- this allows blocking the user's prompts from executing, and also
prevents them from entering history
- handles the edge case where you can both prevent the user's prompt AND
add n amount of additionalContexts
- refactors some old code into common.rs where hooks overlap
functionality
- refactors additionalContext being previously added to user messages,
instead we use developer messages for them
- handles queued messages correctly
Sample hook for testing - if you write "[block-user-submit]" this hook
will stop the thread:
example run
```
› sup
• Running UserPromptSubmit hook: reading the observatory notes
UserPromptSubmit hook (completed)
warning: wizard-tower UserPromptSubmit demo inspected: sup
hook context: Wizard Tower UserPromptSubmit demo fired. For this reply only, include the exact
phrase 'observatory lanterns lit' exactly once near the end.
• Just riding the cosmic wave and ready to help, my friend. What are we building today? observatory
lanterns lit
› and [block-user-submit]
• Running UserPromptSubmit hook: reading the observatory notes
UserPromptSubmit hook (stopped)
warning: wizard-tower UserPromptSubmit demo blocked the prompt on purpose.
stop: Wizard Tower demo block: remove [block-user-submit] to continue.
```
.codex/config.toml
```
[features]
codex_hooks = true
```
.codex/hooks.json
```
{
"hooks": {
"UserPromptSubmit": [
{
"hooks": [
{
"type": "command",
"command": "/usr/bin/python3 .codex/hooks/user_prompt_submit_demo.py",
"timeoutSec": 10,
"statusMessage": "reading the observatory notes"
}
]
}
]
}
}
```
.codex/hooks/user_prompt_submit_demo.py
```
#!/usr/bin/env python3
import json
import sys
from pathlib import Path
def prompt_from_payload(payload: dict) -> str:
prompt = payload.get("prompt")
if isinstance(prompt, str) and prompt.strip():
return prompt.strip()
event = payload.get("event")
if isinstance(event, dict):
user_prompt = event.get("user_prompt")
if isinstance(user_prompt, str):
return user_prompt.strip()
return ""
def main() -> int:
payload = json.load(sys.stdin)
prompt = prompt_from_payload(payload)
cwd = Path(payload.get("cwd", ".")).name or "wizard-tower"
if "[block-user-submit]" in prompt:
print(
json.dumps(
{
"systemMessage": (
f"{cwd} UserPromptSubmit demo blocked the prompt on purpose."
),
"decision": "block",
"reason": (
"Wizard Tower demo block: remove [block-user-submit] to continue."
),
}
)
)
return 0
prompt_preview = prompt or "(empty prompt)"
if len(prompt_preview) > 80:
prompt_preview = f"{prompt_preview[:77]}..."
print(
json.dumps(
{
"systemMessage": (
f"{cwd} UserPromptSubmit demo inspected: {prompt_preview}"
),
"hookSpecificOutput": {
"hookEventName": "UserPromptSubmit",
"additionalContext": (
"Wizard Tower UserPromptSubmit demo fired. "
"For this reply only, include the exact phrase "
"'observatory lanterns lit' exactly once near the end."
),
},
}
)
)
return 0
if __name__ == "__main__":
raise SystemExit(main())
```
Adds an environment crate and environment + file system abstraction.
Environment is a combination of attributes and services specific to
environment the agent is connected to:
File system, process management, OS, default shell.
The goal is to move most of agent logic that assumes environment to work
through the environment abstraction.
## What is flaky
The Windows shell-driven integration tests in `codex-rs/core` were
intermittently unstable, especially:
- `apply_patch_cli_can_use_shell_command_output_as_patch_input`
- `websocket_test_codex_shell_chain`
- `websocket_v2_test_codex_shell_chain`
## Why it was flaky
These tests were exercising real shell-tool flows through whichever
shell Codex selected on Windows, and the `apply_patch` test also nested
a PowerShell read inside `cmd /c`.
There were multiple independent sources of nondeterminism in that setup:
- The test harness depended on the model-selected Windows shell instead
of pinning the shell it actually meant to exercise.
- `cmd.exe /c powershell.exe -Command "..."` is quoting-sensitive; on CI
that could leave the read command wrapped as a literal string instead of
executing it.
- Even after getting the quoting right, PowerShell could emit CLIXML
progress records like module-initialization output onto stdout.
- The `apply_patch` test was building a patch directly from shell
stdout, so any quoting artifact or progress noise corrupted the patch
input.
So the failures were driven by shell startup and output-shape variance,
not by the `apply_patch` or websocket logic themselves.
## How this PR fixes it
- Add a test-only `user_shell_override` path so Windows integration
tests can pin `cmd.exe` explicitly.
- Use that override in the websocket shell-chain tests and in the
`apply_patch` harness.
- Change the nested Windows file read in
`apply_patch_cli_can_use_shell_command_output_as_patch_input` to a UTF-8
PowerShell `-EncodedCommand` script.
- Run that nested PowerShell process with `-NonInteractive`, set
`$ProgressPreference = 'SilentlyContinue'`, and read the file with
`[System.IO.File]::ReadAllText(...)`.
## Why this fix fixes the flakiness
The outer harness now runs under a deterministic shell, and the inner
PowerShell read no longer depends on fragile `cmd` quoting or on
progress output staying quiet by accident. The shell tool returns only
the file contents, so patch construction and websocket assertions depend
on stable test inputs instead of on runner-specific shell behavior.
---------
Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
Co-authored-by: Codex <noreply@openai.com>
## Description
This PR fixes a bad first-turn failure mode in app-server when the
startup websocket prewarm hangs. Before this change, `initialize ->
thread/start -> turn/start` could sit behind the prewarm for up to five
minutes, so the client would not see `turn/started`, and even
`turn/interrupt` would block because the turn had not actually started
yet.
Now, we:
- set a (configurable) timeout of 15s for websocket startup time,
exposed as `websocket_startup_timeout_ms` in config.toml
- `turn/started` is sent immediately on `turn/start` even if the
websocket is still connecting
- `turn/interrupt` can be used to cancel a turn that is still waiting on
the websocket warmup
- the turn task will wait for the full 15s websocket warming timeout
before falling back
## Why
The old behavior made app-server feel stuck at exactly the moment the
client expects turn lifecycle events to start flowing. That was
especially painful for external clients, because from their point of
view the server had accepted the request but then went silent for
minutes.
## Configuring the websocket startup timeout
Can set it in config.toml like this:
```
[model_providers.openai]
supports_websockets = true
websocket_connect_timeout_ms = 15000
```
## Stack Position
2/4. Built on top of #14828.
## Base
- #14828
## Unblocks
- #14829
- #14827
## Scope
- Port the realtime v2 wire parsing, session, app-server, and
conversation runtime behavior onto the split websocket-method base.
- Branch runtime behavior directly on the current realtime session kind
instead of parser-derived flow flags.
- Keep regression coverage in the existing e2e suites.
---------
Co-authored-by: Codex <noreply@openai.com>
Fix https://github.com/openai/codex/issues/14161
This fixes sub-agent [[skills.config]] overrides being ignored when
parent and child share the same cwd. The root cause was that turn skill
loading rebuilt from cwd-only state and reused a cwd-scoped cache, so
role-local skill enable/disable overrides did not reliably affect the
spawned agent's effective skill set.
This change switches turn construction to use the effective per-turn
config and adds a config-aware skills cache keyed by skill roots plus
final disabled paths.
## Summary
- reuse a guardian subagent session across approvals so reviews keep a
stable prompt cache key and avoid one-shot startup overhead
- clear the guardian child history before each review so prior guardian
decisions do not leak into later approvals
- include the `smart_approvals` -> `guardian_approval` feature flag
rename in the same PR to minimize release latency on a very tight
timeline
- add regression coverage for prompt-cache-key reuse without
prior-review prompt bleed
## Request
- Bug/enhancement request: internal guardian prompt-cache and latency
improvement request
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- apply persisted execpolicy network rules when booting the managed
network proxy
- pass the current execpolicy into managed proxy startup so host
approvals selected with "allow this host in the future" survive new
sessions
## Summary
- reuse rollout reconstruction when applying a backtrack rollback so
`reference_context_item` is restored from persisted rollout state
- build rollback replay from the flushed rollout items plus the rollback
marker, avoiding the extra reread/fallback path
- add regression coverage for rollback after compaction so turn-context
diffing stays aligned after backtracking
Co-authored-by: Codex <noreply@openai.com>
## Why
The unified-exec path was carrying zsh-fork state in a partially
flattened way.
First, the decision about whether zsh-fork was active came from feature
selection in `ToolsConfig`, while the real prerequisites lived in
session state. That left the handler and runtime defending against
partially configured cases later.
Second, once zsh-fork was active, its two runtime-only paths were
threaded through the runtime as separate arguments even though they form
one coherent piece of configuration.
This change keeps unified-exec on a single session-derived source of
truth and bundles the zsh-fork-specific paths into a named config type
so the runtime can pass them around as one unit.
In particular, this PR introduces this enum so the `ZshFork` variant can
carry the appropriate state with it:
```rust
#[derive(Debug, Clone, Eq, PartialEq)]
pub enum UnifiedExecShellMode {
Direct,
ZshFork(ZshForkConfig),
}
#[derive(Debug, Clone, Eq, PartialEq)]
pub struct ZshForkConfig {
pub(crate) shell_zsh_path: AbsolutePathBuf,
pub(crate) main_execve_wrapper_exe: AbsolutePathBuf,
}
```
This cleanup was done in preparation for
https://github.com/openai/codex/pull/13432.
## What Changed
- Replaced the feature-only `UnifiedExecBackendConfig` split with
`UnifiedExecShellMode` in `codex-rs/core/src/tools/spec.rs`.
- Derived the unified-exec mode from session-backed inputs when building
turn `ToolsConfig`, and preserved that mode across model switches and
review turns.
- Introduced `ZshForkConfig`, which stores the resolved zsh-fork
`AbsolutePathBuf` values for the configured `zsh` binary and `execve`
wrapper.
- Threaded `ZshForkConfig` through unified-exec command construction and
the zsh-fork preparation path so zsh-fork-specific runtime code consumes
a single config object instead of separate path arguments.
- Added focused tests for constructing zsh-fork mode only when session
prerequisites are available, and updated the zsh-fork expectations to be
target-platform aware.
## Testing
- `cargo test -p codex-core zsh_fork --lib`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/14633).
* #13432
* __->__ #14633
## Summary
- add `approvals_reviewer = "user" | "guardian_subagent"` as the runtime
control for who reviews approval requests
- route Smart Approvals guardian review through core for command
execution, file changes, managed-network approvals, MCP approvals, and
delegated/subagent approval flows
- expose guardian review in app-server with temporary unstable
`item/autoApprovalReview/{started,completed}` notifications carrying
`targetItemId`, `review`, and `action`
- update the TUI so Smart Approvals can be enabled from `/experimental`,
aligned with the matching `/approvals` mode, and surfaced clearly while
reviews are pending or resolved
## Runtime model
This PR does not introduce a new `approval_policy`.
Instead:
- `approval_policy` still controls when approval is needed
- `approvals_reviewer` controls who reviewable approval requests are
routed to:
- `user`
- `guardian_subagent`
`guardian_subagent` is a carefully prompted reviewer subagent that
gathers relevant context and applies a risk-based decision framework
before approving or denying the request.
The `smart_approvals` feature flag is a rollout/UI gate. Core runtime
behavior keys off `approvals_reviewer`.
When Smart Approvals is enabled from the TUI, it also switches the
current `/approvals` settings to the matching Smart Approvals mode so
users immediately see guardian review in the active thread:
- `approval_policy = on-request`
- `approvals_reviewer = guardian_subagent`
- `sandbox_mode = workspace-write`
Users can still change `/approvals` afterward.
Config-load behavior stays intentionally narrow:
- plain `smart_approvals = true` in `config.toml` remains just the
rollout/UI gate and does not auto-set `approvals_reviewer`
- the deprecated `guardian_approval = true` alias migration does
backfill `approvals_reviewer = "guardian_subagent"` in the same scope
when that reviewer is not already configured there, so old configs
preserve their original guardian-enabled behavior
ARC remains a separate safety check. For MCP tool approvals, ARC
escalations now flow into the configured reviewer instead of always
bypassing guardian and forcing manual review.
## Config stability
The runtime reviewer override is stable, but the config-backed
app-server protocol shape is still settling.
- `thread/start`, `thread/resume`, and `turn/start` keep stable
`approvalsReviewer` overrides
- the config-backed `approvals_reviewer` exposure returned via
`config/read` (including profile-level config) is now marked
`[UNSTABLE]` / experimental in the app-server protocol until we are more
confident in that config surface
## App-server surface
This PR intentionally keeps the guardian app-server shape narrow and
temporary.
It adds generic unstable lifecycle notifications:
- `item/autoApprovalReview/started`
- `item/autoApprovalReview/completed`
with payloads of the form:
- `{ threadId, turnId, targetItemId, review, action? }`
`review` is currently:
- `{ status, riskScore?, riskLevel?, rationale? }`
- where `status` is one of `inProgress`, `approved`, `denied`, or
`aborted`
`action` carries the guardian action summary payload from core when
available. This lets clients render temporary standalone pending-review
UI, including parallel reviews, even when the underlying tool item has
not been emitted yet.
These notifications are explicitly documented as `[UNSTABLE]` and
expected to change soon.
This PR does **not** persist guardian review state onto `thread/read`
tool items. The intended follow-up is to attach guardian review state to
the reviewed tool item lifecycle instead, which would improve
consistency with manual approvals and allow thread history / reconnect
flows to replay guardian review state directly.
## TUI behavior
- `/experimental` exposes the rollout gate as `Smart Approvals`
- enabling it in the TUI enables the feature and switches the current
session to the matching Smart Approvals `/approvals` mode
- disabling it in the TUI clears the persisted `approvals_reviewer`
override when appropriate and returns the session to default manual
review when the effective reviewer changes
- `/approvals` still exposes the reviewer choice directly
- the TUI renders:
- pending guardian review state in the live status footer, including
parallel review aggregation
- resolved approval/denial state in history
## Scope notes
This PR includes the supporting core/runtime work needed to make Smart
Approvals usable end-to-end:
- shell / unified-exec / apply_patch / managed-network / MCP guardian
review
- delegated/subagent approval routing into guardian review
- guardian review risk metadata and action summaries for app-server/TUI
- config/profile/TUI handling for `smart_approvals`, `guardian_approval`
alias migration, and `approvals_reviewer`
- a small internal cleanup of delegated approval forwarding to dedupe
fallback paths and simplify guardian-vs-parent approval waiting (no
intended behavior change)
Out of scope for this PR:
- redesigning the existing manual approval protocol shapes
- persisting guardian review state onto app-server `ThreadItem`s
- delegated MCP elicitation auto-review (the current delegated MCP
guardian shim only covers the legacy `RequestUserInput` path)
---------
Co-authored-by: Codex <noreply@openai.com>
## Description
This PR expands tracing coverage across app-server thread startup, core
session initialization, and the Responses transport layer. It also gives
core dispatch spans stable operation-specific names so traces are easier
to follow than the old generic `submission_dispatch` spans.
Also use `fmt::Display` for types that we serialize in traces so we send
strings instead of rust types
## Summary
- launch Windows sandboxed children on a private desktop instead of
`Winsta0\Default`
- make private desktop the default while keeping
`windows.sandbox_private_desktop=false` as the escape hatch
- centralize process launch through the shared
`create_process_as_user(...)` path
- scope the private desktop ACL to the launching logon SID
## Why
Today sandboxed Windows commands run on the visible shared desktop. That
leaves an avoidable same-desktop attack surface for window interaction,
spoofing, and related UI/input issues. This change moves sandboxed
commands onto a dedicated per-launch desktop by default so the sandbox
no longer shares `Winsta0\Default` with the user session.
The implementation stays conservative on security with no silent
fallback back to `Winsta0\Default`
If private-desktop setup fails on a machine, users can still opt out
explicitly with `windows.sandbox_private_desktop=false`.
## Validation
- `cargo build -p codex-cli`
- elevated-path `codex exec` desktop-name probe returned
`CodexSandboxDesktop-*`
- elevated-path `codex exec` smoke sweep for shell commands, nested
`pwsh`, jobs, and hidden `notepad` launch
- unelevated-path full private-desktop compatibility sweep via `codex
exec` with `-c windows.sandbox=unelevated`
## Summary
- persist the code mode runner process in the session-scoped code mode
store
- switch the runner protocol from `init` to `start` with explicit
session ids
- handle runner-side session processing without the init waiter queue
## Validation
- just fmt
- cargo check -p codex-core
- node --check codex-rs/core/src/tools/code_mode_runner.cjs
## Summary
This PR keeps app-server RPC request trace context alive for the full
lifetime of the work that request kicks off (e.g. for `thread/start`,
this is `app-server rpc handler -> tokio background task -> core op
submissions`). Previously we lose trace lineage once the request handler
returns or hands work off to background tasks.
This approach is especially relevant for `thread/start` and other RPC
handlers that run in a non-blocking way. In the near future we'll most
likely want to make all app-server handlers run in a non-blocking way by
default, and only queue operations that must operate in order (e.g.
thread RPCs per thread?), so we want to make sure tracing in app-server
just generally works.
Depends on https://github.com/openai/codex/pull/14300
**Before**
<img width="155" height="207" alt="image"
src="https://github.com/user-attachments/assets/c9487459-36f1-436c-beb7-fafeb40737af"
/>
**After**
<img width="299" height="337" alt="image"
src="https://github.com/user-attachments/assets/727392b2-d072-4427-9dc4-0502d8652dea"
/>
## What changed
- Keep request-scoped trace context around until we send the final
response or error, or the connection closes.
- Thread that trace context through detached `thread/start` work so
background startup stays attached to the originating request.
- Pass request trace context through to downstream core operations,
including:
- thread creation
- resume/fork flows
- turn submission
- review
- interrupt
- realtime conversation operations
- Add tracing tests that verify:
- remote W3C trace context is preserved for `thread/start`
- remote W3C trace context is preserved for `turn/start`
- downstream core spans stay under the originating request span
- request-scoped tracing state is cleaned up correctly
- Clean up shutdown behavior so detached background tasks and spawned
threads are drained before process exit.
## Why
to support a new bring your own search tool in Responses
API(https://developers.openai.com/api/docs/guides/tools-tool-search#client-executed-tool-search)
we migrating our bm25 search tool to use official way to execute search
on client and communicate additional tools to the model.
## What
- replace the legacy `search_tool_bm25` flow with client-executed
`tool_search`
- add protocol, SSE, history, and normalization support for
`tool_search_call` and `tool_search_output`
- return namespaced Codex Apps search results and wire namespaced
follow-up tool calls back into MCP dispatch
## Summary
- defer fresh-session `build_initial_context()` until the first real
turn instead of seeding model-visible context during startup
- rely on the existing `reference_context_item == None` turn-start path
to inject full initial context on that first real turn (and again after
baseline resets such as compaction)
- add a regression test for `InitialHistory::New` and update affected
deterministic tests / snapshots around developer-message layout,
collaboration instructions, personality updates, and compact request
shapes
## Notes
- this PR does not add any special empty-thread `/compact` behavior
- most of the snapshot churn is the direct result of moving the initial
model-visible context from startup to the first real turn, so first-turn
request layouts no longer contain a pre-user startup copy of permissions
/ environment / other developer-visible context
- remote manual `/compact` with no prior user still skips the remote
compact request; local first-turn `/compact` still issues a compact
request, but that request now reflects the lack of startup-seeded
context
---------
Co-authored-by: Codex <noreply@openai.com>
# External (non-OpenAI) Pull Request Requirements
Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.
Include a link to a bug report or enhancement request.
## Summary
- add `skill_approval` to `RejectConfig` and the app-server v2
`AskForApproval::Reject` payload so skill-script prompts can be
configured independently from sandbox and rule-based prompts
- update Unix shell escalation to reject prompts based on the actual
decision source, keeping prefix rules tied to `rules`, unmatched command
fallbacks tied to `sandbox_approval`, and skill scripts tied to
`skill_approval`
- regenerate the affected protocol/config schemas and expand
unit/integration coverage for the new flag and skill approval behavior
Summary
- drop `McpToolOutput` in favor of `CallToolResult`, moving its helpers
to keep MCP tooling focused on the final result shape
- wire the new schema definitions through code mode, context, handlers,
and spec modules so MCP tools serialize the exact output shape expected
by the model
- extend code mode tests to cover multiple MCP call scenarios and ensure
the serialized data matches the new schema
- refresh JS runner helpers and protocol models alongside the schema
changes
Testing
- Not run (not requested)
### Purpose
While trying to build out CLI-Tools for the agent to use under skills we
have found that those tools sometimes need to invoke a user elicitation.
These elicitations are handled out of band of the codex app-server but
need to indicate to the exec manager that the command running is not
going to progress on the usual timeout horizon.
### Example
Model calls universal exec:
`$ download-credit-card-history --start-date 2026-01-19 --end-date
2026-02-19 > credit_history.jsonl`
download-cred-card-history might hit a hosted/preauthenticated service
to fetch data. That service might decide that the request requires an
end user approval the access to the personal data. It should be able to
signal to the running thread that the command in question is blocked on
user elicitation. In that case we want the exec to continue, but the
timeout to not expire on the tool call, essentially freezing time until
the user approves or rejects the command at which point the tool would
signal the app-server to decrement the outstanding elicitation count.
Now timeouts would proceed as normal.
### What's Added
- New v2 RPC methods:
- thread/increment_elicitation
- thread/decrement_elicitation
- Protocol updates in:
- codex-rs/app-server-protocol/src/protocol/common.rs
- codex-rs/app-server-protocol/src/protocol/v2.rs
- App-server handlers wired in:
- codex-rs/app-server/src/codex_message_processor.rs
### Behavior
- Counter starts at 0 per thread.
- increment atomically increases the counter.
- decrement atomically decreases the counter; decrement at 0 returns
invalid request.
- Transition rules:
- 0 -> 1: broadcast pause state, pausing all active stopwatches
immediately.
- \>0 -> >0: remain paused.
- 1 -> 0: broadcast unpause state, resuming stopwatches.
- Core thread/session logic:
- codex-rs/core/src/codex_thread.rs
- codex-rs/core/src/codex.rs
- codex-rs/core/src/mcp_connection_manager.rs
### Exec-server stopwatch integration
- Added centralized stopwatch tracking/controller:
- codex-rs/exec-server/src/posix/stopwatch_controller.rs
- Hooked pause/unpause broadcast handling + stopwatch registration:
- codex-rs/exec-server/src/posix/mcp.rs
- codex-rs/exec-server/src/posix/stopwatch.rs
- codex-rs/exec-server/src/posix.rs
(Experimental)
This PR adds a first MVP for hooks, with SessionStart and Stop
The core design is:
- hooks live in a dedicated engine under codex-rs/hooks
- each hook type has its own event-specific file
- hook execution is synchronous and blocks normal turn progression while
running
- matching hooks run in parallel, then their results are aggregated into
a normalized HookRunSummary
On the AppServer side, hooks are exposed as operational metadata rather
than transcript-native items:
- new live notifications: hook/started, hook/completed
- persisted/replayed hook results live on Turn.hookRuns
- we intentionally did not add hook-specific ThreadItem variants
Hooks messages are not persisted, they remain ephemeral. The context
changes they add are (they get appended to the user's prompt)
## Summary
This is a fast follow to the initial `[permissions]` structure.
- keep the new split-policy carveout behavior for narrower non-write
entries under broader writable roots
- preserve legacy `WorkspaceWrite` semantics by using a cwd-aware bridge
that drops only redundant nested readable roots when projecting from
`SandboxPolicy`
- route the legacy macOS seatbelt adapter through that same legacy
bridge so redundant nested readable roots do not become read-only
carveouts on macOS
- derive the legacy bridge for `command_exec` using the sandbox root cwd
rather than the request cwd so policy derivation matches later sandbox
enforcement
- add regression coverage for the legacy macOS nested-readable-root case
## Examples
### Legacy `workspace-write` on macOS
A legacy `workspace-write` policy can redundantly list a nested readable
root under an already-writable workspace root.
For example, legacy config can effectively mean:
- workspace root (`.` / `cwd`) is writable
- `docs/` is also listed in `readable_roots`
The new shared split-policy helper intentionally treats a narrower
non-write entry under a broader writable root as a carveout for real
`[permissions]` configs. Without this fast follow, the unchanged macOS
seatbelt legacy adapter could project that legacy shape into a
`FileSystemSandboxPolicy` that treated `docs/` like a read-only carveout
under the writable workspace root. In practice, legacy callers on macOS
could unexpectedly lose write access inside `docs/`, even though that
path was writable before the `[permissions]` migration work.
This change fixes that by routing the legacy seatbelt path through the
cwd-aware legacy bridge, so:
- legacy `workspace-write` keeps `docs/` writable when `docs/` was only
a redundant readable root
- explicit `[permissions]` entries like `'.' = 'write'` and `'docs' =
'read'` still make `docs/` read-only, which is the new intended
split-policy behavior
### Legacy `command_exec` with a subdirectory cwd
`command_exec` can run a command from a request cwd that is narrower
than the sandbox root cwd.
For example:
- sandbox root cwd is `/repo`
- request cwd is `/repo/subdir`
- legacy policy is still `workspace-write` rooted at `/repo`
Before this fast follow, `command_exec` derived the legacy bridge using
the request cwd, but the sandbox was later built using the sandbox root
cwd. That mismatch could miss redundant legacy readable roots during
projection and accidentally reintroduce read-only carveouts for paths
that should still be writable under the legacy model.
This change fixes that by deriving the legacy bridge with the same
sandbox root cwd that sandbox enforcement later uses.
## Verification
- `just fmt`
- `cargo test -p codex-core
seatbelt_legacy_workspace_write_nested_readable_root_stays_writable`
- `cargo test -p codex-core test_sandbox_config_parsing`
- `cargo clippy -p codex-core -p codex-app-server --all-targets -- -D
warnings`
- `cargo clean`
## Summary
We need to support allowing request_permissions calls when using
`Reject` policy
<img width="1133" height="588" alt="Screenshot 2026-03-09 at 12 06
40 PM"
src="https://github.com/user-attachments/assets/a8df987f-c225-4866-b8ab-5590960daec5"
/>
Note that this is a backwards-incompatible change for Reject policy. I'm
not sure if we need to add a default based on our current use/setup
## Testing
- [x] Added tests
- [x] Tested locally
## Summary
request_permissions flows should support persisting results for the
session.
Open Question: Still deciding if we need within-turn approvals - this
adds complexity but I could see it being useful
## Testing
- [x] Updated unit tests
---------
Co-authored-by: Codex <noreply@openai.com>
Adds a built-in `request_permissions` tool and wires it through the
Codex core, protocol, and app-server layers so a running turn can ask
the client for additional permissions instead of relying on a static
session policy.
The new flow emits a `RequestPermissions` event from core, tracks the
pending request by call ID, forwards it through app-server v2 as an
`item/permissions/requestApproval` request, and resumes the tool call
once the client returns an approved subset of the requested permission
profile.