mirror of
https://github.com/pchuan98/codex.git
synced 2026-07-01 00:31:56 +08:00
e476fc16ce810495cbfabe4f815d20947da0ba79
16 Commits
-
Prepare managed network sandbox context (#29456)
## Why Managed network configures commands to use local HTTP and SOCKS proxies. For commands delegated to the exec server, the proxy environment and the sandbox policy were prepared separately. On macOS, that meant a command could receive `HTTPS_PROXY=http://127.0.0.1:43123` while Seatbelt still denied access to port `43123`. ## What changed `NetworkProxy` now prepares the command environment and sandbox context together from the same runtime snapshot: ```text Prepared managed network ├── command environment: HTTPS_PROXY=http://127.0.0.1:43123 └── sandbox context: allow outbound to 127.0.0.1:43123 ``` That context travels with remote exec requests. The exec server preserves the managed proxy and CA environment, and macOS Seatbelt allows only the prepared loopback proxy ports without enabling broad network access or local binding. The protocol field is optional and the existing enforcement flag remains in place, preserving compatibility with callers that do not send the new context.
jif ·
2026-06-23 20:07:09 +01:00 -
path-uri: clarify host-native path conversion (#29501)
## Why Downstream refactors are producing confusing code with this functionality having a very generic name. Encoding the specific conversion approach in the method name makes it clearer. ## What Rename `PathUri::from_path` to `PathUri::from_host_native_path` and update its Rust call sites.
Adam Perry @ OpenAI ·
2026-06-23 00:02:33 +00:00 -
Carry sandbox intent to remote exec servers (#29108)
## What changed PR #29099 stopped sending the orchestrator's concrete sandbox wrapper to a remote exec-server. Remote commands now arrive as plain native argv. This PR adds the next piece: Codex also sends portable sandbox intent next to that plain argv. For a remote unified-exec command, the request can now include: - the canonical permission profile before local workspace-root materialization - the sandbox cwd and workspace roots as `PathUri` values - Windows sandbox settings - the legacy Landlock setting - whether managed networking must be enforced The important part is that symbolic entries such as `:workspace_roots` stay symbolic while crossing the boundary. The executor can then bind them to its own workspace-root paths instead of receiving orchestrator-local absolute paths. The data travels through `ExecRequest` into `ExecParams`. Older exec-servers can still deserialize requests because the new fields have defaults. ## Why The orchestrator should not decide how another machine implements sandboxing. For example: - a local macOS Codex would normally build a Seatbelt command - a remote Linux executor needs a Linux sandbox command instead The orchestrator now sends the plain command plus the policy it intended to enforce. A later PR can let the exec-server choose and build the correct sandbox for its own operating system. ## Important detail This keeps the portable intent separate from the local `SandboxType`. `SandboxType::None` is ambiguous: - it can mean the command was explicitly approved to run without a sandbox - it can also mean the orchestrator host has no concrete sandbox implementation available Those cases are different for remote execution. This PR adds `sandbox_requested` so an executor can still receive sandbox intent when the orchestrator cannot build a local wrapper. Explicit unsandboxed retries still send no sandbox context. ## Behavior today This PR only transports the intent. The exec-server accepts the new fields but does not apply them yet. Remote commands therefore remain unsandboxed after this PR, just as they are after PR #29099. ## Follow-up The next PR will make exec-server read this portable intent, bind symbolic workspace permissions to executor-native roots, choose the sandbox for its own operating system, build the wrapper locally, and then spawn the command.
jif ·
2026-06-21 12:33:21 +02:00 -
Resume exec-server sessions after disconnect (#28512)
Supersedes #28288 (closed). ## Why A short WebSocket interruption currently ends every client-side process handle, even though exec-server keeps the server session and its processes alive for a short time. This is especially visible for executor-backed stdio MCP servers: a temporary connection loss becomes a permanent `Transport closed` error. The server already has the information needed to resume the session, but the client opens a fresh session instead of using it. This change reconnects below the process and MCP layers. Existing process handles stay valid, missed output is recovered, and the same server-side processes continue running. ## State machine One logical `ExecServerClient` stays alive while its underlying RPC connection changes generations. ```text transport closes +------------------------------------------------+ | v +-------------+ +-------------+ | Connected | | Recovering | +-------------+ +-------------+ ^ | | session resumed, processes caught up | retryable error +------------------------------------------------+ loops until deadline | | deadline or permanent error v +-------------+ | Failed | +-------------+ ``` ### `Connected` - New RPC calls use the current connection. - Process notifications are published in sequence order. - A disconnect only starts recovery if it came from the current connection generation. Late events from older generations cannot replace the active connection. ### `Recovering` - New calls wait instead of choosing a half-connected RPC client. - Existing process handles, wake subscriptions, and event subscriptions stay open. - Streaming HTTP response bodies fail immediately because their byte streams cannot be resumed safely. - Recovery first waits for process starts that were already in flight. A start whose result became ambiguous is cleaned up after reconnection instead of being silently adopted. - The client reconnects with the learned `session_id`. The server may briefly report that the old connection is still attached, so that error is retried until the detach finishes. - The notification consumer starts before the resume handshake completes. This prevents a busy process from filling the notification queue and blocking the initialize response. - Before installing the new connection, the client catches up every recoverable process with `process/read`. ### `Failed` - Recovery stops after 25 seconds or after a permanent error. - Waiting calls are released with one stable disconnect error. - Existing process sessions receive a terminal failure instead of waiting forever. ## Recovering process events Output, exit, and close events share one sequence. During normal operation, the client buffers early events until every lower sequence has been published. After reconnection, the client reads each process starting after its last published sequence: 1. Retained output chunks are inserted by sequence number. 2. Exit and close state are reconstructed in their sequence positions. 3. Events already received as live notifications are ignored as duplicates. 4. Newly contiguous events are published in order. 5. If the server no longer retains enough output to fill a sequence gap, only that process is terminated and failed. The recovered connection remains usable for other processes. The server reports its full next event sequence for unbounded reads, including exit and close events. Closed processes remain readable for the same 30-second window used to retain detached sessions. ## Other details - Detached server sessions are retained for 30 seconds, leaving margin around the client's 25-second recovery deadline. - Session attach and detach update the active notification sender under the same attachment lock, so an old connection cannot clear a newly attached sender. - A dedicated error code distinguishes the temporary "session is still attached" race from permanent initialization errors. - Process starts are identity-checked on both client and server. Cleanup from an older start cannot remove a newer process that reused the same ID. - Mutating requests that were already in flight when the transport closed are not replayed, because the client cannot know whether the server applied them. Requests started after recovery is known wait for the replacement connection. - We assume the server/client version stays in sync (on the before/after this PR) ## User impact Long-running commands and stdio MCP servers can survive a temporary exec-server WebSocket interruption without changing process IDs or losing output produced during the outage.
jif ·
2026-06-17 10:20:39 +02:00 -
[codex] Carry exec-server cwd as PathUri (#28032)
## Why This is the second-to-last place in the exec-server protocol that needs to migrate to URIs to support cross-OS operation. ## What - Change `ExecParams.cwd` to `PathUri`. - Keep the cwd URI-shaped through core and rmcp producers, converting it to `AbsolutePathBuf` only in `LocalProcess::start_process`. - Reject non-native cwd URIs before launch and update the affected protocol documentation and call sites.
Adam Perry @ OpenAI ·
2026-06-13 20:56:42 +00:00 -
fix(exec-server): retain output until streams close (#18946)
## Why A Mac Bazel run hit a flake in `server::handler::tests::output_and_exit_are_retained_after_notification_receiver_closes` where the read path observed process exit but lost the expected buffered stdout (`first\nsecond\n`). See the [GitHub Actions job](https://github.com/openai/codex/actions/runs/24758468552/job/72436716505) and [BuildBuddy invocation](https://app.buildbuddy.io/invocation/37475a12-4ef2-45fb-ab8a-e49a2aba1d59). The underlying race is that process exit is not the same thing as stdout/stderr closure. If a child or grandchild inherits the pipe write end, or a process duplicates it with `dup2`, the watched process can exit while the stream is still open and more output can still arrive. The exec-server was starting exited-process retention cleanup from the exit event, so the process entry could be removed before the output streams had actually closed. While stress-testing the exec-server unit suite, `server::handler::tests::long_poll_read_fails_after_session_resume` exposed a separate test race: it started a short-lived command that could exit and wake the pending long-poll read before the session-resume assertion observed the resumed-session error. That test is intended to cover resume eviction, not process-exit delivery, so this change keeps the process alive and quiet while the second connection resumes the session. ## What changed - Keep exec-server process entries retained until stdout/stderr streams close, then start the post-exit retention timer from the closed event. - Wake long-poll readers when the closed event is emitted. - Add focused `local_process` unit coverage that proves late output is still retained after the short test retention interval has elapsed, and that closed process entries are eventually evicted. - Add a local and remote regression test where a parent exits while a child keeps inherited stdout open. The child waits on an explicit release file, so the test deterministically observes exit first, releases the child, then requires a nonzero-wait read from the exit sequence to receive the late output. - In `codex-rs/exec-server/src/server/handler/tests.rs`, make `long_poll_read_fails_after_session_resume` run a long-lived silent command instead of a short command that prints and exits. This isolates the test to session-resume behavior and prevents a normal process exit from satisfying the pending long-poll read first. ## Testing - `cargo test -p codex-exec-server exec_process_retains_output_after_exit_until_streams_close` - `cargo test -p codex-exec-server local_process::tests` - `cargo test -p codex-exec-server` - `just fix -p codex-exec-server` - `bazel test //codex-rs/exec-server:exec-server-unit-tests //codex-rs/exec-server:exec-server-exec_process-test //codex-rs/exec-server:exec-server-file_system-test //codex-rs/exec-server:exec-server-http_client-test //codex-rs/exec-server:exec-server-initialize-test //codex-rs/exec-server:exec-server-process-test //codex-rs/exec-server:exec-server-websocket-test` - `bazel test --runs_per_test=25 //codex-rs/exec-server:exec-server-unit-tests` ## Documentation No docs update needed; this is an internal exec-server correctness fix.
Michael Bolin ·
2026-04-23 19:49:58 +00:00 -
exec-server: expose arg0 alias root to fs sandbox (#19016)
## Why The post-merge `rust-ci-full` run for #18999 still failed the Ubuntu remote `suite::remote_env` sandboxed filesystem tests. That run checked out merge commit `ddde50c611e4800cb805f243ed3c50bbafe7d011`, so the arg0 guard lifetime fix was present. The Docker-backed failure had two remaining pieces: - The sandboxed filesystem helper needs to execute Codex through the `codex-linux-sandbox` arg0 alias path. The helper sandbox was only granting read access to the real Codex executable parent, so the alias parent also has to be visible inside the helper sandbox. - The remote-env tests were building sandbox contexts with `FileSystemSandboxContext::new()`, which captures the local test runner cwd. In the Docker remote exec-server, that host checkout path does not exist, so spawning the filesystem helper failed with `No such file or directory` before the helper could process the request. ## What Changed - Track all helper runtime read roots instead of a single root. - Add both the real Codex executable parent and the `codex-linux-sandbox` alias parent to sandbox readable roots. - Avoid sending an unused local cwd in remote filesystem sandbox contexts when the permission profile has no cwd-dependent entries. - Build the Docker remote-env test sandbox contexts with a cwd path that exists inside the container. - Add unit coverage for the alias-parent root and remote sandbox cwd handling. ## Verification - `cargo test -p codex-exec-server` - `cargo test -p codex-core remote_test_env_sandboxed_read_allows_readable_root` - `just fix -p codex-exec-server` - `just fix -p codex-core`
Michael Bolin ·
2026-04-22 21:34:22 +00:00 -
fix: fully revert agent identity runtime wiring (#18757)
## Summary This PR fully reverts the previously merged Agent Identity runtime integration from the old stack: https://github.com/openai/codex/pull/17387/changes It removes the Codex-side task lifecycle wiring, rollout/session persistence, feature flag plumbing, lazy `auth.json` mutation, background task auth paths, and request callsite changes introduced by that stack. This leaves the repo in a clean pre-AgentIdentity integration state so the follow-up PRs can reintroduce the pieces in smaller reviewable layers. ## Stack 1. This PR: full revert 2. https://github.com/openai/codex/pull/18871: move Agent Identity business logic into a crate 3. https://github.com/openai/codex/pull/18785: add explicit AgentIdentity auth mode and startup task allocation 4. https://github.com/openai/codex/pull/18811: migrate auth callsites through AuthProvider ## Testing Tests: targeted Rust checks, cargo-shear, Bazel lock check, and CI.
efrazer-oai ·
2026-04-21 14:30:55 -07:00 -
[2/8] Support piped stdin in exec process API (#18086)
## Summary - Add an explicit stdin mode to process/start. - Keep normal non-interactive exec stdin closed while allowing pipe-backed processes. ## Stack ```text o #18027 [8/8] Fail exec client operations after disconnect │ o #18025 [7/8] Cover MCP stdio tests with executor placement │ o #18089 [6/8] Wire remote MCP stdio through executor │ o #18088 [5/8] Add executor process transport for MCP stdio │ o #18087 [4/8] Abstract MCP stdio server launching │ o #18020 [3/8] Add pushed exec process events │ @ #18086 [2/8] Support piped stdin in exec process API │ o #18085 [1/8] Add MCP server environment config │ o main ``` Co-authored-by: Codex <noreply@openai.com>
Ahmed Ibrahim ·
2026-04-16 10:30:10 -07:00 -
Build remote exec env from exec-server policy (#17216)
## Summary - add an exec-server `envPolicy` field; when present, the server starts from its own process env and applies the shell environment policy there - keep `env` as the exact environment for local/embedded starts, but make it an overlay for remote unified-exec starts - move the shell-environment-policy builder into `codex-config` so Core and exec-server share the inherit/filter/set/include behavior - overlay only runtime/sandbox/network deltas from Core onto the exec-server-derived env ## Why Remote unified exec was materializing the shell env inside Core and forwarding the whole map to exec-server, so remote processes could inherit the orchestrator machine's `HOME`, `PATH`, etc. This keeps the base env on the executor while preserving Core-owned runtime additions like `CODEX_THREAD_ID`, unified-exec defaults, network proxy env, and sandbox marker env. ## Validation - `just fmt` - `git diff --check` - `cargo test -p codex-exec-server --lib` - `cargo test -p codex-core --lib unified_exec::process_manager::tests` - `cargo test -p codex-core --lib exec_env::tests` - `cargo test -p codex-core --lib exec_env_tests` (compile-only; filter matched 0 tests) - `cargo test -p codex-config --lib shell_environment` (compile-only; filter matched 0 tests) - `just bazel-lock-update` ## Known local validation issue - `just bazel-lock-check` is not runnable in this checkout: it invokes `./scripts/check-module-bazel-lock.sh`, which is missing. --------- Co-authored-by: Codex <noreply@openai.com> Co-authored-by: pakrym-oai <pakrym@openai.com>
jif-oai ·
2026-04-13 09:59:08 +01:00 -
Run exec-server fs operations through sandbox helper (#17294)
## Summary - run exec-server filesystem RPCs requiring sandboxing through a `codex-fs` arg0 helper over stdin/stdout - keep direct local filesystem execution for `DangerFullAccess` and external sandbox policies - remove the standalone exec-server binary path in favor of top-level arg0 dispatch/runtime paths - add sandbox escape regression coverage for local and remote filesystem paths ## Validation - `just fmt` - `git diff --check` - remote devbox: `cd codex-rs && bazel test --bes_backend= --bes_results_url= //codex-rs/exec-server:all` (6/6 passed) --------- Co-authored-by: Codex <noreply@openai.com>
starr-openai ·
2026-04-12 18:36:03 -07:00 -
Adrian ·
2026-04-11 09:52:06 -07:00 -
Fix Windows exec-server output test flake (#17409)
Problem: The Windows exec-server test command could let separator whitespace become part of `echo` output, making the exact retained-output assertion flaky. Solution: Tighten the Windows `cmd.exe` command by placing command separators directly after the echoed tokens so stdout remains deterministic while preserving the exact assertion.
Eric Traut ·
2026-04-10 19:24:40 -07:00 -
feat: move exec-server ownership (#16344)
This introduces session-scoped ownership for exec-server so ws disconnects no longer immediately kill running remote exec processes, and it prepares the protocol for reconnect-based resume. - add session_id / resume_session_id to the exec-server initialize handshake - move process ownership under a shared session registry - detach sessions on websocket disconnect and expire them after a TTL instead of killing processes immediately (we will resume based on this) - allow a new connection to resume an existing session and take over notifications/ownership - I use UUID to make them not predictable as we don't have auth for now - make detached-session expiry authoritative at resume time so teardown wins at the TTL boundary - reject long-poll process/read calls that get resumed out from under an older attachment --------- Co-authored-by: Codex <noreply@openai.com>
jif-oai ·
2026-04-10 14:11:47 +01:00 -
feat: use
ProcessIdinexec-server(#15866)Use a full struct for the ProcessId to increase readability and make it easier in the future to make it evolve if needed
jif-oai ·
2026-03-26 16:45:36 +01:00 -
Add exec-server exec RPC implementation (#15090)
Stacked PR 2/3, based on the stub PR. Adds the exec RPC implementation and process/event flow in exec-server only. --------- Co-authored-by: Codex <noreply@openai.com>
starr-openai ·
2026-03-19 19:00:36 +00:00