codex

feat(shell-tool-mcp): add patched zsh build pipeline (#11668 )

## Summary
- add `shell-tool-mcp/patches/zsh-exec-wrapper.patch` against upstream
zsh `77045ef899e53b9598bebc5a41db93a548a40ca6`
- add `zsh-linux` and `zsh-darwin` jobs to
`.github/workflows/shell-tool-mcp.yml`
- stage zsh binaries under `artifacts/vendor/<target>/zsh/<variant>/zsh`
- include zsh artifact jobs in `package.needs`
- mark staged zsh binaries executable during packaging

## Notes
- zsh source is cloned from `https://git.code.sf.net/p/zsh/code`
- workflow pins zsh commit `77045ef899e53b9598bebc5a41db93a548a40ca6`
- zsh build runs `./Util/preconfig` before `./configure`

## Validation
- parsed workflow YAML locally (`yaml-ok`)
- validated zsh patch applies cleanly with `git apply --check` on a
fresh zsh clone

Jeremy Rose · 2026-02-13 01:34:48 +00:00

9cf7a07281

Bump rmcp to 0.15 (#11539 )

https://github.com/modelcontextprotocol/rust-sdk/pull/598 in 0.14 broke
some MCP oauth (like Linear) and
https://github.com/modelcontextprotocol/rust-sdk/pull/641 fixed it in
0.15

Gabriel Peal · 2026-02-11 22:04:17 -08:00

bd3ce98190

Upgrade rmcp to 0.14 (#10718 )

- [x] Upgrade rmcp to 0.14

Matthew Zeng · 2026-02-08 15:07:53 -08:00

9f1009540b

fix: correct login shell mismatch in the accept_elicitation_for_prompt_rule() test (#8931 )

Because the path to `git` is used to construct `elicitations_to_accept`,
we need to ensure that we resolve which `git` to use the same way our
Bash process will:


https://github.com/openai/codex/blob/c9c65606852c0cda9d983b4917359a0826a4b7f0/codex-rs/exec-server/tests/suite/accept_elicitation.rs#L59-L69

This fixes an issue when running the test on macOS using Bazel
(https://github.com/openai/codex/pull/8875) where the login shell chose
`/opt/homebrew/bin/git` whereas the non-login shell chose
`/usr/bin/git`.

Michael Bolin · 2026-01-08 12:37:38 -08:00

a70f5b0b3c

feat: introduce codex-utils-cargo-bin as an alternative to assert_cmd::Command (#8496 )

This PR introduces a `codex-utils-cargo-bin` utility crate that
wraps/replaces our use of `assert_cmd::Command` and
`escargot::CargoBuild`.

As you can infer from the introduction of `buck_project_root()` in this
PR, I am attempting to make it possible to build Codex under
[Buck2](https://buck2.build) as well as `cargo`. With Buck2, I hope to
achieve faster incremental local builds (largely due to Buck2's
[dice](https://buck2.build/docs/insights_and_knowledge/modern_dice/)
build strategy, as well as benefits from its local build daemon) as well
as faster CI builds if we invest in remote execution and caching.

See
https://buck2.build/docs/getting_started/what_is_buck2/#why-use-buck2-key-advantages
for more details about the performance advantages of Buck2.

Buck2 enforces stronger requirements in terms of build and test
isolation. It discourages assumptions about absolute paths (which is key
to enabling remote execution). Because the `CARGO_BIN_EXE_*` environment
variables that Cargo provides are absolute paths (which
`assert_cmd::Command` reads), this is a problem for Buck2, which is why
we need this `codex-utils-cargo-bin` utility.

My WIP-Buck2 setup sets the `CARGO_BIN_EXE_*` environment variables
passed to a `rust_test()` build rule as relative paths.
`codex-utils-cargo-bin` will resolve these values to absolute paths,
when necessary.


---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/8496).
* #8498
* __->__ #8496

Michael Bolin · 2025-12-23 19:29:32 -08:00

e61bae12e3

fix: change codex/sandbox-state/update from a notification to a request (#8142 )

Historically, `accept_elicitation_for_prompt_rule()` was flaky because
we were using a notification to update the sandbox followed by a `shell`
tool request that we expected to be subject to the new sandbox config,
but because [rmcp](https://crates.io/crates/rmcp) MCP servers delegate
each incoming message to a new Tokio task, messages are not guaranteed
to be processed in order, so sometimes the `shell` tool call would run
before the notification was processed.

Prior to this PR, we relied on a generous `sleep()` between the
notification and the request to reduce the change of the test flaking
out.

This PR implements a proper fix, which is to use a _request_ instead of
a notification for the sandbox update so that we can wait for the
response to the sandbox request before sending the request to the
`shell` tool call. Previously, `rmcp` did not support custom requests,
but I fixed that in
https://github.com/modelcontextprotocol/rust-sdk/pull/590, which made it
into the `0.12.0` release (see #8288).

This PR updates `shell-tool-mcp` to expect
`"codex/sandbox-state/update"` as a _request_ instead of a notification
and sends the appropriate ack. Note this behavior is tied to our custom
`codex/sandbox-state` capability, which Codex honors as an MCP client,
which is why `core/src/mcp_connection_manager.rs` had to be updated as
part of this PR, as well.

This PR also updates the docs at `shell-tool-mcp/README.md`.

Michael Bolin · 2025-12-18 15:32:01 -08:00

46baedd7cb

Fixes mcp elicitation test that fails for me when run locally (#8020 )

Eric Traut · 2025-12-15 16:23:04 -08:00

d9554c8191

exec-server: additional context for errors (#7935 )

Add a .context() on some exec-server errors for debugging CI flakes.

Also, "login": false in the test to make the test not affected by user
profile.

Jeremy Rose · 2025-12-15 11:40:40 -08:00

2c6995ca4d

fix: policy/*.codexpolicy -> rules/*.rules (#7888 )

We decided that `*.rules` is a more fitting (and concise) file extension
than `*.codexpolicy`, so we are changing the file extension for the
"execpolicy" effort. We are also changing the subfolder of `$CODEX_HOME`
from `policy` to `rules` to match.

This PR updates the in-repo docs and we will update the public docs once
the next CLI release goes out.

Locally, I created `~/.codex/rules/default.rules` with the following
contents:

```
prefix_rule(pattern=["gh", "pr", "view"])
```

And then I asked Codex to run:

```
gh pr view 7888 --json title,body,comments
```

and it was able to!

Michael Bolin · 2025-12-11 14:46:00 -08:00

e0d7ac51d3

fix: add a hopefully-temporary sleep to reduce test flakiness (#7848 )

Let's see if this `sleep()` call is good enough to fix the test
flakiness we currently see in CI. It will take me some time to upstream
a proper fix, and I would prefer not to disable this test in the
interim.

Michael Bolin · 2025-12-11 00:51:33 +00:00

038767af69

fix: ensure accept_elicitation_for_prompt_rule() test passes locally (#7832 )

When I originally introduced `accept_elicitation_for_prompt_rule()` in
https://github.com/openai/codex/pull/7617, it worked for me locally
because I had run `codex-rs/exec-server/tests/suite/bash` once myself,
which had the side-effect of installing the corresponding DotSlash
artifact.

In CI, I added explicit logic to do this as part of
`.github/workflows/rust-ci.yml`, which meant the test also passed in CI,
but this logic should have been done as part of the test so that it
would work locally for devs who had not installed the DotSlash artifact
for `codex-rs/exec-server/tests/suite/bash` before. This PR updates the
test to do this (and deletes the setup logic from `rust-ci.yml`),
creating a new `DOTSLASH_CACHE` in a temp directory so that this is
handled independently for each test.

While here, also added a check to ensure that the `codex` binary has
been built prior to running the test, as we have to ensure it is
symlinked as `codex-linux-sandbox` on Linux in order for the integration
test to work on that platform.

Michael Bolin · 2025-12-10 15:17:13 -08:00

87f5b69b24

fix: allow sendmsg(2) and recvmsg(2) syscalls in our Linux sandbox (#7779 )

This changes our default Landlock policy to allow `sendmsg(2)` and
`recvmsg(2)` syscalls. We believe these were originally denied out of an
abundance of caution, but given that `send(2)` nor `recv(2)` are allowed
today [which provide comparable capability to the `*msg` equivalents],
we do not believe allowing them grants any privileges beyond what we
already allow.

Rather than using the syscall as the security boundary, preventing
access to the potentially hazardous file descriptor in the first place
seems like the right layer of defense.

In particular, this makes it possible for `shell-tool-mcp` to run on
Linux when using a read-only sandbox for the Bash process, as
demonstrated by `accept_elicitation_for_prompt_rule()` now succeeding in
CI.

Michael Bolin · 2025-12-09 09:24:01 -08:00

a7e3e37da8

fix: add integration tests for codex-exec-mcp-server with execpolicy (#7617 )

This PR introduces integration tests that run
[codex-shell-tool-mcp](https://www.npmjs.com/package/@openai/codex-shell-tool-mcp)
as a user would. Note that this requires running our fork of Bash, so we
introduce a [DotSlash](https://dotslash-cli.com/) file for `bash` so
that we can run the integration tests on multiple platforms without
having to check the binaries into the repository. (As noted in the
DotSlash file, it is slightly more heavyweight than necessary, which may
be worth addressing as disk space in CI is limited:
https://github.com/openai/codex/pull/7678.)

To start, this PR adds two tests:

- `list_tools()` makes the `list_tools` request to the MCP server and
verifies we get the expected response
- `accept_elicitation_for_prompt_rule()` defines a `prefix_rule()` with
`decision="prompt"` and verifies the elicitation flow works as expected

Though the `accept_elicitation_for_prompt_rule()` test **only works on
Linux**, as this PR reveals that there are currently issues when running
the Bash fork in a read-only sandbox on Linux. This will have to be
fixed in a follow-up PR.

Incidentally, getting this test run to correctly on macOS also requires
a recent fix we made to `brew` that hasn't hit a mainline release yet,
so getting CI green in this PR required
https://github.com/openai/codex/pull/7680.

Michael Bolin · 2025-12-07 06:39:38 +00:00

3c3d3d1adc

13 Commits