codex

Let Codex consult user-level code-review-* skills. (#30143 )

## Why

I use the `$code-review` skill a lot and it'd be nice to add my own
additional review criteria in `$CODEX_HOME/skills/code-review-*`.

## What

Removes phrasing about "code-review-* skills in this repository" which
in practice seems like enough to get Codex to consult my user-level code
review skills in addition to the repo-level ones.

Adam Perry @ OpenAI · 2026-06-26 12:36:40 -07:00

ac85409b7b

docs: document remote executor integration testing (#29790 )

## Why

Agents need a clear default for writing remote-compatible integration
tests and reproducible commands for each supported runner.

## What

Expand the `remote-tests` skill with fixture guidance, skip selection,
and Docker and Wine commands. Add always-visible `AGENTS.md` guidance
that points new core and app-server tests toward automatic environment
fixtures.

Stacked on #29789.

Adam Perry @ OpenAI · 2026-06-24 05:55:36 +00:00

31e428a1ef

test: branch on target OS instead of runner flavor (#29712 )

## Why

Core tests should branch on the executor's operating system, not on
runner details such as Docker or Wine. This keeps platform behavior
stable as new test backends are added and reserves Wine-specific skips
for actual runner debt.

## What

- Add `TestTargetOs` and target/host-aware skip helpers while keeping
`TestEnvironment` internal.
- Replace topology enum access with remote predicates and a narrow
Docker accessor.
- Migrate OS-semantic Wine skips, preserve runner-specific gaps, and
document the skip taxonomy.

## Validation

- `just test -p core_test_support`
- `just test -p codex-core
remote_test_env_can_connect_and_use_filesystem`
- `bazel test //codex-rs/core:core-all-wine-exec-test
--test_output=errors` reached test execution; unrelated existing
view-image, path, and timing failures remain.
- `just test -p codex-core` and `just test` reached broad test
execution; this checkout has unrelated helper, sandbox, and timing
failures.

Adam Perry @ OpenAI · 2026-06-23 14:27:13 -07:00

9a79536e6b

core: log AGENTS.md paths as URIs (#28989 )

## Why

No need to do path contortions when it's for our own logs.

## What

Follow up on a previous PR's nit and update the path-types skill for
future reference.

Adam Perry @ OpenAI · 2026-06-18 16:16:19 -07:00

195c936fa2

Record more path migration guidance for codex. (#28851 )

Some common themes pulled out of both human and automated reviews from
the last couple of days' migrations to `PathUri` and
`LegacyAppPathString`.

Adam Perry @ OpenAI · 2026-06-18 04:59:42 +00:00

285eff6c3e

Tell codex to avoid changing rollout format. (#28632 )

Just adds a requirement to the path-types skill to nudge Codex away from
touching rollout types while migrating paths.

Adam Perry @ OpenAI · 2026-06-17 10:51:41 -07:00

b947695a98

Revert "Tell codex about PathUri serde compat. (#28595 )" (#28627 )

This reverts commit bd2a786326, which
didn't capture all the nuance we need for this migration.

Adam Perry @ OpenAI · 2026-06-16 17:18:20 -07:00

bfe90188ad

Tell codex about PathUri serde compat. (#28595 )

This addresses another wrinkle I keep having to re-prompt codex about
when migrating to cross-OS paths.

Adam Perry @ OpenAI · 2026-06-16 15:01:22 -07:00

bd2a786326

Record invariants for path migration. (#28589 )

## Why

Help Codex understand how to execute the migration to support cross-OS
paths.

## What

Expand the path-types skill with our goals and constraints.

Adam Perry @ OpenAI · 2026-06-16 21:05:32 +00:00

33d50234a8

Clarify model-generated and legacy app path types (#28577 )

## Why

`ApiPathString` kind of implies that it can be used anywhere we pull a
path out of JSON, but it's not really appropriate for tool arguments
when the model might generate relative paths.

Prefer `String` for model-generated paths and we can handle the
conversion per feature for now and define a shared abstraction later if
it makes sense.

# What

Rename `ApiPathString` to `AppLegacyPathString` to clarify its role.

Expand the `path-types` skill to tell the model to leave tool args as
bare strings.

Adam Perry @ OpenAI · 2026-06-16 20:47:43 +00:00

322b83de5e

Run core integration tests against a Wine-backed Windows executor (#28401 )

## Why

We want to exercise a linux app-server against a windows exec-server
without having to repeat every test case. This approach has slight
precedent in the remote docker test setup.

## What

Run the shared `codex-core` integration suite against Windows
exec-server behavior from Linux. This makes cross-OS path and shell
regressions visible while keeping unsupported cases owned by individual
tests.

- Add `local`, `docker`, and `wine-exec` test environment selection with
legacy Docker compatibility.
- Extend `codex_rust_crate` to generate a sharded Wine-exec variant
using a cross-built Windows server and pinned Bazel Wine/PowerShell
runtimes.
- Teach remote-aware helpers about Windows paths and track temporary
incompatibilities with source-local `skip_if_wine_exec!` calls and
follow-up reasons.

Adam Perry @ OpenAI · 2026-06-16 00:38:41 +00:00

1fe89de576

[codex] add path-types skill (#28347 )

## Why

Codex contributors and agents need repository-scoped guidance for
choosing compatible Rust types
for operating system paths during the ongoing URI migration. Keeping the
guidance in the repository
makes the app-server and exec-server rules available consistently
without relying on a personal
skill installation.

## What

- Add the `path-types` skill at `.codex/skills/path-types/SKILL.md`.
- Document the intended uses of `ApiPathString`, `PathUri`,
`AbsolutePathBuf`, and `PathBuf` across
  protocol, internal, and shared dependency boundaries.
- Keep migrations of existing types limited to explicit requests and
proportional edits.

## Validation

- Validated the skill structure with skill-creator's
`quick_validate.py`.

Adam Perry @ OpenAI · 2026-06-15 11:48:31 -07:00

71633f8b8f

[codex] Ignore pending PR review comments (#27080 )

## Why

The PR babysitter could surface inline comments from a GitHub review
that was still in the `PENDING` state. That allowed Codex to start
acting on feedback before the reviewer submitted it.

## What changed

- Correlate inline comments with their parent review and ignore pending
reviews and their comments.
- Remove pending review IDs from saved watcher state so the feedback
surfaces normally after publication.
- Update the skill instructions and add regression coverage for the
draft-to-published transition.

## Validation

- `python3 -m pytest
.codex/skills/babysit-pr/scripts/test_gh_pr_watch.py`
- Skill package validation with `quick_validate.py`
- Live verification on #26835: the draft comment stayed hidden and
surfaced after the review was submitted.

Adam Perry @ OpenAI · 2026-06-09 08:29:32 -07:00

a770e5b847

[codex] Clarify PR babysitter state mutations (#27038 )

# Why

Codex is doing a bit too much on my PRs that it's babysitting. In
particular I'd like it to not interact with comment threads that involve
other humans -- I should be the one doing human interaction. This is
tricky because it's still very useful to be able to drop review comments
myself and have Codex iterate on them.

## What

This updates `.codex/skills/babysit-pr/SKILL.md` with an explicit GitHub
state mutation policy.

Adam Perry @ OpenAI · 2026-06-08 13:09:06 -07:00

f3a8074975

Add skill for pushing CI configuration changes (#26473 )

## Why

Codex agents that modify GitHub Actions configuration need clear
guidance when repository push protections require temporary approval.
Without it, an agent may pursue an unavailable exemption or stop before
checking whether the user already has access.

## What

Add a `pushing-ci-changes` skill that explains the restriction, directs
agents to attempt the push first, and tells them how to involve the user
when approval is required.

## Validation

Not run; this change only adds skill documentation.

Adam Perry @ OpenAI · 2026-06-04 15:40:16 -07:00

e695ec8ec6

codex-pr-body: avoid confidential references (#26260 )

## Why

PR descriptions can be visible outside the context used to generate
them. In #23710, a generated description referenced an internal
document, showing that the skill needs an explicit guardrail against
exposing confidential context.

## What changed

- Updated the `codex-pr-body` guidance to prohibit confidential
references, including codenames and OpenAI-internal URLs.

Adam Perry @ OpenAI · 2026-06-03 15:29:57 -07:00

14272b21e9

[codex] Copy user Bazel settings into Codex worktrees (#25925 )

## Why

Codex-created linked worktrees do not include ignored files from the
main worktree. Bazel users who keep local overrides in `user.bazelrc`
therefore lose those settings in every new worktree.

The setup must also work on Windows and must not overwrite a file that
already exists in the worktree.

## What changed

The checked-in Codex environment now invokes
`.codex/environments/setup.py`. The script resolves the main worktree
and current worktree, then uses
`copy_from_main_worktree_to_worktree(repo_relative_path)` to copy
ignored files into new worktrees without overwriting existing
destinations.

`main()` currently copies `user.bazelrc`. Additional repository-relative
paths can be added as further calls to the same helper.

## Validation

- Ran the setup script in a linked worktree and confirmed it handles a
missing main-worktree `user.bazelrc`.
- Verified the helper copies a main-worktree file, preserves an existing
worktree file, and creates parent directories for a nested path.

Adam Perry @ OpenAI · 2026-06-03 18:29:36 +00:00

2d5c264ebc

Uprev Rust toolchain pins to 1.95.0 (#24684 )

## Summary
- Bump the workspace Rust toolchain from `1.93.0` to `1.95.0` across
Cargo, Bazel, CI, release workflows, devcontainers, and the Codex
environment config.
- Refresh `MODULE.bazel.lock` so the Bazel Rust toolchain artifacts
match the new version.
- Leave purpose-specific toolchains unchanged, including the
`argument-comment-lint` nightly and the upstream `rusty_v8` `1.91.0`
build pin.
- Includes fixes for new lints from `just fix` and a few codex-authored
fixes for lints without a suggestion.

Adam Perry @ OpenAI · 2026-05-26 20:59:47 -07:00

cca1e0ba1d

CI: Customize v8 building (#22086 )

## Summary

Move the rusty_v8 artifact production into hermetic Bazel path and bump
the `v8` crate to `147.4.0`

The new flow builds V8 release artifacts from source for Darwin and
Linux targets, publishes both the current release-compatible artifacts
and sandbox-enabled variants, and keeps Cargo consumers on prebuilt
binaries by continuing to feed the `v8` crate the archive and generated
binding files it already expects.

## Why

We need control over V8 build-time features without giving up prebuilt
artifacts for downstream Cargo builds.

Upstream `rusty_v8` already supports source-only features such as
`v8_enable_sandbox`, but its normal prebuilt release assets do not cover
every feature combination we need. Building the artifacts ourselves lets
us enable settings such as the V8 sandbox and pointer compression at
artifact build time, then publish those outputs so ordinary Cargo builds
can still consume prebuilts instead of compiling V8 locally.

This keeps the fast consumer experience of prebuilt `rusty_v8` archives
while giving us a reproducible path to ship featureful variants that
upstream does not currently publish for us.

## Implementation Notes

The Bazel graph in this PR is not copied wholesale from `rusty_v8`;
`rusty_v8`'s normal source build is still GN/Ninja-based.

Instead, this change starts from upstream V8's Bazel rules and adapts
them to Codex's hermetic toolchains and dependency layout. Where we
intentionally follow `rusty_v8`, we mirror its existing artifact
contract:

- the same `v8` crate version and generated binding expectations
- the same sandbox feature relationship, where sandboxing requires
pointer compression
- the same custom libc++ model expected by Cargo's default
`use_custom_libcxx` feature
- the same release-style archive plus `src_binding` outputs consumed by
the `v8` crate

To preserve that contract, the Bazel release path pins the libc++,
libc++abi, and llvm-libc revisions used by `rusty_v8 v147.4.0`, builds
release artifacts with `--config=rusty-v8-upstream-libcxx`, and folds
the matching runtime objects into the final static archive.

## Windows

Windows is annoyingly handled differently.

Codex's current hermetic Bazel Windows C++ platform is `windows-gnullvm`
/ `x86_64-w64-windows-gnu`, while upstream `rusty_v8` publishes Windows
prebuilts for `*-pc-windows-msvc`. Those are different ABIs, so the
Bazel graph cannot truthfully reproduce the upstream MSVC artifacts
until we add a real MSVC-targeting C++ toolchain.

For now:

- Windows MSVC consumers continue to use upstream `rusty_v8` release
archives.
- Windows GNU targets are built in-tree so they link against a matching
GNU ABI.
- The canary workflow separately exercises upstream `rusty_v8` source
builds for MSVC sandbox artifacts, but MSVC is not yet part of the
Bazel-produced release matrix.

## Validation
This PR is technically self validating through CI. I have already
published it as a release tag so the artifacts from this branch are
published to
https://github.com/openai/codex/releases/tag/rusty-v8-v147.4.0 CI for
this PR should therefore consume our own release targets. I have also
locally tested for linux and darwin.

---------

Co-authored-by: Codex <noreply@openai.com>

Channing Conger · 2026-05-18 21:33:05 -07:00

7cdeab33d1

Deduplicate issue digest interactions by user (#22039 )

## Summary

The issue digest uses recent posts, comments, and reactions to decide
which issues deserve attention. A single active user could previously
raise an issue's apparent importance by commenting or reacting multiple
times in the window.

This changes `codex-issue-digest` so `user_interactions` counts unique
human GitHub users per issue across new issue posts, new comments, and
new reactions. Raw reaction/comment counts are still preserved for
detail output, and the skill guidance now describes `Interactions` as a
unique-human-user count.

Eric Traut · 2026-05-10 09:55:42 -07:00

76845d716b

[codex] Add Codex environment config (#20630 )

## Why

This adds a checked-in Codex environment configuration so the repo
exposes a ready-to-run Codex action from the app environment metadata.

## What changed

- Added `.codex/environments/environment.toml` with a generated `Run`
action.
- The action runs the `codex` binary from `codex-rs/Cargo.toml` with
`mcp_oauth_credentials_store=file`.

## Verification

- Not run; configuration-only change.

pakrym-oai · 2026-05-01 10:01:45 -07:00

9b8d585075

[codex] Improve PR babysitter CI diagnostics and guardrails (#20484 )

## Summary

- Surface failed GitHub Actions jobs in the PR babysitter watcher so
Codex can fetch job logs as soon as a job fails, instead of waiting for
the overall workflow run to complete.
- Update babysit-pr skill instructions, GitHub API notes, and heuristics
to prefer direct job log archives before falling back to `gh run view
--log-failed`.
- Add guardrails requiring explicit user confirmation before posting
replies to human-authored review comments.
- Add guardrails preventing Codex from patching unrelated flaky tests,
CI infrastructure, runner issues, dependency outages, or other failures
not caused by the PR branch.

## Validation

- `python3 -m pytest
.codex/skills/babysit-pr/scripts/test_gh_pr_watch.py`

Tom · 2026-04-30 19:58:19 -07:00

c39824c2fd

Refine Codex issue digest summaries (#20097 )

## Why

The `codex-issue-digest` skill was producing more detail than the daily
digest needed, and broad all-area digests could miss active issues. In
particular, issue #16088 had substantial recent comments and reactions
but did not appear in the weekly all-areas output because GitHub search
was using default relevance ranking and the collector could exhaust its
candidate cap before later search queries got a fair sample.

That made the digest look quieter than the underlying user activity and
made threshold tuning misleading.

## What changed

- Make the digest summary headline-first and summary-only by default.
- Add an explicit opt-in flow for `## Details`, so the issue table is
shown only when requested or when the prompt asks for details upfront.
- Update the collector to request GitHub issue search results with
`sort=updated` and `order=desc`.
- Apply the search candidate cap per query instead of globally across
all queries.
- Bump the collector script version to `3`.
- Add tests that cover updated sorting and per-query candidate limits.

## Verification

- `pytest
.codex/skills/codex-issue-digest/scripts/test_collect_issue_digest.py`
- `ruff check
.codex/skills/codex-issue-digest/scripts/collect_issue_digest.py
.codex/skills/codex-issue-digest/scripts/test_collect_issue_digest.py`
- `git diff --check`
- Reran the all-areas weekly collector and confirmed #16088 is now
included with `55` interactions.

Eric Traut · 2026-04-28 16:53:59 -07:00

2223b31c06

Add Codex issue digest skill (#19779 )

Problem: Maintainers need a shared way to run Codex GitHub issue digests
without copying large prompts or relying on manual GitHub page
summaries.

Solution: Add a reusable codex-issue-digest skill with a deterministic
GitHub collector, owner/all-label windows, reaction-aware activity
metrics, scaled attention markers, and focused tests.

Eric Traut · 2026-04-26 23:16:43 -07:00

4f1d5f00f0

Split DeveloperInstructions into individual fragments. (#18813 )

Split DeveloperInstructions into individual fragments.

pakrym-oai · 2026-04-21 10:22:36 -07:00

2a226096f6

[codex] Tighten code review skill wording (#18818 )

## Summary

This updates the code review orchestrator skill wording so the
instruction explicitly requires returning every issue from every
subagent.

## Impact

The change is limited to `.codex/skills/code-review/SKILL.md` and
clarifies review aggregation behavior for future Codex-driven reviews.

## Validation

No tests were run because this is a markdown-only skill wording change.

pakrym-oai · 2026-04-21 00:04:04 -07:00

a3ed5068c1

Add Code Review skill (#18746 )

Adds a skill that centralizes rules used during code review for codex.

pakrym-oai · 2026-04-20 16:01:16 -07:00

513dc28717

Attribute automated PR Babysitter review replies (#18379 )

## Summary
PR Babysitter can reply directly to GitHub code review comments when
feedback is non-actionable, already addressed, or not valid. Those
replies should be visibly attributed so reviewers do not mistake an
automated Codex response for a message from the human operator.

This updates the skill instructions to require GitHub code review
replies from the babysitter to start with `[codex]`.

## Changes
- Adds the `[codex]` prefix requirement to the core PR Babysitter
workflow.
- Repeats the requirement in the review comment handling guidance where
agents decide whether to reply to a review thread.

Eric Traut · 2026-04-17 12:27:48 -07:00

d8b91f5fa1

feat: introduce codex-pr-body skill (#18033 )

## Motivation

Codex needs a repeatable workflow for updating PR metadata after a pull
request already exists. This is more specific than generic GitHub
handling: the assistant needs to preserve author-provided body content,
explain why the PR exists before listing implementation details, and
describe only the net change under review, including when Sapling stacks
put a PR on top of another PR instead of `main`.

## Changes

- Adds `.codex/skills/codex-pr-body/SKILL.md`.
- Documents how to infer the target PR from the current branch or
commit, including Sapling-specific PR metadata and `sl sl` output.
- Defines the expected PR body update behavior: inspect the existing
body, preserve key content such as images, avoid local absolute paths,
use Markdown formatting, include relevant issue/PR references, and call
out developer docs follow-up only when applicable.
- Captures stacked-PR handling so generated PR text describes the change
between the PR's base and head, rather than unrelated ancestor changes.

## Verification

Not run; this is a Codex skill documentation addition.

Michael Bolin · 2026-04-15 18:07:46 -07:00

d63ba2d5ec

Add project-local codex bug triage skill (#17064 )

Add a `codex-bug` skill to help diagnose and fix bugs in codex.

Eric Traut · 2026-04-07 19:20:04 -07:00

3fe0e022be

Fix PR babysitter review comment monitoring (#16363 )

## Summary
- prioritize newly surfaced review comments ahead of CI and mergeability
handling in the PR babysitter watcher
- keep `--watch` running for open PRs even when they are currently
merge-ready so later review feedback is not missed

Eric Traut · 2026-03-31 14:25:32 -06:00

0fe873ad5f

Update PR babysitter skill for review replies and resolution (#16112 )

This PR updates the "PR Babysitter" skill to clarify that non-actionable
review comments should receive a direct reply explaining why no change
is needed, and actionable review comments should be marked "resolved"
after they are addressed.

Eric Traut · 2026-03-28 10:35:20 -06:00

3d1abf3f3d

Add remote test skill (#15324 )

Teach codex to run remote tests.

pakrym-oai · 2026-03-20 10:37:57 -07:00

4ddde54c19

Add PR babysitting skill for this repo (#12513 )

## PR Notes

This PR adds a project-scoped `babysit-pr` skill for ongoing PR
monitoring (CI, reviews, mergeability).

Simply invoke this skill after creating a PR, and codex will do its best
to get it to a mergeable state:

### What the skill does
* Fixes CI failures related to the PR
* Retries CI failures due to flaky tests
* Addresses code review comments if it agrees with them
* Addresses merge conflicts on main branch

### How the skill works
- Polls PR status on a loop (CI checks, workflow runs, review activity,
mergeability, and review decision).
- Detects new review feedback (including inline comments and automated
Codex review comments) and prompts/handles follow-up work.
- Distinguishes pending vs failed vs passed CI and identifies likely
flaky failures.
- Can retry failed checks/workflows when appropriate.
- Prioritizes actionable code review feedback over flaky CI retries (to
avoid rerunning CI on a SHA that is about to be replaced).
- Continues monitoring after fixes are applied and pushed, rather than
stopping after a progress update.
- Uses a slower backoff polling cadence once CI is green, while still
watching for new review feedback or state changes.
- Treats required review/approval as a blocking condition and keeps
watching until the PR is actually merge-ready (or merged/closed, or
human intervention is needed).

### Intended outcome

Keep the PR moving with minimal manual babysitting by continuously
watching for CI failures, reviewer feedback, and merge blockers, and
responding in the right order until the PR is ready to merge.

Eric Traut · 2026-02-22 15:36:28 -08:00

7e569f1162

Teach codex to test itself (#11531 )

For fun and profit!

pakrym-oai · 2026-02-11 20:03:19 -08:00

b8e0d7594f

35 Commits