35 Commits

  • Let Codex consult user-level code-review-* skills. (#30143)
    ## Why
    
    I use the `$code-review` skill a lot and it'd be nice to add my own
    additional review criteria in `$CODEX_HOME/skills/code-review-*`.
    
    ## What
    
    Removes phrasing about "code-review-* skills in this repository" which
    in practice seems like enough to get Codex to consult my user-level code
    review skills in addition to the repo-level ones.
  • docs: document remote executor integration testing (#29790)
    ## Why
    
    Agents need a clear default for writing remote-compatible integration
    tests and reproducible commands for each supported runner.
    
    ## What
    
    Expand the `remote-tests` skill with fixture guidance, skip selection,
    and Docker and Wine commands. Add always-visible `AGENTS.md` guidance
    that points new core and app-server tests toward automatic environment
    fixtures.
    
    Stacked on #29789.
  • test: branch on target OS instead of runner flavor (#29712)
    ## Why
    
    Core tests should branch on the executor's operating system, not on
    runner details such as Docker or Wine. This keeps platform behavior
    stable as new test backends are added and reserves Wine-specific skips
    for actual runner debt.
    
    ## What
    
    - Add `TestTargetOs` and target/host-aware skip helpers while keeping
    `TestEnvironment` internal.
    - Replace topology enum access with remote predicates and a narrow
    Docker accessor.
    - Migrate OS-semantic Wine skips, preserve runner-specific gaps, and
    document the skip taxonomy.
    
    ## Validation
    
    - `just test -p core_test_support`
    - `just test -p codex-core
    remote_test_env_can_connect_and_use_filesystem`
    - `bazel test //codex-rs/core:core-all-wine-exec-test
    --test_output=errors` reached test execution; unrelated existing
    view-image, path, and timing failures remain.
    - `just test -p codex-core` and `just test` reached broad test
    execution; this checkout has unrelated helper, sandbox, and timing
    failures.
  • core: log AGENTS.md paths as URIs (#28989)
    ## Why
    
    No need to do path contortions when it's for our own logs.
    
    ## What
    
    Follow up on a previous PR's nit and update the path-types skill for
    future reference.
  • Record more path migration guidance for codex. (#28851)
    Some common themes pulled out of both human and automated reviews from
    the last couple of days' migrations to `PathUri` and
    `LegacyAppPathString`.
  • Tell codex to avoid changing rollout format. (#28632)
    Just adds a requirement to the path-types skill to nudge Codex away from
    touching rollout types while migrating paths.
  • Revert "Tell codex about PathUri serde compat. (#28595)" (#28627)
    This reverts commit bd2a786326, which
    didn't capture all the nuance we need for this migration.
  • Tell codex about PathUri serde compat. (#28595)
    This addresses another wrinkle I keep having to re-prompt codex about
    when migrating to cross-OS paths.
  • Record invariants for path migration. (#28589)
    ## Why
    
    Help Codex understand how to execute the migration to support cross-OS
    paths.
    
    ## What
    
    Expand the path-types skill with our goals and constraints.
  • Clarify model-generated and legacy app path types (#28577)
    ## Why
    
    `ApiPathString` kind of implies that it can be used anywhere we pull a
    path out of JSON, but it's not really appropriate for tool arguments
    when the model might generate relative paths.
    
    Prefer `String` for model-generated paths and we can handle the
    conversion per feature for now and define a shared abstraction later if
    it makes sense.
    
    # What
    
    Rename `ApiPathString` to `AppLegacyPathString` to clarify its role.
    
    Expand the `path-types` skill to tell the model to leave tool args as
    bare strings.
  • Run core integration tests against a Wine-backed Windows executor (#28401)
    ## Why
    
    We want to exercise a linux app-server against a windows exec-server
    without having to repeat every test case. This approach has slight
    precedent in the remote docker test setup.
    
    ## What
    
    Run the shared `codex-core` integration suite against Windows
    exec-server behavior from Linux. This makes cross-OS path and shell
    regressions visible while keeping unsupported cases owned by individual
    tests.
    
    - Add `local`, `docker`, and `wine-exec` test environment selection with
    legacy Docker compatibility.
    - Extend `codex_rust_crate` to generate a sharded Wine-exec variant
    using a cross-built Windows server and pinned Bazel Wine/PowerShell
    runtimes.
    - Teach remote-aware helpers about Windows paths and track temporary
    incompatibilities with source-local `skip_if_wine_exec!` calls and
    follow-up reasons.
  • [codex] add path-types skill (#28347)
    ## Why
    
    Codex contributors and agents need repository-scoped guidance for
    choosing compatible Rust types
    for operating system paths during the ongoing URI migration. Keeping the
    guidance in the repository
    makes the app-server and exec-server rules available consistently
    without relying on a personal
    skill installation.
    
    ## What
    
    - Add the `path-types` skill at `.codex/skills/path-types/SKILL.md`.
    - Document the intended uses of `ApiPathString`, `PathUri`,
    `AbsolutePathBuf`, and `PathBuf` across
      protocol, internal, and shared dependency boundaries.
    - Keep migrations of existing types limited to explicit requests and
    proportional edits.
    
    ## Validation
    
    - Validated the skill structure with skill-creator's
    `quick_validate.py`.
  • [codex] Ignore pending PR review comments (#27080)
    ## Why
    
    The PR babysitter could surface inline comments from a GitHub review
    that was still in the `PENDING` state. That allowed Codex to start
    acting on feedback before the reviewer submitted it.
    
    ## What changed
    
    - Correlate inline comments with their parent review and ignore pending
    reviews and their comments.
    - Remove pending review IDs from saved watcher state so the feedback
    surfaces normally after publication.
    - Update the skill instructions and add regression coverage for the
    draft-to-published transition.
    
    ## Validation
    
    - `python3 -m pytest
    .codex/skills/babysit-pr/scripts/test_gh_pr_watch.py`
    - Skill package validation with `quick_validate.py`
    - Live verification on #26835: the draft comment stayed hidden and
    surfaced after the review was submitted.
  • [codex] Clarify PR babysitter state mutations (#27038)
    # Why
    
    Codex is doing a bit too much on my PRs that it's babysitting. In
    particular I'd like it to not interact with comment threads that involve
    other humans -- I should be the one doing human interaction. This is
    tricky because it's still very useful to be able to drop review comments
    myself and have Codex iterate on them.
    
    ## What
    
    This updates `.codex/skills/babysit-pr/SKILL.md` with an explicit GitHub
    state mutation policy.
  • Add skill for pushing CI configuration changes (#26473)
    ## Why
    
    Codex agents that modify GitHub Actions configuration need clear
    guidance when repository push protections require temporary approval.
    Without it, an agent may pursue an unavailable exemption or stop before
    checking whether the user already has access.
    
    ## What
    
    Add a `pushing-ci-changes` skill that explains the restriction, directs
    agents to attempt the push first, and tells them how to involve the user
    when approval is required.
    
    ## Validation
    
    Not run; this change only adds skill documentation.
  • codex-pr-body: avoid confidential references (#26260)
    ## Why
    
    PR descriptions can be visible outside the context used to generate
    them. In #23710, a generated description referenced an internal
    document, showing that the skill needs an explicit guardrail against
    exposing confidential context.
    
    ## What changed
    
    - Updated the `codex-pr-body` guidance to prohibit confidential
    references, including codenames and OpenAI-internal URLs.
  • [codex] Copy user Bazel settings into Codex worktrees (#25925)
    ## Why
    
    Codex-created linked worktrees do not include ignored files from the
    main worktree. Bazel users who keep local overrides in `user.bazelrc`
    therefore lose those settings in every new worktree.
    
    The setup must also work on Windows and must not overwrite a file that
    already exists in the worktree.
    
    ## What changed
    
    The checked-in Codex environment now invokes
    `.codex/environments/setup.py`. The script resolves the main worktree
    and current worktree, then uses
    `copy_from_main_worktree_to_worktree(repo_relative_path)` to copy
    ignored files into new worktrees without overwriting existing
    destinations.
    
    `main()` currently copies `user.bazelrc`. Additional repository-relative
    paths can be added as further calls to the same helper.
    
    ## Validation
    
    - Ran the setup script in a linked worktree and confirmed it handles a
    missing main-worktree `user.bazelrc`.
    - Verified the helper copies a main-worktree file, preserves an existing
    worktree file, and creates parent directories for a nested path.
  • Uprev Rust toolchain pins to 1.95.0 (#24684)
    ## Summary
    - Bump the workspace Rust toolchain from `1.93.0` to `1.95.0` across
    Cargo, Bazel, CI, release workflows, devcontainers, and the Codex
    environment config.
    - Refresh `MODULE.bazel.lock` so the Bazel Rust toolchain artifacts
    match the new version.
    - Leave purpose-specific toolchains unchanged, including the
    `argument-comment-lint` nightly and the upstream `rusty_v8` `1.91.0`
    build pin.
    - Includes fixes for new lints from `just fix` and a few codex-authored
    fixes for lints without a suggestion.
  • CI: Customize v8 building (#22086)
    ## Summary
    
    Move the rusty_v8 artifact production into hermetic Bazel path and bump
    the `v8` crate to `147.4.0`
    
    The new flow builds V8 release artifacts from source for Darwin and
    Linux targets, publishes both the current release-compatible artifacts
    and sandbox-enabled variants, and keeps Cargo consumers on prebuilt
    binaries by continuing to feed the `v8` crate the archive and generated
    binding files it already expects.
    
    ## Why
    
    We need control over V8 build-time features without giving up prebuilt
    artifacts for downstream Cargo builds.
    
    Upstream `rusty_v8` already supports source-only features such as
    `v8_enable_sandbox`, but its normal prebuilt release assets do not cover
    every feature combination we need. Building the artifacts ourselves lets
    us enable settings such as the V8 sandbox and pointer compression at
    artifact build time, then publish those outputs so ordinary Cargo builds
    can still consume prebuilts instead of compiling V8 locally.
    
    This keeps the fast consumer experience of prebuilt `rusty_v8` archives
    while giving us a reproducible path to ship featureful variants that
    upstream does not currently publish for us.
    
    ## Implementation Notes
    
    The Bazel graph in this PR is not copied wholesale from `rusty_v8`;
    `rusty_v8`'s normal source build is still GN/Ninja-based.
    
    Instead, this change starts from upstream V8's Bazel rules and adapts
    them to Codex's hermetic toolchains and dependency layout. Where we
    intentionally follow `rusty_v8`, we mirror its existing artifact
    contract:
    
    - the same `v8` crate version and generated binding expectations
    - the same sandbox feature relationship, where sandboxing requires
    pointer compression
    - the same custom libc++ model expected by Cargo's default
    `use_custom_libcxx` feature
    - the same release-style archive plus `src_binding` outputs consumed by
    the `v8` crate
    
    To preserve that contract, the Bazel release path pins the libc++,
    libc++abi, and llvm-libc revisions used by `rusty_v8 v147.4.0`, builds
    release artifacts with `--config=rusty-v8-upstream-libcxx`, and folds
    the matching runtime objects into the final static archive.
    
    ## Windows
    
    Windows is annoyingly handled differently.
    
    Codex's current hermetic Bazel Windows C++ platform is `windows-gnullvm`
    / `x86_64-w64-windows-gnu`, while upstream `rusty_v8` publishes Windows
    prebuilts for `*-pc-windows-msvc`. Those are different ABIs, so the
    Bazel graph cannot truthfully reproduce the upstream MSVC artifacts
    until we add a real MSVC-targeting C++ toolchain.
    
    For now:
    
    - Windows MSVC consumers continue to use upstream `rusty_v8` release
    archives.
    - Windows GNU targets are built in-tree so they link against a matching
    GNU ABI.
    - The canary workflow separately exercises upstream `rusty_v8` source
    builds for MSVC sandbox artifacts, but MSVC is not yet part of the
    Bazel-produced release matrix.
    
    ## Validation
    This PR is technically self validating through CI. I have already
    published it as a release tag so the artifacts from this branch are
    published to
    https://github.com/openai/codex/releases/tag/rusty-v8-v147.4.0 CI for
    this PR should therefore consume our own release targets. I have also
    locally tested for linux and darwin.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Deduplicate issue digest interactions by user (#22039)
    ## Summary
    
    The issue digest uses recent posts, comments, and reactions to decide
    which issues deserve attention. A single active user could previously
    raise an issue's apparent importance by commenting or reacting multiple
    times in the window.
    
    This changes `codex-issue-digest` so `user_interactions` counts unique
    human GitHub users per issue across new issue posts, new comments, and
    new reactions. Raw reaction/comment counts are still preserved for
    detail output, and the skill guidance now describes `Interactions` as a
    unique-human-user count.
  • [codex] Add Codex environment config (#20630)
    ## Why
    
    This adds a checked-in Codex environment configuration so the repo
    exposes a ready-to-run Codex action from the app environment metadata.
    
    ## What changed
    
    - Added `.codex/environments/environment.toml` with a generated `Run`
    action.
    - The action runs the `codex` binary from `codex-rs/Cargo.toml` with
    `mcp_oauth_credentials_store=file`.
    
    ## Verification
    
    - Not run; configuration-only change.
  • [codex] Improve PR babysitter CI diagnostics and guardrails (#20484)
    ## Summary
    
    - Surface failed GitHub Actions jobs in the PR babysitter watcher so
    Codex can fetch job logs as soon as a job fails, instead of waiting for
    the overall workflow run to complete.
    - Update babysit-pr skill instructions, GitHub API notes, and heuristics
    to prefer direct job log archives before falling back to `gh run view
    --log-failed`.
    - Add guardrails requiring explicit user confirmation before posting
    replies to human-authored review comments.
    - Add guardrails preventing Codex from patching unrelated flaky tests,
    CI infrastructure, runner issues, dependency outages, or other failures
    not caused by the PR branch.
    
    ## Validation
    
    - `python3 -m pytest
    .codex/skills/babysit-pr/scripts/test_gh_pr_watch.py`
  • Refine Codex issue digest summaries (#20097)
    ## Why
    
    The `codex-issue-digest` skill was producing more detail than the daily
    digest needed, and broad all-area digests could miss active issues. In
    particular, issue #16088 had substantial recent comments and reactions
    but did not appear in the weekly all-areas output because GitHub search
    was using default relevance ranking and the collector could exhaust its
    candidate cap before later search queries got a fair sample.
    
    That made the digest look quieter than the underlying user activity and
    made threshold tuning misleading.
    
    ## What changed
    
    - Make the digest summary headline-first and summary-only by default.
    - Add an explicit opt-in flow for `## Details`, so the issue table is
    shown only when requested or when the prompt asks for details upfront.
    - Update the collector to request GitHub issue search results with
    `sort=updated` and `order=desc`.
    - Apply the search candidate cap per query instead of globally across
    all queries.
    - Bump the collector script version to `3`.
    - Add tests that cover updated sorting and per-query candidate limits.
    
    ## Verification
    
    - `pytest
    .codex/skills/codex-issue-digest/scripts/test_collect_issue_digest.py`
    - `ruff check
    .codex/skills/codex-issue-digest/scripts/collect_issue_digest.py
    .codex/skills/codex-issue-digest/scripts/test_collect_issue_digest.py`
    - `git diff --check`
    - Reran the all-areas weekly collector and confirmed #16088 is now
    included with `55` interactions.
  • Add Codex issue digest skill (#19779)
    Problem: Maintainers need a shared way to run Codex GitHub issue digests
    without copying large prompts or relying on manual GitHub page
    summaries.
    
    Solution: Add a reusable codex-issue-digest skill with a deterministic
    GitHub collector, owner/all-label windows, reaction-aware activity
    metrics, scaled attention markers, and focused tests.
  • Split DeveloperInstructions into individual fragments. (#18813)
    Split DeveloperInstructions into individual fragments.
  • [codex] Tighten code review skill wording (#18818)
    ## Summary
    
    This updates the code review orchestrator skill wording so the
    instruction explicitly requires returning every issue from every
    subagent.
    
    ## Impact
    
    The change is limited to `.codex/skills/code-review/SKILL.md` and
    clarifies review aggregation behavior for future Codex-driven reviews.
    
    ## Validation
    
    No tests were run because this is a markdown-only skill wording change.
  • Add Code Review skill (#18746)
    Adds a skill that centralizes rules used during code review for codex.
  • Attribute automated PR Babysitter review replies (#18379)
    ## Summary
    PR Babysitter can reply directly to GitHub code review comments when
    feedback is non-actionable, already addressed, or not valid. Those
    replies should be visibly attributed so reviewers do not mistake an
    automated Codex response for a message from the human operator.
    
    This updates the skill instructions to require GitHub code review
    replies from the babysitter to start with `[codex]`.
    
    ## Changes
    - Adds the `[codex]` prefix requirement to the core PR Babysitter
    workflow.
    - Repeats the requirement in the review comment handling guidance where
    agents decide whether to reply to a review thread.
  • feat: introduce codex-pr-body skill (#18033)
    ## Motivation
    
    Codex needs a repeatable workflow for updating PR metadata after a pull
    request already exists. This is more specific than generic GitHub
    handling: the assistant needs to preserve author-provided body content,
    explain why the PR exists before listing implementation details, and
    describe only the net change under review, including when Sapling stacks
    put a PR on top of another PR instead of `main`.
    
    ## Changes
    
    - Adds `.codex/skills/codex-pr-body/SKILL.md`.
    - Documents how to infer the target PR from the current branch or
    commit, including Sapling-specific PR metadata and `sl sl` output.
    - Defines the expected PR body update behavior: inspect the existing
    body, preserve key content such as images, avoid local absolute paths,
    use Markdown formatting, include relevant issue/PR references, and call
    out developer docs follow-up only when applicable.
    - Captures stacked-PR handling so generated PR text describes the change
    between the PR's base and head, rather than unrelated ancestor changes.
    
    ## Verification
    
    Not run; this is a Codex skill documentation addition.
  • Add project-local codex bug triage skill (#17064)
    Add a `codex-bug` skill to help diagnose and fix bugs in codex.
  • Fix PR babysitter review comment monitoring (#16363)
    ## Summary
    - prioritize newly surfaced review comments ahead of CI and mergeability
    handling in the PR babysitter watcher
    - keep `--watch` running for open PRs even when they are currently
    merge-ready so later review feedback is not missed
  • Update PR babysitter skill for review replies and resolution (#16112)
    This PR updates the "PR Babysitter" skill to clarify that non-actionable
    review comments should receive a direct reply explaining why no change
    is needed, and actionable review comments should be marked "resolved"
    after they are addressed.
  • Add remote test skill (#15324)
    Teach codex to run remote tests.
  • Add PR babysitting skill for this repo (#12513)
    ## PR Notes
    
    This PR adds a project-scoped `babysit-pr` skill for ongoing PR
    monitoring (CI, reviews, mergeability).
    
    Simply invoke this skill after creating a PR, and codex will do its best
    to get it to a mergeable state:
    
    ### What the skill does
    * Fixes CI failures related to the PR
    * Retries CI failures due to flaky tests
    * Addresses code review comments if it agrees with them
    * Addresses merge conflicts on main branch
    
    ### How the skill works
    - Polls PR status on a loop (CI checks, workflow runs, review activity,
    mergeability, and review decision).
    - Detects new review feedback (including inline comments and automated
    Codex review comments) and prompts/handles follow-up work.
    - Distinguishes pending vs failed vs passed CI and identifies likely
    flaky failures.
    - Can retry failed checks/workflows when appropriate.
    - Prioritizes actionable code review feedback over flaky CI retries (to
    avoid rerunning CI on a SHA that is about to be replaced).
    - Continues monitoring after fixes are applied and pushed, rather than
    stopping after a progress update.
    - Uses a slower backoff polling cadence once CI is green, while still
    watching for new review feedback or state changes.
    - Treats required review/approval as a blocking condition and keeps
    watching until the PR is actually merge-ready (or merged/closed, or
    human intervention is needed).
    
    ### Intended outcome
    
    Keep the PR moving with minimal manual babysitting by continuously
    watching for CI failures, reviewer feedback, and merge blockers, and
    responding in the right order until the PR is ready to merge.