codex

[codex] Inject agent graph store into ThreadManager (#29736 )

Pick up the AgentGraphStore migration.

- Inject an explicit optional agent graph store into `ThreadManager` 
- Move all calls to spawn, close, recursive resume, and
subtree/archive/delete/feedback traversal through it
- Keep using  `LocalAgentGraphStore` when SQLite is available

This required some changes to the interface to deal with futures:

- The interface now matches `ThreadStore`'s object-safe pattern by
returning a boxed `AgentGraphStoreFuture` directly, allowing
`ThreadManager` to hold `Arc<dyn AgentGraphStore>`

*Slight behavior change!* Unfiltered subtree enumeration now performs a
single all-status breadth-first traversal, so a closed grandchild
beneath an open edge is included; the previous Open-then-Closed
traversals could not cross mixed-status paths and silently omitted it.

Tom · 2026-06-24 13:24:10 -07:00

ece1dfece0

feat(network-proxy): experimental local credential broker (#28034 )

## Why

Codex child processes can inherit injectable local credentials directly,
which lets commands read and exfiltrate the real values. This
experimental slice keeps supported workflows working while moving those
credentials behind the managed network proxy.

This PR contains only the proxy-owned broker implementation. The Codex
config and runtime integration is stacked separately in #29752.

## What changed

- discover supported credentials during child setup, retain real values
only in the in-memory proxy broker, and replace them with shaped dummy
values
- require a presented dummy to select a stored credential and preserve
unrelated explicit authorization headers
- bind GitHub cloud, GitHub Enterprise, and OpenAI credentials to their
intended hosts
- inject credentials only into TLS traffic by default; plaintext
injection requires the explicit dangerous opt-in
- use TLS ClientHello routing for CONNECT so non-TLS protocols remain
opaque tunnels
- expose a pure API that identifies environment keys still holding
broker-generated dummies without mutating the caller's environment

## Scope

- supported credentials: `GH_TOKEN`, `GITHUB_TOKEN`,
`GH_ENTERPRISE_TOKEN`, `GITHUB_ENTERPRISE_TOKEN`, and `OPENAI_API_KEY`
- GitHub cloud credentials match `github.com`, `api.github.com`, and
`*.ghe.com`
- GitHub Enterprise credentials match only the normalized non-cloud
`GH_HOST`
- OpenAI API keys match only `api.openai.com`
- this does not cover SSH agents, kube client certificates, filesystem
secret discovery, or context-injected secret scrubbing

## Validation

- `just test -p codex-network-proxy` (191 passed)
- focused opaque CONNECT, plaintext opt-in, dummy-selection, and
child-isolation regressions passed
- scoped Clippy check for `codex-network-proxy` passed

---------

Co-authored-by: viyatb-oai <viyatb@openai.com>
Co-authored-by: Codex <noreply@openai.com>

Winston Howes · 2026-06-24 13:21:16 -07:00

989f55defa

feat(app-server): list descendant threads by ancestor (#29591 )

## Why

`thread/list` can filter direct children with `parentThreadId`, but
clients cannot request an entire spawned subtree. Discovering every
descendant requires repeated client-side requests and gives up the
database's existing filtering and pagination path.

## What changed

Experimental clients can use `ancestorThreadId` to return strict
descendants at any depth while `parentThreadId` retains its direct-child
meaning. The filters are mutually exclusive, the ancestor is excluded,
and every result preserves its immediate `parentThreadId` so callers can
reconstruct the tree.

## How it works

- **Explicit relationship:** Internal list parameters distinguish direct
children from transitive descendants without changing the meaning of
`parentThreadId`.
- **Existing graph:** Persisted parent-child spawn edges remain the
source of truth, so descendant lookup needs no schema migration or
ancestry cache.
- **Indexed traversal:** A recursive SQLite query starts from the
parent-edge index, walks each generation, and applies thread filters,
sorting, and cursor pagination in the same database request.
- **Reconstructable results:** The response stays flat and normally
ordered while carrying each descendant's immediate parent.

## Verification

Ran 550 tests across the protocol, state, rollout, and thread-store
crates, then reran the four focused state, store, and app-server
descendant-listing tests after the final diff reduction. Scoped Clippy
and formatting checks passed. Stable and experimental schema generation
was checked; the stable fixtures remain unchanged while the experimental
schema includes the new field.

Brent Traut · 2026-06-24 13:08:14 -07:00

8057603d0c

Skip credential refresh for WindowsApps launch failures (#29637 )

## Summary

- keep the child error 1312 credential retry for normal executables
- return WindowsApps/AppX launch errors directly instead of rotating
sandbox credentials and retrying the same command

## Why

Windows AppX activation can return `ERROR_NO_SUCH_LOGON_SESSION` (1312)
even when the sandbox token is healthy. For executables under
`WindowsApps`, refreshing the sandbox account password cannot fix that
activation failure; it only triggers elevated setup before the same
command fails again.

This is a focused follow-up to #29624.

jif · 2026-06-24 20:59:53 +01:00

3ccef20ef4

Follow directory symlinks in filesystem walks (#29844 )

Stack 3 of 3. Stacked on #29842.

## What changes

Adds an opt-in `followDirectorySymlinks` setting to `fs/walk`.

When enabled, the walk follows directory symlinks but continues to
ignore symlinked files. Canonical directory identities prevent symlink
cycles, while normal paths keep their existing spelling.

Environment skill discovery enables the setting so symlinked skill
directories continue to work with the new single-RPC scan.

jif · 2026-06-24 20:52:36 +01:00

96d8e34712

[codex] Trace exec-server JSON-RPC requests (#27466 )

## Why

Exec-server JSON-RPC calls can cross local and remote transports, but
trace context stopped at the RPC boundary. That made client and server
work difficult to correlate when diagnosing latency or failures.

## What changed

- Propagate the current W3C trace context on outbound JSON-RPC requests.
- Parent inbound request spans from received trace context.
- Record the received JSON-RPC method on server spans and keep each span
open through response enqueue.
- Add only the OTEL dependencies required by the exec-server crate.

## Stack

Review and land this stack in order:

1. #27466 — trace exec-server JSON-RPC requests **(this PR)**
2. #27467 — record bounded connection, request, and process lifecycle
metrics
3. #27470 — observe remote registration and Noise rendezvous lifecycle

## Validation

- `just test -p codex-exec-server --lib` (153 passed)
- `just bazel-lock-check`
- `just fix -p codex-exec-server`

richardopenai · 2026-06-24 12:50:18 -07:00

74dcce594d

Preserve Windows sandbox identity during credential retry (#29624 )

## Summary

- recognize stale Windows sandbox credentials from both runner logon and
child startup failures
- refresh credentials once without changing the original command,
permissions, file rules, desktop mode, or managed-network identity
- add a Windows regression test that forces error 1312 and inspects the
real retry arguments

## Why

Elevated unified exec starts commands in two steps:

```text
Codex -> sandbox command runner -> requested command
```

Either process start can fail when Windows invalidates the sandbox logon
session. The child-side failure was previously returned as text, so the
parent could not reliably recognize Windows error 1312.

The existing retry also refreshed credentials with `proxy_enforced =
false`, even when the original request used managed networking. That
could change the selected Windows sandbox identity from offline to
online during the retry.

## How

- carry the failure stage and numeric Windows error code through the
command-runner IPC protocol
- preserve native `CreateProcessAsUserW` error codes instead of parsing
error messages
- keep every retry-sensitive field in one request and use it for both
attempts
- retry exactly once after refreshing credentials, then return the
second failure
- share the retry rule with the elevated capture path

The Windows test injects error 1312 on both attempts and verifies:

- two spawn attempts and one credential refresh
- stale credentials are replaced by refreshed credentials
- both attempts receive the same command, environment, cwd, permissions,
roots, deny paths, TTY settings, and private-desktop mode
- credential refresh receives the original `proxy_enforced` value

## Tests

- `just test -p codex-windows-sandbox`
- the new Windows-only regression test is included in the Windows
nextest CI archive

jif · 2026-06-24 20:20:52 +01:00

4907f0c2c3

[codex] suppress low usage remaining warnings when credits are available (#28593 )

## Why

The TUI computed proactive `Heads up, you have less than ...` warnings
before considering workspace credits. As a result, users could see
included-limit warnings even when they could continue using Codex with
workspace credits.

`has_credits` alone is not sufficient to determine whether finite
credits are usable: a spend-control hard limit can cap the reported
balance to zero while `has_credits` still reflects the workspace's raw
balance. Unlimited credits are the opposite case: they are usable even
though no numeric balance is reported.

## What changed

- suppress proactive TUI rate-limit usage warnings and the lower-cost
model nudge when usable workspace credits are available
- treat credits as usable when `has_credits` is true and either
`unlimited` is true or the parsed balance is positive
- continue showing warnings when the usable balance is zero, including
when a spend-control limit has capped otherwise available workspace
credits
- add regression coverage for zero-balance, positive-balance, and
unlimited workspace-credit snapshots

## Validation

- `just test -p codex-tui rate_limit_usage_warnings_`

Brooks · 2026-06-24 18:43:17 +00:00

5013d10824

[codex] fix Windows ConPTY input handling (#29734 )

## Why

Windows unified-exec TTY input did not behave like the non-Windows PTY
path. ConPTY sessions could receive the wrong line ending or mishandle
backspace, especially when sending input to a foreground program through
PowerShell or cmd. The local, legacy restricted, and elevated paths also
handled this normalization separately.

## What changed

- share one stateful Windows TTY input normalizer across local, legacy
restricted, and elevated runner paths
- translate LF and split CRLF into one Windows terminal Enter, encode
backspace as DEL, and preserve UTF-8 and control bytes such as Ctrl-C
- add Windows integration coverage for Unicode input, backspace, Enter,
and PowerShell foreground-child Ctrl-C behavior

## Validation

- `just test -p codex-utils-pty` (13 tests passed; the Unicode
integration test retried once)
- the Unicode integration test passed five consecutive runs with retries
disabled
- integration coverage sends `cafeé 漢字` through cmd and PowerShell and
verifies that Ctrl-C interrupts a running PowerShell foreground child

iceweasel-oai · 2026-06-24 11:27:44 -07:00

a781761eda

Fix environment skill discovery after merge (#29887 )

## Why

The merge of #29831 with the new `fs/walk` environment discovery path
left three `SkillFileDiscovery` initializers without the new namespace
fields. This makes `codex-core-skills` fail to compile and breaks CI for
every PR based on current `main`.

## What changed

- collect plugin roots from the directory entries already returned by
`fs/walk`
- keep the selected root as the namespace fallback
- initialize empty discovery results with empty namespace sets

This preserves the bounded `fs/walk` implementation while restoring the
namespace caching added by #29831.

jif · 2026-06-24 19:08:39 +01:00

8a6a34be75

Cache plugin namespace during executor skill discovery (#29831 )

## Why

Executor skill discovery runs before the remote skills catalog is
available. For a remote environment, each `ExecutorFileSystem` operation
becomes an exec-server RPC.

Previously, every discovered `SKILL.md` independently resolved its
plugin namespace by walking its ancestors and probing both supported
manifest locations. In the common `plugin/skills/<skill>/SKILL.md`
layout, that repeats 8 RPCs per skill even though every skill under the
plugin root uses the same namespace. These lookups happen while skills
are parsed, so their cost grows linearly with the skill count and adds
directly to first-turn latency.

A selected capability root can also contain standalone skills, multiple
sibling plugins, nested plugins, or symlinked directories. The
optimization therefore needs to retain the nearest-ancestor namespace
for each skill rather than assuming the selected root represents exactly
one plugin.

## What changed

- record plugin-root candidates from directory entries already returned
during skill discovery
- prune candidates that are not ancestors of any discovered `SKILL.md`
before reading manifests
- resolve each relevant plugin root once, with one fallback lookup per
canonical traversal root for symlinked directories
- select the nearest cached plugin namespace for each discovered skill
- avoid namespace lookup entirely when the root contains no skills

No additional directory traversal is required. Namespace work now scales
with the number of plugin roots that contain discovered skills, rather
than the total number of skills or unrelated sibling plugins. Standalone
and nested-plugin names keep their previous behavior.

## Benchmarks

I used a temporary counting `ExecutorFileSystem` around the real local
filesystem. Each filesystem operation was counted as one remote RPC and
given 1 ms of injected latency. Each variant ran three times; times
below are medians.

### One plugin with 100 skills

| Operation | Before | After | Delta |
| --- | ---: | ---: | ---: |
| `get_metadata` | 1,002 | 303 | -699 |
| `read_file` | 200 | 101 | -99 |
| `read_directory` | 102 | 102 | 0 |
| **Total filesystem RPCs** | **1,304** | **506** | **-798 (-61.2%)** |
| **Median load time** | **2.890 s** | **0.997 s** | **2.90× faster** |

The namespace-specific work drops from 800 RPCs to 2 in this layout.

### Multiple plugins under one selected root

These runs compare the correct pre-optimization implementation with the
final nearest-plugin-root cache. The total plugin skill count stays at
100 while the number of plugin roots changes.

| Layout | Before RPCs | After RPCs | Reduction | Before | After |
Speedup |
| --- | ---: | ---: | ---: | ---: | ---: | ---: |
| 2 plugins × 50 skills | 1,312 | 530 | 59.6% | 1,819 ms | 711 ms |
2.56× |
| 10 plugins × 10 skills | 1,344 | 578 | 57.0% | 1,850 ms | 778 ms |
2.38× |
| 50 plugins × 2 skills | 1,504 | 818 | 45.6% | 2,094 ms | 1,086 ms |
1.93× |
| 10 plugins × 10 skills + 10 standalone skills | 1,596 | 630 | 60.5% |
2,209 ms | 860 ms | 2.57× |

The remaining cost grows with the number of relevant plugin manifests.
Each relevant manifest is read once instead of once per skill, while
sibling plugins with no discovered skills are not read. Absolute latency
savings depend on the executor's real RPC latency.

## Tests

- `just test -p codex-core-skills` (109 passed across the library and
integration-test binaries)
- one integration test covers standalone, outer-plugin, nested-plugin,
and unused sibling-plugin layouts, and asserts the exact set of
manifests read

jif · 2026-06-24 17:14:34 +01:00

390b73133b

[codex] show external import result counts (#29567 )

## What changed

- Show per-type import counts in the `/import` review UI and started
message.
- Render completion results as a multi-line summary with total
imported/failed counts and one row per import type.
- Add snapshot coverage for the updated review and completion output.

<img width="537" height="322" alt="Screenshot 2026-06-23 at 9 41 20 PM"
src="https://github.com/user-attachments/assets/166542eb-2097-4b2b-8130-8f6fd8c680ce"
/>


## Why

The TUI previously only reported that Claude Code import started or
finished. Users could not see how many items of each type were selected
or how many actually imported versus failed.

charlesgong-openai · 2026-06-24 08:56:57 -07:00

3694b48a82

Use fs/walk for environment skill discovery (#29842 )

Stack 2 of 3. Base: #29841. Follow-up: #29844.

## What changes

Environment skill discovery currently walks remote filesystems through
repeated `readDirectory` and `getMetadata` calls. This switches that
scan to the bounded `fs/walk` operation from the base PR.

```text
Before: readDirectory(root) -> getMetadata(...) -> readDirectory(child) -> ...
After:  fs/walk(root, limits) -> filter the result for SKILL.md
```

This makes environment skill discovery one RPC while preserving
traversal warnings and the existing depth and directory limits. The scan
also has an explicit entry limit. The follow-up restores
directory-symlink traversal.

jif · 2026-06-24 16:32:35 +01:00

69b76e9d07

Add a bounded filesystem walk RPC (#29841 )

Stack 1 of 3. Follow-ups: #29842 and #29844.

## What changes

Adds a general bounded `fs/walk` operation to the exec server.

The operation returns file and directory entries plus recoverable
per-path errors. It skips symlinks, preserves the existing filesystem
sandbox routing, and enforces depth, directory, entry, and response-size
limits.

This PR only defines and wires the filesystem operation. It does not
change any callers yet.

jif · 2026-06-24 16:05:43 +01:00

c14623d04c

Persist agent messages as response items (#29829 )

## Why

Inter-agent messages are recorded in live history as
`ResponseItem::AgentMessage`, but rollouts stored
`InterAgentCommunication` and rebuilt the response item during resume.
This made the rollout differ from the actual Responses history.

## What changed

- store the prepared `agent_message` response item directly
- keep `trigger_turn` in a small local metadata record for fork
truncation
- keep reading older `inter_agent_communication` rollout items

jif · 2026-06-24 15:43:10 +01:00

b4f0f3eff1

[codex] Emit implicit skill usage for support reads (#29731 )

## Summary
- Index all enabled skills for command-based usage detection, regardless
of `allow_implicit_invocation`.
- Preserve `allow_implicit_invocation` for the model-visible implicit
routing list.
- Add regression coverage for a support/preflight skill whose `SKILL.md`
is read and whose script is run while implicit invocation is disabled.

## Root cause
`allow_implicit_invocation` was used for both model routing and
command-based usage-event detection. That meant support skills like
`data-analytics:user-context` could be read or run by other skills, but
those accesses could not emit implicit usage events.

## Validation
- `just fmt`
- `just test -p codex-core-skills
service::tests::skills_for_config_indexes_usage_detection_for_non_implicit_skills`
- `just test -p codex-core-skills` now has the new test passing, but 3
unrelated local tests fail because
`/Users/alexsong/.agents/skills/test/SKILL.md` is invalid/missing YAML
frontmatter.

alexsong-oai · 2026-06-24 08:57:34 +00:00

f959e7fc98

Keep executor plugin MCP paths URI-native (#29628 )

## Why

Executor-owned plugin roots are `PathUri`, but MCP config normalization
still converts them into a native `Path` using the app-server host's
rules. Relative `cwd` values can therefore resolve against the wrong
filesystem when host and executor path conventions differ.

This PR keeps executor MCP paths URI-native until the selected
environment launches the server, while retaining the existing host
parser behavior.

## What changed

- Keep one shared MCP normalization path with narrow host-`Path` and
executor-`PathUri` entrypoints.
- Preserve native host resolution for locally installed plugin MCP
configs.
- For executor configs, default `cwd` to the plugin root and resolve
relative working directories with the root URI's path convention.
- Accept explicit executor `file:` URIs only when they remain within the
selected plugin root.
- Preserve the selected environment id and existing remote
environment-variable ownership rules.
- Route the executor plugin provider through the URI-native entrypoint
without converting the root on the host.
- Ensure `codex doctor` does not probe executor-owned stdio commands or
foreign working directories on the host.
- Cover foreign Windows roots, relative and absolute executor working
directories, traversal rejection, runtime resolution, and doctor
behavior.

```text
plugin root:    file:///C:/plugins/demo
configured cwd: scripts
                  |
                  v
resolved cwd:  file:///C:/plugins/demo/scripts
                  |
                  v
launch through the selected executor
```

No new provider or filesystem abstraction is introduced.

## Stack

1. #29614 — add lexical `PathUri` containment.
2. #29620 — share URI-native manifest path resolution.
3. #28918 — keep selected plugin roots and resources URI-native.
4. #29626 — load executor skills without host path conversion.
5. **This PR** — resolve executor MCP working directories without host
path conversion.

jif · 2026-06-24 09:46:07 +01:00

3e39e92f03

[codex] Remove auto-compaction opt-out (#29815 )

## Summary

- remove the default-on `auto_compaction` feature flag and generated
config schema entries
- restore unconditional pre-turn, model-switch/hash, and mid-turn
automatic compaction
- expose `new_context` whenever token-budget tooling is enabled
- remove the disabled-auto-compaction integration coverage introduced by
#28260

## Motivation

Roll back the internal auto-compaction escape hatch added in #28260.
Automatic compaction should no longer be suppressible with `--disable
auto_compaction`; existing manual `/compact` behavior remains unchanged.

## Testing

- `just write-config-schema`
- `just test -p codex-features` — 53 passed
- `just test -p codex-core 'suite::compact::'` — 36 passed
- `just test -p codex-core
suite::token_budget::new_context_tool_starts_new_window_before_follow_up`
— 1 passed
- `just fix -p codex-core -p codex-features`
- `just fmt`
- `just test -p codex-core` — 2,778 passed, 59 failed, 16 skipped;
failures were outside the changed compaction paths and were dominated by
missing first-party test binaries and shell-snapshot timeouts

rhan-oai · 2026-06-24 00:15:04 -07:00

2a320fedb5

test: use automatic environments in app-server integration tests (#29789 )

## Why

Topology-neutral app-server integration tests should exercise automatic
environment selection so the same setup covers local and remote
executors.

## What

Migrate eligible tests to `TestAppServer::new_with_auto_env()` and
`send_thread_start_request_with_auto_env()`. Leave explicit-topology
tests unchanged, and skip the request-permissions case on Windows with a
TODO for cross-platform tool routing.

## Validation

- `just test -p codex-app-server`
- `bazel test //codex-rs/app-server:app-server-all-wine-exec-test
--test_output=errors`

Stacked on #29788.

Adam Perry @ OpenAI · 2026-06-23 22:48:06 -07:00

c2b3e3b4f5

test: run app-server integration tests under Wine (#29788 )

## Why

Made a mistake when carving #29746 out of my local changes and the test
was missing from the build graph. Oops!

## What

Enable the app-server Wine exec test target. Remove the `manual` tag
from generated Wine-exec test variants so wildcard Bazel test
invocations select them. Refactor the smoke test to ensure it passes
with current Windows support.

Adam Perry @ OpenAI · 2026-06-24 05:23:29 +00:00

b17f30eb2a

connectors: own app metadata types (#29723 )

## Why

Connector metadata is consumed by connector discovery, ChatGPT
integration, core, and TUI code. Treating app-server's wire DTO as the
shared domain model reverses the intended dependency direction.

## What changed

- Added connector-owned app branding, review, screenshot, metadata, and
info types.
- Added explicit conversions in app-server and TUI while preserving
app-server's wire payloads.
- Removed production app-server-protocol dependencies from connectors
and ChatGPT connector code.

## Stack

This is PR 4 of 6, stacked on [PR
#29722](https://github.com/openai/codex/pull/29722). Review only the
delta from `codex/split-config-layer-types`. Next: [PR
#29724](https://github.com/openai/codex/pull/29724).

## Validation

- Connector and tools coverage passed.
- App-server app-list coverage passed: 13 tests.

Adam Perry @ OpenAI · 2026-06-23 22:08:23 -07:00

e639e8c4bd

config: own layer provenance types (#29722 )

## Why

Config layer provenance describes how effective configuration was
assembled, so it belongs with the config loader rather than in
app-server's serialized API types.

## What changed

- Moved `ConfigLayerSource`, `ConfigLayerMetadata`, and `ConfigLayer`
ownership into `codex-config`.
- Kept app-server's wire payloads unchanged and added explicit
conversions at the app boundary.
- Removed lower-level app-server-protocol dependencies from config
consumers.

## Stack

This is PR 3 of 6, stacked on [PR
#29721](https://github.com/openai/codex/pull/29721). Review only the
delta from `codex/split-auth-domain-types`. Next: [PR
#29723](https://github.com/openai/codex/pull/29723).

## Validation

- `codex-config` coverage passed.
- App-server config-manager and config RPC coverage passed.

Adam Perry @ OpenAI · 2026-06-24 04:03:04 +00:00

1d65ccabd5

[plugins] Enforce marketplace source admission requirements (#29753 )

## Why

Managed marketplace source requirements only become effective when every
local marketplace mutation path applies the same admission decision.
This change centralizes that decision so CLI, app-server, and
external-agent migration flows cannot add, install from, or refresh a
disallowed source.

## What changed

- Match exact normalized Git repository URLs with an optional exact
`ref`.
- Match Git hosts with managed regular expressions.
- Match local marketplaces by exact absolute path.
- Preserve the expected path/name boundary for managed OpenAI
marketplaces.
- Enforce source admission during marketplace add, plugin install, and
configured Git marketplace upgrade.
- Continue upgrading independent marketplaces when one source is
rejected and return a per-marketplace error.
- Load the effective requirements stack at CLI, app-server, and
external-agent migration entry points.

This PR does not filter already configured marketplaces at runtime; that
remains in draft follow-up #29691.

## Stack

This is PR 2 of 3 and is based on #29690, which introduces the
requirements data shape and merge behavior.

## Test plan

- Source matcher coverage for Git URL/ref, host-pattern, local-path, and
managed marketplace cases.
- Marketplace add and plugin install coverage for allowed and rejected
sources.
- Marketplace upgrade coverage for rejection and per-marketplace
continuation.

xl-openai · 2026-06-23 20:13:11 -07:00

4fe02f4fcf

auth: move domain mode below app wire types (#29721 )

## Why

Authentication mode is a domain concept used by login, model selection,
telemetry, and transports. Keeping the canonical type in app-server
protocol forces those lower-level crates to depend on an unrelated wire
API.

## What changed

- Added canonical `codex_protocol::auth::AuthMode` domain values.
- Kept the app-server wire DTO unchanged and added an explicit app-side
conversion.
- Removed production app-server-protocol dependencies from login,
model-provider-info, models-manager, and otel call paths.

## Stack

This is PR 2 of 6, stacked on [PR
#29714](https://github.com/openai/codex/pull/29714). Review only the
delta from `codex/split-json-rpc-protocols`. Next: [PR
#29722](https://github.com/openai/codex/pull/29722).

## Validation

- Auth and login coverage passed in the focused protocol/domain test
run.
- App-server account and auth conversion coverage passed.

Adam Perry @ OpenAI · 2026-06-24 03:10:20 +00:00

31372078d1

[codex] Assign response item IDs in forked history (#29767 )

## Why

Fork-specific response items, including the subagent usage hint, are
appended directly to `InitialHistory::Forked`. This bypasses the normal
history insertion path that assigns missing response item IDs when
`Feature::ItemIds` is enabled, so the child could reconstruct and
persist those items without IDs.

## What changed

- When `Feature::ItemIds` is enabled, assign missing IDs to top-level
`ResponseItem`s while materializing `InitialHistory::Forked`, before
both reconstruction and persistence.
- Preserve existing IDs and use the same owned rollout items for live
history and persistence.
- Extract the existing single-item ID allocation logic for reuse by the
fork path.
- Add coverage that verifies a fork-only developer message receives the
same ID in live and persisted history with the feature enabled.

Normal history recording, compacted-history replacement, and fork
handling all continue to honor `Feature::ItemIds`. External-agent
imports, normal resume, and nested legacy compaction checkpoints are
unchanged.

## Testing

- `just test -p codex-core
record_initial_history_reconstructs_forked_transcript`
- `just test -p codex-core
record_initial_history_assigns_and_persists_id_for_forked_response_item`

pakrym-oai · 2026-06-24 03:03:19 +00:00

806a4b66c9

[codex] Ignore local curated plugins when remote catalog is active (#29765 )

## Summary

- suppress configured `openai-curated` plugins when the remote plugin
feature is enabled and auth uses the Codex backend
- preserve `openai-api-curated` and non-Codex-backend behavior while
including remote catalog activation in the plugin load cache key
- add core plugin coverage and an app-server integration test for
runtime feature enablement

## Why

The Codex app enables remote plugins through process-local runtime
feature enablement, which can happen after app-server startup tasks have
already observed legacy local plugin state. The existing conflict logic
only preferred a remote plugin when the same plugin was already
installed remotely, so a configured legacy-only plugin could continue
exposing skills and other capabilities from `openai-curated`.

## Impact

When the remote catalog is active, legacy `openai-curated` plugins no
longer contribute skills, MCP servers, apps, or hooks. Remote installed
plugins continue to load normally, and `openai-api-curated` remains
unaffected. This does not change remote fetch, bundle sync, or uninstall
behavior.

## Validation

- `just test -p codex-core-plugins
remote_global_catalog_ignores_local_curated_plugins
remote_plugin_feature_keeps_local_curated_without_codex_backend`
- `just test -p codex-app-server
runtime_remote_plugin_enablement_excludes_local_curated_plugin_skills`
- `just fmt`
- `git diff --check`

xl-openai · 2026-06-23 19:51:31 -07:00

ff78e21215

[plugins] Add marketplace source requirements (#29690 )

## Why

Managed deployments need a mergeable way to declare which marketplace
sources Codex may use. An enterprise-keyed TOML table avoids array merge
ambiguity and lets every requirements layer use the existing config
precedence rules without a marketplace-specific merger.

## Requirements shape

```toml
[marketplaces]
restrict_to_allowed_sources = true

[marketplaces.allowed_sources.company_plugins]
source = "git"
url = "https://github.com/example/company-plugins.git"
ref = "main"

[marketplaces.allowed_sources.internal_git]
source = "host_pattern"
host_pattern = "^git\\.example\\.com$"

[marketplaces.allowed_sources.local_plugins]
source = "local"
path = "/opt/company/codex-plugins"
```

`restrict_to_allowed_sources` follows normal scalar precedence.
`allowed_sources` follows normal recursive TOML table merge behavior:
distinct keys accumulate and fields under the same key use normal layer
precedence. The final `source` value later selects which fields the
marketplace admission policy interprets.

The raw rule fields remain optional while requirements layers are
composed, so a higher-priority layer can override only `ref`, `url`, or
another individual field. Source-specific validation and normalization
intentionally belong to the marketplace admission layer, not
requirements merging.

This initial shape includes `git`, `host_pattern`, and `local` sources.
It does not add npm or path-pattern rules.

## What changed

- Add the marketplace requirements TOML shape to
`ConfigRequirementsToml`, `ConfigRequirementsWithSources`, and
`ConfigRequirements`.
- Carry marketplace requirements through the existing regular
requirements merge path.
- Keep allowed-source entries as raw partial tables for downstream
policy interpretation.
- Cover partial same-key overlays, source changes, unknown fields, and
unmodified local paths.

This PR defines and composes the requirements only. Source admission is
implemented by the next PR in the stack.

## Stack

This is PR 1 of 3. #29753 adds source admission on top of this PR; draft
#29691 will add runtime enforcement after it is rebased later.

## Test plan

- `just test -p codex-config marketplace_`

xl-openai · 2026-06-23 19:42:13 -07:00

2696e7199b

[codex] Update bundled skill installer guidance (#29768 )

## Summary

- Update the bundled skill installer's post-install guidance to say the
skill will be available on the user's next turn.
- Remove the obsolete instruction to restart Codex.

## Why

Codex refreshes its skill catalog between turns. The existing bundled
instruction predates that behavior and causes the model to recommend an
unnecessary restart.

## Impact

Released Codex builds will materialize accurate post-install guidance
for the bundled system skill.

## Related

- Canonical skill change: https://github.com/openai/skills/pull/507

## Validation

- `just fmt`
- `git diff --check`
- `just test -p codex-app-server
skills_changed_notification_is_emitted_after_skill_change` (passed
during investigation)

No test code was added because the existing live-refresh path and
focused integration test already verify that skill changes are picked up
without restarting.

sayan-oai · 2026-06-23 19:36:17 -07:00

6f65b9a98c

[codex] Reuse compacted history replacement for new context windows (#29762 )

## Why

`start_new_context_window` independently replaced in-memory history and
persisted a compacted checkpoint instead of using the shared
compacted-history path. That bypassed the centralized missing-item-ID
assignment when `item_ids` is enabled, so fresh context messages could
enter the new context window and its persisted replacement history
without IDs.

This follows up on the token-budget compaction reset flow introduced in
[#29743](https://github.com/openai/codex/pull/29743).

## What changed

- Delegate new context-window installation to
`replace_compacted_history`.
- Reuse its ID assignment, in-memory replacement, world-state baseline,
checkpoint persistence, turn-context persistence, and session-start
bookkeeping.
- Add focused coverage that verifies generated IDs are present in live
history and preserved in the persisted replacement history.

## Testing

- `just test -p codex-core
start_new_context_window_assigns_and_persists_item_ids`
- `just test -p codex-core
new_context_tool_starts_new_window_before_follow_up`

pakrym-oai · 2026-06-23 18:53:35 -07:00

176af2b510

Let image generation extension hosts control output persistence (#29711 )

## Why

Some extension hosts need generated images returned without writing them
to the local filesystem or giving the model a local path.

## What changed

**tl;dr**: we now conduct all extension operations in the image gen
extension

- Let hosts provide an optional image save root when installing the
extension.
- Save images and return path hints only when a save root is configured.
- Return image data without saving or adding a path hint when no save
root is configured.
- Preserve the extension-provided `saved_path` instead of persisting
extension images again in core.
- Leave built-in image generation unchanged.

## Validation

- `just test -p codex-image-generation-extension`
- `just test -p codex-app-server
standalone_image_generation_returns_saved_path_hint_to_model`
- `just test -p codex-core
extension_tool_uses_granted_turn_permissions_without_local_persistence`
- `just test -p codex-core tools::handlers::extension_tools::tests`
- tested on CODEX CLI on both save_root: CODEX_HOME and None 
- tested on CODEX APP on both as well

Won Park · 2026-06-23 18:51:49 -07:00

61f5a84930

test: add app-server auto environment helper (#29746 )

## Why

Start moving towards app-server tests defaulting to running against
remote & foreign OS executors. To do so we need a point of indirection
similar to core integration tests' `build_with_auto_env`, but with the
flexibility of letting tests control environment registration if they
need to.

## What

This adds:

- `TestAppServer::new_with_auto_env()` for constructing an app server
with a default environment defined by the test runner (e.g. bazel)
- `TestAppServer::auto_env_params()` for tests to easily acquire turn
env params tailored to the automatic environment
- `TestAppServer::send_thread_start_request_with_auto_env()` to make it
easy for tests to start a thread using the automatic environment

The above methods all fail if the test calling them has set up an
environment where the automatic environment configuration conflicts with
test-created state.

## Validation

Adds a couple of basic smoke tests to the app-server test suite.
Follow-ups will migrate more tests to use it.

Adam Perry @ OpenAI · 2026-06-24 01:06:29 +00:00

283bc4cf01

chore: assign amsg_ IDs to agent messages (#29750 )

## Why

The `ItemIds` path fills in missing IDs before response items are
persisted and emitted as raw item events. `ResponseItem::AgentMessage`
is part of that same response-item stream, but it was skipped by the
missing-ID repair path, leaving agent messages without stable item IDs
while messages and tool items received generated IDs.

Agent messages recorded through `InterAgentCommunication` also need the
generated ID to survive rollout persistence and resume. Otherwise
clients can observe an `amsg_` ID for the live raw response item, then
see that same persisted agent message lose its item ID after restart.

## What changed

- Assign missing `ResponseItem::AgentMessage` IDs with the `amsg_`
prefix.
- Persist the generated item ID on `InterAgentCommunication` and replay
it back into the reconstructed `ResponseItem::AgentMessage` on resume.
- Keep the persisted ID out of the model-visible inter-agent message
envelope.
- Keep `CompactionTrigger` and `Other` skipped because they do not get
generated item IDs.
- Update session/protocol tests for agent-message ID assignment and
resume preservation.

## Manual Testing

Run the local dev build using `just c --enable item_ids` to ensure this
code is exercised:


https://github.com/openai/codex/blob/322e33512b2d38d38d705e2ef692a8aca50decac/codex-rs/core/src/session/mod.rs#L2713-L2715

In the `.jsonl` file, I saw entries like:

```json
{
  "timestamp": "2026-06-24T00:44:03.098Z",
  "type": "inter_agent_communication",
  "payload": {
    "id": "amsg_019ef715-849a-7a50-becc-ce63c6a9c994",
```

## Test plan

- `just test -p codex-core
record_inter_agent_communication_preserves_item_id_in_rollout_and_resume`
- `just test -p codex-core
record_inter_agent_communication_sets_turn_id_in_rollout_and_resume`
- `just test -p codex-protocol
inter_agent_communication_response_input_item_preserves_commentary_phase`

Michael Bolin · 2026-06-23 17:57:03 -07:00

97dce078c5

[codex] trace MCP startup latency (#28630 )

## Summary

- add trace-level instrumentation around per-server MCP setup, client
construction, initialization, and initial tool listing
- trace Codex Apps tool and server-info cache loads
- attach `server_name` to server-scoped spans so slow startup work can
be attributed to a specific MCP server

## Why

`session_init.mcp_manager_init` can occasionally be slow, but its
existing coarse span does not identify whether time is spent loading the
Codex Apps cache, constructing a client, initializing a transport, or
listing tools. These definition-level spans provide that breakdown
without changing startup behavior.

## Validation

- `just test -p codex-mcp` (87 passed)
- `just test -p codex-rmcp-client` (86 passed, 2 skipped)

rphilizaire-openai · 2026-06-23 17:46:54 -07:00

322e33512b

core: add wait_for_environment for starting environments (#29745 )

## Why

With `DeferredExecutor`, a sampling request can begin while an
environment is still starting. The model can see that pending state, but
needs a way to wait for the environment within the same turn before
continuing.

Environment startup is owned by Core, so the wait tool should use the
same request-frozen `StepContext` that advertised the starting
environment. This keeps tool registration and execution tied to the
exact startup operation the model saw, even if live thread state later
changes.

Supersedes #29735.

## What

- register `wait_for_environment` when the current `StepContext`
contains starting environments
- wait on the selected `StartingTurnEnvironment` shared resolution and
return a bounded ready or failed result
- rebuild the next request normally, removing the wait tool and exposing
ready environment tools, or reporting the environment as unavailable
after failure

## Testing

- `just test -p codex-core deferred_executor_`
- verifies the wait tool is replaced by environment-backed tools after
startup
- verifies startup failure removes both the wait tool and unavailable
environment tools while notifying the model

sayan-oai · 2026-06-24 00:35:34 +00:00

61ff4d087e

Support thread-level originator overrides (#29477 )

## Why

Work(TPP) threads can be launched from the Desktop app, but if they all
keep the Desktop app's default originator then downstream attribution
cannot distinguish local Work launches from cloud-backed Work launches.
`thread/start.serviceName` already carries that launch signal, while
`SessionMeta.originator` is the durable thread-level value that survives
resume and fork.

This change converts the Desktop Work service names into an effective
originator at thread creation time, persists that originator with the
thread, and keeps using it for later model requests and memory writes.

## What changed

- Map `CODEX_WORK_LOCAL` and `CODEX_WORK_CLOUD` service names to
per-thread originators, while preserving
`CODEX_INTERNAL_ORIGINATOR_OVERRIDE` as the highest-precedence override.
- Persist the effective originator in `SessionMeta.originator`, read it
back on resume/fork, and inherit the parent originator for subagent
spawns when there is no persisted session metadata.
- Handle truncated `SpawnAgentForkMode::LastNTurns` forks by falling
back to the live parent originator when the forked history no longer
includes `SessionMeta`.
- Thread the per-thread originator through Responses headers,
websocket/compaction request paths, thread-store creation, rollout
metadata, and memory stage-one telemetry.

## Verification

- `just test -p codex-core
agent::control::tests::spawn_thread_subagent_inherits_parent_originator_without_fork
agent::control::tests::spawn_thread_subagent_fork_last_n_turns_inherits_parent_originator_without_session_meta
thread_manager::tests::originator_override_precedes_service_name_remapping`
- `just test -p codex-core
agent::control::tests::resume_thread_subagent_restores_stored_metadata_and_effective_multi_agent_mode`
- `just test -p codex-memories-write`
- `just fix -p codex-core -p codex-memories-write`
- `git diff --check`

alexsong-oai · 2026-06-23 17:23:38 -07:00

1acb722e8a

core: reset context for token budget compaction (#29743 )

## Why

When `Feature::TokenBudget` is enabled, compaction should behave like
`new_context`: start a fresh context window with the standard injected
context, without asking the server to summarize old history and without
carrying prior user or assistant messages into the next model request.

This is still a compaction operation from the client lifecycle
perspective. Manual `/compact` and auto-compaction should keep the same
observable side effects that clients and hooks expect, including compact
hooks and `TurnItem::ContextCompaction`.

## What changed

- Added `compact_token_budget` to run token-budget manual and inline
auto-compaction through a shared compaction lifecycle.
- Split pending `new_context` requests from forced context-window
startup: `take_new_context_window_request()` consumes pending requests,
and `start_new_context_window()` installs a fresh context window.
- Routed token-budget manual `/compact` and inline auto-compaction to
install a fresh context window locally instead of calling server/local
summarization.
- Preserved compact lifecycle side effects for token-budget compaction
by running pre/post compact hooks and emitting `ContextCompaction` item
start/completion events.
- Updated token-budget tests to assert fresh window IDs, absence of
server-side compaction calls, dropped prior transcript messages/tool
output after reset, and compact hook/item lifecycle behavior.

## Testing

- `just test -p codex-core
token_budget_context_uses_new_window_after_compaction`
- `just test -p codex-core token_budget_compaction_runs_compact_hooks`
- `just test -p codex-core
token_budget_mid_turn_auto_compaction_resets_before_active_follow_up`

---------

Co-authored-by: pakrym-oai <pakrym@openai.com>

Michael Bolin · 2026-06-23 16:59:04 -07:00

32b65bbf7a

Update new_context_window instructions (#29739 )

Andrey Mishchenko · 2026-06-23 16:52:40 -07:00

3b4186986f

[codex] rename rollout budget error to session budget error (#29744 )

## Summary

- rename the rollout-budget exhaustion error from
`RolloutBudgetExceeded` to `SessionBudgetExceeded`
- expose the matching app-server v2 wire value as
`sessionBudgetExceeded`
- regenerate JSON/TypeScript schema fixtures and update the app-server
docs and focused tests

This is a naming-only follow-up to #29715 based on [Pavel's review
suggestion](https://github.com/openai/codex/pull/29715#discussion_r3463183480).
Runtime behavior is unchanged.

## Tests

- `just test -p codex-core rollout_budget`
- `just test -p codex-app-server-protocol`
- `just fmt`
- `just write-app-server-schema`

rka-oai · 2026-06-23 16:49:13 -07:00

1ec3def0b5

fix: scope context remaining to body window (#29665 )

## Why

With `model_auto_compact_token_limit_scope = "body_after_prefix"`, the
persistent prefix should not count against the active body window.
`get_context_remaining` and the token-budget reminder should report the
same usable body-after-prefix window that auto-compaction uses, rather
than the total token count since the session began.

This is stacked on #29664 so the mechanical move from `turn.rs` is
isolated from the behavior fix.

## What

- Extends `ContextWindowTokenStatus` with `context_remaining_tokens`.
- Updates `get_context_remaining` to use the shared context-window
accounting.
- Adds integration coverage for body-after-prefix reminder timing and
`get_context_remaining` output.

## Testing

- `just test -p codex-core body_after_prefix_window`
- `just test -p codex-core auto_compact_body_after_prefix`
- `just fix -p codex-core`

Michael Bolin · 2026-06-23 23:08:54 +00:00

77e7ce1374

refactor: extract context window token status (#29664 )

## Why

This PR keeps the mechanical helper extraction separate from the
behavior change in #29665. The follow-up needs the token-window
accounting from `turn.rs` in another call path, but reviewing that is
much easier when the helper extraction is separate from the semantic
change.

## What

- Adds `session/context_window.rs` with `ContextWindowTokenStatus`.
- Moves the existing auto-compaction token-status calculation out of
`session/turn.rs`.
- Replaces the duplicated inline remaining-token calculation in
`turn.rs` with `tokens_until_compaction()`.

This PR is intended to be behavior-preserving. The
`get_context_remaining` behavior change is stacked separately in #29665.

## Testing

- `just test -p codex-core auto_compact_body_after_prefix`













---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/29664).
* #29665
* __->__ #29664

Michael Bolin · 2026-06-23 15:49:30 -07:00

4dde907d27

protocol: separate app and exec RPC ownership (#29714 )

## Why

The app-server and exec-server expose separate JSON-RPC APIs, but
exec-server currently sources its serialized protocol and envelope types
through app-server-oriented code. Giving each API an explicit owner
makes the crate boundary legible without introducing shared generic
envelopes.

## What changed

- Added `codex-exec-server-protocol` to own exec DTOs, process IDs, and
JSON-RPC envelopes.
- Updated exec-server clients, transports, handlers, and tests to use
the new crate.
- Exposed app-server's existing JSON-RPC types through a public `rpc`
module while retaining root re-exports.
- Preserved existing wire shapes, including exec `PathUri` behavior.

## Stack

This is PR 1 of 6. Next: [PR
#29721](https://github.com/openai/codex/pull/29721), which moves auth
mode below the app wire boundary.

## Validation

- Exec-server protocol and server coverage passed in the focused
protocol test runs.
- App-server protocol schema fixtures passed.

Adam Perry @ OpenAI · 2026-06-23 22:37:31 +00:00

829f5b6b59

Load executor skills without host path conversion (#29626 )

## Why

After #28918, selected skill roots are `PathUri`, but the executor skill
provider still converts them to the app-server host's `AbsolutePathBuf`.
A foreign Windows root therefore cannot be discovered by a Unix host,
and the inverse has the same problem.

This PR keeps executor skill discovery and reads on the filesystem that
owns the selected root while reusing the existing skill rules.

## What changed

- Generalize the existing skill traversal to operate on `PathUri`
through `ExecutorFileSystem`, preserving its depth, directory, symlink,
and sibling-metadata concurrency behavior.
- Add a small environment skill loader that reuses the shared discovery,
frontmatter validation, dependency parsing, product policy, and
prompt-visibility rules.
- Keep the environment id and entrypoint `PathUri` in the skill catalog,
then route `skills.read` back through the same environment filesystem.
- Preserve the executor's path convention when deriving catalog handles,
including literal backslashes in POSIX filenames.
- Resolve plugin namespaces from nearby manifests through URI-native
filesystem reads.
- Cover foreign Windows roots, executor-owned reads, namespaces,
metadata, policy, and path identity.

```text
selected root (PathUri)
        |
        v
shared discovery over ExecutorFileSystem
        |
        v
environment-bound catalog entry --skills.read--> same ExecutorFileSystem
```

No second filesystem abstraction or duplicate traversal implementation
is introduced.

## Stack

1. #29614 — add lexical `PathUri` containment.
2. #29620 — share URI-native manifest path resolution.
3. #28918 — keep selected plugin roots and resources URI-native.
4. **This PR** — load executor skills without host path conversion.
5. #29628 — resolve executor MCP working directories without host path
conversion.

jif · 2026-06-23 23:26:06 +01:00

220f5b76b2

code-mode: Remove Session::is_alive() (#29732 )

Remove this unused API. This API is insidious in that it implies that
alive state should be determinable from the caller, and implies that a
preflight should indicate routing. Lets drop this, and handle errors
correctly from a failed session in the future.

Channing Conger · 2026-06-23 15:14:13 -07:00

db6e676afc

[codex] surface rollout budget exhaustion (#29715 )

## Summary
- surface shared rollout-budget exhaustion as
`CodexErr::RolloutBudgetExceeded` instead of a generic interrupted turn
- map it through the existing `CodexErrorInfo` and app-server v2
`codexErrorInfo` path
- keep local compaction from retrying after the shared rollout budget is
exhausted

This gives app-server clients a stable `rolloutBudgetExceeded` error
they can classify without guessing from `status="interrupted"`.

## Tests
- `just test -p codex-core rollout_budget`

rka-oai · 2026-06-23 15:01:28 -07:00

bbbea91960

[codex] define code mode host handshake protocol (#29515 )

## Summary

- add validated protocol-version, capability, and session identifier
types
- define explicit `ClientToHost` and `HostToClient` JSON envelopes for
connection negotiation and session open/close acknowledgements
- reject invalid states and unknown fields during decoding, with
explicit wire-format and round-trip coverage

## Why

This establishes the transport-neutral encoding shape needed to build
and test the new code-mode host incrementally. Cell, tool callback, and
failure-domain messages are intentionally deferred until their actors
and behavior tests establish the required semantics.

This is additive protocol scaffolding and does not change the current
production code-mode implementation.

## Validation

Channing Conger · 2026-06-23 14:57:44 -07:00

be0dfcfbea

Make selected plugin roots URI-native (#28918 )

## Why

Selected capability roots belong to the executor filesystem, not the
app-server host. Converting their path strings into the host's native
`Path` breaks whenever the two machines use different path conventions,
such as a Windows executor behind a Unix app-server.

This PR establishes `PathUri` as the selected-plugin boundary so the
executor remains authoritative for its paths.

## What changed

- Require `selectedCapabilityRoots[].location.path` to be a canonical
`file:` URI and deserialize it directly as `PathUri`; native path
strings are rejected.
- Update the app-server schema, generated TypeScript, examples, and
request coverage for the URI contract.
- Keep selected roots, resolved plugin locations, manifest paths, and
manifest resources as `PathUri`.
- Inspect and read plugin roots and manifests only through the selected
environment's `ExecutorFileSystem`.
- Parse executor manifests with the shared URI-native parser from #29620
instead of projecting them onto the host filesystem.
- Enforce resource containment lexically and preserve the root URI's
POSIX or Windows path convention.
- Cover foreign Windows plugin roots and URI-native manifest resources.

```text
thread/start
  selectedCapabilityRoots[].location.path = "file:///C:/plugins/demo"
                              | PathUri
                              v
                    ExecutorFileSystem
                              |
                              +--> plugin.json
                              +--> manifest resources
```

This PR stops at the shared selected-plugin representation. The next two
PRs remove the remaining host-path projections in the skill and MCP
consumers.

## Stack

1. #29614 — add lexical `PathUri` containment.
2. #29620 — share URI-native manifest path resolution.
3. **This PR** — keep selected plugin roots and resources URI-native.
4. #29626 — load executor skills without host path conversion.
5. #29628 — resolve executor MCP working directories without host path
conversion.

jif · 2026-06-23 22:51:19 +01:00

2e69966cd8

core: persist initial context window metadata (#29519 )

## Why

PR #29494 made context-window IDs visible to the model by wrapping the
token-budget window payload in `<context_window>`, but rollout JSONL
consumers still could not see the initial window identity by tailing the
session file. Compacted rollout items carry window IDs only after
compaction has happened, so a session with no compaction had no durable
JSONL record for window 0.

This change gives tailing consumers a stable initial-window record at
session creation time.

## What Changed

- Added `session_meta.context_window.window_id` for the initial
context-window identity.
- `CreateThreadParams` now requires `initial_window_id: String`, so
thread-store callers cannot accidentally create new threads without
window-0 metadata.
- Live thread creation derives the persisted initial window ID from the
same `AutoCompactWindowIds` used to initialize `SessionState`, keeping
runtime state and JSONL metadata aligned.
- Rollout reconstruction uses `session_meta.context_window.window_id` as
the initial-window fallback and derives `window_number = 0`,
`first_window_id = window_id`, and `previous_window_id = None`
internally.
- Fork reconstruction intentionally uses the same rollout reconstruction
path; consumers that need to distinguish copied initial-window metadata
can use the rollout `thread_id`.
- Legacy compactions without `window_number` still use compaction-count
fallback accounting instead of being reset to window 0 by the
initial-window fallback.
- Compacted rollout metadata still takes precedence once compaction
records exist, preserving the richer chain fields there.

## JSONL Shape

Real rollout JSONL is one object per line. This example is expanded for
readability, but shows the new initial `session_meta.context_window`
record followed by the existing compacted rollout item shape that also
carries window IDs:

```jsonl
{
  "timestamp": "2026-06-22T12:00:00.000Z",
  "type": "session_meta",
  "payload": {
    "session_id": "<THREAD_ID>",
    "id": "<THREAD_ID>",
    "timestamp": "2026-06-22T12:00:00.000Z",
    "cwd": "/repo",
    "originator": "codex",
    "cli_version": "0.0.0",
    "source": "cli",
    "model_provider": "<MODEL_PROVIDER>",
    "context_window": {
      "window_id": "<INITIAL_WINDOW_ID>"
    }
  }
}
...
{
  "timestamp": "2026-06-22T12:34:56.000Z",
  "type": "compacted",
  "payload": {
    "message": "<COMPACTION_SUMMARY>",
    "replacement_history": [
      "..."
    ],
    "window_number": 1,
    "first_window_id": "<INITIAL_WINDOW_ID>",
    "previous_window_id": "<INITIAL_WINDOW_ID>",
    "window_id": "<NEXT_WINDOW_ID>"
  }
}
```

The nested `context_window` object is intentional: it gives rollout
consumers a stable namespace for context-window metadata while only
writing the non-derivable initial `window_id`. For the initial window,
`window_number`, `first_window_id`, and `previous_window_id` are derived
internally instead of being written to the rollout.

## Verification

- `just test -p codex-protocol`
- `just test -p codex-rollout
recorder_materializes_on_flush_with_pending_items`
- `just test -p codex-core reconstruct_history`
- `just test -p codex-core
record_initial_history_reconstructs_forked_transcript`
- `just test -p codex-thread-store`
- `just test -p codex-state`
- `just test -p codex-app-server
thread_read_returns_summary_without_turns`
- `just test -p codex-rollout persistence_metrics`

Michael Bolin · 2026-06-23 21:50:50 +00:00

01f89c8c59

path-uri: remove legacy path deserialization (#29158 )

## Why

I'd originally added `PathUri` legacy path deserialization thinking we'd
want it for having `PathUri` in public app-server APIs. Since then we've
added `LegacyAppPathString` to handle the messy conversions that we need
for backcompat. It's confusing for `PathUri` to support deserializing
legacy paths when we don't yet want to actually expose app-server
callers or rollout storage to the new URI format.

Stacked on top of #29472 to avoid breaking compatibility in case those
types ended up stored somewhere for someone.

## What changed

- Parse deserialized `PathUri` values exclusively as valid `file:` URIs.
- Replace legacy acceptance coverage with rejection coverage for
top-level filesystem paths and sandbox working directories.
- Serialize CWDs in hand-built exec-server process requests as `PathUri`
values.

Adam Perry @ OpenAI · 2026-06-23 21:47:00 +00:00

c26f961b85

core tests: rename automatic environment builder (#29728 )

## Why

Use a clearer name for what happens when this helper sets up a test
environment.

## What

- Rename the builder and its harness wrapper to use `auto_env` instead
of `remote_env` because the helper will set up a local environment if
configured by the build system.

Adam Perry @ OpenAI · 2026-06-23 21:45:06 +00:00

5283522939

test: branch on target OS instead of runner flavor (#29712 )

## Why

Core tests should branch on the executor's operating system, not on
runner details such as Docker or Wine. This keeps platform behavior
stable as new test backends are added and reserves Wine-specific skips
for actual runner debt.

## What

- Add `TestTargetOs` and target/host-aware skip helpers while keeping
`TestEnvironment` internal.
- Replace topology enum access with remote predicates and a narrow
Docker accessor.
- Migrate OS-semantic Wine skips, preserve runner-specific gaps, and
document the skip taxonomy.

## Validation

- `just test -p core_test_support`
- `just test -p codex-core
remote_test_env_can_connect_and_use_filesystem`
- `bazel test //codex-rs/core:core-all-wine-exec-test
--test_output=errors` reached test execution; unrelated existing
view-image, path, and timing failures remain.
- `just test -p codex-core` and `just test` reached broad test
execution; this checkout has unrelated helper, sandbox, and timing
failures.

Adam Perry @ OpenAI · 2026-06-23 14:27:13 -07:00

9a79536e6b

6841 Commits