codex

feat(app-server): add history_mode to thread (#29927 )

## Description

This PR adds a new `historyMode = "legacy" | "paginated"` to `Thread`.
This will be stored in `SessionMeta` in the JSONL rollout file and as a
new column in the SQLite thread_metadata table, and exposed on
`thread/start` and on the `Thread` object in app-server.

## What changed

- Added canonical `ThreadHistoryMode` with `legacy` and `paginated`,
defaulting old and new SessionMeta to `legacy`.
- Carried `history_mode` through core session config, ThreadStore stored
metadata, local/in-memory stores, rollout metadata extraction, and the
existing SQLite `threads` table.
- Added experimental `historyMode` to app-server v2 `Thread` and
`thread/start`.
- Made paginated stored threads metadata-discoverable but unsupported
for legacy full-history reads, `load_history`, live resume, and create
paths.
- Regenerated app-server schema fixtures and added
protocol/state/thread-store/app-server coverage for persistence and
fail-closed behavior.

## Compatibility floor
Because users may be running various versions of Codex binaries on the
same machine (TUI, Codex App, etc.), we will need to establish a
compatibility floor for upcoming paginated threads, which will change
how thread storage reads and writes work.

The overall plan here:
```
Release N:
- Add historyMode to SessionMeta / Thread / SQLite metadata.
- Teach binaries to understand paginated threads.
- If a binary sees `historyMode="paginated"` but does not support the paginated contract, it refuses to resume/mutate the thread.
- Default remains `"legacy"`.

Release N+1:
- First-party clients start opting into paginated threads where appropriate.
- Internal dogfood / staged rollout.
- Measure old-client usage and paginated-thread unsupported errors.

Release N+2:
- Only after Release N+ is overwhelmingly deployed, make paginated the default.
- Accept that a small tail of N-1-or-older binaries may not understand paginated threads.
```

The important behavior change is fail-closed handling for a binary that
encounters a persisted `paginated` thread before it knows how to fully
support paginated history. In app-server, if a thread is `paginated`, we
will:

- allow metadata-only discovery paths like `thread/list` and
`thread/read(includeTurns=false)`, so clients can still see the thread
and inspect its `historyMode`
- reject legacy full-history/live-thread paths like
`thread/read(includeTurns=true)` and `thread/resume` with an unsupported
JSON-RPC error
- avoid silently treating an unknown or future `historyMode` as `legacy`

Under the hood, the ThreadStore layer also rejects legacy operations
that would need to load or replay the full thread history for a
paginated thread. That gives us the behavior we want for Release N:
future paginated threads are visible, but this binary fails closed
instead of trying to operate on them as if they were legacy threads.

Owen Lin · 2026-06-26 09:12:42 -07:00

5267e805fb

[codex] Restore thread recency with compatible migration history (#28671 )

## Summary

- Revert #28655, restoring the thread `recencyAt` behavior introduced by
#27910.
- Move `threads_recency_at` to migration 0039 so it no longer collides
with `external_agent_config_imports` at version 0038.
- Repair databases that already applied the recency migration as version
38 by moving the matching migration-history row to version 39 before
SQLx validation. The current version-38 migration can then apply
normally.

## Validation

- `just test -p codex-state
migrations::tests::repairs_recency_migration_that_was_applied_as_version_38`
- `just test -p codex-state -p codex-rollout -p codex-thread-store -p
codex-app-server-protocol -p codex-tui`: 3,439 passed; six TUI tests
could not open the machine's existing read-only incident database at
`~/.codex/sqlite/state_5.sqlite`.
- `just fix -p codex-state`
- `just fmt`
- Verified that state migration versions are unique.

Jeremy Rose · 2026-06-17 18:52:18 +00:00

7dc7096ae1

Revert thread recencyAt for sidebar ordering (#28655 )

## Why

Revert #27910 to remove the newly introduced thread `recencyAt`
persistence and API behavior from `main`.

## What changed

This reverts commit `fac3158c2a783095768076489815f361fa9b0db4`,
including the state migration, thread-store propagation, app-server API
surface, generated schemas, and related tests.

## Validation

Not run before opening; relying on CI for the initial fast signal.

pakrym-oai · 2026-06-16 21:39:30 -07:00

cb15c64760

Add thread recencyAt for sidebar ordering (#27910 )

## Summary

Add a server-owned `recencyAt` timestamp and `recency_at` thread-list
sort key for product recency ordering while preserving the existing
meaning of `updatedAt` as the latest persisted thread mutation.

This is the server-side alternative to #27697. Rather than narrowing
`updatedAt`, clients can sort the sidebar by `recency_at` and continue
treating `updatedAt` as mutation time.

Paired Codex Apps PR:
[openai/openai#1024599](https://github.com/openai/openai/pull/1024599)

## Contract

- `recencyAt` initializes when a thread is created.
- A turn start advances `recencyAt` monotonically.
- Commentary, agent output, tool results, token/accounting updates, turn
completion, archive, unarchive, resume, and generic metadata writes do
not advance it.
- `updatedAt` retains its existing behavior and continues to advance for
persisted thread mutations.
- Current servers populate `recencyAt`; the response field is optional
in generated TypeScript so clients connected to older servers can fall
back to `updatedAt`.
- Filesystem-only fallback uses existing updated/mtime ordering when
SQLite is unavailable.

## Persistence and compatibility

Migration 0038 adds second- and millisecond-precision recency columns,
backfills them from the existing updated timestamp, creates list
indexes, and includes an insert trigger so older binaries writing to a
migrated database seed recency without causing later mutations to
advance it.

Generic metadata upserts preserve existing recency values. Turn-start
updates use a dedicated monotonic touch, and process-local allocation
keeps millisecond cursor values unique. State DB list, search, read,
filtered-list repair, rollout fallback propagation, and app-server
conversions all carry the new field.

## API

`Thread` responses include:

```ts
recencyAt?: number
```

`thread/list` and `thread/search` accept:

```json
{ "sortKey": "recency_at" }
```

Generated TypeScript and JSON schemas are included.

## Validation

- `just test -p codex-state` — 146 passed
- `just test -p codex-rollout` — 69 passed
- `just test -p codex-thread-store` — 81 passed
- `just test -p codex-app-server-protocol` — 231 passed
- Focused app-server list ordering, response mapping, archive/unarchive,
and resume lifecycle tests passed
- Scoped `just fix` for state, rollout, thread-store,
app-server-protocol, and app-server
- `just fmt`
- `git diff --check`
- Independent correctness, simplicity, elegance, security, and
test-quality reviews; actionable ordering, lifecycle, query-projection,
and timestamp-uniqueness findings were addressed

Jeremy Rose · 2026-06-16 17:06:22 -07:00

fac3158c2a

[codex] Record external agent import results (#28396 )

## Summary
- restore `externalAgentConfig/import/progress` notifications while
keeping `externalAgentConfig/import/completed` as the must-deliver event
- persist completed external-agent config imports in state DB by
`importId`, including concrete success/failure details for config,
AGENTS.md, skills, plugins, MCP servers, subagents, hooks, commands, and
sessions
- add `externalAgentConfig/import/readHistories` so clients can recover
persisted import results after missing the live completion notification
- include `errorType` on import failures in protocol
responses/notifications and persisted DB JSON so future code can
classify failures without another wire/storage shape change

## Validation
- `git diff --check`
- `just test -p codex-state external_agent_config_imports`
- `just test -p codex-app-server-protocol`
- `CODEX_SQLITE_HOME=/private/tmp/codex-app-server-sqlite-read-details
just test -p codex-app-server
external_agent_config_import_sends_completion_notification_for_sync_only_import`

Also ran earlier broader checks before publishing:
- `just test -p codex-state`
-
`CODEX_SQLITE_HOME=/private/tmp/codex-app-server-external-agent-test-sqlite
just test -p codex-app-server external_agent_config`
- `just test -p codex-external-agent-migration`

charlesgong-openai · 2026-06-15 23:17:24 -07:00

314fa3d25b

feat(app-server): persist remote-control desired state (#27445 )

## Why

Remote-control runtime enablement and persisted enrollment preference
were represented by separate flags. That made startup rehydration, RPC
persistence, and new-enrollment seeding race with one another, and it
did not cleanly distinguish runtime-only CLI or daemon starts from
durable app-server RPC changes.

## What Changed

- Replace the parallel enablement, seed, and rehydration flags with one
transport-owned `RemoteControlDesiredState`.
- Add nullable enrollment-scoped persistence and preserve existing
preferences during enrollment upserts.
- Rehydrate plain startup only after auth and client scope resolve,
without overwriting a concurrent RPC transition.
- Make ordinary `remoteControl/enable` and `remoteControl/disable`
durable while retaining `ephemeral: true` for runtime-only callers.
- Have the daemon explicitly request ephemeral enablement and regenerate
the app-server schemas.

## Verification

- Covered migration and `NULL`/`0`/`1` persistence round trips.
- Covered plain-start rehydration and runtime-only versus durable
enrollment seeding.
- Covered durable enable, durable disable, and ephemeral enable through
app-server RPC.
- Covered the daemon's exact `{ "ephemeral": true }` request payload.

Related issue: N/A (internal remote-control persistence architecture
change).

Anton Panasenko · 2026-06-11 21:28:52 -07:00

d61dfeb23a

Index visible thread list ordering (#27391 )

## Summary

- add partial SQLite indexes for visible thread lists ordered by
creation or update time
- match the `archived` and non-empty `preview` filters used by
`thread/list`
- add query-plan coverage for both supported sort orders

## Query performance

Benchmarked the production query shape on a snapshot of my database with
~10k threads before and after applying these indexes. The query selected
the full thread projection with `archived = 0`, `preview <> ''`, the
`openai` provider filter, and a page size of 201. Results are the mean
of 30 runs after 5 warmups:

| Query | Before | After | Speedup |
| --- | ---: | ---: | ---: |
| First page, `created_at_ms DESC` | 132.3 ms | 15.1 ms | 8.78x |
| First page, `updated_at_ms DESC` | 123.6 ms | 15.5 ms | 7.99x |
| Cursor page near row 4,000, `created_at_ms DESC` | 51.8 ms | 16.8 ms |
3.07x |
| Cursor page near row 4,000, `updated_at_ms DESC` | 52.4 ms | 17.1 ms |
3.06x |

Before this change, SQLite used `idx_threads_archived`, filtered the
candidate rows, and built a temporary B-tree for the requested ordering.
With the partial indexes, SQLite reads matching visible rows directly in
timestamp order and stops at the page limit. `EXPLAIN QUERY PLAN` no
longer reports `USE TEMP B-TREE FOR ORDER BY`.

The result rows were identical before and after. The two partial indexes
occupy approximately 168 KiB combined on this snapshot.

## Performance under contention

I noticed this issue on a database with high-contention and tried to use
simulated contention to validate the performance in that context.

A synthetic SQLite benchmark ran five concurrent readers, matching the
state database pool size, and fetched 101 rows per query. Results are
the median of three runs on fresh copies of the same database snapshot:

| Query | Before | After |
| --- | ---: | ---: |
| `created_at_ms` mean latency under saturation | 328 ms | 12 ms |
| `created_at_ms` throughput | 16 queries/s | 412 queries/s |
| `updated_at_ms` mean latency under saturation | 336 ms | 14 ms |
| `updated_at_ms` throughput | 15 queries/s | 357 queries/s |

For a burst of 100 queries queued through five connections, p95
completion time fell from 6.90 seconds to 226 ms for `created_at_ms`,
and from 6.31 seconds to 473 ms for `updated_at_ms`.

## Validation

- `just test -p codex-state` (135 tests passed)
- query-plan regression covers created-at and updated-at ordering,
requires the corresponding index, and rejects `TEMP B-TREE`
- `just fmt`

Zanie Blue · 2026-06-10 11:52:17 -05:00

2ef007dc1a

Move memory state to a dedicated SQLite DB (#24591 )

## Summary

Generated memory rows and their stage-one/stage-two job state currently
live in `state_5.sqlite` alongside thread metadata. That makes memory
cleanup and regeneration share the main state schema even though those
rows are memory-pipeline data and can be rebuilt independently from the
durable thread records.

This PR moves the memory-owned tables into a dedicated
`memories_1.sqlite` runtime database while keeping thread metadata in
`state_5.sqlite`.

## Changes

- Adds a separate memories DB runtime, migrator, path helpers, telemetry
kind, and Bazel compile data for `state/memory_migrations`.
- Introduces `MemoryStore` behind `StateRuntime::memories()` and moves
memory table/job operations onto that store.
- Drops the old memory tables from the state DB and recreates their
schema in `state/memory_migrations/0001_memories.sql`.
- Updates memory startup, citation usage tracking, rollout pollution
handling, `debug clear-memories`, and app-server `memory/reset` to
operate through the memories DB.
- Preserves cross-DB behavior by hydrating thread metadata from the
state DB when selecting visible memory outputs and checking stage-one
staleness.

## Verification

- Added/updated `codex-state` tests for deleted-thread memory visibility
and already-polluted phase-two enqueue behavior.
- Updated `debug clear-memories`, app-server `memory/reset`, and
memories startup tests to seed and assert memory rows through
`memories_1.sqlite`.

jif-oai · 2026-05-26 20:07:25 +02:00

aad59a0916

feat: dedicated goal DB (#23300 )

## Why

Thread goals are moving toward extension-owned runtime behavior, but
their persisted state was still stored in the shared state database.
This makes the goal store harder to isolate and keeps future storage
splits tied to ad hoc runtime plumbing.

This PR gives goals their own SQLite database while keeping the existing
`StateRuntime` entry point. The goal is to make this the pattern for
adding more dedicated runtime databases later.

This also reduce load on existing DB and reduce contention

## Limitation
Thread preview from goal is not supported anymore. I'm looking into this
[EDIT]: solved

## What changed

- Added a dedicated `goals_1.sqlite` database with its own
`goals_migrations` directory.
- Moved `thread_goals` creation into the goals DB migration set.
- Dropped the old `thread_goals` table from the main state DB with a
normal state migration. There is intentionally no backfill for existing
goal rows.
- Changed `GoalStore` to be backed only by the goals DB pool.
- Removed the old goal-write side effect that filled empty
`threads.preview` values from the goal objective.
- Added shared runtime DB path metadata so startup, telemetry, `codex
doctor`, and repair handling can include future DBs without bespoke path
lists.
- Updated Bazel compile data so the new goals migration directory is
available to `sqlx::migrate!`.

## Verification

- `cargo check --tests -p codex-state -p codex-cli -p codex-core -p
codex-app-server`
- `just fix -p codex-state`
- `just fix -p codex-cli`
- `just fix -p codex-app-server`

jif-oai · 2026-05-19 11:11:41 +02:00

ba57aab13a

goal: pause continuation loops on usage limits and blockers (#23094 )

Addresses #22833, #22245, #23067

## Why
`/goal` can keep synthesizing turns even when the next turn cannot make
meaningful progress. Hard usage exhaustion can replay failing turns, and
repeated permission or external-resource blockers can keep burning
tokens while waiting for user or system intervention.

## What changed
- Add resumable `blocked` and `usageLimited` goal states. As with
`paused`, goal continuation stops with these states.
- Move to `usageLimited` after usage-limit failures.
- Allow the built-in `update_goal` tool to set `blocked` only under
explicit repeated-impasse guidance. Updated goal continuation prompt to
specify that agent should use `blocked` only when it has made at least
three attempts to get past an impasse.

Most of the files touched by this PR are because of the small app server
protocol update.

## Validation

I manually reproduced a number of situations where an agent can run into
a true impasse and verified that it properly enters `blocked` state. I
then resumed and verified that it once again entered `blocked` state
several turns later if the impasse still exists.

I also manually reproduced the usage-limit condition by creating a
simulated responses API endpoint that returns 429 errors with the
appropriate error message. Verified that the goal runtime properly moves
the goal into `usageLimited` state and TUI UI updates appropriately.
Verified that `/goal resume` resumes (and immediately goes back into
`ussageLImited` state if appropriate).


## Follow-up PRs

Small changes will be needed to the GUI clients to properly handle the
two new states.

Eric Traut · 2026-05-18 11:28:53 -07:00

0d344aca9b

Use goal preview metadata for goal-first threads (#21981 )

Fixes #20792

## Why

`/goal`-first threads are valid resumable threads, but they can be
missing from `codex resume` and app recents because discovery depends on
metadata derived from a normal first user message.

PR #21489 attempted to fix this by using the goal objective as
`first_user_message`. Review feedback pointed out that
`first_user_message` does more than provide visible text today: it gates
listing, supplies preview text, and participates in deciding whether a
later title should surface as a distinct thread name. Reusing it for the
goal objective could leave a `/goal`-first thread with
`first_user_message=<goal>` and `title=<later prompt>`, even though the
goal should only provide the initial visible preview.

This PR follows that feedback by and keeps the `first_user_message` as
is but introduces a new `preview` field to separate concerns. The
`preview` field is populated from the first user message or the goal
objective. We can extend it in the future to include other sources.

## What Changed

- Added internal thread `preview` metadata in `codex-state`, including a
SQLite migration that backfills from `first_user_message` and from
existing `thread_goals` objectives when needed.
- Treated `ThreadGoalUpdated` as preview-bearing metadata so goal-first
threads can be listed and searched without mutating
`first_user_message`.
- Updated rollout listing, state queries, thread-store conversion, and
app-server mapping to use preview metadata while continuing to expose
the existing public `preview` field.
- Preserved title/name distinctness behavior around literal
`first_user_message`, so a later normal prompt after `/goal` does not
surface as a separate name just because the goal supplied the initial
preview.
- Preserved compatibility for older/internal metadata writes by deriving
preview from `first_user_message` when explicit preview metadata is
absent.

## Verification

- Manually verified that a thread that starts with a `/goal <objective>`
shows up in the resume picker.

Eric Traut · 2026-05-11 10:12:46 -07:00

f10ddc3f13

device-key: clean up unused crate (#21487 )

Ruslan Nigmatullin · 2026-05-07 09:01:44 -07:00

e64a8979b0

[codex-analytics] rework thread_source for thread analytics (#20949 )

## Summary
- make `thread_source` an explicit optional thread-level field on
`thread/start`, `thread/fork`, and returned thread payloads
- persist `thread_source` in rollout/session metadata so resumed live
threads retain the original value
- replace the old best-effort `session_source` -> `thread_source`
mapping with an explicit caller-supplied analytics classification

## Why
Before this change, analytics `thread_source` was populated by a
best-effort mapping from `session_source`. `session_source` describes
the runtime/client surface, not the actual thread-level origin, so that
projection was not accurate enough to distinguish cases such as `user`,
`subagent`, `memory_consolidation`, and future thread origins reliably.

Making `thread_source` explicit keeps one thread-level analytics field
while letting callers provide the real classification directly instead
of recovering it indirectly from `session_source`.

## Impact
For new analytics events, `thread_source` now reflects the explicit
thread-level classification supplied by the caller rather than an
inferred value derived from `session_source`. Existing protocol fields
remain optional; callers that omit `threadSource` now produce `null`
instead of a best-effort inferred value.

## Validation
- `just write-app-server-schema`
- `cargo test -p codex-analytics -p codex-core -p
codex-app-server-protocol --no-run`
- `cargo test -p codex-app-server-protocol
generated_ts_optional_nullable_fields_only_in_params`
- `cargo test -p codex-analytics
thread_initialized_event_serializes_expected_shape`
- `cargo test -p codex-core
resume_stopped_thread_from_rollout_preserves_thread_source`

rhan-oai · 2026-05-06 02:12:31 +00:00

b3d4f1a9f0

Add goal persistence foundation (1 / 5) (#18073 )

Adds the persisted goal foundation for the rest of the stack. This PR is
intentionally limited to feature flag and state-layer behavior;
app-server APIs, model tools, runtime continuation, and TUI UX are
layered in later PRs.

## Why

Goal mode needs durable thread-level state before clients or model tools
can safely build on it. The state layer needs to know whether a goal
exists, what objective it tracks, whether it is active, paused,
budget-limited, or complete, and how much time/token usage has already
been accounted.

## What changed

- Added the `goals` feature flag and generated config schema entry.
- Added the `thread_goals` state table and Rust model for persisted
thread goals.
- Added state runtime APIs for creating, replacing, updating, deleting,
and accounting goal usage.
- Added `goal_id`-based stale update protection so an old goal update
cannot overwrite a replacement.
- Kept this PR scoped to persistence and state runtime behavior, with no
app-server, model-facing, continuation, or TUI behavior yet.

## Verification

- Added state runtime coverage for goal creation, replacement, stale
update protection, status transitions, token-budget behavior, and usage
accounting.

Eric Traut · 2026-04-24 20:51:38 -07:00

0ee737cea6

app-server: persist device key bindings in sqlite (#19206 )

## Why

Device-key providers should only own platform key material. The
account/client binding used to authorize a signing payload is app-server
state, and keeping that state in provider-specific metadata makes the
same check harder to audit and harder to share across platform
implementations.

Persisting the binding in the shared state database gives the device-key
crate a platform-neutral source of truth before it asks a provider to
sign. It also lets app-server move potentially blocking key operations
off the main message processor path, which matters once providers may
wait for OS authentication prompts.

## What changed

- Add a `device_key_bindings` state migration plus `StateRuntime`
helpers keyed by `key_id`.
- Add an async `DeviceKeyBindingStore` abstraction to `codex-device-key`
and use it from `DeviceKeyStore::create` and `DeviceKeyStore::sign`.
- Keep provider calls behind async store methods and run the synchronous
provider work through `spawn_blocking`.
- Wire app-server device-key RPC handling to the SQLite-backed binding
store and spawn response/error delivery tasks for device-key requests.
- Run the turn-start tracing test on the existing larger current-thread
test harness after the larger async surface made the default test stack
too small locally.

## Validation

- `cargo test -p codex-device-key`
- `cargo test -p codex-state device_key`
- `cargo test -p codex-state`
- `cargo test -p codex-app-server device_key`
- `cargo test -p codex-app-server
message_processor::tracing_tests::turn_start_jsonrpc_span_parents_core_turn_spans`
- `cargo test -p codex-app-server`
- `just fix -p codex-device-key`
- `just fix -p codex-state`
- `just fix -p codex-app-server`
- `just bazel-lock-update`
- `just bazel-lock-check`
- `git diff --check`

Ruslan Nigmatullin · 2026-04-23 21:55:56 -07:00

19badb0be2

Support multiple cwd filters for thread list (#18502 )

## Summary

- Teach app-server `thread/list` to accept either a single `cwd` or an
array of cwd filters, returning threads whose recorded session cwd
matches any requested path
- Add `useStateDbOnly` as an explicit opt-in fast path for callers that
want to answer `thread/list` from SQLite without scanning JSONL rollout
files
- Preserve backwards compatibility: by default, `thread/list` still
scans JSONL rollouts and repairs SQLite state
- Wire the new cwd array and SQLite-only options through app-server,
local/remote thread-store, rollout listing, generated TypeScript/schema
fixtures, proto output, and docs

## Test Plan

- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-rollout`
- `cargo test -p codex-thread-store`
- `cargo test -p codex-app-server thread_list`
- `just fmt`
- `just fix -p codex-app-server-protocol -p codex-rollout -p
codex-thread-store -p codex-app-server`
- `cargo build -p codex-cli --bin codex`

acrognale-oai · 2026-04-22 06:10:09 -04:00

4f8c58f737

[tool search] support namespaced deferred dynamic tools (#18413 )

Deferred dynamic tools need to round-trip a namespace so a tool returned
by `tool_search` can be called through the same registry key that core
uses for dispatch.

This change adds namespace support for dynamic tool specs/calls,
persists it through app-server thread state, and routes dynamic tool
calls by full `ToolName` while still sending the app the leaf tool name.
Deferred dynamic tools must provide a namespace; non-deferred dynamic
tools may remain top-level.

It also introduces `LoadableToolSpec` as the shared
function-or-namespace Responses shape used by both `tool_search` output
and dynamic tool registration, so dynamic tools use the same wrapping
logic in both paths.

Validation:
- `cargo test -p codex-tools`
- `cargo test -p codex-core tool_search`

---------

Co-authored-by: Sayan Sisodiya <sayan@openai.com>

pash-openai · 2026-04-21 14:13:08 +08:00

dc1a8f2190

Moving updated-at timestamps to unique millisecond times (#17489 )

To allow the ability to have guaranteed-unique cursors, we make two
important updates:
* Add new updated_at_ms and created_at_ms columns that are in
millisecond precision
* Guarantee uniqueness -- if multiple items are inserted at the same
millisecond, bump the new one by one millisecond until it becomes unique

This lets us use single-number cursors for forwards and backwards paging
through resultsets and guarantee that the cursor is a fixed point to do
(timestamp > cursor) and get new items only.

This updated implementation is backwards-compatible since multiple
appservers can be running and won't handle the previous method well.

David de Regt · 2026-04-14 11:55:34 -04:00

4f2fc3e3fa

app-server: Add transport for remote control (#15951 )

Ruslan Nigmatullin · 2026-04-06 14:55:59 -07:00

73dab2046f

chore: drop log DB (#16433 )

Drop the log table from the state DB

jif-oai · 2026-04-01 15:49:17 +02:00

c846a57d03

feat: change multi-agent to use path-like system instead of uuids (#15313 )

This PR add an URI-based system to reference agents within a tree. This
comes from a sync between research and engineering.

The main agent (the one manually spawned by a user) is always called
`/root`. Any sub-agent spawned by it will be `/root/agent_1` for example
where `agent_1` is chosen by the model.

Any agent can contact any agents using the path.

Paths can be used either in absolute or relative to the calling agents

Resume is not supported for now on this new path

jif-oai · 2026-03-20 18:23:48 +00:00

79ad7b247b

feat: add graph representation of agent network (#15056 )

Add a representation of the agent graph. This is now used for:
* Cascade close agents (when I close a parent, it close the kids)
* Cascade resume (oposite)

Later, this will also be used for post-compaction stuffing of the
context

Direct fix for: https://github.com/openai/codex/issues/14458

jif-oai · 2026-03-19 10:21:25 +00:00

70cdb17703

Feat: CXA-1831 Persist latest model and reasoning effort in sqlite (#14859 )

### Summary
The goal is for us to get the latest turn model and reasoning effort on
thread/resume is no override is provided on the thread/resume func call.
This is the part 1 which we write the model and reasoning effort for a
thread to the sqlite db and there will be a followup PR to consume the
two new fields on thread/resume.

[part 2 PR is currently WIP](https://github.com/openai/codex/pull/14888)
and this one can be merged independently.

Shijie Rao · 2026-03-17 10:14:34 -07:00

8e258eb3f5

dynamic tool calls: add param exposeToContext to optionally hide tool (#14501 )

This extends dynamic_tool_calls to allow us to hide a tool from the
model context but still use it as part of the general tool calling
runtime (for ex from js_repl/code_mode)

Channing Conger · 2026-03-14 01:58:43 -07:00

70eddad6b0

feat: memories forgetting (#12900 )

Add diff based memory forgetting

jif-oai · 2026-02-26 13:19:57 +00:00

382fa338b3

nit: migration (#12772 )

jif-oai · 2026-02-25 13:56:52 +00:00

8d49e0d0c4

feat: record memory usage (#12761 )

jif-oai · 2026-02-25 13:48:40 +00:00

e4bfa763f6

Agent jobs (spawn_agents_on_csv) + progress UI (#10935 )

## Summary
- Add agent job support: spawn a batch of sub-agents from CSV, auto-run,
auto-export, and store results in SQLite.
- Simplify workflow: remove run/resume/get-status/export tools; spawn is
deterministic and completes in one call.
- Improve exec UX: stable, single-line progress bar with ETA; suppress
sub-agent chatter in exec.

## Why
Enables map-reduce style workflows over arbitrarily large repos using
the existing Codex orchestrator. This addresses review feedback about
overly complex job controls and non-deterministic monitoring.

## Demo (progress bar)
```
./codex-rs/target/debug/codex exec \
  --enable collab \
  --enable sqlite \
  --full-auto \
  --progress-cursor \
  -c agents.max_threads=16 \
  -C /Users/daveaitel/code/codex \
  - <<'PROMPT'
Create /tmp/agent_job_progress_demo.csv with columns: path,area and 30 rows:
path = item-01..item-30, area = test.

Then call spawn_agents_on_csv with:
- csv_path: /tmp/agent_job_progress_demo.csv
- instruction: "Run `python - <<'PY'` to sleep a random 0.3–1.2s, then output JSON with keys: path, score (int). Set score = 1."
- output_csv_path: /tmp/agent_job_progress_demo_out.csv
PROMPT
```

## Review feedback addressed
- Auto-start jobs on spawn; removed run/resume/status/export tools.
- Auto-export on success.
- More descriptive tool spec + clearer prompts.
- Avoid deadlocks on spawn failure; pending/running handled safely.
- Progress bar no longer scrolls; stable single-line redraw.

## Tests
- `cd codex-rs && cargo test -p codex-exec`
- `cd codex-rs && cargo build -p codex-cli`

daveaitel-openai · 2026-02-24 21:00:19 +00:00

dcab40123f

feat: add nick name to sub-agents (#12320 )

Adding random nick name to sub-agents. Used for UX

At the same time, also storing and wiring the role of the sub-agent

jif-oai · 2026-02-20 14:39:49 +00:00

0f9eed3a6f

state: enforce 10 MiB log caps for thread and threadless process logs (#12038 )

## Summary
- enforce a 10 MiB cap per `thread_id` in state log storage
- enforce a 10 MiB cap per `process_uuid` for threadless (`thread_id IS
NULL`) logs
- scope pruning to only keys affected by the current insert batch
- add a cheap per-key `SUM(...)` precheck so windowed prune queries only
run for keys that are currently over the cap
- add SQLite indexes used by the pruning queries
- add focused runtime tests covering both pruning behaviors

## Why
This keeps log growth bounded by the intended partition semantics while
preserving a small, readable implementation localized to the existing
insert path.

## Local Latency Snapshot (No Truncation-Pressure Run)
Collected from session `019c734f-1d16-7002-9e00-c966c9fbbcae` using
local-only (uncommitted) instrumentation, while not specifically
benchmarking the truncation-heavy regime.

### Percentiles By Query (ms)
| query | count | p50 | p90 | p95 | p99 | max |
|---|---:|---:|---:|---:|---:|---:|
| `insert_logs.insert_batch` | 110 | 0.332 | 0.999 | 1.811 | 2.978 |
3.493 |
| `insert_logs.precheck.process` | 106 | 0.074 | 0.152 | 0.206 | 0.258 |
0.426 |
| `insert_logs.precheck.thread` | 73 | 0.118 | 0.206 | 0.253 | 1.025 |
1.025 |
| `insert_logs.prune.process` | 58 | 0.291 | 0.576 | 0.607 | 1.088 |
1.088 |
| `insert_logs.prune.thread` | 44 | 0.318 | 0.467 | 0.728 | 0.797 |
0.797 |
| `insert_logs.prune_total` | 110 | 0.488 | 0.976 | 1.237 | 1.593 |
1.684 |
| `insert_logs.total` | 110 | 1.315 | 2.889 | 3.623 | 5.739 | 5.961 |
| `insert_logs.tx_begin` | 110 | 0.133 | 0.235 | 0.282 | 0.412 | 0.546 |
| `insert_logs.tx_commit` | 110 | 0.259 | 0.689 | 0.772 | 1.065 | 1.080
|

### `insert_logs.total` Histogram (ms)
| bucket | count |
|---|---:|
| `<= 0.100` | 0 |
| `<= 0.250` | 0 |
| `<= 0.500` | 7 |
| `<= 1.000` | 33 |
| `<= 2.000` | 40 |
| `<= 5.000` | 28 |
| `<= 10.000` | 2 |
| `<= 20.000` | 0 |
| `<= 50.000` | 0 |
| `<= 100.000` | 0 |
| `> 100.000` | 0 |

## Local Latency Snapshot (Truncation-Heavy / Cap-Hit Regime)
Collected from a run where cap-hit behavior was frequent (`135/180`
insert calls), using local-only (uncommitted) instrumentation and a
temporary local cap of `10_000` bytes for stress testing (not the merged
`10 MiB` cap).

### Percentiles By Query (ms)
| query | count | p50 | p90 | p95 | p99 | max |
|---|---:|---:|---:|---:|---:|---:|
| `insert_logs.insert_batch` | 180 | 0.524 | 1.645 | 2.163 | 3.424 |
3.777 |
| `insert_logs.precheck.process` | 171 | 0.086 | 0.235 | 0.373 | 0.758 |
1.147 |
| `insert_logs.precheck.thread` | 100 | 0.105 | 0.251 | 0.291 | 1.176 |
1.622 |
| `insert_logs.prune.process` | 109 | 0.386 | 0.839 | 1.146 | 1.548 |
2.588 |
| `insert_logs.prune.thread` | 56 | 0.253 | 0.550 | 1.148 | 2.484 |
2.484 |
| `insert_logs.prune_total` | 180 | 0.511 | 1.221 | 1.695 | 4.548 |
5.512 |
| `insert_logs.total` | 180 | 1.631 | 3.902 | 5.103 | 8.901 | 9.095 |
| `insert_logs.total_cap_hit` | 135 | 1.876 | 4.501 | 5.547 | 8.902 |
9.096 |
| `insert_logs.total_no_cap_hit` | 45 | 0.520 | 1.700 | 2.079 | 3.294 |
3.294 |
| `insert_logs.tx_begin` | 180 | 0.109 | 0.253 | 0.287 | 1.088 | 1.406 |
| `insert_logs.tx_commit` | 180 | 0.267 | 0.813 | 1.170 | 2.497 | 2.574
|

### `insert_logs.total` Histogram (ms)
| bucket | count |
|---|---:|
| `<= 0.100` | 0 |
| `<= 0.250` | 0 |
| `<= 0.500` | 16 |
| `<= 1.000` | 39 |
| `<= 2.000` | 60 |
| `<= 5.000` | 54 |
| `<= 10.000` | 11 |
| `<= 20.000` | 0 |
| `<= 50.000` | 0 |
| `<= 100.000` | 0 |
| `> 100.000` | 0 |

### `insert_logs.total` Histogram When Cap Was Hit (ms)
| bucket | count |
|---|---:|
| `<= 0.100` | 0 |
| `<= 0.250` | 0 |
| `<= 0.500` | 0 |
| `<= 1.000` | 22 |
| `<= 2.000` | 51 |
| `<= 5.000` | 51 |
| `<= 10.000` | 11 |
| `<= 20.000` | 0 |
| `<= 50.000` | 0 |
| `<= 100.000` | 0 |
| `> 100.000` | 0 |

### Performance Takeaways
- Even in a cap-hit-heavy run (`75%` cap-hit calls), `insert_logs.total`
stays sub-10ms at p99 (`8.901ms`) and max (`9.095ms`).
- Calls that did **not** hit the cap are materially cheaper
(`insert_logs.total_no_cap_hit` p95 `2.079ms`) than cap-hit calls
(`insert_logs.total_cap_hit` p95 `5.547ms`).
- Compared to the earlier non-truncation-pressure run, overall
`insert_logs.total` rose from p95 `3.623ms` to p95 `5.103ms`
(+`1.48ms`), indicating bounded overhead when pruning is active.
- This truncation-heavy run used an intentionally low local cap for
stress testing; with the real 10 MiB cap, cap-hit frequency should be
much lower in normal sessions.

## Testing
- `just fmt` (in `codex-rs`)
- `cargo test -p codex-state` (in `codex-rs`)

Charley Cunningham · 2026-02-18 17:08:08 -08:00

7f3dbaeb25

Add process_uuid to sqlite logs (#11534 )

## Summary
This PR is the first slice of the per-session `/feedback` logging work:
it adds a process-unique identifier to SQLite log rows.

It does **not** change `/feedback` sourcing behavior yet.

## Changes
- Add migration `0009_logs_process_id.sql` to extend `logs` with:
  - `process_uuid TEXT`
  - `idx_logs_process_uuid` index
- Extend state log models:
  - `LogEntry.process_uuid: Option<String>`
  - `LogRow.process_uuid: Option<String>`
- Stamp each log row with a stable per-process UUID in the sqlite log
layer:
  - generated once per process as `pid:<pid>:<uuid>`
- Update sqlite log insert/query paths to persist and read
`process_uuid`:
  - `INSERT INTO logs (..., process_uuid, ...)`
  - `SELECT ..., process_uuid, ... FROM logs`

## Why
App-server runs many sessions in one process. This change provides a
process-scoping primitive we need for follow-up `/feedback` work, so
threadless/process-level logs can be associated with the emitting
process without mixing across processes.

## Non-goals in this PR
- No `/feedback` transport/source changes
- No attachment size changes
- No sqlite retention/trim policy changes

## Testing
- `just fmt`
- CI will run the full checks

Charley Cunningham · 2026-02-14 17:27:22 -08:00

fce4ad9cf4

feat: add slug in name (#11739 )

jif-oai · 2026-02-13 15:24:03 +00:00

db66d827be

feat: mem v2 - PR5 (#11372 )

jif-oai · 2026-02-10 23:22:55 +00:00

2c9be54c9a

chore: unify memory job flow (#11334 )

jif-oai · 2026-02-10 20:26:39 +00:00

a6e9469fa4

feat: align memory phase 1 and make it stronger (#11300 )

## Align with the new phase-1 design

Basically we know run phase 1 in parallel by considering:
* Max 64 rollouts
* Max 1 month old
* Consider the most recent first

This PR also adds stronger parallelization capabilities by detecting
stale jobs, retry policies, ownership of computation to prevent double
computations etc etc

jif-oai · 2026-02-10 13:42:09 +00:00

1d5eba0090

state: add memory consolidation lock primitives (#11199 )

## Summary
- add a migration for memory_consolidation_locks
- add acquire/release lock primitives to codex-state runtime
- add core/state_db wrappers and cwd normalization for memory queries
and lock keys

## Testing
- cargo test -p codex-state memory_consolidation_lock_
- cargo test -p codex-core --lib state_db::

jif-oai · 2026-02-09 21:04:20 +00:00

74ecd6e3b2

Leverage state DB metadata for thread summaries (#10621 )

Summary:
- read conversation summaries and cwd info from the state DB when
possible so we no longer rely on rollout files for metadata and avoid
extra I/O
- persist CLI version in thread metadata, surface it through summary
builders, and add the necessary DB migration hooks
- simplify thread listing by using enriched state DB data directly
rather than reading rollout heads

Testing:
- Not run (not requested)

jif-oai · 2026-02-05 16:39:11 +00:00

9ee746afd6

feat: resumable backfill (#10745 )

## Summary

This PR makes SQLite rollout backfill resumable and repeatable instead
of one-shot-on-db-create.

## What changed

- Added a persisted backfill state table:
  - state/migrations/0008_backfill_state.sql
- Tracks status (pending|running|complete), last_watermark, and
last_success_at.
- Added backfill state model/types in codex-state:
  - BackfillState, BackfillStatus (state/src/model/backfill_state.rs)
- Added runtime APIs to manage backfill lifecycle/progress:
  - get_backfill_state
  - mark_backfill_running
  - checkpoint_backfill
  - mark_backfill_complete
- Updated core startup behavior:
- Backfill now runs whenever state is not Complete (not only when DB
file is newly created).
- Reworked backfill execution:
- Collect rollout files, derive deterministic watermark per path, sort,
resume from last_watermark.
- Process in batches (BACKFILL_BATCH_SIZE = 200), checkpoint after each
batch.
  - Mark complete with last_success_at at the end.

## Why

Previous behavior could leave users permanently partially backfilled if
the process exited during initial async backfill. This change allows
safe continuation across restarts and avoids restarting from scratch.

jif-oai · 2026-02-05 14:34:34 +00:00

4033f905c6

feat: add phase 1 mem db (#10634 )

- Schema: thread_id (PK, FK to threads.id with cascade delete),
trace_summary, memory_summary, updated_at.
- Migration: creates the table and an index on (updated_at DESC,
thread_id DESC) for efficient recent-first reads.
  - Runtime API (DB-only):
      - `get_thread_memory(thread_id)`: fetch one memory row.
- `upsert_thread_memory(thread_id, trace_summary, memory_summary)`:
insert/update by thread id and always advance updated_at.
- `get_last_n_thread_memories_for_cwd(cwd, n)`: join thread_memory with
threads and return newest n rows for an exact cwd match.
- Model layer: introduced ThreadMemory and row conversion types to keep
query decoding typed and consistent with existing state models.

jif-oai · 2026-02-04 21:38:39 +00:00

4922b3e571

[feat] persist thread_dynamic_tools in db (#10252 )

Persist thread_dynamic_tools in sqlite and read first from it. Fall back
to rollout files if it's not found. Persist dynamic tools to both sqlite
and rollout files.

Saw that new sessions get populated to db correctly & old sessions get
backfilled correctly at startup:
```
celia@com-92114 codex-rs % sqlite3 ~/.codex/state.sqlite \      "select thread_id, position,name,description,input_schema from thread_dynamic_tools;"
019c0cad-ec0d-74b2-a787-e8b33a349117|0|geo_lookup|lookup a city|{"properties":{"city":{"type":"string"}},"required":["city"],"type":"object"}
....
019c10ca-aa4b-7620-ae40-c0919fbd7ea7|0|geo_lookup|lookup a city|{"properties":{"city":{"type":"string"}},"required":["city"],"type":"object"}
```

Celia Chen · 2026-02-03 00:06:44 +00:00

fb2df99cf1

chore: unify log queries (#10152 )

Unify log queries to only have SQLX code in the runtime and use it for
both the log client and for tests

jif-oai · 2026-01-29 16:28:15 +00:00

e6c4f548ab

feat: adding thread ID to logs + filter in the client (#10150 )

jif-oai · 2026-01-29 16:53:30 +01:00

89c5f3c4d4

chore: improve client (#10149 )

<img width="883" height="84" alt="Screenshot 2026-01-29 at 11 13 12"
src="https://github.com/user-attachments/assets/090a2fec-94ed-4c0f-aee5-1653ed8b1439"
/>

jif-oai · 2026-01-29 11:25:22 +01:00

4ba911d48c

feat: add log db (#10086 )

Add a log DB. The goal is just to store our logs in a `.sqlite` DB to
make it easier to crawl them and drop the oldest ones.

jif-oai · 2026-01-29 10:23:03 +01:00

780482da84

feat: sqlite 1 (#10004 )

Add a `.sqlite` database to be used to store rollout metatdata (and
later logs)
This PR is phase 1:
* Add the database and the required infrastructure
* Add a backfill of the database
* Persist the newly created rollout both in files and in the DB
* When we need to get metadata or a rollout, consider the `JSONL` as the
source of truth but compare the results with the DB and show any errors

jif-oai · 2026-01-28 15:29:14 +01:00

3878c3dc7c

45 Commits