45 Commits

  • feat(app-server): add history_mode to thread (#29927)
    ## Description
    
    This PR adds a new `historyMode = "legacy" | "paginated"` to `Thread`.
    This will be stored in `SessionMeta` in the JSONL rollout file and as a
    new column in the SQLite thread_metadata table, and exposed on
    `thread/start` and on the `Thread` object in app-server.
    
    ## What changed
    
    - Added canonical `ThreadHistoryMode` with `legacy` and `paginated`,
    defaulting old and new SessionMeta to `legacy`.
    - Carried `history_mode` through core session config, ThreadStore stored
    metadata, local/in-memory stores, rollout metadata extraction, and the
    existing SQLite `threads` table.
    - Added experimental `historyMode` to app-server v2 `Thread` and
    `thread/start`.
    - Made paginated stored threads metadata-discoverable but unsupported
    for legacy full-history reads, `load_history`, live resume, and create
    paths.
    - Regenerated app-server schema fixtures and added
    protocol/state/thread-store/app-server coverage for persistence and
    fail-closed behavior.
    
    ## Compatibility floor
    Because users may be running various versions of Codex binaries on the
    same machine (TUI, Codex App, etc.), we will need to establish a
    compatibility floor for upcoming paginated threads, which will change
    how thread storage reads and writes work.
    
    The overall plan here:
    ```
    Release N:
    - Add historyMode to SessionMeta / Thread / SQLite metadata.
    - Teach binaries to understand paginated threads.
    - If a binary sees `historyMode="paginated"` but does not support the paginated contract, it refuses to resume/mutate the thread.
    - Default remains `"legacy"`.
    
    Release N+1:
    - First-party clients start opting into paginated threads where appropriate.
    - Internal dogfood / staged rollout.
    - Measure old-client usage and paginated-thread unsupported errors.
    
    Release N+2:
    - Only after Release N+ is overwhelmingly deployed, make paginated the default.
    - Accept that a small tail of N-1-or-older binaries may not understand paginated threads.
    ```
    
    The important behavior change is fail-closed handling for a binary that
    encounters a persisted `paginated` thread before it knows how to fully
    support paginated history. In app-server, if a thread is `paginated`, we
    will:
    
    - allow metadata-only discovery paths like `thread/list` and
    `thread/read(includeTurns=false)`, so clients can still see the thread
    and inspect its `historyMode`
    - reject legacy full-history/live-thread paths like
    `thread/read(includeTurns=true)` and `thread/resume` with an unsupported
    JSON-RPC error
    - avoid silently treating an unknown or future `historyMode` as `legacy`
    
    Under the hood, the ThreadStore layer also rejects legacy operations
    that would need to load or replay the full thread history for a
    paginated thread. That gives us the behavior we want for Release N:
    future paginated threads are visible, but this binary fails closed
    instead of trying to operate on them as if they were legacy threads.
  • [codex] Restore thread recency with compatible migration history (#28671)
    ## Summary
    
    - Revert #28655, restoring the thread `recencyAt` behavior introduced by
    #27910.
    - Move `threads_recency_at` to migration 0039 so it no longer collides
    with `external_agent_config_imports` at version 0038.
    - Repair databases that already applied the recency migration as version
    38 by moving the matching migration-history row to version 39 before
    SQLx validation. The current version-38 migration can then apply
    normally.
    
    ## Validation
    
    - `just test -p codex-state
    migrations::tests::repairs_recency_migration_that_was_applied_as_version_38`
    - `just test -p codex-state -p codex-rollout -p codex-thread-store -p
    codex-app-server-protocol -p codex-tui`: 3,439 passed; six TUI tests
    could not open the machine's existing read-only incident database at
    `~/.codex/sqlite/state_5.sqlite`.
    - `just fix -p codex-state`
    - `just fmt`
    - Verified that state migration versions are unique.
  • Revert thread recencyAt for sidebar ordering (#28655)
    ## Why
    
    Revert #27910 to remove the newly introduced thread `recencyAt`
    persistence and API behavior from `main`.
    
    ## What changed
    
    This reverts commit `fac3158c2a783095768076489815f361fa9b0db4`,
    including the state migration, thread-store propagation, app-server API
    surface, generated schemas, and related tests.
    
    ## Validation
    
    Not run before opening; relying on CI for the initial fast signal.
  • Add thread recencyAt for sidebar ordering (#27910)
    ## Summary
    
    Add a server-owned `recencyAt` timestamp and `recency_at` thread-list
    sort key for product recency ordering while preserving the existing
    meaning of `updatedAt` as the latest persisted thread mutation.
    
    This is the server-side alternative to #27697. Rather than narrowing
    `updatedAt`, clients can sort the sidebar by `recency_at` and continue
    treating `updatedAt` as mutation time.
    
    Paired Codex Apps PR:
    [openai/openai#1024599](https://github.com/openai/openai/pull/1024599)
    
    ## Contract
    
    - `recencyAt` initializes when a thread is created.
    - A turn start advances `recencyAt` monotonically.
    - Commentary, agent output, tool results, token/accounting updates, turn
    completion, archive, unarchive, resume, and generic metadata writes do
    not advance it.
    - `updatedAt` retains its existing behavior and continues to advance for
    persisted thread mutations.
    - Current servers populate `recencyAt`; the response field is optional
    in generated TypeScript so clients connected to older servers can fall
    back to `updatedAt`.
    - Filesystem-only fallback uses existing updated/mtime ordering when
    SQLite is unavailable.
    
    ## Persistence and compatibility
    
    Migration 0038 adds second- and millisecond-precision recency columns,
    backfills them from the existing updated timestamp, creates list
    indexes, and includes an insert trigger so older binaries writing to a
    migrated database seed recency without causing later mutations to
    advance it.
    
    Generic metadata upserts preserve existing recency values. Turn-start
    updates use a dedicated monotonic touch, and process-local allocation
    keeps millisecond cursor values unique. State DB list, search, read,
    filtered-list repair, rollout fallback propagation, and app-server
    conversions all carry the new field.
    
    ## API
    
    `Thread` responses include:
    
    ```ts
    recencyAt?: number
    ```
    
    `thread/list` and `thread/search` accept:
    
    ```json
    { "sortKey": "recency_at" }
    ```
    
    Generated TypeScript and JSON schemas are included.
    
    ## Validation
    
    - `just test -p codex-state` — 146 passed
    - `just test -p codex-rollout` — 69 passed
    - `just test -p codex-thread-store` — 81 passed
    - `just test -p codex-app-server-protocol` — 231 passed
    - Focused app-server list ordering, response mapping, archive/unarchive,
    and resume lifecycle tests passed
    - Scoped `just fix` for state, rollout, thread-store,
    app-server-protocol, and app-server
    - `just fmt`
    - `git diff --check`
    - Independent correctness, simplicity, elegance, security, and
    test-quality reviews; actionable ordering, lifecycle, query-projection,
    and timestamp-uniqueness findings were addressed
  • [codex] Record external agent import results (#28396)
    ## Summary
    - restore `externalAgentConfig/import/progress` notifications while
    keeping `externalAgentConfig/import/completed` as the must-deliver event
    - persist completed external-agent config imports in state DB by
    `importId`, including concrete success/failure details for config,
    AGENTS.md, skills, plugins, MCP servers, subagents, hooks, commands, and
    sessions
    - add `externalAgentConfig/import/readHistories` so clients can recover
    persisted import results after missing the live completion notification
    - include `errorType` on import failures in protocol
    responses/notifications and persisted DB JSON so future code can
    classify failures without another wire/storage shape change
    
    ## Validation
    - `git diff --check`
    - `just test -p codex-state external_agent_config_imports`
    - `just test -p codex-app-server-protocol`
    - `CODEX_SQLITE_HOME=/private/tmp/codex-app-server-sqlite-read-details
    just test -p codex-app-server
    external_agent_config_import_sends_completion_notification_for_sync_only_import`
    
    Also ran earlier broader checks before publishing:
    - `just test -p codex-state`
    -
    `CODEX_SQLITE_HOME=/private/tmp/codex-app-server-external-agent-test-sqlite
    just test -p codex-app-server external_agent_config`
    - `just test -p codex-external-agent-migration`
  • feat(app-server): persist remote-control desired state (#27445)
    ## Why
    
    Remote-control runtime enablement and persisted enrollment preference
    were represented by separate flags. That made startup rehydration, RPC
    persistence, and new-enrollment seeding race with one another, and it
    did not cleanly distinguish runtime-only CLI or daemon starts from
    durable app-server RPC changes.
    
    ## What Changed
    
    - Replace the parallel enablement, seed, and rehydration flags with one
    transport-owned `RemoteControlDesiredState`.
    - Add nullable enrollment-scoped persistence and preserve existing
    preferences during enrollment upserts.
    - Rehydrate plain startup only after auth and client scope resolve,
    without overwriting a concurrent RPC transition.
    - Make ordinary `remoteControl/enable` and `remoteControl/disable`
    durable while retaining `ephemeral: true` for runtime-only callers.
    - Have the daemon explicitly request ephemeral enablement and regenerate
    the app-server schemas.
    
    ## Verification
    
    - Covered migration and `NULL`/`0`/`1` persistence round trips.
    - Covered plain-start rehydration and runtime-only versus durable
    enrollment seeding.
    - Covered durable enable, durable disable, and ephemeral enable through
    app-server RPC.
    - Covered the daemon's exact `{ "ephemeral": true }` request payload.
    
    Related issue: N/A (internal remote-control persistence architecture
    change).
  • Index visible thread list ordering (#27391)
    ## Summary
    
    - add partial SQLite indexes for visible thread lists ordered by
    creation or update time
    - match the `archived` and non-empty `preview` filters used by
    `thread/list`
    - add query-plan coverage for both supported sort orders
    
    ## Query performance
    
    Benchmarked the production query shape on a snapshot of my database with
    ~10k threads before and after applying these indexes. The query selected
    the full thread projection with `archived = 0`, `preview <> ''`, the
    `openai` provider filter, and a page size of 201. Results are the mean
    of 30 runs after 5 warmups:
    
    | Query | Before | After | Speedup |
    | --- | ---: | ---: | ---: |
    | First page, `created_at_ms DESC` | 132.3 ms | 15.1 ms | 8.78x |
    | First page, `updated_at_ms DESC` | 123.6 ms | 15.5 ms | 7.99x |
    | Cursor page near row 4,000, `created_at_ms DESC` | 51.8 ms | 16.8 ms |
    3.07x |
    | Cursor page near row 4,000, `updated_at_ms DESC` | 52.4 ms | 17.1 ms |
    3.06x |
    
    Before this change, SQLite used `idx_threads_archived`, filtered the
    candidate rows, and built a temporary B-tree for the requested ordering.
    With the partial indexes, SQLite reads matching visible rows directly in
    timestamp order and stops at the page limit. `EXPLAIN QUERY PLAN` no
    longer reports `USE TEMP B-TREE FOR ORDER BY`.
    
    The result rows were identical before and after. The two partial indexes
    occupy approximately 168 KiB combined on this snapshot.
    
    ## Performance under contention
    
    I noticed this issue on a database with high-contention and tried to use
    simulated contention to validate the performance in that context.
    
    A synthetic SQLite benchmark ran five concurrent readers, matching the
    state database pool size, and fetched 101 rows per query. Results are
    the median of three runs on fresh copies of the same database snapshot:
    
    | Query | Before | After |
    | --- | ---: | ---: |
    | `created_at_ms` mean latency under saturation | 328 ms | 12 ms |
    | `created_at_ms` throughput | 16 queries/s | 412 queries/s |
    | `updated_at_ms` mean latency under saturation | 336 ms | 14 ms |
    | `updated_at_ms` throughput | 15 queries/s | 357 queries/s |
    
    For a burst of 100 queries queued through five connections, p95
    completion time fell from 6.90 seconds to 226 ms for `created_at_ms`,
    and from 6.31 seconds to 473 ms for `updated_at_ms`.
    
    ## Validation
    
    - `just test -p codex-state` (135 tests passed)
    - query-plan regression covers created-at and updated-at ordering,
    requires the corresponding index, and rejects `TEMP B-TREE`
    - `just fmt`
  • Move memory state to a dedicated SQLite DB (#24591)
    ## Summary
    
    Generated memory rows and their stage-one/stage-two job state currently
    live in `state_5.sqlite` alongside thread metadata. That makes memory
    cleanup and regeneration share the main state schema even though those
    rows are memory-pipeline data and can be rebuilt independently from the
    durable thread records.
    
    This PR moves the memory-owned tables into a dedicated
    `memories_1.sqlite` runtime database while keeping thread metadata in
    `state_5.sqlite`.
    
    ## Changes
    
    - Adds a separate memories DB runtime, migrator, path helpers, telemetry
    kind, and Bazel compile data for `state/memory_migrations`.
    - Introduces `MemoryStore` behind `StateRuntime::memories()` and moves
    memory table/job operations onto that store.
    - Drops the old memory tables from the state DB and recreates their
    schema in `state/memory_migrations/0001_memories.sql`.
    - Updates memory startup, citation usage tracking, rollout pollution
    handling, `debug clear-memories`, and app-server `memory/reset` to
    operate through the memories DB.
    - Preserves cross-DB behavior by hydrating thread metadata from the
    state DB when selecting visible memory outputs and checking stage-one
    staleness.
    
    ## Verification
    
    - Added/updated `codex-state` tests for deleted-thread memory visibility
    and already-polluted phase-two enqueue behavior.
    - Updated `debug clear-memories`, app-server `memory/reset`, and
    memories startup tests to seed and assert memory rows through
    `memories_1.sqlite`.
  • feat: dedicated goal DB (#23300)
    ## Why
    
    Thread goals are moving toward extension-owned runtime behavior, but
    their persisted state was still stored in the shared state database.
    This makes the goal store harder to isolate and keeps future storage
    splits tied to ad hoc runtime plumbing.
    
    This PR gives goals their own SQLite database while keeping the existing
    `StateRuntime` entry point. The goal is to make this the pattern for
    adding more dedicated runtime databases later.
    
    This also reduce load on existing DB and reduce contention
    
    ## Limitation
    Thread preview from goal is not supported anymore. I'm looking into this
    [EDIT]: solved
    
    ## What changed
    
    - Added a dedicated `goals_1.sqlite` database with its own
    `goals_migrations` directory.
    - Moved `thread_goals` creation into the goals DB migration set.
    - Dropped the old `thread_goals` table from the main state DB with a
    normal state migration. There is intentionally no backfill for existing
    goal rows.
    - Changed `GoalStore` to be backed only by the goals DB pool.
    - Removed the old goal-write side effect that filled empty
    `threads.preview` values from the goal objective.
    - Added shared runtime DB path metadata so startup, telemetry, `codex
    doctor`, and repair handling can include future DBs without bespoke path
    lists.
    - Updated Bazel compile data so the new goals migration directory is
    available to `sqlx::migrate!`.
    
    ## Verification
    
    - `cargo check --tests -p codex-state -p codex-cli -p codex-core -p
    codex-app-server`
    - `just fix -p codex-state`
    - `just fix -p codex-cli`
    - `just fix -p codex-app-server`
  • goal: pause continuation loops on usage limits and blockers (#23094)
    Addresses #22833, #22245, #23067
    
    ## Why
    `/goal` can keep synthesizing turns even when the next turn cannot make
    meaningful progress. Hard usage exhaustion can replay failing turns, and
    repeated permission or external-resource blockers can keep burning
    tokens while waiting for user or system intervention.
    
    ## What changed
    - Add resumable `blocked` and `usageLimited` goal states. As with
    `paused`, goal continuation stops with these states.
    - Move to `usageLimited` after usage-limit failures.
    - Allow the built-in `update_goal` tool to set `blocked` only under
    explicit repeated-impasse guidance. Updated goal continuation prompt to
    specify that agent should use `blocked` only when it has made at least
    three attempts to get past an impasse.
    
    Most of the files touched by this PR are because of the small app server
    protocol update.
    
    ## Validation
    
    I manually reproduced a number of situations where an agent can run into
    a true impasse and verified that it properly enters `blocked` state. I
    then resumed and verified that it once again entered `blocked` state
    several turns later if the impasse still exists.
    
    I also manually reproduced the usage-limit condition by creating a
    simulated responses API endpoint that returns 429 errors with the
    appropriate error message. Verified that the goal runtime properly moves
    the goal into `usageLimited` state and TUI UI updates appropriately.
    Verified that `/goal resume` resumes (and immediately goes back into
    `ussageLImited` state if appropriate).
    
    
    ## Follow-up PRs
    
    Small changes will be needed to the GUI clients to properly handle the
    two new states.
  • Use goal preview metadata for goal-first threads (#21981)
    Fixes #20792
    
    ## Why
    
    `/goal`-first threads are valid resumable threads, but they can be
    missing from `codex resume` and app recents because discovery depends on
    metadata derived from a normal first user message.
    
    PR #21489 attempted to fix this by using the goal objective as
    `first_user_message`. Review feedback pointed out that
    `first_user_message` does more than provide visible text today: it gates
    listing, supplies preview text, and participates in deciding whether a
    later title should surface as a distinct thread name. Reusing it for the
    goal objective could leave a `/goal`-first thread with
    `first_user_message=<goal>` and `title=<later prompt>`, even though the
    goal should only provide the initial visible preview.
    
    This PR follows that feedback by and keeps the `first_user_message` as
    is but introduces a new `preview` field to separate concerns. The
    `preview` field is populated from the first user message or the goal
    objective. We can extend it in the future to include other sources.
    
    ## What Changed
    
    - Added internal thread `preview` metadata in `codex-state`, including a
    SQLite migration that backfills from `first_user_message` and from
    existing `thread_goals` objectives when needed.
    - Treated `ThreadGoalUpdated` as preview-bearing metadata so goal-first
    threads can be listed and searched without mutating
    `first_user_message`.
    - Updated rollout listing, state queries, thread-store conversion, and
    app-server mapping to use preview metadata while continuing to expose
    the existing public `preview` field.
    - Preserved title/name distinctness behavior around literal
    `first_user_message`, so a later normal prompt after `/goal` does not
    surface as a separate name just because the goal supplied the initial
    preview.
    - Preserved compatibility for older/internal metadata writes by deriving
    preview from `first_user_message` when explicit preview metadata is
    absent.
    
    ## Verification
    
    - Manually verified that a thread that starts with a `/goal <objective>`
    shows up in the resume picker.
  • [codex-analytics] rework thread_source for thread analytics (#20949)
    ## Summary
    - make `thread_source` an explicit optional thread-level field on
    `thread/start`, `thread/fork`, and returned thread payloads
    - persist `thread_source` in rollout/session metadata so resumed live
    threads retain the original value
    - replace the old best-effort `session_source` -> `thread_source`
    mapping with an explicit caller-supplied analytics classification
    
    ## Why
    Before this change, analytics `thread_source` was populated by a
    best-effort mapping from `session_source`. `session_source` describes
    the runtime/client surface, not the actual thread-level origin, so that
    projection was not accurate enough to distinguish cases such as `user`,
    `subagent`, `memory_consolidation`, and future thread origins reliably.
    
    Making `thread_source` explicit keeps one thread-level analytics field
    while letting callers provide the real classification directly instead
    of recovering it indirectly from `session_source`.
    
    ## Impact
    For new analytics events, `thread_source` now reflects the explicit
    thread-level classification supplied by the caller rather than an
    inferred value derived from `session_source`. Existing protocol fields
    remain optional; callers that omit `threadSource` now produce `null`
    instead of a best-effort inferred value.
    
    ## Validation
    - `just write-app-server-schema`
    - `cargo test -p codex-analytics -p codex-core -p
    codex-app-server-protocol --no-run`
    - `cargo test -p codex-app-server-protocol
    generated_ts_optional_nullable_fields_only_in_params`
    - `cargo test -p codex-analytics
    thread_initialized_event_serializes_expected_shape`
    - `cargo test -p codex-core
    resume_stopped_thread_from_rollout_preserves_thread_source`
  • Add goal persistence foundation (1 / 5) (#18073)
    Adds the persisted goal foundation for the rest of the stack. This PR is
    intentionally limited to feature flag and state-layer behavior;
    app-server APIs, model tools, runtime continuation, and TUI UX are
    layered in later PRs.
    
    ## Why
    
    Goal mode needs durable thread-level state before clients or model tools
    can safely build on it. The state layer needs to know whether a goal
    exists, what objective it tracks, whether it is active, paused,
    budget-limited, or complete, and how much time/token usage has already
    been accounted.
    
    ## What changed
    
    - Added the `goals` feature flag and generated config schema entry.
    - Added the `thread_goals` state table and Rust model for persisted
    thread goals.
    - Added state runtime APIs for creating, replacing, updating, deleting,
    and accounting goal usage.
    - Added `goal_id`-based stale update protection so an old goal update
    cannot overwrite a replacement.
    - Kept this PR scoped to persistence and state runtime behavior, with no
    app-server, model-facing, continuation, or TUI behavior yet.
    
    ## Verification
    
    - Added state runtime coverage for goal creation, replacement, stale
    update protection, status transitions, token-budget behavior, and usage
    accounting.
  • app-server: persist device key bindings in sqlite (#19206)
    ## Why
    
    Device-key providers should only own platform key material. The
    account/client binding used to authorize a signing payload is app-server
    state, and keeping that state in provider-specific metadata makes the
    same check harder to audit and harder to share across platform
    implementations.
    
    Persisting the binding in the shared state database gives the device-key
    crate a platform-neutral source of truth before it asks a provider to
    sign. It also lets app-server move potentially blocking key operations
    off the main message processor path, which matters once providers may
    wait for OS authentication prompts.
    
    ## What changed
    
    - Add a `device_key_bindings` state migration plus `StateRuntime`
    helpers keyed by `key_id`.
    - Add an async `DeviceKeyBindingStore` abstraction to `codex-device-key`
    and use it from `DeviceKeyStore::create` and `DeviceKeyStore::sign`.
    - Keep provider calls behind async store methods and run the synchronous
    provider work through `spawn_blocking`.
    - Wire app-server device-key RPC handling to the SQLite-backed binding
    store and spawn response/error delivery tasks for device-key requests.
    - Run the turn-start tracing test on the existing larger current-thread
    test harness after the larger async surface made the default test stack
    too small locally.
    
    ## Validation
    
    - `cargo test -p codex-device-key`
    - `cargo test -p codex-state device_key`
    - `cargo test -p codex-state`
    - `cargo test -p codex-app-server device_key`
    - `cargo test -p codex-app-server
    message_processor::tracing_tests::turn_start_jsonrpc_span_parents_core_turn_spans`
    - `cargo test -p codex-app-server`
    - `just fix -p codex-device-key`
    - `just fix -p codex-state`
    - `just fix -p codex-app-server`
    - `just bazel-lock-update`
    - `just bazel-lock-check`
    - `git diff --check`
  • Support multiple cwd filters for thread list (#18502)
    ## Summary
    
    - Teach app-server `thread/list` to accept either a single `cwd` or an
    array of cwd filters, returning threads whose recorded session cwd
    matches any requested path
    - Add `useStateDbOnly` as an explicit opt-in fast path for callers that
    want to answer `thread/list` from SQLite without scanning JSONL rollout
    files
    - Preserve backwards compatibility: by default, `thread/list` still
    scans JSONL rollouts and repairs SQLite state
    - Wire the new cwd array and SQLite-only options through app-server,
    local/remote thread-store, rollout listing, generated TypeScript/schema
    fixtures, proto output, and docs
    
    ## Test Plan
    
    - `cargo test -p codex-app-server-protocol`
    - `cargo test -p codex-rollout`
    - `cargo test -p codex-thread-store`
    - `cargo test -p codex-app-server thread_list`
    - `just fmt`
    - `just fix -p codex-app-server-protocol -p codex-rollout -p
    codex-thread-store -p codex-app-server`
    - `cargo build -p codex-cli --bin codex`
  • [tool search] support namespaced deferred dynamic tools (#18413)
    Deferred dynamic tools need to round-trip a namespace so a tool returned
    by `tool_search` can be called through the same registry key that core
    uses for dispatch.
    
    This change adds namespace support for dynamic tool specs/calls,
    persists it through app-server thread state, and routes dynamic tool
    calls by full `ToolName` while still sending the app the leaf tool name.
    Deferred dynamic tools must provide a namespace; non-deferred dynamic
    tools may remain top-level.
    
    It also introduces `LoadableToolSpec` as the shared
    function-or-namespace Responses shape used by both `tool_search` output
    and dynamic tool registration, so dynamic tools use the same wrapping
    logic in both paths.
    
    Validation:
    - `cargo test -p codex-tools`
    - `cargo test -p codex-core tool_search`
    
    ---------
    
    Co-authored-by: Sayan Sisodiya <sayan@openai.com>
  • Moving updated-at timestamps to unique millisecond times (#17489)
    To allow the ability to have guaranteed-unique cursors, we make two
    important updates:
    * Add new updated_at_ms and created_at_ms columns that are in
    millisecond precision
    * Guarantee uniqueness -- if multiple items are inserted at the same
    millisecond, bump the new one by one millisecond until it becomes unique
    
    This lets us use single-number cursors for forwards and backwards paging
    through resultsets and guarantee that the cursor is a fixed point to do
    (timestamp > cursor) and get new items only.
    
    This updated implementation is backwards-compatible since multiple
    appservers can be running and won't handle the previous method well.
  • chore: drop log DB (#16433)
    Drop the log table from the state DB
  • feat: change multi-agent to use path-like system instead of uuids (#15313)
    This PR add an URI-based system to reference agents within a tree. This
    comes from a sync between research and engineering.
    
    The main agent (the one manually spawned by a user) is always called
    `/root`. Any sub-agent spawned by it will be `/root/agent_1` for example
    where `agent_1` is chosen by the model.
    
    Any agent can contact any agents using the path.
    
    Paths can be used either in absolute or relative to the calling agents
    
    Resume is not supported for now on this new path
  • feat: add graph representation of agent network (#15056)
    Add a representation of the agent graph. This is now used for:
    * Cascade close agents (when I close a parent, it close the kids)
    * Cascade resume (oposite)
    
    Later, this will also be used for post-compaction stuffing of the
    context
    
    Direct fix for: https://github.com/openai/codex/issues/14458
  • Feat: CXA-1831 Persist latest model and reasoning effort in sqlite (#14859)
    ### Summary
    The goal is for us to get the latest turn model and reasoning effort on
    thread/resume is no override is provided on the thread/resume func call.
    This is the part 1 which we write the model and reasoning effort for a
    thread to the sqlite db and there will be a followup PR to consume the
    two new fields on thread/resume.
    
    [part 2 PR is currently WIP](https://github.com/openai/codex/pull/14888)
    and this one can be merged independently.
  • dynamic tool calls: add param exposeToContext to optionally hide tool (#14501)
    This extends dynamic_tool_calls to allow us to hide a tool from the
    model context but still use it as part of the general tool calling
    runtime (for ex from js_repl/code_mode)
  • feat: memories forgetting (#12900)
    Add diff based memory forgetting
  • Agent jobs (spawn_agents_on_csv) + progress UI (#10935)
    ## Summary
    - Add agent job support: spawn a batch of sub-agents from CSV, auto-run,
    auto-export, and store results in SQLite.
    - Simplify workflow: remove run/resume/get-status/export tools; spawn is
    deterministic and completes in one call.
    - Improve exec UX: stable, single-line progress bar with ETA; suppress
    sub-agent chatter in exec.
    
    ## Why
    Enables map-reduce style workflows over arbitrarily large repos using
    the existing Codex orchestrator. This addresses review feedback about
    overly complex job controls and non-deterministic monitoring.
    
    ## Demo (progress bar)
    ```
    ./codex-rs/target/debug/codex exec \
      --enable collab \
      --enable sqlite \
      --full-auto \
      --progress-cursor \
      -c agents.max_threads=16 \
      -C /Users/daveaitel/code/codex \
      - <<'PROMPT'
    Create /tmp/agent_job_progress_demo.csv with columns: path,area and 30 rows:
    path = item-01..item-30, area = test.
    
    Then call spawn_agents_on_csv with:
    - csv_path: /tmp/agent_job_progress_demo.csv
    - instruction: "Run `python - <<'PY'` to sleep a random 0.3–1.2s, then output JSON with keys: path, score (int). Set score = 1."
    - output_csv_path: /tmp/agent_job_progress_demo_out.csv
    PROMPT
    ```
    
    ## Review feedback addressed
    - Auto-start jobs on spawn; removed run/resume/status/export tools.
    - Auto-export on success.
    - More descriptive tool spec + clearer prompts.
    - Avoid deadlocks on spawn failure; pending/running handled safely.
    - Progress bar no longer scrolls; stable single-line redraw.
    
    ## Tests
    - `cd codex-rs && cargo test -p codex-exec`
    - `cd codex-rs && cargo build -p codex-cli`
  • feat: add nick name to sub-agents (#12320)
    Adding random nick name to sub-agents. Used for UX
    
    At the same time, also storing and wiring the role of the sub-agent
  • state: enforce 10 MiB log caps for thread and threadless process logs (#12038)
    ## Summary
    - enforce a 10 MiB cap per `thread_id` in state log storage
    - enforce a 10 MiB cap per `process_uuid` for threadless (`thread_id IS
    NULL`) logs
    - scope pruning to only keys affected by the current insert batch
    - add a cheap per-key `SUM(...)` precheck so windowed prune queries only
    run for keys that are currently over the cap
    - add SQLite indexes used by the pruning queries
    - add focused runtime tests covering both pruning behaviors
    
    ## Why
    This keeps log growth bounded by the intended partition semantics while
    preserving a small, readable implementation localized to the existing
    insert path.
    
    ## Local Latency Snapshot (No Truncation-Pressure Run)
    Collected from session `019c734f-1d16-7002-9e00-c966c9fbbcae` using
    local-only (uncommitted) instrumentation, while not specifically
    benchmarking the truncation-heavy regime.
    
    ### Percentiles By Query (ms)
    | query | count | p50 | p90 | p95 | p99 | max |
    |---|---:|---:|---:|---:|---:|---:|
    | `insert_logs.insert_batch` | 110 | 0.332 | 0.999 | 1.811 | 2.978 |
    3.493 |
    | `insert_logs.precheck.process` | 106 | 0.074 | 0.152 | 0.206 | 0.258 |
    0.426 |
    | `insert_logs.precheck.thread` | 73 | 0.118 | 0.206 | 0.253 | 1.025 |
    1.025 |
    | `insert_logs.prune.process` | 58 | 0.291 | 0.576 | 0.607 | 1.088 |
    1.088 |
    | `insert_logs.prune.thread` | 44 | 0.318 | 0.467 | 0.728 | 0.797 |
    0.797 |
    | `insert_logs.prune_total` | 110 | 0.488 | 0.976 | 1.237 | 1.593 |
    1.684 |
    | `insert_logs.total` | 110 | 1.315 | 2.889 | 3.623 | 5.739 | 5.961 |
    | `insert_logs.tx_begin` | 110 | 0.133 | 0.235 | 0.282 | 0.412 | 0.546 |
    | `insert_logs.tx_commit` | 110 | 0.259 | 0.689 | 0.772 | 1.065 | 1.080
    |
    
    ### `insert_logs.total` Histogram (ms)
    | bucket | count |
    |---|---:|
    | `<= 0.100` | 0 |
    | `<= 0.250` | 0 |
    | `<= 0.500` | 7 |
    | `<= 1.000` | 33 |
    | `<= 2.000` | 40 |
    | `<= 5.000` | 28 |
    | `<= 10.000` | 2 |
    | `<= 20.000` | 0 |
    | `<= 50.000` | 0 |
    | `<= 100.000` | 0 |
    | `> 100.000` | 0 |
    
    ## Local Latency Snapshot (Truncation-Heavy / Cap-Hit Regime)
    Collected from a run where cap-hit behavior was frequent (`135/180`
    insert calls), using local-only (uncommitted) instrumentation and a
    temporary local cap of `10_000` bytes for stress testing (not the merged
    `10 MiB` cap).
    
    ### Percentiles By Query (ms)
    | query | count | p50 | p90 | p95 | p99 | max |
    |---|---:|---:|---:|---:|---:|---:|
    | `insert_logs.insert_batch` | 180 | 0.524 | 1.645 | 2.163 | 3.424 |
    3.777 |
    | `insert_logs.precheck.process` | 171 | 0.086 | 0.235 | 0.373 | 0.758 |
    1.147 |
    | `insert_logs.precheck.thread` | 100 | 0.105 | 0.251 | 0.291 | 1.176 |
    1.622 |
    | `insert_logs.prune.process` | 109 | 0.386 | 0.839 | 1.146 | 1.548 |
    2.588 |
    | `insert_logs.prune.thread` | 56 | 0.253 | 0.550 | 1.148 | 2.484 |
    2.484 |
    | `insert_logs.prune_total` | 180 | 0.511 | 1.221 | 1.695 | 4.548 |
    5.512 |
    | `insert_logs.total` | 180 | 1.631 | 3.902 | 5.103 | 8.901 | 9.095 |
    | `insert_logs.total_cap_hit` | 135 | 1.876 | 4.501 | 5.547 | 8.902 |
    9.096 |
    | `insert_logs.total_no_cap_hit` | 45 | 0.520 | 1.700 | 2.079 | 3.294 |
    3.294 |
    | `insert_logs.tx_begin` | 180 | 0.109 | 0.253 | 0.287 | 1.088 | 1.406 |
    | `insert_logs.tx_commit` | 180 | 0.267 | 0.813 | 1.170 | 2.497 | 2.574
    |
    
    ### `insert_logs.total` Histogram (ms)
    | bucket | count |
    |---|---:|
    | `<= 0.100` | 0 |
    | `<= 0.250` | 0 |
    | `<= 0.500` | 16 |
    | `<= 1.000` | 39 |
    | `<= 2.000` | 60 |
    | `<= 5.000` | 54 |
    | `<= 10.000` | 11 |
    | `<= 20.000` | 0 |
    | `<= 50.000` | 0 |
    | `<= 100.000` | 0 |
    | `> 100.000` | 0 |
    
    ### `insert_logs.total` Histogram When Cap Was Hit (ms)
    | bucket | count |
    |---|---:|
    | `<= 0.100` | 0 |
    | `<= 0.250` | 0 |
    | `<= 0.500` | 0 |
    | `<= 1.000` | 22 |
    | `<= 2.000` | 51 |
    | `<= 5.000` | 51 |
    | `<= 10.000` | 11 |
    | `<= 20.000` | 0 |
    | `<= 50.000` | 0 |
    | `<= 100.000` | 0 |
    | `> 100.000` | 0 |
    
    ### Performance Takeaways
    - Even in a cap-hit-heavy run (`75%` cap-hit calls), `insert_logs.total`
    stays sub-10ms at p99 (`8.901ms`) and max (`9.095ms`).
    - Calls that did **not** hit the cap are materially cheaper
    (`insert_logs.total_no_cap_hit` p95 `2.079ms`) than cap-hit calls
    (`insert_logs.total_cap_hit` p95 `5.547ms`).
    - Compared to the earlier non-truncation-pressure run, overall
    `insert_logs.total` rose from p95 `3.623ms` to p95 `5.103ms`
    (+`1.48ms`), indicating bounded overhead when pruning is active.
    - This truncation-heavy run used an intentionally low local cap for
    stress testing; with the real 10 MiB cap, cap-hit frequency should be
    much lower in normal sessions.
    
    ## Testing
    - `just fmt` (in `codex-rs`)
    - `cargo test -p codex-state` (in `codex-rs`)
  • Add process_uuid to sqlite logs (#11534)
    ## Summary
    This PR is the first slice of the per-session `/feedback` logging work:
    it adds a process-unique identifier to SQLite log rows.
    
    It does **not** change `/feedback` sourcing behavior yet.
    
    ## Changes
    - Add migration `0009_logs_process_id.sql` to extend `logs` with:
      - `process_uuid TEXT`
      - `idx_logs_process_uuid` index
    - Extend state log models:
      - `LogEntry.process_uuid: Option<String>`
      - `LogRow.process_uuid: Option<String>`
    - Stamp each log row with a stable per-process UUID in the sqlite log
    layer:
      - generated once per process as `pid:<pid>:<uuid>`
    - Update sqlite log insert/query paths to persist and read
    `process_uuid`:
      - `INSERT INTO logs (..., process_uuid, ...)`
      - `SELECT ..., process_uuid, ... FROM logs`
    
    ## Why
    App-server runs many sessions in one process. This change provides a
    process-scoping primitive we need for follow-up `/feedback` work, so
    threadless/process-level logs can be associated with the emitting
    process without mixing across processes.
    
    ## Non-goals in this PR
    - No `/feedback` transport/source changes
    - No attachment size changes
    - No sqlite retention/trim policy changes
    
    ## Testing
    - `just fmt`
    - CI will run the full checks
  • feat: align memory phase 1 and make it stronger (#11300)
    ## Align with the new phase-1 design
    
    Basically we know run phase 1 in parallel by considering:
    * Max 64 rollouts
    * Max 1 month old
    * Consider the most recent first
    
    This PR also adds stronger parallelization capabilities by detecting
    stale jobs, retry policies, ownership of computation to prevent double
    computations etc etc
  • state: add memory consolidation lock primitives (#11199)
    ## Summary
    - add a migration for memory_consolidation_locks
    - add acquire/release lock primitives to codex-state runtime
    - add core/state_db wrappers and cwd normalization for memory queries
    and lock keys
    
    ## Testing
    - cargo test -p codex-state memory_consolidation_lock_
    - cargo test -p codex-core --lib state_db::
  • Leverage state DB metadata for thread summaries (#10621)
    Summary:
    - read conversation summaries and cwd info from the state DB when
    possible so we no longer rely on rollout files for metadata and avoid
    extra I/O
    - persist CLI version in thread metadata, surface it through summary
    builders, and add the necessary DB migration hooks
    - simplify thread listing by using enriched state DB data directly
    rather than reading rollout heads
    
    Testing:
    - Not run (not requested)
  • feat: resumable backfill (#10745)
    ## Summary
    
    This PR makes SQLite rollout backfill resumable and repeatable instead
    of one-shot-on-db-create.
    
    ## What changed
    
    - Added a persisted backfill state table:
      - state/migrations/0008_backfill_state.sql
    - Tracks status (pending|running|complete), last_watermark, and
    last_success_at.
    - Added backfill state model/types in codex-state:
      - BackfillState, BackfillStatus (state/src/model/backfill_state.rs)
    - Added runtime APIs to manage backfill lifecycle/progress:
      - get_backfill_state
      - mark_backfill_running
      - checkpoint_backfill
      - mark_backfill_complete
    - Updated core startup behavior:
    - Backfill now runs whenever state is not Complete (not only when DB
    file is newly created).
    - Reworked backfill execution:
    - Collect rollout files, derive deterministic watermark per path, sort,
    resume from last_watermark.
    - Process in batches (BACKFILL_BATCH_SIZE = 200), checkpoint after each
    batch.
      - Mark complete with last_success_at at the end.
    
    ## Why
    
    Previous behavior could leave users permanently partially backfilled if
    the process exited during initial async backfill. This change allows
    safe continuation across restarts and avoids restarting from scratch.
  • feat: add phase 1 mem db (#10634)
    - Schema: thread_id (PK, FK to threads.id with cascade delete),
    trace_summary, memory_summary, updated_at.
    - Migration: creates the table and an index on (updated_at DESC,
    thread_id DESC) for efficient recent-first reads.
      - Runtime API (DB-only):
          - `get_thread_memory(thread_id)`: fetch one memory row.
    - `upsert_thread_memory(thread_id, trace_summary, memory_summary)`:
    insert/update by thread id and always advance updated_at.
    - `get_last_n_thread_memories_for_cwd(cwd, n)`: join thread_memory with
    threads and return newest n rows for an exact cwd match.
    - Model layer: introduced ThreadMemory and row conversion types to keep
    query decoding typed and consistent with existing state models.
  • [feat] persist thread_dynamic_tools in db (#10252)
    Persist thread_dynamic_tools in sqlite and read first from it. Fall back
    to rollout files if it's not found. Persist dynamic tools to both sqlite
    and rollout files.
    
    Saw that new sessions get populated to db correctly & old sessions get
    backfilled correctly at startup:
    ```
    celia@com-92114 codex-rs % sqlite3 ~/.codex/state.sqlite \      "select thread_id, position,name,description,input_schema from thread_dynamic_tools;"
    019c0cad-ec0d-74b2-a787-e8b33a349117|0|geo_lookup|lookup a city|{"properties":{"city":{"type":"string"}},"required":["city"],"type":"object"}
    ....
    019c10ca-aa4b-7620-ae40-c0919fbd7ea7|0|geo_lookup|lookup a city|{"properties":{"city":{"type":"string"}},"required":["city"],"type":"object"}
    ```
  • chore: unify log queries (#10152)
    Unify log queries to only have SQLX code in the runtime and use it for
    both the log client and for tests
  • chore: improve client (#10149)
    <img width="883" height="84" alt="Screenshot 2026-01-29 at 11 13 12"
    src="https://github.com/user-attachments/assets/090a2fec-94ed-4c0f-aee5-1653ed8b1439"
    />
  • feat: add log db (#10086)
    Add a log DB. The goal is just to store our logs in a `.sqlite` DB to
    make it easier to crawl them and drop the oldest ones.
  • feat: sqlite 1 (#10004)
    Add a `.sqlite` database to be used to store rollout metatdata (and
    later logs)
    This PR is phase 1:
    * Add the database and the required infrastructure
    * Add a backfill of the database
    * Persist the newly created rollout both in files and in the DB
    * When we need to get metadata or a rollout, consider the `JSONL` as the
    source of truth but compare the results with the DB and show any errors