Commit Graph

46 Commits

  • Add cwd to memory files (#11591)
    Add cwd to memory files so that model can deal with multi cwd memory
    better.
    
    ---------
    
    Co-authored-by: jif-oai <jif@openai.com>
  • fix: db stuff mem (#11575)
    * Documenting DB functions
    * Fixing 1 nit where stage-2 was sorting the stage 1 in the wrong
    direction
    * Added some tests
  • Ensure list_threads drops stale rollout files (#11572)
    Summary
    - trim `state_db::list_threads_db` results to entries whose rollout
    files still exist, logging and recording a discrepancy for dropped rows
    - delete stale metadata rows from the SQLite store so future calls don’t
    surface invalid paths
    - add regression coverage in `recorder.rs` to verify stale DB paths are
    dropped when the file is missing
  • feat: mem slash commands (#11569)
    Add 2 slash commands for memories:
    * `/m_drop` delete all the memories
    * `/m_update` update the memories with phase 1 and 2
  • feat: make sandbox read access configurable with ReadOnlyAccess (#11387)
    `SandboxPolicy::ReadOnly` previously implied broad read access and could
    not express a narrower read surface.
    This change introduces an explicit read-access model so we can support
    user-configurable read restrictions in follow-up work, while preserving
    current behavior today.
    
    It also ensures unsupported backends fail closed for restricted-read
    policies instead of silently granting broader access than intended.
    
    ## What
    
    - Added `ReadOnlyAccess` in protocol with:
      - `Restricted { include_platform_defaults, readable_roots }`
      - `FullAccess`
    - Updated `SandboxPolicy` to carry read-access configuration:
      - `ReadOnly { access: ReadOnlyAccess }`
      - `WorkspaceWrite { ..., read_only_access: ReadOnlyAccess }`
    - Preserved existing behavior by defaulting current construction paths
    to `ReadOnlyAccess::FullAccess`.
    - Threaded the new fields through sandbox policy consumers and call
    sites across `core`, `tui`, `linux-sandbox`, `windows-sandbox`, and
    related tests.
    - Updated Seatbelt policy generation to honor restricted read roots by
    emitting scoped read rules when full read access is not granted.
    - Added fail-closed behavior on Linux and Windows backends when
    restricted read access is requested but not yet implemented there
    (`UnsupportedOperation`).
    - Regenerated app-server protocol schema and TypeScript artifacts,
    including `ReadOnlyAccess`.
    
    ## Compatibility / rollout
    
    - Runtime behavior remains unchanged by default (`FullAccess`).
    - API/schema changes are in place so future config wiring can enable
    restricted read access without another policy-shape migration.
  • feat: new memory prompts (#11439)
    * Update prompt
    * Wire CWD in the prompt
    * Handle the no-output case
  • fix: flaky test (#11428)
    stage1_concurrent_claims_respect_running_cap was flaky due to SQLite
    lock contention, not cap logic correctness. The claim flow used deferred
    transactions (BEGIN) with read-then-write behavior, which can fail under
    concurrency with SQLITE_BUSY_SNAPSHOT/database is locked when upgrading
    a read transaction to a write transaction. We fixed this by using BEGIN
    IMMEDIATE for stage1 and phase2 claim paths, so lock acquisition happens
    up front and contenders serialize cleanly instead of failing during
    upgrade. After the change, codex-state tests pass and stress reruns of
    the flaky path no longer reproduced the failure.
  • feat: prevent double backfill (#11377)
    ## Summary
    
    Add a DB-backed lease to prevent duplicate `.sqlite` backfill workers
    from running concurrently.
    
    ### What changed
    - Added StateRuntime::try_claim_backfill(lease_seconds) that atomically
    claims backfill only when:
      - backfill is not complete, and
      - no fresh running worker currently owns it.
    - Updated backfill_sessions to use the claim API and exit early when
    another worker already holds the lease.
    - Added runtime tests covering:
      - singleton claim behavior,
      - stale lease takeover,
      - claim blocked after complete.
    - Set backfill lease to 900s in production and 1s in tests.
    
    ### Why
    
    This avoids duplicate backfill work and reduces backfill status churn
    under concurrent startup, while preserving
    current best-effort fallback behavior.
  • feat: mem v2 - PR4 (#11369)
    # Memories migration plan (simplified global workflow)
    
    ## Target behavior
    
    - One shared memory root only: `~/.codex/memories/`.
    - No per-cwd memory buckets, no cwd hash handling.
    - Phase 1 candidate rules:
    - Not currently being processed unless the job lease is stale.
    - Rollout updated within the max-age window (currently 30 days).
    - Rollout idle for at least 12 hours (new constant).
    - Global cap: at most 64 stage-1 jobs in `running` state at any time
    (new invariant).
    - Stage-1 model output shape (new):
    - `rollout_slug` (accepted but ignored for now).
    - `rollout_summary`.
    - `raw_memory`.
    - Phase-1 artifacts written under the shared root:
    - `rollout_summaries/<thread_id>.md` for each rollout summary.
    - `raw_memories.md` containing appended/merged raw memory paragraphs.
    - Phase 2 runs one consolidation agent for the shared `memories/`
    directory.
    - Phase-2 lock is DB-backed with 1 hour lease and heartbeat/expiry.
    
    ## Current code map
    
    - Core startup pipeline: `core/src/memories/startup/mod.rs`.
    - Stage-1 request+parse: `core/src/memories/startup/extract.rs`,
    `core/src/memories/stage_one.rs`, templates in
    `core/templates/memories/`.
    - File materialization: `core/src/memories/storage.rs`,
    `core/src/memories/layout.rs`.
    - Scope routing (cwd/user): `core/src/memories/scope.rs`,
    `core/src/memories/startup/mod.rs`.
    - DB job lifecycle and scope queueing: `state/src/runtime/memory.rs`.
    
    ## PR plan
    
    ## PR 1: Correct phase-1 selection invariants (no behavior-breaking
    layout changes yet)
    
    - Add `PHASE_ONE_MIN_ROLLOUT_IDLE_HOURS: i64 = 12` in
    `core/src/memories/mod.rs`.
    - Thread this into `state::claim_stage1_jobs_for_startup(...)`.
    - Enforce idle-time filter in DB selection logic (not only in-memory
    filtering after `scan_limit`) so eligible threads are not starved by
    very recent threads.
    - Enforce global running cap of 64 at claim time in DB logic:
    - Count fresh `memory_stage1` running jobs.
    - Only allow new claims while count < cap.
    - Keep stale-lease takeover behavior intact.
    - Add/adjust tests in `state/src/runtime.rs`:
    - Idle filter inclusion/exclusion around 12h boundary.
    - Global running-cap guarantee.
    - Existing stale/fresh ownership behavior still passes.
    
    Acceptance criteria:
    - Startup never creates more than 64 fresh `memory_stage1` running jobs.
    - Threads updated <12h ago are skipped.
    - Threads older than 30d are skipped.
    
    ## PR 2: Stage-1 output contract + storage artifacts
    (forward-compatible)
    
    - Update parser/types to accept the new structured output while keeping
    backward compatibility:
    - Add `rollout_slug` (optional for now).
    - Add `rollout_summary`.
    - Keep alias support for legacy `summary` and `rawMemory` until prompt
    swap completes.
    - Update stage-1 schema generator in `core/src/memories/stage_one.rs` to
    include the new keys.
    - Update prompt templates:
    - `core/templates/memories/stage_one_system.md`.
    - `core/templates/memories/stage_one_input.md`.
    - Replace storage model in `core/src/memories/storage.rs`:
    - Introduce `rollout_summaries/` directory writer (`<thread_id>.md`
    files).
    - Introduce `raw_memories.md` aggregator writer from DB rows.
    - Keep deterministic rebuild behavior from DB outputs so files can
    always be regenerated.
    - Update consolidation prompt template to reference `rollout_summaries/`
    + `raw_memories.md` inputs.
    
    Acceptance criteria:
    - Stage-1 accepts both old and new output keys during migration.
    - Phase-1 artifacts are generated in new format from DB state.
    - No dependence on per-thread files in `raw_memories/`.
    
    ## PR 3: Remove per-cwd memories and move to one global memory root
    
    - Simplify layout in `core/src/memories/layout.rs`:
    - Single root: `codex_home/memories`.
    - Remove cwd-hash bucket helpers and normalization logic used only for
    memory pathing.
    - Remove scope branching from startup phase-2 dispatch path:
    - No cwd/user mapping in `core/src/memories/startup/mod.rs`.
    - One target root for consolidation.
    - In `state/src/runtime/memory.rs`, stop enqueueing/handling cwd
    consolidation scope.
    - Keep one logical consolidation scope/job key (global/user) to avoid a
    risky schema rewrite in same PR.
    - Add one-time migration helper (core side) to preserve current shared
    memory output:
    - If `~/.codex/memories/user/memory` exists and new root is empty,
    move/copy contents into `~/.codex/memories`.
    - Leave old hashed cwd buckets untouched for now (safe/no-destructive
    migration).
    
    Acceptance criteria:
    - New runs only read/write `~/.codex/memories`.
    - No new cwd-scoped consolidation jobs are enqueued.
    - Existing user-shared memory content is preserved.
    
    ## PR 4: Phase-2 global lock simplification and cleanup
    
    - Replace multi-scope dispatch with a single global consolidation claim
    path:
    - Either reuse jobs table with one fixed key, or add a tiny dedicated
    lock helper; keep 1h lease.
    - Ensure at most one consolidation agent can run at once.
    - Keep heartbeat + stale lock recovery semantics in
    `core/src/memories/startup/watch.rs`.
    - Remove dead scope code and legacy constants no longer used.
    - Update tests:
    - One-agent-at-a-time behavior.
    - Lock expiry allows takeover after stale lease.
    
    Acceptance criteria:
    - Exactly one phase-2 consolidation agent can be active cluster-wide
    (per local DB).
    - Stale lock recovers automatically.
    
    ## PR 5: Final cleanup and docs
    
    - Remove legacy artifacts and references:
    - `raw_memories/` and `memory_summary.md` assumptions from
    prompts/comments/tests.
    - Scope constants for cwd memory pathing in core/state if fully unused.
    - Update docs under `docs/` for memory workflow and directory layout.
    - Add a brief operator note for rollout: compatibility window for old
    stage-1 JSON keys and when to remove aliases.
    
    Acceptance criteria:
    - Code and docs reflect only the simplified global workflow.
    - No stale references to per-cwd memory buckets.
    
    ## Notes on sequencing
    
    - PR 1 is safest first because it improves correctness without changing
    external artifact layout.
    - PR 2 keeps parser compatibility so prompt deployment can happen
    independently.
    - PR 3 and PR 4 split filesystem/scope simplification from locking
    simplification to reduce blast radius.
    - PR 5 is intentionally cleanup-only.
  • feat: mem v2 - PR3 (#11366)
    # Memories migration plan (simplified global workflow)
    
    ## Target behavior
    
    - One shared memory root only: `~/.codex/memories/`.
    - No per-cwd memory buckets, no cwd hash handling.
    - Phase 1 candidate rules:
    - Not currently being processed unless the job lease is stale.
    - Rollout updated within the max-age window (currently 30 days).
    - Rollout idle for at least 12 hours (new constant).
    - Global cap: at most 64 stage-1 jobs in `running` state at any time
    (new invariant).
    - Stage-1 model output shape (new):
    - `rollout_slug` (accepted but ignored for now).
    - `rollout_summary`.
    - `raw_memory`.
    - Phase-1 artifacts written under the shared root:
    - `rollout_summaries/<thread_id>.md` for each rollout summary.
    - `raw_memories.md` containing appended/merged raw memory paragraphs.
    - Phase 2 runs one consolidation agent for the shared `memories/`
    directory.
    - Phase-2 lock is DB-backed with 1 hour lease and heartbeat/expiry.
    
    ## Current code map
    
    - Core startup pipeline: `core/src/memories/startup/mod.rs`.
    - Stage-1 request+parse: `core/src/memories/startup/extract.rs`,
    `core/src/memories/stage_one.rs`, templates in
    `core/templates/memories/`.
    - File materialization: `core/src/memories/storage.rs`,
    `core/src/memories/layout.rs`.
    - Scope routing (cwd/user): `core/src/memories/scope.rs`,
    `core/src/memories/startup/mod.rs`.
    - DB job lifecycle and scope queueing: `state/src/runtime/memory.rs`.
    
    ## PR plan
    
    ## PR 1: Correct phase-1 selection invariants (no behavior-breaking
    layout changes yet)
    
    - Add `PHASE_ONE_MIN_ROLLOUT_IDLE_HOURS: i64 = 12` in
    `core/src/memories/mod.rs`.
    - Thread this into `state::claim_stage1_jobs_for_startup(...)`.
    - Enforce idle-time filter in DB selection logic (not only in-memory
    filtering after `scan_limit`) so eligible threads are not starved by
    very recent threads.
    - Enforce global running cap of 64 at claim time in DB logic:
    - Count fresh `memory_stage1` running jobs.
    - Only allow new claims while count < cap.
    - Keep stale-lease takeover behavior intact.
    - Add/adjust tests in `state/src/runtime.rs`:
    - Idle filter inclusion/exclusion around 12h boundary.
    - Global running-cap guarantee.
    - Existing stale/fresh ownership behavior still passes.
    
    Acceptance criteria:
    - Startup never creates more than 64 fresh `memory_stage1` running jobs.
    - Threads updated <12h ago are skipped.
    - Threads older than 30d are skipped.
    
    ## PR 2: Stage-1 output contract + storage artifacts
    (forward-compatible)
    
    - Update parser/types to accept the new structured output while keeping
    backward compatibility:
    - Add `rollout_slug` (optional for now).
    - Add `rollout_summary`.
    - Keep alias support for legacy `summary` and `rawMemory` until prompt
    swap completes.
    - Update stage-1 schema generator in `core/src/memories/stage_one.rs` to
    include the new keys.
    - Update prompt templates:
    - `core/templates/memories/stage_one_system.md`.
    - `core/templates/memories/stage_one_input.md`.
    - Replace storage model in `core/src/memories/storage.rs`:
    - Introduce `rollout_summaries/` directory writer (`<thread_id>.md`
    files).
    - Introduce `raw_memories.md` aggregator writer from DB rows.
    - Keep deterministic rebuild behavior from DB outputs so files can
    always be regenerated.
    - Update consolidation prompt template to reference `rollout_summaries/`
    + `raw_memories.md` inputs.
    
    Acceptance criteria:
    - Stage-1 accepts both old and new output keys during migration.
    - Phase-1 artifacts are generated in new format from DB state.
    - No dependence on per-thread files in `raw_memories/`.
    
    ## PR 3: Remove per-cwd memories and move to one global memory root
    
    - Simplify layout in `core/src/memories/layout.rs`:
    - Single root: `codex_home/memories`.
    - Remove cwd-hash bucket helpers and normalization logic used only for
    memory pathing.
    - Remove scope branching from startup phase-2 dispatch path:
    - No cwd/user mapping in `core/src/memories/startup/mod.rs`.
    - One target root for consolidation.
    - In `state/src/runtime/memory.rs`, stop enqueueing/handling cwd
    consolidation scope.
    - Keep one logical consolidation scope/job key (global/user) to avoid a
    risky schema rewrite in same PR.
    - Add one-time migration helper (core side) to preserve current shared
    memory output:
    - If `~/.codex/memories/user/memory` exists and new root is empty,
    move/copy contents into `~/.codex/memories`.
    - Leave old hashed cwd buckets untouched for now (safe/no-destructive
    migration).
    
    Acceptance criteria:
    - New runs only read/write `~/.codex/memories`.
    - No new cwd-scoped consolidation jobs are enqueued.
    - Existing user-shared memory content is preserved.
    
    ## PR 4: Phase-2 global lock simplification and cleanup
    
    - Replace multi-scope dispatch with a single global consolidation claim
    path:
    - Either reuse jobs table with one fixed key, or add a tiny dedicated
    lock helper; keep 1h lease.
    - Ensure at most one consolidation agent can run at once.
    - Keep heartbeat + stale lock recovery semantics in
    `core/src/memories/startup/watch.rs`.
    - Remove dead scope code and legacy constants no longer used.
    - Update tests:
    - One-agent-at-a-time behavior.
    - Lock expiry allows takeover after stale lease.
    
    Acceptance criteria:
    - Exactly one phase-2 consolidation agent can be active cluster-wide
    (per local DB).
    - Stale lock recovers automatically.
    
    ## PR 5: Final cleanup and docs
    
    - Remove legacy artifacts and references:
    - `raw_memories/` and `memory_summary.md` assumptions from
    prompts/comments/tests.
    - Scope constants for cwd memory pathing in core/state if fully unused.
    - Update docs under `docs/` for memory workflow and directory layout.
    - Add a brief operator note for rollout: compatibility window for old
    stage-1 JSON keys and when to remove aliases.
    
    Acceptance criteria:
    - Code and docs reflect only the simplified global workflow.
    - No stale references to per-cwd memory buckets.
    
    ## Notes on sequencing
    
    - PR 1 is safest first because it improves correctness without changing
    external artifact layout.
    - PR 2 keeps parser compatibility so prompt deployment can happen
    independently.
    - PR 3 and PR 4 split filesystem/scope simplification from locking
    simplification to reduce blast radius.
    - PR 5 is intentionally cleanup-only.
  • feat: mem v2 - PR1 (#11364)
    # Memories migration plan (simplified global workflow)
    
    ## Target behavior
    
    - One shared memory root only: `~/.codex/memories/`.
    - No per-cwd memory buckets, no cwd hash handling.
    - Phase 1 candidate rules:
    - Not currently being processed unless the job lease is stale.
    - Rollout updated within the max-age window (currently 30 days).
    - Rollout idle for at least 12 hours (new constant).
    - Global cap: at most 64 stage-1 jobs in `running` state at any time
    (new invariant).
    - Stage-1 model output shape (new):
    - `rollout_slug` (accepted but ignored for now).
    - `rollout_summary`.
    - `raw_memory`.
    - Phase-1 artifacts written under the shared root:
    - `rollout_summaries/<thread_id>.md` for each rollout summary.
    - `raw_memories.md` containing appended/merged raw memory paragraphs.
    - Phase 2 runs one consolidation agent for the shared `memories/`
    directory.
    - Phase-2 lock is DB-backed with 1 hour lease and heartbeat/expiry.
    
    ## Current code map
    
    - Core startup pipeline: `core/src/memories/startup/mod.rs`.
    - Stage-1 request+parse: `core/src/memories/startup/extract.rs`,
    `core/src/memories/stage_one.rs`, templates in
    `core/templates/memories/`.
    - File materialization: `core/src/memories/storage.rs`,
    `core/src/memories/layout.rs`.
    - Scope routing (cwd/user): `core/src/memories/scope.rs`,
    `core/src/memories/startup/mod.rs`.
    - DB job lifecycle and scope queueing: `state/src/runtime/memory.rs`.
    
    ## PR plan
    
    ## PR 1: Correct phase-1 selection invariants (no behavior-breaking
    layout changes yet)
    
    - Add `PHASE_ONE_MIN_ROLLOUT_IDLE_HOURS: i64 = 12` in
    `core/src/memories/mod.rs`.
    - Thread this into `state::claim_stage1_jobs_for_startup(...)`.
    - Enforce idle-time filter in DB selection logic (not only in-memory
    filtering after `scan_limit`) so eligible threads are not starved by
    very recent threads.
    - Enforce global running cap of 64 at claim time in DB logic:
    - Count fresh `memory_stage1` running jobs.
    - Only allow new claims while count < cap.
    - Keep stale-lease takeover behavior intact.
    - Add/adjust tests in `state/src/runtime.rs`:
    - Idle filter inclusion/exclusion around 12h boundary.
    - Global running-cap guarantee.
    - Existing stale/fresh ownership behavior still passes.
    
    Acceptance criteria:
    - Startup never creates more than 64 fresh `memory_stage1` running jobs.
    - Threads updated <12h ago are skipped.
    - Threads older than 30d are skipped.
    
    ## PR 2: Stage-1 output contract + storage artifacts
    (forward-compatible)
    
    - Update parser/types to accept the new structured output while keeping
    backward compatibility:
    - Add `rollout_slug` (optional for now).
    - Add `rollout_summary`.
    - Keep alias support for legacy `summary` and `rawMemory` until prompt
    swap completes.
    - Update stage-1 schema generator in `core/src/memories/stage_one.rs` to
    include the new keys.
    - Update prompt templates:
    - `core/templates/memories/stage_one_system.md`.
    - `core/templates/memories/stage_one_input.md`.
    - Replace storage model in `core/src/memories/storage.rs`:
    - Introduce `rollout_summaries/` directory writer (`<thread_id>.md`
    files).
    - Introduce `raw_memories.md` aggregator writer from DB rows.
    - Keep deterministic rebuild behavior from DB outputs so files can
    always be regenerated.
    - Update consolidation prompt template to reference `rollout_summaries/`
    + `raw_memories.md` inputs.
    
    Acceptance criteria:
    - Stage-1 accepts both old and new output keys during migration.
    - Phase-1 artifacts are generated in new format from DB state.
    - No dependence on per-thread files in `raw_memories/`.
    
    ## PR 3: Remove per-cwd memories and move to one global memory root
    
    - Simplify layout in `core/src/memories/layout.rs`:
    - Single root: `codex_home/memories`.
    - Remove cwd-hash bucket helpers and normalization logic used only for
    memory pathing.
    - Remove scope branching from startup phase-2 dispatch path:
    - No cwd/user mapping in `core/src/memories/startup/mod.rs`.
    - One target root for consolidation.
    - In `state/src/runtime/memory.rs`, stop enqueueing/handling cwd
    consolidation scope.
    - Keep one logical consolidation scope/job key (global/user) to avoid a
    risky schema rewrite in same PR.
    - Add one-time migration helper (core side) to preserve current shared
    memory output:
    - If `~/.codex/memories/user/memory` exists and new root is empty,
    move/copy contents into `~/.codex/memories`.
    - Leave old hashed cwd buckets untouched for now (safe/no-destructive
    migration).
    
    Acceptance criteria:
    - New runs only read/write `~/.codex/memories`.
    - No new cwd-scoped consolidation jobs are enqueued.
    - Existing user-shared memory content is preserved.
    
    ## PR 4: Phase-2 global lock simplification and cleanup
    
    - Replace multi-scope dispatch with a single global consolidation claim
    path:
    - Either reuse jobs table with one fixed key, or add a tiny dedicated
    lock helper; keep 1h lease.
    - Ensure at most one consolidation agent can run at once.
    - Keep heartbeat + stale lock recovery semantics in
    `core/src/memories/startup/watch.rs`.
    - Remove dead scope code and legacy constants no longer used.
    - Update tests:
    - One-agent-at-a-time behavior.
    - Lock expiry allows takeover after stale lease.
    
    Acceptance criteria:
    - Exactly one phase-2 consolidation agent can be active cluster-wide
    (per local DB).
    - Stale lock recovers automatically.
    
    ## PR 5: Final cleanup and docs
    
    - Remove legacy artifacts and references:
    - `raw_memories/` and `memory_summary.md` assumptions from
    prompts/comments/tests.
    - Scope constants for cwd memory pathing in core/state if fully unused.
    - Update docs under `docs/` for memory workflow and directory layout.
    - Add a brief operator note for rollout: compatibility window for old
    stage-1 JSON keys and when to remove aliases.
    
    Acceptance criteria:
    - Code and docs reflect only the simplified global workflow.
    - No stale references to per-cwd memory buckets.
    
    ## Notes on sequencing
    
    - PR 1 is safest first because it improves correctness without changing
    external artifact layout.
    - PR 2 keeps parser compatibility so prompt deployment can happen
    independently.
    - PR 3 and PR 4 split filesystem/scope simplification from locking
    simplification to reduce blast radius.
    - PR 5 is intentionally cleanup-only.
  • feat: phase 2 consolidation (#11306)
    Consolidation phase of memories
    
    Cleaning and better handling of concurrency
  • feat: align memory phase 1 and make it stronger (#11300)
    ## Align with the new phase-1 design
    
    Basically we know run phase 1 in parallel by considering:
    * Max 64 rollouts
    * Max 1 month old
    * Consider the most recent first
    
    This PR also adds stronger parallelization capabilities by detecting
    stale jobs, retry policies, ownership of computation to prevent double
    computations etc etc
  • memories: add extraction and prompt module foundation (#11200)
    ## Summary
    - add the new `core/src/memories` module (phase-one parsing, rollout
    filtering, storage, selection, prompts)
    - add Askama-backed memory templates for stage-one input/system and
    consolidation prompts
    - add module tests for parsing, filtering, path bucketing, and summary
    maintenance
    
    ## Testing
    - just fmt
    - cargo test -p codex-core --lib memories::
  • state: add memory consolidation lock primitives (#11199)
    ## Summary
    - add a migration for memory_consolidation_locks
    - add acquire/release lock primitives to codex-state runtime
    - add core/state_db wrappers and cwd normalization for memory queries
    and lock keys
    
    ## Testing
    - cargo test -p codex-state memory_consolidation_lock_
    - cargo test -p codex-core --lib state_db::
  • Leverage state DB metadata for thread summaries (#10621)
    Summary:
    - read conversation summaries and cwd info from the state DB when
    possible so we no longer rely on rollout files for metadata and avoid
    extra I/O
    - persist CLI version in thread metadata, surface it through summary
    builders, and add the necessary DB migration hooks
    - simplify thread listing by using enriched state DB data directly
    rather than reading rollout heads
    
    Testing:
    - Not run (not requested)
  • feat: resumable backfill (#10745)
    ## Summary
    
    This PR makes SQLite rollout backfill resumable and repeatable instead
    of one-shot-on-db-create.
    
    ## What changed
    
    - Added a persisted backfill state table:
      - state/migrations/0008_backfill_state.sql
    - Tracks status (pending|running|complete), last_watermark, and
    last_success_at.
    - Added backfill state model/types in codex-state:
      - BackfillState, BackfillStatus (state/src/model/backfill_state.rs)
    - Added runtime APIs to manage backfill lifecycle/progress:
      - get_backfill_state
      - mark_backfill_running
      - checkpoint_backfill
      - mark_backfill_complete
    - Updated core startup behavior:
    - Backfill now runs whenever state is not Complete (not only when DB
    file is newly created).
    - Reworked backfill execution:
    - Collect rollout files, derive deterministic watermark per path, sort,
    resume from last_watermark.
    - Process in batches (BACKFILL_BATCH_SIZE = 200), checkpoint after each
    batch.
      - Mark complete with last_success_at at the end.
    
    ## Why
    
    Previous behavior could leave users permanently partially backfilled if
    the process exited during initial async backfill. This change allows
    safe continuation across restarts and avoids restarting from scratch.
  • feat: add phase 1 mem db (#10634)
    - Schema: thread_id (PK, FK to threads.id with cascade delete),
    trace_summary, memory_summary, updated_at.
    - Migration: creates the table and an index on (updated_at DESC,
    thread_id DESC) for efficient recent-first reads.
      - Runtime API (DB-only):
          - `get_thread_memory(thread_id)`: fetch one memory row.
    - `upsert_thread_memory(thread_id, trace_summary, memory_summary)`:
    insert/update by thread id and always advance updated_at.
    - `get_last_n_thread_memories_for_cwd(cwd, n)`: join thread_memory with
    threads and return newest n rows for an exact cwd match.
    - Model layer: introduced ThreadMemory and row conversion types to keep
    query decoding typed and consistent with existing state models.
  • Migrate state DB path helpers to versioned filename (#10623)
    Summary
    - add versioned state sqlite filename helpers and re-export them from
    the state crate
    - remove legacy state files when initializing the runtime and update
    consumers/tests to use the new helpers
    - tweak logs client description and database resolution to match the new
    path
  • chore: simplify user message detection (#10611)
    We don't check anymore the response item with `user` role as they may be
    instructions etc
  • Prefer state DB thread listings before filesystem (#10544)
    Summary
    - add Cursor/ThreadsPage conversions so state DB listings can be mapped
    back into the rollout list model
    - make recorder list helpers query the state DB first (archived flag
    included) and only fall back to file traversal if needed, along with
    populating head bytes lazily
    - add extensive tests to ensure the DB path is honored for active and
    archived threads and that the fallback works
    
    Testing
    - Not run (not requested)
    
    <img width="1196" height="693" alt="Screenshot 2026-02-03 at 20 42 33"
    src="https://github.com/user-attachments/assets/826b3c7a-ef11-4b27-802a-3c343695794a"
    />
  • Avoid redundant transactional check before inserting dynamic tools (#10521)
    Summary
    - remove the extra transaction guard that checked for existing dynamic
    tools per thread before inserting new ones
    - insert each tool record with `ON CONFLICT(thread_id, position) DO
    NOTHING` to ignore duplicates instead of pre-querying
    - simplify execution to use the shared pool directly and avoid unneeded
    commits
    
    Testing
    - Not run (not requested)
  • chore: add phase to message responseitem (#10455)
    ### What
    
    add wiring for `phase` field on `ResponseItem::Message` to lay
    groundwork for differentiating model preambles and final messages.
    currently optional.
    
    follows pattern in #9698.
    
    updated schemas with `just write-app-server-schema` so we can see type
    changes.
    
    ### Tests
    Updated existing tests for SSE parsing and hydrating from history
  • [feat] persist thread_dynamic_tools in db (#10252)
    Persist thread_dynamic_tools in sqlite and read first from it. Fall back
    to rollout files if it's not found. Persist dynamic tools to both sqlite
    and rollout files.
    
    Saw that new sessions get populated to db correctly & old sessions get
    backfilled correctly at startup:
    ```
    celia@com-92114 codex-rs % sqlite3 ~/.codex/state.sqlite \      "select thread_id, position,name,description,input_schema from thread_dynamic_tools;"
    019c0cad-ec0d-74b2-a787-e8b33a349117|0|geo_lookup|lookup a city|{"properties":{"city":{"type":"string"}},"required":["city"],"type":"object"}
    ....
    019c10ca-aa4b-7620-ae40-c0919fbd7ea7|0|geo_lookup|lookup a city|{"properties":{"city":{"type":"string"}},"required":["city"],"type":"object"}
    ```
  • feat: backfill timing metric (#10218)
    1. Add a metric to measure the backfill time
    2. Add a unit to the timing histogram
  • feat: reduce span exposition (#10171)
    This only avoids the creation of duplicates spans
  • chore: unify log queries (#10152)
    Unify log queries to only have SQLX code in the runtime and use it for
    both the log client and for tests
  • chore: improve client (#10149)
    <img width="883" height="84" alt="Screenshot 2026-01-29 at 11 13 12"
    src="https://github.com/user-attachments/assets/090a2fec-94ed-4c0f-aee5-1653ed8b1439"
    />
  • feat: log db client (#10087)
    ```
    just log -h
    if [ "${1:-}" = "--" ]; then shift; fi; cargo run -p codex-state --bin logs_client -- "$@"
        Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.21s
         Running `target/debug/logs_client -h`
    Tail Codex logs from state.sqlite with simple filters
    
    Usage: logs_client [OPTIONS]
    
    Options:
          --codex-home <CODEX_HOME>  Path to CODEX_HOME. Defaults to $CODEX_HOME or ~/.codex [env: CODEX_HOME=]
          --db <DB>                  Direct path to the SQLite database. Overrides --codex-home
          --level <LEVEL>            Log level to match exactly (case-insensitive)
          --from <RFC3339|UNIX>      Start timestamp (RFC3339 or unix seconds)
          --to <RFC3339|UNIX>        End timestamp (RFC3339 or unix seconds)
          --module <MODULE>          Substring match on module_path
          --file <FILE>              Substring match on file path
          --backfill <BACKFILL>      Number of matching rows to show before tailing [default: 200]
          --poll-ms <POLL_MS>        Poll interval in milliseconds [default: 500]
      -h, --help                     Print help
      ```
  • feat: add log db (#10086)
    Add a log DB. The goal is just to store our logs in a `.sqlite` DB to
    make it easier to crawl them and drop the oldest ones.
  • feat: sqlite 1 (#10004)
    Add a `.sqlite` database to be used to store rollout metatdata (and
    later logs)
    This PR is phase 1:
    * Add the database and the required infrastructure
    * Add a backfill of the database
    * Persist the newly created rollout both in files and in the DB
    * When we need to get metadata or a rollout, consider the `JSONL` as the
    source of truth but compare the results with the DB and show any errors