## Description This PR adds a new `historyMode = "legacy" | "paginated"` to `Thread`. This will be stored in `SessionMeta` in the JSONL rollout file and as a new column in the SQLite thread_metadata table, and exposed on `thread/start` and on the `Thread` object in app-server. ## What changed - Added canonical `ThreadHistoryMode` with `legacy` and `paginated`, defaulting old and new SessionMeta to `legacy`. - Carried `history_mode` through core session config, ThreadStore stored metadata, local/in-memory stores, rollout metadata extraction, and the existing SQLite `threads` table. - Added experimental `historyMode` to app-server v2 `Thread` and `thread/start`. - Made paginated stored threads metadata-discoverable but unsupported for legacy full-history reads, `load_history`, live resume, and create paths. - Regenerated app-server schema fixtures and added protocol/state/thread-store/app-server coverage for persistence and fail-closed behavior. ## Compatibility floor Because users may be running various versions of Codex binaries on the same machine (TUI, Codex App, etc.), we will need to establish a compatibility floor for upcoming paginated threads, which will change how thread storage reads and writes work. The overall plan here: ``` Release N: - Add historyMode to SessionMeta / Thread / SQLite metadata. - Teach binaries to understand paginated threads. - If a binary sees `historyMode="paginated"` but does not support the paginated contract, it refuses to resume/mutate the thread. - Default remains `"legacy"`. Release N+1: - First-party clients start opting into paginated threads where appropriate. - Internal dogfood / staged rollout. - Measure old-client usage and paginated-thread unsupported errors. Release N+2: - Only after Release N+ is overwhelmingly deployed, make paginated the default. - Accept that a small tail of N-1-or-older binaries may not understand paginated threads. ``` The important behavior change is fail-closed handling for a binary that encounters a persisted `paginated` thread before it knows how to fully support paginated history. In app-server, if a thread is `paginated`, we will: - allow metadata-only discovery paths like `thread/list` and `thread/read(includeTurns=false)`, so clients can still see the thread and inspect its `historyMode` - reject legacy full-history/live-thread paths like `thread/read(includeTurns=true)` and `thread/resume` with an unsupported JSON-RPC error - avoid silently treating an unknown or future `historyMode` as `legacy` Under the hood, the ThreadStore layer also rejects legacy operations that would need to load or replay the full thread history for a paginated thread. That gives us the behavior we want for Release N: future paginated threads are visible, but this binary fails closed instead of trying to operate on them as if they were legacy threads.
Memories
This directory owns reusable memory crates and the memory pipeline documentation.
Runtime orchestration for Phase 1 and Phase 2 still lives in codex-core under
codex-rs/core/src/memories/.
Crates
codex-rs/memories/read(codex-memories-read) owns the read path: memory developer-instruction injection, memory citation parsing, and read-usage telemetry classification.codex-rs/memories/write(codex-memories-write) owns the write path: Phase 1 and Phase 2 prompt rendering, filesystem artifact helpers, workspace diff helpers, and extension resource pruning.
Prompt Templates
Memory prompt templates live with the crate that uses them:
- The undated template files are the canonical latest versions used at runtime:
read/templates/memories/read_path.mdwrite/templates/memories/stage_one_system.mdwrite/templates/memories/stage_one_input.mdwrite/templates/memories/consolidation.md
- In
codex, edit those undated template files in place. - The dated snapshot-copy workflow is used in the separate
openai/project/agent_memory/writeharness repo, not here.
When it runs
The pipeline is triggered when a root session starts, and only if:
- the session is not ephemeral
- the memory feature is enabled
- the session is not a sub-agent session
- the state DB is available
It runs asynchronously in the background and executes two phases in order: Phase 1, then Phase 2.
Phase 1: Rollout Extraction (per-thread)
Phase 1 finds recent eligible rollouts and extracts a structured memory from each one.
Eligible rollouts are selected from the state DB using startup claim rules. In practice this means the pipeline only considers rollouts that are:
- from allowed interactive session sources
- within the configured age window
- idle long enough (to avoid summarizing still-active/fresh rollouts)
- not already owned by another in-flight phase-1 worker
- within startup scan/claim limits (bounded work per startup)
What it does:
- claims a bounded set of rollout jobs from the state DB (startup claim)
- filters rollout content down to memory-relevant response items
- sends each rollout to a model (in parallel, with a concurrency cap)
- expects structured output containing:
- a detailed
raw_memory - a compact
rollout_summary - an optional
rollout_slug
- a detailed
- redacts secrets from the generated memory fields
- stores successful outputs back into the state DB as stage-1 outputs
Concurrency / coordination:
- Phase 1 runs multiple extraction jobs in parallel (with a fixed concurrency cap) so startup memory generation can process several rollouts at once.
- Each job is leased/claimed in the state DB before processing, which prevents duplicate work across concurrent workers/startups.
- Failed jobs are marked with retry backoff, so they are retried later instead of hot-looping.
Job outcomes:
succeeded(memory produced)succeeded_no_output(valid run but nothing useful generated)failed(with retry backoff/lease handling in DB)
Phase 1 is the stage that turns individual rollouts into DB-backed memory records.
Phase 2: Global Consolidation
Phase 2 consolidates the latest stage-1 outputs into the filesystem memory artifacts and then runs a dedicated consolidation agent.
What it does:
- claims a single global phase-2 lock before touching the memories root (so only one consolidation inspects or mutates the workspace at a time)
- loads a bounded set of stage-1 outputs from the state DB using phase-2
selection rules:
- ignores memories whose
last_usagefalls outside the configuredmax_unused_dayswindow - for memories with no
last_usage, falls back togenerated_atso fresh never-used memories can still be selected - ranks eligible memories by
usage_countfirst, then by the most recentlast_usage/generated_at
- ignores memories whose
- computes a completion watermark from the claimed watermark + newest input timestamps
- syncs local memory artifacts under the memories root:
raw_memories.md(merged raw memories, stable ascending thread-id order)rollout_summaries/(one summary file per selected rollout)
- keeps the memories root itself as a git-baseline directory, initialized under
~/.codex/memories/.gitbycodex-git-utils - prunes stale rollout summaries that are no longer selected
- prunes memory extension resource files older than the extension retention window, so cleanup appears in the workspace diff
- writes
phase2_workspace_diff.mdin the memories root with the git-style diff from the previous successful Phase 2 baseline to the current worktree - if the memory workspace has no changes after artifact sync/pruning, marks the job successful and exits
If the memory workspace has changes, it then:
- spawns an internal consolidation sub-agent
- builds the Phase 2 prompt with the path to the generated workspace diff
- points the agent at
phase2_workspace_diff.mdfor the detailed diff context - runs it with no approvals, no network, and local write access only
- disables collab for that agent (to prevent recursive delegation)
- watches the agent status and heartbeats the global job lease while it runs
- resets the memory git baseline after the agent completes successfully; the generated diff file is removed before this reset so deleted content is not kept in the prompt artifact or unreachable git objects
- marks the phase-2 job success/failure in the state DB when the agent finishes
Selection and workspace-diff behavior:
- successful Phase 2 runs mark the exact stage-1 snapshots they consumed with
selected_for_phase2 = 1and persist the matchingselected_for_phase2_source_updated_at - Phase 1 upserts preserve the previous
selected_for_phase2baseline until the next successful Phase 2 run rewrites it - Phase 2 loads only the current top-N selected stage-1 inputs, syncs
rollout_summaries/directly to that selection, rendersraw_memories.mdin stable ascending thread-id order to avoid usage-rank churn, then lets the git-style workspace diff surface additions, modifications, and deletions against the previous successful memory baseline - when the selected input set is empty, stale
rollout_summaries/files are removed andraw_memories.mdis rewritten to the empty-input placeholder; consolidated outputs such asMEMORY.md,memory_summary.md, andskills/are left for the agent to update
Watermark behavior:
- The global phase-2 lock does not use DB watermarks as a dirty check; git workspace dirtiness decides whether an agent needs to run.
- The global phase-2 job row still tracks an input watermark as bookkeeping for the latest DB input timestamp known when the job was claimed.
- Phase 2 recomputes a
new_watermarkusing the max of:- the claimed watermark
- the newest
source_updated_attimestamp in the stage-1 inputs it actually loaded
- On success, Phase 2 stores that completion watermark in the DB.
- This avoids moving the recorded completion watermark backwards, but does not decide whether Phase 2 has work.
In practice, this phase is responsible for refreshing the on-disk memory workspace and producing/updating the higher-level consolidated memory outputs.
Why it is split into two phases
- Phase 1 scales across many rollouts and produces normalized per-rollout memory records.
- Phase 2 serializes global consolidation so the shared memory artifacts are updated safely and consistently.