82 Commits

  • [codex] Restore thread recency with compatible migration history (#28671)
    ## Summary
    
    - Revert #28655, restoring the thread `recencyAt` behavior introduced by
    #27910.
    - Move `threads_recency_at` to migration 0039 so it no longer collides
    with `external_agent_config_imports` at version 0038.
    - Repair databases that already applied the recency migration as version
    38 by moving the matching migration-history row to version 39 before
    SQLx validation. The current version-38 migration can then apply
    normally.
    
    ## Validation
    
    - `just test -p codex-state
    migrations::tests::repairs_recency_migration_that_was_applied_as_version_38`
    - `just test -p codex-state -p codex-rollout -p codex-thread-store -p
    codex-app-server-protocol -p codex-tui`: 3,439 passed; six TUI tests
    could not open the machine's existing read-only incident database at
    `~/.codex/sqlite/state_5.sqlite`.
    - `just fix -p codex-state`
    - `just fmt`
    - Verified that state migration versions are unique.
  • Revert thread recencyAt for sidebar ordering (#28655)
    ## Why
    
    Revert #27910 to remove the newly introduced thread `recencyAt`
    persistence and API behavior from `main`.
    
    ## What changed
    
    This reverts commit `fac3158c2a783095768076489815f361fa9b0db4`,
    including the state migration, thread-store propagation, app-server API
    surface, generated schemas, and related tests.
    
    ## Validation
    
    Not run before opening; relying on CI for the initial fast signal.
  • Add thread recencyAt for sidebar ordering (#27910)
    ## Summary
    
    Add a server-owned `recencyAt` timestamp and `recency_at` thread-list
    sort key for product recency ordering while preserving the existing
    meaning of `updatedAt` as the latest persisted thread mutation.
    
    This is the server-side alternative to #27697. Rather than narrowing
    `updatedAt`, clients can sort the sidebar by `recency_at` and continue
    treating `updatedAt` as mutation time.
    
    Paired Codex Apps PR:
    [openai/openai#1024599](https://github.com/openai/openai/pull/1024599)
    
    ## Contract
    
    - `recencyAt` initializes when a thread is created.
    - A turn start advances `recencyAt` monotonically.
    - Commentary, agent output, tool results, token/accounting updates, turn
    completion, archive, unarchive, resume, and generic metadata writes do
    not advance it.
    - `updatedAt` retains its existing behavior and continues to advance for
    persisted thread mutations.
    - Current servers populate `recencyAt`; the response field is optional
    in generated TypeScript so clients connected to older servers can fall
    back to `updatedAt`.
    - Filesystem-only fallback uses existing updated/mtime ordering when
    SQLite is unavailable.
    
    ## Persistence and compatibility
    
    Migration 0038 adds second- and millisecond-precision recency columns,
    backfills them from the existing updated timestamp, creates list
    indexes, and includes an insert trigger so older binaries writing to a
    migrated database seed recency without causing later mutations to
    advance it.
    
    Generic metadata upserts preserve existing recency values. Turn-start
    updates use a dedicated monotonic touch, and process-local allocation
    keeps millisecond cursor values unique. State DB list, search, read,
    filtered-list repair, rollout fallback propagation, and app-server
    conversions all carry the new field.
    
    ## API
    
    `Thread` responses include:
    
    ```ts
    recencyAt?: number
    ```
    
    `thread/list` and `thread/search` accept:
    
    ```json
    { "sortKey": "recency_at" }
    ```
    
    Generated TypeScript and JSON schemas are included.
    
    ## Validation
    
    - `just test -p codex-state` — 146 passed
    - `just test -p codex-rollout` — 69 passed
    - `just test -p codex-thread-store` — 81 passed
    - `just test -p codex-app-server-protocol` — 231 passed
    - Focused app-server list ordering, response mapping, archive/unarchive,
    and resume lifecycle tests passed
    - Scoped `just fix` for state, rollout, thread-store,
    app-server-protocol, and app-server
    - `just fmt`
    - `git diff --check`
    - Independent correctness, simplicity, elegance, security, and
    test-quality reviews; actionable ordering, lifecycle, query-projection,
    and timestamp-uniqueness findings were addressed
  • [codex] Record external agent import results (#28396)
    ## Summary
    - restore `externalAgentConfig/import/progress` notifications while
    keeping `externalAgentConfig/import/completed` as the must-deliver event
    - persist completed external-agent config imports in state DB by
    `importId`, including concrete success/failure details for config,
    AGENTS.md, skills, plugins, MCP servers, subagents, hooks, commands, and
    sessions
    - add `externalAgentConfig/import/readHistories` so clients can recover
    persisted import results after missing the live completion notification
    - include `errorType` on import failures in protocol
    responses/notifications and persisted DB JSON so future code can
    classify failures without another wire/storage shape change
    
    ## Validation
    - `git diff --check`
    - `just test -p codex-state external_agent_config_imports`
    - `just test -p codex-app-server-protocol`
    - `CODEX_SQLITE_HOME=/private/tmp/codex-app-server-sqlite-read-details
    just test -p codex-app-server
    external_agent_config_import_sends_completion_notification_for_sync_only_import`
    
    Also ran earlier broader checks before publishing:
    - `just test -p codex-state`
    -
    `CODEX_SQLITE_HOME=/private/tmp/codex-app-server-external-agent-test-sqlite
    just test -p codex-app-server external_agent_config`
    - `just test -p codex-external-agent-migration`
  • fix: Auto-recover from corrupted sqlite databases (#26859)
    Further investigation of the sqlite incidents showed that the problems
    are due to corruption from the older version of SQLite that we recently
    upgraded, and that the data is truly corrupted in the root database --
    recovery of all data is not possible. Given that the data is
    reconstructable from the rollouts on disk, we should just auto-backup
    the database and let codex rebuild the rollout info from the disk
    rollouts.
    
    The new behavior is that appserver auto-backs-up and rebuilds (with logs
    reflecting that behavior). The CLI now pops a message letting you know
    this happened and the paths of the backed-up corrupt db and the new
    database. There is also context added so that the desktop app can read
    the rebuild info from it and inform the user with it.
  • Avoid no-op backfill state writes (#26420)
    ## Summary
    
    - avoid acquiring SQLite's writer slot when the singleton backfill row
    already exists
    - preserve race-safe repair when the row is missing
    - add regressions for writer contention and missing-row repair
    
    ## Why
    
    State runtime initialization and backfill-state reads previously
    executed
    `INSERT ... ON CONFLICT DO NOTHING` even in the steady state. SQLite
    still
    enters the writer path for that statement, so TUI and app-server startup
    could
    wait behind another writer for up to the configured five-second busy
    timeout.
    
    ## Validation
    
    - `just test -p codex-state` (134 tests passed)
    - `just fix -p codex-state`
    - `just fmt`
  • [codex] Remove redundant SQLite dynamic tool storage (#24819)
    ## Why
    
    Dynamic tools are defined at thread start and already stored in rollout
    `SessionMeta`, which restores resumed and forked sessions. Persisting
    the same tools through SQLite creates a second runtime persistence path
    that is unnecessary prework for the explicit namespace refactor.
    
    ## What changed
    
    - Restore missing thread-start dynamic tools directly from rollout
    history, including when SQLite is enabled.
    - Remove SQLite dynamic-tool reads, writes, backfill, and thread
    metadata patch plumbing.
    - Add SQLite-enabled resume integration coverage that verifies a
    rollout-defined dynamic tool is still sent after resume.
    
    ## Compatibility
    
    The existing `thread_dynamic_tools` table is intentionally not dropped
    even though it's now unused. Older Codex binaries are allowed to open
    databases migrated by newer binaries and still reference this table;
    dropping it would break that mixed-version path. See
    [here](https://github.com/openai/codex/blob/main/codex-rs/state/src/migrations.rs#L10-L11).
    
    ## Verification
    
    - `just test -p codex-state -p codex-rollout -p codex-thread-store`
    - `just test -p codex-core --test all
    resume_restores_dynamic_tools_from_rollout_with_sqlite_enabled`
  • Move memory state to a dedicated SQLite DB (#24591)
    ## Summary
    
    Generated memory rows and their stage-one/stage-two job state currently
    live in `state_5.sqlite` alongside thread metadata. That makes memory
    cleanup and regeneration share the main state schema even though those
    rows are memory-pipeline data and can be rebuilt independently from the
    durable thread records.
    
    This PR moves the memory-owned tables into a dedicated
    `memories_1.sqlite` runtime database while keeping thread metadata in
    `state_5.sqlite`.
    
    ## Changes
    
    - Adds a separate memories DB runtime, migrator, path helpers, telemetry
    kind, and Bazel compile data for `state/memory_migrations`.
    - Introduces `MemoryStore` behind `StateRuntime::memories()` and moves
    memory table/job operations onto that store.
    - Drops the old memory tables from the state DB and recreates their
    schema in `state/memory_migrations/0001_memories.sql`.
    - Updates memory startup, citation usage tracking, rollout pollution
    handling, `debug clear-memories`, and app-server `memory/reset` to
    operate through the memories DB.
    - Preserves cross-DB behavior by hydrating thread metadata from the
    state DB when selecting visible memory outputs and checking stage-one
    staleness.
    
    ## Verification
    
    - Added/updated `codex-state` tests for deleted-thread memory visibility
    and already-polluted phase-two enqueue behavior.
    - Updated `debug clear-memories`, app-server `memory/reset`, and
    memories startup tests to seed and assert memory rows through
    `memories_1.sqlite`.
  • fix: main (#23675)
    Fix main due to conflicting merges
    This is only fixing some imports and mechanics
  • feat: rename 2 (#23668)
    Just a mechanical renaming
  • feat: rename 3 (#23669)
    Just a mechanical renaming
  • feat: rename 1 (#23667)
    Just a mechanical renaming
  • feat: dedicated goal DB (#23300)
    ## Why
    
    Thread goals are moving toward extension-owned runtime behavior, but
    their persisted state was still stored in the shared state database.
    This makes the goal store harder to isolate and keeps future storage
    splits tied to ad hoc runtime plumbing.
    
    This PR gives goals their own SQLite database while keeping the existing
    `StateRuntime` entry point. The goal is to make this the pattern for
    adding more dedicated runtime databases later.
    
    This also reduce load on existing DB and reduce contention
    
    ## Limitation
    Thread preview from goal is not supported anymore. I'm looking into this
    [EDIT]: solved
    
    ## What changed
    
    - Added a dedicated `goals_1.sqlite` database with its own
    `goals_migrations` directory.
    - Moved `thread_goals` creation into the goals DB migration set.
    - Dropped the old `thread_goals` table from the main state DB with a
    normal state migration. There is intentionally no backfill for existing
    goal rows.
    - Changed `GoalStore` to be backed only by the goals DB pool.
    - Removed the old goal-write side effect that filled empty
    `threads.preview` values from the goal objective.
    - Added shared runtime DB path metadata so startup, telemetry, `codex
    doctor`, and repair handling can include future DBs without bespoke path
    lists.
    - Updated Bazel compile data so the new goals migration directory is
    available to `sqlx::migrate!`.
    
    ## Verification
    
    - `cargo check --tests -p codex-state -p codex-cli -p codex-core -p
    codex-app-server`
    - `just fix -p codex-state`
    - `just fix -p codex-cli`
    - `just fix -p codex-app-server`
  • chore: isolate thread goal storage behind GoalStore (#23295)
    ## Why
    
    Thread goal persistence is being prepared for a dedicated storage
    boundary. Before that split, goal-specific reads, writes, accounting,
    and cleanup were exposed directly on `StateRuntime`, so core and
    app-server callsites stayed coupled to the full runtime instead of a
    goal-specific store.
    
    This PR introduces that boundary without changing the goal wire API or
    current persistence behavior. Callers now go through
    `StateRuntime::thread_goals()` and the new `GoalStore`, while
    `GoalStore` still uses the existing state DB pool underneath.
    
    ## What changed
    
    - Added `GoalStore` in `state/src/runtime/goals.rs` and exposed it from
    `StateRuntime` via `thread_goals()`.
    - Moved thread-goal reads, writes, status updates, pause, delete, and
    usage accounting onto `GoalStore`.
    - Updated core session goal handling, app-server goal RPCs, resume
    snapshots, and goal tests to use the store boundary.
    - Kept thread deletion responsible for cascading goal cleanup by
    deleting the goal through the store only after a thread row is removed.
    
    ## Testing
    
    - Existing goal persistence, resume, and accounting tests were updated
    to exercise the new `GoalStore` access path.
  • feat(cli): add codex doctor diagnostics (#22336)
    ## Why
    
    Users and support need a single command that captures the local Codex
    runtime, configuration, auth, terminal, network, and state shape without
    asking the user to know which diagnostic depth to choose first. `codex
    doctor` now runs the useful checks by default and makes the detailed
    human output the default because the command is usually run when someone
    already needs context.
    
    The command also targets concrete support failure modes we have seen
    while iterating on the design:
    
    - update-target mismatches like #21956, where the installed package
    manager target can differ from the running executable
    - terminal and multiplexer issues that depend on `TERM`, tmux/zellij
    state, color handling, and TTY metadata
    - provider-specific HTTP/WebSocket connectivity, including ChatGPT
    WebSocket handshakes and API-key/provider endpoint reachability
    - local state/log SQLite integrity problems and large rollout
    directories
    - feedback reports that need an attached, redacted diagnostic snapshot
    without asking the user to run a second command
    
    ## What Changed
    
    - Adds `codex doctor` as a grouped CLI diagnostic report with default
    detailed output and `--summary` for the compact view.
    - Adds stable report sections for Environment, Configuration, Updates,
    Connectivity, and Background Server, plus a top Notes block that
    promotes anomalies such as available updates, large rollout directories,
    optional MCP issues, and mixed auth signals.
    - Adds runtime provenance, install consistency, bundled/system search
    readiness, terminal/multiplexer metadata, `config.toml` parse status,
    auth mode details, sandbox details, feature flag summaries, update
    cache/latest-version state, app-server daemon state, SQLite integrity
    checks, rollout statistics, and provider-aware network diagnostics.
    - Adds ChatGPT WebSocket diagnostics that report the negotiated HTTP
    upgrade as `HTTP 101 Switching Protocols` and include timeout, DNS,
    auth, and provider context in detailed output.
    - Makes reachability provider-aware: API-key OpenAI setups check the API
    endpoint, ChatGPT auth checks the ChatGPT path, and custom/AWS/local
    providers check configured HTTP endpoints when available.
    - Adds structured, redacted JSON output where `checks` is keyed by check
    id and `details` is a key/value object for support tooling.
    - Integrates doctor with feedback uploads by attaching a best-effort
    `codex-doctor-report.json` report and adding derived Sentry tags for
    overall status and failing/warning checks.
    - Updates the TUI feedback consent copy so users can see that the doctor
    report is included when logs/diagnostics are uploaded.
    - Updates the CLI bug issue template to ask reporters for `codex doctor
    --json` and render pasted reports as JSON.
    
    ## Example Output
    
    The examples below are sanitized from local smoke runs with `--no-color`
    so the structure is reviewable in plain text.
    
    ### `codex doctor`
    
    ```text
    Codex Doctor v0.0.0 · macos-aarch64
    
    Notes
       ↑ updates      0.130.0 available (current 0.0.0, dismissed 0.128.0)
       ⚠ rollouts     1,526 active files · 2.53 GB on disk
       ⚠ mcp          MCP configuration has optional issues
       ⚠ auth         mixed auth signals: ChatGPT login plus API key env var; HTTP reachability uses API-key mode
    ─────────────────────────────────────────────────────────────
    
    Environment
      ✓ runtime      local debug build
          version                  0.0.0
          install method           other
          commit                   unknown
          executable               ~/code/codex.fcoury-doct…x-rs/target/debug/codex
      ✓ install      consistent
          context                  other
          managed by               npm: no · bun: no · package root —
          PATH entries (2)         ~/.local/share/mise/installs/node/24/bin/codex
                                   ~/.local/share/mise/shims/codex
      ✓ search       ripgrep 15.1.0 (system, `rg`)
      ✓ terminal     Ghostty 1.3.2-main-+b0f827665 · tmux 3.6a · TERM=xterm-256color
          terminal                 Ghostty
          TERM_PROGRAM             ghostty
          terminal version         1.3.2-main-+b0f827665
          TERM                     xterm-256color
          multiplexer              tmux 3.6a
          tmux extended-keys       on
          tmux allow-passthrough   on
          tmux set-clipboard       on
      ✓ state        databases healthy
          CODEX_HOME               ~/.codex (dir)
          state DB                 ~/.codex/state_5.sqlite (file) · integrity ok
          log DB                   ~/.codex/logs_2.sqlite (file) · integrity ok
          active rollouts          1,526 files · 2.53 GB (avg 1.70 MB)
          archived rollouts        8 files · 3.84 MB (avg 491.11 KB)
    
    Configuration
      ✓ config       loaded
          model                    gpt-5.5 · openai
          cwd                      ~/code/codex.fcoury-doctor/codex-rs
          config.toml              ~/.codex/config.toml
          config.toml parse        ok
          MCP servers              1
          feature flags            36 enabled · 7 overridden (full list with --all)
          overrides                code_mode, code_mode_only, memories, chronicle, goals, remote_control, prevent_idle_sleep
      ✓ auth         auth is configured
          auth storage mode        File
          auth file                ~/.codex/auth.json
          auth env vars present    OPENAI_API_KEY
          stored auth mode         chatgpt
          stored API key           false
          stored ChatGPT tokens    true
          stored agent identity    false
      ⚠ mcp          MCP configuration has optional issues — Set the missing MCP env vars or disable the affected server.
          configured servers       1
          disabled servers         0
          streamable_http servers  1
          optional reachability    openaiDeveloperDocs: https://developers.openai.com/mcp (HEAD connect failed; GET connect failed)
      ✓ sandbox      restricted fs + restricted network · approval OnRequest
          approval policy          OnRequest
          filesystem sandbox       restricted
          network sandbox          restricted
    
    Connectivity
      ✓ network      network-related environment looks readable
      ✓ websocket    connected (HTTP 101 Switching Protocols) · 15s timeout
          model provider           openai
          provider name            OpenAI
          wire API                 responses
          supports websockets      true
          connect timeout          15000 ms
          auth mode                chatgpt
          endpoint                 wss://chatgpt.com/backend-api/<redacted>
          DNS                      2 IPv4, 2 IPv6, first IPv6
          handshake result         HTTP 101 Switching Protocols
      ✗ reachability one or more required provider endpoints are unreachable over HTTP — Check proxy, VPN, firewall, DNS, and custom CA configuration.
          reachability mode        API key auth
          openai API               https://api.openai.com/v1 connect failed (required)
    
    Background Server
      ○ app-server   not running (ephemeral mode)
    
    ─────────────────────────────────────────────────────────────
    11 ok · 1 idle · 4 notes · 1 warn · 1 fail failed
    
    --summary compact output           --all expand truncated lists
    --json redacted report
    ```
    
    ### `codex doctor --summary`
    
    ```text
    Codex Doctor v0.0.0 · macos-aarch64
    
    Notes
       ↑ updates      0.130.0 available (current 0.0.0, dismissed 0.128.0)
       ⚠ rollouts     1,526 active files · 2.53 GB on disk
       ⚠ mcp          MCP configuration has optional issues
       ⚠ auth         mixed auth signals: ChatGPT login plus API key env var; HTTP reachability uses API-key mode
    ─────────────────────────────────────────────────────────────
    
    Environment
      ✓ runtime      local debug build
      ✓ install      consistent
      ✓ search       ripgrep 15.1.0 (system, `rg`)
      ✓ terminal     Ghostty 1.3.2-main-+b0f827665 · tmux 3.6a · TERM=xterm-256color
      ✓ state        databases healthy
    
    Configuration
      ✓ config       loaded
      ✓ auth         auth is configured
      ⚠ mcp          MCP configuration has optional issues — Set the missing MCP env vars or disable the affected server.
      ✓ sandbox      restricted fs + restricted network · approval OnRequest
    
    Updates
      ✓ updates      update configuration is locally consistent
    
    Connectivity
      ✓ network      network-related environment looks readable
      ✓ websocket    connected (HTTP 101 Switching Protocols) · 15s timeout
      ✗ reachability one or more required provider endpoints are unreachable over HTTP — Check proxy, VPN, firewall, DNS, and custom CA configuration.
    
    Background Server
      ○ app-server   not running (ephemeral mode)
    
    ─────────────────────────────────────────────────────────────
    11 ok · 1 idle · 4 notes · 1 warn · 1 fail failed
    
    Run codex doctor without --summary for detailed diagnostics.
    --all expand truncated lists       --json redacted report
    ```
    
    ### `codex doctor --json` shape
    
    ```json
    {
      "schema_version": 1,
      "overall_status": "fail",
      "checks": {
        "runtime.provenance": {
          "id": "runtime.provenance",
          "category": "Environment",
          "status": "ok",
          "summary": "local debug build",
          "details": {
            "version": "0.0.0",
            "install method": "other",
            "commit": "unknown"
          }
        },
        "sandbox.helpers": {
          "id": "sandbox.helpers",
          "category": "Configuration",
          "status": "ok",
          "summary": "restricted fs + restricted network · approval OnRequest",
          "details": {
            "approval policy": "OnRequest",
            "filesystem sandbox": "restricted",
            "network sandbox": "restricted"
          }
        }
      }
    }
    ```
    
    ### `/feedback` new sentry attachment
    
    <img width="938" height="798" alt="CleanShot 2026-05-13 at 15 36 14"
    src="https://github.com/user-attachments/assets/715e62e0-d7b4-4fea-a35a-fd5d5d33c4c0"
    />
    
    ### New section in CLI issue template
    
    <img width="1164" height="435" alt="CleanShot 2026-05-13 at 15 47 24"
    src="https://github.com/user-attachments/assets/9081dc25-a28c-4afa-8ba1-e299c2b4031d"
    />
    
    ## How to Test
    
    1. Run `cargo run --bin codex -- doctor --no-color`.
    2. Confirm the detailed report is the default and includes promoted
    Notes, grouped sections, terminal details, state DB integrity, rollout
    stats, provider reachability, WebSocket diagnostics, and app-server
    status.
    3. Run `cargo run --bin codex -- doctor --summary --no-color`.
    4. Confirm the compact view keeps the same sections and summary counts
    but omits detailed key/value rows.
    5. Run `cargo run --bin codex -- doctor --json`.
    6. Confirm the output is redacted JSON, `checks` is an object keyed by
    check id, and each check's `details` is a key/value object.
    7. Preview the CLI bug issue template and confirm the `Codex doctor
    report` field appears after the terminal field, asks for `codex doctor
    --json`, and renders pasted output as JSON.
    8. Start a feedback flow that includes logs.
    9. Confirm the upload consent copy lists `codex-doctor-report.json`
    alongside the log attachments.
    
    Targeted tests:
    
    - `cargo test -p codex-cli doctor`
    - `cargo test -p codex-app-server
    doctor_report_tags_summarize_status_counts`
    - `cargo test -p codex-feedback`
    - `cargo test -p codex-tui feedback_view`
    - `just argument-comment-lint`
    - `git diff --check`
  • Add process-scoped SQLite telemetry (#22154)
    ## Summary
    - add SQLite init, backfill-gate, and fallback telemetry without
    introducing a cross-cutting state-db access wrapper
    - install one process-scoped telemetry sink after OTEL startup and let
    low-level state/rollout paths emit through it directly
    - add process-start metrics for the process owners that initialize
    SQLite
    
    ---------
    
    Co-authored-by: Owen Lin <owen@openai.com>
  • sqlite: no more destructive version bumps (#21847)
    ## Why
    
    We'd like SQLite state to become required and load-bearing. As a first
    step, let's remove the mechanism that allows us to blow away the SQLite
    DB on a version bump, and instead rely on graceful migrations.
    
    The original motivation
    ([PR](https://github.com/openai/codex/pull/10623)) behind this mechanism
    was to care less about backwards compatibility while SQLite was being
    landed, but I'd say it's quite important now to keep the data in it.
    
    ## What changed
    
    - Make `STATE_DB_FILENAME` and `LOGS_DB_FILENAME` the full canonical
    filenames: `state_5.sqlite` and `logs_2.sqlite`.
    - Remove `STATE_DB_VERSION` / `LOGS_DB_VERSION` and the helper that
    constructed filenames from versions.
    - Stop `StateRuntime::init` from scanning for or deleting older SQLite
    DB filenames at startup.
    - Delete the tests that encoded legacy state/logs DB deletion behavior.
    
    ## Verification
    
    - `cargo test -p codex-state`
  • feat: move auto vaccum (#21378)
    The initial vaccum is not needed anymore. We can consider all the DBs
    have been reclaimed by now
  • Add goal persistence foundation (1 / 5) (#18073)
    Adds the persisted goal foundation for the rest of the stack. This PR is
    intentionally limited to feature flag and state-layer behavior;
    app-server APIs, model tools, runtime continuation, and TUI UX are
    layered in later PRs.
    
    ## Why
    
    Goal mode needs durable thread-level state before clients or model tools
    can safely build on it. The state layer needs to know whether a goal
    exists, what objective it tracks, whether it is active, paused,
    budget-limited, or complete, and how much time/token usage has already
    been accounted.
    
    ## What changed
    
    - Added the `goals` feature flag and generated config schema entry.
    - Added the `thread_goals` state table and Rust model for persisted
    thread goals.
    - Added state runtime APIs for creating, replacing, updating, deleting,
    and accounting goal usage.
    - Added `goal_id`-based stale update protection so an old goal update
    cannot overwrite a replacement.
    - Kept this PR scoped to persistence and state runtime behavior, with no
    app-server, model-facing, continuation, or TUI behavior yet.
    
    ## Verification
    
    - Added state runtime coverage for goal creation, replacement, stale
    update protection, status transitions, token-budget behavior, and usage
    accounting.
  • app-server: persist device key bindings in sqlite (#19206)
    ## Why
    
    Device-key providers should only own platform key material. The
    account/client binding used to authorize a signing payload is app-server
    state, and keeping that state in provider-specific metadata makes the
    same check harder to audit and harder to share across platform
    implementations.
    
    Persisting the binding in the shared state database gives the device-key
    crate a platform-neutral source of truth before it asks a provider to
    sign. It also lets app-server move potentially blocking key operations
    off the main message processor path, which matters once providers may
    wait for OS authentication prompts.
    
    ## What changed
    
    - Add a `device_key_bindings` state migration plus `StateRuntime`
    helpers keyed by `key_id`.
    - Add an async `DeviceKeyBindingStore` abstraction to `codex-device-key`
    and use it from `DeviceKeyStore::create` and `DeviceKeyStore::sign`.
    - Keep provider calls behind async store methods and run the synchronous
    provider work through `spawn_blocking`.
    - Wire app-server device-key RPC handling to the SQLite-backed binding
    store and spawn response/error delivery tasks for device-key requests.
    - Run the turn-start tracing test on the existing larger current-thread
    test harness after the larger async surface made the default test stack
    too small locally.
    
    ## Validation
    
    - `cargo test -p codex-device-key`
    - `cargo test -p codex-state device_key`
    - `cargo test -p codex-state`
    - `cargo test -p codex-app-server device_key`
    - `cargo test -p codex-app-server
    message_processor::tracing_tests::turn_start_jsonrpc_span_parents_core_turn_spans`
    - `cargo test -p codex-app-server`
    - `just fix -p codex-device-key`
    - `just fix -p codex-state`
    - `just fix -p codex-app-server`
    - `just bazel-lock-update`
    - `just bazel-lock-check`
    - `git diff --check`
  • feat: log client use min log level (#18661)
    In the log client, use the log level filter as a minimum severity
    instead of exact match
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Add sorting/backwardsCursor to thread/list and new thread/turns/list api (#17305)
    To improve performance of UI loads from the app, add two main
    improvements:
    1. The `thread/list` api now gets a `sortDirection` request field and a
    `backwardsCursor` to the response, which lets you paginate forwards and
    backwards from a window. This lets you fetch the first few items to
    display immediately while you paginate to fill in history, then can
    paginate "backwards" on future loads to catch up with any changes since
    the last UI load without a full reload of the entire data set.
    2. Added a new `thread/turns/list` api which also has sortDirection and
    backwardsCursor for the same behavior as `thread/list`, allowing you the
    same small-fetch for immediate display followed by background fill-in
    and resync catchup.
  • Moving updated-at timestamps to unique millisecond times (#17489)
    To allow the ability to have guaranteed-unique cursors, we make two
    important updates:
    * Add new updated_at_ms and created_at_ms columns that are in
    millisecond precision
    * Guarantee uniqueness -- if multiple items are inserted at the same
    millisecond, bump the new one by one millisecond until it becomes unique
    
    This lets us use single-number cursors for forwards and backwards paging
    through resultsets and guarantee that the cursor is a fixed point to do
    (timestamp > cursor) and get new items only.
    
    This updated implementation is backwards-compatible since multiple
    appservers can be running and won't handle the previous method well.
  • fix(sqlite): don't hard fail migrator if DB is newer (#16924)
    ## Description
    
    This PR makes the SQLite state runtime tolerate databases that have
    already been migrated by a newer Codex binary.
    
    Today, if an older CLI sees migration versions in `_sqlx_migrations`
    that it doesn't know about, startup fails. This change relaxes that
    check for the runtime migrators we use in `codex-state` so older
    binaries can keep opening the DB in that case.
    
    ## Why
    
    We can end up with mixed-version CLIs running against the same local
    state DB. In that setup, treating "the database is ahead of me" as a
    hard error is unnecessarily strict and breaks the older client even when
    the migration history is otherwise fine.
    
    ## Follow-up
    
    We still clean up versioned `state_*.sqlite` and `logs_*.sqlite` files
    during init, so older binaries can treat newer DB files as legacy. That
    should probably be tightened separately if we want mixed-version local
    usage to be fully safe.
  • feat: auto vaccum state DB (#16434)
    Start with a full vaccum the first time, then auto-vaccum incremental
  • feat: log db better maintenance (#16330)
    Run a DB clean-up more frequently with an incremental `VACCUM` in it
  • Align SQLite feedback logs with feedback formatter (#13494)
    ## Summary
    - store a pre-rendered `feedback_log_body` in SQLite so `/feedback`
    exports keep span prefixes and structured event fields
    - render SQLite feedback exports with timestamps and level prefixes to
    match the old in-memory feedback formatter, while preserving existing
    trailing newlines
    - count `feedback_log_body` in the SQLite retention budget so structured
    or span-prefixed rows still prune correctly
    - bound `/feedback` row loading in SQL with the retention estimate, then
    apply exact whole-line truncation in Rust so uploads stay capped without
    splitting lines
    
    ## Details
    - add a `feedback_log_body` column to `logs` and backfill it from
    `message` for existing rows
    - capture span names plus formatted span and event fields at write time,
    since SQLite does not retain enough structure to reconstruct the old
    formatter later
    - keep SQLite feedback queries scoped to the requested thread plus
    same-process threadless rows
    - restore a SQL-side cumulative `estimated_bytes` cap for feedback
    export queries so over-retained partitions do not load every matching
    row before truncation
    - add focused formatting coverage for exported feedback lines and parity
    coverage against `tracing_subscriber`
    
    ## Testing
    - cargo test -p codex-state
    - just fix -p codex-state
    - just fmt
    
    codex author: `codex resume 019ca1b0-0ecc-78b1-85eb-6befdd7e4f1f`
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • chore(otel): rename OtelManager to SessionTelemetry (#13808)
    ## Summary
    This is a purely mechanical refactor of `OtelManager` ->
    `SessionTelemetry` to better convey what the struct is doing. No
    behavior change.
    
    ## Why
    
    `OtelManager` ended up sounding much broader than what this type
    actually does. It doesn't manage OTEL globally; it's the session-scoped
    telemetry surface for emitting log/trace events and recording metrics
    with consistent session metadata (`app_version`, `model`, `slug`,
    `originator`, etc.).
    
    `SessionTelemetry` is a more accurate name, and updating the call sites
    makes that boundary a lot easier to follow.
    
    ## Validation
    
    - `just fmt`
    - `cargo test -p codex-otel`
    - `cargo test -p codex-core`
  • Move sqlite logs to a dedicated database (#13772)
    ## Summary
    - move sqlite log reads and writes onto a dedicated `logs_1.sqlite`
    database to reduce lock contention with the main state DB
    - add a dedicated logs migrator and route `codex-state-logs` to the new
    database path
    - leave the old `logs` table in the existing state DB untouched for now
    
    ## Testing
    - just fmt
    - cargo test -p codex-state
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • feat: limit number of rows per log (#13763)
    avoid DB explosion. This is a temp solution
  • feat: memories forgetting (#12900)
    Add diff based memory forgetting
  • feat: add search term to thread list (#12578)
    Add `searchTerm` to `thread/list` that will search for a match in the
    titles (the condition being `searchTerm` $$\in$$ `title`)
  • Agent jobs (spawn_agents_on_csv) + progress UI (#10935)
    ## Summary
    - Add agent job support: spawn a batch of sub-agents from CSV, auto-run,
    auto-export, and store results in SQLite.
    - Simplify workflow: remove run/resume/get-status/export tools; spawn is
    deterministic and completes in one call.
    - Improve exec UX: stable, single-line progress bar with ETA; suppress
    sub-agent chatter in exec.
    
    ## Why
    Enables map-reduce style workflows over arbitrarily large repos using
    the existing Codex orchestrator. This addresses review feedback about
    overly complex job controls and non-deterministic monitoring.
    
    ## Demo (progress bar)
    ```
    ./codex-rs/target/debug/codex exec \
      --enable collab \
      --enable sqlite \
      --full-auto \
      --progress-cursor \
      -c agents.max_threads=16 \
      -C /Users/daveaitel/code/codex \
      - <<'PROMPT'
    Create /tmp/agent_job_progress_demo.csv with columns: path,area and 30 rows:
    path = item-01..item-30, area = test.
    
    Then call spawn_agents_on_csv with:
    - csv_path: /tmp/agent_job_progress_demo.csv
    - instruction: "Run `python - <<'PY'` to sleep a random 0.3–1.2s, then output JSON with keys: path, score (int). Set score = 1."
    - output_csv_path: /tmp/agent_job_progress_demo_out.csv
    PROMPT
    ```
    
    ## Review feedback addressed
    - Auto-start jobs on spawn; removed run/resume/status/export tools.
    - Auto-export on success.
    - More descriptive tool spec + clearer prompts.
    - Avoid deadlocks on spawn failure; pending/running handled safely.
    - Progress bar no longer scrolls; stable single-line redraw.
    
    ## Tests
    - `cd codex-rs && cargo test -p codex-exec`
    - `cd codex-rs && cargo build -p codex-cli`
  • feat: add nick name to sub-agents (#12320)
    Adding random nick name to sub-agents. Used for UX
    
    At the same time, also storing and wiring the role of the sub-agent
  • state: enforce 10 MiB log caps for thread and threadless process logs (#12038)
    ## Summary
    - enforce a 10 MiB cap per `thread_id` in state log storage
    - enforce a 10 MiB cap per `process_uuid` for threadless (`thread_id IS
    NULL`) logs
    - scope pruning to only keys affected by the current insert batch
    - add a cheap per-key `SUM(...)` precheck so windowed prune queries only
    run for keys that are currently over the cap
    - add SQLite indexes used by the pruning queries
    - add focused runtime tests covering both pruning behaviors
    
    ## Why
    This keeps log growth bounded by the intended partition semantics while
    preserving a small, readable implementation localized to the existing
    insert path.
    
    ## Local Latency Snapshot (No Truncation-Pressure Run)
    Collected from session `019c734f-1d16-7002-9e00-c966c9fbbcae` using
    local-only (uncommitted) instrumentation, while not specifically
    benchmarking the truncation-heavy regime.
    
    ### Percentiles By Query (ms)
    | query | count | p50 | p90 | p95 | p99 | max |
    |---|---:|---:|---:|---:|---:|---:|
    | `insert_logs.insert_batch` | 110 | 0.332 | 0.999 | 1.811 | 2.978 |
    3.493 |
    | `insert_logs.precheck.process` | 106 | 0.074 | 0.152 | 0.206 | 0.258 |
    0.426 |
    | `insert_logs.precheck.thread` | 73 | 0.118 | 0.206 | 0.253 | 1.025 |
    1.025 |
    | `insert_logs.prune.process` | 58 | 0.291 | 0.576 | 0.607 | 1.088 |
    1.088 |
    | `insert_logs.prune.thread` | 44 | 0.318 | 0.467 | 0.728 | 0.797 |
    0.797 |
    | `insert_logs.prune_total` | 110 | 0.488 | 0.976 | 1.237 | 1.593 |
    1.684 |
    | `insert_logs.total` | 110 | 1.315 | 2.889 | 3.623 | 5.739 | 5.961 |
    | `insert_logs.tx_begin` | 110 | 0.133 | 0.235 | 0.282 | 0.412 | 0.546 |
    | `insert_logs.tx_commit` | 110 | 0.259 | 0.689 | 0.772 | 1.065 | 1.080
    |
    
    ### `insert_logs.total` Histogram (ms)
    | bucket | count |
    |---|---:|
    | `<= 0.100` | 0 |
    | `<= 0.250` | 0 |
    | `<= 0.500` | 7 |
    | `<= 1.000` | 33 |
    | `<= 2.000` | 40 |
    | `<= 5.000` | 28 |
    | `<= 10.000` | 2 |
    | `<= 20.000` | 0 |
    | `<= 50.000` | 0 |
    | `<= 100.000` | 0 |
    | `> 100.000` | 0 |
    
    ## Local Latency Snapshot (Truncation-Heavy / Cap-Hit Regime)
    Collected from a run where cap-hit behavior was frequent (`135/180`
    insert calls), using local-only (uncommitted) instrumentation and a
    temporary local cap of `10_000` bytes for stress testing (not the merged
    `10 MiB` cap).
    
    ### Percentiles By Query (ms)
    | query | count | p50 | p90 | p95 | p99 | max |
    |---|---:|---:|---:|---:|---:|---:|
    | `insert_logs.insert_batch` | 180 | 0.524 | 1.645 | 2.163 | 3.424 |
    3.777 |
    | `insert_logs.precheck.process` | 171 | 0.086 | 0.235 | 0.373 | 0.758 |
    1.147 |
    | `insert_logs.precheck.thread` | 100 | 0.105 | 0.251 | 0.291 | 1.176 |
    1.622 |
    | `insert_logs.prune.process` | 109 | 0.386 | 0.839 | 1.146 | 1.548 |
    2.588 |
    | `insert_logs.prune.thread` | 56 | 0.253 | 0.550 | 1.148 | 2.484 |
    2.484 |
    | `insert_logs.prune_total` | 180 | 0.511 | 1.221 | 1.695 | 4.548 |
    5.512 |
    | `insert_logs.total` | 180 | 1.631 | 3.902 | 5.103 | 8.901 | 9.095 |
    | `insert_logs.total_cap_hit` | 135 | 1.876 | 4.501 | 5.547 | 8.902 |
    9.096 |
    | `insert_logs.total_no_cap_hit` | 45 | 0.520 | 1.700 | 2.079 | 3.294 |
    3.294 |
    | `insert_logs.tx_begin` | 180 | 0.109 | 0.253 | 0.287 | 1.088 | 1.406 |
    | `insert_logs.tx_commit` | 180 | 0.267 | 0.813 | 1.170 | 2.497 | 2.574
    |
    
    ### `insert_logs.total` Histogram (ms)
    | bucket | count |
    |---|---:|
    | `<= 0.100` | 0 |
    | `<= 0.250` | 0 |
    | `<= 0.500` | 16 |
    | `<= 1.000` | 39 |
    | `<= 2.000` | 60 |
    | `<= 5.000` | 54 |
    | `<= 10.000` | 11 |
    | `<= 20.000` | 0 |
    | `<= 50.000` | 0 |
    | `<= 100.000` | 0 |
    | `> 100.000` | 0 |
    
    ### `insert_logs.total` Histogram When Cap Was Hit (ms)
    | bucket | count |
    |---|---:|
    | `<= 0.100` | 0 |
    | `<= 0.250` | 0 |
    | `<= 0.500` | 0 |
    | `<= 1.000` | 22 |
    | `<= 2.000` | 51 |
    | `<= 5.000` | 51 |
    | `<= 10.000` | 11 |
    | `<= 20.000` | 0 |
    | `<= 50.000` | 0 |
    | `<= 100.000` | 0 |
    | `> 100.000` | 0 |
    
    ### Performance Takeaways
    - Even in a cap-hit-heavy run (`75%` cap-hit calls), `insert_logs.total`
    stays sub-10ms at p99 (`8.901ms`) and max (`9.095ms`).
    - Calls that did **not** hit the cap are materially cheaper
    (`insert_logs.total_no_cap_hit` p95 `2.079ms`) than cap-hit calls
    (`insert_logs.total_cap_hit` p95 `5.547ms`).
    - Compared to the earlier non-truncation-pressure run, overall
    `insert_logs.total` rose from p95 `3.623ms` to p95 `5.103ms`
    (+`1.48ms`), indicating bounded overhead when pruning is active.
    - This truncation-heavy run used an intentionally low local cap for
    stress testing; with the real 10 MiB cap, cap-hit frequency should be
    much lower in normal sessions.
    
    ## Testing
    - `just fmt` (in `codex-rs`)
    - `cargo test -p codex-state` (in `codex-rs`)
  • feat: add --search to just log (#11995)
    Summary
    - extend the log client to accept an optional `--search` substring
    filter when querying codex-state logs
    - propagate the filter through `LogQuery` and apply it in
    `push_log_filters` via `INSTR(message, ...)`
    - add an integration test that exercises the new search filtering
    behavior
    
    Testing
    - Not run (not requested)
  • Add process_uuid to sqlite logs (#11534)
    ## Summary
    This PR is the first slice of the per-session `/feedback` logging work:
    it adds a process-unique identifier to SQLite log rows.
    
    It does **not** change `/feedback` sourcing behavior yet.
    
    ## Changes
    - Add migration `0009_logs_process_id.sql` to extend `logs` with:
      - `process_uuid TEXT`
      - `idx_logs_process_uuid` index
    - Extend state log models:
      - `LogEntry.process_uuid: Option<String>`
      - `LogRow.process_uuid: Option<String>`
    - Stamp each log row with a stable per-process UUID in the sqlite log
    layer:
      - generated once per process as `pid:<pid>:<uuid>`
    - Update sqlite log insert/query paths to persist and read
    `process_uuid`:
      - `INSERT INTO logs (..., process_uuid, ...)`
      - `SELECT ..., process_uuid, ... FROM logs`
    
    ## Why
    App-server runs many sessions in one process. This change provides a
    process-scoping primitive we need for follow-up `/feedback` work, so
    threadless/process-level logs can be associated with the emitting
    process without mixing across processes.
    
    ## Non-goals in this PR
    - No `/feedback` transport/source changes
    - No attachment size changes
    - No sqlite retention/trim policy changes
    
    ## Testing
    - `just fmt`
    - CI will run the full checks
  • Add cwd to memory files (#11591)
    Add cwd to memory files so that model can deal with multi cwd memory
    better.
    
    ---------
    
    Co-authored-by: jif-oai <jif@openai.com>
  • fix: db stuff mem (#11575)
    * Documenting DB functions
    * Fixing 1 nit where stage-2 was sorting the stage 1 in the wrong
    direction
    * Added some tests
  • Ensure list_threads drops stale rollout files (#11572)
    Summary
    - trim `state_db::list_threads_db` results to entries whose rollout
    files still exist, logging and recording a discrepancy for dropped rows
    - delete stale metadata rows from the SQLite store so future calls don’t
    surface invalid paths
    - add regression coverage in `recorder.rs` to verify stale DB paths are
    dropped when the file is missing
  • feat: make sandbox read access configurable with ReadOnlyAccess (#11387)
    `SandboxPolicy::ReadOnly` previously implied broad read access and could
    not express a narrower read surface.
    This change introduces an explicit read-access model so we can support
    user-configurable read restrictions in follow-up work, while preserving
    current behavior today.
    
    It also ensures unsupported backends fail closed for restricted-read
    policies instead of silently granting broader access than intended.
    
    ## What
    
    - Added `ReadOnlyAccess` in protocol with:
      - `Restricted { include_platform_defaults, readable_roots }`
      - `FullAccess`
    - Updated `SandboxPolicy` to carry read-access configuration:
      - `ReadOnly { access: ReadOnlyAccess }`
      - `WorkspaceWrite { ..., read_only_access: ReadOnlyAccess }`
    - Preserved existing behavior by defaulting current construction paths
    to `ReadOnlyAccess::FullAccess`.
    - Threaded the new fields through sandbox policy consumers and call
    sites across `core`, `tui`, `linux-sandbox`, `windows-sandbox`, and
    related tests.
    - Updated Seatbelt policy generation to honor restricted read roots by
    emitting scoped read rules when full read access is not granted.
    - Added fail-closed behavior on Linux and Windows backends when
    restricted read access is requested but not yet implemented there
    (`UnsupportedOperation`).
    - Regenerated app-server protocol schema and TypeScript artifacts,
    including `ReadOnlyAccess`.
    
    ## Compatibility / rollout
    
    - Runtime behavior remains unchanged by default (`FullAccess`).
    - API/schema changes are in place so future config wiring can enable
    restricted read access without another policy-shape migration.