Commit Graph

88 Commits

  • feat: change multi-agent to use path-like system instead of uuids (#15313)
    This PR add an URI-based system to reference agents within a tree. This
    comes from a sync between research and engineering.
    
    The main agent (the one manually spawned by a user) is always called
    `/root`. Any sub-agent spawned by it will be `/root/agent_1` for example
    where `agent_1` is chosen by the model.
    
    Any agent can contact any agents using the path.
    
    Paths can be used either in absolute or relative to the calling agents
    
    Resume is not supported for now on this new path
  • feat: add graph representation of agent network (#15056)
    Add a representation of the agent graph. This is now used for:
    * Cascade close agents (when I close a parent, it close the kids)
    * Cascade resume (oposite)
    
    Later, this will also be used for post-compaction stuffing of the
    context
    
    Direct fix for: https://github.com/openai/codex/issues/14458
  • Align SQLite feedback logs with feedback formatter (#13494)
    ## Summary
    - store a pre-rendered `feedback_log_body` in SQLite so `/feedback`
    exports keep span prefixes and structured event fields
    - render SQLite feedback exports with timestamps and level prefixes to
    match the old in-memory feedback formatter, while preserving existing
    trailing newlines
    - count `feedback_log_body` in the SQLite retention budget so structured
    or span-prefixed rows still prune correctly
    - bound `/feedback` row loading in SQL with the retention estimate, then
    apply exact whole-line truncation in Rust so uploads stay capped without
    splitting lines
    
    ## Details
    - add a `feedback_log_body` column to `logs` and backfill it from
    `message` for existing rows
    - capture span names plus formatted span and event fields at write time,
    since SQLite does not retain enough structure to reconstruct the old
    formatter later
    - keep SQLite feedback queries scoped to the requested thread plus
    same-process threadless rows
    - restore a SQL-side cumulative `estimated_bytes` cap for feedback
    export queries so over-retained partitions do not load every matching
    row before truncation
    - add focused formatting coverage for exported feedback lines and parity
    coverage against `tracing_subscriber`
    
    ## Testing
    - cargo test -p codex-state
    - just fix -p codex-state
    - just fmt
    
    codex author: `codex resume 019ca1b0-0ecc-78b1-85eb-6befdd7e4f1f`
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Feat: CXA-1831 Persist latest model and reasoning effort in sqlite (#14859)
    ### Summary
    The goal is for us to get the latest turn model and reasoning effort on
    thread/resume is no override is provided on the thread/resume func call.
    This is the part 1 which we write the model and reasoning effort for a
    thread to the sqlite db and there will be a followup PR to consume the
    two new fields on thread/resume.
    
    [part 2 PR is currently WIP](https://github.com/openai/codex/pull/14888)
    and this one can be merged independently.
  • Fix agent jobs finalization race and reduce status polling churn (#14843)
    ## Summary
    - make `report_agent_job_result` atomically transition an item from
    running to completed while storing `result_json`
    - remove brittle finalization grace-sleep logic and make finished-item
    cleanup idempotent
    - replace blind fixed-interval waiting with status-subscription-based
    waiting for active worker threads
    - add state runtime tests for atomic completion and late-report
    rejection
    
    ## Why
    This addresses the race and polling concerns in #13948 by removing
    timing-based correctness assumptions and reducing unnecessary status
    polling churn.
    
    ## Validation
    - `cd codex-rs && just fmt`
    - `cd codex-rs && cargo test -p codex-state`
    - `cd codex-rs && cargo test -p codex-core --test all suite::agent_jobs`
    - `cd codex-rs && cargo test`
    - fails in an unrelated app-server tracing test:
    `message_processor::tracing_tests::thread_start_jsonrpc_span_exports_server_span_and_parents_children`
    timed out waiting for response
    
    ## Notes
    - This PR supersedes #14129 with the same agent-jobs fix on a clean
    branch from `main`.
    - The earlier PR branch was stacked on unrelated history, which made the
    review diff include unrelated commits.
    
    Fixes #13948
  • Apply argument comment lint across codex-rs (#14652)
    ## Why
    
    Once the repo-local lint exists, `codex-rs` needs to follow the
    checked-in convention and CI needs to keep it from drifting. This commit
    applies the fallback `/*param*/` style consistently across existing
    positional literal call sites without changing those APIs.
    
    The longer-term preference is still to avoid APIs that require comments
    by choosing clearer parameter types and call shapes. This PR is
    intentionally the mechanical follow-through for the places where the
    existing signatures stay in place.
    
    After rebasing onto newer `main`, the rollout also had to cover newly
    introduced `tui_app_server` call sites. That made it clear the first cut
    of the CI job was too expensive for the common path: it was spending
    almost as much time installing `cargo-dylint` and re-testing the lint
    crate as a representative test job spends running product tests. The CI
    update keeps the full workspace enforcement but trims that extra
    overhead from ordinary `codex-rs` PRs.
    
    ## What changed
    
    - keep a dedicated `argument_comment_lint` job in `rust-ci`
    - mechanically annotate remaining opaque positional literals across
    `codex-rs` with exact `/*param*/` comments, including the rebased
    `tui_app_server` call sites that now fall under the lint
    - keep the checked-in style aligned with the lint policy by using
    `/*param*/` and leaving string and char literals uncommented
    - cache `cargo-dylint`, `dylint-link`, and the relevant Cargo
    registry/git metadata in the lint job
    - split changed-path detection so the lint crate's own `cargo test` step
    runs only when `tools/argument-comment-lint/*` or `rust-ci.yml` changes
    - continue to run the repo wrapper over the `codex-rs` workspace, so
    product-code enforcement is unchanged
    
    Most of the code changes in this commit are intentionally mechanical
    comment rewrites or insertions driven by the lint itself.
    
    ## Verification
    
    - `./tools/argument-comment-lint/run.sh --workspace`
    - `cargo test -p codex-tui-app-server -p codex-tui`
    - parsed `.github/workflows/rust-ci.yml` locally with PyYAML
    
    ---
    
    * -> #14652
    * #14651
  • dynamic tool calls: add param exposeToContext to optionally hide tool (#14501)
    This extends dynamic_tool_calls to allow us to hide a tool from the
    model context but still use it as part of the general tool calling
    runtime (for ex from js_repl/code_mode)
  • chore(otel): rename OtelManager to SessionTelemetry (#13808)
    ## Summary
    This is a purely mechanical refactor of `OtelManager` ->
    `SessionTelemetry` to better convey what the struct is doing. No
    behavior change.
    
    ## Why
    
    `OtelManager` ended up sounding much broader than what this type
    actually does. It doesn't manage OTEL globally; it's the session-scoped
    telemetry surface for emitting log/trace events and recording metrics
    with consistent session metadata (`app_version`, `model`, `slug`,
    `originator`, etc.).
    
    `SessionTelemetry` is a more accurate name, and updating the call sites
    makes that boundary a lot easier to follow.
    
    ## Validation
    
    - `just fmt`
    - `cargo test -p codex-otel`
    - `cargo test -p codex-core`
  • Reduce SQLite log retention to 10 days (#13781)
    ## Summary
    - reduce the SQLite-backed log retention window from 90 days to 10 days
    
    ## Testing
    - just fmt
    - cargo test -p codex-state
    
    Co-authored-by: Codex <noreply@openai.com>
  • chore: improve DB flushing (#13620)
    This branch:
    * Avoid flushing DB when not necessary
    * Filter events for which we perfom an `upsert` into the DB
    * Add a dedicated update function of the `thread:updated_at` that is
    lighter
    
    This should significantly reduce the DB lock contention. If it is not
    sufficient, we can de-sync the flush of the DB for `updated_at`
  • Move sqlite logs to a dedicated database (#13772)
    ## Summary
    - move sqlite log reads and writes onto a dedicated `logs_1.sqlite`
    database to reduce lock contention with the main state DB
    - add a dedicated logs migrator and route `codex-state-logs` to the new
    database path
    - leave the old `logs` table in the existing state DB untouched for now
    
    ## Testing
    - just fmt
    - cargo test -p codex-state
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • feat: limit number of rows per log (#13763)
    avoid DB explosion. This is a temp solution
  • Add timestamped SQLite /feedback logs without schema changes (#13645)
    ## Summary
    - keep the SQLite schema unchanged (no migrations)
    - add timestamps to SQLite-backed `/feedback` log exports
    - keep the existing SQL-side byte cap behavior and newline handling
    - document the remaining fidelity gap (span prefixes + structured
    fields) with TODOs
    
    ## Details
    - update `query_feedback_logs` to format each exported line as:
      - `YYYY-MM-DDTHH:MM:SS.ffffffZ {level} {message}`
    - continue scoping rows to requested-thread + same-process threadless
    logs
    - continue capping in SQL before returning rows
    - keep the existing fallback behavior unchanged when SQLite returns no
    rows
    - update parity tests to normalize away the new timestamp prefix while
    we still only store `message`
    
    ## Follow-up
    - TODO already in code: persist enough span/event metadata in SQLite to
    reproduce span prefixes and structured fields in `/feedback` exports
    
    ## Testing
    - `cargo test -p codex-state`
    - `just fmt`
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • feat(core): persist trace_id for turns in RolloutItem::TurnContext (#13602)
    This PR adds a durable trace linkage for each turn by storing the active
    trace ID on the rollout TurnContext record stored in session rollout
    files.
    
    Before this change, we propagated trace context at runtime but didn’t
    persist a stable per-turn trace key in rollout history. That made
    after-the-fact debugging harder (for example, mapping a historical turn
    to the corresponding trace in datadog). This sets us up for much easier
    debugging in the future.
    
    ### What changed
    - Added an optional `trace_id` to TurnContextItem (rollout schema).
    - Added a small OTEL helper to read the current span trace ID.
    - Captured `trace_id` when creating `TurnContext` and included it in
    `to_turn_context_item()`.
    - Updated tests and fixtures that construct TurnContextItem so
    older/no-trace cases still work.
    
    ### Why this approach
    TurnContext is already the canonical durable per-turn metadata in
    rollout. This keeps ownership clean: trace linkage lives with other
    persisted turn metadata.
  • Add thread metadata update endpoint to app server (#13280)
    ## Summary
    - add the v2 `thread/metadata/update` API, including
    protocol/schema/TypeScript exports and app-server docs
    - patch stored thread `gitInfo` in sqlite without resuming the thread,
    with validation plus support for explicit `null` clears
    - repair missing sqlite thread rows from rollout data before patching,
    and make those repairs safe by inserting only when absent and updating
    only git columns so newer metadata is not clobbered
    - keep sqlite authoritative for mutable thread git metadata by
    preserving existing sqlite git fields during reconcile/backfill and only
    using rollout `SessionMeta` git fields to fill gaps
    - add regression coverage for the endpoint, repair paths, concurrent
    sqlite writes, clearing git fields, and rollout/backfill reconciliation
    - fix the login server shutdown race so cancelling before the waiter
    starts still terminates `block_until_done()` correctly
    
    ## Testing
    - `cargo test -p codex-state
    apply_rollout_items_preserves_existing_git_branch_and_fills_missing_git_fields`
    - `cargo test -p codex-state
    update_thread_git_info_preserves_newer_non_git_metadata`
    - `cargo test -p codex-core
    backfill_sessions_preserves_existing_git_branch_and_fills_missing_git_fields`
    - `cargo test -p codex-app-server thread_metadata_update`
    - `cargo test`
    - currently fails in existing `codex-core` grep-files tests with
    `unsupported call: grep_files`:
        - `suite::grep_files::grep_files_tool_collects_matches`
        - `suite::grep_files::grep_files_tool_reports_empty_results`
  • app-server: source /feedback logs from sqlite at trace level (#12969)
    ## Summary
    - write app-server SQLite logs at TRACE level when SQLite is enabled
    - source app-server `/feedback` log attachments from SQLite for the
    requested thread when available
    - flush buffered SQLite log writes before `/feedback` queries them so
    newly emitted events are not lost behind the async inserter
    - include same-process threadless SQLite rows in those `/feedback` logs
    so the attachment matches the process-wide feedback buffer more closely
    - keep the existing in-memory ring buffer fallback unchanged, including
    when the SQLite query returns no rows
    
    ## Details
    - add a byte-bounded `query_feedback_logs` helper in `codex-state` so
    `/feedback` does not fetch all rows before truncating
    - scope SQLite feedback logs to the requested thread plus threadless
    rows from the same `process_uuid`
    - format exported SQLite feedback lines with the log level prefix to
    better match the in-memory feedback formatter
    - add an explicit `LogDbLayer::flush()` control path and await it in
    app-server before querying SQLite for feedback logs
    - pass optional SQLite log bytes through `codex-feedback` as the
    `codex-logs.log` attachment override
    - leave TUI behavior unchanged apart from the updated `upload_feedback`
    call signature
    - add regression coverage for:
      - newest-within-budget ordering
      - excluding oversized newest rows
      - including same-process threadless rows
      - keeping the newest suffix across mixed thread and threadless rows
      - matching the feedback formatter shape aside from span prefixes
      - falling back to the in-memory snapshot when SQLite returns no logs
      - flushing buffered SQLite rows before querying
    
    ## Follow-up
    - SQLite feedback exports still do not reproduce span prefixes like
    `feedback-thread{thread_id=...}:`; there is a `TODO(ccunningham)` in
    `codex-rs/state/src/log_db.rs` for that follow-up.
    
    ## Testing
    - `cd codex-rs && cargo test -p codex-state`
    - `cd codex-rs && cargo test -p codex-app-server`
    - `cd codex-rs && just fmt`
  • feat: polluted memories (#13008)
    Add a feature flag to disable memory creation for "polluted"
  • Record realtime close marker on replacement (#13058)
    ## Summary
    - record a realtime close developer message when a new realtime session
    replaces an active one
    - assert the replacement marker through the mocked responses request
    path
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
    Co-authored-by: Charles Cunningham <ccunningham@openai.com>
  • feat: add debug clear-memories command to hard-wipe memories state (#13085)
    #### what
    adds a `codex debug clear-memories` command to help with clearing all
    memories state from disk, sqlite db, and marking threads as
    `memory_mode=disabled` so they don't get resummarized when the
    `memories` feature is re-enabled.
    
    #### tests
    add tests
  • feat: add local date/timezone to turn environment context (#12947)
    ## Summary
    
    This PR includes the session's local date and timezone in the
    model-visible environment context and persists that data in
    `TurnContextItem`.
    
      ## What changed
    - captures the current local date and IANA timezone when building a turn
    context, with a UTC fallback if the timezone lookup fails
    - includes current_date and timezone in the serialized
    <environment_context> payload
    - stores those fields on TurnContextItem so they survive rollout/history
    handling, subagent review threads, and resume flows
    - treats date/timezone changes as environment updates, so prompt caching
    and context refresh logic do not silently reuse stale time context
    - updates tests to validate the new environment fields without depending
    on a single hardcoded environment-context string
    
    ## test
    
    built a local build and saw it in the rollout file:
    ```
    {"timestamp":"2026-02-26T21:39:50.737Z","type":"response_item","payload":{"type":"message","role":"user","content":[{"type":"input_text","text":"<environment_context>\n  <shell>zsh</shell>\n  <current_date>2026-02-26</current_date>\n  <timezone>America/Los_Angeles</timezone>\n</environment_context>"}]}}
    ```
  • feat: memories forgetting (#12900)
    Add diff based memory forgetting
  • feat: add search term to thread list (#12578)
    Add `searchTerm` to `thread/list` that will search for a match in the
    titles (the condition being `searchTerm` $$\in$$ `title`)
  • Agent jobs (spawn_agents_on_csv) + progress UI (#10935)
    ## Summary
    - Add agent job support: spawn a batch of sub-agents from CSV, auto-run,
    auto-export, and store results in SQLite.
    - Simplify workflow: remove run/resume/get-status/export tools; spawn is
    deterministic and completes in one call.
    - Improve exec UX: stable, single-line progress bar with ETA; suppress
    sub-agent chatter in exec.
    
    ## Why
    Enables map-reduce style workflows over arbitrarily large repos using
    the existing Codex orchestrator. This addresses review feedback about
    overly complex job controls and non-deterministic monitoring.
    
    ## Demo (progress bar)
    ```
    ./codex-rs/target/debug/codex exec \
      --enable collab \
      --enable sqlite \
      --full-auto \
      --progress-cursor \
      -c agents.max_threads=16 \
      -C /Users/daveaitel/code/codex \
      - <<'PROMPT'
    Create /tmp/agent_job_progress_demo.csv with columns: path,area and 30 rows:
    path = item-01..item-30, area = test.
    
    Then call spawn_agents_on_csv with:
    - csv_path: /tmp/agent_job_progress_demo.csv
    - instruction: "Run `python - <<'PY'` to sleep a random 0.3–1.2s, then output JSON with keys: path, score (int). Set score = 1."
    - output_csv_path: /tmp/agent_job_progress_demo_out.csv
    PROMPT
    ```
    
    ## Review feedback addressed
    - Auto-start jobs on spawn; removed run/resume/status/export tools.
    - Auto-export on success.
    - More descriptive tool spec + clearer prompts.
    - Avoid deadlocks on spawn failure; pending/running handled safely.
    - Progress bar no longer scrolls; stable single-line redraw.
    
    ## Tests
    - `cd codex-rs && cargo test -p codex-exec`
    - `cd codex-rs && cargo build -p codex-cli`
  • feat: add nick name to sub-agents (#12320)
    Adding random nick name to sub-agents. Used for UX
    
    At the same time, also storing and wiring the role of the sub-agent
  • state: enforce 10 MiB log caps for thread and threadless process logs (#12038)
    ## Summary
    - enforce a 10 MiB cap per `thread_id` in state log storage
    - enforce a 10 MiB cap per `process_uuid` for threadless (`thread_id IS
    NULL`) logs
    - scope pruning to only keys affected by the current insert batch
    - add a cheap per-key `SUM(...)` precheck so windowed prune queries only
    run for keys that are currently over the cap
    - add SQLite indexes used by the pruning queries
    - add focused runtime tests covering both pruning behaviors
    
    ## Why
    This keeps log growth bounded by the intended partition semantics while
    preserving a small, readable implementation localized to the existing
    insert path.
    
    ## Local Latency Snapshot (No Truncation-Pressure Run)
    Collected from session `019c734f-1d16-7002-9e00-c966c9fbbcae` using
    local-only (uncommitted) instrumentation, while not specifically
    benchmarking the truncation-heavy regime.
    
    ### Percentiles By Query (ms)
    | query | count | p50 | p90 | p95 | p99 | max |
    |---|---:|---:|---:|---:|---:|---:|
    | `insert_logs.insert_batch` | 110 | 0.332 | 0.999 | 1.811 | 2.978 |
    3.493 |
    | `insert_logs.precheck.process` | 106 | 0.074 | 0.152 | 0.206 | 0.258 |
    0.426 |
    | `insert_logs.precheck.thread` | 73 | 0.118 | 0.206 | 0.253 | 1.025 |
    1.025 |
    | `insert_logs.prune.process` | 58 | 0.291 | 0.576 | 0.607 | 1.088 |
    1.088 |
    | `insert_logs.prune.thread` | 44 | 0.318 | 0.467 | 0.728 | 0.797 |
    0.797 |
    | `insert_logs.prune_total` | 110 | 0.488 | 0.976 | 1.237 | 1.593 |
    1.684 |
    | `insert_logs.total` | 110 | 1.315 | 2.889 | 3.623 | 5.739 | 5.961 |
    | `insert_logs.tx_begin` | 110 | 0.133 | 0.235 | 0.282 | 0.412 | 0.546 |
    | `insert_logs.tx_commit` | 110 | 0.259 | 0.689 | 0.772 | 1.065 | 1.080
    |
    
    ### `insert_logs.total` Histogram (ms)
    | bucket | count |
    |---|---:|
    | `<= 0.100` | 0 |
    | `<= 0.250` | 0 |
    | `<= 0.500` | 7 |
    | `<= 1.000` | 33 |
    | `<= 2.000` | 40 |
    | `<= 5.000` | 28 |
    | `<= 10.000` | 2 |
    | `<= 20.000` | 0 |
    | `<= 50.000` | 0 |
    | `<= 100.000` | 0 |
    | `> 100.000` | 0 |
    
    ## Local Latency Snapshot (Truncation-Heavy / Cap-Hit Regime)
    Collected from a run where cap-hit behavior was frequent (`135/180`
    insert calls), using local-only (uncommitted) instrumentation and a
    temporary local cap of `10_000` bytes for stress testing (not the merged
    `10 MiB` cap).
    
    ### Percentiles By Query (ms)
    | query | count | p50 | p90 | p95 | p99 | max |
    |---|---:|---:|---:|---:|---:|---:|
    | `insert_logs.insert_batch` | 180 | 0.524 | 1.645 | 2.163 | 3.424 |
    3.777 |
    | `insert_logs.precheck.process` | 171 | 0.086 | 0.235 | 0.373 | 0.758 |
    1.147 |
    | `insert_logs.precheck.thread` | 100 | 0.105 | 0.251 | 0.291 | 1.176 |
    1.622 |
    | `insert_logs.prune.process` | 109 | 0.386 | 0.839 | 1.146 | 1.548 |
    2.588 |
    | `insert_logs.prune.thread` | 56 | 0.253 | 0.550 | 1.148 | 2.484 |
    2.484 |
    | `insert_logs.prune_total` | 180 | 0.511 | 1.221 | 1.695 | 4.548 |
    5.512 |
    | `insert_logs.total` | 180 | 1.631 | 3.902 | 5.103 | 8.901 | 9.095 |
    | `insert_logs.total_cap_hit` | 135 | 1.876 | 4.501 | 5.547 | 8.902 |
    9.096 |
    | `insert_logs.total_no_cap_hit` | 45 | 0.520 | 1.700 | 2.079 | 3.294 |
    3.294 |
    | `insert_logs.tx_begin` | 180 | 0.109 | 0.253 | 0.287 | 1.088 | 1.406 |
    | `insert_logs.tx_commit` | 180 | 0.267 | 0.813 | 1.170 | 2.497 | 2.574
    |
    
    ### `insert_logs.total` Histogram (ms)
    | bucket | count |
    |---|---:|
    | `<= 0.100` | 0 |
    | `<= 0.250` | 0 |
    | `<= 0.500` | 16 |
    | `<= 1.000` | 39 |
    | `<= 2.000` | 60 |
    | `<= 5.000` | 54 |
    | `<= 10.000` | 11 |
    | `<= 20.000` | 0 |
    | `<= 50.000` | 0 |
    | `<= 100.000` | 0 |
    | `> 100.000` | 0 |
    
    ### `insert_logs.total` Histogram When Cap Was Hit (ms)
    | bucket | count |
    |---|---:|
    | `<= 0.100` | 0 |
    | `<= 0.250` | 0 |
    | `<= 0.500` | 0 |
    | `<= 1.000` | 22 |
    | `<= 2.000` | 51 |
    | `<= 5.000` | 51 |
    | `<= 10.000` | 11 |
    | `<= 20.000` | 0 |
    | `<= 50.000` | 0 |
    | `<= 100.000` | 0 |
    | `> 100.000` | 0 |
    
    ### Performance Takeaways
    - Even in a cap-hit-heavy run (`75%` cap-hit calls), `insert_logs.total`
    stays sub-10ms at p99 (`8.901ms`) and max (`9.095ms`).
    - Calls that did **not** hit the cap are materially cheaper
    (`insert_logs.total_no_cap_hit` p95 `2.079ms`) than cap-hit calls
    (`insert_logs.total_cap_hit` p95 `5.547ms`).
    - Compared to the earlier non-truncation-pressure run, overall
    `insert_logs.total` rose from p95 `3.623ms` to p95 `5.103ms`
    (+`1.48ms`), indicating bounded overhead when pruning is active.
    - This truncation-heavy run used an intentionally low local cap for
    stress testing; with the real 10 MiB cap, cap-hit frequency should be
    much lower in normal sessions.
    
    ## Testing
    - `just fmt` (in `codex-rs`)
    - `cargo test -p codex-state` (in `codex-rs`)
  • feat: add --compact mode to just log (#11994)
    Summary:
    - add a `--compact` flag to the logs client to suppress thread/target
    info
    - format rows and timestamps differently when compact mode is enabled so
    only hour time, level, and message remain
  • feat: add --search to just log (#11995)
    Summary
    - extend the log client to accept an optional `--search` substring
    filter when querying codex-state logs
    - propagate the filter through `LogQuery` and apply it in
    `push_log_filters` via `INSTR(message, ...)`
    - add an integration test that exercises the new search filtering
    behavior
    
    Testing
    - Not run (not requested)
  • Add process_uuid to sqlite logs (#11534)
    ## Summary
    This PR is the first slice of the per-session `/feedback` logging work:
    it adds a process-unique identifier to SQLite log rows.
    
    It does **not** change `/feedback` sourcing behavior yet.
    
    ## Changes
    - Add migration `0009_logs_process_id.sql` to extend `logs` with:
      - `process_uuid TEXT`
      - `idx_logs_process_uuid` index
    - Extend state log models:
      - `LogEntry.process_uuid: Option<String>`
      - `LogRow.process_uuid: Option<String>`
    - Stamp each log row with a stable per-process UUID in the sqlite log
    layer:
      - generated once per process as `pid:<pid>:<uuid>`
    - Update sqlite log insert/query paths to persist and read
    `process_uuid`:
      - `INSERT INTO logs (..., process_uuid, ...)`
      - `SELECT ..., process_uuid, ... FROM logs`
    
    ## Why
    App-server runs many sessions in one process. This change provides a
    process-scoping primitive we need for follow-up `/feedback` work, so
    threadless/process-level logs can be associated with the emitting
    process without mixing across processes.
    
    ## Non-goals in this PR
    - No `/feedback` transport/source changes
    - No attachment size changes
    - No sqlite retention/trim policy changes
    
    ## Testing
    - `just fmt`
    - CI will run the full checks
  • Add cwd to memory files (#11591)
    Add cwd to memory files so that model can deal with multi cwd memory
    better.
    
    ---------
    
    Co-authored-by: jif-oai <jif@openai.com>
  • fix: db stuff mem (#11575)
    * Documenting DB functions
    * Fixing 1 nit where stage-2 was sorting the stage 1 in the wrong
    direction
    * Added some tests
  • Ensure list_threads drops stale rollout files (#11572)
    Summary
    - trim `state_db::list_threads_db` results to entries whose rollout
    files still exist, logging and recording a discrepancy for dropped rows
    - delete stale metadata rows from the SQLite store so future calls don’t
    surface invalid paths
    - add regression coverage in `recorder.rs` to verify stale DB paths are
    dropped when the file is missing
  • feat: mem slash commands (#11569)
    Add 2 slash commands for memories:
    * `/m_drop` delete all the memories
    * `/m_update` update the memories with phase 1 and 2
  • feat: make sandbox read access configurable with ReadOnlyAccess (#11387)
    `SandboxPolicy::ReadOnly` previously implied broad read access and could
    not express a narrower read surface.
    This change introduces an explicit read-access model so we can support
    user-configurable read restrictions in follow-up work, while preserving
    current behavior today.
    
    It also ensures unsupported backends fail closed for restricted-read
    policies instead of silently granting broader access than intended.
    
    ## What
    
    - Added `ReadOnlyAccess` in protocol with:
      - `Restricted { include_platform_defaults, readable_roots }`
      - `FullAccess`
    - Updated `SandboxPolicy` to carry read-access configuration:
      - `ReadOnly { access: ReadOnlyAccess }`
      - `WorkspaceWrite { ..., read_only_access: ReadOnlyAccess }`
    - Preserved existing behavior by defaulting current construction paths
    to `ReadOnlyAccess::FullAccess`.
    - Threaded the new fields through sandbox policy consumers and call
    sites across `core`, `tui`, `linux-sandbox`, `windows-sandbox`, and
    related tests.
    - Updated Seatbelt policy generation to honor restricted read roots by
    emitting scoped read rules when full read access is not granted.
    - Added fail-closed behavior on Linux and Windows backends when
    restricted read access is requested but not yet implemented there
    (`UnsupportedOperation`).
    - Regenerated app-server protocol schema and TypeScript artifacts,
    including `ReadOnlyAccess`.
    
    ## Compatibility / rollout
    
    - Runtime behavior remains unchanged by default (`FullAccess`).
    - API/schema changes are in place so future config wiring can enable
    restricted read access without another policy-shape migration.
  • feat: new memory prompts (#11439)
    * Update prompt
    * Wire CWD in the prompt
    * Handle the no-output case