474 Commits

  • start of hooks engine (#13276)
    (Experimental)
    
    This PR adds a first MVP for hooks, with SessionStart and Stop
    
    The core design is:
    
    - hooks live in a dedicated engine under codex-rs/hooks
    - each hook type has its own event-specific file
    - hook execution is synchronous and blocks normal turn progression while
    running
    - matching hooks run in parallel, then their results are aggregated into
    a normalized HookRunSummary
    
    On the AppServer side, hooks are exposed as operational metadata rather
    than transcript-native items:
    
    - new live notifications: hook/started, hook/completed
    - persisted/replayed hook results live on Turn.hookRuns
    - we intentionally did not add hook-specific ThreadItem variants
    
    Hooks messages are not persisted, they remain ephemeral. The context
    changes they add are (they get appended to the user's prompt)
  • Add request permissions tool (#13092)
    Adds a built-in `request_permissions` tool and wires it through the
    Codex core, protocol, and app-server layers so a running turn can ask
    the client for additional permissions instead of relying on a static
    session policy.
    
    The new flow emits a `RequestPermissions` event from core, tracks the
    pending request by call ID, forwards it through app-server v2 as an
    `item/permissions/requestApproval` request, and resumes the tool call
    once the client returns an approved subset of the requested permission
    profile.
  • Add in-process app server and wire up exec to use it (#14005)
    This is a subset of PR #13636. See that PR for a full overview of the
    architectural change.
    
    This PR implements the in-process app server and modifies the
    non-interactive "exec" entry point to use the app server.
    
    ---------
    
    Co-authored-by: Felipe Coury <felipe.coury@gmail.com>
  • [elicitations] Switch to use MCP style elicitation payload for mcp tool approvals. (#13621)
    - [x] Switch to use MCP style elicitation payload for mcp tool
    approvals.
    - [ ] TODO: Update the UI to support the full spec.
  • Enabling CWD Saving for Image-Gen (#13607)
    Codex now saves the generated image on to your current working
    directory.
  • feat(app-server): support mcp elicitations in v2 api (#13425)
    This adds a first-class server request for MCP server elicitations:
    `mcpServer/elicitation/request`.
    
    Until now, MCP elicitation requests only showed up as a raw
    `codex/event/elicitation_request` event from core. That made it hard for
    v2 clients to handle elicitations using the same request/response flow
    as other server-driven interactions (like shell and `apply_patch`
    tools).
    
    This also updates the underlying MCP elicitation request handling in
    core to pass through the full MCP request (including URL and form data)
    so we can expose it properly in app-server.
    
    ### Why not `item/mcpToolCall/elicitationRequest`?
    This is because MCP elicitations are related to MCP servers first, and
    only optionally to a specific MCP tool call.
    
    In the MCP protocol, elicitation is a server-to-client capability: the
    server sends `elicitation/create`, and the client replies with an
    elicitation result. RMCP models it that way as well.
    
    In practice an elicitation is often triggered by an MCP tool call, but
    not always.
    
    ### What changed
    - add `mcpServer/elicitation/request` to the v2 app-server API
    - translate core `codex/event/elicitation_request` events into the new
    v2 server request
    - map client responses back into `Op::ResolveElicitation` so the MCP
    server can continue
    - update app-server docs and generated protocol schema
    - add an end-to-end app-server test that covers the full round trip
    through a real RMCP elicitation flow
    - The new test exercises a realistic case where an MCP tool call
    triggers an elicitation, the app-server emits
    mcpServer/elicitation/request, the client accepts it, and the tool call
    resumes and completes successfully.
    
    ### app-server API flow
    - Client starts a thread with `thread/start`.
    - Client starts a turn with `turn/start`.
    - App-server sends `item/started` for the `mcpToolCall`.
    - While that tool call is in progress, app-server sends
    `mcpServer/elicitation/request`.
    - Client responds to that request with `{ action: "accept" | "decline" |
    "cancel" }`.
    - App-server sends `serverRequest/resolved`.
    - App-server sends `item/completed` for the mcpToolCall.
    - App-server sends `turn/completed`.
    - If the turn is interrupted while the elicitation is pending,
    app-server still sends `serverRequest/resolved` before the turn
    finishes.
  • image-gen-event/client_processing (#13512)
    enabling client-side to process with image-generation capabilities
    (setting app-server)
  • feat(app-server): propagate app-server trace context into core (#13368)
    ### Summary
    Propagate trace context originating at app-server RPC method handlers ->
    codex core submission loop (so this includes spans such as `run_turn`!).
    This implements PR 2 of the app-server tracing rollout.
    
    This also removes the old lower-level env-based reparenting in core so
    explicit request/submission ancestry wins instead of being overridden by
    ambient `TRACEPARENT` state.
    
    ### What changed
    - Added `trace: Option<W3cTraceContext>` to codex_protocol::Submission
    - Taught `Codex::submit()` / `submit_with_id()` to automatically capture
    the current span context when constructing or forwarding a submission
    - Wrapped the core submission loop in a submission_dispatch span
    parented from Submission.trace
    - Warn on invalid submission trace carriers and ignore them cleanly
    - Removed the old env-based downstream reparenting path in core task
    execution
    - Stopped OTEL provider init from implicitly attaching env trace context
    process-wide
    - Updated mcp-server Submission call sites for the new field
    
    Added focused unit tests for:
    - capturing trace context into Submission
    - preferring `Submission.trace` when building the core dispatch span
    
    ### Why
    PR 1 gave us consistent inbound request spans in app-server, but that
    only covered the transport boundary. For long-running work like turns
    and reviews, the important missing piece was preserving ancestry after
    the request handler returns and core continues work on a different async
    path.
    
    This change makes that handoff explicit and keeps the parentage rules
    simple:
    - app-server request span sets the current context
    - `Submission.trace` snapshots that context
    - core restores it once, at the submission boundary
    - deeper core spans inherit naturally
    
    That also lets us stop relying on env-based reparenting for this path,
    which was too ambient and could override explicit ancestry.
  • app-server service tier plumbing (plus some cleanup) (#13334)
    followup to https://github.com/openai/codex/pull/13212 to expose fast
    tier controls to app server
    (majority of this PR is generated schema jsons - actual code is +69 /
    -35 and +24 tests )
    
    - add service tier fields to the app-server protocol surfaces used by
    thread lifecycle, turn start, config, and session configured events
    - thread service tier through the app-server message processor and core
    thread config snapshots
    - allow runtime config overrides to carry service tier for app-server
    callers
    
    cleanup:
    - Removing useless "legacy" code supporting "standard" - we moved to
    None | "fast", so "standard" is not needed.
  • add fast mode toggle (#13212)
    - add a local Fast mode setting in codex-core (similar to how model id
    is currently stored on disk locally)
    - send `service_tier=priority` on requests when Fast is enabled
    - add `/fast` in the TUI and persist it locally
    - feature flag
  • Enable analytics in codex exec and codex mcp-server (#13083)
    Addresses #12913
    
    `codex exec` was not correctly defaulting to Otel metrics to enabled 
    `codex mcp-server` completely lacked an Otel collector
    
    Summary:
    - default to enabling analytics when `codex exec` initializes
    OpenTelemetry so the CLI actually reports metrics again
    - add a regression test that proves the flag remains enabled by default
    - added Otel collector to `codex mcp-server`
  • Suppress duplicate assistant output on stdout in interactive sessions (#13082)
    Addresses #12566
    
    Summary
    - stop printing the final assistant message on stdout when the process
    is running in a terminal so interactive users only see it once
    - add a helper that gates the stdout emission and cover it with unit
    tests
  • Allow clients not to send summary as an option (#12950)
    Summary is a required parameter on UserTurn. Ideally we'd like the core
    to decide the appropriate summary level.
    
    Make the summary optional and don't send it when not needed.
  • Use model catalog default for reasoning summary fallback (#12873)
    ## Summary
    - make `Config.model_reasoning_summary` optional so unset means use
    model default
    - resolve the optional config value to a concrete summary when building
    `TurnContext`
    - add protocol support for `default_reasoning_summary` in model metadata
    
    ## Validation
    - `cargo test -p codex-core --lib client::tests -- --nocapture`
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • core: bundle settings diff updates into one dev/user envelope (#12417)
    ## Summary
    - bundle contextual prompt injection into at most one developer message
    plus one contextual user message in both:
      - per-turn settings updates
      - initial context insertion
    - preserve `<model_switch>` across compaction by rebuilding it through
    canonical initial-context injection, instead of relying on
    strip/reattach hacks
    - centralize contextual user fragment detection in one shared definition
    table and reuse it for parsing/compaction logic
    - keep `AGENTS.md` in its natural serialized format:
      - `# AGENTS.md instructions for {dirname}`
      - `<INSTRUCTIONS>...</INSTRUCTIONS>`
    - simplify related tests/helpers and accept the expected snapshot/layout
    updates from bundled multi-part messages
    
    ## Why
    The goal is to converge toward a simpler, more intentional prompt shape
    where contextual updates are consistently represented as one developer
    envelope plus one contextual user envelope, while keeping parsing and
    compaction behavior aligned with that representation.
    
    ## Notable details
    - the temporary `SettingsUpdateEnvelope` wrapper was removed; these
    paths now return `Vec<ResponseItem>` directly
    - local/remote compaction no longer rely on model-switch strip/restore
    helpers
    - contextual user detection is now driven by shared fragment definitions
    instead of ad hoc matcher assembly
    - AGENTS/user instructions are still the same logical context; only the
    synthetic `<user_instructions>` wrapper was replaced by the natural
    AGENTS text format
    
    ## Testing
    - `just fmt`
    - `cargo test -p codex-app-server
    codex_message_processor::tests::extract_conversation_summary_prefers_plain_user_messages
    -- --exact`
    - `cargo test -p codex-core
    compact::tests::collect_user_messages_filters_session_prefix_entries
    --lib -- --exact`
    - `cargo test -p codex-core --test all
    'suite::compact::snapshot_request_shape_pre_turn_compaction_strips_incoming_model_switch'
    -- --exact`
    - `cargo test -p codex-core --test all
    'suite::compact_remote::snapshot_request_shape_remote_pre_turn_compaction_strips_incoming_model_switch'
    -- --exact`
    - `cargo test -p codex-core --test all
    'suite::client::includes_apps_guidance_as_developer_message_when_enabled'
    -- --exact`
    - `cargo test -p codex-core --test all
    'suite::client::includes_developer_instructions_message_in_request' --
    --exact`
    - `cargo test -p codex-core --test all
    'suite::client::includes_user_instructions_message_in_request' --
    --exact`
    - `cargo test -p codex-core --test all
    'suite::client::resume_includes_initial_messages_and_sends_prior_items'
    -- --exact`
    - `cargo test -p codex-core --test all
    'suite::review::review_input_isolated_from_parent_history' -- --exact`
    - `cargo test -p codex-exec --test all
    'suite::resume::exec_resume_last_respects_cwd_filter_and_all_flag' --
    --exact`
    - `cargo test -p core_test_support
    context_snapshot::tests::full_text_mode_preserves_unredacted_text --
    --exact`
    
    ## Notes
    - I also ran several targeted `compact`, `compact_remote`,
    `prompt_caching`, `model_visible_layout`, and `event_mapping` tests
    while iterating on prompt-shape changes.
    - I have not claimed a clean full-workspace `cargo test` from this
    environment because local sandbox/resource conditions have previously
    produced unrelated failures in large workspace runs.
  • Revert "Add skill approval event/response (#12633)" (#12811)
    This reverts commit https://github.com/openai/codex/pull/12633. We no
    longer need this PR, because we favor sending normal exec command
    approval server request with `additional_permissions` of skill
    permissions instead
  • Enable request_user_input in Default mode (#12735)
    ## Summary
    - allow `request_user_input` in Default collaboration mode as well as
    Plan
    - update the Default-mode instructions to prefer assumptions first and
    use `request_user_input` only when a question is unavoidable
    - update request_user_input and app-server tests to match the new
    Default-mode behavior
    - refactor collaboration-mode availability plumbing into
    `CollaborationModesConfig` for future mode-related flags
    
    ## Codex author
    `codex resume 019c9124-ed28-7c13-96c6-b916b1c97d49`
  • feat(app-server): add ThreadItem::DynamicToolCall (#12732)
    Previously, clients would call `thread/start` with dynamic_tools set,
    and when a model invokes a dynamic tool, it would just make the
    server->client `item/tool/call` request and wait for the client's
    response to complete the tool call. This works, but it doesn't have an
    `item/started` or `item/completed` event.
    
    Now we are doing this:
    - [new] emit `item/started` with `DynamicToolCall` populated with the
    call arguments
    - send an `item/tool/call` server request
    - [new] once the client responds, emit `item/completed` with
    `DynamicToolCall` populated with the response.
    
    Also, with `persistExtendedHistory: true`, dynamic tool calls are now
    reconstructable in `thread/read` and `thread/resume` as
    `ThreadItem::DynamicToolCall`.
  • feat: pass helper executable paths via Arg0DispatchPaths (#12719)
    ## Why
    
    `codex-rs/core/src/tools/runtimes/shell/unix_escalation.rs` previously
    located `codex-execve-wrapper` by scanning `PATH` and sibling
    directories. That lookup is brittle and can select the wrong binary when
    the runtime environment differs from startup assumptions.
    
    We already pass `codex-linux-sandbox` from `codex-arg0`;
    `codex-execve-wrapper` should use the same startup-driven path plumbing.
    
    ## What changed
    
    - Introduced `Arg0DispatchPaths` in `codex-arg0` to carry both helper
    executable paths:
      - `codex_linux_sandbox_exe`
      - `main_execve_wrapper_exe`
    - Updated `arg0_dispatch_or_else()` to pass `Arg0DispatchPaths` to
    top-level binaries and preserve helper paths created in
    `prepend_path_entry_for_codex_aliases()`.
    - Threaded `Arg0DispatchPaths` through entrypoints in `cli`, `exec`,
    `tui`, `app-server`, and `mcp-server`.
    - Added `main_execve_wrapper_exe` to core configuration plumbing
    (`Config`, `ConfigOverrides`, and `SessionServices`).
    - Updated zsh-fork shell escalation to consume the configured
    `main_execve_wrapper_exe` and removed path-sniffing fallback logic.
    - Updated app-server config reload paths so reloaded configs keep the
    same startup-provided helper executable paths.
    
    ## References
    
    - [`Arg0DispatchPaths`
    definition](https://github.com/openai/codex/blob/e355b43d5c2a771f045296a6deae10d7c9c36ec6/codex-rs/arg0/src/lib.rs#L20-L24)
    - [`arg0_dispatch_or_else()` forwarding both
    paths](https://github.com/openai/codex/blob/e355b43d5c2a771f045296a6deae10d7c9c36ec6/codex-rs/arg0/src/lib.rs#L145-L176)
    - [zsh-fork escalation using configured wrapper
    path](https://github.com/openai/codex/blob/e355b43d5c2a771f045296a6deae10d7c9c36ec6/codex-rs/core/src/tools/runtimes/shell/unix_escalation.rs#L109-L150)
    
    ## Testing
    
    - `cargo check -p codex-arg0 -p codex-core -p codex-exec -p codex-tui -p
    codex-mcp-server -p codex-app-server`
    - `cargo test -p codex-arg0`
    - `cargo test -p codex-core tools::runtimes::shell::unix_escalation:: --
    --nocapture`
  • Agent jobs (spawn_agents_on_csv) + progress UI (#10935)
    ## Summary
    - Add agent job support: spawn a batch of sub-agents from CSV, auto-run,
    auto-export, and store results in SQLite.
    - Simplify workflow: remove run/resume/get-status/export tools; spawn is
    deterministic and completes in one call.
    - Improve exec UX: stable, single-line progress bar with ETA; suppress
    sub-agent chatter in exec.
    
    ## Why
    Enables map-reduce style workflows over arbitrarily large repos using
    the existing Codex orchestrator. This addresses review feedback about
    overly complex job controls and non-deterministic monitoring.
    
    ## Demo (progress bar)
    ```
    ./codex-rs/target/debug/codex exec \
      --enable collab \
      --enable sqlite \
      --full-auto \
      --progress-cursor \
      -c agents.max_threads=16 \
      -C /Users/daveaitel/code/codex \
      - <<'PROMPT'
    Create /tmp/agent_job_progress_demo.csv with columns: path,area and 30 rows:
    path = item-01..item-30, area = test.
    
    Then call spawn_agents_on_csv with:
    - csv_path: /tmp/agent_job_progress_demo.csv
    - instruction: "Run `python - <<'PY'` to sleep a random 0.3–1.2s, then output JSON with keys: path, score (int). Set score = 1."
    - output_csv_path: /tmp/agent_job_progress_demo_out.csv
    PROMPT
    ```
    
    ## Review feedback addressed
    - Auto-start jobs on spawn; removed run/resume/status/export tools.
    - Auto-export on success.
    - More descriptive tool spec + clearer prompts.
    - Avoid deadlocks on spawn failure; pending/running handled safely.
    - Progress bar no longer scrolls; stable single-line redraw.
    
    ## Tests
    - `cd codex-rs && cargo test -p codex-exec`
    - `cd codex-rs && cargo build -p codex-cli`
  • Add skill approval event/response (#12633)
    Set the stage for skill-level permission approval in addition to
    command-level.
    
    Behind a feature flag.
  • fix(exec) Patch resume test race condition (#12648)
    ## Summary
    The test exec_resume_last_respects_cwd_filter_and_all_flag makes one
    session “newest” by resuming it, but rollout updated_at is stored/sorted
    at second precision. On fast CI (especially Windows), the touch could
    land in the same second as initial session creation, making ordering
    nondeterministic.
    
    This change adds a short sleep before the recency-touch step so the
    resumed session is guaranteed to have a later updated_at, preserving the
    intended assertion without changing product behavior.
  • Allow exec resume to parse output-last-message flag after command (#12541)
    Summary
    - mark `output-last-message` as a global exec flag so it can follow
    subcommands like `resume`
    - add regression tests in both `cli` and `exec` crates verifying the
    flag order works when invoking `resume`
    
    Fixes #12538
  • fix: codex-arg0 no longer depends on codex-core (#12434)
    ## Why
    
    `codex-rs/arg0` only needed two things from `codex-core`:
    
    - the `find_codex_home()` wrapper
    - the special argv flag used for the internal `apply_patch`
    self-invocation path
    
    That made `codex-arg0` depend on `codex-core` for a very small surface
    area. This change removes that dependency edge and moves the shared
    `apply_patch` invocation flag to a more natural boundary
    (`codex-apply-patch`) while keeping the contract explicitly documented.
    
    ## What Changed
    
    - Moved the internal `apply_patch` argv[1] flag constant out of
    `codex-core` and into `codex-apply-patch`.
    - Renamed the constant to `CODEX_CORE_APPLY_PATCH_ARG1` and documented
    that it is part of the Codex core process-invocation contract (even
    though it now lives in `codex-apply-patch`).
    - Updated `arg0`, the core apply-patch runtime, and the `codex-exec`
    apply-patch test to import the constant from `codex-apply-patch`.
    - Updated `codex-rs/arg0` to call
    `codex_utils_home_dir::find_codex_home()` directly instead of
    `codex_core::config::find_codex_home()`.
    - Removed the `codex-core` dependency from `codex-rs/arg0` and added the
    needed direct dependency on `codex-utils-home-dir`.
    - Added `codex-apply-patch` as a dev-dependency for `codex-rs/exec`
    tests (the apply-patch test now imports the moved constant directly).
    
    ## Verification
    
    - `cargo test -p codex-apply-patch`
    - `cargo test -p codex-arg0`
    - `cargo test -p codex-core --lib apply_patch`
    - `cargo test -p codex-exec
    test_standalone_exec_cli_can_use_apply_patch`
    - `cargo shear`
  • chore: remove codex-core public protocol/shell re-exports (#12432)
    ## Why
    
    `codex-rs/core/src/lib.rs` re-exported a broad set of types and modules
    from `codex-protocol` and `codex-shell-command`. That made it easy for
    workspace crates to import those APIs through `codex-core`, which in
    turn hides dependency edges and makes it harder to reduce compile-time
    coupling over time.
    
    This change removes those public re-exports so call sites must import
    from the source crates directly. Even when a crate still depends on
    `codex-core` today, this makes dependency boundaries explicit and
    unblocks future work to drop `codex-core` dependencies where possible.
    
    ## What Changed
    
    - Removed public re-exports from `codex-rs/core/src/lib.rs` for:
    - `codex_protocol::protocol` and related protocol/model types (including
    `InitialHistory`)
      - `codex_protocol::config_types` (`protocol_config_types`)
    - `codex_shell_command::{bash, is_dangerous_command, is_safe_command,
    parse_command, powershell}`
    - Migrated workspace Rust call sites to import directly from:
      - `codex_protocol::protocol`
      - `codex_protocol::config_types`
      - `codex_protocol::models`
      - `codex_shell_command`
    - Added explicit `Cargo.toml` dependencies (`codex-protocol` /
    `codex-shell-command`) in crates that now import those crates directly.
    - Kept `codex-core` internal modules compiling by using `pub(crate)`
    aliases in `core/src/lib.rs` (internal-only, not part of the public
    API).
    - Updated the two utility crates that can already drop a `codex-core`
    dependency edge entirely:
      - `codex-utils-approval-presets`
      - `codex-utils-cli`
    
    ## Verification
    
    - `cargo test -p codex-utils-approval-presets`
    - `cargo test -p codex-utils-cli`
    - `cargo check --workspace --all-targets`
    - `just clippy`
  • Wire realtime api to core (#12268)
    - Introduce `RealtimeConversationManager` for realtime API management 
    - Add `op::conversation` to start conversation, insert audio, insert
    text, and close conversation.
    - emit conversation lifecycle and realtime events.
    - Move shared realtime payload types into codex-protocol and add core
    e2e websocket tests for start/replace/transport-close paths.
    
    Things to consider:
    - Should we use the same `op::` and `Events` channel to carry audio? I
    think we should try this simple approach and later we can create
    separate one if the channels got congested.
    - Sending text updates to the client: we can start simple and later
    restrict that.
    - Provider auth isn't wired for now intentionally
  • feat: cleaner TUI for sub-agents (#12327)
    <img width="760" height="496" alt="Screenshot 2026-02-20 at 14 31 25"
    src="https://github.com/user-attachments/assets/1983b825-bb47-417e-9925-6f727af56765"
    />
  • client side modelinfo overrides (#12101)
    TL;DR
    Add top-level `model_catalog_json` config support so users can supply a
    local model catalog override from a JSON file path (including adding new
    models) without backend changes.
    
    ### Problem
    Codex previously had no clean client-side way to replace/overlay model
    catalog data for local testing of model metadata and new model entries.
    
    ### Fix
    - Add top-level `model_catalog_json` config field (JSON file path).
    - Apply catalog entries when resolving `ModelInfo`:
      1. Base resolved model metadata (remote/fallback)
      2. Catalog overlay from `model_catalog_json`
    3. Existing global top-level overrides (`model_context_window`,
    `model_supports_reasoning_summaries`, etc.)
    
    ### Note
    Will revisit per-field overrides in a follow-up
    
    ### Tests
    Added tests
  • [js_repl] paths for node module resolution can be specified for js_repl (#11944)
    # External (non-OpenAI) Pull Request Requirements
    
    In `js_repl` mode, module resolution currently starts from
    `js_repl_kernel.js`, which is written to a per-kernel temp dir. This
    effectively means that bare imports will not resolve.
    
    This PR adds a new config option, `js_repl_node_module_dirs`, which is a
    list of dirs that are used (in order) to resolve a bare import. If none
    of those work, the current working directory of the thread is used.
    
    For example:
    ```toml
    js_repl_node_module_dirs = [
        "/path/to/node_modules/",
        "/other/path/to/node_modules/",
    ]
    ```
  • feat(core): zsh exec bridge (#12052)
    zsh fork PR stack:
    - https://github.com/openai/codex/pull/12051 
    - https://github.com/openai/codex/pull/12052 👈 
    
    ### Summary
    This PR introduces a feature-gated native shell runtime path that routes
    shell execution through a patched zsh exec bridge, removing MCP-specific
    behavior from the shell hot path while preserving existing
    CommandExecution lifecycle semantics.
    
    When shell_zsh_fork is enabled, shell commands run via patched zsh with
    per-`execve` interception through EXEC_WRAPPER. Core receives wrapper
    IPC requests over a Unix socket, applies existing approval policy, and
    returns allow/deny before the subcommand executes.
    
    ### What’s included
    **1) New zsh exec bridge runtime in core**
    - Wrapper-mode entrypoint (maybe_run_zsh_exec_wrapper_mode) for
    EXEC_WRAPPER invocations.
    - Per-execution Unix-socket IPC handling for wrapper requests/responses.
    - Approval callback integration using existing core approval
    orchestration.
    - Streaming stdout/stderr deltas to existing command output event
    pipeline.
    - Error handling for malformed IPC, denial/abort, and execution
    failures.
    
    **2) Session lifecycle integration**
    SessionServices now owns a `ZshExecBridge`.
    Session startup initializes bridge state; shutdown tears it down
    cleanly.
    
    **3) Shell runtime routing (feature-gated)**
    When `shell_zsh_fork` is enabled:
    - Build execution env/spec as usual.
    - Add wrapper socket env wiring.
    - Execute via `zsh_exec_bridge.execute_shell_request(...)` instead of
    the regular shell path.
    - Non-zsh-fork behavior remains unchanged.
    
    **4) Config + feature wiring**
    - Added `Feature::ShellZshFork` (under development).
    - Added config support for `zsh_path` (optional absolute path to patched
    zsh):
    - `Config`, `ConfigToml`, `ConfigProfile`, overrides, and schema.
    - Session startup validates that `zsh_path` exists/usable when zsh-fork
    is enabled.
    - Added startup test for missing `zsh_path` failure mode.
    
    **5) Seatbelt/sandbox updates for wrapper IPC**
    - Extended seatbelt policy generation to optionally allow outbound
    connection to explicitly permitted Unix sockets.
    - Wired sandboxing path to pass wrapper socket path through to seatbelt
    policy generation.
    - Added/updated seatbelt tests for explicit socket allow rule and
    argument emission.
    
    **6) Runtime entrypoint hooks**
    - This allows the same binary to act as the zsh wrapper subprocess when
    invoked via `EXEC_WRAPPER`.
    
    **7) Tool selection behavior**
    - ToolsConfig now prefers ShellCommand type when shell_zsh_fork is
    enabled.
    - Added test coverage for precedence with unified-exec enabled.
  • chore: rm remote models fflag (#11699)
    rm `remote_models` feature flag.
    
    We see issues like #11527 when a user has `remote_models` disabled, as
    we always use the default fallback `ModelInfo`. This causes issues with
    model performance.
    
    Builds on #11690, which helps by warning the user when they are using
    the default fallback. This PR will make that happen much less frequently
    as an accidental consequence of disabling `remote_models`.
  • Feat: add model reroute notification (#12001)
    ### Summary
    Builiding off
    https://github.com/openai/codex/pull/11964/files/5c75aa7b89a70bc2cc410a6fd238749306ec4c5e#diff-058ae8f109a8b84b4b79bbfa45f522c2233b9d9e139696044ae374d50b6196e0,
    we have created a `model/rerouted` notification that captures the event
    so that consumers can render as expected. Keep the `EventMsg::Warning`
    path in core so that this does not affect TUI rendering.
    
    `model/rerouted` is meant to be generic to account for future usage
    including capacity planning etc.
  • Report syntax errors in rules file (#11686)
    Currently, if there are syntax errors detected in the starlark rules
    file, the entire policy is silently ignored by the CLI. The app server
    correctly emits a message that can be displayed in a GUI.
    
    This PR changes the CLI (both the TUI and non-interactive exec) to fail
    when the rules file can't be parsed. It then prints out an error message
    and exits with a non-zero exit code. This is consistent with the
    handling of errors in the config file.
    
    This addresses #11603
  • feat: introduce Permissions (#11633)
    ## Why
    We currently carry multiple permission-related concepts directly on
    `Config` for shell/unified-exec behavior (`approval_policy`,
    `sandbox_policy`, `network`, `shell_environment_policy`,
    `windows_sandbox_mode`).
    
    Consolidating these into one in-memory struct makes permission handling
    easier to reason about and sets up the next step: supporting named
    permission profiles (`[permissions.PROFILE_NAME]`) without changing
    behavior now.
    
    This change is mostly mechanical: it updates existing callsites to go
    through `config.permissions`, but it does not yet refactor those
    callsites to take a single `Permissions` value in places where multiple
    permission fields are still threaded separately.
    
    This PR intentionally **does not** change the on-disk `config.toml`
    format yet and keeps compatibility with legacy config keys.
    
    ## What Changed
    - Introduced `Permissions` in `core/src/config/mod.rs`.
    - Added `Config::permissions` and moved effective runtime permission
    fields under it:
      - `approval_policy`
      - `sandbox_policy`
      - `network`
      - `shell_environment_policy`
      - `windows_sandbox_mode`
    - Updated config loading/building so these effective values are still
    derived from the same existing config inputs and constraints.
    - Updated Windows sandbox helpers/resolution to read/write via
    `permissions`.
    - Threaded the new field through all permission consumers across core
    runtime, app-server, CLI/exec, TUI, and sandbox summary code.
    - Updated affected tests to reference `config.permissions.*`.
    - Renamed the struct/field from
    `EffectivePermissions`/`effective_permissions` to
    `Permissions`/`permissions` and aligned variable naming accordingly.
    
    ## Verification
    - `just fix -p codex-core -p codex-tui -p codex-cli -p codex-app-server
    -p codex-exec -p codex-utils-sandbox-summary`
    - `cargo build -p codex-core -p codex-tui -p codex-cli -p
    codex-app-server -p codex-exec -p codex-utils-sandbox-summary`
  • feat(app-server): experimental flag to persist extended history (#11227)
    This PR adds an experimental `persist_extended_history` bool flag to
    app-server thread APIs so rollout logs can retain a richer set of
    EventMsgs for non-lossy Thread > Turn > ThreadItems reconstruction (i.e.
    on `thread/resume`).
    
    ### Motivation
    Today, our rollout recorder only persists a small subset (e.g. user
    message, reasoning, assistant message) of `EventMsg` types, dropping a
    good number (like command exec, file change, etc.) that are important
    for reconstructing full item history for `thread/resume`, `thread/read`,
    and `thread/fork`.
    
    Some clients want to be able to resume a thread without lossiness. This
    lossiness is primarily a UI thing, since what the model sees are
    `ResponseItem` and not `EventMsg`.
    
    ### Approach
    This change introduces an opt-in `persist_full_history` flag to preserve
    those events when you start/resume/fork a thread (defaults to `false`).
    
    This is done by adding an `EventPersistenceMode` to the rollout
    recorder:
    - `Limited` (existing behavior, default)
    - `Extended` (new opt-in behavior)
    
    In `Extended` mode, persist additional `EventMsg` variants needed for
    non-lossy app-server `ThreadItem` reconstruction. We now store the
    following ThreadItems that we didn't before:
    - web search
    - command execution
    - patch/file changes
    - MCP tool calls
    - image view calls
    - collab tool outcomes
    - context compaction
    - review mode enter/exit
    
    For **command executions** in particular, we truncate the output using
    the existing `truncate_text` from core to store an upper bound of 10,000
    bytes, which is also the default value for truncating tool outputs shown
    to the model. This keeps the size of the rollout file and command
    execution items returned over the wire reasonable.
    
    And we also persist `EventMsg::Error` which we can now map back to the
    Turn's status and populates the Turn's error metadata.
    
    #### Updates to EventMsgs
    To truly make `thread/resume` non-lossy, we also needed to persist the
    `status` on `EventMsg::CommandExecutionEndEvent` and
    `EventMsg::PatchApplyEndEvent`. Previously it was not obvious whether a
    command failed or was declined (similar for apply_patch). These
    EventMsgs were never persisted before so I made it a required field.
  • feat: make sandbox read access configurable with ReadOnlyAccess (#11387)
    `SandboxPolicy::ReadOnly` previously implied broad read access and could
    not express a narrower read surface.
    This change introduces an explicit read-access model so we can support
    user-configurable read restrictions in follow-up work, while preserving
    current behavior today.
    
    It also ensures unsupported backends fail closed for restricted-read
    policies instead of silently granting broader access than intended.
    
    ## What
    
    - Added `ReadOnlyAccess` in protocol with:
      - `Restricted { include_platform_defaults, readable_roots }`
      - `FullAccess`
    - Updated `SandboxPolicy` to carry read-access configuration:
      - `ReadOnly { access: ReadOnlyAccess }`
      - `WorkspaceWrite { ..., read_only_access: ReadOnlyAccess }`
    - Preserved existing behavior by defaulting current construction paths
    to `ReadOnlyAccess::FullAccess`.
    - Threaded the new fields through sandbox policy consumers and call
    sites across `core`, `tui`, `linux-sandbox`, `windows-sandbox`, and
    related tests.
    - Updated Seatbelt policy generation to honor restricted read roots by
    emitting scoped read rules when full read access is not granted.
    - Added fail-closed behavior on Linux and Windows backends when
    restricted read access is requested but not yet implemented there
    (`UnsupportedOperation`).
    - Regenerated app-server protocol schema and TypeScript artifacts,
    including `ReadOnlyAccess`.
    
    ## Compatibility / rollout
    
    - Runtime behavior remains unchanged by default (`FullAccess`).
    - API/schema changes are in place so future config wiring can enable
    restricted read access without another policy-shape migration.
  • Add feature-gated freeform js_repl core runtime (#10674)
    ## Summary
    
    This PR adds an **experimental, feature-gated `js_repl` core runtime**
    so models can execute JavaScript in a persistent REPL context across
    tool calls.
    
    The implementation integrates with existing feature gating, tool
    registration, prompt composition, config/schema docs, and tests.
    
    ## What changed
    
    - Added new experimental feature flag: `features.js_repl`.
    - Added freeform `js_repl` tool and companion `js_repl_reset` tool.
    - Gated tool availability behind `Feature::JsRepl`.
    - Added conditional prompt-section injection for JS REPL instructions
    via marker-based prompt processing.
    - Implemented JS REPL handlers, including freeform parsing and pragma
    support (timeout/reset controls).
    - Added runtime resolution order for Node:
      1. `CODEX_JS_REPL_NODE_PATH`
      2. `js_repl_node_path` in config
      3. `PATH`
    - Added JS runtime assets/version files and updated docs/schema.
    
    ## Why
    
    This enables richer agent workflows that require incremental JavaScript
    execution with preserved state, while keeping rollout safe behind an
    explicit feature flag.
    
    ## Testing
    
    Coverage includes:
    
    - Feature-flag gating behavior for tool exposure.
    - Freeform parser/pragma handling edge cases.
    - Runtime behavior (state persistence across calls and top-level `await`
    support).
    
    ## Usage
    
    ```toml
    [features]
    js_repl = true
    ```
    
    Optional runtime override:
    
    - `CODEX_JS_REPL_NODE_PATH`, or
    - `js_repl_node_path` in config.
    
    #### [git stack](https://github.com/magus/git-stack-cli)
    - 👉 `1` https://github.com/openai/codex/pull/10674
    -  `2` https://github.com/openai/codex/pull/10672
    -  `3` https://github.com/openai/codex/pull/10671
    -  `4` https://github.com/openai/codex/pull/10673
    -  `5` https://github.com/openai/codex/pull/10670
  • Cache cloud requirements (#11305)
    We're loading these from the web on every startup. This puts them in a
    local file with a 1hr TTL.
    
    We sign the downloaded requirements with a key compiled into the Codex
    CLI to prevent unsophisticated tampering (determined circumvention is
    outside of our threat model: after all, one could just compile Codex
    without any of these checks).
    
    If any of the following are true, we ignore the local cache and re-fetch
    from Cloud:
    * The signature is invalid for the payload (== requirements, sign time,
    ttl, user identity)
    * The identity does not match the auth'd user's identity
    * The TTL has expired
    * We cannot parse requirements.toml from the payload
  • feat: split codex-common into smaller utils crates (#11422)
    We are removing feature-gated shared crates from the `codex-rs`
    workspace. `codex-common` grouped several unrelated utilities behind
    `[features]`, which made dependency boundaries harder to reason about
    and worked against the ongoing effort to eliminate feature flags from
    workspace crates.
    
    Splitting these utilities into dedicated crates under `utils/` aligns
    this area with existing workspace structure and keeps each dependency
    explicit at the crate boundary.
    
    ## What changed
    
    - Removed `codex-rs/common` (`codex-common`) from workspace members and
    workspace dependencies.
    - Added six new utility crates under `codex-rs/utils/`:
      - `codex-utils-cli`
      - `codex-utils-elapsed`
      - `codex-utils-sandbox-summary`
      - `codex-utils-approval-presets`
      - `codex-utils-oss`
      - `codex-utils-fuzzy-match`
    - Migrated the corresponding modules out of `codex-common` into these
    crates (with tests), and added matching `BUILD.bazel` targets.
    - Updated direct consumers to use the new crates instead of
    `codex-common`:
      - `codex-rs/cli`
      - `codex-rs/tui`
      - `codex-rs/exec`
      - `codex-rs/app-server`
      - `codex-rs/mcp-server`
      - `codex-rs/chatgpt`
      - `codex-rs/cloud-tasks`
    - Updated workspace lockfile entries to reflect the new dependency graph
    and removal of `codex-common`.
  • chore: persist turn_id in rollout session and make turn_id uuid based (#11246)
    Problem:
    1. turn id is constructed in-memory;
    2. on resuming threads, turn_id might not be unique;
    3. client cannot no the boundary of a turn from rollout files easily.
    
    This PR does three things:
    1. persist `task_started` and `task_complete` events;
    1. persist `turn_id` in rollout turn events;
    5. generate turn_id as unique uuids instead of incrementing it in
    memory.
    
    This helps us resolve the issue of clients wanting to have unique turn
    ids for resuming a thread, and knowing the boundry of each turn in
    rollout files.
    
    example debug logs
    ```
    2026-02-11T00:32:10.746876Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=8 turn=Turn { id: "019c4a07-d809-74c3-bc4b-fd9618487b4b", items: [UserMessage { id: "item-24", content: [Text { text: "hi", text_elements: [] }] }, AgentMessage { id: "item-25", text: "Hi. I’m in the workspace with your current changes loaded and ready. Send the next task and I’ll execute it end-to-end." }], status: Completed, error: None }
    2026-02-11T00:32:10.746888Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=9 turn=Turn { id: "019c4a18-1004-76c0-a0fb-a77610f6a9b8", items: [UserMessage { id: "item-26", content: [Text { text: "hello", text_elements: [] }] }, AgentMessage { id: "item-27", text: "Hello. Ready for the next change in `codex-rs`; I can continue from the current in-progress diff or start a new task." }], status: Completed, error: None }
    2026-02-11T00:32:10.746899Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=10 turn=Turn { id: "019c4a19-41f0-7db0-ad78-74f1503baeb8", items: [UserMessage { id: "item-28", content: [Text { text: "hello", text_elements: [] }] }, AgentMessage { id: "item-29", text: "Hello. Send the specific change you want in `codex-rs`, and I’ll implement it and run the required checks." }], status: Completed, error: None }
    ```
    
    backward compatibility:
    if you try to resume an old session without task_started and
    task_complete event populated, the following happens:
    - If you resume and do nothing: those reconstructed historical IDs can
    differ next time you resume.
    - If you resume and send a new turn: the new turn gets a fresh UUID from
    live submission flow and is persisted, so that new turn’s ID is stable
    on later resumes.
    I think this behavior is fine, because we only care about deterministic
    turn id once a turn is triggered.
  • feat: retain NetworkProxy, when appropriate (#11207)
    As of this PR, `SessionServices` retains a
    `Option<StartedNetworkProxy>`, if appropriate.
    
    Now the `network` field on `Config` is `Option<NetworkProxySpec>`
    instead of `Option<NetworkProxy>`.
    
    Over in `Session::new()`, we invoke `NetworkProxySpec::start_proxy()` to
    create the `StartedNetworkProxy`, which is a new struct that retains the
    `NetworkProxy` as well as the `NetworkProxyHandle`. (Note that `Drop` is
    implemented for `NetworkProxyHandle` to ensure the proxies are shutdown
    when it is dropped.)
    
    The `NetworkProxy` from the `StartedNetworkProxy` is threaded through to
    the appropriate places.
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/11207).
    * #11285
    * __->__ #11207
  • feat(sandbox): enforce proxy-aware network routing in sandbox (#11113)
    ## Summary
    - expand proxy env injection to cover common tool env vars
    (`HTTP_PROXY`/`HTTPS_PROXY`/`ALL_PROXY`/`NO_PROXY` families +
    tool-specific variants)
    - harden macOS Seatbelt network policy generation to route through
    inferred loopback proxy endpoints and fail closed when proxy env is
    malformed
    - thread proxy-aware Linux sandbox flags and add minimal bwrap netns
    isolation hook for restricted non-proxy runs
    - add/refresh tests for proxy env wiring, Seatbelt policy generation,
    and Linux sandbox argument wiring
  • feat: do not close unified exec processes across turns (#10799)
    With this PR we do not close the unified exec processes (i.e. background
    terminals) at the end of a turn unless:
    * The user interrupt the turn
    * The user decide to clean the processes through `app-server` or
    `/clean`
    
    I made sure that `codex exec` correctly kill all the processes
  • Add resume_agent collab tool (#10903)
    Summary
    - add the new resume_agent collab tool path through core, protocol, and
    the app server API, including the resume events
    - update the schema/TypeScript definitions plus docs so resume_agent
    appears in generated artifacts and README
    - note that resumed agents rehydrate rollout history without overwriting
    their base instructions
    
    Testing
    - Not run (not requested)
  • Handle required MCP startup failures across components (#10902)
    Summary
    - add a `required` flag for MCP servers everywhere config/CLI data is
    touched so mandatory helpers can be round-tripped
    - have `codex exec` and `codex app-server` thread start/resume fail fast
    when required MCPs fail to initialize