Commit Graph

4678 Commits

  • Refactor ExecServer filesystem split between local and remote (#15232)
    For each feature we have:
    1. Trait exposed on environment
    2. **Local Implementation** of the trait
    3. Remote implementation that uses the client to proxy via network
    4. Handler implementation that handles PRC requests and calls into
    **Local Implementation**
  • changed save directory to codex_home (#15222)
    saving image gen default save directory to
    codex_home/imagegen/thread_id/
  • feat(app-server): add mcpServer/startupStatus/updated notification (#15220)
    Exposes the legacy `codex/event/mcp_startup_update` event as an API v2
    notification.
    
    The legacy event has this shape:
    ```
    #[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
    pub struct McpStartupUpdateEvent {
        /// Server name being started.
        pub server: String,
        /// Current startup status.
        pub status: McpStartupStatus,
    }
    
    #[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
    #[serde(rename_all = "snake_case", tag = "state")]
    #[ts(rename_all = "snake_case", tag = "state")]
    pub enum McpStartupStatus {
        Starting,
        Ready,
        Failed { error: String },
        Cancelled,
    }
    ```
  • Plumb MCP turn metadata through _meta (#15190)
    ## Summary
    
    Some background. We're looking to instrument GA turns end to end. Right
    now a big gap is grouping mcp tool calls with their codex sessions. We
    send session id and turn id headers to the responses call but not the
    mcp/wham calls.
    
    Ideally we could pass the args as headers like with responses, but given
    the setup of the rmcp client, we can't send as headers without either
    changing the rmcp package upstream to allow per request headers or
    introducing a mutex which break concurrency. An earlier attempt made the
    assumption that we had 1 client per thread, which allowed us to set
    headers at the start of a turn. @pakrym mentioned that this assumption
    might break in the near future.
    
    So the solution now is to package the turn metadata/session id into the
    _meta field in the post body and pull out in codex-backend.
    
    - send turn metadata to MCP servers via `tools/call` `_meta` instead of
    assuming per-thread request headers on shared clients
    - preserve the existing `_codex_apps` metadata while adding
    `x-codex-turn-metadata` for all MCP tool calls
    - extend tests to cover both custom MCP servers and the codex apps
    search flow
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • adding full imagepath to tui (#15154)
    adding full path to TUI so image is open-able in the TUI after being
    generated. LImited to VSCode Terminal for now.
  • add specific tool guidance for Windows destructive commands (#15207)
    updated Windows shell/unified_exec tool descriptions:
    
    `exec_command`
    ```text
    Runs a command in a PTY, returning output or a session ID for ongoing interaction.
    
    Windows safety rules:
    - Do not compose destructive filesystem commands across shells. Do not enumerate paths in PowerShell and then pass them to `cmd /c`, batch builtins, or another shell for deletion or moving. Use one shell end-to-end, prefer native PowerShell cmdlets such as `Remove-Item` / `Move-Item` with `-LiteralPath`, and avoid string-built shell commands for file operations.
    - Before any recursive delete or move on Windows, verify the resolved absolute target paths stay within the intended workspace or explicitly named target directory. Never issue a recursive delete or move against a computed path if the final target has not been checked.
    ```
    
    `shell`
    ```text
    Runs a Powershell command (Windows) and returns its output. Arguments to `shell` will be passed to CreateProcessW(). Most commands should be prefixed with ["powershell.exe", "-Command"].
    
    Examples of valid command strings:
    
    - ls -a (show hidden): ["powershell.exe", "-Command", "Get-ChildItem -Force"]
    - recursive find by name: ["powershell.exe", "-Command", "Get-ChildItem -Recurse -Filter *.py"]
    - recursive grep: ["powershell.exe", "-Command", "Get-ChildItem -Path C:\\myrepo -Recurse | Select-String -Pattern 'TODO' -CaseSensitive"]
    - ps aux | grep python: ["powershell.exe", "-Command", "Get-Process | Where-Object { $_.ProcessName -like '*python*' }"]
    - setting an env var: ["powershell.exe", "-Command", "$env:FOO='bar'; echo $env:FOO"]
    - running an inline Python script: ["powershell.exe", "-Command", "@'\nprint('Hello, world!')\n'@ | python -"]
    
    Windows safety rules:
    - Do not compose destructive filesystem commands across shells. Do not enumerate paths in PowerShell and then pass them to `cmd /c`, batch builtins, or another shell for deletion or moving. Use one shell end-to-end, prefer native PowerShell cmdlets such as `Remove-Item` / `Move-Item` with `-LiteralPath`, and avoid string-built shell commands for file operations.
    - Before any recursive delete or move on Windows, verify the resolved absolute target paths stay within the intended workspace or explicitly named target directory. Never issue a recursive delete or move against a computed path if the final target has not been checked.
    ```
    
    `shell_command`
    ```text
    Runs a Powershell command (Windows) and returns its output.
    
    Examples of valid command strings:
    
    - ls -a (show hidden): "Get-ChildItem -Force"
    - recursive find by name: "Get-ChildItem -Recurse -Filter *.py"
    - recursive grep: "Get-ChildItem -Path C:\\myrepo -Recurse | Select-String -Pattern 'TODO' -CaseSensitive"
    - ps aux | grep python: "Get-Process | Where-Object { $_.ProcessName -like '*python*' }"
    - setting an env var: "$env:FOO='bar'; echo $env:FOO"
    - running an inline Python script: "@'\nprint('Hello, world!')\n'@ | python -"
    
    Windows safety rules:
    - Do not compose destructive filesystem commands across shells. Do not enumerate paths in PowerShell and then pass them to `cmd /c`, batch builtins, or another shell for deletion or moving. Use one shell end-to-end, prefer native PowerShell cmdlets such as `Remove-Item` / `Move-Item` with `-LiteralPath`, and avoid string-built shell commands for file operations.
    - Before any recursive delete or move on Windows, verify the resolved absolute target paths stay within the intended workspace or explicitly named target directory. Never issue a recursive delete or move against a computed path if the final target has not been checked.
    ```
  • Move terminal module to terminal-detection crate (#15216)
    - Move core/src/terminal.rs and its tests into a standalone
    terminal-detection workspace crate.
    - Update direct consumers to depend on codex-terminal-detection and
    import terminal APIs directly.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • feat(tracing): tag app-server turn spans with turn_id (#15206)
    So we can find and filter spans by `turn.id`.
    
    We do this for the `turn/start`, `turn/steer`, and `turn/interrupt`
    APIs.
  • feat(tui): add /title terminal title configuration (#12334)
    ## Problem
    
    When multiple Codex sessions are open at once, terminal tabs and windows
    are hard to distinguish from each other. The existing status line only
    helps once the TUI is already focused, so it does not solve the "which
    tab is this?" problem.
    
    This PR adds a first-class `/title` command so the terminal window or
    tab title can carry a short, configurable summary of the current
    session.
    
    ## Screenshot
    
    <img width="849" height="320" alt="image"
    src="https://github.com/user-attachments/assets/8b112927-7890-45ed-bb1e-adf2f584663d"
    />
    
    ## Mental model
    
    `/statusline` and `/title` are separate status surfaces with different
    constraints. The status line is an in-app footer that can be denser and
    more detailed. The terminal title is external terminal metadata, so it
    needs short, stable segments that still make multiple sessions easy to
    tell apart.
    
    The `/title` configuration is an ordered list of compact items. By
    default it renders `spinner,project`, so active sessions show
    lightweight progress first while idle sessions still stay easy to
    disambiguate. Each configured item is omitted when its value is not
    currently available rather than forcing a placeholder.
    
    ## Non-goals
    
    This does not merge `/title` into `/statusline`, and it does not add an
    arbitrary free-form title string. The feature is intentionally limited
    to a small set of structured items so the title stays short and
    reviewable.
    
    This also does not attempt to restore whatever title the terminal or
    shell had before Codex started. When Codex clears the title, it clears
    the title Codex last wrote.
    
    ## Tradeoffs
    
    A separate `/title` command adds some conceptual overlap with
    `/statusline`, but it keeps title-specific constraints explicit instead
    of forcing the status line model to cover two different surfaces.
    
    Title refresh can happen frequently, so the implementation now shares
    parsing and git-branch orchestration between the status line and title
    paths, and caches the derived project-root name by cwd. That keeps the
    hot path cheap without introducing background polling.
    
    ## Architecture
    
    The TUI gets a new `/title` slash command and a dedicated picker UI for
    selecting and ordering terminal-title items. The chosen ids are
    persisted in `tui.terminal_title`, with `spinner` and `project` as the
    default when the config is unset. `status` remains available as a
    separate text item, so configurations like `spinner,status` render
    compact progress like `⠋ Working`.
    
    `ChatWidget` now refreshes both status surfaces through a shared
    `refresh_status_surfaces()` path. That shared path parses configured
    items once, warns on invalid ids once, synchronizes shared cached state
    such as git-branch lookup, then renders the footer status line and
    terminal title from the same snapshot.
    
    Low-level OSC title writes live in `codex-rs/tui/src/terminal_title.rs`,
    which owns the terminal write path and last-mile sanitization before
    emitting OSC 0.
    
    ## Security
    
    Terminal-title text is treated as untrusted display content before Codex
    emits it. The write path strips control characters, removes invisible
    and bidi formatting characters that can make the title visually
    misleading, normalizes whitespace, and caps the emitted length.
    
    References used while implementing this:
    
    - [xterm control
    sequences](https://invisible-island.net/xterm/ctlseqs/ctlseqs.html)
    - [WezTerm escape sequences](https://wezterm.org/escape-sequences.html)
    - [CWE-150: Improper Neutralization of Escape, Meta, or Control
    Sequences](https://cwe.mitre.org/data/definitions/150.html)
    - [CERT VU#999008 (Trojan Source)](https://kb.cert.org/vuls/id/999008)
    - [Trojan Source disclosure site](https://trojansource.codes/)
    - [Unicode Bidirectional Algorithm (UAX
    #9)](https://www.unicode.org/reports/tr9/)
    - [Unicode Security Considerations (UTR
    #36)](https://www.unicode.org/reports/tr36/)
    
    ## Observability
    
    Unknown configured title item ids are warned about once instead of
    repeatedly spamming the transcript. Live preview applies immediately
    while the `/title` picker is open, and cancel rolls the in-memory title
    selection back to the pre-picker value.
    
    If terminal title writes fail, the TUI emits debug logs around set and
    clear attempts. The rendered status label intentionally collapses richer
    internal states into compact title text such as `Starting...`, `Ready`,
    `Thinking...`, `Working...`, `Waiting...`, and `Undoing...` when
    `status` is configured.
    
    ## Tests
    
    Ran:
    
    - `just fmt`
    - `cargo test -p codex-tui`
    
    At the moment, the red Windows `rust-ci` failures are due to existing
    `codex-core` `apply_patch_cli` stack-overflow tests that also reproduce
    on `main`. The `/title`-specific `codex-tui` suite is green.
  • Log automated reviewer approval sources distinctly (#15201)
    ## Summary
    
    - log guardian-reviewed tool approvals as `source=automated_reviewer` in
    `codex.tool_decision`
    - keep direct user approvals as `source=user` and config-driven
    approvals as `source=config`
    
    ## Testing
    
    -
    `/Users/gabec/.codex/skills/codex-oss-fastdev/scripts/codex-rs-fmt-quiet.sh`
    -
    `/Users/gabec/.codex/skills/codex-oss-fastdev/scripts/codex-rs-test-quiet.sh
    -p codex-otel` (fails in sandboxed loopback bind tests under
    `otel/tests/suite/otlp_http_loopback.rs`)
    - `cargo test -p codex-core guardian -- --nocapture` (original-tree run
    reached Guardian tests and only hit sandbox-related listener/proxy
    failures)
    
    Co-authored-by: Codex <noreply@openai.com>
  • Add exec-server exec RPC implementation (#15090)
    Stacked PR 2/3, based on the stub PR.
    
    Adds the exec RPC implementation and process/event flow in exec-server
    only.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Publish runnable DotSlash package for argument-comment lint (#15198)
    ## Why
    
    To date, the argument-comment linter introduced in
    https://github.com/openai/codex/pull/14651 had to be built from source
    to run, which can be a bit slow (both for local dev and when it is run
    in CI). Because of the potential slowness, I did not wire it up to run
    as part of `just clippy` or anything like that. As a result, I have seen
    a number of occasions where folks put up PRs that violate the lint, see
    it fail in CI, and then have to put up their PR again.
    
    The goal of this PR is to pre-build a runnable version of the linter and
    then make it available via a DotSlash file. Once it is available, I will
    update `just clippy` and other touchpoints to make it a natural part of
    the dev cycle so lint violations should get flagged _before_ putting up
    a PR for review.
    
    To get things started, we will build the DotSlash file as part of an
    alpha release. Though I don't expect the linter to change often, so I'll
    probably change this to only build as part of mainline releases once we
    have a working DotSlash file. (Ultimately, we should probably move the
    linter into its own repo so it can have its own release cycle.)
    
    ## What Changed
    - add a reusable `rust-release-argument-comment-lint.yml` workflow that
    builds host-specific archives for macOS arm64, Linux arm64/x64, and
    Windows x64
    - wire `rust-release.yml` to publish the `argument-comment-lint`
    DotSlash manifest on all releases for now, including alpha tags
    - package a runnable layout instead of a bare library
    
    The Unix archive layout is:
    
    ```text
    argument-comment-lint/
      bin/
        argument-comment-lint
        cargo-dylint
      lib/
        libargument_comment_lint@nightly-2025-09-18-<target>.dylib|so
    ```
    
    On Windows the same layout is published as a `.zip`, with `.exe` and
    `.dll` filenames instead.
    
    DotSlash resolves the package entrypoint to
    `argument-comment-lint/bin/argument-comment-lint`. That runner finds the
    sibling bundled `cargo-dylint` binary plus the single packaged Dylint
    library under `lib/`, then invokes `cargo-dylint dylint --lib-path
    <that-library>` with the repo's default lint settings.
  • Add experimental exec server URL handling (#15196)
    Add a config and attempt to start the server.
  • [hooks] use a user message > developer message for prompt continuation (#14867)
    ## Summary
    
    Persist Stop-hook continuation prompts as `user` messages instead of
    hidden `developer` messages + some requested integration tests
    
    This is a followup to @pakrym 's comment in
    https://github.com/openai/codex/pull/14532 to make sure stop-block
    continuation prompts match training for turn loops
    
    - Stop continuation now writes `<hook_prompt hook_run_id="...">stop
    hook's user prompt<hook_prompt>`
    - Introduces quick-xml dependency, though we already indirectly depended
    on it anyway via syntect
    - This PR only has about 500 lines of actual logic changes, the rest is
    tests/schema
    
    ## Testing
    
    Example run (with a sessionstart hook and 3 stop hooks) - this shows
    context added by session start, then two stop hooks sending their own
    additional prompts in a new turn. The model responds with a single
    message addressing both. Then when that turn ends, the hooks detect that
    they just ran using `stop_hook_active` and decide not to infinite loop
    
    test files for this (unzip, move codex -> .codex):
    [codex.zip](https://github.com/user-attachments/files/26075806/codex.zip)
    
    ```
    › cats
    
    
    • Running SessionStart hook: lighting the observatory
    
    SessionStart hook (completed)
      warning: Hi, I'm a session start hook for wizard-tower (startup).
      hook context: A wimboltine stonpet is an exotic cuisine from hyperspace
    
    • Cats are tiny zen wizards, my friend: equal parts nap, mystery, and chaos. If you want, we can talk
      cat facts, cat breeds, cat names, or build something cat-themed in this repo.
    
    • Running Stop hook: checking the tower wards
    
    • Running Stop hook: sacking the guards
    
    • Running Stop hook: hiring the guards
    
    Stop hook (completed)
      warning: Wizard Tower Stop hook reviewed the completed reply (177 chars).
    
    Stop hook (blocked)
      warning: Wizard Tower Stop hook continuing conversation
      feedback: cook the stonpet
    
    Stop hook (blocked)
      warning: Wizard Tower Stop hook continuing conversation
      feedback: eat the cooked stonpet
    
    • Stonpet’s cooked, aloha style: flash-seared over a blue quasiflame, glazed with nebula salt, and
      rested until the hyperspace juices settle.
    
      Now we eat with gratitude, my friend. One mindful bite in, and the flavor is pure cosmic surf:
      smoky, bright, and totally out of this dimension.
    
    • Running Stop hook: checking the tower wards
    
    • Running Stop hook: sacking the guards
    
    • Running Stop hook: hiring the guards
    
    Stop hook (completed)
      warning: Wizard Tower Stop hook reviewed the completed reply (285 chars).
    
    Stop hook (completed)
      warning: Wizard Tower Stop hook saw a second pass and stayed calm to avoid a loop.
    
    Stop hook (completed)
      warning: Wizard Tower Stop hook saw a second pass and stayed calm to avoid a loop.
    ```
  • Move environment abstraction into exec server (#15125)
    The idea is that codex-exec exposes an Environment struct with services
    on it. Each of those is a trait.
    
    Depending on construction parameters passed to Environment they are
    either backed by local or remote server but core doesn't see these
    differences.
  • feat: add graph representation of agent network (#15056)
    Add a representation of the agent graph. This is now used for:
    * Cascade close agents (when I close a parent, it close the kids)
    * Cascade resume (oposite)
    
    Later, this will also be used for post-compaction stuffing of the
    context
    
    Direct fix for: https://github.com/openai/codex/issues/14458
  • feat: support product-scoped plugins. (#15041)
    1. Added SessionSource::Custom(String) and --session-source.
      2. Enforced plugin and skill products by session_source.
      3. Applied the same filtering to curated background refresh.
  • Add thread/shellCommand to app server API surface (#14988)
    This PR adds a new `thread/shellCommand` app server API so clients can
    implement `!` shell commands. These commands are executed within the
    sandbox, and the command text and output are visible to the model.
    
    The internal implementation mirrors the current TUI `!` behavior.
    - persist shell command execution as `CommandExecution` thread items,
    including source and formatted output metadata
    - bridge live and replayed app-server command execution events back into
    the existing `tui_app_server` exec rendering path
    
    This PR also wires `tui_app_server` to submit `!` commands through the
    new API.
  • Simple directory mentions (#14970)
    - Adds simple support for directory mentions in the TUI.
    - Codex App/VS Code will require minor change to recognize a directory
    mention as such and change the link behavior.
    - Directory mentions have a trailing slash to differentiate from
    extensionless files
    
    
    <img width="972" height="382" alt="image"
    src="https://github.com/user-attachments/assets/8035b1eb-0978-465b-8d7a-4db2e5feca39"
    />
    <img width="978" height="228" alt="image"
    src="https://github.com/user-attachments/assets/af22cf0b-dd10-4440-9bee-a09915f6ba52"
    />
  • [hooks] turn_id extension for Stop & UserPromptSubmit (#15118)
    ## Description
    
    Adding an extension to the spec that exposes the turn_id to hook
    scripts. This is a codex-specific mechanic that allows connecting the
    hook runs with particular turns
    
    ## Testing
    
    hooks config / sample hooks to use. Extract this, rename codex ->
    .codex, and place this into a repo or your home folder. It includes:
    config.toml that enables hooks, hooks.json, and sample python hooks:
    
    
    [codex.zip](https://github.com/user-attachments/files/26102671/codex.zip)
    
    example run (note the turn_ids change between turns):
    
    ```
    › hello
    
    
    • Running SessionStart hook: lighting the observatory
    
    SessionStart hook (completed)
      warning: Hi, I'm a session start hook for wizard-tower (startup).
      hook context: Startup context: A wimboltine stonpet is an exotic cuisine from hyperspace
    
    • Running UserPromptSubmit hook: lighting the observatory lanterns
    
    UserPromptSubmit hook (completed)
      warning: wizard-tower UserPromptSubmit demo inspected: hello for turn: 019d036d-c7fa-72d2-b6fd-
    78878bfe34e4
      hook context: Wizard Tower UserPromptSubmit demo fired. For this reply only, include the exact
    phrase 'observatory lanterns lit' near the end.
    
    • Aloha! Grateful to be here and ready to build with you. Show me what you want to tackle in wizard-
      tower, and we’ll surf the next wave together. observatory lanterns lit
    
    • Running Stop hook: back to shore
    
    Stop hook (completed)
      warning: Wizard Tower Stop hook reviewed the completed reply (170 chars) for turn: 019d036d-c7fa-
    72d2-b6fd-78878bfe34e4
    
    
    › what's a stonpet?
    
    
    • Running UserPromptSubmit hook: lighting the observatory lanterns
    
    UserPromptSubmit hook (completed)
      warning: wizard-tower UserPromptSubmit demo inspected: what's a stonpet? for turn: 019d036e-3164-
    72c3-a170-98925564c4fc
      hook context: Wizard Tower UserPromptSubmit demo fired. For this reply only, include the exact
    phrase 'observatory lanterns lit' near the end.
    
    • A stonpet isn’t a standard real-world word, brah. In our shared context here, a wimboltine stonpet
      is an exotic cuisine from hyperspace, so “stonpet” sounds like the dish or food itself.
    
      If you want, we can totally invent the lore for it next. observatory lanterns lit
    
    • Running Stop hook: back to shore
    
    Stop hook (completed)
      warning: Wizard Tower Stop hook reviewed the completed reply (271 chars) for turn: 019d036e-3164-
    72c3-a170-98925564c4fc
    ```
  • Forward session and turn headers to MCP HTTP requests (#15011)
    ## Summary
    - forward request-scoped task headers through MCP tool metadata lookups
    and tool calls
    - apply those headers to streamable HTTP initialize, tools/list, and
    tools/call requests
    - update affected rmcp/core tests for the new request_headers plumbing
    
    ## Testing
    - cargo test -p codex-rmcp-client
    - cargo test -p codex-core (fails on pre-existing unrelated error in
    core/src/auth_env_telemetry.rs: missing websocket_connect_timeout_ms in
    ModelProviderInfo initializer)
    - just fix -p codex-rmcp-client
    - just fix -p codex-core (hits the same unrelated auth_env_telemetry.rs
    error)
    - just fmt
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • feat(core, tracing): create turn spans over websockets (#14632)
    ## Description
    
    Dependent on:
    - [responsesapi] https://github.com/openai/openai/pull/760991 
    - [codex-backend] https://github.com/openai/openai/pull/760985
    
    `codex app-server -> codex-backend -> responsesapi` now reuses a
    persistent websocket connection across many turns. This PR updates
    tracing when using websockets so that each `response.create` websocket
    request propagates the current tracing context, so we can get a holistic
    end-to-end trace for each turn.
    
    Tracing is propagated via special keys (`ws_request_header_traceparent`,
    `ws_request_header_tracestate`) set in the `client_metadata` param in
    Responses API.
    
    Currently tracing on websockets is a bit broken because we only set
    tracing context on ws connection time, so it's detached from a
    `turn/start` request.
  • Remove stdio transport from exec server (#15119)
    Summary
    - delete the deprecated stdio transport plumbing from the exec server
    stack
    - add a basic `exec_server()` harness plus test utilities to start a
    server, send requests, and await events
    - refresh exec-server dependencies, configs, and documentation to
    reflect the new flow
    
    Testing
    - Not run (not requested)
    
    ---------
    
    Co-authored-by: starr-openai <starr@openai.com>
    Co-authored-by: Codex <noreply@openai.com>
  • Add Python SDK thread.run convenience methods (#15088)
    ## TL;DR
    Add `thread.run(...)` / `async thread.run(...)` convenience methods to
    the Python SDK for the common case.
    
    - add `RunInput = Input | str` and `RunResult` with `final_response`,
    collected `items`, and optional `usage`
    - keep `thread.turn(...)` strict and lower-level for streaming,
    steering, interrupting, and raw generated `Turn` access
    - update Python SDK docs, quickstart examples, and tests for the sync
    and async convenience flows
    
    ## Validation
    - `python3 -m pytest sdk/python/tests/test_public_api_signatures.py
    sdk/python/tests/test_public_api_runtime_behavior.py`
    - `python3 -m pytest
    sdk/python/tests/test_real_app_server_integration.py -k
    'thread_run_convenience or async_thread_run_convenience'` (skipped in
    this environment)
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Add exec-server stub server and protocol docs (#15089)
    Stacked PR 1/3.
    
    This is the initialize-only exec-server stub slice: binary/client
    scaffolding and protocol docs, without exec/filesystem implementation.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • fix: harden plugin feature gating (#15104)
    Resubmit https://github.com/openai/codex/pull/15020 with correct
    content.
    
    1. Use requirement-resolved config.features as the plugin gate.
    2. Guard plugin/list, plugin/read, and related flows behind that gate.
    3. Skip bad marketplace.json files instead of failing the whole list.
    4. Simplify plugin state and caching.
  • don't add transcript for v2 realtime (#15111)
    # External (non-OpenAI) Pull Request Requirements
    
    Before opening this Pull Request, please read the dedicated
    "Contributing" markdown file or your PR may be closed:
    https://github.com/openai/codex/blob/main/docs/contributing.md
    
    If your PR conforms to our contribution guidelines, replace this text
    with a detailed and high quality description of your changes.
    
    Include a link to a bug report or enhancement request.
  • Feat: reuse persisted model and reasoning effort on thread resume (#14888)
    ## Summary
    
    This PR makes `thread/resume` reuse persisted thread model metadata when
    the caller does not explicitly override it.
    
    Changes:
    - read persisted thread metadata from SQLite during `thread/resume`
    - reuse persisted `model` and `model_reasoning_effort` as resume-time
    defaults
    - fetch persisted metadata once and reuse it later in the resume
    response path
    - keep thread summary loading on the existing rollout path, while
    reusing persisted metadata when available
    - document the resume fallback behavior in the app-server README
    
    ## Why
    
    Before this change, resuming a thread without explicit overrides derived
    `model` and `model_reasoning_effort` from current config, which could
    drift from the thread’s last persisted values. That meant a resumed
    thread could report and run with different model settings than the ones
    it previously used.
    
    ## Behavior
    
    Precedence on `thread/resume` is now:
    1. explicit resume overrides
    2. persisted SQLite metadata for the thread
    3. normal config resolution for the resumed cwd
  • Align SQLite feedback logs with feedback formatter (#13494)
    ## Summary
    - store a pre-rendered `feedback_log_body` in SQLite so `/feedback`
    exports keep span prefixes and structured event fields
    - render SQLite feedback exports with timestamps and level prefixes to
    match the old in-memory feedback formatter, while preserving existing
    trailing newlines
    - count `feedback_log_body` in the SQLite retention budget so structured
    or span-prefixed rows still prune correctly
    - bound `/feedback` row loading in SQL with the retention estimate, then
    apply exact whole-line truncation in Rust so uploads stay capped without
    splitting lines
    
    ## Details
    - add a `feedback_log_body` column to `logs` and backfill it from
    `message` for existing rows
    - capture span names plus formatted span and event fields at write time,
    since SQLite does not retain enough structure to reconstruct the old
    formatter later
    - keep SQLite feedback queries scoped to the requested thread plus
    same-process threadless rows
    - restore a SQL-side cumulative `estimated_bytes` cap for feedback
    export queries so over-retained partitions do not load every matching
    row before truncation
    - add focused formatting coverage for exported feedback lines and parity
    coverage against `tracing_subscriber`
    
    ## Testing
    - cargo test -p codex-state
    - just fix -p codex-state
    - just fmt
    
    codex author: `codex resume 019ca1b0-0ecc-78b1-85eb-6befdd7e4f1f`
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Add final message prefix to realtime handoff output (#15077)
    - prefix realtime handoff output with the agent final message label for
    both realtime v1 and v2
    - update realtime websocket and core expectations to match
  • Revert "fix: harden plugin feature gating" (#15102)
    Reverts openai/codex#15020
    
    I messed up the commit in my PR and accidentally merged changes that
    were still under review.
  • Add a startup deprecation warning for custom prompts (#15076)
    ## Summary
    - detect custom prompts in `$CODEX_HOME/prompts` during TUI startup
    - show a deprecation notice only when prompts are present, with guidance
    to use `$skill-creator`
    - add TUI tests and snapshot coverage for present, missing, and empty
    prompts directories
    
    ## Testing
    - Manually tested
  • Return image URL from view_image tool (#15072)
    Cleanup image semantics in code mode.
    
    `view_image` now returns `{image_url:string, details?: string}` 
    
    `image()` now allows both string parameter and `{image_url:string,
    details?: string}`
  • Propagate tool errors to code mode (#15075)
    Clean up error flow to push the FunctionCallError all the way up to
    dispatcher and allow code mode to surface as exception.
  • fix: try to fix "Stage npm package" step in ci.yml (#15092)
    Fix the CI job by updating it to use artifacts from a more recent
    release (`0.115.0`) instead of the existing one (`0.74.0`).
    
    This step in our CI job on PRs started failing today:
    
    
    https://github.com/openai/codex/blob/334164a6f714c171bb9f6440c7d3cd04ec04d295/.github/workflows/ci.yml#L33-L47
    
    I believe it's because this test verifies that the "package npm" script
    works, but we want it to be fast and not wait for binaries to be built,
    so it uses a GitHub workflow that's already done. Because it was using a
    GitHub workflow associated with `0.74.0`, it seems likely that
    workflow's history has been reaped, so we need to use a newer one.
  • feat(tui): restore composer history in app-server tui (#14945)
    ## Problem
    
    The app-server TUI (`tui_app_server`) lacked composer history support.
    Pressing Up/Down to recall previous prompts hit a stub that logged a
    warning and displayed "Not available in app-server TUI yet." New
    submissions were silently dropped from the shared history file, so
    nothing persisted for future sessions.
    
    ## Mental model
    
    Codex maintains a single, append-only history file
    (`$CODEX_HOME/history.jsonl`) shared across all TUI processes on the
    same machine. The legacy (in-process) TUI already reads/writes this file
    through `codex_core::message_history`. The app-server TUI delegates most
    operations to a separate process over RPC, but history is intentionally
    *not* an RPC concern — it's a client-local file.
    
    This PR makes the app-server TUI access the same history file directly,
    bypassing the app-server process entirely. The composer's Up/Down
    navigation and submit-time persistence now follow the same code paths as
    the legacy TUI, with the only difference being *where* the call is
    dispatched (locally in `App`, rather than inside `CodexThread`).
    
    The branch is rebuilt directly on top of `upstream/main`, so it keeps
    the
    existing app-server restore architecture intact.
    `AppServerStartedThread`
    still restores transcript history from the server `Thread` snapshot via
    `thread_snapshot_events`; this PR only adds composer-history support.
    
    ## Non-goals
    
    - Adding history support to the app-server protocol. History remains
    client-local.
    - Changing the on-disk format or location of `history.jsonl`.
    - Surfacing history I/O errors to the user (failures are logged and
    silently swallowed, matching the legacy TUI).
    
    ## Tradeoffs
    
    | Decision | Why | Risk |
    |----------|-----|------|
    | Widen `message_history` from `pub(crate)` to `pub` | Avoids
    duplicating file I/O logic; the module already has a clean, minimal API
    surface. | Other workspace crates can now call these functions — the
    contract is no longer crate-private. However, this is consistent with
    recent precedent: `590cfa617` exposed `mention_syntax` for TUI
    consumption, `752402c4f` exposed plugin APIs (`PluginsManager`), and
    `14fcb6645`/`edacbf7b6` widened internal core APIs for other crates.
    These were all narrow, intentional exposures of specific APIs — not
    broad "make internals public" moves. `1af2a37ad` even went the other
    direction, reducing broad re-exports to tighten boundaries. This change
    follows the same pattern: a small, deliberate API surface (3 functions)
    rather than a wholesale visibility change. |
    | Intercept `AddToHistory` / `GetHistoryEntryRequest` in `App` before
    RPC fallback | Keeps history ops out of the "unsupported op" error path
    without changing app-server protocol. | This now routes through a single
    `submit_thread_op` entry point, which is safer than the original
    duplicated dispatch. The remaining risk is organizational: future
    thread-op submission paths need to keep using that shared entry point. |
    | `session_configured_from_thread_response` is now `async` | Needs
    `await` on `history_metadata()` to populate real `history_log_id` /
    `history_entry_count`. | Adds an async file-stat + full-file newline
    scan to the session bootstrap path. The scan is bounded by
    `history.max_bytes` and matches the legacy TUI's cost profile, but
    startup latency still scales with file size. |
    
    ## Architecture
    
    ```
    User presses Up                     User submits a prompt
           │                                    │
           ▼                                    ▼
    ChatComposerHistory                 ChatWidget::do_submit_turn
      navigate_up()                       encode_history_mentions()
           │                                    │
           ▼                                    ▼
      AppEvent::CodexOp                  Op::AddToHistory { text }
      (GetHistoryEntryRequest)                  │
           │                                    ▼
           ▼                            App::try_handle_local_history_op
      App::try_handle_local_history_op    message_history::append_entry()
        spawn_blocking {                        │
          message_history::lookup()             ▼
        }                                $CODEX_HOME/history.jsonl
           │
           ▼
      AppEvent::ThreadEvent
      (GetHistoryEntryResponse)
           │
           ▼
      ChatComposerHistory::on_entry_response()
    ```
    
    ## Observability
    
    - `tracing::warn` on `append_entry` failure (includes thread ID).
    - `tracing::warn` on `spawn_blocking` lookup join error.
    - `tracing::warn` from `message_history` internals on file-open, lock,
    or parse failures.
    
    ## Tests
    
    - `chat_composer_history::tests::navigation_with_async_fetch` — verifies
    that Up emits `Op::GetHistoryEntryRequest` (was: checked for stub error
    cell).
    - `app::tests::history_lookup_response_is_routed_to_requesting_thread` —
    verifies multi-thread composer recall routes the lookup result back to
    the originating thread.
    -
    `app_server_session::tests::resume_response_relies_on_snapshot_replay_not_initial_messages`
    — verifies app-server session restore still uses the upstream
    thread-snapshot path.
    -
    `app_server_session::tests::session_configured_populates_history_metadata`
    — verifies bootstrap sets nonzero `history_log_id` /
    `history_entry_count` from the shared local history file.
  • fix: harden plugin feature gating (#15020)
    1. Use requirement-resolved config.features as the plugin gate.
    2. Guard plugin/list, plugin/read, and related flows behind that gate.
    3. Skip bad marketplace.json files instead of failing the whole list.
    4. Simplify plugin state and caching.
  • Add notify to code-mode (#14842)
    Allows model to send an out-of-band notification.
    
    The notification is injected as another tool call output for the same
    call_id.
  • chore: disable memory read path for morpheus (#15059)
    Because we don't want prompts collisions