Commit Graph

54 Commits

  • [plugins] Fix plugin explicit mention context management. (#15372)
    - [x] Fix plugin explicit mention context management.
  • Feat/restore image generation history (#15223)
    Restore image generation items in resumed thread history
  • Split features into codex-features crate (#15253)
    - Split the feature system into a new `codex-features` crate.
    - Cut `codex-core` and workspace consumers over to the new config and
    warning APIs.
    
    Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
    Co-authored-by: Codex <noreply@openai.com>
  • changed save directory to codex_home (#15222)
    saving image gen default save directory to
    codex_home/imagegen/thread_id/
  • Plumb MCP turn metadata through _meta (#15190)
    ## Summary
    
    Some background. We're looking to instrument GA turns end to end. Right
    now a big gap is grouping mcp tool calls with their codex sessions. We
    send session id and turn id headers to the responses call but not the
    mcp/wham calls.
    
    Ideally we could pass the args as headers like with responses, but given
    the setup of the rmcp client, we can't send as headers without either
    changing the rmcp package upstream to allow per request headers or
    introducing a mutex which break concurrency. An earlier attempt made the
    assumption that we had 1 client per thread, which allowed us to set
    headers at the start of a turn. @pakrym mentioned that this assumption
    might break in the near future.
    
    So the solution now is to package the turn metadata/session id into the
    _meta field in the post body and pull out in codex-backend.
    
    - send turn metadata to MCP servers via `tools/call` `_meta` instead of
    assuming per-thread request headers on shared clients
    - preserve the existing `_codex_apps` metadata while adding
    `x-codex-turn-metadata` for all MCP tool calls
    - extend tests to cover both custom MCP servers and the codex apps
    search flow
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Add experimental exec server URL handling (#15196)
    Add a config and attempt to start the server.
  • Move environment abstraction into exec server (#15125)
    The idea is that codex-exec exposes an Environment struct with services
    on it. Each of those is a trait.
    
    Depending on construction parameters passed to Environment they are
    either backed by local or remote server but core doesn't see these
    differences.
  • feat(core, tracing): create turn spans over websockets (#14632)
    ## Description
    
    Dependent on:
    - [responsesapi] https://github.com/openai/openai/pull/760991 
    - [codex-backend] https://github.com/openai/openai/pull/760985
    
    `codex app-server -> codex-backend -> responsesapi` now reuses a
    persistent websocket connection across many turns. This PR updates
    tracing when using websockets so that each `response.create` websocket
    request propagates the current tracing context, so we can get a holistic
    end-to-end trace for each turn.
    
    Tracing is propagated via special keys (`ws_request_header_traceparent`,
    `ws_request_header_tracestate`) set in the `client_metadata` param in
    Responses API.
    
    Currently tracing on websockets is a bit broken because we only set
    tracing context on ws connection time, so it's detached from a
    `turn/start` request.
  • fix: harden plugin feature gating (#15104)
    Resubmit https://github.com/openai/codex/pull/15020 with correct
    content.
    
    1. Use requirement-resolved config.features as the plugin gate.
    2. Guard plugin/list, plugin/read, and related flows behind that gate.
    3. Skip bad marketplace.json files instead of failing the whole list.
    4. Simplify plugin state and caching.
  • Propagate tool errors to code mode (#15075)
    Clean up error flow to push the FunctionCallError all the way up to
    dispatcher and allow code mode to surface as exception.
  • fix(subagents) share execpolicy by default (#13702)
    ## Summary
    If a subagent requests approval, and the user persists that approval to
    the execpolicy, it should (by default) propagate. We'll need to rethink
    this a bit in light of coming Permissions changes, though I think this
    is closer to the end state that we'd want, which is that execpolicy
    changes to one permissions profile should be synced across threads.
    
    ## Testing
    - [x] Added integration test
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • [hooks] userpromptsubmit - hook before user's prompt is executed (#14626)
    - this allows blocking the user's prompts from executing, and also
    prevents them from entering history
    - handles the edge case where you can both prevent the user's prompt AND
    add n amount of additionalContexts
    - refactors some old code into common.rs where hooks overlap
    functionality
    - refactors additionalContext being previously added to user messages,
    instead we use developer messages for them
    - handles queued messages correctly
    
    Sample hook for testing - if you write "[block-user-submit]" this hook
    will stop the thread:
    
    example run
    ```
    › sup
    
    
    • Running UserPromptSubmit hook: reading the observatory notes
    
    UserPromptSubmit hook (completed)
      warning: wizard-tower UserPromptSubmit demo inspected: sup
      hook context: Wizard Tower UserPromptSubmit demo fired. For this reply only, include the exact
    phrase 'observatory lanterns lit' exactly once near the end.
    
    • Just riding the cosmic wave and ready to help, my friend. What are we building today? observatory
      lanterns lit
    
    
    › and [block-user-submit]
    
    
    • Running UserPromptSubmit hook: reading the observatory notes
    
    UserPromptSubmit hook (stopped)
      warning: wizard-tower UserPromptSubmit demo blocked the prompt on purpose.
      stop: Wizard Tower demo block: remove [block-user-submit] to continue.
    ```
    
    .codex/config.toml
    ```
    [features]
    codex_hooks = true
    ```
    
    .codex/hooks.json
    ```
    {
      "hooks": {
        "UserPromptSubmit": [
          {
            "hooks": [
              {
                "type": "command",
                "command": "/usr/bin/python3 .codex/hooks/user_prompt_submit_demo.py",
                "timeoutSec": 10,
                "statusMessage": "reading the observatory notes"
              }
            ]
          }
        ]
      }
    }
    ```
    
    .codex/hooks/user_prompt_submit_demo.py
    ```
    #!/usr/bin/env python3
    
    import json
    import sys
    from pathlib import Path
    
    
    def prompt_from_payload(payload: dict) -> str:
        prompt = payload.get("prompt")
        if isinstance(prompt, str) and prompt.strip():
            return prompt.strip()
    
        event = payload.get("event")
        if isinstance(event, dict):
            user_prompt = event.get("user_prompt")
            if isinstance(user_prompt, str):
                return user_prompt.strip()
    
        return ""
    
    
    def main() -> int:
        payload = json.load(sys.stdin)
        prompt = prompt_from_payload(payload)
        cwd = Path(payload.get("cwd", ".")).name or "wizard-tower"
    
        if "[block-user-submit]" in prompt:
            print(
                json.dumps(
                    {
                        "systemMessage": (
                            f"{cwd} UserPromptSubmit demo blocked the prompt on purpose."
                        ),
                        "decision": "block",
                        "reason": (
                            "Wizard Tower demo block: remove [block-user-submit] to continue."
                        ),
                    }
                )
            )
            return 0
    
        prompt_preview = prompt or "(empty prompt)"
        if len(prompt_preview) > 80:
            prompt_preview = f"{prompt_preview[:77]}..."
    
        print(
            json.dumps(
                {
                    "systemMessage": (
                        f"{cwd} UserPromptSubmit demo inspected: {prompt_preview}"
                    ),
                    "hookSpecificOutput": {
                        "hookEventName": "UserPromptSubmit",
                        "additionalContext": (
                            "Wizard Tower UserPromptSubmit demo fired. "
                            "For this reply only, include the exact phrase "
                            "'observatory lanterns lit' exactly once near the end."
                        ),
                    },
                }
            )
        )
        return 0
    
    
    if __name__ == "__main__":
        raise SystemExit(main())
    ```
  • Prefer websockets when providers support them (#13592)
    Remove all flags and model settings.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Add FS abstraction and use in view_image (#14960)
    Adds an environment crate and environment + file system abstraction.
    
    Environment is a combination of attributes and services specific to
    environment the agent is connected to:
    File system, process management, OS, default shell.
    
    The goal is to move most of agent logic that assumes environment to work
    through the environment abstraction.
  • Stabilize Windows cmd-based shell test harnesses (#14958)
    ## What is flaky
    The Windows shell-driven integration tests in `codex-rs/core` were
    intermittently unstable, especially:
    
    - `apply_patch_cli_can_use_shell_command_output_as_patch_input`
    - `websocket_test_codex_shell_chain`
    - `websocket_v2_test_codex_shell_chain`
    
    ## Why it was flaky
    These tests were exercising real shell-tool flows through whichever
    shell Codex selected on Windows, and the `apply_patch` test also nested
    a PowerShell read inside `cmd /c`.
    
    There were multiple independent sources of nondeterminism in that setup:
    
    - The test harness depended on the model-selected Windows shell instead
    of pinning the shell it actually meant to exercise.
    - `cmd.exe /c powershell.exe -Command "..."` is quoting-sensitive; on CI
    that could leave the read command wrapped as a literal string instead of
    executing it.
    - Even after getting the quoting right, PowerShell could emit CLIXML
    progress records like module-initialization output onto stdout.
    - The `apply_patch` test was building a patch directly from shell
    stdout, so any quoting artifact or progress noise corrupted the patch
    input.
    
    So the failures were driven by shell startup and output-shape variance,
    not by the `apply_patch` or websocket logic themselves.
    
    ## How this PR fixes it
    - Add a test-only `user_shell_override` path so Windows integration
    tests can pin `cmd.exe` explicitly.
    - Use that override in the websocket shell-chain tests and in the
    `apply_patch` harness.
    - Change the nested Windows file read in
    `apply_patch_cli_can_use_shell_command_output_as_patch_input` to a UTF-8
    PowerShell `-EncodedCommand` script.
    - Run that nested PowerShell process with `-NonInteractive`, set
    `$ProgressPreference = 'SilentlyContinue'`, and read the file with
    `[System.IO.File]::ReadAllText(...)`.
    
    ## Why this fix fixes the flakiness
    The outer harness now runs under a deterministic shell, and the inner
    PowerShell read no longer depends on fragile `cmd` quoting or on
    progress output staying quiet by accident. The shell tool returns only
    the file contents, so patch construction and websocket assertions depend
    on stable test inputs instead of on runner-specific shell behavior.
    
    ---------
    
    Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
    Co-authored-by: Codex <noreply@openai.com>
  • fix(core): prevent hanging turn/start due to websocket warming issues (#14838)
    ## Description
    
    This PR fixes a bad first-turn failure mode in app-server when the
    startup websocket prewarm hangs. Before this change, `initialize ->
    thread/start -> turn/start` could sit behind the prewarm for up to five
    minutes, so the client would not see `turn/started`, and even
    `turn/interrupt` would block because the turn had not actually started
    yet.
    
    Now, we:
    - set a (configurable) timeout of 15s for websocket startup time,
    exposed as `websocket_startup_timeout_ms` in config.toml
    - `turn/started` is sent immediately on `turn/start` even if the
    websocket is still connecting
    - `turn/interrupt` can be used to cancel a turn that is still waiting on
    the websocket warmup
    - the turn task will wait for the full 15s websocket warming timeout
    before falling back
    
    ## Why
    
    The old behavior made app-server feel stuck at exactly the moment the
    client expects turn lifecycle events to start flowing. That was
    especially painful for external clients, because from their point of
    view the server had accepted the request but then went silent for
    minutes.
    
    ## Configuring the websocket startup timeout
    Can set it in config.toml like this:
    ```
    [model_providers.openai]
    supports_websockets = true
    websocket_connect_timeout_ms = 15000
    ```
  • [stack 2/4] Align main realtime v2 wire and runtime flow (#14830)
    ## Stack Position
    2/4. Built on top of #14828.
    
    ## Base
    - #14828
    
    ## Unblocks
    - #14829
    - #14827
    
    ## Scope
    - Port the realtime v2 wire parsing, session, app-server, and
    conversation runtime behavior onto the split websocket-method base.
    - Branch runtime behavior directly on the current realtime session kind
    instead of parser-derived flow flags.
    - Keep regression coverage in the existing e2e suites.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • feat: improve skills cache key to take into account config layering (#14806)
    Fix https://github.com/openai/codex/issues/14161
    
    This fixes sub-agent [[skills.config]] overrides being ignored when
    parent and child share the same cwd. The root cause was that turn skill
    loading rebuilt from cwd-only state and reused a cwd-scoped cache, so
    role-local skill enable/disable overrides did not reliably affect the
    spawned agent's effective skill set.
    
    This change switches turn construction to use the effective per-turn
    config and adds a config-aware skills cache keyed by skill roots plus
    final disabled paths.
  • Reuse guardian session across approvals (#14668)
    ## Summary
    - reuse a guardian subagent session across approvals so reviews keep a
    stable prompt cache key and avoid one-shot startup overhead
    - clear the guardian child history before each review so prior guardian
    decisions do not leak into later approvals
    - include the `smart_approvals` -> `guardian_approval` feature flag
    rename in the same PR to minimize release latency on a very tight
    timeline
    - add regression coverage for prompt-cache-key reuse without
    prior-review prompt bleed
    
    ## Request
    - Bug/enhancement request: internal guardian prompt-cache and latency
    improvement request
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • fix: persist future network host approvals across sessions (#14619)
    ## Summary
    - apply persisted execpolicy network rules when booting the managed
    network proxy
    - pass the current execpolicy into managed proxy startup so host
    approvals selected with "allow this host in the future" survive new
    sessions
  • Fix turn context reconstruction after backtracking (#14616)
    ## Summary
    - reuse rollout reconstruction when applying a backtrack rollback so
    `reference_context_item` is restored from persisted rollout state
    - build rollback replay from the flushed rollout items plus the rollback
    marker, avoiding the extra reread/fallback path
    - add regression coverage for rollback after compaction so turn-context
    diffing stays aligned after backtracking
    
    Co-authored-by: Codex <noreply@openai.com>
  • refactor: make unified-exec zsh-fork state explicit (#14633)
    ## Why
    
    The unified-exec path was carrying zsh-fork state in a partially
    flattened way.
    
    First, the decision about whether zsh-fork was active came from feature
    selection in `ToolsConfig`, while the real prerequisites lived in
    session state. That left the handler and runtime defending against
    partially configured cases later.
    
    Second, once zsh-fork was active, its two runtime-only paths were
    threaded through the runtime as separate arguments even though they form
    one coherent piece of configuration.
    
    This change keeps unified-exec on a single session-derived source of
    truth and bundles the zsh-fork-specific paths into a named config type
    so the runtime can pass them around as one unit.
    
    In particular, this PR introduces this enum so the `ZshFork` variant can
    carry the appropriate state with it:
    
    ```rust
    #[derive(Debug, Clone, Eq, PartialEq)]
    pub enum UnifiedExecShellMode {
        Direct,
        ZshFork(ZshForkConfig),
    }
    
    #[derive(Debug, Clone, Eq, PartialEq)]
    pub struct ZshForkConfig {
        pub(crate) shell_zsh_path: AbsolutePathBuf,
        pub(crate) main_execve_wrapper_exe: AbsolutePathBuf,
    }
    ```
    
    This cleanup was done in preparation for
    https://github.com/openai/codex/pull/13432.
    
    ## What Changed
    
    - Replaced the feature-only `UnifiedExecBackendConfig` split with
    `UnifiedExecShellMode` in `codex-rs/core/src/tools/spec.rs`.
    - Derived the unified-exec mode from session-backed inputs when building
    turn `ToolsConfig`, and preserved that mode across model switches and
    review turns.
    - Introduced `ZshForkConfig`, which stores the resolved zsh-fork
    `AbsolutePathBuf` values for the configured `zsh` binary and `execve`
    wrapper.
    - Threaded `ZshForkConfig` through unified-exec command construction and
    the zsh-fork preparation path so zsh-fork-specific runtime code consumes
    a single config object instead of separate path arguments.
    - Added focused tests for constructing zsh-fork mode only when session
    prerequisites are available, and updated the zsh-fork expectations to be
    target-platform aware.
    
    ## Testing
    
    - `cargo test -p codex-core zsh_fork --lib`
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/14633).
    * #13432
    * __->__ #14633
  • Add Smart Approvals guardian review across core, app-server, and TUI (#13860)
    ## Summary
    - add `approvals_reviewer = "user" | "guardian_subagent"` as the runtime
    control for who reviews approval requests
    - route Smart Approvals guardian review through core for command
    execution, file changes, managed-network approvals, MCP approvals, and
    delegated/subagent approval flows
    - expose guardian review in app-server with temporary unstable
    `item/autoApprovalReview/{started,completed}` notifications carrying
    `targetItemId`, `review`, and `action`
    - update the TUI so Smart Approvals can be enabled from `/experimental`,
    aligned with the matching `/approvals` mode, and surfaced clearly while
    reviews are pending or resolved
    
    ## Runtime model
    This PR does not introduce a new `approval_policy`.
    
    Instead:
    - `approval_policy` still controls when approval is needed
    - `approvals_reviewer` controls who reviewable approval requests are
    routed to:
      - `user`
      - `guardian_subagent`
    
    `guardian_subagent` is a carefully prompted reviewer subagent that
    gathers relevant context and applies a risk-based decision framework
    before approving or denying the request.
    
    The `smart_approvals` feature flag is a rollout/UI gate. Core runtime
    behavior keys off `approvals_reviewer`.
    
    When Smart Approvals is enabled from the TUI, it also switches the
    current `/approvals` settings to the matching Smart Approvals mode so
    users immediately see guardian review in the active thread:
    - `approval_policy = on-request`
    - `approvals_reviewer = guardian_subagent`
    - `sandbox_mode = workspace-write`
    
    Users can still change `/approvals` afterward.
    
    Config-load behavior stays intentionally narrow:
    - plain `smart_approvals = true` in `config.toml` remains just the
    rollout/UI gate and does not auto-set `approvals_reviewer`
    - the deprecated `guardian_approval = true` alias migration does
    backfill `approvals_reviewer = "guardian_subagent"` in the same scope
    when that reviewer is not already configured there, so old configs
    preserve their original guardian-enabled behavior
    
    ARC remains a separate safety check. For MCP tool approvals, ARC
    escalations now flow into the configured reviewer instead of always
    bypassing guardian and forcing manual review.
    
    ## Config stability
    The runtime reviewer override is stable, but the config-backed
    app-server protocol shape is still settling.
    
    - `thread/start`, `thread/resume`, and `turn/start` keep stable
    `approvalsReviewer` overrides
    - the config-backed `approvals_reviewer` exposure returned via
    `config/read` (including profile-level config) is now marked
    `[UNSTABLE]` / experimental in the app-server protocol until we are more
    confident in that config surface
    
    ## App-server surface
    This PR intentionally keeps the guardian app-server shape narrow and
    temporary.
    
    It adds generic unstable lifecycle notifications:
    - `item/autoApprovalReview/started`
    - `item/autoApprovalReview/completed`
    
    with payloads of the form:
    - `{ threadId, turnId, targetItemId, review, action? }`
    
    `review` is currently:
    - `{ status, riskScore?, riskLevel?, rationale? }`
    - where `status` is one of `inProgress`, `approved`, `denied`, or
    `aborted`
    
    `action` carries the guardian action summary payload from core when
    available. This lets clients render temporary standalone pending-review
    UI, including parallel reviews, even when the underlying tool item has
    not been emitted yet.
    
    These notifications are explicitly documented as `[UNSTABLE]` and
    expected to change soon.
    
    This PR does **not** persist guardian review state onto `thread/read`
    tool items. The intended follow-up is to attach guardian review state to
    the reviewed tool item lifecycle instead, which would improve
    consistency with manual approvals and allow thread history / reconnect
    flows to replay guardian review state directly.
    
    ## TUI behavior
    - `/experimental` exposes the rollout gate as `Smart Approvals`
    - enabling it in the TUI enables the feature and switches the current
    session to the matching Smart Approvals `/approvals` mode
    - disabling it in the TUI clears the persisted `approvals_reviewer`
    override when appropriate and returns the session to default manual
    review when the effective reviewer changes
    - `/approvals` still exposes the reviewer choice directly
    - the TUI renders:
    - pending guardian review state in the live status footer, including
    parallel review aggregation
      - resolved approval/denial state in history
    
    ## Scope notes
    This PR includes the supporting core/runtime work needed to make Smart
    Approvals usable end-to-end:
    - shell / unified-exec / apply_patch / managed-network / MCP guardian
    review
    - delegated/subagent approval routing into guardian review
    - guardian review risk metadata and action summaries for app-server/TUI
    - config/profile/TUI handling for `smart_approvals`, `guardian_approval`
    alias migration, and `approvals_reviewer`
    - a small internal cleanup of delegated approval forwarding to dedupe
    fallback paths and simplify guardian-vs-parent approval waiting (no
    intended behavior change)
    
    Out of scope for this PR:
    - redesigning the existing manual approval protocol shapes
    - persisting guardian review state onto app-server `ThreadItem`s
    - delegated MCP elicitation auto-review (the current delegated MCP
    guardian shim only covers the legacy `RequestUserInput` path)
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • feat(app-server, core): add more spans (#14479)
    ## Description
    
    This PR expands tracing coverage across app-server thread startup, core
    session initialization, and the Responses transport layer. It also gives
    core dispatch spans stable operation-specific names so traces are easier
    to follow than the old generic `submission_dispatch` spans.
    
    Also use `fmt::Display` for types that we serialize in traces so we send
    strings instead of rust types
  • sending back imagaegencall response back to responseapi (#14558)
    Sending back the ResponseItem::ImageGenerationCall as is, because it is
    now supported from the API-side.
  • Use a private desktop for Windows sandbox instead of Winsta0\Default (#14400)
    ## Summary
    - launch Windows sandboxed children on a private desktop instead of
    `Winsta0\Default`
    - make private desktop the default while keeping
    `windows.sandbox_private_desktop=false` as the escape hatch
    - centralize process launch through the shared
    `create_process_as_user(...)` path
    - scope the private desktop ACL to the launching logon SID
    
    ## Why
    Today sandboxed Windows commands run on the visible shared desktop. That
    leaves an avoidable same-desktop attack surface for window interaction,
    spoofing, and related UI/input issues. This change moves sandboxed
    commands onto a dedicated per-launch desktop by default so the sandbox
    no longer shares `Winsta0\Default` with the user session.
    
    The implementation stays conservative on security with no silent
    fallback back to `Winsta0\Default`
    
    If private-desktop setup fails on a machine, users can still opt out
    explicitly with `windows.sandbox_private_desktop=false`.
    
    ## Validation
    - `cargo build -p codex-cli`
    - elevated-path `codex exec` desktop-name probe returned
    `CodexSandboxDesktop-*`
    - elevated-path `codex exec` smoke sweep for shell commands, nested
    `pwsh`, jobs, and hidden `notepad` launch
    - unelevated-path full private-desktop compatibility sweep via `codex
    exec` with `-c windows.sandbox=unelevated`
  • Support waiting for code_mode sessions (#14295)
    ## Summary
    - persist the code mode runner process in the session-scoped code mode
    store
    - switch the runner protocol from `init` to `start` with explicit
    session ids
    - handle runner-side session processing without the init waiter queue
    
    ## Validation
    - just fmt
    - cargo check -p codex-core
    - node --check codex-rs/core/src/tools/code_mode_runner.cjs
  • [apps] Add tool_suggest tool. (#14287)
    - [x] Add tool_suggest tool.
    - [x] Move chatgpt/src/connectors.rs and core/src/connectors.rs into a
    dedicated mod so that we have all the logic and global cache in one
    place.
    - [x] Update TUI app link view to support rendering the installation
    view for mcp elicitation.
    
    ---------
    
    Co-authored-by: Shaqayeq <shaqayeq@openai.com>
    Co-authored-by: Eric Traut <etraut@openai.com>
    Co-authored-by: pakrym-oai <pakrym@openai.com>
    Co-authored-by: Ahmed Ibrahim <aibrahim@openai.com>
    Co-authored-by: guinness-oai <guinness@openai.com>
    Co-authored-by: Eugene Brevdo <ebrevdo@users.noreply.github.com>
    Co-authored-by: Charlie Guo <cguo@openai.com>
    Co-authored-by: Fouad Matin <fouad@openai.com>
    Co-authored-by: Fouad Matin <169186268+fouad-openai@users.noreply.github.com>
    Co-authored-by: xl-openai <xl@openai.com>
    Co-authored-by: alexsong-oai <alexsong@openai.com>
    Co-authored-by: Owen Lin <owenlin0@gmail.com>
    Co-authored-by: sdcoffey <stevendcoffey@gmail.com>
    Co-authored-by: Codex <noreply@openai.com>
    Co-authored-by: Won Park <won@openai.com>
    Co-authored-by: Dylan Hurd <dylan.hurd@openai.com>
    Co-authored-by: celia-oai <celia@openai.com>
    Co-authored-by: gabec-openai <gabec@openai.com>
    Co-authored-by: joeytrasatti-openai <joey.trasatti@openai.com>
    Co-authored-by: Leo Shimonaka <leoshimo@openai.com>
    Co-authored-by: Rasmus Rygaard <rasmus@openai.com>
    Co-authored-by: maja-openai <163171781+maja-openai@users.noreply.github.com>
    Co-authored-by: pash-openai <pash@openai.com>
    Co-authored-by: Josh McKinney <joshka@openai.com>
  • feat(app-server): propagate traces across tasks and core ops (#14387)
    ## Summary
    
    This PR keeps app-server RPC request trace context alive for the full
    lifetime of the work that request kicks off (e.g. for `thread/start`,
    this is `app-server rpc handler -> tokio background task -> core op
    submissions`). Previously we lose trace lineage once the request handler
    returns or hands work off to background tasks.
    
    This approach is especially relevant for `thread/start` and other RPC
    handlers that run in a non-blocking way. In the near future we'll most
    likely want to make all app-server handlers run in a non-blocking way by
    default, and only queue operations that must operate in order (e.g.
    thread RPCs per thread?), so we want to make sure tracing in app-server
    just generally works.
    
    Depends on https://github.com/openai/codex/pull/14300
    
    **Before**
    <img width="155" height="207" alt="image"
    src="https://github.com/user-attachments/assets/c9487459-36f1-436c-beb7-fafeb40737af"
    />
    
    
    **After**
    <img width="299" height="337" alt="image"
    src="https://github.com/user-attachments/assets/727392b2-d072-4427-9dc4-0502d8652dea"
    />
    
    ## What changed
    
    - Keep request-scoped trace context around until we send the final
    response or error, or the connection closes.
    - Thread that trace context through detached `thread/start` work so
    background startup stays attached to the originating request.
    - Pass request trace context through to downstream core operations,
    including:
      - thread creation
      - resume/fork flows
      - turn submission
      - review
      - interrupt
      - realtime conversation operations
    - Add tracing tests that verify:
      - remote W3C trace context is preserved for `thread/start`
      - remote W3C trace context is preserved for `turn/start`
      - downstream core spans stay under the originating request span
      - request-scoped tracing state is cleaned up correctly
    - Clean up shutdown behavior so detached background tasks and spawned
    threads are drained before process exit.
  • feat: search_tool migrate to bring you own tool of Responses API (#14274)
    ## Why
    
    to support a new bring your own search tool in Responses
    API(https://developers.openai.com/api/docs/guides/tools-tool-search#client-executed-tool-search)
    we migrating our bm25 search tool to use official way to execute search
    on client and communicate additional tools to the model.
    
    ## What
    - replace the legacy `search_tool_bm25` flow with client-executed
    `tool_search`
    - add protocol, SSE, history, and normalization support for
    `tool_search_call` and `tool_search_output`
    - return namespaced Codex Apps search results and wire namespaced
    follow-up tool calls back into MCP dispatch
  • Defer initial context insertion until the first turn (#14313)
    ## Summary
    - defer fresh-session `build_initial_context()` until the first real
    turn instead of seeding model-visible context during startup
    - rely on the existing `reference_context_item == None` turn-start path
    to inject full initial context on that first real turn (and again after
    baseline resets such as compaction)
    - add a regression test for `InitialHistory::New` and update affected
    deterministic tests / snapshots around developer-message layout,
    collaboration instructions, personality updates, and compact request
    shapes
    
    ## Notes
    - this PR does not add any special empty-thread `/compact` behavior
    - most of the snapshot churn is the direct result of moving the initial
    model-visible context from startup to the first real turn, so first-turn
    request layouts no longer contain a pre-user startup copy of permissions
    / environment / other developer-visible context
    - remote manual `/compact` with no prior user still skips the remote
    compact request; local first-turn `/compact` still issues a compact
    request, but that request now reflects the lack of startup-seeded
    context
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • spawn prompt (#14362)
    # External (non-OpenAI) Pull Request Requirements
    
    Before opening this Pull Request, please read the dedicated
    "Contributing" markdown file or your PR may be closed:
    https://github.com/openai/codex/blob/main/docs/contributing.md
    
    If your PR conforms to our contribution guidelines, replace this text
    with a detailed and high quality description of your changes.
    
    Include a link to a bug report or enhancement request.
  • chore: add a separate reject-policy flag for skill approvals (#14271)
    ## Summary
    - add `skill_approval` to `RejectConfig` and the app-server v2
    `AskForApproval::Reject` payload so skill-script prompts can be
    configured independently from sandbox and rule-based prompts
    - update Unix shell escalation to reject prompts based on the actual
    decision source, keeping prefix rules tied to `rules`, unmatched command
    fallbacks tied to `sandbox_approval`, and skill scripts tied to
    `skill_approval`
    - regenerate the affected protocol/config schemas and expand
    unit/integration coverage for the new flag and skill approval behavior
  • Add store/load support for code mode (#14259)
    adds support for transferring state across code mode invocations.
  • Add output schema to MCP tools and expose MCP tool results in code mode (#14236)
    Summary
    - drop `McpToolOutput` in favor of `CallToolResult`, moving its helpers
    to keep MCP tooling focused on the final result shape
    - wire the new schema definitions through code mode, context, handlers,
    and spec modules so MCP tools serialize the exact output shape expected
    by the model
    - extend code mode tests to cover multiple MCP call scenarios and ensure
    the serialized data matches the new schema
    - refresh JS runner helpers and protocol models alongside the schema
    changes
    
    Testing
    - Not run (not requested)
  • unifying all image saves to /tmp to bug-proof (#14149)
    image-gen feature will have the model saving to /tmp by default + at all
    times
  • Reuse McpToolOutput in McpHandler (#14229)
    We already have a type to represent the MCP tool output, reuse it
    instead of the custom McpHandlerOutput
  • Implemented thread-level atomic elicitation counter for stopwatch pausing (#12296)
    ### Purpose
    While trying to build out CLI-Tools for the agent to use under skills we
    have found that those tools sometimes need to invoke a user elicitation.
    These elicitations are handled out of band of the codex app-server but
    need to indicate to the exec manager that the command running is not
    going to progress on the usual timeout horizon.
    
    ### Example
    Model calls universal exec:
    `$ download-credit-card-history --start-date 2026-01-19 --end-date
    2026-02-19 > credit_history.jsonl`
    
    download-cred-card-history might hit a hosted/preauthenticated service
    to fetch data. That service might decide that the request requires an
    end user approval the access to the personal data. It should be able to
    signal to the running thread that the command in question is blocked on
    user elicitation. In that case we want the exec to continue, but the
    timeout to not expire on the tool call, essentially freezing time until
    the user approves or rejects the command at which point the tool would
    signal the app-server to decrement the outstanding elicitation count.
    Now timeouts would proceed as normal.
    
    ### What's Added
    
    - New v2 RPC methods:
        - thread/increment_elicitation
        - thread/decrement_elicitation
    - Protocol updates in:
        - codex-rs/app-server-protocol/src/protocol/common.rs
        - codex-rs/app-server-protocol/src/protocol/v2.rs
    - App-server handlers wired in:
        - codex-rs/app-server/src/codex_message_processor.rs
    
    ### Behavior
    
    - Counter starts at 0 per thread.
    - increment atomically increases the counter.
    - decrement atomically decreases the counter; decrement at 0 returns
    invalid request.
    - Transition rules:
    - 0 -> 1: broadcast pause state, pausing all active stopwatches
    immediately.
        - \>0 -> >0: remain paused.
        - 1 -> 0: broadcast unpause state, resuming stopwatches.
    - Core thread/session logic:
        - codex-rs/core/src/codex_thread.rs
        - codex-rs/core/src/codex.rs
        - codex-rs/core/src/mcp_connection_manager.rs
    
    ### Exec-server stopwatch integration
    
    - Added centralized stopwatch tracking/controller:
        - codex-rs/exec-server/src/posix/stopwatch_controller.rs
    - Hooked pause/unpause broadcast handling + stopwatch registration:
        - codex-rs/exec-server/src/posix/mcp.rs
        - codex-rs/exec-server/src/posix/stopwatch.rs
        - codex-rs/exec-server/src/posix.rs
  • feat: support disabling bundled system skills (#13792)
    Support disable bundled system skills with a config:
    
    [skills.bundled]
    enabled = false
  • Enforce single tool output type in codex handlers (#14157)
    We'll need to associate output schema with each tool. Each tool can only
    have on output type.
  • start of hooks engine (#13276)
    (Experimental)
    
    This PR adds a first MVP for hooks, with SessionStart and Stop
    
    The core design is:
    
    - hooks live in a dedicated engine under codex-rs/hooks
    - each hook type has its own event-specific file
    - hook execution is synchronous and blocks normal turn progression while
    running
    - matching hooks run in parallel, then their results are aggregated into
    a normalized HookRunSummary
    
    On the AppServer side, hooks are exposed as operational metadata rather
    than transcript-native items:
    
    - new live notifications: hook/started, hook/completed
    - persisted/replayed hook results live on Turn.hookRuns
    - we intentionally did not add hook-specific ThreadItem variants
    
    Hooks messages are not persisted, they remain ephemeral. The context
    changes they add are (they get appended to the user's prompt)
  • Refactor tool output into trait implementations (#14152)
    First state to making tool outputs strongly typed (and `renderable`).
  • fix(protocol): preserve legacy workspace-write semantics (#13957)
    ## Summary
    This is a fast follow to the initial `[permissions]` structure.
    
    - keep the new split-policy carveout behavior for narrower non-write
    entries under broader writable roots
    - preserve legacy `WorkspaceWrite` semantics by using a cwd-aware bridge
    that drops only redundant nested readable roots when projecting from
    `SandboxPolicy`
    - route the legacy macOS seatbelt adapter through that same legacy
    bridge so redundant nested readable roots do not become read-only
    carveouts on macOS
    - derive the legacy bridge for `command_exec` using the sandbox root cwd
    rather than the request cwd so policy derivation matches later sandbox
    enforcement
    - add regression coverage for the legacy macOS nested-readable-root case
    
    ## Examples
    ### Legacy `workspace-write` on macOS
    A legacy `workspace-write` policy can redundantly list a nested readable
    root under an already-writable workspace root.
    
    For example, legacy config can effectively mean:
    - workspace root (`.` / `cwd`) is writable
    - `docs/` is also listed in `readable_roots`
    
    The new shared split-policy helper intentionally treats a narrower
    non-write entry under a broader writable root as a carveout for real
    `[permissions]` configs. Without this fast follow, the unchanged macOS
    seatbelt legacy adapter could project that legacy shape into a
    `FileSystemSandboxPolicy` that treated `docs/` like a read-only carveout
    under the writable workspace root. In practice, legacy callers on macOS
    could unexpectedly lose write access inside `docs/`, even though that
    path was writable before the `[permissions]` migration work.
    
    This change fixes that by routing the legacy seatbelt path through the
    cwd-aware legacy bridge, so:
    - legacy `workspace-write` keeps `docs/` writable when `docs/` was only
    a redundant readable root
    - explicit `[permissions]` entries like `'.' = 'write'` and `'docs' =
    'read'` still make `docs/` read-only, which is the new intended
    split-policy behavior
    
    ### Legacy `command_exec` with a subdirectory cwd
    `command_exec` can run a command from a request cwd that is narrower
    than the sandbox root cwd.
    
    For example:
    - sandbox root cwd is `/repo`
    - request cwd is `/repo/subdir`
    - legacy policy is still `workspace-write` rooted at `/repo`
    
    Before this fast follow, `command_exec` derived the legacy bridge using
    the request cwd, but the sandbox was later built using the sandbox root
    cwd. That mismatch could miss redundant legacy readable roots during
    projection and accidentally reintroduce read-only carveouts for paths
    that should still be writable under the legacy model.
    
    This change fixes that by deriving the legacy bridge with the same
    sandbox root cwd that sandbox enforcement later uses.
    
    ## Verification
    - `just fmt`
    - `cargo test -p codex-core
    seatbelt_legacy_workspace_write_nested_readable_root_stays_writable`
    - `cargo test -p codex-core test_sandbox_config_parsing`
    - `cargo clippy -p codex-core -p codex-app-server --all-targets -- -D
    warnings`
    - `cargo clean`
  • feat(approvals) RejectConfig for request_permissions (#14118)
    ## Summary
    We need to support allowing request_permissions calls when using
    `Reject` policy
    
    <img width="1133" height="588" alt="Screenshot 2026-03-09 at 12 06
    40 PM"
    src="https://github.com/user-attachments/assets/a8df987f-c225-4866-b8ab-5590960daec5"
    />
    
    Note that this is a backwards-incompatible change for Reject policy. I'm
    not sure if we need to add a default based on our current use/setup
    
    ## Testing
    - [x] Added tests
    - [x] Tested locally
  • feat(core) Persist request_permission data across turns (#14009)
    ## Summary
    request_permissions flows should support persisting results for the
    session.
    
    Open Question: Still deciding if we need within-turn approvals - this
    adds complexity but I could see it being useful
    
    ## Testing
    - [x] Updated unit tests
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Add request permissions tool (#13092)
    Adds a built-in `request_permissions` tool and wires it through the
    Codex core, protocol, and app-server layers so a running turn can ask
    the client for additional permissions instead of relying on a static
    session policy.
    
    The new flow emits a `RequestPermissions` event from core, tracks the
    pending request by call ID, forwards it through app-server v2 as an
    `item/permissions/requestApproval` request, and resumes the tool call
    once the client returns an approved subset of the requested permission
    profile.