Commit Graph

421 Commits

  • chore(core) update prefix_rule guidance (#15231)
    ## Summary
    Small tweaks to the prefix_rule guidance.
    
    ## Testing
    - [x] in progress
  • fix: allow restricted filesystem profiles to read helper executables (#15114)
    ## Summary
    
    This PR fixes restricted filesystem permission profiles so Codex's
    runtime-managed helper executables remain readable without requiring
    explicit user configuration.
    
    - add implicit readable roots for the configured `zsh` helper path and
    the main execve wrapper
    - allowlist the shared `$CODEX_HOME/tmp/arg0` root when the execve
    wrapper lives there, so session-specific helper paths keep working
    - dedupe injected paths and avoid adding duplicate read entries to the
    sandbox policy
    - add regression coverage for restricted read mode with helper
    executable overrides
    
    ## Testing 
    before this change: got this error when executing a shell command via
    zsh fork:
    ```
    "sandbox error: sandbox denied exec error, exit code: 127, stdout: , stderr: /etc/zprofile:11: operation not permitted: /usr/libexec/path_helper\nzsh:1: operation not permitted: .codex/skills/proxy-a/scripts/fetch_example.sh\n"
    ```
    
    saw this change went away after this change, meaning the readable roots
    and injected correctly.
  • feat: change multi-agent to use path-like system instead of uuids (#15313)
    This PR add an URI-based system to reference agents within a tree. This
    comes from a sync between research and engineering.
    
    The main agent (the one manually spawned by a user) is always called
    `/root`. Any sub-agent spawned by it will be `/root/agent_1` for example
    where `agent_1` is chosen by the model.
    
    Any agent can contact any agents using the path.
    
    Paths can be used either in absolute or relative to the calling agents
    
    Resume is not supported for now on this new path
  • [hooks] use a user message > developer message for prompt continuation (#14867)
    ## Summary
    
    Persist Stop-hook continuation prompts as `user` messages instead of
    hidden `developer` messages + some requested integration tests
    
    This is a followup to @pakrym 's comment in
    https://github.com/openai/codex/pull/14532 to make sure stop-block
    continuation prompts match training for turn loops
    
    - Stop continuation now writes `<hook_prompt hook_run_id="...">stop
    hook's user prompt<hook_prompt>`
    - Introduces quick-xml dependency, though we already indirectly depended
    on it anyway via syntect
    - This PR only has about 500 lines of actual logic changes, the rest is
    tests/schema
    
    ## Testing
    
    Example run (with a sessionstart hook and 3 stop hooks) - this shows
    context added by session start, then two stop hooks sending their own
    additional prompts in a new turn. The model responds with a single
    message addressing both. Then when that turn ends, the hooks detect that
    they just ran using `stop_hook_active` and decide not to infinite loop
    
    test files for this (unzip, move codex -> .codex):
    [codex.zip](https://github.com/user-attachments/files/26075806/codex.zip)
    
    ```
    › cats
    
    
    • Running SessionStart hook: lighting the observatory
    
    SessionStart hook (completed)
      warning: Hi, I'm a session start hook for wizard-tower (startup).
      hook context: A wimboltine stonpet is an exotic cuisine from hyperspace
    
    • Cats are tiny zen wizards, my friend: equal parts nap, mystery, and chaos. If you want, we can talk
      cat facts, cat breeds, cat names, or build something cat-themed in this repo.
    
    • Running Stop hook: checking the tower wards
    
    • Running Stop hook: sacking the guards
    
    • Running Stop hook: hiring the guards
    
    Stop hook (completed)
      warning: Wizard Tower Stop hook reviewed the completed reply (177 chars).
    
    Stop hook (blocked)
      warning: Wizard Tower Stop hook continuing conversation
      feedback: cook the stonpet
    
    Stop hook (blocked)
      warning: Wizard Tower Stop hook continuing conversation
      feedback: eat the cooked stonpet
    
    • Stonpet’s cooked, aloha style: flash-seared over a blue quasiflame, glazed with nebula salt, and
      rested until the hyperspace juices settle.
    
      Now we eat with gratitude, my friend. One mindful bite in, and the flavor is pure cosmic surf:
      smoky, bright, and totally out of this dimension.
    
    • Running Stop hook: checking the tower wards
    
    • Running Stop hook: sacking the guards
    
    • Running Stop hook: hiring the guards
    
    Stop hook (completed)
      warning: Wizard Tower Stop hook reviewed the completed reply (285 chars).
    
    Stop hook (completed)
      warning: Wizard Tower Stop hook saw a second pass and stayed calm to avoid a loop.
    
    Stop hook (completed)
      warning: Wizard Tower Stop hook saw a second pass and stayed calm to avoid a loop.
    ```
  • feat: support product-scoped plugins. (#15041)
    1. Added SessionSource::Custom(String) and --session-source.
      2. Enforced plugin and skill products by session_source.
      3. Applied the same filtering to curated background refresh.
  • Revert "fix: harden plugin feature gating" (#15102)
    Reverts openai/codex#15020
    
    I messed up the commit in my PR and accidentally merged changes that
    were still under review.
  • Return image URL from view_image tool (#15072)
    Cleanup image semantics in code mode.
    
    `view_image` now returns `{image_url:string, details?: string}` 
    
    `image()` now allows both string parameter and `{image_url:string,
    details?: string}`
  • fix: harden plugin feature gating (#15020)
    1. Use requirement-resolved config.features as the plugin gate.
    2. Guard plugin/list, plugin/read, and related flows behind that gate.
    3. Skip bad marketplace.json files instead of failing the whole list.
    4. Simplify plugin state and caching.
  • Add notify to code-mode (#14842)
    Allows model to send an out-of-band notification.
    
    The notification is injected as another tool call output for the same
    call_id.
  • [hooks] userpromptsubmit - hook before user's prompt is executed (#14626)
    - this allows blocking the user's prompts from executing, and also
    prevents them from entering history
    - handles the edge case where you can both prevent the user's prompt AND
    add n amount of additionalContexts
    - refactors some old code into common.rs where hooks overlap
    functionality
    - refactors additionalContext being previously added to user messages,
    instead we use developer messages for them
    - handles queued messages correctly
    
    Sample hook for testing - if you write "[block-user-submit]" this hook
    will stop the thread:
    
    example run
    ```
    › sup
    
    
    • Running UserPromptSubmit hook: reading the observatory notes
    
    UserPromptSubmit hook (completed)
      warning: wizard-tower UserPromptSubmit demo inspected: sup
      hook context: Wizard Tower UserPromptSubmit demo fired. For this reply only, include the exact
    phrase 'observatory lanterns lit' exactly once near the end.
    
    • Just riding the cosmic wave and ready to help, my friend. What are we building today? observatory
      lanterns lit
    
    
    › and [block-user-submit]
    
    
    • Running UserPromptSubmit hook: reading the observatory notes
    
    UserPromptSubmit hook (stopped)
      warning: wizard-tower UserPromptSubmit demo blocked the prompt on purpose.
      stop: Wizard Tower demo block: remove [block-user-submit] to continue.
    ```
    
    .codex/config.toml
    ```
    [features]
    codex_hooks = true
    ```
    
    .codex/hooks.json
    ```
    {
      "hooks": {
        "UserPromptSubmit": [
          {
            "hooks": [
              {
                "type": "command",
                "command": "/usr/bin/python3 .codex/hooks/user_prompt_submit_demo.py",
                "timeoutSec": 10,
                "statusMessage": "reading the observatory notes"
              }
            ]
          }
        ]
      }
    }
    ```
    
    .codex/hooks/user_prompt_submit_demo.py
    ```
    #!/usr/bin/env python3
    
    import json
    import sys
    from pathlib import Path
    
    
    def prompt_from_payload(payload: dict) -> str:
        prompt = payload.get("prompt")
        if isinstance(prompt, str) and prompt.strip():
            return prompt.strip()
    
        event = payload.get("event")
        if isinstance(event, dict):
            user_prompt = event.get("user_prompt")
            if isinstance(user_prompt, str):
                return user_prompt.strip()
    
        return ""
    
    
    def main() -> int:
        payload = json.load(sys.stdin)
        prompt = prompt_from_payload(payload)
        cwd = Path(payload.get("cwd", ".")).name or "wizard-tower"
    
        if "[block-user-submit]" in prompt:
            print(
                json.dumps(
                    {
                        "systemMessage": (
                            f"{cwd} UserPromptSubmit demo blocked the prompt on purpose."
                        ),
                        "decision": "block",
                        "reason": (
                            "Wizard Tower demo block: remove [block-user-submit] to continue."
                        ),
                    }
                )
            )
            return 0
    
        prompt_preview = prompt or "(empty prompt)"
        if len(prompt_preview) > 80:
            prompt_preview = f"{prompt_preview[:77]}..."
    
        print(
            json.dumps(
                {
                    "systemMessage": (
                        f"{cwd} UserPromptSubmit demo inspected: {prompt_preview}"
                    ),
                    "hookSpecificOutput": {
                        "hookEventName": "UserPromptSubmit",
                        "additionalContext": (
                            "Wizard Tower UserPromptSubmit demo fired. "
                            "For this reply only, include the exact phrase "
                            "'observatory lanterns lit' exactly once near the end."
                        ),
                    },
                }
            )
        )
        return 0
    
    
    if __name__ == "__main__":
        raise SystemExit(main())
    ```
  • Prefer websockets when providers support them (#13592)
    Remove all flags and model settings.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Add FS abstraction and use in view_image (#14960)
    Adds an environment crate and environment + file system abstraction.
    
    Environment is a combination of attributes and services specific to
    environment the agent is connected to:
    File system, process management, OS, default shell.
    
    The goal is to move most of agent logic that assumes environment to work
    through the environment abstraction.
  • feat: Add product-aware plugin policies and clean up manifest naming (#14993)
    - Add shared Product support to marketplace plugin policy and skill
    policy (no enforced yet).
    - Move marketplace installation/authentication under policy and model it
    as MarketplacePluginPolicy.
    - Rename plugin/marketplace local manifest types to separate raw serde
    shapes from resolved in-memory models.
  • Gate realtime audio interruption logic to v2 (#14984)
    - thread the realtime version into conversation start and app-server
    notifications
    - keep playback-aware mic gating and playback interruption behavior on
    v2 only, leaving v1 on the legacy path
  • Cleanup skills/remote/xxx endpoints. (#14977)
    Remote skills/remote/xxx as they are not in used for now.
  • generate an internal json schema for RolloutLine (#14434)
    ### Why
    i'm working on something that parses and analyzes codex rollout logs,
    and i'd like to have a schema for generating a parser/validator.
    
    `codex app-server generate-internal-json-schema` writes an
    `RolloutLine.json` file
    
    while doing this, i noticed we have a writer <> reader mismatch issue on
    `FunctionCallOutputPayload` and reasoning item ID -- added some schemars
    annotations to fix those
    
    ### Test
    
    ```
    $ just codex app-server generate-internal-json-schema --out ./foo
    ```
    
    generates an `RolloutLine.json` file, which i validated against jsonl
    files on disk
    
    `just codex app-server --help` doesn't expose the
    `generate-internal-json-schema` option by default, but you can do `just
    codex app-server generate-internal-json-schema --help` if you know the
    command
    
    everything else still works
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Feat: CXA-1831 Persist latest model and reasoning effort in sqlite (#14859)
    ### Summary
    The goal is for us to get the latest turn model and reasoning effort on
    thread/resume is no override is provided on the thread/resume func call.
    This is the part 1 which we write the model and reasoning effort for a
    thread to the sqlite db and there will be a followup PR to consume the
    two new fields on thread/resume.
    
    [part 2 PR is currently WIP](https://github.com/openai/codex/pull/14888)
    and this one can be merged independently.
  • feat: show effective model in spawn agent event (#14944)
    Show effective model after the full config layering for the sub agent
  • fix: canonicalize symlinked Linux sandbox cwd (#14849)
    ## Problem
    On Linux, Codex can be launched from a workspace path that is a symlink
    (for example, a symlinked checkout or a symlinked parent directory).
    
    Our sandbox policy intentionally canonicalizes writable/readable roots
    to the real filesystem path before building the bubblewrap mounts. That
    part is correct and needed for safety.
    
    The remaining bug was that bubblewrap could still inherit the helper
    process's logical cwd, which might be the symlinked alias instead of the
    mounted canonical path. In that case, the sandbox starts in a cwd that
    does not exist inside the sandbox namespace even though the real
    workspace is mounted. This can cause sandboxed commands to fail in
    symlinked workspaces.
    
    ## Fix
    This PR keeps the sandbox policy behavior the same, but separates two
    concepts that were previously conflated:
    
    - the canonical cwd used to define sandbox mounts and permissions
    - the caller's logical cwd used when launching the command
    
    On the Linux bubblewrap path, we now thread the logical command cwd
    through the helper explicitly and only add `--chdir <canonical path>`
    when the logical cwd differs from the mounted canonical path.
    
    That means:
    - permissions are still computed from canonical paths
    - bubblewrap starts the command from a cwd that definitely exists inside
    the sandbox
    - we do not widen filesystem access or undo the earlier symlink
    hardening
    
    ## Why This Is Safe
    This is a narrow Linux-only launch fix, not a policy change.
    
    - Writable/readable root canonicalization stays intact.
    - Protected metadata carveouts still operate on canonical roots.
    - We only override bubblewrap's inherited cwd when the logical path
    would otherwise point at a symlink alias that is not mounted in the
    sandbox.
    
    ## Tests
    - kept the existing protocol/core regression coverage for symlink
    canonicalization
    - added regression coverage for symlinked cwd handling in the Linux
    bubblewrap builder/helper path
    
    Local validation:
    - `just fmt`
    - `cargo test -p codex-protocol`
    - `cargo test -p codex-core
    normalize_additional_permissions_canonicalizes_symlinked_write_paths`
    - `cargo clippy -p codex-linux-sandbox -p codex-protocol -p codex-core
    --tests -- -D warnings`
    - `cargo build --bin codex`
    
    ## Context
    This is related to #14694. The earlier writable-root symlink fix
    addressed the mount/permission side; this PR fixes the remaining
    symlinked-cwd launch mismatch in the Linux sandbox path.
  • [stack 2/4] Align main realtime v2 wire and runtime flow (#14830)
    ## Stack Position
    2/4. Built on top of #14828.
    
    ## Base
    - #14828
    
    ## Unblocks
    - #14829
    - #14827
    
    ## Scope
    - Port the realtime v2 wire parsing, session, app-server, and
    conversation runtime behavior onto the split websocket-method base.
    - Branch runtime behavior directly on the current realtime session kind
    instead of parser-derived flow flags.
    - Keep regression coverage in the existing e2e suites.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • feat: make interrupt state not final for multi-agents (#13850)
    Make `interrupted` an agent state and make it not final. As a result, a
    `wait` won't return on an interrupted agent and no notification will be
    send to the parent agent.
    
    The rationals are:
    * If a user interrupt a sub-agent for any reason, you don't want the
    parent agent to instantaneously ask the sub-agent to restart
    * If a parent agent interrupt a sub-agent, no need to add a noisy
    notification in the parent agen
  • Preserve background terminals on interrupt and rename cleanup command to /stop (#14602)
    ### Motivation
    - Interrupting a running turn (Ctrl+C / Esc) currently also terminates
    long‑running background shells, which is surprising for workflows like
    local dev servers or file watchers.
    - The existing cleanup command name was confusing; callers expect an
    explicit command to stop background terminals rather than a UI clear
    action.
    - Make background‑shell termination explicit and surface a clearer
    command name while preserving backward compatibility.
    
    ### Description
    - Renamed the background‑terminal cleanup slash command from `Clean`
    (`/clean`) to `Stop` (`/stop`) and kept `clean` as an alias in the
    command parsing/visibility layer, updated the user descriptions and
    command popup wiring accordingly.
    - Updated the unified‑exec footer text and snapshots to point to `/stop`
    (and trimmed corresponding snapshot output to match the new label).
    - Changed interrupt behavior so `Op::Interrupt` (Ctrl+C / Esc interrupt)
    no longer closes or clears tracked unified exec / background terminal
    processes in the TUI or core cleanup path; background shells are now
    preserved after an interrupt.
    - Updated protocol/docs to clarify that `turn/interrupt` (or
    `Op::Interrupt`) interrupts the active turn but does not terminate
    background terminals, and that `thread/backgroundTerminals/clean` is the
    explicit API to stop those shells.
    - Updated unit/integration tests and insta snapshots in the TUI and core
    unified‑exec suites to reflect the new semantics and command name.
    
    ### Testing
    - Ran formatting with `just fmt` in `codex-rs` (succeeded). 
    - Ran `cargo test -p codex-protocol` (succeeded). 
    - Attempted `cargo test -p codex-tui` but the build could not complete
    in this environment due to a native build dependency that requires
    `libcap` development headers (the `codex-linux-sandbox` vendored build
    step); install `libcap-dev` / make `libcap.pc` available in
    `PKG_CONFIG_PATH` to run the TUI test suite locally.
    - Updated and accepted the affected `insta` snapshots for the TUI
    changes so visual diffs reflect the new `/stop` wording and preserved
    interrupt behavior.
    
    ------
    [Codex
    Task](https://chatgpt.com/codex/tasks/task_i_69b39c44b6dc8323bd133ae206310fae)
  • fix: fix symlinked writable roots in sandbox policies (#14674)
    ## Summary
    - normalize effective readable, writable, and unreadable sandbox roots
    after resolving special paths so symlinked roots use canonical runtime
    paths
    - add a protocol regression test for a symlinked writable root with a
    denied child and update protocol expectations to canonicalized effective
    paths
    - update macOS seatbelt tests to assert against effective normalized
    roots produced by the shared policy helpers
    
    ## Testing
    - just fmt
    - cargo test -p codex-protocol
    - cargo test -p codex-core explicit_unreadable_paths_are_excluded_
    - cargo clippy -p codex-protocol -p codex-core --tests -- -D warnings
    
    ## Notes
    - This is intended to fix the symlinked TMPDIR bind failure in
    bubblewrap described in #14672.
    Fixes #14672
  • dynamic tool calls: add param exposeToContext to optionally hide tool (#14501)
    This extends dynamic_tool_calls to allow us to hide a tool from the
    model context but still use it as part of the general tool calling
    runtime (for ex from js_repl/code_mode)
  • move plugin/skill instructions into dev msg and reorder (#14609)
    Move the general `Apps`, `Skills` and `Plugins` instructions blocks out
    of `user_instructions` and into the developer message, with new `Apps ->
    Skills -> Plugins` order for better clarity.
    
    Also wrap those sections in stable XML-style instruction tags (like
    other sections) and update prompt-layout tests/snapshots. This makes the
    tests less brittle in snapshot output (we can parse the sections), and
    it consolidates the capability instructions in one place.
    
    #### Tests
    Updated snapshots, added tests.
    
    `<AGENTS_MD>` disappearing in snapshots is expected: before this change,
    the wrapped user-instructions message was kept alive by `Skills`
    content. Now that `Skills` and `Plugins` are in the developer message,
    that wrapper only appears when there is real
    project-doc/user-instructions content.
    
    ---------
    
    Co-authored-by: Charley Cunningham <ccunningham@openai.com>
  • Add Smart Approvals guardian review across core, app-server, and TUI (#13860)
    ## Summary
    - add `approvals_reviewer = "user" | "guardian_subagent"` as the runtime
    control for who reviews approval requests
    - route Smart Approvals guardian review through core for command
    execution, file changes, managed-network approvals, MCP approvals, and
    delegated/subagent approval flows
    - expose guardian review in app-server with temporary unstable
    `item/autoApprovalReview/{started,completed}` notifications carrying
    `targetItemId`, `review`, and `action`
    - update the TUI so Smart Approvals can be enabled from `/experimental`,
    aligned with the matching `/approvals` mode, and surfaced clearly while
    reviews are pending or resolved
    
    ## Runtime model
    This PR does not introduce a new `approval_policy`.
    
    Instead:
    - `approval_policy` still controls when approval is needed
    - `approvals_reviewer` controls who reviewable approval requests are
    routed to:
      - `user`
      - `guardian_subagent`
    
    `guardian_subagent` is a carefully prompted reviewer subagent that
    gathers relevant context and applies a risk-based decision framework
    before approving or denying the request.
    
    The `smart_approvals` feature flag is a rollout/UI gate. Core runtime
    behavior keys off `approvals_reviewer`.
    
    When Smart Approvals is enabled from the TUI, it also switches the
    current `/approvals` settings to the matching Smart Approvals mode so
    users immediately see guardian review in the active thread:
    - `approval_policy = on-request`
    - `approvals_reviewer = guardian_subagent`
    - `sandbox_mode = workspace-write`
    
    Users can still change `/approvals` afterward.
    
    Config-load behavior stays intentionally narrow:
    - plain `smart_approvals = true` in `config.toml` remains just the
    rollout/UI gate and does not auto-set `approvals_reviewer`
    - the deprecated `guardian_approval = true` alias migration does
    backfill `approvals_reviewer = "guardian_subagent"` in the same scope
    when that reviewer is not already configured there, so old configs
    preserve their original guardian-enabled behavior
    
    ARC remains a separate safety check. For MCP tool approvals, ARC
    escalations now flow into the configured reviewer instead of always
    bypassing guardian and forcing manual review.
    
    ## Config stability
    The runtime reviewer override is stable, but the config-backed
    app-server protocol shape is still settling.
    
    - `thread/start`, `thread/resume`, and `turn/start` keep stable
    `approvalsReviewer` overrides
    - the config-backed `approvals_reviewer` exposure returned via
    `config/read` (including profile-level config) is now marked
    `[UNSTABLE]` / experimental in the app-server protocol until we are more
    confident in that config surface
    
    ## App-server surface
    This PR intentionally keeps the guardian app-server shape narrow and
    temporary.
    
    It adds generic unstable lifecycle notifications:
    - `item/autoApprovalReview/started`
    - `item/autoApprovalReview/completed`
    
    with payloads of the form:
    - `{ threadId, turnId, targetItemId, review, action? }`
    
    `review` is currently:
    - `{ status, riskScore?, riskLevel?, rationale? }`
    - where `status` is one of `inProgress`, `approved`, `denied`, or
    `aborted`
    
    `action` carries the guardian action summary payload from core when
    available. This lets clients render temporary standalone pending-review
    UI, including parallel reviews, even when the underlying tool item has
    not been emitted yet.
    
    These notifications are explicitly documented as `[UNSTABLE]` and
    expected to change soon.
    
    This PR does **not** persist guardian review state onto `thread/read`
    tool items. The intended follow-up is to attach guardian review state to
    the reviewed tool item lifecycle instead, which would improve
    consistency with manual approvals and allow thread history / reconnect
    flows to replay guardian review state directly.
    
    ## TUI behavior
    - `/experimental` exposes the rollout gate as `Smart Approvals`
    - enabling it in the TUI enables the feature and switches the current
    session to the matching Smart Approvals `/approvals` mode
    - disabling it in the TUI clears the persisted `approvals_reviewer`
    override when appropriate and returns the session to default manual
    review when the effective reviewer changes
    - `/approvals` still exposes the reviewer choice directly
    - the TUI renders:
    - pending guardian review state in the live status footer, including
    parallel review aggregation
      - resolved approval/denial state in history
    
    ## Scope notes
    This PR includes the supporting core/runtime work needed to make Smart
    Approvals usable end-to-end:
    - shell / unified-exec / apply_patch / managed-network / MCP guardian
    review
    - delegated/subagent approval routing into guardian review
    - guardian review risk metadata and action summaries for app-server/TUI
    - config/profile/TUI handling for `smart_approvals`, `guardian_approval`
    alias migration, and `approvals_reviewer`
    - a small internal cleanup of delegated approval forwarding to dedupe
    fallback paths and simplify guardian-vs-parent approval waiting (no
    intended behavior change)
    
    Out of scope for this PR:
    - redesigning the existing manual approval protocol shapes
    - persisting guardian review state onto app-server `ThreadItem`s
    - delegated MCP elicitation auto-review (the current delegated MCP
    guardian shim only covers the legacy `RequestUserInput` path)
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • feat(app-server, core): add more spans (#14479)
    ## Description
    
    This PR expands tracing coverage across app-server thread startup, core
    session initialization, and the Responses transport layer. It also gives
    core dispatch spans stable operation-specific names so traces are easier
    to follow than the old generic `submission_dispatch` spans.
    
    Also use `fmt::Display` for types that we serialize in traces so we send
    strings instead of rust types
  • Include spawn agent model metadata in app-server items (#14410)
    - add model and reasoning effort to app-server collab spawn items and
    notifications
    - regenerate app-server protocol schemas for the new fields
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • refactor: centralize filesystem permissions precedence (#14174)
    ## Stack
    
       fix: fail closed for unsupported split windows sandboxing #14172
       fix: preserve split filesystem semantics in linux sandbox #14173
       fix: align core approvals with split sandbox policies #14171
    -> refactor: centralize filesystem permissions precedence #14174
    
    ## Summary
    - add a shared per-path split filesystem precedence helper in
    `FileSystemSandboxPolicy`
    - derive readable, writable, and unreadable roots from the same
    most-specific resolution rules
    - add regression coverage for nested `write` / `read` / `none` carveouts
    and legacy bridge enforcement detection
    
    ## Testing
    - cargo test -p codex-protocol
    - cargo clippy -p codex-protocol --tests -- -D warnings
  • feat: search_tool migrate to bring you own tool of Responses API (#14274)
    ## Why
    
    to support a new bring your own search tool in Responses
    API(https://developers.openai.com/api/docs/guides/tools-tool-search#client-executed-tool-search)
    we migrating our bm25 search tool to use official way to execute search
    on client and communicate additional tools to the model.
    
    ## What
    - replace the legacy `search_tool_bm25` flow with client-executed
    `tool_search`
    - add protocol, SSE, history, and normalization support for
    `tool_search_call` and `tool_search_output`
    - return namespaced Codex Apps search results and wire namespaced
    follow-up tool calls back into MCP dispatch
  • Add realtime start instructions config override (#14270)
    - add `realtime_start_instructions` config support
    - thread it into realtime context updates, schema, docs, and tests
  • Show spawned agent model and effort in TUI (#14273)
    - include the requested sub-agent model and reasoning effort in the
    spawn begin event\n- render that metadata next to the spawned agent name
    and role in the TUI transcript
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • chore: add a separate reject-policy flag for skill approvals (#14271)
    ## Summary
    - add `skill_approval` to `RejectConfig` and the app-server v2
    `AskForApproval::Reject` payload so skill-script prompts can be
    configured independently from sandbox and rule-based prompts
    - update Unix shell escalation to reject prompts based on the actual
    decision source, keeping prefix rules tied to `rules`, unmatched command
    fallbacks tied to `sandbox_approval`, and skill scripts tied to
    `skill_approval`
    - regenerate the affected protocol/config schemas and expand
    unit/integration coverage for the new flag and skill approval behavior
  • feat: Add additional macOS Sandbox Permissions for Launch Services, Contacts, Reminders (#14155)
    Add additional macOS Sandbox Permissions levers for the following:
    
    - Launch Services
    - Contacts
    - Reminders
  • Add output schema to MCP tools and expose MCP tool results in code mode (#14236)
    Summary
    - drop `McpToolOutput` in favor of `CallToolResult`, moving its helpers
    to keep MCP tooling focused on the final result shape
    - wire the new schema definitions through code mode, context, handlers,
    and spec modules so MCP tools serialize the exact output shape expected
    by the model
    - extend code mode tests to cover multiple MCP call scenarios and ensure
    the serialized data matches the new schema
    - refresh JS runner helpers and protocol models alongside the schema
    changes
    
    Testing
    - Not run (not requested)
  • Reuse McpToolOutput in McpHandler (#14229)
    We already have a type to represent the MCP tool output, reuse it
    instead of the custom McpHandlerOutput
  • Use realtime transcript for handoff context (#14132)
    - collect input/output transcript deltas into active handoff transcript
    state
    - attach and clear that transcript on each handoff, and regenerate
    schema/tests
  • fix(core) default RejectConfig.request_permissions (#14165)
    ## Summary
    Adds a default here so existing config deserializes
    
    ## Testing
    - [x] Added a unit test
  • start of hooks engine (#13276)
    (Experimental)
    
    This PR adds a first MVP for hooks, with SessionStart and Stop
    
    The core design is:
    
    - hooks live in a dedicated engine under codex-rs/hooks
    - each hook type has its own event-specific file
    - hook execution is synchronous and blocks normal turn progression while
    running
    - matching hooks run in parallel, then their results are aggregated into
    a normalized HookRunSummary
    
    On the AppServer side, hooks are exposed as operational metadata rather
    than transcript-native items:
    
    - new live notifications: hook/started, hook/completed
    - persisted/replayed hook results live on Turn.hookRuns
    - we intentionally did not add hook-specific ThreadItem variants
    
    Hooks messages are not persisted, they remain ephemeral. The context
    changes they add are (they get appended to the user's prompt)
  • fix: keep permissions profiles forward compatible (#14107)
    ## Summary
    - preserve unknown `:special_path` tokens, including nested entries, so
    older Codex builds warn and ignore instead of failing config load
    - fail closed with a startup warning when a permissions profile has
    missing or empty filesystem entries instead of aborting profile
    compilation
    - normalize Windows verbatim paths like `\?\C:\...` before absolute-path
    validation while keeping explicit errors for truly invalid paths
    
    ## Testing
    - just fmt
    - cargo test -p codex-core permissions_profiles_allow
    - cargo test -p codex-core
    normalize_absolute_path_for_platform_simplifies_windows_verbatim_paths
    - cargo test -p codex-protocol
    unknown_special_paths_are_ignored_by_legacy_bridge
    - cargo clippy -p codex-core -p codex-protocol --all-targets -- -D
    warnings
    - cargo clean
  • fix(protocol): preserve legacy workspace-write semantics (#13957)
    ## Summary
    This is a fast follow to the initial `[permissions]` structure.
    
    - keep the new split-policy carveout behavior for narrower non-write
    entries under broader writable roots
    - preserve legacy `WorkspaceWrite` semantics by using a cwd-aware bridge
    that drops only redundant nested readable roots when projecting from
    `SandboxPolicy`
    - route the legacy macOS seatbelt adapter through that same legacy
    bridge so redundant nested readable roots do not become read-only
    carveouts on macOS
    - derive the legacy bridge for `command_exec` using the sandbox root cwd
    rather than the request cwd so policy derivation matches later sandbox
    enforcement
    - add regression coverage for the legacy macOS nested-readable-root case
    
    ## Examples
    ### Legacy `workspace-write` on macOS
    A legacy `workspace-write` policy can redundantly list a nested readable
    root under an already-writable workspace root.
    
    For example, legacy config can effectively mean:
    - workspace root (`.` / `cwd`) is writable
    - `docs/` is also listed in `readable_roots`
    
    The new shared split-policy helper intentionally treats a narrower
    non-write entry under a broader writable root as a carveout for real
    `[permissions]` configs. Without this fast follow, the unchanged macOS
    seatbelt legacy adapter could project that legacy shape into a
    `FileSystemSandboxPolicy` that treated `docs/` like a read-only carveout
    under the writable workspace root. In practice, legacy callers on macOS
    could unexpectedly lose write access inside `docs/`, even though that
    path was writable before the `[permissions]` migration work.
    
    This change fixes that by routing the legacy seatbelt path through the
    cwd-aware legacy bridge, so:
    - legacy `workspace-write` keeps `docs/` writable when `docs/` was only
    a redundant readable root
    - explicit `[permissions]` entries like `'.' = 'write'` and `'docs' =
    'read'` still make `docs/` read-only, which is the new intended
    split-policy behavior
    
    ### Legacy `command_exec` with a subdirectory cwd
    `command_exec` can run a command from a request cwd that is narrower
    than the sandbox root cwd.
    
    For example:
    - sandbox root cwd is `/repo`
    - request cwd is `/repo/subdir`
    - legacy policy is still `workspace-write` rooted at `/repo`
    
    Before this fast follow, `command_exec` derived the legacy bridge using
    the request cwd, but the sandbox was later built using the sandbox root
    cwd. That mismatch could miss redundant legacy readable roots during
    projection and accidentally reintroduce read-only carveouts for paths
    that should still be writable under the legacy model.
    
    This change fixes that by deriving the legacy bridge with the same
    sandbox root cwd that sandbox enforcement later uses.
    
    ## Verification
    - `just fmt`
    - `cargo test -p codex-core
    seatbelt_legacy_workspace_write_nested_readable_root_stays_writable`
    - `cargo test -p codex-core test_sandbox_config_parsing`
    - `cargo clippy -p codex-core -p codex-app-server --all-targets -- -D
    warnings`
    - `cargo clean`
  • feat(approvals) RejectConfig for request_permissions (#14118)
    ## Summary
    We need to support allowing request_permissions calls when using
    `Reject` policy
    
    <img width="1133" height="588" alt="Screenshot 2026-03-09 at 12 06
    40 PM"
    src="https://github.com/user-attachments/assets/a8df987f-c225-4866-b8ab-5590960daec5"
    />
    
    Note that this is a backwards-incompatible change for Reject policy. I'm
    not sure if we need to add a default based on our current use/setup
    
    ## Testing
    - [x] Added tests
    - [x] Tested locally
  • feat(core) Persist request_permission data across turns (#14009)
    ## Summary
    request_permissions flows should support persisting results for the
    session.
    
    Open Question: Still deciding if we need within-turn approvals - this
    adds complexity but I could see it being useful
    
    ## Testing
    - [x] Updated unit tests
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>