Commit Graph

440 Commits

  • Remove remaining custom prompt support (#16115)
    ## Summary
    - remove protocol and core support for discovering and listing custom
    prompts
    - simplify the TUI slash-command flow and command popup to built-in
    commands only
    - delete obsolete custom prompt tests, helpers, and docs references
    - clean up downstream event handling for the removed protocol events
  • chore: clean up argument-comment lint and roll out all-target CI on macOS (#16054)
    ## Why
    
    `argument-comment-lint` was green in CI even though the repo still had
    many uncommented literal arguments. The main gap was target coverage:
    the repo wrapper did not force Cargo to inspect test-only call sites, so
    examples like the `latest_session_lookup_params(true, ...)` tests in
    `codex-rs/tui_app_server/src/lib.rs` never entered the blocking CI path.
    
    This change cleans up the existing backlog, makes the default repo lint
    path cover all Cargo targets, and starts rolling that stricter CI
    enforcement out on the platform where it is currently validated.
    
    ## What changed
    
    - mechanically fixed existing `argument-comment-lint` violations across
    the `codex-rs` workspace, including tests, examples, and benches
    - updated `tools/argument-comment-lint/run-prebuilt-linter.sh` and
    `tools/argument-comment-lint/run.sh` so non-`--fix` runs default to
    `--all-targets` unless the caller explicitly narrows the target set
    - fixed both wrappers so forwarded cargo arguments after `--` are
    preserved with a single separator
    - documented the new default behavior in
    `tools/argument-comment-lint/README.md`
    - updated `rust-ci` so the macOS lint lane keeps the plain wrapper
    invocation and therefore enforces `--all-targets`, while Linux and
    Windows temporarily pass `-- --lib --bins`
    
    That temporary CI split keeps the stricter all-targets check where it is
    already cleaned up, while leaving room to finish the remaining Linux-
    and Windows-specific target-gated cleanup before enabling
    `--all-targets` on those runners. The Linux and Windows failures on the
    intermediate revision were caused by the wrapper forwarding bug, not by
    additional lint findings in those lanes.
    
    ## Validation
    
    - `bash -n tools/argument-comment-lint/run.sh`
    - `bash -n tools/argument-comment-lint/run-prebuilt-linter.sh`
    - shell-level wrapper forwarding check for `-- --lib --bins`
    - shell-level wrapper forwarding check for `-- --tests`
    - `just argument-comment-lint`
    - `cargo test` in `tools/argument-comment-lint`
    - `cargo test -p codex-terminal-detection`
    
    ## Follow-up
    
    - Clean up remaining Linux-only target-gated callsites, then switch the
    Linux lint lane back to the plain wrapper invocation.
    - Clean up remaining Windows-only target-gated callsites, then switch
    the Windows lint lane back to the plain wrapper invocation.
  • Add usage-based business plan types (#15934)
    ## Summary
    - add `self_serve_business_usage_based` and `enterprise_cbp_usage_based`
    to the public/internal plan enums and regenerate the app-server + Python
    SDK artifacts
    - map both plans through JWT login and backend rate-limit payloads, then
    bucket them with the existing Team/Business entitlement behavior in
    cloud requirements, usage-limit copy, tooltips, and status display
    - keep the earlier display-label remap commit on this branch so the new
    Team-like and Business-like plans render consistently in the UI
    
    ## Testing
    - `just write-app-server-schema`
    - `uv run --project sdk/python python
    sdk/python/scripts/update_sdk_artifacts.py generate-types`
    - `just fix -p codex-protocol -p codex-login -p codex-core -p
    codex-backend-client -p codex-cloud-requirements -p codex-tui -p
    codex-tui-app-server -p codex-backend-openapi-models`
    - `just fmt`
    - `just argument-comment-lint`
    - `cargo test -p codex-protocol
    usage_based_plan_types_use_expected_wire_names`
    - `cargo test -p codex-login usage_based`
    - `cargo test -p codex-backend-client usage_based`
    - `cargo test -p codex-cloud-requirements usage_based`
    - `cargo test -p codex-core usage_limit_reached_error_formats_`
    - `cargo test -p codex-tui plan_type_display_name_remaps_display_labels`
    - `cargo test -p codex-tui remapped`
    - `cargo test -p codex-tui-app-server
    plan_type_display_name_remaps_display_labels`
    - `cargo test -p codex-tui-app-server remapped`
    - `cargo test -p codex-tui-app-server
    preserves_usage_based_plan_type_wire_name`
    
    ## Notes
    - a broader multi-crate `cargo test` run still hits unrelated existing
    guardian-approval config failures in
    `codex-rs/core/src/config/config_tests.rs`
  • permissions: remove macOS seatbelt extension profiles (#15918)
    ## Why
    
    `PermissionProfile` should only describe the per-command permissions we
    still want to grant dynamically. Keeping
    `MacOsSeatbeltProfileExtensions` in that surface forced extra macOS-only
    approval, protocol, schema, and TUI branches for a capability we no
    longer want to expose.
    
    ## What changed
    
    - Removed the macOS-specific permission-profile types from
    `codex-protocol`, the app-server v2 API, and the generated
    schema/TypeScript artifacts.
    - Deleted the core and sandboxing plumbing that threaded
    `MacOsSeatbeltProfileExtensions` through execution requests and seatbelt
    construction.
    - Simplified macOS seatbelt generation so it always includes the fixed
    read-only preferences allowlist instead of carrying a configurable
    profile extension.
    - Removed the macOS additional-permissions UI/docs/test coverage and
    deleted the obsolete macOS permission modules.
    - Tightened `request_permissions` intersection handling so explicitly
    empty requested read lists are preserved only when that field was
    actually granted, avoiding zero-grant responses being stored as active
    permissions.
  • chore: remove skill metadata from command approval payloads (#15906)
    ## Why
    
    This is effectively a follow-up to
    [#15812](https://github.com/openai/codex/pull/15812). That change
    removed the special skill-script exec path, but `skill_metadata` was
    still being threaded through command-approval payloads even though the
    approval flow no longer uses it to render prompts or resolve decisions.
    
    Keeping it around added extra protocol, schema, and client surface area
    without changing behavior.
    
    Removing it keeps the command-approval contract smaller and avoids
    carrying a dead field through app-server, TUI, and MCP boundaries.
    
    ## What changed
    
    - removed `ExecApprovalRequestSkillMetadata` and the corresponding
    `skillMetadata` field from core approval events and the v2 app-server
    protocol
    - removed the generated JSON and TypeScript schema output for that field
    - updated app-server, MCP server, TUI, and TUI app-server approval
    plumbing to stop forwarding the field
    - cleaned up tests that previously constructed or asserted
    `skillMetadata`
    
    ## Testing
    
    - `cargo test -p codex-app-server-protocol`
    - `cargo test -p codex-protocol`
    - `cargo test -p codex-app-server-test-client`
    - `cargo test -p codex-mcp-server`
    - `just argument-comment-lint`
  • Protect first-time project .codex creation across Linux and macOS sandboxes (#15067)
    ## Problem
    
    Codex already treated an existing top-level project `./.codex` directory
    as protected, but there was a gap on first creation.
    
    If `./.codex` did not exist yet, a turn could create files under it,
    such as `./.codex/config.toml`, without going through the same approval
    path as later modifications. That meant the initial write could bypass
    the intended protection for project-local Codex state.
    
    ## What this changes
    
    This PR closes that first-creation gap in the Unix enforcement layers:
    
    - `codex-protocol`
    - treat the top-level project `./.codex` path as a protected carveout
    even when it does not exist yet
    - avoid injecting the default carveout when the user already has an
    explicit rule for that exact path
    - macOS Seatbelt
    - deny writes to both the exact protected path and anything beneath it,
    so creating `./.codex` itself is blocked in addition to writes inside it
    - Linux bubblewrap
    - preserve the same protected-path behavior for first-time creation
    under `./.codex`
    - tests
    - add protocol regressions for missing `./.codex` and explicit-rule
    collisions
    - add Unix sandbox coverage for blocking first-time `./.codex` creation
      - tighten Seatbelt policy assertions around excluded subpaths
    
    ## Scope
    
    This change is intentionally scoped to protecting the top-level project
    `.codex` subtree from agent writes.
    
    It does not make `.codex` unreadable, and it does not change the product
    behavior around loading project skills from `.codex` when project config
    is untrusted.
    
    ## Why this shape
    
    The fix is pointed rather than broad:
    - it preserves the current model of “project `.codex` is protected from
    writes”
    - it closes the security-relevant first-write hole
    - it avoids folding a larger permissions-model redesign into this PR
    
    ## Validation
    
    - `cargo test -p codex-protocol`
    - `cargo test -p codex-sandboxing seatbelt`
    - `cargo test -p codex-exec --test all
    sandbox_blocks_first_time_dot_codex_creation -- --nocapture`
    
    ---------
    
    Co-authored-by: Michael Bolin <mbolin@openai.com>
  • [hooks] add non-streaming (non-stdin style) shell-only PostToolUse support (#15531)
    CHAINED PR - note that base is eternal/hooks-pretooluse-bash, not main
    -- so the following PR should be first
    
    Matching post-tool hook to the pre-tool functionality here:
    https://github.com/openai/codex/pull/15211
    
    So, PreToolUse calls for plain shell calls, allows blocking. This
    PostToolUse call runs after the command executed
    
    example run:
    
    ```
    › as a test, run in parallel the following commands:
      - echo 'one'
      - echo '[block-pre-tool-use]'
      - echo '[block-post-tool-use]'
    
    
    ⚠ MCP startup incomplete (failed: notion, linear)
    
    • Cruising through those three commands in parallel now, and I’ll share the exact outputs right after
      they land.
    
    • Running PreToolUse hook: checking the observatory runes
    
    • Running PreToolUse hook: checking the observatory runes
    
    • Running PreToolUse hook: checking the observatory runes
    
    PreToolUse hook (blocked)
      warning: wizard-tower PreToolUse demo blocked a Bash command on purpose.
      feedback: PreToolUse demo blocked the command. Remove [block-pre-tool-use] to continue.
    
    PreToolUse hook (completed)
      warning: wizard-tower PreToolUse demo inspected Bash: echo 'one'
    
    PreToolUse hook (completed)
      warning: wizard-tower PreToolUse demo inspected Bash: echo '[block-post-tool-use]'
    
    • Ran echo '[block-post-tool-use]'
      └ [block-post-tool-use]
    
    • Ran echo 'one'
      └ one
    
    • Running PostToolUse hook: reading the observatory aftermath
    
    • Running PostToolUse hook: reading the observatory aftermath
    
    PostToolUse hook (blocked)
      warning: wizard-tower PostToolUse demo blocked the Bash follow-through on purpose.
      feedback: PostToolUse demo blocked the result after execution. Remove [block-post-tool-use] to
    continue.
    
    PostToolUse hook (completed)
      warning: wizard-tower PostToolUse demo inspected Bash: echo 'one'
      hook context: PostToolUse demo saw the Bash result. Response preview: one
    
    ──────────────────────────────────────────────────────────────────────────────────────────────────────
    
    • Ran them in parallel. Little wave report:
    
      - echo 'one' completed and returned one.
      - echo '[block-pre-tool-use]' was blocked before execution by the PreToolUse hook.
      - echo '[block-post-tool-use]' ran, but its result was blocked afterward by the PostToolUse hook.
    
      The hook messages were:
    
      - PreToolUse demo blocked the command. Remove [block-pre-tool-use] to continue.
      - PostToolUse demo blocked the result after execution. Remove [block-post-tool-use] to continue.
    
    
    › for the blocked messages (correctly blocked in this test) did the tool output land back into
      context?
    
    
    • Yeah, from this run:
    
      - PreToolUse block: the command itself did not execute, so no command output landed in context. Only
        the block message did.
      - PostToolUse block: the command did execute, but the actual stdout did not come back into context.
        Only the post-tool block message did.
    
      So the blocked notifications landed in context, but the blocked command payloads themselves did not.
      Nice clean guardrail, aloha.
    ```
  • Move string truncation helpers into codex-utils-string (#15572)
    - move the shared byte-based middle truncation logic from `core` into
    `codex-utils-string`
    - keep token-specific truncation in `codex-core` so rollout can reuse
    the shared helper in the next stacked PR
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Move git utilities into a dedicated crate (#15564)
    - create `codex-git-utils` and move the shared git helpers into it with
    file moves preserved for diff readability
    - move the `GitInfo` helpers out of `core` so stacked rollout work can
    depend on the shared crate without carrying its own git info module
    
    ---------
    
    Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
    Co-authored-by: Codex <noreply@openai.com>
  • feat: communication pattern v2 (#15647)
    See internal communication
  • feat: disable notifier v2 and start turn on agent interaction (#15624)
    Make the inter-agent communication start a turn
    
    As part of this, we disable the v2 notifier to prevent some odd
    behaviour where the agent restart working while you're talking to it for
    example
  • chore(core) Add approvals reviewer to UserTurn (#15426)
    ## Summary
    Adds support for approvals_reviewer to `Op::UserTurn` so we can migrate
    `[CodexMessageProcessor::turn_start]` to use Op::UserTurn
    
    ## Testing
    - [x] Adds quick test for the new field
    
    Co-authored-by: Codex <noreply@openai.com>
  • feat: use serde to differenciate inter agent communication (#15560)
    Use `serde` to encode the inter agent communication to an assistant
    message and use the decode to see if this is such a message
    
    Note: this assume serde on small pattern is fast enough
  • [hooks] add non-streaming (non-stdin style) shell-only PreToolUse support (#15211)
    - add `PreToolUse` hook for bash-like tool execution only at first
    - block shell execution before dispatch with deny-only hook behavior
    - introduces common.rs matcher framework for matching when hooks are run
    
    example run:
    
    ```
    › run three parallel echo commands, and the second one should echo "[block-pre-tool-use]" as a test
    
    
    • Running the three echo commands in parallel now and I’ll report the output directly.
    
    • Running PreToolUse hook: name for demo pre tool use hook
    
    • Running PreToolUse hook: name for demo pre tool use hook
    
    • Running PreToolUse hook: name for demo pre tool use hook
    
    PreToolUse hook (completed)
      warning: wizard-tower PreToolUse demo inspected Bash: echo "first parallel echo"
      
    PreToolUse hook (blocked)
      warning: wizard-tower PreToolUse demo blocked a Bash command on purpose.
      feedback: PreToolUse demo blocked the command. Remove [block-pre-tool-use] to continue.
    
    PreToolUse hook (completed)
      warning: wizard-tower PreToolUse demo inspected Bash: echo "third parallel echo"
    
    • Ran echo "first parallel echo"
      └ first parallel echo
    
    • Ran echo "third parallel echo"
      └ third parallel echo
    
    • Three little waves went out in parallel.
    
      1. printed first parallel echo
      2. was blocked before execution because it contained the exact test string [block-pre-tool-use]
      3. printed third parallel echo
    
      There was also an unrelated macOS defaults warning around the successful commands, but the echoes
      themselves worked fine. If you want, I can rerun the second one with a slightly modified string so
      it passes cleanly.
    ```
  • feat: new op type for sub-agents communication (#15556)
    Add `InterAgentCommunication` for v2 agent communication
  • tui: queue follow-ups during manual /compact (#15259)
    ## Summary
    - queue input after the user submits `/compact` until that manual
    compact turn ends
    - mirror the same behavior in the app-server TUI
    - add regressions for input queued before compact starts and while it is
    running
    
    Co-authored-by: Codex <noreply@openai.com>
  • chore(context) Include guardian approval context (#15366)
    ## Summary
    Include the guardian context in the developer message for approvals
    
    ## Testing
    - [x] Updated unit tests
  • chore(core) update prefix_rule guidance (#15231)
    ## Summary
    Small tweaks to the prefix_rule guidance.
    
    ## Testing
    - [x] in progress
  • fix: allow restricted filesystem profiles to read helper executables (#15114)
    ## Summary
    
    This PR fixes restricted filesystem permission profiles so Codex's
    runtime-managed helper executables remain readable without requiring
    explicit user configuration.
    
    - add implicit readable roots for the configured `zsh` helper path and
    the main execve wrapper
    - allowlist the shared `$CODEX_HOME/tmp/arg0` root when the execve
    wrapper lives there, so session-specific helper paths keep working
    - dedupe injected paths and avoid adding duplicate read entries to the
    sandbox policy
    - add regression coverage for restricted read mode with helper
    executable overrides
    
    ## Testing 
    before this change: got this error when executing a shell command via
    zsh fork:
    ```
    "sandbox error: sandbox denied exec error, exit code: 127, stdout: , stderr: /etc/zprofile:11: operation not permitted: /usr/libexec/path_helper\nzsh:1: operation not permitted: .codex/skills/proxy-a/scripts/fetch_example.sh\n"
    ```
    
    saw this change went away after this change, meaning the readable roots
    and injected correctly.
  • feat: change multi-agent to use path-like system instead of uuids (#15313)
    This PR add an URI-based system to reference agents within a tree. This
    comes from a sync between research and engineering.
    
    The main agent (the one manually spawned by a user) is always called
    `/root`. Any sub-agent spawned by it will be `/root/agent_1` for example
    where `agent_1` is chosen by the model.
    
    Any agent can contact any agents using the path.
    
    Paths can be used either in absolute or relative to the calling agents
    
    Resume is not supported for now on this new path
  • [hooks] use a user message > developer message for prompt continuation (#14867)
    ## Summary
    
    Persist Stop-hook continuation prompts as `user` messages instead of
    hidden `developer` messages + some requested integration tests
    
    This is a followup to @pakrym 's comment in
    https://github.com/openai/codex/pull/14532 to make sure stop-block
    continuation prompts match training for turn loops
    
    - Stop continuation now writes `<hook_prompt hook_run_id="...">stop
    hook's user prompt<hook_prompt>`
    - Introduces quick-xml dependency, though we already indirectly depended
    on it anyway via syntect
    - This PR only has about 500 lines of actual logic changes, the rest is
    tests/schema
    
    ## Testing
    
    Example run (with a sessionstart hook and 3 stop hooks) - this shows
    context added by session start, then two stop hooks sending their own
    additional prompts in a new turn. The model responds with a single
    message addressing both. Then when that turn ends, the hooks detect that
    they just ran using `stop_hook_active` and decide not to infinite loop
    
    test files for this (unzip, move codex -> .codex):
    [codex.zip](https://github.com/user-attachments/files/26075806/codex.zip)
    
    ```
    › cats
    
    
    • Running SessionStart hook: lighting the observatory
    
    SessionStart hook (completed)
      warning: Hi, I'm a session start hook for wizard-tower (startup).
      hook context: A wimboltine stonpet is an exotic cuisine from hyperspace
    
    • Cats are tiny zen wizards, my friend: equal parts nap, mystery, and chaos. If you want, we can talk
      cat facts, cat breeds, cat names, or build something cat-themed in this repo.
    
    • Running Stop hook: checking the tower wards
    
    • Running Stop hook: sacking the guards
    
    • Running Stop hook: hiring the guards
    
    Stop hook (completed)
      warning: Wizard Tower Stop hook reviewed the completed reply (177 chars).
    
    Stop hook (blocked)
      warning: Wizard Tower Stop hook continuing conversation
      feedback: cook the stonpet
    
    Stop hook (blocked)
      warning: Wizard Tower Stop hook continuing conversation
      feedback: eat the cooked stonpet
    
    • Stonpet’s cooked, aloha style: flash-seared over a blue quasiflame, glazed with nebula salt, and
      rested until the hyperspace juices settle.
    
      Now we eat with gratitude, my friend. One mindful bite in, and the flavor is pure cosmic surf:
      smoky, bright, and totally out of this dimension.
    
    • Running Stop hook: checking the tower wards
    
    • Running Stop hook: sacking the guards
    
    • Running Stop hook: hiring the guards
    
    Stop hook (completed)
      warning: Wizard Tower Stop hook reviewed the completed reply (285 chars).
    
    Stop hook (completed)
      warning: Wizard Tower Stop hook saw a second pass and stayed calm to avoid a loop.
    
    Stop hook (completed)
      warning: Wizard Tower Stop hook saw a second pass and stayed calm to avoid a loop.
    ```
  • feat: support product-scoped plugins. (#15041)
    1. Added SessionSource::Custom(String) and --session-source.
      2. Enforced plugin and skill products by session_source.
      3. Applied the same filtering to curated background refresh.
  • Revert "fix: harden plugin feature gating" (#15102)
    Reverts openai/codex#15020
    
    I messed up the commit in my PR and accidentally merged changes that
    were still under review.
  • Return image URL from view_image tool (#15072)
    Cleanup image semantics in code mode.
    
    `view_image` now returns `{image_url:string, details?: string}` 
    
    `image()` now allows both string parameter and `{image_url:string,
    details?: string}`
  • fix: harden plugin feature gating (#15020)
    1. Use requirement-resolved config.features as the plugin gate.
    2. Guard plugin/list, plugin/read, and related flows behind that gate.
    3. Skip bad marketplace.json files instead of failing the whole list.
    4. Simplify plugin state and caching.
  • Add notify to code-mode (#14842)
    Allows model to send an out-of-band notification.
    
    The notification is injected as another tool call output for the same
    call_id.
  • [hooks] userpromptsubmit - hook before user's prompt is executed (#14626)
    - this allows blocking the user's prompts from executing, and also
    prevents them from entering history
    - handles the edge case where you can both prevent the user's prompt AND
    add n amount of additionalContexts
    - refactors some old code into common.rs where hooks overlap
    functionality
    - refactors additionalContext being previously added to user messages,
    instead we use developer messages for them
    - handles queued messages correctly
    
    Sample hook for testing - if you write "[block-user-submit]" this hook
    will stop the thread:
    
    example run
    ```
    › sup
    
    
    • Running UserPromptSubmit hook: reading the observatory notes
    
    UserPromptSubmit hook (completed)
      warning: wizard-tower UserPromptSubmit demo inspected: sup
      hook context: Wizard Tower UserPromptSubmit demo fired. For this reply only, include the exact
    phrase 'observatory lanterns lit' exactly once near the end.
    
    • Just riding the cosmic wave and ready to help, my friend. What are we building today? observatory
      lanterns lit
    
    
    › and [block-user-submit]
    
    
    • Running UserPromptSubmit hook: reading the observatory notes
    
    UserPromptSubmit hook (stopped)
      warning: wizard-tower UserPromptSubmit demo blocked the prompt on purpose.
      stop: Wizard Tower demo block: remove [block-user-submit] to continue.
    ```
    
    .codex/config.toml
    ```
    [features]
    codex_hooks = true
    ```
    
    .codex/hooks.json
    ```
    {
      "hooks": {
        "UserPromptSubmit": [
          {
            "hooks": [
              {
                "type": "command",
                "command": "/usr/bin/python3 .codex/hooks/user_prompt_submit_demo.py",
                "timeoutSec": 10,
                "statusMessage": "reading the observatory notes"
              }
            ]
          }
        ]
      }
    }
    ```
    
    .codex/hooks/user_prompt_submit_demo.py
    ```
    #!/usr/bin/env python3
    
    import json
    import sys
    from pathlib import Path
    
    
    def prompt_from_payload(payload: dict) -> str:
        prompt = payload.get("prompt")
        if isinstance(prompt, str) and prompt.strip():
            return prompt.strip()
    
        event = payload.get("event")
        if isinstance(event, dict):
            user_prompt = event.get("user_prompt")
            if isinstance(user_prompt, str):
                return user_prompt.strip()
    
        return ""
    
    
    def main() -> int:
        payload = json.load(sys.stdin)
        prompt = prompt_from_payload(payload)
        cwd = Path(payload.get("cwd", ".")).name or "wizard-tower"
    
        if "[block-user-submit]" in prompt:
            print(
                json.dumps(
                    {
                        "systemMessage": (
                            f"{cwd} UserPromptSubmit demo blocked the prompt on purpose."
                        ),
                        "decision": "block",
                        "reason": (
                            "Wizard Tower demo block: remove [block-user-submit] to continue."
                        ),
                    }
                )
            )
            return 0
    
        prompt_preview = prompt or "(empty prompt)"
        if len(prompt_preview) > 80:
            prompt_preview = f"{prompt_preview[:77]}..."
    
        print(
            json.dumps(
                {
                    "systemMessage": (
                        f"{cwd} UserPromptSubmit demo inspected: {prompt_preview}"
                    ),
                    "hookSpecificOutput": {
                        "hookEventName": "UserPromptSubmit",
                        "additionalContext": (
                            "Wizard Tower UserPromptSubmit demo fired. "
                            "For this reply only, include the exact phrase "
                            "'observatory lanterns lit' exactly once near the end."
                        ),
                    },
                }
            )
        )
        return 0
    
    
    if __name__ == "__main__":
        raise SystemExit(main())
    ```
  • Prefer websockets when providers support them (#13592)
    Remove all flags and model settings.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Add FS abstraction and use in view_image (#14960)
    Adds an environment crate and environment + file system abstraction.
    
    Environment is a combination of attributes and services specific to
    environment the agent is connected to:
    File system, process management, OS, default shell.
    
    The goal is to move most of agent logic that assumes environment to work
    through the environment abstraction.
  • feat: Add product-aware plugin policies and clean up manifest naming (#14993)
    - Add shared Product support to marketplace plugin policy and skill
    policy (no enforced yet).
    - Move marketplace installation/authentication under policy and model it
    as MarketplacePluginPolicy.
    - Rename plugin/marketplace local manifest types to separate raw serde
    shapes from resolved in-memory models.
  • Gate realtime audio interruption logic to v2 (#14984)
    - thread the realtime version into conversation start and app-server
    notifications
    - keep playback-aware mic gating and playback interruption behavior on
    v2 only, leaving v1 on the legacy path
  • Cleanup skills/remote/xxx endpoints. (#14977)
    Remote skills/remote/xxx as they are not in used for now.
  • generate an internal json schema for RolloutLine (#14434)
    ### Why
    i'm working on something that parses and analyzes codex rollout logs,
    and i'd like to have a schema for generating a parser/validator.
    
    `codex app-server generate-internal-json-schema` writes an
    `RolloutLine.json` file
    
    while doing this, i noticed we have a writer <> reader mismatch issue on
    `FunctionCallOutputPayload` and reasoning item ID -- added some schemars
    annotations to fix those
    
    ### Test
    
    ```
    $ just codex app-server generate-internal-json-schema --out ./foo
    ```
    
    generates an `RolloutLine.json` file, which i validated against jsonl
    files on disk
    
    `just codex app-server --help` doesn't expose the
    `generate-internal-json-schema` option by default, but you can do `just
    codex app-server generate-internal-json-schema --help` if you know the
    command
    
    everything else still works
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Feat: CXA-1831 Persist latest model and reasoning effort in sqlite (#14859)
    ### Summary
    The goal is for us to get the latest turn model and reasoning effort on
    thread/resume is no override is provided on the thread/resume func call.
    This is the part 1 which we write the model and reasoning effort for a
    thread to the sqlite db and there will be a followup PR to consume the
    two new fields on thread/resume.
    
    [part 2 PR is currently WIP](https://github.com/openai/codex/pull/14888)
    and this one can be merged independently.
  • feat: show effective model in spawn agent event (#14944)
    Show effective model after the full config layering for the sub agent
  • fix: canonicalize symlinked Linux sandbox cwd (#14849)
    ## Problem
    On Linux, Codex can be launched from a workspace path that is a symlink
    (for example, a symlinked checkout or a symlinked parent directory).
    
    Our sandbox policy intentionally canonicalizes writable/readable roots
    to the real filesystem path before building the bubblewrap mounts. That
    part is correct and needed for safety.
    
    The remaining bug was that bubblewrap could still inherit the helper
    process's logical cwd, which might be the symlinked alias instead of the
    mounted canonical path. In that case, the sandbox starts in a cwd that
    does not exist inside the sandbox namespace even though the real
    workspace is mounted. This can cause sandboxed commands to fail in
    symlinked workspaces.
    
    ## Fix
    This PR keeps the sandbox policy behavior the same, but separates two
    concepts that were previously conflated:
    
    - the canonical cwd used to define sandbox mounts and permissions
    - the caller's logical cwd used when launching the command
    
    On the Linux bubblewrap path, we now thread the logical command cwd
    through the helper explicitly and only add `--chdir <canonical path>`
    when the logical cwd differs from the mounted canonical path.
    
    That means:
    - permissions are still computed from canonical paths
    - bubblewrap starts the command from a cwd that definitely exists inside
    the sandbox
    - we do not widen filesystem access or undo the earlier symlink
    hardening
    
    ## Why This Is Safe
    This is a narrow Linux-only launch fix, not a policy change.
    
    - Writable/readable root canonicalization stays intact.
    - Protected metadata carveouts still operate on canonical roots.
    - We only override bubblewrap's inherited cwd when the logical path
    would otherwise point at a symlink alias that is not mounted in the
    sandbox.
    
    ## Tests
    - kept the existing protocol/core regression coverage for symlink
    canonicalization
    - added regression coverage for symlinked cwd handling in the Linux
    bubblewrap builder/helper path
    
    Local validation:
    - `just fmt`
    - `cargo test -p codex-protocol`
    - `cargo test -p codex-core
    normalize_additional_permissions_canonicalizes_symlinked_write_paths`
    - `cargo clippy -p codex-linux-sandbox -p codex-protocol -p codex-core
    --tests -- -D warnings`
    - `cargo build --bin codex`
    
    ## Context
    This is related to #14694. The earlier writable-root symlink fix
    addressed the mount/permission side; this PR fixes the remaining
    symlinked-cwd launch mismatch in the Linux sandbox path.
  • [stack 2/4] Align main realtime v2 wire and runtime flow (#14830)
    ## Stack Position
    2/4. Built on top of #14828.
    
    ## Base
    - #14828
    
    ## Unblocks
    - #14829
    - #14827
    
    ## Scope
    - Port the realtime v2 wire parsing, session, app-server, and
    conversation runtime behavior onto the split websocket-method base.
    - Branch runtime behavior directly on the current realtime session kind
    instead of parser-derived flow flags.
    - Keep regression coverage in the existing e2e suites.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • feat: make interrupt state not final for multi-agents (#13850)
    Make `interrupted` an agent state and make it not final. As a result, a
    `wait` won't return on an interrupted agent and no notification will be
    send to the parent agent.
    
    The rationals are:
    * If a user interrupt a sub-agent for any reason, you don't want the
    parent agent to instantaneously ask the sub-agent to restart
    * If a parent agent interrupt a sub-agent, no need to add a noisy
    notification in the parent agen
  • Preserve background terminals on interrupt and rename cleanup command to /stop (#14602)
    ### Motivation
    - Interrupting a running turn (Ctrl+C / Esc) currently also terminates
    long‑running background shells, which is surprising for workflows like
    local dev servers or file watchers.
    - The existing cleanup command name was confusing; callers expect an
    explicit command to stop background terminals rather than a UI clear
    action.
    - Make background‑shell termination explicit and surface a clearer
    command name while preserving backward compatibility.
    
    ### Description
    - Renamed the background‑terminal cleanup slash command from `Clean`
    (`/clean`) to `Stop` (`/stop`) and kept `clean` as an alias in the
    command parsing/visibility layer, updated the user descriptions and
    command popup wiring accordingly.
    - Updated the unified‑exec footer text and snapshots to point to `/stop`
    (and trimmed corresponding snapshot output to match the new label).
    - Changed interrupt behavior so `Op::Interrupt` (Ctrl+C / Esc interrupt)
    no longer closes or clears tracked unified exec / background terminal
    processes in the TUI or core cleanup path; background shells are now
    preserved after an interrupt.
    - Updated protocol/docs to clarify that `turn/interrupt` (or
    `Op::Interrupt`) interrupts the active turn but does not terminate
    background terminals, and that `thread/backgroundTerminals/clean` is the
    explicit API to stop those shells.
    - Updated unit/integration tests and insta snapshots in the TUI and core
    unified‑exec suites to reflect the new semantics and command name.
    
    ### Testing
    - Ran formatting with `just fmt` in `codex-rs` (succeeded). 
    - Ran `cargo test -p codex-protocol` (succeeded). 
    - Attempted `cargo test -p codex-tui` but the build could not complete
    in this environment due to a native build dependency that requires
    `libcap` development headers (the `codex-linux-sandbox` vendored build
    step); install `libcap-dev` / make `libcap.pc` available in
    `PKG_CONFIG_PATH` to run the TUI test suite locally.
    - Updated and accepted the affected `insta` snapshots for the TUI
    changes so visual diffs reflect the new `/stop` wording and preserved
    interrupt behavior.
    
    ------
    [Codex
    Task](https://chatgpt.com/codex/tasks/task_i_69b39c44b6dc8323bd133ae206310fae)
  • fix: fix symlinked writable roots in sandbox policies (#14674)
    ## Summary
    - normalize effective readable, writable, and unreadable sandbox roots
    after resolving special paths so symlinked roots use canonical runtime
    paths
    - add a protocol regression test for a symlinked writable root with a
    denied child and update protocol expectations to canonicalized effective
    paths
    - update macOS seatbelt tests to assert against effective normalized
    roots produced by the shared policy helpers
    
    ## Testing
    - just fmt
    - cargo test -p codex-protocol
    - cargo test -p codex-core explicit_unreadable_paths_are_excluded_
    - cargo clippy -p codex-protocol -p codex-core --tests -- -D warnings
    
    ## Notes
    - This is intended to fix the symlinked TMPDIR bind failure in
    bubblewrap described in #14672.
    Fixes #14672
  • dynamic tool calls: add param exposeToContext to optionally hide tool (#14501)
    This extends dynamic_tool_calls to allow us to hide a tool from the
    model context but still use it as part of the general tool calling
    runtime (for ex from js_repl/code_mode)
  • move plugin/skill instructions into dev msg and reorder (#14609)
    Move the general `Apps`, `Skills` and `Plugins` instructions blocks out
    of `user_instructions` and into the developer message, with new `Apps ->
    Skills -> Plugins` order for better clarity.
    
    Also wrap those sections in stable XML-style instruction tags (like
    other sections) and update prompt-layout tests/snapshots. This makes the
    tests less brittle in snapshot output (we can parse the sections), and
    it consolidates the capability instructions in one place.
    
    #### Tests
    Updated snapshots, added tests.
    
    `<AGENTS_MD>` disappearing in snapshots is expected: before this change,
    the wrapped user-instructions message was kept alive by `Skills`
    content. Now that `Skills` and `Plugins` are in the developer message,
    that wrapper only appears when there is real
    project-doc/user-instructions content.
    
    ---------
    
    Co-authored-by: Charley Cunningham <ccunningham@openai.com>
  • Add Smart Approvals guardian review across core, app-server, and TUI (#13860)
    ## Summary
    - add `approvals_reviewer = "user" | "guardian_subagent"` as the runtime
    control for who reviews approval requests
    - route Smart Approvals guardian review through core for command
    execution, file changes, managed-network approvals, MCP approvals, and
    delegated/subagent approval flows
    - expose guardian review in app-server with temporary unstable
    `item/autoApprovalReview/{started,completed}` notifications carrying
    `targetItemId`, `review`, and `action`
    - update the TUI so Smart Approvals can be enabled from `/experimental`,
    aligned with the matching `/approvals` mode, and surfaced clearly while
    reviews are pending or resolved
    
    ## Runtime model
    This PR does not introduce a new `approval_policy`.
    
    Instead:
    - `approval_policy` still controls when approval is needed
    - `approvals_reviewer` controls who reviewable approval requests are
    routed to:
      - `user`
      - `guardian_subagent`
    
    `guardian_subagent` is a carefully prompted reviewer subagent that
    gathers relevant context and applies a risk-based decision framework
    before approving or denying the request.
    
    The `smart_approvals` feature flag is a rollout/UI gate. Core runtime
    behavior keys off `approvals_reviewer`.
    
    When Smart Approvals is enabled from the TUI, it also switches the
    current `/approvals` settings to the matching Smart Approvals mode so
    users immediately see guardian review in the active thread:
    - `approval_policy = on-request`
    - `approvals_reviewer = guardian_subagent`
    - `sandbox_mode = workspace-write`
    
    Users can still change `/approvals` afterward.
    
    Config-load behavior stays intentionally narrow:
    - plain `smart_approvals = true` in `config.toml` remains just the
    rollout/UI gate and does not auto-set `approvals_reviewer`
    - the deprecated `guardian_approval = true` alias migration does
    backfill `approvals_reviewer = "guardian_subagent"` in the same scope
    when that reviewer is not already configured there, so old configs
    preserve their original guardian-enabled behavior
    
    ARC remains a separate safety check. For MCP tool approvals, ARC
    escalations now flow into the configured reviewer instead of always
    bypassing guardian and forcing manual review.
    
    ## Config stability
    The runtime reviewer override is stable, but the config-backed
    app-server protocol shape is still settling.
    
    - `thread/start`, `thread/resume`, and `turn/start` keep stable
    `approvalsReviewer` overrides
    - the config-backed `approvals_reviewer` exposure returned via
    `config/read` (including profile-level config) is now marked
    `[UNSTABLE]` / experimental in the app-server protocol until we are more
    confident in that config surface
    
    ## App-server surface
    This PR intentionally keeps the guardian app-server shape narrow and
    temporary.
    
    It adds generic unstable lifecycle notifications:
    - `item/autoApprovalReview/started`
    - `item/autoApprovalReview/completed`
    
    with payloads of the form:
    - `{ threadId, turnId, targetItemId, review, action? }`
    
    `review` is currently:
    - `{ status, riskScore?, riskLevel?, rationale? }`
    - where `status` is one of `inProgress`, `approved`, `denied`, or
    `aborted`
    
    `action` carries the guardian action summary payload from core when
    available. This lets clients render temporary standalone pending-review
    UI, including parallel reviews, even when the underlying tool item has
    not been emitted yet.
    
    These notifications are explicitly documented as `[UNSTABLE]` and
    expected to change soon.
    
    This PR does **not** persist guardian review state onto `thread/read`
    tool items. The intended follow-up is to attach guardian review state to
    the reviewed tool item lifecycle instead, which would improve
    consistency with manual approvals and allow thread history / reconnect
    flows to replay guardian review state directly.
    
    ## TUI behavior
    - `/experimental` exposes the rollout gate as `Smart Approvals`
    - enabling it in the TUI enables the feature and switches the current
    session to the matching Smart Approvals `/approvals` mode
    - disabling it in the TUI clears the persisted `approvals_reviewer`
    override when appropriate and returns the session to default manual
    review when the effective reviewer changes
    - `/approvals` still exposes the reviewer choice directly
    - the TUI renders:
    - pending guardian review state in the live status footer, including
    parallel review aggregation
      - resolved approval/denial state in history
    
    ## Scope notes
    This PR includes the supporting core/runtime work needed to make Smart
    Approvals usable end-to-end:
    - shell / unified-exec / apply_patch / managed-network / MCP guardian
    review
    - delegated/subagent approval routing into guardian review
    - guardian review risk metadata and action summaries for app-server/TUI
    - config/profile/TUI handling for `smart_approvals`, `guardian_approval`
    alias migration, and `approvals_reviewer`
    - a small internal cleanup of delegated approval forwarding to dedupe
    fallback paths and simplify guardian-vs-parent approval waiting (no
    intended behavior change)
    
    Out of scope for this PR:
    - redesigning the existing manual approval protocol shapes
    - persisting guardian review state onto app-server `ThreadItem`s
    - delegated MCP elicitation auto-review (the current delegated MCP
    guardian shim only covers the legacy `RequestUserInput` path)
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • feat(app-server, core): add more spans (#14479)
    ## Description
    
    This PR expands tracing coverage across app-server thread startup, core
    session initialization, and the Responses transport layer. It also gives
    core dispatch spans stable operation-specific names so traces are easier
    to follow than the old generic `submission_dispatch` spans.
    
    Also use `fmt::Display` for types that we serialize in traces so we send
    strings instead of rust types