Commit Graph

372 Commits

  • chore: clean up argument-comment lint and roll out all-target CI on macOS (#16054)
    ## Why
    
    `argument-comment-lint` was green in CI even though the repo still had
    many uncommented literal arguments. The main gap was target coverage:
    the repo wrapper did not force Cargo to inspect test-only call sites, so
    examples like the `latest_session_lookup_params(true, ...)` tests in
    `codex-rs/tui_app_server/src/lib.rs` never entered the blocking CI path.
    
    This change cleans up the existing backlog, makes the default repo lint
    path cover all Cargo targets, and starts rolling that stricter CI
    enforcement out on the platform where it is currently validated.
    
    ## What changed
    
    - mechanically fixed existing `argument-comment-lint` violations across
    the `codex-rs` workspace, including tests, examples, and benches
    - updated `tools/argument-comment-lint/run-prebuilt-linter.sh` and
    `tools/argument-comment-lint/run.sh` so non-`--fix` runs default to
    `--all-targets` unless the caller explicitly narrows the target set
    - fixed both wrappers so forwarded cargo arguments after `--` are
    preserved with a single separator
    - documented the new default behavior in
    `tools/argument-comment-lint/README.md`
    - updated `rust-ci` so the macOS lint lane keeps the plain wrapper
    invocation and therefore enforces `--all-targets`, while Linux and
    Windows temporarily pass `-- --lib --bins`
    
    That temporary CI split keeps the stricter all-targets check where it is
    already cleaned up, while leaving room to finish the remaining Linux-
    and Windows-specific target-gated cleanup before enabling
    `--all-targets` on those runners. The Linux and Windows failures on the
    intermediate revision were caused by the wrapper forwarding bug, not by
    additional lint findings in those lanes.
    
    ## Validation
    
    - `bash -n tools/argument-comment-lint/run.sh`
    - `bash -n tools/argument-comment-lint/run-prebuilt-linter.sh`
    - shell-level wrapper forwarding check for `-- --lib --bins`
    - shell-level wrapper forwarding check for `-- --tests`
    - `just argument-comment-lint`
    - `cargo test` in `tools/argument-comment-lint`
    - `cargo test -p codex-terminal-detection`
    
    ## Follow-up
    
    - Clean up remaining Linux-only target-gated callsites, then switch the
    Linux lint lane back to the plain wrapper invocation.
    - Clean up remaining Windows-only target-gated callsites, then switch
    the Windows lint lane back to the plain wrapper invocation.
  • Add usage-based business plan types (#15934)
    ## Summary
    - add `self_serve_business_usage_based` and `enterprise_cbp_usage_based`
    to the public/internal plan enums and regenerate the app-server + Python
    SDK artifacts
    - map both plans through JWT login and backend rate-limit payloads, then
    bucket them with the existing Team/Business entitlement behavior in
    cloud requirements, usage-limit copy, tooltips, and status display
    - keep the earlier display-label remap commit on this branch so the new
    Team-like and Business-like plans render consistently in the UI
    
    ## Testing
    - `just write-app-server-schema`
    - `uv run --project sdk/python python
    sdk/python/scripts/update_sdk_artifacts.py generate-types`
    - `just fix -p codex-protocol -p codex-login -p codex-core -p
    codex-backend-client -p codex-cloud-requirements -p codex-tui -p
    codex-tui-app-server -p codex-backend-openapi-models`
    - `just fmt`
    - `just argument-comment-lint`
    - `cargo test -p codex-protocol
    usage_based_plan_types_use_expected_wire_names`
    - `cargo test -p codex-login usage_based`
    - `cargo test -p codex-backend-client usage_based`
    - `cargo test -p codex-cloud-requirements usage_based`
    - `cargo test -p codex-core usage_limit_reached_error_formats_`
    - `cargo test -p codex-tui plan_type_display_name_remaps_display_labels`
    - `cargo test -p codex-tui remapped`
    - `cargo test -p codex-tui-app-server
    plan_type_display_name_remaps_display_labels`
    - `cargo test -p codex-tui-app-server remapped`
    - `cargo test -p codex-tui-app-server
    preserves_usage_based_plan_type_wire_name`
    
    ## Notes
    - a broader multi-crate `cargo test` run still hits unrelated existing
    guardian-approval config failures in
    `codex-rs/core/src/config/config_tests.rs`
  • Add ChatGPT device-code login to app server (#15525)
    ## Problem
    
    App-server clients could only initiate ChatGPT login through the browser
    callback flow, even though the shared login crate already supports
    device-code auth. That left VS Code, Codex App, and other app-server
    clients without a first-class way to use the existing device-code
    backend when browser redirects are brittle or when the client UX wants
    to own the login ceremony.
    
    ## Mental model
    
    This change adds a second ChatGPT login start path to app-server:
    clients can now call `account/login/start` with `type:
    "chatgptDeviceCode"`. App-server immediately returns a `loginId` plus
    the device-code UX payload (`verificationUrl` and `userCode`), then
    completes the login asynchronously in the background using the existing
    `codex_login` polling flow. Successful device-code login still resolves
    to ordinary `chatgpt` auth, and completion continues to flow through the
    existing `account/login/completed` and `account/updated` notifications.
    
    ## Non-goals
    
    This does not introduce a new auth mode, a new account shape, or a
    device-code eligibility discovery API. It also does not add automatic
    fallback to browser login in core; clients remain responsible for
    choosing when to request device code and whether to retry with a
    different UX if the backend/admin policy rejects it.
    
    ## Tradeoffs
    
    We intentionally keep `login_chatgpt_common` as a local validation
    helper instead of turning it into a capability probe. Device-code
    eligibility is checked by actually calling `request_device_code`, which
    means policy-disabled cases surface as an immediate request error rather
    than an async completion event. We also keep the active-login state
    machine minimal: browser and device-code logins share the same public
    cancel contract, but device-code cancellation is implemented with a
    local cancel token rather than a larger cross-crate refactor.
    
    ## Architecture
    
    The protocol grows a new `chatgptDeviceCode` request/response variant in
    app-server v2. On the server side, the new handler reuses the existing
    ChatGPT login precondition checks, calls `request_device_code`, returns
    the device-code payload, and then spawns a background task that waits on
    either cancellation or `complete_device_code_login`. On success, it
    reuses the existing auth reload and cloud-requirements refresh path
    before emitting `account/login/completed` success and `account/updated`.
    On failure or cancellation, it emits only `account/login/completed`
    failure. The existing `account/login/cancel { loginId }` contract
    remains unchanged and now works for both browser and device-code
    attempts.
    
    
    ## Tests
    
    Added protocol serialization coverage for the new request/response
    variant, plus app-server tests for device-code success, failure, cancel,
    and start-time rejection behavior. Existing browser ChatGPT login
    coverage remains in place to show that the callback-based flow is
    unchanged.
  • chore: refactor network permissions to use explicit domain and unix socket rule maps (#15120)
    ## Summary
    
    This PR replaces the legacy network allow/deny list model with explicit
    rule maps for domains and unix sockets across managed requirements,
    permissions profiles, the network proxy config, and the app server
    protocol.
    
    Concretely, it:
    
    - introduces typed domain (`allow` / `deny`) and unix socket permission
    (`allow` / `none`) entries instead of separate `allowed_domains`,
    `denied_domains`, and `allow_unix_sockets` lists
    - updates config loading, managed requirements merging, and exec-policy
    overlays to read and upsert rule entries consistently
    - exposes the new shape through protocol/schema outputs, debug surfaces,
    and app-server config APIs
    - rejects the legacy list-based keys and updates docs/tests to reflect
    the new config format
    
    ## Why
    
    The previous representation split related network policy across multiple
    parallel lists, which made merging and overriding rules harder to reason
    about. Moving to explicit keyed permission maps gives us a single source
    of truth per host/socket entry, makes allow/deny precedence clearer, and
    gives protocol consumers access to the full rule state instead of
    derived projections only.
    
    ## Backward Compatibility
    
    ### Backward compatible
    
    - Managed requirements still accept the legacy
    `experimental_network.allowed_domains`,
    `experimental_network.denied_domains`, and
    `experimental_network.allow_unix_sockets` fields. They are normalized
    into the new canonical `domains` and `unix_sockets` maps internally.
    - App-server v2 still deserializes legacy `allowedDomains`,
    `deniedDomains`, and `allowUnixSockets` payloads, so older clients can
    continue reading managed network requirements.
    - App-server v2 responses still populate `allowedDomains`,
    `deniedDomains`, and `allowUnixSockets` as legacy compatibility views
    derived from the canonical maps.
    - `managed_allowed_domains_only` keeps the same behavior after
    normalization. Legacy managed allowlists still participate in the same
    enforcement path as canonical `domains` entries.
    
    ### Not backward compatible
    
    - Permissions profiles under `[permissions.<profile>.network]` no longer
    accept the legacy list-based keys. Those configs must use the canonical
    `[domains]` and `[unix_sockets]` tables instead of `allowed_domains`,
    `denied_domains`, or `allow_unix_sockets`.
    - Managed `experimental_network` config cannot mix canonical and legacy
    forms in the same block. For example, `domains` cannot be combined with
    `allowed_domains` or `denied_domains`, and `unix_sockets` cannot be
    combined with `allow_unix_sockets`.
    - The canonical format can express explicit `"none"` entries for unix
    sockets, but those entries do not round-trip through the legacy
    compatibility fields because the legacy fields only represent allow/deny
    lists.
    ## Testing
    `/target/debug/codex sandbox macos --log-denials /bin/zsh -c 'curl
    https://www.example.com' ` gives 200 with config
    ```
    [permissions.workspace.network.domains]
    "www.example.com" = "allow"
    ```
    and fails when set to deny: `curl: (56) CONNECT tunnel failed, response
    403`.
    
    Also tested backward compatibility path by verifying that adding the
    following to `/etc/codex/requirements.toml` works:
    ```
    [experimental_network]
    allowed_domains = ["www.example.com"]
    ```
  • [app-server-protocol] introduce generic ClientResponse for app-server-protocol (#15921)
    - introduces `ClientResponse` as the symmetrical typed response union to
    `ClientRequest` for app-server-protocol
    - enables scalable event stream ingestion for use cases such as
    analytics
    - no runtime behavior changes, protocol/schema plumbing only
  • permissions: remove macOS seatbelt extension profiles (#15918)
    ## Why
    
    `PermissionProfile` should only describe the per-command permissions we
    still want to grant dynamically. Keeping
    `MacOsSeatbeltProfileExtensions` in that surface forced extra macOS-only
    approval, protocol, schema, and TUI branches for a capability we no
    longer want to expose.
    
    ## What changed
    
    - Removed the macOS-specific permission-profile types from
    `codex-protocol`, the app-server v2 API, and the generated
    schema/TypeScript artifacts.
    - Deleted the core and sandboxing plumbing that threaded
    `MacOsSeatbeltProfileExtensions` through execution requests and seatbelt
    construction.
    - Simplified macOS seatbelt generation so it always includes the fixed
    read-only preferences allowlist instead of carrying a configurable
    profile extension.
    - Removed the macOS additional-permissions UI/docs/test coverage and
    deleted the obsolete macOS permission modules.
    - Tightened `request_permissions` intersection handling so explicitly
    empty requested read lists are preserved only when that field was
    actually granted, avoiding zero-grant responses being stored as active
    permissions.
  • chore: remove skill metadata from command approval payloads (#15906)
    ## Why
    
    This is effectively a follow-up to
    [#15812](https://github.com/openai/codex/pull/15812). That change
    removed the special skill-script exec path, but `skill_metadata` was
    still being threaded through command-approval payloads even though the
    approval flow no longer uses it to render prompts or resolve decisions.
    
    Keeping it around added extra protocol, schema, and client surface area
    without changing behavior.
    
    Removing it keeps the command-approval contract smaller and avoids
    carrying a dead field through app-server, TUI, and MCP boundaries.
    
    ## What changed
    
    - removed `ExecApprovalRequestSkillMetadata` and the corresponding
    `skillMetadata` field from core approval events and the v2 app-server
    protocol
    - removed the generated JSON and TypeScript schema output for that field
    - updated app-server, MCP server, TUI, and TUI app-server approval
    plumbing to stop forwarding the field
    - cleaned up tests that previously constructed or asserted
    `skillMetadata`
    
    ## Testing
    
    - `cargo test -p codex-app-server-protocol`
    - `cargo test -p codex-protocol`
    - `cargo test -p codex-app-server-test-client`
    - `cargo test -p codex-mcp-server`
    - `just argument-comment-lint`
  • [hooks] add non-streaming (non-stdin style) shell-only PostToolUse support (#15531)
    CHAINED PR - note that base is eternal/hooks-pretooluse-bash, not main
    -- so the following PR should be first
    
    Matching post-tool hook to the pre-tool functionality here:
    https://github.com/openai/codex/pull/15211
    
    So, PreToolUse calls for plain shell calls, allows blocking. This
    PostToolUse call runs after the command executed
    
    example run:
    
    ```
    › as a test, run in parallel the following commands:
      - echo 'one'
      - echo '[block-pre-tool-use]'
      - echo '[block-post-tool-use]'
    
    
    ⚠ MCP startup incomplete (failed: notion, linear)
    
    • Cruising through those three commands in parallel now, and I’ll share the exact outputs right after
      they land.
    
    • Running PreToolUse hook: checking the observatory runes
    
    • Running PreToolUse hook: checking the observatory runes
    
    • Running PreToolUse hook: checking the observatory runes
    
    PreToolUse hook (blocked)
      warning: wizard-tower PreToolUse demo blocked a Bash command on purpose.
      feedback: PreToolUse demo blocked the command. Remove [block-pre-tool-use] to continue.
    
    PreToolUse hook (completed)
      warning: wizard-tower PreToolUse demo inspected Bash: echo 'one'
    
    PreToolUse hook (completed)
      warning: wizard-tower PreToolUse demo inspected Bash: echo '[block-post-tool-use]'
    
    • Ran echo '[block-post-tool-use]'
      └ [block-post-tool-use]
    
    • Ran echo 'one'
      └ one
    
    • Running PostToolUse hook: reading the observatory aftermath
    
    • Running PostToolUse hook: reading the observatory aftermath
    
    PostToolUse hook (blocked)
      warning: wizard-tower PostToolUse demo blocked the Bash follow-through on purpose.
      feedback: PostToolUse demo blocked the result after execution. Remove [block-post-tool-use] to
    continue.
    
    PostToolUse hook (completed)
      warning: wizard-tower PostToolUse demo inspected Bash: echo 'one'
      hook context: PostToolUse demo saw the Bash result. Response preview: one
    
    ──────────────────────────────────────────────────────────────────────────────────────────────────────
    
    • Ran them in parallel. Little wave report:
    
      - echo 'one' completed and returned one.
      - echo '[block-pre-tool-use]' was blocked before execution by the PreToolUse hook.
      - echo '[block-post-tool-use]' ran, but its result was blocked afterward by the PostToolUse hook.
    
      The hook messages were:
    
      - PreToolUse demo blocked the command. Remove [block-pre-tool-use] to continue.
      - PostToolUse demo blocked the result after execution. Remove [block-post-tool-use] to continue.
    
    
    › for the blocked messages (correctly blocked in this test) did the tool output land back into
      context?
    
    
    • Yeah, from this run:
    
      - PreToolUse block: the command itself did not execute, so no command output landed in context. Only
        the block message did.
      - PostToolUse block: the command did execute, but the actual stdout did not come back into context.
        Only the post-tool block message did.
    
      So the blocked notifications landed in context, but the blocked command payloads themselves did not.
      Nice clean guardrail, aloha.
    ```
  • [app-server] Add a method to override feature flags. (#15601)
    - [x] Add a method to override feature flags globally and not just
    thread level.
  • app-server: Return codex home in initialize response (#15689)
    This allows clients to get enough information to interact with the codex
    skills/configuration/etc.
  • app-server: add filesystem watch support (#14533)
    ### Summary
    Add the v2 app-server filesystem watch RPCs and notifications, wire them
    through the message processor, and implement connection-scoped watches
    with notify-backed change delivery. This also updates the schema
    fixtures, app-server documentation, and the v2 integration coverage for
    watch and unwatch behavior.
    
    This allows clients to efficiently watch for filesystem updates, e.g. to
    react on branch changes.
    
    ### Testing
    - exercise watch lifecycles for directory changes, atomic file
    replacement, missing-file targets, and unwatch cleanup
  • Move git utilities into a dedicated crate (#15564)
    - create `codex-git-utils` and move the shared git helpers into it with
    file moves preserved for diff readability
    - move the `GitInfo` helpers out of `core` so stacked rollout work can
    depend on the shared crate without carrying its own git info module
    
    ---------
    
    Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
    Co-authored-by: Codex <noreply@openai.com>
  • Add fork snapshot modes (#15239)
    ## Summary
    - add `ForkSnapshotMode` to `ThreadManager::fork_thread` so callers can
    request either a committed snapshot or an interrupted snapshot
    - share the model-visible `<turn_aborted>` history marker between the
    live interrupt path and interrupted forks
    - update the small set of direct fork callsites to pass
    `ForkSnapshotMode::Committed`
    
    Note: this enables /btw to work similarly as Esc to interrupt (hopefully
    somewhat in distribution)
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • [hooks] add non-streaming (non-stdin style) shell-only PreToolUse support (#15211)
    - add `PreToolUse` hook for bash-like tool execution only at first
    - block shell execution before dispatch with deny-only hook behavior
    - introduces common.rs matcher framework for matching when hooks are run
    
    example run:
    
    ```
    › run three parallel echo commands, and the second one should echo "[block-pre-tool-use]" as a test
    
    
    • Running the three echo commands in parallel now and I’ll report the output directly.
    
    • Running PreToolUse hook: name for demo pre tool use hook
    
    • Running PreToolUse hook: name for demo pre tool use hook
    
    • Running PreToolUse hook: name for demo pre tool use hook
    
    PreToolUse hook (completed)
      warning: wizard-tower PreToolUse demo inspected Bash: echo "first parallel echo"
      
    PreToolUse hook (blocked)
      warning: wizard-tower PreToolUse demo blocked a Bash command on purpose.
      feedback: PreToolUse demo blocked the command. Remove [block-pre-tool-use] to continue.
    
    PreToolUse hook (completed)
      warning: wizard-tower PreToolUse demo inspected Bash: echo "third parallel echo"
    
    • Ran echo "first parallel echo"
      └ first parallel echo
    
    • Ran echo "third parallel echo"
      └ third parallel echo
    
    • Three little waves went out in parallel.
    
      1. printed first parallel echo
      2. was blocked before execution because it contained the exact test string [block-pre-tool-use]
      3. printed third parallel echo
    
      There was also an unrelated macOS defaults warning around the successful commands, but the echoes
      themselves worked fine. If you want, I can rerun the second one with a slightly modified string so
      it passes cleanly.
    ```
  • feat: support disable skills by name. (#15378)
    Support disabling skills by name, primarily for plugin skills. We can’t
    use the path, since plugin skill paths may change across versions.
  • tui: queue follow-ups during manual /compact (#15259)
    ## Summary
    - queue input after the user submits `/compact` until that manual
    compact turn ends
    - mirror the same behavior in the app-server TUI
    - add regressions for input queued before compact starts and while it is
    running
    
    Co-authored-by: Codex <noreply@openai.com>
  • Add realtime transcript notification in v2 (#15344)
    - emit a typed `thread/realtime/transcriptUpdated` notification from
    live realtime transcript deltas
    - expose that notification as flat `threadId`, `role`, and `text` fields
    instead of a nested transcript array
    - continue forwarding raw `handoff_request` items on
    `thread/realtime/itemAdded`, including the accumulated
    `active_transcript`
    - update app-server docs, tests, and generated protocol schema artifacts
    to match the delta-based payloads
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • feat: change multi-agent to use path-like system instead of uuids (#15313)
    This PR add an URI-based system to reference agents within a tree. This
    comes from a sync between research and engineering.
    
    The main agent (the one manually spawned by a user) is always called
    `/root`. Any sub-agent spawned by it will be `/root/agent_1` for example
    where `agent_1` is chosen by the model.
    
    Any agent can contact any agents using the path.
    
    Paths can be used either in absolute or relative to the calling agents
    
    Resume is not supported for now on this new path
  • Feat/restore image generation history (#15223)
    Restore image generation items in resumed thread history
  • feat(app-server): add mcpServer/startupStatus/updated notification (#15220)
    Exposes the legacy `codex/event/mcp_startup_update` event as an API v2
    notification.
    
    The legacy event has this shape:
    ```
    #[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
    pub struct McpStartupUpdateEvent {
        /// Server name being started.
        pub server: String,
        /// Current startup status.
        pub status: McpStartupStatus,
    }
    
    #[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
    #[serde(rename_all = "snake_case", tag = "state")]
    #[ts(rename_all = "snake_case", tag = "state")]
    pub enum McpStartupStatus {
        Starting,
        Ready,
        Failed { error: String },
        Cancelled,
    }
    ```
  • [hooks] use a user message > developer message for prompt continuation (#14867)
    ## Summary
    
    Persist Stop-hook continuation prompts as `user` messages instead of
    hidden `developer` messages + some requested integration tests
    
    This is a followup to @pakrym 's comment in
    https://github.com/openai/codex/pull/14532 to make sure stop-block
    continuation prompts match training for turn loops
    
    - Stop continuation now writes `<hook_prompt hook_run_id="...">stop
    hook's user prompt<hook_prompt>`
    - Introduces quick-xml dependency, though we already indirectly depended
    on it anyway via syntect
    - This PR only has about 500 lines of actual logic changes, the rest is
    tests/schema
    
    ## Testing
    
    Example run (with a sessionstart hook and 3 stop hooks) - this shows
    context added by session start, then two stop hooks sending their own
    additional prompts in a new turn. The model responds with a single
    message addressing both. Then when that turn ends, the hooks detect that
    they just ran using `stop_hook_active` and decide not to infinite loop
    
    test files for this (unzip, move codex -> .codex):
    [codex.zip](https://github.com/user-attachments/files/26075806/codex.zip)
    
    ```
    › cats
    
    
    • Running SessionStart hook: lighting the observatory
    
    SessionStart hook (completed)
      warning: Hi, I'm a session start hook for wizard-tower (startup).
      hook context: A wimboltine stonpet is an exotic cuisine from hyperspace
    
    • Cats are tiny zen wizards, my friend: equal parts nap, mystery, and chaos. If you want, we can talk
      cat facts, cat breeds, cat names, or build something cat-themed in this repo.
    
    • Running Stop hook: checking the tower wards
    
    • Running Stop hook: sacking the guards
    
    • Running Stop hook: hiring the guards
    
    Stop hook (completed)
      warning: Wizard Tower Stop hook reviewed the completed reply (177 chars).
    
    Stop hook (blocked)
      warning: Wizard Tower Stop hook continuing conversation
      feedback: cook the stonpet
    
    Stop hook (blocked)
      warning: Wizard Tower Stop hook continuing conversation
      feedback: eat the cooked stonpet
    
    • Stonpet’s cooked, aloha style: flash-seared over a blue quasiflame, glazed with nebula salt, and
      rested until the hyperspace juices settle.
    
      Now we eat with gratitude, my friend. One mindful bite in, and the flavor is pure cosmic surf:
      smoky, bright, and totally out of this dimension.
    
    • Running Stop hook: checking the tower wards
    
    • Running Stop hook: sacking the guards
    
    • Running Stop hook: hiring the guards
    
    Stop hook (completed)
      warning: Wizard Tower Stop hook reviewed the completed reply (285 chars).
    
    Stop hook (completed)
      warning: Wizard Tower Stop hook saw a second pass and stayed calm to avoid a loop.
    
    Stop hook (completed)
      warning: Wizard Tower Stop hook saw a second pass and stayed calm to avoid a loop.
    ```
  • feat: support product-scoped plugins. (#15041)
    1. Added SessionSource::Custom(String) and --session-source.
      2. Enforced plugin and skill products by session_source.
      3. Applied the same filtering to curated background refresh.
  • Add thread/shellCommand to app server API surface (#14988)
    This PR adds a new `thread/shellCommand` app server API so clients can
    implement `!` shell commands. These commands are executed within the
    sandbox, and the command text and output are visible to the model.
    
    The internal implementation mirrors the current TUI `!` behavior.
    - persist shell command execution as `CommandExecution` thread items,
    including source and formatted output metadata
    - bridge live and replayed app-server command execution events back into
    the existing `tui_app_server` exec rendering path
    
    This PR also wires `tui_app_server` to submit `!` commands through the
    new API.
  • Simple directory mentions (#14970)
    - Adds simple support for directory mentions in the TUI.
    - Codex App/VS Code will require minor change to recognize a directory
    mention as such and change the link behavior.
    - Directory mentions have a trailing slash to differentiate from
    extensionless files
    
    
    <img width="972" height="382" alt="image"
    src="https://github.com/user-attachments/assets/8035b1eb-0978-465b-8d7a-4db2e5feca39"
    />
    <img width="978" height="228" alt="image"
    src="https://github.com/user-attachments/assets/af22cf0b-dd10-4440-9bee-a09915f6ba52"
    />
  • Revert "fix: harden plugin feature gating" (#15102)
    Reverts openai/codex#15020
    
    I messed up the commit in my PR and accidentally merged changes that
    were still under review.
  • fix: harden plugin feature gating (#15020)
    1. Use requirement-resolved config.features as the plugin gate.
    2. Guard plugin/list, plugin/read, and related flows behind that gate.
    3. Skip bad marketplace.json files instead of failing the whole list.
    4. Simplify plugin state and caching.
  • Add notify to code-mode (#14842)
    Allows model to send an out-of-band notification.
    
    The notification is injected as another tool call output for the same
    call_id.
  • [hooks] userpromptsubmit - hook before user's prompt is executed (#14626)
    - this allows blocking the user's prompts from executing, and also
    prevents them from entering history
    - handles the edge case where you can both prevent the user's prompt AND
    add n amount of additionalContexts
    - refactors some old code into common.rs where hooks overlap
    functionality
    - refactors additionalContext being previously added to user messages,
    instead we use developer messages for them
    - handles queued messages correctly
    
    Sample hook for testing - if you write "[block-user-submit]" this hook
    will stop the thread:
    
    example run
    ```
    › sup
    
    
    • Running UserPromptSubmit hook: reading the observatory notes
    
    UserPromptSubmit hook (completed)
      warning: wizard-tower UserPromptSubmit demo inspected: sup
      hook context: Wizard Tower UserPromptSubmit demo fired. For this reply only, include the exact
    phrase 'observatory lanterns lit' exactly once near the end.
    
    • Just riding the cosmic wave and ready to help, my friend. What are we building today? observatory
      lanterns lit
    
    
    › and [block-user-submit]
    
    
    • Running UserPromptSubmit hook: reading the observatory notes
    
    UserPromptSubmit hook (stopped)
      warning: wizard-tower UserPromptSubmit demo blocked the prompt on purpose.
      stop: Wizard Tower demo block: remove [block-user-submit] to continue.
    ```
    
    .codex/config.toml
    ```
    [features]
    codex_hooks = true
    ```
    
    .codex/hooks.json
    ```
    {
      "hooks": {
        "UserPromptSubmit": [
          {
            "hooks": [
              {
                "type": "command",
                "command": "/usr/bin/python3 .codex/hooks/user_prompt_submit_demo.py",
                "timeoutSec": 10,
                "statusMessage": "reading the observatory notes"
              }
            ]
          }
        ]
      }
    }
    ```
    
    .codex/hooks/user_prompt_submit_demo.py
    ```
    #!/usr/bin/env python3
    
    import json
    import sys
    from pathlib import Path
    
    
    def prompt_from_payload(payload: dict) -> str:
        prompt = payload.get("prompt")
        if isinstance(prompt, str) and prompt.strip():
            return prompt.strip()
    
        event = payload.get("event")
        if isinstance(event, dict):
            user_prompt = event.get("user_prompt")
            if isinstance(user_prompt, str):
                return user_prompt.strip()
    
        return ""
    
    
    def main() -> int:
        payload = json.load(sys.stdin)
        prompt = prompt_from_payload(payload)
        cwd = Path(payload.get("cwd", ".")).name or "wizard-tower"
    
        if "[block-user-submit]" in prompt:
            print(
                json.dumps(
                    {
                        "systemMessage": (
                            f"{cwd} UserPromptSubmit demo blocked the prompt on purpose."
                        ),
                        "decision": "block",
                        "reason": (
                            "Wizard Tower demo block: remove [block-user-submit] to continue."
                        ),
                    }
                )
            )
            return 0
    
        prompt_preview = prompt or "(empty prompt)"
        if len(prompt_preview) > 80:
            prompt_preview = f"{prompt_preview[:77]}..."
    
        print(
            json.dumps(
                {
                    "systemMessage": (
                        f"{cwd} UserPromptSubmit demo inspected: {prompt_preview}"
                    ),
                    "hookSpecificOutput": {
                        "hookEventName": "UserPromptSubmit",
                        "additionalContext": (
                            "Wizard Tower UserPromptSubmit demo fired. "
                            "For this reply only, include the exact phrase "
                            "'observatory lanterns lit' exactly once near the end."
                        ),
                    },
                }
            )
        )
        return 0
    
    
    if __name__ == "__main__":
        raise SystemExit(main())
    ```
  • Gate realtime audio interruption logic to v2 (#14984)
    - thread the realtime version into conversation start and app-server
    notifications
    - keep playback-aware mic gating and playback interruption behavior on
    v2 only, leaving v1 on the legacy path
  • Cleanup skills/remote/xxx endpoints. (#14977)
    Remote skills/remote/xxx as they are not in used for now.
  • generate an internal json schema for RolloutLine (#14434)
    ### Why
    i'm working on something that parses and analyzes codex rollout logs,
    and i'd like to have a schema for generating a parser/validator.
    
    `codex app-server generate-internal-json-schema` writes an
    `RolloutLine.json` file
    
    while doing this, i noticed we have a writer <> reader mismatch issue on
    `FunctionCallOutputPayload` and reasoning item ID -- added some schemars
    annotations to fix those
    
    ### Test
    
    ```
    $ just codex app-server generate-internal-json-schema --out ./foo
    ```
    
    generates an `RolloutLine.json` file, which i validated against jsonl
    files on disk
    
    `just codex app-server --help` doesn't expose the
    `generate-internal-json-schema` option by default, but you can do `just
    codex app-server generate-internal-json-schema --help` if you know the
    command
    
    everything else still works
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • fix: align marketplace display name with existing interface conventions (#14886)
    1. camelCase for displayName;
    2. move displayName under interface.
  • [stack 2/4] Align main realtime v2 wire and runtime flow (#14830)
    ## Stack Position
    2/4. Built on top of #14828.
    
    ## Base
    - #14828
    
    ## Unblocks
    - #14829
    - #14827
    
    ## Scope
    - Port the realtime v2 wire parsing, session, app-server, and
    conversation runtime behavior onto the split websocket-method base.
    - Branch runtime behavior directly on the current realtime session kind
    instead of parser-derived flow flags.
    - Keep regression coverage in the existing e2e suites.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • feat: support remote_sync for plugin install/uninstall. (#14878)
    - Added forceRemoteSync to plugin/install and plugin/uninstall.
    - With forceRemoteSync=true, we update the remote plugin status first,
    then apply the local change only if the backend call succeeds.
    - Kept plugin/list(forceRemoteSync=true) as the main recon path, and for
    now it treats remote enabled=false as uninstall. We
    will eventually migrate to plugin/installed for more precise state
    handling.
  • Add marketplace display names to plugin/list (#14861)
    Add display_name support to marketplace.json.
  • Apply argument comment lint across codex-rs (#14652)
    ## Why
    
    Once the repo-local lint exists, `codex-rs` needs to follow the
    checked-in convention and CI needs to keep it from drifting. This commit
    applies the fallback `/*param*/` style consistently across existing
    positional literal call sites without changing those APIs.
    
    The longer-term preference is still to avoid APIs that require comments
    by choosing clearer parameter types and call shapes. This PR is
    intentionally the mechanical follow-through for the places where the
    existing signatures stay in place.
    
    After rebasing onto newer `main`, the rollout also had to cover newly
    introduced `tui_app_server` call sites. That made it clear the first cut
    of the CI job was too expensive for the common path: it was spending
    almost as much time installing `cargo-dylint` and re-testing the lint
    crate as a representative test job spends running product tests. The CI
    update keeps the full workspace enforcement but trims that extra
    overhead from ordinary `codex-rs` PRs.
    
    ## What changed
    
    - keep a dedicated `argument_comment_lint` job in `rust-ci`
    - mechanically annotate remaining opaque positional literals across
    `codex-rs` with exact `/*param*/` comments, including the rebased
    `tui_app_server` call sites that now fall under the lint
    - keep the checked-in style aligned with the lint policy by using
    `/*param*/` and leaving string and char literals uncommented
    - cache `cargo-dylint`, `dylint-link`, and the relevant Cargo
    registry/git metadata in the lint job
    - split changed-path detection so the lint crate's own `cargo test` step
    runs only when `tools/argument-comment-lint/*` or `rust-ci.yml` changes
    - continue to run the repo wrapper over the `codex-rs` workspace, so
    product-code enforcement is unchanged
    
    Most of the code changes in this commit are intentionally mechanical
    comment rewrites or insertions driven by the lint itself.
    
    ## Verification
    
    - `./tools/argument-comment-lint/run.sh --workspace`
    - `cargo test -p codex-tui-app-server -p codex-tui`
    - parsed `.github/workflows/rust-ci.yml` locally with PyYAML
    
    ---
    
    * -> #14652
    * #14651
  • feat: make interrupt state not final for multi-agents (#13850)
    Make `interrupted` an agent state and make it not final. As a result, a
    `wait` won't return on an interrupted agent and no notification will be
    send to the parent agent.
    
    The rationals are:
    * If a user interrupt a sub-agent for any reason, you don't want the
    parent agent to instantaneously ask the sub-agent to restart
    * If a parent agent interrupt a sub-agent, no need to add a noisy
    notification in the parent agen
  • dynamic tool calls: add param exposeToContext to optionally hide tool (#14501)
    This extends dynamic_tool_calls to allow us to hide a tool from the
    model context but still use it as part of the general tool calling
    runtime (for ex from js_repl/code_mode)
  • make defaultPrompt an array, keep backcompat (#14649)
    make plugins' `defaultPrompt` an array, but keep backcompat for strings.
    
    the array is limited by app-server to 3 entries of up to 128 chars
    (drops extra entries, `None`s-out ones that are too long) without
    erroring if those invariants are violating.
    
    added tests, tested locally.
  • Add Smart Approvals guardian review across core, app-server, and TUI (#13860)
    ## Summary
    - add `approvals_reviewer = "user" | "guardian_subagent"` as the runtime
    control for who reviews approval requests
    - route Smart Approvals guardian review through core for command
    execution, file changes, managed-network approvals, MCP approvals, and
    delegated/subagent approval flows
    - expose guardian review in app-server with temporary unstable
    `item/autoApprovalReview/{started,completed}` notifications carrying
    `targetItemId`, `review`, and `action`
    - update the TUI so Smart Approvals can be enabled from `/experimental`,
    aligned with the matching `/approvals` mode, and surfaced clearly while
    reviews are pending or resolved
    
    ## Runtime model
    This PR does not introduce a new `approval_policy`.
    
    Instead:
    - `approval_policy` still controls when approval is needed
    - `approvals_reviewer` controls who reviewable approval requests are
    routed to:
      - `user`
      - `guardian_subagent`
    
    `guardian_subagent` is a carefully prompted reviewer subagent that
    gathers relevant context and applies a risk-based decision framework
    before approving or denying the request.
    
    The `smart_approvals` feature flag is a rollout/UI gate. Core runtime
    behavior keys off `approvals_reviewer`.
    
    When Smart Approvals is enabled from the TUI, it also switches the
    current `/approvals` settings to the matching Smart Approvals mode so
    users immediately see guardian review in the active thread:
    - `approval_policy = on-request`
    - `approvals_reviewer = guardian_subagent`
    - `sandbox_mode = workspace-write`
    
    Users can still change `/approvals` afterward.
    
    Config-load behavior stays intentionally narrow:
    - plain `smart_approvals = true` in `config.toml` remains just the
    rollout/UI gate and does not auto-set `approvals_reviewer`
    - the deprecated `guardian_approval = true` alias migration does
    backfill `approvals_reviewer = "guardian_subagent"` in the same scope
    when that reviewer is not already configured there, so old configs
    preserve their original guardian-enabled behavior
    
    ARC remains a separate safety check. For MCP tool approvals, ARC
    escalations now flow into the configured reviewer instead of always
    bypassing guardian and forcing manual review.
    
    ## Config stability
    The runtime reviewer override is stable, but the config-backed
    app-server protocol shape is still settling.
    
    - `thread/start`, `thread/resume`, and `turn/start` keep stable
    `approvalsReviewer` overrides
    - the config-backed `approvals_reviewer` exposure returned via
    `config/read` (including profile-level config) is now marked
    `[UNSTABLE]` / experimental in the app-server protocol until we are more
    confident in that config surface
    
    ## App-server surface
    This PR intentionally keeps the guardian app-server shape narrow and
    temporary.
    
    It adds generic unstable lifecycle notifications:
    - `item/autoApprovalReview/started`
    - `item/autoApprovalReview/completed`
    
    with payloads of the form:
    - `{ threadId, turnId, targetItemId, review, action? }`
    
    `review` is currently:
    - `{ status, riskScore?, riskLevel?, rationale? }`
    - where `status` is one of `inProgress`, `approved`, `denied`, or
    `aborted`
    
    `action` carries the guardian action summary payload from core when
    available. This lets clients render temporary standalone pending-review
    UI, including parallel reviews, even when the underlying tool item has
    not been emitted yet.
    
    These notifications are explicitly documented as `[UNSTABLE]` and
    expected to change soon.
    
    This PR does **not** persist guardian review state onto `thread/read`
    tool items. The intended follow-up is to attach guardian review state to
    the reviewed tool item lifecycle instead, which would improve
    consistency with manual approvals and allow thread history / reconnect
    flows to replay guardian review state directly.
    
    ## TUI behavior
    - `/experimental` exposes the rollout gate as `Smart Approvals`
    - enabling it in the TUI enables the feature and switches the current
    session to the matching Smart Approvals `/approvals` mode
    - disabling it in the TUI clears the persisted `approvals_reviewer`
    override when appropriate and returns the session to default manual
    review when the effective reviewer changes
    - `/approvals` still exposes the reviewer choice directly
    - the TUI renders:
    - pending guardian review state in the live status footer, including
    parallel review aggregation
      - resolved approval/denial state in history
    
    ## Scope notes
    This PR includes the supporting core/runtime work needed to make Smart
    Approvals usable end-to-end:
    - shell / unified-exec / apply_patch / managed-network / MCP guardian
    review
    - delegated/subagent approval routing into guardian review
    - guardian review risk metadata and action summaries for app-server/TUI
    - config/profile/TUI handling for `smart_approvals`, `guardian_approval`
    alias migration, and `approvals_reviewer`
    - a small internal cleanup of delegated approval forwarding to dedupe
    fallback paths and simplify guardian-vs-parent approval waiting (no
    intended behavior change)
    
    Out of scope for this PR:
    - redesigning the existing manual approval protocol shapes
    - persisting guardian review state onto app-server `ThreadItem`s
    - delegated MCP elicitation auto-review (the current delegated MCP
    guardian shim only covers the legacy `RequestUserInput` path)
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • app-server: add v2 filesystem APIs (#14245)
    Add a protocol-level filesystem surface to the v2 app-server so Codex
    clients can read and write files, inspect directories, and subscribe to
    path changes without relying on host-specific helpers.
    
    High-level changes:
    - define the new v2 fs/readFile, fs/writeFile, fs/createDirectory,
    fs/getMetadata, fs/readDirectory, fs/remove, fs/copy RPCs
    - implement the app-server handlers, including absolute-path validation,
    base64 file payloads, recursive copy/remove semantics
    - document the API, regenerate protocol schemas/types, and add
    end-to-end tests for filesystem operations, copy edge cases
    
    Testing plan:
    - validate protocol serialization and generated schema output for the
    new fs request, response, and notification types
    - run app-server integration coverage for file and directory CRUD paths,
    metadata/readDirectory responses, copy failure modes, and absolute-path
    validation
  • feat(app-server, core): add more spans (#14479)
    ## Description
    
    This PR expands tracing coverage across app-server thread startup, core
    session initialization, and the Responses transport layer. It also gives
    core dispatch spans stable operation-specific names so traces are easier
    to follow than the old generic `submission_dispatch` spans.
    
    Also use `fmt::Display` for types that we serialize in traces so we send
    strings instead of rust types
  • app-server: Add platform os and family to init response (#14527)
    This allows the client to pick os-specific behavior while interacting
    with the app server, e.g. to use proper path separators.
  • feat: add plugin/read. (#14445)
    return more information for a specific plugin.