Commit Graph

237 Commits

  • [tui] Show speed in session header (#13446)
    - add a speed row to the startup/session header under the model row
    - render the speed row with the same styling pattern as the model row,
    using /fast to change
    - show only Fast or Standard to users and update the affected snapshots
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Preserve persisted thread git info in resume (#13504)
    ## Summary
    - ensure `thread.resume` reuses the stored `gitInfo` instead of
    rebuilding it from the live working tree
    - persist and apply thread git metadata through the resume flow and add
    a regression test covering branch mismatch cases
    
    ## Testing
    - Not run (not requested)
  • plugin: support local-based marketplace.json + install endpoint. (#13422)
    Support marketplace.json that points to a local file, with
    ```
        "source":
        {
            "source": "local",
            "path": "./plugin-1"
        },
     ```
     
     Add a new plugin/install endpoint which add the plugin to the cache folder and enable it in config.toml.
  • allow apps to specify cwd for sandbox setup. (#13484)
    The electron app doesn't start up the app-server in a particular
    workspace directory.
    So sandbox setup happens in the app-installed directory instead of the
    project workspace.
    
    This allows the app do specify the workspace cwd so that the sandbox
    setup actually sets up the ACLs instead of exiting fast and then having
    the first shell command be slow.
  • config: enforce enterprise feature requirements (#13388)
    ## Why
    
    Enterprises can already constrain approvals, sandboxing, and web search
    through `requirements.toml` and MDM, but feature flags were still only
    configurable as managed defaults. That meant an enterprise could suggest
    feature values, but it could not actually pin them.
    
    This change closes that gap and makes enterprise feature requirements
    behave like the other constrained settings. The effective feature set
    now stays consistent with enterprise requirements during config load,
    when config writes are validated, and when runtime code mutates feature
    flags later in the session.
    
    It also tightens the runtime API for managed features. `ManagedFeatures`
    now follows the same constraint-oriented shape as `Constrained<T>`
    instead of exposing panic-prone mutation helpers, and production code
    can no longer construct it through an unconstrained `From<Features>`
    path.
    
    The PR also hardens the `compact_resume_fork` integration coverage on
    Windows. After the feature-management changes,
    `compact_resume_after_second_compaction_preserves_history` was
    overflowing the libtest/Tokio thread stacks on Windows, so the test now
    uses an explicit larger-stack harness as a pragmatic mitigation. That
    may not be the ideal root-cause fix, and it merits a parallel
    investigation into whether part of the async future chain should be
    boxed to reduce stack pressure instead.
    
    ## What Changed
    
    Enterprises can now pin feature values in `requirements.toml` with the
    requirements-side `features` table:
    
    ```toml
    [features]
    personality = true
    unified_exec = false
    ```
    
    Only canonical feature keys are allowed in the requirements `features`
    table; omitted keys remain unconstrained.
    
    - Added a requirements-side pinned feature map to
    `ConfigRequirementsToml`, threaded it through source-preserving
    requirements merge and normalization in `codex-config`, and made the
    TOML surface use `[features]` (while still accepting legacy
    `[feature_requirements]` for compatibility).
    - Exposed `featureRequirements` from `configRequirements/read`,
    regenerated the JSON/TypeScript schema artifacts, and updated the
    app-server README.
    - Wrapped the effective feature set in `ManagedFeatures`, backed by
    `ConstrainedWithSource<Features>`, and changed its API to mirror
    `Constrained<T>`: `can_set(...)`, `set(...) -> ConstraintResult<()>`,
    and result-returning `enable` / `disable` / `set_enabled` helpers.
    - Removed the legacy-usage and bulk-map passthroughs from
    `ManagedFeatures`; callers that need those behaviors now mutate a plain
    `Features` value and reapply it through `set(...)`, so the constrained
    wrapper remains the enforcement boundary.
    - Removed the production loophole for constructing unconstrained
    `ManagedFeatures`. Non-test code now creates it through the configured
    feature-loading path, and `impl From<Features> for ManagedFeatures` is
    restricted to `#[cfg(test)]`.
    - Rejected legacy feature aliases in enterprise feature requirements,
    and return a load error when a pinned combination cannot survive
    dependency normalization.
    - Validated config writes against enterprise feature requirements before
    persisting changes, including explicit conflicting writes and
    profile-specific feature states that normalize into invalid
    combinations.
    - Updated runtime and TUI feature-toggle paths to use the constrained
    setter API and to persist or apply the effective post-constraint value
    rather than the requested value.
    - Updated the `core_test_support` Bazel target to include the bundled
    core model-catalog fixtures in its runtime data, so helper code that
    resolves `core/models.json` through runfiles works in remote Bazel test
    environments.
    - Renamed the core config test coverage to emphasize that effective
    feature values are normalized at runtime, while conflicting persisted
    config writes are rejected.
    - Ran `compact_resume_after_second_compaction_preserves_history` inside
    an explicit 8 MiB test thread and Tokio runtime worker stack, following
    the existing larger-stack integration-test pattern, to keep the Windows
    `compact_resume_fork` test slice from aborting while a parallel
    investigation continues into whether some of the underlying async
    futures should be boxed.
    
    ## Verification
    
    - `cargo test -p codex-config`
    - `cargo test -p codex-core feature_requirements_ -- --nocapture`
    - `cargo test -p codex-core
    load_requirements_toml_produces_expected_constraints -- --nocapture`
    - `cargo test -p codex-core
    compact_resume_after_second_compaction_preserves_history -- --nocapture`
    - `cargo test -p codex-core compact_resume_fork -- --nocapture`
    - Re-ran the built `codex-core` `tests/all` binary with
    `RUST_MIN_STACK=262144` for
    `compact_resume_after_second_compaction_preserves_history` to confirm
    the explicit-stack harness fixes the deterministic low-stack repro.
    - `cargo test -p codex-core`
    - This still fails locally in unrelated integration areas that expect
    the `codex` / `test_stdio_server` binaries or hit existing `search_tool`
    wiremock mismatches.
    
    ## Docs
    
    `developers.openai.com/codex` should document the requirements-side
    `[features]` table for enterprise and MDM-managed configuration,
    including that it only accepts canonical feature keys and that
    conflicting config writes are rejected.
  • Add thread metadata update endpoint to app server (#13280)
    ## Summary
    - add the v2 `thread/metadata/update` API, including
    protocol/schema/TypeScript exports and app-server docs
    - patch stored thread `gitInfo` in sqlite without resuming the thread,
    with validation plus support for explicit `null` clears
    - repair missing sqlite thread rows from rollout data before patching,
    and make those repairs safe by inserting only when absent and updating
    only git columns so newer metadata is not clobbered
    - keep sqlite authoritative for mutable thread git metadata by
    preserving existing sqlite git fields during reconcile/backfill and only
    using rollout `SessionMeta` git fields to fill gaps
    - add regression coverage for the endpoint, repair paths, concurrent
    sqlite writes, clearing git fields, and rollout/backfill reconciliation
    - fix the login server shutdown race so cancelling before the waiter
    starts still terminates `block_until_done()` correctly
    
    ## Testing
    - `cargo test -p codex-state
    apply_rollout_items_preserves_existing_git_branch_and_fills_missing_git_fields`
    - `cargo test -p codex-state
    update_thread_git_info_preserves_newer_non_git_metadata`
    - `cargo test -p codex-core
    backfill_sessions_preserves_existing_git_branch_and_fills_missing_git_fields`
    - `cargo test -p codex-app-server thread_metadata_update`
    - `cargo test`
    - currently fails in existing `codex-core` grep-files tests with
    `unsupported call: grep_files`:
        - `suite::grep_files::grep_files_tool_collects_matches`
        - `suite::grep_files::grep_files_tool_reports_empty_results`
  • chore(app-server): delete v1 RPC methods and notifications (#13375)
    ## Summary
    This removes the old app-server v1 methods and notifications we no
    longer need, while keeping the small set the main codex app client still
    depends on for now.
    
    The remaining legacy surface is:
    - `initialize`
    - `getConversationSummary`
    - `getAuthStatus`
    - `gitDiffToRemote`
    - `fuzzyFileSearch`
    - `fuzzyFileSearch/sessionStart`
    - `fuzzyFileSearch/sessionUpdate`
    - `fuzzyFileSearch/sessionStop`
    
    And the raw `codex/event/*` notifications emitted from core. These
    notifications will be removed in a followup PR.
    
    ## What changed
    - removed deprecated v1 request variants from the protocol and
    app-server dispatcher
    - removed deprecated typed notifications: `authStatusChange`,
    `loginChatGptComplete`, and `sessionConfigured`
    - updated the app-server test client to use v2 flows instead of deleted
    v1 flows
    - deleted legacy-only app-server test suites and added focused coverage
    for `getConversationSummary`
    - regenerated app-server schema fixtures and updated the MCP interface
    docs to match the remaining compatibility surface
    
    ## Testing
    - `just write-app-server-schema`
    - `cargo test -p codex-app-server-protocol`
    - `cargo test -p codex-app-server`
  • app-server: source /feedback logs from sqlite at trace level (#12969)
    ## Summary
    - write app-server SQLite logs at TRACE level when SQLite is enabled
    - source app-server `/feedback` log attachments from SQLite for the
    requested thread when available
    - flush buffered SQLite log writes before `/feedback` queries them so
    newly emitted events are not lost behind the async inserter
    - include same-process threadless SQLite rows in those `/feedback` logs
    so the attachment matches the process-wide feedback buffer more closely
    - keep the existing in-memory ring buffer fallback unchanged, including
    when the SQLite query returns no rows
    
    ## Details
    - add a byte-bounded `query_feedback_logs` helper in `codex-state` so
    `/feedback` does not fetch all rows before truncating
    - scope SQLite feedback logs to the requested thread plus threadless
    rows from the same `process_uuid`
    - format exported SQLite feedback lines with the log level prefix to
    better match the in-memory feedback formatter
    - add an explicit `LogDbLayer::flush()` control path and await it in
    app-server before querying SQLite for feedback logs
    - pass optional SQLite log bytes through `codex-feedback` as the
    `codex-logs.log` attachment override
    - leave TUI behavior unchanged apart from the updated `upload_feedback`
    call signature
    - add regression coverage for:
      - newest-within-budget ordering
      - excluding oversized newest rows
      - including same-process threadless rows
      - keeping the newest suffix across mixed thread and threadless rows
      - matching the feedback formatter shape aside from span prefixes
      - falling back to the in-memory snapshot when SQLite returns no logs
      - flushing buffered SQLite rows before querying
    
    ## Follow-up
    - SQLite feedback exports still do not reproduce span prefixes like
    `feedback-thread{thread_id=...}:`; there is a `TODO(ccunningham)` in
    `codex-rs/state/src/log_db.rs` for that follow-up.
    
    ## Testing
    - `cd codex-rs && cargo test -p codex-state`
    - `cd codex-rs && cargo test -p codex-app-server`
    - `cd codex-rs && just fmt`
  • app-server service tier plumbing (plus some cleanup) (#13334)
    followup to https://github.com/openai/codex/pull/13212 to expose fast
    tier controls to app server
    (majority of this PR is generated schema jsons - actual code is +69 /
    -35 and +24 tests )
    
    - add service tier fields to the app-server protocol surfaces used by
    thread lifecycle, turn start, config, and session configured events
    - thread service tier through the app-server message processor and core
    thread config snapshots
    - allow runtime config overrides to carry service tier for app-server
    callers
    
    cleanup:
    - Removing useless "legacy" code supporting "standard" - we moved to
    None | "fast", so "standard" is not needed.
  • add fast mode toggle (#13212)
    - add a local Fast mode setting in codex-core (similar to how model id
    is currently stored on disk locally)
    - send `service_tier=priority` on requests when Fast is enabled
    - add `/fast` in the TUI and persist it locally
    - feature flag
  • app-server: Silence thread status changes caused by thread being created (#13079)
    Currently we emit `thread/status/changed` with `Idle` status right
    before sending `thread/started` event (which also has `Idle` status in
    it).
    It feels that there is no point in that as client has no way to know
    prior state of the thread as it didn't exist yet, so silence these kinds
    of notifications.
  • fix(app-server): emit turn/started only when turn actually starts (#13261)
    This is a follow-up for https://github.com/openai/codex/pull/13047
    
    ## Why
    We had a race where `turn/started` could be observed before the thread
    had actually transitioned to `Active`. This was because we eagerly
    emitted `turn/started` in the request handler for `turn/start` (and
    `review/start`).
    
    That was showing up as flaky `thread/resume` tests, but the real issue
    was broader: a client could see `turn/started` and still get back an
    idle thread immediately afterward.
    
    The first idea was to eagerly call
    `thread_watch_manager.note_turn_started(...)` from the `turn/start`
    request path. That turns out to be unsafe, because
    `submit(Op::UserInput)` only queues work. If a turn starts and completes
    quickly, request-path bookkeeping can race with the real lifecycle
    events and leave stale running state behind.
    
    **The real fix** is to move `turn/started` to emit only after the turn
    _actually_ starts, so we do that by waiting for the
    `EventMsg::TurnStarted` notification emitted by codex core. We do this
    for both `turn/start` and `review/start`.
    
    I also verified this change is safe for our first-party codex apps -
    they don't have any assumptions that `turn/started` is emitted before
    the RPC response to `turn/start` (which is correct anyway).
    
    I also removed `single_client_mode` since it isn't really necessary now.
    
    ## Testing
    - `cargo test -p codex-app-server thread_resume -- --nocapture`
    - `cargo test -p codex-app-server
    'suite::v2::turn_start::turn_start_emits_notifications_and_accepts_model_override'
    -- --exact --nocapture`
    - `cargo test -p codex-app-server`
  • app-server: Update thread/name/set to support not-loaded threads (#13282)
    Currently `thread/name/set` does only work for loaded threads.
    Expand the scope to also support persisted but not-yet-loaded ones for a
    more predictable API surface.
    This will make it possible to rename threads discovered via
    `thread/list` and similar operations.
  • [codex] include plan type in account updates (#13181)
    This change fixes a Codex app account-state sync bug where clients could
    know the user was signed in but still miss the ChatGPT subscription
    tier, which could lead to incorrect upgrade messaging for paid users.
    
    The root cause was that `account/updated` only carried `authMode` while
    plan information was available separately via `account/read` and
    rate-limit snapshots, so this update adds `planType` to
    `account/updated`, populates it consistently across login and refresh
    paths.
  • feat: load from plugins (#12864)
    Support loading plugins.
    
    Plugins can now be enabled via [plugins.<name>] in config.toml. They are
    loaded as first-class entities through PluginsManager, and their default
    skills/ and .mcp.json contributions are integrated into the existing
    skills and MCP flows.
  • app-server: Add ephemeral field to Thread object (#13084)
    Currently there is no alternative way to know that thread is ephemeral,
    only client which did create it has the knowledge.
  • fix(app-server): make thread/start non-blocking (#13033)
    Stop `thread/start` from blocking other app-server requests.
    
    Before this change, `thread/start ran` inline on the request loop, so
    slow startup paths like MCP auth checks could hold up unrelated requests
    on the same connection, including `thread/loaded/list`. This moves
    `thread/start` into a background task.
    
    While doing so, it revealed an issue where we were doing nested locking
    (and there were some race conditions possible that could introduce a
    "phantom listener"). This PR also refactors the listener/subscription
    bookkeeping - listener/subscription state is now centralized in
    `ThreadStateManager` instead of being split across multiple lock
    domains. That makes late auto-attach on `thread/start` race-safe and
    avoids reintroducing disconnected clients as phantom subscribers.
  • [apps] Stablize app list updated event. (#13067)
    Stablize app list updated event so that we only send 2 updates: 1 when
    installed apps become available, one when all directory apps are
    available. Previously it also updates when directory apps become
    available before installed apps, which cuts off installed apps.
  • app-server: Replay pending item requests on thread/resume (#12560)
    Replay pending client requests after `thread/resume` and emit resolved
    notifications when those requests clear so approval/input UI state stays
    in sync after reconnects and across subscribed clients.
    
    Affected RPCs:
    - `item/commandExecution/requestApproval`
    - `item/fileChange/requestApproval`
    - `item/tool/requestUserInput`
    
    Motivation:
    - Resumed clients need to see pending approval/input requests that were
    already outstanding before the reconnect.
    - Clients also need an explicit signal when a pending request resolves
    or is cleared so stale UI can be removed on turn start, completion, or
    interruption.
    
    Implementation notes:
    - Use pending client requests from `OutgoingMessageSender` in order to
    replay them after `thread/resume` attaches the connection, using
    original request ids.
    - Emit `serverRequest/resolved` when pending requests are answered
    or cleared by lifecycle cleanup.
    - Update the app-server protocol schema, generated TypeScript bindings,
    and README docs for the replay/resolution flow.
    
    High-level test plan:
    - Added automated coverage for replaying pending command execution and
    file change approval requests on `thread/resume`.
    - Added automated coverage for resolved notifications in command
    approval, file change approval, request_user_input, turn start, and turn
    interrupt flows.
    - Verified schema/docs updates in the relevant protocol and app-server
    tests.
    
    Manual testing:
    - Tested reconnect/resume with multiple connections.
    - Confirmed state stayed in sync between connections.
  • notify: include client in legacy hook payload (#12968)
    ## Why
    
    The `notify` hook payload did not identify which Codex client started
    the turn. That meant downstream notification hooks could not distinguish
    between completions coming from the TUI and completions coming from
    app-server clients such as VS Code or Xcode. Now that the Codex App
    provides its own desktop notifications, it would be nice to be able to
    filter those out.
    
    This change adds that context without changing the existing payload
    shape for callers that do not know the client name, and keeps the new
    end-to-end test cross-platform.
    
    ## What changed
    
    - added an optional top-level `client` field to the legacy `notify` JSON
    payload
    - threaded that value through `core` and `hooks`; the internal session
    and turn state now carries it as `app_server_client_name`
    - set the field to `codex-tui` for TUI turns
    - captured `initialize.clientInfo.name` in the app server and applied it
    to subsequent turns before dispatching hooks
    - replaced the notify integration test hook with a `python3` script so
    the test does not rely on Unix shell permissions or `bash`
    - documented the new field in `docs/config.md`
    
    ## Testing
    
    - `cargo test -p codex-hooks`
    - `cargo test -p codex-tui`
    - `cargo test -p codex-app-server
    suite::v2::initialize::turn_start_notify_payload_includes_initialize_client_name
    -- --exact --nocapture`
    - `cargo test -p codex-core` (`src/lib.rs` passed; `core/tests/all.rs`
    still has unrelated existing failures in this environment)
    
    ## Docs
    
    The public config reference on `developers.openai.com/codex` should
    mention that the legacy `notify` payload may include a top-level
    `client` field. The TUI reports `codex-tui`, and the app server reports
    `initialize.clientInfo.name` when it is available.
  • Add oauth_resource handling for MCP login flows (#12866)
    Addresses bug https://github.com/openai/codex/issues/12589
    
    Builds on community PR #12763.
    
    This adds `oauth_resource` support for MCP `streamable_http` servers and
    wires it through the relevant config and login paths. It fixes the bug
    where the configured OAuth resource was not reliably included in the
    authorization request, causing MCP login to omit the expected
    `resource` parameter.
  • [apps] Improve app/list with force_fetch=true (#12745)
    - [x] Improve app/list with force_fetch=true, we now keep cached
    snapshot until both install apps and directory apps load.
  • Allow clients not to send summary as an option (#12950)
    Summary is a required parameter on UserTurn. Ideally we'd like the core
    to decide the appropriate summary level.
    
    Make the summary optional and don't send it when not needed.
  • core: bundle settings diff updates into one dev/user envelope (#12417)
    ## Summary
    - bundle contextual prompt injection into at most one developer message
    plus one contextual user message in both:
      - per-turn settings updates
      - initial context insertion
    - preserve `<model_switch>` across compaction by rebuilding it through
    canonical initial-context injection, instead of relying on
    strip/reattach hacks
    - centralize contextual user fragment detection in one shared definition
    table and reuse it for parsing/compaction logic
    - keep `AGENTS.md` in its natural serialized format:
      - `# AGENTS.md instructions for {dirname}`
      - `<INSTRUCTIONS>...</INSTRUCTIONS>`
    - simplify related tests/helpers and accept the expected snapshot/layout
    updates from bundled multi-part messages
    
    ## Why
    The goal is to converge toward a simpler, more intentional prompt shape
    where contextual updates are consistently represented as one developer
    envelope plus one contextual user envelope, while keeping parsing and
    compaction behavior aligned with that representation.
    
    ## Notable details
    - the temporary `SettingsUpdateEnvelope` wrapper was removed; these
    paths now return `Vec<ResponseItem>` directly
    - local/remote compaction no longer rely on model-switch strip/restore
    helpers
    - contextual user detection is now driven by shared fragment definitions
    instead of ad hoc matcher assembly
    - AGENTS/user instructions are still the same logical context; only the
    synthetic `<user_instructions>` wrapper was replaced by the natural
    AGENTS text format
    
    ## Testing
    - `just fmt`
    - `cargo test -p codex-app-server
    codex_message_processor::tests::extract_conversation_summary_prefers_plain_user_messages
    -- --exact`
    - `cargo test -p codex-core
    compact::tests::collect_user_messages_filters_session_prefix_entries
    --lib -- --exact`
    - `cargo test -p codex-core --test all
    'suite::compact::snapshot_request_shape_pre_turn_compaction_strips_incoming_model_switch'
    -- --exact`
    - `cargo test -p codex-core --test all
    'suite::compact_remote::snapshot_request_shape_remote_pre_turn_compaction_strips_incoming_model_switch'
    -- --exact`
    - `cargo test -p codex-core --test all
    'suite::client::includes_apps_guidance_as_developer_message_when_enabled'
    -- --exact`
    - `cargo test -p codex-core --test all
    'suite::client::includes_developer_instructions_message_in_request' --
    --exact`
    - `cargo test -p codex-core --test all
    'suite::client::includes_user_instructions_message_in_request' --
    --exact`
    - `cargo test -p codex-core --test all
    'suite::client::resume_includes_initial_messages_and_sends_prior_items'
    -- --exact`
    - `cargo test -p codex-core --test all
    'suite::review::review_input_isolated_from_parent_history' -- --exact`
    - `cargo test -p codex-exec --test all
    'suite::resume::exec_resume_last_respects_cwd_filter_and_all_flag' --
    --exact`
    - `cargo test -p core_test_support
    context_snapshot::tests::full_text_mode_preserves_unredacted_text --
    --exact`
    
    ## Notes
    - I also ran several targeted `compact`, `compact_remote`,
    `prompt_caching`, `model_visible_layout`, and `event_mapping` tests
    while iterating on prompt-shape changes.
    - I have not claimed a clean full-workspace `cargo test` from this
    environment because local sandbox/resource conditions have previously
    produced unrelated failures in large workspace runs.
  • Enforce user input length cap (#12823)
    Currently there is no bound on the length of a user message submitted in
    the TUI or through the app server interface. That means users can paste
    many megabytes of text, which can lead to bad performance, hangs, and
    crashes. In extreme cases, it can lead to a [kernel
    panic](https://github.com/openai/codex/issues/12323).
    
    This PR limits the length of a user input to 2**20 (about 1M)
    characters. This value was chosen because it fills the entire context
    window on the latest models, so accepting longer inputs wouldn't make
    sense anyway.
    
    Summary
    - add a shared `MAX_USER_INPUT_TEXT_CHARS` constant in codex-protocol
    and surface it in TUI and app server code
    - block oversized submissions in the TUI submit flow and emit error
    history cells when validation fails
    - reject heavy app-server requests with JSON-RPC `-32602` and structured
    `input_too_large` data, plus document the behavior
    
    Testing
    - ran the IDE extension with this change and verified that when I
    attempt to paste a user message that's several MB long, it correctly
    reports an error instead of crashing or making my computer hot.
  • Enable request_user_input in Default mode (#12735)
    ## Summary
    - allow `request_user_input` in Default collaboration mode as well as
    Plan
    - update the Default-mode instructions to prefer assumptions first and
    use `request_user_input` only when a question is unavoidable
    - update request_user_input and app-server tests to match the new
    Default-mode behavior
    - refactor collaboration-mode availability plumbing into
    `CollaborationModesConfig` for future mode-related flags
    
    ## Codex author
    `codex resume 019c9124-ed28-7c13-96c6-b916b1c97d49`
  • Revert "Ensure shell command skills trigger approval (#12697)" (#12721)
    This reverts commit daf0f03ac8.
    
    # External (non-OpenAI) Pull Request Requirements
    
    Before opening this Pull Request, please read the dedicated
    "Contributing" markdown file or your PR may be closed:
    https://github.com/openai/codex/blob/main/docs/contributing.md
    
    If your PR conforms to our contribution guidelines, replace this text
    with a detailed and high quality description of your changes.
    
    Include a link to a bug report or enhancement request.
  • feat(app-server): thread/unsubscribe API (#10954)
    Adds a new v2 app-server API for a client to be able to unsubscribe to a
    thread:
    - New RPC method: `thread/unsubscribe`
    - New server notification: `thread/closed`
    
    Today clients can start/resume/archive threads, but there wasn’t a way
    to explicitly unload a live thread from memory without archiving it.
    With `thread/unsubscribe`, a client can indicate it is no longer
    actively working with a live Thread. If this is the only client
    subscribed to that given thread, the thread will be automatically closed
    by app-server, at which point the server will send `thread/closed` and
    `thread/status/changed` with `status: notLoaded` notifications.
    
    This gives clients a way to prevent long-running app-server processes
    from accumulating too many thread (and related) objects in memory.
    
    Closed threads will also be removed from `thread/loaded/list`.
  • Add app-server v2 thread realtime API (#12715)
    Add experimental `thread/realtime/*` v2 requests and notifications, then
    route app-server realtime events through that thread-scoped surface with
    integration coverage.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • feat(network-proxy): add embedded OTEL policy audit logging (#12046)
    **PR Summary**
    
    This PR adds embedded-only OTEL policy audit logging for
    `codex-network-proxy` and threads audit metadata from `codex-core` into
    managed proxy startup.
    
    ### What changed
    - Added structured audit event emission in `network_policy.rs` with
    target `codex_otel.network_proxy`.
    - Emitted:
    - `codex.network_proxy.domain_policy_decision` once per domain-policy
    evaluation.
      - `codex.network_proxy.block_decision` for non-domain denies.
    - Added required policy/network fields, RFC3339 UTC millisecond
    `event.timestamp`, and fallback defaults (`http.request.method="none"`,
    `client.address="unknown"`).
    - Added non-domain deny audit emission in HTTP/SOCKS handlers for
    mode-guard and proxy-state denies, including unix-socket deny paths.
    - Added `REASON_UNIX_SOCKET_UNSUPPORTED` and used it for unsupported
    unix-socket auditing.
    - Added `NetworkProxyAuditMetadata` to runtime/state, re-exported from
    `lib.rs` and `state.rs`.
    - Added `start_proxy_with_audit_metadata(...)` in core config, with
    `start_proxy()` delegating to default metadata.
    - Wired metadata construction in `codex.rs` from session/auth context,
    including originator sanitization for OTEL-safe tagging.
    - Updated `network-proxy/README.md` with embedded-mode audit schema and
    behavior notes.
    - Refactored HTTP block-audit emission to a small local helper to reduce
    duplication.
    - Preserved existing unix-socket proxy-disabled host/path behavior for
    responses and blocked history while using an audit-only endpoint
    override (`server.address="unix-socket"`, `server.port=0`).
    
    ### Explicit exclusions
    - No standalone proxy OTEL startup work.
    - No `main.rs` binary wiring.
    - No `standalone_otel.rs`.
    - No standalone docs/tests.
    
    ### Tests
    - Extended `network_policy.rs` tests for event mapping, metadata
    propagation, fallbacks, timestamp format, and target prefix.
    - Extended HTTP tests to assert unix-socket deny block audit events.
    - Extended SOCKS tests to cover deny emission from handler deny
    branches.
    - Added/updated core tests to verify audit metadata threading into
    managed proxy state.
    
    ### Validation run
    - `just fmt`
    - `cargo test -p codex-network-proxy` 
    - `cargo test -p codex-core` ran with one unrelated flaky timeout
    (`shell_snapshot::tests::snapshot_shell_does_not_inherit_stdin`), and
    the test passed when rerun directly 
    
    ---------
    
    Co-authored-by: viyatb-oai <viyatb@openai.com>
  • Support external agent config detect and import (#12660)
    Migration Behavior
    
    * Config
      *  Migrates settings.json into config.toml
    * Only adds fields when config.toml is missing, or when those fields are
    missing from the existing file
      *  Supported mappings:
        env -> shell_environment_policy
         sandbox.enabled = true -> sandbox_mode = "workspace-write"
    
    * Skills
      *  Copies home and repo .claude/skills into .agents/skills
      *  Existing skill directories are not overwritten
      *  SKILL.md content is rewritten from Claude-related terms to Codex
    
    * AgentsMd
      *  Repo only
      *  Migrates CLAUDE.md into AGENTS.md
    * Detect/import only proceed when AGENTS.md is missing or present but
    empty
      *  Content is rewritten from Claude-related terms to Codex
  • feat: add search term to thread list (#12578)
    Add `searchTerm` to `thread/list` that will search for a match in the
    titles (the condition being `searchTerm` $$\in$$ `title`)
  • feat: add service name to app-server (#12319)
    Add service name to the app-server so that the app can use it's own
    service name
    
    This is on thread level because later we might plan the app-server to
    become a singleton on the computer
  • feat: pass helper executable paths via Arg0DispatchPaths (#12719)
    ## Why
    
    `codex-rs/core/src/tools/runtimes/shell/unix_escalation.rs` previously
    located `codex-execve-wrapper` by scanning `PATH` and sibling
    directories. That lookup is brittle and can select the wrong binary when
    the runtime environment differs from startup assumptions.
    
    We already pass `codex-linux-sandbox` from `codex-arg0`;
    `codex-execve-wrapper` should use the same startup-driven path plumbing.
    
    ## What changed
    
    - Introduced `Arg0DispatchPaths` in `codex-arg0` to carry both helper
    executable paths:
      - `codex_linux_sandbox_exe`
      - `main_execve_wrapper_exe`
    - Updated `arg0_dispatch_or_else()` to pass `Arg0DispatchPaths` to
    top-level binaries and preserve helper paths created in
    `prepend_path_entry_for_codex_aliases()`.
    - Threaded `Arg0DispatchPaths` through entrypoints in `cli`, `exec`,
    `tui`, `app-server`, and `mcp-server`.
    - Added `main_execve_wrapper_exe` to core configuration plumbing
    (`Config`, `ConfigOverrides`, and `SessionServices`).
    - Updated zsh-fork shell escalation to consume the configured
    `main_execve_wrapper_exe` and removed path-sniffing fallback logic.
    - Updated app-server config reload paths so reloaded configs keep the
    same startup-provided helper executable paths.
    
    ## References
    
    - [`Arg0DispatchPaths`
    definition](https://github.com/openai/codex/blob/e355b43d5c2a771f045296a6deae10d7c9c36ec6/codex-rs/arg0/src/lib.rs#L20-L24)
    - [`arg0_dispatch_or_else()` forwarding both
    paths](https://github.com/openai/codex/blob/e355b43d5c2a771f045296a6deae10d7c9c36ec6/codex-rs/arg0/src/lib.rs#L145-L176)
    - [zsh-fork escalation using configured wrapper
    path](https://github.com/openai/codex/blob/e355b43d5c2a771f045296a6deae10d7c9c36ec6/codex-rs/core/src/tools/runtimes/shell/unix_escalation.rs#L109-L150)
    
    ## Testing
    
    - `cargo check -p codex-arg0 -p codex-core -p codex-exec -p codex-tui -p
    codex-mcp-server -p codex-app-server`
    - `cargo test -p codex-arg0`
    - `cargo test -p codex-core tools::runtimes::shell::unix_escalation:: --
    --nocapture`
  • fix: clarify the value of SkillMetadata.path (#12729)
    Rename `SkillMetadata.path` to `SkillMetadata.path_to_skills_md` for
    clarity.
    
    Would ideally change the type to `AbsolutePathBuf`, but that can be done
    later.
  • codex-rs/app-server: graceful websocket restart on Ctrl-C (#12517)
    ## Summary
    - add graceful websocket app-server restart on Ctrl-C by draining until
    no assistant turns are running
    - stop the websocket acceptor and disconnect existing connections once
    the drain condition is met
    - add a websocket integration test that verifies Ctrl-C waits for an
    in-flight turn before exit
    
    ## Verification
    - `cargo check -p codex-app-server --quiet`
    - `cargo test -p codex-app-server --test all
    suite::v2::connection_handling_websocket`
    - I (maxj) tested remote and local Codex.app
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Add app-server event tracing (#12695)
    To help with debugging
  • Ensure shell command skills trigger approval (#12697)
    Summary
    - detect skill-invoking shell commands based on the original command
    string, request approvals when needed, and cache positive decisions per
    session
    - keep implicit skill invocation emitted after approval and keep skill
    approval decline messaging centralized to the shell handler
    - expand and adjust skill approval tests to cover shell-based skill
    scripts while matching the new detection expectations
    
    Testing
    - Not run (not requested)
  • app-server: retain thread listener across disconnects (#12373)
    - keep the per-thread app-server listener alive when the last client
    unsubscribes or disconnects
    - preserve listener-side active turn history so running `thread/resume`
    can merge an in-progress turn snapshot after reconnect
    - add `ThreadStateManager` regressions for disconnect/unsubscribe
    retention and explicit thread teardown cleanup
    
    Added unit tests, and I manually tested to confirm the fix
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • chore: remove codex-core public protocol/shell re-exports (#12432)
    ## Why
    
    `codex-rs/core/src/lib.rs` re-exported a broad set of types and modules
    from `codex-protocol` and `codex-shell-command`. That made it easy for
    workspace crates to import those APIs through `codex-core`, which in
    turn hides dependency edges and makes it harder to reduce compile-time
    coupling over time.
    
    This change removes those public re-exports so call sites must import
    from the source crates directly. Even when a crate still depends on
    `codex-core` today, this makes dependency boundaries explicit and
    unblocks future work to drop `codex-core` dependencies where possible.
    
    ## What Changed
    
    - Removed public re-exports from `codex-rs/core/src/lib.rs` for:
    - `codex_protocol::protocol` and related protocol/model types (including
    `InitialHistory`)
      - `codex_protocol::config_types` (`protocol_config_types`)
    - `codex_shell_command::{bash, is_dangerous_command, is_safe_command,
    parse_command, powershell}`
    - Migrated workspace Rust call sites to import directly from:
      - `codex_protocol::protocol`
      - `codex_protocol::config_types`
      - `codex_protocol::models`
      - `codex_shell_command`
    - Added explicit `Cargo.toml` dependencies (`codex-protocol` /
    `codex-shell-command`) in crates that now import those crates directly.
    - Kept `codex-core` internal modules compiling by using `pub(crate)`
    aliases in `core/src/lib.rs` (internal-only, not part of the public
    API).
    - Updated the two utility crates that can already drop a `codex-core`
    dependency edge entirely:
      - `codex-utils-approval-presets`
      - `codex-utils-cli`
    
    ## Verification
    
    - `cargo test -p codex-utils-approval-presets`
    - `cargo test -p codex-utils-cli`
    - `cargo check --workspace --all-targets`
    - `just clippy`
  • fix: address flakiness in thread_resume_rejoins_running_thread_even_with_override_mismatch (#12381)
    ## Why
    `thread/resume` responses for already-running threads can be reported as
    `Idle` even while a turn is still in progress. This is caused by a
    timing window where the runtime watch state has not yet observed the
    running-thread transition, so API clients can receive stale status
    information at resume time.
    
    Possibly related: https://github.com/openai/codex/pull/11786
    
    ## What
    - Add a shared status normalization helper, `resolve_thread_status`, in
    `codex-rs/app-server/src/thread_status.rs` that resolves
    `Idle`/`NotLoaded` to `Active { active_flags: [] }` when an in-progress
    turn is known.
    - Reuse this helper across thread response paths in
    `codex-rs/app-server/src/codex_message_processor.rs` (including
    `thread/start`, `thread/unarchive`, `thread/read`, `thread/resume`,
    `thread/fork`, and review/thread-started notification responses).
    - In `handle_pending_thread_resume_request`, use both the in-memory
    `active_turn_snapshot` and the resumed rollout turns to decide whether a
    turn is in progress before resolving thread status for the response.
    - Extend `thread_status` tests to validate the new status-resolution
    behavior directly.
    
    ## Verification
    - `cargo test -p codex-app-server
    suite::v2::thread_resume::thread_resume_rejoins_running_thread_even_with_override_mismatch`
  • Add field to Thread object for the latest rename set for a given thread (#12301)
    Exposes through the app server updated names set for a thread. This
    enables other surfaces to use the core as the source of truth for thread
    naming. `threadName` is gathered using the helper functions used to
    interact with `session_index.jsonl`, and is hydrated in:
    - `thread/list`
    - `thread/read`
    - `thread/resume`
    - `thread/unarchive`
    - `thread/rollback`
    
    We don't do this for `thread/start` and `thread/fork`.
  • Add ability to attach extra files to feedback (#12370)
    Allow clients to provide extra files.
  • app-server: harden disconnect cleanup paths (#12218)
    Hardens codex-rs/app-server connection lifecycle and outbound routing
    for websocket clients. Fixes some FUD I was having
    
    - Added per-connection disconnect signaling (CancellationToken) for
    websocket transports.
    - Split websocket handling into independent inbound/outbound tasks
    coordinated by cancellation.
    - Changed outbound routing so websocket connections use non-blocking
    try_send; slow/full websocket writers are disconnected instead of
    stalling broadcast delivery.
    - Kept stdio behavior blocking-on-send (no forced disconnect) so local
    stdio clients are not dropped when queues are temporarily full.
    - Simplified outbound router flow by removing deferred
    pending_closed_connections handling.
    - Added guards to drop incoming response/notification/error messages
    from unknown connections.
    - Fixed listener teardown race in thread listener tasks using a
    listener_generation check so stale tasks do not clear newer listeners.
    
    Fixes
    https://linear.app/openai/issue/CODEX-4966/multiclient-handle-slow-notification-consumers
    
      ## Tests
    
      Added/updated transport tests covering:
    
      - broadcast does not block on a slow/full websocket connection
      - stdio connection waits instead of disconnecting on full queue
    
    I (maxj) have tested manually and will retest before landing
  • Refactor network approvals to host/protocol/port scope (#12140)
    ## Summary
    Simplify network approvals by removing per-attempt proxy correlation and
    moving to session-level approval dedupe keyed by (host, protocol, port).
    Instead of encoding attempt IDs into proxy credentials/URLs, we now
    treat approvals as a destination policy decision.
    
    - Concurrent calls to the same destination share one approval prompt.
    - Different destinations (or same host on different ports) get separate
    prompts.
    - Allow once approves the current queued request group only.
    - Allow for session caches that (host, protocol, port) and auto-allows
    future matching requests.
    - Never policy continues to deny without prompting.
    
    Example:
    - 3 calls: 
      - a.com (line 443)
      - b.com (line 443)
      - a.com (line 443)
    => 2 prompts total (a, b), second a waits on the first decision.
    - a.com:80 is treated separately from a.com line 443
    
    ## Testing
    - `just fmt` (in `codex-rs`)
    - `cargo test -p codex-core tools::network_approval::tests`
    - `cargo test -p codex-core` (unit tests pass; existing
    integration-suite failures remain in this environment)
  • feat: add nick name to sub-agents (#12320)
    Adding random nick name to sub-agents. Used for UX
    
    At the same time, also storing and wiring the role of the sub-agent
  • app-server: improve thread resume rejoin flow (#11776)
    thread/resume response includes latest turn with all items, in band so
    no events are stale or lost
    
    Testing
    - e2e tested using app-server-test-client using flow described in
    "Testing Thread Rejoin Behavior" in
    codex-rs/app-server-test-client/README.md
    - e2e tested in codex desktop by reconnecting to a running turn
  • Add configurable MCP OAuth callback URL for MCP login (#11382)
    ## Summary
    
    Implements a configurable MCP OAuth callback URL override for `codex mcp
    login` and app-server OAuth login flows, including support for non-local
    callback endpoints (for example, devbox ingress URLs).
    
    ## What changed
    
    - Added new config key: `mcp_oauth_callback_url` in
    `~/.codex/config.toml`.
    - OAuth authorization now uses `mcp_oauth_callback_url` as
    `redirect_uri` when set.
    - Callback handling validates the callback path against the configured
    redirect URI path.
    - Listener bind behavior is now host-aware:
    - local callback URL hosts (`localhost`, `127.0.0.1`, `::1`) bind to
    `127.0.0.1`
      - non-local callback URL hosts bind to `0.0.0.0`
    - `mcp_oauth_callback_port` remains supported and is used for the
    listener port.
    - Wired through:
      - CLI MCP login flow
      - App-server MCP OAuth login flow
      - Skill dependency OAuth login flow
    - Updated config schema and config tests.
    
    ## Why
    
    Some environments need OAuth callbacks to land on a specific reachable
    URL (for example ingress in remote devboxes), not loopback. This change
    allows that while preserving local defaults for existing users.
    
    ## Backward compatibility
    
    - No behavior change when `mcp_oauth_callback_url` is unset.
    - Existing `mcp_oauth_callback_port` behavior remains intact.
    - Local callback flows continue binding to loopback by default.
    
    ## Testing
    
    - `cargo test -p codex-rmcp-client callback -- --nocapture`
    - `cargo test -p codex-core --lib mcp_oauth_callback -- --nocapture`
    - `cargo check -p codex-cli -p codex-app-server -p codex-rmcp-client`
    
    ## Example config
    
    ```toml
    mcp_oauth_callback_port = 5555
    mcp_oauth_callback_url = "https://<devbox>-<namespace>.gateway.<cluster>.internal.api.openai.org/callback"
  • app-server: expose loaded thread status via read/list and notifications (#11786)
    Motivation
    - Today, a newly connected client has no direct way to determine the
    current runtime status of threads from read/list responses alone.
    - This forces clients to infer state from transient events, which can
    lead to stale or inconsistent UI when reconnecting or attaching late.
    
    Changes
    - Add `status` to `thread/read` responses.
    - Add `statuses` to `thread/list` responses.
    - Emit `thread/status/changed` notifications with `threadId` and the new
    status.
    - Track runtime status for all loaded threads and default unknown
    threads to `idle`.
    - Update protocol/docs/tests/schema fixtures for the revised API.
    
    Testing
    - Validated protocol API changes with automated protocol tests and
    regenerated schema/type fixtures.
    - Validated app-server behavior with unit and integration test suites,
    including status transitions and notifications.