24 Commits

  • Load executor skills without host path conversion (#29626)
    ## Why
    
    After #28918, selected skill roots are `PathUri`, but the executor skill
    provider still converts them to the app-server host's `AbsolutePathBuf`.
    A foreign Windows root therefore cannot be discovered by a Unix host,
    and the inverse has the same problem.
    
    This PR keeps executor skill discovery and reads on the filesystem that
    owns the selected root while reusing the existing skill rules.
    
    ## What changed
    
    - Generalize the existing skill traversal to operate on `PathUri`
    through `ExecutorFileSystem`, preserving its depth, directory, symlink,
    and sibling-metadata concurrency behavior.
    - Add a small environment skill loader that reuses the shared discovery,
    frontmatter validation, dependency parsing, product policy, and
    prompt-visibility rules.
    - Keep the environment id and entrypoint `PathUri` in the skill catalog,
    then route `skills.read` back through the same environment filesystem.
    - Preserve the executor's path convention when deriving catalog handles,
    including literal backslashes in POSIX filenames.
    - Resolve plugin namespaces from nearby manifests through URI-native
    filesystem reads.
    - Cover foreign Windows roots, executor-owned reads, namespaces,
    metadata, policy, and path identity.
    
    ```text
    selected root (PathUri)
            |
            v
    shared discovery over ExecutorFileSystem
            |
            v
    environment-bound catalog entry --skills.read--> same ExecutorFileSystem
    ```
    
    No second filesystem abstraction or duplicate traversal implementation
    is introduced.
    
    ## Stack
    
    1. #29614 — add lexical `PathUri` containment.
    2. #29620 — share URI-native manifest path resolution.
    3. #28918 — keep selected plugin roots and resources URI-native.
    4. **This PR** — load executor skills without host path conversion.
    5. #29628 — resolve executor MCP working directories without host path
    conversion.
  • [codex] Use expect in integration tests (#28441)
    The workspace denies `clippy::expect_used` in production. Although
    `clippy.toml` allows `expect` in tests, Bazel Clippy compiles
    integration-test helper code in a way that does not receive that
    exemption, which encouraged verbose `unwrap_or_else(... panic!(...))`
    and equivalent `match`/`let else` forms.
    
    This allows `clippy::expect_used` once at each integration-test crate
    root (including aggregated suites and test-support libraries), then
    replaces manual panic-based Result and Option unwraps with
    `expect`/`expect_err`. Standalone `tests/*.rs` files remain their own
    crate roots. Intentional assertion and unexpected-variant panics remain
    unchanged, and the production `expect_used = "deny"` lint remains in
    place.
    
    The cleanup is mechanical and net-negative in line count.
  • skills: hide orchestrator skills with a local executor (#28333)
    ## Why
    
    App-server threads without a local executor need orchestrator-owned
    skills from the hosted `codex_apps` MCP server. Threads with the local
    executor already discover installed skills from the local filesystem.
    
    After the orchestrator skill provider was enabled for every app-server
    thread, local-executor threads also received the hosted skill catalog
    and the `skills.list` and `skills.read` tools. This changed the existing
    local behavior and could expose a second hosted copy of a skill that was
    already installed locally.
    
    ## What changed
    
    - Expose the thread's selected execution environments to extensions at
    thread startup.
    - Enable orchestrator skills only when the reserved local environment is
    not selected.
    - Apply that decision consistently to hosted skill catalog discovery,
    explicit skill injection, and the `skills.list` and `skills.read` tools.
    
    ## Verification
    
    - The existing no-executor app-server test continues to verify hosted
    skill discovery, invocation, and child-resource reads.
    - A new app-server test verifies that local-executor threads do not
    receive hosted skill context or `skills.*` tools.
  • Route image extension reads through turn environments v2 (#27498)
    ## Why
    
    Image generation used `std::fs::read` for referenced image paths, which
    did not support environment-backed filesystems or their sandbox context.
    
    ## What changed
    
    - Expose optional turn environments to extension tool calls.
    - Include each environment’s ID, working directory, filesystem, and
    sandbox context.
    - Read referenced images through the selected environment filesystem.
    - Keep sandbox usage at the extension call site so extensions can choose
    the appropriate access mode.
    - Consolidate image request construction into one async function.
    - Add coverage for successful environment reads and read failures.
    
    ## Validation
    
    - `cargo check -p codex-image-generation-extension --tests`
    - `just fmt`
    - `just bazel-lock-update`
    - `just bazel-lock-check`
    
    `just test -p codex-image-generation-extension` could not complete
    because the build exhausted available disk space.
  • [codex-analytics] emit goal lifecycle analytics (#27078)
    ## Why
    - Currently, there is no analytics event for `/goal` behavior
    - Existing events cannot identify goal execution or its resulting
    outcome
    - The original update in
    [#26182](https://github.com/openai/codex/pull/26182) was implemented
    before `/goal` moved into `codex-goal-extension`.
    
    ## What Changed
    - Adds `codex_goal_event` serialization and enrichment to
    `codex-analytics`
    - Emits goal events from the canonical `codex-goal-extension` mutation
    and accounting paths:
      - `created` when a new logical goal is persisted
      - `usage_accounted` when cumulative goal usage is persisted
      - `status_changed` when the stored goal status changes
      - `cleared` when the goal is deleted
    - Preserves causal `turn_id` for turn driven events and uses null
    attribution for external or idle lifecycle events
    - Changes goal deletion to return the deleted row so `cleared` retains
    the stable goal ID
    
    ## Event Details
    
    Includes standard analytics metadata along with goal specific fields:
    - `goal_id`: Stable ID stored in the local SQLite goal row and shared
    across the goal's events
    - `event_kind`: Observed operation (see the 4 lifecycle events cited in
    the above bullet)
    - `goal_status`: Resulting or last stored status: `active`, `paused`,
    `blocked`, `usage_limited`, etc.
      - `has_token_budget`: Indicates whether a token budget is configured
      - `turn_id`: Causal turn ID, or null when no causal turn exists
    - `cumulative_tokens_accounted`: Cumulative tokens on `usage_accounted`
    events; null otherwise
    - `cumulative_time_accounted_seconds`: Cumulative active time on
    `usage_accounted` events; null otherwise
    
    ## Validation
    - `just test -p codex-analytics -p codex-state -p codex-goal-extension`
    - `just test -p codex-core -E 'test(/goal/)'`
    - `just test -p codex-app-server`
    - `cargo build -p codex-analytics -p codex-core -p codex-state -p
    codex-app-server`
  • Allow creating a new goal after completion (#26681)
    ## Why
    
    Users have indicated that they want an agent to be able to create a new
    goal for itself after completing the previous goal. Currently, that's
    not possible because agents cannot overwrite an existing goal even if
    it's complete. This PR removes this limitation and allows `create_goal`
    to overwrite an existing goal if it is in the `complete` state.
    
    ## What changed
    
    `create_goal` now replaces the existing goal only when its status is
    `complete`. The replacement is performed atomically in the goal store,
    creates a fresh active goal with reset usage, and continues to reject
    creation while any unfinished goal exists. App server clients see a
    single `thread/goal/updated` event when the previous goal is replaced
    with the new one.
    
    The tool description and error message now reflect these semantics.
    
    ## What didn't change
    
    Agents are not allowed to create a new goal (overwrite their existing
    goal) if an existing goal is still active, blocked, paused, or in any
    other state other than "completed".
  • Block active goals after terminal turn errors (#26690)
    ## Why
    
    Terminal turn errors can leave a goal active. Automatic goal
    continuation may then repeatedly hit a permanent failure, including
    compaction requests rejected with HTTP 400, and consume excessive
    tokens.
    
    This PR changes the goal extension to treat all turn-ending errors
    (including non-retryable errors and retryable errors that have exceeded
    their retry count) as "blocking" for the goal. The downside to this
    change is that there are some errors that may eventually succeed (e.g. a
    429 due to a service outage), and previously the goal runtime would have
    kept the agent going in these situations.
    
    ## What changed
    
    - Block the current active goal when a turn ends with an error other
    than a usage-limit error.
    - Preserve the existing `usage_limited` transition for usage-limit
    errors.
    - Share progress accounting, guarded state updates, metrics, and event
    emission in the goal runtime.
  • fix: serialize goal progress accounting (#26155)
    ## Why
    
    Goal progress accounting can be reached from multiple completion paths
    for the same thread. Each path takes a progress snapshot, writes the
    usage delta, and then marks that snapshot as accounted. When two
    tool-completion hooks run at the same time, they can both observe the
    same unaccounted delta and charge it twice.
    
    ## What changed
    
    - Added a per-thread progress-accounting permit to
    `GoalAccountingState`.
    - Held that permit across the snapshot/write/mark-accounted critical
    section for active-turn, idle, and tool-finish accounting.
    - Added regression coverage for parallel tool-finish hooks so a shared
    token delta is charged once and only one progress event is emitted.
    
    ## Testing
    
    - Not run locally.
    - Added `parallel_tool_finish_accounts_active_goal_progress_once`.
  • Add goal extension GoalApi (#25096)
    ## Summary
    
    - add an extension-owned `GoalApi` for thread goal get/set/clear
    operations
    - register live goal runtimes with the API from the goal extension
    backend
    - cover the API and runtime-effect paths in goal extension tests
    
    ## Stack
    
    Follow-up app-server wiring PR: #25108
    
    ## Validation
    
    - `just fmt`
    - `just fix -p codex-goal-extension`
    - `just test -p codex-goal-extension`
  • [codex] Require model for standalone web search (#25131)
    ## Why
    
    The standalone `/v1/alpha/search` request now requires a `model`, but
    the `web.run` extension currently omits it.
    
    Adds `model` to extension `ToolCall` invocation.
    
    Follow-up to #23823.
    
    ## What changed
    
    - Make `SearchRequest.model` required.
    - Expose the effective per-turn model on extension tool calls and pass
    it in standalone web-search requests.
    - Assert the model is forwarded in the app-server round-trip test.
    
    ## Testing
    
    - `just test -p codex-api -p codex-tools -p codex-web-search-extension
    -p codex-memories-extension -p codex-goal-extension`
    - `just test -p codex-core -E
    'test(passes_turn_fields_and_scoped_turn_item_emitter_to_extension_call)'`
    - `just test -p codex-app-server -E
    'test(standalone_web_search_round_trips_encrypted_output)'`
  • Handle goal usage limits from turn errors (#25095)
    ## Summary
    - handle goal usage-limit turn errors in the goal extension
    - exercise the extension path in the goal backend test
    
    ## Tests
    - just fmt
    - just test -p codex-goal-extension
    - just fix -p codex-goal-extension
  • extension-api: add TurnItemEmitter to tool calls (#24813)
    ## Why
    Extension-contributed tools need to emit visible turn items through
    Codex's normal event and persistence pipeline.
    
    ## What
    - Add `TurnItemEmitter` to extension `ToolCall`s and route the core
    implementation through `Session::emit_turn_item_*`.
    - Hold weak session and turn references so retained tool calls cannot
    keep host state alive.
    - Provide a no-op emitter for extension test callers.
    
    ## Test Plan
    - `just test -p codex-core -E
    'test(passes_turn_fields_and_scoped_turn_item_emitter_to_extension_call)'`
    
    ---------
    
    Co-authored-by: jif-oai <jif@openai.com>
  • Gate goal tools by thread eligibility (#24925)
    ## Why
    
    Goal tools create and update goal state for a persistent thread. The
    extension was only checking whether goals were enabled before
    advertising those tools, which meant they could be surfaced in contexts
    that should not receive thread goal controls: ephemeral threads without
    persistent thread state and review subagents.
    
    Those sessions can still run the goal extension lifecycle, but the
    thread tools should only be visible when the current thread can safely
    use them.
    
    ## What changed
    
    - Adds a `GoalRuntimeConfig` that separates goal enablement from whether
    goal tools are available for the current thread.
    - Computes tool eligibility on thread start from
    `persistent_thread_state_available` and `SessionSource`, hiding tools
    for review subagents.
    - Uses `GoalRuntimeHandle::tools_visible()` when contributing thread
    tools so enabled runtime state does not automatically imply tool
    exposure.
    - Adds backend coverage for hiding goal tools on ephemeral threads and
    review subagents.
    
    ## Testing
    
    - Added `goal_tools_hidden_for_ephemeral_threads`.
    - Added `goal_tools_hidden_for_review_subagents`.
  • Add thread start contributor facts (#24915)
    Summary: add session source and persistent-state availability to
    ThreadStartInput; populate them from session init; update existing goal
    test harness constructors. Tests: just fmt; git diff --check. No full
    tests or clippy run per request.
  • feat: handle goal usage limits in goal extension (#24628)
    ## Why
    
    The extracted goal runtime needs a host-callable path for turns that
    stop because the workspace usage limit is reached. In that case, any
    in-turn goal progress should be accounted before the goal becomes
    terminal, and active goal accounting must be cleared so later
    tool-finish or turn-stop handling does not keep charging usage to a
    stopped goal.
    
    ## What changed
    
    - Adds `GoalRuntimeHandle::usage_limit_active_goal_for_turn`, which
    accounts current active-goal progress, marks the active or
    budget-limited thread goal as `UsageLimited`, records terminal metrics
    when the status changes, clears active goal accounting, and emits the
    updated goal event.
    - Covers both active and budget-limited goals in
    `ext/goal/tests/goal_extension_backend.rs`, including the invariant that
    later token/tool events do not add usage after the goal has been
    usage-limited.
    
    ## Testing
    
    - Added
    `usage_limit_active_goal_accounts_progress_and_clears_accounting`.
    - Added `usage_limit_budget_limited_goal_accounts_remaining_progress`.
  • fix: restore goal accounting after thread resume (#24626)
    ## Why
    
    Goal idle accounting is supposed to survive a thread resume. Previously,
    the resume hook restored the active goal state inline from the extension
    lifecycle contributor, which left the runtime handle without a reusable
    restoration path and made the behavior hard to cover directly. When a
    thread with an active goal was resumed, goal accounting could lose track
    of the active idle goal instead of continuing to accrue elapsed time.
    
    ## What changed
    
    - Moved thread-resume restoration into
    `GoalRuntimeHandle::restore_after_resume()` so the runtime owns
    rehydrating active goal accounting from persisted thread goal state.
    - Kept disabled goal runtimes as a no-op and preserved the existing
    warning path when persisted goal state cannot be loaded.
    - Added a backend regression test that seeds an active goal, resumes the
    thread, waits briefly, and verifies elapsed idle time is reflected on
    the next external goal mutation.
    
    ## Testing
    
    - Not run locally; this metadata update only rewrote the PR title/body.
  • Add goal extension telemetry parity (#24615)
    ## Why
    
    `core/src/goals.rs` already emits OTEL metrics for goal creation,
    resume, terminal transitions, token counts, and duration. As `/goal`
    moves into `ext/goal`, the extension needs to preserve that telemetry
    contract instead of only emitting app-visible `ThreadGoalUpdated`
    events.
    
    This keeps the existing `codex.goal.*` metric surface intact while goal
    lifecycle ownership shifts toward the extension.
    
    ## What changed
    
    - Added an extension-local `GoalMetrics` helper that records the
    existing `codex.goal.*` counters and histograms through `codex-otel`.
    - Threaded an optional `MetricsClient` through `install_with_backend`,
    `GoalExtension`, `GoalRuntimeHandle`, and `GoalToolExecutor`.
    - Emitted created, resumed, and terminal goal metrics from the extension
    paths that create goals, restore active goals on thread resume, account
    budget limits, complete or block goals, and handle external goal
    mutations.
    - Updated existing goal extension test setup callsites to pass `None`
    for metrics when instrumentation is not under test.
    
    ## Verification
    
    Not run locally.
  • Expose conversation history to extension tools (#23963)
    ## Why
    
    Extension tools that need conversation context should be able to read it
    from the live tool invocation instead of reaching into thread
    persistence themselves.
    
    ## What changed
    
    - Add a `ConversationHistory` snapshot to extension `ToolCall`s and
    populate it from the current raw in-memory response history.
    - Expose all history items at this boundary so each extension can filter
    and bound the subset it needs before consuming or forwarding it.
    - Cover the adapter and registry dispatch paths and update existing
    extension tests that construct `ToolCall` literals.
    
    ## Test plan
    
    - `cargo test -p codex-tools`
    - `cargo test -p codex-extension-api`
    - `cargo test -p codex-goal-extension`
    - `cargo test -p codex-memories-extension`
    - `cargo test -p codex-core passes_turn_fields_to_extension_call`
    - `cargo test -p codex-core
    extension_tool_executors_are_model_visible_and_dispatchable`
  • [codex] Steer budget-limited goal extension turns (#23718)
    ## What
    - Add a small extension capability for injecting model-visible response
    items into the active turn
    - Have the goal extension inject hidden goal-context steering when
    tool-finish accounting reaches `BudgetLimited`
    - Cover the extension backend path with an assertion on the injected
    steering item
    
    ## Why
    PR #23696 persists and emits the budget-limited goal update from
    tool-finish accounting, but it leaves the model unaware of that
    transition. The existing core runtime steers the model to wrap up in
    this case; the extension path should do the same through an explicit
    host capability.
    
    ## Testing
    - `just fmt`
    - `cargo test -p codex-goal-extension`
    - `cargo test -p codex-extension-api`
  • Fix thread settings clippy failure (#23724)
    ## Why
    
    `main` picked up two small Rust build failures after nearby merges:
    
    - #23507 added a real handler for
    `ServerNotification::ThreadSettingsUpdated`, but the same variant was
    still listed in the ignored-notification match arm. Full Clippy runs
    treat the resulting unreachable-pattern warning as an error.
    - #23666 added `turn_id` and `truncation_policy` to
    `codex_tools::ToolCall`, while the goal extension backend test fixtures
    from the goal-extension work still used the old shape. That left
    `codex-goal-extension` tests unable to compile once the branches met on
    `main`.
    
    ## What changed
    
    Removed the duplicate `ThreadSettingsUpdated` match pattern from
    `tui/src/chatwidget/protocol.rs`.
    
    Updated the goal extension test `tool_call` helper to populate the new
    `ToolCall` fields, and reused that helper for the one direct literal
    that still had the old field list.
    
    ## Verification
    
    - `just fix -p codex-tui`
    - `cargo test -p codex-goal-extension`
  • feat: account active goal progress in the goal extension (#23696)
    ## Why
    
    The goal extension can create and surface goals, but the live
    turn-accounting path still stopped short of persisting active-goal
    progress. That leaves token and wall-clock usage, plus
    `ThreadGoalUpdated` events, out of sync with the extension boundary once
    work actually advances or a goal transitions out of active state.
    
    ## What changed
    
    - Teach `GoalAccountingState` to track the current turn, active goal,
    token deltas, and wall-clock progress snapshots against the persisted
    goal id.
    - Flush active-goal accounting from tool-finish, turn-stop, and
    turn-abort lifecycle hooks, and emit `ThreadGoalUpdated` events when
    persisted progress changes.
    - Route `create_goal` and `update_goal` through the same accounting
    state so new goals start from the right baseline, final progress is
    flushed before status changes, and `update_goal` can mark a goal
    `blocked` as well as `complete`.
    - Keep budget-limited goals accruing through the end of the turn while
    clearing local active-goal state once a turn or explicit update is
    finished.
    - Expand backend and lifecycle coverage around store ids, baseline
    reset, tool-finish accounting, budget-limited carry-through, and
    blocked-goal updates.
    
    ## Testing
    
    - Added focused backend coverage in
    `codex-rs/ext/goal/tests/goal_extension_backend.rs` for baseline reset,
    tool-finish accounting, budget-limited turns, and blocked-goal updates.
    - Extended `codex-rs/core/src/session/tests.rs` to assert that lifecycle
    inputs expose the expected session, thread, and turn store ids.
  • feat: expose turn-start metadata to extensions (#23688)
    ## Why
    
    The goal extension needs more context when a turn starts than
    `turn_store` alone provides.
    
    In particular, goal accounting needs the stable turn id, the effective
    collaboration mode, and the cumulative token-usage baseline captured at
    turn start so it can:
    
    - suppress goal accounting for plan-mode turns
    - compute exact per-turn deltas from cumulative `total_token_usage`
    snapshots instead of relying on the most recent usage event alone
    - keep the extension-owned accounting path aligned with the host turn
    lifecycle
    
    ## What
    
    - extend `codex_extension_api::TurnStartInput` to expose `turn_id`,
    `collaboration_mode`, and `token_usage_at_turn_start`
    - pass the full `TurnContext` plus the captured token-usage baseline
    through the turn-start lifecycle emission path
    - initialize goal turn accounting from the turn-start baseline and
    collaboration mode
    - switch goal token accounting to compute deltas from cumulative
    `total_token_usage` snapshots
    - add coverage for the new turn-start lifecycle fields and for
    goal-accounting baseline behavior
    
    ## Testing
    
    - added `turn_start_lifecycle_exposes_turn_metadata_and_token_baseline`
    in `codex-rs/core/src/session/tests.rs`
    - added `ext/goal/tests/accounting.rs` coverage for baseline-aware goal
    accounting and plan-mode suppression
  • feat: wire goal extension tools to the dedicated goal store (#23685)
    ## Why
    
    `ext/goal` already had the tool specs and contributor wiring for
    `/goal`, but the installed tools still depended on a placeholder backend
    that always errored. That meant the extension could not actually own
    goal persistence even though the dedicated `thread_goals` store already
    exists.
    
    This change wires the extension tools directly to the dedicated goal
    store so the extension can create, read, and complete goals against real
    state instead of falling back to host-side placeholders.
    
    ## What changed
    
    - make `install_with_backend(...)` require
    `Arc<codex_state::StateRuntime>` so goal storage is always available
    when the extension is installed
    - remove the unused no-backend/public backend abstraction from
    `ext/goal` and have the tool executors talk directly to `StateRuntime`
    - map `thread_goals` rows into the existing protocol response shape for
    `get_goal`, `create_goal`, and `update_goal`
    - preserve current thread-list behavior by filling an empty thread
    preview from the goal objective when a goal is created through the
    extension path
    - add integration coverage for the installed tool surface, including
    successful goal creation and duplicate-create rejection
    
    ## Testing
    
    - `cargo test -p codex-goal-extension`