122 Commits

  • [codex] Consolidate shared prompts in codex-prompts (#25151)
    ## Why
    
    `codex_core` is consistently a bottleneck for incremental builds during
    iteration. The simplest fix is to make the crate smaller.
    
    ## Summary
    
    `codex-core` owns several reusable prompt renderers and static prompt
    assets, which makes the crate harder to split apart.
    
    Rename `codex-review-prompts` to `codex-prompts` and move shared review,
    goal, permissions, compaction, realtime, hierarchical AGENTS.md, and
    `apply_patch` prompts into it. Move prompt-only tests and update
    consumers and `CODEOWNERS`.
    
    ## Validation
    
    - `just test -p codex-prompts -p codex-apply-patch`
    - `just test -p codex-core prompt_caching`
    - Bazel builds for the affected crates
  • goal: pause continuation loops on usage limits and blockers (#23094)
    Addresses #22833, #22245, #23067
    
    ## Why
    `/goal` can keep synthesizing turns even when the next turn cannot make
    meaningful progress. Hard usage exhaustion can replay failing turns, and
    repeated permission or external-resource blockers can keep burning
    tokens while waiting for user or system intervention.
    
    ## What changed
    - Add resumable `blocked` and `usageLimited` goal states. As with
    `paused`, goal continuation stops with these states.
    - Move to `usageLimited` after usage-limit failures.
    - Allow the built-in `update_goal` tool to set `blocked` only under
    explicit repeated-impasse guidance. Updated goal continuation prompt to
    specify that agent should use `blocked` only when it has made at least
    three attempts to get past an impasse.
    
    Most of the files touched by this PR are because of the small app server
    protocol update.
    
    ## Validation
    
    I manually reproduced a number of situations where an agent can run into
    a true impasse and verified that it properly enters `blocked` state. I
    then resumed and verified that it once again entered `blocked` state
    several turns later if the impasse still exists.
    
    I also manually reproduced the usage-limit condition by creating a
    simulated responses API endpoint that returns 429 errors with the
    appropriate error message. Verified that the goal runtime properly moves
    the goal into `usageLimited` state and TUI UI updates appropriately.
    Verified that `/goal resume` resumes (and immediately goes back into
    `ussageLImited` state if appropriate).
    
    
    ## Follow-up PRs
    
    Small changes will be needed to the GUI clients to properly handle the
    two new states.
  • Fix goal update and add /goal edit command in TUI (#21954)
    ## Why
    
    Users have requested the ability to edit a goal's objective after a goal
    has been created. This PR exposes a new `/goal edit` command in the TUI
    to address this request.
    
    In the process of implementing this, I also noticed an existing bug in
    the goal runtime. When a goal's objective is updated through the
    `thread/goal/set` app server API, the goal runtime didn't emit a new
    steering prompt to tell the agent about the new objective. This PR also
    fixes this hole.
    
    ## What Changed
    
    - Adds `/goal edit` in the TUI, opening an edit box prefilled with the
    current goal objective.
    - Keeps active and paused goals in their current state, resets completed
    goals to active, keeps budget-limited goals budget-limited, and
    preserves the existing token budget.
    - Changes the existing `thread/goal/set` behavior so editing an
    objective preserves goal accounting instead of resetting it. The older
    reset-on-new-objective behavior was left over from before
    `thread/goal/clear`; clients that need to reset accounting can now clear
    the existing goal and create a new one.
    - Reuses the existing goal set API path; this does not add or change
    app-server protocol surface area.
    - Adds a dedicated goal runtime steering prompt when an externally
    persisted goal mutation changes the objective, so active turns receive
    the updated objective.
    
    ## Validation
    
    - Make sure `/goal edit` returns an error if no goal currently exists
    - Make sure `/goal edit` displays an edit box that can be optionally
    canceled with no side effects
    - Make sure that an edited goal results in a steer so the agent starts
    pursuing the new objective
    - Make sure the new objective is reflected in the goal if you use
    `/goal` to display the goal summary
    - Make sure that `/goal edit` doesn't reset the token budget, time/token
    accounting on the updated goal
  • Improve goal continuation based on feedback (#22045)
    ## Summary
    
    This PR updates the goal continuation prompt to address feedback from
    early adopters. There are two primary changes:
    
    1. Goal continuation and budget-limit steering prompts now use hidden
    user-context messages instead of hidden developer messages.
    2. The goal continuation prompt is refined to improve the model's
    ability to fully complete the active goal rather than stop at a smaller
    or merely passing subset.
    
    The user-message transition is important for two reasons. First, it
    eliminates an issue where older steering messages could be responded to
    again after a new turn. Second, it works better with compaction because
    user messages are treated differently from developer messages during
    compaction.
    
    The prompt refinements make persistence explicit, ground work in current
    evidence, encourage `update_plan` for multi-step progress visibility,
    and require stronger completion audits before calling `update_goal`. It
    also removes the elapsed-time reporting in the prompt; I saw evidence
    that this was causing the model to shortcut work as it became nervous
    about time.
    
    These changes were tested with evals. Chriss4123 has also been running
    independent evals in
    [#19910](https://github.com/openai/codex/issues/19910), and many of the
    improvements in this PR were suggested by him.
    
    ## Verification
    
    - Tested with evals.
    - Added and updated focused `codex-core` coverage for hidden goal user
    context, continuation and budget-limit request shape, prompt rendering,
    and objective delimiter escaping.
  • [tool_suggest] More prompt polishes. (#20566)
    Tool suggest still misfires when model needs tool_search, updating the
    prompts to further disambiguate it:
    
    - [x] rename it from `tool_suggest` to `request_plugin_install`
    - [x] rephrase "suggestion" to "install" in the tool descriptions.
    - [x] disambiguate "the tool" vs "the plugin/connector". 
    
    Tested with the Codex App and verified it still works.
  • Remove no-tool goal continuation suppression (#20523)
    ## Why
    
    `/goal` is supposed to keep Codex working until the goal is actually
    done. The previous continuation logic had two ways to stop early: the
    continuation prompt told the model to wait for new input when it felt
    blocked, and the runtime suppressed another continuation turn after a
    continuation finished without any tool calls.
    
    That made goals stop short even when the agent could still keep making
    progress (I received a few reports of this from users). It also relied
    on a brittle heuristic that treated "no registry tool calls" as
    equivalent to "should stop."
    
    ## What changed
    
    - removed the continuation prompt sentence that told the model to stop
    and wait for new input when it could not continue productively
    - removed the goal runtime suppression heuristic that stopped
    auto-continuation after a no-tool continuation turn
    - deleted the continuation-activity bookkeeping and left `tool_calls` as
    telemetry only
    - added focused regressions for the two intended behaviors: completed
    no-tool continuation turns still continue, while `request_user_input`
    keeps the existing turn open instead of spawning a new continuation
  • [tool_suggest] Improve tool_suggest triggering conditions. (#20091)
    ## Summary
    - Tighten `tool_suggest` guidance so it prefers explicit plugin install
    requests, while still allowing a connector install when the relevant
    plugin is already installed and a needed connector from that plugin is
    missing.
    - Tell the model not to call `tool_suggest` in parallel with other
    tools.
    
    ## Testing
    - `cargo test -p codex-tools tool_suggest`
    - `cargo test -p codex-core tool_suggest`
  • chore: split memories part 1 (#19818)
    Extract memories into 2 different crates
  • feat: use git-backed workspace diffs for memory consolidation (#18982)
    ## Why
    
    This PR make the `morpheus` agent (memory phase 2) use a git diff to
    start it's consolidation. The workflow is the following:
    1. The agent acquire a lock
    2. If `.codex/memories` does not exist or is not a git root, initialize
    everything (and make a first empty commit)
    3. Update `raw_memories.md` and `rollout_summaries/` as before.
    Basically we select max N phase 1 memories based on a given policy
    4. We use git (`gix`) to get a diff between the current state of
    `.codex/memories` and the last commit.
    5. Dump the diff in `phase2_workspace_diff.md`
    6. Spawn `morpheus` and point it to `phase2_workspace_diff.md`
    7. Wait for `morpheus` to be done
    8. Re-create a new `.git` and make one single commit on it. We do this
    because we don't want to preserve history through `.git` and this is
    cheap anyway
    9. We release the lock
    On top of this, we keep the retry policies etc etc
    
    The goals of this new workflow are:
    * Better support of any memory extensions such as `chronicle`
    * Allow the user to manually edit memories and this will be considered
    by the phase 2 agent
     
    As a follow-up we will need to add support for user's edition while
    `morpheus` is running
    
    ## What Changed
    
    - Added memory workspace helpers that prepare the git baseline, compute
    the diff, write `phase2_workspace_diff.md`, and reset the baseline after
    successful consolidation.
    - Updated Phase 2 to sync current inputs into `raw_memories.md` and
    `rollout_summaries/`, prune old extension resources, skip clean
    workspaces, and run the consolidation subagent only when the workspace
    has changes.
    - Tightened Phase 2 job ownership around long-running consolidation with
    heartbeats and an ownership check before resetting the baseline.
    - Simplified the prompt and state APIs so DB watermarks are bookkeeping,
    while workspace dirtiness decides whether consolidation work exists.
    - Updated the memory pipeline README and tests for workspace diffs,
    extension-resource cleanup, pollution-driven forgetting, selection
    ranking, and baseline persistence.
    
    ## Verification
    
    - Added/updated coverage in `core/src/memories/tests.rs`,
    `core/src/memories/workspace_tests.rs`, `state/src/runtime/memories.rs`,
    and `core/tests/suite/memories.rs`.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Add goal core runtime (4 / 5) (#18076)
    Adds the core runtime behavior for active goals on top of the model
    tools from PR 3.
    
    ## Why
    
    A long-running goal should be a core runtime concern, not something
    every client has to implement. Core owns the turn lifecycle, tool
    completion boundaries, interruptions, resume behavior, and token usage,
    so it is the right place to account progress, enforce budgets, and
    decide when to continue work.
    
    ## What changed
    
    - Centralized goal lifecycle side effects behind
    `Session::goal_runtime_apply(GoalRuntimeEvent::...)`.
    - Starts goal continuation turns only when the session is idle; pending
    user input and mailbox work take priority.
    - Accounts token and wall-clock usage at turn, tool, mutation,
    interrupt, and resume boundaries; `get_thread_goal` remains read-only.
    - Preserves sub-second wall-clock remainder across accounting boundaries
    so long-running goals do not drift downward over time.
    - Treats token budget exhaustion as a soft stop by marking the goal
    `budget_limited` and injecting wrap-up steering instead of aborting the
    active turn.
    - Suppresses budget steering when `update_goal` marks a goal complete.
    - Pauses active goals on interrupt and auto-reactivates paused goals
    when a thread resumes outside plan mode.
    - Suppresses repeated automatic continuation when a continuation turn
    makes no tool calls.
    - Added continuation and budget-limit prompt templates.
    
    ## Verification
    
    - Added focused core coverage for continuation scheduling, accounting
    boundaries, budget-limit steering, completion accounting, interrupt
    pause behavior, resume auto-activation, and wall-clock remainder
    accounting.
  • [codex] Update realtime V2 VAD silence delay and 1.5 prompt (#18092)
    ## Summary
    
    - set the realtime v2 server VAD silence delay to 500ms
    - update the default realtime 1.5 backend prompt to the v4 text
    - keep the session payload and prompt rendering tests aligned with those
    changes
    
    ## Why
    
    - the VAD change gives the voice path a longer pause before ending the
    user's turn
    - the prompt change makes the default bundled realtime prompt match the
    current v4 content
    
    ## Validation
    
    - `cargo +1.93.0 test -p codex-core realtime_prompt --manifest-path
    /tmp/codex-realtime-v2-vad-prompt-v4/codex-rs/Cargo.toml`
    - `CARGO_TARGET_DIR=/tmp/codex-pr-v4-target cargo +1.93.0 test -p
    codex-api
    realtime_v2_session_update_includes_background_agent_tool_and_handoff_output_item
    --manifest-path
    /tmp/codex-realtime-v2-vad-prompt-v4/codex-rs/Cargo.toml`
    - `CARGO_TARGET_DIR=/tmp/codex-pr-v4-target cargo +1.93.0 test -p
    codex-app-server --test all
    'suite::v2::realtime_conversation::realtime_webrtc_start_emits_sdp_notification'
    --manifest-path /tmp/codex-realtime-v2-vad-prompt-v4/codex-rs/Cargo.toml
    -- --exact`
  • fix (#17493)
    # External (non-OpenAI) Pull Request Requirements
    
    Before opening this Pull Request, please read the dedicated
    "Contributing" markdown file or your PR may be closed:
    https://github.com/openai/codex/blob/main/docs/contributing.md
    
    If your PR conforms to our contribution guidelines, replace this text
    with a detailed and high quality description of your changes.
    
    Include a link to a bug report or enhancement request.
  • Strengthen realtime backend delegation prompt (#17363)
    Encourages realtime prompt handling to delegate user requests to the
    backend agent by default when repo inspection, commands, implementation,
    or validation may help.
    
    Co-authored-by: Codex <noreply@openai.com>
  • [codex] add memory extensions (#16276)
    # External (non-OpenAI) Pull Request Requirements
    
    Before opening this Pull Request, please read the dedicated
    "Contributing" markdown file or your PR may be closed:
    https://github.com/openai/codex/blob/main/docs/contributing.md
    
    If your PR conforms to our contribution guidelines, replace this text
    with a detailed and high quality description of your changes.
    
    Include a link to a bug report or enhancement request.
  • Move default realtime prompt into core (#17165)
    - Adds a core-owned realtime backend prompt template and preparation
    path.
    - Makes omitted realtime start prompts use the core default, while null
    or empty prompts intentionally send empty instructions.
    - Covers the core realtime path and app-server v2 path with integration
    coverage.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • codex debug 2 (guardian approved) (#17118)
    Removes lines 8-14 from core/templates/agents/orchestrator.md.
  • codex debug 15 (guardian approved) (#17131)
    Removes lines 99-106 from core/templates/agents/orchestrator.md.
  • codex debug 13 (guardian approved) (#17129)
    Removes lines 85-91 from core/templates/agents/orchestrator.md.
  • codex debug 11 (guardian approved) (#17127)
    Removes lines 71-77 from core/templates/agents/orchestrator.md.
  • codex debug 9 (guardian approved) (#17125)
    Removes lines 57-63 from core/templates/agents/orchestrator.md.
  • codex debug 7 (guardian approved) (#17123)
    Removes lines 43-49 from core/templates/agents/orchestrator.md.
  • codex debug 5 (guardian approved) (#17121)
    Removes lines 29-35 from core/templates/agents/orchestrator.md.
  • codex debug 3 (guardian approved) (#17119)
    Removes lines 15-21 from core/templates/agents/orchestrator.md.
  • codex debug 1 (guardian approved) (#17117)
    Removes lines 1-7 from core/templates/agents/orchestrator.md.
  • extract models manager and related ownership from core (#16508)
    ## Summary
    - split `models-manager` out of `core` and add `ModelsManagerConfig`
    plus `Config::to_models_manager_config()` so model metadata paths stop
    depending on `core::Config`
    - move login-owned/auth-owned code out of `core` into `codex-login`,
    move model provider config into `codex-model-provider-info`, move API
    bridge mapping into `codex-api`, move protocol-owned types/impls into
    `codex-protocol`, and move response debug helpers into a dedicated
    `response-debug-context` crate
    - move feedback tag emission into `codex-feedback`, relocate tests to
    the crates that now own the code, and keep broad temporary re-exports so
    this PR avoids a giant import-only rewrite
    
    ## Major moves and decisions
    - created `codex-models-manager` as the owner for model
    cache/catalog/config/model info logic, including the new
    `ModelsManagerConfig` struct
    - created `codex-model-provider-info` as the owner for provider config
    parsing/defaults and kept temporary `codex-login`/`codex-core`
    re-exports for old import paths
    - moved `api_bridge` error mapping + `CoreAuthProvider` into
    `codex-api`, while `codex-login::api_bridge` temporarily re-exports
    those symbols and keeps the `auth_provider_from_auth` wrapper
    - moved `auth_env_telemetry` and `provider_auth` ownership to
    `codex-login`
    - moved `CodexErr` ownership to `codex-protocol::error`, plus
    `StreamOutput`, `bytes_to_string_smart`, and network policy helpers to
    protocol-owned modules
    - created `codex-response-debug-context` for
    `extract_response_debug_context`, `telemetry_transport_error_message`,
    and related response-debug plumbing instead of leaving that behavior in
    `core`
    - moved `FeedbackRequestTags`, `emit_feedback_request_tags`, and
    `emit_feedback_request_tags_with_auth_env` to `codex-feedback`
    - deferred removal of temporary re-exports and the mechanical import
    rewrites to a stacked follow-up PR so this PR stays reviewable
    
    ## Test moves
    - moved auth refresh coverage from `core/tests/suite/auth_refresh.rs` to
    `login/tests/suite/auth_refresh.rs`
    - moved text encoding coverage from
    `core/tests/suite/text_encoding_fix.rs` to
    `protocol/src/exec_output_tests.rs`
    - moved model info override coverage from
    `core/tests/suite/model_info_overrides.rs` to
    `models-manager/src/model_info_overrides_tests.rs`
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • [plugins] Polish tool suggest prompts. (#15891)
    - [x] Polish tool suggest prompts to distinguish between missing
    connectors and discoverable plugins, and be very precise about the
    triggering conditions.
  • [apps][tool_suggest] Remove tool_suggest's dependency on tool search. (#14856)
    - [x] Remove tool_suggest's dependency on tool search.
  • [apps] Add tool call meta. (#14647)
    - [x] Add resource_uri and other things to _meta to shortcut resource
    lookup and speed things up.
  • Rename multi-agent wait tool to wait_agent (#14631)
    - rename the multi-agent tool name the model sees to wait_agent
    - update the model-facing prompts and tool descriptions to match
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Update tool search prompts (#14500)
    - [x] Add mentions of connectors because model always think in connector
    terms in its CoT.
    - [x] Suppress list_mcp_resources in favor of tool search for available
    apps.
  • memories: focus write prompts on user preferences (#14493)
    ## Summary
    - update `codex-rs/core/templates/memories/stage_one_system.md` so phase
    1 captures stronger user-preference signals, richer task summaries, and
    cwd provenance without branch-specific fields
    - update `codex-rs/core/templates/memories/consolidation.md` so phase 2
    keeps separate sections for user preferences, reusable knowledge, and
    failure shields while staying cwd-aware but branchless
    - document the `codex` prompt-template maintenance rule in
    `codex-rs/core/src/memories/README.md`: the undated templates are
    canonical here and should be edited in place
    
    ## Testing
    - cargo test -p codex-core memories --manifest-path codex-rs/Cargo.toml
  • [apps] Add tool_suggest tool. (#14287)
    - [x] Add tool_suggest tool.
    - [x] Move chatgpt/src/connectors.rs and core/src/connectors.rs into a
    dedicated mod so that we have all the logic and global cache in one
    place.
    - [x] Update TUI app link view to support rendering the installation
    view for mcp elicitation.
    
    ---------
    
    Co-authored-by: Shaqayeq <shaqayeq@openai.com>
    Co-authored-by: Eric Traut <etraut@openai.com>
    Co-authored-by: pakrym-oai <pakrym@openai.com>
    Co-authored-by: Ahmed Ibrahim <aibrahim@openai.com>
    Co-authored-by: guinness-oai <guinness@openai.com>
    Co-authored-by: Eugene Brevdo <ebrevdo@users.noreply.github.com>
    Co-authored-by: Charlie Guo <cguo@openai.com>
    Co-authored-by: Fouad Matin <fouad@openai.com>
    Co-authored-by: Fouad Matin <169186268+fouad-openai@users.noreply.github.com>
    Co-authored-by: xl-openai <xl@openai.com>
    Co-authored-by: alexsong-oai <alexsong@openai.com>
    Co-authored-by: Owen Lin <owenlin0@gmail.com>
    Co-authored-by: sdcoffey <stevendcoffey@gmail.com>
    Co-authored-by: Codex <noreply@openai.com>
    Co-authored-by: Won Park <won@openai.com>
    Co-authored-by: Dylan Hurd <dylan.hurd@openai.com>
    Co-authored-by: celia-oai <celia@openai.com>
    Co-authored-by: gabec-openai <gabec@openai.com>
    Co-authored-by: joeytrasatti-openai <joey.trasatti@openai.com>
    Co-authored-by: Leo Shimonaka <leoshimo@openai.com>
    Co-authored-by: Rasmus Rygaard <rasmus@openai.com>
    Co-authored-by: maja-openai <163171781+maja-openai@users.noreply.github.com>
    Co-authored-by: pash-openai <pash@openai.com>
    Co-authored-by: Josh McKinney <joshka@openai.com>
  • feat: search_tool migrate to bring you own tool of Responses API (#14274)
    ## Why
    
    to support a new bring your own search tool in Responses
    API(https://developers.openai.com/api/docs/guides/tools-tool-search#client-executed-tool-search)
    we migrating our bm25 search tool to use official way to execute search
    on client and communicate additional tools to the model.
    
    ## What
    - replace the legacy `search_tool_bm25` flow with client-executed
    `tool_search`
    - add protocol, SSE, history, and normalization support for
    `tool_search_call` and `tool_search_output`
    - return namespaced Codex Apps search results and wire namespaced
    follow-up tool calls back into MCP dispatch
  • feat: pres artifact part 5 (#13355)
    Mostly written by Codex
  • feat: presentation artifact p1 (#13341)
    Part 1 of presentation tool artifact
  • Adjusting plan prompt for clarity and verbosity (#13284)
    `plan.md` prompt changes to tighten plan clarity and verbosity.
  • Tune memory read-path for stale facts (#13088)
    ## Why
    - tighten Codex memory-read behavior around stale facts and conflicting
    memory
    - encode the risk-of-drift vs verification-effort decision rule directly
    in the read-path prompt
    - make partial stale-detail updates explicit so correcting only the
    answer is not treated as sufficient
    
    ## What changed
    - update `codex-rs/core/templates/memories/read_path.md`
    - add guidance for when to verify cheap local facts vs when to answer
    from older memory with visible provenance
    - strengthen same-turn `MEMORY.md` updates when stored concrete details
    are stale
    
    ## Notes
    - this is based on some staleness eval work
  • feat: memories forgetting (#12900)
    Add diff based memory forgetting
  • Enable request_user_input in Default mode (#12735)
    ## Summary
    - allow `request_user_input` in Default collaboration mode as well as
    Plan
    - update the Default-mode instructions to prefer assumptions first and
    use `request_user_input` only when a question is unavoidable
    - update request_user_input and app-server tests to match the new
    Default-mode behavior
    - refactor collaboration-mode availability plumbing into
    `CollaborationModesConfig` for future mode-related flags
    
    ## Codex author
    `codex resume 019c9124-ed28-7c13-96c6-b916b1c97d49`
  • feat: adding stream parser (#12666)
    Add a stream parser to extract citations (and others) from a stream.
    This support cases where markers are split in differen tokens.
    
    Codex never manage to make this code work so everything was done
    manually. Please review correctly and do not touch this part of the code
    without a very clear understanding of it