Commit Graph

172 Commits

  • Inject SKILL.md when it's explicitly mentioned. (#7763)
    1. Skills load once in core at session start; the cached outcome is
    reused across core and surfaced to TUI via SessionConfigured.
    2. TUI detects explicit skill selections, and core injects the matching
    SKILL.md content into the turn when a selected skill is present.
  • Removed experimental "command risk assessment" feature (#7799)
    This experimental feature received lukewarm reception during internal
    testing. Removing from the code base.
  • refactoring with_escalated_permissions to use SandboxPermissions instead (#7750)
    helpful in the future if we want more granularity for requesting
    escalated permissions:
    e.g when running in readonly sandbox, model can request to escalate to a
    sandbox that allows writes
  • Fix: gracefully error out for unsupported images (#7478)
    Fix for #7459 
    ## What
    Since codex errors out for unsupported images, stop attempting to
    base64/attach them and instead emit a clear placeholder when the file
    isn’t a supported image MIME.
    
    ## Why
    Local uploads for unsupported formats (e.g., SVG/GIF/etc.) were
    dead-ending after decode failures because of the 400 retry loop. Users
    now get an explicit “cannot attach … unsupported image format …”
    response.
    
    ## How
    Replace the fallback read/encode path with MIME detection that bails out
    for non-image or unsupported image types, returning a consistent
    placeholder. Unreadable and invalid images still produce their existing
    error placeholders.
  • override instructions using ModelInfo (#7754)
    Making sure we can override base instructions
  • load models from disk and set a ttl and etag (#7722)
    # External (non-OpenAI) Pull Request Requirements
    
    Before opening this Pull Request, please read the dedicated
    "Contributing" markdown file or your PR may be closed:
    https://github.com/openai/codex/blob/main/docs/contributing.md
    
    If your PR conforms to our contribution guidelines, replace this text
    with a detailed and high quality description of your changes.
    
    Include a link to a bug report or enhancement request.
  • Add remote models feature flag (#7648)
    # External (non-OpenAI) Pull Request Requirements
    
    Before opening this Pull Request, please read the dedicated
    "Contributing" markdown file or your PR may be closed:
    https://github.com/openai/codex/blob/main/docs/contributing.md
    
    If your PR conforms to our contribution guidelines, replace this text
    with a detailed and high quality description of your changes.
    
    Include a link to a bug report or enhancement request.
  • feat(core) Add login to shell_command tool (#6846)
    ## Summary
    Adds the `login` parameter to the `shell_command` tool - optional,
    defaults to true.
    
    ## Testing
    - [x] Tested locally
  • fix: taking plan type from usage endpoint instead of thru auth token (#7610)
    pull plan type from the usage endpoint, persist it in session state /
    tui state, and propagate through rate limit snapshots
  • Call models endpoint in models manager (#7616)
    - Introduce `with_remote_overrides` and update
    `refresh_available_models`
    - Put `auth_manager` instead of `auth_mode` on `models_manager`
    - Remove `ShellType` and `ReasoningLevel` to use already existing
    structs
  • Add models endpoint (#7603)
    - Use the codex-api crate to introduce models endpoint. 
    - Add `models` to codex core tests helpers
    - Add `ModelsInfo` for the endpoint return type
  • Refactor execpolicy fallback evaluation (#7544)
    ## Refactor of the `execpolicy` crate
    
    To illustrate why we need this refactor, consider an agent attempting to
    run `apple | rm -rf ./`. Suppose `apple` is allowed by `execpolicy`.
    Before this PR, `execpolicy` would consider `apple` and `pear` and only
    render one rule match: `Allow`. We would skip any heuristics checks on
    `rm -rf ./` and immediately approve `apple | rm -rf ./` to run.
    
    To fix this, we now thread a `fallback` evaluation function into
    `execpolicy` that runs when no `execpolicy` rules match a given command.
    In our example, we would run `fallback` on `rm -rf ./` and prevent
    `apple | rm -rf ./` from being run without approval.
  • whitelist command prefix integration in core and tui (#7033)
    this PR enables TUI to approve commands and add their prefixes to an
    allowlist:
    <img width="708" height="605" alt="Screenshot 2025-11-21 at 4 18 07 PM"
    src="https://github.com/user-attachments/assets/56a19893-4553-4770-a881-becf79eeda32"
    />
    
    note: we only show the option to whitelist the command when 
    1) command is not multi-part (e.g `git add -A && git commit -m 'hello
    world'`)
    2) command is not already matched by an existing rule
  • Migrate model preset (#7542)
    - Introduce `openai_models` in `/core`
    - Move `PRESETS` under it
    - Move `ModelPreset`, `ModelUpgrade`, `ReasoningEffortPreset`,
    `ReasoningEffortPreset`, and `ReasoningEffortPreset` to `protocol`
    - Introduce `Op::ListModels` and `EventMsg::AvailableModels`
    
    Next steps:
    - migrate `app-server` and `tui` to use the introduced Operation
  • chore: add cargo-deny configuration (#7119)
    - add GitHub workflow running cargo-deny on push/PR
    - document cargo-deny allowlist with workspace-dep notes and advisory
    ignores
    - align workspace crates to inherit version/edition/license for
    consistent checks
  • [feedback] Add source info into feedback metadata. (#7140)
    Verified the source info is correctly attached based on whether it's cli
    or vscode.
  • support MCP elicitations (#6947)
    No support for request schema yet, but we'll at least show the message
    and allow accept/decline.
    
    <img width="823" height="551" alt="Screenshot 2025-11-21 at 2 44 05 PM"
    src="https://github.com/user-attachments/assets/6fbb892d-ca12-4765-921e-9ac4b217534d"
    />
  • Support all types of search actions (#7061)
    Fixes the 
    
    ```
    {
      "error": {
        "message": "Invalid value: 'other'. Supported values are: 'search', 'open_page', and 'find_in_page'.",
        "type": "invalid_request_error",
        "param": "input[150].action.type",
        "code": "invalid_value"
      }
    ```
    error.
    
    
    The actual-actual fix here is supporting absent `query` parameter.
  • [app-server] update doc with codex error info (#6941)
    Document new codex error info. Also fixed the name from
    `codex_error_code` to `codex_error_info`.
  • [app-server & core] introduce new codex error code and v2 app-server error events (#6938)
    This PR does two things:
    1. populate a new `codex_error_code` protocol in error events sent from
    core to client;
    2. old v1 core events `codex/event/stream_error` and `codex/event/error`
    will now both become `error`. We also show codex error code for
    turncompleted -> error status.
    
    new events in app server test:
    ```
    < {
    <   "method": "codex/event/stream_error",
    <   "params": {
    <     "conversationId": "019aa34c-0c14-70e0-9706-98520a760d67",
    <     "id": "0",
    <     "msg": {
    <       "codex_error_code": {
    <         "response_stream_disconnected": {
    <           "http_status_code": 401
    <         }
    <       },
    <       "message": "Reconnecting... 2/5",
    <       "type": "stream_error"
    <     }
    <   }
    < }
    
     {
    <   "method": "error",
    <   "params": {
    <     "error": {
    <       "codexErrorCode": {
    <         "responseStreamDisconnected": {
    <           "httpStatusCode": 401
    <         }
    <       },
    <       "message": "Reconnecting... 2/5"
    <     }
    <   }
    < }
    
    < {
    <   "method": "turn/completed",
    <   "params": {
    <     "turn": {
    <       "error": {
    <         "codexErrorCode": {
    <           "responseTooManyFailedAttempts": {
    <             "httpStatusCode": 401
    <           }
    <         },
    <         "message": "exceeded retry limit, last status: 401 Unauthorized, request id: 9a1b495a1a97ed3e-SJC"
    <       },
    <       "id": "0",
    <       "items": [],
    <       "status": "failed"
    <     }
    <   }
    < }
    ```
  • [app-server] feat: v2 apply_patch approval flow (#6760)
    This PR adds the API V2 version of the apply_patch approval flow, which
    centers around `ThreadItem::FileChange`.
    
    This PR wires the new RPC (`item/fileChange/requestApproval`, V2 only)
    and related events (`item/started`, `item/completed` for
    `ThreadItem::FileChange`, which are emitted in both V1 and V2) through
    the app-server
    protocol. The new approval RPC is only sent when the user initiates a
    turn with the new `turn/start` API so we don't break backwards
    compatibility with VSCE.
    
    Similar to https://github.com/openai/codex/pull/6758, the approach I
    took was to make as few changes to the Codex core as possible,
    leveraging existing `EventMsg` core events, and translating those in
    app-server. I did have to add a few additional fields to
    `EventMsg::PatchApplyBegin` and `EventMsg::PatchApplyEnd`, but those
    were fairly lightweight.
    
    However, the `EventMsg`s emitted by core are the following:
    ```
    1) Auto-approved (no request for approval)

    - EventMsg::PatchApplyBegin
    - EventMsg::PatchApplyEnd
    
    2) Approved by user
    - EventMsg::ApplyPatchApprovalRequest
    - EventMsg::PatchApplyBegin
    - EventMsg::PatchApplyEnd
    
    3) Declined by user
    - EventMsg::ApplyPatchApprovalRequest
    - EventMsg::PatchApplyBegin
    - EventMsg::PatchApplyEnd
    ```
    
    For a request triggering an approval, this would result in:
    ```
    item/fileChange/requestApproval
    item/started
    item/completed
    ```
    
    which is different from the `ThreadItem::CommandExecution` flow
    introduced in https://github.com/openai/codex/pull/6758, which does the
    below and is preferable:
    ```
    item/started
    item/commandExecution/requestApproval
    item/completed
    ```
    
    To fix this, we leverage `TurnSummaryStore` on codex_message_processor
    to store a little bit of state, allowing us to fire `item/started` and
    `item/fileChange/requestApproval` whenever we receive the underlying
    `EventMsg::ApplyPatchApprovalRequest`, and no-oping when we receive the
    `EventMsg::PatchApplyBegin` later.
    
    This is much less invasive than modifying the order of EventMsg within
    core (I tried).
    
    The resulting payloads:
    ```
    {
      "method": "item/started",
      "params": {
        "item": {
          "changes": [
            {
              "diff": "Hello from Codex!\n",
              "kind": "add",
              "path": "/Users/owen/repos/codex/codex-rs/APPROVAL_DEMO.txt"
            }
          ],
          "id": "call_Nxnwj7B3YXigfV6Mwh03d686",
          "status": "inProgress",
          "type": "fileChange"
        }
      }
    }
    ```
    
    ```
    {
      "id": 0,
      "method": "item/fileChange/requestApproval",
      "params": {
        "grantRoot": null,
        "itemId": "call_Nxnwj7B3YXigfV6Mwh03d686",
        "reason": null,
        "threadId": "019a9e11-8295-7883-a283-779e06502c6f",
        "turnId": "1"
      }
    }
    ```
    
    ```
    {
      "id": 0,
      "result": {
        "decision": "accept"
      }
    }
    ```
    
    ```
    {
      "method": "item/completed",
      "params": {
        "item": {
          "changes": [
            {
              "diff": "Hello from Codex!\n",
              "kind": "add",
              "path": "/Users/owen/repos/codex/codex-rs/APPROVAL_DEMO.txt"
            }
          ],
          "id": "call_Nxnwj7B3YXigfV6Mwh03d686",
          "status": "completed",
          "type": "fileChange"
        }
      }
    }
    ```
  • Revert "[core] add optional status_code to error events (#6865)" (#6955)
    This reverts commit c2ec477d93.
    
    # External (non-OpenAI) Pull Request Requirements
    
    Before opening this Pull Request, please read the dedicated
    "Contributing" markdown file or your PR may be closed:
    https://github.com/openai/codex/blob/main/docs/contributing.md
    
    If your PR conforms to our contribution guidelines, replace this text
    with a detailed and high quality description of your changes.
    
    Include a link to a bug report or enhancement request.
  • [core] add optional status_code to error events (#6865)
    We want to better uncover error status code for clients. Add an optional
    status_code to error events (thread error, error, stream error) so app
    server could uncover the status code from the client side later.
    
    in event log:
    ```
    < {
    <   "method": "codex/event/stream_error",
    <   "params": {
    <     "conversationId": "019a9a32-f576-7292-9711-8e57e8063536",
    <     "id": "0",
    <     "msg": {
    <       "message": "Reconnecting... 5/5",
    <       "status_code": 401,
    <       "type": "stream_error"
    <     }
    <   }
    < }
    < {
    <   "method": "codex/event/error",
    <   "params": {
    <     "conversationId": "019a9a32-f576-7292-9711-8e57e8063536",
    <     "id": "0",
    <     "msg": {
    <       "message": "exceeded retry limit, last status: 401 Unauthorized, request id: 9a0cb03a485067f7-SJC",
    <       "status_code": 401,
    <       "type": "error"
    <     }
    <   }
    < }
    ```
  • storing credits (#6858)
    Expand the rate-limit cache/TUI: store credit snapshots alongside
    primary and secondary windows, render “Credits” when the backend reports
    they exist (unlimited vs rounded integer balances)
  • fix: typos in model picker (#6859)
    # External (non-OpenAI) Pull Request Requirements
    
    Before opening this Pull Request, please read the dedicated
    "Contributing" markdown file or your PR may be closed:
    https://github.com/openai/codex/blob/main/docs/contributing.md
    
    If your PR conforms to our contribution guidelines, replace this text
    with a detailed and high quality description of your changes.
    
    Include a link to a bug report or enhancement request.
  • fix: add more fields to ThreadStartResponse and ThreadResumeResponse (#6847)
    This adds the following fields to `ThreadStartResponse` and
    `ThreadResumeResponse`:
    
    ```rust
        pub model: String,
        pub model_provider: String,
        pub cwd: PathBuf,
        pub approval_policy: AskForApproval,
        pub sandbox: SandboxPolicy,
        pub reasoning_effort: Option<ReasoningEffort>,
    ```
    
    This is important because these fields are optional in
    `ThreadStartParams` and `ThreadResumeParams`, so the caller needs to be
    able to determine what values were ultimately used to start/resume the
    conversation. (Though note that any of these could be changed later
    between turns in the conversation.)
    
    Though to get this information reliably, it must be read from the
    internal `SessionConfiguredEvent` that is created in response to the
    start of a conversation. Because `SessionConfiguredEvent` (as defined in
    `codex-rs/protocol/src/protocol.rs`) did not have all of these fields, a
    number of them had to be added as part of this PR.
    
    Because `SessionConfiguredEvent` is referenced in many tests, test
    instances of `SessionConfiguredEvent` had to be updated, as well, which
    is why this PR touches so many files.
  • feat: remote compaction (#6795)
    Co-authored-by: pakrym-oai <pakrym@openai.com>
  • [app-server] feat: add v2 command execution approval flow (#6758)
    This PR adds the API V2 version of the command‑execution approval flow
    for the shell tool.
    
    This PR wires the new RPC (`item/commandExecution/requestApproval`, V2
    only) and related events (`item/started`, `item/completed`, and
    `item/commandExecution/delta`, which are emitted in both V1 and V2)
    through the app-server
    protocol. The new approval RPC is only sent when the user initiates a
    turn with the new `turn/start` API so we don't break backwards
    compatibility with VSCE.
    
    The approach I took was to make as few changes to the Codex core as
    possible, leveraging existing `EventMsg` core events, and translating
    those in app-server. I did have to add additional fields to
    `EventMsg::ExecCommandEndEvent` to capture the command's input so that
    app-server can statelessly transform these events to a
    `ThreadItem::CommandExecution` item for the `item/completed` event.
    
    Once we stabilize the API and it's complete enough for our partners, we
    can work on migrating the core to be aware of command execution items as
    a first-class concept.
    
    **Note**: We'll need followup work to make sure these APIs work for the
    unified exec tool, but will wait til that's stable and landed before
    doing a pass on app-server.
    
    Example payloads below:
    ```
    {
      "method": "item/started",
      "params": {
        "item": {
          "aggregatedOutput": null,
          "command": "/bin/zsh -lc 'touch /tmp/should-trigger-approval'",
          "cwd": "/Users/owen/repos/codex/codex-rs",
          "durationMs": null,
          "exitCode": null,
          "id": "call_lNWWsbXl1e47qNaYjFRs0dyU",
          "parsedCmd": [
            {
              "cmd": "touch /tmp/should-trigger-approval",
              "type": "unknown"
            }
          ],
          "status": "inProgress",
          "type": "commandExecution"
        }
      }
    }
    ```
    
    ```
    {
      "id": 0,
      "method": "item/commandExecution/requestApproval",
      "params": {
        "itemId": "call_lNWWsbXl1e47qNaYjFRs0dyU",
        "parsedCmd": [
          {
            "cmd": "touch /tmp/should-trigger-approval",
            "type": "unknown"
          }
        ],
        "reason": "Need to create file in /tmp which is outside workspace sandbox",
        "risk": null,
        "threadId": "019a93e8-0a52-7fe3-9808-b6bc40c0989a",
        "turnId": "1"
      }
    }
    ```
    
    ```
    {
      "id": 0,
      "result": {
        "acceptSettings": {
          "forSession": false
        },
        "decision": "accept"
      }
    }
    ```
    
    ```
    {
      "params": {
        "item": {
          "aggregatedOutput": null,
          "command": "/bin/zsh -lc 'touch /tmp/should-trigger-approval'",
          "cwd": "/Users/owen/repos/codex/codex-rs",
          "durationMs": 224,
          "exitCode": 0,
          "id": "call_lNWWsbXl1e47qNaYjFRs0dyU",
          "parsedCmd": [
            {
              "cmd": "touch /tmp/should-trigger-approval",
              "type": "unknown"
            }
          ],
          "status": "completed",
          "type": "commandExecution"
        }
      }
    }
    ```
  • core/tui: non-blocking MCP startup (#6334)
    This makes MCP startup not block TUI startup. Messages sent while MCPs
    are booting will be queued.
    
    
    https://github.com/user-attachments/assets/96e1d234-5d8f-4932-a935-a675d35c05e0
    
    
    Fixes #6317
    
    ---------
    
    Co-authored-by: pakrym-oai <pakrym@openai.com>
  • Handle "Don't Trust" directory selection in onboarding (#4941)
    Fixes #4940
    Fixes #4892
    
    When selecting "No, ask me to approve edits and commands" during
    onboarding, the code wasn't applying the correct approval policy,
    causing Codex to block all write operations instead of requesting
    approval.
    
    This PR fixes the issue by persisting the "DontTrust" decision in
    config.toml as `trust_level = "untrusted"` and handling it in the
    sandbox and approval policy logic, so Codex correctly asks for approval
    before making changes.
    
    ## Before (bug)
    <img width="709" height="500" alt="bef"
    src="https://github.com/user-attachments/assets/5aced26d-d810-4754-879a-89d9e4e0073b"
    />
    
    ## After (fixed)
    <img width="713" height="359" alt="aft"
    src="https://github.com/user-attachments/assets/9887bbcb-a9a5-4e54-8e76-9125a782226b"
    />
    
    ---------
    
    Co-authored-by: Eric Traut <etraut@openai.com>
  • feat: better UI for unified_exec (#6515)
    <img width="376" height="132" alt="Screenshot 2025-11-12 at 17 36 22"
    src="https://github.com/user-attachments/assets/ce693f0d-5ca0-462e-b170-c20811dcc8d5"
    />
  • [app-server] small fixes for JSON schema export and one-of types (#6614)
    A partner is consuming our generated JSON schema bundle for app-server
    and identified a few issues:
    - not all polymorphic / one-of types have a type descriminator
    - `"$ref": "#/definitions/v2/SandboxPolicy"` is missing
    - "Option<>" is an invalid schema name, and also unnecessary
    
    This PR:
    - adds the type descriminator to the various types that are missing it
    except for `SessionSource` and `SubAgentSource` because they are
    serialized to disk (adding this would break backwards compat for
    resume), and they should not be necessary to consume for an integration
    with app-server.
    - removes the special handling in `export.rs` of various types like
    SandboxPolicy, which turned out to be unnecessary and incorrect
    - filters out `Option<>` which was auto-generated for request params
    that don't need a body
    
    For context, we currently pull in wayyy more types than we need through
    the `EventMsg` god object which we are **not** planning to expose in API
    v2 (this is how I suspect `SessionSource` and `SubAgentSource` are being
    pulled in). But until we have all the necessary v2 notifications in
    place that will allow us to remove `EventMsg`, we will keep exporting it
    for now.
  • [App-server] add new v2 events:item/reasoning/delta, item/agentMessage/delta & item/reasoning/summaryPartAdded (#6559)
    core event to app server event mapping:
    1. `codex/event/reasoning_content_delta` ->
    `item/reasoning/summaryTextDelta`.
    2. `codex/event/reasoning_raw_content_delta` ->
    `item/reasoning/textDelta`
    3. `codex/event/agent_message_content_delta` →
    `item/agentMessage/delta`.
    4. `codex/event/agent_reasoning_section_break` ->
    `item/reasoning/summaryPartAdded`.
    
    Also added a change in core to pass down content index, summary index
    and item id from events.
    
    Tested with the `git checkout owen/app_server_test_client && cargo run
    -p codex-app-server-test-client -- send-message-v2 "hello"` and verified
    that new events are emitted correctly.
  • Reasoning level update (#6586)
    Automatically update reasoning levels when migrating between models
  • Set verbosity to low for 5.1 (#6568)
    And improve test coverage
  • feat: shell_command tool (#6510)
    This adds support for a new variant of the shell tool behind a flag. To
    test, run `codex` with `--enable shell_command_tool`, which will
    register the tool with Codex under the name `shell_command` that accepts
    the following shape:
    
    ```python
    {
      command: str
      workdir: str | None,
      timeout_ms: int | None,
      with_escalated_permissions: bool | None,
      justification: str | None,
    }
    ```
    
    This is comparable to the existing tool registered under
    `shell`/`container.exec`. The primary difference is that it accepts
    `command` as a `str` instead of a `str[]`. The `shell_command` tool
    executes by running `execvp(["bash", "-lc", command])`, though the exact
    arguments to `execvp(3)` depend on the user's default shell.
    
    The hypothesis is that this will simplify things for the model. For
    example, on Windows, instead of generating:
    
    ```json
    {"command": ["pwsh.exe", "-NoLogo", "-Command", "ls -Name"]}
    ```
    
    The model could simply generate:
    
    ```json
    {"command": "ls -Name"}
    ```
    
    As part of this change, I extracted some logic out of `user_shell.rs` as
    `Shell::derive_exec_args()` so that it can be reused in
    `codex-rs/core/src/tools/handlers/shell.rs`. Note the original code
    generated exec arg lists like:
    
    ```javascript
    ["bash", "-lc", command]
    ["zsh", "-lc", command]
    ["pwsh.exe", "-NoProfile", "-Command", command]
    ```
    
    Using `-l` for Bash and Zsh, but then specifying `-NoProfile` for
    PowerShell seemed inconsistent to me, so I changed this in the new
    implementation while also adding a `use_login_shell: bool` option to
    make this explicit. If we decide to add a `login: bool` to
    `ShellCommandToolCallParams` like we have for unified exec:
    
    
    https://github.com/openai/codex/blob/807e2c27f0a9f2e85c50e7e6df5533f0d9b853c7/codex-rs/core/src/tools/handlers/unified_exec.rs#L33-L34
    
    Then this should make it straightforward to support.
  • Include reasoning tokens in the context window calculation (#6161)
    This value is used to determine whether mid-turn compaction is required.
    Reasoning items are only excluded between turns (and soon will start to
    be preserved even across turns) so it's incorrect to subtract
    reasoning_output_tokens mid term.
    
    This will result in higher values reported between turns but we are also
    looking into preserving reasoning items for the entire conversation to
    improve performance and caching.
  • Changes to sandbox command assessment feature based on initial experiment feedback (#6091)
    * Removed sandbox risk categories; feedback indicates that these are not
    that useful and "less is more"
    * Tweaked the assessment prompt to generate terser answers
    * Fixed bug in orchestrator that prevents this feature from being
    exposed in the extension