Commit Graph

190 Commits

  • feat: better UI for unified_exec (#6515)
    <img width="376" height="132" alt="Screenshot 2025-11-12 at 17 36 22"
    src="https://github.com/user-attachments/assets/ce693f0d-5ca0-462e-b170-c20811dcc8d5"
    />
  • feat: Add support for --add-dir to exec and TypeScript SDK (#6565)
    ## Summary
    
    Adds support for specifying additional directories in the TypeScript SDK
    through a new `additionalDirectories` option in `ThreadOptions`.
    
    ## Changes
    
    - Added `additionalDirectories` parameter to `ThreadOptions` interface
    - Updated `CodexExec` to accept and pass through additional directories
    via the `--config` flag for `sandbox_workspace_write.writable_roots`
    - Added comprehensive test coverage for the new functionality
    
    ## Test plan
    
    - Added test case that verifies `additionalDirectories` is correctly
    passed as repeated flags
    - Existing tests continue to pass
    
    ---------
    
    Co-authored-by: Claude <noreply@anthropic.com>
  • Update full-auto description with on-request (#6523)
    This PR fixes #6522 by correcting the comment for `full-auto` in both
    `codex-rs/exec/src/cli.rs` and `codex-rs/tui/src/cli.rs` from `-a
    on-failure` to `-a on-request` to make it coherent with
    `codex-rs/tui/src/lib.rs:97-105`:
    
    ```rust
    pub async fn run_main(
        mut cli: Cli,
        codex_linux_sandbox_exe: Option<PathBuf>,
    ) -> std::io::Result<AppExitInfo> {
        let (sandbox_mode, approval_policy) = if cli.full_auto {
            (
                Some(SandboxMode::WorkspaceWrite),
                Some(AskForApproval::OnRequest),
            )
    ```
    
    Running `just codex --help` or `just codex exec --help` should now yield
    the correct description of `full-auto` CLI argument.
    
    Signed-off-by: lionelchg <lionel.cheng@hotmail.fr>
  • Fix warning message phrasing (#6446)
    Small fix for sentence phrasing in the warning message
    
    Co-authored-by: AndrewNikolin <877163+AndrewNikolin@users.noreply.github.com>
  • Add warning on compact (#6052)
    This PR introduces the ability for `core` to send `warnings` as it can
    send `errors. It also sends a warning on compaction.
    
    <img width="811" height="187" alt="image"
    src="https://github.com/user-attachments/assets/0947a42d-b720-420d-b7fd-115f8a65a46a"
    />
  • [Hygiene] Remove include_view_image_tool config (#5976)
    There's still some debate about whether we want to expose
    `tools.view_image` or `feature.view_image` so those are left unchanged
    for now, but this old `include_view_image_tool` config is good-to-go.
    Also updated the doc to reflect that `view_image` tool is now by default
    true.
  • [codex] add developer instructions (#5897)
    we are using developer instructions for code reviews, we need to pass
    them in cli as well.
  • feat: compaction prompt configurable (#5959)
    ```
     codex -c compact_prompt="Summarize in bullet points"
     ```
  • Add item streaming events (#5546)
    Adds AgentMessageContentDelta, ReasoningContentDelta,
    ReasoningRawContentDelta item streaming events while maintaining
    compatibility for old events.
    
    ---------
    
    Co-authored-by: Owen Lin <owen@openai.com>
  • [exec] Add MCP tool arguments and results (#5899)
    Extends mcp_tool_call item to include arguments and results.
  • feat: deprecation warning (#5825)
    <img width="955" height="311" alt="Screenshot 2025-10-28 at 14 26 25"
    src="https://github.com/user-attachments/assets/99729b3d-3bc9-4503-aab3-8dc919220ab4"
    />
  • feature: Add "!cmd" user shell execution (#2471)
    feature: Add "!cmd" user shell execution
    
    This change lets users run local shell commands directly from the TUI by
    prefixing their input with ! (e.g. !ls). Output is truncated to keep the
    exec cell usable, and Ctrl-C cleanly
      interrupts long-running commands (e.g. !sleep 10000).
    
    **Summary of changes**
    
    - Route Op::RunUserShellCommand through a dedicated UserShellCommandTask
    (core/src/tasks/user_shell.rs), keeping the task logic out of codex.rs.
    - Reuse the existing tool router: the task constructs a ToolCall for the
    local_shell tool and relies on ShellHandler, so no manual MCP tool
    lookup is required.
    - Emit exec lifecycle events (ExecCommandBegin/ExecCommandEnd) so the
    TUI can show command metadata, live output, and exit status.
    
    **End-to-end flow**
    
      **TUI handling**
    
    1. ChatWidget::submit_user_message (TUI) intercepts messages starting
    with !.
    2. Non-empty commands dispatch Op::RunUserShellCommand { command };
    empty commands surface a help hint.
    3. No UserInput items are created, so nothing is enqueued for the model.
    
      **Core submission loop**
    4. The submission loop routes the op to handlers::run_user_shell_command
    (core/src/codex.rs).
    5. A fresh TurnContext is created and Session::spawn_user_shell_command
    enqueues UserShellCommandTask.
    
      **Task execution**
    6. UserShellCommandTask::run emits TaskStartedEvent, formats the
    command, and prepares a ToolCall targeting local_shell.
      7. ToolCallRuntime::handle_tool_call dispatches to ShellHandler.
    
      **Shell tool runtime**
    8. ShellHandler::run_exec_like launches the process via the unified exec
    runtime, honoring sandbox and shell policies, and emits
    ExecCommandBegin/End.
    9. Stdout/stderr are captured for the UI, but the task does not turn the
    resulting ToolOutput into a model response.
    
      **Completion**
    10. After ExecCommandEnd, the task finishes without an assistant
    message; the session marks it complete and the exec cell displays the
    final output.
    
      **Conversation context**
    
    - The command and its output never enter the conversation history or the
    model prompt; the flow is local-only.
      - Only exec/task events are emitted for UI rendering.
    
    **Demo video**
    
    
    https://github.com/user-attachments/assets/fcd114b0-4304-4448-a367-a04c43e0b996
  • Fixed bug that results in a sporadic hang when attaching images (#5891)
    Addresses https://github.com/openai/codex/issues/5773
    
    Testing: I tested that images work (regardless of order that they are
    associated with the task prompt) in both the CLI and Extension. Also
    verified that conversations in CLI and extension with images can be
    resumed.
  • [Auth] Choose which auth storage to use based on config (#5792)
    This PR is a follow-up to #5591. It allows users to choose which auth
    storage mode they want by using the new
    `cli_auth_credentials_store_mode` config.
  • feat(tui): clarify Windows auto mode requirements (#5568)
    ## Summary
    - Coerce Windows `workspace-write` configs back to read-only, surface
    the forced downgrade in the approvals popup,
      and funnel users toward WSL or Full Access.
    - Add WSL installation instructions to the Auto preset on Windows while
    keeping the preset available for other
      platforms.
    - Skip the trust-on-first-run prompt on native Windows so new folders
    remain read-only without additional
      confirmation.
    - Expose a structured sandbox policy resolution from config to flag
    Windows downgrades and adjust tests (core,
    exec, TUI) to reflect the new behavior; provide a Windows-only approvals
    snapshot.
    
      ## Testing
      - cargo fmt
    - cargo test -p codex-core
    config::tests::add_dir_override_extends_workspace_writable_roots
    - cargo test -p codex-exec
    suite::resume::exec_resume_preserves_cli_configuration_overrides
    - cargo test -p codex-tui
    chatwidget::tests::approvals_selection_popup_snapshot
    - cargo test -p codex-tui
    approvals_popup_includes_wsl_note_for_auto_mode
      - cargo test -p codex-tui windows_skips_trust_prompt
      - just fix -p codex-core
      - just fix -p codex-tui
  • feat: annotate conversations with model_provider for filtering (#5658)
    Because conversations that use the Responses API can have encrypted
    reasoning messages, trying to resume a conversation with a different
    provider could lead to confusing "failed to decrypt" errors. (This is
    reproducible by starting a conversation using ChatGPT login and resuming
    it as a conversation that uses OpenAI models via Azure.)
    
    This changes `ListConversationsParams` to take a `model_providers:
    Option<Vec<String>>` and adds `model_provider` on each
    `ConversationSummary` it returns so these cases can be disambiguated.
    
    Note this ended up making changes to
    `codex-rs/core/src/rollout/tests.rs` because it had a number of cases
    where it expected `Some` for the value of `next_cursor`, but the list of
    rollouts was complete, so according to this docstring:
    
    
    https://github.com/openai/codex/blob/bcd64c7e7231d6316a2377d1525a0fa74f21b783/codex-rs/app-server-protocol/src/protocol.rs#L334-L337
    
    If there are no more items to return, then `next_cursor` should be
    `None`. This PR updates that logic.
    
    
    
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/5658).
    * #5803
    * #5793
    * __->__ #5658
  • Added model summary and risk assessment for commands that violate sandbox policy (#5536)
    This PR adds support for a model-based summary and risk assessment for
    commands that violate the sandbox policy and require user approval. This
    aids the user in evaluating whether the command should be approved.
    
    The feature works by taking a failed command and passing it back to the
    model and asking it to summarize the command, give it a risk level (low,
    medium, high) and a risk category (e.g. "data deletion" or "data
    exfiltration"). It uses a new conversation thread so the context in the
    existing thread doesn't influence the answer. If the call to the model
    fails or takes longer than 5 seconds, it falls back to the current
    behavior.
    
    For now, this is an experimental feature and is gated by a config key
    `experimental_sandbox_command_assessment`.
    
    Here is a screen shot of the approval prompt showing the risk assessment
    and summary.
    
    <img width="723" height="282" alt="image"
    src="https://github.com/user-attachments/assets/4597dd7c-d5a0-4e9f-9d13-414bd082fd6b"
    />
  • chore: drop approve all (#5503)
    Not needed anymore
  • Enable plan tool by default (#5384)
    ## Summary
    - make the plan tool available by default by removing the feature flag
    and always registering the handler
    - drop plan-tool CLI and API toggles across the exec, TUI, MCP server,
    and app server code paths
    - update tests and configs to reflect the always-on plan tool and guard
    workspace restriction tests against env leakage
    
    ## Testing
    Manually tested the extension. 
    ------
    https://chatgpt.com/codex/tasks/task_i_68f67a3ff2d083209562a773f814c1f9
  • Add ItemStarted/ItemCompleted events for UserInputItem (#5306)
    Adds a new ItemStarted event and delivers UserMessage as the first item
    type (more to come).
    
    
    Renames `InputItem` to `UserInput` considering we're using the `Item`
    suffix for actual items.
  • Auto compact at ~90% (#5292)
    Users now hit a window exceeded limit and they usually don't know what
    to do. This starts auto compact at ~90% of the window.
  • Add forced_chatgpt_workspace_id and forced_login_method configuration options (#5303)
    This PR adds support for configs to specify a forced login method
    (chatgpt or api) as well as a forced chatgpt account id. This lets
    enterprises uses [managed
    configs](https://developers.openai.com/codex/security#managed-configuration)
    to force all employees to use their company's workspace instead of their
    own or any other.
    
    When a workspace id is set, a query param is sent to the login flow
    which auto-selects the given workspace or errors if the user isn't a
    member of it.
    
    This PR is large but a large % of it is tests, wiring, and required
    formatting changes.
    
    API login with chatgpt forced
    <img width="1592" height="116" alt="CleanShot 2025-10-19 at 22 40 04"
    src="https://github.com/user-attachments/assets/560c6bb4-a20a-4a37-95af-93df39d057dd"
    />
    
    ChatGPT login with api forced
    <img width="1018" height="100" alt="CleanShot 2025-10-19 at 22 40 29"
    src="https://github.com/user-attachments/assets/d010bbbb-9c8d-4227-9eda-e55bf043b4af"
    />
    
    Onboarding with api forced
    <img width="892" height="460" alt="CleanShot 2025-10-19 at 22 41 02"
    src="https://github.com/user-attachments/assets/cc0ed45c-b257-4d62-a32e-6ca7514b5edd"
    />
    
    Onboarding with ChatGPT forced
    <img width="1154" height="426" alt="CleanShot 2025-10-19 at 22 41 27"
    src="https://github.com/user-attachments/assets/41c41417-dc68-4bb4-b3e7-3b7769f7e6a1"
    />
    
    Logging in with the wrong workspace
    <img width="2222" height="84" alt="CleanShot 2025-10-19 at 22 42 31"
    src="https://github.com/user-attachments/assets/0ff4222c-f626-4dd3-b035-0b7fe998a046"
    />
  • feat: add --add-dir flag for extra writable roots (#5335)
    Add a `--add-dir` CLI flag so sessions can use extra writable roots in
    addition to the ones specified in the config file. These are ephemerally
    added during the session only.
    
    Fixes #3303
    Fixes #2797
  • revert /name for now (#4978)
    There was a regression where we'd read entire rollout contents if there
    was no /name present.
  • feat: Set chat name (#4974)
    Set chat name with `/name` so they appear in the codex resume page:
    
    
    https://github.com/user-attachments/assets/c0252bba-3a53-44c7-a740-f4690a3ad405
  • Set codex SDK TypeScript originator (#4894)
    ## Summary
    - ensure the TypeScript SDK sets CODEX_INTERNAL_ORIGINATOR_OVERRIDE to
    codex_sdk_ts when spawning the Codex CLI
    - extend the responses proxy test helper to capture request headers for
    assertions
    - add coverage that verifies Codex threads launched from the TypeScript
    SDK send the codex_sdk_ts originator header
    
    ## Testing
    - Not Run (not requested)
    
    
    ------
    https://chatgpt.com/codex/tasks/task_i_68e561b125248320a487f129093d16e7
  • cli: fix zsh completion (#4692)
    Before this change:
    ```
    tamird@L03G26TD12 codex-rs % codex
    zsh: do you wish to see all 3864 possibilities (1285 lines)?
    ```
    
    After this change:
    ```
    tamird@L03G26TD12 codex-rs % codex
    app-server              -- [experimental] Run the app server
    apply                a  -- Apply the latest diff produced by Codex agent as a `git apply` to your local working tree
    cloud                   -- [EXPERIMENTAL] Browse tasks from Codex Cloud and apply changes locally
    completion              -- Generate shell completion scripts
    debug                   -- Internal debugging commands
    exec                 e  -- Run Codex non-interactively
    generate-ts             -- Internal: generate TypeScript protocol bindings
    help                    -- Print this message or the help of the given subcommand(s)
    login                   -- Manage login
    logout                  -- Remove stored authentication credentials
    mcp                     -- [experimental] Run Codex as an MCP server and manage MCP servers
    mcp-server              -- [experimental] Run the Codex MCP server (stdio transport)
    responses-api-proxy     -- Internal: run the responses API proxy
    resume                  -- Resume a previous interactive session (picker by default; use --last to continue the most recent)
    ```
  • Simplify request body assertions (#4845)
    We'll have a lot more test like these
  • Use response helpers when mounting SSE test responses (#4783)
    ## Summary
    - replace manual wiremock SSE mounts in the compact suite with the
    shared response helpers
    - simplify the exec auth_env integration test by using the
    mount_sse_once_match helper
    - rely on mount_sse_sequence plus server request collection to replace
    the bespoke SeqResponder utility in tests
    
    ## Testing
    - just fmt
    
    ------
    https://chatgpt.com/codex/tasks/task_i_68e2e238f2a88320a337f0b9e4098093
  • Add helper for response created SSE events in tests (#4758)
    ## Summary
    - add a reusable `ev_response_created` helper that builds
    `response.created` SSE events for integration tests
    - update the exec and core integration suites to use the new helper
    instead of repeating manual JSON literals
    - keep the streaming fixtures consistent by relying on the shared helper
    in every touched test
    
    ## Testing
    - `just fmt`
    
    
    ------
    https://chatgpt.com/codex/tasks/task_i_68e1fe885bb883208aafffb94218da61
  • feat: Freeform apply_patch with simple shell output (#4718)
    ## Summary
    This PR is an alternative approach to #4711, but instead of changing our
    storage, parses out shell calls in the client and reserializes them on
    the fly before we send them out as part of the request.
    
    What this changes:
    1. Adds additional serialization logic when the
    ApplyPatchToolType::Freeform is in use.
    2. Adds a --custom-apply-patch flag to enable this setting on a
    session-by-session basis.
    
    This change is delicate, but is not meant to be permanent. It is meant
    to be the first step in a migration:
    1. (This PR) Add in-flight serialization with config
    2. Update model_family default
    3. Update serialization logic to store turn outputs in a structured
    format, with logic to serialize based on model_family setting.
    4. Remove this rewrite in-flight logic.
    
    ## Test Plan
    - [x] Additional unit tests added
    - [x] Integration tests added
    - [x] Tested locally
  • add(core): managed config (#3868)
    ## Summary
    
    - Factor `load_config_as_toml` into `core::config_loader` so config
    loading is reusable across callers.
    - Layer `~/.codex/config.toml`, optional `~/.codex/managed_config.toml`,
    and macOS managed preferences (base64) with recursive table merging and
    scoped threads per source.
    
    ## Config Flow
    
    ```
    Managed prefs (macOS profile: com.openai.codex/config_toml_base64)
                                   ▲
                                   │
    ~/.codex/managed_config.toml   │  (optional file-based override)
                                   ▲
                                   │
                    ~/.codex/config.toml (user-defined settings)
    ```
    
    - The loader searches under the resolved `CODEX_HOME` directory
    (defaults to `~/.codex`).
    - Managed configs let administrators ship fleet-wide overrides via
    device profiles which is useful for enforcing certain settings like
    sandbox or approval defaults.
    - For nested hash tables: overlays merge recursively. Child tables are
    merged key-by-key, while scalar or array values replace the prior layer
    entirely. This lets admins add or tweak individual fields without
    clobbering unrelated user settings.
  • feat: codex exec writes only the final message to stdout (#4644)
    This updates `codex exec` so that, by default, most of the agent's
    activity is written to stderr so that only the final agent message is
    written to stdout. This makes it easier to pipe `codex exec` into
    another tool without extra filtering.
    
    I introduced `#![deny(clippy::print_stdout)]` to help enforce this
    change and renamed the `ts_println!()` macro to `ts_msg()` because (1)
    it no longer calls `println!()` and (2), `ts_eprintln!()` seemed too
    long of a name.
    
    While here, this also adds `-o` as an alias for `--output-last-message`.
    
    Fixes https://github.com/openai/codex/issues/1670
  • chore: refactor tool handling (#4510)
    # Tool System Refactor
    
    - Centralizes tool definitions and execution in `core/src/tools/*`:
    specs (`spec.rs`), handlers (`handlers/*`), router (`router.rs`),
    registry/dispatch (`registry.rs`), and shared context (`context.rs`).
    One registry now builds the model-visible tool list and binds handlers.
    - Router converts model responses to tool calls; Registry dispatches
    with consistent telemetry via `codex-rs/otel` and unified error
    handling. Function, Local Shell, MCP, and experimental `unified_exec`
    all flow through this path; legacy shell aliases still work.
    - Rationale: reduce per‑tool boilerplate, keep spec/handler in sync, and
    make adding tools predictable and testable.
    
    Example: `read_file`
    - Spec: `core/src/tools/spec.rs` (see `create_read_file_tool`,
    registered by `build_specs`).
    - Handler: `core/src/tools/handlers/read_file.rs` (absolute `file_path`,
    1‑indexed `offset`, `limit`, `L#: ` prefixes, safe truncation).
    - E2E test: `core/tests/suite/read_file.rs` validates the tool returns
    the requested lines.
    
    ## Next steps:
    - Decompose `handle_container_exec_with_params` 
    - Add parallel tool calls
  • Use supports_color in codex exec (#4633)
    It knows how to detect github actions
  • Separate interactive and non-interactive sessions (#4612)
    Do not show exec session in VSCode/TUI selector.
  • Support CODEX_API_KEY for codex exec (#4615)
    Allows to set API key per invocation of `codex exec`
  • Add initial set of doc comments to the SDK (#4513)
    Also perform minor code cleanup.
  • fix: remove mcp-types from app server protocol (#4537)
    We continue the separation between `codex app-server` and `codex
    mcp-server`.
    
    In particular, we introduce a new crate, `codex-app-server-protocol`,
    and migrate `codex-rs/protocol/src/mcp_protocol.rs` into it, renaming it
    `codex-rs/app-server-protocol/src/protocol.rs`.
    
    Because `ConversationId` was defined in `mcp_protocol.rs`, we move it
    into its own file, `codex-rs/protocol/src/conversation_id.rs`, and
    because it is referenced in a ton of places, we have to touch a lot of
    files as part of this PR.
    
    We also decide to get away from proper JSON-RPC 2.0 semantics, so we
    also introduce `codex-rs/app-server-protocol/src/jsonrpc_lite.rs`, which
    is basically the same `JSONRPCMessage` type defined in `mcp-types`
    except with all of the `"jsonrpc": "2.0"` removed.
    
    Getting rid of `"jsonrpc": "2.0"` makes our serialization logic
    considerably simpler, as we can lean heavier on serde to serialize
    directly into the wire format that we use now.
  • Remove legacy codex exec --json format (#4525)
    `codex exec --json` now maps to the behavior of `codex exec
    --experimental-json` with new event and item shapes.
    
    Thread events:
    - thread.started
    - turn.started
    - turn.completed
    - turn.failed
    - item.started
    - item.updated
    - item.completed
    
    Item types: 
    - assistant_message
    - reasoning
    - command_execution
    - file_change
    - mcp_tool_call
    - web_search
    - todo_list
    - error
    
    Sample output:
    
    <details>
    `codex exec "list my assigned github issues"  --json | jq`
    
    ```
    {
      "type": "thread.started",
      "thread_id": "01999ce5-f229-7661-8570-53312bd47ea3"
    }
    {
      "type": "turn.started"
    }
    {
      "type": "item.completed",
      "item": {
        "id": "item_0",
        "item_type": "reasoning",
        "text": "**Planning to list assigned GitHub issues**"
      }
    }
    {
      "type": "item.started",
      "item": {
        "id": "item_1",
        "item_type": "mcp_tool_call",
        "server": "github",
        "tool": "search_issues",
        "status": "in_progress"
      }
    }
    {
      "type": "item.completed",
      "item": {
        "id": "item_1",
        "item_type": "mcp_tool_call",
        "server": "github",
        "tool": "search_issues",
        "status": "completed"
      }
    }
    {
      "type": "item.completed",
      "item": {
        "id": "item_2",
        "item_type": "reasoning",
        "text": "**Organizing final message structure**"
      }
    }
    {
      "type": "item.completed",
      "item": {
        "id": "item_3",
        "item_type": "assistant_message",
        "text": "**Assigned Issues**\n- openai/codex#3267 – “stream error: stream disconnected before completion…” (bug) – last update 2025-09-08\n- openai/codex#3257 – “You've hit your usage limit. Try again in 4 days 20 hours 9 minutes.” – last update 2025-09-23\n- openai/codex#3054 – “reqwest SSL panic (library has no ciphers)” (bug) – last update 2025-09-03\n- openai/codex#3051 – “thread 'main' panicked at linux-sandbox/src/linux_run_main.rs:53:5:” (bug) – last update 2025-09-10\n- openai/codex#3004 – “Auto-compact when approaching context limit” (enhancement) – last update 2025-09-26\n- openai/codex#2916 – “Feature request: Add OpenAI service tier support for cost optimization” – last update 2025-09-12\n- openai/codex#1581 – “stream error: stream disconnected before completion: stream closed before response.complete; retrying...” (bug) – last update 2025-09-17"
      }
    }
    {
      "type": "turn.completed",
      "usage": {
        "input_tokens": 34785,
        "cached_input_tokens": 12544,
        "output_tokens": 560
      }
    }
    ```
    
    </details>