codex

Disable very flaky tests (#11394 )

Collected from last 20 builds of main in
https://github.com/openai/codex/commits/main/.

pakrym-oai · 2026-02-10 18:50:11 -08:00

bfd4e2112c

tui: double-press Ctrl+C/Ctrl+D to quit (#8936 )

## Problem

Codex’s TUI quit behavior has historically been easy to trigger
accidentally and hard to reason
about.

- `Ctrl+C`/`Ctrl+D` could terminate the UI immediately, which is a
common key to press while trying
  to dismiss a modal, cancel a command, or recover from a stuck state.
- “Quit” and “shutdown” were not consistently separated, so some exit
paths could bypass the
  shutdown/cleanup work that should run before the process terminates.

This PR makes quitting both safer (harder to do by accident) and more
uniform across quit
gestures, while keeping the shutdown-first semantics explicit.

## Mental model

After this change, the system treats quitting as a UI request that is
coordinated by the app
layer.

- The UI requests exit via `AppEvent::Exit(ExitMode)`.
- `ExitMode::ShutdownFirst` is the normal user path: the app triggers
`Op::Shutdown`, continues
rendering while shutdown runs, and only ends the UI loop once shutdown
has completed.
- `ExitMode::Immediate` exists as an escape hatch (and as the
post-shutdown “now actually exit”
signal); it bypasses cleanup and should not be the default for
user-triggered quits.

User-facing quit gestures are intentionally “two-step” for safety:

- `Ctrl+C` and `Ctrl+D` no longer exit immediately.
- The first press arms a 1-second window and shows a footer hint (“ctrl
+ <key> again to quit”).
- Pressing the same key again within the window requests a
shutdown-first quit; otherwise the
  hint expires and the next press starts a fresh window.

Key routing remains modal-first:

- A modal/popup gets first chance to consume `Ctrl+C`.
- If a modal handles `Ctrl+C`, any armed quit shortcut is cleared so
dismissing a modal cannot
  prime a subsequent `Ctrl+C` to quit.
- `Ctrl+D` only participates in quitting when the composer is empty and
no modal/popup is active.

The design doc `docs/exit-confirmation-prompt-design.md` captures the
intended routing and the
invariants the UI should maintain.

## Non-goals

- This does not attempt to redesign modal UX or make modals uniformly
dismissible via `Ctrl+C`.
It only ensures modals get priority and that quit arming does not leak
across modal handling.
- This does not introduce a persistent confirmation prompt/menu for
quitting; the goal is to keep
  the exit gesture lightweight and consistent.
- This does not change the semantics of core shutdown itself; it changes
how the UI requests and
  sequences it.

## Tradeoffs

- Quitting via `Ctrl+C`/`Ctrl+D` now requires a deliberate second
keypress, which adds friction for
  users who relied on the old “instant quit” behavior.
- The UI now maintains a small time-bounded state machine for the armed
shortcut, which increases
  complexity and introduces timing-dependent behavior.

This design was chosen over alternatives (a modal confirmation prompt or
a long-lived “are you
sure” state) because it provides an explicit safety barrier while
keeping the flow fast and
keyboard-native.

## Architecture

- `ChatWidget` owns the quit-shortcut state machine and decides when a
quit gesture is allowed
  (idle vs cancellable work, composer state, etc.).
- `BottomPane` owns rendering and local input routing for modals/popups.
It is responsible for
consuming cancellation keys when a view is active and for
showing/expiring the footer hint.
- `App` owns shutdown sequencing: translating
`AppEvent::Exit(ShutdownFirst)` into `Op::Shutdown`
  and only terminating the UI loop when exit is safe.

This keeps “what should happen” decisions (quit vs interrupt vs ignore)
in the chat/widget layer,
while keeping “how it looks and which view gets the key” in the
bottom-pane layer.

## Observability

You can tell this is working by running the TUIs and exercising the quit
gestures:

- While idle: pressing `Ctrl+C` (or `Ctrl+D` with an empty composer and
no modal) shows a footer
hint for ~1 second; pressing again within that window exits via
shutdown-first.
- While streaming/tools/review are active: `Ctrl+C` interrupts work
rather than quitting.
- With a modal/popup open: `Ctrl+C` dismisses/handles the modal (if it
chooses to) and does not
arm a quit shortcut; a subsequent quick `Ctrl+C` should not quit unless
the user re-arms it.

Failure modes are visible as:

- Quits that happen immediately (no hint window) from `Ctrl+C`/`Ctrl+D`.
- Quits that occur while a modal is open and consuming `Ctrl+C`.
- UI termination before shutdown completes (cleanup skipped).

## Tests

- Updated/added unit and snapshot coverage in `codex-tui` and
`codex-tui2` to validate:
  - The quit hint appears and expires on the expected key.
- Double-press within the window triggers a shutdown-first quit request.
- Modal-first routing prevents quit bypass and clears any armed shortcut
when a modal consumes
    `Ctrl+C`.

These tests focus on the UI-level invariants and rendered output; they
do not attempt to validate
real terminal key-repeat timing or end-to-end process shutdown behavior.

---
Screenshot:
<img width="912" height="740" alt="Screenshot 2026-01-13 at 1 05 28 PM"
src="https://github.com/user-attachments/assets/18f3d22e-2557-47f2-a369-ae7a9531f29f"
/>

Josh McKinney · 2026-01-14 17:42:52 +00:00

4283a7432b

Improve handling of config and rules errors for app server clients (#9182 )

When an invalid config.toml key or value is detected, the CLI currently
just quits. This leaves the VSCE in a dead state.

This PR changes the behavior to not quit and bubble up the config error
to users to make it actionable. It also surfaces errors related to
"rules" parsing.

This allows us to surface these errors to users in the VSCE, like this:

<img width="342" height="129" alt="Screenshot 2026-01-13 at 4 29 22 PM"
src="https://github.com/user-attachments/assets/a79ffbe7-7604-400c-a304-c5165b6eebc4"
/>

<img width="346" height="244" alt="Screenshot 2026-01-13 at 4 45 06 PM"
src="https://github.com/user-attachments/assets/de874f7c-16a2-4a95-8c6d-15f10482e67b"
/>

Eric Traut · 2026-01-13 17:57:09 -08:00

31d9b6f4d2

fix: integration test for #9011 (#9166 )

Adds an integration test for the new behavior introduced in
https://github.com/openai/codex/pull/9011. The work to create the test
setup was substantial enough that I thought it merited a separate PR.

This integration test spawns `codex` in TUI mode, which requires
spawning a PTY to run successfully, so I had to introduce quite a bit of
scaffolding in `run_codex_cli()`. I was surprised to discover that we
have not done this in our codebase before, so perhaps this should get
moved to a common location so it can be reused.

The test itself verifies that a malformed `rules` in `$CODEX_HOME`
prints a human-readable error message and exits nonzero.

Michael Bolin · 2026-01-13 23:39:34 +00:00

4c673086bc

chore: delete chatwidget::tests::binary_size_transcript_snapshot tui test (#6759 )

We're running into quite a bit of drag maintaining this test, since
every time we add fields to an EventMsg that happened to be dumped into
the `binary-size-log.jsonl` fixture, this test starts to fail. The fix
is usually to either manually update the `binary-size-log.jsonl` fixture
file, or update the `upgrade_event_payload_for_tests` function to map
the data in that file into something workable.

Eason says it's fine to delete this test, so let's just delete it

Owen Lin · 2025-11-17 11:11:41 -08:00

ae2a084fae

fix(tui): propagate errors in insert_history_lines_to_writer (#4266 )

## What?
Fixed error handling in `insert_history_lines_to_writer` where all
terminal operations were silently ignoring errors via `.ok()`.

  ## Why?
Silent I/O failures could leave the terminal in an inconsistent state
(e.g., scroll region not reset) with no way to debug. This violates Rust
error handling best practices.

  ## How?
  - Changed function signature to return `io::Result<()>`
  - Replaced all `.ok()` calls with `?` operator to propagate errors
- Added `tracing::warn!` in wrapper function for backward compatibility
  - Updated 15 test call sites to handle Result  with `.expect()`

  ## Testing
  - ✅ Pass all tests

  ## Type of Change
  - [x] Bug fix (non-breaking change)

---------

Signed-off-by: Huaiwu Li <lhwzds@gmail.com>
Co-authored-by: Eric Traut <etraut@openai.com>

Huaiwu Li · 2025-10-30 18:07:51 -07:00

9a638dbf4e

feat: add Vec<ParsedCommand> to ExecApprovalRequestEvent (#5222 )

This adds `parsed_cmd: Vec<ParsedCommand>` to `ExecApprovalRequestEvent`
in the core protocol (`protocol/src/protocol.rs`), which is also what
this field is named on `ExecCommandBeginEvent`. Honestly, I don't love
the name (it sounds like a single command, but it is actually a list of
them), but I don't want to get distracted by a naming discussion right
now.

This also adds `parsed_cmd` to `ExecCommandApprovalParams` in
`codex-rs/app-server-protocol/src/protocol.rs`, so it will be available
via `codex app-server`, as well.

For consistency, I also updated `ExecApprovalElicitRequestParams` in
`codex-rs/mcp-server/src/exec_approval.rs` to include this field under
the name `codex_parsed_cmd`, as that struct already has a number of
special `codex_*` fields. Note this is the code for when Codex is used
as an MCP _server_ and therefore has to conform to the official spec for
an MCP elicitation type.

Michael Bolin · 2025-10-15 13:58:40 -07:00

995f5c3614

update composer + user message styling (#4240 )

Changes:

- the composer and user messages now have a colored background that
stretches the entire width of the terminal.
- the prompt character was changed from a cyan `▌` to a bold `›`.
- the "working" shimmer now follows the "dark gray" color of the
terminal, better matching the terminal's color scheme

| Terminal + Background        | Screenshot |
|------------------------------|------------|
| iTerm with dark bg | <img width="810" height="641" alt="Screenshot
2025-09-25 at 11 44 52 AM"
src="https://github.com/user-attachments/assets/1317e579-64a9-4785-93e6-98b0258f5d92"
/> |
| iTerm with light bg | <img width="845" height="540" alt="Screenshot
2025-09-25 at 11 46 29 AM"
src="https://github.com/user-attachments/assets/e671d490-c747-4460-af0b-3f8d7f7a6b8e"
/> |
| iTerm with color bg | <img width="825" height="564" alt="Screenshot
2025-09-25 at 11 47 12 AM"
src="https://github.com/user-attachments/assets/141cda1b-1164-41d5-87da-3be11e6a3063"
/> |
| Terminal.app with dark bg | <img width="577" height="367"
alt="Screenshot 2025-09-25 at 11 45 22 AM"
src="https://github.com/user-attachments/assets/93fc4781-99f7-4ee7-9c8e-3db3cd854fe5"
/> |
| Terminal.app with light bg | <img width="577" height="367"
alt="Screenshot 2025-09-25 at 11 46 04 AM"
src="https://github.com/user-attachments/assets/19bf6a3c-91e0-447b-9667-b8033f512219"
/> |
| Terminal.app with color bg | <img width="577" height="367"
alt="Screenshot 2025-09-25 at 11 45 50 AM"
src="https://github.com/user-attachments/assets/dd7c4b5b-342e-4028-8140-f4e65752bd0b"
/> |

Jeremy Rose · 2025-09-26 16:35:56 -07:00

43b63ccae8

feat: update default (#4076 )

Changes:
- Default model and docs now use gpt-5-codex. 
- Disables the GPT-5 Codex NUX by default.
- Keeps presets available for API key users.

Thibault Sottiaux · 2025-09-22 20:10:52 -07:00

c93e77b68b

feat: include reasoning_effort in NewConversationResponse (#3506 )

`ClientRequest::NewConversation` picks up the reasoning level from the user's defaults in `config.toml`, so it should be reported in `NewConversationResponse`.

Michael Bolin · 2025-09-11 21:04:40 -07:00

9bbeb75361

fix: include rollout_path in NewConversationResponse (#3352 )

Adding the `rollout_path` to the `NewConversationResponse` makes it so a
client can perform subsequent operations on a `(ConversationId,
PathBuf)` pair. #3353 will introduce support for `ArchiveConversation`.

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/3352).
* #3353
* __->__ #3352

Michael Bolin · 2025-09-09 00:11:48 -07:00

2a76a08a9e

fix: fix serde_as annotation and verify with test (#3170 )

I didn't do https://github.com/openai/codex/pull/3163 correctly the
first time: now verified with a test.

Michael Bolin · 2025-09-04 10:38:00 -07:00

91708bb031

prefer ratatui Stylized for constructing lines/spans (#3068 )

no functional change, just simplifying ratatui styling and adding
guidance in AGENTS.md for future.

Jeremy Rose · 2025-09-02 23:19:54 +00:00

578ff09e17

rework message styling (#2877 )

https://github.com/user-attachments/assets/cf07f62b-1895-44bb-b9c3-7a12032eb371

Jeremy Rose · 2025-09-02 17:29:58 +00:00

e442ecedab

do not show timeouts as "sandbox error"s (#2587 )

🙅🫸
```
✗ Failed (exit -1)
  └ 🧪 cargo test --all-features -q
    sandbox error: command timed out
```

😌👉
```
✗ Failed (exit -1)
  └ 🧪 cargo test --all-features -q
    error: command timed out
```

Jeremy Rose · 2025-08-25 17:52:23 -07:00

17e5077507

test: faster test execution in codex-core (#2633 )

this dramatically improves time to run `cargo test -p codex-core` (~25x
speedup).

before:
```
cargo test -p codex-core  35.96s user 68.63s system 19% cpu 8:49.80 total
```

after:
```
cargo test -p codex-core  5.51s user 8.16s system 63% cpu 21.407 total
```

both tests measured "hot", i.e. on a 2nd run with no filesystem changes,
to exclude compile times.

approach inspired by [Delete Cargo Integration
Tests](https://matklad.github.io/2021/02/27/delete-cargo-integration-tests.html),
we move all test cases in tests/ into a single suite in order to have a
single binary, as there is significant overhead for each test binary
executed, and because test execution is only parallelized with a single
binary.

Jeremy Rose · 2025-08-24 11:10:53 -07:00

32bbbbad61

send-aggregated output (#2364 )

We want to send an aggregated output of stderr and stdout so we don't
have to aggregate it stderr+stdout as we lose order sometimes.

---------

Co-authored-by: Gabriel Peal <gpeal@users.noreply.github.com>

Ahmed Ibrahim · 2025-08-23 16:54:31 +00:00

957d44918d

tui: coalesce command output; show unabridged commands in transcript (#2590 )

https://github.com/user-attachments/assets/effec7c7-732a-4b61-a2ae-3cb297b6b19b

Jeremy Rose · 2025-08-22 16:32:31 -07:00

d994019f3f

hide CoT by default; show headers in status indicator (#2316 )

Plan is for full CoT summaries to be visible in a "transcript view" when
we implement that, but for now they're hidden.


https://github.com/user-attachments/assets/e8a1b0ef-8f2a-48ff-9625-9c3c67d92cdb

Jeremy Rose · 2025-08-20 16:58:56 -07:00

e95cad1946

chore: upgrade to Rust 1.89 (#2465 )

Codex created this PR from the following prompt:

> upgrade this entire repo to Rust 1.89. Note that this requires
updating codex-rs/rust-toolchain.toml as well as the workflows in
.github/. Make sure that things are "clippy clean" as this change will
likely uncover new Clippy errors. `just fmt` and `cargo clippy --tests`
are sufficient to check for correctness

Note this modifies a lot of lines because it folds nested `if`
statements using `&&`.

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2465).
* #2467
* __->__ #2465

Michael Bolin · 2025-08-19 13:22:02 -07:00

50c48e88f5

tui: standardize tree prefix glyphs to └ (#2274 )

Replace mixed `⎿` and `L` prefixes with `└` in TUI rendering.

<img width="454" height="659" alt="Screenshot 2025-08-13 at 4 02 03 PM"
src="https://github.com/user-attachments/assets/61c9c7da-830b-4040-bb79-a91be90870ca"
/>

Jeremy Rose · 2025-08-13 19:14:03 -04:00

bb9ce3cb78

Re-add markdown streaming (#2029 )

Wait for newlines, then render markdown on a line by line basis. Word wrap it for the current terminal size and then spit it out line by line into the UI. Also adds tests and fixes some UI regressions.

easong-openai · 2025-08-12 17:37:28 -07:00

6340acd885

Revert "Streaming markdown (#1920 )" (#1981 )

This reverts commit 2b7139859e.

easong-openai · 2025-08-08 01:38:39 +00:00

52e12f2b6c

Streaming markdown (#1920 )

We wait until we have an entire newline, then format it with markdown and stream in to the UI. This reduces time to first token but is the right thing to do with our current rendering model IMO. Also lets us add word wrapping!

easong-openai · 2025-08-07 18:26:47 -07:00

2b7139859e

Stream model responses (#1810 )

Stream models thoughts and responses instead of waiting for the whole
thing to come through. Very rough right now, but I'm making the risk call to push through.

easong-openai · 2025-08-05 04:23:22 +00:00

906d449760

feat: initial import of Rust implementation of Codex CLI in codex-rs/ (#629 )

As stated in `codex-rs/README.md`:

Today, Codex CLI is written in TypeScript and requires Node.js 22+ to
run it. For a number of users, this runtime requirement inhibits
adoption: they would be better served by a standalone executable. As
maintainers, we want Codex to run efficiently in a wide range of
environments with minimal overhead. We also want to take advantage of
operating system-specific APIs to provide better sandboxing, where
possible.

To that end, we are moving forward with a Rust implementation of Codex
CLI contained in this folder, which has the following benefits:

- The CLI compiles to small, standalone, platform-specific binaries.
- Can make direct, native calls to
[seccomp](https://man7.org/linux/man-pages/man2/seccomp.2.html) and
[landlock](https://man7.org/linux/man-pages/man7/landlock.7.html) in
order to support sandboxing on Linux.
- No runtime garbage collection, resulting in lower memory consumption
and better, more predictable performance.

Currently, the Rust implementation is materially behind the TypeScript
implementation in functionality, so continue to use the TypeScript
implmentation for the time being. We will publish native executables via
GitHub Releases as soon as we feel the Rust version is usable.

Michael Bolin · 2025-04-24 13:31:40 -07:00

31d0d7a305

26 Commits