[codex] parallelize release code generation (#27702)

The release profile still uses one codegen unit, which serializes LLVM
code generation within each crate. That setting was selected alongside
fat LTO for optimization quality and binary size, but releases now use
ThinLTO and code generation dominates the critical-path build.

Use four codegen units. On an Apple M4 Max with 16 cores and 128 GiB
RAM, using rustc 1.96.0, four and eight units took 507.486 and 505.325
seconds respectively. Four therefore keeps the build-time gain while
limiting the stripped `codex` increase to 14.7%, compared with 21.5% at
eight units. The gzip-compressed binary grows 7.8% at four units.

The one-unit build from an empty target directory took 981.150 seconds.
That comparison also populated dependency and native build caches, so it
is directional rather than controlled. It agrees with the earlier clean
matrix where eight units reduced 671 seconds to 303 seconds:
https://gist.github.com/anp/4b88393a0acd35783d9f42156f3243d5

At the local 48% reduction, the current release's 55m22s critical-path
macOS Cargo step would save about 26 minutes from the 71m28s workflow:
https://github.com/openai/codex/actions/runs/27367405663

The prompt-image medians ranged from 3.9% faster to 0.9% slower. CLI
startup shifted by 1-2 ms while user and system CPU time were unchanged.

This is a draft because the release-latency improvement may not justify
the binary-size increase.
This commit is contained in:
Tamir Duberstein
2026-06-11 19:44:36 -07:00
committed by GitHub
Unverified
parent 16c7c79540
commit e23d4df4ff
+2 -2
View File
@@ -513,8 +513,8 @@ split-debuginfo = "off"
# sidecar symbols and stripped the binaries.
strip = false
# See https://github.com/openai/codex/issues/1411 for details.
codegen-units = 1
# Balance parallel release code generation against binary size.
codegen-units = 4
[profile.profiling]
inherits = "release"