Commit Graph

23 Commits

  • remove model_family from `config (#7571)
    - Remove `model_family` from `config`
    - Make sure to still override config elements related to `model_family`
    like supporting reasoning
  • Migrate model family to models manager (#7565)
    This PR moves `ModelsFamily` to `openai_models`. It also propagates
    `ModelsManager` to session services and use it to drive model family. We
    also make `derive_default_model_family` private because it's a step
    towards what we want: one place that gives model configuration.
    
    This is a second step at having one source of truth for models
    information and config: `ModelsManager`.
    
    Next steps would be to remove `ModelsFamily` from config. That's massive
    because it's being used in 41 occasions mostly pre launching `codex`.
    Also, we need to make `find_family_for_model` private. It's also big
    because it's being used in 21 occasions ~ all tests.
  • core: make shell behavior portable on FreeBSD (#7039)
    - Use /bin/sh instead of /bin/bash on FreeBSD/OpenBSD in the process
    group timeout test to avoid command-not-found failures.
    
    - Accept /usr/local/bin/bash as a valid SHELL path to match common
    FreeBSD installations.
    
    - Switch the shell serialization duration test to /bin/sh for improved
    portability across Unix platforms.
    
    With this change, `cargo test -p codex-core --lib` runs and passes on
    FreeBSD.
  • Move shell to use truncate_text (#6842)
    Move shell to use the configurable `truncate_text`
    
    ---------
    
    Co-authored-by: pakrym-oai <pakrym@openai.com>
  • shell_command returns freeform output (#6860)
    Instead of returning structured out and then re-formatting it into
    freeform, return the freeform output from shell_command tool.
    
    Keep `shell` as the default tool for GPT-5.
  • chore(core) Add shell_serialization coverage (#6810)
    ## Summary
    Similar to #6545, this PR updates the shell_serialization test suite to
    cover the various `shell` tool invocations we have. Note that this does
    not cover unified_exec, which has its own suite of tests. This should
    provide some test coverage for when we eventually consolidate
    serialization logic.
    
    ## Testing
    - [x] These are tests
  • Update defaults to gpt-5.1 (#6652)
    ## Summary
    - update documentation, example configs, and automation defaults to
    reference gpt-5.1 / gpt-5.1-codex
    - bump the CLI and core configuration defaults, model presets, and error
    messaging to the new models while keeping the model-family/tool coverage
    for legacy slugs
    - refresh tests, fixtures, and TUI snapshots so they expect the upgraded
    defaults
    
    ## Testing
    - `cargo test -p codex-core
    config::tests::test_precedence_fixture_with_gpt5_profile`
    
    
    ------
    [Codex
    Task](https://chatgpt.com/codex/tasks/task_i_6916c5b3c2b08321ace04ee38604fc6b)
  • fix(core) serialize shell_command (#6744)
    ## Summary
    Ensures we're serializing calls to `shell_command`
    
    ## Testing
    - [x] Added unit test
  • Promote shared helpers for suite tests (#6460)
    ## Summary
    - add `TestCodex::submit_turn_with_policies` and extend the response
    helpers with reusable tool-call utilities
    - update the grep_files, read_file, list_dir, shell_serialization, and
    tools suites to rely on the shared helpers instead of local copies
    - make the list_dir helper return `anyhow::Result` so clippy no longer
    warns about `expect`
    
    ## Testing
    - `just fix -p codex-core`
    - `cargo test -p codex-core --test all
    suite::grep_files::grep_files_tool_collects_matches`
    - `cargo test -p codex-core
    suite::grep_files::grep_files_tool_collects_matches -- --ignored`
    (filter requests ignored tests so nothing runs, but the build stays
    clean)
    
    
    ------
    [Codex
    Task](https://chatgpt.com/codex/tasks/task_i_69112d53abac83219813cab4d7cb6446)
  • chore: Add shell serialization tests for json (#6043)
    ## Summary
    Can never have enough tests on this code path - checking that json
    inside a shell call is deserialized correctly.
    
    ## Tests
    - [x] These are tests 😎
  • Add ItemStarted/ItemCompleted events for UserInputItem (#5306)
    Adds a new ItemStarted event and delivers UserMessage as the first item
    type (more to come).
    
    
    Renames `InputItem` to `UserInput` considering we're using the `Item`
    suffix for actual items.
  • fix(core) use regex for all shell_serialization tests (#5189)
    ## Summary
    Thought I switched all of these to using a regex instead, but missed 2.
    This should address our [flakey test
    problem](https://github.com/openai/codex/actions/runs/18511206616/job/52752341520?pr=5185).
    
    ## Test Plan
    - [x] Only updated unit tests
  • fix: apply_patch shell_serialization tests (#4786)
    ## Summary
    Adds additional shell_serialization tests specifically for apply_patch
    and other cases.
    
    ## Test Plan
    - [x] These are all tests
  • feat: feature flag (#4948)
    Add proper feature flag instead of having custom flags for everything.
    This is just for experimental/wip part of the code
    It can be used through CLI:
    ```bash
    codex --enable unified_exec --disable view_image_tool
    ```
    
    Or in the `config.toml`
    ```toml
    # Global toggles applied to every profile unless overridden.
    [features]
    apply_patch_freeform = true
    view_image_tool = false
    ```
    
    Follow-up:
    In a following PR, the goal is to have a default have `bundles` of
    features that we can associate to a model
  • feat: make shortcut works even with capslock (#5049)
    Shortcut where not working in caps-lock. Fixing this
  • Make output assertions more explicit (#4784)
    Match using precise regexes.
  • Add helper for response created SSE events in tests (#4758)
    ## Summary
    - add a reusable `ev_response_created` helper that builds
    `response.created` SSE events for integration tests
    - update the exec and core integration suites to use the new helper
    instead of repeating manual JSON literals
    - keep the streaming fixtures consistent by relying on the shared helper
    in every touched test
    
    ## Testing
    - `just fmt`
    
    
    ------
    https://chatgpt.com/codex/tasks/task_i_68e1fe885bb883208aafffb94218da61
  • Use wait_for_event helpers in tests (#4753)
    ## Summary
    - replace manual event polling loops in several core test suites with
    the shared wait_for_event helpers
    - keep prior assertions intact by using closure captures for stateful
    expectations, including plan updates, patch lifecycles, and review flow
    checks
    - rely on wait_for_event_with_timeout where longer waits are required,
    simplifying timeout handling
    
    ## Testing
    - just fmt
    
    
    ------
    https://chatgpt.com/codex/tasks/task_i_68e1d58582d483208febadc5f90dd95e
  • feat: Freeform apply_patch with simple shell output (#4718)
    ## Summary
    This PR is an alternative approach to #4711, but instead of changing our
    storage, parses out shell calls in the client and reserializes them on
    the fly before we send them out as part of the request.
    
    What this changes:
    1. Adds additional serialization logic when the
    ApplyPatchToolType::Freeform is in use.
    2. Adds a --custom-apply-patch flag to enable this setting on a
    session-by-session basis.
    
    This change is delicate, but is not meant to be permanent. It is meant
    to be the first step in a migration:
    1. (This PR) Add in-flight serialization with config
    2. Update model_family default
    3. Update serialization logic to store turn outputs in a structured
    format, with logic to serialize based on model_family setting.
    4. Remove this rewrite in-flight logic.
    
    ## Test Plan
    - [x] Additional unit tests added
    - [x] Integration tests added
    - [x] Tested locally