3 Commits

  • [codex] Return TurnResult from Python turn handles (#23151)
    ## Why
    
    `TurnHandle.run()` returned the raw app-server `Turn`, whose live
    start/completed payloads do not include loaded `items`, so users saw
    empty `items` after starting a turn. That made the handle-based path
    behave differently from `Thread.run(...)`, and pushed examples toward
    persisted-thread reads plus helper extraction.
    
    This PR makes the run APIs standalone: starting a turn and running it
    returns collected turn data directly, or fails visibly when required
    stream events are missing.
    
    ## What Changed
    
    - Replaces the public `RunResult` export with `TurnResult`.
    - Adds turn metadata to `TurnResult`: `id`, `status`, `error`,
    `started_at`, `completed_at`, and `duration_ms`, alongside
    `final_response`, `items`, and `usage`.
    - Changes `TurnHandle.run()` and `AsyncTurnHandle.run()` to consume
    stream events with the same collector used by `Thread.run(...)`.
    - Exports `TurnError` from `openai_codex.types` for the new result
    shape.
    - Updates tests, examples, docs, and the walkthrough notebook to use
    `result.final_response` and `result.items` directly.
    - Removes persisted-thread helper paths and placeholder/skipped control
    flows from the public examples and notebook.
    
    ## Verification
    
    - `python3 -m py_compile ...` over changed SDK, example, and test Python
    files.
    - `python3 -c "import json;
    json.load(open('sdk/python/notebooks/sdk_walkthrough.ipynb'))"`
    - `git diff --check`
    - `PYTHONPATH=sdk/python/src python3 -c ...` import/signature smoke for
    `TurnResult`, `TurnHandle.run`, and `AsyncTurnHandle.run`.
  • [8/8] Add Python SDK Ruff formatting (#22021)
    ## Why
    
    The Python SDK needs the same tight formatter/lint loop as the rest of
    the repo: a safe Ruff autofix pass, Ruff formatting, editor save
    behavior, and CI checks that catch drift. Without that loop, SDK changes
    can land with formatting or import ordering that differs from what
    reviewers and CI expect.
    
    ## What
    
    - Add Ruff configuration to `sdk/python/pyproject.toml`, excluding
    generated protocol code and notebooks from the normal lint/format pass.
    - Update `just fmt` so it still formats Rust and also runs Python SDK
    Ruff autofix and formatting.
    - Add Python SDK CI steps for `ruff check` and `ruff format --check`
    before pytest.
    - Recommend the Ruff VS Code extension and enable Python
    format/fix/organize-on-save so Cmd+S uses the same tooling.
    - Apply the resulting Ruff formatting to SDK Python files, examples, and
    the checked-in generated `v2_all.py` output emitted by the pinned
    generator.
    - Add a guard test for the `just fmt` recipe so it keeps working from
    both Rust and Python SDK working directories.
    
    ## Stack
    
    1. #21891 `[1/8]` Pin Python SDK runtime dependency
    2. #21893 `[2/8]` Generate Python SDK types from pinned runtime
    3. #21895 `[3/8]` Run Python SDK tests in CI
    4. #21896 `[4/8]` Define Python SDK public API surface
    5. #21905 `[5/8]` Rename Python SDK package to `openai-codex`
    6. #21910 `[6/8]` Add high-level Python SDK approval mode
    7. #22014 `[7/8]` Add Python SDK app-server integration harness
    8. This PR `[8/8]` Add Python SDK Ruff formatting
    
    ## Verification
    
    - Added `test_root_fmt_recipe_formats_rust_and_python_sdk` for the
    shared format recipe.
    - Ran `just fmt` after the recipe update.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • [7/8] Add Python SDK app-server integration harness (#22014)
    ## Why
    
    The SDK had behavioral tests that replaced SDK client internals. Those
    tests could catch wrapper mistakes, but they did not prove the pinned
    app-server runtime, generated notification models, request routing, and
    sync/async public clients worked together.
    
    This PR adds deterministic integration coverage that starts the pinned
    `codex app-server` process and mocks only the upstream Responses HTTP
    boundary.
    
    ## What
    
    - Add `AppServerHarness` and `MockResponsesServer` helpers for isolated
    `CODEX_HOME`, mock-provider config, queued SSE responses, and captured
    `/v1/responses` requests.
    - Add shared helpers for SSE construction, stream assertions,
    approval-policy inspection, and image fixtures.
    - Split integration coverage into focused modules for run behavior,
    inputs, streaming, turn controls, approvals, and thread lifecycle.
    - Cover sync and async `Thread.run`, `TurnHandle.stream`, interleaved
    streams, approval-mode persistence, lifecycle helpers, final-answer
    phase handling, image inputs, loaded skill input injection, steering,
    interruption, listing, history reads, run overrides, and token usage
    mapping.
    - Replace public-wrapper tests that duplicated integration-test behavior
    with lower-level client tests only where direct client behavior is the
    thing under test.
    
    ## Stack
    
    1. #21891 `[1/8]` Pin Python SDK runtime dependency
    2. #21893 `[2/8]` Generate Python SDK types from pinned runtime
    3. #21895 `[3/8]` Run Python SDK tests in CI
    4. #21896 `[4/8]` Define Python SDK public API surface
    5. #21905 `[5/8]` Rename Python SDK package to `openai-codex`
    6. #21910 `[6/8]` Add high-level Python SDK approval mode
    7. This PR `[7/8]` Add Python SDK app-server integration harness
    8. #22021 `[8/8]` Add Python SDK Ruff formatting
    
    ## Verification
    
    - Added pinned app-server integration tests under
    `sdk/python/tests/test_app_server_*.py` and
    `test_real_app_server_integration.py`.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>