* handle invalid commands * better test * format
* o3 instead of o3-mini
Signed-off-by: Ilan Bigio <ilan@openai.com>