`shuf` is a GNU coreutil which requires `brew install coreutils` on
macOS. Replace it with `echo $((RANDOM % <sides> + 1))` which works in
bash and zsh on both Linux and macOS.
Also reword "true randomness" to "using a random number generator" to
more clearly distinguish programmatic RNG from LLM non-determinism.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a tutorial-style quickstart guide that walks the reader through
creating their first Agent Skill — a `roll-dice` skill that teaches an
agent to roll dice using true system randomness. The guide covers
creating the `SKILL.md` file, verifying discovery via `/skills` in VS
Code, testing with a "Roll a d20" prompt, and a brief explanation of the
discovery/activation/execution lifecycle. Includes both bash and
PowerShell command variants, and a note about model variation in
tool-use reliability.
Also adds the quickstart as the first page under "For skill creators" in
`docs.json` navigation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a new "Gotchas sections" subsection to "Patterns for effective
instructions" in the best practices guide. Covers what a gotcha is
(environment-specific facts that defy reasonable assumptions), how to
structure them (problem/correction pairs), and why they belong directly
in `SKILL.md` rather than a separate reference file.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Show `csv-analyzer/` (containing `SKILL.md` and `evals/evals.json`)
alongside `csv-analyzer-workspace/` so readers can see the full layout
at a glance.
Closes#238. Closes#239.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
How-to guide covering the full description optimization workflow:
writing effective descriptions, designing trigger eval queries
(should-trigger and should-not-trigger with near-miss examples), testing
trigger rates with a bash eval script, train/validation splits to avoid
overfitting, and the iterative optimization loop.
The guide is client-agnostic by default but includes a working Claude
Code example in the `check_triggered` function using
`--output-format json` and `jq` to detect `Skill` tool calls.
Adds the page to the "For skill creators" navigation group in
`docs.json`.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
A how-to guide for evaluating skill output quality using structured
evals. Covers the full eval workflow: designing test cases, running
with-skill vs. baseline comparisons, writing assertions, LLM-based
grading, aggregating benchmarks, analyzing patterns, human review, and
LLM-driven iterative improvement.
Derived from the workflow implemented by the `skill-creator` Skill, but
written as a standalone guide that readers can follow without using that
tool.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Add "Using scripts" guide for skill creators
New guide at `docs/skill-creation/using-scripts.mdx` covering how to use
commands and scripts in skills:
- One-off commands with `uvx`, `pipx`, `npx`, `bunx`, `go run`,
`deno run` (tabbed by ecosystem, with pinned version examples)
- Referencing bundled scripts from `SKILL.md` using relative paths
- Self-contained scripts with inline dependency declarations (PEP 723,
Deno `npm:` imports, Bun auto-install, Ruby `bundler/inline` — tabbed
with a common HTML-parsing example)
- Designing scripts for agentic use: non-interactive execution, `--help`
documentation, error messages, structured output, and a compressed
checklist of further considerations
Also updates `docs/docs.json` to organize navigation into groups ("For
skill creators" and "For client implementors").
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Address review feedback on "Using scripts" guide
Relative paths note: Clarify that the convention applies to support
files like `references/*.md`, and explain *why* (the agent runs commands
from the skill root).
Structured output: Reframe motivation around composability with both
agents and standard tools (`jq`, `cut`, `awk`) rather than LLM parsing
ambiguity. Shorten prose; let the code example's inline comments carry
the contrast.
Predictable output size: Add `--output` flag as an alternative strategy
for scripts whose output is large and not amenable to pagination. The
`--output` flag acts as a consent mechanism — the agent must explicitly
choose a file destination or pass `-` to opt in to stdout, preventing
accidental context-window flooding.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>