NAME

rad-experiment-benchmark - Run benchmarks on an existing worktree and output per-metric JSON

SYNOPSIS

rad-experiment benchmark [-r|--repo] <--worktree> <--bench-cmd> [-q|--quiet] [--build-cmd] [--pretty] <--metric> [--runs] [--label] [-h|--help]

DESCRIPTION

Runs bench_cmd N times against a worktree, extracts every metric defined with --metric, and prints the result as JSON on stdout. Does not touch the COB store or require a Radicle profile — purely a stateless helper that pairs with rad-experiment-compute-delta(1).

Each --metric takes the form `name=unit:criteria:regex`, where `criteria` is `lower_is_better` or `higher_is_better`. The first --metric is the primary (optimization target); the rest are secondary. The output JSON contains the --label string, the worktree path, the runs count, and a `metrics` object keyed by metric name. Each metric entry holds its unit, its criteria, an is_primary flag, the median and standard deviation scaled by 1000, the raw per-run samples_x1000 array, and the sample count n.

OPTIONS

-r, --repo <REPO>: Repository path or Radicle RID (defaults to current directory)
--worktree <WORKTREE>: Path to the worktree to benchmark
--bench-cmd <BENCH_CMD>: Command that runs the benchmark
-q, --quiet: Suppress non-error output to stderr
--build-cmd <BUILD_CMD>: Optional pre-benchmark build step
--pretty: Pretty-print JSON output (for human inspection; agents should omit)
--metric <NAME=UNIT:CRITERIA:REGEX>: Metric definition (repeatable). Format: name=unit:criteria:regex. First --metric is primary
--runs <RUNS> [default: 5]: Number of benchmark runs (default: 5)
--label <LABEL> [default: benchmark]: Label for this benchmark (e.g. "baseline" or "candidate")
-h, --help: Print help (see a summary with '-h')

EXTRA

EXAMPLE — bench two worktrees in sequence:

git worktree add /tmp/repo-base 9b32764 git worktree add /tmp/repo-head 5574144

rad-experiment benchmark --worktree /tmp/repo-base --bench-cmd 'bash ./bench/benchmark.sh' --metric 'duration_ms=ms:lower_is_better:duration\s*:\s*([0-9.]+)\s*ms' --runs 5 --label baseline > /tmp/baseline.json

rad-experiment benchmark --worktree /tmp/repo-head --bench-cmd 'bash ./bench/benchmark.sh' --metric 'duration_ms=ms:lower_is_better:duration\s*:\s*([0-9.]+)\s*ms' --runs 5 --label candidate > /tmp/candidate.json

benchmark