← CLI reference

rad-experiment-benchmark(1) General Commands Manual rad-experiment-benchmark(1)

rad-experiment-benchmark - Run benchmarks on an existing worktree and output per-metric JSON

rad-experiment benchmark [-r|--repo] <--worktree> <--bench-cmd> [-q|--quiet] [--build-cmd] [--pretty] <--metric> [--runs] [--label] [-h|--help]

Runs bench_cmd N times against a worktree, extracts every metric defined with --metric, and prints the result as JSON on stdout. Does not touch the COB store or require a Radicle profile — purely a stateless helper that pairs with rad-experiment-compute-delta(1).

Each --metric takes the form `name=unit:criteria:regex`, where `criteria` is `lower_is_better` or `higher_is_better`. The first --metric is the primary (optimization target); the rest are secondary. The output JSON contains the --label string, the worktree path, the runs count, and a `metrics` object keyed by metric name. Each metric entry holds its unit, its criteria, an is_primary flag, the median and standard deviation scaled by 1000, the raw per-run samples_x1000 array, and the sample count n.

Repository path or Radicle RID (defaults to current directory)
Path to the worktree to benchmark
Command that runs the benchmark
Suppress non-error output to stderr
Optional pre-benchmark build step
Pretty-print JSON output (for human inspection; agents should omit)
Metric definition (repeatable). Format: name=unit:criteria:regex. First --metric is primary
Number of benchmark runs (default: 5)
Label for this benchmark (e.g. "baseline" or "candidate")
Print help (see a summary with '-h')

EXAMPLE — bench two worktrees in sequence:

git worktree add /tmp/repo-base 9b32764 git worktree add /tmp/repo-head 5574144

rad-experiment benchmark --worktree /tmp/repo-base --bench-cmd 'bash ./bench/benchmark.sh' --metric 'duration_ms=ms:lower_is_better:duration\s*:\s*([0-9.]+)\s*ms' --runs 5 --label baseline > /tmp/baseline.json

rad-experiment benchmark --worktree /tmp/repo-head --bench-cmd 'bash ./bench/benchmark.sh' --metric 'duration_ms=ms:lower_is_better:duration\s*:\s*([0-9.]+)\s*ms' --runs 5 --label candidate > /tmp/candidate.json

benchmark