| rad-experiment-benchmark(1) | General Commands Manual | rad-experiment-benchmark(1) |
NAME
rad-experiment-benchmark - Run benchmarks on an existing worktree and output per-metric JSON
SYNOPSIS
rad-experiment benchmark [-r|--repo] <--worktree> <--bench-cmd> [-q|--quiet] [--build-cmd] [--pretty] <--metric> [--runs] [--label] [-h|--help]
DESCRIPTION
Runs bench_cmd N times against a worktree, extracts every metric defined with --metric, and prints the result as JSON on stdout. Does not touch the COB store or require a Radicle profile — purely a stateless helper that pairs with rad-experiment-compute-delta(1).
Each --metric takes the form `name=unit:criteria:regex`, where `criteria` is `lower_is_better` or `higher_is_better`. The first --metric is the primary (optimization target); the rest are secondary. The output JSON contains the --label string, the worktree path, the runs count, and a `metrics` object keyed by metric name. Each metric entry holds its unit, its criteria, an is_primary flag, the median and standard deviation scaled by 1000, the raw per-run samples_x1000 array, and the sample count n.
OPTIONS
- -r, --repo <REPO>
- Repository path or Radicle RID (defaults to current directory)
- --worktree <WORKTREE>
- Path to the worktree to benchmark
- --bench-cmd <BENCH_CMD>
- Command that runs the benchmark
- -q, --quiet
- Suppress non-error output to stderr
- --build-cmd <BUILD_CMD>
- Optional pre-benchmark build step
- --pretty
- Pretty-print JSON output (for human inspection; agents should omit)
- --metric <NAME=UNIT:CRITERIA:REGEX>
- Metric definition (repeatable). Format: name=unit:criteria:regex. First --metric is primary
- --runs <RUNS> [default: 5]
- Number of benchmark runs (default: 5)
- --label <LABEL> [default: benchmark]
- Label for this benchmark (e.g. "baseline" or "candidate")
- -h, --help
- Print help (see a summary with '-h')
EXTRA
EXAMPLE — bench two worktrees in sequence:
git worktree add /tmp/repo-base 9b32764 git worktree add /tmp/repo-head 5574144
rad-experiment benchmark --worktree /tmp/repo-base --bench-cmd 'bash ./bench/benchmark.sh' --metric 'duration_ms=ms:lower_is_better:duration\s*:\s*([0-9.]+)\s*ms' --runs 5 --label baseline > /tmp/baseline.json
rad-experiment benchmark --worktree /tmp/repo-head --bench-cmd 'bash ./bench/benchmark.sh' --metric 'duration_ms=ms:lower_is_better:duration\s*:\s*([0-9.]+)\s*ms' --runs 5 --label candidate > /tmp/candidate.json
| benchmark |