NAME

rad-experiment-reproduce - Add a reproduction to an experiment (auto-runs benchmarks, or accepts manual measurements)

SYNOPSIS

rad-experiment reproduce [-r|--repo] [--runs] [--arch] [-q|--quiet] [--os] [--pretty] [--cpu] [--memory] [--baseline-median] [--baseline-std] [--baseline-samples] [--baseline-n] [--candidate-median] [--candidate-std] [--candidate-samples] [--candidate-n] [--notes] [--json] [-h|--help] <ID>

DESCRIPTION

Adds a reproduction to an existing experiment. Has two modes.

AUTO MODE (default). With no measurement flags, reproduce ensures both the base and candidate commits are reachable in your local clone (fetching the author's branch if needed), reads bench_cmd / build_cmd / per-metric regexes directly from the experiment COB, runs bench_cmd --runs times against each side (default 3), extracts every metric defined on the COB (not just the primary), and records them all on the new reproduction. Experiments published before schema v5 don't carry the benchmark config and can no longer be reproduced automatically — reproduce aborts with "can't reproduce. older configuration." in that case. A v5 experiment missing bench_cmd or a regex aborts with "can't reproduce. missing benchmark config."

MANUAL MODE. Pass any of --baseline-median, --baseline-n, --candidate-median, or --candidate-n and the command switches to manual mode. All four become required, and only the primary metric is recorded.

WARNING: reproduction runs UNTRUSTED CODE. It checks out a branch you may not control and executes its bench_cmd. Review the candidate diff first, or run inside a container or VM.

OPTIONS

-r, --repo <REPO>: Repository path or Radicle RID (defaults to current directory)
--runs <RUNS>: Number of benchmark runs (auto mode only, default: 3)
--arch <ARCH>: CPU architecture override (auto-detected if omitted)
-q, --quiet: Suppress non-error output to stderr
--os <OS>: OS override (auto-detected if omitted)
--pretty: Pretty-print JSON output (for human inspection; agents should omit)
--cpu <CPU>: CPU model override (auto-detected if omitted)
--memory <MEMORY>: Total physical memory in bytes (auto-detected if omitted)
--baseline-median <BASELINE_MEDIAN>: Baseline median (value × 1000, integer). If omitted, benchmarks are run automatically
--baseline-std <BASELINE_STD>: Baseline standard deviation (value × 1000, integer)
--baseline-samples <BASELINE_SAMPLES>: Baseline per-run samples (value × 1000, comma-separated)
--baseline-n <BASELINE_N>: Baseline sample count
--candidate-median <CANDIDATE_MEDIAN>: Candidate median (value × 1000, integer)
--candidate-std <CANDIDATE_STD>: Candidate standard deviation (value × 1000, integer)
--candidate-samples <CANDIDATE_SAMPLES>: Candidate per-run samples (value × 1000, comma-separated)
--candidate-n <CANDIDATE_N>: Candidate sample count
--notes <NOTES>: Notes
--json: Output as JSON
-h, --help: Print help (see a summary with '-h')
<ID>: Experiment ID

EXTRA

EXAMPLES:

Auto mode — 3 runs of base + 3 runs of candidate, every metric from the COB:

rad-experiment reproduce 5574144

More samples for tighter confidence:

rad-experiment reproduce 5574144 --runs 10

Auto mode with a note explaining the environment:

rad-experiment reproduce 5574144 --notes "warm cache, perf governor"

Manual mode — primary metric only, no re-benchmarking:

rad-experiment reproduce 5574144 --baseline-median 1498 --baseline-n 5 --candidate-median 1430 --candidate-n 5 --notes "ran on a borrowed Hetzner box"

reproduce