NAME

rad-experiment-publish - Publish optimization experiments as signed COBs

SYNOPSIS

rad-experiment publish [--dry-run] [-r|--repo] [--index] [-q|--quiet] [--from-json] [--pretty] [-d|--description] [--base] [--head] [--metric] [--baseline-median] [--baseline-std] [--baseline-samples] [--baseline-n] [--candidate-median] [--candidate-std] [--candidate-samples] [--candidate-n] [--secondary] [--agent-system] [--agent-model] [--arch] [--os] [--cpu] [--memory] [--unit] [--criteria] [--bench-cmd] [--build-cmd] [--metric-regex] [--asi] [--bench-hash] [--json] [-h|--help] [PATH]

DESCRIPTION

Publishes optimization experiments as signed COBs. Three modes, chosen by argument shape:

1. Tape mode — `rad-experiment publish <file.jsonl>` Imports an append-only session log written by pi-autoresearch or cc's own autoresearch skill and publishes every kept or discarded experiment in it. Benchmark config (bench_cmd, build_cmd, per-metric regex) is read from the segment's config header if present, and filled in from CLI flags (`--bench-cmd`, `--build-cmd`, `--metric-regex`) otherwise; if neither source provides a bench_cmd, publish fails. An index file under `<jsonl_parent>/.community-computer/published.json` tracks which (base,head) pairs have already been published — delete it to re-publish.

2. JSON mode — `rad-experiment publish --from-json <path|->` Reads all required fields from a JSON file (or `-` for stdin). Accepts `compute-delta` output directly or a flat publish-flag shape. CLI flags override individual fields.

3. Flag mode — `rad-experiment publish --base X --head Y --metric M --bench-cmd '...' --metric-regex '<name>=<regex>' ...` Builds the experiment record from explicit flags plus an auto-detected environment record (CPU arch, OS, CPU brand, RAM — each overridable). bench_cmd is required; build_cmd is optional; the primary metric must have a `--metric-regex`.

In every mode the record is signed with your Radicle key and written to the COB store, and publish announces the new refs to your local Radicle node so peers see the experiment. The COB only references commits by OID — for peers to actually fetch those commits they must be reachable from a ref the publisher has pushed (any branch that contains them works). publish does not push branches itself; that's on you.

OPTIONS

--dry-run: Parse and print what would be published without touching the COB store. Tape mode only
-r, --repo <REPO>: Repository path or Radicle RID (defaults to current directory)
--index <PATH>: Override the idempotency index file path (tape mode). Default: `<jsonl_parent>/.community-computer/published.json`
-q, --quiet: Suppress non-error output to stderr
--from-json <PATH>: Read all required fields from a JSON file (or `-` for stdin).
Accepts `compute-delta` output directly (keys like `base_sha`, `primary_metric`, `baseline_median_x1000`, `metrics[primary].*`, `secondary_flags`) or a flat publish-flag shape (`base`, `metric`, `baseline_median`, ...). When set, `--base` / `--head` / `--metric` / `--baseline-median` / `--baseline-n` / `--candidate-median` / `--candidate-n` are satisfied from JSON; CLI flags override.
--pretty: Pretty-print JSON output (for human inspection; agents should omit)
-d, --description <DESCRIPTION>: Hypothesis: what was tried and why
--base <BASE>: Base commit SHA (flag mode)
--head <HEAD>: Candidate commit SHA (flag mode)
--metric <METRIC>: Primary metric name (flag mode)
--baseline-median <BASELINE_MEDIAN>: Baseline median (value × 1000, integer) — flag mode
--baseline-std <BASELINE_STD>: Baseline standard deviation (value × 1000, integer)
--baseline-samples <BASELINE_SAMPLES> [default: ]: Baseline per-run samples (value × 1000, comma-separated)
--baseline-n <BASELINE_N>: Baseline sample count (flag mode)
--candidate-median <CANDIDATE_MEDIAN>: Candidate median (value × 1000, integer) — flag mode
--candidate-std <CANDIDATE_STD>: Candidate standard deviation (value × 1000, integer)
--candidate-samples <CANDIDATE_SAMPLES> [default: ]: Candidate per-run samples (value × 1000, comma-separated)
--candidate-n <CANDIDATE_N>: Candidate sample count (flag mode)
--secondary <SECONDARY>: Secondary metric (repeatable). Format: name:baseline_median_x1000:candidate_median_x1000 Example: --secondary "binary_size:1000000:950000"
--agent-system <AGENT_SYSTEM> [default: claude-code]: Agent system
--agent-model <AGENT_MODEL> [default: claude-opus-4-6]: Agent model
--arch <ARCH>: CPU architecture override (auto-detected if omitted)
--os <OS>: OS override (auto-detected if omitted)
--cpu <CPU>: CPU model override (auto-detected if omitted)
--memory <MEMORY>: Total physical memory in bytes (auto-detected if omitted)
--unit <UNIT>: Unit for the primary metric (e.g. "ms", "µs")
--criteria <CRITERIA>: Criteria for the primary metric: "lower_is_better" or "higher_is_better"
--bench-cmd <BENCH_CMD>: Command that runs the benchmark (e.g. "bash ./bench/benchmark.sh"). Required in flag mode. In tape mode, fills in any segment whose header doesn't specify one
--build-cmd <BUILD_CMD>: Optional pre-benchmark build step
--metric-regex <NAME=REGEX>: Per-metric regex (repeatable). Format: name=<regex>. Must have an entry for the primary metric in flag mode. In tape mode, fills in entries missing from the segment header
--asi <ASI>: Actionable Side Information as a JSON object. v4 schema or later.
Free-form per-run diagnostics: hypothesis, rollback reason, profiler notes, etc. Must be a JSON object (keys are strings, values are arbitrary JSON).
--bench-hash <BENCH_HASH>: SHA-256 (hex) of the benchmark script at the time of the run. v4 schema or later
--json: Output as JSON
-h, --help: Print help (see a summary with '-h')
[PATH]: Path to an `autoresearch.jsonl` session log (tape mode).
When present, reads the tape and publishes every unpublished keep/discard. Mutually exclusive with `--from-json` and the flag-mode args (`--base`, `--head`, `--metric`, ...).

EXTRA

EXAMPLES:

Tape mode — publish every unpublished keep/discard from a session log:

rad-experiment publish autoresearch.jsonl

Tape mode with a dry-run and a custom index location:

rad-experiment publish autoresearch.jsonl --dry-run rad-experiment publish autoresearch.jsonl --index /tmp/cc.json

Flag mode — latency dropped 1500 ms → 1425 ms across 5 runs:

rad-experiment publish --base 9b32764 --head 5574144 --metric duration_ms --baseline-median 1500 --baseline-n 5 --candidate-median 1425 --candidate-n 5 --bench-cmd 'bash ./bench/benchmark.sh' --metric-regex 'duration_ms=duration\s*:\s*([0-9.]+)\s*ms' --description "Hoist allocation"

With per-run samples and a standard deviation:

rad-experiment publish --base 9b32764 --head 5574144 --metric duration_ms --baseline-median 1500 --baseline-std 23 --baseline-samples 1488,1502,1497,1510,1503 --baseline-n 5 --candidate-median 1425 --candidate-std 18 --candidate-samples 1420,1432,1418,1428,1425 --candidate-n 5

With a secondary metric (binary size dropped 1 MB → 950 KB):

rad-experiment publish --base 9b32764 --head 5574144 --metric duration_ms --baseline-median 1500 --baseline-n 5 --candidate-median 1425 --candidate-n 5 --secondary "binary_size_bytes:1000000:950000"

Tagging a hand-written change instead of the default agent:

rad-experiment publish --agent-system human --agent-model $(whoami) --base 9b32764 --head 5574144 --metric duration_ms --baseline-median 1500 --baseline-n 5 --candidate-median 1425 --candidate-n 5

JSON mode — pipe compute-delta output straight into publish:

rad-experiment compute-delta --baseline base.json --candidate cand.json --primary-metric duration_ms --criteria lower_is_better --bench-cmd 'bash ./bench/benchmark.sh' --base-commit 9b32764 --head-commit 5574144 --description "Hoist allocation" | rad-experiment publish --from-json -

Or from the pending file that compute-delta writes:

rad-experiment publish --from-json /tmp/cc-experiment-pending/5574144.json

publish