Skip to main content

Why Reruns

At synapse time the validator checks that the miner’s sandbox_manifest.digest equals the image_hash they declared at registration. That gates the synapse, but it does not verify that the miner actually ran that image. They could be running a faster or lighter container and signing the declared hash regardless. The async rerun verifies it. The validator pulls the registered image after the round, runs it against the same bundle and nonce, and compares the resulting traces. A miner who tampered with the trace data, or ran a different image, or whose pipeline is non deterministic will produce different traces and be caught.

When a Rerun Is Triggered

For each primary in a task’s verification group, the validator inspects the post round consensus signal:
Group stateReruns triggered
Consensus broken (group split > 50/50)Every diverging primary
Some primaries diverge from majorityOnly diverging primaries
All primaries agreeOne randomly selected primary (sampling)
This balances cost (we do not rerun every primary every round) against coverage (every primary gets sampled regularly, and any primary that looked off triggers a deeper check).

Rerun Queue

Reruns persist in a SQLite queue at ~/.phylax/rerun_queue.sqlite3 on the validator host. The schema is small:
FieldTypeDescription
idintQueue position
enqueued_attimestampWhen the validator added the job
hotkeytextTarget miner hotkey
skill_typetextTask skill type
bundle_sha256textBundle being rerun
noncehexNonce to replay
image_uritextImage to pull
image_hashtextDigest the miner registered
original_trace_hashesjsonHashes the miner declared
statusenumpending, running, done, failed
resultjsonComparison result, if done
The queue survives validator restarts. If the validator goes down, pending reruns resume from where the queue left off.

The Rerun Worker

A separate thread inside the validator container drives the queue.
1

Pull next job

SELECT * FROM rerun_queue WHERE status = 'pending' ORDER BY enqueued_at LIMIT 1
2

Pull the miner image

docker pull <image_uri>@<image_hash>
If the digest of the pulled image does not match image_hash, mark the job failed with reason digest_mismatch and apply reputation x 0.5. The miner is publishing a different image than they registered.
3

Run the image

docker run --rm \
  --network none \
  --cpus 1 --memory 1g \
  --read-only --tmpfs /tmp \
  -v <bundle_path>:/work/bundle:ro \
  -e PHYLAX_NONCE=<nonce> \
  -e PHYLAX_SKILL_TYPE=<skill_type> \
  <image_uri>@<image_hash>
The container has no network, capped resources, and a 10 minute wall clock. Output is the trace bundle tar.
4

Compare

  • fs_trace_hash must equal the miner’s declared hash exactly. The canary write is deterministic so this is a precise equality check.
  • Semantic agreement on network.jsonl, process.jsonl, secrets.jsonl must be ≥ 0.7. Semantic agreement uses set intersection over normalised record signatures (e.g. (dst_host, dst_port) for network records).
If both pass, the rerun confirms the miner. If either fails, the rerun marks the miner as divergent.
5

Update reputation

OutcomeDelta
Pass+0.02 (clamped to 1.0)
Failx 0.7
Digest mismatchx 0.5
6

Mark done

Update queue row, log result, move to next job.
The worker runs as fast as Docker pulls and runs allow. There is no rate limit.

Why the fs Trace Hash Is Exact

The canary file write inside the sandbox is echo <canary_text> > /tmp/phylax/canary-<nonce>.txt. The text, path, and timing are all deterministic given the nonce. The normalised fs.jsonl record for this write is byte identical across any honest pipeline. If the miner’s pipeline produced a different hash for this file, they are doing something different from what they declared. Either they did not actually emit the canary write, or they tampered with the JSONL records, or they ran a different image. Other traces (network, process, secrets) are not deterministic byte for byte. Process IDs, timestamp microseconds, ephemeral port numbers vary between runs of the same code. Hence the 0.7 semantic agreement threshold for those.

Sandbox Isolation

The validator runs reruns under tight constraints:
ConstraintValue
--network noneNo network access
--cpus 1One vCPU max
--memory 1g1 GB RAM max
--read-onlyRoot filesystem read only
--tmpfs /tmpWritable tmp inside the container
-v ...bundle:roBundle mounted read only
10 minute wall clockSIGTERM after 10 minutes
A malicious miner image cannot escape the container, cannot reach the host network, cannot persist anything, and cannot starve the validator host.

Failure Modes

FailureWhat the worker does
Image not pullable (404, private registry)Mark failed, log, apply reputation x 0.7
Digest mismatchMark failed, log, apply reputation x 0.5
Container exits non zeroMark failed, log, apply reputation x 0.7
Container times outMark failed, log, apply reputation x 0.7
Trace file missing in outputMark failed, log, apply reputation x 0.7
Trace fs hash mismatchMark failed, log, apply reputation x 0.7
Trace semantic agreement < 0.7Mark failed, log, apply reputation x 0.7
OtherwiseMark done, log, apply reputation +0.02
In all cases the result is logged and a row is preserved. Operators can post mortem the rerun history of any miner.

Throughput

A single validator runs reruns serially. Throughput is bounded by Docker pull time plus container run time.
PhaseTypical
Image pull (first time)10 to 60 s
Image pull (cached)< 1 s
Container run30 to 90 s
Trace comparison< 1 s
Per rerun: 30 to 90 seconds amortised. A validator runs maybe twenty rounds per hour, so up to twenty reruns per hour. The queue stays empty under normal load. If the queue grows persistently, the operator should:
  • Confirm Docker pulls are not blocked by network or registry rate limits
  • Consider raising WEIGHT_UPDATE_INTERVAL to slow the round cadence

What’s Next

Probe Events

What the rerun is comparing against, derived from the nonce.

SSSA Schema

The sandbox_manifest and trace_hashes fields the rerun checks.

Reputation

How rerun outcomes feed into per type reputation.

Validator Setup

Where the rerun worker fits in the validator deployment.