Sandbox Reruns

Why Reruns

At synapse time the validator checks that the miner’s sandbox_manifest.digest equals the image_hash they declared at registration. That gates the synapse, but it does not verify that the miner actually ran that image. They could be running a faster or lighter container and signing the declared hash regardless. The async rerun verifies it. The validator pulls the registered image after the round, runs it against the same bundle and nonce, and compares the resulting traces. A miner who tampered with the trace data, or ran a different image, or whose pipeline is non deterministic will produce different traces and be caught.

When a Rerun Is Triggered

For each primary in a task’s verification group, the validator inspects the post round consensus signal:

Group state	Reruns triggered
Consensus broken (group split > 50/50)	Every diverging primary
Some primaries diverge from majority	Only diverging primaries
All primaries agree	One randomly selected primary (sampling)

This balances cost (we do not rerun every primary every round) against coverage (every primary gets sampled regularly, and any primary that looked off triggers a deeper check).

Rerun Queue

Reruns persist in a SQLite queue at ~/.phylax/rerun_queue.sqlite3 on the validator host. The schema is small:

Field	Type	Description
`id`	int	Queue position
`enqueued_at`	timestamp	When the validator added the job
`hotkey`	text	Target miner hotkey
`skill_type`	text	Task skill type
`bundle_sha256`	text	Bundle being rerun
`nonce`	hex	Nonce to replay
`image_uri`	text	Image to pull
`image_hash`	text	Digest the miner registered
`original_trace_hashes`	json	Hashes the miner declared
`status`	enum	`pending`, `running`, `done`, `failed`
`result`	json	Comparison result, if done

The queue survives validator restarts. If the validator goes down, pending reruns resume from where the queue left off.

The Rerun Worker

A separate thread inside the validator container drives the queue.

Pull next job

SELECT * FROM rerun_queue WHERE status = 'pending' ORDER BY enqueued_at LIMIT 1

Pull the miner image

docker pull <image_uri>@<image_hash>

If the digest of the pulled image does not match image_hash, mark the job failed with reason digest_mismatch and apply reputation x 0.5. The miner is publishing a different image than they registered.

Run the image

docker run --rm \
  --network none \
  --cpus 1 --memory 1g \
  --read-only --tmpfs /tmp \
  -v <bundle_path>:/work/bundle:ro \
  -e PHYLAX_NONCE=<nonce> \
  -e PHYLAX_SKILL_TYPE=<skill_type> \
  <image_uri>@<image_hash>

The container has no network, capped resources, and a 10 minute wall clock. Output is the trace bundle tar.

Compare

fs_trace_hash must equal the miner’s declared hash exactly. The canary write is deterministic so this is a precise equality check.
Semantic agreement on network.jsonl, process.jsonl, secrets.jsonl must be ≥ 0.7. Semantic agreement uses set intersection over normalised record signatures (e.g. (dst_host, dst_port) for network records).

If both pass, the rerun confirms the miner. If either fails, the rerun marks the miner as divergent.

Update reputation

Outcome	Delta
Pass	+0.02 (clamped to 1.0)
Fail	x 0.7
Digest mismatch	x 0.5

Mark done

Update queue row, log result, move to next job.

The worker runs as fast as Docker pulls and runs allow. There is no rate limit.

Why the fs Trace Hash Is Exact

The canary file write inside the sandbox is echo <canary_text> > /tmp/phylax/canary-<nonce>.txt. The text, path, and timing are all deterministic given the nonce. The normalised fs.jsonl record for this write is byte identical across any honest pipeline. If the miner’s pipeline produced a different hash for this file, they are doing something different from what they declared. Either they did not actually emit the canary write, or they tampered with the JSONL records, or they ran a different image. Other traces (network, process, secrets) are not deterministic byte for byte. Process IDs, timestamp microseconds, ephemeral port numbers vary between runs of the same code. Hence the 0.7 semantic agreement threshold for those.

Sandbox Isolation

The validator runs reruns under tight constraints:

Constraint	Value
`--network none`	No network access
`--cpus 1`	One vCPU max
`--memory 1g`	1 GB RAM max
`--read-only`	Root filesystem read only
`--tmpfs /tmp`	Writable tmp inside the container
`-v ...bundle:ro`	Bundle mounted read only
10 minute wall clock	SIGTERM after 10 minutes

A malicious miner image cannot escape the container, cannot reach the host network, cannot persist anything, and cannot starve the validator host.

Failure Modes

Failure	What the worker does
Image not pullable (404, private registry)	Mark `failed`, log, apply reputation x 0.7
Digest mismatch	Mark `failed`, log, apply reputation x 0.5
Container exits non zero	Mark `failed`, log, apply reputation x 0.7
Container times out	Mark `failed`, log, apply reputation x 0.7
Trace file missing in output	Mark `failed`, log, apply reputation x 0.7
Trace fs hash mismatch	Mark `failed`, log, apply reputation x 0.7
Trace semantic agreement < 0.7	Mark `failed`, log, apply reputation x 0.7
Otherwise	Mark `done`, log, apply reputation +0.02

In all cases the result is logged and a row is preserved. Operators can post mortem the rerun history of any miner.

Throughput

A single validator runs reruns serially. Throughput is bounded by Docker pull time plus container run time.

Phase	Typical
Image pull (first time)	10 to 60 s
Image pull (cached)	< 1 s
Container run	30 to 90 s
Trace comparison	< 1 s

Per rerun: 30 to 90 seconds amortised. A validator runs maybe twenty rounds per hour, so up to twenty reruns per hour. The queue stays empty under normal load. If the queue grows persistently, the operator should:

Confirm Docker pulls are not blocked by network or registry rate limits
Consider raising WEIGHT_UPDATE_INTERVAL to slow the round cadence

What’s Next

Probe Events

What the rerun is comparing against, derived from the nonce.

SSSA Schema

The sandbox_manifest and trace_hashes fields the rerun checks.

Reputation

How rerun outcomes feed into per type reputation.

Validator Setup

Where the rerun worker fits in the validator deployment.

​Why Reruns

​When a Rerun Is Triggered

​Rerun Queue

​The Rerun Worker

​Why the fs Trace Hash Is Exact

​Sandbox Isolation

​Failure Modes

​Throughput

​What’s Next