Why Reruns
At synapse time the validator checks that the miner’ssandbox_manifest.digest equals the image_hash they declared at registration. That gates the synapse, but it does not verify that the miner actually ran that image. They could be running a faster or lighter container and signing the declared hash regardless.
The async rerun verifies it. The validator pulls the registered image after the round, runs it against the same bundle and nonce, and compares the resulting traces.
A miner who tampered with the trace data, or ran a different image, or whose pipeline is non deterministic will produce different traces and be caught.
When a Rerun Is Triggered
For each primary in a task’s verification group, the validator inspects the post round consensus signal:| Group state | Reruns triggered |
|---|---|
| Consensus broken (group split > 50/50) | Every diverging primary |
| Some primaries diverge from majority | Only diverging primaries |
| All primaries agree | One randomly selected primary (sampling) |
Rerun Queue
Reruns persist in a SQLite queue at~/.phylax/rerun_queue.sqlite3 on the validator host. The schema is small:
| Field | Type | Description |
|---|---|---|
id | int | Queue position |
enqueued_at | timestamp | When the validator added the job |
hotkey | text | Target miner hotkey |
skill_type | text | Task skill type |
bundle_sha256 | text | Bundle being rerun |
nonce | hex | Nonce to replay |
image_uri | text | Image to pull |
image_hash | text | Digest the miner registered |
original_trace_hashes | json | Hashes the miner declared |
status | enum | pending, running, done, failed |
result | json | Comparison result, if done |
The Rerun Worker
A separate thread inside the validator container drives the queue.Pull the miner image
image_hash, mark the job failed with reason digest_mismatch and apply reputation x 0.5. The miner is publishing a different image than they registered.Run the image
Compare
fs_trace_hashmust equal the miner’s declared hash exactly. The canary write is deterministic so this is a precise equality check.- Semantic agreement on
network.jsonl,process.jsonl,secrets.jsonlmust be ≥ 0.7. Semantic agreement uses set intersection over normalised record signatures (e.g.(dst_host, dst_port)for network records).
Why the fs Trace Hash Is Exact
The canary file write inside the sandbox isecho <canary_text> > /tmp/phylax/canary-<nonce>.txt. The text, path, and timing are all deterministic given the nonce. The normalised fs.jsonl record for this write is byte identical across any honest pipeline.
If the miner’s pipeline produced a different hash for this file, they are doing something different from what they declared. Either they did not actually emit the canary write, or they tampered with the JSONL records, or they ran a different image.
Other traces (network, process, secrets) are not deterministic byte for byte. Process IDs, timestamp microseconds, ephemeral port numbers vary between runs of the same code. Hence the 0.7 semantic agreement threshold for those.
Sandbox Isolation
The validator runs reruns under tight constraints:| Constraint | Value |
|---|---|
--network none | No network access |
--cpus 1 | One vCPU max |
--memory 1g | 1 GB RAM max |
--read-only | Root filesystem read only |
--tmpfs /tmp | Writable tmp inside the container |
-v ...bundle:ro | Bundle mounted read only |
| 10 minute wall clock | SIGTERM after 10 minutes |
Failure Modes
| Failure | What the worker does |
|---|---|
| Image not pullable (404, private registry) | Mark failed, log, apply reputation x 0.7 |
| Digest mismatch | Mark failed, log, apply reputation x 0.5 |
| Container exits non zero | Mark failed, log, apply reputation x 0.7 |
| Container times out | Mark failed, log, apply reputation x 0.7 |
| Trace file missing in output | Mark failed, log, apply reputation x 0.7 |
| Trace fs hash mismatch | Mark failed, log, apply reputation x 0.7 |
| Trace semantic agreement < 0.7 | Mark failed, log, apply reputation x 0.7 |
| Otherwise | Mark done, log, apply reputation +0.02 |
Throughput
A single validator runs reruns serially. Throughput is bounded by Docker pull time plus container run time.| Phase | Typical |
|---|---|
| Image pull (first time) | 10 to 60 s |
| Image pull (cached) | < 1 s |
| Container run | 30 to 90 s |
| Trace comparison | < 1 s |
- Confirm Docker pulls are not blocked by network or registry rate limits
- Consider raising
WEIGHT_UPDATE_INTERVALto slow the round cadence
What’s Next
Probe Events
What the rerun is comparing against, derived from the nonce.
SSSA Schema
The
sandbox_manifest and trace_hashes fields the rerun checks.Reputation
How rerun outcomes feed into per type reputation.
Validator Setup
Where the rerun worker fits in the validator deployment.