# Prior Art & Algorithm Families

Phase R of the synced-clock/synced-tick research (see `./TIMER_RESEARCH_PLAN.md`). This document
surveys the distributed-systems algorithm families that bear on our 6 hard EDGE cases and records an
explicit **accept** or **reject** for each, with reasons tied to our hard constraints:

> **Constraints (apply to every decision below):** sparse peer-to-peer, **no master / no server**,
> broadcast/attendance-friendly, **no excessive per-tick message overhead**, best-effort transport
> (drop/dup/reorder/partition), and a **monotonic in-game tick that must never run backward**.

The current baseline in this repo: `tick = floor((syncedClock.now() - epoch) / tickMs)`, where
`epoch = strict-min(epochs seen)` ("oldest-epoch-wins", a min-register CRDT) and `syncedClock` is
**Cristian's algorithm + median offset** (`test-harness/SyncedPeerClock.js`, modelling v1
`SyncedClock.js`). It solves late-join + bounded skew but fails all 6 EDGEs.

The EDGEs (full text in `../../tests/synced-tick-distributed-spec.test.js`, `../../DECISIONS.md` #37):

- **EDGE 1** — simultaneous cold-start, no founder anchor → need a deterministic tie-break (min raw epoch hijacks by hours).
- **EDGE 2** — two converged groups merge → longer-present group wins; no averaging into a "third time".
- **EDGE 3a** — small recalibration → tick stays monotonic, never rewinds.
- **EDGE 3b** — large (~1h) recalibration → no 72,000-tick lurch; ease or abandon-and-resync.
- **EDGE 4** — late joiner with a "ready" clock must yield to a longer-present group; readiness ≠ authority.
- **EDGE 5a** — a peer that slept must self-demote its stale seniority and re-bootstrap, not yank the live group.
- **EDGE 5b** — a single forged-seniority peer must not move a group of 3 that agree (out of initial scope; informs design).
- **EDGE 6** — a large legitimate correction must re-sync *simulation state*, not just snap the tick counter.

---

## Summary table

| # | Algorithm family | EDGEs it serves | Decision |
|---|------------------|-----------------|----------|
| 1 | Cristian's algorithm (clock sync) | baseline for all; offset estimation under 3a | **Accept** (offset estimation only) / **Reject** as an authority or correction mechanism (EDGE 1, 2, 3a, 4, 5b) |
| 2 | NTP / Marzullo's intersection algorithm | EDGE 5b (corroboration/outlier rejection), 5a, 4 | **Accept** as candidate (interval intersection + corroboration) |
| 3 | Berkeley algorithm (averaging) | (would target EDGE 2/group merge) | **Reject** — averaging produces a forbidden "third time" |
| 4 | Lamport timestamps & Vector clocks | EDGE 1 (tie-break), 2/4 (presence ordering), 5a | **Accept** for deterministic tie-break + presence ordering; vector clocks **Reject** as the primary carrier (cost) |
| 5 | Hybrid Logical Clocks (HLC) | EDGE 3a (monotonic), 1/4 (seniority token), 6 | **Accept** as candidate (monotonic tick + non-forgeable seniority) |
| 6 | Bully / Ring / **Spanning Tree root-bridge election** | EDGE 1, 2, 4 (STP = closest analogue); 5a | **Accept** STP-style election; **Reject** Bully and Ring (topology assumptions) |
| 7 | Virtual synchrony / gossip anti-entropy | EDGE 2 (sparse group merge), 6 | **Accept** gossip convergence; virtual synchrony **Reject** as a hard requirement (membership cost) |
| 8 | Paxos / Raft (consensus) | (would target EDGE 1/2/4 authority) | **Reject** — too heavy for per-tick P2P; **borrow** monotonic-term + lease ideas (EDGE 4/5a) |
| 9 | Attendance timeout / **phi-accrual failure detector** | **EDGE 5a** (self-demotion), 5b liveness | **Accept** as candidate (staleness → self-demote) |
| 10 | PLL/FLL loop filters, **slewing vs stepping (adjtime)** | **EDGE 3a/3b** (smooth correction, no rewind) | **Accept** slew-not-step for small; pair with re-sync for large |
| 11 | Snapshot / anti-entropy / log-replay re-bootstrap | **EDGE 3b**, **EDGE 6** (state re-sync) | **Accept** as candidate (abandon-and-resync of simulation state) |

Every EDGE appears below under at least one **accepted candidate** and at least one **rejected
alternative**, both justified against the sparse-P2P-no-master constraints.

---

## 1. Cristian's algorithm (clock synchronization)

### What it is
A client estimates the offset to a time source by timing a request/response round trip: it sends at
`t0`, the source replies with its time `Ts`, the reply arrives at `t1`, and the client estimates the
source's "now" as `Ts + (t1 - t0)/2`, assuming a symmetric path. Repeated samples are filtered (our
system takes the **median** of recent samples, `SyncedPeerClock.js`). This is exactly our current
peer-clock mechanism: each peer pings the peers the transport reports and the responder answers with
its **own synced estimate**, so samples estimate the offset to the same shared frame and the mesh
converges transitively.

### EDGEs it addresses / candidate fit
- **EDGE 3a (small recal):** Cristian's median-offset *estimation* is the right way to compute the
  small ongoing correction; the question of how to *apply* it monotonically belongs to family 10.
- It is the substrate every other family sits on — you cannot do seniority or merge without first
  estimating offsets to peers.

### Why it does or does not fit our constraints
Fits the sparse/no-master constraint well for *offset estimation*: it is pairwise, rides the existing
ping/pong, needs no master. But it has no notion of **authority**: it tells you *how far apart* two
clocks are, not *which clock the group should adopt*. Its `(t1-t0)/2` symmetry assumption leaks a
bounded residual under asymmetric latency. Critically, it has **no tie-break** — when two peers
cold-start with no anchor (EDGE 1) Cristian's gives each a valid-looking offset to the other and the
min-epoch rule on top then hijacks; and a single confidently-sampled peer (EDGE 4, EDGE 5b) produces
a "ready" estimate that Cristian's cannot tell is *late/lone* rather than *authoritative*.

### Accept / Reject
**Accept** as the offset-estimation substrate (keep it; it is `SyncedPeerClock`). **Reject** as an
*authority* or *correction-application* mechanism for **EDGE 1, EDGE 2, EDGE 3a (application half),
EDGE 4, EDGE 5b** — it estimates distance, it does not elect a frame or bound a correction.

---

## 2. NTP / Marzullo's intersection algorithm (fault-tolerant clock selection)

### What it is
NTP layers fault tolerance over Cristian-style sampling. Each source is represented as a **confidence
interval** `[offset - error, offset + error]`. **Marzullo's algorithm** (and its NTP refinement, the
"intersection algorithm") finds the smallest interval consistent with the largest number of sources —
i.e. it discards **outliers** ("falsetickers") and keeps the agreeing majority ("truechimers"). NTP
also disciplines the local clock with a feedback loop (see family 10) rather than stepping.

### EDGEs it addresses / candidate fit
- **EDGE 5b (lone forged peer):** this is the canonical fit. Treat each peer's reported time as an
  interval; a lone dissenter whose interval does not intersect the cluster of the other three is
  rejected as a falseticker. The group does not move toward it. This is *corroboration by interval
  intersection* — adopting a different frame requires multiple peers' intervals to agree.
- **EDGE 5a:** the same intersection logic lets a returning sleeper notice its own interval no longer
  intersects the live majority → a signal to self-demote (complements family 9).
- **EDGE 4:** a "ready" late joiner whose interval sits 20 min from the established cluster is, by
  intersection, the falseticker; readiness does not place it inside the majority interval.

### Why it does or does not fit our constraints
Fits well: interval intersection is cheap, runs on the samples we already collect, needs no master,
and degrades gracefully on best-effort transport (more samples → tighter intervals). The assumption it
imports is that **the honest peers are a majority** — true for EDGE 5b's 3-vs-1, but it cannot defend
against a colluding majority (that is a Byzantine problem we are explicitly not solving now). It also
decides *who is an outlier*, not *who is senior*; it must be combined with a presence/seniority signal
(families 4/6) to answer EDGE 2/4 where neither clock is "wrong", only differently-settled.

### Accept / Reject
**Accept** as a candidate for the **corroboration / outlier-rejection** rule (EDGE 5b primarily; supports
EDGE 4 and EDGE 5a). Its purpose: "do not adopt a different frame unless ≥k peers' intervals corroborate
it." **Reject** as a standalone seniority mechanism — intersection cannot by itself express "longest
present wins".

---

## 3. Berkeley algorithm (averaging) — REJECTED

### What it is
A master-coordinator polls all nodes for their clocks, computes the **average** (discarding outliers),
and tells each node how much to adjust. The group converges to the *mean* of the members' clocks rather
than to any one source.

### EDGEs it would target / candidate fit
On paper it looks relevant to **EDGE 2** (two groups meeting and needing one shared time) and **EDGE 1**
(two peers settling together) — "just average the clocks."

### Why it does NOT fit our constraints
Two independent disqualifiers:
1. **It needs a master coordinator** to run the poll-and-average round. We are explicitly no-master /
   sparse P2P. A leaderless averaging variant exists (iterative gossip averaging) but inherits the
   second, fatal problem:
2. **Averaging produces a forbidden "third time".** The EDGE 2 spec is explicit: when {A,B} (longest
   present) meets {C,D} (30 min away), the network must NOT "average into a third time" — the
   longer-present group's frame must *survive intact* (A's tick must keep its cadence, not leap ~halfway
   toward C,D). Averaging is exactly the behavior the spec forbids: it would drag *every* peer,
   including the senior group, to a brand-new mean nobody was running on, breaking tick monotonicity for
   the senior peers and discarding the "seniority decides" requirement of EDGE 2 and EDGE 4. Averaging
   also has no answer to EDGE 5b — a lone forged claim would *pull the mean* toward itself, the opposite
   of the required behavior.

### Accept / Reject
**Reject** for all EDGEs. Averaging conflicts with the core project requirement that a chosen frame
*survives* a merge rather than being blended. Recorded here precisely so we never reach for it: for
EDGE 2/EDGE 1 we want *election* (one frame wins), not *averaging* (all frames blend).

---

## 4. Lamport timestamps & Vector clocks (logical time)

### What it is
- **Lamport timestamps:** a scalar logical counter per node, incremented on every event and on receive
  set to `max(local, received)+1`. Gives a consistent **total order** when ties are broken by node id;
  captures "happened-before" but not concurrency.
- **Vector clocks:** a vector of per-node counters; can *detect* concurrency (whether two events are
  causally ordered or genuinely simultaneous), at `O(N)` space per stamp.

### EDGEs it addresses / candidate fit
- **EDGE 1 (deterministic tie-break):** the cold-start case needs symmetry-breaking with no anchor.
  Lamport's `(timestamp, id)` total order is the standard deterministic tie-break — "lowest id founds
  the frame" is a Lamport-style id tie-break, and the spec demands reproducibility (no coin flip).
- **EDGE 2 / EDGE 4 (presence ordering):** "longest present" is a *causal/ordering* claim; a
  logical-time counter that only advances while a peer is genuinely participating is a candidate carrier
  for seniority that cannot be back-dated by simply setting a small wall-clock number.
- **EDGE 5a:** vector clocks could *detect* that a returning sleeper has been concurrent-absent (its
  vector is stale relative to the live group), supporting self-demotion.

### Why it does or does not fit our constraints
Scalar Lamport timestamps fit perfectly: `O(1)`, ride the attendance, no master, deterministic. **Vector
clocks do not fit as the primary carrier**: `O(N)` per attendance violates the "no excessive per-tick
overhead" constraint for larger meshes, and membership churn complicates the vector. Logical time also
solves *ordering* but not *physical time* — it cannot by itself tell you *what tick it is now in
wall-clock terms*, so it must be combined with the physical estimate (family 1) — which is precisely
what family 5 (HLC) does.

### Accept / Reject
**Accept** scalar Lamport `(counter, id)` for **deterministic tie-break (EDGE 1)** and as the
**presence-ordering** primitive feeding seniority (EDGE 2/4/5a). **Reject** vector clocks as the
on-the-wire carrier (per-message `O(N)` cost) — keep them only as an optional offline analysis tool.

---

## 5. Hybrid Logical Clocks (HLC) — ACCEPTED CANDIDATE

### What it is
HLC combines a **physical** timestamp with a bounded **logical** counter into a single value that is
(a) **monotonically non-decreasing**, (b) close to physical time (bounded drift from it), and (c)
totally ordered with an id tie-break. On each event the physical part tracks `max(local_physical,
received_physical, wall_clock)` and the logical counter increments only to break ties when the physical
part does not advance. It gives you "a clock that never goes backward and stays near real time, with a
deterministic order built in."

### EDGEs it addresses / candidate fit
- **EDGE 3a (monotonic tick — the headline fit):** HLC is monotonic *by construction*. Deriving the
  tick from an HLC instead of a raw `now()` means a small downward offset correction cannot make the
  tick rewind — the logical counter absorbs the non-advance. This is exactly the "never run backward,
  slow down instead" requirement.
- **EDGE 1 / EDGE 4 (seniority token without a forgeable raw wall-clock):** an HLC value embeds *both*
  a physical estimate and a logical/causal component, so seniority encoded as an HLC is harder to forge
  by simply lying about the wall clock (you would have to also have legitimately advanced the logical
  component through real participation). Its built-in `(value, id)` total order gives EDGE 1 its
  deterministic tie-break for free.
- **EDGE 6:** HLC gives a monotonic, totally-ordered stamp to label re-synced state so a correction
  reorders cleanly without a backward jump.

### Why it does or does not fit our constraints
Fits the constraints strongly: `O(1)` per message (one physical + one small logical field), rides the
attendance, no master, monotonic (satisfies the sacred no-rewind invariant), deterministic order. The
honest limit: HLC keeps the clock *monotonic and ordered*, but it does **not** by itself decide a
*large* correction's policy (EDGE 3b) — a 1-hour jump still needs the abandon-and-resync of family 11 —
nor does it perform the corroboration of family 2. It is a primitive, not a complete solution.

### Accept / Reject
**Accept as a candidate** — specifically as the representation of the derived tick / seniority token so
that the **monotonic-tick requirement (EDGE 3a)** and **non-forgeable seniority (EDGE 1/4)** hold by
construction. Combine with families 6 (election), 2 (corroboration), 10 (correction policy), 11 (state).

---

## 6. Bully / Ring / Spanning Tree root-bridge election (leader/root election)

### What it is
- **Bully:** the highest-id reachable node becomes leader; a node noticing the leader is gone holds an
  election by messaging all higher ids; lots of messages, assumes every node knows the full id space.
- **Ring election:** nodes arranged in a logical ring pass an election message around accumulating the
  max id; assumes a maintained ring topology.
- **Spanning Tree Protocol (STP) root-bridge election:** every bridge advertises its `(priority, id)`;
  the lowest wins and becomes root; advertisements are **gossiped hop-by-hop**, and on a **topology
  change (merge/partition) the tree re-converges automatically** to the best root currently reachable.

### EDGEs it addresses / candidate fit
**STP root election is the closest analogue to EDGE 1, EDGE 2, and EDGE 4**, and this is the central
insight of the research:
- **EDGE 1:** STP's deterministic "lowest `(priority,id)` wins" is exactly the leaderless tie-break the
  cold-start needs — no master, reproducible, converges.
- **EDGE 2:** STP *re-converges on merge*. Model each group's frame as a root advertisement carrying
  `(presence-seniority, founder-id)`; when partitions heal, the *more-senior root wins and the younger
  group adopts it* — precisely "longest-present group wins, the other does not blend." Partition/heal in
  our `NetworkSim` maps directly onto STP topology-change re-convergence.
- **EDGE 4:** STP advertises a *priority*, not mere reachability/readiness. Encoding presence-seniority
  as the priority means a "ready but late" joiner advertises a *junior* root and yields — readiness does
  not outrank seniority.
- **EDGE 5a:** a returning sleeper re-advertises; if its seniority field has been demoted (family 9) it
  no longer wins the root election, so it adopts the live root instead of yanking it.

The deep reason STP fits: its `(priority, id)`-min election is effectively a **CRDT join** (associative,
commutative, idempotent), so re-converging after arbitrary partition/merge is correct by construction —
which is exactly the property our min-epoch rule has but with a *richer, seniority-aware* key.

### Why it does or does not fit our constraints
STP fits the sparse/no-master/gossip constraints natively (it was designed for exactly hop-by-hop
gossip with topology churn). **Bully and Ring do not fit:** Bully assumes a node knows and can message
the entire higher-id set (chatty, not sparse-friendly, and message count explodes); Ring assumes a
maintained ring overlay we do not have in a churny P2P mesh. Both also elect a *single coordinator that
then acts as a master* — reintroducing the master we are forbidden from having — whereas STP elects a
*root reference* without making it a runtime coordinator.

### Accept / Reject
**Accept** the **STP-style `(seniority, founder-id)` gossiped root election** as the core authority
mechanism for **EDGE 1, EDGE 2, EDGE 4** (and it carries EDGE 5a once the seniority field self-demotes).
**Reject** Bully and Ring — their topology/coordinator assumptions violate sparse-P2P-no-master.

---

## 7. Virtual synchrony & gossip / epidemic anti-entropy (group membership / view change)

### What it is
- **Virtual synchrony:** a group-communication model where all members observe the same sequence of
  **views** (membership changes) and messages are delivered consistently relative to view changes —
  strong guarantees, historically via a membership/sequencer service.
- **Gossip / epidemic anti-entropy:** peers periodically exchange state digests with random/known peers
  and reconcile differences; state spreads epidemically and the mesh **converges** without any
  coordinator, tolerating drop/reorder/partition.

### EDGEs it addresses / candidate fit
- **EDGE 2 (sparse group merge):** gossip anti-entropy is the natural transport for spreading the
  winning root/frame across a healed partition — each peer learns the senior frame from whichever peer
  it can reach, and it propagates transitively. This is how the STP advertisement of family 6 actually
  travels in our mesh.
- **EDGE 6:** anti-entropy is also the mechanism for spreading *state* digests so a corrected peer can
  pull the agreed simulation state (overlaps family 11).
- **EDGE 5a:** view-change awareness ("the set of peers I can confirm with changed") is the trigger
  context for self-demotion.

### Why it does or does not fit our constraints
Gossip anti-entropy fits perfectly — it *is* the sparse, no-master, partition-tolerant convergence model
our transport already resembles (attendance are a gossip channel). **Virtual synchrony as a strict
requirement does not fit:** classic virtual synchrony relies on a membership/sequencer service and
strong delivery guarantees that are expensive and master-ish over best-effort P2P; demanding totally
consistent views per message violates the "no excessive overhead" constraint. We borrow its *concept*
(a notion of "current view / who is present") to drive seniority and self-demotion, without its
machinery.

### Accept / Reject
**Accept** gossip/epidemic anti-entropy as the **propagation substrate** for the elected frame and for
state digests (EDGE 2, EDGE 6; supports EDGE 5a). **Reject** virtual synchrony as a hard requirement —
keep only the lightweight "current view" concept; its sequencer/strong-delivery machinery is too heavy.

---

## 8. Paxos / Raft (consensus) — REJECTED as too heavy (ideas borrowed)

### What it is
Leader-based consensus protocols that get a quorum of nodes to **agree on a single value / an ordered
log** despite failures. Raft elects a leader per **term** (a number that only ever increases) and
replicates a log via attendance; a leader holds a **lease** kept alive by those attendance. Both
guarantee a single agreed history with majority quorums.

### EDGEs it would target / candidate fit
On paper, "agree on one shared epoch/frame" (EDGE 1/2/4) is a consensus problem, so Paxos/Raft *look*
applicable.

### Why it does NOT fit our constraints
- **Too heavy for per-tick P2P.** Consensus requires multiple message rounds and a stable quorum per
  decision. Our frame can be perturbed by churn/merge frequently, and we must not pay multi-round
  quorum traffic on the tick path — this directly violates "no excessive per-tick message overhead". A
  full replicated log is far more machinery than "agree which epoch is the root".
- **Leader = master.** Raft's elected leader is effectively the master we are forbidden from having; a
  partitioned minority cannot even make progress (by design), which is wrong for a game that must keep
  running locally during a partition (our peers keep ticking and reconcile later).
- Liveness assumptions (a stable majority, bounded message delay for the leader lease) are stronger than
  best-effort P2P provides.

### Ideas worth borrowing (noted, per the task)
- **Monotonic term number:** Raft's "term only increases, higher term wins" is a clean, forgery-resistant
  way to order authority epochs — reusable for **EDGE 4** (an established frame carries a higher
  "term/seniority" than a fresh joiner) and to ensure a stale returning peer cannot present a *higher*
  term than the live group.
- **Lease / attendance liveness:** "authority is only valid while you keep heart-beating" maps onto
  **EDGE 5a** self-demotion — a peer that stops confirming (sleeps) lets its lease lapse and must
  re-earn standing, rather than retaining stale seniority.

### Accept / Reject
**Reject** Paxos/Raft as the mechanism (too heavy, leader-as-master, partition-intolerant for our use).
**Borrow** the monotonic-term and lease-liveness *ideas* for EDGE 4 and EDGE 5a.

---

## 9. Attendance timeouts & phi-accrual failure detector (failure detection)

### What it is
- **Attendance timeout:** declare a peer suspect if no attendance arrives within a fixed window. Simple
  but brittle — one threshold trades off false positives vs detection latency.
- **Phi-accrual failure detector:** instead of a boolean, it outputs a *suspicion level* `φ` computed
  from the **statistical distribution of recent inter-arrival times**; `φ` rises smoothly as a attendance
  becomes overdue relative to history. The application picks an action threshold, adapting automatically
  to the observed network rather than a hand-tuned timeout.

### EDGEs it addresses / candidate fit
**Phi-accrual connects directly to EDGE 5a (self-demotion):** the precise rule we need is "I have not
*confirmed my clock with anyone* for T → treat myself as unsynced and drop my seniority claim." Phi-accrual
is the principled way to compute that "T has effectively elapsed" signal: a peer tracks the arrival
statistics of *clock-corroboration* exchanges, and when `φ` against the whole group crosses a threshold
(it has heard from *no one* recently), it self-demotes — re-bootstrapping onto whatever frame the live
group is running rather than re-asserting its now-stale seniority. This is exactly the sleeper-wakes
scenario: while asleep it confirmed with nobody, so on waking its detector already says "you are out of
contact → demote." It also feeds EDGE 5b (distinguishing a live-but-lone dissenter from a real group).

### Why it does or does not fit our constraints
Fits well: it runs on the attendance/ping stream we already have, is `O(1)` state per peer, needs no
master, and *adapts* to best-effort jitter rather than assuming a fixed RTT — a good match for variable
P2P links. Limit: it detects *absence/contact-loss*, not *correctness* — it tells you "I'm out of touch",
not "my time is wrong"; pairing with family 2 (intersection) gives the "and my interval no longer agrees"
half. A plain fixed timeout is the rejected alternative here: it works but mis-tunes across heterogeneous
links (false demotions on a slow link, slow demotions on a fast one).

### Accept / Reject
**Accept** phi-accrual as the candidate **staleness/self-demotion trigger for EDGE 5a** (and a liveness
input to EDGE 5b). **Reject** a single fixed-threshold attendance timeout as the primary mechanism — it
is the brittle alternative phi-accrual replaces — though a coarse fixed timeout is an acceptable fallback.

---

## 10. PLL/FLL loop filters; slewing vs stepping (clock disciplining)

### What it is
How real systems *apply* a clock correction:
- **Stepping** jumps the clock instantly to the corrected value (what `settimeofday` / a naive
  `tick = floor((now-epoch)/tickMs)` does on a new offset) — can jump *backward*.
- **Slewing** (`adjtime`) changes the clock's *rate* slightly so it eases toward the target over time
  without ever jumping or going backward.
- **PLL/FLL loop filters** (as in NTP's local clock discipline) are feedback controllers that adjust
  *frequency* to track the source smoothly, damping jitter and avoiding overshoot.

### EDGEs it addresses / candidate fit
**Directly connects to EDGE 3 (smooth correction without rewind):**
- **EDGE 3a (small recal):** slewing is the canonical answer — apply a small offset correction by
  briefly slowing/speeding the derived tick's effective rate so it eases into agreement and **never
  rewinds**. This is the same "never go backwards, slow down instead" rule v1 `SyncedClock.js`
  already gestures at (its monotonic clamp) and that our `SyncedPeerClock` models. A PLL/FLL filter is
  the refined version that also resists jitter-induced over-correction.
- **EDGE 3b (large recal):** slewing alone is the *rejected* approach — easing a 1-hour error at a safe
  rate would take far too long (or, if rate-capped loosely, still effectively lurch). The correct answer
  for a large delta is to **stop slewing and hand off to family 11** (abandon-and-resync). So family 10
  *defines the threshold*: below it, slew; above it, re-sync.

### Why it does or does not fit our constraints
Slewing fits perfectly and is essentially mandatory given the sacred no-rewind invariant: it is local,
`O(1)`, no master, and preserves monotonicity. The constraint it imposes is a *bounded correction rate*
(the tick may run slightly slow/fast but within the spec's "~no faster than real time + slack" bound).
Its limit is the large-jump case, which is *why* EDGE 3b needs a different family.

### Accept / Reject
**Accept** slew-not-step (optionally PLL/FLL-filtered) as the correction-application mechanism for
**EDGE 3a**, and as the *threshold definition* for EDGE 3b. **Reject** stepping/instant-jump (it rewinds
or lurches) and **reject** slewing as the answer to a *large* EDGE 3b correction (too slow) — that hands
off to family 11.

---

## 11. Snapshot / anti-entropy / log-replay re-bootstrap (state reconciliation)

### What it is
Mechanisms to make a node's *application state* match its peers:
- **Snapshot transfer:** a peer sends its current state; the receiver adopts it wholesale (we already
  have a bootstrap/`MSG_BOOT` path that does this for late-join).
- **Anti-entropy:** peers exchange digests and reconcile differing state (family 7's substrate).
- **Log-replay re-bootstrap:** re-run the deterministic simulation from a known snapshot + the input
  log up to the corrected tick.

### EDGEs it addresses / candidate fit
- **EDGE 6 (the decisive one):** a large *legitimate* correction must re-sync **simulation state**, not
  just snap the tick. Because the peer had been simulating on the wrong frame, its per-player tallies /
  positions / RNG stream were computed for the wrong ticks. The spec's teeth are a **state deep-equality**
  check (`C.getState().tally === A.getState().tally`), which only passes if, after the clock jump, the
  peer **abandons its simulation and re-bootstraps from a peer snapshot** (or re-replays). Snapshot
  transfer / log-replay is the family that provides this.
- **EDGE 3b (large recal):** the partner to family 10 — once a correction is judged "too large to slew",
  the action is `catchUpStatus → 'catching-up'`, pull a snapshot, re-simulate to the corrected synced
  target, then `→ 'live'`. The repo's existing bootstrap re-sim loop is the seam to reuse.

### Why it does or does not fit our constraints
Fits: we already have a bootstrap snapshot path and a deterministic step function (so log-replay is
viable and exact). It is invoked *rarely* (only on large corrections / merges), so the one-off snapshot
cost does not violate the per-tick overhead budget. Risks to design around: snapshot size for large game
state, and **livelock** if large corrections flap (needs hysteresis/backoff — flagged in the plan's open
questions). The rejected alternative for EDGE 6 is **tick-only snapping** (or HLC/logical relabeling
alone): it makes the *counter* agree while leaving *state* wrong — it would pass a tick check and **fail**
the EDGE 6 state-equality check.

### Accept / Reject
**Accept** snapshot/log-replay re-bootstrap as the candidate for **EDGE 3b and EDGE 6** state re-sync,
reusing the existing bootstrap seam, with hysteresis to avoid re-sync flapping. **Reject** tick-only
snapping for EDGE 6 — it satisfies the tick number but not the required state agreement.

---

## EDGE → (accepted candidate, rejected alternative) coverage check

| EDGE | Accepted candidate(s) | Rejected alternative(s) |
|------|-----------------------|--------------------------|
| **1** (cold-start tie-break) | STP-style `(seniority,id)` election (6); Lamport `(counter,id)` tie-break (4); HLC ordered value (5) | Cristian min-epoch — no tie-break, hijacks (1); Berkeley averaging (3); Bully/Ring (6) |
| **2** (group merge, no blend) | STP root re-convergence (6); gossip anti-entropy propagation (7) | Berkeley averaging → forbidden "third time" (3); Cristian (1); virtual synchrony as requirement (7) |
| **3a** (small recal, monotonic) | Slewing / PLL-FLL (10); HLC monotonic tick (5) | Stepping / raw `now()`-derived tick (10); Cristian application half (1) |
| **3b** (large recal, no lurch) | Abandon-and-resync via snapshot/log-replay (11); threshold from slew policy (10) | Slewing a 1h error — too slow (10); Berkeley (3) |
| **4** (seniority > readiness) | STP seniority-as-priority (6); Raft monotonic-term idea (8) | Cristian "ready" single-sample as authority (1); full Paxos/Raft — too heavy (8) |
| **5a** (sleeper self-demotes) | Phi-accrual staleness trigger (9); STP demoted re-advertise (6); Raft lease-lapse idea (8) | Static oldest-wins seniority / Bully (6); fixed-threshold timeout as primary (9) |
| **5b** (lone forged peer) | Marzullo intersection + corroboration quorum (2); phi-accrual liveness (9) | Cristian single-sample trust (1); Berkeley — mean chases the dissenter (3) |
| **6** (re-sync state) | Snapshot / log-replay re-bootstrap (11); gossip state digests (7); HLC stamping (5) | Tick-only snap (11); clock disciplining alone (10) |

Every EDGE has at least one accepted candidate and at least one rejected alternative, each justified
against the sparse-P2P / no-master / no-excessive-overhead / monotonic-tick constraints.

---

## Synthesis (feeds Phase D)

The shape that prior art points to — to be *designed and falsified*, not yet decided:

1. **Authority = an STP-style gossiped root election** over a `(seniority, founder-id)` key (families 6,
   7), generalizing today's min-epoch into a seniority-aware, CRDT-join key. Carries EDGE 1/2/4.
2. **Seniority measured/encoded so it is monotonic and not trivially forgeable** — an HLC-like value
   (family 5) advanced by genuine participation, with a Raft-style monotonic-term flavor (family 8),
   and **self-demoting** via a phi-accrual staleness detector (family 9). Carries EDGE 4/5a.
3. **Corroboration before adopting a foreign frame** — Marzullo interval intersection requiring ≥k
   agreeing peers (family 2), so a lone/forged claim cannot move a healthy group. Carries EDGE 5b.
4. **Correction policy split by magnitude** — slew small (family 10), abandon-and-resync large via the
   existing snapshot/bootstrap path (family 11), HLC keeping the tick monotonic throughout (family 5).
   Carries EDGE 3a/3b/6.
5. **Explicitly NOT averaging** (family 3 rejected) and **NOT full consensus** (family 8 rejected) — the
   frame is *elected and survives*, not blended or quorum-logged.

Open questions (cost of the election key on the attendance, exact small/large threshold, anti-flap
hysteresis, CRDT-framing of the seniority join) are carried in `./TIMER_RESEARCH_PLAN.md` §4.
