# Sparse Checkpoint Protocol (§13.3.2 — refines §13.3.1)

Co-designed 2026-06-08. This refines the §13.3.1 checkpoint-reliability layer so checkpoint traffic is
**sparse and content-driven** instead of one item per grid tick: a node that is idle (or agreed) sends
**nothing**, and a divergence resurfaces exactly the window needed to repair it.

It REPLACES the slice-4c content-keyed dirty gate (which collided when a peer's state repeated across
ticks — an idle peer at the same sum has identical content hashes at different ticks, so an ack of an
earlier checkpoint wrongly cleared a later divergent one and the canonical holder fell silent before the
divergence finalized).

---

## 1. Substrate (unchanged): the snapshotInterval grid

Keep the existing `snapshotInterval` grid as the **state-hash + rollback-snapshot substrate**: a state hash is
recorded at every grid tick (it hashes finalized STATE, not the input log — identical state ⇒ identical
future). Checkpointing is **finalized-only**: a checkpoint never covers state inside the grace window.

The grid stays ALIGNED across nodes (cheap, lets any node compute its own hash at any grid tick). What
becomes sparse and per-node is the **sendable checkpoint** on top of it.

## 2. The checkpoint (a content-triggered chain link)

A checkpoint is emitted only when **finalized state has changed** since the last one. Trigger:

- State changes at tick `T` → `G_prev` = the grid tick at/before `T`.
- When `G_prev + checkpointInterval` **finalizes**, emit one link covering `[G_prev, G_prev+checkpointInterval]`,
  batching every change in that window:

```
checkpoint = { seq, startTick, endTick, startHash, endHash,
               stateHashes: [grid hashes over (startTick, endTick]],
               inputs: [ { author, tick, intent? } over (startTick, endTick] ],  // (source,tick) REF always;
                                                                                 // intent only if receiver lacks it
               selfAsserted: { "author@tick": bool } }   // §13.6, for the holder's OWN inputs
```

`seq` is a monotonic per-sender stamp. The link carries the fine grid `stateHashes` so a receiver can locate
divergence to a tick, and the **(source,tick) reference for every input in the window** — the input set behind
those hashes, which `resolve()` compares (load-bearing, §6). `intent` (the value) is **delta-encoded**: carried
only for inputs the receiver lacks (and corroborated — honest-send). Because checkpoint windows are
content-driven, **two nodes' windows do NOT line up** — node A makes a link where IT changed, B where IT
changed. Comparison is built to not assume alignment.

Idle game ⇒ no change ⇒ no link ⇒ **silent**.

## 3. Per-peer watermark: `agreeUntil[peer]`

Each node stores, per peer P, `agreeUntil[P]` = the most recent finalized tick at which it has **confirmed**
its state equals P's. It is durable (the send/dirty gate reads it between comparisons) and updated only by
**confirmed agreement**, never by an idle assumption. Also store `lastSeenSeq[P]`.

## 4. Comparison — walk BACK, anchor on the most recent match

We only care that the **most recent finalized state agrees** — not that the histories match. Different
histories that reconverge are equivalent (identical state ⇒ identical future), so an old divergence that has
since washed out is irrelevant.

```
on receive checkpoint cp from P:
    if cp.seq <= lastSeenSeq[P]: return            // stale — only accept higher seq
    lastSeenSeq[P] = cp.seq
    e = min(cp.endTick, myLatestFinalized)
    if myHash(e) == cp.hashAt(e):                  // TOP matches → converged at P's latest
        agreeUntil[P] = max(agreeUntil[P], e)      //   rise or keep — NEVER lower
        return
    // TOP mismatches → genuine current divergence: find the anchor
    for T from e-1 down to max(cp.startTick, myWindowBottom):
        if myHash(T) == cp.hashAt(T):
            agreeUntil[P] = T                       //   set to most-recent match — may LOWER (real divergence)
            resolve(S = T, mine, cp) over (T, e]    //   §13.6.1 canon resolve of the divergent tail only
            return
    // walked to the bottom with no match → no anchor in my window
    if myHash(windowBottom) != cp.hashAt(windowBottom):   // the bottom ACTUALLY diverges
        request B8 state transfer                   //   unrecoverable: nothing older to restore to
    // else: no overlap low enough to decide — wait for more data
```

Key properties this gives **for free**:
- **Re-convergence:** if the top matches, done — the old divergence is never looked at (no wasted rollback).
- **The "C-trap":** if B reconciled tick 25 (because of peer C) and a stale link from A arrives mismatching
  at 30, the backward walk finds B's most-recent match below the change (≈24) → replay, never a false transfer.
- **Minimal resolve range:** only `(mostRecentMatch, latest]`.

## 5. Watermark update rules (the subtle part)

- **Rise gate = top match.** `agreeUntil` only RISES (`max`) when the freshest checkpoint's top hash
  matches — we are converged at the peer's latest finalized state. A matching-top checkpoint can never lower it.
- **Fall gate = fresh top-mismatch.** `agreeUntil` only FALLS when a **fresh** (higher-seq) checkpoint's top
  hash mismatches — a real current divergence; it falls to the most-recent match below. A stale (lower-seq)
  checkpoint is dropped, so an old-but-since-reconverged divergence can't drag it down.
- **Self-reconcile retreat.** When THIS node reconciles its own finalized state (canon drops/adds an input,
  rewriting state from tick `C` forward), its prior agreement with EVERY peer below `C` is invalid →
  `agreeUntil[P] = min(agreeUntil[P], C)` for all P (and it is now dirty to all peers from `C` forward).
- **Ack advances over the confirmed prefix.** When P acks my checkpoint `[agreeUntil, X]`: if my data for
  that window is unchanged → advance to `X`; if I changed it at tick `C` in the meantime (`agreeUntil < C ≤ X`)
  → advance to `C` only (below `C` is still exactly what P acked) and re-dirty `[C, latest]`.

Invariant: **`agreeUntil` rises monotonically except on a fresh top-mismatch or a self-reconcile.**

## 6. Send / dirty rule (where silence comes from)

`agreeUntil[P]` is the **agreement watermark, NOT the send cursor** — the send is driven by the sparse DELTA
P lacks, never by the `[agreeUntil, latest]` range (which would over-send the whole retained window after a
long idle).

- **Dirty to P** iff `_known[P]` (the §10.1/§10.2 per-peer holdings ledger, plus the liveness/beat knowledge)
  is **missing an input or liveness change in `(agreeUntil[P], latest]`**. This is *literally the input-wire
  forwarding condition* — the checkpoint dirty-bit and the §10.2 input-forwarding dirty-bit are the **same**
  signal. Gating on inputs+liveness (not "my state hash moved") is essential: a game whose state animates
  every tick is still deterministic, so only inputs/liveness can break agreement → a constantly-animating
  game with no inputs stays **silent**.
- **The send** covers `[anchor, latest]` where `anchor` = the most recent finalized grid tick whose hash is
  still the stable pre-change value (≈ one `snapshotInterval` before the change burst). Its `startHash` is the
  stable hash P already agreed on, so P — idle at that hash — recognizes the anchor instantly and applies just
  the change. The send carries: the grid hashes, the **(source,tick) reference for EVERY input in the window**
  (the input set behind the hashes — load-bearing, see below), the **intent values only for inputs P lacks**
  (corroborated, honest-send — the §13.3 delta-encoding), and the `selfAsserted` bits. If P does not recognize
  the recent anchor (it diverged deeper), it disagrees at the bottom → narrow the window downward toward the
  real anchor, or transfer if it aged out.
- **Ack** (only when this message ADVANCED P's agreeUntil): reply `{ tick, hash, seq }`. The receiver advances
  its own `agreeUntil` to `tick` iff that hash still matches there. Idle + agreed ⇒ nothing advances ⇒ no ack
  ⇒ **literal zero traffic**. No periodic attendance (a *very* slow one may be added later as a non-determinism
  safety net, since the inputs-gate alone can't surface a same-inputs/different-state sim bug).

**Why the references for inputs P already holds.** The (source,tick) refs ARE the input set `resolve()`
compares. If A's hash was computed over `{X,Y,Z}` and B has since **dropped Y** (a canon resolution with a
third peer), only the refs tell B the difference is specifically `Y`. A dropped input *below* the anchor is
caught by the anchor `startHash` itself mismatching, so refs-in-window + anchor-hash cover the whole set.

A divergence makes even an unchanged peer dirty: an uncorroborated own input is "lacked" by P (it reached no
one → not in `_known[P]`) → the holder is dirty → it ships **hashes-only** (honest-send withholds the
uncorroborated value); the hashes surface the split, P replies, the holder `resolve()`s and drops it.

### 6.1 Tombstones — never resurrect a canon-dropped input

A canon reconcile that DROPS an input tombstones it (§13.3). The `_known`-driven forward then has a trap: P
"lacks" a tombstoned input both when it *never saw* it AND when it *dropped it as void* — and re-forwarding /
re-accepting it would **resurrect** the void input and oscillate. So the forward and the checkpoint-apply
paths must treat a **tombstone as distinct from absence**: never re-ship an input P has tombstoned, and never
re-accept an input THIS node has tombstoned. Tombstones are retained to the same history floor as the inputs
they shadow (a tombstone that ages out can no longer protect against resurrection — by then the input is
below the window and only a transfer applies anyway).

## 7. Transfer escalation (§13.7)

Replay-repair needs an anchor `S` that is still in the retained history window. The backward walk requests a
**B8 state transfer** precisely when it reaches its window bottom and that bottom **actually diverges** from
the peer's hash there — there is no older agreeing state to restore to. `agreeUntil < windowBottom` alone is
not enough; the bottom must be *seen* to disagree (otherwise it is just a lagging watermark, not a real
unrecoverable split). Reliable checkpoint delivery guarantees a real change always surfaces as a checkpoint;
if it surfaces after its anchor aged out, this is where it is caught.

## 8. The §13.6 decision under sparse windows (already built)

The canon decision (`resolve(S, mine, peers)`) is unchanged EXCEPT the tie-break: with misaligned windows two
checkpoints' corroborated SETS span different ranges, so a content-hash tie-break is not a stable cross-node
key. The tie-break is **holder id** (`(score desc, holderId asc)` — a global, window-independent total order;
still cycle-free, the score being the absolute scalar). [Done in `CanonDecision.better()`.]

---

## Worked examples

- **Void straggler.** A has `+9` (state 180,…); B is `∅`. They never reconverge. A's link tops mismatch B;
  B walks back to tick 0 (initial states match) → anchor 0 → `resolve((0, latest])` → A's `A@0` is
  uncorroborated → canon `∅` → A drops it. Next compare: tops match → `agreeUntil = latest` → silent.
- **Steady state.** Both idle, tops match every comparison (and no new links emitted) → `agreeUntil` sits at
  the top, send window empty → zero traffic.
- **Re-convergence.** Diverge at 20, reconverge by 40, idle to 60. Top (60) matches → `agreeUntil = 60`,
  the 20–40 split never examined.
- **The C-trap.** B reconciles 25 (via C); stale link from A arrives mismatching at 30. Backward walk finds
  B's match ≈24 (in-window) → replay, not transfer. `agreeUntil` stayed ≥ its old confirmed value because B's
  change was above it.
